1,814 125 25MB
Pages 360 Page size 198.48 x 324.48 pts Year 2008
DECISION MODELING AND BEHAVIOR IN COMPLEX AND UNCERTAIN ENVIRONMENTS
Springer Optimization and Its Applications VOLUME 21 Managing Editor Panos M. Pardalos (University of Florida) Editor—Combinatorial Optimization Ding-Zhu Du (University of Texas at Dallas) Advisory Board J. Birge (University of Chicago) C.A. Floudas (Princeton University) F. Giannessi (University of Pisa) H.D. Sherali (Virginia Polytechnic and State University) T. Terlaky (McMaster University) Y. Ye (Stanford University)
Aims and Scope Optimization has been expanding in all directions at an astonishing rate during the last few decades. New algorithmic and theoretical techniques have been developed, the diffusion into other disciplines has proceeded at a rapid pace, and our knowledge of all aspects of the field has grown even more profound. At the same time, one of the most striking trends in optimization is the constantly increasing emphasis on the interdisciplinary nature of the field. Optimization has been a basic tool in all areas of applied mathematics, engineering, medicine, economics and other sciences. The series Optimization and Its Applications publishes undergraduate and graduate textbooks, monographs and state-of-the-art expository works that focus on algorithms for solving optimization problems and also study applications involving such problems. Some of the topics covered include nonlinear optimization (convex and nonconvex), network flow problems, stochastic optimization, optimal control, discrete optimization, multiobjective programming, description of software packages, approximation techniques and heuristic approaches.
DECISION MODELING AND BEHAVIOR IN COMPLEX AND UNCERTAIN ENVIRONMENTS Edited By TAMAR KUGLER University of Arizona, Tucson, AZ J. COLE SMITH University of Florida, Florida, FL TERRY CONNOLLY University of Arizona, Tucson, AZ YOUNG-JUN SON University of Arizona, Tucson, AZ
123
Editors Tamar Kugler University of Arizona Management and Organizations Eller College Tucson AZ 85721 [email protected]
Terry Connolly University of Arizona Eller College of Management 1130 E. Helen St. Tucson AZ 85721 430Z McClelland Hall [email protected]
J. Cole Smith University of Florida Dept. Industrial & Systems Engineering 303 Weil Hall P.O. Box 116595 Gainesville FL 32611-6595 [email protected]
Young-Jun Son University of Arizona College of Engineering Dept. Systems & Industrial Engineering 1127 E. James E. Rogers Way P.O. Box 210020 Tucson AZ 85721-0020 [email protected]
Managing Editor: Panos M. Pardalos Department of Industrial and Systems Engineering University of Florida Gainesville, FL 32611 [email protected] @
Editor/Combinatorial Optimization. Ding-Zhu Du Department of Computer Science and Engineering University of Texas at Dallas RRichardson, TX 75083 [email protected]
ISSN: 1931-6828 ISBN: 978-0-387-77130-4 DOI: 10.1007/978-0-387-77131-1
e-ISBN: 978-0-387-77131-1
Library of Congress Control Number: 2008926556 Mathematics Subject Classification (2000): 91A30, 91B16, 91A35, 62Cxx, 91B06, 90B50, 91A80, 91E10, 91E40, 68T05, 91E45 © 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com
To our spouses and families, who patiently tolerated the inevitable demands that the preparation of this book made on time that was rightfully theirs.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Part I Integrating Decision Analysis and Behavioral Models Improving and Measuring the Effectiveness of Decision Analysis: Linking Decision Analysis and Behavioral Decision Research Robert T. Clemen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Reducing Perceptual and Cognitive Challenges in Making Decisions with Models Jenna L. Marquard and Stephen M. Robinson . . . . . . . . . . . . . . . . . . . . . . . 33 Agricultural Decision Making in the Argentine Pampas: Modeling the Interaction between Uncertain and Complex Environments and Heterogeneous and Complex Decision Makers Guillermo Podest´ a, Elke U. Weber, Carlos Laciana, Federico Bert, and David Letson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Part II Innovations in Behavioral Model Analysis On Optimal Satisficing: How Simple Policies Can Achieve Excellent Results J. Neil Bearden and Terry Connolly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 There Are Many Models of Transitive Preference: A Tutorial Review and Current Perspective Michel Regenwetter and Clintin P. Davis-Stober . . . . . . . . . . . . . . . . . . . . . 99
viii
Contents
Integrating Compensatory and Noncompensatory Decision-Making Strategies in Dynamic Task Environments Ling Rothrock and Jing Yin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Repeated Prisoner’s Dilemma and Battle of Sexes Games: A Simulation Study Jijun Zhao, Ferenc Szidarovszky, and Miklos N. Szilagyi . . . . . . . . . . . . . . 143
Part III Innovations in Descriptive Behavior Models Individual Values and Social Goals in Environmental Decision Making David H. Krantz, Nicole Peterson, Poonam Arora, Kerry Milch, and Ben Orlove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Does More Money Buy You More Happiness? Manel Baucells and Rakesh K. Sarin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Cognitive Biases Affect the Acceptance of Tradeoff Studies Eric D. Smith, Massimo Piatelli-Palmarini, and A. Terry Bahill . . . . . . . 227
Part IV Experimental Studies in Behavioral Research Valuation of Vague Prospects with Mixed Outcomes David V. Budescu and Sara Templin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Collective Search in Concrete and Abstract Spaces Robert L. Goldstone, Michael E. Roberts, Winter Mason, and Todd Gureckis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Braess Paradox in the Laboratory: Experimental Study of Route Choice in Traffic Networks with Asymmetric Costs Amnon Rapoport, Tamar Kugler, Subhasish Dugar, and Eyran J. Gisches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Non-Euclidean Traveling Salesman Problem John Saalweachter and Zygmunt Pizlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Preface
On February 27 and 28 of 2006, the University of Arizona held a workshop entitled, “Decision Modeling and Behavior in Uncertain and Complex Environments,” sponsored by the Air Force Office of Scientific Research under a Multidisciplinary University Research Initiative (MURI) grant. The purpose of the workshop was to assemble preeminent researchers studying problems at the interface of behavioral and cognitive science, decision analysis, and operations research. This book is a compilation of 14 chapters based on the presentations given during the workshop. These contributions are grouped into four general areas, which describe in some detail the challenges in conducting novel research in this field. Part One is concerned with the need for integrating decision analysis and behavioral models. Robert T. Clemen discusses how the fields of behavioral research and decision analysis have diverged over time, and makes a compelling case to establish new links between the disciplines. He recommends leveraging lessons learned from behavioral studies within prescriptive decision analysis studies and evaluating the practical impact of those prescriptive techniques in helping decision makers achieve their objectives. Jenna L. Marquard and Stephen M. Robinson address eleven common “traps” that face decision model analysts and users. An understanding of these traps leads to an understanding of modeling features that either help or hurt the decision-making process. The authors link theory and practice by examining a set of case studies across a diverse array of model scenarios, and provide a checklist of recommendations for analysts confronted by these eleven traps. Guillermo Podest´ a, Elke U. Weber, Carlos Laciana, Federico Bert, and David Letson present a promising illustration of the use of behavioral analysis in prescriptive modeling. The application under consideration is agricultural production in the Argentine pampas region. At issue is the allocation of land for different crops in the region, where the goals to be maximized by the farmers are modeled by expected utility and prospect theories. The authors demonstrate that the optimal actions for landowners and tenants can differ
x
Preface
significantly according to the objective being modeled and the parameters used for the objective. Part Two of this book examines innovations in behavioral model analysis; particularly with respect to mathematical analyses of models. Neil Bearden and Terry Connolly consider a decision maker who is restricted to analyzing sequential decision problems via “satisficing,” which is a type of thresholdbased decision making that many humans employ in complex scenarios. The chapter indicates that humans who exhibit satisficing behavior in the best possible manner can often perform well in the sense that their decisions are near-optimal. This result is in contradiction to the faulty conventional wisdom that satisficing decision makers are inherently significantly suboptimal. Decision makers are also known to exhibit nonintuitive behavior regarding the transitivity of preference among options. Michel Regenwetter and Clintin P. Davis-Stober examine complex mathematical models that better explain the transitivity (and intransitivity) of preferences. The authors pay particular attention to resolving the difficulties encountered by examining logical transitivity axioms by experimental choice data. Ling Rothrock and Jing Yin summarize an analysis of Brunswik’s lens model for compensatory and noncompensatory decision-making behaviors. The authors discuss a traditional lens model for compensatory behaviors and a rule-based variation for noncompensatory decision making. The classical prisoners’ dilemma and battle of sexes games receive a new analysis from a game theoretical/simulation perspective. Jijun Zhao, Ferenc Szidarovszky, and Miklos N. Szilagyi examine the influence of multiple agents acting according to various profiles in these games, taking into account media influences. Their agent-based simulation approach leads to insights regarding the impact of individual agent personality on overall societal behavior under these games. Part Three of this volume offers new innovations in descriptive behavior models. David H. Krantz, Nicole Peterson, Poonam Arora, Kerry Milch, and Ben Orlove observe that contemporary models that do not account for the presence of “social goals” cannot accurately model certain observed behavioral phenomena. The authors establish the importance of social goals as an element of rational decision making. The authors identify environmental decision-making problems as a key application in which social goals are a significant decision component, an assertion further supported by the Podest´a et al. study developed in Part One. Manel Baucells and Rakesh K. Sarin examine the behavioral characteristics that explain why more money does not in practice reliably buy more happiness. The authors assert that “projection bias,” which is the tendency of people to project into the future their current reference levels, causes people to overestimate the value of additional money and misallocate their budgets. Biases are also identified by Eric D. Smith, Massimo Piatelli-Palmarini, and A. Terry Bahill, as a primary reason why people tend to reject tradeoff studies. The authors demonstrate that tradeoff studies are susceptible to
Preface
xi
human cognitive biases and provide examples of biases that produce variability in decisions made via tradeoff studies. Similar to the purpose of the Marquard and Robinson paper in Part One, a knowledge of these biases permits decision makers to improve their understanding of, and confidence in, tradeoff studies. Part Four of the text focuses on experimental studies in behavioral research. David V. Budescu and Sara Templin examine how humans assess vague prospects having mixed outcomes; that is, prospects having vague probabilities and a risky outcome that could result in either a gain or a loss (as opposed to traditional studies that have only varying levels of losses). Prior research in this area yielded a generalization of prospect theory (PT) for options with vaguely specified attributes, and the current chapter reinforces these findings regarding subjects’ tendencies to seek vagueness in the domain of gains but avoid vagueness in the domain of losses. Robert L. Goldstone, Michael E. Roberts, Winter Mason, and Todd Gureckis present a study of collective search behavior from a complex systems perspective. Their chapter describes an interactive experiment in which subjects perform real-time collective search in order to achieve a certain goal. The authors report two different experiments: one a physical resource searching problem, and the other an abstract search problem concerning the dissemination of innovations in social networks. The chapter then explores the group behavior characteristics observed in these experiments in order to provide a foundation for future collective search task analysis. Amnon Rapoport, Tamar Kugler, Subhasish Dugar, and Eyran J. Gisches examine group behavior arising in traffic networks. It is well known in the routing literature that selfish routing (each agent choosing its own best path) results in longer travel times than centralized routing. The Braess paradox demonstrates that the suboptimality of selfish routing can be observed by adding a zero-delay link to a network, which exploits agents’ selfish behaviors to actually increase average travel times. This chapter demonstrates conclusively by laboratory experiments that the Braess paradox is indeed a real phenomenon, even on relatively simple networks. Finally, as the last chapters of this book move from simple decisions to complex decisions, John Saalweachter and Zygmunt Pizlo examine human decision-making models in solving the traveling salesman problem, which belongs to the most difficult class of combinatorial optimization problems. Although the traveling salesman problem is usually studied on a two-dimensional Euclidean surface, the problem becomes even more difficult when obstacles prohibiting some straight-line directions are added to the problem. The authors show that in their experiments, subjects continue to perform well in solving the traveling salesman problem in the presence of simple obstacles. However, when complex obstacles were present, not only did the subjects’ performance decline, but the complexity of their mental mechanisms increased. The compilation of these chapters provides a basis for understanding several state-of-the-art research avenues in the field of descriptive and prescriptive
xii
Preface
decision making. Our hope is that the book inspires further interdisciplinary research in these areas, and illustrates the benefit of studies that tie together concepts from the diverse fields represented by the contributions of this book.
Tucson, AZ and Gainesville, FL January 2008
Tamar Kugler J. Cole Smith Terry Connolly Young-Jun Son
Improving and Measuring the Effectiveness of Decision Analysis: Linking Decision Analysis and Behavioral Decision Research Robert T. Clemen Fuqua School of Business, Duke University, Durham, NC 27708-1208, USA [email protected] Summary. Although behavioral research and decision analysis began with a close connection, that connection appears to have diminished over time. This chapter discusses how to re-establish the connection between the disciplines in two distinct ways. First, theoretical and empirical results in behavioral research in many cases provide a basis for crafting improved prescriptive decision analysis methods. Several productive applications of behavioral results to decision analysis are reviewed, and suggestions are made for additional areas in which behavioral results can be brought to bear on decision analysis methods in precise ways. Pursuing behaviorally based improvements in prescriptive techniques will go a long way toward re-establishing the link between the two fields. The second way to reconnect behavioral research and decision analysis involves the development of new empirical methods for evaluating the effectiveness of prescriptive techniques. New techniques, including behaviorally based ones such as those proposed above, will undoubtedly be subjected to validation studies as part of the development process. However, validation studies typically focus on specific aspects of the decision-making process and do not answer a more fundamental question. Are the proposed methods effective in helping people achieve their objectives? More generally, if we use decision analysis techniques, will we do a better job of getting what we want over the long run than we would if we used some other decisionmaking method? In order to answer these questions, we must develop methods that will allow us to measure the effectiveness of decision-making methods. In our framework, we identify two types of effectiveness. We begin with the idea that individuals typically make choices based on their own preferences and often before all uncertainties are resolved. A decision-making method is said to be weakly effective if it leads to choices that can be shown to be preferred (in a way that we make precise) before consequences are experienced. In contrast, when the decision maker actually experiences his or her consequences, the question is whether decision analysis helps individuals do a better job of achieving their objectives in the long run. A decisionmaking method that does so is called strongly effective. We propose some methods for measuring effectiveness, discuss potential research paradigms, and suggest possible research projects. The chapter concludes with a discussion of the beneficial interplay between research on specific prescriptive methods and effectiveness studies.
4
Robert T. Clemen
1 Introduction Decision analysts are quick to point out the distinction between decision process and decision outcome, and that even the best decision process can be derailed by an unlucky outcome (e.g., [8,11,57]). So why should we use decision analysis (DA) techniques? Typical answers include “gaining insight” and “being coherent,” but the best reason to use DA would be that doing so would more likely get us what we want. Are there any results indicating that this is the case? If an individual or group uses DA, will they be more likely to get what they want? In the long run, are users of DA healthier, wealthier, safer, or in general more satisfied with the consequences of their decisions? Certainly DA can show an individual how to be coherent in making inferences and choices. That is, adherence to DA principles can promise that your decisions will not be self-contradictory and that the inferences you make will be consistent with the laws of probability. Thus, DA tells us what one should do on the basis of logical argument. However, behavioral decision research (BDR) has shown that people do not always make coherent decisions and internally consistent inferences (e.g., [28,36,38]). DA and BDR began with rather close ties, with BDR topics arising largely from questions associated with the appropriateness of subjective expected utility as a descriptive model of human behavior. As such, much of the early BDR literature looked at empirical questions about how well people could judge probabilities and the inconsistencies in preference judgments and decisions. In the late 1960s and early 1970s, it was not uncommon to see experiments that evaluated DA methods for assessing subjective probabilities or utilities. However, in reviewing the two literatures over the past 25 years, one comes away with the sense that BDR and DA have taken somewhat different paths. BDR has increasingly focused on psychological processes with less emphasis on helping to improve DA’s prescriptive techniques. On the other hand, although DA learned many important lessons from the early BDR work, much of current practice ignores recent developments in BDR and still relies on methods developed in the 1970s and early 1980s. For example, probability assessment in practice still follows principles laid out by Spetzler and Sta¨el Von Holstein [67] (e.g., see [39,49,50]). In very general terms, DA helps decision makers address their decisions in careful and deliberate ways. Thus, dual-process theories from psychology, especially Sloman [63] and Kahneman [34], can provide a useful framework. Many of the behavioral phenomena studies in BDR can be explained as resulting from an intuitive or nonconscious process, which Kahneman [34] calls System I. DA can be characterized as avoiding System I’s distortions and biases by careful and conscious deliberation (System II). Put simply, DA works to get decision makers past System I and into System II. This chapter has two goals. To begin, we take the position that current best practices in DA are not always successful in avoiding cognitive biases that arise from System I, and that a reasonable strategy is to find specific ways to
Effectiveness of Decision Analysis
5
counteract such biases. Thus, our first goal is to show how researchers can use recently developed psychological models from BDR to develop improved and prescriptive methods for DA. Our proposal goes beyond general statements about how awareness of behavioral biases can help decision makers avoid pitfalls. Instead, an appeal is made to take BDR results and models directly into the DA domain and to develop precise prescriptive methods that, according to the proposed theory, should improve judgment and/or decision making in a specific and systematic way. In Section 2 we briefly review research on probability and preference assessment, highlighting several recent examples that have used BDR results as a basis for new DA methods. Suppose that we are indeed able to develop new prescriptive methods. How will we know if they work? Although it may be possible to demonstrate in the lab that a method reduces a particular behavioral effect or bias, the question remains whether the method, applied in real decision settings, will genuinely be of value to decision makers. Put another way, the question is whether specific, BDR-based prescriptive methods will be more effective than current DA practice or unaided intuitive judgment in getting a decision maker what he or she wants. Thus, our second goal in the chapter is to describe research paradigms that could be used to measure the effectiveness of DA methods. Studies of effectiveness may complement the development of prescriptive DA methods by highlighting important behavioral questions about why various methods perform as they do, and behavioral research can in turn suggest specific ways to improve decision analysis. In evaluating DA and other decision-making techniques, the basic research questions are, “Is XYZ decision-making technique effective in getting people what they want? How does XYZ compare with ABC in terms of effectiveness?” Answering such questions is somewhat problematic. One cannot, for example, compare two different alternatives that an individual might have chosen in a particular decision situation; one person can follow only one path in the decision tree, and it is impossible to compare what actually happens with what might have happened under different circumstances. Measuring the effectiveness of decision-making techniques is a research problem fraught with challenges. For some of these challenges, satisfactory approaches are easily identified, but responding to others will require creative new methods. Nevertheless, comparative studies can be performed using readily available approaches, and we describe some ways researchers can carry out such studies. The remainder of this chapter is organized as follows. The next section delves into the relationship between BDR and DA, with particular attention to existing examples of a productive relationship and suggestions for other areas that could yield important improvements in DA methodology. Our review is selective and meant to highlight particular streams of research. In the following two sections, we turn to the chapter’s second goal of measuring the effectiveness of DA methods; we define the concepts of strong and weak effectiveness and describe in some detail a variety of possible studies and a
6
Robert T. Clemen
general research agenda. The final section concludes with a discussion of the potential benefits that BDR and DA can gain from the development of specific prescriptive methods and studies of effectiveness.
2 Using BDR to Improve Prescriptive DA Methods DA methodology derives from the subjective expected utility paradigm, with a strong focus on subjective judgment of probabilities and assessment of personal preferences. Much of BDR has likewise focused on these issues. There are, of course, many aspects of decision making that fall outside the scope of subjective expected utility, such as generating alternatives, identifying objectives to consider, or identifying and modeling relevant risks [24]. DA has developed methods for dealing with these and other aspects of decision making, and BDR has studied some of them. For our purposes in this chapter, however, we focus on subjective assessment of probabilities and modeling and assessment associated with value and utility functions. 2.1 Probability Assessment Early Work Early work on subjective probability judgments by Kahneman and Tversky and others (see [36]) emphasized how heuristic judgment processes can lead to biases. This work was important for the development of standard DA procedures for eliciting subjective probabilities [67]. For example, the anchor-andadjust heuristic in particular has played an important role in DA, because overconfidence is one of the most persistent biases that decision analysts face. For the assessment of probability distributions for continuous variables, Spetzler and Sta¨el Von Holstein advocated pushing experts to reason about scenarios that could lead to extreme events and to adjust their probabilities after such reasoning. Much of the early behavioral work on probability judgments focused on calibration of subjective probabilities (see [41]), demonstrating the extent to which subjective probability judgments did not match the objective frequency of occurrence of corresponding events. Some efforts were made to find ways to improve calibration. For example, Sta¨el Von Holstein [68,69] and Schaefer and Borcherding [60] reported that short and simple training procedures could improve the calibration of assessed probabilities, although their results did not show an overwhelming improvement in performance. Fischhoff [17] discussed debiasing techniques intended to improve the quality of subjective probability assessments. More recently, specific lines of inquiry have yielded results that are important for DA; we mention two here. The first is from the work of Gigerenzer
Effectiveness of Decision Analysis
7
and colleagues [26,27] and includes asking the expert questions that are “ecologically consistent” with those typically encountered in his or her domain of expertise and framing assessment questions in terms of relative frequencies. Both can substantially improve calibration, but neither is a panacea. By their very nature, risk assessments often require experts to go beyond their day-today experience (e.g., “What is the probability of a failure of a newly redesigned system in a nuclear reactor?”). Also, not all assessment tasks can be readily reframed in frequency terms. Consider the nuclear reactor question. Given an entirely new system design, how would the analyst describe an equivalence class for which the expert could make a relative-frequency judgment? The second is the literature on decomposition of probability judgments, the process of breaking down an assessment into smaller and presumably more manageable judgment tasks, making these simpler judgments, and then recombining them using the laws of probability to obtain the overall probability desired. For example, Hora et al. [31] showed that decomposition can improve assessment performance, and Clemen et al. [10] found similar results in the context of aggregating expert judgments. Morgan and Henrion [50] reviewed the empirical support for decomposition in probability judgment. Recent Directions: Underlying Processes More recent BDR work on probability judgment has turned to understanding the processes underlying observed biases. One example is the notion of dual processing systems. For example, Sloman [63] makes the case for associative and rule-based reasoning systems and their impact on judgments of uncertainty. Kahneman and Frederick [35] and Kahneman [34] propose “System I,” the quick, intuitive processing system, and “System II,” the deliberative reasoning system. They argue that when an individual makes a probabilistic judgment, the process begins in System I, which is subject to a variety of nonconscious effects, one of which is the substitution of features of an object for the characteristic being judged. For example, when asked for a probability judgment, an individual may use availability, representativeness, or affect as substitutes for genuine (and more deliberative) judgments of likelihood. Another example that emphasizes psychological process is the affect heuristic (e.g., [65]), whereby an individual makes judgments based on his affective response to the stimulus. The affect heuristic applies to both probability and preference judgments; Slovic et al. discuss the underpinnings of this heuristic, including how it ties into the fundamental workings of memory and emotion. A third example is support theory [58,72], which provides a theoretical framework for understanding the psychological process by which an individual generates probability statements. An important feature of support theory is the support function, a modeling construct that represents how an individual summarizes the recruited evidence in favor of a hypothesis (a particular description of a possible event). The notation s(A) is used to represent the
8
Robert T. Clemen
support for hypothesis A; that is, it represents the individual’s evaluation regarding the strength of evidence in favor of A. According to support theory, s(A) is not description invariant; different descriptions of the same event do not necessarily carry the same support. For example, suppose A is “precipitation tomorrow,” which can be decomposed into “rain or frozen precipitation tomorrow.” Although the two descriptions are meant to designate the same event, support theory contends that s(precipitation tomorrow) may not be equal to s(rain or frozen precipitation tomorrow). In fact, support theory typically assumes that describing an event A as a union of disjoint events (A1 or A2 ) increases support and the sum of the support for disjoint events is typically larger than the support of the union of those events. In symbols, s(A) ≤ s(A1 or A2 ) ≤ s(A1 ) + s(A2 ) . In turn, differences in support due to different descriptions can lead to different stated probabilities for the same event A, depending on how A is described. Fox and Tversky [23] argue that this process is separate from and prior to the decision-making stage when an individual must make a decision under uncertainty. More recently, Fox and his colleagues [21,22,62] show that judgments of probabilities are subject to a bias they call partition dependence. For example, when experimental participants were asked to assess the probability that the NASDAQ stock index would close in particular intervals, the assessed probabilities depended strongly on the specific intervals that were specified. Fox and colleagues argue that partition dependence stems from a heuristic in which the individual begins with a default prior probability distribution that assigns equal likelihood to each element in the state space; they dub this default distribution the ignorance prior. Because individuals tend not to adjust the ignorance prior sufficiently to account for their information, the result is that judged probabilities can depend strongly on how the state space is partitioned in the first place. In their conclusion, Fox and Clemen [21] connect the idea of partition dependence and the ignorance prior to standard DA practices in probability assessment. They argue that probability assessment occurs in three separate stages, and that different biases are likely to operate in different stages. In particular, they argue that standard DA practice is well suited to reducing biases that occur in the first and second stages (interpretation of categories and assessment of support). For example, working with experts to define and elaborate the interpretation of each event to be judged can reduce effects due to ambiguity, and in the assessment of support the analyst can encourage an expert to fully articulate her reasoning to reduce the effects such as availability or representativeness. However, biases in the third stage, mapping of support into stated probabilities, include the tendency to anchor on the ignorance prior. Fox and Clemen argue that this bias may resist correction because it is not particularly amenable to conscious reflection. They suggest a number of ways in which the analyst can work with the expert to minimize partition
Effectiveness of Decision Analysis
9
dependence, including the use of multiple partitions for use in the assessment process. Fox and Clemen [21] make an explicit connection between their behavioral results and DA practice, however, Clemen and Ulu [13] take this connection one step further. Building on the idea of partition dependence, they construct a model of the probability judgment process that is consistent with a number of known properties of subjective probabilities, including partition dependence as well as binary complementarity and subadditivity (e.g., [72]). In addition, they show that their model is consistent with interior additivity, a property observed by Wu and Gonzalez [78]. In one of their experiments, they observed that they could calculate the revealed or “indirect” probability of event A as P 0 (A) = P (A ∪ B) − P (B) for a variety of specifications of the auxiliary event B, and their various calculations of P 0 (A) tended to be highly consistent. Furthermore, an individual’s direct assessment of P (A) tended to differ substantially from the indirect probabilities P 0 (A). Clemen and Ulu [13] present empirical evidence in support of their model. More importantly for our purposes, they show that indirect probabilities, after being normalized to sum to one across the state space, are not biased by the ignorance prior (according to their model) and hence should not display partition dependence. Their empirical results, although preliminary, support this contention. Thus, Clemen and Ulu suggest that decision analysts can use normalized indirect probabilities as a way to counteract partition dependence. Although early empirical BDR work was able to provide good general guidance to decision analysts, Clemen and Ulu’s work shows that it is possible to build on psychological theory to develop a precise prescriptive procedure for DA. Whether the use of normalized indirect probabilities genuinely improves on standard practice will no doubt be the subject of future empirical studies. 2.2 Understanding and Assessing Preferences Assessing Utility Functions for Risky Choices Much of the work on preferences under risk has focused on the extent to which expressed preferences for lotteries are internally consistent, as exemplified by the Allais paradox [1,2] or Tversky and Kahneman’s [70] work on framing. An especially relevant early example is the phenomenon of preference reversal as described by Lichtenstein and Slovic [42]. Stated preferences may reverse depending on response mode (choosing between two risky alternatives versus specifying a value, typically a probability that makes the two alternatives equally preferable). The result is robust, having been demonstrated in many different domains and different forms. This result was then and continues to be important for DA practice, because it shows that preference elicitation methods that are equivalent under subjective expected utility do not necessarily yield consistent responses. Ord´ on ˜ez et al. [53] reviewed the preference
10
Robert T. Clemen
reversal literature and also studied whether preference reversals can be reduced by “debiasing” [17]. Having subjects perform the two assessment tasks simultaneously yielded little improvement but providing financial incentives for consistency, however, did reduce the reversal rate. Moreover, their results are consistent with the notion that the simultaneous judgment tasks, in the presence of adequate financial incentives, can lead to a merging of the preference patterns displayed in the different tasks. Hershey et al. [30] discussed biases induced by different preference-elicitation approaches in spite of their formal equivalence. One such bias is the certainty effect [70], whereby individuals tend to overweight certain or nearly certain outcomes. Understanding the certainty effect is important for DA, because standard methods for assessing utility functions under risk use reference lotteries that compare a risky lottery to a sure outcome. In order to account for the certainty effect, McCord and de Neufville [48] propose an assessment method in which the individual compares two lotteries. Wakker and Deneffe [76] took this idea one step further with their tradeoff method, in which the individual compares two 2-outcome lotteries at a time, each involving the same probabilities of winning and losing. The individual’s task is to specify one of the four outcomes (xi ) so that she is indifferent between the two lotteries. The assessed value is then used to construct the next pair of lotteries, the assessed value (xi+1 ) from which leads to the next lottery, and so on. Given the way in which the lotteries are constructed, it is possible to show that the assessed values x1 , . . . , xn are equally spaced in terms of their utility values. Moreover, the tradeoff method works for assessing utility functions under uncertainty or for assessing value functions in nonexpected utility models. Wakker and Deneffe’s [76] tradeoff method is a good example of a specific DA method that was developed to account for a particular behavioral phenomenon. Another example comes from Bleichrodt et al. [6], who, like Clemen and Ulu [13], develop a prescriptive method on the basis of a specific behavioral model. Bleichrodt et al. argue that utility assessments are systematically biased in terms of loss aversion and probability weighting as specified by prospect theory [37,71] regardless of the particular assessment method used. They further show how to use the prospect theory model to remove the bias. Doing so, of course, requires estimating the model parameters. Although ideally one would estimate parameters for the particular individual making the assessment, the authors find that even using the aggregate estimates from Tversky and Kahneman [71] can improve consistency across different utility assessment methods. Assessing Multiattribute Preferences Although many different biases have been identified in the DR literature on multiattribute assessment, we focus here on two key issues: the scale compatibility and attribute splitting effects. Slovic et al. [66] define the scale compatibility effect as a bias that occurs when an attribute’s weight is enhanced because the scale on which that
Effectiveness of Decision Analysis
11
attribute is measured is compatible with (or easily commensurable with) the scale of the response mode. For example, suppose a decision maker must judge weights for several attributes, some of which are naturally represented by dollars (e.g., profits, cost, or taxes) and some that are not (e.g., lives lost or environmental damage). A typical assessment method requires the decision maker to “price out” the various alternatives, or to identify an amount of each attribute that is consistent with a particular dollar value. Such an approach is relatively common, for example, in contingent valuation methods. However, due to the scale compatibility effect, those attributes that are already in dollar terms or can be easily converted to dollars will tend to be overweighted, whereas those that are not readily represented in dollar terms will tend to be underweighted. Tversky et al. [74] proposed scale compatibility as a key explanation of preference reversals. If the scale compatibility effect stems from choosing a particular attribute to be the numeraire in judging weights, then a reasonable approach is not to identify a single attribute. This is the approach taken by Delqui´e [14], who proposes bidimensional matching as a prescriptive assessment method. Instead of changing only one attribute in order to identify a preferentially equivalent option, Delqui´e suggest changing two attributes at once. The two attributes are varied in a systematic way until indifference is found. Delqui´e’s experiments show that this approach does reduce the scale compatibility effect. Anderson and Hobbs [3] take a different approach. They develop a model in which the scale compatibility effect is represented by a bias parameter in the model. Using Bayesian statistical methods, they show that one can use a set of tradeoff assessments to derive a posterior distribution for the bias parameter and for the individual’s weights. Thus, Anderson and Hobbs’s approach is similar to that of Clemen and Ulu [13] and Bleichrodt et al. [6] in processing an individual’s judgments ex post in order to adjust for anticipated biases. Another behavioral issue in assessing multiattribute weights is the attribute splitting effect [77]. The attribute splitting effect has to do with how attributes are structured in a hierarchy. For example, consider the two hierarchies shown in Figure 1. The task would be to assess global weights wA , wB , and wC for attributes A, B, and C, respectively. In the left-hand panel, the decision maker would judge these three weights directly. In the right-hand panel, the assessment is broken down. The decision maker judges the local weights vA and vA0 for A and A0 separately from the local weights vB and vC for B and C. Combining the local weights to obtain the global weights, we have wA? = vA , wB ? = vA0 vB , and wC ? = vA0 vC . The problem arises from the fact that wA? obtained using the two-level hierarchy tends to be greater than wA obtained from the one-level hierarchy. The attribute splitting effect is especially problematic for decision analysts, because any value hierarchy with more than two attributes can be represented in multiple ways (and hence may lead to different weights), and there is no canonical representation. The attribute splitting effect is similar to the ignorance prior effect discussed above [21,22,62]. A reasonable hypothesis might be that individuals
12
Robert T. Clemen
Fig. 1. Two value trees for assessing weights for attributes A, B, and C.
begin with equal weights for attributes at the same level in the hierarchy, but then adjust insufficiently. Starting from this premise, Jacobi and Hobbs [33] offer four possible models to account for attribute splitting. They test their four models in the context of an electrical utility firm evaluating environmental and economic attributes of alternative plans to expand generation capacity. The model that performs the best is similar to Clemen and Ulu’s [13] linear model of probability judgment: a weighted linear combination of a default weight (equal for all attributes at the same level of the hierarchy) and the decision-makers’ “true” weight performs the best by a variety of measures. Thus, like Clemen and Ulu [13], Bleichrodt et al. [6], and Anderson and Hobbs [3], Jacobi and Hobbs [33] take the approach of adjusting the decision-maker’s judgments ex post to account for the attribute splitting bias. Constructed Preferences, Emotions, and DA Although ex post adjustment of assessed weights makes some sense as a way to correct for the attribute splitting effect, it is not clear that this is the most appropriate way to frame the problem. An alternative is to think about the effect as a result of the individual’s process of constructing a response to the assessment question or, more fundamentally, from constructing preferences themselves in a situation where the issues, attributes, and tradeoffs are unfamiliar and poorly understood. In such a situation, clear preferences may not be readily articulated and may require careful thinking. The problem, of course, is that the way in which the assessment questions are asked can direct that thinking in particular ways that may affect the eventual preference judgments. Other judgmental phenomena can be viewed in terms of constructed preferences [54,64]. One particularly intriguing example is the role that emotions play in decision making. We mentioned the affect heuristic [65] in the context of probability judgment, but affect plays perhaps a larger role in preference assessment. For example, Loewenstein et al. [43] characterize how individuals respond emotionally to risky situations, and how such responses can have an impact on both judgments and decisions, and Luce et al. [44] show that negative emotion can arise from thinking about tradeoffs in decisions, and that the emotion can affect decisions. Hsee and Rottenstreich [32] show that when
Effectiveness of Decision Analysis
13
feelings predominate, judgments of value tend to reflect only the presence or absence of a stimulus, whereas when deliberation predominates, judgments reflect sensitivity to scope of the stimulus. Exploring the interplay between emotions and decision making leads to deep psychological issues. Although the literature in this area is vast, we offer two examples that relate affect and cognitive functioning. First, in studying self-regulation, Baumeister and his colleagues have found that self-regulation, which includes suppressing one’s emotions, generally appears to consume some of the brain’s executive resources [5,51] and can lead to reduced cognitive functioning [61]. Second, Ashby et al. [4] observe that positive affect generally improves cognitive functioning, and they offer a neurochemical explanation. Positive affect is associated with increased levels of dopamine, which in turn has been shown to be associated with various aspects of cognitive function, including improved creative problem solving and cognitive flexibility. If such effects are occurring at the general level of cognitive functioning, it seems evident that emotions, positive or negative, and having to cope with those emotions in a complex decision situation can have substantial impacts on an individual’s ability to think deliberately about tradeoffs. By extension, in a situation where preferences are not well articulated, emotions could have a profound effect on preference construction. Is it possible to develop prescriptive DA methods that respond to the issues raised by the constructed-preference view? Although the idea of constructed preferences has been known to decision analysts for some time (e.g., [16,75]), no specific methodology has been proposed. If anything, the argument has become circular. For example, Gregory et al. [29] critique the contingent valuation methodology, adopting a constructive-preferences view. As a replacement for contingent valuation, they recommend multiattribute utility analysis, a mainstay of DA methods, arguing that the decomposition approach of multiattribute utility can help individuals think deliberately through complex and unfamiliar tradeoffs. Likewise, Payne et al. [55] appeal to many DA techniques, including multiattribute utility assessment, in describing a “building code” for constructed preferences. As we have seen above, however, even standard DA methods can affect the way in which preferences are constructed and expressed. For the time being, finding good prescriptive ways to manage the construction of preferences appears to put BDR and DA at an impasse. Before continuing to an equally difficult topic, measuring the effectiveness of DA methods, we look for hopeful signs. Both behavioral researchers and decision analysts have a growing understanding of constructed preferences. That understanding is undoubtedly the first step. And if DA methods are viewed as a basis for avoiding many of the pitfalls in constructed preferences, it may be time for behavioral researchers and decision analysts to find productive ways to collaborate on this problem.
14
Robert T. Clemen
3 What Does “Effectiveness Mean? 3.1 Strong and Weak Effectiveness The simple answer to the question, “What does effectiveness mean?” is that DA and other decision-making techniques are effective to the extent that they help us achieve what we want to achieve. Thus, we must measure the quality of the consequences we get—in terms of what we want—as a function of the decision-making method used. This perspective is consequentialist; that is, it embodies the notion that the ultimate value of expending effort on decision making is because doing so can help one to obtain preferred consequences [24]. In particular, a consequentialist perspective does not include any value that might be obtained from the decision-making process itself. In the spirit of Keeney [40], we assume that it is possible to identify a decision-maker’s objectives at the time of a decision. Measuring effectiveness then requires measuring achievement of these objectives. If a decision method tends to lead to consequences that represent a high level of achievement of the decision objectives, we say that the method is strongly effective. In contrast, if a method tends to generate choices that informed judges generally view as preferable at the time the action is taken, then we say that the method is weakly effective. Although weak effectiveness may appear to be trivial (of course the decision maker must prefer the chosen alternative!), it is not when viewed more broadly. Showing weak effectiveness may be accomplished by showing that alternatives chosen by decision makers using a particular technique are judged to be preferred, or even dominant, when compared to alternatives chosen by other methods. The judgment of alternatives is made ex ante (i.e., in the context of making the decision before experiencing the consequence) by an appropriate sample of individuals. We expand on this approach below and make precise what we mean by “an appropriate sample of individuals” and how we can use their judgments to measure weak effectiveness. Does strong effectiveness imply weak effectiveness? It is certainly tempting to answer in the affirmative; if a technique produces alternatives that in turn lead to preferred consequences, would it not be the case that the decision makers would have evaluated those alternatives as having greater expected utility? Unfortunately, no compelling reason exists to believe this would be the case. In fact, one can imagine that a prescriptive technique could mislead a decision maker into thinking that the recommended alternative dominates all others, although the consequence eventually obtained from the recommended alternative would be inferior compared to the consequences from other alternatives. The issue here is not only the distinction between decision utility and experience utility in a specific decision situation, but also the extent to which a decision method itself can lead to a discrepancy between the two.
Effectiveness of Decision Analysis
15
3.2 Elements of Value Saying that a technique is effective when it helps one to achieve his objectives to a greater degree begs the question of what those objectives might be. Keeney [40] describes the process of identifying one’s objectives for decisionmaking purposes. Although we can legitimately expect different decision makers to have their own objectives in specific contexts, some basic classes of objectives may be common to certain types of decision-making units. Table 1 lists some generic objectives that may be of interest to individuals, small groups, corporations, or public-policy organizations. In what follows, we use the term “decision maker” to refer to the decision-making unit, regardless of whether that unit consists of one or more individuals. The objectives in Table 1 are intended to be representative, not exhaustive. These objectives describe typical reasons why the decision maker cares about any decisions within its purview. For the individual, we might characterize the objectives as “why we live.” In contrast, a small interest group’s objectives can be said to represent “why we join” voluntarily with others in common endeavors. The objectives of policy makers include notions of fairness, efficiency, and the management of externalities; we might call these objectives “why we govern,” and the corporation’s objectives could be described as “why we engage in economic activity.” Table 1 provides guidance as to what sort of objectives must be measured in order to determine effectiveness. Although specific decision contexts may have specific objectives, that objective is probably related to one of the objectives in Table 1. For example, maximize starting salary in the context of an individual’s job search is related to a wealth objective. Knowing what to measure to determine effectiveness is crucial; we want to be sure that we are concerned with the extent to which the decision maker’s lot is improved, according to his or her perspective, by using one specific decision method or another. Table 1. Typical objectives of different decision makers. • • • •
Individual: Health, wealth, safety, wisdom, love, respect, prestige Small Interest Group: Impact on community, influence, social standing, camaraderie, goals specific to group’s mission Public Policy Maker: Efficient use or allocation of resources, productivity, environmental quality, safety, health, fair decision processes and outcomes Corporation: Profit, market share, stock price (wealth), sales, lower costs, worker satisfaction
Often, adequate measures can be found using standard DA techniques [40], and measures for many of the objectives listed in Table 1 may be found in this way. However, measuring achievement of some objectives may be straightforward whereas others are quite difficult. For example, measuring overall health
16
Robert T. Clemen
of a group of constituents may be achieved using standard epidemiological survey methods, likewise, wealth as measured in the relevant currency in principle, although there is the typical problem of obtaining truthful responses to questions about private matters. How does one measure wisdom, though, or respect? For a small group, how does one measure its impact? In policy making, fairness depends on perceptions of the distribution of outcomes as well as the process which led to the allocation. Boiney [7] and Fishburn and Sarin [18,19] provide decision-theoretic procedures for evaluating fairness of allocations (including risky allocations) using the concept of envy among stakeholders. Measuring fairness of process is somewhat more problematic, but not impossible. For example, stakeholders who have a “voice” in a public-policy decision often perceive the process to be fairer than those who have no voice in the decision (e.g., [20]). Thus, one possibility for measuring process fairness would be to measure the extent to which stakeholders are given a voice in a decision (and the extent to which the stakeholders perceive themselves as having a voice). Aside from the objectives listed in Table 1, it is also possible for a decision maker or organization to obtain value from the decision-making process itself. For example, an individual may enjoy the process of discovering her values, or may gain useful experience that can be applied to similar problems later. In an organization, improved communication among workers and heightened commitment to a path of action can result from decision making [12]. Although we acknowledge the importance of value that derives from the process itself, in this chapter we focus on consequence-oriented objectives such as those in Table 1 rather than process-oriented objectives.
4 Measuring Effectiveness Virtually no research has been done that compares DA with other decisionmaking techniques in terms of strong effectiveness. Relatively little work has been done to show weak effectiveness. As mentioned above, much of the early work on behavioral decision was motivated by expected utility and implicitly asked whether DA techniques that stem directly from expected utility theory (e.g., probability and utility assessment) were weakly effective in the narrow sense of being able to provide accurate models of a decision-maker’s preferences or beliefs about uncertainty. However, as argued by Frisch and Clemen [24], many elements of decision making fall outside the expected utility paradigm per se. For example, expected utility theory sheds little light on how to identify one’s objectives or how to find new alternatives. In this section, we have two goals. First, we describe some research paradigms that might be used to measure the effectiveness of DA and other decision-making methods. Second, we give examples of specific effectiveness studies.
Effectiveness of Decision Analysis
17
4.1 Measuring Strong Effectiveness Longitudinal Studies Studies of strong effectiveness must ultimately embrace the challenge of longitudinal studies. In most important decisions for individuals, small interest groups, corporations, or public-policy decision makers, consequences are experienced over time. Thus, one obvious way to measure effectiveness is to recruit participants, subject them to manipulations regarding the use of particular decision-making methods, and to track over time the extent to which identified objectives are achieved. Aside from the complicated logistics of tracking a group of mobile individuals over long time spans (and of maintaining long-term funding for doing so), an important issue is identifying an appropriate decision situation and an adequate sample of participants who face that situation. For example, consider college graduates making decisions about careers. At a large school with a strong alumni program, it may be possible to keep track of individuals who have gone through a particular manipulation as part of making career choices. In modeling multiattribute preferences for jobs or careers, for instance, their judgments may be taken at face value or adjusted for scale compatibility effects as discussed above [3]. Another example might be upcoming retirees for a large corporation; as employees approach retirement, it may be possible to recruit some individuals as participants in a study that manipulates decision techniques for retirement planning. In this case, one might study the effectiveness of Bleichrodt et al.’s [6] approach for assessing risk-averse utility functions, which are then used for making portfolio allocation decisions. A third group might be entrepreneurs, segregated into subgroups according to decision methods used. For entrepreneurs, appropriate objectives to measure may include number of profitable ventures launched, capital attracted from outside investors, or total return on investment over a specified period of time. Here, one could imagine testing different methods for counteracting partition dependence in the entrepreneurs’ probability judgments, using methods suggested in Fox and Clemen [21] or Clemen and Ulu [13]. One area that seems particularly apt for longitudinal studies is the medical arena. What are needed are clinical trials of decision-making methods; patients with the same condition and treatment options could be randomized into different groups in which individual decisions would be made based on different methods. The study would follow the progress of the patients and compare their conditions after specified periods of time depending on the particular condition being treated. The results would compare the effectiveness of the different decision methods under investigation. Some related work has been done. For example, Clancy et al. [9] and Protheroe et al. [56] showed that using decision analysis can influence individual medical decisions (screening or vaccinating for Hepatitis B in the former, treatment of atrial fibrillation in the latter). However, in neither case were patients followed in order to
18
Robert T. Clemen
track their health outcomes. Fryback and Thornbury [25] showed that the use of decision analysis, even informally, can affect physicians’ diagnoses when evaluating radiological evidence of renal lesions. Their results showed that the DA-based diagnoses tended to be more accurate. The study was retrospective; the physicians examined existing patient records for which the actual outcome was known (but not to the physicians in the study). Thus, the use of DA should improve the expected outcome for a renal lesion patient. A genuine clinical trial as described above would be needed to confirm this conclusion. Simulation Studies. The logistic difficulties of real-time longitudinal studies and clinical trials reduce their attractiveness as research methods. Simulations may provide a suitable alternative. Corporate and industry simulations, for example, are common fare in business curricula; similar games that would be amenable to manipulations in decision-making techniques could provide a testbed for the effectiveness of those techniques. Such games would have at least two advantages: the time dimension is highly compressed, and the environment (including in part the objectives of the participants) can be tightly controlled. Games could be designed around individual decisions, corporate strategy, or public policy; the main necessary ingredients are realistic decision situations and outcomes, along with appropriate incentives to engage the participants in the exercise. An example might be a game that requires participants to make a series of marketing strategy decisions for their simulated “firms,” which interact as members of an industry. Different groups could use specific techniques (e.g., a particular computer decision aid versus generic use of DA modeling methods, including decision trees, Monte Carlo simulation, and optimization). A control group having no specific training or decisionmaking instructions would provide a benchmark. Each group’s results would be measured in terms of the objectives specified in the game and could be compared across groups. 4.2 Measuring Weak Effectiveness Comparing Expected Values In contrast to strong effectiveness, studies of weak effectiveness need not be designed to track outcomes and consequences over time. The simplest approach, exemplified by Clemen and Kwit [12], is to compare the expected values of alternatives that are analyzed in a series of decisions. Clemen and Kwit make the comparison by calculating the difference between the expected value of the chosen alternative and the average of the other expected values for the other alternatives analyzed. If it is possible to document the strategy that would have been taken without the analysis (sometimes called the “momentum strategy”), then one can calculate the difference between the expected value of the chosen strategy and the expected value of the momentum strategy.
Effectiveness of Decision Analysis
19
Regardless of the specific metric used, this approach requires substantial record keeping in a consistent way over many decision-making projects. Results that document positive value of the analysis indicate “bottom line” value added, but do not necessarily document value obtained in other ways. For example, if an organization has an objective of improving communication across functional areas, consistently using DA on projects that cut across such areas may help achieve this objective by imposing a common language for discussing decisions. However, such value is not likely to be documented in the calculation of incremental expected value added by DA. Panel Preferences Because it is not always possible to capture all aspects of value in a bottom-line analysis, we broaden the question to ask whether the alternatives generated by a particular decision-making method are viewed as preferable. To operationalize this notion, we propose using a panel of judges. Because we are concerned here with the notion of decision utility, it would be natural to have a panel of judges (e.g., individuals sampled from the same population as the original decision makers) express their preferences for those alternatives. These preferences could be based on holistic judgments, full-fledged preference models, or something in between. Holistic judgments would appear to be unsatisfactory; the decomposition approach of DA as well as other formal decision-making methods challenges the view that holistic judgments adequately capture an individual’s preferences. On the other hand, forcing an individual into a specific preference model requires selection of a particular structure and possibly a particular modeling or assessment technique. Thus, it would appear that some in-between approach is needed, one that requires the judge to make relatively easy assessments regarding the candidate alternatives. As one possible method, consider the problem of comparing multiattribute alternatives. We can ask each member of a panel of judges to rate each alternative on a set of relevant attributes. With data of this nature, the researcher can explore all of the dimensions of preference. The strongest result would be to show that a particular decision-making technique tends to generate a high proportion of dominant or efficient alternatives. A dominated alternative is one for which another alternative can be found that improves on all of the attributes. The set of nondominated alternatives is often called the efficient frontier, because if one member of this set is chosen, it is not possible to switch to another without reducing achievement of at least one objective. In order to operationalize this approach, we need a way to measure an alternative’s efficiency, its closeness to the efficient frontier. Figure 2 presents an analytical example for measuring the relative efficiency of a set of alternatives evaluated on two attributes. A decision maker has identified two attributes, X and Y , that are important in evaluating the alternatives and has in fact rated alternatives A, B, and C in terms of X and Y using functions U (x) and V (y). From the graph it is clear that C is dominated by A and that neither A nor B is dominated. Because A and B are both
20
Robert T. Clemen
on the efficient frontier with respect to this particular set of alternatives, we will set their efficiency measures EA and EB to 100%. We desire a measure that yields a value of less than 100% for EC . Assuming an additive value function, C lies on an indifference curve defined by aU (x) + (1 − a)V (y) = t, where a and (1 − a) can be thought of as weights in a two-attribute additive utility function. Using the same a, the greatest utility achievable is t? , represented by the line segment AB, parallel to the indifference curve through C. Thus, we can define EC to be the ratio t/t? . This is equivalent to calculating the ratio of the distance DC to the distance DE in Figure 2.
Fig. 2. Measuring the relative efficiency of an alternative. Assuming an additive utility function, alternative C lies on an indifference curve having utility t. A measure of efficiency for C relative to the alternatives included in the evaluation set is EC = t/t? . Using the same logic, the efficiency measures for A and B would each be 100%.
We can improve this measure of efficiency slightly. Note that we could have used any of the line segments on the efficient frontier. Each one corresponds to a different set of weights for the additive utility function. To make C appear in the best light possible, we choose the efficient frontier segment that maximizes the ratio t/t? . Mathematically, suppose we have a set A of alternatives, and we wish to calculate the efficiency score Ei for alternative Ai ∈ A, where Ai has attribute values xi for attribute X and yi for attribute Y . We define Ei to be " # aU (xi ) + (1 − a)V (yi ) Ei = max . a supAj ∈A [aU (xj ) + (1 − a)V (yj )] For a particular a, the numerator inside the brackets calculates the utility t for Ai , and the denominator finds the largest possible utility t? over all elements of A. The maximization finds the values for a that maximize the ratio t/t?
Effectiveness of Decision Analysis
21
for alternative Ai . This formula generalizes easily to more than two attributes and to more general forms of multiattribute utility functions. Why would such an approach be useful? Suppose we have a set of alternatives representing choices made by decision makers in a particular decision situation and using specified decision-making methods. A panel of judges can score the alternatives on a set of attributes (of their own choosing or attributes predetermined by the experimenter), and then for each judge the data can be used to generate an efficiency measure for each alternative. With efficiency measures calculated for all of the judges, statistical analysis can be used to compare the efficiency measures across alternatives. The judgmental inputs satisfy our desire for something between holistic judgments of preference and a full-blown preference model; the ratings for each attribute are straightforward judgments that still capture the richness of the judges’ preferences. No assessment of utility weights by the judges is needed, and the statistical analysis can determine whether alternatives generated by one decision-making method tend to be more efficient than alternatives from another method. Using panel preferences is not without problems. As with all decision makers, panel judges may be susceptible to many of the biases discussed in the first part of this chapter. For example, we described the problem of constructed preferences. Individual preferences can be shaped by aspects of a decision situation and the task-contingent strategies people use in judgment and choice [54,73]. If we are unsure of the quality of our elicited panel preferences, of what scientific value are the calculated efficiency scores? Although we cannot completely escape the implications of this question, we offer two mitigating considerations. First, to the extent possible, judges should be exposed to the same decision-making environment as the decision makers and should be encouraged to deliberate about the value of the alternatives on the various attributes according to standard procedures and following principles consistent with Payne et al.’s [55] “building code” for decision making. Doing so may not remove all biases, but standard procedures should minimize biases and variability across judges. Second, the approach described above calculates an efficiency score specific to each judge, and therefore agreement across judges is not required. In fact, the procedure for calculating efficiency scores for alternatives for each judge appropriately takes into account variability in preferences across judges; the objective is to identify a subset of alternatives that are preferred in the sense of having (statistically) higher efficiency scores than other alternatives. 4.3 Some Research Projects The paragraphs above describe some general experimental paradigms that could be used to measure the effectiveness of DA and other techniques. In this section we speculate on some specific studies that might be performed.
22
Robert T. Clemen
Probabilistic Forecasting Competition Imagine a complex forecasting situation that involves many kinds and levels of uncertainty, such as forecasting crude oil prices, diagnosing a disease based on patient signs and symptoms, or troubleshooting a computer software installation. A variety of analytical and modeling techniques are available for problems such as these, ranging from the construction of complex belief nets or other artificial intelligence models to decomposed or even holistic probability judgments made by experts. Given one or more prespecified domains, a competition could be held, pitting different probabilistic forecasting techniques against each other. It would be most natural to hold such a competition in real-time, in which case the result can be used to determine strong effectiveness of the techniques. Strictly proper scoring rules provide natural performance measures that can capture both calibration and skill [52]. In a probabilistic forecasting environment, enough outcomes would have to be recorded to calculate meaningful average scores for the various techniques. Competitions such as this have been run in the forecasting field [45–47]. An alternative to a real-time exercise would be to construct a simulation in which participants would have to make probabilistic forecasts and in which their overall performance would be measured by their average scores. For example, a business simulation in which participants must make judgments and take calculated risks could be designed to incorporate participants’ skill in assessing probabilities related to aspects of their business such as marketing, R&D, production, or competitive analysis. Aside from creating such a game, the challenge would be to implement different probability forecasting options within the context of the game in a way that permits experimental manipulation of the techniques. Scoring rules need not be the basis of comparing probability forecasting techniques, especially if the probability forecast can be related to a specific decision context. For example, in the business simulation game, one might want to measure stock price, profits, market share, or some other objective important to a real or fictitious corporation. Other possibilities are to choose a context such as college choice or retirement planning and have proponents of different techniques develop systems that lead users through the necessary uncertainty judgments and modeling before offering alternatives from which to choose. By tracking the experiences of the participants in a longitudinal study, one could measure the effectiveness of the different systems and implicitly of the underlying techniques. Value Structuring and Creativity In his book, Value-Focused Thinking (VFT), Keeney [40] recommends that decision makers should identify their objectives as a first step in the decisionmaking process, if possible before identifying alternatives. VFT provides many
Effectiveness of Decision Analysis
23
tools and techniques for identifying, structuring, and using objectives in decision making. The overall process has become an important element in the toolkit of decision analysts who work with decision makers on complex multiobjective problems. In VFT, Keeney stresses that his approach is valuable for many things, not least of which is its potential for generating creative alternatives. How would one determine whether a technique generates creative alternatives? One must start with a relatively unstructured problem that admits the possibility of creative problem solving; highly structured textbooklike problems typically do not provide adequate leeway for the decision maker to find creative answers. But if we want to be able to evaluate creative alternatives, by definition we may be considering alternatives that we have not yet seen, so no scoring system for the alternatives can be established in advance. Fortunately, a procedure such as the one described in the previous subsection for measuring weak effectiveness can be used. Suppose that VFT and other techniques are used to generate alternatives in some decision situation, either real or simulated. A panel of judges can evaluate the alternatives by rating them on each of several dimensions, and the subsequent efficiency analysis can determine whether VFT tends to generate more efficient alternatives than do the other techniques. To the extent that more creative alternatives are also efficient, such a study can indicate the potential of DA techniques to enhance creativity. Decision-Making Tournament Several different decision paradigms exist, among them DA, the analytic hierarchy process [59], and goal programming to name a few. A tournament could be held by having proponents of different methods address a prespecified set of decision problems. The quality of the decisions chosen could be evaluated by a panel of judges or, if suitable, by tracking the downstream consequences to the decision makers either in a real-world or simulated environment. An important issue that must be faced in such a tournament is coming up with decision problems that present a reasonably level playing field for the various techniques to be tested. Such problems would presumably consist of “case studies” that are rich in realistic and detailed information. Care must be taken not to present information in a way that artificially predisposes a decision maker toward a particular technique; for example, expressing uncertainty explicitly in terms of subjectively assessed probability distributions might create a bias toward DA, whereas explicit indication of pairwise comparisons might predispose a decision maker toward the analytic hierarchy process. Canonical Decision Problems Researchers might want to run the tournament just described with technique proponents as participants more than once for each set of cases. After the
24
Robert T. Clemen
initial run, it would be necessary in subsequent runs to ensure that the participants were not prejudiced one way or the other by the outcomes and decisions of prior runs. An interesting twist on the tournament idea would be to develop a set of canonical decision problems and an experimental procedure that could be used as a way to test new techniques as they are developed. A parallel can be found in the field of mathematical programming, which has adopted a few computational problems that are commonly used as benchmarks for comparing algorithm performance. Although no similar set of canonical problems exists for decision making, the creation of such a collection would facilitate the comparison of decisionmaking techniques. A strict procedure must be established, however, in which “na¨ıve” subjects would be instructed in the use of a particular technique prior to applying it to the canonical problem. The prior knowledge of the subjects, regarding both the problems they would face and previous results on other decision techniques, must be carefully controlled. Finally, because decisions would not necessarily be made contemporaneously, the panel approach to judging the quality of alternatives is not appropriate. Instead, the decision problems must have appropriate built-in measures for determining the quality of the decisions, and those built-in measures must be directly related to the objectives of the roles adopted by the participants. Ethnological Study of Decision Making A final example is to take an ethnological approach. In this case, one would collect accounts of decision-making styles and techniques in different contexts. For example, a database might be developed that contained accounts of individual decision making, including decision context, framing, techniques used, and choices made. Similar databases could be created for corporate or public-policy decisions, although gag rules may render collection of such data quite difficult. To be able to compare effectiveness, such a database must be augmented with either judgments of quality of the alternative chosen (possibly done by a panel reviewing all decisions in the database) or a measure of the consequences to the decision maker (via later reports of performance). A large database could be analyzed to determine the characteristics of the most effective decision makers. It is clear that a program to study the effectiveness of DA techniques has many facets and issues to which the researcher must attend. The first order of business is to create appropriate experimental and analytical paradigms. For example, the analysis of panel judgments as described above must be refined, tested, and extended to decision situations other than multiattribute choices under certainty. We must learn how to handle the ethics and logistics of longitudinal studies. Good case studies must be developed and refined for use in simulation studies or as canonical decision problems for ongoing research on decision effectiveness. Methodological work of this nature may not appear attractive in and of itself unless it can directly address substantive questions
Effectiveness of Decision Analysis
25
of interest. Nevertheless, the importance of such work cannot be overstated for the research program described here; these methodological problems must be solved if definitive results on effectiveness are to be obtained. With an array of experimental methods available, research can address many aspects of the effectiveness of decision-making methods. Some general examples include the following. • Compare specific techniques such as DA, analytic hierarchy process, na¨ıve decision making, and so on as overall paradigms for making decisions. • Explore different aspects of decision making: problem structuring, uncertainty modeling and assessment, preference assessment, and so on. • Examine different types of decision-making contexts, such as personal decisions, corporate strategy, public policy, or multiple-stakeholder decisions. • Investigate decisions in specific domains such as environmental risk assessment, college or career choice, consumer product marketing strategy, research and development, or municipal waste facility location. • Study how effectiveness of different methods varies depending on individual characteristics, such as age, education, attitudes (e.g., attitudes toward quantitative analysis or technology), or cognitive abilities (e.g., ability to work with quantitative information). Many other, more specific studies are also possible, and they might stem from the development of new DA techniques based on BDR findings, or from the introduction of new decision methods.
5 Conclusion: The Interplay of BDR and DA In the first part of this chapter, we discussed ways in which BDR can be brought to bear on the development of new DA methods. Recent efforts have shown how to use BDR results and theory as a basis for developing better DA methods. We discussed new directions that might prove fruitful, such as considering the role of emotions in decision making and the implications for DA methods. In the second part of the chapter, we discussed in detail how researchers can evaluate decision-making methods. The concepts of strong and weak effectiveness provide a framework for studying effectiveness of decisionmaking methods, and we considered some generic research approaches and specific projects. Although we have discussed these two main topics as if they are largely separate, they actually come together in two interesting ways. First, methods from BDR may be useful in developing research paradigms and methods to study effectiveness. Second, the two topics can lead to a research cycle. We have seen that BDR can inform the development of new DA methods. When the new methods are evaluated in terms of effectiveness, knowledge from BDR can be used to explain the results of the effectiveness studies and in turn help researchers refine the methods. As the cycle continues, DA methods improve,
26
Robert T. Clemen
and we would hope that developing and evaluating new DA methods will contribute to the BDR body of knowledge. An important dilemma that researchers will have to face is the problem of preferences that change over time. For example, in longitudinal studies of strong effectiveness, the researcher must follow participants as they mature and experience consequences over time. The reasons for making the original choice may be less compelling at a later date, leading a decision maker to regret the choice (and, perhaps, the decision to be a participant in the study!). Alternatively, the decision maker may have developed a new rationalization for the original choice based on his new preferences. Yet another possibility is that the decision-making method used led the decision maker to construct his preferences in a particular way, but experiencing the consequences leads to somewhat different preferences. Such complications must be dealt with in order to evaluate strong effectiveness. What does it mean to say that a decision technique is effective at getting us what we want if what we want has changed substantially by the time we experience the consequences? Throughout, we have implicitly made three assumptions that some readers may find controversial. First, we have implicitly assumed that deliberative, System II thinking generally leads to better decisions. Some researchers have offered evidence to the contrary. Recently, for example, Dijksterhuis et al. [15] found that, when making consumer decisions, individuals made “better decisions” when they did not process the information consciously compared to when they did. However, their conclusions are not particularly relevant to our assumption for two reasons. First, their experiments were conducted under conditions vastly different from what one would expect when using DA methods. Dijksterhuis et al. presented information about a variety of different product attributes fairly quickly, focusing on each one for a matter of seconds, and providing four minutes for an individual to think about his or her choice. In contrast, we might expect conscious processing to involve considerably more time, including time to examine claims in detail, acquire and assess new information, discuss the matter with others, and so on. Second, their measure of decision quality was subjective in three of their four experiments (either postdecision satisfaction or attitude toward the target object). In the fourth experiment (their Study 1), the experimenters identified the “best decision” as the choice with the most positive aspects, even though different individuals may weight these aspects very differently in their choices. If the “best decision” was a dominant choice—better than any of the other choices on all aspects—then their method seems reasonable, but otherwise does little to suggest either strong or weak effectiveness. Our second implicit assumption has been that DA methods and the underlying subjective expected utility paradigm are the appropriate deliberative basis for making a decision. However, our argument does not rely on the assumption that DA is “the best” or the “only rational” way to make a decision. All of our arguments could be applied to other deliberative decision-making frameworks, such as the analytic hierarchy process [59] or other multicriteria
Effectiveness of Decision Analysis
27
methods. In fact, studies of effectiveness could profitably compare different decision-making frameworks. Decision analysts who have held tenaciously to the belief that DA is the best method may be surprised! Third, in suggesting that a decision-maker’s or expert’s judgments or preference statements may be improved by adjusting them ex post, we have implicitly assumed that this is a legitimate thing to do. However, it leads to situations that may appear somewhat paradoxical. For example, take the Bleichrodt et al. [6] approach to adjusting a decision-maker’s stated risk preferences to account for distortions due to prospect theory. One can imagine the decision maker saying, “My risk tolerance is X,” and the analyst saying, “No, your risk tolerance is really Y.” Is it reasonable for the analyst to take such a bold step? We offer two justifications. First, the decision maker can be educated (on the basis of sound research) that his or her statements may indeed be biased and that correcting for the biases can improve judgments and choices. Second, in a public-policy setting, experts and stakeholders may state their judgments or tradeoff weights and certify that those statements are the best representation of their beliefs and preferences that they are capable of making. Similarly, the analyst can certify that any adjustments reflect up-to-date scientific knowledge about how to account for known biases. Such certification is already implicitly practiced, for example, by scientists who certify that a particular complex mathematical model represents up-to-date scientific understanding. Helping individuals and organizations find paths that lead to their objectives is the ultimate goal of research in decision making. Regardless of the issues raised in the paragraphs above, BDR and effectiveness studies can lead to better decision-making methods and thus can ultimately help decision makers achieve their objectives.
Acknowledgments I am grateful to colleagues to the Fuqua School of Business, the University of Texas at Austin, and the Workshop on Decision Modeling and Behavior in Uncertain and Complex Environments (Tucson AZ, 2006) for the opportunity to present this work and for hours of discussion. Thanks especially to Susan Brodt, Ellen Peters, and an anonymous reviewer for their comments and insights. This work was supported in part by the National Science Foundation under Grant No. SES-0317867. Any opinions, findings, conclusions, or recommendations expressed herein are those of the author and do not necessarily reflect the views of the National Science Foundation.
References 1. M. Allais. Le comportement de l’homme rationnel devant le risque: Critique des postulats et axiomes de l’ecole americaine. Econometrica, 21:503–546, 1953.
28
Robert T. Clemen
2. M. Allais and J. Hagen. Expected Utility Hypotheses and the Allais Paradox. Reidel, Dordrecht, The Netherlands, 1979. 3. R. M. Anderson and B. F. Hobbs. Using a Bayesian approach to quantify scale compatibility bias. Management Science, 48:1555–1568, 2002. 4. F. G. Ashby, A. M. Eisen, and A. U. Turken. A neuropsychological theory of positive affect and its influence on cognition. Psychological Review, 106:529–550, 1999. 5. R. F. Baumeister and T. F. Heatherton. Self-regulation failure: An overview. Psychological Inquiry, 7:1–15, 1996. 6. H. Bleichrodt, J. L. Pinto, and P. P. Wakker. Making descriptive use of prospect theory to improve the prescriptive use of expected utility. Management Science, 47:1498–1514, 2001. 7. L. G. Boiney. When efficient is insufficient: Fairness in decisions affecting a group. Management Science, 41:1523–1537, 1995. 8. D. Bunn. Applied Decision Analysis. McGraw-Hill, New York, 1984. 9. C. M. Clancy, R. D. Cebul, and S. V. Williams. Guiding individual decisions: A randomized, controlled trial of decision analysis. American Journal of Medicine, 84:283–288, 1988. 10. R. Clemen, S. K. Jones, and R. L. Winkler. Aggregating forecasts: An empirical evaluation of some Bayesian methods. In D. Berry, K. M. Chaloner, and J. K. Geweke, editors, Bayesian Analysis in Statistics and Econometrics, pages 3–14. Wiley, New York, 1996. 11. R. T. Clemen. Making Hard Decisions: An Introduction to Decision Analysis. Duxbury, Belmont, CA, second edition, 1996. 12. R. T. Clemen and R. C. Kwit. The value of decision analysis at Eastman Kodak Company, 1990–1999. Interfaces, 31:74–92, 2001. 13. R. T. Clemen and C. Ulu. Interior additivity and subjective probability assessment of continuous variables. Unpublished manuscript, Duke University, 2006. 14. P. Delqui´e. “Bimatching”: A new preference assessment method to reduce compatibility effects. Management Science, 43:640–658, 1997. 15. A. Dijksterhuis, M. W. Bos, L. F. Nordgren, and R. B. van Baaren. On making the right choice: The deliberation-without-attention effect. Science, 311:1005– 1007, 2006. 16. G. W. Fischer. Utility models for multiple objective decisions: Do they accurately represent human preferences? Decision Sciences, 10:451–479, 1979. 17. B. Fischhoff. Debiasing. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgment Under Uncertainty: Heuristics and Biases, pages 422–444. Cambridge University Press, Cambridge, UK, 1982. 18. P. C. Fishburn and R. K. Sarin. Fairness and social risk I: Unaggregated analyses. Management Science, 40:1174–1188, 1994. 19. P. C. Fishburn and R. K. Sarin. Fairness and social risk II: Aggregated analyses. Management Science, 43:115–126, 1997. 20. R. Folger. Distributive and procedural justice: Combined impact of “voice” and improvement on experienced inequity. Journal of Personality and Social Psychology, 35:108–119, 1977. 21. C. R. Fox and R. T. Clemen. Subjective probability assessment in decision analysis: Partition dependence and bias toward the ignorance prior. Management Science, 51:1417–1432, 2005.
Effectiveness of Decision Analysis
29
22. C. R. Fox and Y. Rottenstreich. Partition priming in judgment under uncertainty. Psychological Science, 14:195–200, 2003. 23. C. R. Fox and A. Tversky. A belief-based account of decision under uncertainty. Management Science, 44:879–895, 1998. 24. D. Frisch and R. T. Clemen. Beyond expected utility: Rethinking behavioral decision research. Psychological Bulletin, 116:46–54, 1994. 25. D. G. Fryback and J. R. Thornbury. Informal use of decision theory to improve radiological patient management. Radiology, 129:385–388, 1978. 26. G. Gigerenzer. How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology, 2:83–115, 1991. 27. G. Gigerenzer, U. Hoffrage, and H. Kleinb¨ olting. Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98:506–528, 1991. 28. T. Gilovich, D. Griffin, and D. Kahneman, editors. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge University Press, Cambridge, UK, 2002. 29. R. Gregory, S. Lichtenstein, and P. Slovic. Valuing environmental resources: A constructive approach. Journal of Risk and Uncertainty, 7:177–197, 1993. 30. J. Hershey, H. C. Kunreuther, and P. J. Schoemaker. Sources of bias in assessment of utility functions. Management Science, 28:936–954, 1982. 31. S. C. Hora, N. G. Dodd, and J. A. Hora. The use of decomposition in probability assessments on continuous variables. Journal of Behavioral Decision Making, 6:133–147, 1993. 32. C. K. Hsee and Y. Rottenstreich. Music, pandas, and muggers: On the affective psychology of value. Journal of Experimental Psychology: General, 133:23–30, 2004. 33. S. K. Jacobi and B. F. Hobbs. Quantifying and mitigating splitting biases in value trees. Unpublished manuscript, Johns Hopkins University, Baltimore, MD, 2006. 34. D. Kahneman. Maps of bounded rationality: Psychology for behavioral economics. American Economic Review, 93:1449–1475, 2003. 35. D. Kahneman and S. Frederick. Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, and D. Kahneman, editors, Heuristics and Biases: The Psychology of Intuitive Judgment, pages 49–81. Cambridge University Press, New York, 2002. 36. D. Kahneman, P. Slovic, and A. Tversky, editors. Judgment Under Uncertainty: Heuristics and Biases. Cambridge University Press, Cambridge, UK, 1982. 37. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979. 38. D. Kahneman and A. Tversky. Choices, Values, and Frames. Cambridge University Press, Cambridge, UK, 2000. 39. R. Keeney and D. von Winterfeldt. Eliciting probabilities from experts in complex technical problems. IEEE Transactions on Engineering Management, 38:191–201, 1991. 40. R. L. Keeney. Value-Focused Thinking: A Path to Creative Decision Making. Harvard University Press, Cambridge, MA, 1992. 41. S. Lichtenstein, B. Fischhoff, and L. D. Phillips. Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgment Under Uncertainty: Heuristics and Biases, pages 306–334. Cambridge University Press, Cambridge, UK, 1982.
30
Robert T. Clemen
42. S. Lichtenstein and P. Slovic. Reversals of preference between bids and choices in gambling decisions. Journal of Experimental Psychology, 89:46–55, 1971. 43. G. F. Loewenstein, C. K. Hsee, E. U. Weber, and N. Welch. Risk as feelings. Psychological Bulletin, 127:267–286, 2001. 44. M. F. Luce, J. R. Bettman, and J. W. Payne. Choice processing in emotionally difficult decisions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23:384–405, 1997. 45. S. Makridakis, A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, and R. Winkler. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting, 1:111–153, 1982. 46. S. Makridakis, C. Chatfield, M. Hibon, M. Lawrence, T. Mills, K. Ord, and L. Simmons. The M-2 competition: A real-time judgmentally based forecasting study. International Journal of Forecasting, 9:5–22, 1993. 47. S. Makridakis and M. Hibon. The M3-competition. International Journal of Forecasting, 16:451–476, 2000. 48. M. McCord and R. de Neufville. Lottery equivalents: Reduction of the certainty effect problem in utility assessment. Management Science, 32:56–60, 1986. 49. M. W. Merkhofer. Quantifying judgmental uncertainty: Methodology, experiences, and insights. IEEE Transactions on Systems, Man, and Cybernetics, 17:741–752, 1987. 50. M. G. Morgan and M. Henrion. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, Cambridge, UK, 1990. 51. M. Muraven and R. F. Baumeister. Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological Bulletin, 126:247– 259, 2000. 52. A. H. Murphy and R. L. Winkler. Scoring rules in probability assessment and evaluation. Acta Psychologica, 34:273–286, 1970. 53. L. D. Ord´ on ˜ez, B. A. Mellers, S.-J. Chang, and J. Roberts. Are preference reversals reduced when made explicit? Journal of Behavioral Decision Making, 8:265–277, 1995. 54. J. W. Payne, J. R. Bettman, and E. J. Johnson. The Adaptive Decision Maker. Cambridge University Press, Cambridge, UK, 1993. 55. J. W. Payne, J. R. Bettman, and D. A. Schkade. Measuring constructed preferences: Towards a building code. Journal of Risk and Uncertainty, 19:243–270, 1999. 56. J. Protheroe, T. Fahey, A. A. Montgomery, and T. J. Peters. The impact of patients’ preferences on the treatment of atrial fibrillation: Observational study of patient based decision analysis. British Medical Journal, 320:1380–1384, 2000. 57. H. Raiffa. Decision Analysis. Addison-Wesley, Reading, MA, 1968. 58. Y. Rottenstreich and A. Tversky. Unpacking, repacking, and anchoring: Advances in support theory. Psychological Review, 2:406–415, 1997. 59. T. Saaty. The Analytic Hierarchy Process. McGraw-Hill, New York, 1980. 60. R. E. Schaefer and K. Borcherding. The assessment of subjective probability distributions: A training experiment. Acta Psychologica, 37:117–129, 1973. 61. B. J. Schmeichel, K. D. Vohs, and R. F. Baumeister. Intellectual performance and ego depletion: Role of the self in logical reasoning and other information processing. Journal of Personality and Social Psychology, 85:33–46, 2003.
Effectiveness of Decision Analysis
31
62. K. E. See, C. R. Fox, and Y. Rottenstreich. Between ignorance and truth: Partition dependence and learning in judgment under uncertainty. Unpublished manuscript, University of Pennsylvania, 2006. 63. S. Sloman. The empirical case for two systems of reasoning. Psychological Bulletin, 119:3–22, 1996. 64. P. Slovic. The construction of preferences. American Psychologist, 50:364–371, 1995. 65. P. Slovic, M. Finucane, E. Peters, and D. G. MacGregor. The affect heuristic. In T. Gilovich, D. Griffin, and D. Kahneman, editors, Heuristics and Biases: The Psychology of Intuitive Judgment, pages 397–420. Cambridge University Press, Cambridge, UK, 2002. 66. P. Slovic, D. Griffin, and A. Tversky. Compatibility effects in judgment and choice. In R. Hogarth, editor, Insights in Decision Making: A Tribute to Hillel J. Einhorn, pages 5–27. University of Chicago Press, IL, 1990. 67. C. S. Spetzler and C.-A. S. Sta¨el Von Holstein. Probability encoding in decision analysis. Management Science, 22:340–352, 1975. 68. C.-A. S. Sta¨el Von Holstein. The effect of learning on the assessment of subjective probability distributions. Organizational Behavior and Human Decision Processes, 6:304–315, 1971. 69. C.-A. S. Sta¨el Von Holstein. Two techniques for assessment of subjective probability distributions: An experimental study. Acta Psychologica, 35:478–494, 1971. 70. A. Tversky and D. Kahneman. The framing of decisions and the psychology of choice. Science, 211:453–458, 1981. 71. A. Tversky and D. Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 26:297–323, 1992. 72. A. Tversky and D. J. Koehler. Support theory: A nonextensional representation of subjective probability. Psychological Review, 101:547–567, 1994. 73. A. Tversky, S. Sattath, and P. Slovic. Contingent weighting in judgment and choice. Psychological Review, 95:371–84, 1988. 74. A. Tversky, P. Slovic, and D. Kahneman. The causes of preference reversal. The American Economic Review, 80:204–217, 1990. 75. D. von Winterfeldt and W. Edwards. Decision Analysis and Behavioral Research. Cambridge University Press, Cambridge, UK, 1986. 76. P. Wakker and D. Deneffe. Eliciting von Neumann-Morgenstern utilities when probabilities are distorted or unknown. Management Science, 42:1131–1150, 1996. 77. M. Weber, F. Eisenf¨ uhr, and D. von Winterfeldt. The effects of splitting attributes on weights in multiattribute utility measurement. Management Science, 34:431–445, 1988. 78. G. Wu and R. Gonzalez. Nonlinear decision weights in choice under uncertainty. Management Science, 45:74–85, 1999.
Reducing Perceptual and Cognitive Challenges in Making Decisions with Models Jenna L. Marquard1 and Stephen M. Robinson2 1
2
Department of Industrial and Systems Engineering, University of Wisconsin-Madison, 1513 University Avenue, Madison, WI 53706, USA [email protected] Department of Industrial and Systems Engineering, University of Wisconsin-Madison, 1513 University Avenue, Madison, WI 53706, USA [email protected]
Summary. Decision makers face perceptual and cognitive challenges in understanding presentations of formal models used to support decisions. These presentations, often made by analysts who created the models, must allow the decision makers not only to acquire information but also to process that information into a form that they can use effectively for decision making. This chapter illuminates the occurrence of eleven well-known perceptual and cognitive challenges that decision makers may face as they receive and use the results of formal models. It also discusses features of the modeling process that have either aided or hindered decision makers in overcoming these perceptual and cognitive challenges, and draws some conclusions about ways in which people might improve the modeling process to reduce the severity of perceptual and cognitive challenges and thereby to improve the decision-maker’s effectiveness. Using retrospective case analysis, with the known set of perceptual and cognitive challenges as themes, we examine the presence and nature of these themes across five modeling projects. The selected projects span diverse disciplines and vary in the type and complexity of the models developed and in the characteristics of the decision makers. Through examination of the case studies, we see evidence of five of the perceptual and cognitive challenges and indication of an additional two challenges. These challenges stem from the nature of the model presentation, from the roles of the analyst and decision maker in the modeling process, or from factors external to that process. From the results of the case analysis we derive a condensed checklist of recommendations for analysts. By identifying perceptual and cognitive traps along with their sources, this work provides insight not only into what challenges are likely to exist when decision makers use models as a part of their decision-making process, but also into how the structure of the modeling process and the model presentation allow those challenges to arise.
34
Jenna L. Marquard and Stephen M. Robinson
1 Background People often view formal models as useful tools for systematic analysis to support informed decision making. However, creating formal models does not necessarily produce informed decision making. Decision makers face perceptual and cognitive challenges in understanding presentations of these models, with these presentations often made by analysts who created the models. Decision makers must not only acquire information from a model presentation, but also subsequently process that information into a form that they can use effectively for decision making. Perceptual and cognitive challenges in acquiring and processing information from model presentations may take many forms, including mental shortcuts taken by a decision maker to reduce a problem’s complexity. These challenges may lead to serious errors in decision making. In particular, a decision maker may incorrectly assess elements of a model presentation such as uncertainties, assumptions, and biases underlying the model, and may then draw incorrect conclusions from the model presentation. This chapter draws on the analysis of a series of case studies to illuminate the occurrence of these perceptual and cognitive challenges in decision-makers’ reception of model presentations. The case studies described include a personnel model used by the army, a navy shipbuilding model, a simulation model of air quality in the San Joaquin Valley, a modeling project undertaken to support economic development of the Obergurgl region in Austria, and an analysis of water management for the Netherlands. We also discuss features of the modeling process that have either aided or hindered decision makers in overcoming these perceptual and cognitive challenges. From these case studies and discussions we draw some conclusions about ways in which people might improve the modeling process to reduce the severity of perceptual and cognitive challenges and thereby improve the decision-maker’s effectiveness. 1.1 The Judgmental Gap In complex and/or uncertain situations, even experienced decision makers may be unable to produce good decisions using only conventional wisdom or mental models [5]. This inability has produced advocacy for making decisions based on systematic analysis. Formal models can help in constructing these systematic analyses because they allow for comprehensive structuring of a situation, organizing information with which an unaided decision maker may be unable to cope [13]. However, decision makers using such models need both sufficient analytic tools to supplement their decision-making skills and the ability to appropriately use these tools. The mere presence of analytic tools does not suffice to produce informed decision making. There is a body of work from the past forty years describing a disconnect between the tools produced through systematic analysis and what decision makers desire in order to address a problem [13,22,23]. Raiffa [22] postulates
Reducing Perceptual and Cognitive Challenges
35
the presence of a judgmental gap between the outputs of formal models and decision makers embedded in the real world. In [22, Figure 10.3], he shows objective inputs feeding into a formal model that produces output, and on the other side the real world, but with a judgmental gap separating these two sides. He also states that the width of the judgmental gap may result in analysis that is not relevant and does not provide meaningful insights, and that may then be ignored. Although Raiffa describes the potential consequences of the judgmental gap, he does not describe how the gap is produced nor how it might be detected by the decision maker embedded in the real world. The judgmental gap described by Raiffa may render a model unusable, or may severely limit its usefulness to the decision maker. Work on characterizing the factors that produce the judgmental gap, as well as the means by which a decision maker detects and responds appropriately to that gap, might help to increase models’ usefulness in supporting decision making. Factors producing the judgmental gap might include the assumptions upon which a model is built or the biases of the analysts who created the model. Our emphasis in this chapter is on perceptual and cognitive challenges that may affect whether and how a decision maker detects and responds to the judgmental gap. 1.2 Models as Analytical Tools Models are one type of analytical tool, and the term “model” can be used to refer to a broad array of representations of reference systems. A model of whatever type is a representation, or abstraction, aimed at mimicking some real-world phenomenon. Greenberger et al. [13] propose that a model “represents selected features of the reference system so that one can learn about the reference system by working with the model.” They also distinguish between classes of models. At the highest level, formal models differ from mental models through their expression in a formalized language that facilitates examination and revision. Within formal models, Greenberger et al. distinguish among four model classes; schematic models, physical models, symbolic models, and role-playing models [13]. According to their definition of models, one can learn about a building to be constructed by examining its blueprint (schematic), a molecule by handling a physical model of its structure (physical), a manufacturing plant by manipulating a mathematical model of its operations (symbolic), and the real-estate business by playing Monopoly (role-playing). Our work in this chapter focuses, within formal models, specifically on the class of symbolic models. This class includes but is not limited to mathematical models, including those that use computers either to determine the results of calculations or as devices to support simulation or graphical visualization. 1.3 Conveyance: Information versus Models Two important role types, the analyst and the decision maker, participate in using models and other analytical tools to inform decision making. Roughly
36
Jenna L. Marquard and Stephen M. Robinson
speaking, a decision maker is an individual or group, possibly embedded within a larger set of decision makers, that makes a commitment toward a certain course of action. An analyst is an aide in the decision-making process, eliciting systemic understanding of a problem situation from one or more decision makers or other stakeholders, forming that information into a coherent analysis and then presenting analytical findings to the decision makers or stakeholders. Like the decision maker, the analyst may or may not be embedded within a larger set of analysts. Stakeholders, a superset of decision makers, include those individuals who have an interest in the decision. The simplified diagram in Figure 1 shows these role types and their relation to one another.
Fig. 1. Conveyance of information.
The existence of these roles produces a set of processes for conveying information from one role type to another. The simplified diagram shown in Figure 1 displays two pathways for conveying information between individuals in these roles, with decision makers conveying situational and personal preference information to analysts and analysts conveying analytical findings back to decision makers, in what can be an iterative and tightly coupled process. These conveyance pathways suggest that each role type possesses complementary knowledge sets, both of which are needed to create informed decisions. Entire disciplines are dedicated to the conveyance of technical or complex information to lay people, however, there has not been a similar body of work produced to assist in conveying technical models to lay people. By “lay people” we mean those who lack the particular technical skills for a deep understanding of the model at issue, although they may be highly qualified in other fields.
Reducing Perceptual and Cognitive Challenges
37
The field of risk communication, for instance, focuses on conveying concepts or messages to lay people with the goal of specific understanding or action [5,9]. This field tends to focus on four conveyance elements; the message, the messenger, the audience, and the context [26]. By contrast, the issues in conveying models to lay people are how to convey relationships and structures, with the goal of understanding and appropriate use of the model. Work in this area, including that described in this chapter, thus complements disciplines such as risk communication by addressing issues surrounding the conveyance of model structures and interactions instead of content and messages. Risk communication and model conveyance also differ in the level of understanding and type of action required by the decision maker. Risk communication messages attempt to induce lay people to take some action or gain some understanding, often in a timely manner, but do not necessarily require comprehension of the logic behind that action or understanding [20]. Model conveyance, on the other hand, requires the decision maker to comprehend better the logic underlying the model but may or may not foster a need for her to take action. 1.4 Model Presentation: Gateway to Understanding Significant empirical work has gone into examining and improving the ability of models to represent two things, a particular reference system and how a decision maker would ideally behave in a given situation. Figure 2 shows these two representations. Much of the work in the former area lies in the area of
Fig. 2. Representations: reference system and ideal behavior.
model verification and validation, whereas work in the latter area has included the elicitation of decision-maker preference and utility values [18,19,24]. By contrast, there is a lack of empirical work examining how decision makers understand and use these models. As was previously mentioned, Raiffa conceptualized these areas as a judgmental gap, but he focused on the potential results of the judgmental gap as opposed to its constituent factors [22]. In many cases those who have vast anecdotal evidence of these factors function in roles, such as consulting, in which they may have neither the facilities nor the incentive to illuminate and address these factors.
38
Jenna L. Marquard and Stephen M. Robinson
The representations shown in Figure 2 can also be viewed as the main directions in which researchers in the field of operations research have moved in trying to close the judgmental gap described in Section 1.1. Work in operations research has included refining the use of decision-maker preferences and utilities in models [5,18], reducing uncertainties in models [21], increasing model complexity, and undertaking extended sensitivity analyses [8]. This focus on building more accurate models as a way to close the judgmental gap may have derived in part from increases in computational power over the past thirty years and a general focus on computation within the operations research and engineering fields. Despite this significant expenditure of work, the problem of fully closing the judgmental gap has gone unsolved for almost forty years. To participate fully in the decision-making process, lay decision makers need to understand the presentation or conveyance, by the analyst, of the formal model that they are using to support their decision. This understanding is important because the analyst’s presentation serves as the gateway to how a decision maker understands and uses a model. Figure 3 shows this view of
Fig. 3. Model presentation.
model presentation as the gateway or conduit from analyst to decision maker. This figure also shows the information flows and avenues of influence when models are involved in the decision-making process. For instance, a decision maker’s influence on the real world is driven by information flows both from the model presentation and from the real world. We have not seen in the existing literature any treatment of the model presentation as central to whatever assistance the model gives to the decision maker. Many articles have dealt with aiding decision makers, but the
Reducing Perceptual and Cognitive Challenges
39
typical view seems to envision the decision maker’s communicating directly with the model and, in many cases, understanding its technical foundations. For example, the article of Borcherding and Schaefer [6] notes the “tendency to use heuristic, ineffective, and often incorrect rules when making intuitive judgments,” and offers procedures and approaches to aid decision makers in complex decision situations. These include multiattribute utility analysis and discussion of issues in the assessment of probabilities. By contrast, we envision a decision maker without a technical understanding of just what the model is doing, for whom the presentation is therefore an essential interface. This chapter focuses specifically on the information flow from model presentation to decision maker. If decision makers can effectively understand a model presentation, they may be able to more appropriately critique and use the underlying model. For instance, based on her understanding of a model presentation, a decision maker may choose not to use a particular model. When decision makers can appropriately calibrate models’ usefulness, they are likely to be better suited to make truly informed decisions. Despite the need for decision makers to better understand the models that they use, via effective model presentations, as far as we are aware no guidelines currently exist to aid analysts in conveying existing and newly developed models to relevant decision makers. Without such guidelines, analysts may create and refine models without understanding, and sometimes in complete ignorance of, key factors that influence how their models are received or used. 1.5 Perception and Cognition in Decision Making To understand how decision makers might make better use of models, it should be beneficial to examine factors that stand in the way of the information flow from model presentation to decision maker, seen in Figure 3. Although many factors probably influence this information flow, we examine here how perceptual and cognitive challenges faced by decision makers may affect how they understand a given model presentation and thereby understand a given model. This work views perception as the process of a decision maker’s acquiring information from a given model presentation, and views cognition as how the decision maker processes information acquired from the model presentation. Perceptual and cognitive challenges, if present, are likely to interfere with the decision-makers’ ability to acquire information from a model presentation and subsequently to process that information into a form that they can use effectively for decision making. Fortunately, we already have much research describing the ways that perceptual and cognitive factors impede decision makers. Much of the well-known work of Tversky and Kahneman examined how heuristics—useful cognitive shortcuts for reducing problem complexity—can lead to “severe and systematic errors” in assessing probabilities and predicting values for uncertain events; see the collection [17]. Hammond et al. [14] also describe what they term “psychological traps,” again addressing the potential errors resulting
40
Jenna L. Marquard and Stephen M. Robinson
from perceptual and cognitive shortcuts in decision making. Janis and Mann [16] focus on decision makers embedded in complex and stressful situations, examining why certain patterns in decision making “interfere with vigilant information processing” and what the consequences of these patterns are. Although the works of Hammond, Keeney, and Raiffa and of Janis and Mann look at the process of decision making generally, they do not specifically address how these “psychological traps” and patterns of less than vigilant information processing affect decision makers who use formal models as a basis for their decision making. Across the fields of psychology and decision making, the list of potential perceptual and cognitive challenges faced by decision makers is large. A scan of relevant literature resulted in a list of seventy potential challenges. Because this work is based in the decision sciences field, we build on the work of Hammond et al. [14], using their list of “psychological traps” as a foundation for the challenges faced by those using formal models as a basis for decision making. Table 1 shows the list of these perceptual and cognitive challenges, as adapted from [14]. Table 1. Psychological traps, adapted from [14]. Challenge
Description
Anchoring Status quo trap
Disproportionately weighting initial information Bias toward alternatives that perpetuate the current situation Weighing past decisions, or costs, in the current decision Searching for or interpreting information in a way that supports one’s preconceptions Framing of a question or problem to influence the answer; e.g., as gains versus losses or with different reference points Overconfidence in the accuracy of one’s predictions Overestimating the probability of memorable or dramatic events Neglecting a base-rate in an assessment Compounding of error due to multiple “safe” judgments Viewing patterns in random phenomena Failure to recognize reality as sometimes surprising
Sunk-cost trap Confirming-evidence trap Framing trap
Overconfidence trap Recallability trap Base-rate trap Prudence trap Outguessing randomness trap Surprised-by-surprises trap
Reducing Perceptual and Cognitive Challenges
41
2 Methods Case studies form one end of a research methodology spectrum whose other end is experimental design [10]. Bromley [7] defines case study as a research method to be a “systematic inquiry into an event or a set of related events which aims to describe and explain the phenomenon of interest.” Qualitative research such as case studies can be helpful in exploring a complex research area about which one knows little. This way of employing case studies differs strikingly from the use of case studies in teaching, where the studies are exemplars of a teaching lesson and thus where elements of a case may be omitted or exaggerated [28,29]. Teaching case studies, such as those found in the Harvard Business Review, can be entirely fictitious while retaining their usefulness. In teaching case studies, the lesson to be learned prevails over accuracy and completeness. In case study research, the opposite is true. Here we use cases to illustrate ways in which various perceptual and cognitive challenges can affect how a decision maker understands a model presentation and thereby understands and uses a given model. This illustration may then help to suggest methods for mitigating the effects of these perceptual and cognitive challenges. Here we follow Yin’s case study methodology [28,29] in examining multiple cases to address the following questions. 1. How do the known perceptual and cognitive challenges described in Table 1 affect how decision makers understand a model presentation? 2. What are viable means for reducing the impact of these challenges on decision-makers’ understanding of a model presentation? Using the set of perceptual and cognitive challenges shown in Table 1 as themes, we examine the presence and nature of these themes in five case studies. The chosen case studies, described below, span diverse disciplines and vary in the type and complexity of the models developed and in the characteristics of the decision makers using the models. Following this examination, we summarize the results in Table 3 of Section 3.12 below. 2.1 Compensation, Accessions, and Personnel Management (CAPM) Model The compensation, accessions, and personnel management (CAPM)1 model is designed to aid military decision makers in understanding how various personnel policies affect the retention behavior of military personnel [2–4].These policies affect pay scales for military personnel, retirement benefits, bonuses upon reenlistment, and early retirement incentives. CAPM is an econometric 1
This model has no connection with the capital asset pricing model, also known as CAPM, that is ubiquitous in finance. As the latter model plays no role in this chapter, we use the abbreviation CAPM for the compensation, accessions, and personnel management model with no risk of confusion.
42
Jenna L. Marquard and Stephen M. Robinson
model based on an assumption of rational decision making, viz. that an individual in the military will determine whether to remain in the service based on a comparison of the economic value of remaining versus the economic value of leaving. The CAPM model was built as a standalone Microsoft Excel-based tool, in which decision makers can manipulate the model and its underlying assumptions directly through the spreadsheet and its associated graphical interface. 2.2 Navy Shipbuilding and Force Structure Analysis Tool The shipbuilding and force structure analysis tool integrates disparate planning tools used by the navy for future force structure decision making. The goal of this integrated tool is to aid navy decision makers in determining how to effectively use the nation’s shipbuilding resources, given limited funding [1]. These decision makers must project future force structure needs, planning for the design and construction of new additions to the fleet given an existing fleet with various capabilities and operational costs. This tool integrates four existing models: the force transition model, industrial base model, operating and support cost model, and financial adjustments and assumptions model. The tool was built using Microsoft Access as the underlying platform; it allows the user to manipulate all aspects of the model. 2.3 San Joaquin Valley Air Pollution Control In the early 1990s an $18 million project was undertaken to develop SARMAP, a model to aid decision makers in determining how to combat severe air quality problems in the San Joaquin Valley region of California [11]. The air quality problems in this region stem from a combination of topography, meteorology, and human-caused emissions, and the modeling efforts were aimed at understanding how to attain an acceptable air quality level by lowering humancaused emissions. The developed model simulated the effects of various sources of emissions on air quality and, through use of the simulation model, planners created an attainment plan for achieving acceptable air quality levels. Despite the planning and modeling efforts, the goal of attaining federal ozone levels was not reached through the attainment plan and the attainment date was moved from 1999 to 2010. 2.4 Obergurgl: A Microcosm of Economic Development In the 1960s, two researchers from the University of British Columbia began a project to form a more cohesive relationship between analysts and decision makers while addressing problems concerning the economic development of the Obergurgl region of Austria. The project, headed by C. S. Holling and Carl Walters, addressed what they saw as “a lack of congruence or communication between the modeler and intended user” [15,23,27]. Their approach to
Reducing Perceptual and Cognitive Challenges
43
this problem involved bringing modelers together with various experts and lay representatives from the Obergurgl region, and having them jointly build a model of human impact on the alpine ecosystem [15]. These joint development sessions were initially highly problematic, due to variations in participants’ conceptions of the problems and of what building a model meant. Through continued development sessions, participants “began to communicate not via the model but around the model” [23]. In this case, no final model was ever built, but the exercise of jointly working toward a model helped the participants to understand more systematically the problems facing them and to collaborate better in the creation of joint policies. 2.5 POLANO: Protecting an Estuary from Floods After a severe storm caused unprecedented damage to the delta region of the Netherlands in 1953, the Dutch government began extensive planning to protect this region from future flooding [12]. One of the most complex efforts involved evaluating means for protecting the largest estuary in the region, the Oosterschelde. This planning effort, called POLANO, generated a common scorecard framework for comparing the impact of three alternatives: an impermeable dam, a storm surge barrier, and keeping the mouth of the estuary open while building dikes around the perimeter. The major impacts studied were cost, security, ecology, and economic and social impacts. Various modeling efforts were used to generate the impact measure values in the scorecard framework.
3 Results Each of the case studies described above showed evidence of the various perceptual and cognitive challenges described in Table 1, faced by the decision makers upon model presentation. The evidence is organized below for each challenge. 3.1 Anchoring Anchoring occurs when a decision maker gives disproportionate weight to initial or past information in a current situation. In forecasting, for instance, future projections are anchored on past values. Questions can also be anchored by including values in the question itself. Anchoring is not necessarily always harmful, however, it does influence generated answers. Forecasts may vary depending on whether they take past information into account, and the answer to a judgmental question will often vary depending on whether the question was anchored with initial values. In modeling, anchoring is important because initial information provided by the analyst will influence how a decision maker uses or interprets the model.
44
Jenna L. Marquard and Stephen M. Robinson
In both the CAPM and navy shipbuilding models, many default values are set for the user. In the model presentation, users are then encouraged to alter these default values to match their own situation. The presence of these default values, however, is likely to influence the data that the user will enter into the model. The assumptions used by analysts can also act as anchors, influencing how a decision maker uses a given model. The navy shipbuilding model, for instance, is built on the assumption that the user should first determine force structure needs, with resulting expenditures and the plan for achieving that force structure being the outputs of the model. Another framework for addressing this problem would be to provide budgetary constraints and some ideal force structure, with the model coming as close to the ideal force structure as possible given the budget. This assumption of how the model is to be used heavily anchors and affects how the decision maker views and uses the particular model. 3.2 Status Quo Trap The status quo trap occurs when maintaining a current situation appears easier or less risky than opting for change, creating a bias toward alternatives that perpetuate the current situation. It is often very easy for this to happen, because the consequences of the status quo will generally either be known or will be relatively easy to visualize, whereas it may require a strong imaginative effort to understand the consequences of a change, and even then they may be highly uncertain. None of the case studies showed direct evidence of bias toward the status quo, however, the San Joaquin air quality model decision makers may have been biased in this way through their past tendencies to support business interests over citizen interests [11]. These past tendencies may have influenced the significant amount of industry participation in the modeling process or the apparent leniency toward industry in the modeling recommendations. Users of the CAPM model may also fall into the status quo trap, especially if the personnel changes to be made would decrease wages, retirement bonuses, or reenlistment bonuses. 3.3 Sunk-Cost Trap The sunk-cost trap occurs when past costs or decisions influence a current decision. For instance, if decision makers have spent money on a project that is now not going well, they may be reluctant to stop the project because of the costs already put into the project. Although it was not evident in any of the case studies, the sunk-cost trap is important for decision makers to keep in mind when using models. There may be a tendency to use a model simply because money was spent to create it. In fact, the money spent on building the model is a sunk cost and the assessments of if and how to use the model
Reducing Perceptual and Cognitive Challenges
45
are new decisions that should not be affected by the sunk cost involved in building it. 3.4 Confirming-Evidence Trap The confirming-evidence trap occurs when decision makers search for or interpret information in a way that supports their preconceptions. This error is essentially one of determining a solution before conducting an analysis. Because models of complex situations are laden with uncertainty, analysts must be aware that decision makers may wish to use their model to justify conclusions to which they have already come. In the San Joaquin air quality model, industry representatives were well represented on the planning boards. Although involvement from stakeholders such as industry may be beneficial, other prominent stakeholders such as public interest and community groups were not represented on these boards. When the air quality model was built and policies were recommended, the recommendations appeared lenient toward industry. The Netherlands estuary project provided a novel form of output to decision makers through the use of a scorecard tool. The disaggregated nature of this tool and the fact that no ideal solution was provided to decision makers allowed various stakeholders to interpret the results of the work differently. Those worried about the fishing industry may have favored the solution that had the least impact on the fish population in the estuary and those concerned about budgeting may have leaned toward the solution that cost the least. In a similar manner to the Netherlands estuary project, the Obergurgl project involved a variety of stakeholders and no unique solution. Through joint development of a model, these stakeholders could argue for model inputs and a model structure that benefited their priorities. 3.5 Framing Trap The framing trap occurs when the way a problem is framed influences the answer. A well known example of this trap, taken from [25], deals with two groups faced with the decisions given in Table 2 for a situation in which 600 people were at risk. It is clear from the program descriptions that the effects of Programs A and C are the same, as are those of Programs B and D. In the first group of subjects, where framing was in terms of gains, about three-quarters of the respondents chose the certain program (A) (a risk-averse choice). In the second group, where framing was in terms of losses, about three-quarters chose the risky program (D) (a risk-seeking choice). The Netherlands estuary project showed particularly strong potential vulnerability to framing effects. In this project, analysts determined which impacts to display, their ordering, and their units of expression on the scorecards. On the scorecards, some of the impacts were displayed as positive, or in terms of the gains brought about by an option. Other impacts were displayed as
46
Jenna L. Marquard and Stephen M. Robinson Table 2. Framing effect in a risky decision with 600 people at risk, from [25]. Group 1
Group 2
•
•
Program choices
•
If Program A is used, 200 people will be saved If Program B is used, either 600 people will be saved (p = 1/3) or no people will be saved (p = 2/3)
•
If Program C is used, 400 people will die If Program D is used, either nobody will die (p = 1/3) or 600 people will die (p = 2/3)
Choice patterns
72% preferred Program A; 28% preferred Program B
22% preferred Program C; 78% preferred Program D
Expected value of gamble
(1/3)(600) + (2/3)(0) = 200 lives saved
(1/3)(0) + (2/3)(600) = 400 lives lost
Risk preference
Risk-averse
Risk-seeking
negative, or in terms of the detriments brought about by each option. This mixture of positive and negative framing may have influenced how decision makers evaluated the impacts. It should be recalled that this project was conducted before the work of Tversky and Kahneman had produced widespread awareness of framing effects. 3.6 Overconfidence Trap Overconfidence occurs when decision makers strongly believe in the accuracy of their predictions. In the case of modeling, overconfidence occurs when a decision maker strongly believes in the accuracy of an analyst’s predictions. This overconfidence can occur in particular when decision makers view analysts as experts and analysts have not explicitly expressed uncertainties in the communication of their models. This lack of explicit uncertainty shows up in the Netherlands estuary project scorecards, the navy shipyard and CAPM models, and the San Joaquin air quality model. In the description of the San Joaquin air quality model, those involved in the modeling process are described as “highly qualified experts with several decades of experience” using a “robust database for input data.” Decision makers were so confident of the accuracy of the model predictions that they made little or no provision for a margin of error in the predictions. In essence, their action plan relied on the idea that the real world would act exactly as the model predicted.
Reducing Perceptual and Cognitive Challenges
47
3.7 Recallability Trap The recallability trap occurs when the probability of memorable or dramatic events is overestimated. Although not mentioned directly in any of the case studies, this error may have occurred in the Netherlands estuary project. Given the catastrophic nature of the storm that drove the project, decision makers may have given additional weight to solutions that would never allow that type of flooding to happen again. 3.8 Base-Rate Trap The base-rate trap occurs when conclusions are drawn while ignoring a base rate. An example of this given by Hammond et al. [14] asks the question, “A man is shy; is it likely he’s a salesman or a librarian?” In this situation, although librarians may tend to be shyer than salesmen, there are many more male salesmen than male librarians. Given this, the man is more likely to be a salesman. This trap did not appear in any of the case studies examined. 3.9 Prudence Trap The prudence trap occurs when errors are compounded because several “safe” estimates are combined. As a simple example, suppose that one is observing four mutually independent real-valued random variables, each of which can take one of two values, good or bad. If for each variable the probability of good is 90%, then the probability that all four take good values is not 90%; rather, it is (0.9)4 = 0.6561, which is less than 2/3. This is an instance of the well-known principle from reliability theory that a series system may exhibit low reliability even though the reliability of each of its components is high. The case study where this may have occurred is the navy shipbuilding model. Because this model aggregates four existing models, a decision maker who uses conservative estimates in each of the four base models could be faced with serious errors when the models are aggregated. Conversely, the Netherlands estuary project purposefully did not aggregate the various impacts. In this way, any errors are limited to single scorecard elements. 3.10 Outguessing Randomness Trap When decision makers believe they can detect patterns in phenomena that are actually random, this is known as the outguessing randomness trap. This trap did not appear explicitly in any of the case studies. 3.11 Surprised-by-Surprises Trap In some situations seemingly surprising events may in fact have high probability. When a decision maker fails to see these surprises as probabilistic events, this is the surprised-by-surprises trap. This trap did not appear explicitly in any of the case studies.
48
Jenna L. Marquard and Stephen M. Robinson
3.12 Summary of Observations Table 3 summarizes the observations in the preceding sections, showing which traps occurred in each of the cases we considered. Table 3. Psychological traps observed in various cases. An X in the table indicates that we found evidence of that trap in that case; an I indicates some indication but not conclusive evidence. Trap Anchoring Status quo Sunk cost Confirming evidence Framing Overconfidence Recallability Base-rate Prudence Outguessing randomness Surprised by surprises
CAPM
Navy
X I
X
San Joaquin
POLANO
X
X X X I
I X
X
Obergurgl
X
X
X
4 Discussion Through examination of the case studies, we can see evidence of the various perceptual and cognitive challenges, or psychological traps, defined by Hammond et al. [14]. Interestingly, not all of these traps appeared in the chosen case studies. Possible reasons for nonappearance of a particular trap could be that (1) decision models may in general not be vulnerable to it; (2) some decision models may be vulnerable to it but the cases we studied were not; (3) the trap was actually present in one or more of the studied cases but it did not appear in the documentation that we used as a basis for the case analysis. The first of these explanations would indicate that the modeling process somehow eliminates some of the perceptual and cognitive challenges known to exist in the decision-making process. We have no basis for an opinion as to whether this is or is not true. The second explanation, if correct, would merit further examination of a wider body of case studies. The final explanation rests on the nature of the case study documentation, rather than the case itself. Documentation of models for decision makers is usually focused on the model itself: how to use it and how to draw conclusions from it, as opposed to how the model was received and used by decision makers. This explanation, if
Reducing Perceptual and Cognitive Challenges
49
correct, suggests a need for prospective evaluation of cases where models are developed for decision makers, as opposed to the retrospective method used here. The perceptual and cognitive challenges identified in the various cases are quite interesting, particularly because of the diversity of the areas from which the challenges stem: these include (A) the nature of the model presentation, (B) the roles of the analyst and decision maker in the modeling process, and (C) factors external to the modeling process. One can usefully view these areas in terms of their scope of influence, from micro- to macroinfluencers, as seen in Figure 4.
Fig. 4. Scope of influence: micro to macro.
4.1 Model Presentation Challenges stemming from the nature of the model presentation included both the content of the presentation and method of presentation. Challenges stemming from presentation content arose due to the default values and assumptions embedded in both the CAPM and navy shipbuilding models. Because decision makers interact directly with these modeling tools, the anchoring of the model content may be troublesome. Differences in challenges can also be seen across cases that employed different presentation methods. The CAPM and navy shipbuilding model presentations consisted of decision makers directly interacting with the developed models, whereas the Netherlands estuary project presented models to decision makers through a set of visual and numerical scorecards. Although the CAPM and navy shipbuilding model presentations displayed evidence of anchoring challenges, as described above, the Netherlands estuary project model presentations displayed evidence of
50
Jenna L. Marquard and Stephen M. Robinson
framing challenges because of the way model outputs were displayed on the scorecards. Similarly, the navy shipbuilding model aggregated four existing models, but allowed the decision maker to generate the inputs for these models. By aggregating these models in the model presentation, this model was subject to prudence challenges arising from the decision maker’s compounding of conservative estimates across the four models. The use of a scorecard presentation method in the Netherlands estuary project purposefully disaggregates errors and thus may combat the prudence trap, but thoughtful design of these scorecards is then necessary to avoid framing challenges. These examples suggest the need for analysts to consider, along with each method of model presentation, the perceptual and cognitive challenges to which that presentation method may be subject. If decision makers are to interact with a model, special attention must be given to the default values and assumptions provided to the decision maker, as well as how the decision maker is likely to make estimates for the data embedded in the model. If decision makers are to be presented with model outputs (e.g. via scorecards), analysts must understand that in the presence of uncertainty the framing of these outputs is likely to influence how the decision maker understands and judges the model outputs. In practice, neither of the authors has ever seen such explicit consideration of perceptual and cognitive traps by working analysts designing model presentations. 4.2 Roles of the Analyst and Decision Maker In some cases, vulnerability to psychological traps might stem from how analysts and decision makers view themselves and their role in the modeling process, how they view each other, and how they interact with each other. If a decision maker views an analyst as an expert, for instance, this could increase vulnerability to the overconfidence trap. Extreme overconfidence occurred in the San Joaquin air quality model where decision makers relied on the real world’s acting exactly as the analysts’ model predicted it would. Another challenge associated with the roles of analysts and decision makers can occur when decision makers do not separate the decision to invest in the modeling process from the decision to use the model, thereby not viewing the investment in the modeling process as a sunk cost. Analysts could aid in combating these challenges by clearly communicating both the uncertainties embedded in their models and the limitations of those models, and by encouraging decision makers’ critical evaluation of both the model presentations and (to the extent feasible) the models themselves. 4.3 Factors External to the Modeling Process Factors external to the modeling process may impose perceptual and cognitive challenges on how decision makers aid in constructing and interpreting models. These factors might include recent or current events, political pressures on
Reducing Perceptual and Cognitive Challenges
51
the decision maker, and expectations that others have of the decision maker. In two of the case studies, lay participants were heavily involved in the model construction process. In these cases, the backgrounds of the participants and their viewpoints may have influenced the construction of the model, leading to challenges associated with building models that confirm participants’ preconceptions or desires. In building the San Joaquin air quality model, industry representatives were much more heavily represented than public interest and community groups. Such a discrepancy in participation may lead to a model whose outputs unfairly favor the interests of the well-represented group. In the Obergurgl project, stakeholders with various backgrounds were in a position to try to structure the model to benefit their own interests. The cross-section of participants in that project, however, is likely to have tamed the impact of any one group’s interests. Analysts need to understand the basis for the information they receive from decision makers and affected stakeholders: if uncritically used, this information may lead to models that simply confirm the conceptions decision makers already had of the situation. The structuring of the Netherlands estuary project model presentation allowed decision makers to view the model outputs in a way that confirmed their preconceptions or desires. Because the model presentation did not aggregate impacts, decision makers could argue for or against solutions based on how they were expected to affect factors of interest to them. In this case, analysts must be aware that certain model presentations may lend themselves to decision makers’ interpreting the presentation to confirm their preconceptions. Decision makers may be influenced by outside pressures to make decisions that do not drastically alter the status quo. Although a model might justify a decision maker’s altering the status quo, the decision maker might have difficulty in choosing such a course of action. For instance, in the CAPM model, any policy change would require some subset of military personnel to receive lowered wages and/or bonuses despite another subset’s having higher wages and/or bonuses. As decision makers can directly manipulate the inputs and structure of the CAPM model, they could be driven to structure the model in such a way that it produces status quo outputs. In such a case, during the initial structuring of the modeling process the participants would do well to consider how much direct control decision makers should have over the model’s inputs. Of course, in many cases such consideration might be subject to the decision maker’s veto, and might therefore be unlikely to happen. Finally, if a modeling project is driven by a dramatic and/or catastrophic event, as in the Netherlands estuary project, model construction and interpretation may be heavily driven by this external event. Because of the nature and recency of the event, modelers may overweight the chances of such an event occurring again and decision makers may seek solutions that would minimize any repeat occurrence of the event. Analysts can best serve the overall goal of informed decisions by understanding and communicating the impact that such an event can have on both model formulation and interpretation, although that may be very difficult to do.
52
Jenna L. Marquard and Stephen M. Robinson
4.4 Recommendations and Limitations Here is a condensed checklist of recommendation for analysts, based on the above discussion. • Weigh carefully whether to provide default values and assumptions in a model to the user: they can influence the input data and assumptions a decision maker uses when interacting with the model. • Understand that the way model outputs are framed (i.e., positively or negatively) can influence how a decision maker compares alternatives. • If decision makers are to interact with a model, determine the extent and implications of the various uncertainties in the input values they will provide. • When determining the method of model presentation, understand that each method of presentation requires weighing of tradeoffs in the types of challenges to which the method is subject. • Clearly communicate model uncertainties, assumptions, biases, and limitations. • Encourage decision makers to understand and critique the model. • Maintain awareness that decision makers may provide inputs or evaluate model presentations to provide justification for maintaining a status quo situation or to confirm previous beliefs about the problem or system. • Understand that recent or dramatic events can be overestimated in probability or can heavily influence how a model presentation is interpreted. This exploratory work has several limitations. First, the small number of cases examined and the nature of the documentation available almost certainly provide only a partial view of the perceptual and cognitive challenges that affect decision makers’ use of formal models as a part of their decision-making process. To fully understand these challenges, it may be necessary to conduct a series of prospective case study analyses of modeling projects. This prospective type of analysis may then lead to a more informative set of recommendations for analysts. Second, this work looks solely at the perceptual and cognitive challenges that might impede how a decision maker understands and uses a model. The impact of other factors, such as the degree of involvement of the decision maker in the modeling processes, will also be important for further understanding of how decision makers receive and use formal models.
5 Conclusions Formal models can add significant value in assisting informed decision making, but the creation of these models does not necessarily produce informed decisions. Decision makers using such models need sufficient analytic tools— models in this case—to supplement their decision-making skills, as well as the ability to use these tools appropriately. Although there is a significant body of
Reducing Perceptual and Cognitive Challenges
53
work addressing how to build these models, there is a lack of empirical work examining how decision makers understand and use them. To fully participate in the decision-making process, lay decision makers need to understand the presentation or conveyance, by the analyst, of the formal model that is likely to support their decision. Many factors probably influence this understanding, significant among these are perceptual and cognitive challenges to understanding. The “psychological traps” framework of Hammond et al. [14] provides a helpful set of themes for retrospective analysis to understand the role of these challenges. Analysis of the following five case studies disclosed evidence for seven of the eleven psychological traps listed in Table 1. 1. 2. 3. 4. 5.
Compensation, accessions, and personnel management (CAPM) model, Navy shipbuilding and force structure analysis tool, San Joaquin Valley air pollution control, Obergurgl: a microcosm of economic development, and POLANO: protecting an estuary from floods.
From this analysis we concluded that the psychological traps identified in the case studies stemmed from three factors: (A) the nature of the model presentation, (B) the roles of the analyst and decision maker in the modeling process, and (C) factors external to the modeling process. We offered recommendations to help analysts improve the presentation of their models. By identifying perceptual and cognitive traps along with their sources, this work provides insight into not only what challenges are likely to exist when decision makers use models as a part of their decision-making process, but also how the structure of the modeling process and the model presentation allow these challenges to arise. That insight can then aid analysts to combat the traps, and thus to improve the both the effectiveness of the modeling process and the quality of the resulting decisions.
Acknowledgments The work of the first author was supported by a National Science Foundation (NSF) Graduate Research Fellowship, and that of the second author was supported in part by the Air Force Research Laboratory under agreement number FA9550-04-1-0192, and in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant number DAAD19-01-1-0502. The U.S. Government has certain rights in this material, and is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the sponsoring agencies or the U.S. Government.
54
Jenna L. Marquard and Stephen M. Robinson
References 1. M. V. Arena, J. F. Schank, and M. Abbott. The Shipbuilding and Force Structure Analysis Tool: A User’s Guide. RAND Corporation, Santa Monica, CA, 2004. 2. J. A. Ausink. User’s Guide for the Compensation, Accessions, and Personnel Management (CAPM) Model. RAND Corporation, Santa Monica, CA, 2003. 3. J. A. Ausink, J. A. K. Cave, and M. J. Carrillo. Background and Theory Behind the Compensation, Accessions, and Personnel Management (CAPM) Model. RAND Corporation, Santa Monica, CA, 2003. 4. J. A. Ausink and A. A. Robbert. A Tutorial and Exercises for the Compensation, Accessions, and Personnel Management (CAPM) Model. RAND Corporation, Santa Monica, CA, 2003. 5. P. L. Bernstein. Against the Gods: The Remarkable Story of Risk. John Wiley & Sons, New York, 1996. 6. K. Borcherding and R. E. Schaefer. Aiding decision making and information processing. In M. Irle, editor, Decision Making: Social Psychological and Socioeconomical Analysis, chapter 17, pages 627–673. Walter de Gruyter, Berlin, 1982. 7. D. B. Bromley. Academic contributions to psychological counselling. 1. A philosophy of science for the study of individual cases. Counselling Psychology Quarterly, 3:299–307, 1990. 8. D. G. Cacuci. Sensitivity and Uncertainty Analysis. Chapman & Hall/CRC, Boca Raton, FL, 2003. 9. J. C. Davies. Comparing Environmental Risks: Tools for Setting Government Priorities. Resources for the Future, Washington, DC, 1996. 10. J. R. Feagin, A. M. Orum, and G. Sjoberg. A Case for the Case Study. University of North Carolina Press, Chapel Hill, 1991. 11. J. D. Fine and D. Owen. Technocracy and democracy: Conflicts between models and participation in environmental law and planning. Hastings Law Journal, 56:901–982, 2005. 12. B. F. Goeller, A. Abrahamse, J. H. Bigelow, J. G. Bolten, D. M. DeFerranti, J. C. DeHaven, T. F. Kirkwood, and R. Petruschell. Protecting an estuary from floods. Vol. 1, Summary report: A policy analysis of the Oosterschelde. Technical Report R-2121/1-NETH, RAND Corporation, Santa Monica, CA, 1977. 13. M. Greenberger, M. A. Crenson, and B. L. Crissey. Models in the Policy Process: Public Decision Making in the Computer Era. Russell Sage Foundation, New York, 1976. 14. J. S. Hammond, R. L. Keeney, and H. Raiffa. Smart Choices: A Practical Guide to Making Better Decisions. Harvard Business School Press, Boston, 1999. 15. C. S. Holling. Adaptive Environmental Assessment and Management. John Wiley & Sons, Chichester, England and New York, 1978. 16. I. L. Janis and L. Mann. Decision Making: A Psychological Analysis of Conflict, Choice, and Commitment. Free Press, New York, 1977. 17. D. Kahneman, P. Slovic, and A. Tversky. Judgment Under Uncertainty: Heuristics and Biases. Cambridge University Press, Cambridge, UK; New York, 1982. 18. R. L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Cambridge University Press, Cambridge, England, 1993. 19. J. G. March. Bounded rationality, ambiguity, and the engineering of choice. The Bell Journal of Economics, 9:587–608, 1978. 20. National Research Council. Improving Risk Communication. National Academy Press, Washington, DC, 1989.
Reducing Perceptual and Cognitive Challenges
55
21. A. Onatski and N. Williams. Modeling Model Uncertainty. National Bureau of Economic Research, Inc., March 2003. 22. H. Raiffa. Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Addison-Wesley, Reading, MA, 1968. 23. H. Raiffa. The Art and Science of Negotiation. Belknap Press of Harvard University Press, Cambridge, MA, 1982. 24. R. G. Sargent. Verification and validation of simulation models. In S. Chick, P. J. Sanchez, D. Ferrin, and D. J. Morrice, editors, Proceedings of the 2003 Winter Simulation Conference, pages 37–48, 2003. 25. A. Tversky and D. Kahneman. The framing of decisions and the psychology of choice. Science, 211:453–458, 1981. 26. University of Maryland. EnviRN. Online document, July 2005. http://envirn. umaryland.edu/. 27. C. J. Walters. Adaptive Management of Renewable Resources. Macmillan, New York, 1986. 28. R. K. Yin. Applications of Case Study Research, volume 34 of Applied Social Research Methods Series. Sage, Thousand Oaks, CA, 2003. 29. R. K. Yin. Case Study Research: Design and Methods, volume 5 of Applied Social Research Methods Series. Sage, Thousand Oaks, CA, third edition, 2003.
Agricultural Decision Making in the Argentine Pampas: Modeling the Interaction between Uncertain and Complex Environments and Heterogeneous and Complex Decision Makers Guillermo Podest´ a1 , Elke U. Weber2 , Carlos Laciana3 , Federico Bert4 , and 1 David Letson 1
2
3
4
Rosenstiel School of Marine and Atmospheric Science, University of Miami, USA [email protected], [email protected] Columbia University, Department of Psychology and Graduate School of Business, and Center for Research on Environmental Decisions (CRED), USA [email protected] Facultad de Ingenier´ıa, University of Buenos Aires, Argentina [email protected] Facultad de Agronom´ıa, University of Buenos Aires, Argentina [email protected]
Summary. Simulated outcomes of agricultural production decisions in the Argentine Pampas were used to examine “optimal” land allocations among different crops identified by maximization of the objective functions associated with expected utility and prospect theories. We propose a more mathematically tractable formulation for the prospect theory value-function maximization, and explore results for a broad parameter space. Optimal actions differ among some objective functions and parameter values, especially for land tenants, whose enterprise allocation is less constrained by rotations. Our results demonstrate in a nonlaboratory decision context that psychologically plausible deviations from EU maximization matter.
1 Introduction The world faces the dual challenge of feeding a burgeoning 21st century population of perhaps 9 billion, while at the same time not depleting its ecosystems that sustain life and well-being. In recent decades, agricultural output succeeded in outpacing human population growth and has reduced famine. As the food supply must continue to expand, however, it must do so with reduced environmental consequences [29]. New environmental information and its innovative usage will be central to this expansion. Agricultural stakeholders consistently rank climate variability among the top sources of risk to production or profits. The use of climate information
58
Guillermo Podest´ a et al.
worldwide is evolving. Whereas decisions used to be based on analysis of historical records, now there is an increasing capability to monitor and predict seasonal regional climate. The increase in scientific and technological capabilities, an increasing appreciation for the importance of climate on human endeavors (including sustainable development, poverty mitigation, and food security), and a greater demand for climate information are all providing greater incentives for the provision of climate services, which can be defined as the timely production and delivery of useful climate data, information, and knowledge to decision makers [7,30]. On seasonal-to-interannual scales, the El Ni˜ no–Southern Oscillation (ENSO) phenomenon is the major single source of climate variability in many parts of the world [38]. The emerging ability to forecast regional climate based on ENSO [2,11,23] offers agricultural decision makers the opportunity to mitigate unwanted impacts and take advantage of expected favorable conditions [15,18,27,28]. However, any efforts to foster effective use of climate information and forecasts in agriculture must be grounded in a firm understanding of the goals, objectives, and constraints of decision makers in the target system, for three reasons. First, climate data, forecasts, and technical assistance with the use of climate information are often publicly provided and highly subsidized. Estimates of the economic value of climate information and forecasts help justify investments in such publicly provided technology and infrastructure by comparing rates of return to those available from investments in other innovations. Research that estimates value of information (VOI) by simulating optimal forecast responses can provide useful insights, but actual use of climate information in agricultural production decisions and the production decisions themselves will most likely deviate from the prescriptions of normative models. Tests and validations of alternative descriptive models of risky decision making and probabilistic information use are thus crucial to obtaining realistic estimates of the value added from climate information. Estimates of the value of climate information should be based on alternative models closely linked to observed decision processes. The impact of alternative assumptions about decision processes and goals needs to be examined. Second, the goals and objectives of farmers’ decisions (i.e., their objective functions, in decision-theoretical terms) influence how climate information (both historical data and forecasts) is used. In turn, this has implications for how climate information should be presented and communicated (i.e., the design of climate forecasts and tutorials on climate information use). Decisions on the current contents and formats of climate forecasts make implicit assumptions about what farmers are trying to achieve and how such information will be used. It will be useful to make these assumptions explicit and put them to test. The probabilistic nature of climate forecasts needs emphasis and explanation for all users, as probabilistic thinking is a relatively recent evolutionary accomplishment [13] and not something that comes naturally to even highly trained professionals [8]. Nevertheless, the expectation of a
Agricultural Decision Making in the Argentine Pampas
59
deterministic forecast that will turn out to be either “correct” or “false” is especially damaging in situations where the decision maker will experience postdecisional regret after believing that she acted on a “false” forecast. Better understanding of the outcome variables that matter to farmers also will provide guidelines on whether and how best to “translate” climate forecasts. If, for example, crop yields or the costs of production input get particular attention, it makes sense to “translate” a climate forecast into the agronomic yield and/or cost implications that it holds. Third, decision makers in numerous domains have been shown to have poor insight into their own decision processes and goals and objectives. This offers opportunities for interventions to help farmers to enhance their decisions. When made aware of the objective function and goals implicit in their past decisions, decision makers tend to react in one of two ways. Some are surprised by identified objectives and the associated cues or information they are using in their decisions. Furthermore, once aware of these objectives and cues, these decision makers may wish they were not using them: examples may include unconscious gender discrimination in hiring decisions, or possibly crop yield maximization rather than profit maximization in farm production decisions. Other decision makers may concur with identified goals, objectives, and their associated information cues once made apparent to them, and refuse to give up on them (e.g., greater sensitivity to losses than to gains), even if they violate normative models. Identification of objective functions and decision goals will provide feedback to farmers about their implicit decision processes, which can then be reviewed and either explicitly acknowledged and accepted, or rejected, leading to the modification of decision processes.
2 Background 2.1 Choice Theories: Expected Utility and Prospect Theory The work by von Neumann and Morgenstern [40] provided an explicit formulation of expected utility (EU) and an axiomatic foundation. Subsequent extensions and variations are described by Schoemaker [37]. The EU model has been central in the analysis of choice under risk and uncertainty. It has been successful not only because of its compelling axiomatic foundation and ability to describe economic choices, but also because of its mathematical tractability [41]. Despite its obvious strengths, EU maximization as the (sole) objective of risky choice has encountered some opposition in recent years. There is both experimental and real-world evidence that individuals often do not behave in a manner consistent with EU theory [4,24]. A central assumption of EU theory is that the utility of decision outcomes is determined entirely by the final wealth they generate regardless of context, that is, that it is an absolute or reference-independent construct. Yet, decision-makers’ evaluation of outcomes appears to be influenced by a variety of relative comparisons [19].
60
Guillermo Podest´ a et al.
Prospect theory (PT) [20] and its modification, cumulative prospect theory [9,39], currently have become the most prominent alternatives to EU theory. Prospect theory formalizes one type of relative comparison observed when decision makers evaluate the utility of decision outcomes. Its value function V (·) is defined in terms of relative gains or losses, that is, positive or negative deviations from a reference point. Value therefore is determined by changes in wealth, rather than reference-independent states of wealth as in utility theory [19]. Furthermore, the value function for losses is steeper than the value function for gains, resulting in a sharp kink at the reference point. This feature of the value function models the phenomenon of loss aversion, that is, the observation that the negative experience or disutility of a loss of a given magnitude is larger than the positive experience or utility of a gain of the same magnitude. Empirical studies have consistently confirmed loss aversion as an important aspect of human choice behavior [4,5,36]. Rabin [33] emphasized the growing importance of loss aversion as a psychological finding which should be integrated into economic analysis. 2.2 EU Formulation We define a risky prospect q = (p1 , w1 ; . . . ; pn , wn ) as the ensemble of possible wealth/outcome values wi with associated probabilities pi that are nonnegative and add up to one. A common formulation (p. 104 of [16]) states that a decision maker evaluates the expected utility of prospect q as X EU (q) = pi u(wi ) . (1) i
The following real-valued utility function u(·) is given by Pratt [32] as w1−r if r 6= 1 1−r u(w) ∝ , ln w if r = 1
(2)
where r is the coefficient of constant relative risk aversion (CRRA). CRRA implies that preferences among risky prospects are unchanged if all payoffs are multiplied by a positive constant [16]. The curvature of the utility function, defined by parameter r, captures all information concerning risk attitude. 2.3 Prospect Theory Formulation In prospect theory [20], the subjective value of a prospect is defined as X V (q) = Ω(pi )v(∆wi ) ,
(3)
i
where ∆wi represents the difference between outcome wi and a reference point wref , a free parameter that separates perceived gains from perceived losses.
Agricultural Decision Making in the Argentine Pampas
61
The subjective evaluation of this difference can be expressed as suggested by Tversky and Kahneman [39], using for simplicity the same exponent for losses and gains (which tends to be a good approximation based on empirical estimates of the two parameters and which can, of course, be changed to the more general case, if so desired). That assumption allows us to write the PT value function in the following more compact form X v(∆w) = h(∆w)|∆w|α , (4) where function h(∆w) is the step function 1 if ∆w ≥ 0 h(∆w) = , −λ if ∆w < 0
(5)
and λ is a parameter (λ > 1) that reflects the degree of loss aversion. The exponent α in (4) ranges between 0 and 1 and describes the nonlinearity of the value function. Because of the discontinuity at the reference point, the exponent describes the degree of risk aversion (concavity) in the gains region and the degree of risk seeking (convexity) in the losses region. The evaluation of risky prospects is based on subjective probability weights that typically do not correspond to the objective probabilities. Tversky and Kahneman [39] propose the nonlinear function Ω(p), Ω(p) =
pγ 1/γ
(pγ + (1 − pγ ))
,
(6)
to model the subjective weight of event probabilities, which overweights objective probabilities for outcomes at the extremes of the distribution of possible outcomes and underweights outcomes in the middle. The value of Ω(p) depends on positive parameter γ, which must be empirically estimated.
3 A Case Study We compare and contrast the objective functions or choice criteria associated with expected utility and prospect theory in a real-world optimization problem in agricultural management. The decisions examined are related to the production of cereals and oilseeds in the pampas region of central-eastern Argentina, one of the most important agricultural regions in the world [14]. In particular, we examine the nature and magnitude of differences among simulated agricultural production decisions identified as “optimal” by maximization of the objective functions associated with EU and PT. EU maximization is a widely used criterion in agricultural economics, and thus is a useful benchmark against which to compare the results of other objective functions. We argue that as proven and mathematically tractable alternatives to the EU model become available, agricultural and resource economists should at least
62
Guillermo Podest´ a et al.
begin to consider alternative objective functions and explore how they might improve analysis and insight [41]. The case study is organized as follows. First, we describe the agricultural production systems in the target region. We then define a set of cropping enterprises that encompasses a realistic range of initial soil conditions and management options for the typical crops in the region, namely maize, soybean, and a wheat–soybean doublecrop (wheat followed during the same cropping cycle by a shorter-cycle soybean). Next we describe how yields and economic returns are simulated for each cropping enterprise using historical climate data, biophysical models, and realistic cost estimates. These results are subsequently used as input to optimization procedures. Finally we show and discuss optimal enterprise allocation for the two objective functions considered. 3.1 The Area of Study The geographic focus of this study is the region of central-eastern Argentina known as the pampas, one of the most productive agricultural areas in the world [14] and of major importance to the Argentine economy (51% of exports, and 12% of GDP over 1999–2001, [6]). The climate, soils, and cropping systems of the Argentine pampas have been characterized by Hall et al. [14]. In particular, we focus on the region near Pergamino (33◦ 560 S, 60◦ 330 W), the most productive subregion of the pampas [31]. Two characteristics of agricultural production in the study region have implications for the optimization described below. First, agriculture in the Pampas is marketoriented and technology-intensive. As a consequence, a broad spectrum of agronomic management options exists and can be explored in the optimization process. Second, a considerable proportion of the area currently farmed is not owned by the farmers exploiting it. Very short land leases (usually one year) provide incentives for tenants to maximize short-term profits via highly profitable crops. In contrast, landowners tend to rotate crops to steward longterm sustainability of production and soil quality [22]. Given the differences in decision-making goals and constraints between landowners and tenants, we model the two groups separately. 3.2 Crop Enterprises We defined 64 different cropping enterprises that reflect a realistic range of cultivation options for the study area. Each enterprise involves the combination of (a) a given crop (maize, full-cycle soybean, and wheat–soybean), (b) various agronomic decisions (cultivar/hybrid, planting date, fertilization options), and (c) a set of initial conditions (water and nitrogen in the soil at planting) that result from previous production decisions. That is, several enterprises may be associated with the same crop, although involving different management options.
Agricultural Decision Making in the Argentine Pampas
63
3.3 Simulation of Yields: Agronomic Models Yields for each enterprise were simulated using the crop models in the decision support system for agrotechnology transfer (DSSAT) package [17]: genericCERES [34] for maize and wheat, and CROPGRO [3] for soybean. These models have been calibrated and validated under field conditions in several production environments including the pampas [12,25,26]. The information required to run the DSSAT models includes: (i) daily weather data (maximum and minimum temperature, precipitation, solar radiation), (ii) “genetic coefficients” that describe physiological processes and developmental differences among crop hybrids or varieties, (iii) a description of crop management, and (iv) soil parameters, including soil moisture and N content at the beginning of simulations. Historical (1931–2003) daily weather data for Pergamino provided information about category (i). Genetic coefficients, the management options that defined the enterprises, and likely ranges of initial soil conditions were provided by the Asociaci´ on Argentina de Consorcios Regionales de Experimentaci´on Agr´ıcola (AACREA), a nonprofit farmers’ group (similar in goals to the U.S. Agricultural Extension Services) that partnered with us in this study. Simulations assumed no irrigation, a very infrequent practice in the pampas. For each enterprise, 72 simulated yields were obtained (one for each cropping cycle in the 1931–2003 historical weather record used). 3.4 Simulation of Economic Outcomes Economic outcomes were simulated for a hypothetical 600-hectare farm, the median size of AACREA farms in the Pergamino region. We computed net economic returns per hectare πij for year i and enterprise j as the difference between income and costs: πij = Yij Pj − (Fj + Vij + Si + Ti ) .
(7)
Gross incomes per hectare Yij Pj were the product of simulated yield for a year and enterprise (Yij ) and a constant output price for each crop (Pj ). Assumed output prices were the median of 2000–2005 prices during the month when most of the harvest is marketed (April, May, and January for maize, soybean, and wheat, respectively). After deducting export taxes charged by the Argentine government, these prices were 78.9, 166.0, and 112.0 US $ ton−1 for maize, soybean, and wheat, respectively. Four different kinds of costs were involved in the computation of net returns per hectare: (i) Fixed costs Fj for enterprise j are independent of yield. For landowners, fixed costs included: (a) crop production inputs (e.g., fertilizer, seed, field labor), and (b) farmer’s salary, health insurance, and a fixed fiscal contribution. For land tenants, fixed costs also included (c) land rental (assumed to be 232.5 $ ha−1 , equivalent to the price of 1.4 tons of soybean) and (d) management costs (12 $ ha−1 ). (ii) Variable costs Vij are a function
64
Guillermo Podest´ a et al.
of yield on year i for enterprise j. These costs included: (a) harvesting costs, estimated as 8% of gross income (Yij Pj ), (b) transportation costs (about 10 $ ton−1 ), and (c) sales tax and commissions, estimated as 8% of gross income. Variable costs were the same for landowners and tenants. (iii) Structural costs Si are applicable only to landowners and covered: (a) maintenance of farm infrastructure, (b) real estate taxes, and (c) management and technical advice. Structural costs are independent of farm activities or enterprise yields. For the sake of simplicity, however, they were approximated following a criterion used by AACREA: they were a percentage (23%, 18%, and 20% for maize, soybean, and wheat–soybean, respectively) of income per ha after subtracting variable costs (Yij Pj − Vij ). Because structural costs are incurred even if part of the farm is not cultivated, an implicit but not unreasonable assumption, given the high costs of land around Pergamino, is that the entire 600-ha area of the hypothetical farm is cultivated. (iv) Income tax Ti applies equally to landowners and tenants and was computed as follows. b(π − a) + c if π ≥ a T = , (8) c if π < a where a is a threshold income above which farmers pay an average tax rate b = 0.32. Below a, farmers pay a minimum tax assumed to be 59.33 $ ha−1 . To simplify calculations, an average annual income π of 177.5 $ ha−1 (57.6 $ ha−1 ) was assumed for owners (tenants). 3.5 Optimization Procedure A whole farm production model was used to identify optimal decisions for the objective functions associated with EU and prospect theories. The choice variable in the optimization is the vector x = (x1 , . . . , x64 ) that includes the area in the 600-hectare hypothetical farm allocated to each of the 64 alternative cropping enterprises considered. Different land amounts allocated to the 64 enterprises were considered by the optimization of each objective function. The optimization was performed using algorithm MINOS5 in the GAMS software package [10]. For comparability, all objective functions are expressed in terms of a decision-maker’s wealth, either in an absolute sense (for EU and regretadjusted EU), or as a difference from a specified reference level (in prospect theory). The total wealth of a decision maker at the end of cropping year i is wi = w0 + πi ,
(9)
where w0 is the decision-maker’s initial wealth (i.e., prior to production decisions for year i) and πi is the farmwide income during year i, after deducting costs. Farmwide income πi is calculated as πi =
m X j=1
xj πij ,
(10)
Agricultural Decision Making in the Argentine Pampas
65
where πij is the net margin for year i and enterprise j (7) and xj is the amount of land allocated to enterprise j (i.e., a component of the land allocation vector x). Expected Utility Optimization The expected utility (1) of final wealth can be expressed as: EU (x) =
n X
pi u [wi (x)] ,
(11)
i=1
where pi is the probability of a given climate scenario for year i. A climate scenario is defined as the climate conditions over an entire production cycle. We assume that all climate scenarios in the historical record have the same probability (i.e., pi = 1/n, where n is the number of cropping cycles in the historical climate data, in this case, 70 years). Therefore, we can write n
EU (x) =
1X u [wi (x)] . n i=1
(12)
The next step is the optimization max EU (x) = EU (x? ) , x
(13)
where x? = (x?1 , . . . , x?64 ) indicates the proportion of land allocated to each enterprise that maximizes the value of EU. Prospect Theory Value Optimization In prospect theory, value is defined by changes in wealth rather than referenceindependent wealth states. Outcomes wi are evaluated as gains or losses with respect to reference value wref : ∆wi = wi − wref .
(14)
One plausible reference value of wealth that determines whether a farmer thinks of another wealth level as a gain or a loss is the income wr that a farmer could achieve with minimal effort (e.g., by renting his land) added to the decision-maker’s initial wealth: wref = w0 + wr .
(15)
Combining (9) and (14) with (15) we obtain: ∆wi = πi − wr .
(16)
66
Guillermo Podest´ a et al.
The total value function for prospect theory (3) then can be rewritten as V (x) =
n X
Ω(pi )v[∆wi (x)] .
(17)
i=1
As for EU, all climate scenarios are assumed to have the same probability (i.e., pi = 1/n), therefore Ω(pi ) is independent of i. Rewriting (17), we obtain n X 1 V (x) = Ω v[∆wi (x)] , (18) n i=1 which indicates that the constant Ω(1/n) is irrelevant for the optimization; thus one need not worry about the functional form of Ω. The optimization is performed in a way analogous to (13): max V (x) = V (x? ) . x
(19)
Optimizing the value function with the GAMS software [10] available to us was problematic because of the discontinuity of function h(·) (defined in (9)) at ∆wi = 0 (where prospect theory’s value function has a sharp kink and is not differentiable). To address the problem, we used a continuous function ˜ that is numerically equivalent to h(·): h(·) ˜ h(x) = 1/2 [1 − λ + (1 + λ) tanh(%x)] ,
(20)
where % is an arbitrary parameter such that % > 1; large values of % (we used % = 10) reproduce function h(·) more closely. 3.6 Optimization Constraints Allocation of land to cropping enterprises differs for landowners and tenants in the Pergamino region. Landowners tend to adhere to a rotation of crops that offers advantages for soil conservation and control of pests and diseases [22]. In contrast, land tenants seek high profits during short leases (usually one year) and thus usually select enterprises with the greatest economic returns. The clear differences in enterprise allocation between land tenure regimes suggest that we explore optimal decisions separately for landowners and tenants. With three major cropping systems (maize, soybean, and a wheat–soybean double crop) the rotation advocated by AACREA allocates about 33.3% of the land to each of these cropping systems in a given year. To allow owners some flexibility in land allocation, we introduced two constraints in the optimization procedure: land assigned to a crop could be no less than 25%, or more than 45% of the farm area. These constraints did not apply to land tenants, who could allocate the entire farmed area to a single crop. The lack of allocation constraints is consistent with the observed increase in monocropping of soybean that has occurred in the pampas in the last few years [35]. A final constraint specified that 100% of the land had to be assigned to some enterprise (i.e., no land could be left without cultivation).
Agricultural Decision Making in the Argentine Pampas
67
3.7 Parameter Space Explored for Each Objective Function Each objective function has a set of parameters. In some cases, the value of a given parameter describes a personality characteristic (e.g., degree of risk aversion or loss aversion) that may vary among decision makers. With no widely accepted values for parameters, a broad range of plausible values should be considered. In this section, we describe and justify our choice of central (or nominal) parameter values. Expected Utility The expected utility function has two parameters: (i) the decision-maker’s initial wealth w0 and (ii) the risk-aversion coefficient r. Initial wealth w0 is defined as liquid assets. For landowners, this quantity was estimated as 40% of the value of the farm land. The definition is based on the assumption that a farmer will not sacrifice future income potential by selling crop land, but can borrow up to 40% of her land value. The 1994–2003 average value of land for Pergamino was 3541 $ ha−1 , making w0 equal to 1400 $ ha−1 (3541 $ ha−1 × 0.4). For land tenants, we assumed a w0 value of 1000 $ ha−1 , the liquid assets required to finance two complete cropping cycles (i.e., in case of a total loss in one cycle, the farmer still has capital to fund a second cycle). For the risk-aversion coefficient r, we followed Anderson and Dillon’s [1] classification: 0.5 is hardly risk averse; 1.0, somewhat risk averse (normal); 2.0, rather risk averse; 3.0, very risk averse; and 4.0, extremely risk averse. We also included risk indifference by considering r values of 0.0. The range of r values was the same for owners and tenants. Prospect Theory Value Function The value function is defined by (i) a reference wealth wr that separates outcomes perceived as gains and losses, (ii) a risk preference parameter α, and (iii) a loss aversion parameter λ that quantifies the relative impact of gains and losses. The combination of all three parameters defines risk aversion in PT. For landowners, wr was estimated as the income easily achieved by renting out the land instead of farming it. This value of wr was estimated to be 232.5 $ ha−1 (a rental fee of 1.4 ton ha−1 of soybean times a price of 166 $ ton−1 ). For land tenants, wr was estimated as the income obtained by placing the tenant’s initial wealth (w0 = 1000 $ ha−1 , as described for EU) in a bank for six months (the duration of a cropping season) at an annual interest of 4% (representative of current rates in Argentina). The nominal wr value, then, was 20 $ ha−1 . For the risk-aversion parameter α and the lossaversion parameter λ, we used the values empirically estimated by Tversky and Kahneman [39] of 0.88 and 2.25, respectively, for both owners and tenants, but also explored a broader range of values.
68
Guillermo Podest´ a et al.
4 Results This section describes the land allocations (i.e., the proportion of land assigned to different enterprises) identified as optimal for each objective function. Only seven out of 64 possible cropping enterprises were selected by the various optimizations. 4.1 EU Maximization Landowners The enterprise allocation that maximized expected utility for landowners was constant for the full range of initial wealth and risk aversion values explored (Figure 1). The maximum area allowed for one crop by the optimization constraints defined for owners (45% of total land) was allocated to full-cycle soybean Soy14, the enterprise with the highest average economic returns (x = 188.1 $ ha−1 ) over the 70 simulated cropping cycles. Conversely, the minimum area required by constraints (25%) was for maize, the crop with lowest average profits. Ma23, the enterprise with the highest average profits for this crop (x = 116.5 $ ha−1 ) was selected. The remainder of the area (30%) was allocated to the wheat–soy enterprise SW21, which had average profits between those of full-cycle soy and maize (x = 168.8 $ ha−1 ). The stability of results for all parameter combinations illustrates the importance of ecological or logistic constraints associated with maintaining a crop rotation: these constraints clearly override any financial or personality characteristics of a decision maker. Land Tenants For land tenants, only two enterprises (full-cycle soybean Soy14 and wheat– soybean SW21) were involved in the maximization of expected utility. Because of the markedly lower economic profits of maize (due to higher production costs and low prices) and the lack of areal constraints for tenants, this crop did not appear at all in the optimal land allocations. The relative proportions of the two selected enterprises (Soy14 and SW 21) depended on the combination of parameters. Figure 2 has four panels with increasingly higher levels of risk aversion r (from top to bottom). In each panel, the optimal allocation of land is shown as a function of initial wealth w0 . For a risk-neutral decision maker (r = 0; Figure 2, upper panel), the optimal action was to allocate the entire area to the double crop enterprise SW21; this result is constant for the entire range of values considered for w0 . Because the decision maker is risk-neutral, the selection of SW21 was based only on its higher mean profit relative to Soy14 (77.6 $ ha−1 versus 69.4 $ ha−1 ), and ignored the higher risks associated with the considerably larger dispersion of profits (122.0 $ ha−1 versus 89.0 $ ha−1 for Soy14). For moderate risk aversion
Agricultural Decision Making in the Argentine Pampas
69
Fig. 1. Land allocation (as proportion of the hypothetical 600-ha farm) that maximizes expected utility for landowners. The selected combination of enterprises is constant for all initial wealth w0 and risk aversion r.
values (r = 1.0; Figure 2, second panel from top) and low w0 (below about 1100 $ ha−1 ), the optimal action involved about 75% of the land allocated to SW21 and about 25% to Soy14. For higher w0 values, the optimal action was to allocate the entire area to the double crop enterprise SW21. When slightly higher amounts of risk aversion are considered (r = 1.5; Figure 2, third panel from top), the optimal action involved diversification of enterprises for most values of w0 . For low w0 , diversification is highest: 60% of the land was allocated to SW21 and 40% to Soy14. As w0 increases (and, thus, decision makers can afford higher financial risks), the proportion of land assigned to SW21 grew until this enterprise occupied the entire area, resembling results for risk-neutrality. In other words, increasing initial wealth compensates, to some degree, for the effects of risk aversion. Finally, for a highly risk-averse decision maker (r = 3.0; Figure 2, bottom panel), the optimal land allocation was fairly conservative, as maximum crop diversification (comparable proportions of SW21 and Soy14) prevailed throughout the w0 range. 4.2 Prospect Theory Landowners The land allocation that maximized PT’s value for landowners was fairly similar to results for EU. As for EU, full-cycle soybean Soy14 was the enterprise with the largest area (45%). The area allocated to maize (25%) was again the minimum required by optimization constraints. Unlike EU, though, three different maize enterprises (Ma21, Ma23, and Ma24) were selected for
70
Guillermo Podest´ a et al.
Fig. 2. Land allocation (as proportion of the hypothetical 600-ha farm) that maximizes expected utility for land tenants. The four panels show the results for risk neutrality (r = 0, upper panel), small risk aversion (r = 1.0), moderate risk aversion (r = 1.5), and pronounced risk aversion (r = 3.0, bottom panel), in each case plotted as a function of initial wealth w0 .
different portions of the parameter space (figure not shown). All three enterprises had very similar average returns (113.2, 116.5, and 116.3 $ ha−1 for Ma21, Ma23, and Ma24, respectively). The maize enterprise with the highest dispersion (Ma21, SD = 106.8 $ ha−1 ) prevailed for risk-seeking parameter combinations, whereas the less variable enterprise (Ma23, SD = 84.1 $ ha−1 ) was characteristic of moderate and high-risk aversion. Ma21 only appeared for intermediate reference wealth and lower risk preferences. Nevertheless, Kolmogorov–Smirnov tests showed that distributions of economic returns for the three maize enterprises were not significantly different from one another, therefore any differences in land allocation to maize can be considered minor. The wheat–soy double crop (in most cases, enterprise SW21, but also SW20)
Agricultural Decision Making in the Argentine Pampas
71
occupied the remaining area. As for EU, results are consistent with the relative average profitability of each crop. Furthermore, the similarity with EU results suggests that constraints associated with maintaining the crop rotation prevail over personality characteristics, and thus optimal allocations are similar even for fairly different objective functions. Land Tenants Just as for EU, the land allocation that maximized prospect theory’s value function for tenants involved two enterprises: full-cycle soybean (Soy14) and wheat–soybean (SW21). As for EU, the specific proportions of these enterprises depended on the combination of parameters. The top-left panel of Figure 3 (wr = 10 $ ha−1 and λ = 1.00) can be used as a reference to discuss the consequences of varying prospect theory’s parameters. In this panel, there is no loss aversion. Also, a low level of reference wealth puts most outcomes into the domain of gains, where low α values imply a more risk-averse decision maker. As a result, a diversified land allocation including two enterprises (Soy14 and SW21) is selected. As α increases and the decision maker becomes less risk-averse, the allocation switches toward an increasingly higher proportion of the more profitable but riskier SW21, until monoculture is reached. As we move along the top row of Figure 3, we detect a mixture of the two dominating enterprises in the central and right-top panels. Nevertheless, there is always a higher proportion of the less-risky Soy14. The conservative land allocations reflect the effect of increases in loss aversion. If we move down the left column of Figure 3, the switch from diversification to a monoculture of SW21 begins at progressively lower values of α. This is due to the fact that as wr increases, an increasing proportion of outcomes is perceived as losses, in which case α indicates risk-seeking. Risk-seeking to risk-neutral decision makers choose the riskier option (SW21) in search of higher profitability, and thus enterprise selection in the bottom-left panel is identical to that of riskneutral EU maximizers (top panel in Figure 2). This pattern is also apparent in the middle-bottom panel of Figure 3, where we now also have loss-aversion (λ = 2.25). The high reference wealth (wr = 80 $ ha−1 ) implies that a high proportion of outcomes are perceived as losses. For low α values, the decision maker is more risk-seeking and thus selects riskier SW21 in order to attain higher profits and get out of the domain of losses. As α increases, risk-seeking decreases and the selected allocation becomes diversified, as loss aversion now takes effect. The result is a higher proportion of the less variable enterprise Soy14. When loss aversion is even stronger, as in the right column of Fig 3 (λ = 3.50), this effect takes over and dictates diversification across the whole range of α, and even more so for higher levels of reference wealth, as more outcomes are in the domain of losses and hence subject to loss aversion.
72
Guillermo Podest´ a et al.
Fig. 3. Land allocation (as proportion of the hypothetical 600-ha farm) that maximizes prospect theory’s value function for land tenants. The different panels correspond to combinations of wr (reference wealth in $ ha−1 , increasing from top to bottom) and λ (loss aversion parameter, increasing from left to right). In each panel, the optimal allocation of land is shown as a function of the risk aversion coefficient α.
5 Discussion Our results demonstrate in a nonlaboratory decision context that, in some cases, psychologically plausible deviations from EU maximization lead to differences in optimal land allocation decisions. As an example, for nominal or central values of parameters (considered typical of many decision makers), EU maximization generally suggests that tenants should allocate a much greater proportion of land to the riskier SW21 enterprise than in PT value maximization. The loss aversion that is essential in PT’s formulation dictates more conservative strategies (predominance of Soy14) for this objective function. Nevertheless, in situations where there are prescribed constraints to land allocation (e.g., those associated with maintaining an ecologically sound crop rotation), results are very similar for quite different objective functions (EU and PT’s value) and for a broad range of personality and economic characteristics of decision makers. This consistency is an illustration of one of the important goals of institutions and/or social norms, namely, to make behavior more predictable.
Agricultural Decision Making in the Argentine Pampas
73
Optimization of any utility or value function reflects a tradeoff between the expected profits of an enterprise and its risk or dispersion of outcomes. It is interesting to see that different objective functions shape the nature of this tradeoff in different ways that are consistent with the characteristics of each function. In EU optimization, more risk-averse land allocation is encouraged by differences in risk-aversion as indicated by parameter r, and by lower initial wealth w0 . In contrast, in PT’s value optimization, risk-averse behavior is encouraged by a lower reference wealth (that divides the perception of returns into gains versus losses) and much more by the individual loss aversion parameter λ than by the individual risk preference parameter α. The importance of loss aversion is not surprising, given the centrality of this process in PT. Similarly, more risk-seeking land allocation is encouraged by different processes and parameters under the different objective functions. For EU maximization, both parameters r (less risk aversion) and w0 (greater initial wealth) are deciding factors (top panel and right end of middle panel of Figure 2). In PT value optimization, on the other hand, less risk-averse land allocations come about when the decision maker has no loss aversion but a high reference value, with the result that most outcomes are in the domain of losses, in which choices are either risk-seeking or at best risk-neutral (bottom left panel of Figure 3). 5.1 Relevance of Results We envision three main applications of the work presented here. First, an improved understanding of individual differences in preferences and objective functions (when they induce different optimal land allocations) may allow the development of agronomic advice tailored to the personality characteristics of different types of farmers. Such advice will be more effective than the common “one size fits all” agronomic recommendations. Second, knowledge of individual preferences may be helpful to guide the framing and to assess the acceptability of regional or national policies of agricultural sustainability (e.g., policies that encourage crop diversification). Finally, an understanding of production decisions in agriculture may contribute to a better understanding and thus better planning and implementation of a range of related issues, such as adoption of technological innovations and adaptation to climate change.
Acknowledgments This work was supported by an NSF Biocomplexity in the Environment grant (BE-0410348), an NSF Center grant under the Decision Making under Uncertainty Initiative (SES-0345840), and a NOAA Human Dimensions grant (GC04-159). F. Bert is supported by a doctoral fellowship from Argentina’s Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas (CONICET). The authors are grateful to the management, technical advisors, and
74
Guillermo Podest´ a et al.
farmer members of the Asociaci´ on Argentina de Consorcios Regionales de Experimentaci´on Agr´ıcola for their commitment to this research.
References 1. J. R. Anderson and J. L. Dillon. Risk Analysis in Dryland Farming Systems. Number 2 in Farm Systems Management Series. FAO, Rome, 1992. 2. A. G. Barnston, M. H. Glantz, and Y. He. Predictive skill of statistical and dynamical climate models in SST forecast during the 1997–98 El Ni˜ no episode and the 1998 La Ni˜ na onset. Bulletin of the American Meteorological Society, 80:217–243, 1999. 3. K. J. Boote and J. W. Jones. Simulation of crop growth: CROPGRO model. In R. M. Peart and R. B. Curry, editors, Agricultural Systems Modeling and Simulation, pages 651–692. Marcel Dekker, New York, 1998. 4. C. Camerer. Prospect theory in the wild. In D. Kahneman and A. Tversky, editors, Choice, Values, and Frames, pages 288–300. Cambridge University Press, New York, 2000. 5. C. Camerer. Three cheers—psychological, theoretical, empirical—for loss aversion. Journal of Marketing Research, 42:129–133, 2005. 6. D. D´ıaz. Una f´ abrica de empleos. Clar´ın Rural, page 4, 2002. 7. J. A. Dutton. Opportunities and priorities in a new era for weather and climate services. Bulletin of the American Meteorological Society, 83:1303–1311, 2002. 8. D. M. Eddy. Probabilistic reasoning in medicine: Problems and opportunities. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgment Under Uncertainty: Heuristics and Biases, pages 249–267. Cambridge University Press, New York, 1982. 9. H. P. Fennema and P. P. Wakker. Original and cumulative prospect theory: A discussion of empirical differences. Journal of Behavioral Decision Making, 10:53–64, 1997. 10. P. E. Gill, W. Murray, B. A. Murtagh, M. A. Saunders, and M. Wright. GAMS/MINOS. In GAMS–The Solver Manuals. GAMS Development Corporation, 2000. 11. L. Goddard, S. Mason, S. Zebiak, C. Ropelewski, R. Basher, and M. Cane. Current approaches to seasonal-to-interannual climate predictions. International Journal of Climatology, 21:1111–1152, 2001. 12. E. Guevara, S. Meira, M. Maturano, and G. Coco. Maize simulation for different environments in Argentina. In International Symposium: Modelling Cropping Systems, pages 193–194. European Society of Agronomy, University of Lleida, Catalonia, Spain, 1999. 13. I. Hacking. The Emergence of Probability. Cambridge University Press, New York, 1975. 14. A. J. Hall, C. M. Rebella, C. M. Ghersa, and J. P. H. Culot. Field crops systems of the Pampas. In C. J. Pearson, editor, Field Crops Systems: Ecosystems of the World, pages 413–449. Elsevier, Amsterdam, 1992. 15. G. Hammer, J. Hansen, J. Phillips, J. Mjelde, H. Hill, A. Love, and A. Potgieter. Advances in application of climate prediction in agriculture. Agricultural Systems, 70:515–553, 2001.
Agricultural Decision Making in the Argentine Pampas
75
16. J. B. Hardaker, R. B. M. Huirne, J. R. Anderson, and G. Lien. Coping with Risk in Agriculture. CABI Publishing, Cambridge, MA, 2004. 17. J. Jones, G. Tsuji, G. Hoogenboom, L. Hunt, P. Thornton, P. Wilkens, D. Imamura, W. Bowen, and U. Singh. Decision support system for agrotechnology transfer. In G. Tsuji, G. Hoogenboom, and P. Thornton, editors, Understanding Options for Agricultural Production, chapter Dordrecht, The Netherlands, pages 157–177. Kluwer, Boston, 1998. 18. J. W. Jones, J. W. Hansen, F. S. Royce, and C. D. Messina. Potential benefits of climate forecasting to agriculture. Agriculture, Ecosystems and Environment, 82:169–184, 2000. 19. D. Kahneman. Perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58:697–720, 2003. 20. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979. 21. C. E. Laciana and E. U. Weber. Using prospect theory in optimization problems. Technical report, Department of Psychology, Columbia University, New York, 2005. 22. B. Leteinturier, J. L. Herman, F. D. Longueville, L. Quintin, and R. Oger. Adaptation of a crop sequence indicator based on a land parcel management system. Agriculture, Ecosystems & Environment, 112:324–334, 2006. 23. S. J. Mason, L. Goddard, N. E. Graham, E. Yulaeva, L. Sun, and P. A. Arkin. The IRI seasonal climate prediction system and the 1997/98 El Ni˜ no event. Bulletin of the American Meteorological Society, 80:1853–1873, 1999. 24. D. McFadden. Rationality for economists? Journal of Risk and Uncertainty, 19:73–105, 1999. 25. S. Meira, E. Baigorri, E. Guevara, and M. Maturano. Calibration of soybean cultivars for two environments in Argentina. In Global Soy Forum, Chicago, IL, August 1999. 26. J. L. Mercau, J. L. Dardanelli, D. J. Collino, J. M. Andriani, A. Irigoyen, and E. H. Satorre. Predicting on-farm soybean yields in the pampas using CROPGRO-Soybean. Field Crop Research, 100:200–209, 2007. 27. J. Mjelde, H. Hill, and J. Griffiths. A review of current evidence on climate forecasts and their economic effects in agriculture. American Journal of Agricultural Economics, 80:1089–1095, 1998. 28. J. Mjelde, T. Thompson, C. Nixon, and P. Lamb. Utilizing a farm-level decision model to help prioritize future climate prediction research needs. Meteorological Applications, 4:161–170, 1997. 29. National Research Council. Our Common Journey: A Transition Towards Sustainability. National Academy Press, Washington, DC, 1999. 30. National Research Council. A Climate Services Vision: First Steps Toward the Future. National Academy Press, Washington, DC, 2001. 31. J. Paruelo and O. Sala. Effect of global change on Argentine maize. Climate Research, 3:161–167, 1993. 32. J. W. Pratt. Risk aversion in the small and in the large. Econometrica, 32:122– 136, 1964. 33. M. Rabin. Psychology and economics. Journal of Economic Literature, 36:11– 46, 1998. 34. J. Ritchie, V. Singh, D. Godwin, and W. Bowen. Cereal growth, development and yield. In G. Tsuji, G. Hoogenboom, and P. Thornton, editors, Understanding
76
35. 36. 37. 38. 39. 40. 41.
Guillermo Podest´ a et al. Options for Agricultural Production, pages 79–98. Kluwer Academic, Dordrecht, The Netherlands, 1998. E. H. Satorre. Cambios tecnol´ ogicos en la agricultura actual. Ciencia Hoy, 15:24–31, 2005. U. Schmidt and H. Zank. What is loss aversion? Journal of Risk and Uncertainty, 30:157–167, 2005. P. J. H. Schoemaker. The expected utility model: its variants, purposes, evidence and limitations. Journal of Economic Literature, 20:529–563, 1982. K. E. Trenberth and D. P. Stepaniak. Indices of El Ni˜ no evolution. Journal of Climate, 14:1697–1701, 2001. A. Tversky and D. Kahneman. Advances in prospect theory, cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5:297–323, 1992. J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1944/1947. R. T. Woodward. Should agricultural and resource economists care that the subjective expected utility hypothesis is false? In Meeting of the American Agricultural Economics Association, Salt Lake City, UT, August 1998. http://agecon2.tamu.edu/people/faculty/woodward-richard/paps/ AAEA98-Uncertainty.pdf.
On Optimal Satisficing: How Simple Policies Can Achieve Excellent Results J. Neil Bearden and Terry Connolly University of Arizona, Department of Management and Organizations, 405 McClelland Hall, Tucson, AZ 85721, USA [email protected], [email protected] Summary. Herbert Simon introduced the notion of satisficing to explain how boundedly rational agents might approach difficult sequential decision problems. His satisficing decision makers were offered as an alternative to optimizers, who need impressive computational capacities in order to maximize their payoffs. In this chapter, we present a simplified sequential search problem for a satisficing decision maker, and show how to compute its optimal satisficing search policies. Our analyses provide bounds on the performance of satisficing search policies.
1 Introduction The assumptions upon which rational choice theories—including neoclassical economic theory, game theory, and the like—are built are often quite strong, and imply implausibly large computational powers in human decision makers. Simon [9,10] recognized this and proposed an alternative to the maximization objective upon which these theories rest. He suggested that actual agents try to find alternatives that are “good enough” rather than those that maximize their payoffs. He referred to the former objective as satisficing. Superficially, it seems that these objectives are at odds and that they should produce quite different decision outcomes. In the current chapter, we show that these suspicions are not warranted: satisficing, if done well, can produce outcomes on par with those obtained by maximizing. Consider the commonplace problem of trying to find a house to buy. Presumably, one would like to minimize the cost and maximize the quality of the house one chooses. Furthermore, one might wish to maximize the quality of one’s neighbors and of the surrounding schools, and to minimize the distance one will travel to work. And so on and so on. The complexity of optimally selecting a home quickly blows up when one simultaneously considers all of these factors. As traditionally conceived, a satisficing decision maker (DM) need not, for example, solve any difficult nonlinear mathematical programming problems.
80
J. Neil Bearden and Terry Connolly
She simply decides what she will find acceptable in a home and takes the first one that meets these standards. For example, she might decide that she cannot spend more than $100,000, that she will not live more than 30 minutes from work, and that she must have a tree in her front yard. She does not worry about making complicated trade-offs among the attributes of the homes. She just tries to find one that meets her aspirations. This procedure simplifies the choice process, and the computations involved in executing it, enormously. Similar processes are commonly used by multiperson decision-making groups to avoid the problem of reaching agreement on a group multiattribute utility function. For example, the committee charged with selecting a new university president typically (and intendedly) includes representatives of a wide range of constituents, and would face a huge challenge if they needed to make explicit all the trade-offs they each made in evaluating each candidate. The task is greatly simplified if the committee can merely list minimal criteria and search for candidates that meet them all. Simon [9] suggested that DMs such as our househunter might, for a number of reasons, use such simplified (0–1) payoff functions in order to make decisions. Most important for us here, he suggested that it is often difficult to map a vector of apples (e.g., price) and oranges (e.g., driving time) to a scalar payoff because making appropriate trade-offs is difficult. To get around this, when setting out to shop, a DM can simply decide on the minimum number of apples and the minimum number of oranges she will find acceptable. It is easy to see why a DM might make decisions in a satisficing manner when alternatives are encountered sequentially. What is not clear is how well a (0–1) payoff function captures individual preferences. Presumably, three apples and three oranges at $x are preferred to two oranges and two apples at $x, all things being equal. When faced with the two alternative bundles simultaneously, the DM would choose the former. For illustration, let us suppose that a DM is faced with a problem involving just two attributes. For each, he sets an aspiration level, and an alternative that does not exceed both of his aspiration levels will be rejected. Those alternatives that are not immediately rejected are acceptable. An acceptable alternative satisfices. Figure 1 shows this graphically. The alternatives that satisfice are those that fall within the shaded region. The arrows represent the gradient of the DM’s true (non-(0–1)) payoff function at various combinations of the two attribute values. According to Simon [9], the DM might sensibly set his aspiration level for an attribute at a point where the returns on its values are rapidly diminishing. But, importantly, even within the satisficing set, the gradient is positive. Some acceptable alternatives would be preferred to others. Should the DM thus set higher aspiration levels? The satisficing DM faces a dilemma in setting aspiration levels. If they are set too high, the satisficing regime will shrink, and, ceteris paribus, the odds of finding an alternative that satisfices will decrease. On the other hand, if they are set too low, the DM will easily find an alternative, but it may be quite poor. Because by definition, a satisficing DM will take the first alternative
Optimal Satisficing
81
she encounters whose attributes meet all of her aspiration levels, her problem lies in determining how best to set her aspiration levels. In this chapter, we present a procedure for finding aspiration levels for satisficing search policies that maximize expected payoffs. That is, given that one has settled on a satisficing approach, how can the thresholds be best set?
Fig. 1. Partitioning of the feasible decision alternatives into those that satisfice (shaded region) and those that do not (the rest). More is better on each attribute. The arrows represent the gradient field for the DM’s payoff function. Hence, her payoff is increasing in both attributes but is doing so at a decreasing rate as she approaches the acceptable region.
As we conceive it here, a multiattribute satisficing procedure substitutes a simplified, multithreshold decision rule for what is, in fact, a more complex multiattribute utility function. The simplification may be driven by a desire to avoid the complex assessments and computations involved in making explicit multiattribute trade-offs, by political considerations such as opposition to putting explicit dollar values on saving human lives, or as a device to accommodate the desires of multiple actors whose preferences may not be entirely compatible. The simplification will, in general, exact a payoff penalty. A DM might accept an option that narrowly satisfies all her criteria when another available option would have yielded a higher overall payoff—for example, by offering high values on several important attributes with low, perhaps below-threshold values on other, less important ones. It is also clear that some satisficing strategies are better than others. Some will cause the DM to search for a hopelessly ideal alternative; others will cause her to accept inferior alternatives.
82
J. Neil Bearden and Terry Connolly
There are thus two components to the overall loss in expected payoff associated with using a satisficing rather than a maximizing strategy. One component, which we refer to as inherent penalty, is the result of substituting binary, good/bad evaluations for what are in reality continuous multiattribute utility functions. The second, which we refer to as aspiration error, is loss of expected payoff resulting from the poor choice of aspiration levels. Assessing aspiration error obviously requires us to specify what good aspiration levels would be. Thus the title, and primary purpose, of our chapter: we want to be able to specify, for one important class of problems, the set of aspiration levels that yields the best possible expected payoff to a DM who is constrained to use a satisficing strategy. We want, in short, a tool that can allow us to distinguish between payoff loss resulting from satisficing and payoff loss resulting from satisficing badly. The rest of the chapter is organized as follows. Section 2 presents a formal description of a satisficing problem involving sequential search through a finite set of decision alternatives, and also a general procedure for computing its optimal policy. In Section 3, we present a number of special cases of the general problem and show the optimal policies for each. The general problem can be specified in a very large number of ways, so we have attempted to select special cases that broadly span the space of possibilities. We examine the inherent penalty for following satisficing policies in Section 4 by presenting a procedure for solving the maximization version of our problem, and then comparing its results to those obtained by satisficing. Section 5 describes a method for computing optimal heuristic policies, which are constrained versions of the policies discussed in Sections 2 and 3. Results from applying the heuristic policies to the problems studied in Section 3 suggest that little is lost when the satisficing heuristics are constrained to be quite simple. An infinitehorizon version of the satisficing problem in which the DM pays search costs is presented in Section 6. We conclude in Section 7 and discuss some basic implications of this work.
2 Problem Statement and Optimal Policy Suppose a DM may examine as many as N decision alternatives. Each alternative n ∈ {1, . . . , N } is represented by a K-dimensional vector of random variables Xn = (Xn1 , Xn2 , . . . , XnK ), the realization of which is denoted xn = (x1n , x2n , . . . , xK n ). The vector elements correspond to the attributes of the decision alternatives. For each attribute k for each alternative n, we assume that the DM has an aspiration level θnk , which determines which of that attribute’s values she finds acceptable; specifically, an attribute is acceptable if and only if xkn ≥ θnk . The set of all stage n aspiration levels is represented by θn = {θn1 , θn2 , . . . , θnK }. An alternative n is acceptable (or satisfices) if and only if xkn ≥ θnk , ∀k. For shorthand, we use the indicator
Optimal Satisficing
σn =
83
if xkn ≥ θnk , ∀k otherwise.
1 0
A DM’s payoff for selecting an alternative n, denoted φ(xn ), is some monotonically increasing function of the alternative’s attribute values. Most consistent with using aspiration levels to make selection decisions, the function should be increasing in its arguments. can be P For example, the payoff function Q additive in its arguments, φ(xn ) = k xkn ; or multiplicative, φ(xn ) = k xkn , with xkn ≥ 0, ∀n, k; or some combination such as φ(xn ) = x1n + x2n x3n , xkn ≥ 0, ∀n, k. The DM sees the N alternatives sequentially and must select one and only one of them. The alternative that is selected is the first one for which σn = 1, that is, the first one that satisfices. Once passed, an alternative cannot be recalled. The DM’s objective is to maximize his expected payoff. He controls his fate only by setting his aspiration levels. In order to set sensible aspiration levels, the DM must know something about the distribution of the alternatives in the world. For tractability, we assume that the DM has full knowledge of the multivariate distribution from which the alternatives are taken, and denote the density for a vector of attributes x at stage n by fn (x). By allowing the density to vary over n, we can capture, for example, situations in which the quality of encountered alternatives is expected to increase E[φ(Xn )] < E[φ(Xn+1 )], decrease E[φ(Xn )] > E[φ(Xn+1 )], or stay the same E[φ(Xn )] = E[φ(Xn+1 )] over time. Pairs of attributes i and j within an alternative may be correlated %i,j 6= 0 or uncorrelated %i,j = 0. But we assume that the alternative vectors themselves are uncorrelated. Given a set of aspiration levels at stage n, the probability that an alternative will be acceptable (i.e., that σn = 1) can be expressed as Z Z P (σn = 1|θn ) = ··· fn (x)dx1 · · · dxK . 1 θn
K θn
When the DM does not know whether the attributes of an alternative are acceptable, the expected payoff for selecting that alternative is denoted E[φ(X)]. When the DM does know that the alternative is acceptable, we denote the expected payoff for selecting it by E[φ(X)|σn = 1]. For an arbitrary payoff function φ, these expectations can be computed at stage n by Z Z E [φ(Xn )] = · · · fn (x)φ(x)dx1 · · · dxK and R E [φ(Xn )|σn = 1] =
1 θn
···
R 1 θn
R K θn
···
fn (x)φ(x)dx1 · · · dxK
R K θn
fn (x)dx1 · · · dxK
.
84
J. Neil Bearden and Terry Connolly
Given that the DM will stop on the first alternative that he finds acceptable, how should he set his aspiration levels so that the expected value of his selected alternative will be maximized? Put differently, how can he maximize his expected payoff while satisficing? Note that the DM faces a straightforward multistage decision problem. And, if we can optimally solve the problem at each stage n, we can solve the DM’s full problem, from n = 1 to n = N . Given this property, we can find optimal decision policies for the DM using dynamic programming methods [3]. The line of reasoning we evoke here to solve the satisficing problem is closely related to one typically used to solve related optimal stopping problems (see, e.g., [6,7]). At any stage n in the decision S problem, the expected payoff or value for following some partial policy θn {θn+1 , . . . , θN } is denoted Vn (θn ). Because for all feasible policies the DM must accept the N th alterative, if reached, VN (θN ) = E[φ(X)]. Then, for any feasible set of aspiration levels at n < N , we have Vn (θ) = P (σn = 1|θ)E [φ(Xn )|σn = 1] + P (σn = 0|θ)Vn+1 .
(1)
For each stage n, we wish to find the θ that maximizes Vn (θ), which we denote S ? ? θn? . The value of a partial optimal policy θn? {θn+1 , . . . , θN } is represented ? ? by Vn ≡ Vn (θn ). Formally, at each stage n = N − 1 to n = 1, we must solve θn? = arg max Vn (θ) , θ∈RK
(2)
Optimal policies can, therefore, be found by coupling numerical optimization (to find the θs) with dynamic programming (to update the V s). Note that when K = 1 this problem reduces to a standard full-information optimal stopping problem (e.g., [6]). This equivalence is illustrated below for a special case of the satisficing problem. In the next section, we present optimal policies for a broad range of problems. These problems differ in a number of ways and demonstrate that the nature of the optimal satisficing policies is not always obvious a priori.
3 Some Numerical Examples The problems described here vary in terms of the properties of the distribution from which the attributes are taken, the nature of the payoff function φ, and the number of attributes K. We restrict most of our examples to problems with three or fewer attributes (K ≤ 3), which themselves are computationally quite involved. However, the algorithm described above can be implemented for any arbitrary K. For the examples reported here that required numerical optimization, we used the fmincon function in MATLAB 6.5, which performs constrained nonlinear optimization. Besides the results we report here on the optimal policies
Optimal Satisficing
85
themselves, we did extensive analyses of the nature of the solution space. Figure 2 shows an instance of typical behavior of Vn as a function of θn for two-dimensional problems. Fortunately, Vn tends to be quite well behaved, making finding the maximum on the surface relatively straightforward.
Fig. 2. Example of a value function surface. These results are based on x1n ∼ N (0, 1), x1n ∼ N (1, 1), % = 0, and φ(xn ) = x1n + x2n .
Example 1 (Additive Payoffs with Uniform Attributes). Suppose N = 5, K = 2, and, for all n, x1n ∼ Uni[0, 1], x2n ∼ Uni[0, 2], % = 0, and φ(xn ) = x1n + x2n . The optimal policy and its values for this case are shown in Table 1. We can see that the optimal aspiration levels for the second attribute, which has a greater mean (µ = 1), are greater than those for the first attribute (µ = 0.50). Also note that the aspiration levels become progressively less strict as the horizon approaches. This is consistent with Simon’s (1955) house-seller who adjusts her aspiration levels downward as time passes and she fails to sell her house. Table 1. Optimal policies for Example 1. n Vn? θn1? θn2?
1
2
3
4
5
2.07
1.99
1.89
1.75
1.50
0.33
0.26
0.17
0.00
—
1.33
1.26
1.16
1.00
—
86
J. Neil Bearden and Terry Connolly
Example 2 (Additive Payoffs with Normal Attributes). First, imagine that N = 5, K = 2, and for all n, x1n ∼ N (0, 1), x2n ∼ N (1, 1), % = 0, and φ(xn ) = x1n + x2n . The optimal policy and its values for this case are shown in the center panel of Table 2. Again, the attribute with the greater mean (Attribute 2) has greater aspiration levels. The behavior of the optimal policy for additive payoffs is quite sensible and consistent with our a priori expectations. But what happens if the attributes are correlated? Using the same parameters but increasing the attribute correlation from % = 0 to % = 0.50, we obtain the policy shown in the lower panel of Table 2. The optimal DM’s aspiration levels increase considerably when the attributes are correlated, as do the expected payoffs Vn? . Given that many things in nature will tend to have correlated attributes, this is a desirable consequence. Interestingly, the expected payoff for satisficing in environments with negatively correlated attributes is considerably lower than in the uncorrelated case (see the top panel of Table 2). We further examine the effects of attribute correlations below in Section 4. Table 2. Optimal policies for Example 2 for both uncorrelated and correlated attributes. n
1
2
3
4
5
Vn?
1.70
1.59
θn1? θn2?
1.46
1.28
1.00
−0.07 0.93
−0.16
−0.30
−0.54
—
0.84
0.70
0.46
—
1.00
% = −0.50
% = 0.00 Vn? θn1? θn2?
2.14
1.98
1.78
1.49
0.11
−0.01
−0.19
−0.51
—
1.11
0.99
0.81
0.49
—
1.00
% = 0.50 Vn? θn1? θn2?
2.50
2.30
2.03
1.65
0.33
0.19
−0.02
−0.38
—
1.33
1.19
0.98
0.62
—
Example 3 (Multiplicative Payoffs with Uniform Attributes). The previous examples were based on additive payoff functions. We now examine a multiplicative function, which might be taken to represent payoffs for alternatives with noncompensatory (or nonsubstitutable) attributes. Let us see what happens when N = 5, K = 2, and for all n, x1n ∼ Uni[0, 2], x2n ∼ Uni[2, 3], % = 0, and φ(xn ) = x1n x2n . The results are shown in Table 3. The optimal policy is not
Optimal Satisficing
87
absolutely intuitive: Above all, the DM should ensure that he does not select an alternative with a small value on the first attribute; the aspiration level for the second attribute is near its minimum and therefore does not play much of a role. For example, if a househunter would be miserable in a house near a freeway—no matter how cheap the house—then she will be well-advised to set a high aspiration level for her location attribute. Table 3. Optimal policies for Example 3. n Vn? θn1? θn2?
1
2
3
4
5
3.88
3.71
3.48
3.12
2.50
1.44
1.37
1.25
1.00
—
2.16
2.06
2.00
2.00
—
Example 4 (A Three Attribute Problem). Think about the dilemma a satisficing DM faces as K increases. If he sets her aspiration levels too high, he can expect to find that most alternatives are not acceptable, because it only takes xkn < θnk for a single k to make alternative n unacceptable; and, all things equal, the probability of at least one attribute being unacceptable increases with the number of attributes. On the other hand, if he sets his aspiration levels too low, finding an acceptable alternative will be easy, but he may end up with a relatively undesirable alternative. What to do? Consider a problem in which x1n ∼ N (0, 1), x2n ∼ N (.5, 1), x3n ∼ N (1, 1), % = 0, and φ(xn ) = x1n + x2n + x3n for all n. Results are shown in Table 4. In isolation, the general nature of the optimal policy is consistent with expectations. However, the results are more interesting if we compare them to those presented in Example 2 in which the DM faced a related problem. In fact, if we replace the payoff function for the current problem with φ(xn ) = x1n + 0(x2n ) + x3n , we get the Example 2 problem (with % = 0). Note that the optimal aspiration levels for the K = 3 problem are uniformly smaller than the corresponding aspiration levels from Example 2. The intuition for this is quite simple. Suppose a DM used the same aspiration levels for attributes 1 and 3 in the current problem that she used for attributes 1 and 2 in the Example 2 problem. Wherever she sets −∞ < θn2 for the K = 3 problem, there is a nonzero probability that an alternative with acceptable values on attributes 1 and 3 will be rejected because attribute 2 is not satisfactory. Thus, if the DM did use the same aspiration levels in both problems, she would find herself searching deeper into the alternatives when K = 3, where, because of the finite horizon, she can expect to do worse. To prevent this, she must lower her aspiration levels as K increases. We should comment that the complexity of finding optimal policies grows rapidly in the number of attributes. The time required to solve the K = 3
88
J. Neil Bearden and Terry Connolly Table 4. Optimal policies for Example 4. n Vn? θn1? θn2? θn3?
1
2
3
4
5
2.78
2.60
2.37
2.04
1.50
−0.22
−0.33
−0.49
−0.76
—
0.28
0.17
0.01
−0.27
—
0.78
0.67
0.51
0.23
—
case was greater than the time required for the corresponding K = 2 problem by roughly an order of magnitude. There is some irony in the fact that to determine how to optimally satisfice over just five 3-attribute alternatives requires intensive and quite time-consuming computations. Example 5 (A Directly Solvable Problem: Uniform Attributes). In this example, we demonstrate a special case of the satisficing problem in which the optimal policy can be solved analytically, that is, without resorting to numerical optimization. Assume that we have two uncorrelated attributes xkn ∼ Uni[0, 1], k ∈ {1, 2}, and φ(xn ) = x1n + x2n , for all n. Substituting into (1), we get 1 2 − θn1 − θn2 1 2 2 Vn (θn ) = 1 − θn 1 − θn θn + θn + 2 + θn1 + θn2 − θn1 θn2 Vn+1 . (3) Our objective is to find the θn1 and θn2 that maximize this expression. The first-order conditions are ∂V 1 2 2 1 = θn + θn1 θn2 − θn1 − θn2 Vn+1 + Vn+1 − = 0 1 ∂θn 2 2 and
∂V 1 1 2 1 = θ + θn1 θn2 − θn2 − θn1 Vn+1 + Vn+1 − = 0 . 2 ∂θn 2 n 2
Solving the system of equations, we conclude that θn1 = θn2 = 23 Vn+1 − 13 is a stationary point. (There is another stationary point at θn1 = θn2 = 1, which can be disregarded because it will always leave Vn = Vn+1 .) Evaluating the second partial derivatives for permissible Vn+1 , we find that this point is indeed a maximum. Hence, starting with VN? = 1 and working from n = N − 1 ? to n = 1, θnk? = 23 Vn+1 − 13 for k ∈ {1, 2}. Results for a problem with N = 5 are shown in Table 5. Obviously, with θn? in hand for the current problem, optimal policies can be obtained for uniformly distributed attributes on any interval [a, b]. To see the relation to standard optimal stopping problems, consider applying the same reasoning to a problem with a single attribute (K = 1). We have
Optimal Satisficing
89
Table 5. Optimal policies for Example 5. n Vn? θn1? , θn2?
1
2
3
4
5
1.35
1.30
1.24
1.15
1.00
0.54
0.49
0.43
0.33
—
Vn (θ) = 1 − and
θn1
θn1
1 − θn1 + 2
+ θn1 Vn+1 ,
(4)
∂V = Vn+1 − θn1 . ∂θn1
Setting ∂V /∂θn1 = 0 and solving for θn1 , we find that (4) is maximized when θn1 = Vn+1 . This is the solution to the standard full-information optimal stopping problem [6]. It is easy to see that (and why) optimal satisficing and standard optimal stopping are equivalent when K = 1 for arbitrary densities. 0 0 Most generally, assuming that θnk? = θnk ? , ∀k, k ∈ {1, . . . , K}, for independent uniformly distributed attributes on [0, 1], we get h K K i K − Kθnk Vn (θn ) = 1 − θnk Kθnk + + 1 − 1 − θnk Vn+1 2 and θnk? =
? 2Vn+1 +1−K , k ∈ {1, . . . , K} , K +1
where VN? = K 12 . Expressed this way, we can see what happens to the aspiration levels as K grows. Consider stage N −1 aspiration levels; for these we can k? observe that θN −1 → 0 as K → ∞. For large K, it is unlikely that the DM will ever find an alternative that satisfies unless he sets his aspiration levels very low.1 Example 6 (Nonstationary Attributes). Recall that the distributions from which the alternatives are drawn may vary with n. Imagine a house-hunting DM who believes that the quality of the houses she will encounter during her search will tend to decrease in time. Let us assume that she believes that x1n ∼ N (((N − n)/N ), 1) and x2n ∼ N (2((N − n)/N ), 1), with % = 0, and that, for all n, φ(xn ) = x1n + x2n . The optimal policy for N = 5 is shown in the top panel of Table 6. To provide a reference, the bottom panel of the table shows the optimal policies for this problem when x1n ∼ N (1, 1) and x2n ∼ N (2, 1), ∀n. In both cases, the DM’s aspiration levels decrease as the horizon approaches. Most important, for the DM whose alternatives are expected to decrease in 1
We have also obtained the analogous solution for exponentially distributed attributes.
90
J. Neil Bearden and Terry Connolly
quality, we observe lower aspiration levels across the range of n. For her, because things are expected to get worse over time, she should be less choosy at all points in time.
Table 6. Optimal policies for Example 6 for nonstationary and stationary distributions. n
1
2
3
4
5
Vn?
2.77
2.15
θn1? θn2?
1.52
0.83
0.00
0.13 0.93
−0.10
−0.35
−0.72
—
0.50
0.05
−0.52
—
Nonstationary
Stationary Vn? θn1? θn2?
3.99
3.79
3.51
3.05
1.00
0.82
0.53
−0.61
1.50 —
2.00
1.82
1.53
0.38
—
4 Comparing “Maximizing” and “Satisficing” Policies We assume that the maximizing DM evaluates the decision alternatives using his multiattribute utility function. He does not select an alternative that is acceptable on each of its attributes; instead, he makes his accept–reject decisions on the basis of the actual payoff φ(x) for selecting an alternative. It turns out, then, that the maximizing problem is just a standard full-information optimal stopping problem (e.g., [6,8]). One of the questions we wish to answer is: how great is the inherent penalty for satisficing? That is, in comparison to maximizing, how much does one lose by (optimally) satisficing? The optimal maximizing policy for each stage n is an acceptability threshold s?n ; under it, the DM stops at the first stage n at which φ(xn ) ≥ s?n . The value of some stage n < N policy s is V˜n (s) = P [φ(Xn ) ≥ s] E [φ(Xn )|φ(xn ) ≥ s] + P [φ(Xn ) < s] V˜n+1 .
(5)
As before, V˜N? = E[φ(X)]. Our objective for n = N − 1 to n = 1 is to find s?n = arg max V˜n (s) . s∈R
(6)
In order to determine the optimal maximizing policy, we must know the density of φ(x), denoted g[φ(x)], which is a function of both φ(x) and f (x). When
Optimal Satisficing
91
the payoff function is additive in its arguments and the attributes are independent, g[φ(x)] can be obtained by convolution. For instance, if two independent normally distributed attributes have means µ1 and µ2 and variances σ12 and σ22 , then g[φ(x)] is normal with mean µ1 + µ2 and variance σ12 + σ22 . More generally, the convolution can be applied recursively to get the desired density for K normally distributed attributes when the payoff function is additive. However, in general, an analytic expression for g[φ(x)] will be difficult to obtain. Fortunately, in practice, we do not actually need it. The stage n stopping probability for some policy s can be obtained numerically by evaluating Z Z P [φ(X) ≥ s] = · · · I [φ(x), s] fn (x) dx1 · · · dxK , where I [φ(x), s] =
1 0
if φ (x) ≥ s otherwise.
The expectation E [φ(Xn )|φ(xn ) ≥ s] can be similarly expressed. Given this, (6) can itself be solved quite easily, because by the principle of optimality, we ? know that s?n = V˜n+1 .
Fig. 3. Geometric relationship between optimal maximizing and optimal satisficing policies for a single stage n. The policies are for a problem with xk ∼ Uni[0, 1], k ∈ {1, 2}, and φ(x) = x1 + 2x2 .
Figure 3 graphically depicts the relationship between satisficing and maximizing policies at a single stage n. Under the maximizing policy, the DM takes
92
J. Neil Bearden and Terry Connolly
an alternative whose attribute values both fall above the dotted line, which ? represents attribute combinations with φ(xn ) = Vn+1 . In contrast, the satisficing DM accepts only those alternatives for which x1 ≥ θ1? and x2 ≥ θ2? , which fall in the darkened rectangular area in the top right. Note that the satisficing policy both accepts some alternatives that are rejected by the maximizing policy and also rejects some that are accepted by it. It is easy to see that the maximizing policy is more compensatory than the satisficing policy. Under the maximizing policy, the DM will take an alternative with relatively small values on x1 (x2 ) as long as the alternative’s x2 (x1 ) value compensates. Optimal maximizing policies were obtained for a subset of the problems reported in Section 3. Our interest is not so much in the policies themselves but in the payoffs associated with those policies. These are shown in Tables 7 and 8. The most striking finding is that very little is lost by satisficing. In most of the cases, optimal satisficing earns the DM about 99% of what he would earn maximizing. Importantly, though, how much is lost by satisficing depends, among other things, on the nature of the distribution from which the alternatives are taken. Considering those examples with additive payoff functions (Examples 1, 2, and 4), we can see that the relative efficiency of satisficing is worse when the attribute values are taken from normal distributions (Example 2 with % = 0) than when they are taken from uniform ones (Example 1). However, when the attributes are positively correlated, the performance of satisficing improves; and, in the case of normally distributed attribute values, satisficing can earn more than 99% of what is earned by the maximizing policy. Thus, in the wild, where attributes will likely be correlated, the inherent penalty for satisficing may be negligible. Table 8 also displays the efficiency of the optimal satisficing policy for a range of correlations for the problem described in Example 2. Table 7. The efficiency (V1? /V˜1? ) of the optimal satisficing policy for Examples 1, 3, and 4. (Results for Example 2 are displayed below). Example V1? /V˜1?
1
3
4
0.99
0.99
0.99
Table 8. Efficiency of optimal satisficing policies for several values of % for the problem considered in Example 2. % Vn? /V˜n?
−0.95
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
0.95
0.85
0.86
0.89
0.91
0.94
0.95
0.97
0.99
0.99
Optimal Satisficing
93
5 Optimal “Heuristic” Satisficing In the general formulation of our problem, we allowed the satisficing DM to set aspiration levels separately for each stage n. One might complain that the complexity of the resulting policies, which have K(N − 1) separate aspiration levels, is not in the spirit of satisficing. In anticipation of this objection, we now examine a constrained version of the problem in which the DM must use the 1 K same aspiration levels θH = {θH , . . . , θH } for all n ∈ {1, . . . , N − 1}. To keep our expressions from becoming too cumbersome, let us denote P (σn = 1|θH ), which is constant for all n, by Pσ . Now, assuming that the attribute density function does not vary with n (i.e., fn = f, ∀n), the value for some policy θH can be expressed as VH (θH ) =
N −1 h X
n−1
(1 − Pσ )
i N −1 Pσ E [φ (X) |σn = 1] + (1 − Pσ ) E [φ (X)] .
n=1
(7) The last term on the right side of (7) corresponds to the event that the DM reaches the final alternative and must accept it. We denote the θH that max? 1? K? ? imizes (7) by θH = {θH , . . . , θH }, and let VH? ≡ VH (θH ). How much is optimal satisficing affected by the constraint that the aspiration levels must be fixed across the alternatives? To address this question, we found optimal policies for Examples 1 through 4. The results are summarized in Table 9. Most interestingly, we observe that the expected payoffs for ? following θH are quite close to those for following the optimal unconstrained satisficing policy: In all cases, the heuristic policy earns at least 98% of the expected payoff for the optimal unconstrained policy. Table 9. Optimal heuristic policies for Examples 1–4. VH? /V1? is the ratio of the expected payoffs for of the heuristic policy to those of the unconstrained policy. Example 2a corresponds to the problem in Example 2 with % = 0; 2b corresponds to the one with % = 0.50. Example VH? VH? /V1? 1? θH 2? θH 3? θH
1
2a
2b
3
4
2.05
2.10
2.64
3.84
1.00
0.99
0.98
0.98
0.99
0.99
0.22
−0.09
0.26
1.34
−0.40
1.22
0.91
1.26
2.00
0.10
—
—
—
—
0.60
94
J. Neil Bearden and Terry Connolly
6 Infinite-Horizon Satisficing We now present a continuous-time infinite-horizon (control-type) satisficing problem in which the DM must pay a fixed cost c > 0 per unit of time she searches. If the cost were 0, the DM could, if infinitely patient, wait forever to select a “perfect” alternative. To get the value function, let us replace the sum in (7) with an integral. Then with a slight modification to remove the final term, which vanishes, adding the cost term, and replacing n with t (time), we get Z ∞
t
(1 − Pσ ) (Pσ E [φ (X) |σ = 1] − c) dt ,
V (θ∞ ) =
(8)
1 1 K where θ∞ = {θ∞ , . . . , θ∞ } are the aspiration levels for all t. We assume that the attribute density function does not vary with time, ft = f, ∀t. Now our objective is to solve ? θ∞ = arg max V (θ∞ ) . (9) θ∈RK
As with the previous problems, (9) can be solved numerically. Optimal policies for the infinite-horizon instance of the problem in Example 1 for different search costs are shown in Table 10. As the search costs increase, the DM lowers her aspiration levels and can also expect to earn less, as we expected. Table 10. Optimal policies for infinite-horizon version of Example 1 for different search costs. c
0.01
0.10
0.25
0.50
1.00
? V∞ 1? θ∞ 2? θ∞
2.32
1.40
0.84
0.40
0.07
0.73
0.47
0.29
0.11
0.00
1.73
1.47
1.29
1.11
0.69
7 Conclusion In the half century since Simon first proposed “satisficing” as an alternative to the descriptively implausible optimizing model of decision-maker behavior, the term has come to connote a poor, simple-minded, imperfect approach to decisions. We may be constrained, either by situational limits in the task or by our own cognitive limitations, to use satisficing approaches, but we will pay significant penalties for doing so. The primary objective of the present chapter has been to explore this assumption. At least for one general class of decisions, sequential search problems, the assumption seems to be a poor one. Over quite wide ranges of problem characteristics and payoff functions, the penalty for using a satisficing approach was small or negligible.
Optimal Satisficing
95
We distinguished between two components which together make up the loss in expected payoff a given decision policy incurs compared to an optimal policy. The inherent penalty is the unavoidable (expected) payoff loss a DM incurs for simply employing a satisficing rather than a maximizing policy. The second component, aspiration error, is the additional loss associated with using a satisficing strategy with nonoptimal aspiration levels. Our analyses suggest that, although aspiration errors can be very large, inherent penalties are often quite small.2 A satisficing DM can do quite well compared to a maximizer faced with the same kinds of problems. (Similar points have been made elsewhere (e.g., [1,4]).) This is particulary true in environments in which the attributes that compose the alternatives are correlated. Hence, satisficing policies may even be ecologically rational [5]. It seems to us likely that the class of problems for which optimal satisficing strategies will be of practical decision-aiding value is quite small. Computing optimal satisficing thresholds requires both extensive knowledge of task and DM characteristics, and considerable computation; and little more effort would be required to compute genuinely optimal strategies. Unless special circumstances demand that a satisficing approach be used (as in the group decision process discussed earlier), we thus do not envision widespread use of optimal satisficing as a decision aid. The results presented here offer greater potential as benchmarks for behavioral studies of actual decision makers who may adopt satisficing strategies, either spontaneously or in response to special circumstances. In such studies it would be natural to ask what aspiration levels subjects actually adopt, how these levels diverge from optimal levels and at what cost, whether they change over time within and between choice tasks, and so. In a companion paper [2], we have shown that experimental subjects who are required to set aspiration levels and who only learn whether the attribute values of encountered alternatives are acceptable (by the definition given above in the problem statement) do as well as subjects who can see the actual attribute values before making selection decisions. In other words, the subjects who were forced to behave as satisficers did as well as those who were free to maximize. We found this using standard methods from experimental economics, including, most importantly, incentive compatible payoffs. The subjects’ aspiration levels did depart systematically from optimality; specifically, they tended to set their aspiration levels too high. In sum, those subjects who were free to maximize behaved somewhat suboptimally. Those who were constrained to satisfice satisficed somewhat suboptimally. But, overall, the two groups did equally well. The core message of the present chapter is that we should treat satisficing not as a specific decision strategy but as a class of strategies. As we have shown, some of the strategies in this class are capable of performance 2
We will examine in detail the effects of aspiration error in a future paper. Preliminary results can be found in Bearden [2].
96
J. Neil Bearden and Terry Connolly
essentially equivalent to that of the global optimizing (or maximizing) strategy, at least for a broad class of problems. Other satisficing strategies yield quite poor overall performance. In our terminology the inherent penalty associated with the use of a satisficing strategy may often be quite small, although for an ill-chosen satisficing strategy actual losses may be huge. Clearly, given the very large computational costs incurred in optimal satisficing we are not proposing this as a plausible descriptive model of real DMs. Indeed, it was precisely to avoid such costs that Simon originally proposed the satisficing idea. However, the DM who adopts a satisficing strategy in order to avoid the costs of making explicit interattribute trade-offs is not thereby condemned to suffer large utility losses. The procedures outlined here provide a metric by which the loss associated with any given satisficing strategy can be assessed. This opens the way to developing aspiration level setting rules that, if not optimal, can at least keep the expected losses small. Descriptively, it allows investigators to explore the extent to which specific DMs have been able to develop such rules for themselves. It may well be that, in some settings, DMs with or without such help have been able to arrive at satisficing decision strategies that yield excellent overall performance for little effort. By clarifying the notion of optimal satisficing, we hope to have shed some light on the descriptively more realistic notion of acceptable satisficing, surely the notion that Simon originally had in mind.
Acknowledgments We gratefully acknowledge financial support by a contract F49620-03-1-0377 from the AFOSR/MURI to the Department of Systems and Industrial Engineering and the Department of Management and Organizations at the University of Arizona.
References 1. G. A. Akerlof. Irving Fisher on his head: The consequences of constant threshold-target monitoring of money holdings. The Quarterly Journal of Economics, 93:169–187, 1979. 2. J. N. Bearden and T. Connolly. Multiattribute sequential search. Organizational Behavior & Human Decision Processes, 103:147–158, 2007. 3. R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957. 4. G. Gallego and G. van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science, 40:999–1020, 1994. 5. G. Gigerenzer, P. M. Todd, and The ABC Research Group. Simple Heuristics That Make Us Smart. Oxford University Press, New York, 1999. 6. J. Gilbert and F. Mosteller. Recognizing the maximum of a sequence. Journal of the American Statistical Association, 61:35–73, 1966.
Optimal Satisficing
97
7. D. V. Lindley. Dynamic programming and decision theory. Applied Statistics, 10:39–51, 1961. 8. J. B. MacQueen and R. G. Miller. Optimal persistance policies. Operations Research, 8:362–380, 1960. 9. H. A. Simon. A behavioral model of rational choice. Quarterly Journal of Economics, 69:99–118, 1955. 10. H. A. Simon. Satisficing. In D. Greenwald, editor, The McGraw-Hill Encyclopedia of Economics, pages 881–886. McGraw-Hill, New York, second edition, 1993.
There Are Many Models of Transitive Preference: A Tutorial Review and Current Perspective Michel Regenwetter1 and Clintin P. Davis-Stober 1
2
Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL 61820 [email protected] Department of Political Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 [email protected]
Summary. Transitivity of preference is a fundamental rationality axiom shared by nearly all normative, prescriptive, and descriptive models of preference or choice. There are many possible models of transitive preferences. We review a general class of such models and we summarize a recent critique of the empirical literature on (in)transitivity of preference. A key conceptual hurdle lies in the fact that transitivity is an algebraic/logical axiom, whereas experimental choice data are, by design, the outcomes of sampling processes. We discuss probabilistic specifications of transitivity that can be cast as (unions of) convex polytopes within the unit cube. Adding to the challenge, probabilistic specifications with inequality constraints (including the standard “weak stochastic transitivity” constraint on binary choice probabilities) fall victim to a “boundary problem” where the log-likelihood test statistic fails to have an asymptotic χ2 -distribution. This invalidates many existing statistical analyses of empirical (in)transitive choice in the experimental literature. We summarize techniques to test models of transitive preference based on two key components: (1) we discuss probabilistic specifications in terms of convex polytopes, and (2) we provide the correct asymptotic distributions to test them. Furthermore, we demonstrate these techniques with examples on illustrative sample data.
1 Introduction This chapter reviews recent and ongoing progress in formulating and properly testing probabilistic specifications of the concept of “transitive preference.” We first review models of transitive preference and summarize our recent critique of its nearly ubiquitous operationalization via “weak stochastic transitivity.” We emphasize that there are many ways to model transitive preferences, and weak stochastic transitivity is only one of these models. We move the focus to a more canonical representation of “transitive preference” in view of variable/uncertain preferences, namely “mixture models of transitive preference” (and their “random utility representations,” when they exist).
100
Michel Regenwetter and Clintin P. Davis-Stober
In the terminology of Loomes and Sugden [29] we study “random preference models” whose “core theories” are “linear,” “weak,” “semi- ,” “interval,” and “partial orders,” respectively. To put it more formally, we discuss probabilistic choice models that are mathematically equivalent to the “linear,” “weak,” “semi-,” “interval,” and “partial order polytopes” for “binary choice” or “ternary paired comparison probabilities.” These polytopes form a natural hierarchy of natural probabilistic specifications of transitive preference depending on our choice of additional axioms about preference. To stay clear of a combinatorial explosion, we place a special focus on the case of four choice alternatives. We emphasize theoretical work that is closely connected to actual experimental data. More specifically, we highlight conceptually appropriate probabilistic modeling and methodologically sound statistical testing that is compatible with the empirical sample space created by a given experimental or survey setting. Throughout, we assume that the relevant experimental data originate from paired comparisons, either without, or with the option of expressing indifference between two choice alternatives. We call the former “binary choices” and the latter “ternary paired comparisons” (using a term we adapted from [3]).1 Depending on how we group response categories, we furthermore refer to ternary paired comparisons “with” or “without indifference.” We illustrate the theoretical points with example data from an experiment and a survey.
2 Mathematical Models of Transitive Preference Because any real-world experiment can only use a finite collection of choice options, we assume throughout that the set C of choice options (relevant to the experiment) is finite. In this section, we first review two standard mathematical representations of preference in the deterministic realm: binary (preference) relations, and corresponding real-valued (utility) functions that map choice options into utility values (we follow the standard definitions and terminology of [13–15,28,30,31,41,46]). We then summarize known results about probabilistic models that include the above deterministic models as special cases, namely “mixture models” of transitive preference relations and corresponding “random utility” and “random function” representations. We also briefly review the most commonly used probabilistic specification of transitive preference, namely “weak stochastic transitivity” of binary choice probabilities.
1
B¨ ockenholt [3] introduced the term “trinary paired comparison.” The term “ternary paired comparison” appears to us more natural, according to most dictionary and encyclopedia entries for the words “trinary” and “ternary.”
Many Models of Transitive Preference
101
2.1 Transitive Binary Relations Definition 1. A binary relation R on a set of choice alternatives C is a subset of the Cartesian product of C with itself, that is, R ⊆ C × C. In other words, a binary relation on C is a collection of ordered pairs of elements in C. In preference/choice theory it is standard to write (x, y) ∈ R as xRy and to read the relationship as “x is preferred to y.” The central mathematical notion of this chapter is the axiom of “transitivity” of binary relations that is a property common to nearly all normative, prescriptive, and descriptive theories of preference and choice, including, most prominently, expected utility theory [43,49] and prospect theory [25]. Definition 2. A binary relation R on C satisfies the axiom of transitivity if and only if ∀x, y, z ∈ C :
[xRy and yRz] ⇒ xRz .
(1)
Before we review the different kinds of transitive binary relations, it is necessary that we state some additional axioms. Definition 3. If R and S are two (binary) relations, we write RS = {(z, y)| ∃x, zRx, xSy} for their relative product.2 Let R−1 = {(x, y)|yRx} denote the inverse of the binary relation R. If R is a binary relation on C, thus R ⊆ C ×C, we write R = (C × C) \ R for the complement of R (with respect to C). The identity relation on C is IdC = {(c, c)|c ∈ C}. A binary relation R on C is reflexive complete strongly complete antisymmetric asymmetric negatively transitive
if if if if if if
IdC ⊆ R, R ∪ R−1 ∪ IdC = C × C, R ∪ R−1 = C × C, R ∩ R−1 ⊆ IdC , R ∩ R−1 = ∅, R R⊆R.
There are many ways to model transitive preference, as transitivity can be combined with axioms in the above list in various coherent ways. We are now ready to define (what we consider to be) the most important families of transitive binary relations. Definition 4. A strict partial order is an asymmetric and transitive binary relation. An interval order is a strict partial order R with the property that RR−1 R ⊆ R. A semiorder is an interval order R with the property that RRR−1 ⊆ R. A strict weak order is an asymmetric and negatively transitive binary relation. A strict linear order is a transitive, asymmetric, and complete 2
This means that a binary relation is transitive if RR ⊆ R.
102
Michel Regenwetter and Clintin P. Davis-Stober
binary relation. A partial order is a transitive, reflexive, and antisymmetric binary relation. A weak order is a transitive, reflexive, and strongly complete binary relation. A linear order is a transitive, reflexive, strongly complete, and antisymmetric binary relation. Figure 1 shows the directed graphs for five strict partial orders. A pair (a, b) belongs to the preference relation if and only if there is an arrow from a to b in the directed graph. Only the leftmost relation is a strict linear order. The two leftmost relations are strict weak orders. The center graph (with triangular nodes) illustrates the axiom RRR−1 ⊆ R used in the semiorder definition, which states that, given the solid arrows, at least one of the two dashed arrows must be present. Similarly, the graph with hexagon-shaped nodes illustrates axiom RR−1 R ⊆ R used in the definition of semiorders and interval orders, which states that, given the solid arrows, at least one of the two dashed arrows must be present. The rightmost graph shows a strict partial order that is neither an interval order, semiorder, strict weak order, nor a strict linear order.
Fig. 1. Directed graphs of five strict partial orders. When the relations are taken as states of preference, a directed arrow from one object to another indicates a strict preference of the first over the second object.
Clearly, strict linear orders, strict weak orders, semiorders, and interval orders are special cases of strict partial orders, whereas linear orders and weak orders are special cases of partial orders. All these relation types can model transitive preference. However, not all of these models treat indifference as being transitive. The following definition is useful to specify which models treat indifference as transitive. Definition 5. Let R be a binary relation. Then, define IncR = R ∩ R−1 and IndR = R ∩ R−1 .
Many Models of Transitive Preference
103
Observation 1. (1) If R is a strict weak order then IncR is also transitive. However, generally, if R is a semiorder, interval order, or other type of strict partial order then IncR need not be transitive. If R is a weak order then IncR = ∅. (2) If R is a strict partial order then IndR = ∅. If R is a partial order then IndR is transitive. It is standard to interpret IndR as denoting a state of indifference, that is, to interpret (a, b) ∈ IndR to mean that a respondent with preference state R is indifferent between a and b. For IncR there are two different standard interpretations. One is to interpret it also as indifference. This is a common interpretation for semiorders and interval orders, whose motivation was to model transitive strict preference combined with intransitive indifference [13,30]. In Figure 1, IncR holds between any pair of objects that are not linked by an arrow at all. The three rightmost graphs illustrate this type of intransitive indifference. Another interpretation of IncR is as denoting incomparability, in which case (a, b) ∈ IncR means that a respondent with preference state R is unable to compare a and b (with respect to R).3 Much of the decision sciences, rather than relying on combinatorial entities such as binary relations, rely on real-valued functions that they call utility functions and which represent preferences using real numbers. These are our next topic. 2.2 Real Representations of Transitive Binary Relations The following real representations, properties and interpretations are well known (see, e.g., [13–15,28,30,31,41,46]). Observation 2. Let R be a binary relation on C. Suppose that there exist real-valued functions u, l on C with u(c) > l(c), ∀c ∈ C, such that ∀a, b ∈ C :
aRb ⇔ l(a) > u(b) .
(2)
Then R is an interval order on C. If u(x) = l(x)+ ε, ∀x ∈ C, for a fixed positive (utility threshold) ε, then R in the real representation (2) is a semiorder on C. If u = l then R in (2) is a strict weak order on C, but if, in addition, u (and thus also l) is an injective mapping then R in (2) is a strict linear order. The functions u, l can be interpreted as the upper and lower (bound on the) utility, respectively. The representation (2) using > on the real numbers implies transitivity of strict preference. However, it is easy to see that indifference need not be transitive when u 6= l. Most contemporary theories of utility and choice, such as expected utility theory and prospect theory, assume the 3
Alternatively, incomparability between a, b could also result from an absence of paired comparison observations regarding a, b.
104
Michel Regenwetter and Clintin P. Davis-Stober
existence of a single utility function u = l. Under this assumption, both strict preference and indifference are transitive, as the axiom is directly inherited from the relation > on the real numbers. It is well known that, when R is a strict linear, strict weak, semi, or interval order, then there always exists a corresponding representation (2). We leave out the details for brevity. We also leave out real (vector-valued) representations of strict partial orders, for brevity, as they are more complicated (see, e.g., [40] for details). Observation 3. Let R be a binary relation on C. Suppose that there exists a real-valued function u : C → R such that ∀a, b ∈ C :
aRb ⇔ u(a) ≥ u(b) .
(3)
Then R in (3) is a weak order on C. If u is injective then R in (3) is a linear order. Clearly, the representations (2) and (3) are closely related. Most utility theories use a single utility function u with u(a) > u(b) denoting “strict preference of a over b,” and u(a) ≥ u(b) denoting “preference or indifference of a over b,” equivalently (notice that the definition of IncR using > as a relation R and using the definition of IndR above with ≥ as a relation R, we have Inc> = Ind≥ ). The combinatorial and algebraic models that we have so far reviewed share the characteristic that they are deterministic representations. Yet, it is common knowledge that real decision makers, when faced with the same decision multiple times, are prone to fluctuate in their actual choices (unless the decision is very clear cut). One reason why choices fluctuate is that choices between options with complex trade-offs are difficult, and decision makers may experience uncertainty about their own preferences. A natural way to model preference uncertainty or preference volatility is via “mixture models.” These models allow us to separate variability of preference (e.g., inconsistency of preference over time) or preference uncertainty from intransitivity of preference. 2.3 Mixture Models of Transitive Binary Relations Definition 6. Let R be a collection of transitive relations on C. A mixture model over R is a probability distribution P : R → [0, 1] Pthat maps each R ∈ R into a probability R 7→ PR , where PR ∈ [0, 1] and R∈R PR = 1. A mixture model of transitive relations is a mixture model over a collection R of transitive binary relations. A mixture model of transitive relations can have several different interpretations. It may quantify the variability of preference or the preference uncertainty of a single person by postulating a sample space of potential mental
Many Models of Transitive Preference
105
states, all of which are transitive. Alternatively, it may quantify the variability of preference across a population of decision makers, all of whom have transitive preferences. In our illustrative examples below, we consider each of these cases in turn. We first analyze one dataset with repeated choices made by an individual and then we analyze a survey dataset with between-subjects data. As we have already indicated, transitive preferences can be modeled in many different ways. Thus, there are many possible selections of R to build a mixture model. We now introduce shorthand notation for the most important families of binary relations. Definition 7. Denote by LO(C) SWO(C) WO(C) SO(C) IO(C) SPO(C) PO(C)
the the the the the the the
collection collection collection collection collection collection collection
of of of of of of of
linear orders on C strict weak orders on C weak orders on C semiorders on C interval orders on C strict partial orders on C partial orders on C
Definition 8. A collection (Pa,b ) a,b∈C is called a system of binary choice a6=b probabilities if ∀a, b ∈ C, with a 6= b, we have 0 ≤ Pa,b ≤ 1 and Pa,b +Pb,a = 1. This system of binary choice probabilities is induced by linear orders if there exists a probability distribution P on LO(C), such that ∀a, b ∈ C, a 6= b, X Pa,b = PR . (4) R∈LO(C) aRb
In other words, a system of binary choice probabilities is induced by linear orders if there exists a probability distribution over linear orders such that each binary choice probability is the marginal probability of all those linear orders that are consistent with that binary choice. Clearly, if empirical data originate from binary choice probabilities induced by linear orders, this does not prove that preferences must be linear orders. Rather, data of that sort are consistent with a model that postulates linear order preferences.4 Definition 9. A collection (Pa,b ) a,b∈C is called a system of ternary paired a6=b comparison probabilities without indifference, if ∀a, b ∈ C, with a 6= b we have 0 ≤ Pa,b ≤ 1 and Pa,b + Pb,a ≤ 1. This system of ternary paired comparison probabilities without indifference is induced by strict weak orders, respectively, 4
We omit the definition and characterization of a system of binary choice probabilities induced by strict linear orders as it completely matches the case of linear orders.
106
Michel Regenwetter and Clintin P. Davis-Stober
semiorders, interval orders, strict partial orders, and partial orders if there exists a probability distribution on R such that ∀a, b ∈ C, a 6= b, X Pa,b = PR , (5) R∈R aRb
with R equal to the collection SWO(C), respectively, SO(C), IO(C), SPO(C), PO(C). Definition 10. A collection (Pa,b ) a,b∈C is called a system of ternary paired a6=b comparison probabilities with indifference, if ∀a, b ∈ C, with a 6= b we have 0 ≤ Pa,b ≤ 1 and Pa,b + Pb,a ≥ 1. This system of ternary paired comparison probabilities with indifference is induced by weak orders if there exists a probability distribution on R such that ∀a, b ∈ C, with a 6= b, (5) holds with R = WO(C). To model the different ways in which decision makers may act in accordance with transitive preferences (that may vary within and between decision makers), we investigate in Section 4 the various ways in which binary choice or ternary paired comparison probabilities may be induced by (mixture models over) various families R of binary relations. In other words, we discuss how to characterize properties on binary choice and ternary paired comparison probabilities that allow us to check whether they are induced by various types of preference relations via (4) or (5). First, we review how mixture models may often be recast as equivalent “random utility models.” 2.4 Random Utility Definition 11. Let I be a finite (index) set. Any collection (Uc,i )c∈C,i∈I of jointly distributed real random variables is called a random utility model on C. A random utility model is called noncoincident if ∀i ∈ I, and for any c 6= d, c, d ∈ C, P (Uc,i = Ud,i ) = 0 . When |I| = 1 we omit the index set and the index i. The following theorem combines results from, e.g., [2,22,23,34,36,37]. Theorem 1. A system of binary choice probabilities (Pa,b ) a,b∈C is induced by a6=b linear orders if and only if there exists a noncoincident random utility model (Uc )c∈C such that ∀a, b ∈ C, a 6= b, Pa,b = P (Ua > Ub ) .
(6)
A system of ternary paired comparison probabilities without indifference (Pa,b ) a,b∈C is induced by strict weak orders if and only if there exists a random a6=b
Many Models of Transitive Preference
107
utility model (Uc )c∈C (not necessarily noncoincident) such that ∀a, b ∈ C, a 6= b, (6) holds. A system of ternary paired comparison probabilities without indifference (Pa,b ) a,b∈C is induced by interval orders if and only if there exists a random a6=b
utility model (Uc,i )c∈C,i∈{1,2} , with Uc,1 < Uc,2 almost everywhere, such that ∀a, b ∈ C, a 6= b, Pa,b = P (Ua,1 > Ub,2 ) .
(7)
A system of ternary paired comparison probabilities without indifference (Pa,b ) a,b∈C is induced by semiorders if and only if there exists a random utila6=b
ity model (Uc,i )c∈C,i∈{1,2} , with Uc,1 = Uc,2 − 1 almost everywhere, such that ∀a, b ∈ C, a 6= b, (7) holds. A system of ternary paired comparison probabilities with indifference (Pa,b ) a,b∈C is induced by weak orders if and only if there exists a random a6=b
utility model (Uc )c∈C such that ∀a, b ∈ C, a 6= b, Pa,b = P (Ua ≥ Ub ) .
(8)
As Regenwetter and Marley [40] discuss, there is another equivalent representation, called random functions, in which a probability measure is imposed on a suitably chosen space of utility functions. We leave out the details of this representation, as well as the partial order case, for brevity. This completes our review of random utility representations. We now sketch weak stochastic transitivity. 2.5 Weak Stochastic Transitivity Definition 12. A system of binary choice probabilities (Pa,b ) a,b∈C satisfies a6=b weak stochastic transitivity if, ∀x, y, z ∈ C distinct, 1 1 1 Pa,b ≥ and Pb,c ≥ ⇒ Pa,c ≥ . (9) 2 2 2 Theorem 2. [2,32] A system of binary choice probabilities (Pa,b ) a,b∈C satisa6=b fies weak stochastic transitivity if and only if there exists a real-valued function u : C → R with Pa,b ≥
1 ⇔ u(a) ≥ u(b) . 2
(10)
Let R ⊆ C × C be defined by aRb ⇔ Pa,b ≥ 12 . Then R is a weak order. We re-emphasize: there are many ways to model transitive preference. This theorem shows that weak stochastic transitivity implies one particular way to be transitive at the aggregate, namely via an aggregate weak order.
108
Michel Regenwetter and Clintin P. Davis-Stober
3 Summary Critique of Prior Work Jointly with other authors, we provide a detailed critique of the past approaches to empirically testing “transitive preferences” elsewhere [39]. DavisStober [8] discusses the statistical problems involved in analyzing empirical data. Here, we only summarize the main points. 3.1 Critique of the Standard Empirical Paradigm In two-alternatives forced choice, the standard paradigm in experiments on transitivity, the respondent must choose one and only one of two choice alternatives on each experimental trial. This forces each binary choice to be asymmetric and complete. Thus, at the level of binary choices, the experimental paradigm artificially enforces the additional axioms of asymmetry and completeness. As we have seen above, the combination of transitivity with these two axioms leads to a strict linear order, a very strong mathematical structure that is equivalent to a complete ranking, without ties, from best to worst. 3.2 Critique of Weak Stochastic Transitivity Weak stochastic transitivity holds if and only if majority rule outcomes [7], applied to individual choice probabilities, yield a transitive social welfare relation. This means that the paradoxes and problematic features of majority rule are directly applicable to weak stochastic transitivity. We only provide a summary (see [39] for details). Violations of weak stochastic transitivity could be artifacts of data aggregation: They could be instances of the famous classical Condorcet paradox of social choice theory, in which transitive individual preferences are aggregated into an intransitive social welfare relation. Loomes and Sugden [29] also pointed out this problem previously. Weak stochastic transitivity also ignores the absolute magnitude of binary choice probabilities. Yet, clearly, binary choice probabilities near 21 are conceptually different from binary choice probabilities near 0 or 1. The former might suggest a respondent’s overall indifference or extreme uncertainty of preference, whereas the latter might suggest approximately deterministic choice perturbed by error. Last, but not least, weak stochastic transitivity and the equivalent weak utility model (10) induce an aggregate weak order on the choice alternatives. Thus, weak stochastic transitivity is mathematically equivalent to transitivity and additional axioms, at the aggregate level. In particular, weak stochastic transitivity also requires that indifference be transitive. (Of course, stronger forms of stochastic transitivity further add additional structure.) 3.3 A Volatile Mix Combining two-alternatives forced choice with weak stochastic transitivity can produce artifactual violations of weak stochastic transitivity. Tsetlin
Many Models of Transitive Preference
109
et al. [48] argue in a social choice context that there are situations in which several binary choice probabilities are equal to 12 (and weak stochastic transitivity holds), but where intransitivities have substantial nonzero probability in samples. A famous example of such a process in social choice theory is called the “impartial culture.” The impartial culture over three objects is a uniform distribution over the six possible linear orders on those three objects. In random samples from this impartial culture, the probability of an intransitivity in a random sample converges to approximately .09 as the sample size goes to infinity. In our context, this means that if some binary choices with difficult tradeoffs lead to choices “by coin flip” then this may, under somewhat contrived assumptions, artificially inflate the chance of observing intransitivities in empirical samples, because the effectual tie by majority rule at the distribution (probability) level will frequently turn into an aggregate cycle (intransitivity) at the sample level. 3.4 Statistical Methodology The canonical empirical sample space for binary choice or ternary paired comparison is given by a product of independent binomials or trinomials, respectively. The basic idea is to assume that empirical data originate from an independent random sample with replacement from the probabilities (Pa,b ) a,b∈C . a6=b Suppose that a respondent evaluated each paired comparison N times. When we treat this task as a binary choice task, we let Na,b denote the number of times the respondent chose a over b and we assume that Na,b is a realization of a binomial random variable. We write B(Pa,b , N ) for the distribution function of a binomial random variable with probability of success Pa,b and with N repetitions. We also assume that the binomials [B(Pa,b , N )] a,b∈C are indepena6=b dent as the different paired comparisons are separated by decoys. When we treat the task as a ternary paired comparison task without indifference, let Na,b denote the number of times the respondent chose a (i.e., neither “indifference” nor b) in the paired comparison among a, b. Here, we assume that Na,b and Nb,a result from a trinomial sampling process with probabilities Pa,b , Pb,a , and 1 − Pa,b − Pb,a for the three outcome categories. We assume again that these trinomials (with N many repetitions) are independent across choices of a and b as the different paired comparisons are separated by decoys. In the case of ternary paired comparison with indifference, we let Na,b count the number of times that the respondent chose either a or “indifference” in the 0 paired comparison among a and b. Define Pa,b = 1 − Pb,a for all distinct a, b 0 0 0 0 and consider the trinomials with probabilities Pa,b , Pb,a , and 1 − Pa,b − Pb,a 0 for the three outcome categories. Similarly, let Na,b = N − Nb,a for all distinct a, b. The likelihood of the observed data in these three scenarios takes the form
110
Michel Regenwetter and Clintin P. Davis-Stober
Y
N −Na,b
Pa,b Na,b (1 − Pa,b )
for binary choice,
a,b∈C a6=b
Y
N −Na,b −Nb,a
Pa,b Na,b Pb,a Nb,a (1 − Pa,b − Pb,a )
a,b∈C a6=b
Y
0
for ternary paired comparison without indifference, 0 0 0 N −Na,b −Nb,a 0 0 1 − Pa,b − Pb,a
0 Na,b 0 Nb,a Pa,b Pb,a
a,b∈C a6=b
for ternary paired comparison with indifference. In our first illustration below, N = 30 denotes the number of (repeated) observations per pair from a single participant in an experiment. In our second illustration, N = 303 denotes the number of participants in a survey, with each participant evaluating each paired comparison once. Using standard techniques, we determine the goodness-of-fit of a given model to a given set of data by computing a log-likelihood ratio test statistic. Here, we need to distinguish between unconstrained choice probabilities and choice probabilities that are constrained by a given model. Writing Pa,b and 0 0 Pa,b for the unconstrained probabilities, and writing θa,b and θa,b for the corresponding constrained probabilities, the standard log-likelihood ratio is given by Q N −Na,b Na,b (1 − Pa,b ) a,b∈C Pa,b , 2 ln Q a6=b (11) N −Na,b Na,b (1 − θa,b ) a,b∈C θa,b a6=b Q N −Na,b −Nb,a Na,b Pb,a Nb,a (1 − Pa,b − Pb,a ) a,b∈C Pa,b , 2 ln Q a6=b (12) N −Na,b −Nb,a Na,b θb,a Nb,a (1 − θa,b − θb,a ) a,b∈C θa,b a6=b Q 0 0 N −Na,b −Nb,a 0 0 0 Na,b 0 Nb,a 0 0 P 1 − P − P a,b∈C P a,b b,a a,b b,a 2 ln a6=b , (13) 0 −N 0 N −Na,b Q 0 0 b,a Na,b 0 Nb,a 0 0 0 θ 1 − θ − θ a,b∈C θ a,b b,a a,b b,a a6=b
with (11) denoting the case of binary choices, (12) denoting the case of ternary paired comparisons without indifference, and (13) denoting the case of ternary paired comparisons with indifference. If standard assumptions were to hold, these three test statistics would asymptotically follow χ2 distributions (see, e.g., [1]). However, in every model we discuss in the chapter, these regularity requirements easily fail. We first consider weak stochastic transitivity. Iverson and Falmagne [24] considered weak stochastic transitivity geometrically. The disallowed events in weak stochastic transitivity are sets of binary choice probabilities of the form 0 ≤ (Pa,b ) a,b∈C ≤ 1 with 12 < Px,y , Py,z , Pz,x ≤ 1 for some choice of distinct a6=b x, y, z ∈ C. Geometrically, these disallowed events are half-unit cubes. Thus,
Many Models of Transitive Preference
111
weak stochastic transitivity equals the unit cube of probabilities minus the disallowed half-unit cubes. In more abstract terms, weak stochastic transitivity is a certain union of convex polytopes embedded in the unit cube.5 When choice proportions lie outside this union of polytopes, then the maximum likelihood estimator, subject to the constraints imposed by weak stochastic transitivity, lies at the boundary of the parameter space. This means that the log-likelihood ratio test statistic does not have an asymptotic χ2 distribution because the required regularity assumptions are violated (for general discussions of this type of problem, see, e.g., [42]). Iverson and Falmagne’s [24] finding invalidates many statistical tests in the existing empirical literature on weak stochastic transitivity. The same statistical constraints also apply to other probabilistic formulations that feature inequality restrictions on parameters [42]. The work by Iverson and Falmagne [24], which provided the correct asymptotic distribution of the log-likelihood ratio test statistic for weak stochastic transitivity, has almost completely been ignored by the empirical literature on testing transitivity and intransitivity of preference. Unfortunately, Iverson and Falmagne [24] were unable to develop their technique beyond a limited set of models, due to limitations in available mathematical and computational tools at the time. Suck [45] discussed statistical testing of random utility representations in a slightly different context, and most recently, Myung et al. [35] developed a general Bayesian approach to test weak stochastic transitivity, as well as more general models that involve order constraints on model parameters. Davis-Stober [8] expanded Iverson and Falmagne’s [24] approach to a broad class of models in which the model parameters of a probabilistic model are constrained by a union of convex polytopes. This technique directly applies to the mixture models we discuss here. The key to analyzing relevant data correctly is to know the proper asymptotic distribution of the log-likelihood ratio test statistic. Davis-Stober [8] uses recent developments in mathematical statistics and convex geometry to derive this distribution. This distribution is a convex combination of χ2 distributions with varying degrees of freedom, commonly referred to as a χ ¯2 (chi-bar squared) distribution. This completes the review of our recent critique on prior work to test transitivity of individual preferences. We refer the reader to Regenwetter et al. [39] and Davis-Stober [8] for further details, including a detailed literature review.
4 Testing For Transitive Preferences We now discuss how to test whether binary choice probabilities or ternary paired comparison probabilities are induced by transitive relations in the sense discussed in Section 2.3. 5
For an example of a convex polytope, see Figure 2, which illustrates a (different) polytope that we discuss in Section 4.1.
112
Michel Regenwetter and Clintin P. Davis-Stober
As an illustrating example we use data from a ternary paired comparison pilot study that we recently ran in preparation for a major experiment. In each experimental trial, a subject looked at a pair of gambles on a computer display and had three options; they could choose the left gamble, the right gamble, or express indifference between the two gambles. The subject considered each possible pair of gambles 30 times, separated by substantial numbers of decoys. Table 1 shows, for one subject, the proportions of times, Qi,j , that the subject chose gamble i over gamble j, for all distinct i, j. This particular respondent, although being allowed to express indifference between gambles, in fact never responded with the “indifference” option.6 Thus, for this experimental subject, Qi,j + Qj,i = 1, even though this was not required by the experimental paradigm. This means that the data from this subject can be treated either as binary choice data or as ternary paired comparison data. It is for this convenient reason that we use this dataset for illustrative purposes here. Table 1. Experiment using ternary paired comparisons, proportion of choices out of 30 repetitions, reported for one subject. Gamble pair (i, j)
(a, b)
(a, c)
(a, d)
(b, c)
(b, d)
(c, d)
Proportion i chosen: Qi,j Proportion j chosen: Qj,i
24 30 6 30
24 30 6 30
14 30 16 30
23 30 7 30
15 30 15 30
23 30 7 30
Proportion indifference between i & j
0
0
0
0
0
0
On the surface it is not clear how one would, in general, characterize whether binary choice proportions or ternary paired comparison proportions such as the ones in Table 1, originated from binary choice or ternary paired comparison probabilities that are induced by a given family of binary relations. Even ignoring the fact that we only observe proportions, it is not straightforward how one would, in general, characterize whether a given collection of binary choice/ternary paired comparison probabilities are induced by such a family of relations. The key lies in the geometry of the space of binary choice probabilities (or of ternary paired comparison probabilities) that are induced by such a family of relations. After fixing a family R of binary relations of interest, we write each R ∈ R using 0/1 coordinates in a suitable high-dimensional space as follows. For each ordered pair (x, y) and each binary relation R ∈ R, let Rxy = 1 if xRy, and Rxy = 0, otherwise. Each binary relation is thereby written as a 0/1 vector indexed by all ordered pairs of elements in C, that is, as a 6
However, the subject did use the “indifference” option among other pairs of gambles that served as distractor items.
Many Models of Transitive Preference
113
point in [0, 1]n(n−1) when |C| = n. (In the case of linear order preferences, we project this representation down to a representation that uses one coordinate for each nonordered pair {x, y}.) A probability distribution over R can thus be represented mathematically as a convex combination of such 0/1 vectors, that is, as a point in the convex hull of finitely many points in [0, 1]n(n−1) . The convex hull of a finite set of points, in turn, is a convex polytope (see Figure 2 for an example). Every convex polytope is a bounded intersection of finitely many closed half spaces, each of which can be defined by an affine inequality. Thus, characterizing whether binary choice or ternary paired comparison probabilities are induced by binary relations in R is tantamount to finding a “complete” description of an appropriately selected convex polytope, associated with R, via a system of affine inequalities. The best understood example of such a problem is the case of binary choice probabilities induced by linear orders. We review this case first. For the remainder of the chapter, we focus primarily on cases where |C| = 4, because the polytopal structure for larger sets C is, in most cases, not yet fully understood. (Figure 2 displays the case when |C| = 3.) 4.1 Testing For Linear Order Preferences The problem of characterizing binary choice probabilities induced by linear orders is also known as the classical binary choice problem. This problem has been studied very extensively in several disciplines and is notorious for its intractability and complexity (e.g., [4–6,10,16–20,23,26,27,44]). Definition 13. The linear ordering polytope is the convex polytope whose vertices are the n! many 0/1 vectors associated with the linear orders as follows. First, for each nonordered pair {x, y}, arbitrarily fix one of the two possible corresponding ordered pairs, say (x, y). Then, for each linear order R ∈ LO, Rxy = 1 if (x, y) ∈ R, and Rxy = 0, otherwise. A minimal description of a convex polytope is a shortest possible list of equations and inequalities whose solution set is that polytope. The inequalities in such a description are called facet-defining inequalities because they define the facets of the polytope, that is, proper faces of maximal dimension. For introductions to polytope theory, see, for example, Gr¨ unbaum [21] or Ziegler [50]. Because optimization over the linear ordering polytope is NPhard, no complete linear description of the polytope is likely to be found for all values of n. Thus, the problem of characterizing binary choice probabilities induced by linear orders is unsolved for large n. We review the case of up to five choice alternatives for which a minimal description is known.7 Theorem 3. (e.g., [11]) Let 3 ≤ |C| ≤ 5. Binary choice probabilities on C are induced by linear orders if and only if the following list of equations and 7
According to Fiorini [11], minimal descriptions are known up to |C| = 7.
114
Michel Regenwetter and Clintin P. Davis-Stober
facet-defining inequalities for the linear order polytopes on three, four, or five objects are satisfied (with quantifiers omitted, and a, b, c distinct). LO1 : Pa,b + Pb,a = 1, LO2 : Pa,b ≤ 1, LO3 : Pa,b + Pb,c − Pa,c ≤ 1 . For instance, for |C| = 4, the binary choice polytope is a six-dimensional polytope. The system of equations LO1 (which can be thought of as inequalities holding both ways) and the first set of facet-defining inequalities LO2 are also commonly referred to as trivial inequalities because they are satisfied trivially by the design of the experiment and the implied sample space for binary choice data. In the case of |C| = 4, these define 12 distinct facets. The second set of inequalities, LO3 , is commonly referred to as the triangle inequality. It directly reflects the axiom of transitivity and defines eight distinct facets when |C| = 4. Figure 2 shows the linear ordering polytope for |C| = 3, embedded in a three-dimensional Euclidean space. The six strict linear orders, when written in 0/1 coordinates, form the six vertices of the polytope. These vertices form the zero-dimensional faces, whereas the one-dimensional faces of the polytope consist of the edges that link adjacent vertices. The maximal dimensional proper faces (i.e., the facets) are two-dimensional. The full description of this linear ordering polytope consists of four facet-defining inequalities for trivial facets (that correspond to the walls of the unit cube in which the polytope is embedded), as well as two facet-defining inequalities for the nontrivial facets corresponding to the two possible triangle inequalities that one can write out for |C| = 3. Each point belonging to the polytope is a permissible collection of binary choice probabilities for the linear order mixture model, whereas any point outside the polytope is a collection of binary choice probabilities that are ruled out by the mixture model. Figure 2 illustrates an example where the empirical choice proportions lie outside the polytope and thus, the maximum likelihood estimate of the constrained binary choice probabilities lies on the boundary of the polytope, that is, on a facet (or possibly an intersection of facets). Note that the maximum likelihood estimate (subject to the polytope constraints) need not be the orthogonal projection of the empirical proportions onto the polytope. We now return to the data of Table 1. If we substitute Qi,j for Pi,j , the proportions Qi,j of Table 1 automatically satisfy the trivial inequalities. However, they violate two instances of the triangle inequality, namely Qb,c + Qc,d − Qb,d = 1.04 > 1 Qa,c + Qc,d − Qa,d = 1.1 > 1 .
(14) (15)
This finding means that the binary choice proportions lie outside the linear ordering polytope. It raises the question of how plausible it is that these choice
Many Models of Transitive Preference
115
Fig. 2. The linear ordering polytope for |C| = 3.
proportions could have been observed, assuming that the generating binary choice probabilities were, in fact, restricted by the linear ordering polytope, that is, assuming that they satisfied the triangle inequality. This is a statistical question of whether the inequalities (14)–(15) jointly provide a statistically significant violation of the linear ordering polytope, or whether these apparent violations of the linear order model could be a result of sampling variability. To answer that question, we obtain the maximum likelihood point estimate under the linear order polytope constraint and determine the goodness-of-fit. The point estimate lies at the boundary (on a face) of the linear ordering polytope. This implies that the log-likelihood ratio test statistic need not have an asymptotic χ2 distribution. The point estimate and the local facial structure at the point estimate determine the χ ¯2 distribution that we need to use in order to obtain a p-value for the goodness-of-fit. The maximum likelihood point estimate, given in Table 2, lies in the intersection of the two facets whose facet-defining inequalities the choice proportions violate (as we show in [38] this need not hold automatically). More precisely, it turns out Table 2. Point estimates for the linear order polytope. (i, j)
(a, b)
(a, c)
(a, d)
(b, c)
(b, d)
(c, d)
Estimated probability i preferred to j
0.8
0.77
0.51
0.76
0.50
0.73
116
Michel Regenwetter and Clintin P. Davis-Stober
that the lowest-dimensional face of the linear ordering polytope that contains this point estimate is the four-dimensional face given by the intersection of the linear ordering polytope and the subspace generated by the following system of two equations. Pa,c + Pc,d − Pa,d = 1 Pb,c + Pc,d − Pb,d = 1 . Using the techniques developed by Davis-Stober [8], we find that the loglikelihood ratio test statistic has a point estimate of .4919, with a χ ¯2 distri2 2 bution equal to .3 + .46χ1 + .24χ2 . This yields a p-value of .41, indicating a good statistical fit. In other words, the two triangle inequality violations in (14)–(15) do not constitute a statistically significant violation of the mixture model over linear orders. 4.2 Testing For Weak Order Preferences We now consider a mixture model over weak orders, that is, a probabilistic choice model for the case where the “core theory” [29] states that preferences are weak orders. The natural empirical paradigm here is ternary paired comparison with indifference. Theorem 4. [11,12] Let |C| = 4. Ternary paired comparison probabilities with indifference on C are induced by weak orders if and only if the following list of facet-defining inequalities for the weak order polytope on four objects are satisfied (with quantifiers omitted, and a, b, c, d distinct). WO1 WO2 WO3 WO4
: : : :
WO5 WO6 WO7 WO8 WO9
: : : : :
Pa,b ≤ 1 −Pa,b − Pb,a ≤ −1 Pa,b + Pb,c − Pa,c ≤ 1 Pa,b + Pb,a + Pb,c + Pc,b + Pb,d + Pd,b − Pa,d −Pd,a − Pa,c − Pc,a − Pc,d − Pd,c ≤ 1 Pa,b + Pa,c + Pb,d + Pc,d + Pd,a − Pa,d − Pb,c − Pc,b ≤ 2 Pa,b + Pb,a + Pa,c + Pc,a + Pa,d − Pb,c − Pb,d − Pc,b − Pc,d Pa,b + Pb,a + Pa,c + Pc,a + Pd,a − Pb,c − Pc,b − Pd,b − Pd,c Pa,b + Pb,a + Pa,c + Pc,a + Pb,d + Pc,d − Pa,d − Pb,c − Pc,b Pa,b + Pb,a + Pa,c + Pc,a + Pd,b + Pd,c − Pb,c − Pc,b − Pd,a
≤2 ≤2 ≤3 ≤3.
As Fiorini [11] and Fiorini and Fishburn [12] discuss in detail, the weakorder polytope for four choice alternatives is a 12-dimensional polytope with 75 vertices and with 106 different facets. Specifically, taking into account the omitted quantifiers, there are 12 facets of type WO1 , 6 facets of type WO2 , 24 facets of type WO3 , 4 facets of type WO4 , 12 facets of type WO5 , 12 facets
Many Models of Transitive Preference
117
of type WO6 , 12 facets of type WO7 , 12 facets of type WO8 , as well as 12 facets of type WO9 . Notice that this polytope, because it lives in a much higher-dimensional space, has 24 triangle inequalities, compared to the linear order polytope, which only has 8 triangle inequalities. Note, in passing, that Fiorini [11] and Fiorini and Fishburn [12] collapse the 9 classes of facets into 7 groups by exploiting additional symmetries, besides the possible choices of labels a, b, c, d. For the empirical illustration of Table 1, the observed proportions lie outside the weak-order polytope, because the following seven inequalities on the choice proportions Qi,j are in contradiction with the facet-defining inequalities of the weak-order polytope (the remaining 99 facet-defining inequalities are satisfied). Qc,d + Qd,a − Qc,a = 1.1 > 1 (16) Qc,d + Qd,b − Qc,b = 1.04 > 1 (17) Qb,c + Qc,d − Qb,d = 1.04 > 1 (18) Qd,b + Qb,c − Qd,c = 1.04 > 1 (19) Qa,c + Qc,d − Qa,d = 1.1 > 1 (20) Qd,a + Qa,c − Qd,c = 1.1 > 1. (21) Qa,c + Qb,c + Qc,d + Qd,a + Qd,b − Qa,b − Qb,a − Qd,c = 2.14 > 2 . (22)
Table 3. Point estimates for the weak order polytope. (i, j)
(a, b)
(a, c)
(a, d)
(b, c)
(b, d)
(c, d)
Estimated probability i preferred to j
0.8
0.77
0.51
0.76
0.50
0.73
(i, j)
(b, a)
(c, a)
(d, a)
(c, b)
(d, b)
(d, c)
Estimated probability i preferred to j
0.2
0.23
0.49
0.24
0.50
0.27
The fact that the choice proportions lie outside the polytope may or may not be an indication that the underlying choice probabilities violate the mixture model over weak orders. All in all, the first six inequalities present potential violations of the triangle inequality, whereas (22) presents a potential violation of inequality constraint WO5 on the choice proportions generating the data. Because the choice proportions lie outside the weak-order polytope, the maximum likelihood point estimate (subject to the weak-order polytope constraints) is a point on the boundary of the polytope. Table 3 shows the point estimate vector of ternary paired comparison probabilities. Following DavisStober [8], we find that the log-likelihood ratio test statistic (at the point estimate) has a χ ¯2 distribution equal to 12 + 12 χ21 (with estimated weights
118
Michel Regenwetter and Clintin P. Davis-Stober
rounded to two significant digits). The point estimate of the log-likelihood ratio is 0.4964, which gives a p-value of 0.24 for the goodness-of-fit, indicating a good statistical fit of the weak order model. The maximum likelihood point estimate vector in Table 3 lies in a fourdimensional face of the weak-order polytope, in a subspace given by the following system of 13 (redundant) linear equalities. Pa,b + Pb,a = 1, Pa,c + Pc,a = 1, Pa,d + Pd,a = 1, Pb,c + Pc,b = 1, Pb,d + Pd,b = 1, Pc,d + Pd,c = 1, Pc,d − Pc,a + Pd,a = 1, Pc,d − Pc,b + Pd,b = 1, Pb,c − Pb,d + Pc,d = 1, Pb,c + Pd,b − Pd,c = 1, Pa,c − Pa,d + Pc,d = 1, Pa,c + Pd,a − Pd,c = 1, Pa,c + Pb,c + Pc,d + Pd,a + Pd,b − Pa,b − Pb,a − Pd,c = 2 .
(23) (24) (25) (26) (27) (28)
This is the intersection of the facets whose facet-defining inequalities the choice proportions violate, intersected with the trivial facets whose facet-defining inequalities are of the form WO2 . (Recall that this is not automatic.) 4.3 Testing For Partial-Order Preferences The complete minimal description of the semiorder and interval-order polytopes, even for |C| = 4, goes beyond the scope of this tutorial for lack of published theorems in the literature. We provide these elsewhere [38]. Rather we turn now to our most general model of transitive preference. Here, the “core theory” [29] states that individual preferences are partial orders: transitive, reflexive, and antisymmetric binary relations. The natural model here is the partial-order polytope for ternary paired comparison probabilities without indifference. Theorem 5. [11] Ternary paired comparison probabilities without indifference on four choice alternatives are induced by partial orders if and only if the following list of facet-defining inequalities for the partial-order polytope on four objects is satisfied (with quantifiers omitted, and a, b, c, d distinct). PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8
: : : : : : : :
−Pa,b ≤ 0, Pa,b + Pb,a ≤ 1, Pa,b + Pb,c − Pa,c ≤ 1, Pa,b + Pb,c + Pc,a − Pb,a − Pc,b − Pa,c ≤ 1, Pa,b + Pb,c + Pc,d + Pd,a − Pb,a − Pa,c − Pd,b ≤ 2, Pa,b + Pb,c + Pc,a + Pb,d + Pd,b − Pc,b − Pb,a − Pd,c − Pd,a ≤ 2, Pa,b + Pb,c + Pa,d + Pd,c − Pb,d − Pd,a − 2Pa,c ≤ 2, Pa,b + Pb,c + Pc,b + Pb,d − 2Pb,a − 2Pa,c − 2Pc,d − 2Pd,b ≤ 3 .
Many Models of Transitive Preference
119
As Fiorini [11] discusses in detail, the partial-order polytope for four choice alternatives is a 12-dimensional polytope with 219 vertices and with 128 different facets. It has 12 facets of type PO1 , 6 of type PO2 , 24 of type PO3 , 8 of type PO4 , 24 of type PO5 , 24 of type PO6 , 24 of type PO7 , as well as 6 of type PO8 . Of these 128 facet-defining inequalities, the observed choice proportions in Table 1 violate 8 and satisfy 120. The triangle inequality PO3 is the same as before, and hence is violated 6 times. In addition, we find the following two violations, Qb,c + Qc,d + Qd,b − Qb,d − Qd,c − Qc,b = 1.08 > 1 Qa,c + Qc,d + Qd,a − Qa,d − Qd,c − Qc,a = 1.2 > 1 .
(29) (30)
The log-likelihood ratio test statistic point estimate is 0.4919 with the same distribution and p-value as the weak-order case (with weights and p-value rounded to two significant digits). The lowest-dimensional face of the partialorder polytope that the binary choice point estimate lies in is the intersection of the partial-order polytope with the subspace generated by the following (redundant) system of equations. Pa,b + Pb,a = 1, Pa,c + Pc,a = 1, Pa,d + Pd,a = 1, Pb,c + Pc,b = 1, Pb,d + Pd,b = 1, Pc,d + Pd,c = 1, Pc,d − Pc,a + Pd,a = 1, Pc,d − Pc,b + Pd,b = 1, Pb,c − Pb,d + Pc,d = 1, Pb,c + Pd,b − Pd,c = 1, Pa,c − Pa,d + Pc,d = 1, Pa,c + Pd,a − Pd,c = 1, Pb,c − Pb,d + Pc,d − Pc,b + Pd,b − Pd,c = 1, Pa,c − Pa,d + Pc,d − Pc,a + Pd,a − Pd,c = 1 .
(31) (32) (33) (34) (35) (36) (37)
We mention in passing that [38] offer a complete minimal description of the semiorder and interval-order polytopes when |C| = 4 and carry out an illustrative fit with the data of Table 1. In those analyses, the maximum likelihood point estimates, goodness-of-fit, and χ ¯2 (with weights rounded to two significant digits) match the ones we have reported here for the partialorder polytope. Also, in both cases, the smallest-dimensional face containing the maximum likelihood points estimate lies again in the subspace generated by (31)−(37). To summarize our analyses so far (and those of [38]), the data of Table 1 are compatible with being induced by linear orders, weak orders, semiorders, interval orders, or partial orders.
5 An Illustration with Indifferences Using Between-Subjects Data To illustrate an example with nonzero indifference counts and with a betweensubjects design, we analyze data from a survey taken from Dittrich et al. [9]
120
Michel Regenwetter and Clintin P. Davis-Stober
in which respondents made paired comparisons among European universities. We use the full data on all paired comparisons, but restricted to four of the universities, namely London, Paris, St. Gallen, and Barcelona (see [9] for details on the empirical task).8 The data on three of the schools were previously analyzed by Bockenholt [3] with the purpose of fitting several models of semiorder preferences within a random utility framework with an error component (see the original paper for more details). Table 4. Ternary paired comparison among universities, proportion of choices out of 303 survey respondents, for London (a), Paris (b), St. Gallen (c), and Barcelona (d). Lack of choice of a university in a given pair is coded as indifference. School pair (i, j)
(a, b)
(a, c)
(a, d)
(b, c)
(b, d)
(c, d)
Proportion i chosen over j : Qi,j
186 303 91 303 26 303
208 303 73 303 22 303
217 303 67 303 19 303
165 303 119 303 19 303
157 303 109 303 37 303
144 303 134 303 25 303
Proportion j chosen over i : Qj,i Proportion indifference between i & j
The ternary paired comparison proportions in Table 4 can be treated either as ternary paired comparisons with or without indifference by collapsing cells accordingly. Thus, we can check whether or not they lie inside the weak order, semiorder, interval-order, or partial-order polytopes. When collapsed into ternary paired comparison with indifference (by adding the third row to each of the first two rows), the resulting proportions satisfy all 106 facet-defining inequalities of the weak-order polytope for four objects. When collapsed into ternary paired comparison proportions without indifference (by ignoring the third row in the table), the data satisfy all 563 facet-defining inequalities of the semiorder polytope [38], all 191 facet-defining inequalities of the interval order polytope, [38] and all 128 facet-defining inequalities of the partial-order polytope for four choice alternatives. This means that the corresponding maximum likelihood point estimates under these model constraints are the ternary paired comparison proportions themselves and that all four models display perfect statistical fit on this dataset. In other words, the ternary paired comparison data in Table 4 are induced by weak, semi-, interval, and partial orders. Put more generally, these data are consistent with a model that claims that each individual’s response in each paired comparison in the survey originated from a transitive state of preference (with different paired comparisons by the same respondent possibly originating from different transitive preference states). One disclaimer is in order, however: clearly, this does not logically force preferences to be transitive. 8
The full data are available at [47].
Many Models of Transitive Preference
121
6 Conclusions Transitive preferences can be modeled in many different ways. The standard operationalization of transitive preference in terms of weak stochastic transitivity is only one of many probabilistic specifications of transitive preference, and it is a problematic one. We have laid out how to formulate and test a broad class of mixture models (and equivalent random utility models when available) that all have the common feature of modeling transitive preference and that differ in which additional axioms they posit. In particular, they differ in whether they model transitive or intransitive indifference. We have documented on illustrative experimental data how to analyze binary choice or ternary paired comparison data statistically. Note that none of the models in this chapter are explicit “error models,” although one could interpret weak stochastic transitivity as an error model (in which respondents are allowed to make errors with probability arbitrarily close to 12 ) and one could study mixture models that have an error “flavor” to them. A simple example is a parametric family of models that generalize the Mallows φ-model [33]. In this model class one assumes that there is a single “true” modal binary relation (which may be treated as a free parameter) and other binary relations in a given class are “random distortions” of the modal relation with their probability being a decreasing function of the amount of disagreement with the modal preference relation. The original Mallows φmodel is an example of such a model for linear orders. In general, “error theories” could alternatively be added on top of the mixture models we have discussed here. We leave this for future consideration. We also leave it for future work to carry out rigorous and systematic empirical tests of preference transitivity using the framework we have reviewed. The existing body of work on transitive or intransitive preference has largely ignored the modeling concerns we have addressed above and it has almost completely ignored the statistical challenges that face anyone who tests models with inequality constraints. Furthermore, the empirical literature has yet to incorporate ternary paired comparison systematically (see [3] for an exception). We would not venture to speculate about the prevalence of preference transitivity in the real world. Suffice it to say that it would be premature to make general claims about whether empirical choice proportions may have originated from transitive preference (and whether empirical choice proportions may have originated from preference structures with transitive indifference) without extensive further study. Future work will ideally move beyond four choice alternatives, in order to enhance the parsimony of all the models we have discussed. As we discuss elsewhere [38], these models can all be viewed as being extremely parsimonious when |C| ≥ 5. The key hurdle for the expansion to five or more choice alternatives lies in the combinatorial explosion we face, for example, in the numbers of vertices and facet-defining inequalities for the relevant polytopes.
122
Michel Regenwetter and Clintin P. Davis-Stober
Acknowledgments This material is based upon work supported by the Air Force Office of Scientific Research, Cognition and Decision Program, under Award No. FA955005-1-0356 entitled “Testing Transitivity and Related Axioms of Preference for Individuals and Small Groups” (to M. Regenwetter, PI) and by the National Institutes of Mental Health under Training Grant Award Nr. PHS 2 T32 MH014257 entitled “Quantitative Methods for Behavioral Research” (to M. Regenwetter, PI). Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the Air Force Office of Scientific Research or of the National Institutes of Mental Health. The human subjects research in our reported pilot study was approved by the Institutional Review Board of the University of Illinois at Urbana-Champaign under IRB Nr. 05178. We thank the Royal Statistical Society for availability of the survey data we have analyzed here. We are grateful to Ulf B¨ ockenholt for pointers to the survey data and to Lyle Regenwetter for his valuable assistance with preparation of this manuscript. Jean-Paul Doignon, Samuel Fiorini, William Messner, and Reinhard Suck, as well as an anonymous referee, gave helpful comments on earlier versions.
References 1. P. J. Bickel and K. A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, volume 1. Prentice-Hall, Upper Saddle River, NJ, 2001. 2. H. D. Block and J. Marschak. Random orderings and stochastic theories of responses. In I. Olkin, S. Ghurye, H. Hoeffding, W. Madow, and H. Mann, editors, Contributions to Probability and Statistics, pages 97–132. Stanford University Press, Stanford, CA, 1960. 3. U. B¨ ockenholt. Thresholds and intransitivities in pairwise judgments: A multilevel analysis. Journal of Educational and Behavioral Statistics, 26:269–282, 2001. 4. G. Bolotashvili, M. Kovalev, and E. Girlich. New facets of the linear ordering polytope. SIAM Journal on Discrete Mathematics, 12:326–336, 1999. 5. M. Cohen and J.-C. Falmagne. Random scale representations of binary choice probabilities: A counterexample to a conjecture of Marschak. Unpublished manuscript, Department of Psychology, New York University, 1978. 6. M. Cohen and J.-C. Falmagne. Random utility representation of binary choice probabilities: A new class of necessary conditions. Journal of Mathematical Psychology, 34:88–94, 1990. 7. M. Condorcet. Essai sur l’application de l’analyse ` a la probabilit´e des d´ecisions rendues ` a la pluralit´e des voix (Essai on the application of the probabilistic analysis of majority vote decisions). Imprimerie Royale, Paris, 1785. 8. C. P. Davis-Stober. Multinomial models under inequality constraints: Applications to measurement theory. Manuscript submitted, 2008.
Many Models of Transitive Preference
123
9. R. Dittrich, R. Hatzinger, and W. Katzenbeisser. Modelling the effect of subjectspecific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society: Series C, 47:511–525, 1998. 10. S. Fiorini. Determining the automorphism group of the linear ordering polytope. Discrete Applied Mathematics, 112:121–128, 2001. 11. S. Fiorini. Polyhedral Combinatorics of Order Polytopes. PhD thesis, Universit´e Libre de Bruxelles, 2001. 12. S. Fiorini and P. C. Fishburn. Weak order polytopes. Discrete Mathematics, 275:111–127, 2004. 13. P. C. Fishburn. Intransitive indifference in preference theory. Journal of Mathematical Psychology, 7:207–228, 1970. 14. P. C. Fishburn. Utility Theory for Decision Making. R. E. Krieger, Huntington, NY, 1979. 15. P. C. Fishburn. Interval Orders and Interval Graphs. Wiley, New York, 1985. 16. P. C. Fishburn. Binary probabilities induced by rankings. SIAM Journal of Discrete Mathematics, 3:478–488, 1990. 17. P. C. Fishburn. Induced binary probabilities and the linear ordering polytope: A status report. Mathematical Social Sciences, 23:67–80, 1992. 18. P. C. Fishburn and J.-C. Falmagne. Binary choice probabilities and rankings. Economic Letters, 31:113–117, 1989. 19. I. Gilboa. A necessary but insufficient condition for the stochastic binary choice problem. Journal of Mathematical Psychology, 34:371–392, 1990. 20. M. Gr¨ otschel, M. J¨ unger, and G. Reinelt. Facets of the linear ordering polytope. Mathematical Programming, 33:43–60, 1985. 21. B. Gr¨ unbaum. Convex Polytopes. Wiley, New York, 1967. 22. D. Heyer and R. Nieder´ee. Elements of a model-theoretic framework for probabilistic measurement. In E. E. Roskam, editor, Mathematical Psychology in Progress, pages 99–112. Springer, Berlin, 1989. 23. D. Heyer and R. Nieder´ee. Generalizing the concept of binary choice systems induced by rankings: One way of probabilizing deterministic measurement structures. Mathematical Social Sciences, 23:31–44, 1992. 24. G. J. Iverson and J.-C. Falmagne. Statistical issues in measurement. Mathematical Social Sciences, 10:131–153, 1985. 25. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979. 26. M. Koppen. Random utility representations of binary choice probabilities. In J.-P. Doignon and J.-C. Falmagne, editors, Mathematical Psychology: Current Developments, pages 181–201. Springer, New York, 1991. 27. M. Koppen. Random utility representation of binary choice probabilities: Critical graphs yielding critical necessary conditions. Journal of Mathematical Psychology, 39:21–39, 1995. 28. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky. Foundations of Measurement, volume 1. Academic Press, San Diego, 1971. 29. G. Loomes and R. Sugden. Incorporating a stochastic element into decision theories. European Economic Review, 39:641–648, 1995. 30. R. D. Luce. Semiorders and a theory of utility discrimination. Econometrica, 26:178–191, 1956. 31. R. D. Luce, D. H. Krantz, P. Suppes, and A. Tversky. Foundations of Measurement, volume 3. Academic Press, San Diego, 1990.
124
Michel Regenwetter and Clintin P. Davis-Stober
32. R. D. Luce and P. Suppes. Preference, utility and subjective probability. In R. D. Luce, R. R. Bush, and E. Galanter, editors, Handbook of Mathematical Psychology, volume III, pages 249–410. Wiley, New York, 1965. 33. C. L. Mallows. Non-null ranking models I. Biometrika, 44:114–130, 1957. 34. J. Marschak. Binary-choice constraints and random utility indicators. In K. J. Arrow, S. Karlin, and P. Suppes, editors, Proceedings of the First Stanford Symposium on Mathematical Methods in the Social Sciences, pages 312–329. Stanford University Press, Stanford, CA, 1960. 35. J. I. Myung, G. Karabatsos, and G. J. Iverson. A Bayesian approach to testing decision making axioms. Journal of Mathematical Psychology, 49:205–225, 2005. 36. R. Nieder´ee and D. Heyer. Generalized random utility models and the representational theory of measurement: A conceptual link. In A. A. J. Marley, editor, Choice, Decision and Measurement: Essays in Honor of R. Duncan Luce, pages 155–189. Lawrence Erlbaum, Mahwah, NJ, 1997. 37. M. Regenwetter. Random utility representations of finite m-ary relations. Journal of Mathematical Psychology, 40:219–234, 1996. 38. M. Regenwetter and C. P. Davis-Stober. Random preference models of transitive preference. Manuscript in preparation, 2008. 39. M. Regenwetter, J. Dana, and C. P. Davis-Stober. Transitivity of preferences. Manuscript submitted, 2008. 40. M. Regenwetter and A. A. J. Marley. Random relations, random utilities, and random functions. Journal of Mathematical Psychology, 45:864–912, 2001. 41. F. S. Roberts. Measurement Theory. Addison-Wesley, London, 1979. 42. T. Robertson, F. T. Wright, and R. L. Dykstra. Order Restricted Statistical Inference. Wiley, New York, 1988. 43. L. J. Savage. The Foundations of Statistics. Wiley, New York, 1954. 44. R. Suck. Geometric and combinatorial properties of the polytope of binary choice probabilities. Mathematical Social Sciences, 23:81–102, 1992. 45. R. Suck. Probabilistic biclassification and random variable representations. Journal of Mathematical Psychology, 41:57–64, 1997. 46. P. Suppes, D. H. Krantz, R. D. Luce, and A. Tversky. Foundations of Measurement, volume II. Academic Press, San Diego, 1989. 47. The Royal Statistical Society datasets website. http://www. blackwellpublishing.com/rss/Volumes/SeriesCvol47.htm. Last accessed January 2008. 48. I. Tsetlin, M. Regenwetter, and B. Grofman. The impartial culture maximizes the probability of majority cycles. Social Choice and Welfare, 21:387–398, 2003. 49. J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1947. 50. G. M. Ziegler. Lectures on Polytopes. Springer-Verlag, Berlin, 1995.
Integrating Compensatory and Noncompensatory Decision-Making Strategies in Dynamic Task Environments Ling Rothrock and Jing Yin The Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, PA 16802, USA lxr28,[email protected] Summary. This chapter summarizes the ongoing work to analyze compensatory and noncompensatory decision-making behaviors using a common framework known as Brunswik’s lens model. The authors begin with a survey of existing work in modeling compensatory decision-making behavior using the lens model and an overview of initial forays into noncompensatory modeling. An example is provided of an instance in which both compensatory and noncompensatory decision making may occur in the same task under different circumstances. Formulations of the lens models to account for both types of decision behaviors are then discussed followed by speculations on a consolidated framework to incorporate both models.
1 Background 1.1 Decision-Making Strategies Decision-making strategies are often characterized as either compensatory or noncompensatory [8,18,24]. Under a compensatory mode of decision making, information is processed exhaustively and trade-offs need to be made between attribute cues. Noncompensatory strategies, on the other hand, generally do not make use of all available information and trade-offs are often ignored. Noncompensatory decision models are consistent with the concept of bounded rationality. These strategies can be used to reduce the number of alternatives to be carefully evaluated and therefore improve the processing efficiency [1,18,19]. Moreover, decision makers are also sensitive to the increase of problem size or time constraint and findings suggest that they shift to noncompensatory strategies to save cognitive effort and improve processing efficiency [19]. In this chapter, we propose that different analytical approaches should be used to analyze compensatory and noncompensatory decision-making behaviors. Moreover, we seek to develop a common framework to analyze both decision behaviors.
126
Ling Rothrock and Jing Yin
In compensatory strategies, attributes with high values can compensate for attributes with low cues; the influence of any attribute on the choice of the alternatives may be offset by other attributes. Common compensatory strategies include: • Weighted additive—attributes have different weights; • Equal weight—a special case of weighted additive strategy. All attributes have equal weights In a noncompensatory strategy, a decision determined by some attributes cannot be reversed by other attributes. Noncompensatory decisions shortcut the compensatory process and make decision making easier. People usually do not systematically collect all relevant information and often ignore the relative importance of various attributes when they make noncompensatory decisions. Some noncompensatory strategies include: • Conjunctive—alternatives that fail to meet a minimum value requirement on each attribute will be eliminated from the choice set, • Disjunctive—alternatives are evaluated on the basis of maximum values for each attribute, • Elimination by aspects [26]—during a sequential process, the decision maker selects an attribute according to its relative importance and eliminates all alternatives that do not meet a specific criterion (i.e., attribute value) until only one alternative remains, • Lexicographic [16]—alternatives are ranked on the basis of the most important attribute. In the case of ties, the process proceeds sequentially to the next important attribute until only one alternative remains, • Take the best [11]—the decision maker uses a sequence of rules to choose an alternative in the face of uncertain knowledge. Research findings suggest that compensatory strategies usually require more information processing and integration [13,18], and are therefore more difficult to implement than noncompensatory strategies. However, it is also well known that it is simpler to model strategies that are cognitively more demanding than less demanding strategies. The linear regression model (1) is the most commonly used representation for the compensatory decisions [4]. fc (X) = b0 +
k X
bi X i + ε c .
(1)
i=1
The value fc (X) is the utility score of the alternative to be evaluated, bi is the weight for the ith cue, Xi is the value for the ith cue, and εc is a random error term. For noncompensatory strategies, a complementary model is offered in (2) [27]. ! _ ^ fn (X) = X i + εn , (2) s
k
Decision-Making in Dynamic Task Environments
127
W where is the Boolean disjunction over s, the set containing all rules, and V is the Boolean conjunction over k, the set containing all cues. The noncompensatory model is represented in the disjunctive normal form (DNF). That is, it is a statement consisting of a disjunction (a sequence of ORs), each of which is a conjunction (an AND) of one or more literals. Mendelson has shown that any strategy consisting of basic logical operators can be reduced to DNF [17]. As such, (2) can be used to model strategies ranging from simple conjunctive and disjunctive rules [8] to the process-dependent fast and frugal heuristics [11,21]. 1.2 An Example A restaurant example is provided to illustrate the key difference between two modes of decision making. The example is an extension of [27]. Suppose that Mr. Smith is choosing between four restaurants (A, B, C, and D) for an evening out. We suppose that each candidate restaurant has been reviewed by a local newspaper food critic and has established a reputation. Assume there are four hypothetical restaurants and their scores are given in Table 1. Desirability represents the attractiveness of the restaurant to the decision Table 1. Review scores for the restaurants. Restaurant
Local Reputation
Newspaper Review
Desirability
A
5.0
5.0
0.95
B
4.7
3.0
0.75
C
4.1
3.5
0.70
D
2.0
1.0
0.20
maker Mr. Smith (1 – completely desirable; 0 – completely undesirable). As a compensatory model, we can represent the restaurant choice problem as (3). Desirability = −0.17+0.15×Local Reputation+0.08×Newspaper Review (3) We see that the desirability of a restaurant is determined by weighing all the available information. Model (3) also indicates that Mr. Smith considers Local Reputation twice as important as Newspaper Review when using them to evaluate restaurants. In this case, he will choose restaurant A to dine out. Using Desirability as a response variable and Local Reputation and Newspaper Review as predictor variables in a linear regression, we can account for the variance exhibited by the data in Table 1. Now, consider the case that Mr. Smith is under time pressure to choose a restaurant, eat quickly, and return to a late business meeting. In this situation,
128
Ling Rothrock and Jing Yin
although he is still concerned about the quality of the restaurant, he is more interested in finishing on time. For the four restaurants, he saw the following information (see Table 2). Table 2. General restaurant information. Restaurant
Seating Line
Occupancy
Desirability
A
Full
Full
0
B
Full
Seat Available
1
C
Empty
Full
1
D
Empty
Seat Available
0
Whereas Restaurant A was highly desirable in the compensatory case, in this case, Restaurant A is undesirable in a time-stressed situation. Moreover, because an empty restaurant is a likely indicator of the quality of the food, the decision maker is not interested in Restaurant D either. Therefore, as a noncompensatory model, we can represent the choice problem as (4). [(Seating Line = Full AND Occupancy = Full)
→ (Desirability = 0)]
OR [(Seating Line = Full AND Occupancy = Seat Available)
→ (Desirability = 1)] (4)
OR [(Seating Line = Empty AND Occupancy = Full)
→ (Desirability = 1)]
OR [(Seating Line = Empty AND Occupancy = Seat Available)
→ (Desirability = 0)]
The noncompensatory restaurant choice model is an example of an exclusiveOR relationship that cannot be modeled using linear methods. The best linear model results in the noninformative solution (5). The model cannot account for any variance of the data in Table 2. Desirability = 0.5 + 0 × Seating Line + 0 × Occupancy
(5)
The example demonstrates the key difference between compensatory and noncompensatory, or more specifically, rule-based models of decision makings. Although linear additive models can often approximate nonlinear strategies, they are sometimes inadequate to represent rule-based behaviors such as that exhibited in the restaurant example. We want to address that different analytical approaches should be used to analyze compensatory and noncompensatory decision-making behaviors. To analyze both decision behaviors, we seek to develop a common framework. As a starting point, we begin with Brunswik’s lens model.
Decision-Making in Dynamic Task Environments
129
2 Brunswik’s Lens Model The theoretical basis for the lens model traces back to Brunswik’s probabilistic functionalism [5], which specifies a functional relationship between the organism and its environment as mediated through probabilistic relations within a set of environmental cues. A common representation of the lens model is given in Figure 1 where the task ecology is represented in the left half and human judgment system is represented in the right half. The subjects make judgment Ys on an environmental variable measured by criterion Ye , by utilizing a set of cues X. The symmetrical structure of the lens model provides a means for evaluating judgment behaviors as the relationship between the judgments and its situated task ecology. The lens model has been applied to study a variety of behaviors and conceptual problems [6].
Fig. 1. Brunswik’s lens model.
The existing analytical framework for the lens model focuses on using linear models (i.e., regression) to determine the fitness of human judgment relative to the environment constraints, however, its power is tempered by cases of contingent decision behaviors, where the nature of judgments is driven by noncompensatory strategies. We developed a new approach called the rulebased lens model (RLM) to model noncompensatory decision-making behaviors [27]. In RLM, the direct use of rule-based modeling within the lens model framework was explored and the underlying inductive system for building the rule-based models is logical inference [20]. It differs from the traditional version of the lens model (LM) where multiple regression analysis is used to generate linear models for both sides of the system. Moreover, a new computational framework based on match frequency was developed to serve as the complement for statistical correlation analysis used in LM. The next two sections describe the formulation of the lens framework, as originally conceived by [5], to explicate both types of decision-making behavior.
130
Ling Rothrock and Jing Yin
2.1 The Traditional Lens Model The traditional lens model can be used to evaluate and characterize compensatory decision behaviors such as (3) in the restaurant choice example. A central component of LM is to use multiple regression to capture the judge’s policy or task environment model. Historically, multiple linear regression is the tool most commonly used to capture the judgment policies of an individual decision maker [6]. Nevertheless, LM models do not provide a complete depiction of decision process features [12]. We provide a short description of the mechanisms of the LM. Assume that the judge is presented with a series of m profiles where in each profile, the values for the k cues are given. The judge is required to produce a judgment for each profile. Let X1 , . . . , Xk denote the values for the cues, Ys represent the value of the judgment, and Ye represent the criterion value. The regression equation produced by the policy-capturing determines the optimal weight for each cue in terms of its predictive contribution to the judgment. The general form of linear judgment model is given as Ys = b0 +
k X
bi X i + e ,
(6)
i=1
where b0 is the regression constant, bi is the regression coefficient for each cue, and e is an error term. In LM (see Figure 2), the lens model parameters are calculated as statistical correlations. Let Re = corr(Ye , Yˆe ), Rs = corr(Ys , Yˆs ), G = corr(Yˆe , Yˆs ), C = corr[(Ye − Yˆe ), (Ys − Yˆs )], and ra = corr(Ye , Ys ). Correlation deals with the problem of finding a linear relationship between two sets of sampled data. A standard measure for the linear relationship is represented by the correlation coefficient. The most commonly used measure of correlation for variables that are measured on interval or ratio scales is Pearson’s product-moment correlation coefficient (r). In LM, ra represents how human decisions adapt to the actual value of the environmental criterion. The environmental predictability Re measures how well the environmental model predicts the actual criterion value. Human control Rs indicates how well the linear policy model captures actual human judgments. Linear knowledge G is designed to estimate how well the linear prediction model of the environment maps onto the policy model of the human judge. Unmodeled knowledge C measures how well the two models (one for the environment and the other for the human judgment) share the common points that are not captured in the corresponding linear models. A mathematical formulation for the relations between components of the model was introduced by Hursch et al. [15]. Tucker [25] made important modifications of their work and the lens model equation (LME) is given as follows, p p ra = GRe Rs + C 1 − Re2 1 − Rs2 . (7)
Decision-Making in Dynamic Task Environments
131
Fig. 2. Traditional lens model (LM) with labeled statistical parameters [27].
The LME decomposes a judge’s achievement into two components. The linear component GRe Rs captures the linear relationship within thepecology p and human judgment systems. The second component C 1 − Re2 1 − Rs2 represents the contribution of unmodeled aspects of the judgment and ecology models. LME can be viewed as a coherent modeling methodology that links the regression-based policy capturing model with the regression-based criterion model for comparison between human judgment and ecological systems [6]. 2.2 Rule-Based Lens Model To analyze noncompensatory decisions such as (2), we propose another method using a rule-based version of the lens model. RLM was built on the idea that people use rules or heuristics to make decisions to cope with the constraints posed in the environment. The key components of RLM are: • The rule inference system to build rule-based models to represent the relationship between cue-criterion and cue-judgment • Rule-based lens model analysis to complement the lens model equation (7). Rule Inference System The majority of efforts to model rule-based decision strategies has been to use nonlinear additive formulations [8–10]. Additive models, however, are inadequate to account for the noncompensatory decision behavior described in the restaurant example. Rothrock and Kirlik [20] developed an alternative technique called genetic-based policy capturing (GBPC) to infer noncompensatory rules in DNF form from human judgment data.
132
Ling Rothrock and Jing Yin
The inference process can be characterized as a multiobjective optimization problem where the search methodology must search for optimal solutions. The central element of the inductive inference system is the multiobjective fitness evaluation function. In LM, the least squares criterion is used to determine the optimal fit between the linear models with the actual data. However, in RLM, alternative measures need to be used to evaluate the goodness of the rule set. In GBPC, Rothrock and Kirlik [20] defined the fitness of a rule set along three dimensions: completeness, specificity, and parsimony. Rule sets that have high fitness values across all three dimensions will not only match a set of exemplars (i.e., human judgment data, environment data), but also it will resemble economical and noncompensatory judgment strategies. As such, the rule sets learned from the inductive inference system will be able to represent behaviors in a manner that is consistent with the notion of bounded rationality. Suppose we have a set of rules called a rule set S which consists of M disjuncts of conjunctive rules S1 , . . . , SM and an exemplar consisting of N judgment instances: E1 , . . . , EN . The goodness-of-fit measures can be defined along completeness, specificity, and parsimony. 1. Completeness c (0 ≤ c ≤ 1). The completeness dimension is based on work by DeJong et al. [7] and reflects how well a rule set matches the entire set of exemplars. In other words, it is the proportion of judgment instances that can be derived from rule set S. 2. Specificity s (0 ≤ s ≤ 1). The specificity dimension was first suggested by Holland et al. [14] and reflects how specific a rule set is with respect to the number of wild cards (match-anything conditions) it contains. Rule sets with fewer wild card characters are classified as more specific. For rule set S, specificity measures the average proportion of specified conditions in the conjunctive rules. 3. Parsimony p (0 ≤ p ≤ 1). The parsimony dimension reflects the goodness of a rule set in terms of necessity. In a parsimonious rule set, there are no unnecessary rules. For rule set S, it is the proportion of conjunctive rules that can actually cover at least one judgment instance. The aforementioned restaurant example is used to illustrate the concept and implementation details for the fitness evaluation. The judgment data in Table 2 represent the exemplar set consisting of four judgment instances (exemplars) as shown in Table 3. For this case, suppose we want to evaluate the fitness of five candidate rule sets (see Table 4) as the sample population used to learn the exemplars in the inference system. The rule sets consist of variable number of rules. For example, rule set No. 3 consists of three disjuncts of conjunctive rules. The fitness values for the five rule sets are given in Table 5. The completeness dimension discriminates between rule sets matching all exemplars (rule set No. 1), rule sets matching part of the exemplars (rule set No. 3 matches three exemplars; rule set No. 4 matches two exemplars; and rule set No. 5
Decision-Making in Dynamic Task Environments
133
Table 3. Exemplar set for the restaurant example. Exemplar No.
Exemplar
1
(seating line is full) AND (occupancy is full) AND (desirability is 0)
2
(seating line is full) AND (occupancy is seat available) AND (desirability is 1)
3
(seating line is empty) AND (occupancy is full) AND (desirability is 1)
4
(seating line is empty) AND (occupancy is seat available) AND (desirability is 0)
Table 4. Sample rule sets for the restaurant example. Rule Set No.
Rule Set
1
(seating line is anything) AND (occupancy is anything) AND (desirability is anything)
2
(seating line is empty) AND (occupancy is full) AND (desirability is 0)
3
[(seating line is full) AND (occupancy is full) AND (desirability is 0)] OR [(seating line is full) AND (occupancy is seat available) AND (desirability is 1)] OR [(seating line is empty) AND (occupancy is anything) AND (desirability is 1)]
4
[(seating line is empty) AND (occupancy is full) AND (desirability is 1)] OR [(seating line is empty) AND (occupancy is seat available) AND (desirability is 0)]
5
[(seating line is full) AND (occupancy is full) AND (desirability is 1)] OR [(seating line is full) AND (occupancy is seat available) AND (desirability is 0)] OR [(seating line is empty) AND (occupancy is anything) AND (desirability is 0)]
matches one exemplar), and rule sets matching none (rule set No. 2). The specificity dimension discriminates between rule sets that contain matchinganything conditions (rule set No. 1 contains three matching-anything conditions; rule set Nos. 3 and 5 both contain one match-anything condition), and
134
Ling Rothrock and Jing Yin
rule sets that do not (rule set Nos. 2 and 4). The third dimension, parsimony, discriminates between rule sets within which there are no unnecessary rules (rule set Nos. 1, 3, and 4), and those in which there are (rule set No. 2 contains one useless rule that does not match any exemplar; rule set No. 5 contains two such useless rules). Table 5. Sample rule sets’ fitness values. Rule Set No.
Completeness (c)
Specificity (s)
Parsimony (p)
1
1
0
1
2
0
1
0
3
0.75
0.8889
1
4
0.5
1
1
5
0.25
0.8889
0.3333
As we can see, rule set Nos. 1–4 score perfectly on some dimensions but not on all at once. For example, although rule set No. 1 is able to match all exemplars and does not contain any unnecessary rules, overall it represents an overgeneralized strategy (if anything then anything) and we do not gain any useful information from it. Rule set No. 2 represents a specific strategy but it does not match any exemplar. All these rule sets have different merits and drawbacks but no single rule set scores perfectly on all dimensions simultaneously. Given that we have finite numbers of rule sets S whose length varies from 1 to M , we want to find the set of rule sets that are Pareto optimal with respect to these three dimensions (c, s, p). The search process can be formulated as a multiobjective integer programming problem [22] where we have decision variables as bounded integer numbers and objectives as continuous numbers on three different dimensions. The ideal rule set, therefore, will match all behavioral data, will be maximally complete, specific, and parsimonious. The rule inference technique can be used to model both the human judgment as well as the task ecology. Rule-Based Lens Model Analysis The integration of rule-based model into the lens model framework is the second component of the RLM. In the case of the traditional lens model, the assessment of lens parameters can be undertaken using parametric statistical tools because the cue values and judgments are both measured on an interval scale. However, because the nature of different judgment strategies requires that the decision-making process operate on different scales [15,20,25], the notable difference between building a functional model of the human-ecology
Decision-Making in Dynamic Task Environments
135
system in compensatory, versus noncompensatory, terms lies not only on the mechanisms of induction, but also in the types of data to be analyzed. Standard linear regression methods are used to study compensatory strategies and ecologies, which require the dependent variables to be measured on an interval scale [23]. However, as shown in the modeling of noncompensatory judgment strategies [20], the rule-based formulation is built upon the assumption that human judgments, ecological cues, and the ecological criterion itself fall within a nominal scale. Therefore a different framework is needed to operate on values on this nominal scale. The development of this alternative framework [27] begins with a review of the fundamental differences between the two types of problems. Linear additive problems, which correspond to the linear-based formulation of the lens model, usually could be defined as attempting to predict the values of a continuous dependent variable from one or many continuous and/or categorical predictor variables. On the other hand, classification problems, which correspond to the rule-based formulation of the lens model, are basically those predicting values of a categorical dependent variable from one or many continuous and/or categorical predictor variables. For example, an existing aircraft identification task [2], requires subjects to make identification judgments (hostile, neutral, or friendly) of unknown tracks based on available information presented in the task environment, such as range from center, speed, altitude, and point of origination. This is an example of a multiple classification problems, where the dependent variable can take multiple categorical values. As previously discussed, statistical correlation serves as the basis for the data correspondence analysis in LM. Before any attempt of extending the same kind of approach to account for noncompensatory strategies, the assumptions under the correlation analysis must be clarified to see if the same computational framework can be applied to the rule-based case. Pearson’s product-moment correlation coefficient (r) makes the implicit assumption that the two variables need to be jointly normally distributed. When this assumption is not justified, other nonparametric measures such as Spearman’s rank correlation coefficient or Kendall’s tau require at least ordered data. When modeling noncompensatory strategies and ecologies, the data used to represent environmental cues, judgments, and criteria are discrete and categorical. As such, the numbers assigned to them are on nominal scales. This means the assignment is arbitrary, as the numbers are merely labels. As a consequence, it makes no sense to apply interval or ratio scale operations, and we cannot use the correlation coefficient as a measure of the correspondence between data groups. Therefore, a new RLM framework was developed to compare the results of fitted models of both cognitive strategies and the task ecology. Recall that the errors when modeling continuous data via regressions are always measured as the distances from the predicted values and actual values. In contrast, when modeling categorical data, if the right category is predicted, we call it a match; if the wrong category is predicted, it should be regarded as
136
Ling Rothrock and Jing Yin
an error no matter which wrong category is predicted. The concept is used to formulate the correspondence relationship between two nominal, categorical datasets. A simple example (see Table 6) illustrates the underlying formulation. Table 6. Two categorical datasets. Desirable
Undesirable
Mismatch
Undesirable
Desirable
Mismatch
Desirable
Desirable
Match
Desirable
Desirable
Match
Undesirable
Undesirable
Match
The example in Table 3 has 5 pairs of categorical data in which 3 of them are matched. We call the ratio denoting the number of match cases out of the total number of instances as match frequency: m = 3/5 = 0.6. In RLM, the lens model parameters (all bivariate correlations in the LM) Re , Rs , G, Ra are calculated as match frequencies. Regarding C, or the “unmodeled knowledge” term, because subtraction operations cannot be applied to nominal data, set relationships are introduced in the formulation [27]. Assuming the ecological criterion to be judged is Y , which takes the categorical values on nominal scales, let {Y1 , . . . , Yp } be the set of discrete values that Y can take. For instance i, 1 ≤ i ≤ n, Yei represents the actual environmental criterion value; Yˆei represents the predicted criterion value from the environmental rule-based model; Ysi represents the human judgment value; and Yˆsi represents the predicted judgment value from the human judgment rule-based model. Define 1 if Yei = Yˆei Iei = 0 otherwise 1 if Ysi = Yˆsi Isi = 0 otherwise 1 if Yei = Ysi Iri = 0 otherwise 1 if Yˆei = Yˆsi IGi = 0 otherwise 1 if Iei = Isi = 0 ICi = . 0 otherwise Using these binary variables, the RLM parameters are defined as follows,
Decision-Making in Dynamic Task Environments
Re = Rs = ra = G= C=
n X i=1 n X i=1 n X i=1 n X i=1 n X
137
Iei /n;
(8)
Isi /n;
(9)
Iri /n;
(10)
IGi /n;
(11)
ICi /n .
(12)
i=1
The definition of ICi differs from other indicators. It is defined as 1 if both environmental criteria and human judgments are not captured by the corresponding rule-based models. In such cases, mapping exists between the irregularities of the two sides (environment and human judgment systems). In contrast, if either human judgments or environmental criteria can be correctly predicted by the inductive rule-based models, the mapping does not exist in the sense that the irregularities do not occur simultaneously. Consequently, C measures the degree of mapping of systems’ irregularities between the rulebased ecological and judgment models. In the restaurant example, assume the criterion to be judged is the quality of the restaurant (1, good; 0, bad). The actual criterion values Ye and judgment data Ys are shown in Table 7. If we assume that the predicted criterion values Yˆe and judgment values Yˆs are generated from the rule-based models produced by the corresponding inference systems, then we can calculate the RLM parameters for this example according to (8)–(12). Table 7. RLM parameters calculation for the restaurant example i
Yei
Yˆei
Ysi
Yˆsi
Iei
Isi
Iri
IGi
ICi
1 2 3 4
1 1 1 0
0 1 1 0
0 1 1 0
1 1 1 0
0 1 1 1
0 1 1 1
0 1 1 1
0 1 1 1
1 0 0 0
Re = 0.75
Rs = 0.75
ra = 0.75
G = 0.75
C = 0.25
The interpretation of other lens model parameters for RLM is analogous to the linear case. Achievement ra represents the correspondence between human judgments and the actual value of the environmental criterion. The environmental predictability Re measures how well the rule-based environmental
138
Ling Rothrock and Jing Yin
model can be used to predict the criterion value; whereas Rs , labeled as human control, indicates how well the rule-based cognitive model captures actual human judgments. However, instead of being labeled as linear knowledge in LM, the parameter G now represents how well the rule-based model of the environment maps onto the rule-based model of the human judgments strategy. In RLM, the range of parameters ra , Re , Rs , G, and C is [0, 1]. Parameters ra , Re , Rs , and G are calculated as the match frequencies of the corresponding categorical datasets. The closer these values are to unity, the better achievement, environmental predictability, human control, and modeled knowledge they represent, respectively. As for C, higher values reveal a higher degree of unmodeled knowledge. Indeed, a high value of C revealed through RLM analysis may indicate a high degree of compensatory cue usage. The integrated RLM framework that accounts for noncompensatory strategies is depicted in Figure 3.
Fig. 3. The rule-based lens model (RLM) with labeled parameters [27].
2.3 Summary The major characteristics of LM and RLM are summarized in Table 8.
3 Future Work The development of RLM is still at its primitive stage. When building the rule learning system, it is hard for us to establish a necessary condition using the multidimensional evaluation to induce noncompensatory rules that are
Decision-Making in Dynamic Task Environments
139
Table 8. LM and RLM comparison. Prediction Mechanism
Parameter Calculation
LM
Multiple regressions
Statistical correlations
RLM
Rule learning
Boolean algebra
consistent with human judgment behavior. As for future work, we seek to refine the formulation of the fitness functions to establish the relationship. In addition, alternative dimensions can potentially be added to the global fitness of the model to make it more robust. An intelligent search algorithm is also needed to solve the optimization problem more efficiently and possibly to account for continuous cues. To date, a unified and coherent methodology does not exist for analyzing noncompensatory decisions such as LME for compensatory behaviors. We are currently working on the extension of Bjorkman’s approach [3] by using the probability models to refine the formulation of RLM. The ultimate research goal is to develop a mathematical model for RLM that can systematically incorporate different components of the total system. We believe such a model can greatly promote the theoretical understanding of human decision making shaped by the interaction with the environment.
Acknowledgments The authors would like to gratefully acknowledge the financial support provided by NSF under Grant No. SES-0452416. Any opinions, findings, and conclusions or recommendations expressed in this chapter are those of the authors and do not necessarily reflect the views of the NSF.
References 1. L. R. Beach. Measures of linear and nonlinear models of decision behavior: Process tracing versus policy capturing. Organizational Behavior and Human Decision Processes, 31:331–351, 1993. 2. A. M. Bisantz, A. Kirlik, P. Gay, D. Phipps, N. Walker, and A. D. Fisk. Modeling and analysis of a dynamic judgment task using a lens model approach. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 30:605–616, 2000. 3. M. Bjorkman. Stimulus-event learning and event learning as concurrent processes. Organizational Behavior and Human Performance, 2:219–236, 1967. 4. E. Brunswik. Thing consistency as measured by correlation coefficient. Psychological Review, 47:69–78, 1940. 5. E. Brunswik. Representative design and probabilistic theory in a functional psychology. Psychological Review, 62:193–217, 1955.
140
Ling Rothrock and Jing Yin
6. R. W. Cooksey. Judgment Analysis: Theory, Methods, and Applications. Academic Press, San Diego, 1996. 7. K. A. DeJong, W. M. Spears, and D. F. Gordon. Using genetic algorithms for concept learning. Machine Learning, 13:161–188, 1993. 8. H. J. Einhorn. The use of nonlinear, noncompensatory models in decision making. Psychological Bulletin, 73:221–230, 1970. 9. T. Elrod, R. D. Johnson, and J. White. A new integrated model of noncompensatory and compensatory decision strategies. Organizational Behavior and Human Decision Processes, 95:1–19, 2004. 10. Y. Ganzach and B. Czaczkes. On detecting nonlinear noncompensatory judgment strategies: Comparison of alternative regression models. Organizational Behavior and Human Decision Processes, 61:168–176, 1995. 11. G. Gigerenzer and D. G. Goldstein. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103:650–669, 1996. 12. P. J. Hoffman. The paramorphic representation of clinical judgment. Psychological Bulletin, 47:116–131, 1960. 13. R. Hogarth. Judgement and Choice. Wiley, New York, second edition, 1987. 14. J. H. Holland, K. F. Holyoak, R. E. Nisbett, and P. R. Thagard. Induction. The MIT Press, Cambridge, MA, 1986. 15. C. J. Hursch, K. R. Hammond, and J. L. Hursch. Some methodological considerations in multiple-cue probability studies. Psychological Review, 71:42–60, 1964. 16. K. R. MacCrimmon and D. M. Messick. A framework for social motives. Behavioral Science, 21:86–100, 1976. 17. E. Mendelson. Introduction to Mathematical Logic. Chapman & Hall, London, fourth edition, 1997. 18. J. W. Payne. Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16:366–387, 1976. 19. J. W. Payne, J. R. Bettman, and E. J. Johnson. The Adaptive Decision Maker. Cambridge University Press, Cambridge, UK, 1993. 20. L. Rothrock and A. Kirlik. Inferring rule-based strategies in dynamic judgment tasks: Toward a noncompensatory formulation of the lens model. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 33:58–72, 2003, 2003. 21. L. Rothrock and A. Kirlik. Inferring fast and frugal heuristics from human judgment data. In A. Kirlik, editor, Adaptive Perspectives on Human-Technology Interaction: Methods and Models for Cognitive Engineering and Human-Computer Interaction, pages 131–148. Oxford University Press, Oxford, 2006. 22. L. Rothrock, J. Ventura, and S. Park. An optimization methodology to investigate operator impact on quality of service. In Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, FL, 2003. 23. L. G. Schneider and T. I. Selling. A comparison of compensatory and noncompensatory models of judgment: Effects of task predictability and degrees of freedom. Accounting, Organizations and Society, 21:3–22, 1996. 24. O. Svenson. Process descriptions of decision making. Organizational Behavior and Human Decision Processes, 23:86–112, 1979. 25. L. R. Tucker. Suggested alternative formulation in the developments by Hursch, Hammond, and Hursch, and by Hammond, Hursch, and Todd. Psychological Review, 71:528–530, 1964.
Decision-Making in Dynamic Task Environments
141
26. A. Tversky. Elimination by aspects: A theory of choice. Psychological Review, 79:77–94, 1972. 27. J. Yin and L. Rothrock. A rule-based lens model. International Journal of Industrial Ergonomics, 36:499–509, 2006.
Repeated Prisoner’s Dilemma and Battle of Sexes Games: A Simulation Study Jijun Zhao1 , Ferenc Szidarovszky2 , and Miklos N. Szilagyi3 1
2
3
Complexity Institute, Qingdao University, Qingdao 266071, China [email protected] Department of Systems and Industrial Engineering, University of Arizona, Tucson, AZ 85721-0020, USA [email protected] Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721-0104, USA [email protected]
Summary. Agent-based simulation is used to examine the N -person prisoner’s dilemma and battle of sexes games. The N -person binary state prisoner’s dilemma model is extended first to the continuous state, and dynamic extensions are formulated with comformist, Pavlovian, greedy, accountant and influencing agents, where the influence of the media is also taken into consideration. For the N -person battle of sexes games, deterministic and stochastic dynamic models are developed. In all cases the dynamic equations are complicated large-scale nonlinear difference equations, the asymptotical behavior of which can be examined only by using computer simulation. Numerical results are also presented showing the dependence of the state trajectories and limits on model parameters and the initial states.
1 Introduction There is a large variety of finite two-person games, when each player has exactly two available choices (strategies) and only preference orderings of the payoffs are taken into account. Rapoport and Guyer [4] have shown that 576 such game types exist, and if we interchange the players and their payoffs, then this number further reduces to 78. The two choices of the players are usually called defection and cooperation, and there are two different payoff functions for defectors and cooperators. The most famous two-person game with binary choices is the prisoner’s dilemma (PD) game. The basic model can be explained in the following way. Assume that two criminals committed a serious crime, they were arrested, but the district attorney does not have sufficient evidence to convict them with the full crime, only with a lesser charge, unless at least one of the prisoners makes a confession. If the two criminals are isolated from each other in the prison, then they cannot communicate and each has two alternatives: confessing the full crime they committed or not. In the first case the confessing prisoner defects (D) from his or her partner,
144
Jijun Zhao et al.
and in the second case cooperates (C) with the partner. Because the prisoners cannot communicate about their decisions, there are four possible outcomes: (C,C), (C,D), (D,C), and (D,D), where the first term shows the choice of prisoner 1 and the second term is the choice of prisoner 2. Table 1 shows the consequences of their choices. The rows correspond to the possible choices of prisoner 1, the columns are associated with the choices of prisoner 2, and for each pair of choices the first number shows the punishment of prisoner 1 and the second number gives that of prisoner 2. We can imagine these negative numbers as years to be spent in prison. In the case of mutual defection both prisoners receive five years, in the case of mutual cooperation only two years sentences, and if only one of them defects, then the defecting prisoner ends up with a very light, one-year sentence and the other cooperating prisoner receives a very harsh, ten-year sentence. By considering this two-person noncooperative game, the only Nash equilibrium is mutual defection. Table 1. The prisoner’s dilemma game. 1/2
D
C
D
(−5, −5)
(−1, −10)
C
(−10, −1)
(−2, −2)
Another famous two-person binary-choice game is the battle of sexes (BOS) game, in which a husband and his wife have to decide their Saturday night program. Their preferences are different; however, they would like to spend the evening together. If either of them accepts the preferred program of the other, then this person is called the cooperator, otherwise he or she is a defector. Because the meanings and names of the participants are different in different games, in general analysis as well as in N -person extensions, the players are usually called agents. Letting x be the ratio of cooperators in the neighborhood of any agent, then the payoff of a cooperating agent is C(x) and that of a defecting agent is D(x), where C and D are given functions in [0, 1]. In the literature linear payoff functions are considered, and the game type is determined by the relative ordering of the values S = C(0), R = C(1), P = D(0), and T = D(1). Hence in this special case there are 4! = 24 game types. Szilagyi [6] claims that a game can be considered a dilemma, only if reward for mutual cooperation is greater than punishment for mutual defection, but the temptation to defect, that is, defecting alone, gives the largest possible payoff. These conditions can be mathematically represented as P < R < T . In the case of the prisoner’s dilemma S < P < R < T [5,7]. A typical nondilemma is the battle of the sexes game, in which T > S > P > R [2]. The outcome of the game depends on a combination of different assumptions and rules such as
Prisoner’s Dilemma and Battle of Sexes Games
145
• Order in which actions of the agents are taken (e.g., sequential or simultaneous) • Information structure (e.g., are agents informed about the actions of others or not) • Possible coalition formation • Objectives of the agents, which is determined by their personality types • Dynamic horizon of the game (e.g., one-shot, finite or infinite horizon) • Possible symmetry (same or different payoff functions for the agents) • Particular forms of the payoff functions • Distribution of agents in space (topology with possible relocations) • The definition of the neighborhoods of the agents In our study we assume that the games are uniform and iterated. The agents are distributed in and fully occupy a finite two-dimensional space, and updates are simultaneous. The payoff functions are linear. Each agent is able to observe the past actions and payoffs of its neighbors, where the number of neighbors is determined by the number of neighborhood layers around agents. Let (i, j) be an agent, then agent (p, q) belongs to its l-layer neighborhood if and only if |i − p| ≤ l and |j − q| ≤ l. Note that agents close to the boundary have fewer neighbors than agents in the middle. As a special case the neighborhood may even extend to the entire array of agents. When everything else is fixed the payoff curves and the personalities of the agents determine the game. For a systematic analysis of binary games see Merlone et al. [3]. There are even more outcome varieties if stochastic elements are introduced in the game. This chapter shows simulation results in two types of games, PD and BOS, and illustrates how special assumptions on personality types of the agents affect the outcomes. This chapter is organized as follows. Section 2 introduces generalized dynamic N -person PD games, describes personality types, and presents simulation results. The BOS game is introduced and analyzed in Section 3. Final comments and conclusions are presented in the last section.
2 Generalized Prisoner’s Dilemma Each agent is regarded as a person who has an attitude state with a certain degree of cooperation or defection. At each time period t each agent i is characterized by its state, which is a real value si (t) ∈ [0, 1], where 0 value corresponds to complete defection, 1 corresponds to unconditional cooperation, and value 0.5 corresponds to the neutral state. We consider two types of agents: those that influence others, and those that are being influenced. The former are called “influencing agents” and the latter just “agents.” In social psychology they are called agents and targets, respectively. An influencing agent tries to influence others in its neighborhood by encouragement or persuasion. The media can also influence agents by propaganda. The reaction of the agents to the actions of others including those of influencing
146
Jijun Zhao et al.
agents and the media depends on the personality of the agents. Because it is realistic to assume that agents might have combined personalities [10], in our model we assume that the personality of each agent is a combination of the following six personality types and indifference. Let k1 (i), k2 (i), k3 (i), k4 (i), k5 (i), and k6 (i) denote the personality coefficients of conformist, Pavlovian, persuasion, propaganda, being P greedy, and accountant, respectively. It is asP6 6 sumed that l=1 kl (i) ≤ 1. If l=1 kl (i) < 1, then agent i is indifferent to P6 degree 1 − l=1 kl (i). If kl (i) = 1, then agent i has completely personality type l, and if kl (i) = 0, then the agent does not have this type at all. Next we show the information structure that agents with different personality types use and how their states are changed in time. 2.1 Personality Types and Dynamics A conformist agent uses the average state of its neighborhood ui , si (t) =
1 X sj (t) , |ui | j∈u
(1)
i
and tends to follow the majority of its neighbors by moving to the direction of this average, so the adjustment in si (t) is given as f1 (si (t), si (t)) = ci (si (t) − si (t)) , where ci is the speed of adjustment of agent i. It is assumed that 0 < ci ≤ 1. Pavlovian agents change their states in the following manner. When si (t) = 0, then the reward of agent i is C(si (t)). When si (t) = 0, then its penalty is D(si (t)). Because si (t) is usually between 0 and 1, agent i’s reward is interpolated as Ri (t) = D(si (t)) + (C(si (t)) − D(si (t)))si (t) .
(2)
If this value is positive, then the agent is encouraged to continue its behavior toward cooperation or defection. Hence si (t) moves in the direction given by αi Ri (t) if si (t) ≥ 0.5 f2 (Ri (t)) = (3) −βi Ri (t) if si (t) < 0.5 , where αi and βi are positive coefficients. Individual persuasion depends on the distances dij between an agent i and all influencing agents j in its neighborhood. Define first Iij (t) = exp(−(dij − 1)2 /a) ∈ [0, 1] ,
(4)
for all j, where a is a shape parameter. This function is shown in Figure 1. Then the accumulated influence Ii (t) on agent i is the sum of the individual persuasive effects by all influencing agents in the neighborhood:
Prisoner’s Dilemma and Battle of Sexes Games
147
P Ii (t) = j∈ui Iij (t), and because repeated influence of the same kind loses its effect over time we assume that si (t) is adjusted with the term Ii (t) . τ ≤t Ii (τ )
f3 (t) = Ii (t) P
(5)
The effect of medium propaganda depends on the frequency F showing how many times the new idea (pushing for cooperation) is presented during time period t. The influence on agent i is represented by the function M (F (t)) =
2 −1, 1 + e−bF (t)
(6)
where b > 0 is a shape parameter. This function is shown in Figure 2. Similar to the case of influencing agents, M (F ) also loses effect in time, so si (t) is assumed to be adjusted with value M (F (t)) . τ ≤t M (F (τ ))
f4 (M (F )) = M (F (t)) P
(7)
Fig. 1. Persuasive effect.
A greedy agent looks for the highest reward received by anyone in its neighborhood and imitates the action of a neighbor who received the highest reward. If ji (t) denotes the index of the agent in ui with highest reward in time t, then it is assumed that a greedy agent i will adjust its state with the term f5 (t) = sji (t) (t) − si (t) . (8) By adding this adjustment term to the pervious state si (t) of the agent, the new state will become the complete imitation of the state of agent ji (t). In the case of accountant agents the state is adjusted similarly to Pavlovian agents; however, the previous reward Ri (t) is replaced by the average reward
148
Jijun Zhao et al.
Fig. 2. Propaganda effect.
Ri (t) =
t 1 X Ri (τ ) t + 1 τ =0
(9)
received for all previous actions. Then similarly to Equation (3), accountant agents adjust their states by the term si (t) moves in the direction given by αi Ri (t) if si (t) ≥ 0.5 f6 (Ri (t)) = (10) −β i Ri (t) if si (t) < 0.5 , where αi and β i are positive coefficients. The state of any agent with combined personality changes according to the difference equation si (t + 1) = si (t) + k1 (i)f1 (si (t), si (t)) + k2 (i)f2 (Ri (t)) + k3 (i)f3 (t) + k4 (i)f4 (M (F )) + k5 (i)f5 (t) + k6 (i)f6 (Ri (t)) . (11) If at any time period si (t + 1) becomes negative, then its value is changed to zero, and if it becomes larger than 1, then its value is changed to 1. This additional adjustment guarantees that si (t) remains in the feasible range between 0 and 1 through the entire time span under consideration. Equation (11) is a complicated high-dimensional system of difference equations with nondifferentiable right-hand sides, so its asymptotic behavior cannot be examined by using analytical methodology. In order to gain some insight, computer simulation is used, in which we actually solve Equation (11) with varying parameter values and observe the long-term behavior of the state trajectories. For general methodology see, for example, [7,8]. 2.2 Simulation Results The artificial society is located on a 100 by 100 two-dimensional lattice, so the maximum number of agents is 10,000. The dynamism of the system can be
Prisoner’s Dilemma and Battle of Sexes Games
149
illustrated by animation graphs. Any white cell shows that the agent at that location is a complete defector (with state 0), and any black cell indicates that the agent located here is a complete cooperator (with state 1). The levels of darkness in the cells show the states of the agents: darker cells mean larger values of the state (either lower level of defection or larger level of cooperation). The horizontal axis in the histograms shows the agents’ states from 0 to 1.0, the vertical axis shows the number of agents.
Fig. 3. Histogram of agents’ states after the first iteration.
Fig. 4. Animation and histogram of states after the third iteration.
150
Jijun Zhao et al.
Fig. 5. Animation and histogram of states after the fifth iteration.
First we considered conformist agents. Neighborhood depth 1 was selected, and the initial ratio of cooperators was chosen as 0.5. The evolution of the society is illustrated in Figures 3–8, which show the animations and histograms at t = 1, 3, 5, 10, 18, and 42. Notice that the agents’ states gradually converge to states in the range [0.4, 0.5] and [0.5, 0.6], and pieces of clusters merge into two large clusters with final ratio 0.5 of cooperators. Similar results were obtained with different initial number of cooperators and observed that the agents’ states always converge to a value that is close to the initial average state of the entire society. Next we conducted experiments to examine how model parameters affect simulation results for Pavlovian agents. The reward functions were selected as C(x) = 2x − 1 and D(x) = 2x − 0.5. The learning coefficients in Equation (3) were chosen as αi = βi = 0.1 for all agents. The initial states of the agents were 0 and 1 with the ratio 0.4 of initial cooperators. First we examined the dependence on the neighborhood depth. Figure 9 shows the time series patterns with unit depth and with larger depths. Notice that with unit depth the time series is much smoother, with almost identical limits. The effect of the initial ratio of cooperators is illustrated in Figure 10, where the initial ratios of cooperators were 0.4, 0.69, 0.73, and 0.8. The cooperating agents were uniformly distributed on the 100 by 100 lattice.
Prisoner’s Dilemma and Battle of Sexes Games
Fig. 6. Animation and histogram of states after the tenth iteration.
Fig. 7. Animation and histogram of states after the 18th iteration.
151
152
Jijun Zhao et al.
Fig. 8. Animation and histogram of states after the 42nd iteration.
(a) Neighborhood depth = 1
(b) Neighborhood depth = 10
(c) Neighborhood depth = 20
(d) Neighborhood depth = 40
Fig. 9. Results with initial ratio of cooperators 0.4.
Prisoner’s Dilemma and Battle of Sexes Games
153
(a) Initial cooperation ratio = 0.4
(b) Initial cooperation ratio = 0.69
(c) Initial cooperation ratio = 0.73
(d) Initial cooperation ratio = 0.8
Fig. 10. Results with different initial ratios.
In order to examine the effect of influencing agents we placed an influencing agent in the central cell (50, 50). The neighborhood depth of this influencing agent was assumed to be 30. All agents’ initial states were selected as si (0) = 0.5, and a k3 (i) = 0.2 personality coefficient was assigned. Notice that agents close to the influencing agent will move into partial or complete cooperation very quickly before influence can fade away. For agents in the neighborhood of the influencing agent but far from it, there is only a limited effect, so only small positive amounts are added to their neutral states. We repeated the experiment with two influencing agents. It was assumed that both had neighborhood depth of 30 and they had 20 layers of overlap in their neighborhoods. The results are shown in Figures 11 and 12. The effect of the media is shown in Figure 13, where k4 (i) = 0.1 was selected for all agents with 0.5 initial states. The media influence accumulated fast at the beginning and then it slowed down. In examining the case of greedy agents we selected the reward functions C(x) = x and D(x) = 1.65x. We assumed that 50% of agents had initial state 0.3 and the remaining 50% had initial state 0.7, and the neighborhood depth was 1. Figure 14 shows the animation graphs of the evolution of the game. Because greedy agents imitate each other, the state types (0.3 and 0.7) remain the same, only their ratio changes in time. The time series of an average state is shown in Figure 15, which is convergent.
154
Jijun Zhao et al.
(a) Graphics output after third iteration
(b) Graphics output after 20th iteration
Fig. 11. Outputs with one influencing agent.
Prisoner’s Dilemma and Battle of Sexes Games
(a) Graphics output after ninth iteration
(b) Graphics output after 20th iteration
Fig. 12. Outputs with two influencing agents.
155
156
Jijun Zhao et al.
Fig. 13. The influence of the mass media versus iterations.
(a) Initialization
(b) Iteration 1
(c) Iteration 2
(d) Iteration 5
(e) Iteration 11
(f) Iteration 25
(g) Iteration 38
(h) Iteration 63
(i) Iteration 119
Fig. 14. Animation with greedy agents. Agents are distributed randomly at the beginning of the simulation.
Prisoner’s Dilemma and Battle of Sexes Games
157
Results with accountant agents are very similar to Pavlovians, so the results are not presented here.
Fig. 15. Time series of the average state.
3 Battle of Sexes Games The N -person battle of sexes games are straightforward generalizations of the well-known two-person version, in which a man and his wife want to spend an evening together, but the man prefers a football game and his wife a concert. However, both of them prefer to spend the evening together. The decision of each player is one of the choices: football or concert. In the case of N players each agent has the same alternatives. If x is the ratio of agents who select football, then we have four payoff functions DL(x), L0 (x), DL0 (x), and L(x), where the payoff of football-selecting agents is L or DL depending on the fact that they like or dislike this choice. The functions L0 and DL0 are defined similarly for concert-choosing agents. Notice that L and DL are increasing, and L0 and DL0 are decreasing. Furthermore L(x) > L0 (x) > DL(1) > DL0 (0) >
DL(x), DL0 (x), L(1/N ), L0 ((N − 1)/N ) ,
(12)
where the last two relations require that any agent, who dislikes its choice (and this choice is the same as that of all others) receives a higher payoff compared to the case when the agent likes its choice but is alone. Bean and Kerckhoff [1] give an excellent description of this game.
158
Jijun Zhao et al.
3.1 Dynamic Extensions The N agents are identified by index i ∈ {1, . . . , N }; each agent has two decision alternatives. Letting di (t) denote the decision of agent i at time t, then di (t) ∈ {0, 1}, where 1 represents choosing football and 0 represents choosing concert. The ratio of agents selecting football at time t is x(t) =
N 1 X di (t) . N i=1
(13)
It is assumed that each agent’s initial decision di (0) is its preference li , so the initial ratio of football selecting agents is x(0) =
N 1 X li . N i=1
(14)
A deterministic dynamic process can be introduced as follows. At each time t, each agent considers the number of other football-selecting agents in the previous time t − 1: x?i (t − 1) =
1 X dj (t − 1) N
(15)
j6=i
and then compares its payoff by selecting football or concert under the assumption that all other agents will keep their previous choices. If the agent selects football then the new ratio of football-selecting agents become x?i (t − 1) + 1/N ; if it selects concert, then the ratio remains the same x?i (t − 1). If the agent prefers football then its payoff will be L(x?i (t − 1) + 1/N ) with football choice and DL0 (x?i (t − 1)) with concert choice. Similarly, if the agent prefers concert, then its payoff will be L0 (x?i (t − 1)) with concert choice and DL0 (x?i (t − 1) + 1/N ) with football choice. Depending on the preference of the agent, it compares the payoffs with these two possible choices and makes the choice which gives the higher payoff. Mathematically this concept can be expressed as follows. If li = 1, then 1 if L(x?i (t − 1) + 1/N ) ≥ DL0 (x?i (t − 1)) di (t) = (16) 0 otherwise, and if li = 0, then di (t) =
1 if L0 (x?i (t − 1)) ≥ DL(x?i (t − 1) + 1/N ) 0 otherwise.
(17)
The repeated application of this rule determines the state trajectories for all agents. A stochastic process is introduced next. Let pi (t) denote the probability that agent i selects football at time t. For football-preferring agents a
Prisoner’s Dilemma and Battle of Sexes Games
159
high initial value of pi (0) is given and for concert-preferring agents a low initial value pi (0) is assigned. The actual initial choices of the agents are then given accordingly to the probability values pi (0). At each time t, the probability values are updated by the following rule: pi (t − 1) + ki [L(x?i (t − 1) + 1/N ) − DL0 (x?i (t − 1))] if li = 1 pi (t) = (18) 0 ? ? p (t − 1) − k [L (x (t − 1)) − DL(x (t − 1) + 1/N )] i i i i if li = 0 , which is a straightforward stochastic generalization of relations (16) and (17), where ki > 0 is a learning parameter. Then the actual decisions are made randomly according to the probability values. A theoretical study of the above described dynamic game is given in [9]. 3.2 Simulation Results For football-preferring agents pi (0) = 0.9 and for concert-preferring agents pi (0) = 0.1 initial probability values are given. The payoff functions L(x) = 2x + 1, DL(x) = 2x, L0 (x) = 2(1 − x) + 1, and DL0 (x) = 2(1 − x) were selected. The learning parameters were chosen as 0.2 for all agents. The ratio of football-preferring agents was varied from 0.01 to 0.99 with small step size 0.001. For F ≤ 0.270 the trajectories always converge to zero, when 0.276 ≤ F ≤ 0.725, x(t) converges to the preference ratio F , and if F ≥ 0.73, then the trajectories converge to unity. As 0.270 < F < 0.276, x(t) always converges and the limit is sometimes 0, in other cases the limit is F . For the specific value F = 0.271, there is an 80% chance of zero limit and for F = 0.272 the probability decreases to 60%, and at F = 0.275 the probability is only 5%. For 0.725 ≤ F ≤ 0.729 the limit is either 1 or F . The probability of unit limit is 97% for F = 0.729, 80% for F = 0.728, 70% for F = 0.727, 10% for F = 0.726, and 3% for F = 0.725. We have therefore five zones for the preference ratio, where the trajectories have different long-term properties. This finding is basically the same as the theoretical results presented in [9]. Figure 16 shows two convergent trajectories: one tends to zero, the other to F .
4 Conclusions The behavior of artificial societies depends on the rules on which these societies operate, on the personalities of the members of the society, on the information structure, and on the payoff functions, to mention only a few. The evolution of the society can be then described by a large-scale complicated system of dynamic equations, the asymptotic properties of which cannot be examined by simple analytic methods. In this chapter two particular games were investigated: prisoner’s dilemma and battle of sexes.
160
Jijun Zhao et al.
Fig. 16. For 0.270 < F < 0.276, BOS game converges to 0 or to the ratio of preferences.
We first introduced a general N -person prisoner’s dilemma game with different personality types. In particular we considered conformist, Pavlovian, greedy, accountant, and influencing agents, and the influence of the media. A detailed simulation study illustrated the effect of model parameters and the personality types of the agents. We also investigated N -person battle of sexes games and introduced deterministic and stochastic dynamisms. A simulation study showed the dependence of the long-term behavior of the game on the preference ratio of the entire population. In this chapter two particular games were examined. However, a very important and interesting problem is how one game evolves into another game by gradually altering the parameter values. Such a systematic approach will be the subject of a future paper.
References 1. F. Bean and A. Kerckhoff. Personality and perception in husband-wife conflicts. Journal of Marriage and the Family, 33:351–359, 1971. 2. R. Cooper, D. DeJong, R. Forsythe, and T. Ross. Forward induction in the battle of the sexes games. American Economics Review, 83:1303–1361, 1993. 3. U. Merlone, F. Szidarovszky, and M. N. Szilagyi. Finite neighborhood games with binary choices. Technical report, Department of Systems and Industrial Engineering, The University of Arizona, Tucson, 2006. 4. A. Rapoport and M. Guyer. A taxonomy of 2 × 2 games. General Systems, 11:203–214, 1966. 5. F. Szidarovszky and M. N. Szilagyi. An analytical study of the n-person Prisoners’ dilemma. Southwest Journal of Pure and Applied Mathematics, 2:22–31, 2002. 6. M. N. Szilagyi. From two-person games to multi-player social phenomena (in Hungarian). Beszelo, 10:88–98, 2005.
Prisoner’s Dilemma and Battle of Sexes Games
161
7. M. N. Szilagyi, J. Zhao, and F. Szidarovszky. A new algorithm for the solution of n-person Prisoners’ dilemmas with Pavlovian agents. Pure Mathematics and Applications, 14:233–248, 2003. 8. J. Zhao, F. Szidarovszky, and M. N. Szilagyi. An agent based simulation methodology for analysing public radio membership campaigns. Information Technology for Economics and Management, 3:1–34, 2005. 9. J. Zhao, M. N. Szilagyi, and F. Szidarovszky. An n-person battle of sexes game. Technical report, Department of Systems and Industrial Engineering, The University of Arizona, Tucson, 2006. 10. J. Zhao, M. N. Szilagyi, and F. Szidarovszky. A continuous model of n-person Prisoners’ dilemma. Game Theory and Applications, 2006 (in press).
Individual Values and Social Goals in Environmental Decision Making David H. Krantz1 , Nicole Peterson1 , Poonam Arora1 , Kerry Milch1 , and Ben Orlove2 1
2
Columbia University, Department of Psychology, Center for Research on Environmental Decisions (CRED), MC 5501, New York, NY 10027, USA dhk,np2184,[email protected], [email protected] University of California, Davis, Department of Environmental Science and Policy, One Shields Way, Davis, CA 95616, USA [email protected]
Summary. Environmental problems can be viewed as a failure of cooperation: individual choices are seemingly made based on individual benefits, rather than benefits for all. The game-theoretic structure of these problems thus resembles commons dilemmas or similar multiplayer strategic choices, in which the incentives to eschew cooperation can lead to unfavorable outcomes for all the players. Such problems can sometimes be restructured by punishing noncooperators. However, cooperation can also be enhanced when individuals adopt intrinsic values, such as cooperation, as a result of social goals to affiliate with a group. We define a social goal as any goal to affiliate with a group, and also any goal that derives from any group affiliation. We suggest that individual decision making in group contexts depends on a variety of different types of social goals, including conformity to group norms, sharing success, and carrying out group-role obligations. We present a classification of social goals and suggest how these goals might be used to understand individual and group behavior. In several lab- and field-based studies, we show how our classification of social goals can be used to structure observation and coding of group interactions. We present brief accounts of two laboratory experiments, one demonstrating the relationship between group affiliation and cooperation, the second exploring differences in goals that arise when a decision problem is considered by groups rather than individuals. We also sketch three field projects in which these ideas help to clarify processes involved in decisions based on seasonal climate information. Identifying social goals as elements of rational decision making leads in turn to a wide variety of questions about tradeoffs involving social goals, and about the effects of uncertainty and delay in goal attainment on tradeoffs involving social goals. Recognizing a variety of context-dependent goals leads naturally to consideration of decision rules other than utility maximization, both in descriptive and in prescriptive analysis. Although these considerations about social goals and decision rules apply in most decision-making domains, they are particularly important for environmental decision making, because environmental problems typically involve many players, each with multiple economic, environmental, and social goals, and because examples
166
David H. Krantz et al.
abound where the players fail to attain the widespread cooperation that would benefit everyone (compared to widespread noncooperation).
1 Introduction About 25 years ago, the New York City Council strengthened its “scoop-thepoop” ordinance. Dog walkers were required to scoop and properly discard their dogs’ feces; violators would receive a summons and might have to pay a substantial fine. To some (including one of the authors, who is not a dog owner), this ordinance appeared to be a model of poor legislation: it would be costly to enforce, and if not enforced, it would merely add to the many ways in which New Yorkers already disrespected the legal system. However, this negative prejudgment turned out to be off the mark: there was widespread compliance. Costly enforcement was not necessary. Compliance was not universal, of course, but it did not have to be. The sidewalks of New York City became cleaner. New Yorkers walk much, and walking became more pleasant. This improvement has been largely sustained over the years. Why did it turn out thus, when so many other New York City ordinances are ignored and remain seemingly unenforceable? Was the threat of punishment more credible than for other minor offenses such as littering or jaywalking? Are dog owners, as a group, more afraid of punishment than others? Or are they more civic-minded and law-abiding than wrong-way bicyclists or hornblowing motorists? Indeed, civic-mindedness and concern about punishment may each have influenced the dog owners. Anticipated social reward—perhaps a feeling of group accomplishment—may also have affected their behavior. In this chapter, we suggest a general framework for analyzing problems of cooperation or compliance. We include both social rewards and social sanctions, and we discuss the relation of social goals to group affiliations. We suggest that individual differences in cooperation relate more to differences in group affiliation than to personality traits such as altruism, fear of punishment, or willingness to punish others. As applied to the scoop-the-poop puzzle, our analysis suggests that even weak social identities (such as belonging to the group of neighborhood dog owners, or to the group of civic-minded New Yorkers) could have given rise to social goals that played a crucial role for compliance with the ordinance. Many environmentally related problems are partially analogous to the dog feces problem. For example, efforts by individuals, groups, or countries to reduce greenhouse gas emissions, hoping to minimize global climate change, are partly analogous to the scooping actions undertaken by dog owners to minimize unpleasant sidewalks. Decisions by groups or countries are different, of course, from those by individuals, but the values and goals of individuals strongly influence decisions by groups or countries. A deeper understanding of the values and goals that determine compliance and cooperation is important
Values and Goals in Environmental Decision Making
167
for the design of solutions to a broad range of environmental problems. We suggest here that such understanding is closely tied to understanding the multiple group identities [7] and the resulting social goals of the people whose cooperation is in question for any given problem. For present purposes, we define a social goal as either a goal to affiliate with a group or a goal that is a consequence of an affiliation. We include both temporary and long-term affiliations with groups of any size (from a relationship with one other person to an abstract identity as “good citizen” or the like. Consideration of social goals is important not only for the understanding of behavior but also for prescriptive economic analysis. From the standpoint of prescriptive models, social goals present an array of novel questions. What are the rules describing the tradeoff of one social goal against another, or against an economic goal? How are social goals affected by uncertainty and by delay? These central questions about intergoal tradeoffs, intertemporal tradeoffs, and uncertainty are often handled by adopting sweeping, simplified behavioral assumptions (discounted expected multiattribute utility) [20,35]. The question of when such simplification gives a good approximation for a prescriptive problem can only be raised once the social goals involved have been described, which brings one back again to understanding the multiple group identities of the decision makers. Section 2 presents a brief discussion of commons dilemma problems, such as the scoop-the-poop problem. We cannot do justice to the massive literature on this topic. We merely present and discuss a game-theoretic framework in which payoffs are affected by social institutions and social goals. In particular, we use this framework to discuss the importance of intrinsic social rewards for mutual cooperation. (The huge literature on intrinsic rewards from Bruner [10] to Deci and Moller [15], is also part of the background but beyond the scope of our discussion.) In Section 3 we discuss group affiliation and goals that derive from it. The discussion leads to a tentative classification of social goals. Such classification can serve several purposes. First, it helps one to keep in mind the wide variety of different goals that may be in play for different decision makers. Relatedly, the classification can guide qualitative research on the process of decision making: specifically, the development of coding categories for records of group interactions in decision settings. (Later sections offer some examples in which such group interactions are observed and coded.) Lastly, the classification serves to organize a set of detailed questions about decision mechanisms, that is, about the behavioral mechanisms that govern tradeoffs among different goals and the effects of time delay and uncertainty. Section 4 contains a brief discussion of such decision mechanisms. We emphasize the novel research questions posed by this framework, although answers are scarce. Relating social goals to environmental decisions is one of the major themes of the Center for Research on Environmental Decisions (CRED), and the last two sections discuss some current CRED projects. In Section 5 we describe
168
David H. Krantz et al.
two lines of laboratory investigation with small groups. The first looks at group identity as a precursor of cooperation, and the second at group decision processes related to reframing decision problems. Section 6 describes field observations of group decision making and additional laboratory studies suggested by them. Finally, Section 7 considers the implications of our work for prescriptive economic analysis. We conclude with a summary.
2 Coercion and Intrinsic Reward in Commons Dilemmas 2.1 Commons Dilemmas and Coercion The scoop-the-poop problem is a commons dilemma [24]: if many cooperate, the gain for each person is large, but the portion of that gain that stems directly from any one person’s cooperation is too small to repay his or her effort or cost, therefore each person has an incentive not to cooperate, regardless of whether many or only few others cooperate. Many decision situations have a similar structure. Discharge of chemical wastes into waterways, or discharge of polluting or greenhouse gases or aerosols into the atmosphere provide a host of examples. As Hardin [24] put it: The rational man finds that his share of the cost of the wastes he discharges into the commons is less than the cost of purifying his wastes before releasing them. The collapse of fisheries due to overfishing provides yet another family of examples. Widespread self-restraint would make most users of a fishery better off, and most recognize this to be true, but there is no incentive for self-restraint by any one user, because the improvement in fish stocks produced by any one user’s restraint would be too small to repay that user’s sacrifice. Hardin’s prescribed solution for commons dilemmas was “mutual coercion mutually agreed upon.” Dog owners helped to elect the New York City Council, which enacted the coercive scoop-the-poop ordinance. Similarly, in times of water shortage, city or regional officials impose restrictions on the water usage of the voters to whom they are directly or indirectly responsible. International attempts at mutually agreed coercion are seen in many treaties, including those intended to curb overfishing, to protect endangered species, to curb chlorofluorocarbon emissions (damaging to the protective ozone layer in the stratosphere), and to curb greenhouse gas emissions (involved in global warming). Since Hardin’s work in 1968, there has been much research on commons dilemmas. Ostrom [39] provides a valuable review and synthesis, covering laboratory studies, field studies, and theory (see also [40]). A major conclusion is that mutual coercion does not always or even usually require formalized institutions stemming from treaties, legally enforced contracts, legislation, or
Values and Goals in Environmental Decision Making
169
administrative law. Many groups succeed in creating norms for cooperation that command wide compliance. Ostrom finds that trust and reciprocity are central to cooperation, and that norm development and group affiliation enhance cooperation. Ostrom’s [39] design principles for the commons of group membership or boundaries and participation particularly signal the importance of group affiliation. Both the empirical findings and the theory reviewed by Ostrom emphasize the importance of sanctions (negative consequences) for violations of the relevant norms. Also noteworthy are the laboratory studies of Fehr and his collaborators [11,18], which emphasize the important role of individuals who, despite cost to themselves and lack of gain, are ready to impose sanctions on noncooperators. Evolutionary theory for repeated play in commons dilemmas [22] emphasizes the importance of conditional cooperation: a disposition to cooperate initially and to continue only if cooperation is soon reciprocated. Here, the discontinuation of cooperation can be viewed as a sanction. 2.2 Intrinsic Reward for Cooperation The threat of sanctions is undoubtedly important, but one should also consider the rewards stemming from cooperation. We discuss this first in the context of the prisoner’s dilemma, which can be characterized abstractly by the two-person payoff matrix shown in Table 1. Player #1 has a choice of two strategies, shown as the two main rows of Table 1, labeled Cooperate and Defect. Player #2 has a similar choice, shown as the two main columns. The outcomes for each player are shown in the cell of the table determined by their two strategy choices, with Player #1’s payoff above and #2’s below. Table 1. Prisoner’s dilemma outcome ranks. Top entry in each cell is Player #1 outcome ranking; bottom is Player #2 outcome ranking. Player #1 Strategies
Cooperate
Defect
Player #2 Strategies Cooperate
Defect
3
1
3
4
4
2
1
2
The outcome ranking for each player goes from 1 (low) to 4 (high). For Player #1, row 2 (Defect) dominates, because the outcome for row 1 ranks lower within each column: 3 < 4 and 1 < 2. Similarly, for Player #2, column
170
David H. Krantz et al.
2 (Defect) dominates: in row 1, 3 < 4, whereas in row 2, 1 < 2. In a situation where the game will be played only once, each player should Defect to maximize the rank of her payoff. Empirically, a majority of people do defect on a single isolated play [5], but many do cooperate, and several different situational changes have been shown to induce more people to cooperate [41]. Why do they cooperate? There seem to be two broad categories of explanation: (i) confusion and (ii) group goals. The first type of explanation asserts that people fail to perceive that noncooperation is dominant. In real-life examples, outcomes are usually uncertain, which may contribute to failure of perceived dominance. The ranking from 1 up to 4 may not be perceived as such if people deviate widely from expected utility (for reasons rational or otherwise) in processing the uncertainties. Most laboratory experiments on the prisoner’s dilemma, however, avoid complication due to uncertainty: they present outcomes as certain or they precalculate expected values and display them in the payoff matrix, which should make dominance easier to recognize [6]. Even under certainty, people may not represent the payoffs as a conditional table as in Table 1, and may therefore not recognize dominance [49]. Some people who do recognize the dominance of Defect in the payoffs nevertheless choose Cooperate. Why? This brings us to explanation category (ii): very simply, they want to cooperate. More precisely, some people view mutual cooperation as an additional goal. Mutual cooperation is an intrinsic social reward. It is as though Player 1 implicitly adds a nonzero value C to the entry in the (Cooperate, Cooperate) cell of the payoff matrix, and Player 2 adds C 0 . In this table, we denote the increased rewards for mutual cooperation by 3+C or 3+C 0 . This is not a rigorous notation, because 3 was just the rank of an outcome. We use this notation to indicate that the outcome is improved for each player. The result is shown in Table 2. Table 2. Prisoner’s dilemma with rewards for mutual cooperation. Top entry in each cell is for Player #1; bottom is for Player #2. Player #1 Strategies
Cooperate
Defect
Player #2 Strategies Cooperate
Defect
3+C
1
3 + C0
4
4
2
1
2
If the C and C 0 improvements are sufficiently large, then the outcomes in the upper left cell for each player are highest, and Defect no longer dominates.
Values and Goals in Environmental Decision Making
171
If Player #1 thinks that C 0 is high, then he may expect Player #2 to be tempted to maximize his outcome by cooperating. If C is also high, and if player #2 knows this, then the two players each recognize that both will be best off through cooperation, and they will be likely to cooperate. With this altered payoff structure, the situation is no longer a prisoner’s dilemma. High values of C and C 0 can be thought of as representing valuation of a group goal for the players. That is, the addend is not associated with the action of cooperation per se, because it does not show up when the other player defects. Rather, it depends on a group outcome, namely, that both cooperate. We keep open the possibility that C 6= C 0 , that is, that the intrinsic reward value of mutual cooperation is different for the two players. In particular, Defect might dominate for just one of the players. If the other player knows this, she will assume that her counterpart will defect, and thus, despite having a high intrinsic value for cooperation, she will also likely defect. We defer to Section 3 the consideration of how such an intrinsic reward for mutual cooperation might arise from group affiliation, and to Section 4 a discussion of how such a reward interacts with other goals (represented in a preliminary fashion by the “+” signs in Table 2). To make clear why such intrinsic rewards could play a key role in the commons dilemmas and other similar decision situations, we consider a four-person symmetric game with both intrinsic rewards and sanctions. The symmetry assumption greatly reduces the generality but it allows the payoffs for the multiperson game to be represented in Table 3 by a matrix (instead of a multiway array). Table 3. Reward and punishment in a symmetric four-person game. Number of
Payoff to each
Defectors
Cooperate
Defect
0
7 + C4 − q0
—
1
5 + C3 − q1
8 − D1
2
3 + C2 − q2
6 − D2
3
1 + C1 − q3
4 − D3
4
—
2
Here, as in Tables 1 and 2, the extrinsic rewards are ranked in order of their subscripts, from 1 up to 8. If all the intrinsic rewards (Cs), the costs of punishment (qs), and magnitudes of punishment (Ds) are = 0, then the payoff matrix represents a classic commons dilemma or public goods game. No matter how many cooperate or defect, there is an incentive for any one cooperator to defect, to get a better outcome (7 < 8, 5 < 6, etc.). Thus, the only Nash equilibrium is all-Defect.
172
David H. Krantz et al.
The standard way to create cooperative equilibria is to punish defectors. Assume that it costs each cooperator qj if there are j defectors to be punished, and that each defector suffers punishment Dj . If the punishments are severe relative to the costs, defection is deterred. Thus (continuing the symbolic addition/subtraction), if 7 − q0 > 8 − D1 , then when all cooperate, nobody has any incentive to defect. This, or variants of the same idea, may very well succeed. But even when all do cooperate, the cooperators lose something, because establishing sanctions for defectors is costly. As the number of defectors rises, so typically does the cost of sanctions, and the sanctions in turn typically become less effective. Norms that many people violate become difficult or impossible to enforce. The situation might be much different if the Cs add positive value to cooperation. An intrinsic reward value for mutual cooperation can be viewed as a windfall: the players are promised only 7, the next-to-best individual outcome, if all four cooperate, but they experience something better: the promised outcome plus the intrinsic reward C4 . This windfall may be expected by people who have experienced similar intrinsic rewards, and thus it may motivate cooperation. Many factors complicate this simple picture. The intrinsic reward value of cooperation may depend on how many cooperate: C4 may be different from C3 , and so on. Symmetry may not hold; the extrinsic outcomes and intrinsic rewards may vary across group members, and the latter may depend as well on which particular people are among the cooperators. Intrinsic rewards associated with group goals may be also accompanied by additional costs: for example, a person’s pursuit of other, unrelated goals may be constrained by group norms, such as when individual financial goals conflict with norms of altruism or generosity. Even when many people are motivated by intrinsic reward, some may not be, thus, punishment of defectors may still be necessary for maintenance of cooperation. Despite these caveats, the additive formulation in Table 3 shows the possible importance of intrinsic reward. All-cooperate is a Nash equilibrium in Table 3 provided that 7 + C4 − q0 > 8 − D1 . The larger C4 , the more easily is this attained. The sanctions D threatened for a potential defector can be smaller, and perhaps less costly (smaller q) when C values are large for that individual. In practice, the designers of a formal or informal structure for “mutual coercion” may understand the social goals and intrinsic rewards from cooperation for the people who will be subject to sanctions, and may (consciously or unconsciously) take such social rewards into account in deciding (correctly) what levels of sanction and monitoring will be sufficient. A structure that appears to work because of effective enforcement may in fact work only through
Values and Goals in Environmental Decision Making
173
a mixture of sanctions and intrinsic social rewards. This would be our reading of Ostrom’s design principles for the commons. Moreover, sanctions themselves create a second-level commons dilemma: who will cooperate to impose them, and why? Consider the scoop-the-poop problem. In the high population density of New York City, violations are generally noticed, even at off hours. Will someone who observes a violation take the trouble to call the police? Or the trouble and risk of criticizing the offender directly? Insofar as there is some small probability of this, it probably stems from anger or other negative emotion elicited by the violation: usually, the observer cannot in turn be sanctioned for failure to report or to criticize the violator. Such negative emotion does not, however, come merely from one additional bit of dog feces on the sidewalk, rather, it comes from observing a violation of a norm that one considers important. A large C (for achieving a cooperative goal) gives rise to a large disappointment when the goal is not achieved, and consequent negative emotion directed at a person violating the norm. Because of this second-level commons dilemma, a first-level account based solely on punishment must nevertheless take into account social goals of the enforcers.
3 Toward a Classification of Social Goals Humans are social animals: associations with human groups strongly affect the survival, health, and reproductive fitness of individuals. Unlike other social animals, people affiliate simultaneously with multiple groups, including family, work groups, neighbors, and often many others. Fitness depends not only on the particular groups with which one is affiliated, but on one’s role and one’s status within each such group. The preceding claims seem obviously true, but determining the effects of affiliation, role, and status in detail is complicated. For example, a study by [17] exhibits a simple relationship between social role and reproductive fitness, but in horses, not people. Complicated relationships of social status and social roles to human longevity are shown by Samuelsson and Dehlin [46]. Affiliation with groups can be viewed as a fundamental social goal, which in turn is related to many other goals [21]. In some cases, affiliation is merely a subgoal directed toward some purely individual goal. An example would be a hungry person who approaches others chiefly in the hope of being invited for a meal. Often affiliation is an end in itself or a means to other social goals: for example, a person might approach others just in order to be near them (end in itself) or in hope of conversation, sex, or other social goals. Indeed, a particular affiliation goal may be directed toward a variety of other goals. A partial classification is given in part (A) of Table 4. Schachter [47] studied factors that increase the desire to affiliate. For example, inducing anxiety (about a supposedly forthcoming electric shock) increases a subject’s desire to spend waiting time with another (unknown)
174
David H. Krantz et al.
person. The monograph directly demonstrates increase in well-being through shared presence: actually spending time with another person reduced anxiety. Turner et al. [53] demonstrated that very slight differentiation among a set of individuals can induce separate group identities, sufficient to motivate discrimination against the “out-group” (or in favor of the “in-group”) with respect to allocation of rewards. Here, affiliation seems to be an end in itself, arising automatically as a consequence of the slight differentiation. We view the nepotism toward in-group members or discrimination against the outgroup as intrinsic social goals (Category E in Table 4). We treat affiliation as a central concept not only because it arises so readily as a goal but because other social goals can easily be classified relative to specific group affiliations. Table 4 gives an overview of such a classification scheme. Under (A) we list a number of consequences of affiliation. A person affiliated with a particular group thinks of himself as part of that group (group identity) and is perceived as having somewhat altered status (by outsiders, by other members of the group, and by himself). Depending on the nature of the group, the person may also feel (and may be) more secure, physically, economically, or socially, and may feel (and may be) more able to pursue other goals. We use the term “efficacy” for the latter state. Each of these points, identity, status, security, and efficacy can be discussed at book length, but elaboration is beyond the scope of this brief classificatory venture. Suffice it to say that each of these consequences can be anticipated to some extent and can thus motivate affiliation with the group. Another important consequence of affiliation is increased awareness of group norms: the expectations of one’s behavior held by other group members [19]. Various forms of sharing with other group members also emerge as goals. Sharing of “mere presence” becomes an important goal in many groups [47,60]. Many specific activities are also shared, depending on the type of group, and usually governed partly by group norms. Table 4 mentions a few examples. Finally, because a member’s status depends in part on that of the group, protection and/or enhancement of group status is a goal that arises as a consequence of affiliation. Included in categories B, C, and D are goals that arise from within-group differentiation of roles and/or status, rather than from affiliation per se. A group may comprise many different roles, designated formally or agreed on tacitly. President or secretary may be roles that are described formally and assigned by election in a large group; strawberry-shortcake-maker might be a role assumed informally by one member of a couple. Category B is role aspiration: a person desires a particular role within the group. Typically, role aspirations are new goals that arise as a consequence of group affiliations, because a person not affiliated is likely to have at most vague awareness of the within-group roles, especially the informal roles. Once such a goal is
Values and Goals in Environmental Decision Making
175
Table 4. Classification of social goals relative to a particular group. (A) Goals related to affiliation per se
Taking on group identity Taking on group status Safety: feeling and being secure Efficacy: feeling and being capable or powerful Adherence to group norms Sharing with other group members • Mere presence • Activities or experiences (vary with group: e.g., meals, conversation, sex, prayer) Enhancing group status
(B) Role or status aspirations within the group
Specific roles (formally or informally designated) Within-group status aspirations Within-group social comparison
(C) Role-derived obligations (prevention goals)
Within-group • Memory operations (encoding, storage, retrieval) • Coordination (scheduling, etc.) • Adding or elaborating norms • Sanctioning norm violators External • Protecting group status • Intergroup relations and coordination • Affiliation with related or umbrella groups
(D) Role-derived opportunities (group promotion goals)
(E) Goals for other group members and nonmembers
Sharing goals (in-group nepotism) Opposing goals (out-group prejudice, schadenfreude)
attained, however, additional goals specific to the particular role are activated and adopted. Category C consists of role obligations: goals that a person fulfilling a particular role ought to adopt. The norms in question may be imposed within the group or from other groups. For example, the role of parent within a family entails numerous obligations, some of them not well anticipated: some imposed by within-family norms, others by outside groups (e.g., extended family,
176
David H. Krantz et al.
other parents in one’s cohort, child-care workers, local school system, or state). Just about every role, however, carries obligations. Under (C) we give a partial classification of such role obligations, making use of some functions that most groups require in order to exist as such. For example, individuals do not necessarily require memory or coordination beyond the brain mechanisms acquired in the course of normal human development. However, groups of two or more individuals do not share a brain, thus they require explicit mechanisms to remember the past and to coordinate current efforts. There are often differentiated roles within groups devoted to these functions. The other categories of role obligation listed under (C) also arise naturally from considering group function. The theory of self-regulation [25] emphasizes the distinction between prevention goals (ought goals, or obligations) and promotion goals (ideal goals, or opportunities). Although the theory has mainly been applied to individual goals, it is interesting to note the large number of potential prevention goals generated by group-role obligations. By contrast, role-derived promotion goals, (D), is a peculiar category. A promotion goal involves an opportunity that one is not obligated to act upon; but if an opportunity is role-derived, and pertains to the group in question, there may always be some degree of obligation to act on it. In this connection, of course, one can consider rolederived opportunities that do not pertain to the group in question, and so are not obligations. The examples that come to mind are ugly: seeking a bribe, taking sexual advantage of a dependent, and so on. Possibly there are good examples of role-derived group-pertinent opportunities that do not entail obligations; we leave (D) as a placeholder for further consideration. The final category, (E), consists of goals that one holds on behalf of other group members. These are numerous and include goals such as wanting one’s child to do well in school or to gain a job promotion, wanting a co-religionist to win a competition, or wanting members of a temporarily associated group to get extra rewards [53]. We also include goals that are specifically held for nonmembers: often, negative ones, as in out-group prejudice. 3.1 Mutual Cooperation The discussion of social dilemmas in the preceding section emphasized the possible importance of a mutual cooperation goal, represented by the additive parameter C or C 0 in Table 2, or the Cs in Table 3. How does such a goal fit into the preceding classification? We take it to be affiliation related (A), but within that category, there are several different forms to be considered. First, mutual cooperation can be thought of as shared activity, and shared success. It is more abstract than sharing a meal or sex, but nonetheless falls into this subcategory. Second, it can sometimes enhance group status (or prevent a lowering of group status). The poop-scooping dog owners may feel that unpleasantness in the streets lowers the regard in which they are held, as dog owners, by others in their neighborhood. Note that this is different from
Values and Goals in Environmental Decision Making
177
the case of a dog owner who scoops despite the fact that others do not. For such a dog owner, personal status may be at issue; that is different from the person who scoops in the hope that others will do so. Third, the individual may feel that failure to cooperate violates a subgroup norm. Identity as a neighborhood dog owner is strengthened by following what may turn out to be an important norm for that group. Finally, in some cases, where there is no pre-existing group, cooperation may simply be a means of affiliating, even if only temporarily, with others. That may be the driving force in laboratory groups, where the individuals are brought together for the first time and do not expect any continuing relationship. In everyday life, one sees examples of exchange of small kindnesses between strangers who are only momentarily together. 3.2 Observing Social Goals The preceding classification of social goals is intended to be useful in identifying social goals from observed behavior. Many sorts of behavior can be observed in the study of decisions. One can observe verbal and nonverbal aspects of people’s interactions with one another, as they discuss a problem or decision together; one can record thoughts reported by individuals while dealing with a decision problem (under thinkaloud or “write your thoughts” instructions), one can administer pre- and/or post-task interviews, also pre- and/or post-task questionnaires. Such observational methods can generate much data, which must be categorized efficiently for analysis. The classification in Table 4 provides a structure for categorizing these observations. Because Table 4 categorizes social goals relative to some particular group affiliation, it suggests an initial question about any particular behavioral observation: which group affiliations might plausibly have influenced that person’s behavior? For example, at a community meeting to hear a seasonal climate forecast, a woman said to the group, “We should work hard.” This looks like an example of “adding or elaborating norms” (under (C) in Table 4). Is she setting a norm for the entire village? Or for members of her own immediate family? Or some other subgroup? In which subgroup(s) does she have a role that obligates her to contribute toward norms? These are questions that might be answered by interviews with the speaker herself or with other local people who are in a position to know. Often several different affiliations will be relevant to a decision problem. For example, a person selecting a gift may consider his relationship with the intended recipient, with a partner and/or friends (who share the expense), with the larger social group connecting gift-giver and intended recipient, and even the (possibly temporary) dyadic relationship with a merchant. Social goals pertaining to each of these relationships could be probed using the above classification to structure questions or coding categories.
178
David H. Krantz et al.
3.3 An Example: Reciprocity in the Framework of Social Goals As explored above, social goals can be associated with a variety of group affiliations, roles, obligations, or opportunities, often simultaneously. The classification system of the previous section provides a guide to observing and analyzing social goals in context. Here, we use gift-giving as an example. Gift-giving is a topic of much interest theoretically. Mauss’s seminal work in anthropology [33] explores how gifts enhance solidarity: “like the market, [gift-giving] supplies each individual with personal incentives for collaborating in the pattern of exchange” [16]. The debate over what a gift is, and whether there are “free gifts” has been long and involved; here, we are most interested in the idea of how gifts fulfill social goals for individuals. Some gifts, viewed in a longer time framework, may be instrumental toward later returns. Such gifts may be viewed as maximizing the giver’s expected utility over time. For example, one may give a large tip in a restaurant in the hope of receiving excellent service on future visits. There is some risk in this— the hope may not be realized—but the expected return may be high enough to compensate for the risk. Social goals are not necessarily involved: the waiter may return excellent service for no reason other than the expectation of more large tips, and large tips may in turn be continued for no reason other than to assure excellent future service. At the opposite extreme, giving a gift may be a social goal in category (E): the recipient’s happiness may be intrinsically rewarding to the giver. Between these two extremes one can identify at least four other social goals that sometimes underlie gifts: fulfillment of role obligations, adherence to group norms (including reciprocation norms), status aspirations, and, perhaps most important, intrinsic reciprocation goals. Roles that lead to gift obligations are ubiquitous. The existence of strong norms for reciprocation, and the role of gift-giving in seeking high status are the two themes of Mauss’s study of Pacific Island and Northwest Pacific cultures [33]. Intrinsically motivated reciprocation is a goal that is induced by a gift or favor from someone else: the reciprocation goal is called intrinsic if the recipient desires to give a gift for a gift, independent of a group or cultural norm to reciprocate, and independent of any expectation of future favors. An example might be leaving a notably large tip in return for excellent restaurant service in a circumstance where one is reasonably sure that there will be no opportunity to return to that restaurant. Moreover, reciprocation may typically involve multiple simultaneous goals: one goal may be intrinsic, but it may coexist with norm-satisfying, status-seeking, role-based, or instrumental goals. A large tip, for example, may be motivated by an instrumental as well as an intrinsic goal. Nonetheless, we think that intrinsic reciprocation is an important motivational phenomenon, possibly the basis for other sorts of reciprocation. Similarly, the intrinsic mechanism can give rise to norms for reciprocation: the original gift-giver believes (on the basis of her own reciprocation goals)
Values and Goals in Environmental Decision Making
179
that she has induced a goal of reciprocation, and therefore she also expects reciprocation and thereby helps define a norm.
4 Tradeoffs Involving Social Goals Tables 1–3 implicitly used a utility-maximization framework, because it was assured that outcomes can be ordered. To break the dominance of Defect, the outcome rank, or utility, was modified by punishment and/or intrinsic reward. We do not believe that a utility-maximization framework is adequate (even for decisions without social goals), however, it is a useful starting point for many questions concerning social goals. In this section, we first discuss four questions about social goals in the utility framework. We then briefly sketch the reasons for considering decision principles other than utility maximization, and point out how some questions about social goals look from the standpoint of alternative decision principles. 4.1 Four Questions About Social Goals (i) Tradeoff Between an Economic and a Social Goal One of the first questions about social goals is how much they are actually valued. People want to be fair, but how much will they pay toward that end? One takes the measure of a social goal in terms of some economic equivalent, the maximum that a person will pay in order to attain it. Or better yet, because even a fairly large monetary payment may represent only a small sacrifice for a wealthy person, the economic equivalent is the marginal utility from money that will be sacrificed in order to attain the goal. People do sometimes make large sacrifices in order to cooperate with others, to share with others, or to benefit others. Some social goals are described as “priceless.” We can view this description with some cynicism, yet it does at least convey the idea that tradeoff questions are not lightly answered. Cooperation in commons dilemmas is a complicated example: relevant factors are not merely the tradeoff of economic and social goals, but also fear of sanctions, concern for reputation (especially in repeated play), expectations about others’ choices, and recognition of others’ expectations and intelligent reasoning (leading sometimes to Nash equilibrium). Nonetheless, the fact that some social goals weigh heavily is an important element. Although we have little confidence that social goals can be valued accurately by studying their tradeoffs with money, we do think that the question of tradeoff between economic and social goals is extremely important. The broad classification of social goals, given above, is intended to place such tradeoff questions in a general context, by recognizing that a wide variety of social goals may be in play simultaneously.
180
David H. Krantz et al.
(ii) Tradeoffs Among Different Social Goals This topic may be as important, and more complex, than tradeoffs between an economic and a social goal. Part of the complexity arises from possible internal conflict among social goals arising from different group identities. Books have been written about the conflict between work roles and parenting roles, between romantic love and religious affiliation, and so on. Again, we hope that a broad classification of social goals may facilitate thinking about the ways in which such conflicts are resolved in decision making. The utility perspective suggests a complex multiattribute utility function, describing the contribution of each such goal to overall happiness. Because particular actions are typically directed toward achieving several goals at once, such a multiattribute utility function would have to encompass many combinations of simultaneously achievable social (and economic) goals. Only in the case of additive utility, where each goal contributes in separable fashion from all others, can one hope for a manageable utility description. (iii) Uncertainty and Social Goals Risk and uncertainty have been studied most with outcomes defined by money value. There is a narrow but important class of decisions with high uncertainty and with gains and losses having clear money values. Gambling is the clearest example; financial investments and insurance purchases are more complicated, because some of the gains and losses in question lie in the future, so risk becomes mixed with intertemporal tradeoff. For insurance, moreover, the financial gains and losses are sometimes hard to separate mentally from accompanying nonmonetary outcomes: physical suffering, social loss, or destruction of loved property [26]. It seems unfortunate that research has focused on financial goals, because the attainment of health goals, social goals, research goals, or environmental goals is often uncertain. Utility has been the underlying theoretical concept used to justify neglect of different goal types: according to the expected-utility principle, the effect of uncertainty for any type of outcome is captured by the formula p(E) · U (y), multiplying the utility U (y) of the outcome in question by the subjective probability, that is, the degree of belief p(E) that the event E (defining the circumstances under which the outcome is obtained) actually occurs. According to this idea, the type of outcome y does not matter, only its utility U (y) enters into the evaluation that takes uncertainty into account. On its face, this is a very strong and scarcely plausible assumption. One might, on the contrary, think that people would show higher tolerance for uncertainty for outcomes that are typically highly uncertain, and less for outcomes that can often be nearly assured. The assumption is so pervasive, however, that it scarcely seems to have been tested empirically, even in the literature on health decisions, where there has been the most systematic use of the utility concept apart from utility of money. Brewer et al. [8] demonstrated that patients who believe that blood cholesterol level is related to coronary
Values and Goals in Environmental Decision Making
181
consequences adhere more closely to LDL cholesterol-lowering regimens then those who do not hold this belief. This sort of evidence at least shows a connection between beliefs and decisions, outside the monetary domain, although it is far from a test of the strong utility model enunciated above. We note that the most important alternative to expected utility is prospect theory [28,56], which was developed and tested solely in the domain of money gambles. Although it is far from obvious how this theory should be generalized to other types of goals (see [29] for one suggestion), the loss-aversion principle derived from the theory has in fact been applied extensively, albeit metaphorically, to nonfinancial goals. In a more general theory, loss aversion may also vary with the goal domain. We hope that a focus on social goals will lead to more thorough investigation of the effects of uncertainty on different sorts of goals. Indeed, it is hard to know how to approach the earlier questions about tradeoffs between social and economic goals, or tradeoffs among social goals, when most of the goals have some associated degree of uncertainty, without also knowing how the valuation of different sorts of goals is affected by uncertainty. (iv) Intertemporal Tradeoffs Involving Social Goals The effects of delay, like those of uncertainty, have been studied extensively in the domain of money. Often decision theory treats delay, like uncertainty, by use of a utility discount factor, that is, through the formula k(t) · U (y), where 1 − k(t) is viewed as the discount associated with delaying utility U (y) by duration t. Chapman [12] reviews some limitations on this model in the domain of health decisions: in particular, discount rates appear different for monetary and health goals. Redelmeier and Heller [45] report differences in discount rates for different health goals, including cases of negative discount rates. Social goals could be an excellent domain for studying effects of delay, because they differ greatly in the time perspectives attached to different goals. Sharing, whether conversation or sex, often seems urgent, whereas role aspirations (becoming a parent, presidency of an organization) are often longterm goals. Group environmental goals (restoring wetlands, amelioration of greenhouse-gas emissions) can carry a very long time perspective. As with uncertainty effects, it may be necessary to understand effects of delay prior to thoroughly understanding tradeoffs between social and economic goals, or among different sorts of social goals. Finally, we note that delay often produces additional uncertainty, so there is an important interaction between these two types of effects. 4.2 Limitations of a Utility Approach Utility maximization is most useful when the tradeoffs among different sorts of goals are stable. Even within the domain of individual goals, however, there
182
David H. Krantz et al.
is strong evidence that tradeoffs among goals are far from stable: the adoption and weighting of particular goals can be context-dependent [50], as can the decision rule governing tradeoffs [29]. Three findings are particularly relevant to these conclusions about tradeoffs and decision rules: intransitivity of pairwise choice, contingent weighting of multiple attributes, and paradoxical effects from adding a dominated alternative. We briefly review these findings, and some of the possible tradeoff mechanisms that underlie them, because similar phenomena and choice mechanisms might play an important role in decision processes involving social goals. (i) Intransitivity Systematic intransitivity of individual choice was first identified by Tversky [54]. He predicted this phenomenon from a choice model in which individuals first make comparisons between two alternative options, along each of several relevant attributes, then choose between the options by combining the positive and negative differences obtained from the various comparisons. This model can lead to intransitivities. For example, B may be chosen over C on the basis of an improvement in quality, even though C is a little cheaper. Similarly, A may be chosen over B, on the basis of still higher quality, although B is a little cheaper. However, when A is compared with C, the combined price difference may seem important, and one may choose C, sacrificing quality to price. A recent beautiful study of foraging choices by Canadian gray jays [58] illustrates this type of intransitivity. Each bird tended to choose A, which consisted of a single raisin that was obtainable by going 28 cm down a wire mesh tube, rather than B, a cache of two raisins at the more dangerousseeming distance of 42 cm. Likewise, each bird mostly chose B, the two raisins at 42 cm, rather than C, three raisins at the still deeper distance 56 cm. However, in a choice between a single raisin at 28 cm versus three at 56 cm, the difference in raisin cache size tended to match or outweigh the appearance of danger. Four of twelve jays chose C, the three-raisin cache on a clear majority of trials; the other eight hovered around 50–50 for choices between C and A. A similar intransitivity must occur whenever the perceptual magnitudes of differences along different attributes grow according to distinctly different laws. For the jays, the perceived advantage of 3 rather than 1 raisin is perhaps nearly the sum of the perceived advantages of 3 raisins versus 2 or 2 versus 1. Although the perceived growth in danger from 28 to 56 cm is perhaps much less than the sum of the magnitudes for 28 versus 42 and for 42 versus 56 cm. Tversky showed that when three attributes are involved, this choice mechanism, via attribute-differencing perceptions, can lead to perfectly transitive choice only when all the difference-growth laws are linear. Intransitivity is of course inconsistent with utility maximization as the mechanism underlying choice, inasmuch as utilities are ordered transitively. This is true whether the quantity to be maximized is expected utility, or
Values and Goals in Environmental Decision Making
183
of any other numerical index (including the value function postulated by prospect theory). An alternative choice process, which would lead to transitive choice, involves comparing some “overall evaluation” or index for each option (integrating across all its relevant attributes) rather than combining within-attribute comparisons. We know, however, that initial within-attribute comparison, of the kind considered by Tversky, is ubiquitous in human multiattribute choice [32,42]. The nonintuitive character of choice by numerical index is well illustrated by the regular controversies over college football polls in the United States. Anyone whose intuitive ranking disagrees with the index feels free to criticize the index as arbitrary or biased. Social goals may be particularly difficult to integrate with economic goals. Consider a choice between plan A, which offers a large financial benefit to the decision maker, versus plan B, which provides a smaller benefit, adheres more closely to the norms of a group with which the decision maker is strongly affiliated, yields some benefits for other group members, and perhaps thereby increases the chance that the decision maker will gain a leadership role in the group. The within-attribute comparisons (greater financial benefit versus better adherence to group norms) are salient. To suppress these comparisons and instead to judge “overall integrated utility” of the financial benefit, the deviation from group norms, and so on. may be difficult. These salient withinattribute comparisons will play a major role in preference construction, contrary to utility maximization. (ii) Contingent Weighting Even when integration across attributes occurs, the relative importance of a given attribute may depend on details of the choice situation. For example, Tversky et al. [57] showed that a tradeoff between probability of winning and amount to be won can depend strongly on whether people simply state which lottery they would rather play or state a price for each lottery (with the understanding that the higher-priced lottery will be chosen to be played). When asked to state prices, people place higher weight on the amount to be won. They found similar context-dependence for tradeoff between the amount of a prize and the delay before it can be obtained. The possibilities for contingent weighting when individual and social goals are both in play seem obvious, and potentially enormous. In the field of disaster policy (natural hazard, accident, or crime) the tradeoffs among lives, injuries of various sorts, damage to property, and public and private expenditure on prevention and preparedness are notoriously difficult. The weights applied to money expenditure and to lives obviously depend on whose lives are at stake, but also on whether the choices are made as part of a budget process where alternative expenditures are at stake or in some other context. In this arena it seems important to sort out the effects of contingent weighting of goals versus the effects of goals that are influential but not fully articulated. Similar issues arise with respect to health policy and environmental policy choices.
184
David H. Krantz et al.
(iii) Asymmetric Dominance The fact that the alternative plans or options are an important determinant of the context and affect the construction of a preference, is demonstrated most directly when a dominated alternative is inserted as a third option in a choice situation [1,27]. The situation is depicted abstractly in Table 5. Table 5. Price/quality tradeoff and asymmetric dominance. Plan/Option
Price
Quality
A0
$22
Fair
A
$20
Fair
B
$35
Good+
B0
$35
Good
If the first three options, A0 , A, and B are the only ones available, then A tends to be chosen rather than B (and A0 , which is dominated by A, is rarely or never chosen). However, if only the last three, A, B, and B0 are available, then B tends to be chosen (and B0 , dominated by B, is rarely or never chosen). In other words, whether A or B tends to be chosen depends on whether a third option is similar to but dominated by A or is similar to but dominated by B. Once again, this phenomenon has not been explored with a mixture of social and individual goals, but it is a robust finding for all sorts of tradeoffs among individual goals. For example, it is seen in female college students’ selection among descriptions of prospective male blind dates, where the dimensions varied are handsomeness and articulateness [48]. Consider Table 6, in which the basic choice is between good adherence to a group norm, gaining a payment of $220 (plan A), versus poor adherence and a payment of $550 received (plan B). Assume that there are no extrinsic rewards or punishments for adhering to the norm or not. Table 6. Adherence to norm versus payment received. Plan/Option
Payment
Adherence
Received
to Norm
A0
$200
Good
A
$250
Good
B
$550
Poor
B0
$550
Poor−
Values and Goals in Environmental Decision Making
185
By analogy with Table 5, adding the option B0 , with a slightly more egregious violation of the norm, for no more gain, might increase the chances that people feel virtuous in choosing B, whereas adding A0 , which allows A the appearance of a bonus for adhering to the norm, might increase the chances that people choose A. Alternatively, tradeoffs between payment received and adherence to a norm might follow quite different rules, and could even show the opposite of the asymmetric dominance phenomenon. This is just one example of the vast terrae incognitae in the domain of social goals. 4.3 Social Goals in a Broader Decision Theory Krantz and Kunreuther [29] discuss several alternatives to utility maximization. One is within-attribute comparison, extensively documented by Payne et al. [42] and incorporated into Tversky’s [54] additive-difference model. A second is voting by goals: choosing a plan that accomplishes two different goals, rather than a different plan that accomplishes only one, or the like. This idea was derived from a basis for intransitivity quite different from Tversky’s, the intransitivity of majority vote [4,13]. The third is threshold-setting with respect to important goals: only those plans are considered that meet a threshold (with respect to probability, or ease of achievement) for one or more “essential” goals. In our view, one of the tasks in understanding decision making is to diagnose, if possible, the rule or rules actually used to select one plan rather than another. The coding of behavior should therefore include not only the group affiliations and social goals involved in a decision, but also the decision rules that are in play. The examples of CRED research presented in Sections 5 and 6 represent a bare beginning toward a program of study aimed at understanding social goals in environmental decision making and simultaneously aimed at developing decision theory in ways appropriate to multiple goals.
5 Some Laboratory Studies of Group Decision Making In this section we briefly describe two studies of group interaction and coordination. The first demonstrates the importance of group affiliation for subsequent cooperation in a social dilemma, and the second strongly suggests that group interactions can alter framing effects by introducing group concerns into consideration. Both of these studies support our argument that social goals can themselves motivate behaviors to reflect group affiliation, identity, or belonging, and can lead to additional reward value from mutual cooperation or reciprocation to others. 5.1 Group Identity in a Cooperative Game In the first study [2,3], groups of four Columbia undergraduates were initially asked to complete a letter-writing task (which varied in its social setting, as
186
David H. Krantz et al.
detailed below). They were paid for this task, then asked whether they wished to place half of their earnings in an “investment cooperative” where the return on investment increased with the total number of investors (0 to 4). In terms of monetary payoffs, the investment cooperative was a game with two types of pure-strategy Nash equilibria: a noncooperative equilibrium, where everyone refuses to invest, and the cooperative equilibria with three investors and one noninvestor. Thus, if one believes that all three others will invest, one has an incentive to refuse, keeping one’s pay and reaping the benefits of the others’ investment; and if one believes that at most one other will invest, one should also refuse to invest. Investment only produces a net gain over refusal if exactly two of the three others choose to invest. This investment decision was always made individually; no interaction was permitted. Groups differed only in the presence and collaboration of others on the unrelated letter-writing task. The four randomly assigned conditions were hypothesized to have varying impact on group affiliation: anonymous (no knowledge of group members); symbol (abstract representation of the group, for example, by a blue star); co-present (group members present in the same room, but no interaction); and collaborative (group members engaged in a prior unrelated collaborative task). In the collaborative condition, the experimenters observed the group interaction during the letter-writing task and coded each individual with respect to his observed level of group affiliation as judged by eye contact, tone, and other indicators. Cooperation rate varied as predicted with condition and group affiliation: 43% for anonymous groups, 63% in the symbol condition, 78% in the copresent condition, and 75% for individuals in the collaborative condition. These results suggest that the level of group contact (either real or abstract) influences the willingness of participants to cooperate in a later task. Within the collaborative condition, 94% of individuals with high-rated group affiliation chose to invest, compared with 52% for those with low-rated group affiliation. Figure 1 shows the fit of a logistic-regression model relating rated group affiliation to cooperation within the collaborative condition. These results are consistent with our view that stronger affiliation leads to stronger social motives and thus greater cooperation. We suggest that social goals, derived from affiliation, can be (but are not always) activated by group activity or awareness of the group, and are present in varying degrees. Varying strengths of affiliation thus lead to different outcomes (in this case, cooperation in a later task), through, we posit, activation of social goals that lead to cooperation. In this specific case, we suggest, and our data support, that group activity leads to developing group affiliation (such as in references to my group, etc.) that then encourages a norm of cooperation, even in the absence of further reinforcement of this identity or norm. Interestingly, similar effects can occur without actual group co-presence or interaction (symbol condition). This of course is seen in real life through in-group effects involving people who share the same flag, same religion, and so on.
Values and Goals in Environmental Decision Making
187
Fig. 1. Predicted and observed cooperation.
In addition, subjects who were the lone defector in a group with three investors experienced significantly lower satisfaction with their outcome than their fellow group members or other defectors (despite their higher monetary payoffs), and were highly likely to cooperate on a second cooperative task, even without further interaction with group members. This suggests that there is a psychological cost to breaking norms of behavior, even if these norms are only implicit. Anecdotal evidence from other studies also shows that defectors often experience remorse, even with optimal economic gains. Finally, we also found that condition and thus level of affiliation with a group influenced how the dilemma presented in the first decision task was framed. Subjects in the co-present and collaborative conditions were more likely to spontaneously frame the decision as leading to a gain, whereas subjects in a more individual-based setting were more likely to frame the decision as one leading to a potential loss. This hints at the underlying process by which affiliation might be affecting the final decision to cooperate or defect in the dilemma. However, does the difference highlight the process if the subject expresses both views? To answer that question, we controlled for a mention of both the loss and gain frames and found that subjects in the group-based situations were more likely to mention only the gain frame as compared to subjects in a more individual-based situation who were more likely to mention
188
David H. Krantz et al.
only the loss frame. Spontaneously framing the decision as a gain led to greater cooperation, whereas spontaneously framing the decision as a gain led to greater defection by the subjects. It bears pointing out that cooperation yields greater gains for all whereas defection results in greater gains for the defecting individual at the cost of the group. Other laboratory studies have also shown the importance of group affiliation for cooperation [9,14]. These studies suggest that there is a connection between affiliation as a group member and behavior in a task that includes cooperation as an option. Here, we explain our results by hypothesizing that activation of group affiliation leads to activation of social goals, in this case of cooperation. This model is not binary, but depends on the degree of perceived affiliation, which can be measured through survey or observational data. 5.2 Framing Effects in a Group Setting In the second study [36], three-person groups completed several decision tasks similar to ones in which strong framing effects have previously been found for individuals working alone. We focus here on two of these decision problems. In one, modeled on the classic demonstration of framing [55], the choice lay between risky and riskless plans to mitigate the effects of a likely disease outbreak, with the consequences of each plan framed either as severe infection or as protection against infection. The second problem probed the intertemporal discount rate, involving either the delay or acceleration of receipt of prize money. For each decision problem, half of the groups first considered the problem as individuals (all with the same framing), making their own decisions, and discussed the problem later (still with the same frame) to reach group consensus. The other half first encountered that problem as a group. In this study each set of three subjects was recruited from an existing group (student groups and office groups were solicited) so that the participants had ongoing relationships. The data include the decisions reached, the justifications reported, background data about the individuals’ relationships to their group, and videotapes of the group interactions. The disease problem used a scenario concerning West Nile Virus (a real threat in the region). The individual decisions showed the standard framing effect: the risky mitigation plan was more popular among participants in the loss frame, but selection of the risky option was markedly reduced with gain framing. The choices in this problem were closely related to references to certainty made in justification. For individual decisions, the correlation between choice and certainty justification was very strong: risky choice in the loss frame was justified by avoiding the sure loss, and risk-averse choice in the gain frame was justified by the certain gain. For groups, reference to societal goals (as well as frame) was associated with selection of the risky option. To set up the other decision problem, prize money was made available to one of the participating groups (randomly preselected before the experiment began). Each group was asked to make intertemporal tradeoff decisions: they
Values and Goals in Environmental Decision Making
189
either decided how much smaller a prize they would accept if its receipt was accelerated (three months earlier), or how much larger a prize they would demand if its receipt were delayed (three months later). All participants were told that the group decision would be binding, should their group turn out to be the one randomly preselected to have the decision count. Discount rate was calculated by using the formula d = (x1 /x2 )1/(t1 −t2 ) , where x1 denotes the amount received today (t1 ) and x2 is the amount seen as equivalent three months from now (t2 ) [44]. Smaller discount factors signify greater discounting; a discount factor of 1 means no discounting. Previous work on asymmetric discounting has shown that individuals tend to have lower discount factors in delay conditions compared to individuals contemplating acceleration of consumption [30]. Contrary to previous findings, this study did not find a significant framing effect for individual or group discount rates. However, there was a difference between groups as a function of previous exposure to the decision as individuals. For groups without previous exposure, there was a significant effect of frame, although in the opposite direction predicted by prior research [59]. In the delay frame, these groups showed less discounting than groups with prior exposure who were in the accelerate frame. Participants’ justifications were coded as to whether they favored delayed receipt of the prize. The delay frame yielded marginally more patient reasons than the accelerate frame. These findings suggest that a different process may be at work when people consider an intertemporal choice with a collective outcome (the decision that counts will be made as a group and will affect the entire group) than when people consider only their own outcome. We are in the process of analyzing the group discussion transcripts to determine whether there is other evidence supporting this hypothesis. Obviously, the results reported here barely scratch the surface of what can be done in understanding collective decisions. The observed connection between social goals and selection of the risky option (affording an opportunity to protect everyone), in the disease mitigation problem, and the observed reversal of the usual framing effect for temporal discounting with a collective outcome each suggest the importance of group processes, and the analysis of justifications offers a picture in which societal goals or group-outcome goals have a strong influence on the decision process.
6 Coding Field Observations of Group Decision Making For many observers of environmental decision making, the importance of group processes and social goals is obvious and has been taken for granted. Yet there has been a disciplinary gulf that has profoundly affected scientific analysis of group processes. For anthropologists and political scientists, social norms and group processes are fundamental to their disciplines, and have been emphasized in the
190
David H. Krantz et al.
analysis of cooperation [34,37,38]. For psychologists and microeconomic theorists, on the other hand, individuals have been the focus of theory. The most important recent work in social psychology has fallen under the heading of social cognition: its focus has been the individual’s perception of the social world, rather than social motives. In applied physical science and engineering, technical information has been the focus; the fact that such information often is not attended to or not understood is a source of frustration, but is not seen as a reason for modifying prescriptive recommendations. From the perspective of mathematical analysis, much of what people do seems irrational, “myopic” or “politically motivated.” Such behavior leads to questions about how to improve rationality, or how to communicate better, but not questions about the prescriptive analysis itself. Of course, there have been important exceptions: in agricultural economics, especially, bridges have been built across the disciplinary gulf. We return to the issue of prescriptive analysis in Section 7. Our own thinking has been strongly influenced by individually oriented behavioral decision theory (especially the work of Daniel Kahneman, Paul Slovic, and Amos Tversky) but also by field observations of group decision making. Suarez et al. [51] strongly emphasize the importance of communicating uncertain information (in probabilistic format) in community settings. A discussion of integrated-systems modeling [23] concludes with a remarkable statement: However, while these models have a considerable degree of utility in their own right, it is argued that the dialogue and learning they generate among the disparate players is equally important to effective applications of climate forecasting. The modeling provides the means to build the trust and effective relationships needed among the disparate players. Phillips and Orlove [43] found that Ugandan farmers place high importance on collective discussion in shaping the use of forecasts. They gather spontaneously in public places to converse about forecasts and their possible uses, and sometimes form “listening groups” that assemble to hear and discuss radio programs that present forecasts. Tronstad et al. [52], working in a different culture on quite different issues, noted a similar phenomenon: We attribute most of this improvement in comfort level [for their mobile computer lab outreach] to individuals of a more familiar and homogeneous background being in the same room. Such field observations make clear that group processes are critical in the communication and use of scientific information, but also suggest how little we understand why this is so. Are the groups that come together to understand scientific information reaching some consensus decision that sets a norm for the individuals? Are they helping one another understand technical material? Are they creating a counterweight, out of numbers, to the prestige of the
Values and Goals in Environmental Decision Making
191
scientific messengers? Do they simply feel better due to the presence of others [47]? Or is learning facilitated by mere presence of others [60]? Most of the authors responsible for these field observations come from backgrounds of mathematical modeling (of climate, agriculture, or economic behavior). They observed and reported the importance of social settings and relationships, but did not turn their observations into questions for scientific investigation. It is natural, however, for collaborators with backgrounds in anthropology or psychology to recognize that these phenomena could be explained in many ways and deserve closer investigation. Several of our projects at CRED are devoted to collecting systematic data in field settings in order to understand such phenomena better. More generally, we use field settings both as test sites for theories about social processes and (especially insofar as our theories prove inadequate) as sources of new theoretical ideas, which in turn might also be tested in laboratory experiments. We briefly describe three examples here. (i) Listening Groups in Ugandan Agricultural Communities This project stemmed from the observations of Phillips and Orlove, noted above. The listening groups are an interesting example of the role of social goals in individual decision making. Decisions about what sort of seed to plant, how much of each, and when to plant each crop can all be influenced by a seasonal climate forecast, because the best choices depend on the onset, continuity, and strength of rain in the rainy seasons. However, each household has its own land, and farming decisions are at least partly made at the household level. Many people have radios, and in principle, could listen within the household to broadcasts of climate forecasts and make whatever decisions seem appropriate. In any case one would not expect uniformity, because amount of land, available labor, and other factors vary from one household to another. It is not surprising that people look to one another for confirmation of their thinking, but the semiformal structure in which people gather to listen and discuss the forecast seems to require explanation. In current work by Ben Orlove and Carla Roncoli, such meetings are observed. The observations include videotape of the meeting itself, sociolinguistic analysis of the verbatim transcripts of the discussion, by national collaborators, and finally interviews with some participants after the meeting. We are not yet able to give detailed results and conclusions, but several preliminary observations can be mentioned. First, the participants have multiple group identities: a person is associated with a particular household, with the farmer group itself (often this is a group that meets from time to time for various kinds of discussion), and with subgroups and superordinate groups. The presence of the visitors leads particularly to superordinate group identities. The village head may decide to attend, just because there are outsiders in the village, and thereby the participants’ village identity is invoked. The visitors themselves have relationships with the group, thereby constituting
192
David H. Krantz et al.
transient but important superordinate identities. For example, most of the population in the research area is Christian, with some Muslims. In one community, the team included the local extension agent who was known to many of the participants and who happened to be Muslim. The meeting was opened with the Lord’s Prayer, but then the group leader wanted to be sure that the Muslim extension agent felt included in the group. Second, much of the discussion is directed toward setting of a consensus or shared understanding of the farming situation, including an agreed interpretation of the forecast, agreed strategies about when to plant various crops, and sometimes norms concerning the amount of effort to be devoted to agricultural work. The latter norms are specifically directed against free-riding. Villagers are concerned about members of a household (mostly men) who do not contribute enough labor to the household, thus free-ride on other members’ efforts. Other concerns are related to these. One is the positive or promotion focus on opportunities of cooperation: if everyone plants the right crop at the right time, the village might attract larger-scale traders, or more traders, and everyone would get a better price for their crops (especially important for villages farther from paved roads). Another is the cooperative effect of everyone working with more effort (the key concept of amaanyi ) when they are more certain of the forecast interpretation and when there is agreement within the village. A third observation about the influence of social goals on discussion is that these meetings serve as an opportunity to review collective goals about nonagricultural issues as well. Village residents can discuss government programs, market conditions, and regional politics. They can integrate agricultural decision making for the coming season with planning at longer time scales. (ii) Water Allocation in Northeast Brazil The Brazilian state of Ceara is subject to frequent droughts, and has a system of water reservoirs that are managed to provide irrigation for local agriculture, and, in the future, may provide transfers of water to the large urban area of Fortaleza. Water release decisions are made by Water Allocation Committees, a form of participatory democracy sponsored by the Ceara government but controlled to some extent by experts who lay out the set of alternatives to be considered. In collaboration with the International Research Institute for Climate and Society, we at CRED have been studying the decision processes of Water Allocation Committees. Observations include videotapes of the meetings themselves and interviews with some of the participants. As with the Uganda meetings, data analysis has not progressed to the point where detailed conclusions are possible. Two features stand out, however. One is that much negotiation takes place outside the formal meeting, so that interviews are an extremely important supplement to the videotapes. Second, one observes (here, as in other contexts) strategic use of the uncertainty of forecasts: uncertainty is used or ignored in a position statement in
Values and Goals in Environmental Decision Making
193
order to justify a position favorable to the individual or group putting forth the argument. The latter observation has led us back to the laboratory, to try to observe the degree to which uncertainty changes people’s tradeoffs between their own and others’ outcomes. This research is being conducted by Miguel Fonseca in the CRED laboratory. Underlying the research, of course, is the existence of social goals: people do value others’ outcomes (e.g., other farmers or urbanites in Fortaleza); the question is, how much. (iii) Risk Sharing in East Africa There are many situations in which strategies are available that transfer risk (often from a buyer to a seller) to the mutual long-term benefit of both parties. A familiar example in industrialized countries is protection against “lemons” by an enforceable guarantee of quality for which a premium price is paid. If the buyer of a defective item might suffer a serious loss, then she will be willing to pay a premium for an inspected or guaranteed product. The seller thereby assumes part of the risk: he cannot sell a defective item, and sustains a loss (but less serious than that of the buyer). Such losses are more than compensated, in the long run, by the receipt of a higher price for the items he does sell. Third-party warranties or insurance can sometimes provide similar benefits when losses are large compared with the wealth of the participants: risk is shared among many purchasers. In a new setting, the introduction of such financial mechanisms can encounter two different sets of problems. One is “fit” with local culture. Transactions may be governed by strong norms, and the real payoffs may therefore involve social as well as economic goals. The second problem is the “long run” nature of the benefit. On a particular occasion, it may be the loss that is salient, so acceptance of risk may be placed in doubt. There is a large literature showing suboptimal behavior and little learning in stochastic environments. We are conducting two types of studies relevant to risk sharing in East Africa. One involves onsite studies of the understanding and acceptance of risk-sharing contracts. This involves both “fit” and acceptance of risk. The other research consists of laboratory studies of learning in stochastic environments. Under what circumstances do people learn to think about long-run outcomes rather than immediate losses? Results on both fronts, although very preliminary, are highly encouraging. In collaboration with the International Research Institute for Climate and Society, we have been studying the acceptance of insurance contracts by farmers in Malawi. In the laboratory, we have developed protocols that lead to excellent learning. It remains to be seen to what extent such learning transfers from one type of stochastic setting to another.
194
David H. Krantz et al.
7 Implications for Decision Analysis Incorporating social goals into a utility analysis would be a daunting task. For any one affiliation, there is a considerable list of possible social goals (Table 4), and this is compounded by multiple affiliations. The task is made impossible by the formation of transient affiliations, accompanied by transient social goals. The analysis of Uganda listening groups offered one set of examples: transient affiliation with visitors. Other examples are seen in routine politenesses reciprocated between strangers who are placed together for a few hours (say, in an airplane) or even for a few seconds (say, going through a doorway nearly simultaneously). Fortunately, this sort of utility complexity is already ruled out by the phenomena of individual choice with nonsocial goals, as reviewed in Section 4. The advice for prescriptive analysis remains what it always has been, in decision analysis: start by determining the decision-maker’s actual goals, and the contingencies under which each will be attained, given a particular plan or strategy. A simplified analysis might simply weight goal values by probabilities and sum. This looks like expected-utility theory, but it is not: it accepts the context-dependence of goals and weights placed on goals. Decision makers sometimes wish to attain near-guarantees for certain important goals. Often, this is possible, if the decision maker is prepared to commit very extensive resources toward the achievement of some important goals. It is then important to look carefully at worst-case scenarios, or at least very unfavorable scenarios. Would the decision maker really be willing, if it turned out to be needed, to devote such extensive resources to the given goals? If not, then the proposed decision rule should be revised. This advice of course has nothing directly to do with social goals as such, but it often comes up in connection with social goals that are viewed as extremely important. Particularly for environmental decision making, attention has to be paid to the values placed on environmental goals per se and on cooperation per se. Sanctions for noncooperators may be needed, but they will work only if most parties cooperate for intrinsic reasons. Summary We define a social goal as either a goal to affiliate (with any size or type of group, temporary or long-term) or a goal that arises from a group affiliation. Several different broad categories of goals derive from group affiliation. Some are linked to affiliation per se, including the goals of adhering to group norms, enhancing group reputation, and simply sharing success with others in the group. Social goals also include aspirations for particular roles within a group, and they include consequences of attaining special group roles, especially role obligations such as norm-setting or norm-enforcement. In addition, affiliation often leads to goals concerning consequences for other people, for example, in-group nepotism or discrimination against out-group members.
Values and Goals in Environmental Decision Making
195
Many environmental decision situations nominally appear to be commons dilemmas, in which the dominant strategy for each player is noncooperation, even though all would be better off with large-scale cooperation. These decision situations might be analyzed differently if social goals were taken into account. Goals of social norm adherence and reciprocity have the potential to lead to punishing norm violators and intrinsic rewards for group successes. For example, the efforts in recent years by European countries to reduce greenhouse gas emissions may in part reflect a “European environmental identity” for decision makers and their constituents by combining a continent wide social identity with environmental values, leading individuals to act cooperatively, despite noncooperation by major greenhouse gas emitters such as the United States and China. The analysis of decision situations should also take into account the complex social identity of decision makers, including multiple simultaneous affiliations, multiple role aspirations, role-derived obligations, and multiple reciprocity goals. The literature on constructed choice makes it very doubtful that all of these multiple goals can be modeled by utility theory (see also [31]). The choice of plans available, as well as other contextual features of the decision situation, determines what goals are activated and how they are weighted. In this context, heretofore unanswered questions come to the fore: How are social goals traded off with economic and other social goals, and how are their relative weights affected by uncertainty and time delays? At CRED, our laboratory experiments and field experiences suggest that group interactions are influenced by these social goals and group identities, such as in the social dilemma described above in which cooperation can be increased by an arbitrary group symbol and in the study of the effects of groups on decision framing. Our field studies in Uganda and Brazil also highlight the importance of social goals for real-world meetings and decision processes, suggesting that goals of consensus and agreement can drive meetings and discussions about climate forecasts. In continuing this research, we hope to elucidate in greater detail the important role of social goals in environmental decisions.
Acknowledgments Research supported by NSF grant SES-0345840. Many thanks to Elke Weber, Carla Roncoli, and the CRED team for their assistance with this chapter.
References 1. D. Ariely and T. S. Wallsten. Seeking subjective dominance in multidimensional space: An explanation of the asymmetric dominance effect. Organizational Behavior and Human Decision Processes, 63:223–232, 1995.
196
David H. Krantz et al.
2. P. Arora and D. H. Krantz. To cooperate or not to cooperate: Impact of unrelated collaboration on social dilemmas. Poster, Society for Judgment and Decision Making, Toronto, 2005. 3. P. Arora, N. Peterson, and D. H. Krantz. Group affiliation and the intrinsic rewards of cooperation. Technical report, Department of Psychology, Columbia University, New York, 2006. 4. K. J. Arrow. Social choice and individual values. Monograph 12, Yale University: Cowles Foundation for Research in Economics, 1951. 5. R. Axelrod and W. D. Hamilton. The evolution of cooperation. Science, 211:1390–1396, 1981. 6. J. Bendor. Uncertainty and the evolution of cooperation. Journal of Conflict Resolution, 37:709–733, 1993. 7. M. B. Brewer. The many faces of social identity: Implications for political psychology. Political Psychology, 22:115–125, 2001. 8. N. T. Brewer, G. B. Chapman, S. Brownlee, and E. A. Leventhal. Cholesterol control, medication adherence and illness cognition. British Journal of Health Psychology, 7:433–448, 2002. 9. R. Brewer, M. B. & Kramer. Choice behavior in social dilemmas: Effects of social identity, group size, and decision framing. Journal of Personality and Social Psychology, 50:543–549, 1986. 10. J. S. Bruner. The act of discovery. Harvard Educational Review, 31:21–32, 1961. 11. C. F. Camerer and E. Fehr. When does “economic man” dominate social behaviour? Science, 311:47–52, 2006. 12. G. B. Chapman. Time discounting of health outcomes. In G. Loewenstein, D. Read, and R. Baumeister, editors, Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice, pages 395–417. Russell Sage Foundation, New York, 2003. 13. M. D. Condorcet. Essai sur l’application do l’analyse ´ a la probabilit´e des d´ecisions rendues ´ a la pluralit des voix. Paris, 1785. 14. R. M. Dawes and D. M. Messick. Social dilemmas. International Journal of Psychology, 35:111–116, 2000. 15. E. L. Deci and A. C. Moller. The concept of competence: A starting place for understanding intrinsic motivation and self-determined extrinsic motivation. In A. J. Elliot and C. S. Dweck, editors, Handbook of Competence and Motivation, pages 579–597. Guilford, New York, 2005. 16. M. Douglas. No free gifts: Introduction to Mauss’s essay on The Gift. Risk and Blame: Essays in Cultural Theory. Routledge, London, 1992. 17. C. Feh. Alliances and successes in Camargue stallions. Animal Behavior, 57:705– 713, 1999. 18. E. Fehr and U. Fischbacher. Third-party punishment and social norms. Evolution and Human Behavior, 25:63–87, 2004. 19. L. Festinger, S. Schachter, and K. Back. Social Pressure in Informal Groups. Harper and Row, New York, 1950. 20. P. C. Fishburn. A study of independence in multivariate utility theory. Econometrica, 37:107–121, 1969. 21. A. P. Fiske. Five core motives, plus or minus five. In S. J. Spencer, S. Fein, M. P. Hanna, and J. M. Olson, editors, Motivated Social Perception, pages 233–246. Lawrence Erlbaum, Mahwah, NJ, 2003. 22. W. G¨ uth. On ultimatum bargaining experiments: A personal review. Journal of Economic Behavior and Organization, 27:329–344, 1995.
Values and Goals in Environmental Decision Making
197
23. G. L. Hammer and H. Meinke. Linking climate, agriculture, and decision making: Experiences and lessons for improving applications of climate forecasts in agriculture. In N. Nichols, G. L. Hammer, and C. Mitchell, editors, Applications of Seasonal Climate Forecasting in Agricultural and Natural Ecosystems. Kluwer Academic Press, The Netherlands, 2000. 24. G. Hardin. The tragedy of the commons. Science, 162:1243–1248, 1968. 25. E. T. Higgins. Beyond pleasure and pain. American Psychologist, 52:1280–1300, 1997. 26. C. K. Hsee and H. Kunreuther. The affection effect in insurance decisions. Journal of Risk and Uncertainty, 20:149–159, 2000. 27. J. Huber, J. W. Payne, and C. Puto. Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9:90–98, 1982. 28. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979. 29. D. H. Krantz and H. Kunreuther. Goals and plans in protective decision making. Working paper Wharton Risk Center WP # 06-18, University of Pennsylvania, 2006. http://opim.wharton.upenn.edu/risk/library/06-18.pdf. 30. G. Loewenstein. Frames of mind in intertemporal choice. Management Science, 34:200–214, 1988. 31. W. J. M., K. S., and M. D. M. A conceptual review of decision making in social dilemmas: Applying a logic of appropriateness. Personality and Social Psychology Review, 8:281–307, 2004. 32. A. B. Markman and C. P. Moreau. Analogy and analogical comparison in choice. In D. Gentner, K. J. Holyoak, and B. N. Kokinov, editors, The Analogical Mind: Perspectives from Cognitive Science, pages 363–399. The MIT Press, Cambridge, MA, 2001. 33. M. Mauss. The Gift: The Form and Reason for Exchange in Archaic Societies. Routledge & Kegan Paul, New York, 1990. 34. B. McCay. ITQs and community: An essay on environmental governance. Agricultural and Resource Economics Review, 33:162–170, 2004. 35. R. F. Meyer. Preferences over time. In R. L. Keeney and H. Raiffa, editors, Decisions with Multiple Objectives: Preferences and Value Tradeoffs, Chapter 9. Wiley, New York, 1976. 36. K. F. Milch. Framing effects in group decisions revisited: The relationship between reasons and choice. Master’s thesis, Department of Psychology, Columbia University, New York, 2006. 37. B. Orlove. Lines in the Water: Nature and Culture at Lake Titicaca. University of California Press, Berkeley, 2002. 38. E. Ostrom. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press, New York, 1990. 39. E. Ostrom. Collective action and the evolution of social norms. The Journal of Economic Perspectives, 14:137–158, 2000. 40. E. Ostrom, T. Dietz, N. Dolsak, P. Stern, S. Stonich, and E. Weber, editors. The Drama of the Commons. National Research Council, National Academy Press, Washington, DC, 2002. 41. T. R. Palfrey and H. Rosenthal. Private incentives in social dillemas: The effects of incomplete information about altruism. Journal of Public Economics, 35:309–332, 1988.
198
David H. Krantz et al.
42. J. W. Payne, J. R. Bettman, and E. J. Johnson. The Adaptive Decision Maker. Cambridge University Press, New York, 1993. 43. J. Phillips and B. Orlove. Improving climate forecast communication for farm management in Uganda. Final report, NOAA Office of Global Programs, 2004. 44. D. Read. Is time-discounting hyperbolic or sub-additive? Journal of Risk and Uncertainty, 23:5–32, 2001. 45. D. A. Redelmeier and D. N. Heller. Time preference in medical decision making and cost-effectiveness analysis. Medical Decision Making, 13:212–217, 1993. 46. G. Samuelsson and O. Dehlin. Family network and mortality: Survival chances through the lifespan of an entire age cohort. International Journal of Aging and Human Development, 37:277–95, 1993. 47. S. Schachter. The Psychology of Affiliation: Experimental Studies of the Sources of Gregariousness. Stanford University Press, Stanford, CA, 1959. 48. C. Sedikides, D. Ariely, and N. Olsen. Contextual and procedural determinants of partner selection: Of asymmetric dominance and prominence. Social Cognition, 17:118–139, 1999. 49. E. Shafir. Uncertainty and the difficulty of thinking through disjunctions. Cognition, 50:403–430, 1994. 50. P. Slovic. The construction of preferences. American Psychologist, 50:364–371, 1995. 51. P. Suarez, A. Patt, and Potsdam Institute for Climate Impact Research. The risks of climate forecast application. Risk, Decision and Policy, 9:75–89, 2004. 52. R. Tronstad, T. Teegerstrom, and D. E. Osgood. The role of technology for reaching underserved audiences. American Journal of Agricultural Economics, 86:767–771, 2004. 53. J. C. Turner, R. J. Brown, and H. Tajfel. Social comparison and group interest in ingroup favoritism. European Journal of Social Psychology, 9:187–204, 1979. 54. A. Tversky. Intransitivity of preferences. Psychological Review, 76:31–48, 1969. 55. A. Tversky and D. Kahneman. The framing of decisions and the psychology of choice. Science, 211:453–458, 1981. 56. A. Tversky and D. Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 26:297–323, 1992. 57. A. Tversky, P. Slovic, and D. Kahneman. The causes of preference reversal. The American Economic Review, 80:204–217, 1990. 58. T. A. Waite. Background context and decision making in hoarding gray jays. Behavioral Ecology and Sociobiology, 12:127–134, 2001. 59. E. U. Weber, E. J. Johnson, K. F. Milch, H. Chang, J. C. Brodscholl, and D. G. Goldstein. Asymmetric discounting in intertemporal choice: A query theory account. Psychological Sciences, 18:516–523, 2007. 60. R. B. Zajonc. Social facilitation. Science, 149:269–274, 1965.
Does More Money Buy You More Happiness? Manel Baucells1 and Rakesh K. Sarin2 1
2
Department of Decision Analysis, IESE Business School, University of Navarra, Barcelona, Spain [email protected] Decisions, Operations & Technology Management Area. UCLA Anderson School of Management. University of California, Los Angeles. Los Angeles, California, USA [email protected]
Summary. Why do we believe that more money will buy us more happiness when in fact it does not? In this chapter, we propose a model to explain this puzzle. The model incorporates both adaptation and social comparison. A rational person who fully accounts for the dynamics of these factors would indeed buy more happiness with money. We argue that projection bias, the tendency to project into the future our current reference levels, precludes subjects from correctly calculating the utility obtained from consumption. Projection bias has two effects. First, it makes people overrate the happiness that they will obtain from money. Second, it makes people misallocate the consumption budget by consuming too much at the beginning of the planning horizon, or consuming too much of adaptative goods.
1 Introduction We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. – The Declaration of Independence, July 4, 1776 In this chapter, we propose a model of adaptation and social comparison that provides insight into the following puzzle. Why do we believe that more money will buy us more happiness when in fact it does not? The key argument is that people overrate the impact money will have on improving happiness (well-being). They do this because they do not fully account for the adaptation to a higher standard of living that accompanies their higher level of income. Furthermore, a permanent increase in income for all peers (e.g., a companywide pay raise) leaves an individual in the same social position as before the increase. These two forces, adaptation and social
200
Manel Baucells and Rakesh K. Sarin
comparison, make it difficult to raise the average well-being of society through economic growth alone. Some segments of the population may indeed benefit from economic growth. For example, nouveau riches who break out from a lower income group to a higher income group, will show a higher level of well-being (at least temporarily). Sophisticated individuals who fully account for adaptation and social comparison can also benefit from economic growth, as they will keep consumption low in early periods in order to be able to sustain an increasingly accelerated consumption plan. In Section 2, we present our model in which the overall utility of a consumption stream depends on the relative consumption with respect to a reference level of consumption. The reference level itself is influenced by one’s past consumption (adaptation) and the average consumption of the peer group (social comparison). In Section 3, we show that our model is consistent with the two key findings in the well-being literature. These findings are: (1) happiness scores in developed countries are flat over time in spite of considerable increases in average income; and (2) there is a positive relationship between individual income and happiness within a society at any given point in time. In Section 4, we derive the optimal consumption plan using our model and show how a rational individual will plan consumption over time. We also derive the indirect utility of income under the assumption of optimal planning. This utility is indeed increasing with income. In Section 5, we resolve the puzzle stated at the start of our chapter using evidence from psychology which shows that people underestimate the effects of adaptation, which causes them to overestimate the utility that will be derived from a permanent increase in income. We make a distinction between basic goods and adaptive goods. Basic goods (food, social relationships, sleep) exhibit little or no adaptation. In Section 6, we show why people tend to allocate higher than optimal income to adaptive goods, at the expense of basic goods. Finally in Section 7, we conclude our findings and provide some implications of our model to economic policy and well-being research.
2 Adaptation—Social Comparison Model Suppose (x1 , x2 , . . . , xT ) is a consumption stream, where xt is the consumption in period t. What is the total utility that an individual (consumer) obtains from such a stream? The discounted utility (DU) model proposes to evaluate total utility as V (x1 , . . . , xT ) =
T X t=1
δ t−1 v(xt ) ,
Does More Money Buy You More Happiness?
201
where v(xt ) is the utility of consuming xt in period t, and δ t is the discount factor associated with period t. The DU model assumes consumption independence, which means that the utility derived from present consumption is not affected by past consumption [29,42]. It is easy to see that in the DU model an increase in income would permit a higher level of consumption and, therefore, total utility will indeed increase as income increases. For a concave v, gains in total utility will be smaller and smaller as income increases. In Figure 1, average happiness is plotted against income per capita for several countries. Several books discuss measurement and empirical issues dealing with happiness within a country and across countries [17,23,30,47]. It is clear from this figure that average happiness in poorer countries is lower than that in wealthier countries. Political issues such as democracy, freedom, and individual rights also influence happiness, which is distinctly lower in former Communist countries [17].
Fig. 1. Country comparison of income and happiness. Source: Figure 7.2 and Table 7.1 of Inglehart and Klingemann [22].
202
Manel Baucells and Rakesh K. Sarin
In wealthier societies the basic needs of the people are, by and large, satisfied. In poorer countries, progress is needed to address the problems of hunger, shelter, disease, and in some cases social turmoil caused by war and violence. It is therefore not a surprise that average happiness is lower in poorer countries. The happiness curve in Figure 1 is consistent with the diminishing marginal utility of income. Beyond a certain level of income, say $15,000 per year, happiness does not increase much with income. Easterlin [10,12] has argued that happiness has not increased over time in spite of significant increases in real income per capita in wealthier nations. Easterlin’s hypothesis of “no” marginal utility cannot be supported by the DU model. Furthermore, consumption independence—a crucial assumption of the DU Model—is not supported by empirical and behavioral studies [33]. There is considerable evidence that the utility derived from consumption depends crucially on two factors: adaptation or habituation to previous consumption levels, and social comparison with a reference or peer group [3,4,11,13–16,30]. A woman who drives a rusty old compact car as a student may find temporary joy upon acquiring a new sedan when she lands in her first job, but she soon adapts to driving the new car and assimilates it as a part of her lifestyle. Brickman et al. [3] find that lottery winners report only slightly higher levels of life satisfaction than the control group just a year after their win (4.0 versus 3.8 on a 5-point scale). Clark [4] finds evidence that job satisfaction—a component of well-being—is strongly related to changes in pay, but not levels of pay. A crucial implication of adaptation is that the utility derived from the same $3000 per month worth of consumption is quite different for someone who is used to consuming that amount of goods and services than for someone who is used to consuming only $2000 per month.1 Several authors have proposed models that account for adaptation in the determination of the total utility of a consumption stream [40,41,49,50]. Baucells and Sarin [2] incorporate satiation from past consumption in a modification of the DU model. In addition to adaptation, the utility derived from consumption also depends on the consumption of others in a person’s peer group. Driving a new Toyota sedan when everyone else in the peer group drives a new Lexus sedan seems quite different than if others in the peer group drive economy cars. Frank [13,14] provides evidence from the psychological and behavioral economics literature that well-being or satisfaction depends heavily on social comparison. Solnick and Hemenway [45] asked students in the School of Public Health at Harvard to choose between living in one of two imaginary worlds in which prices were the same:
1
People may not fully adapt to unemployment, loss of a spouse, noise, and other unfortunate and stressful situations. Adaptation rate is high for material goods, but a healthy marriage or good social relationships provide undiminished joy.
Does More Money Buy You More Happiness?
203
1. In the first world, you get $50,000 a year, and other people get $25,000 a year (on average). 2. In the second world, you get $100,000 a year, and other people get $250,000 a year (on average). A majority of students choose the first type of world. People are likely to compare themselves with those who are similar in income and status. A university professor is unlikely to compare himself with a movie star or a homeless person. He will most likely compare his lifestyle to those of other professors at his university and similarly situated colleagues at other, comparable universities. Medvec et al. [35] find that Olympic bronze medalists are happier than Olympic silver medalists, as the former compare themselves to the athletes who got no medal at all, whereas the latter have nightmares of missing the gold. After the unification of Germany, East Germans’ level of happiness fell as their comparison group shifted from people in other former Soviet block countries to people from West Germany [30]. Morawetz [36] found that people living in a community where variation in income is small are happier than those living in a community with a higher average income, but a more unequal income distribution. It is possible that in a recession or downturn, when everyone gets a uniform pay cut, happiness may not go down, but in prosperity, differential increases in pay can cause unhappiness. We cannot, however, simply improve our happiness by imagining more unfortunate individuals. Kahneman and Miller [26] assert that to influence our hedonic state, counterfactuals must be plausible, not just possible alternatives to reality. The all too common tactic of parents coaxing a child to appreciate food by reminding them of starving children in Africa does not work. Instead, the same child will be far more likely to appreciate a warm apple cider after a Little League game in the cold [38]. We note that it is possible that through spiritual practices such as meditation or prayer, one might gain a better perspective on life and reduce the harmful effects of comparison; however, such a practice requires considerable time, effort, and discipline. For this study, we assume that the social comparison level is exogenously specified (although a theory where the appropriate peer group and social comparison level is endogenous would be useful). We now state our model of adaptation and social comparison: V (x1 , . . . , xT ) =
T X
δ t−1 v(xt − rt ),
(1)
t=1
rt = σst + (1 − σ)at , at = αxt−1 + (1 − α)at−1 ,
t = 1, . . . , T , t = 2, . . . , T ,
(2) (3)
where a1 and st , t = 1, . . . , T, are given. In the above model, rt is the reference level in period t. The reference level is a convex combination of social comparison level st , and adaptation
204
Manel Baucells and Rakesh K. Sarin
level at . The adaptation level is the exponentially weighted sum of past consumptions, in which recent consumption levels are given greater weight than more distant past consumption levels.2 We interpret total utility V as a measure of happiness over an extended period. Experienced utility or per period utility v is to be interpreted as a measure of happiness in the period of time under consideration [28]. Occasionally, in order to remove the effects of initial values, we use the long-run values of experienced utility as a measure of happiness. If st is assumed to be constant over time, then we use S to denote the social comparison level. The carrier of utility is the gain or loss from the reference level. The reference level is determined by both past consumption and social comparison level. Consider an example in which an individual has been consuming 6 units per period and his adaptation level has settled to 6 units. The average consumption level of his peer group is 10 units, and his social comparison level is simply the mean consumption of his peer group (10 units). Now the reference level for this individual, assuming σ = 0.5, will be 0.5 × 6 + 0.5 × 10 = 8 units. If this individual were to consume 8 units, then the corresponding utility would be at the neutral level, v(8 − 8) = v(0) = 0. If he consumed more than 8 units, then the utility of current consumption would be positive; if he consumed less than 8 units, then the utility would be negative. When σ = 1, utility is determined solely by social comparison. Similarly, when σ = 0, social comparison plays no role and utility is determined solely by adaptation. The relative weight given to adaptation and social comparison is likely to be domain-specific. For example, social comparison for family life may play little to no role, as one does not readily observe this aspect of one’s peers’ lives. The utility one derives from a car, house, vacation, or private school for children, however, is likely to be influenced by social comparison. The speed of adaptation is governed by α. For α = 1, the adaptation is immediate and the most recent consumption will always serve as the adaptation level. For α = 0, there is no adaptation and the initial adaptation level, a1 , serves as the reference adaptation in every period regardless of past consumption. Goods for which α = 0 are called basic goods. Examples of basic goods include food, sleep, friendships, and shelter. These goods are necessary for
2
Note that x − r can be written as x − σs − (1 − σ)a = σ(x − s) + (1 − σ)(x − a). This last expression can be interpreted as an individual who uses not one (r) but two reference points (s and a). The comparison of x with s receives weight σ, and the comparison with a receives weight 1 − σ. Because a is also a convex combination of past consumption levels, one can interpret that each level of past consumption serves as a reference point, with different weights given to each comparison. Similarly, if s is understood as an average consumption in the society or in the peer group, then x − s could be seen as a multiple comparison with each group member.
Does More Money Buy You More Happiness?
205
survival.3 The study of basic goods and their contribution to well-being is not irrelevant as a large percentage of the world population lives at subsistence level. For these people, more money, and therefore provision of adequate food, shelter, clean water, and health, could indeed improve happiness. The utility function v is assumed to be concave for consumption above the reference level and convex for consumption below the reference level [27]. The neutral utility, v(0) = 0, is realized when consumption equals the reference level. In the next section, we explore the relationship between income and happiness in more depth. In all our numerical examples, we assume v(x) = xβ , x ≥ 0, and v(x) = −λ|x|β , x < 0, with β = 0.5 and λ = 2.25. The parameter λ measures the degree of loss aversion. With λ = 2.25, a $10 loss gives the same magnitude of negative utility as a $22.5 gain.
3 Income–Happiness Relationship A very poor, underprivileged person might think that it would be wonderful to have an automobile or a television set, and should he acquire them, at the beginning he would feel very happy. Now if such happiness were something permanent, it would remain forever. But it does not; it goes. After a few months he wants to change the models. The old ones, the same objects now cause dissatisfaction. This is the nature of change. Path to Tranquility, Dalai Lama, p. 175 The above quote captures the essence of the “Easterlin paradox,” which is an empirical finding that happiness scores have remained flat despite considerable increases in average income. The most striking example is Japan, where a fivefold increase in real per capita income has led to virtually no increase in average life satisfaction (Figure 2). A similar pattern holds for the United States (Figure 3) and for most other developed countries. Happiness in these surveys is measured by asking people how satisfied they are with their lives. A typical example is the General Social Survey [7] which asks: “Taken all together, how would you say things are these days— Would you say that you are very happy, pretty happy, or not too happy?” In World Values Survey, Inglehart et al. [21] use a 10-point scale with one representing dissatisfied and ten representing satisfied to measure well-being. Pavot and Diener [39] use five questions each rated on a scale from one to seven to measure life-satisfaction (Table 1). Davidson et al. [5] and [6] have found that when people are cheerful and experience positive feelings (funny film clips), there is more activity in the 3
For rich people or those in developed countries, food becomes an adaptive good used for social status (fine wine or a fancy restaurant) and not merely for nutrition.
206
Manel Baucells and Rakesh K. Sarin
Fig. 2. Satisfaction with life and income per capita in Japan between 1958 and 1991. Sources: Penn World Tables and World Database of Happiness.
Fig. 3. Income and happiness in the United States. Source: [30].
left front section of the brain. The difference in activity between the left and right sides of the prefrontal cortex seems to be a good measure of happiness. Self-reported measurements of happiness correlate with this measure of brain activity, as well as with the ratings of one’s happiness made by friends and family members [31]. Diener and Tov [8] report that subjective measures of well-being correlate with other types of measurements of happiness such as biological measurements, informant report, reaction time, open ended
Does More Money Buy You More Happiness?
207
Table 1. The satisfaction with life scale [39]. DIRECTIONS: Below are five statements with which you may agree or disagree. Using the 1–7 scale below, indicate your agreement with each item by placing the appropriate number in the line preceding that item. Please be open and honest in your responding. 1 = Strongly Disagree; 2 = Disagree; 3 = Slightly Disagree; 4 = Neither Agree or Disagree; 5 = Slightly Agree; 6 = Agree; 7 = Strongly Agree a. In most ways my life is close to my ideal. b. The conditions of my life are excellent. c. I am satisfied with my life. d. So far I have gotten the important thing I want in life. e. If I could live my life over, I would change almost nothing.
interviews, smiling and behavior, and online sampling. Kahneman et al. [25] discuss biases in measuring well-being that are induced by a focusing illusion in which the importance of a specific factor (income, marriage, health) is exaggerated by drawing attention to the factor. Nevertheless, Kahneman and Krueger [24] argue that self-reported measures of well-being may be relevant for future decisions as the idiosyncratic effects are likely to average out in representative population samples. Frey and Stutzer [18] conclude: “The existing research suggests that, for many purposes, happiness or reported subjective well-being is a satisfactory empirical approximation to individual utility.” If people pursue the goal of maximization of happiness and they report their happiness levels truthfully in the variety of surveys discussed above, then how do we explain the fact that happiness scores have remained flat in spite of significant increases in real income over time? Of course, happiness depends on factors other than income such as the genetic makeup of a person, family relationships, community and friends, health, work (unemployed, job security), external environment (freedom, wars, or turmoil in society, crime), and personal values (perspective on life, religion, spirituality). Income, however, does influence a person’s happiness up to a point and moderates the adverse effects of some life events [44]. As shown in Figure 4, mean happiness for a cross-section of Americans does increase with income, though at a diminishing rate. In fact, in any given society, richer people are substantially happier relative to poorer people (see Table 2 for United States and Britain). Our model of adaptation and social comparison is consistent with the joint empirical finding that happiness over time does not increase appreciably in spite of large increases in real income, but happiness in a cross-section of data does depend on relative levels of income. That rich people are happier than poor people at a given time and place is easy to justify by social comparison. By and large, richer people have a favorable evaluation of their own situation compared to others. In contrast,
208
Manel Baucells and Rakesh K. Sarin
Fig. 4. Mean happiness and real household income for a cross-section of Americans in 1994. Source: [9]. Table 2. Happiness according to income position. Source: [30]. United States [%]
Britain
Top Bottom Top Bottom Quarter Quarter Quarter Quarter
Very Happy
45
33
40
29
Quite Happy
51
53
54
59
Not Too Happy
4
14
6
12
100
100
100
100
the economically disadvantaged will have an unfavorable evaluation of their relative position in society. Needless to say, some rich people may bring misery upon themselves by comparing themselves with even richer people. Over time, though, both rich and poor people have significantly improved their living standards, but neither group has become happier. Adaptation explains this paradoxical finding. Consider Mr. Yoshi, a young professional living in Japan in the 1950s. He was content to live in his parents’ house, drive a used motorcycle for transportation, wash his clothes in a sink and listen to the radio for entertainment. Also consider Ms. Yuki, a young professional living in Japan in the 1990s. She
Does More Money Buy You More Happiness?
209
earns five times the income of Mr. Yoshi in real terms. She wants her own house, own automobile, washing machine, refrigerator, and television. She travels abroad for vacations and enjoys expensive international restaurants. Mr. Yoshi was consuming 10 units of income per period, but had adapted to that level of consumption. Ms. Yuki consumes 50 units of income per period and has adapted to consuming at that high level. Because Mr. Yoshi and Ms. Yuki are in similar social positions for their times, then both will have the same level of happiness. Happiness does not depend on the absolute level of consumption, which is substantially higher for Ms. Yuki. Instead, happiness depends on the level of consumption relative to the adaptation level. Ms. Yuki has gotten adapted to a much higher level of consumption and therefore finds that she is no happier than Mr. Yoshi. Note that experienced utility, v(x − r), remains constant if income x increases from 10 units (Mr. Yoshi) to 50 units (Ms. Yuki) in steps of one unit each year because r also increases in steps of one unit each year.4 To demonstrate the role of adaptation and social comparison in determining experienced utility, we apply our model to the simple case of constant consumption plans. Suppose the social comparison level S and the initial adaptation level a1 are both 10 units. The experienced utility in each period for persons A, B, and C who have a constant consumption of 12, 10, and 8 units, respectively, is plotted in Figure 5. Figure 5c shows that the poor person C will feel less dissatisfaction over time; whereas, the richer person A will experience diminished satisfaction. Both the poor and the rich person are getting adapted to their respective levels of consumption. Two observations from Figure 5 are of special interest. First, with adaptation alone (α > 0, σ = 0, Figure 5a) both the poor person C and the rich person A will converge to the neutral level of happiness as each becomes adapted to his or her own past consumption level. Second, with social comparison alone (σ = 1, Figure 5b), the poor person, C, and the rich person, A, will remain far apart in happiness. More generally, dispersion in happiness will be about the same as the dispersion in income. This is also the prediction of the discounted utility model, which is a particular case of the pure social comparison model with S = 0. Together, the two factors of adaptation and social comparison, provide the more realistic prediction that the discrepancy in reference levels, and therefore in happiness, is less than the discrepancy in income. The reference levels are pulled towards the average consumption (12, 10, or 8), but do not converge to the average consumption because of the permanent social comparison with S = 10. This prediction of our model is consistent with Easterlin [11] who
4
If income increases at a geometric rate, say four percent per year, then the same conclusion is reached assuming income is measured in logs as suggested by Layard [30]. In this case, use v(ln(x) − ln(r)) = ve(x/r) instead of v(x − r) in (1), and maintain the updating equations (2) and (3).
210
Manel Baucells and Rakesh K. Sarin
Fig. 5. The effect of adaptation and social comparison on experienced utility. Panel (a) shows adaptation alone (α > 0, σ = 0); panel (b) exhibits social comparison alone (σ = 1); and panel (c) is a combination of adaptation and social comparison (α > 0, 0 < σ < 1).
states “the dispersion in norms [reference levels] appears to be, on average, less than that in incomes.” In Figure 6, the relationship between income and happiness is plotted for various weights σ on social comparison. We assume the initial adaptation and social comparison levels to be 10. The horizonal axis represents the constant consumption level x. Note that v(x − r) = 0 at x = r = 10. The vertical axis represents the long-run experienced utility, once the adaptation level has converged to x (assuming α > 0). By (2), the reference level r tends to σ10 + (1 − σ)x, and therefore x − r tends to σ(x − 10). Thus, the longrun experienced utility is given by v(σ(x − 10)). In the absence of social comparison (σ = 0), the long-run experienced utility is independent of income and flat at zero. As the weight on social comparison increases, the richer people (x > 10) become happier and the poorer people (x < 10) become less happy. The happiness function is S-shaped and steeper for losses. Thus, for any rich person, say with x = 17, there is a symmetric poor person, x = 3, such that an increase in income of the poor person gives higher utility than increasing the income of the rich one. If individuals are equally weighted (utilitarian view), then the most gain in societal happiness is realized by improving the income of the person who is slightly below the average. If the worse-off individuals receive higher weight (rank utilitarian view), then this may not be the case.
Does More Money Buy You More Happiness?
211
4
Long-run Experienced Utility
2 0 0
5
-2
10
15 20 per-period income
-4 σ=1 σ=0.5
-6
σ=0.2 σ=0
-8
α=1; S=10
Fig. 6. The effect of social comparison on long-run experienced utility.
In Figure 6, we have assumed that the social comparison level is the same for a rich person as it is for a poor person. If, however, the peer group against which the social comparison is made changes with income level, then little gain in happiness may be realized. For example, if rich people compare themselves with other rich people, then S = x and experienced utility becomes zero. The double-edged sword of increasing adaptation level and increasing social comparison level may leave happiness unchanged even when income increases substantially. Conversely, if poor people are able to suppress social comparison, or compare themselves more often with even poorer individuals, then they may be able to partially overcome the predictions of Figure 6. The argument above does not prove that a rational person who optimally plans consumption by anticipating future adaptation levels will not be happier with more money. We merely have asserted above that if society becomes accustomed or adapted to higher levels of consumption as income rises (which will occur if the consumption plan is constant or not sufficiently increasing), then there will be no gain in observed happiness scores. We now examine the optimal consumption plan for the adaptation–social comparison model.
4 Optimal Consumption Plan Suppose that a consumer wishes to optimally allocate an income I over consumption periods t = 1, . . . , T . For simplicity, assume δ = 1 (no discounting),
212
Manel Baucells and Rakesh K. Sarin
a constant unit price, and borrowing and saving at zero percent interest. The consumer chooses (x1 , . . . , xT ) to solve the following optimization problem, Max V (x1 , . . . , xT ) =
T X
v(xt − rt )
(4)
t=1
s.t.
T X
xt ≤ I,
(5)
t=1
xt ≥ 0,
t = 1, . . . , T ,
(6)
and rt satisfying the updating equations (2) and (3). The optimal consumption plan for the discounted utility model is constant with xt = I/T , t = 1, . . . , T . For our adaptation–social comparison model, the optimal consumption plan depends on reference levels. Because reference levels are influenced by both adaptation and social comparison, the optimal consumption plan shows a richer pattern. For the general case, we can always solve the mathematical program (4)–(6) to obtain the optimal consumption plan and the associated levels of per-period experienced utilities and total utility. To explicitly solve (4)–(6), it is convenient to define zt = xt − rt . We can then simply redefine the problem as one of finding the optimal values of zt , as in the discounted utility model, but with a modified budget constraint. To calculate the new budget constraint, note that for given values of zt , one can easily recover the values of xt (and of rt and at+1 ), t = 1, . . . , T , in a recursive manner by means of (2) and (3). Hence, each xt is a function of zτ , τ = 1, . . . , t. Therefore, the budget constraint (5) can be written in terms of zt , t = 1, . . . , T . Such expression for the budget constraint, however, is quite involved for the general model. It is possible to obtain a tractable expression for the special case of α = 1. In this case, rt = σst +(1−σ)xt−1 , so that xt = zt +rt = zt +σst +(1−σ)xt−1 , xt−1 = zt−1 + σst−1 + (1 − σ)xt−2 , . . ., and x1 = z1 + σs1 + (1 − σ)a1 . It follows that: t X xt = (1 − σ)t a1 + (1 − σ)t−τ (zτ + σsτ ) . (7) τ =1
Plugging (7) into (5) yields the desired expression for the budget constraint as a function of zt : (κ0 − 1)a1 +
T X
κt (zt + σst ) ≤ I ,
(8)
t=1
where κt =
TX −t+1 τ =1
(1 − σ)T −t+1−τ =
1 − (1 − σ)T −t+1 , σ
t = 0, . . . , T .
(9)
Does More Money Buy You More Happiness?
213
Using standard calculus, the first order condition is given by: v 0 (zt ) = λκt ,
t = 1, . . . , T ,
(10)
where λ is the Lagrange multiplier associated with (8). If the constraint xt > 0 is met and v is concave, then the optimal solution is unique and is given by the solution of (10). This will also be the case if v is S-shaped, and the solution operates in the gains portion of the value function (i.e., zt ≥ 0); otherwise, there may be multiple local optimal solutions. To gain further insight, we consider the case of a power value function, v(z) = z β , z ≥ 0. In this case, v 0 (z) = β/z 1−β , and (10) becomes zt =
β λκt
1/(1−β) ,
t = 1, . . . , T .
(11)
Using the budget constraint (8), we solve for the Lagrange multiplier and finally obtain zt =
I − σΩ − (κ0 − 1)a1 1/(1−β)
κt
,
t = 1, . . . , T ,
(12)
K
PT PT −β/(1−β) where Ω = . Using (7) now yields xt , t=1 κt st and K = t=1 κt t = 1, . . . , T . It is apparent from (12) that to ensure a consumption above the reference level (zt ≥ 0) it is necessary for the social comparison levels and the initial adaptation level to be sufficiently low. Essentially, one needs to ensure that the numerator of (12) stays positive, or σΩ + (κ0 − 1)a1 ≤ I. Two special cases with σ = 0 and σ = 1 are instructive. When σ = 0, average income per period, I/T , above the initial adaptation level a1 ensures consumption above the referenceP level in the remaining periods. When σ = 1, the total income T greater than t=1 st will also ensure that consumption in each period is above the reference level. Of course, if the level of social comparison is constant over time (s1 = s2 = · · · = sT = S), this last condition reduces to average income per period greater than S. Even though (12) is derived using a power form for the value function, the conclusion that zt ≥ 0 if the income is at least σΩ + (κ0 − 1) follows more generally from (8) and (10). If zt , t = 1, . . . , T , is positive, and assuming st ≥ 0, then it follows from (7) that the optimal consumption plan is increasing. If the social comparison level or the initial adaptation level are sufficiently high, σΩ + (κ0 − 1)a1 > I, then the optimal solution involves some zt < 0. This can yield complex patterns of consumption. Recall that consumption below the reference level implies that the consumer operates on the convex part of the value function. Therefore, the consumer will find it optimal to accumulate as much loss as possible in some periods. To do so, the individual will cease consumption in some intermediate periods, with the hope of lowering the adaptation level. Once the reference level is low enough, he may start an
214
Manel Baucells and Rakesh K. Sarin
30
30
Consumption Plan Adaptation Level Experienced Utiltiy
25
Consumption Plan Adaptation Level Experienced Utiltiy
25
20
20
15
15
10
10
5
5
0
0 1
2
3
α=1; σ=0.2; S=5; I=100
4
5
(a)
6
7
8
9
10
Time Period
1
2
3
α=1; σ=0.2; S=5; I=100
4
5
6
(b)
7
8
9
10
Time Period
Fig. 7. Optimal consumption plan, and associated adaptation level and experienced utility.
increasing consumption plan from then on. Numerical methods can be used to obtain the optimal consumption plans in these complex cases. Figure 7, we present two possible optimal consumption plans for a fixed income of 100. In this example, a1 is set to 0 and S to 5. In Figure 7a, the optimal consumption plan is increasing. Reference levels and experienced utility are also increasing. In Figure 7b, there is a greater weight given to social comparison; the optimal consumption plan, although still increasing, is flatter and shows the moderating effect of social comparison. In the extreme case when the weight of social comparison is set to one, the optimal plan will be flat. The key observation is that by anticipating the change in future reference levels induced by current consumption, a rational consumer could choose a consumption plan that may produce substantially higher total utility than a constant consumption plan (16.1 versus 12 when α = 1, σ = 0.2, Figure 7a). For a high σ, the optimal plan becomes flatter and therefore the total utility under the optimal plan and the constant consumption plan are close (17.9 versus 17.0 when α = 1, σ = 0.5, Figure 7b). For a given income I we can solve the consumption planning problem and find the total utility associated with the optimal plan. By varying I, we can repeatedly solve the planning problem and derive the indirect utility of income. In Figure 8, indirect utility of income is plotted for some specific values of parameters. It is clear from this figure that an increase in income to a richer person provides less incremental utility than the same increase would provide for a poor person. It is of special interest to note that indirect utility of income need not always be S-shaped even if the per period utility function v is S-shaped. For σ = 0 there is no social comparison, and a rational consumer derives a positive total utility for all values of income. For example, for I = 20, the
Does More Money Buy You More Happiness?
215
30 20
Total Utility
10 0 -10
0
20
40
60
80
Income
100
-20 -30
σ=1 σ=0.5 σ=0.2 σ=0
-40 -50 -60
α=1; S=5 Fig. 8. Utility of income for optimal plan.
total utility is 7.7. A person with I = 20 is relatively poor because with an average consumption of 2 units per period she cannot keep up with the social comparison level of 5 units per period. As σ increases, her utility decreases. With σ = 1 such a poor person obtains a high negative total utility of 39. In contrast, a rich person (I = 100) has a total utility of 17.1 for σ = 0. Social comparison (σ = 1) also contributes to further increase her total utility to 22.4. In Figure 9 we compare the utility derived from an optimal consumption plan with the utility derived from a constant consumption plan. As expected, the utility of the optimal plan is substantially higher than the utility of a constant consumption plan. Benefits of optimal planning seem to accrue more to relatively poorer people. For example, the gain in utility through optimal planning for a person with I = 40 is substantially higher [5.5–(–7.3) = 12.8] than for a person with I = 60 [10.3–6.3 = 4]. A person below the average income of 50, but above the threshold of σΩ + (κ0 − 1)a1 = 32.1, can carefully choose an increasing consumption plan that yields positive experienced utility in all periods. However, under constant consumption, such a person consumes below the reference level, except for the first period, thereby realizing negative experienced and total utility.
216
Manel Baucells and Rakesh K. Sarin
Optimal Consumption
20
Constant Consumption
15
Total Utility
10 5 0 -5
0
20
40
60
80
100
Income
-10 -15 -20 -25
α = 1; σ = 0.2; S = 5
Fig. 9. Utility of income for optimal versus constant consumption plan.
5 Predicted versus Actual Happiness The great source of both the misery and disorders of human life, seems to arise from over-rating the difference between one permanent situation and another. Adam Smith, The Theory of Moral Sentiments, 1979, Part III, Chap. III So far we have seen that our adaptation–social comparison model is consistent with empirical findings that within a country richer people are happier than poorer people (social comparison), but that over time well-being does not increase in spite of permanent increases in income for all (adaptation). But the puzzle that we stated at the start of the chapter still needs resolution. Lottery winners may not be happier [3], but most people continue to believe that winning a lottery will make them happier. As we have demonstrated in Section 4, if people plan optimally, then more money indeed buys more happiness; although their happiness will increase at a diminishing rate. Optimal planning, however, requires that one should correctly predict the impact of current consumption on future utility. An increase in consumption has two perilous effects on future utility. First, the adaptation level goes up and therefore future experienced utility declines (e.g., people get used to a fancier car, a bigger house, or a vacation abroad). Second, the social comparison level may also go up, which again reduces experienced utility. When one joins a country club or moves to a more prosperous neighborhood,
Does More Money Buy You More Happiness?
217
then the peer group with which social comparisons are made also changes. The individual now compares himself with more prosperous “Joneses” and comparisons to his previous peer group of less prosperous “Smiths” fades. If our lottery winner foresees all this, then he can appropriately plan consumption over time and realize high total utility in spite of a higher level of adaptation and an upward movement in peer group. The rub is that people underestimate adaptation and possibly changes in peer group. Loewenstein et al. [32] have documented and analyzed underestimation of adaptation and have called it projection bias. Because of projection bias, a person will realize less happiness than she thinks. The gap between predicted and realized levels of happiness (total utility) further increases if one plans myopically rather than optimally. An example of a myopic plan is to allocate a budget or income equally in each period (constant consumption), as opposed to an increasing plan. A worse form of myopic planning would be to maximize immediate happiness through splurging (large consumption early on); which is what some lottery winners presumably end up doing. We buy too much when hungry [37], forget to carry warm clothing during hot days for cooler evenings, predict that living in California will make us happy [43], and generally project too much of our current state into the future and underestimate adaptation [19,33,34]. vanPraag and Frijters [48] estimate between 35 to 60 cent rise in what one considers required income for every dollar increase in income. Stutzer [46] also estimates an increase in adaptation level of at least 40 cents for each dollar increase in income. After the very first year, the joy of a one dollar increase in income is reduced by 40%, but people are unlikely to foresee this reduced contribution to happiness. People do qualitatively understand that some adaptation to change in lifestyle with higher income will take place; they simply underestimate the magnitude of the changes. In our model, the chosen consumption plan determines the actual reference level rt by means of (2) and (3). In every period, subjects observe the current reference level, but may fail to correctly predict the value of this state variable in future periods. According to projection bias, the predicted reference level is somewhere in between the current reference level and the actual reference level. The relationship between the actual and predicted reference levels can be modeled using a single parameter, π, as follows. Predicted Reference Level = π(Current Reference Level) + (1 − π)(Actual Reference Level). Thus, when π = 0, then there is no projection bias and the predicted reference level coincides with the actual reference level. If π = 1, then the person adopts the current reference level as the future reference level. An intermediate value of π = 0.5 implies that the person’s predicted reference level is halfway between the current and actual reference levels. The projection bias
218
Manel Baucells and Rakesh K. Sarin
model can be extended to any state variables influencing preferences such as the satiation level [1]. If consumption stays above the actual reference level over time, then a person with projection bias may be surprised that the actual realized utility in a future period is lower than what was predicted. The reason, of course, is that the actual reference level is higher than anticipated. Actual happiness associated with higher levels of consumption may be much lower than what was hoped for. This gap may motivate the person to work even harder to increase his income in the hopes of improving happiness. But this chase for happiness through higher and higher consumption is futile as the reference level keeps on increasing. To formalize these ideas, let τ be the current period. The actual and predicted reference levels for a subsequent period t are rt and rˆτ,t , respectively. Now, rˆτ,t = πrτ + (1 − π)rt , where rt follows the dynamics governed by (2) and (3). The actual utility is given by the chosen consumption plan according to the adaptation–social comparison model; however, the chosen consumption plan might not be optimal under projection bias. The reason is that, in period τ , the individual will maximize the predicted utility at τ , given by: Vˆτ (xτ , xτ +1 , . . . , xT |rτ , π) =
T X
v(xt − rˆτ,t ) .
(13)
t=τ
The difference between actual and predicted utility can be demonstrated by a simple example. Suppose a person plans a constant consumption of x units per period. In the first period, the utility realized is v(x) if r1 = 0. If his projection bias is extreme (π = 1), he will predict no changes in reference levels, rˆ1,2 = rˆ1,3 = · · · = rˆ1,T = 0, and a utility of v(x) for the second and remaining periods. But the actual reference level r2 in Period 2 will be greater than 0 for any α > 0 and it will be x for α = 1. Thus, the actual utility will be between v(0) and v(x) for any α > 0, π > 0. The gap between the predicted and actual utility for Period 2 onward will be v(x) − v(0) for the extreme case of α = π = 1. This is the sort of dilemma lottery winners face. Because of projection bias, they overrate the difference between their predicted and actually realized levels of happiness. We now consider consumption planning under projection bias. We set π = 0.5, fix the budget at I = 100, assume no initial adaptation (a1 = 0), and set the social comparison level to S = 10. A person with projection bias maximizes (13) at τ = 1. He obtains (ˆ x1,1 , x ˆ1,2 , . . . , x ˆ1,T ) as the optimal plan where x ˆτ,t is the consumption at time t as planned at period τ . In Table 3, a consumption plan with projection bias for Period 1 is shown in the first row. This person implements x ˆ1,1 = 5.8 and now solves (13) again with a reduced budget of I −x ˆ1,1 = 94.2. The solution now gives (ˆ x2,2 , x ˆ2,3 , . . . , x ˆ1,T ) and x ˆ2,2 = 8.5 is implemented (second row of Table 3). Note that the consumption in Period 2
Does More Money Buy You More Happiness?
219
is revised upward from 8.1 (Period 1 plan for Period 2) to 8.5. Upon reaching Period 3, the person realizes that the actual reference level is higher than what he had thought earlier, so he optimizes again with this new information. The available budget is now I − x ˆ1,1 − x ˆ2,2 = 85.8. By repeatedly solving (13), we obtain (ˆ x1,1 , x ˆ2,2 , . . . , x ˆT,T ). Table 3. Revision of consumption plans under projection bias [α = 1; σ = 0.2; π = 0.5; S = 10; I = 100]. τ \t
1
1 2 3 4 5 6 7 8 9 10 Actual x ˆτ,τ Optimal
x∗t
2
3
4
5
6
7
8
9
10
Budget Available
5.8 8.1 9.0 9.4 9.6 9.7 9.8 10.2 11.4 17.0 – 8.5 9.5 10.0 10.1 10.2 10.3 10.5 11.1 13.9 – – 9.7 10.2 10.4 10.5 10.6 10.7 11.0 12.5 – – – 10.3 10.6 10.7 10.8 10.8 11.0 11.8 – – – – 10.6 10.8 10.8 10.9 11.0 11.6 – – – – – 10.8 10.9 10.9 11.0 11.4 – – – – – – 10.9 10.9 11.0 11.4 – – – – – – – 11.0 11.0 11.4 – – – – – – – – 11.1 11.4 – – – – – – – – – 11.4
100 94.2 85.8 76.0 65.7 55.1 44.3 33.4 22.4 11.4
5.8 8.5 9.7 10.3 10.6 10.8 10.9 11.0 11.1 11.4
–
2.5 4.5 6.1 7.5
8.7
9.8 10.9 12.3 14.7 23.1
100
Note that the person with projection bias is forward looking and does plan optimally except he uses his predicted reference levels in arriving at the consumption plan. A consequence of such a plan, for example, is that he may overconsume in early periods if he underestimates changes in future reference levels. So at an intermediate period, he has used up a lot more budget than he would have used had he predicted reference levels accurately. The projection bias consumption plan is therefore flatter than the optimal consumption plan under no projection bias. In Table 3, the projection bias plan (actual x ˆτ,τ ) is compared to the optimal plan (x∗t ). As expected, the person is overconsuming in early periods compared to the optimal plan. Under projection bias, the actual total utility (8.4) may be lower than the optimal total utility (11.7), and is much lower than the predicted total utility (21.1) in Period 1. In Figure 10, the predicted and actual total utilities for different levels of income are shown. The difference between predicted and actual utility increases significantly as the projection bias increases from π = 0.5 to π = 1. It is clear that people think that more money will buy them a lot more happiness than it actually does.
220
Manel Baucells and Rakesh K. Sarin
30
30
Total Predicted Utility Total Optimal Utility Total Actual Utility
25
Total Predicted Utility Total Optimal Utility Total Actual Utility
25
20
20
15
15
10
10
5
5
0
0 0
20
40
60
80
100
Income
α=1; σ=0.2; π=0.5; S=I/10
0
20
40
60
α=1; σ=0.2; π=1; S=I/10
80
100
Income
Fig. 10. Utility of income under projection bias.
If we were wired to underestimate adaptation then there is little we can do about our incorrect predictions of future state. But we can at least be forward looking and account for the effects of current consumption on future utility. A myopic planer who uses the heuristic of constant consumption will realize even less total utility and suffer from a bigger gap between predicted and realized happiness [20]. Ironically, the DU model with no discounting will prescribe the same erroneous conclusion as the optimal consumption plan is flat, thereby realizing much less happiness than what would have been predicted and a great deal of disappointment.
6 Happiness and Budget Allocation To gain further insight into the relationship between happiness and income consider a simple model in which one allocates a fixed budget between two goods. The first good is an adaptive good, whereas the second good is a basic good for which the reference level remains constant. The overall utility is additively separable between the two goods. The optimization problem is: Max
T X
w v(xat − rta ) + (1 − w)v(xbt − rb )
t=1
s.t.
T X
xat + xbt ≤ I ,
t=1
where rta is determined by the usual updating equation. If w = 1/2, then the adaptive good A will provide less utility because reference levels increase due to past consumption. The basic good B provides a greater utility throughout as long as consumption is above its constant reference level (rb ).
Does More Money Buy You More Happiness? 10
10
Optimal Cons. Adaptive Good A
9
Optimal Cons. Basic Good B Actual Cons. Basic Good B
9
Actual Cons. Adaptive Good A
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
221
0
1
2
3
4
5
α1 = 1; σ = 0; π = 0.5; w = 0.67
6
7
(a)
8
9
10
Period
1
2
3
4
5
6
α2 = 0; σ = 0; π = 0.5; 1-w = 0.33
7
(b)
8
9
10
Period
Fig. 11. Consumption of adaptive (A) and basic (B) goods under projection bias. The reference level for good B is set to rb = 4.
In Figure 11, the optimal allocation of a fixed budget of I = 100 is compared to the allocation that results from projection bias. In this example, we set rb = 4; meaning, a per-period consumption of at least 4 units of the basic good is required to experience positive utility. For the optimal allocation, the adaptive good A receives a low allocation in early periods to keep reference levels under control. The consumption plan for good A is increasing over time (see Figure 11a). The basic good B in contrast receives a constant income allocation of about 7 units per period (see Figure 11b). Under projection bias, the person overconsumes the adaptive good A in early periods, which raises the reference levels for later periods. In order to keep up with the increased reference levels of good A, more and more budget is allocated to it at the expense of the basic good B. The total utility under projection bias is 8.2 units compared to the total utility of 12 units that is obtained under optimal planning. Table 4. Fraction of budget allocated to basic good B. Income
Optimal [%]
Under Projection Bias [%]
40 50 60 70 80 90 100
94 89 82 77 73 70 68
91 82 70 62 56 51 47
222
Manel Baucells and Rakesh K. Sarin
We can perform this analysis for several levels of income. Table 4 shows the relationship between income I and percent allocation of income to the basic good B. Under the optimal plan, the percent allocation to the basic good decreases as income increases; this is also the case under projection bias. Under projection bias, however, a far smaller percentage of income (compared to the optimal plan) is allocated to the basic good. As shown in Figure 12, the net result is that the actual realized utility at every income level is lower. This misallocation is even greater for higher levels of income as realized utility becomes flatter as income increases. No one would shed a tear if the rich Total Optimal Utility Total Utility under Projection Bias
15
10
5
0 40
50
60
σ = 0; π = 0.5; ω = 0.67
70
80
90
100 Income
Fig. 12. Utility of income for two goods under projection bias.
realize less total utility because they overspend on fancy cars, luxury houses, or expensive hotels. A consequence of projection bias is that even for poorer segments of society, a greater than optimal allocation is made to addictive goods such as alcohol, drugs, and lottery tickets thereby leaving them with less of their budget for basic goods such as nutritious food and hygiene.
7 Conclusions In this chapter, we have proposed a model of adaptation and social comparison for valuing time streams of consumption. This model explains two widely observed empirical findings in the well-being literature. The first empirical
Does More Money Buy You More Happiness?
223
finding is that within a society richer people are happier than poorer ones. The second finding is that for a given country average well-being has not improved over time in spite of large gains in per capita income. The second finding is not universal, because in some countries (e.g., Italy and Denmark) the average well-being has improved, although in the majority of countries, including the United States, there has not been an appreciable increase in average well-being. At the individual level, well-being even for lottery winners who had won from $50,000 to $1,000,000 within the previous year was rated at an average of 4 points, compared to 3.8 points for a control group, on a 5-point scale. Furthermore, these lottery winners rated daily activities as less pleasurable than the control subjects did. This finding is dramatic and counterintuitive as most people believe that they will be happier if they win the lottery or even obtain a 20% raise in income. We therefore posed a slight modification to the Easterlin puzzle. Why do people believe that more money will buy more happiness when in fact it does not? We show that under projection bias this puzzle is resolved as a person will predict much more happiness than she will actually realize because of her failure to account for changes in reference levels that accompany higher levels of consumption. Finally, we show that a greater emphasis on basic goods, rather than adaptive goods, will improve happiness. Basic goods include food, shelter, sleep, friendship, spiritual activities, and so on. Great discipline is required, however, to give adequate importance to basic goods. Projection bias will divert resources from basic goods toward adaptive goods even under rational planning. It might be interesting to examine whether activities that provide a better perspective on life (meditation or other spiritual practices) would be able to reduce projection bias in some cases.
Acknowledgments The authors are thankful to Steven Lippman for helpful suggestions.
References 1. M. Baucells and R. Sarin. Predicting utility under satiation and habituation. Technical report, IESE Business School, University of Navarra, Barcelona, Spain, 2006. 2. M. Baucells and R. Sarin. Satiation in discounted utility. Operations Research, 55:170–181, 2007. 3. P. Brickman, D. Coates, and R. Janoff-Bullman. Lottery winners and accident victims: Is happiness relative? Journal of Personality and Social Psychology, 36:917–927, 1978.
224
Manel Baucells and Rakesh K. Sarin
4. A. E. Clark. Job satisfaction in Britain. British Journal of Industrial Relations, 34:189–217, 1996. 5. R. Davidson and Colleagues. Alterations in brain and immune function produced by mindfulness meditation. Psychosomatic Medicine, 65:564–270, 2003. 6. R. Davidson, D. Jackson, and N. Kalin. Emotion, plasticity, context, and regulation: Perspectives from affective neuroscience. Psychological Bulletin, 126:890– 906, 2000. 7. J. A. Davis, T. W. Smith, and P. V. Marsden. General Social Survery, 1972– 2000, Cumulative Codebook. Roper Center for Public Opinion Research, Storrs, CT, 2001. 8. E. Diener and W. Tov. National subjective well-being indices: An assessment. In K. C. Land, editor, Encyclopedia of Social Indicators and Quality-of-Life Studies. Springer, New York, 2005. 9. R. diTella and R. MacCullouch. Some uses of happiness data in economics. Journal of Economic Perspective, 20(1):25–46, 2006. 10. R. A. Easterlin. Does economic growth improve the human lot? Some empirical evidence in nations and households. In P. A. David and M. W. Redner, editors, Economic Growth: Essays in Honor of Moses Abramovitz, pages 98–125. Academic Press, New York, 1974. 11. R. A. Easterlin. Will raising the incomes of all increase the happiness of all? Journal of Economic Behavior and Organization, 27:35–48, 1995. 12. R. A. Easterlin. Income and happiness: Towards a unified theory. Economic Journal, 111:465–484, 2001. 13. R. H. Frank. Choosing the Right Pond. Oxford University Press, New York, 1985. 14. R. H. Frank. The frame of reference as a public good. The Economic Journal, 107(445):1832–1847, 1997. 15. R. H. Frank. Luxury Fever: Why Money Fails to Satisfy in an Era of Excess. Free Press, New York, 1999. 16. S. Frederick and G. Loewenstein. Hedonic adaptation. In D. Kahneman, E. Diener, and N. Schwarz, editors, Well Being: The Foundation of Hedonic Psychology, pages 302–329. Russell Sage, New York, 1999. 17. B. S. Frey and A. Stutzer. Happiness and Economics. Princeton University Press, Princeton, NJ, 2002. 18. B. S. Frey and A. Stutzer. What can economists learn from happiness research. Journal of Economic Literature, 40(2):402–435, 2002. 19. D. Gilbert. Stumbling on Happiness. Knopf, New York, 2006. 20. R. J. Herrnstein and D. Prelec. A theory of addiction. In G. F. Loewenstein and J. Elster, editors, Choice Over Time. Russell Sage Foundation, New York, 1992. 21. R. Inglehart and Colleagues. World Values Surveys and European Values Surveys, 1981–84, 1990–93, 1995–97. Institute for Social Research, Ann Arbor, MI, 2000. 22. R. Inglehart and H.-D. Klingemann. Genes, culture, democracy, and happiness. In E. Diener and E. M. Suh, editors, Culture and Subjective Well-Being. MIT Press, Cambridge, MA, 2000. 23. D. Kahneman, E. Diener, and N. Schwarz, editors. Well Being: The Foundation of Hedonic Psychology. Russell Sage Foundation, New York, 1999. 24. D. Kahneman and A. B. Krueger. Developments in the measurement of subjective well-being. Journal of Economic Perspectives, 20(1):3–24, 2006.
Does More Money Buy You More Happiness?
225
25. D. Kahneman, A. B. Krueger, D. A. Schkade, N. Schwarz, and A. A. Stone. Would you be happier if you were richer? A focusing illusion. Science, 312(30):1776–1780, 2006. 26. D. Kahneman and D. T. Miller. Norm theory: Comparing reality to its alternatives. Psychological Review, 93(2):136–153, 1986. 27. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263–291, 1979. 28. D. Kahneman, P. P. Wakker, and R. K. Sarin. Back to Bentham? Explorations of experienced utility. The Quarterly Journal of Economics, 112(2):375–405, 1997. 29. T. C. Koopmans. Stationary ordinal utility and impatience. Econometrica, 28(2):287–309, 1960. 30. R. Layard. Happiness: Lessons from a New Science. The Penguin Press, London, UK, 2005. 31. H. S. Lepper. Use of other-reports to validate subjective well-being measures. Social Indicators Research, 44:367–379, 1998. 32. G. Loewenstein, T. O’Donoghue, and M. Rabin. Projection bias in predicting future utility. The Quarterly Journal of Economics, 118(3):1209–1248, 2003. 33. G. Loewenstein, D. Read, and R. Baumeister, editors. Time and Decision. Russell Sage Foundation, New York, 2003. 34. G. Loewenstein and D. A. Schkade. Wouldn’t it be nice: Predicting future feelings. In D. Kahneman, E. Diener, and N. Schwarz, editors, Well Being: The Foundation of Hedonic Psychology, pages 85–108. Russell Sage, New York, 1999. 35. V. Medvec, S. Madey, and T. Gilovich. When less is more: Counterfactual thinking and satisfaction among olympic medalists. Journal of Personality and Social Psychology, 69:603–610, 1995. 36. D. Morawetz. Income distribution and self-rated happiness: Some empirical evidence. Economic Journal, 87:511–522, 1977. 37. R. E. Nisbett and D. E. Kanouse. Obesity, hunger, and supermarket shopping behavior. Proceedings of the Annual Convention of the American Psychological Association, 3:683–684, 1968. 38. A. Parducci. Happiness, Pleasure, and Judgment: The Contextual Theory and its Applications. Erlbaum, Hove, UK, 1995. 39. W. Pavot and E. Diener. The affective and cognitive cortex of self-reported measures of subjective well-being. Social Indicators Research, 28(1):1–20, 1993. 40. R. Pollak. Habit formation and dynamic demand functions. Journal of Political Economy, 78:745–763, 1970. 41. H. E. Ryder and G. M. Heal. Optimal growth with intertemporally dependent preferences. Review of Economic Studies, 40:1–33, 1973. 42. P. Samuelson. A note on measurement of utility. Review of Economic Studies, 4:155–161, 1937. 43. D. A. Schkade and D. Kahneman. Does living in California make people happy? A focusing illusion in judgments of life satisfaction. Psychological Science, 9(5):340–346, 1998. 44. D. M. Smith, K. M. Langa, M. V. Kabeto, and P. A. Ubel. Health, wealth, and happiness. Psychological Science, 16(9):663–666, 2005. 45. S. Solnick and D. Hemenway. Is more always better? A survey on positional concerns. Journal of Economic Behavior and Organization, 37:373–383, 1998. 46. A. Stutzer. The role of income aspirations in individual happiness. Journal of Economic Behavior and Organization, 54:89–109, 2003.
226
Manel Baucells and Rakesh K. Sarin
47. B. M. S. vanPraag and A. Ferrer-i-Carbonell. Happiness Quantified: A Satisfaction Calculus Approach. Oxford University Press, Oxford, UK, 2004. 48. B. M. S. vanPraag and P. Frijters. The measurement of welfare and well-being: The Leyden approach. In D. Kahneman, E. Diener, and N. Schwarz, editors, Well Being: The Foundation of Hedonic Psychology, pages 413–433. Russell Sage, New York, 1999. 49. L. Wathieu. Habits and the anomalies in intertemporal choice. Management Science, 43(11):1552–1563, 1997. 50. L. Wathieu. Consumer habituation. Management Science, 50(5):587–596, 2004.
Cognitive Biases Affect the Acceptance of Tradeoff Studies Eric D. Smith1 , Massimo Piatelli-Palmarini2 , and A. Terry Bahill3 1
2
3
Department of Engineering Management and Systems Engineering, University of Missouri at Rolla, MO 65409, USA [email protected] Cognitive Science Program and Department of Management and Policy, University of Arizona Tucson, AZ 85721, USA [email protected] Systems and Industrial Engineering, University of Arizona, Tucson, AZ 85721, USA [email protected]
Summary. Tradeoff studies involving human subjective calibration and data updating are often distrusted by decision makers. A review of objectivity and subjectivity in decision making confirms that prospect theory is a good model for actual human decision making. Relationships between tradeoff studies and the elements of experiments in judgment and decision making show that tradeoff studies are susceptible to human cognitive biases. Examples of relevant biases are given. Knowledge of these biases should help give decision makers more confidence in tradeoff studies.
1 Introduction Tradeoff studies provide a rational method for improving choice among alternatives. Tradeoff studies involve a quantitative consideration of all aspects of the decision, considering all evaluation criteria of the alternatives simultaneously. Without tradeoff studies, humans usually consider alternatives serially, and often fixate on one or a few less-than-optimal criteria. Tradeoff studies are broadly recognized and mandated as the method for simultaneously considering many criteria and many alternatives. They are the primary method for choosing among alternatives given in the Software Engineering Institute’s Capability Maturity Model Integration [7] Decision Analysis and Resolution process [9]. However, a 1999 INCOSE (International Council on Systems Engineering) International Symposium tutorial [12] reveals a much different truth: Over the past few years Bahill has worked with several major aerospace companies . . . and asked for examples of tradeoff analyses that had been done there. He has not yet found one. Given that multicriterion decision analysis techniques (such as tradeoff studies) are mandated for rational decision making, why do so few decision
228
Eric D. Smith et al.
makers use them? Perhaps, because (1) they seem complicated, (2) different techniques have given different preferred alternatives, (3) different life experiences produce different preferred alternatives, and (4) people do not think that way. The goal of this chapter is to inform the reader about common and ever-present biases that produce some of the just-mentioned variability. This variability of results hinders proactive use of tradeoff studies in industry, where tradeoff studies are often written only when required, such as in a project proposal presentation. This chapter is organized as follows. It starts with a description of tradeoff studies, and then it describes a dozen biases that affect human decision making. (In this chapter, for simplicity, we group together cognitive illusions, biases in the canonical definition, and the use of heuristics and call them collectively biases.) The objectivity and subjectivity section of this chapter discusses rational decision making and presents some descriptive models for how humans actually make decisions. The next section presents a dozen biases that especially affect tradeoff studies. Finally, the discussion section suggests how humans can make better decisions if they understand cognitive biases.
2 Components of a Tradeoff Study Problem statement. Problem stating is often more important than problem solving. The problem statement describes the scope of the problem and the key decisions that must be made. Evaluation criteria are derived from high-priority tradeoff requirements. Each alternative will be given a value that indicates the degree to which it satisfies each criterion. This should help distinguish between alternatives. Weights of importance. The decision maker should assign weights to the criteria so that the more important ones will have more effect on the outcome. Alternative solutions must be proposed and evaluated. Investigation of a broad range of alternatives increases the probability of success of a project and also helps to get the requirements right. Evaluation data can come from approximations, product literature, analysis, models, simulations, experiments, and prototypes. Evaluation data are measured in natural units, and indicate the degree to which each alternative satisfies each criterion. Scoring functions (utility curves) transform the criteria evaluation data into normalized scores. The shapes of scoring functions should ideally be determined objectively, but usually subjective expert opinion is involved in their preparation. A scoring function package should be created by a team of engineers and re-evaluated with the customer with each use [8,51]. Scores. The numerically normalized 0 to 1 scores obtained from the criteria scoring functions are easy to work with. Assuming that the weights of importance are also normalized, combining these scores leads to a rankable set of scores for the alternatives that preserves the normalized 0 to 1 range.
Cognitive Biases and Tradeoff Studies
229
Combining functions. The weights and scores must be combined in order to select the preferred alternatives [8]. The most common combining functions are: Sum Combining Function = x + y Product Combining Function = x × y Sum Minus Product Combining Function = x + y − x × y 1/p
Compromise Combining Function = [xp + y p ]
.
One must be careful to choose a combining function appropriate to the situation. Preferred alternatives should arise from the impartial parallel consideration of the scores for the evaluation criteria. The alternative ratings will allow a ranking of alternatives. Care must be taken, however, to eliminate human tendencies that draw the study to a result that is merely subjectively preferred. Sensitivity analysis identifies the most important parameters in a tradeoff study. In a sensitivity analysis, you change a parameter or an input value and measure changes in outputs or performance indices. A sensitivity analysis of the tradeoff study is imperative. The tradeoff study components. Evaluation criteria are derived from a problem statement and possible alternatives are selected. Evaluation data for each evaluation Criterion are normalized with a scoring function, and combined according to the weights of importance and combining functions, yielding a rating for each alternative. A sensitivity analysis is conducted to determine robustness, and a list of preferred alternatives is written. The relationships of these components are shown in Figure 1.
Fig. 1. Components of a tradeoff study.
230
Eric D. Smith et al.
The Pinewood Derby Tradeoff Study is a real-world tradeoff study that also serves as a reference. It has been implemented in Excel with a complete sensitivity analysis, and is available at [34].
3 Inevitable Illusions and Biases When examining human decision making, the presence of cognitive biases and irrationalities can never be ruled out. (Note: In this chapter, we use the term rational as it is used in the field of decision analysis: meaning agents or decision methods that are guided by the aim of maximizing expected value. Our use of the term irrational is therefore technical and does not reflect on any person’s ability to reason, or sanity.) The universality of biases in human decision making is the central tenet of the “heuristics and biases” research program inaugurated and developed by Kahneman and Tversky, which produced the now widely accepted prospect theory [27] to explain how people respond to choices made under risk and uncertainty. Deeply ingrained human biases and predictable misinterpretations of certain classes of external circumstances, such as the framing of choices, can compromise the rationality of decisions and plans. It is important to note that subjects maintain a strong sense that they are acting rationally while exhibiting these biases. One of us (PiattelliPalmarini), in the book Inevitable Illusions, introduces the subject this way: The current term for these biases is “cognitive” illusions, to indicate that they exist quite apart from any subjective, emotional illusion and from any other such habitual, classical, irrational distortion by a particular subject. The pages that follow provide ample documentation, and contain suggestions as to how we may take urgent and sensible precautions against these illusions. It never ceases to surprise me that, more or less 20 years after these illusions were first discovered, and after dozens of books and hundreds of articles have been printed on the subject of cognitive illusions, almost no one except for a select circle of specialists seems to have taken this discovery seriously. In simple and basic fashion, this book proposes to set out the recent scientific discovery of an unconscious . . . that always and unbeknownst to us involves the cognitive; that is, the world of reason, of judgment, of the choices to be made among different opportunities, of the difference between what we consider probable and what we consider unlikely [32]. It is therefore proposed that tradeoff studies be re-examined with the goal of finding subtle biases and cognitive illusions. Below is a summary of common biases and irrationalities that may intervene, unbeknownst to the decider, in what is presumed to be rational decision making.
Cognitive Biases and Tradeoff Studies
231
Because of cognitive illusions, biases, fallacies, and the use of heuristics, humans often make mistakes in doing tradeoff studies. Smith [43] discusses seven dozen biases from the psychology, decision-making, and experimental economics literature that can induce such mistakes. Many of these are also mentioned in [3,32,40] and in popular Web sites such as [29]. A matrix of relations between cognitive biases and tradeoff study elements is available at [34]. 3.1 Judgment and Decision Making The rich scientific literature on judgment and decision making details many cognitive biases, irrationalities, and inconsistencies in the way humans make judgments and decisions. In light of this, it may be cost effective to train decision-making personnel specifically in ways that reduce intangible cognitive biases. Training brings standardization and broadens decision-making abilities, breaking the situation where, “We are prisoners of our own experience.” Ambiguity Aversion, Comparative Ignorance, and Severity Amplifiers The probability of someone accepting a bet can be lowered by the presence of nonrelevant ambiguous material, as in the Ellsberg paradox [16,17]. The decision to play a gamble can be influenced by the perceived intelligence of counterparts, even while the probability of winning remains the same [18,19]. Lack of control, lack of choice, lack of trust, lack of warning, lack of understanding, framing by man, newness, dreadfulness, personalization, and immediacy all amplify perceived severity [1,2]. Confirmation Bias Humans will often try out several means to prove that their current favorite hypothesis is correct, until all efforts fail, and only then will they seriously consider the next hypothesis [50]. Shweder [42] showed a correlation between initially recorded data, and memories of subsequent data. The history of science offers many examples of this obstinacy, an example of which is the obdurate and exaggerated application of circular motion to the planetary orbits before Kepler introduced ellipses in 1609 in his Astronomia Nova. Symmetrically, a disconfirmation bias occurs when people exaggerate the severity of their critical scrutiny of information that contradicts their prior beliefs. Individual Versus Group Decision Making Individuals may be biased, but so may groups. Which is more biased? A prevalent thesis, backed by good data but by no means universally supported, is that groups are less easily biased than the individuals that compose them. However, there is no clear or general pattern. Complicated social models such as Davis’s social decision scheme [10] give good, albeit complicated answers. In short, cognitive problems in decision making cannot always be solved by requiring group decision making.
232
Eric D. Smith et al.
3.2 Framing and Prospect Theory This section and the next deal with objectively incorrect human subjective evaluations of value and probability, which are so ingrained in cognitive evaluation that experts employing their full attention find it difficult to recognize cognitive evaluations among objective evaluations. The effects of cognitive evaluations are thus subtle and difficult, in current untutored practice, to separate from objective assessment. Framing—in a popular sense—is the act of “placing a picture frame” within the full reality of the decision situation, for the purpose of reducing processing effort, increasing mental assimilation and simplifying the decision. Anchoring is a psychological term that refers to focusing on the reference point when making decisions. For example, a shopper looking for a dog may anchor on the fact that a puppy is a Labrador Retriever, and ignore temperament, health, and price. Anchoring and then insufficient adjustment occurs when a decision maker chooses an anchor and adjusts his or her judgments myopically with respect to it. For example, a person may anchor on the first new computer she sees, and adjust insufficiently to consider fully the characteristics of other computers seen later. Availability and Typicality People employ a mental availability heuristic when they evaluate the likelihood or frequency of an event based on how quickly instances or associations come to their own mind. For example, people who have friends that smoke usually overestimate the smoking population. Typicality occurs when items with greater family resemblance to a category are judged more prevalent. Tversky and Kahneman [47] give this example of typicality. Subjects were told: imagine an individual X, who was extracted at random from a large population. Estimate the following probability. • Group 1: That X has suffered already at least one heart attack • Group 2: That X is over 50 years old and has suffered already at least one heart attack Subjects who were shown the Group 2 statement gave, on average, higher estimates than subjects who were shown the Group 1 statement. Loss Aversion Figure 2 shows that people more strongly prefer to avoid losses than acquire gains. Would you rather get a 5% discount, or avoid a 5% surcharge? Most people would rather avoid the surcharge. The same change in price framed differently has a significant effect on consumer behavior. Economists consider this to be irrational. It is important in the fields of marketing and finance. A delay–speedup asymmetry occurs when people will pay more to get rid of a delay than they would to speed up the schedule by the same amount
Cognitive Biases and Tradeoff Studies
233
Fig. 2. Subjective worth versus numeric value according to prospect theory.
of time. The explanation for the asymmetry comes from the subjective value function of prospect theory shown in Figure 2. The delay–speedup asymmetry obviously has applications towards schedule and risk requirements, which often refer to money available either for ameliorating a delay or paying for a schedule speedup. Loss/Gain Discounting Disparity Losses are discounted less (and forgotten later) than gains, making humans susceptible to the “sunk costs effect,” or a tendency to hold on to manifestly losing investments. This effect explains long-held grudges, and even mental illness stemming from an inability to forget past traumas. Perhaps remembering losses more than gains is a good way to focus on learning from mistakes, but the practice is not part of an objective probabilistic stance inline with rational decision making. For example, project managers should not over-emphasize aspects of a project related to a previous project failure. Figure 2 also shows the reference point, which is important as demonstrated in this example from Hammond et al. [25]. Would you accept a 50–50 chance of either losing $300 or winning $500? Many people would refuse this bet. Now let us change the reference point. Assume you have $3000 in your checking account and you are asked, “Would you prefer to keep your checking account balance of $3000 or accept a fifty-fifty chance of having either $2700 or $3500?” Most people would take this bet. 3.3 Subjective Probability and Illusions To the consternation of believers in human probability judgments, one can take practically any of the axioms of probability, including the one that says 0 ≤ p ≤ 1, and design an experiment that shows that peoples’ spontaneous intuitions violate it. There is always an appeal to “simple” explanations for probabilistic and other illusions. The data can allegedly be explained in terms
234
Eric D. Smith et al.
of absent-mindedness, linguistic ambiguity, skewed implicit presuppositions, lack of memory, lack of attention, lack of motivation, etc. However, the best papers in [cognitive psychology] show that this is not the case. [33] The following examples show subjective violations of the axioms of probability. Frequency Illusions Clausen and Frey [6],and Gigerenzer and Selten [20,21] proposed that humans are better calibrated when using frequency of occurrence of events, instead of probabilities. For the same reason professionally chosen stock portfolios may do no better than those composed of stocks picked at random. Conjunction Fallacy The probability of two independent events happening simultaneously will always be less than the probability of either event happening alone: (P (a) × P (b) ≤ P (a) and P (a) × P (b) ≤ P (b)). However, most people are influenced by the “typicality” or “ease of representation” of some conjoined circumstances, and so wrongly judge that specific conditions are more probable than general ones. Consider this example by Tversky and Kahneman [47]: Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more likely? (1) Linda is a bank teller, or, (2) Linda is a bank teller and is active in the feminist movement. 85% of those asked chose option 2. The Sure-Thing Principle If you are going to perform an action regardless of the appearance of an unrelated outcome, then you should not wait to see the unrelated outcome [38]. A response pattern violating the sure-thing principle would look like this: • I will do A, if I know that S the case. • I will do A, if I know that S is not the case. • However, because I do not know the state of S yet, I will wait to do A. An example is choosing a project whether or not a certain employee comes on board [49]. Humans seem to avoid complex disjunctive choices, and opt for decisions that hypothetically will assure less undesirable outcomes. This occurs even when they are told that the outcome, not yet known to them, has already been irrevocably determined. Consequently, most people seem to prefer to settle for “sure” key attributes before advancing into an analysis of other important, but conflicting, attributes.
Cognitive Biases and Tradeoff Studies
235
Extensionality Fallacies Extensionally equivalent true descriptions of an event should all map onto the same numerical probability. But they do not. Two groups of subjects were shown the following insurance policies. • Group A. How attractive do you find an insurance that covers hospitalizations due to any cause whatsoever? • Group B. How attractive do you find an insurance that covers hospitalizations due to diseases or accidents of any sort? On the average, subjects of Group B found their insurance more attractive than did subjects of Group A [26]. One can imagine that a specialized weapon system adapted against two or three specifically ominous threats could be preferred to a more general-purpose system with more capabilities. Certainty Effect People prefer reducing chances of something bad happening from something to nothing, more than reducing chances of something bad happening by the same amount but not to zero. Plous [35] cites economist Richard Zeckhauser: “Zeckhauser observed that most people would pay more to remove the only bullet from a gun in Russian roulette than they would to remove one of four bullets.” Notice that the reduction in their probability of being shot is the same in both cases. 3.4 Objectivity and Subjectivity; Prospect Theory The definitions of value versus utility, and the objective versus the subjective, must first be settled in order to gain a clear understanding of decision theory as it applies to human decision making. Edwards [14] created Figure 3, which differentiates the four variations of the expected value model. In the upper left is the rational, normative, prescriptive, mathematical theory of expected values, where objective probabilities and isomorphic values of rewards are considered. Isomorphic here means preserving an identical value—determined by a normative rational theory—despite the opinion of any individual. The lower right quadrant represents the subjective, behavioral, “biological,” descriptive theory of subjective expected utility, where subjective probabilities and nonisomorphic utilities are considered. This terminology is clarified in Figure 4, in which cross-hatching indicates the areas of subjective utility and probability. It is seen that there are two models (or views, or theories) of human behavior: the normative and the behaviorally descriptive. The normative, or prescriptive, model arises from the view that humans should make decisions according to rational calculations, for example, when the sum of objective probabilities multiplied by corresponding objective values gives the overall
236
Eric D. Smith et al.
Fig. 3. Four variations on the expected value model, from Edward [14].
Fig. 4. Objective and subjective value and probability models.
expected value of a choice. On the other hand, descriptive models of human decision making seek, rather, to describe and explain human behavior as it is. The most accepted descriptive theory is prospect theory [27,48], which explains the nature of the subjective human decision-making process in terms of the heuristics and biases employed in assessing information, and the common deviations from rational decision making that result. Framing and Subjective Utility Prospect theory breaks subjective decision making into a preliminary screening stage, and a secondary evaluating stage. The effect of these two stages is that values are considered not in an absolute sense (from zero), but subjectively from a reference point established by the subject’s perspective on the
Cognitive Biases and Tradeoff Studies
237
situation, based on his self-estimated wealth and his evaluation of the best and worst outcomes before the choice is made. Within prospect theory, the establishment of a subjective reference point is formally called framing. This reference point can be predictably altered by presenting a same choice situation under different, although equally truthful, descriptions. The key graph that shows how objective values translate into subjective worth is shown in Figure 2. Note the significant disparity in magnitude with which gains and losses are subjectively valued; losses can have absolute magnitudes of about 2.25 to 2.5 times that of gains, depending on the human subject.
Fig. 5. The probability weighting function of cumulative prospect theory [36,48].
The probability weighting function of Figure 5 shows the subjective probability weight (how much the stated probability weighs in real-life decision making) as a function of the real probability. The diagonal represents the normative ideal case of a perfectly calibrated, perfectly rational decision maker. In several mathematical models the two curves cross at 1/e ≈ 0.37. Subjective Probability Prospect theory describes the subjective evaluation of probabilities according to the experimentally obtained graph in Figure 5. People overestimate events with low probabilities, such as being killed by a terrorist or in an airplane crash, and underestimate high probability events, such as adults dying of cardiovascular disease. The existence of state lotteries depends upon such overestimation of small probabilities. An effective method of forcing the visualization of how small the probability of winning any such large-stakes lottery consists of computing how many lotteries one should be playing until the chance of winning at least one approaches 50%. The typical figure is one new lottery every minute for thousands of years. At the right side of Figure 5, the probability of a brand new car starting every time is very close to 1.0. But many people put jumper cables in the trunk and buy memberships in AAA. Now, with some understanding of the key differences between objective and subjective decision making, we apply Edwards’ schema to the field of
238
Eric D. Smith et al.
decision making to give Figure 6 (note that prospect theory is not the only mathematical treatment of real-life decision making, but it is the most widely accepted as theoretically sound).
Fig. 6. Rational behavior versus human decision making.
When objective and accurate numerical values are available, tradeoff studies give an exact ranking of alternatives through numerical calculation. In the presence of subjective utilities, when a person expresses judgments or preferences, the best description for human decision making is prospect theory. Humans are far from ideal decision makers because of cognitive illusions and biases, and the use of heuristics. Using tradeoff studies judiciously can help people make rational decisions. We want to help move human decision making from the normal human decision-making lower-right quadrant to the ideal decision-making upper-left quadrant in Figure 6.
4 Biases That Inhibit Acceptance of Tradeoff Studies Many aspects of tradeoff studies “turn off” human decision makers. Evidence for this is in the small number of tradeoff studies reported in the literature. This is a 1999 INCOSE International Symposium tutorial description [12]: When an engineer performs a tradeoff analysis and submits the results to his or her boss, the boss often says, “No, that is not the right
Cognitive Biases and Tradeoff Studies
239
answer.” In response to this, the engineer might change weights, values, scoring functions or tradeoff functions. Part of the reason that engineers do not do tradeoff studies may be that they have seen applications where similar analyses using similar techniques have given different preferred alternatives. Tradeoff analyses fall into the field of decision making under risk or uncertainty, often called utility theory or multicriterion decision making. Perhaps the thought of having to delve into a combination of subjective utility theory and multicriterion decision making discourages human decision makers from performing tradeoff studies. Overconfidence in Subjective Choice: Often Wrong, but Rarely in Doubt Griffin and Tversky [24] have this to say about the weighing of evidence and the determinants of confidence: One of the major findings that has emerged [from cognitive science] is that people are often more confident in their judgments than is warranted by the facts. Overconfidence is not limited to lay judgment or laboratory experiments. The well-publicized observation that more than two-thirds of small businesses fail within 4 years suggests that many entrepreneurs overestimate their probability of success. Overall, people are too confident in their subjective opinion of the overall (presupposed) final result of a tradeoff study, even before they begin any calculations. They may therefore consider it a waste of time to proceed with the mechanics of a formal tradeoff study. However, there are other reasons for the uneasiness and reluctance to proceed with, and complete a tradeoff study. These reasons are described below. Calibration Griffin and Tversky [24] provide an answer to the question of uneasiness in conducting tradeoff studies: “Overconfidence is not universal. Studies of calibration have found that with very easy items, overconfidence is eliminated, and under confidence is often observed [28].” Because the process of completing a tradeoff study involves much calibration, underconfidence can easily occur, and possibly accumulate. Consider the following two questionnaires depicted in Table 1, which were posed to experimental subjects. Half of the subjects got the first questionnaire and the other half got the second one [39]. The Xs show the average answer. On average, the subjects choose the answer near the center of the numerical range, regardless of its numerical value. Obviously, humans cannot be trusted with a problem of calibration, even when it pertains to their own personal experience. In this case, subjects seemed eager to calibrate a number in a way that seemed most compatible with the given range of scale.
240
Eric D. Smith et al. Table 1. Experimental questionnaire [39]. Please estimate the average number of hours you watch television per week 1–4
5–8
X 9–12
13–16
17–20
More than 20
Please estimate the average number of hours you watch television per week 1–2
3–4
X 5–6
7–8
9–10
More than 10
Discriminability of Updated Information Griffin and Tversky [24] reviewed the discrimination of a hypothesis against its updated hypothesis: “People seem to focus on the strength of evidence for a given hypothesis [or a favored hypothesis] and neglect how well the same evidence fits an alternate hypothesis.” Furthermore, Griffin and Tversky note that “studies of sequential updating have shown that posterior probability estimates commonly exhibit conservatism or under confidence [15].” Thus, the sequential updating of information in a Bayesian fashion can be accompanied by inflexibility and lack of trust in the analyst’s mind. The lack of trust, in calibration and posterior probabilities, deals a double blow to progress within tradeoff studies. People would rather hold on to their false overconfidence in their preliminary subjective opinion of the alternatives. Law of Small Numbers The law of small numbers simply stated says, “There are not enough small numbers to meet the many demands made of them.” For example, the function de(n−1)/2 e gives the sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . (i.e., the first ten Fibonacci numbers) for n = 1, . . . , 10, although it subsequently continues 91, 149, . . ., which are not Fibonacci numbers. In tradeoff studies, this law could influence the estimation of weights of importance (or other parameters) whenever critical differences in weights are shrouded by the possibly small number of weight value options used. If aware of such limitations, a decision maker who considers tradeoff studies to be a large collection of “small numbers” may not want proceed with any extended analysis or effort. In a another situation concerning the law of small numbers, someone charged with a decision may want to take some small indicator or data sample, place undue confidence in it, make the decision according to it, and “spare
Cognitive Biases and Tradeoff Studies
241
himself the trouble of re-confirming the obvious.” Tversky and Kahneman [46] wrote: In review, we have seen that the believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others’, he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal “explanation” for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact. For a serious tradeoff study analyst who is focused on finishing the broad structure of the tradeoff study, this aspect of the law of small numbers could result in the mistake of overestimating the significance of small sample sizes that go toward establishing the input values of some criteria, especially if these input values turn out to have the highest sensitivity. Strength and Weight In cognitive psychology, “strength and weight” refers to the magnitude of a measure and the measure of its confidence (e.g., sample size, reliability, uncertainty, or variance). Traditional tradeoff studies do not incorporate measures of confidence in the data. Because tradeoff studies usually focus on the strength of the evidence (criteria values or weights of importance, e.g.) they usually lack a critical component of psychological thought, namely, that of a measure of confidence of each value. Statistical theory and the calculus of chance prescribe rules for combining strength and weight. For example, probability theory specifies how sample proportion and sample size combine to determine posterior probability. . . . [Usually] people focus on the strength of the evidence . . . and then make some adjustment in response to its weight. [24] This usual focus on strength or magnitude in tradeoff studies could thus be beneficial, if the analyst prefers to streamline her decision-making thoughts, or detrimental, if the analyst would prefer to have a strong association between strength and weight in the tradeoff study.
242
Eric D. Smith et al.
Elimination by Aspects; Killer Trades In tradeoff studies, it is acceptable to narrow down the number of alternatives with a preliminary round of killer trades. Alternatives are eliminated by choosing an important aspect, or criterion, and eliminating all the alternatives that do not satisfy the criterion, or aspect. The focused-on criterion or aspect should obviously be very important. This strategy does not consider the effect of criteria that are present in all alternatives, and could mistakenly eliminate a superior alternative by focusing on unimportant criteria; however, the attractiveness of this strategy lies in that it may be easily employed to eliminate a large number of alternatives, and can quickly reduce the complexity of a computational ranking decision. Humans usually shy away from computation, because their innate computational facilities are quite limited [41]. Tversky [45] stated, People are reluctant to accept the principle that (even very important) decisions should depend on computations based on subjective estimates of likelihoods or values in which the decision maker himself has only limited confidence. When faced with an important decision, people appear to search for an analysis of the situation and a compelling principle of choice that will resolve the decision problem by offering a clear-cut choice without relying on estimation of relative weights, or on numerical computations. From a normative standpoint, the major flaw in the principle of elimination by aspects lies in its failure to ensure that the alternatives retained are, in fact, superior to those that are eliminated. In general, therefore, the strategy of elimination by aspects cannot be defended as a rational procedure of choice. On the other hand, there may be many contexts in which it provides a good approximation to much more complicated compensatory models and could thus serve as a useful simplification procedure. Cognitive Dissonance People do not like sustained cognitive dissonance, the holding of competing alternatives within their minds. They like to make a decision and “forget about it.” During man’s evolutionary trajectory, at least until his settling down to an agricultural life approximately 10,000 years ago, man was not in a position to practice computational decision making. For a strong (over)emphasis on the evolutionary adaptive value of simple heuristics, see Gigerenzer et al. [22]. Experimenter’s Regress It is possible for a feedback loop to form detrimentally between theory and evidence. In science, theories are confirmed by evidence, but evidence is also judged according to theories. If cognitive biases or errors affect input data in
Cognitive Biases and Tradeoff Studies
243
tradeoff studies, an alternative can improperly gain or lose perceived strength, leading to an erroneous conclusion—either in the tradeoff study itself or later in studies based on a previous one. Theories of Randomness Early in the history of the judgment and decision-making field, theories held that subjective values were determined by rules involving randomness [44]. Luce [30] held that a random element was involved in decision rules. Obviously, some randomness will always be present in decision studies, either because some randomness will be present in input data, or because the workings of the human brain involve some random components. In the presence of randomness in data, some people will say, “What is the use of doing detailed calculations on inexact data? I would rather just make a decision with which I feel comfortable.” The Case for Nonobjectivity Philosophy has noted a number of reasons why people cannot reason objectively with probabilities. 1. An existence of n probabilities can immediately lead to a necessary consideration of 2n conditional probabilities. For n = 300, there are 2300 conditional probabilities, which is a number larger than the Eddington number, 2258 , the estimated number of particles in the known universe. 2. Probability is an idealization from frequencies. “Probabilities” are often derived from limited samples, from populations whose extent is often unknown. 3. Philosophically speaking, there are no ultimate definitions for anything, only rules for how we reason about things. It is easy to be caught up in the “objective” and exact consideration of “probabilities,” but there is in fact no permanent and fixed definition of such. From a large enough perspective, reasoning is not deductive, but only practically useful and always defensible, that is, subject to annulment when the limits of its logic are found. With a built-in feeling of the infinite, humans often convince themselves that reasoning is of limited value. Obviating Expert Opinion Often, lay people or the lesser trained can converse with an expert—say a decision expert—and conclude that the expert is no better at making decisions than they are. Specific and preponderant evidence of the expert’s skill is often needed. Why is this? 1. Although in some cases an expert may come to a very quick, almost instantaneous, assessment of a situation in the “blink” of an eye [23], usually a period of preliminary perception and assessment is necessary, as in the
244
Eric D. Smith et al.
case of chess players evaluating a chess game position they have never seen before [11]. During this stage, a chess expert may be quieter than a player who truly does not know the game, as a decision analyst may only be laying the groundwork for a decision. 2. In tasks only slightly unassociated with the tasks in which a person is an expert, an expert may fare no better than the average person may. For example, when simply counting the total number of chess pieces on a chess board, irrespective of type or position, experts fared no better than other subjects; however, in detecting critical patterns, experts performed much better [37]. 3. All humans store about seven units or “chunks” of information at a time [31], in short-term memory, irrespective of skill level. However, the chess master’s chunks are larger and richer in information than amateurs’ chunks [5]. A novice cannot see the forest for the trees. In tasks other than the field of study, an expert may seem no smarter than the average person. The above effects may combine and construe to convince an evaluator that an expert has nothing to offer, and that any person with no training will make a good decision. Feeling Invincible Many bad decisions can be attributed to the decision-maker’s sense of invincibility. Teen-age boys are notorious for thinking, “I will not get caught; I cannot get hurt; I will avoid car accidents.” Many other people think, “Nobody is going to steal my identity; I will be able to quit any time I want to; I do not need sun screen, I will not get skin cancer; I do not have to back up my hard drive, my computer will not crash.” The Spanish Armada was thought to be invincible in 1588. The Japanese thought they were invincible at Pearl Harbor in 1941. The German military thought it was invincible as it stormed across Europe at the beginning of World War II. And of course, in 1912, the White Star line said that the Titanic was “unsinkable.” The invincibility bias will affect the risk analysis and therefore the problem statement of a tradeoff study. Summary People rarely do formal tradeoff studies, because the factors listed in this section come together to hinder the implementation of tradeoff studies in settings other than when a sponsor directly orders a tradeoff study to be conducted, makes her ongoing interest in getting the study done clear, and provides significant funding and time for the effort.
Cognitive Biases and Tradeoff Studies
245
5 Discussion Humans usually consider alternatives in series [50], and are often moved to choose hastily one alternative after having their attention drawn, or fixated, to only one or a few criteria. Also, humans tend to form conclusions based on their favorite theories, not from a complete set of alternatives and then a shrinking set of hypotheses brought about by conclusions based on experimentation, empirical data, and data analysis. In order to make good rational choices among alternatives the decision maker should be an expert in the relevant subject matter, and also be aware of cognitive biases and fallacies. Limited awareness can precipitate poor judgment. Decision makers should also have a complete understanding of the mathematical methods that allow the parallelization of human decision processes through tradeoff studies, and be able to apply them without error. Despite the difficulties, the needed rationality and rigor to make good decisions is available. Employing a team approach, with the long-term horizon necessary to conduct iterations and public reviews, brings sobriety to the decision process. Decision aids such as tradeoff studies bring rationality to decision making. Brown [4] notes that good decision-making aids (1) emulate or replicate the performance of some more competent decider, (2) replace the decider’s current thinking and analyze decisions from scratch, and (3) enhance or improve on the decider’s logical thinking. On the level of an individual tradeoff study analyst, the actual mechanics of using knowledge of cognitive biases for improving a tradeoff study would involve the analyst stopping periodically and recognizing biases in his own cognitive processes. Such cognitive self-examination would have to be continually motivated, either by the analyst or by a supervisor. A formal mathematical or statistical examination process for cognitive biases, perhaps on a departmental level, has not been documented, and would probably be expensive. Complex impersonal decisions involving alternatives should not be attempted holistically—at the least, nonexperts should wholly avoid making important decisions with a holistic, mental, feeling-based approach. In order to establish rationality, the components of the decision must be made clear. This is possible by focusing on each element individually. The higher-level decision then becomes a calculation based on a broad base of rationally considered components. Consider the difference between an optimization search and a tradeoff study. As an example, let us consider the updated Pinewood tradeoff study, which has 201 parameters. Assuming that each parameter has only two settings, the number of possible combinations is 2201 , which is close to Eddington’s number of particles in the known universe, 2258 [13]. From a combinatorial optimization standpoint, the Pinewood problem is an incalculable, time-impossible problem. Yet, after rationality is brought to bear on each
246
Eric D. Smith et al.
component individually, a preference order of alternatives can be calculated in a fraction of a second with a common computer and spreadsheet. Returning to the issue of complex decisions made in an instant [23], it should be noted that experts capable of making such judgments have probably spent long periods of time in training, during which they have individually, sequentially and rationally examined the components of the decision. Any preconscious parallelization occurring in such an expert’s brain is reproduced in the parallel structure of a tradeoff study, which is ultimately based on hard data analysis.
6 Conclusions Humans usually consider alternatives in series, and are often moved to hastily choose one alternative after having their attention drawn, or fixated, to only one or a few criteria. Prospect theory describes an information-editing stage followed by the application of subjective probability and value functions. In order to choose rationally among alternatives the decision-maker’s awareness of cognitive biases and fallacies must increase. Limited awareness and irrationality can limit the decision-maker’s trust in the tradeoff study. The tradeoff study should be based on a broad base of rationally considered components, calculation methods, and assumptions. Decision makers should have a complete understanding of the mathematical methods that allow the parallelization of human decision processes through tradeoff studies. Despite the difficulties, tradeoff studies provide a reliable method of rational decision making.
Acknowledgments This work was supported by the Air Force Office of Scientific Research under AFOSR/MURI F49620-03-1-0377.
References 1. A. T. Bahill. Risk analysis of a pinewood derby, http://www.sie.arizona.edu/ sysengr/slides/risk.ppt. Last accessed January 2008. 2. A. T. Bahill. Tradeoff Studies: A Systems Engineering Skills Course. BAE Systems, San Diego, 2006. 3. L. R. Beach and T. Connolly. The Psychology of Decision Making: People in Organizations. Sage, Thousand Oaks, CA, 2005. 4. R. Brown. Rational Choice and Judgment: Decision Analysis for the Decider. John Wiley & Sons, Hoboken, NJ, 2005. 5. W. G. Chase and H. A. Simon. Perception in chess. Cognitive Psychology, 4:55–81, 1973.
Cognitive Biases and Tradeoff Studies
247
6. D. Clausen and D. D. Frey. Improving system reliability by failure-mode avoidance including four concept design strategies. Systems Engineering, 8:245–261, 2005. 7. CMMI. Capability maturity model integration. Software Engineering Institute: http://www.sei.cmu.edu/cmmi/. Last accessed January 2008. 8. J. Daniels, P. W. Werner, and A. T. Bahill. Quantitative methods for tradeoff analyses. Systems Engineering, 4:190–212, 2001. 9. DAR. DAR basics: Applying decision analysis and resolution in the real world, 2004. Software Engineering Institute: http://www.sei.cmu.edu/cmmi/ presentations/sepg04.presentations/dar.pdf. Last accessed January 2008. 10. J. H. Davis. Group decision and social interaction: A theory of social decision schemes. Psychological Review, 80:97–125, 1973. 11. A. D. De Groot. Thought and Choice in Chess. Mouton, The Hague, The Netherlands, 1965. 12. Doing tradeoff studies with multicriterion decision making techniques. http: //www.incose.org.uk/incose99/tutt05.htm. Last accessed January 2008. 13. A. Eddington. Mathematical Theory of Relativity. Cambridge University Press, Cambridge, UK, 1923. 14. W. Edwards. An attempt to predict gambling decisions. In J. W. Dunlap, editor, Mathematical Models of Human Behavior, pages 12–32. Dunlap and Associates, Stamford, CT, 1955. 15. W. Edwards. Conservatism in human information processing. In B. Kleinmuntz, editor, Formal Representation of Human Judgment, pages 17–52. John Wiley & Sons, New York, 1968. 16. D. Ellsberg. Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 75:643–669, 1961. 17. D. Ellsberg. Risk, ambiguity, and decision. PhD thesis, Harvard University, Cambridge, MA, 1962. 18. C. R. Fox and A. Tversky. Ambiguity aversion and comparative ignorance. Quarterly Journal of Economics, 110:585–603, 1995. 19. C. R. Fox and M. Weber. Ambiguity aversion, comparative ignorance, and decision context. Organizational Behavior and Human Decision Processes, 88:476–498, 2002. 20. G. Gigerenzer. How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology, 2:83–115, 1991. 21. G. Gigerenzer and R. Selten, editors. Bounded Rationality: The Adaptive Toolbox. The MIT Press, Cambridge, MA, 1991. 22. G. Gigerenzer, P. M. Todd, and T. A. R. Group. Simple Heuristics that Make us Smart. Oxford University Press, New York, 1999. 23. M. Gladwell. Blink: The Power of Thinking Without Thinking. Little Brown, New York, 2005. 24. D. Griffin and A. Tversky. The weighting of evidence and the determinants of confidence. Cognitive Psychology, 24:411–435, 1992. 25. J. S. Hammond, R. L. Keeney, and H. Raiffa. Best of HBR: The hidden traps in decision making. Harvard Business Review, 84:118–126, 2006. 26. E. J. Johnson, J. Hershey, J. Meszaros, and H. Kunreuther. Framing, probability distortions, and insurance decisions. Journal of Risk and Uncertainty, 7:35–51, 1993. 27. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979.
248
Eric D. Smith et al.
28. S. Lichtenstein, B. Fischhoff, and L. D. Phillips. Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgment Under Uncertainty: Heuristics and Biases, pages 306–334. Cambridge University Press, Cambridge, UK, 1982. 29. List of cognitive biases. http://en.wikipedia.org/wiki/Cognitive\ biases. Last accessed January 2008. 30. R. D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, New York, 1959. 31. G. A. Miller. The magic number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63:81–87, 1956. 32. M. Piattelli-Palmarini. Inevitable Illusions: How Mistakes of Reason Rule our Minds. John Wiley & Sons, New York, 1994. 33. M. Piattelli-Palmarini. Knowledge of probability. Graduate seminar on the foundations of judgment and decision making, The University of Arizona, Tucson, 2005. 34. Pinewood Derby tradeoff study. http://www.sie.arizona.edu/MURI/ software.html. Last accessed January 2008. 35. S. Plous. The nuclear arms race: Prisoner’s dilemma or perceptual dilemma? Journal of Peace Research, 30:163–179, 1993. 36. D. Prelec. The probability weighting function. Econometrica, 66:497–527, 1998. 37. P. Saariluoma. Chess players search for task relevant cues: Are chunks relevant? In D. Brogan, editor, Visual Search. Taylor & Francis, London, 1990. 38. L. J. Savage. The Foundations of Statistics. Wiley, New York, 1954. 39. N. Schwarz. Assessing frequency reports of mundane behaviors: Contributions of cognitive psychology to questionnaire construction. Review of Personality and Social Psychology, 11:98–119, 1990. 40. E. Shafir, editor. Preferences, Belief and Similarity: Selected Writings. The MIT Press, Cambridge, MA, 2004. 41. R. N. Shepard. On the subjective optimum selection among multiattribute alternatives. In M. W. Shelly and G. L. Bryan, editors, Human Judgments and Optimality. Wiley, New York, 1964. 42. R. A. Shweder. Likeness and likelihood in everyday thought: Magical thinking in judgments about personality. Current Anthropology, 18:637–648, 1977. 43. E. D. Smith. Tradeoff studies and cognitive biases. PhD thesis, Department of Systems and Industrial Engineering, The University of Arizona, Tucson, 2006. 44. L. L. Thurstone. The Measurement of Values. University of Chicago Press, Chicago, 1959. 45. A. Tversky. Elimination by aspects: A theory of choice. Psychological Review, 79:281–299, 1972. 46. A. Tversky and D. Kahneman. Belief in the law of small numbers. Psychological Bulletin, 76:105–110, 1971. 47. A. Tversky and D. Kahneman. Extensional vs. intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90:293–315, 1983. 48. A. Tversky and D. Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 26:297–323, 1992. 49. A. Tversky and E. Shafir. The disjunction effect in choice under uncertainty. Psychological Science, 3:305–309, 1992. 50. W. A. Wickelgren. How to Solve Problems: Elements of a Theory of Problems and Problem Solving. Freeman, San Francisco, 1974.
Cognitive Biases and Tradeoff Studies
249
51. A. W. Wymore. Model-based Systems Engineering. CRC Press, Boca Raton, FL, 1993.
Valuation of Vague Prospects with Mixed Outcomes David V. Budescu1 and Sara Templin2 1
2
Department of Psychology, University of Illinois, Champaign, IL 61820, USA [email protected] Department of Psychology, University of Kansas, Lawrence, KS 66045, USA [email protected]
Summary. Previous work on the joint effects of vagueness in probabilities and outcomes in decisions about risky prospects has documented the decision-makers’ (DMs) differential sensitivity to these two sources of imprecision. Budescu et al. [6] report two studies in which DMs provided certainty equivalents (CEs) for precise and vague prospects involving gains or losses. They found (a) higher concern for the precision of the outcomes than that of the probabilities, (b) vagueness seeking for positive outcomes, (c) vagueness avoidance for negative outcomes, and (d) stronger attitudes towards vague gains than for vague losses (see also, [13]). They proposed and tested a new generalization of prospect theory (PT) for options with vaguely specified attributes. The present work extends this model to the case of vague mixed prospects. We report results of a new experiment where 40 DMs used two methods (direct judgments of numerical CEs, and inferred CEs from a series of pairwise comparisons) of valuation of positive (gains), negative (losses), and mixed (gains and losses) prospects with vague outcomes. The results confirm the previous findings of vagueness seeking in the domain of gains, vagueness avoidance for losses, and stronger effects of vagueness in the domain of gains. The CEs of mixed prospects are also consistent with this pattern. The DMs overvalue prospects with vaguely specified gains and precise losses, and undervalue prospects with precisely specified gains and imprecise losses, relative to mixed prospects with precise parameters. Parameter estimates of the generalized model indicate that in the mixed cases the attitudes to vagueness in the two domains are slightly less pronounced, and they are treated more similarly to each other than in the strictly positive, or negative, cases.
1 Introduction Following Ellsberg’s seminal paper [15], decision theorists have recognized the importance of the decision-maker’s (DM’s) attitude and reaction to the imprecision that characterizes the decision parameters. This imprecision is not part of classical subjective expected utility theory, and thus is not expected to affect decisions. But most people are not indifferent between an option
254
David V. Budescu and Sara Templin
with a precisely known probability of 50% and one whose best estimate is 50% but could be somewhat higher or lower. Similarly, most people are not indifferent between two events with roughly equal probabilities but different amounts of evidence underlying these estimates (see [19]). Most people are averse to vagueness, and behave as if an imprecise probability is likely to be less than 50%. Vagueness1 aversion has been demonstrated in numerous laboratory experiments with gambles, and has been generalized to health, environmental, negotiation, and other decision contexts (see [2,9,12,14,17,23,25,26] for partial reviews and illustrative examples). Recent research has also examined the effect of vague specification of outcomes on decision behavior. This follows the realization that most real-life decisions involve outcomes that are also characterized by some degree of imprecision. For example, any taxpayer considering the use of a questionable deduction might experience uncertainty about the probability of being audited and the severity of the potential penalty [10], a patient considering a new medical treatment faces uncertainty both about the severity of its side effects as well as the probability of experiencing them, and insurance companies planning to underwrite policy for “new risks” work with vague probabilities and imprecise loss estimates [28]. In principle, subjective expected utility captures the effects of vague outcomes via the DM’s attitudes toward risk. Camerer and Weber [9] claim that, normatively speaking, vague outcomes do not pose any special problems. However, there is clear experimental evidence (e.g., [26,40] and other references cited in [9]) that attitudes towards vagueness are distinct from attitudes towards risk. Experimental work incorporating vagueness on both probabilities and outcomes has generally found its effects to operate similarly and independently for both dimensions [10,20,28]. Previous work [26,27] has focused on the joint and relative effects of vagueness on both dimensions in the domain of losses. In these studies subjects made a series of pairwise choices among hypothetical health or environmental risks that varied in terms of the probability of loss, the magnitude of the loss, and the precision with which each dimension was specified. To allow meaningful comparisons of the two types of vagueness, the values on these dimensions were matched in terms of their precision and their effects on expected losses.2 1
2
Ellsberg was the first to use the term ambiguity to refer to imprecisely specified probabilities. Budescu et al. [8] have argued against this convention and advocated using vagueness or imprecision to capture the special characteristics of this situation (see also [10]). According to Budescu and Wallsten’s [7] taxonomy, an event is ambiguous if its description is subject to multiple interpretations (e.g., “in the Fall of 2003, the stock market will increase moderately”), and a probability distribution is vague if the DM cannot associate all events in the sample space with unique and precise numerical probabilities. For example, in [26] subjects were asked to indicate their preference (or indifference) between options such as (a) 0.03 to 0.07 chance of a loss of $75, (b) 0.05 chance of a loss between $45 and $105, and (c) 0.05 chance of a loss of $75.
Valuation of Vague Prospects with Mixed Outcomes
255
Both studies show that: (1) the tendency to avoid (or favor, for a minority of subjects) vagueness is stable and consistent across attributes, (i.e., DMs who avoid (like) vague probabilities are also likely to avoid (prefer) vague outcomes) (2) DMs’ dimension preference is the single best predictor of choice among risks of equal expected value (i.e., people tend to choose the option with the lower probability of a loss or to repeatedly choose the option with the lower possible loss; see also [41]); but (3) attitudes toward vagueness become important and critical determinants of choice in cases where dimension prominence does not distinguish between options. Kuhn et al. [27] reduced the probabilities and increased considerably the magnitude of the losses compared. This caused a much higher proportion of subjects to weigh the outcome dimension more heavily, and induced a concurrent increase in the preference for precise outcome. This suggests that (4) concern for the precision of an attribute may be related to its relative salience in the decision process. Budescu et al. [6] reported further support for this conclusion. It is well known that the method used to infer a DM’s preferences can affect the salience of an attribute. Many studies (e.g., [18,37,42,44]) have established that probabilities are more prominent in choice tasks but outcomes are more salient when prospects are evaluated in isolation, especially when the response scale is compatible with the outcomes but not the probabilities. Budescu et al. [6] report two experiments in which DMs provided certainty equivalents (CEs for short)—the cash amount such that the DM is indifferent between playing the risky gamble and getting the cash—for positive (involving only gains) and negative (involving losses) prospects. They manipulated the precision of the probability of the target outcome and/or of the potential outcome (i.e., the amount one could win or lose). Given that the CEs are determined separately for each prospect and are expressed in monetary units (which are compatible with the outcomes), they predicted that the DMs would be more concerned with the (im)precision of the outcomes. This prediction received strong support confirming the effect of dimension prominence on the salience of its precision.
2 Modeling Decisions Involving Vague Prospects Budescu et al. [6] proposed a generalized form of prospect theory (PT for short) to model the CEs of vague prospects. PT, originally proposed by Kahneman and Tversky [24] and refined by Tversky and Kahneman [43], distinguishes between the editing and evaluation phases, assumes that outcomes are represented by a domain-specific value function and that uncertainty operates through a probability weighting function. PT makes some qualitative Options (a) and (b) span identical intervals of expected losses (from $2.25 to $5.25), so they are considered equally vague. In addition, the midpoints of the two intervals ($3.75) coincide with the expected value of the precise option (c). Thus all three options are equated in expected value.
256
David V. Budescu and Sara Templin
assumptions about the properties of these two functions, and suggests that a given prospect is evaluated by a simple combination rule (bilinear in form) of the two. To extend the model to vague prospects, Budescu et al. [6] assumed that the DM encodes any imprecise parameter (probability and/or outcome) as an interval of feasible values that can be described by its lower and upper bounds.3 In many cases these bounds are explicitly provided by the environment, but in other circumstances the DM infers them based on the information available to her. The novel feature of the model is a new editing operation invoked to resolve the prospect’s vagueness prior to its evaluation. This vagueness resolving operation maps each interval (of probabilities and/or outcomes) into a single precise value. The implication is that, everything else being equal, the DM would be indifferent between, and would make the same decision using, the interval and its equivalent precise representation. Because each interval is described by its two endpoints, the interval can be represented by a weighted average of these focal points.4 These weights are additional parameters in the model that can be related to the DM’s attitudes towards vagueness and interpreted in a meaningful way. Budescu et al. [6] further assume that the same value and weighting functions that drive DMs’ responses to precisely specified prospects determine their responses to vague prospects. Let CEij be the CE of a vague prospect where pli and pui are the endpoints of the probability interval and xlj and xuj are the lower and upper bounds of the outcome range. The model predicts that the CE of the prospect, U (CEij ), is: U (CEij ) = U [wx xlj + (1 − wx )xuj ] × f [wp pli + (1 − wp )pui ] .
(1)
In this equation U (x) and f (p) are the regular value and probability functions of PT, wx is a parameter that captures the attitude toward vagueness in outcomes, and we refer to it as the outcome vagueness coefficient, and wp is the probability vagueness coefficient. Both parameters are bounded, such that 0 ≤ wp and wx ≤ 1. Thus, if either vagueness coefficient equals 0.5, the DMs represent the interval by its midpoint. This weighting is insensitive to the width of the range, so we interpret this pattern as insensitivity (indifference) to vagueness. If, however, w > 0.5, the lower end of the range is overweighted, indicating vagueness aversion. Conversely, when w < 0.5, the upper end of the interval is more salient indicating vagueness preference. Note that the 3
4
No further assumptions are made about the likelihood of specific values in the interval. If one assumes a particular probability distribution over the interval, the vagueness is mapped into a second-order probability distribution [9]. This approach is consistent with other descriptive models of decision making under risk (e.g., [33,39]) incorporating the DM’s focus on particular outcome values. It is also similar to Ellsberg’s [15] suggestion that, when faced with gambles with vague probabilities, a DM will choose according to some weighted combination of the gambles’ expected utility and its minimal expected utility.
Valuation of Vague Prospects with Mixed Outcomes
257
more extreme the coefficient (i.e., the farther it is from 0.5 in absolute terms, and the closer it is to the relevant endpoint) the greater the intensity of the attitude toward vagueness. Given that wp and wx are defined and scaled in similar fashion, one can compare |wp − 0.5| and |wx − 0.5| to determine the DM’s relative sensitivity to the two types of vagueness. The general model (1) assumes the presence of vagueness in probabilities and outcomes. If pli = pui or xpj = xuj one can model situations where one of the dimensions is precisely specified (e.g., the classical scenario of Ellsberg’s paradox). And, if pli = pui and xpj = xuj the model reduces to the regular PT for precise prospects. The model deals simultaneously with, and disentangles the effects of, the DM’s distinct attitudes towards probabilities (through the decision weighting function, f (p)), risk (through the value function, U (x)), and vagueness on the two dimensions (through the vagueness coefficients, wp and wx ).
3 Empirical Tests of the Model In Budescu et al. [6] DMs provided CEs for “regular” prospects with precise probabilities and precise outcomes (denoted PP) and imprecise prospects. We denote prospects with precise probabilities and v ague outcomes as PV, their counterparts with v ague probabilities and precise outcomes as VP, and prospects with v ague probabilities and v ague outcomes are denoted VV. For each prospect subjects provided a single monetary value whose sure receipt they would find equally attractive to the opportunity to play the gamble. The PV and VP prospects presented were equally vague and were matched in expected value with the PP prospects. To motivate careful responses, the DMs were given the chance to earn money based on a random subset of their CEs. A simple and straightforward way to assess the impact of vagueness is to compare the subjects’ valuations of PP, PV, and VP prospects matched in expected values and vagueness. These (model-free) comparisons indicate that (a) in both domains subjects were more concerned with the im(precision) of the outcomes than the vagueness of the probabilities, (b) for gains subjects displayed consistent (and significant) preference for vague outcomes, and (c) for losses subjects showed a reversed pattern of consistent (and significant) avoidance of vague outcomes. These patterns were recently replicated by Du and Budescu [13]. To fit the model, Budescu et al. [6] used the functions advocated by Tversky and Kahneman [43].5 They used a two-parameter piecewise (domain specific) power value function: 5
We make no special claims regarding the superiority of unique status of these functions. We tested other forms that fit the data quite well. We focus on these functions because they are simple (one parameter for value and one for probabilities), and are frequently used in the literature.
258
David V. Budescu and Sara Templin
U (x) = xα , U (0) = 0
if x > 0 (2) β
U (x) = −(−x) ,
if x < 0
and a one-parameter nonlinear decision weighting function: f (p) =
[pγ
pγ , + (1 − p)γ ]1/γ
γ>0.
(3)
The model was fit separately for each subject across all of his CEs. The global fit of the model was good,6 and the estimated vagueness coefficients captured well the observed trends in the data. The coefficients of vagueness for outcomes were more extreme than their counterparts for probabilities, indicating a higher concern for their precision. In the domain of gains the parameters indicated strong vagueness preference for outcomes, and there was a clear shift toward avoidance of vague losses (16 of the 24 DMs switched from preference for vague gains to dislike of vague losses). In both domains the probability vagueness coefficients were close to 0.5.
4 The Present Study This switch in the DMs’ attitude towards the vagueness of outcomes from one domain to another is an intriguing novel finding. We believe that it is related to the distinctive focus of attention induced by positive and negative prospects, and we suggested that this is an instance of goal framing [29]. When determining CEs for positive and negative prospects, the DMs are motivated by different concerns: they strive to maximize gains and minimize losses, anchoring on, and paying more attention to, the relevant focal endpoint. The CE of an imprecise prospect in the domain of gains is that value which matches the gamble’s potential to maximize gains, hence the salience of the higher end of the range of values and the tendency towards vagueness seeking. Conversely, when facing potential losses, the focal point of the range is its lower end (i.e., the worst possible loss), thus vagueness avoidance is induced. In the present study we examine more carefully the differential attitude to vagueness and its impact on the valuation of gains and losses. We address this issue by eliciting and analyzing CEs of a new class of stimuli that, to the best of our knowledge, were never studied before, namely vague mixed-outcome prospects of the type, “You have a probability p to obtain a positive outcome between $Gl and $Gu and the complementary probability (1 − p) to lose an amount between −$Ll and −$Lu .” Traditionally, the research related to the distinction between gains and losses has relied, almost exclusively, on simple and direct comparisons between 6
The median root mean squared (RMS) deviation between the observed and the fitted CEs was $0.99 in the first study and $1.19 in the second experiment.
Valuation of Vague Prospects with Mixed Outcomes
259
decisions involving strictly positive prospects on the one hand, and decisions involving strictly negative prospects. Most of the empirical results used to characterize the value function of the PT [24,43] are in this category, as is the Budescu et al. [6] study of the effects of vagueness. Recently, Levy and Levy [30,31] have argued that most real-life decisions involve mixed prospects (i.e., possibilities of gains and/or losses) and speculated that some of the empirical regularities that seem to support PT’s value function may be inaccurate and biased because of this methodological problem. Although their analysis was shown to be flawed in its details (see [1,46]), their general concern regarding the understudy of mixed gambles is valid. In recent years researchers have studied more systematically decisions involving the more complex mixed prospect (see [3,11,38,47]). In the present study we extend this work to cases where the outcomes are imprecise. The current specification of the model (1) assumes that all outcomes are in the same domain (either all gains or all losses). Thus, we replace it by a more general form that can handle mixed prospects. Consistent with PT [43], and many other modern models of individual decision making under risk (see [34]), we assume that a mixed prospect can be decomposed into its gain and loss components (but see [47], for a challenge to this assumption). More specifically, we assume that the overall worth of a mixed gamble (with precise probability) can be represented by a simple sum of its positive and negative components. Let CE be a mixed vague prospect, and let pgain be the probability of a gain. To stress the distinction between the two domains, we use the terms Glower and Gupper for the endpoints of the range of potential gains and Llower and Lupper for the corresponding endpoints of range of potential losses. We also distinguish between two domain-specific coefficients for outcome vagueness: wgain and wloss . (Both coefficients are bounded; i.e., 0 ≤ wgain , wloss ≤ 1.) The CE of the vague mixed prospect, U (Gij ), is given by: U (CE) = f (pgain )U [wgain Glower + (1 − wgain )Gupper ] + f (1 − pgain )U [wloss Llower + (1 − wloss )Lupper ] .
(4)
The data obtained in this experiment will allow us to test the generality of the previous results. More specifically, we aim to (a) replicate some of the results from the previous studies using strictly positive/negative prospects, (b) test a set of related new predictions about mixed prospects, (c) compare the attitudes towards vagueness in (appropriately matched) mixed and strictly positive/negative prospects, and (d) estimate the model’s parameters (especially the vagueness coefficients) separately for the two types of prospects. A second goal of this study is methodological. We plan to obtain valuations of the vague mixed prospects by means of two distinct measurement procedures and compare the respective patterns of results. There are numerous methods for eliciting value and utility functions (for partial reviews see [4,16,21,22,34]). Tversky et al. [44] were the first to highlight the assumption of procedural invariance that is implicit in most decision models. In a
260
David V. Budescu and Sara Templin
nutshell, the expectation is that a rational DM would reveal identical preferences that, in turn, would be mapped into identical utility functions with any of the various assessment methods. However, many empirical studies have shown this somewhat na¨ıve hypothesis to be false. For example, Hershey and Schoemaker [22] have documented the inconsistencies between the certaintyand probability-equivalence methods, McCord and de Neufville [35] have illustrated inconsistencies between lottery-equivalents, and there is a substantial literature on the famous “reversal of preferences” among pricing, matching, and choice procedures (e.g., [5,36,42,44,45]). There are also several forms of CE elicitation, and they are not always in complete agreement (see results reported by Bostic et al. [5] and by Loomes [32]). The most straightforward procedure is the one employed in Budescu et al. [6]: DMs are instructed to provide for each prospect a single monetary value whose sure receipt they would find equally attractive to the opportunity to play the gamble.7 Loomes [32] refers to this as the standard valuation procedure. Adopting Luce’s [34] terminology, we call the value obtained by this direct judgment procedure the judged certainty equivalent (JCE). A natural concern is that people respond differently to risky and riskless outcomes and this may bias JCEs in systematic ways. An alternative approach attempts to infer the CE indirectly from a series of pairwise choices. The DM is asked to choose between the target prospect and a variety of fixed outcomes spanning a wide range of values. Presumably, the DM should prefer high outcomes to the prospect and she would choose the prospect over low outcomes. Thus, there must exist an intermediate value that the DM would find to be equivalent to the prospect. Luce [34] refers to this inferred value as the choice certainty equivalent (CCE). In some cases (e.g., [43]) all the outcomes to be compared to the target prospect are predetermined and shown simultaneously (in the form of a list) to the DM. In other cases (e.g., [5,32]) these values are shown sequentially (one at a time), and are calculated by a special algorithm (e.g., [34], Appendix C), contingent on the DM’s previous responses. In the present study, DMs will evaluate the various prospects and provide JCEs and CCEs. Thus, we will be in a position to assess the similarity of the results, and test the invariance of the conclusions regarding attitudes to various types of vagueness across elicitation methods. 4.1 Method Participants Forty subjects were recruited by posting a classified ad online at the University of Illinois at Urbana-Champaign. Subjects were told they would be 7
Alternatively DMs are told that they own the prospect and asked to determine a minimal selling price, or they are asked what is the maximal amount they would be willing to pay for it.
Valuation of Vague Prospects with Mixed Outcomes
261
paid $15 for completing two experimental sessions, with additional monetary compensation awarded for a gambling task. The ages of the (22 male and 18 female) subjects ranged from 19 to 42 with a mean of 24.55 and a median of 23. Experimental Design Subjects were randomly assigned to four experimental conditions defined by assignment of the two elicitation methods, judged certainty equivalent and choice certainty equivalent, to two sessions. Half of the subjects were assigned to the JCE/CCE conditions in the first session. In the second session half of the subjects in each of these groups switched to the other elicitation method, and the other half used the same method. All the subjects assessed the same prospects that were presented in a random order (that varied from one subject to another and from one session to another). Each prospect was defined by three components: the probability of winning, the potential gain, and possible loss. Each of the three could be precise or vague. The upper and lower bounds of the probability ranges were created by pairing three probabilities, 0.25, 0.50, and 0.75, with the constraint that the lower bound must be lower than, or equal to, the upper bound. This yields three precise probabilities (lower bound = upper bound) and three vague probabilities (lower bound < upper bound). In the current chapter we only analyze the decisions involving precise probabilities. The upper and lower bounds of the outcome ranges were determined by pairing each of three equally spaced values, −$9, −$6, and −$3 for losses and +$3, +$6, and +$9 for gains, with the constraint that the lower bound must be lower than, or equal to, the upper bound. The outcome ranges define four groups: three precise gains ($3, $6, $9), three vague gains ($3 to $6, $3 to $9, $6 to $9), three precise losses (−$9, −$6, −$3) and three vague losses (−$3 to −$6, −$3 to −$9, −$6 to −$9). Subjects evaluated 162 prospects in each session: 27 strictly positive prospects, 27 strictly negative prospects, and 108 mixed prospects. The various types of prospects are abbreviated by one or two letters. The first letter denotes the precision of the gains (Pg is used for precise gains and Vg is used for vague gains). The next letter denotes the precision of the losses. The letters Pl and Vl designate a precise and a vague loss, respectively. The relevant 36 strictly positive and strictly negative prospects (with precise probabilities) are described in full by a single letter. There are 9 Pg , 9 Pl , 9 Vg , and 9 Vl prospects. The 108 mixed prospects were defined by assigning a precise probability to a (precise or vague) gain, and the complementary probability to a (precise or vague) loss. They are denoted by two letters that describe the precision of the gains and the losses, respectively. Thus, we had 27 Pg Pl prospects, 27 Pg Vl prospects, 27 Vg Pl prospects, and 27 Vg Vl prospects.
262
David V. Budescu and Sara Templin
Procedure Subjects were run individually in a computerized lab (the number of subjects that were run simultaneously varied from one to five). They were seated in individual cubicles. Instructions were given at the beginning of the computer program and a hard copy was also available for reference. The two sessions were run approximately a week apart. General instructions and procedures were identical for both CE elicitation methods. Subjects were asked to determine a “for sure” dollar amount that would be as attractive (or unattractive) as the opportunity to play the gamble. The lottery was described to the subject by four values in table form: the probability of a gain, the potential gain, the probability of a loss, and the possible loss.
Fig. 1. Screen shot of the display used in eliciting judgment certainty equivalents (JCEs).
In the JCE method, subjects were simply asked to type that dollar amount into an empty response box on the computer screen. A screenshot of the task as presented to the subjects at the beginning of a trial is reproduced in Figure 1. In the CCE method subjects were asked to choose between a lottery and each of the (precise) values on a list. For each value listed, subjects had to indicate whether they preferred the sure money or the gamble. The values were listed in ascending order and spanned the range from the lowest possible and the highest possible value of the prospect. A screenshot of the task as presented to the subjects at the beginning of the trial is reproduced in Figure 2. The example refers to a Vg Vl gamble that offered a 0.75 probability to win an amount in the $6 to $9 range, and a 0.25 probability to lose an amount in the −$3 to −$9 range. Seven values ranging from −$9 to $9 in $3
Valuation of Vague Prospects with Mixed Outcomes
263
(a) First Stage
(b) Second Stage
Fig. 2. Screenshots of the displays used in the first and second stage of the elicitation of the choice certainty equivalents (CCEs).
increments are listed, and the DM needs to specify whether he or she prefers the lottery or the cash. Initially, the lottery is marked as the preferred choice, but after the DM indicates preference for a certain cash amount (say $3) all other higher amounts were automatically selected by the program. When the subject indicated that he was satisfied with all the pairwise choices, the program identified the range where the CE most likely lay (the region where the subject’s preferences switched from the sure gain to the lottery). Next, the subjects were shown a second list of values within this narrow range and asked to choose the preferred option. For example, if subjects preferred the lotteries for values of, or below $0, but preferred the sure gain for values of, or above $3, the new values would span the $0 to $3 range with $0.5 increments (see second screen in Figure 2). The center of the range where the switch of preferences occurred is the prospect’s JCE.
264
David V. Budescu and Sara Templin
Subjects had three practice trials (with the option of adding additional ones, if they wanted to) with in-depth instructions to help them understand the nature of the task and of the lotteries. The instructions stressed that to maximize gains one should provide honest and accurate CEs. At the conclusion of the last trial, subjects played a number of lotteries to accrue their additional compensation (on top of the $15 flat fee). We used a procedure developed by Mellers et al. [36] that encourages subjects to give their true valuation: ten prospects were randomly selected and paired. In each of the five pairs subjects got to play the one lottery they valued more (based on its stated JCE or inferred CCE). Subjects were allowed to proceed at their own pace through both the practice trials and the judgment task. After half of the trials were completed, subjects were allowed to take a five minute break. Each session lasted approximately one hour. Before playing the lotteries, subjects were shown the relevant values (the chance of winning and their payoffs) and 10 numbers (0 to 9). They were allowed to select their own winning numbers (the number of selections was proportional to the probability of winning the lottery). For instance, if subjects had a 50% chance of winning, they could select any five numbers. The subjects activated a random number generator using the computer mouse. The random numbers were visible but scrolled very quickly on the screen. Subjects could stop the random number generator at any time with a click of the mouse. If the number that appeared matched one of the numbers they selected, the subject would win and the corresponding amount won would be added to her total. If the subject lost the lottery, the corresponding loss was subtracted from her total. The minimum amount received was $15, even if the total gains after the completion of the lotteries was below $15. The average amount earned was $38. 4.2 Results All DMs evaluated all prospects twice (a few days apart and, in half the cases, using different elicitation methods). Most of our analyses of the key predictions are based only on the results of the first session where the DMs’ responses are uncontaminated by previous exposure to the same problems. Attitudes Towards Vague Outcomes in the Domains of Gains and Losses The goal of this analysis is to confirm the differential pattern of attitudes to vague outcomes in the domains of gains and losses first documented in Budescu et al. [6]. To this end we compare the CEs provided by the subjects to the strictly positive and strictly negative prospects matched in terms of their expected values. The absolute values of the CEs were submitted to a four-way ANOVA with one between-subjects factor (elicitation method = JCE or CCE)
Valuation of Vague Prospects with Mixed Outcomes
265
and three within-subjects factors (domain = gains or losses, precision of outcomes = P or V, and absolute expected value). The mean CEs are presented in Table 1. There was a significant effect of elicitation method (F (1, 38) = 10.22, p < 0.01) indicating that the JCEs were, on the average, higher that the CCEs (mean |JCE| = 3.26 versus mean |CCE| = 2.49). Naturally, we found a significant monotonic relationship between the prospects’ (absolute) expected values and their (absolute) CEs (F (3, 114) = 51.62, p < 0.01). There was a significant interaction between elicitation method and the absolute expected values (F (3, 114) = 5.37, p < 0.01), indicating that the JCEs were more sensitive to the EVs than the corresponding CCEs. Table 1. Mean CEs of the strictly positive and strictly negative prospects as a function the elicitation method, the domain, the (absolute) expected value, and the precision type.
Mean
Losses CCE
JCE
|EV| + Precision Type
JCE
Gains CCE
Mean
−1.51 −1.88 −0.37
−1.08 −1.49 −0.41
−1.94 −2.26 −0.32
1.5 Precise 1.5 Vague Vag – Pre
2.08 2.22 0.14
1.86 3.25 1.39
1.97 2.74 0.77
−2.14 −2.20 −0.06
−1.39 −1.57 −0.18
−2.89 −2.84 0.05
2.25 Precise 2.25 Vague Vag – Pre
2.51 3.14 0.63
3.07 3.06 −0.01
2.79 3.10 0.31
−2.20 −2.53 −0.33
−1.68 −1.89 −0.21
−2.72 −3.17 −0.45
3.0 Precise 3.0 Vague Vag – Pre
3.27 4.51 1.24
2.88 3.86 0.98
3.08 4.19 1.11
−3.43 −3.34 0.09
−2.35 −2.60 −0.25
−4.50 −4.07 0.43
4.5 Precise 4.5 Vague Vag – Pre
4.91 5.10 0.19
4.09 3.78 −0.31
4.50 4.44 −0.06
−2.32 −2.49 −0.17
−1.62 −1.89 −0.27
−3.01 −3.09 −0.08
Mean Pre Mean Vag M Vag – Pre
3.19 3.74 0.55
2.98 3.49 0.51
3.09 3.62 0.53
On the average, the strictly positive prospects were valued higher (mean |CE| = 3.35) than their negative counterparts (mean |CE| = 2.40), and the effect of the domain was significant (F (1, 38) = 66.01, p < 0.01). We also found an interaction between the domain and the elicitation method (F (1, 38) = 20.52, p < 0.01), due to the fact that the difference between positive and negative prospects was more pronounced for CCEs (mean |CCE|gain = 3.23
266
David V. Budescu and Sara Templin
versus mean |CCE|loss = 1.76) than for JCEs (mean |JCE|gain = 3.46 versus mean |JCE|loss = 3.05). Finally, and most importantly for our present purposes, we confirmed that the prospects with vague outcomes were assigned higher (absolute) CEs than the prospects with precise outcomes (mean |CE|vague = 3.05 versus mean |CE|precise = 2.70). Table 1 displays this effect for each level of |expected value|. This difference is significant (F (1, 38) = 9.01, p < 0.01), and none of the interactions involving this factor were. To summarize, the results of this analysis indicate that on the average subjects are vagueness seekers for gains (i.e., CE(Vg ) > CE(Pg )) and vagueness avoiders for losses (i.e., CE(Vl ) < CE(Pl )). In general, vague outcomes yield larger absolute values in both domains (i.e., |CE(V )| > |CE(P )|), but the effect of outcome imprecision is stronger for gains (i.e., |CE(Vg )| − |CE(Pg )| > |CE(Vl )| − |CE(Pl )|). Attitudes Towards Vague Outcomes in Mixed Prospects Having documented the differential effects of imprecise outcomes in each domain, we turn now to analysis of the mixed prospects. Consistent with the standard modeling of mixed prospects (e.g., [43]), we assume that for any probability of gain, Πg, the total value of a binary mixed prospect is a weighted average of its positive and negative components: U (Prospect) = f (Πg)U + (gains) + f (1 − Πg)U − (losses) , where f (·) is a monotonic weighting function and U + (·) and U − (·) are monotonic value functions for the domains of gains and losses, respectively. If these three functions are invariant, it is possible to derive predictions about the various types of prospects that vary in terms of their (im)precision, based on the previous regularities documented by Budescu et al. [6] and Du and Budescu [13]. In particular, the pattern of vagueness seeking (avoidance) for gains (losses), and the observation that the effects in the positive domain are stronger, leads us to predict that the order of the four CEs will be: CE(Vg Pl ) > CE(Vg Vl ) > CE(Pg Pl ) > CE(Pg Vl ) . To test this prediction we compared the average CEs assigned by the subjects to the mixed prospects under the four combinations of (im)precision. Table 2 displays the mean CE values across all subjects and elicitation methods. The columns of the tables are arranged according to the predicted order. The key prediction is supported; that is, the Vg Pl prospects are assigned the highest CEs, and the Pg Vl ones are assigned the lowest CEs, for all probabilities. However, the valuations of the all vague (Vg Vl ), and the all precise (Pg Pl ) are not consistent across the various probabilities. Given this pattern, it is more informative to analyze the data separately for each probability, and eliminate the potential effects of differential weighting of
Valuation of Vague Prospects with Mixed Outcomes
267
Table 2. Mean CEs of the mixed prospects as a function of the probability of gain and the precision of the gains and losses.
Probability of Gain 0.25 0.50 0.75 Mean
Vague Gains Precise Losses −0.80 1.25 3.08 1.22
Mean CE Vague Gains Precise Gains Vague Losses Precise Losses −1.41 0.92 2.96 0.86
−0.96 0.95 2.77 0.85
Precise Gains Vague Losses −1.50 0.49 2.38 0.51
low and high probabilities (see [38] on the critical impact of this probability). The mean CEs across the nine cases of each type were submitted to twoway ANOVAs with one between-subjects factor (elicitation method = JCE or CCE) and one within-subjects factor (pattern of precision of outcomes), separately for each probability. The effect of the pattern of precision was significant in all cases (F (3,114) = 8.77, 6.23, 6.13 for probabilities 0.25, 0.50, and 0.75, respectively; all p < 0.01). Furthermore, the CEs of the Vg Pl were significantly higher than their Pg Vl counterparts in all cases (F (1,114) = 18.58, 18.40, 15.92 for probabilities 0.25, 0.50, and 0.75, respectively; all p < 0.01). Overall the mean CCE (1.40) is higher than the mean JCE (0.32), but the difference was significant only for the lowest probability of 0.25 (F (1,38) = 14.88; p < 0.01). The interaction between the two factors is not significant. In summary, the differential attitude to vague gains and losses is clearly observed in the valuation of mixed prospects where cases involving the most desirable combination of vague gains and precise losses are highly valued, whereas the opposite combination of precise gains and vague losses is assigned the lowest CEs, with the all vague and all precise combinations in between. This pattern holds for all probabilities and both elicitation methods. Are Attitudes Towards Vague Outcomes Amplified or Attenuated with Mixed Prospects? We have obtained direct CEs of the mixed prospects as well as CEs of their (positive and negative) constituents. In the previous sections we have shown that, in both cases, our subjects assigned high (low) CEs to prospects with vague gains and precise losses (vague losses and precise gains). To determine whether this tendency is affected by the presentation mode, we calculated for each prospect the difference between its CE under the mixed presentation and the sum of the CEs of its positive and negative components (i.e., mixed − (positive + negative)). The null hypothesis, consistent with the standard decomposition, is that this difference is, uniformly, 0. The intriguing alternative
268
David V. Budescu and Sara Templin
is that the tendencies to favor vague gains and dislike vague losses are accentuated by the joint presentation mode (mixed prospect), so the difference (a) is positive and (b) varies systematically across the four patterns of (im)precision with highest discrepancy in the vague gains and precise losses case, and the lowest in the precise gains and vague losses. The differences between the CEs of the mixed prospects and their constituents were computed and were submitted to a three-way ANOVA with one between-subjects factor (elicitation method = JCE or CCE) and two withinsubjects factors (probability of gain and pattern of precision of outcomes). Table 3 displays the mean differences in CE values across all subjects. The overall mean difference (−0.06) is not significantly different from 0 (F (1,39) = 0.18, p > 0.05). The mean difference between the mixed and positive and negative gambles increased monotonically as a function of the probability of a gain (F (2,78) = 137.27, p < 0.05). However, there were no significant differences between the two CE elicitation methods. There were significant differences between the four distinct patterns of (im)precision (F (3,117) = 4.26, p < 0.05), although none of the individual means differed significantly from 0. Posthoc Tukey tests revealed that only one pair of patterns (Pg Pl and Vg Vl ) was responsible for this difference. In the all precise case (Pg Pl ) case the mixed prospects were valued higher than the sum of the positive and negative components, whereas in the all vague (Vg Vl ) condition, we found the opposite pattern. We conclude that although the attitudes to vague gains and losses are very similar under both presentation formats, there are some subtle differences. When both gains and losses are precise DMs value the mixed prospects more than the sum of their positive and negative components, but when both gains and losses are vague, the pattern is reversed. Fitting a Generalized PT Model to the JCE Data To further illustrate the previous points, the generalized version of the PT model (see (4)) was fit to the CEs. We encountered some estimation problems with the CCEs so we only report the results based on the direct JCE values. The extended PT model (4) was fit to the data using the standard PT functional forms (see (2) and (3)). We used only three distinct probabilities of gains, thus we did not estimate the parameter γ. Instead, its value was set to 0.65 (the average of the values estimated for the gain and loss conditions— 0.61 and 0.69, respectively—in the Tversky and Kahneman 1992 paper [43]). The model was fit separately to the CEs of the mixed prospects, and the CEs of the two domain-specific (positive and negative) prospects.8 Thus, we have two independent sets of estimates of the four parameters of this model, wgain , wloss , α, and β. Table 4 summarizes the group level analysis. It displays the parameter estimates obtained using the median CE estimates (across all 20 DMs) for each 8
The estimation was performed by the PROC NLIN procedure in SAS.
Valuation of Vague Prospects with Mixed Outcomes
269
Table 3. Mean difference between the CEs of the mixed prospects and the predicted CEs based on their positive and negative components as a function of probability of gain and the method of elicitation. Mean Difference Between CEs of Mixed Prospects and CEs of Their (Positive and Negative) Constituents Precise Gains & Precise Losses
Precise Gains & Vague Losses
Vague Gains & Precise Losses
Vague Gains & Vague Losses
Probability of Gain
Elicitation Method
0.25
JCE CCE Overall
−0.37 −0.47 −0.42
−0.72 −0.78 −0.75
−0.60 −0.74 −0.67
−0.96 −0.90 −0.93
−0.66 −0.73 −0.69
0.50
JCE CCE Overall
0.39 0.24 0.32
0.12 0.21 0.17
0.011 0.160 0.084
0.12 −0.01 0.06
0.16 0.15 0.16
0.75
JCE CCE Overall
0.56 0.25 0.41
0.50 0.25 0.37
0.39 0.22 0.30
0.46 0.30 0.38
0.48 0.25 0.36
Mean
JCE CCE Overall
0.19 0.00 0.10
−0.03 −0.11 −0.07
−0.07 −0.12 −0.09
−0.13 −0.20 −0.17
−0.01 −0.11 −0.06
Mean
Table 4. Estimates of model parameters based on median CEs. Prospect
wgain
wloss
α
β
RMSE
Pos or Neg Mixed
0.18 0.36
0.65 0.68
1.59 1.15
1.03 1.03
0.12 0.03
prospect. We found slight risk seeking for gains (α > 1), and risk neutrality for losses (β ≈ 1). The vagueness coefficients confirm the pattern of vagueness seeking in the domain of gains (wgain < 0.5), and vagueness avoidance for losses (wloss > 0.5) for both types of prospects. The differentiation between gains and losses is somewhat reduced in the estimates based on the mixed prospects. Table 5 summarizes results obtained when fitting the model separately for each DM. It displays the median individual parameter estimates. The results are consistent with those in Table 4. Most importantly, we replicate the reversal of attitudes to vagueness across domains. We calculated for each DM the
270
David V. Budescu and Sara Templin
Table 5. Median estimates of model parameters based on fitting of individual subjects. Prospect
wgain
wloss
α
β
RMSE
Pos or Neg Mixed
0.29 0.36
0.59 0.69
1.13 1.18
0.78 1.02
1.03 2.45
difference between the parameters estimated from the domain-specific (positive or negative) and the mixed prospects. There are no significant differences between the two sets of individual estimates of α (t(19) = 1.09, p > 0.05), β (t(19) = 1.66, p > 0.05), and wgain (t(19) = −0.80, p > 0.05). There was a significant difference between the two sets of estimates of wloss (t(19) = −3.05, p < 0.05), indicating that the estimates based on the mixed prospects are higher (closer to 1). However, when subjects for which wloss(mixed) = 1 are removed from the analysis, the two sets are indistinguishable (t(15) = −1.52, p > 0.5). In summary, the model fits the JCEs quite well, and the parameter estimates confirm the patterns uncovered in the original (model-free) analyses: vagueness seeking for the domain of gains and vagueness avoidance for the domain of losses. The two sets of parameters are very similar to each other, but the mixed gambles display a slightly higher symmetry between the domains of gains and losses.
5 Summary and Discussion The main goal of this study was to examine some implications of the intriguing pattern of attitudes towards vague outcomes first uncovered in Budescu et al. [6] and replicated by Du and Budescu [13]. These studies found that, contrary to the typical generalizations regarding the effects of vagueness (e.g., [9]), most DMs prefer positive prospects with vague outcomes. Subsequently, Du and Budescu [13] showed that this tendency is highest for relatively low levels of vagueness. Budescu et al. [6] speculated that when determining CEs for positive prospects, the DMs seek a precise value that matches the vague prospect’s potential to maximize gains. This highlights the salience of the upper end of the range of the interval of values and induces vagueness seeking. The second interesting result in these studies is the reversal of attitude towards imprecision in the domain of losses (see [13] for additional examples of the sensitivity and malleability of these attitudes). Budescu et al. [6] attributed this shift to the distinctive focus of attention induced by positive and negative prospects, and suggested that this is an instance of goal framing ([29]). When determining CEs for negative prospects, the DMs seek to
Valuation of Vague Prospects with Mixed Outcomes
271
minimize losses, so they pay more attention to (and, indeed, anchor their judgments on) the interval’s lower end (i.e., the worst possible loss). This results in vagueness avoidance behavior. The present study can be viewed as an elaborate replication designed to validate these regularities, and to establish their robustness. The design was generalized in two distinct ways related to the elicitation method and the format of the prospects. Unlike the previous studies in which the CEs of the target prospects were obtained by a direct judgment method (JCE), we added a more complex choice-based elicitation method (CCE). And, whereas all previous studies used only strictly positive or negative prospects, in the present study we used for the first time vague mixed-outcome prospects of the type, “You have a probability p to obtain a positive outcome between $Gl and $Gu and the complementary probability (1 − p) to lose an amount in the −$Ll and −$Lu range.” Generally speaking both results—vagueness seeking for gains accompanied by vagueness aversion for losses—were replicated with the new elicitation method and the mixed format. Consider first the elicitation method. For the positive and negative prospects the JCEs were slightly higher, more sensitive to the probabilities, and less sensitive to the gains/loss differences than the CCEs. On the other hand, for the mixed prospects the CCEs were higher than their counterparts. This may be due to the fact that in the second stage of the CCE elicitation the DMs often faced sure amounts from only one domain (i.e., only gains or only losses, depending on the location of the interval identified at the end of the first stage), but in the JCE task DMs are always forced to consider both the potential gains and losses. Despite these differences the basic results associated with the effects of the prospects’ precision were remarkably stable across methods. We did not find any significant interactions between the precision of the positive and/or negative components and the elicitation method, so we have established the robustness of the results across elicitation methods. Even more reassuring is the fact that the results were replicated, almost perfectly, with the CEs of the mixed prospects. In fact, the model’s parameters as estimated from the CEs of these prospects are practically indistinguishable from the estimates based on the strictly (positive and negative) CEs. We have argued that being presented with positive or negative prospects causes DMs to direct their attention to various focal points (the most extreme gain or the most severe loss, respectively) and this causes differential attitudes toward vagueness in the two cases. How do DMs react when facing mixed prospects that involve positive and negative components? In this case, there is no single dominant goal and both objectives (maximize gains and minimize losses) must be considered simultaneously. It appears that our subjects are balancing these two objectives. Interestingly, the relative attention they seem to direct to the two is driven by the probability of a gain (and the complementary probability of a loss). In other words, DMs are over(under)weighting the positive (negative) component as the probability of a gain increases. This can be best seen in the results of the JCEs summarized in the various rows of Table 3.
272
David V. Budescu and Sara Templin
These results show that (a) when the probability of a gain is low (0.25) the mean CE of the mixed prospects is lower that the sum of the CEs of its positive and the negative constituents, indicating a higher concern with the loss component; (b) conversely, when the probability of a gain is high (0.75) the mean CE of the mixed prospects is higher than the sum of the CEs of its positive and the negative constituents, indicating that more attention is diverted to the gain component; and (c) when the probabilities of gains and loss are equal the mean CE of the mixed prospects matches quite closely the sum of its positive and the negative constituents. One of the most intriguing results is the pattern of differences between the CEs of the mixed prospects and the CEs of their positive and negative constituents, under the distinct patterns of precision (summarized in the various columns of Table 3). Although the difference is not significantly different from 0 (overall, and for each class) we found that this difference is not uniform for all types of prospects. It is significantly higher (and positive) in the all precise cases (Pg Pl ), than in the all vague cases (Vg Vl ), where it is negative. This pattern seems to indicate that the standard decomposition assumption (mixed = positive + negative) does not hold universally (see [47]) and, interestingly, it may be sensitive to the (im)precision of the constituents. We hope to address this issue in future work.
Acknowledgments This work was supported in parts by a grant from the Research Board of the University of Illinois, and by National Science Foundation grant SBC 20050017 awarded to the Rand Corporation and the University of Illinois. We wish to thank Michael Coates and Tzur Karelitz for their help in designing and programming the experiment.
References 1. M. Baucells and F. H. Heukamp. Reevaluation of the results of Levy and Levy (2002a). Organizational Behavior and Human Decision Processes, 94:15–21, 2004. 2. S. W. Becker and F. O. Brownson. What price ambiguity? On the role of ambiguity in decision making. Journal of Political Economy, 72:62–73, 1964. 3. M. H. Birnbaum. Evidence against prospect theories in gambles with positive, negative and mixed consequences. Journal of Economic Psychology, 27:737–761, 2006. 4. H. Bleichrodt, J. L. Pinto, and P. P. Wakker. Making descriptive use of prospect theory to improve the prescriptive use of expected utility. Management Science, 47:1498–1514, 2001. 5. R. Bostic, R. J. Herrnstein, and R. D. Luce. The effect on the preference-reversal phenomenon of using choice indifferences. Journal of Economic Behavior and Organization, 13:193–212, 1990.
Valuation of Vague Prospects with Mixed Outcomes
273
6. D. V. Budescu, K. M. Kuhn, K. M. Kramer, and T. Johnson. Modeling certainty equivalents for imprecise gambles. Organizational Behavior and Human Decision Processes, 88:748–768, 2002. 7. D. V. Budescu and T. S. Wallsten. Processing linguistic probabilities: General principles and empirical evidence. In J. R. Busemeyer, R. Hastie, and D. Medin, editors, Decision Making from a Cognitive Perspective, volume 32 of Psychology of Learning and Motivation: Advances in Research and Theory, pages 275–318. Academic Press, San Diego, 1995. 8. D. V. Budescu, S. Weinberg, and T. S. Wallsten. Decisions based on numerically and verbally expressed uncertainties. Journal of Experimental Psychology: Human Performance and Perception, 14:281–294, 1988. 9. C. Camerer and M. Weber. Recent developments in modeling preferences: Uncertainty and ambiguity. Journal of Risk and Uncertainty, 5:325–370, 1992. 10. J. T. Casey and J. T. Scholz. Boundary effects of vague risk information on taxpayer decisions. Organizational Behavior and Human Decision Processes, 50:360–394, 1991. 11. R. A. Chechile and S. F. Butler. Reassessing the testing of generic utility models for mixed gambles. Journal of Risk and Uncertainty, 26:55–76, 2003. 12. S. P. Curley and J. F. Yates. The center and range of the probability interval as factors affecting ambiguity preferences. Organizational Behavior and Human Decision Processes, 36:273–287, 1985. 13. N. Du and D. V. Budescu. The effects of imprecise probabilities and outcomes in evaluating investment options. Management Science, 51:1791–1803, 2005. 14. H. J. Einhorn and R. M. Hogarth. Ambiguity and uncertainty in probabilistic inference. Psychological Review, 92:433–461, 1985. 15. D. Ellsberg. Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics, 75:643–669, 1961. 16. P. Farqhuar. Utility assessment methods. Management Science, 30:1283–1300, 1984. 17. W. Fellner. Distortion of subjective probabilities as a reaction to uncertainty. Quarterly Journal of Economics, 75:670–694, 1961. 18. G. W. Fischer and S. A. Hawkins. Strategy compatibility, scale compatibility and the prominence effect. Journal of Experimental Psychology: Human Perception and Performance, 19:580–597, 1993. 19. P. Gardenfors and N. E. Sahlin. Decision making with unreliable probabilities. British Journal of Mathematical and Statistical Psychology, 36:240–251, 1983. 20. C. Gonz´ alez-Vallejo, A. Bonazzi, and A. J. Shapiro. Effects of vague probabilities and of vague payoffs on preference: A model comparison analysis. Journal of Mathematical Psychology, 40:130–140, 1996. 21. J. Hershey, H. C. Kunreuther, and P. J. Schoemaker. Sources of bias in assessment of utility functions. Management Science, 28:936–954, 1982. 22. J. Hershey and P. J. Schoemaker. Probability versus certainty equivalence methods in utility measurement: Are they equivalent? Management Science, 31:1213–1231, 1985. 23. R. M. Hogarth and H. Kunreuther. Risk, ambiguity and insurance. Journal of Risk and Uncertainty, 2:5–35, 1989. 24. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47:263–291, 1979. 25. G. Keren and L. E. M. Gerritsen. On the robustness and possible accounts of ambiguity aversion. Acta Psychologica, 103:149–172, 1999.
274
David V. Budescu and Sara Templin
26. K. M. Kuhn and D. V. Budescu. The relative importance of probabilities, outcomes, and vagueness in hazard risk decisions. Organizational Behavior and Human Decision Processes, 68:301–317, 1996. 27. K. M. Kuhn, D. V. Budescu, J. R. Hershey, K. M. Kramer, and A. K. Rantilla. Attribute tradeoffs in low probability/high consequence risks: The joint effects of dimension preference and vagueness. Risk, Decision, and Policy, 4:31–46, 1999. 28. H. Kunreuther, J. Meszaros, R. M. Hogarth, and M. Spranca. Ambiguity and underwriter decision processes. Journal of Economic Behavior and Organization, 26:337–352, 1995. 29. I. P. Levin, S. L. Schneider, and G. J. Gaeth. All frames are not created equal: A typology and critical analysis of framing effects. Organizational Behavior and Human Decision Processes, 76:149–188, 1998. 30. H. Levy and M. Levy. Experimental tests of prospect theory value function: A stochastic dominance approach. Organizational Behavior and Human Decision Processes, 89:1058–1081, 2002. 31. M. Levy and H. Levy. Prospect theory: Much ado about nothing? Management Science, 48:1334–1349, 2002. 32. G. Loomes. Different experimental procedures for obtaining valuations of risky actions: Implications for utility theory. Theory and Decision, 25:1–23, 1988. 33. L. L. Lopes and G. C. Oden. The role of aspiration level in risky choice: A comparison of cumulative prospect theory and SP/A theory. Journal of Mathematical Psychology, 43:286–313, 1999. 34. R. D. Luce. Utility of Gains and Losses: Measurement-Theoretical and Experimental Approaches. Erlbaum, Mahwah, NJ, 2000. 35. M. McCord and R. de Neufville. Lottery equivalents: Reduction of the certainty effect problem in utility assessment. Management Science, 32:56–60, 1986. 36. B. A. Mellers, S. Chang, M. H. Birnbaum, and L. D. Ord´ on ˜ez. Preferences, prices, and ratings in risky decision making. Journal of Experimental Psychology: Human Perception and Performance, 18:347–361, 1992. 37. B. A. Mellers, E. U. Weber, L. D. Ord´ on ˜ez, and A. D. J. Cooke. Utility invariance despite labile preferences. The Psychology of Learning and Motivation, 32:221–246, 1995. 38. J. W. Payne. It’s whether you win or lose: The importance of the overall probabilities of winning or losing in risky choice. Journal of Risk and Uncertainty, 30:5–19, 2005. 39. S. L. Schneider and L. L. Lopes. Reflection in preferences under risk: Who and when may suggest why. Journal of Experimental Psychology: Human Perception and Performance, 12:535–548, 1986. 40. P. J. H. Schoemaker. Preference for information on probabilities versus prizes: The role of risk-taking attitudes. Journal of Risk and Uncertainty, 2:37–60, 1989. 41. P. Slovic. Relative importance of probabilities and payoffs in risk taking. Journal of Experimental Psychology, 78:18–27, 1968. 42. P. Slovic. The construction of preferences. American Psychologist, 50:364–371, 1995. 43. A. Tversky and D. Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 26:297–323, 1992. 44. A. Tversky, S. Sattath, and P. Slovic. Contingent weighting in judgment and choice. Psychological Review, 95:371–84, 1988.
Valuation of Vague Prospects with Mixed Outcomes
275
45. A. Tversky, P. Slovic, and D. Kahneman. The causes of preference reversal. The American Economic Review, 80:204–217, 1990. 46. P. P. Wakker. The data of Levy & Levy (2002) “Prospect theory: Much ado about nothing?” actually support prospect theory. Management Science, 49:979–981, 2003. 47. G. Wu and A. B. Markle. An empirical test of gain-loss separability in prospect theory. Working paper, Graduate School of Business, University of Chicago, IL, 2005.
Collective Search in Concrete and Abstract Spaces Robert L. Goldstone, Michael E. Roberts, Winter Mason, and Todd Gureckis Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47401, USA rgoldsto,robertsm,wimason,[email protected]
Summary. Our laboratory has been studying the emergence of collective search behavior from a complex systems perspective. We have developed an Internet-based experimental platform that allows groups of people to interact with each other in real-time on networked computers. The experiments implement virtual environments where participants can see the moment-to-moment actions of their peers and immediately respond to their environment. Agent-based computational models are used as accounts of the experimental results. We describe two paradigms for collective search: one in physical space and the other in an abstract problem space. The physical search situation concerns competitive foraging for resources by individuals inhabiting an environment consisting largely of other individuals foraging for the same resources. The abstract search concerns the dissemination of innovations in social networks. Across both scenarios, the group-level behavior that emerges reveals influences of exploration and exploitation, bandwagon effects, population waves, and compromises between individuals using their own information and information obtained from their peers.
1 Introduction The purpose of much of human cognition can be characterized as solving search problems. Concrete examples of important search problems facing us are finding food, friends, a mate, shelter, and parking places. More abstract examples of search are scouring the Web for valuable data, finding a scientific research area that is novel yet impactful, and finding sequences of moves that will allow us to play an effective game of chess. In the concrete cases, the sought-after resources are distributed in an actual physical space, and people have to move through this space to sample locations for the desired resource. In the abstract cases, if spaces exist at all, they are formal constructions, they will often times have more than three dimensions, and they may not operate under standard Euclidean metric assumptions. Despite these differences between abstract and concrete search problems, there is a growing sentiment that they may share fundamental cognitive operations [31,63]. Pursuing this premise,
278
Robert L. Goldstone et al.
we juxtapose concrete and abstract search paradigms to reveal some of this shared cognitive underpinning. The notion that cognition centrally involves search is hardly new. It was the motivation for Newell and Simon’s classic work [47] on solving problems by searching for sequences of operators that would take an agent from an initial to a goal state. However, in this work, and the considerable body of subsequent work in cognitive science that it inspired, the focus has been on individuals solving problems on her own or in teams. Our work focuses on individuals searching for solutions in an environment that consists of other individuals also searching. The motivation for this focus is that individuals rarely solve important problems in isolation from one another, in controlled laboratory cubicles. For example, one can think of the continued advancement of science and technology as a massive, real-world collective search problem. Although we might view the individual scientist as a single solution-searching unit, in fact each scientist’s work is influenced by their own discoveries and the successes, failures, and current efforts of others. Indeed, the presence of peers with similar motivations fundamentally changes the search process for any individual. Depending upon the circumstances, individuals will be either attracted to or repelled by the solutions of others. Attraction occurs when the cost of exploring a problem space on one’s own is high [6] and when others can act as informative scouts for assessing the quality of solutions that would be costly to personally gauge. Repulsion occurs when competition for resources is keen and early consumers of a resource can preempt the effective consumption of the resource by subsequent individuals. In a world that consists of many individuals competitively searching their environment for resources, it is not enough to employ efficient, independent search strategies. Individuals also need to know when to modify their own search to follow others’ leads or, conversely, to avoid the crowd. One’s peers are a big part of one’s environment. The importance of one’s peers does not stop there, because peers change the rest of the environment as well. When limited resources—like bananas, but not like Web sites—are used, they are no longer available for others. In these cases, one individual’s consumption negatively affects the resources available for the rest of the group. However, the influences that individuals have on the environment need not only be competitive. For Web sites, use may facilitate subsequent use, as when early users’ preferences are used to make helpful suggestions for subsequent users [10]. A generalization of this notion of affecting peers by affecting their environment is the notion of stigmergy. Stigmergy is a form of indirect communication between agents that is achieved by agents modifying their environment and also responding to these modifications [12]. This effect has been well documented in ant swarms, in which ants lay down pheromones as they walk that attract subsequent ants [62]. An analogous stigmergic effect is achieved by “swarms” of humans that make a terrain more attractive to others by wearing down the vegetation with their own steps [26,27,30]. Stigmergy has recently been proposed as an important
Collective Search in Concrete and Abstract Spaces
279
mechanism for achieving multirobot cooperation [32] and robustly interacting software systems [50]. For these reasons, searching in a group changes the essential nature of the search process. The experiments and computational models that we describe concern this process of collective search. Our approach stresses the macroscopic, group-level behavior that emerges when the groups’ members pursue their self-oriented strategies. This work is an effort to complement cognitive scientists’ tendency to focus on the behavior of single individuals thinking and perceiving on their own. Social phenomena such as rumors, the emergence of a standard currency, transportation systems, the World Wide Web, resource harvesting, crowding, and scientific establishments arise because of individuals’ beliefs and goals, but the eventual form that these phenomena take is rarely dictated by any individual [25].
2 Foraging for Concrete Resources A problem faced by all mobile organisms is how to search their environment for resources. Animals forage their environment for food, Web users surf the Internet for desired data, and businesses mine the land for valuable minerals. When an organism forages in an environment that consists, in part, of other organisms that are also foraging, then unique complexities arise. The resources available to an organism are affected not just by the foraging behavior of the organism itself, but also by the simultaneous foraging behavior of all of the other organisms. The optimal resource foraging strategy for an organism is no longer a simple function of the distribution of resources and movement costs, but it is also a function of the strategies adopted by other organisms. 2.1 Group Experiments on Foraging One model in biology for studying the foraging behavior of populations is the ideal free distribution (IFD) model [16,58]. This model assumes that animals distribute themselves among patches so as to maximize the gained resources. The specific assumptions of the model are that animals (1) are free to move between resource patches without cost, (2) have correct (“ideal”) knowledge of the rate of food occurrence at each patch, and (3) are equal in their abilities to compete for resources. The model predicts an equilibrium distribution of foragers such that no forager can profit by moving elsewhere. This condition is met if the distribution of foragers matches the distribution of resources across patches. Consistent with this model, groups of animals often distribute themselves in a nearly optimal manner, with their distribution matching the distribution of resources. For example, [21] distributed edible larvae to two ends of a tank filled with cichlid fish. The food was distributed in ratios of 1:1, 2:1, or 5:1. The cichlids quickly distributed themselves in rough accord with the relative
280
Robert L. Goldstone et al.
rates of the food distribution before many of the fish had even acquired a single larva and before most fish had acquired larvae from both ends. Although animals frequently distribute themselves in approximate accord with an ideal free distribution, systematic deviations are also observed. One common result is undermatching, defined as a distribution of animals that is less extreme than the distribution of resources [36]. An example would be a 75/25 distribution of foragers when the resources have an 80/20 distribution. When undermatching occurs, there are fewer animals at the richer patch, and more animals at the leaner patch, than is optimal. The few experiments that have examined group foraging behavior with humans have also found undermatching [38,42]. Our experiments extend the previous studies of group foraging in humans in a few directions. First, we have developed a computer-based platform for the foraging experiment that allows us to manipulate experimental variables that would be difficult to manipulate in a more naturalistic environment. Second, we collect second-by-second data on the amount of resources and number of participants at different pools, which allows us to explore variation in resource use with high temporal resolution. Third, although our environment is virtual, it is naturalistic in one important respect: resources are distributed in a continuous spatial environment rather than at two discrete locations. Fourth, we do not designate or identify the resource alternatives to participants. As in many natural situations [36], the participants must discover the number and locations of resource patches themselves. Using our virtual environment with interacting participants, we manipulated the relative outputs of the different resource pools and the knowledge possessed by the agents. In Godin and Keenleyside’s experiment with cichlids [21], every cichlid could see the other cichlids as well as the larvae resources at both ends of the tank. Gallistel [17] argued that this kind of information is important for the cichlids to distribute themselves rapidly in accord with the resource distribution. They are learning about the resource distributions by observing events that do not directly involve themselves. However, in individual reinforcement learning situations, an agent only has access to the outcomes of its own actions. It does not have access to the values of options not selected. Both situations occur naturally, and it is possible that the ability of a group to efficiently distribute itself to resources depends on the information at each agent’s disposal [61]. In Experiment 1 [23], two resource pools were created with different rates of replenishment. The participants’ task was to obtain as many resource tokens as possible during an experiment. A participant obtained a token by being the first to move on top of it. Resources were split evenly 50/50, or had a 65/35, or 80/20 split. In our visible condition, each participant could see each other and the entire food distribution. In our invisible condition, they could not see other participants, and they gradually acquired knowledge of the resource distributions by virtue of their reinforcement histories. Participants were 166 undergraduate students from Indiana University and were run in eight sessions with about 21 participants in each session.
Collective Search in Concrete and Abstract Spaces
281
Participants worked at their own computers and were told that they were being asked to participate in an experiment on group behavior. They were instructed to try to pick up as many “food” resources as possible by moving their icon’s position on top of food locations. Participants within a group co-existed in a virtual environment consisting of an 80 × 80 grid of squares populated by replenishing resource pools and other human-controlled agents. Participants controlled their position within this world by moving up, down, left, and right using the four arrow keys on their computers’ keyboards. Each participant was represented by a yellow dot. In the “visible” condition, all of the other participants’ locations were represented by blue dots, and available food resources were represented by green dots. In the “invisible” condition, participants only saw their own position on the screen and any food gathered by that participant in the last two seconds. After this time interval, these consumed food pieces disappeared. The rate of distribution of food was based on the number of participants, with one piece of food delivered every 4/N seconds, where N is the number of participants. This yields an average of one food piece per participant per four seconds. When a piece of food was delivered, it was assigned to a pool probabilistically based upon the distribution rate. For example, for the 80/20 condition, the food would be delivered to the more plentiful pool 80% of the time, and in the less plentiful pool 20% of the time. The location of the food within the pool followed a Gaussian distribution. Every experiment was divided into six 5-minute sessions. These six games consisted of all combinations of the two levels of knowledge (visible versus invisible) and the three levels of resource distribution (50/50, 65/35, 80/20). The locations of the resource pools were different for each of the six sessions so that participants would not have knowledge that carried over sessions. However, the distances between the two resource pools were kept constant across the sessions. As a preliminary analysis of the distribution of agents across resource pools, Figure 1 shows the frequency with which each of the 80 × 80 grid cells was visited by participants, broken down by the six experimental conditions. The brightness of a cell increases proportionally with the number of times the cell was visited. The few isolated bright specks can be attributed to participants who decided not to move for extended periods of time. The thick and thin circles show one standard deviation of the food distribution for the more and less plentiful resources, respectively. An inspection of this figure indicates that agents spend the majority of their time within relatively small regions centered on the two resource pools. The concentration of agents in pools’ centers is greater for visible than invisible conditions, and is greater for the more plentiful pool. For the invisible conditions, there is substantial scatter of travel outside of one standard deviation of the pools’ centers. The dynamics of the distribution of agents to resources is shown in Figure 2, broken down by the six conditions. In this figure, the proportion of agents in the two pools is plotted over time within a session. We plot the proportion of agents in the pools, only counting agents that are within three standard
282
Robert L. Goldstone et al.
Fig. 1. A frequency plot of participants’ visits to each grid square in Experiment 1.
deviations of one pool or the other. For the visible and invisible resources conditions, an average of 0.7% and 16.7% of participants, respectively, were excluded because they were not in either resource pool. This large difference in exclusion rates is most likely due to the need for exploratory foraging in the invisible resources condition. The 50/50 graph is not particularly diagnostic. For the 65/35 graph, horizontal lines indicate the proportions that would match the distribution of food. Although fast adaptation takes place, the asymptotic distribution of agents systematically undermatches the optimal probabilities. For example, for the 65/35 distribution the 65% pool attracts an average of 60.6% of the agents in the 50–300 second interval, a value that is significantly different from 65%. Undermatching is similarly found for the 80/20 distribution. So, if we were efficiency consultants, we would recommend that foragers in the less productive pool should move to the more productive pool; the resources there are being relatively underutilized. A final analysis of interest explores the possibility of periodic fluctuations in resource use. Informal experimental observations suggested the occurrence of waves of overuse and underuse of pools. Participants seemed to heavily congregate at a pool for a period of time, and then become frustrated with the difficulty of collecting food in the pool (due to the large population in the pool), precipitating a migration from this pool to the other pool. If a relatively large subpopulation within a pool decides at roughly the same time to migrate from one pool to another, then cyclic waves of population change may emerge. A Fourier transformation of the time series data was applied to test this. Fourier transformations translate a time-varying signal into a set
Collective Search in Concrete and Abstract Spaces
283
Fig. 2. Changes in group sizes over the course of a session in Experiment 1.
of sinusoidal components. Each sinusoidal component is characterized by a phase (where it crosses the Y -intercept), amplitude, and frequency. For our purposes, the desired output is a frequency plot of the amount of power at different frequencies. Large power at a particular frequency indicates a strong periodic response. The frequency plots for Experiment 1 show significantly greater power in the low frequency spectra for invisible than visible conditions. The power in lower frequencies is particularly high for the invisible condition with an 80/20 distribution. For all three invisible conditions, the peak power is at approximately 0.02 cycles/second. This means that the agents tend to have waves of relatively dense crowding at one pool that repeat about once every 50 seconds. This 50-second period includes both the time to migrate from the first pool to the second pool and to return to the first pool. A pronounced power peak at lower frequencies is absent for the visible condition. One account for the difference in the two visibility conditions is that in the visible condition, each agent can see whether other agents are closer than themselves to underexploited resource pools. The temptation to leave a dissatisfying pool for a potentially more lucrative pool would be tempered by the awareness that other agents are already heading away from the dissatisfying pool and toward
284
Robert L. Goldstone et al.
the lucrative pool. However, in the invisible condition, agents may become dissatisfied with a pool populated with many other agents, but as they leave the pool they would not be aware that other agents are also leaving. Thus, the ironic consequence of people’s shared desire to avoid crowds is the emergence of migratory crowds! In a related irony, Rapoport et al. [55] report in this volume that when individuals within a group each choose paths so as to have a low-cost journey, the collective result can be average journey costs that are high. The problem in both of their and our group experiments is that people end up adopting similar courses of action despite their best intentions, and inefficient congestion arises. In a second experiment [24], we wished to decouple the visibility of agents and food resources. Accordingly, we ran groups of participants in conditions where food resources, but not fellow foragers, were visible, and vice versa. This decoupling allows us to determine whether people use other people as information sources about where food may be located, and if so, how this information is used. An organism may be attracted toward patches occupied by its conspecifics. An animal can use the prevalence of conspecifics in a patch as information that the patch is highly productive. Consistent with this hypothesis, field experiments on migratory birds have shown that the presence of birds attracts other birds to a region [53]. Adding birds to a site makes it more likely for still more birds to choose the site for nesting, which is why duck hunters will put out decoys. Another familiar example is the tendency of buzzards to use the presence of other buzzards as an indicator of possible food sources, and therefore to fly to where there is a large group of buzzards. On the other hand, an animal may avoid sites that already have a crowd of conspecifics. Pulliam and Danielson’s ideal preemptive distribution hypothesis [54] is that the first animals to arrive in an area will take the best territory, with subsequent arrivals taking the best remaining territories. This pattern has been observed with aphids. One of the central questions examined by Experiment 2 is: are people more like buzzards or aphids with respect to the influence of conspecifics on foraging strategies? The results, shown in Figure 3, indicate both systematic undermatching and overmatching in the distribution of human participants to resources over time. Consistent overmatching was found when resources were visible but other agents were invisible. When agents but not resources were visible, undermatching was found. The results support the hypothesis of ideal preemptive distribution rather than conspecific attraction. If participants were attracted to a resource pool because of the presence of other foragers at the pool, then overmatching would have been predicted with invisible resources and visible agents. That is, in a situation where direct knowledge of resources was lacking but the popularity of a pool could be used to estimate the pool’s productivity, the presence of a relatively large number of participants at the richer pool would be expected to draw still more participants to the pool. In fact, a modest level of undermatching was observed in this condition. By contrast, according to the ideal preemptive distribution hypothesis, individuals at a site
Collective Search in Concrete and Abstract Spaces
285
Fig. 3. Changes in group sizes over the course of a session in Experiment 2. The thicker line represents the more productive resource pool.
preempt other individuals from occupying that site. This is consistent with the undermatching observed when agents but not resources are visible, and it is also consistent with the release from undermatching (i.e., overmatching) observed when resources but not agents are visible. By this account, overmatching is found because participants are attracted to the rich productive pools, and are not dissuaded from approaching the pools by the presence of other participants (who are invisible). As with Experiment 1, cyclic waves of population migration were suggested by a Fourier analysis. Together, the results suggest that our participants did not naturally make second-order inferences that other participants would be influenced by the same factors (i.e., dearth of resources in a pool, or sudden onset of food resources) that influenced themselves. 2.2 An Agent-Based Model of Collective Foraging In developing EPICURE [56], an agent-based computational model, our intention was to build as simple a model as possible, with only those strategies included that the empirical results directly implicated. An interactive version of the resulting model is available at [15]. We populated a world with agents that probabilistically decided from moment to moment toward which spatial grid location they would move. The likelihood that a particular location is selected as the target destination is based on the location’s value relative to
286
Robert L. Goldstone et al.
all other locations. Value is influenced by several factors. The first is the distance of location. The closer a location is, the more likely it is to be selected as a target destination. Second, once a location has been selected as a target, we increase its value so that it will tend to be selected as a target at the next moment too. This is a way of incorporating inertia to target locations, and we call this goal bias. Our empirical results showed that people tend to stick around a resource for a while and then switch to the other pool if they know where it is. We do not see people start to move toward a pool, get midway into “no-man’s land,” and then head back to the pool they just left. However, computational foragers did exactly this until we incorporated inertia for their destinations. There are different rule variants for the foragers in the visible and invisible conditions. For the agents who can see all of the other agents and food, a third factor is that the value of a location increases as the density of food in its vicinity increases, and fourth, the location’s value decreases as the density of other agents increases. The motivation for this is that if other agents are present in a region, then there will be more competition to get the food and hence the region is less attractive. Neither of these sources of information is available in the invisible condition. In the invisible condition, agents must gradually accumulate a personal history of where they have found food. Every time food is found in a location, the location’s value increases, and this increase diffuses to the nearby locations. Conversely, if a cell is visited but contained no food, then its value and its neighbors’ values decrease. EPICURE is able to account for the empirically observed pattern of overmatching and undermatching for the four visibility conditions in Experiments 1 and 2. For the visible condition of Experiment 1, EPICURE’s agents rapidly converge on a distribution that approximates the food distributions but consistently undermatches the food distributions (as shown in Figure 3). For the invisible condition, we also find that the agents can learn where the food clusters are given a bit more time, but that there is also asymptotic stable undermatching over the entire experiment. Finding undermatching was surprising to us because we had anticipated that we would only get undermatching if we included an explicit bias in our agents to assume that all discovered resource patches had approximately equal frequencies of outputs. In fact, our agentbased system spontaneously produced undermatching as our human subjects did, even though it makes none of the typical assumptions posited by the biology literature to explain undermatching [36]. It does not need a bias to spend equal time at different resource pools, unequal competitive abilities among foragers, or foraging interference. Why does EPICURE predict undermatching? The critical notion is spatial turfs. A single agent can efficiently patrol a compact region of about 10 squares despite large differences in food productivity. Although the 80% pool has four times the productivity of the 20% pool, they both have the same spatial extent and variance, and so can support agents in numbers that are more similar than predicted by the pools’ productivities.
Collective Search in Concrete and Abstract Spaces
287
Fig. 4. EPICURE matching results for the visible (top) and invisible (bottom) 80/20 food distribution conditions with different uniform variance pool sizes. In each condition, the first number at the top of the graphs indicates the radius of the 80% pool (square regions were used for simplicity), and the second number indicates the radius of the 20% pool. For example, 16 versus 8 indicates that the 80% pool covers a total area of (16 × 2) × (16 × 2) = 1024 cells whereas the 20% pool covers a total area of (8 × 2) × (8 × 2) = 256 cells.
A corollary of this account of overmatching is that the matching of agents to resources should depend heavily on the variances of the pools. EPICURE’s predictions are shown in Figure 4. When a resource pool occupies a larger area, then all else being equal it will attract more EPICUREans. When the pool variances are identical (8 versus 8), the agents slightly undermatch the resources. When the 80% pool has half the variance of the 20% pool (8 versus 16), dramatic undermatching occurs because it takes more agents in the 20% pool to cover the much larger area. When the variances are reversed (16 versus 8), the rarely observed phenomenon of overmatching occurs, and the explanation lies in the fact that the densities of the pools are equal, but the coverage times are unequal because the food rate is low. Agents in the 20% pool have less area to cover and fewer pieces of available food, so the average pick-up time remains relatively low and food does not last long enough to attract agents, as it does in the 80% pool. Finally, in the 16 versus 16 condition, nearly perfect matching is observed. EPICURE’s prediction, comparing the 8 versus 8 to the 16 versus 16 conditions, is that as the spatial extent of two pools increases, better matching should be found. Baum and Kraft [4] found similar results with pigeons competitively foraging for food from two bowls (small resource pools) or troughs (large).
288
Robert L. Goldstone et al.
EPICURE also correctly predicts our results from the mixed visibility conditions of Experiment 2. When agents and resources are both visible, there is a tendency for participants not to go to the more prolific pools because of the restraining effect of high agent density. However, when only the resources are visible, agents cannot see the high agent density of the prolific resource, and hence the empirically observed overmatching is also found in the model. EPICURE also spontaneously produces population waves revealed by Fourier analysis, and as in our empirical data, the strongest frequency response was at about 0.02 cycles/second. The model correctly predicts an average of about 2–3 pool switches across a five-minute experiment. There are several results in the literature that are also predicted by the model. Greater undermatching is predicted and empirically found as the number of agents increases [19]. As the number of agents increases, or the resource area gets more constricted [4], then the number of unoccupied turfs decreases and the resources are well covered by agents staking out their small turfs. The spatial coverage of the agents becomes more important than the resource pool productivity in determining the distribution of agents. A final, somewhat counterintuitive prediction of EPICURE is that increasing the distance between two resource pools, and hence travel cost, should decrease, not increase, undermatching. For example, in one set of simulations, we expanded the gridworld to 120 × 120 cells, and compared simulations with pool centers at (20, 20) and (100, 100) to simulations with pool centers at (40, 40) and (80, 80). With visible agents, the increased distance between pools led to nearly perfect matching. These results agree with the empirical pigeon foraging results found by Baum and Kraft [4]. As the distance between pools increased, the pigeons more closely matched the ideal free distribution model, and they switched pools significantly fewer times. Milinski [46] also found that stickleback fish switch between pools significantly less often as the distance between pools increases. Likewise, in our simulation, the far-apart pools led to significantly fewer average switches than the closer pools. The dynamics of the constrained visible model offers a simple explanation for both the switching and matching results. As the pools become more separated, it is much less likely that an agent will probabilistically choose to switch pools, because the other pool’s resources are so far away. Furthermore, if the agent does decide to switch, the longer distance means there are more opportunities for the agent to change its decision and choose a food pellet in the previous pool, although the goal bias tempers this change in decision. The decreased switching, in turn, promotes better matching because the new pool must appear to be consistently better in order for the agent to complete the journey. The burden of switching is higher, so agents are more likely to switch only when there is a true advantage. An additional consideration is that as travel costs increase, by further separating pools, undermatching increases because it becomes less likely that an agent will sample from both pools and if the pools are equally large then they have approximately equal chances to originally capture an agent.
Collective Search in Concrete and Abstract Spaces
289
It might be argued that EPICURE can be fit to experimental results, but does not genuinely predict outcomes from foraging experiments. In EPICURE’s defense, many of the model’s behaviors were observed before we knew of their empirical support. These predictions are invariant with changes to parameters within reasonable ranges. This includes its surprising prediction of decreased undermatching with increased travel costs, and increased undermatching with smaller resource patches. However, for the future record, we also offer further genuine predictions of EPICURE for which we do not currently know of supporting evidence. First, EPICURE predicts greater disparity of wealth (resources retrieved) among agents in the invisible than visible resources condition. If resources are invisible, then some fortunate agents can effectively monopolize a resource pool without other agents getting a chance to sample from it. Second, EPICURE predicts greater disparity of wealth among agents in the more prolific, compared to less prolific, pool. Third, EPICURE predicts strong reliance on social information with an environment containing a few large resource pools that stochastically appear and deplete in different regions of the environment, but predicts foragers to rely on privately obtained information if those same resource pools are small and quickly depleted. We are currently conducting experiments to test these predictions. In summary, we believe that EPICURE provides an elegant synthesis of several results from the literature on human and animal foraging. Our empirical results highlight the importance of knowledge on group-level behavior. We find three empirical inefficiencies in our groups’ behavior: (1) undermatching in the sense that there were too many participants in the less plentiful resource and too few participants in the more plentiful resource, (2) participants were more scattered than were the food resources, and (3) systematic cycles of population change are apparent whereby the migration of people from one pool to another is roughly synchronized. All three of these inefficiencies were more pronounced for invisible than visible conditions. Knowledge of food distributions allows an agent to more effectively match those distributions, whereas knowledge of other agents allows an agent to more effectively decouple their responses from others. The importance of agent and food information led us to feature these in our computational model of foraging. One of the best ways to evaluate a complex adaptive system model is to see whether behaviors arise that are not explicitly forced by the rules: are we getting out more than we knew we were putting in the model. By this measure, the model does a good job of explaining collective foraging behavior. EPICURE shows high-level behaviors such as undermatching and population cycles even though it was not built with the intention of creating them, and the model also predicts the specific dependencies of these high-level behaviors on population size, and the location, variance, and the productivity of resources.
290
Robert L. Goldstone et al.
3 Propagation of Innovations in a Group The previous experiments and computational model described a situation with competitive foraging for spatial resources. We now turn our attention to a situation with collective foraging for abstract resources. In the concrete spatial foraging task, resources consumed by one agent could not be consumed by others. However, in our abstract search scenario resources are less tangible and thus those used by one agent can still be used by another. This is akin to searching for the solution to a math problem which, once found, can be imitated with no loss to the discoverer. Nevertheless, as with the foraging task, agents can still benefit from knowing where other agents are searching. Furthermore, similar to the foraging task, the abstract search situation is not genuinely cooperative because each agent is still trying to maximize its own, not the group’s performance. 3.1 Group Experiments on Innovation Propagation Humans are uniquely adept at adopting each others’ innovations. Although imitation is commonly thought to be the last resort for dull and dim-witted individuals, cases of true imitation are rare among nonhuman animals [5], requiring complex cognitive processes of perception, analogical reasoning, and action preparation. This capacity for imitation has been termed “no-trial learning” by Bandura [3], who stressed that, by imitating one another, people perform behaviors that they would not have otherwise considered. When combined with variation and adaptation based on reinforcement, imitation is one of the most powerful methods for quick and effective learning. Cultural identity is largely due to the dissemination of concepts, beliefs, and artifacts across people. The tendency for humans to imitate is so ubiquitous that Meltzoff [44] has even suggested that humans be called “Homo imitans.” In social psychology, there has been a long and robust literature on conformity in groups [9,59]. The usual finding is that people conform to majorities in groups. To some degree, conformity is found because people desire to obtain social approval from others. For example, sometimes when people give their answers privately, they are less likely to conform to the group’s opinion than when responding publicly [11]. However, at other times, the conformity runs deeper than this, and people continue to conform to the group’s opinion even privately [59]. In our experiments and modeling, we are interested in the use of information provided by others even when social approval motivations are minimized because the group members never meet one another and are anonymous. Conformity to others’ ideas has been a major field of research not only in social psychology, but also in economics, political science, and sociology. It is common in models of collective action to make an individual’s decision to participate based upon his expectations of how many other people will participate [8]. A common outcome of a collective, “I’ll do it if you do it,”
Collective Search in Concrete and Abstract Spaces
291
mentality, is for “tipping points” to arise in which adding a couple more participants to an action leads to a positive feedback cycle in which still more participants sign on, leading to an exponential increase in participation for a time [20]. This behavior is a sensible policy both because the likelihood of success of an innovation depends upon its public adoption rate [7] and because other people may have privileged information unavailable to the individual making a choice. The potential cost of this bandwagon behavior is wasted time, money, and effort in adopting new innovations [57,60]. Our studies explore the diffusion of innovative ideas among a group of participants, each of whom is individually trying to find the best solution that she can to a search problem. The work fills an important gap in research. There are several excellent computational models for how agents in a population exchange information [2,35,48]. There is also excellent work in social psychology on how individuals conform or use information provided by others [18]. Fieldwork also explores actual small groups of people engaged in cooperative problem solving [1]. However, there is very little work with laboratorycontrolled conditions that explores the dynamics of a group of participants solving problems as they exchange information. One related study is Latan´e and L’Herrou’s [40] exploration of participants’ sending e-mail messages to each other (see also [39]), as they tried to predict which of two options their group would select. Over the course of message exchanges, neighboring participants in the network tended to adopt similar choices (consolidation) but there was also continued diversity of choices across the entire network. In contrast to this work, our research predominantly focuses on situations where participants are trying to find good solutions to a problem rather than trying to conform to their neighbors. For example, farmers may discuss the benefits of various crop rotation techniques with their neighbors, and may be convinced to try a new one by a neighbor’s success, but there is no reward to conforming to a neighbor’s behavior in itself. In creating a paradigm for studying information dissemination, our desiderata were: (1) a problem to solve with answers that vary continuously on a quantitative measure of quality, (2) a problem search space that is sufficiently large that no individual can cover it all in a reasonable amount of time, and (3) simple communications between participants that are amenable to computational modeling. We settled on a minimal search task in which participants guess numbers between 0–100 and the computer reveals to them how many points were obtained from the guess by consulting a hidden fitness function [43]). In addition, random noise was added to the points earned, so that repeated sampling was necessary to accurately determine the underlying function relating guesses to scores. Over 15 rounds of guesses, participants tried to maximize their earned points. Importantly, participants get feedback not only on how well their own guess fared, but also on their neighbors’ guesses. In this manner, participants can choose to imitate high-scoring guesses from their neighbors. We experimentally manipulated the network topology that
292
Robert L. Goldstone et al.
determines who counts as neighbors, as well as the fitness function that converts guesses to earned points.
Fig. 5. Examples of the different network structures for groups of ten participants. Circles represent participants and lines indicate communication channels.
We created neighborhoods of participants according to random, regular lattice, fully connected, and small-world graphs. Examples of the graph topologies for groups of ten participants are shown in Figure 5. In the random graph, connections are randomly created under the constraint that the resulting graph is connected: there is a path from every individual to every other individual. Random graphs have the property that individuals tend to be connected to other individuals via paths that do not require passing through many other individuals. This property has been popularized as the notion of “six degrees of separation” connecting any two people in the world, and has been experimentally supported [45]. More formally, the average path length connecting two randomly selected nodes in a random graph is ln(N )/ ln(K) where N is the number of nodes and K is the average number of neighbors connected to each node. The regular lattice can be used to represent a group with an inherent spatial ordering such that people are connected to each other if and only if they are close to each other. The regular lattice also captures the notion of social “cliques” in that if there is no short path from A to Z, then there will be no direct connection from any of A’s neighbors to any of Z’s neighbors. In regular lattices, the average path required to connect two individuals requires going through N/2K other individuals. Thus, the paths connecting people are much longer, on average, for lattice than random graphs.
Collective Search in Concrete and Abstract Spaces
293
Random graphs have short paths, but unfortunately (from the perspective of modeling social phenomena) do not contain cliques. Lattices show cliques, but do not have short path lengths. Recently, considerable interest has been generated in networks that have both desirable properties, so-called “smallworld networks.” These networks can be formed by starting with a lattice and randomly rewiring (or adding new connections, in the case of our experiments and Figure 5) a small number of connections [64]. The result is a graph which still has cliques because nodes that are connected to the same node tend to be spatially close themselves, yet also have a short average path length. From an information processing perspective, these are attractive networks because the spatial structure of the networks allows information search to proceed systematically, and the short-cut paths allow the search to proceed quickly [37]. Notice, in Figure 5, that all three of the described networks have a total of 12 connections between 10 participants. Thus, if there is a difference in information dissemination in these networks, then it must be due to the topology, not density, of the connections. A fourth network, a fully connected graph, allows every participant to see the guesses and outcomes of every other participant. We compared two hidden functions for converting guessed numbers to points. The unimodal function has a single best solution that can always be eventually found with a hill-climbing method (see Figure 6). The trimodal function increased the difficulty of the search by introducing local maxima. A local maximum is a solution that is better than all of its immediate neighboring solutions, yet is not the best solution possible. Thus, a simple hill-climbing method might not find the best possible solution.
(a) Unimodal fitness function
(b) Multimodal fitness function
Fig. 6. Examples of the unimodal and multimodal fitness functions that convert guesses into obtained points.
294
Robert L. Goldstone et al.
Twelve groups of Indiana University undergraduate students ranging in size from 7–18 people with a median of 14 people per group participated for partial course credit, for a total of 153 participants. Each group participated in eight experiments that consisted of every combination of the four network types (Figure 5) and two fitness functions (Figure 6). Participants were told to try to maximize their total number of points acquired over 15 rounds of number guessing, and that the same guess would be worth about the same number of points from round to round, but that a certain amount of randomness was added to the earned points. Participants were also told that they would see the guesses and points earned by some of the other participants, and that these others would also see the participants’ guesses and earnings. The results from this experiment are shown in Figure 7, expressed in terms of the percentage of participants within one-half standard deviation of the global maximum for a fitness function. Over the 15 rounds, increasingly many participants find the global maximum. For the unimodal function, the fully connected network finds the global maximum most quickly, and the advantage of the fully connected network over the other three networks is particularly striking for Rounds 2–4. Around Round 5, the small-world network catches up to the performance level of the fully connected network, and for the rest of the rounds, these two network types continue to outperform the other two networks. This pattern of results is readily explainable in terms of the propensity of a network to disseminate innovations quickly. Innovations disseminate most quickly in the full network because every individual is informationally connected to every other individual. For the multimodal payout function, the small-world network performs better than the fully connected network for the first six rounds. One account for its superiority over the full network is that the small-world network is able to thoroughly search the problem space. The fully connected groups frequently get stuck in local maximum because the groups prematurely converge on a good, but not great, solution. The small-world structure is an effective compromise between fully exploring a search space and also quickly disseminating good solutions once they are found. Much as how our foraging simulations counterintuitively revealed a more optimal distribution of agents to resources when the environment limited the ability of the agents to easily explore each site, the most surprising aspect of these results is that the truism of “the more information, the better” is not supported. Giving each participant all of the results from all of the agents does not lead to the best group solution for the multimodal problem; the problem with this policy is that with the fully connected network, everybody ends up knowing the same information. Participants thereby become too like-minded, acting as a single explorer, rather than a federation of independent explorers. The general point from this first experiment is that before one decides how to connect a group, one should know about the nature of the problem the group needs to solve. A candidate generalization is that the more exploration a group needs to do, the more clustered and locally connected the
Collective Search in Concrete and Abstract Spaces
295
(a) Unimodal payout function
(b) Multimodal payout function
Fig. 7. Percentage of participants within one standard deviation of the global maximum on each round for the unimodal and multimodal payout functions.
296
Robert L. Goldstone et al.
network should be. Conversely, the more quickly a group needs to exploit emerging solutions, the more globally connected individuals should be. Problem spaces that require considerable exploration to find the global maximum should benefit from networks that have relatively well isolated neighborhoods that can explore different regions of a problem space. To test this hypothesis, in a separate experiment we also tested the more difficult fitness function shown in Figure 8 that we call the needle function. This function features one very broad local maximum, and one hard-to-find global maximum. We tested 12 groups of participants in needle functions like this, with each group connected in the same four network topologies we used before. For this function, Figure 9 shows that the lattice network performed better than the other three network types, starting by Round 7 if not earlier. The lattice network fosters the most exploration because of its spatially segregated network neighborhoods. Exploration of the problem space is exactly what is needed for the needle function because of its hard-to-find global maximum.
Fig. 8. An example of the “needle” payout function. This function features one broad local maximum that is easy to find and one narrow global maximum that is difficult to find.
The three payout functions are ordered by the demands they place on broad exploration of a problem space. The benefit for exploration increases going from the unimodal to the multimodal to the needle function. In parallel, the network structures are ordered by their preservation of local cliques of nodes. Cliquishness increases going from full to small-world to lattice networks. These two progressions are coordinated, with the full network performing best with the unimodal function, the small-world network performing best with the multimodal function, and the lattice performing best with the needle function. In contrast to arguments for a general informational advantage of
Collective Search in Concrete and Abstract Spaces
297
Fig. 9. Performance for the four network structures with the needle payout function. For this function, the lattice network performs better than the other three network types.
small-world networks [64], we find that what network is best depends on the kind of problem a group must solve (see also [41]). As broader exploration is needed to discover good solutions, increasingly cliquish networks are desirable. 3.2 A Computational Model of Innovation Propagation We have developed an agent-based computational model of our experiments based on the premise that members of a group can choose to explore a problem space on their own or take advantage of the solutions found by others. In the model, called SSEC (for self-, social-, and exploration-based choices), every agent on every round probabilistically chooses among three strategies: using their own guess on the last round, using their neighbors’ best guess on the last round, and randomly exploring. Each agent randomly chooses among these strategies, with the likelihood of each strategy based on its intrinsic bias and also its observed success. Guesses also include normally distributed randomness to avoid perfect imitation. The model, thus, can be expressed as B x Sx p(Cx ) = P , n B n Sn
(1)
where p(Cx ) is the probability of using Strategy x, Bx is the bias associated with the strategy, and Sx is the score obtained from the strategy. The participant’s guess is then Gx + N (µ = 1, σ = 1), including normally distributed randomness, with Gx being the guess associated with Strategy x. When the random exploration strategy is selected, a uniform distribution is used to select the next guess. This model is motivated by the particle swarm
298
Robert L. Goldstone et al.
algorithm [35]. However, unlike the swarm algorithm, the SSEC model allows sudden jumps in guesses rather than smoothly changing patterns of oscillations around promising solutions. The experimental results showed that participants frequently jumped from one guess to a completely different guess, a behavior that the original particle swarm algorithm does not accommodate. The simplest version of this model with mostly default parameter values for the biases was able to accommodate some, if not all, of the trends in the results. In particular, we tested a version of the model in which B1 (the bias for using one’s own previous guess) is 1, B2 (the bias for using one’s neighbor’s best scoring guess) is 1, and B3 (the bias for randomly exploring) is 0.1. This is essentially a one-parameter control of biases because B1 and B2 were constrained to be equal, and only the relative, not absolute, value of B3 matters given the choice model used to determine strategy choice. In addition, the value of σ that determines the mutation/drift rate for guesses was set to 3, and noise with a variance of 30 and a mean of 0 was added to the fitness function’s output, just as it was to experiment scores. Each of the four network types was run 1000 times with each of the three fitness functions for 15 rounds of guessing and 15 agents per group. This model showed the best performance with the full network with the unimodal function, the best performance with the small-world network with the multimodal function, and a two-way tie for the best final performance between the lattice and smallworld networks for the needle function. This pattern is the same as empirically observed, except for the needle function, where we found the lattice network performing better than all other network types. Given the promising results of this original set of simulations, we parametrically manipulated the network connectivity to continuously shift from a regular lattice with only local connectivity to a fully connected network in which every agent is directly connected to every other agent. This was achieved by connecting 15 agents via a lattice, and then adding a number of additional random connections between agents. As the number of random connections increases, the network initially transforms from a random network to a small-world network. Then, as the connectivity further increases, the network transforms from a small-world network to a fully connected network. If more information communicated in a network always increases group performance, then we expect better performance (shown by brightness in Figures 10–12) as connectivity increases. Independently, we manipulated the relative weight given to information obtained from oneself compared to others. Keeping B3 constant at 0.1, we varied B1 from 0 to 1 and set B2 equal to (1 − B1 ). Thus, we varied the degree to which each agent’s guesses were based on his own previous guess compared to others’ guesses. In Figures 10–12, as we go from left to right, we go from “sheepish” agents that base their guesses completely on others’ guesses (and an occasional random guess) to “mavericks” that always go their own way without any influence of others.
Collective Search in Concrete and Abstract Spaces
299
Fig. 10. Group performance for the unimodal function. This graph shows the interaction between the bias for self- versus other-obtained information and the number of random links added to a regular lattice. Group performance is measured by the percentage of individuals within one standard deviation of the global maximum of the fitness function. The brightness of each square indicates the group’s performance after 15 rounds of number guessing. For this simple problem space, group performance increases monotonically with increased reliance on others’ information and network connectivity.
Fig. 11. Group performance for the multimodal function. The best performance is found for a combination of using self- and other-obtained information, and for intermediate levels of network connectivity.
300
Robert L. Goldstone et al.
Fig. 12. Group performance for the needle function. This function benefits from even greater reliance on self-obtained information and decreased global network connectivity.
Figures 10–12 show that the influences of connectivity and agent independence are not constant, but rather depend on the shape of the problem space. For the easy-to-solve unimodal problem, Figure 10 shows that group performance increases monotonically with both increased reliance on others’ information and increased connectivity. Both trends can be explained by the fast propagation of innovations obtained when agents follow their best peers, and have many peers to follow. For single-peaked problems, there are no local maxima and so no concern with hasty collective convergence on suboptimal solutions. For the multimodal function (Figure 11), optimal group performance involves intermediate levels of both connectivity and self-reliance. These two factors trade off with each other such that increases in connectivity can be offset by decreases in conformity. Networks that have only local connectivity and self-reliant individuals perform relatively poorly because good solutions are inefficiently spread. Conversely, networks that have global connectivity and conformist individuals also perform poorly because the group frequently converges on local rather than global maxima. Good group performance is found when a group can both search a problem space for good solutions, and yet spread those solutions quickly once they are found. This is achieved when conformist individuals are combined with a network that limits connectivity, or when self-reliant individuals are combined with more broadly connected networks. If one is able to engineer a social network, then one’s target network should depend both on the problem, and “personalities” (mavericks versus sheep) of the nodes in the network.
Collective Search in Concrete and Abstract Spaces
301
For the trickier needle function (Figure 12), the best performing networks are pushed even further in the direction of increasing self-reliance and decreasing connectivity. Consistent with our empirical results, the needle function requires more exploration, and both limiting connectivity and increasing self-reliance promote independent exploration of group members. As with the multimodal function, there is a tradeoff between network connectivity and individual self-reliance. A major conclusion from both the experiments and modeling is that propagating more information is not always good for the group. Lazer and Friedman’s computational model [41] converges on the same maxim: full access to what everybody else in a group is doing can lead human and computational agents to prematurely converge on suboptimal local maxima. Networks that preserve spatial neighborhoods promote exploration, and this can explain why the full network is the best network for the unimodal function, the small world network and its intermediate level of connectivity does best with the trimodal function, and the lattice function with no long-range connections does best with the difficult needle function. Although more information is not always better as far as the group goes, it is always in the best interest of individuals to use all of the information at their disposal. Accordingly, our innovation propagation paradigm provides an unexpected example of a social dilemma [25,49]. Individuals, looking out for their own self-interest, will seek out as much information from others as possible, but this can inhibit the group as a whole from widely exploring a search space. Thus, obtaining information from as many peers as possible is noncooperative behavior even though it does make links between individuals. Searching a problem space on one’s own is cooperative in the sense of allowing the group as a whole to collect the most points possible, by avoiding local maxima. Our simulations show that every individual agent is best off linking to as many other people as possible. Agents with relatively many links outperform those with relatively few links. However, if every agent links maximally to every other agent, then the entire group does not perform well due to premature convergence on good, but not optimal, solutions. Sensitivity to this conflict between individual and group interests may help in the design of adaptive social networks. Designing for the greater common good may sometimes entail placing limits on individuals’ ability to connect with each other. Problems with difficult, hard to find solutions often drive people to look to others for hints and clues, but these are exactly the kinds of problems for which limited local connectivity is advantageous. This analysis of the conflict between the good of the individual and group becomes particularly relevant when we turn to situations where people can choose their connectivity, rather than having it imposed. Pursuing experimental paradigms in which people can create their own social networks would be valuable as connecting with both the mathematical literature on the evolution of networks [13] and the social science literature on coalition formation [34]. In many naturally occurring groups, people have some choice in who
302
Robert L. Goldstone et al.
they will share information with, and what information they will reveal. From our perspective on human groups as complex systems, one of the interesting issues will be to study the global efficiency of information transmission in self-organized networks, and how incentives to individuals can be structured so that globally advantageous networks emerge.
4 Towards Unifying Collective Search Paradigms We have presented two cases of collective search, in concrete and abstract problem spaces. Although collective resource harvesting in space seems, at first, to have little to do with how innovations are propagated in a community, the continuity and parallels between them increasingly impress us. To see these connections, it is useful to consider some salient differences between the paradigms in terms of concreteness, competition, and the importance of information sharing. In terms of concreteness, the physical space of the foraging environment clearly presents search constraints missing in the number-guessing scenario. Foragers must pass through intermediate locations, and accrue travel costs when trekking over spaces between resource pools. In the number-guessing experiments, participants can hop from any number to any other number. However, in both scenarios, the resources themselves are “clumpy,” distributed in compact physical spaces for the foragers, or smooth fitness functions in the innovation propagation experiments. Furthermore, the physical constraints in the forager environment can easily be relaxed. The simple methodological change of allowing participants to point and click to destinations and immediately transport there eliminates the constraints of space. Alternatively, a version of the number-guessing game in which participants can only alter their guesses slowly serves to implement a one-dimensional space. These modifications make a strong case for unifying these paradigms. Travel costs, metric spaces, and ordered dimensions can be either present or absent in both paradigms, for both participants’ movements and the resource distributions. A second difference between the paradigms concerns competitiveness. In the foraging experiments, a food token that is picked up by one participant is no longer available for others. Resource collection is competitive. By contrast, in the number-guessing experiment, any number of participants can guess the same number without penalty. Resources are not consumed as they are sampled. This difference in competition yielded different results. For our foragers, the presence of people at a resource pool often dissuaded others from harvesting the pool, but for our number-guessers, there was no evidence for being deterred by crowds. However, it is easy to imagine versions of foraging without competition. For example, when “foraging” for good stock investments, stock prices do not fall, but rather typically increase, as more individuals buy a stock. Alternatively, competition can be introduced into the number-guessing experiment by dividing the points earned by a numeric guess by the number
Collective Search in Concrete and Abstract Spaces
303
of individuals making similar guesses. In fact, exactly this tactic of “resource sharing” has been proposed by genetic algorithm researchers in order to increase the diversity of a population of evolved solutions to multidimensional problems [22]. The competitiveness of resource harvesting is an important factor in determining the diversity of solutions and the advantage of traveling with and apart from crowds, but it is also a factor that may or may not accompany either scenario. Finally, the paradigms seem to differ in the importance of information sharing. For the number-guessing experiment, participants who do not ever copy others are at a distinct disadvantage. Sharing information about solution quality was sufficiently important that we explicitly manipulated the network configuration that determined the pathways for this sharing. Network connectivity was not manipulated in the foraging experiments, and a forager can perform well without ever following another forager’s lead. However, as we imagine expanding the forager world, decreasing the visibility of resources, and increasing the visibility of fellow foragers, the importance of information sharing increases. Even in our relatively small foraging environments, people often used other people as information scouts when people but not food resources were visible. All of these factors affect real-world search tasks. The results of our empirical foraging task and subsequent model can be extended towards concrete foraging tasks in the real world, such as chimpanzees foraging for fruit [29], the migratory patterns of early hominid foragers, and guppies foraging for mates [14]. Furthermore, many of the same principles also apply to abstract foraging tasks, such as human information foraging [52], information foraging on the World Wide Web, and Internet dating sites. Meanwhile, the results from our number-guessing scenario indicate that the social networking structure can determine the dynamics of collective search tasks. Together, these studies may lead to a greater understanding of individuals’ usage of personal information versus public information, and how this usage is contingent on the social context of a task. For example, undergraduate students face a dilemma when choosing courses for the semester. They certainly want to take courses that involve their own interests and satisfy requirements, but they also want to make use of social information that indicates which classes are easy, which professors are the best, and so on. The social information changes as students begin relying on Web sites’ ratings of professors and courses, rather than simply the students’ local networks of friends and acquaintances. The students’ local networks are presumably a more reliable source of knowledge, but the Web sites offer a greater breadth and accumulation of knowledge. This shift in public information usage can have real consequences in terms of the enrollment in particular courses and the subsequent course offerings. Similar issues of acquiring and weighting information occur in the purchasing of consumer goods. Individuals choose cars and cellphones according to features and style, but they also care about relevant information from their
304
Robert L. Goldstone et al.
peers [33], whether that information concerns the popularity of a product or simply the product’s reliability. Sometimes the criteria of the search space are relatively well defined, such as a car’s mechanical reliability or the average number of dropped calls on a cellphone, but the criteria often include intangibles, such as the aesthetics of a cellphone’s sleek style. Furthermore, we generally receive this product information from a variety of social networks, including our real-world social network, Web social networks such as MySpace, and search engines that may approximate full connectivity via their abundance of links. Clearly these search situations can become increasingly complex as individuals weigh their own personal information and the public information acquired from various sources, but our empirical paradigm provides a rich environment for both isolating and combining these factors. In our current work, we are exploring several tasks to better elucidate these real-world scenarios. In one experiment, we are examining the effects of a social network on an individual’s rating of musical pieces. Unlike the number-guessing scenario, there is no correct optimal solution. In a second experiment, we are studying a task very similar to the number-guessing task, except it involves guessing two-dimensional pictures by filling in squares and receiving both individual feedback and feedback regarding network neighbors’ guesses. This task involves a more difficult search space, and as a result, we expect an increased reliance on social information. Finally, we are also conducting an experiment that examines the relative participation in producer and consumer roles in a social network. In this experiment, we are trying to gauge collective search performance when each experiment round involves individuals’ choosing to either search for their own solutions or consume from their neighbors’ current guesses. We expect to find population oscillations as individuals alternate between scavenging group information and being forced to search for their own information, but we may also find that a mix of strategies is crucial for good group performance. Ultimately, we use the results from these studies to explore increasingly rich collective search tasks. By manipulating concreteness, competition, the importance of information sharing, and the inclusion of multiple sources of personal and public information, we can create a family of experiments unified by general principles of collective search. In the current chapter, we have described several key principles gleaned from our initial collective search experiments. These principles, exemplified in both of our paradigms, include: (1) a tradeoff between exploration and exploitation, (2) a compromise between individuals using self- versus other-obtained information, and (3) the emergence of group-level resource use patterns that result from individual interests but are not always favorable to these interests. Unfavorable manifestations of these principles include inefficient population waves, bandwagon effects, mismatches between agent and resource distributions, disadvantages for highly connected networks, and premature convergence of populations on local maxima.
Collective Search in Concrete and Abstract Spaces
305
Despite these difficulties, collective search continues to be an immensely powerful case of distributed cognition [28] for the simple reason that individual search often fails to provide a solution in a limited time. The shopping Web site Amazon has a feature that tracks one’s searches and uses them as information to provide recommendations to others with similar searches. By including this additional information, the Web site transforms the individual’s search for good products into an instance of collective search, with the same benefits and foibles as described in this chapter. Similarly, the search for missing children or wanted criminals relies on a group of individuals searching and sharing information about their search. If the features of the search are known (e.g., how far the criminal could have traveled, or whether the search problem is easy or difficult), the communications between searchers can be set up to benefit the group and avoid unfavorable outcomes. In light of the peril and promise, it behooves cognitive scientists of all stripes to work together toward solving the problem of understanding and improving collective search.
Acknowledgments The authors wish to express thanks to Katy Borner, Rich Shiffrin, William Timberlake, Peter Todd, and Thomas Wisdom for helpful suggestions on this work. This research was funded by Department of Education, Institute of Education Sciences grant R305H050116, and National Science Foundation grant 0527920. More information about the laboratory and on-line versions of experiments related to the current work can be found at [51].
References 1. H. Arrow, J. E. McGrath, and J. L. Berdahl. Small Groups as Complex Systems: Formation, Coordination, Development, and Adaptation. Sage, Newbury Park, CA, 2000. 2. R. Axelrod. The dissemination of culture: A model with local convergence and global polarization. Journal of Conflict Resolution, 4:203–226, 1997. 3. A. Bandura. Behavioral modification through modeling procedures. In L. Krasner and L. P. Ulmann, editors, Research in Behavior Modification: New Development and Implications, pages 310–340. Rinehart and Winston, New York, 1965. 4. W. M. Baum and J. R. Kraft. Group choice: Competition, travel, and the ideal free distribution. Journal of the Experimental Analysis of Behavior, 69:227–245, 1998. 5. S. J. Blackmore. The Meme Machine. Oxford University Press, Oxford, UK, 1999. 6. R. Boyd and P. J. Richerson. Culture and the Evolutionary Process. University of Chicago Press, IL, 1985.
306
Robert L. Goldstone et al.
7. B. Bullnheimer, H. Dawid, and R. Zeller. Learning from own and foreign experience: Technological adaptation by imitating firms. Computational and Mathematical Organizational Theory, 4:267–282, 1998. 8. M. S.-Y. Chwe. Structure and strategy in collective action. American Journal of Sociology, 105:128–156, 1999. 9. R. B. Cialdini and N. J. Goldstein. Social influence: Compliance and conformity. Annual Review of Psychology, 55:591–621, 2004. 10. A. Clark. Natural Born Cyborgs. Oxford University Press, Oxford, UK, 2003. 11. M. Deutsch and H. B. Gerard. A study of normative and informative social influences upon individual judgment. Journal of Abnormal Social Psychology, 51:629–636, 1955. 12. M. Dorigo, E. Bonabeau, and G. Theraulaz. Ant algorithms and stigmery. Future Generation Computer Systems, 16:851–871, 2000. 13. S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003. 14. L. A. Dugatkin. Sexual selection and imitation: Females copy the mate choice of others. American Naturalist, 139:1384–1389, 1992. 15. Epicure: An agent-based foraging model. http://cognitrn.psych.indiana. edu/Epicure.html. Last accessed January 2008. 16. S. D. Fretwell and H. J. Lucas. Ideal free distribution. Acta Biotheory, 19:16–21, 1970. 17. C. R. Gallistel. The Organization of Learning. MIT Press, Cambridge, MA, 1990. 18. D. Gigone and R. Hastie. The impact of information on group judgment: A model and computer simulation. In J. H. Davis and E. Witte, editors, Understanding Group Behavior: Consensual Action by Small Groups, volume 1, pages 221–251. Erlbaum, Hillsdale, NJ, 1996. 19. D. M. Gillis and D. L. Kramer. Ideal interference distributions: Population density and patch use by zebrafish. Animal Behavior, 35:1875–1882, 1987. 20. M. Gladwell. The Tipping Point. Little Brown, London, 2000. 21. M. J. Godin and M. H. A. Keenleyside. Foraging on patchily distributed prey by a cichlid fish (Teleosti, Chichlideae): A test of the ideal free distribution theory. Animal Behavior, 32:120–131, 1984. 22. D. E. Goldberg. Genetic Algorithms. Addison-Wesley, Reading, MA, 1989. 23. R. L. Goldstone and B. C. Ashpole. Human foraging behavior in a virtual environment. Psychonomic Bulletin and Review, 11:508–514, 2004. 24. R. L. Goldstone, B. C. Ashpole, and M. E. Roberts. Knowledge of resources and competitors in human foraging. Psychonomic Bulletin and Review, 12:81–87, 2005. 25. R. L. Goldstone and M. A. Janssen. Computational models of collective behavior. Trends in Cognitive Science, 9:424–430, 2005. 26. R. L. Goldstone, A. Jones, and M. E. Roberts. Group path formation. IEEE Transactions on System, Man, and Cybernetics, Part A, 36:611–620, 2006. 27. R. L. Goldstone and M. E. Roberts. Self-organized trail systems in groups of humans. Complexity, 15:43–50, 2006. 28. T. M. Gureckis and R. L. Goldstone. Thinking in groups. Pragmatics and Cognition, 14:293–311, 2006. 29. C. Hashimoto, S. Suzuki, Y. Takenoshita, J. Yamagiwa, A. K. Basabose, and T. Furuichi. How fruit abundance affects the chimpanzee party size: A comparison between four study sites. Primates, 44:77–81, 2003.
Collective Search in Concrete and Abstract Spaces
307
30. D. Helbing, J. Keltsch, and P. Moln´ ar. Modeling the evolution of human trail systems. Nature, 388:47–50, 1997. 31. T. T. Hills. Animal foraging and the evolution of goal-directed cognition. Cognitive Science, 30:3–41, 2006. 32. T. G. Howsman, D. O’Neil, and M. A. Craft. A stigmeric cooperative multi-robot control architecture. In J. Pollack, M. Bedau, P. Husbands, T. Ikegami, and R. A. Watson, editors, Artificial Life IX, pages 88–93. MIT Press, Cambridge, MA, 2004. 33. M. A. Janssen and W. Jager. Simulating market dynamics: The interactions of consumer psychology and structure of social networks. Artificial Life, 9:343–356, 2003. 34. J. P. Kahan and A. Rapoport. Theories of Coalition Formation. Erlbaum, Hillsdale, NJ, 1984. 35. J. Kennedy, R. C. Eberhart, and Y. Shi. Swarm Intelligence. Morgan Kaufmann, San Francisco, 2001. 36. M. Kennedy and R. D. Gray. Can ecological theory predict the distribution of foraging animals? A critical analysis of experiments on the ideal free distribution. Oikos, 68:158–166, 1993. 37. J. Kleinberg. Navigation in a small world. Nature, 406:845, 2000. 38. J. R. Kraft and W. M. Baum. Group choice: The ideal free distribution of human social behavior. Journal of the Experimental Analysis of Behavior, 76:21–42, 2001. 39. B. Latan´e and M. J. Bourgeois. Dynamic social impact in electronic groups: Consolidation and clustering in computer-mediated communication. Journal of Communication, 46:35–47, 1996. 40. B. Latan´e and T. L. L’Herrou. Spatial clustering in the conformity game: Dynamic social impact in electronic groups. Journal of Personality and Social Psychology, 70:1218–1230, 1996. 41. D. Lazer and A. Friedman. The parable of the hare and the tortoise: Small worlds, diversity, and system performance. KSG Working Paper RWP05-058, John F. Kennedy School of Government, Harvard University, Cambridge, MA, 2005. 42. G. J. Madden, B. F. Peden, and T. Tamaguchi. Human group choice: Discretetrial and free-operant tests of idea free distribution. Journal of the Experimental Analaysis of Behavior, 78:1–15, 2002. 43. W. A. Mason, A. Jones, and R. L. Goldstone. Propagation of innovations in networked groups. Journal of Experimental Psychology: General, in press. 44. A. N. Meltzoff. Infant imitation after a 1-week delay: Long-term memory for novel acts and multiple stimuli. Developmental Psychology, 24:470–476, 1988. 45. S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967. 46. M. Milinski. Competition for non-depleting resources: The ideal free distribution in sticklebacks. In A. C. Kamil, J. R. Krebs, and H. R. Pulliam, editors, Foraging Behavior, pages 363–388. Plenum Press, New York, 1987. 47. A. Newell and H. A. Simon. Human Problem Solving. Prentice Hall, Englewood Cliffs, NJ, 1972. 48. A. Nowak, J. Szamrej, and B. Latan´e. From private attitude to public opinion: A dynamic theory of social impact. Psychological Review, 97:362–376, 1990. 49. E. Ostrom, R. Gardner, and J. Walker. Rules, Games, and Common-Pool Resources. University of Michigan Press, Ann Arbor, 1994.
308
Robert L. Goldstone et al.
50. H. V. D. Parunak and S. A. Brueckner. Engineering swarming systems. In F. Bergenti, M.-P. Gleizes, and F. Zambonelli, editors, Methodologies and Software Engineering for Agent Systems, pages 341–376. Kluwer, Amsterdam, 2004. 51. Percepts and concepts laboratory. http://cognitrn.psych.indiana.edu. Last accessed January 2008. 52. P. Pirolli and S. Card. Information foraging. Psychological Review, 106:643–675, 1999. 53. H. P¨ oys¨ a, J. Elmberg, and K. Sj¨ oberg. Habitat selection rules in breeding mallards (anas platyrhynchos): A test of two competing hypotheses. Oecologia, 114:283–287, 1998. 54. H. R. Pulliam and B. J. Danielson. Sources, sinks, and habitat selection: A landscape perspective on population dynamics. American Naturalist, 137:S50–S66, 1991. 55. A. Rapoport, T. Kugler, S. Dugar, and E. Gisches. Choice of routes in congested traffic networks: Experimental tests of the braess paradox. In T. Kugler, J. C. Smith, T. Connolly, and Y.-J. Son, editors, Decision Modeling and Behavior in Uncertain and Complex Environments. Springer, New York, 2007. 56. M. E. Roberts and R. L. Goldstone. EPICURE: Spatial and knowledge limitations in group foraging. Adaptive Behavior, 14:291–313, 2006. 57. L. Rosenkopf and E. Abrahamson. Modeling reputational and information influences in threshold models of bandwagon innovation diffusion. Journal of Computational and Mathematical Organizational Theory, 5:361–384, 1999. 58. A. K. Seth. Modeling group foraging: Individual suboptimality, interference, and a kind of matching. Adaptive Behavior, 9:67–91, 2001. 59. M. Sherif. A study of some social factors in perception. Archives of Psychology, 187:1–60, 1935. 60. D. Strang and M. Macy. In search of excellence: Fads, success stories, and adaptive emulation. American Journal of Sociology, 107:147–182, 2001. 61. J. J. Templeton and L. A. Giraldeau. Vicarious sampling: The use of personal and public information by starlings foraging in a simple patchy environment. Behavioral Ecology and Sociobiology, 38:105–114, 1996. 62. G. Theraulaz and E. Bonabeau. Modeling the collective building of complex architectures in social insects with lattice swarms. Journal of Theoretical Biology, 177:381–387, 1995. 63. P. M. Todd and G. Gigerenzer. Simple heuristics that make us smart. Behavioral and Brain Sciences, 23:727–741, 2000. 64. D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. Nature, 393:440–442, 1998.
Braess Paradox in the Laboratory: Experimental Study of Route Choice in Traffic Networks with Asymmetric Costs Amnon Rapoport, Tamar Kugler, Subhasish Dugar, and Eyran J. Gisches Department of Management and Organizations, University of Arizona, Tucson, AZ 85721, USA [email protected], [email protected], [email protected], [email protected] Summary. The Braess paradox (BP) in traffic and communication networks is a powerful illustration of the possible counterintuitive implications of the Nash equilibrium solution. It shows that, paradoxically, when one or more links are added to a directed network with affine link cost functions that depend on congestion, and each user selfishly seeks her best possible route, then the equilibrium travel cost of each and every user may increase. We report the results of a traffic network game experiment designed to test the implications of the BP. The experiment included two network games: a basic network game with three alternative routes, and an augmented network game with two additional routes. Both networks included asymmetric link cost functions, and each game was iterated 60 times with complete outcome information. On each round of the game, the subjects were asked to independently choose a route from a common origin to a common destination in an attempt to minimize individual travel cost. Focusing on aggregate and individual frequencies of route choice and route switching, our results show that with experience in traversing the network, aggregate, but not individual, choice frequencies approach the user equilibrium solution as implied by the BP.
1 Introduction Braess [6] proposed a very simple model of a congested network and showed that, paradoxically, when a new link is added to a network and each user independently seeks her best possible route from a common origin to a common destination, then, for certain combinations of link cost functions and number of users, the equilibrium cost of travel of all users may increase. This situation, subsequently dubbed the Braess paradox (hereafter BP), has attracted considerable attention and instigated much theoretical research in transportation science, computer science, and other disciplines that are interested in decentralized decision making in multiagent systems. Researchers from these disciplines have attempted to classify networks in which the addition
310
Amnon Rapoport et al.
of one or more links could degrade network performance [13,27], discover new paradoxes [7,9,12,18], and quantify the degree of degradation in network performance due to unregulated “selfish” behavior [25]. An excellent discussion of the BP appears in a recent research monograph by Roughgarden [24]. The BP is not the only example of a Nash equilibrium that does not, in general, maximize social welfare. The two best-known examples are the finitely iterated prisoner’s dilemma and centipede games. In the prisoner’s dilemma game, each player has a dominant strategy. Simultaneous choice of the n dominant strategies yields a unique Pareto deficient equilibrium in pure strategies. (See, e.g., [10], or some other introductory textbook on game theory, for definitions of the concepts “dominant strategy,” “Pareto efficiency,” “Nash equilibrium,” and “mixed-strategy equilibrium.”) When the game is iterated a finite and commonly known number of times with a commonly known termination point, backward induction is invoked to yield mutual defection at each round of play. For certain choices of parameter values, this result seems highly counterintuitive. A second and less familiar example is the centipede game [2–4,23], which is an extensive form game with complete information and a binary strategy space, the same for each player: {“continue”, “exit”}. In the two-player version of the game with J moves and exponentially increasing payoffs, payoffs are structured in such a way that: (1) a player who exits on stage j, j = 1, . . . , J, earns more than the other player, (2) payoffs associated with stage j + 1 are obtained by multiplying each of the two payoffs on stage j by the same constant and reversing the player order, (3) both players are better off if the game continues for at least two more stages, and (4) if one of the players continues on stage j and the other exits on stages j+1, then the former is worse off and the latter is better off (see, e.g., [20], for a nontechnical presentation of the game). Figure 1 displays a two-person six-move centipede game taken from Aumann [2]. The equilibrium solution constructed by backward induction is for each player to exit on each stage. This results in the first player exiting on stage 1. Most reasonable people are not willing to accept this solution, which is Pareto deficient, or at least believe that it represents an approach of no practical value. Aumann [2] considers the game to be one of the most “disturbing counterintuitive examples of rational interactive decision-making” (p. 219). The finitely iterated prisoner’s dilemma game, centipede game, and network games susceptible to the BP provide three different perspectives for examining noncooperative decision situations in which Nash equilibria do not maximize social welfare. The iterated prisoner’s dilemma game is a complete information game in strategic form that is repeated in time with no change in the payoff matrix. The centipede game is a one-shot extensive form game with joint payoffs that increase from one move to another. The BP is a concatenation of two network games. Unlike the previous two games, the implications of the BP are not attributed to backward induction. Rather, they result from expansion of a given network. It is the implications of these games that deepen
Braess Paradox in the Laboratory
311
Fig. 1. A six-move two-player centipede game [2]. The two players are marked by the letters A and B, and the moves by numbers from 1 to 6.
our understanding of the Nash equilibrium concept, clarify the inherent conflict that often exists between individual and group rationality, and force our intuition about distributed multiagent decision making to evolve. A critical element in this ongoing process is the experimental investigation of interactive decision making in these classes of games that complements and frequently instigates additional theoretical research. In the last 50 years or so since the publication of the book Games and Decisions by Luce and Raiffa [15], variants of the finitely iterated prisoner’s dilemma game have been studied in literally thousands of experiments (see, e.g., [8] for a partial literature review). Perhaps because it was introduced and studied much later, experimental studies of the centipede game have only appeared in the last 15 years [5,11,16,17,22]. However, we are familiar with only a single experimental study of the BP by Rapoport et al. [21] (hereafter RKDG) that, in contrast to the results of the two-person prisoner’s dilemma and centipede experiments, has provided strong support to the equilibrium solution. Extending this work, the purpose of the present study is to test the descriptive power of the BP in a topologically richer network with asymmetric cost functions. The rest of the chapter is organized as follows. Section 2 introduces terminology and illustrates the BP in a simple traffic network with only two parallel routes from a common origin to a common destination and affine cost functions. Section 3 briefly summarizes the findings of two experiments conducted by RKDG. Both experiments include cost functions that give rise to equilibrium solutions in which players divide themselves in equal numbers across routes. Equal division of the set of players across routes, besides being atypical, provides a focal point that may considerably facilitate coordination in route selection. Section 4 reports the design and results of another
312
Amnon Rapoport et al.
experiment with a different network in which subsets of network users of different size are predicted to choose different routes in equilibrium. The results are summarized and discussed in Section 5.
2 The Braess Paradox: Terminology and Examples It is customary (e.g., [14]) to model a two-terminal network by a graph G = (V, E, O, D), where V is a finite set of vertices (nodes), E is a finite set of edges (arcs, links), and O (the origin) and D (the destination) are two distinct vertices in V . Each edge e ∈ E has a tail t(e) ∈ V and a head h(e) ∈ V ; we interpret e as a one-way road segment from t(e) to h(e). A route (path) in the network G is a sequence of the form v0 , e1 , v1 , e2 , v2 , . . ., vg−1 , eg , vg , where v0 , v1 , . . ., vg are vertices, v0 = O, vg = D, and e1 , e2 , . . ., eg are edges satisfying t(ei ) = vi−1 and h(ei ) = vi for i = 1, . . . , g. We consider noncooperative games that are associated with the network G and have the following components. • A finite set of players (users) N = {1, . . . , n}. Transportation science differentiates between the case where n is infinite, so that each user has an infinitely small impact on road (edge) congestion, and the case where n is finite and commonly known. • An assignment of costs to edges depending on the number of users that traverse these edges. We denote by cij (fij ) the cost to each user of moving along an edge (i, j) with a tail t(e) = i and head h(e) = j, where the total number of users of (i, j) is fij . In the context of traffic networks, cij (fij ) is taken to represent the travel cost for road segment (i, j) when it is chosen by fij users. In our game, every user travels from a common origin O to a common destination D. The strategy space of each user is the set of routes in G. If the n users choose travel routes so that a unilateral change of route by any one user does not lower the user’s travel cost, given that all other n − 1 users do not change their routes, then the n route choices are said to be in equilibrium. We consider in this chapter networks with affine cost functions, where for each edge (i, j) ∈ E, cij = aij fij +bij for some nonnegative constants aij and bij . The cost function includes two components, a fixed component bij that is interpreted as the time to traverse edge (i, j) by a single user, and a variable component aij fij that accounts for the effects of congestion. Affine cost functions were chosen because they are intuitive and supported by empirical evidence [27]. Two networks are required to illustrate the BP, which we refer to as the basic network and the augmented network. The augmented network includes the basic network and one or more traversal edges (crossroads, bridges). Figure 2a displays a very simple basic network with four vertices denoted by the letters O, A, B, and D, four edges, and an antisymmetric [19] arrangement
Braess Paradox in the Laboratory
313
of the edge costs: cOA = 10fOA , cBD = 10fBD , cAD = cOB = 25. Figure 2b depicts the augmented network which, in addition to the basic network, includes a single edge (A, B) that connects the end of edge (O, A) to the beginning of the edge (B, D). In our example, cAB = 0: no cost is charged for traversing the “bridge.” This assumption can be easily relaxed by allowing for cAB > 0. Figures 2a and 2b provide an example of a minimal critical network, whose properties were discussed by Penchina [19].
Fig. 2. Basic and augmented networks for Games 1A and 1B in RKDG, Games 2A and 2B in RKDG, and Games 3A and 3B in the present study.
A major feature of our experiment is that the total travel cost is subtracted from a fixed endowment, denoted by E, which assumes the same value for
314
Amnon Rapoport et al.
each of the n players in both the basic and augmented networks. This allows manipulation of the ratio between the equilibrium travel payoffs (rather than costs) in the basic and augmented networks that may be used experimentally to enhance the effect of the BP. For the networks in Figure 2a and 2b, assume that n = 2 and E = 45. (We choose n = 2 to illustrate the paradox in its simplest form and promote comparison with the two-person PD and centipede games. It is not binding in any way.) It is easy to verify that there are two pure-strategy equilibria in the basic game (called Game 1A) with one player choosing route (O–A– D) and the other route (O–B–D). The individual cost of travel 10 + 25 = 35, and the resulting payoff to each player is 45 − 35 = 10. No player benefits from unilateral deviation. There is also symmetric mixed-strategy equilibrium where each route is chosen with equal probability. Consider next the augmented game in Figure 2b. It is again easy to verify that the augmented game (called Game 1B) has a unique pure-strategy equilibrium where the two players choose the route (O–A–B–D) for total travel cost of 20 + 0 + 20 = 40 and a corresponding payoff of 45 − 40 = 5. The counterintuitive feature of this example is that the improvement of the basic network by adding a cost-free edge (A, B) causes an increase of 14.3 percent in the original travel cost for each user. By fixing the same endowment E for both networks, equilibrium play results in a 50 percent drop in individual payoff (from 10 to 5). The paradox is explained by the congestion externalities on the cost-free edge (A, B). Because each user ignores the external cost imposed by traversing the cost-free “bridge,” both choose to take it. The value of cAB affects the users’ inclination to choose routes. The higher the value of cAB , the smaller the number of players who are enticed to choose route (O–A–B–D). If cAB > 5, then the bridge will never be crossed. There is a flip side to the BP that we find equally surprising. In the BP just described above, the addition of a cost-free link to a network causes every user to be worse off in equilibrium. Alternatively, start with the augmented network in Figure 2b, delete the cost-free edge (A, B), and note that, in equilibrium, both users benefit from the degradation of the network. The occurrence of the BP in traffic networks depends on the choice of parameter values, namely, cij (fij ) and n. To further illustrate this dependence, assume the same costs as above but change n from 2 to 4. In the resulting basic Game 1A, equilibrium play dictates that two players choose route (O–A–D) and two others choose route (O–B–D) for total travel cost of 45. Verify that this is also the pure-strategy equilibrium for the augmented network game. Thus, when the number of players in the example above is doubled from 2 to 4, the cost-free edge (A, B) is never traversed. For another example (see RKDG, Experiment 1), assume the same topology as in Figures 2a and 2b and set the edge cost functions at cOA = 10fOA , cBD = 10fBD , cAD = cOB = 210, cAB = 0, and the number of players at n = 18. In equilibrium, 9 players choose route (O–A–D) and 9 others route (O–B–D) for total travel cost of 10 × 9 + 210 = 300. The augmented game has a unique pure-strategy equilibrium where all
Braess Paradox in the Laboratory
315
18 players choose the cost-free route (O–A–B–D) and, consequently, incur a higher travel cost of 360 per player (a 20 percent increase).
3 Related Experiments A claim can be made that the BP is no more than a theoretical curiosity, its effect will diminish with experience, and examples closer to the complexity of real life would prevent those kinds of paradoxes from being realized. After discussing the BP and two related traffic paradoxes, Arnott and Small [1] (p. 451) raised the same question: “Are these paradoxes more than intellectual curiosities?” With the exception of some anecdotal evidence, no solid empirical evidence has been brought up in more than 35 years to answer this question. Consequently, RKDG pursued an alternative approach that simulates traffic networks susceptible to the BP in a computer-controlled laboratory. In this approach, financially motivated players independently and repeatedly choose routes in these networks both before and after they are expanded. Then, the equilibrium predictions underlying the BP are tested on the aggregate and individual levels. RKDG report the results of two experiments that studied topologically different networks. We briefly summarize below the main results of Experiment 2, which is most closely related to the present study. Experiment 1 implemented the traffic networks in Figures 2a and 2b with the cost parameters cOA = 10fOA , cBD = 10fBD , cAD = cOB = 210, cAB = 0, and n = 18. It reports convergence to equilibrium in Game 1B after 40 rounds of play. Experiment 2 studied a topologically richer network than Experiment 1 that Roughgarden [24] (Ch. 5) calls a “second Braess graph” (see Figures 2c and 2d). The augmented network (Game 2B) was constructed by adding two cost-free edges to Game 2A, namely, edge (C, A) and edge (B, E). This resulted in adding two new routes to the network, namely, routes (O–C–A–D) and (O–B–E–D) for a total of five routes (see Figure 2d). Three groups of 18 players each participated in Experiment 2. All the groups first played Game 2A for 80 rounds and then Game 2B for 80 additional rounds. The experimental findings can be summarized as follows. First, the equilibrium solution accounted quite well for the mean route choices in Game 2A. However, fluctuations around the mean persisted over the 80 rounds, and there was no support for equilibrium play on the individual level. Second, route choice frequencies in the augmented game moved slowly with experience in the direction of the pure-strategy equilibrium solution. However, mean route choices of the two equilibrium routes did not converge to equilibrium by the end of the session. Only 77.7 percent of the players were choosing one of the two equilibrium routes on round 80. When compared with the results of Experiment 1 of RKDG, which used a considerably simpler network, this last finding suggests that the effects of the BP may weaken if the network includes more alternative routes. In fact,
316
Amnon Rapoport et al.
they may altogether disappear with a richer topology or larger demand. As the number of routes in a road network increases, travelers may no longer cope successfully with the information they receive about route choices by all members of the population and, consequently, revert to simple heuristics for choosing their best routes. Moreover, symmetry in choosing routes in traffic networks is the exception, not the rule. Our purpose here is to determine the scope of applicability of the BP and assess its descriptive power by investigating the effects of the topological structure of the network and of the cost functions.
4 A New Network Experiment with Asymmetric Link Costs 4.1 Networks and Their Equilibria Figure 2e exhibits the network of the basic game, Game 3A, which was examined in the present study. Comparison of Games 2A and 3A shows that they vary from each other only in the edge cost functions, not the topological structure. Most importantly for our purpose, the cost functions in Game 3A provide no cue about symmetry that may facilitate coordination. There are 18!/4!6!8! = 9, 189, 180 pure-strategy equilibria in Game 3A where 4, 6, and 8 players choose routes (O–A–D), (O–C–E–D), and (O–B– D), respectively. This can be verified by summing up the edge costs for each of these three routes: (76 + 12 × 4) = (11 + 7 × 6 + 22 + 2 × 6 + 1 + 6 × 6) = (7 × 8 + 68) = 124 . There is also a symmetric mixed-strategy equilibrium where each player chooses routes (O–A–D), (O–C–E–D), and (O–B–D) with probabilities p1 , p2 , and p3 , respectively. These are computed numerically by solving the following simultaneous equations. n−1 X j=0 n−1 X j=0 n−1 X j=0
n−1 j p1 (1 − p1 )n−j−1 [76 + 12(j + 1)] = EV j n−1 j p2 (1 − p2 )n−j−1 [34 + 15(j + 1)] = EV j n−1 j p3 (1 − p3 )n−j−1 [68 + 7(j + 1)] = EV j
p1 + p 2 + p3 = 1 . The solution yields p1 = 0.210, p2 = 0.321, and p3 = 0.469, and expected travel cost of EV = 130.83.
Braess Paradox in the Laboratory
317
Figure 2f exhibits the network of the augmented game, Game 3B, which includes two additional cost-free edges. There are 18!/7!11! = 31, 824 purestrategy equilibria in this game, where the three original routes (O–A–D), (O–C–E–D), and (O–B–D) are abandoned and the two new routes (O–C– A–D) and (O–B–E–D) are chosen by 7 and 11 players, respectively. The travel cost associated with each of these equilibria is 144. To establish that the (7,11) split of the players in Game 3B is in equilibrium, we have to verify that unilateral deviations from route (O–C–A–D) do not decrease the total travel cost. As before, we denote by cij (fij ) the cost for each user of traversing edge (i, j). Checking for deviations, we have: Unilateral deviation from route (O–C–A–D) to route (O–A–D) does not pay off: cOC (7) + cCA (7) + cAD (7) = 60 + 0 + 84 = 144 < cOA (1) + cAD (7) = 76 + 84 = 160 . Unilateral deviation from route (O–C–A–D) to route (O–C–E–D) does not pay off: cOC (7) + cCA (7) + cAD (7) = 60 + 0 + 84 = 144 < cOC (7) + cCE (1) + cED (12) = 60 + 24 + 73 = 157 . Unilateral deviation from route (O–C–A–D) to route (O–B–D) does not pay off: cOC (7) + cCA (7) + cAD (7) = 60 + 0 + 84 = 144 < cOB (12) + cBD (1) = 84 + 68 = 152 . Unilateral deviation from route (O–C–A–D) to route (O–B–E–D) does not pay off: cOC (7) + cCA (7) + cAD (7) = 60 + 0 + 84 = 144 < cOB (12) + cBE (12) + cED (12) = 84 + 0 + 73 = 157 . Similar computations hold for unilateral deviations from the second equilibrium route (O–B–E–D) to route (O–A–D). We do not investigate the existence of mixed-strategy equilibria for Game 3B. In our previous study (RKDG), we constructed different classes of mixedstrategy equilibria for Game 2B. When the game was iterated in time, the subjects did not switch their routes from one trial to another in the frequency predicted by equilibrium play, nor did their patterns of switches support randomization. Consequently, in the present chapter we only focus on pure-strategy equilibria and use them as benchmarks for studying user behavior in Games 3A and 3B. The equilibrium analysis is static, providing no clue how to choose routes when Game 3B is iterated in time. To appreciate the difficulties that players
318
Amnon Rapoport et al.
face when choosing their travel routes, we have to consider nonequilibrium behavior. Recall that in the pure-strategy equilibrium for Game 3B, 7 players choose route (O–C–A–D) and 11 players route (O–B–E–D) with associated travel cost of 144. We have already shown that unilateral deviation from this equilibrium does not pay off. We have also shown that if all the players deviate and distribute themselves such that 4 players choose route (O–A–D), 6 players choose route (O–C–E–D), and 8 players choose route (O–B–D), then all 18 players decrease their travel cost from 144 to 124. However, the actual common patterns of route choice place the players in considerably more difficult positions. To illustrate this claim, suppose that routes (O–A–D), (O–C–E–D), (O–B–D), (O–C–A–D), and (O–B–E–D) are chosen by 2, 2, 2, 5, and 7 players, respectively. This distribution is close to the mean route choices in the present study that we present in the next section. The associated travel costs are 160, 131, 141, 144, and 118, respectively. It is not clear what signal these travel costs send to players who have to repeat the stage game and wish to reduce their travel costs, as the travel costs associated with the two nonequilibrium routes (O–B–D) and (O–C–E–D) are exceeded by the travel cost associated with one of the two equilibrium routes, namely, route (O–C–A–D). This distribution of route choice, which is not at all uncommon, may therefore elicit switches to another route on the next round. As players are known to continue switching their route choices over hundreds of trials even in much simpler two-route networks with no crossroad [21,26], it is doubtful whether convergence to equilibrium may be reached at all in networks with multiple bridges. Experiment 1 of RKDG has only shown that equilibrium is reached in a four-edge, two-route network with a single bridge. Based on the results that they report in Experiment 2, the most we may expect are patterns of route choice that gradually move in the direction of the equilibrium distribution over dozens or, more likely, hundreds of trials. 4.2 Method Participants The players were 54 students at the University of Arizona, who volunteered to participate in a computer-controlled experiment on group decision making for payoff contingent on performance. Males and females participated in almost equal numbers. None of the players had participated in Experiments 1 or 2 of RKDG. The players were divided into three groups (sessions) of 18 members each. A session lasted about 100 minutes. Excluding a $5 show-up bonus for participation, the mean payoff across sessions was $16. Procedure All three sessions were conducted in a large laboratory with multiple terminals located in separate cubicles. Upon arrival at the laboratory, each player was
Braess Paradox in the Laboratory
319
asked to draw a marked chip from a bag that determined her seating. Players were then handed instructions that they read at their own pace. Questions about the procedure and the network game were answered individually by the experimenter. Each session was divided into two parts, and specific instructions were handed to the players at the beginning of each part (see Appendix). The instructions for Part I displayed the traffic network in Game 3A and explained the procedure for choosing one of the three routes in this game. The instructions for Part II, given after Part I was completed, displayed the traffic network in Game 3B and explained the procedure for choosing one of the five routes in this game. The instructions for Part I graphically displayed the traffic network in Game 3A, explained the cost functions, and illustrated the computation of the travel cost for links with either variable or fixed costs. The basic game was iterated 60 times. At the beginning of the trial, each player was given an endowment of E = 184 travel units (points). The payoff for the trial was computed separately for each player by subtracting his travel cost from this fixed endowment. To choose one of the three routes—(O–A–D), (O–B–D), or (O–C–E–D)—the player had to click on the links of this route and then press a “confirm” button. Clicking with the mouse on a link changed the link’s color on the screen to indicate the player’s choice. After all the group members independently and anonymously registered and subsequently verified their route choices, a new screen was displayed with complete information about choices and outcomes: • • • •
The The The The
route chosen by the player number of players choosing each of the routes player’s payoff for the trial payoff associated with traversing each possible route
The instructions for Part II displayed the traffic network in Game 3B, explained the cost functions, and illustrated the computation of the travel cost for edges with either variable, fixed, or zero cost. Game 3B was also iterated 60 times. Because Game 3B, unlike Game 3A, allows for negative externalities, these were explained in detail and illustrated. After Part II was completed, the participants were paid their earnings in 4 trials drawn from the 60 trials in Part I, and 4 additional trials drawn from the 60 trials completed in Part II. The 8 payoff trials were randomly drawn and publicly displayed by one of the players at the end of the session. Points were accumulated across the 8 payoff trials and then converted to money at the exchange rate of 25 points = $1. Participants were paid their earnings individually and dismissed from the laboratory. We have opted for a within- rather than between-subject design so that the same players would experience the effect of adding the two cost-free edges. This provides a stronger test of the BP than a between-subject design as it allows players who have experienced Game 3A for 60 trials to resist choosing
320
Amnon Rapoport et al.
new routes in Game 3B (possibly by anonymous signals) that would result in considerably lower payoffs. An equally important feature of our design is the provision of the same endowment (E = 184 points) in both Games 3A and 3B. Under pure-strategy equilibrium play by all group members, this would have resulted in payoffs of 60 and 40 points per trial in Games 3A and 3B, respectively (a 50 percent increase in equilibrium payoff). 4.3 Results We first test for session effects, then test for the implications of the BP on the aggregate level, and conclude the section with the study of switches and individual differences. Session Effects We computed the following eight statistics for each session. • The number of times (out of 60) that each player chose each of the three routes in Game 3A. • The number of times (out of 60) that each player chose each of the five routes in Game 3B. • The frequency of switches (out of a maximum of 59) in Games 3A and 3B. (A switch occurs if a player changes her route from trial t to t + 1.) Table 1 presents the means and standard deviations of these statistics by session and game. Only two of the three routes in Game 3A and four of the five routes in Game 3B are independent. This allows for testing 8 statistics (2 in Game 3A, 4 in Game 3B, and 2 frequencies of switches in Games 3A and 3B). We conducted separate one-way ANOVAs on each of these eight statistics to test the null hypothesis of equality of means across sessions. None of these tests rejected the null hypothesis. Having failed to find differences between sessions, the raw data were combined across the three sessions for some of the subsequent analyses.1 Aggregate Route Choice For each trial we separately counted the number of players (out of 18) who chose routes (O–A–D), (O–B–D), and (O–C–E–D) in Game 3A, and then averaged the results across players and trials. Similar means were computed 1
Mean route (O–A–D) choice in Game 3A; F (2, 51) = 0.06, p = 0.94; mean route (O–B–D) choice in Game 3A; F (2, 51) = 0.01, p = 0.99; mean number of switches in Game 3A: F (2, 51) = 0.32, p = 0.728; mean route (O–A–D) choice in Game 3B; F (2, 51) = 1.23, p = 0.3; mean route (O–B–D) choice in Game 3B; F (2, 51) = 0.13, p = 0.88; mean route (O–C–A–D) choice in Game 3B; F (2, 51) = 0.29, p = 0.75; mean route (O–B–E–D) choice in Game 3B; F (2, 51) = 0.31, p = 0.74; mean number of switches in Game 3B: F (2, 51) = 1.37, p = 0.26.
Braess Paradox in the Laboratory
321
Table 1. Means and standard deviations of frequency of route choice and number of switches by game and session. Standard deviations appear in parentheses. Game 3A
Route Choice Frequency (O–A–D) (O–B–D) (O–C–E–D)
Number of Switches
Session 1
14.28 (8.60)
25.50 (12.49)
20.22 (11.56)
31.39 (12.08)
Session 2
13.56 (9.02)
25.44 (13.50)
21.00 (10.36)
28.83 (11.26)
Session 3
14.61 (9.71)
24.94 (12.40)
20.44 (10.08)
31.78 (12.62)
Game 3B
(O–A –D)
Route Choice Frequency (O–B (O–C– (O–C– (O–B– –D) E–D) A–D) E–D)
Number of Switches
Session 1
2.78 (2.69)
6.28 (7.51)
4.83 (5.75)
18.06 (14.64)
28.06 (15.94)
30.56 (15.96)
Session 2
2.28 (3.54)
6.22 (5.04)
4.22 (4.24)
17.89 (12.72)
29.39 (16.68)
26.67 (14.13)
Session 3
4.00 (3.82)
7.17 (5.82)
8.28 (7.13)
15.17 (10.84)
25.39 (13.83)
35.17 (16.04)
for each of the five routes in Game 3B. Table 2 presents the observed means and standard deviations by session (rows 2–4) and across sessions (row 5). The bottom row shows the equilibrium predictions (in boldface). The theoretical standard deviations in the bottom row refer to the symmetric mixed-strategy equilibrium for Game 3A, where a player chooses route (O–A–D) with probability 0.21, route (O–B–D) with probability 0.47, and route (O–C–E–D) with probability 0.32. The results for Game 3A corroborate the predictions quite well. Across sessions, the three routes were chosen with probabilities 0.24, 0.42, and 0.34 (compared to the predicted values of 0.21, 0.47, and 0.32). The observed mean route choices (means: 4.24, 7.59, and 6.17) closely follow the pure-strategy equilibrium point predictions (4, 8, and 6). The three observed standard deviations do not differ significantly (F < 1) from the predicted values under symmetric mixed-strategy equilibrium play. Similar results were obtained for each of the three sessions. Across sessions, the three nonequilibrium routes (O–A–D), (O–B–D), and (O–C–E–D) in Game 3B were jointly chosen on
322
Amnon Rapoport et al.
Table 2. Means and standard deviations of number of players choosing each route by game. Standard deviations appear in parentheses. Game 3A
(O–A–D)
(O–B–D)
(O–C–E–D)
Session 1
4.28 (1.94)
7.65 (1.93)
6.07 (2.08)
Session 2
4.07 (1.99)
7.63 (2.05)
6.30 (2.36)
Session 3
4.38 (1.88)
7.48 (2.28)
6.13 (1.94)
All Sessions
4.24 (1.93)
7.59 (2.08)
6.17 (2.12)
Equil.
4 (1.73)
8 (2.12)
6 (1.98)
Game 3B
(O–A–D)
(O–B–D)
(O–C–E–D)
(O–A–C–D)
(O–B–E–D)
Session 1
0.83 (0.76)
1.88 (1.14)
1.45 (0.95)
5.42 (1.55)
8.42 (2.08)
Session 2
0.68 (0.95)
1.87 (1.63)
1.27 (1.19)
5.37 (2.03)
8.82 (2.49)
Session 3
1.20 (0.95)
2.15 (1.20)
2.48 (1.50)
4.55 (1.96)
7.62 (2.17)
All Sessions
0.91 (0.91)
1.97 (1.34)
1.73 (1.34)
5.11 (1.89)
8.28 (2.29)
Equil.
0
0
0
7
11
26 percent of the trials compared to 0 percent under equilibrium play. Although equilibrium route choices are significantly higher than the nonequilibrium route choices in Game 3B, their mean frequencies (5.11 and 8.28 as compared to 7 and 11 for the two equilibrium routes) are well below the purestrategy equilibrium frequencies. Interestingly, the ratios of the two observed mean frequencies (5.11/13.39 = 0.382, 8.28/13.39 = 0.618) are almost identical to the theoretical ratios (7/18 = 0.389, 11/18 = 0.611). Of course, we would not expect all the players to switch to the new equilibrium routes in the augmented game on the first trial of Part II or even the first 10−20 trials. In fact, we would anticipate the opposite behavior with some players inclining
Braess Paradox in the Laboratory
323
to choose the “old” routes in an attempt to prevent the sharp decline in their payoff as they move from Game 3A to Game 3B. Figures 3 and 4 display the mean number of players choosing each of the three routes in Game 3A and each of the five routes in Game 3B, respectively. In each of these two figures, the results are exhibited by session and across all three sessions. To better exhibit the trends across trials, we display the running means in steps of five rather than the individual means for each trial. Consider first Figure 3. It is evident that the mean route choices across sessions in Game 3A begin at levels very close to the corresponding pure-strategy equilibrium predictions in the first five trials. We discern no major learning trends across the 60 trials. An examination of the individual session plots shows separation in the frequencies of route choices and no discernible learning trends with the exception of the diversification of the frequencies of route choice in the last 10 − 15 trials in Session 3. Over time and across sessions, mean frequencies for routes (O–A–D) and (O–B–D) exhibit very mild downward and upward trends, respectively, whereas mean frequencies for route (O–C–E–D) hover around 6. Figure 4 displays results pertaining to the major hypothesis of the present study. Similar to Figure 3, Figure 4 plots the running means of route choice in steps of five. The two equilibrium routes (O–C–A–D) and (O–B–E–D) are already separated from the three nonequilibrium routes in the first few trials. This separation is seen to increase with experience. On trial 60, the two equilibrium routes (O–C–A–D) and (O–B–E–D) were jointly chosen across the three sessions on 74 percent of the trials. Inspection of Figure 4 shows that the players started choosing the two new routes on the early trials and that their frequency increased with experience. Mean choices for route (O– C–A–D) show monotonic convergence to the equilibrium prediction, whereas the other equilibrium route choices show upward movement until trial 50 or so. In each session, the three nonequilibrium routes were always chosen, and displayed minor declining trends. Overall, the mean route choices of the two equilibrium routes in Game 3B did not converge to the equilibrium prediction in 60 trials, although they steadily moved in this direction. Switches Consider first a global definition of switch as a change of route from trial t to trial t + 1. Because, as reported above, we observe no difference between sessions in the frequency of switches for Games 3A and 3B, the results are averaged across the three sessions for each game separately. Figure 5 displays the running mean (in steps of 5) number of switches for Games 3A and 3B. For both games, there is clear downtrend in the mean number of switches. The mean number of switches in Game 3A declines from about 12.1 on the first five trials or so to about 8.8 in the last trial. The corresponding values for Game 3B are 13.3 and 7.6. A player who adheres to mixed-strategy equilibrium play in Game 3A should switch her route from one trial to another with probability
324
Amnon Rapoport et al.
Fig. 3. Mean number of players choosing each route in Game 3A by session.
Braess Paradox in the Laboratory
325
Fig. 4. Mean number of players choosing each route in Game 3B by session.
326
Amnon Rapoport et al.
that depends on the route from which she is switching. This would result, on average, in 14 players switching their routes on any given trial. Our results show that, on average, players switched their routes in Game 3A but not as frequently as predicted by the mixed-strategy equilibrium play. This supports our decision not to test for equilibrium play.
Fig. 5. Running mean number of switches by game.
Is it beneficial to switch routes from trial to trial? We have answered this question by performing two separate analyses, one relating the individual frequency of switches to the individual payoff across all trials, and the second comparing the effect of switching to not switching on the payoff. In the first analysis, we counted for each player the number of switches across the session (min = 0, max = 59) in Game 3A and correlated it with the player’s payoff for the session. We repeated the same analysis for Game 3B. Both correlations were negative and highly significant: −0.56 and −0.65 for Games 3A and 3B, respectively. The analysis just reported above does not consider the possibility that the payoff following a switch may indeed be lower than the one preceding the switch but nevertheless higher than it would have been if the switch did not occur. To investigate this possibility, whenever we recorded a switch we computed the (counterfactual) payoff that the player would have earned without switching (assuming that the remaining players in his group would have played as they actually did). We then computed the difference between the actual payoff following a switch and the counterfactual payoff and summarized it first across all switches for a given player and then across all the 18 players in each session. If switching is detrimental, then the means of these difference scores should be negative. For Game 3A, these means assumed the values −1.23, −6.50, and −3.21 for sessions 1, 2, and 3, respectively. The corresponding
Braess Paradox in the Laboratory
327
means for Game 3B were 0.46, −7.96, and −3.48. Taken together, the results of both analyses show that, on average, switching is detrimental. If players in Game 3A randomize their route choices in the same way, then under the mixed-strategy equilibrium the frequencies of switches in Games 3A and 3B should not be correlated. To test this implication, we computed the correlation between the individual number of switches in Games 3A and 3B (n = 54). The correlation was positive and quite high: r = 0.64, p < 0.01. We conclude from this statistic that the propensity to switch is not affected too much by the topology of the network. Individuals vary from one another in their propensity to switch. Those who switch more often in one network are likely to switch more often in the other. These results clearly refute the hypothesis of randomization by all players with the same probabilities. Next, we refine the definition of switch between trial t to trial t + 1 by directly conditioning it on route congestion on trial t. We ask: are there differences in the switching patterns between players who chose the minimally congested route in trial t and players who did not? It is possible that a player in both games will show a propensity to switch, conditional on the fact that the player chose the least congested route in the previous trial. Selten et al. [26] call such play “contrarian.” Such switching patterns suggest that players exhibit a forward-looking approach in that a player might wish to deviate from the least congested route on trial t to a different route on trial t + 1 because of the player’s expectation that more players would choose the least congested route on trial t + 1. If this happens, then the player may avoid a possible reduction in payoff by switching to another route on trial t + 1. A detailed analysis of the individual switches provides no evidence for players who chose contrarian play on the majority of the trials. Consider first Game 3A, and denote the payoffs on trial t of players choosing routes (O–A– D), (O–B–D), and (O–C–E–D) by Pt (O–A–D), Pt (O–B–D), and Pt (O–C– E–D), respectively. Consider only pairs of trials t and t + 1, where player i switches from route x on trial t to route y on trial t + 1, where x, y ∈ {(O– A–D), (O–B–D), (O–C–E–D)}, x 6= y. Call a switch “myopic” if Pt (x) > Pt (y), “contrarian” if Pt (x) < Pt (y), and “inconsequential” if Pt (x) = Pt (y). Then, for each player i calculate a vector si = {si (m), si (c), si (i)}, where si (m), si (c), and si (i) count the respective frequencies, and denote the total number of switches of player i by Si (0 ≤ Si ≤ 59). Finally, call a player a “myopic switcher” if si (m)/Si ≥ k, “contrarian switcher” if si (c)/Si ≥ k, and “inconsequential switcher” if si (i)/Si ≥ k for some 0 < k ≤ 1. Arbitrarily setting k = 2/3, and only considering players with at least 11 switches (out of a maximum of 59), we classified 18 players as myopic switchers, 0 players as contrarian switchers, and 34 players as inconsequential switchers. Two of the 54 players could not be classified (Si < 11). Although 33.2 percent of the switches were from a route with a higher payoff (less congested) to a route with a lower payoff (more congested), we find no evidence
328
Amnon Rapoport et al.
for players systematically pursuing a contrarian strategy in switching their routes. We repeated the same analysis with Game 3B, which has five routes rather than three. The results were quite similar: 24, 1, and 24 myopic, contrarian, and inconsequential switchers, respectively, and 7 players unclassified. Once again, contrarian play was observed on 33.5 percent of the trials, but only a single player pursued a contrarian strategy on the majority (2/3) of the trials. We conclude from this analysis that the majority of the players were either myopic or inconsequential switchers, who chose contrarian play on a substantial percentage of the trials. Individual Differences In this section, we focus on individual differences in route choice and number of switches. Turning first to route choice, Table 2 shows that the three routes (O–A–D), (O–B–D), and (O–C–E–D) in Game 3B were chosen, on average, by 4.61 of the 18 players. However, Table 2 does not address the question of whether some players chose these “old” routes considerably more often than others. Figure 6 answers this question by exhibiting the frequency distribution of the old route choices in Game 3B. It shows that a single player never chose any of the three nonequilibrium routes across all 60 trials, and a total of 16 players chose one or more old routes in no more than 5 trials (mostly in the beginning of Part II). Most of the choices of nonequilibrium routes (21 or more) were contributed by 17 of the 54 players (31.5 percent). Thus, whatever conclusions we may draw with regard to the realization of the BP in the population may apply to a considerable number of individual players (about 68.5 percent). We can only speculate about the reasons for the individual differences in Figure 6. They may indicate individual differences in the rate of recognizing the pure-strategy equilibrium routes in Game 3B and switching from the equilibrium routes in Game 3A to the new equilibrium routes in Game 3B. Under this interpretation, eventually all the players would be choosing one of the two equilibrium routes with more experience. Alternatively, they may indicate only partly successful signaling of more sophisticated players who, perceiving the payoff implications of equilibrium play in Game 3B (a major drop in earnings), persist in choosing old routes to prevent a sizeable decline in their earnings. Turning next to the individual differences in the frequency of switches, Figure 7 displays the frequency distributions of individual number of switches for Games 3A (upper panel) and Game 3B (lower panel). Once again, in both the basic and augmented games we observe individual differences that cover almost the entire range from 0 to 59. This finding, too, provides strong evidence against uniform randomization. We cannot tell whether the propensity to switch reflects exploration, confusion, an attempt to influence the group behavior in order to exploit it later, boredom with the task, or some combination of these factors.
Braess Paradox in the Laboratory
329
Fig. 6. Frequency (number of players) of nonequilibrium route choices in Game 3B.
Comparison with Experiment 2 of RKDG At the end of Section 3 we stated the hypothesis that support for the BP declines as the network becomes more complex. As argued earlier, the equilibrium solutions for Games 3A and 3B no longer yield equal proportions of players on the equilibrium routes as do the solutions for Games 2A and 2B, rendering coordination considerably more difficult. Our results flatly reject this hypothesis. In both the present experiment and Experiment 2 of RKDG, the equilibrium solutions for the mean route choice frequencies in the basic game were fully supported. On the last trial of the session, the two equilibrium routes in Game 3B were chosen jointly on 74 percent of the trials compared to 78 of the trials in Game 2B. In both experiments, the relative frequency of choosing either of the two equilibrium routes in the augmented game increased with experience. And in both studies, the mean number of switches in the basic and augmented games decreased with experience at roughly the same rate, and the frequency distributions of individual number of switches assumed a similar form. It would seem that the effects of the BP are more robust than we originally anticipated and may generalize to networks with a richer topology and more general cost structures than the ones investigated in the previous and present studies.
5 Conclusions and Discussion The aim of this chapter was to extend the experimental research on the Braess paradox and determine whether its occurrence is limited to simple symmetric networks, or it can be manifested in more complex, asymmetric settings. Whereas the network in this experiment is still rather simple as compared to
330
Amnon Rapoport et al.
Fig. 7. Frequency distribution of number of individual switches in Games 3A and 3B.
real-life traffic or communication networks, and more research is needed to address more complex networks, we may conclude on the basis of this study that the BP is rather robust. The equilibrium solution in pure strategies accounts surprisingly well for the aggregate route frequencies in the basic games in Experiments 1 and 2 of RKDG. Asymmetry in edge costs does not seem to affect this result. We find no support for pure-strategy equilibrium play on the individual level, as fluctuations around the means in Games 1A, 2A, and 3A persist over time and do not seem to diminish with experience. Selten et al. [26], who studied route choice in a two-terminal traffic network similar to the network in Figure 2a, also reported considerable fluctuations around the mean route frequencies
Braess Paradox in the Laboratory
331
over 200 trials. They proposed the multiplicity of the pure-strategy equilibria as one of the reasons for the nonconvergence and persistence of fluctuations. Route choice frequencies in the augmented Game 3B move with experience in the direction of the Pareto deficient equilibrium. Convergence to the predicted (7, 11) split on the two equilibrium routes is not reached in 60 trials possibly because a minority of the players require more experience with the game to achieve equilibrium. However, learning does take place (Figure 4), and the majority of the players do learn to avoid nonequilibrium routes after fewer than 20 trials or so (Figure 6). Our results seem to suggest that pure-strategy equilibrium may be approached, but most likely not reached, in real-life traffic networks in which the number of drivers is neither fixed nor commonly known or information about the route choice frequencies for the entire population is not provided. This is a topic for further experimental investigation.
Acknowledgments We gratefully acknowledge financial support by a contract F49620-03-1-0377 from the AFOSR/MURI to the University of Arizona. We also thank Ryan O. Murphy and Maya Rosenblatt for their assistance in data collection.
Appendix: Instructions for Traffic Network Experiment Introduction Welcome to an experiment on route selection in traffic networks. During this experiment you will be asked to make a large number of decisions and so will the other participants. Your decisions, as well as the decisions of the other participants, will determine your monetary payoff according to the rules that will be explained shortly. Please read carefully the instructions below. If you have any questions, please raise your hand and one of the experimenters will come to assist you. Note that hereafter communication between the participants is prohibited. If the participants communicate with each other by any shape or form, the experiment will be terminated. The Traffic Network Task The experiment is fully computerized. You will make your decisions by clicking on the appropriate buttons. A total of 18 persons participate in this experiment (i.e., 17 subjects in addition to you). During the experiment, you will serve as drivers who are asked to choose a route to travel in two traffic networks that are described below. The two networks differ from one another.
332
Amnon Rapoport et al.
You will first receive the instructions for part 1 (the first network). After completing part 1, you will receive the new instructions for part 2. You will participate in 60 identical rounds in each part. Description of Part I Consider the very simple traffic network exhibited in a diagram form on the next page. Each driver is required to choose one of three routes in order to travel from the starting point, denoted by S, to the final destination, denoted by T . The three alternative routes are denoted in the diagram by [S–A–T ], [S–B–T ], and [S–C–D–T ].
Travel is always costly in terms of the time needed to complete a segment of the road, tolls, etc. The travel costs are indicated near each segment of the route you may choose. For example, if you choose route [S–A–T ], you will be charged a total cost of 76+12X which consists of a fixed cost of 76 for traveling on segment [S–A] plus 12X for traveling on segment [A–T ]. In the expression above X indicates the number of participants who choose segment [A–T ]. Similarly, if you choose route [S–B–T ], you will be charged a total travel cost of 7Y +68, where Y indicates the number of participants who choose the segment [S–B]. If you choose route [S–C–D–T ], you will be charged a total travel cost of (11 + 7W ) + (22 + 2V ) + (1 + 6Z) where W , V , and Z indicate the number of participants who choose segments [S–C], [C–D], and [D–T ], respectively. Note that the costs charged for segments [S–A] and [B–T ] do not depend on the number of drivers choosing them. In contrast, the costs charged for traveling on segments [A–T ], [S–B], [S–C], [C–D], and [D–T ] increase as the number of drivers choosing them increases. All 18 drivers make their route choices independently of one another, leave point S together, and travel at the same time. Example Supposing that you are the only driver who chooses route [S–A–T ], 9 drivers choose route [S–B–T ], and 8 drivers choose route [S–C–D–T ]. Then, your
Braess Paradox in the Laboratory
333
travel cost from point S to point T will be equal to [76 + (12 × 1)] = 88. If, on another round, you and 2 more drivers choose route [S–C–D–T ], 10 other drivers choose route [S–B–T ], and 5 drivers choose route [S–A–T ], then your travel cost for that round will be [(11+(7×3))+(22+(2×3))+(1+(6×3))] = 32 + 28 + 19 = 79. At the beginning of each round, you will receive an endowment of 184 points. Your payoff for each round will be determined by subtracting your travel cost for the round from your endowment. To continue the previous example, if your travel cost for the round is 88, your payoff will be 184 − 88 = 96 points. If your travel cost is 79, then your payoff for that round will be 184 − 79 = 105 points. If too many drivers choose the same route, their travel cost may be negative. For example, if 10 drivers choose route [S–A–T ], then their travel cost will be 76 + (12 × 10) = 76 + 120 = 196. Therefore, their payoff for the round will be 184 − 196 = −12. At the end of each round you will be informed of the number of drivers who chose each route and of your payoff for that round. All 60 rounds in part 1 have exactly the same structure. Procedure At the beginning of each round the computer will display a diagram with three routes: [S–A–T ], [S–B–T ], and [S–C–D–T ]. You will then be asked to choose which of the three routes you wish to travel. To choose a route, simply click on all the segments of that route. For example, if you want to travel on route [S–C–D–T ], then click once on segment [S–C], once on segment [C– D], and once on segment [D–T ]. The color of the segments you click on will change to indicate your choice. If you decide to change your route, click on all the segments of the other route. Once you have chosen your route, press the “Confirm” button. You will be asked to verify your choice. After all 18 participants confirm their decisions, you will receive the following information: • • • • •
The route you have chosen. The number of drivers choosing route [S–A–T ] and their payoff. The number of drivers choosing route [S–B–T ] and their payoff. The number of drivers choosing route [S–C–D–T ] and their payoff. Your payoff for this round.
After completing all 60 rounds of part 1, you will receive a new set of instructions for part 2. Payments At the end of the experiment, you will be paid for 4 rounds that will be randomly selected from the 60 rounds of part 1. The payment rounds will be selected publicly by drawing 4 cards from a pack of 60 cards. You will be paid
334
Amnon Rapoport et al.
in cash for your earnings in these 4 rounds according to a conversion rate of 25 points=$1. In addition, you will receive a show up fee for $5 (for attending the experiment). This amount will be paid independently of the payments for the randomly selected rounds. Please place the instructions on the table in front of you to indicate that you have completed reading them. Part 1 will begin shortly. Part II Part 2 is identical to part 1 except for two segments that were added to the network: one from point C to point A and another from point B to D. Both segments have zero travel costs. As a result, in choosing a travel route in this new traffic network, you will have to choose among five routes, namely, routes [S–A–B], [S–B–T ], [S–C–D–T ], [S–C–A–T ], or [S–B–D–T ]. Similar to part 1, you will have to choose a single travel route. The traffic network for part 2 is displayed below.
Travel costs are computed exactly as in part 1. If you choose route [S–A– T ], you will be charged a total travel cost of (76+12X), where X indicates the number of drivers who chose the segment [A–T ] to travel from S to T via route [S–A–T ] or via route [S–C–A–T ]. Similarly, if you choose route [S–B– T ], you will be charged a total travel cost of (7Y + 68), where Y indicates the number of drivers who chose segment [S–B] to travel from S to T via route [S–B–T ] or via route [S–B–D–T ]. If you choose route [S–C–D–T ], you will have to spend a total travel cost of [(11+7W )+(22+2V )+(1+6Z)], where W indicates the number of drivers who chose the segment [S–C] to travel from S to T via route [S–C–D–T ] or via route [S–C–A–T ], V indicates the number of drivers who chose the segment [C–D] to travel from S to T via route [S– C–D–T ], and Z indicates the number of drivers who chose the segment [D–T ] to travel from S to T via route [S–C–D–T ] or via route [S–B–D–T ]. For another example, if you choose route [S–C–A–T ], you will be charged a total travel cost of [(11 + 7W ) + 0 + 12X], where W indicates the number of drivers
Braess Paradox in the Laboratory
335
who chose the segment [S–C] to travel from S to T via route [S–C–D–T ] or via route [S–C–A–T ] and X indicates the number of drivers who chose the segment [A–T ] to travel from S to T via route [S–A–T ] or via route [S–C–A– T ]. Finally, if you choose route [S–B–D–T ], you will be charged a total travel cost of [7Y + 0 + (1 + 6Z)], where Z indicates the number of drivers choosing the segment [D–T ] to travel from S to T via route [S–C–D–T ] or via route [S–B–D–T ], and Y indicates the number of drivers choosing the segment [S–B] to travel from S to T via route [S–B–T ] or via route [S–B–D–T ]. Note that, unlike part 1, in part 2 drivers who choose route [S–A–T ] or route [S–C–A–T ] share the segment [A–T ]. Similarly, drivers who choose route [S–B–T ] or route [S–B–D–T ] share the segment [S–B]. Lastly, drivers who choose to travel on route [S–C–D–T ], [S–C–A–T ], or [S–B–D–T ] share the segments [S–C] and [D–T ]. Example Supposing that: • • • • •
you choose route [S–C–D–T ], 3 other drivers choose route [S–A–T ], 3 other drivers choose route [S–B–T ], 5 other drivers choose route [S–C–A–T ], and 6 drivers choose route [S–B–D–T ].
Then, your total travel cost for that round is equal to [(11+(7×6))+(22+(2× 1)) + (1 + (6 × 7))] = 53 + 24 + 43 = 120. Note that in this example, 6 drivers (including you) traveled on the segment [S–C] and 7 drivers (again, including you) traveled on the segment [D–T ]. Also note that 8 drivers traveled on the segment [A–T ], namely, 3 drivers who chose route [S–A–T ] and 5 drivers who chose route [S–C–A–T ]. Similarly, a total of 9 drivers traveled on the segment [S–B], namely, 3 drivers who chose route [S–B–T ] and 6 drivers who chose route [S–B–D–T ]. Each of the 3 drivers choosing route [S–A–T ] will be charged a travel cost of [76 + (12 × 8)] = 172, each of the 3 drivers choosing the route [S–B–T ] will be charged a travel cost of [(12 × 9) + 60] = 131, each of the 5 drivers choosing route [S–C–A–T ] will be charged a travel cost of [(11 + (7 × 6)) + (22 + (2 × 1)) + (1 + (6 × 7))] = 120, and each of the 6 drivers choosing route [S–B–D–T ] will be charged a travel cost of [(7 × 9) + 0 + (1 + (6 × 7))] = 106. Exactly as in part 1, at the beginning of each round you will receive an endowment of 184 points. Your payoff for each round will be determined by subtracting your total travel cost from your endowment for that round. The information you receive at the end of each round will be the same as in part 1. In particular, at the end of each round the computer will display: • The route you have chosen. • The number of drivers choosing route [S–A–T ]. • The number of drivers choosing route [S–B–T ].
336
• • • • •
Amnon Rapoport et al.
The number of drivers choosing route [S–C–D–T ]. The number of drivers choosing route [S–C–A–T ]. The number of drivers choosing route [S–B–D–T ]. Your payoff for that round. The payoffs for drivers choosing each of the five routes.
Payoffs will be determined exactly as in part 1 (4 payment rounds randomly drawn out of 60), paid according to an identical exchange rate: 25 points = $1. Therefore, across parts 1 and 2, you will be paid according to your earnings in 8 randomly chosen rounds. Thank you for your participation.
References 1. R. Arnott and K. Small. The economies of traffic congestion. American Scientist, 82:446–455, 1994. 2. R. J. Aumann. Irrationality in game theory. In P. Dasgupta, D. Gale, and E. Maskin, editors, Economic Analysis of Markets and Games, pages 214–227. The MIT Press, Cambridge, MA, 1992. 3. R. J. Aumann. Backward induction and common knowledge of rationality. Games and Economic Behavior, 8:6–19, 1995. 4. R. J. Aumann. On the centipede game. Games and Economic Behavior, 23:97–105, 1998. 5. G. Bornstein, T. Kugler, and A. Ziegelmeyer. Individual and group decisions in the centipede game: Are groups more rational players? Journal of Experimental Social Psychology, 40:599–605, 2004. ¨ 6. D. Braess. Uber ein Paradoxon der Verkehrsplanung. Unternehmensforschung, 12:258–268, 1968. 7. J. E. Cohen and F. P. Kelly. A paradox of congestion in a queueing network. Journal of Applied Probability, 27:730–734, 1990. 8. A. Colman. Game Theory and Its Applications in the Social and Biological Sciences. Butterworth-Heinemann, Oxford, UK, 1995. 9. S. C. Dafermos and A. Nagurney. On some traffic equilibrium theory paradoxes. Transportation Research, Series B, 18:101–110, 1984. 10. A. Dixit and S. Skeath. Games of Strategy. Norton, New York, second edition, 2004. 11. M. Fey, R. D. McKelvey, and T. R. Palfrey. An experimental study of constantsum centipede games. International Journal of Game Theory, 25:269–287, 1996. 12. C. Fisk. More paradoxes in the equilibrium assignment problem. Transportation Research, Series B, 13:305–309, 1979. 13. M. Frank. The Braess paradox. Mathematical Programming, 20:283–302, 1981. 14. R. Holzman and N. Law-yone. Network structure and strong equilibrium in route selection games. Mathematical Social Sciences, 46:193–205, 2003. 15. R. D. Luce and H. Raiffa. Games and Decisions. John Wiley & Sons, New York, 1957. 16. R. D. McKelvey and T. R. Palfrey. An experimental study of the centipede game. Econometrica, 60:803–836, 1992.
Braess Paradox in the Laboratory
337
17. R. Nagel and F. F. Tang. Experimental results on the centipede game in normal form: An investigation on learning. Journal of Mathematical Psychology, 42:356–384, 1998. 18. E. I. Pas and S. L. Principio. Braess’ paradox: Some new insight. Transportation Research, Series B, 31:265–276, 1997. 19. C. M. Penchina. Braess paradox: Maximum penalty in a minimal critical network. Transportation Research, Series A, 31:379–388, 1997. 20. A. Rapoport. Games: Centipede. In L. Nadel, editor, Encyclopedia of Cognitive Science, pages 196–203. Macmillan, London, 2003. 21. A. Rapoport, T. Kugler, S. Dugar, and E. Gisches. Choice of routes in congested traffic networks: Experimental tests of the Braess paradox. Unpublished manuscript, Department of Management and Organizations, University of Arizona, Tucson, 2005. 22. A. Rapoport, W. E. Stein, J. E. Parco, and T. E. Nicholas. Equilibrium play and adaptive learning in a three-person centipede game. Games and Economic Behavior, 43:239–265, 2003. 23. R. W. Rosenthal. Games of perfect information, predatory pricing and the chain-store paradox. Journal of Economic Theory, 25:92–100, 1981. 24. T. Roughgarden. Selfish Routing and the Price of Anarchy. The MIT Press, Cambridge, MA, 2005. 25. T. Roughgarden and E. Tardos. How bad is selfish routing? Journal of the ACM, 49:236–259, 2002. 26. R. Selten, M. Schreckenberg, T. Chmura, T. Pitz, S. Kube, S. F. Hafstein, R. Chrobok, A. Pottmeier, and J. Wahle. Experimental investigation of dayto-day route-choice behavior and network simulations of Autobahn traffic in North Rhine-Westphalia. In M. Schreckenberg and R. Selten, editors, Human Behavior and Traffic Networks, pages 1–21. Springer, Berlin, 2004. 27. R. Steinberg and W. I. Zangwill. On the prevalence of the Braess’s paradox. Transportation Science, 17:301–318, 1983.
Non-Euclidean Traveling Salesman Problem John Saalweachter and Zygmunt Pizlo Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907-1364, [email protected], [email protected]
Summary. The traveling salesman problem (TSP) is usually studied on a Euclidean plane. When obstacles are placed on the plane, the distances are no longer Euclidean, but they still satisfy the metric axioms. Three experiments are reported in which subjects were tested on the TSP and on the shortest-path problem with obstacles. When the obstacles were simple, and they did not change the global structure of the problem, the subjects were able to produce near-optimal solutions, but the complexity of the mental mechanisms was higher than in the case of the Euclidean TSP. When obstacles were complex and changed the problem’s global structure, the solutions were no longer near-optimal. Several computational models are proposed that can account for the psychophysical results.
1 Introduction Scientific research on human thinking and problem solving started around the time of the gestalt revolution [18]. Gestalt psychologists emphasized the role of organizing principles in both perception and thinking. In perception, the organizing principles took the form of a simplicity principle determining figureground organization and perceived shape. In problem solving, the organizing principles led to the concept of insight. In both perception and thinking, experimental data came from introspection and verbal protocols. This kind of qualitative information seemed sufficient to demonstrate the operation of the organizing principles. However, it was not sufficient to study the nature of the underlying mental mechanisms. In order to learn what information is used and how it is analyzed, one has to perform parametric studies, in which behavioral response (accuracy and response times) is measured, and the nature of the stimulus is systematically varied. Parametric studies were not news in perception: the concept of threshold had been known for about a century before the gestalt revolution. However, parametric studies were, and still are news in thinking and problem solving. Problems, unlike physical objects, are not represented by continuous variables. For example, the size of an object can be manipulated with an arbitrary precision and one can
340
John Saalweachter and Zygmunt Pizlo
produce many objects that are identical except for size. However, physics and math problems are best represented by graphs and one problem cannot be changed to another with arbitrarily small steps. This makes parametric studies difficult. A given subject is usually tested with only one instance of a given physics or math problem. But performance (accuracy and response time) obtained from one instance is insufficient to infer the underlying mental mechanisms. Introspection and verbal protocol seemed the only way to go. This state of affairs started to change a decade ago, when the interest of cognitive psychologists shifted to optimization problems, such as the traveling salesman problem (TSP). TSP is defined as follows. Given a set of points (called cities), find a tour of the points with the shortest length. A tour is a path that passes through each point once, and returns to the starting point. The number of tours in a problem with N points is (N −1)!/2. Clearly, finding a shortest tour is an optimization problem. This problem is difficult because the number of tours is large, even for moderate values of N . Optimization problems, such as TSP, naturally lend themselves to parametric studies. There are a large number of instances of TSP, the instances can be systematically varied, and performance can be measured quantitatively by response time and accuracy. It is worth noting that optimization problems are ubiquitous in cognition. Minimizing a cost function has been a standard way to describe: perception of objects [6,12], figure-ground organization [7,14], decision making and games [17,20], motor control [3], categorization [10], formulating scientific theories [9,11], as well as human communication [16]. The fact that the human mind optimizes is not surprising considering that the mind is a result of a long evolutionary process, in which the best adaptation was achieved when the best solutions to everyday life problems were provided. If optimization is the sine qua non of cognition, then cognition, including thinking and problem solving, should be studied in optimization tasks. Finding the shortest TSP tour is difficult because TSP is NP-hard. This means that in the worst case, finding the shortest tour may lead to an exhaustive search through all tours. Because of computational intractability of TSP, there has been growing interest in designing algorithms that can find tours, which are close to the shortest tour, and the time it takes to find the tours does not grow too fast with the problem size N . Review of such algorithms can be found in Lawler et al. [8] and Gutin and Punnen [2]. Can humans solve TSP well? The answer is in the affirmative. More specifically, humans can produce optimal or near-optimal solutions in a linear time, as long as the problem is presented on a Euclidean plane. A review of recent results on human performance with Euclidean TSP (E-TSP) can be found in the first issue of the Journal of Problem Solving [19]. The next section presents our model of the mental mechanisms involved in solving E-TSP. The following two sections present psychophysical experiments and corresponding computational models on human performance in the shortest-path problem (SPP) and in E-TSP in the presence of obstacles
Non-Euclidean Traveling Salesman Problem
341
(E-TSP-O). The chapter is concluded with a summary and suggestions for future research.
2 A Pyramid Model for Euclidean-TSP (E-TSP) Pyramid algorithms have been used extensively to model human visual perception [5,13,14]. These algorithms were a natural choice for modeling TSP mental mechanisms because TSP is presented to the subjects as a visual task. Our pyramid model developed for E-TSP works by first generating a multiresolution (pyramid) representation of the problem, and then constructing a tour using this representation and a sequence of top-down refinements. The first version of this model was presented in Graham et al. [1], and the second version in Pizlo et al. [15]. Both versions received support in experiments, in which subjects solved E-TSP with 6−50 cities. The more recent version of the model is briefly described below. 2.1 Construction of the Pyramid Representation Pyramid representation involves a set of “images” that are characterized by more precise, local information on the lower layers of the pyramid and coarser global information on the top layers of the pyramid. In the pyramid representation the base of the pyramid is the set of points in the problem. This information is exact, but also very local: there is no information about the relationship among the points in the problem. Only the coordinates of each city are stored. In the layer above the base, some points are grouped together to form clusters, which are then abstracted away as a single point, representing the cluster’s center of gravity. The exact information about the positions of points is not present at this layer, but new information about the relationship among points has been gained. Near the apex of the pyramid, the problem may be reduced to a handful of clusters, each cluster representing dozens or even hundreds of points. Information about the general shape of the problem replaces information about the exact positions of points. The first version of the E-TSP algorithm used a bottom-up clustering involving Gaussian blurring [1]. In that algorithm, the number of clusters on a given layer was not directly controlled and it depended on the distribution of points. In the more recent version, the clustering is top-down and the number of clusters is controlled. Specifically, the nth layer from the apex has 22n clusters. The child–parent relations between these clusters (regions) on the different layers of the pyramid are fixed (see [15], for details). 2.2 Construction of the E-TSP Tour Once the model has generated a pyramid representation of an E-TSP problem, it then takes advantage of the representation to find a good (optimal or
342
John Saalweachter and Zygmunt Pizlo
near-optimal) tour. Basically, the overall “shape” of a good TSP tour of the individual points should be an optimal tour of the clusters on the top layers of the pyramid representation. So, the model first finds the best tour of the clusters on the topmost layer of the pyramid, a trivial task as the topmost layer of the pyramid will have a small number of clusters, and then “refines” this tour. In doing so, the model relies on the fact that, in a pyramid representation, each cluster on any given layer has child clusters on the layer immediately below it in the pyramid. A cluster in the tour is removed, and its children are inserted into the tour near its location. The exact position is determined by local search, specifically local cheapest insertion: for each child cluster, the model considers a small constant number of positions near the position of the parent node and finds a position that minimizes the length of the tour. The model then inserts the child cluster into the tour at that position. This process of top-down refinement continues until the tour consists of points on the bottom layer of the pyramid. The tour refinement process incorporates “foveating” and “eye movements.” In the first version of the model [1], the top-down refinement was carried out one layer at a time. Clusters in the tour that were on a given layer were replaced by their child clusters on the layer below. Once all of the clusters on a given layer had been refined, the refinement moved on to the next layer. With the foveating solve process, the top-down refinement focuses on a specific area of the tour. Refinement moves around the E-TSP instance, so that the model produces a complete finished part of the tour in one region before moving on to the next (see [4] for an animation). This model was shown to provide an acceptable fit to the results of several subjects, as measured by the proportion of optimal solutions and the average solution error. The computational complexity of the model is low (O(N log N )). In this chapter we aim to generalize this model to the case of E-TSP in the presence of obstacles (E-TSP-O). Figure 1 illustrates an E-TSP-O problem. The task is to produce a tour in such a way that the tour goes “around” the obstacles. When obstacles are present, the distances between pairs of cities are no longer Euclidean distances. So, the problem is not Euclidean. Nevertheless, the distances satisfy metric axioms, so the TSP problem is metric. There are two motivations for studying E-TSP-O. First is to use TSP problems that more closely reflect characteristics of real-life problems. Second is to provide an additional test of the pyramid model. One of the two main aspects of the pyramid model is the use of hierarchical clustering. Clearly, by using obstacles, proximity relations are changed and clusters are modified. Will introducing obstacles make the problem more difficult for the subjects? Will it make the problem more difficult for the model? The model will surely have to be modified. What is the nature of the required modifications? These questions are answered in the next two sections. Intuitively, E-TSP-O seems more difficult than E-TSP. In the extreme case, when obstacles form a maze, as shown in Figure 2, a human subject can no longer cluster points visually: points that are arbitrarily close in the image
Non-Euclidean Traveling Salesman Problem
343
Fig. 1. E-TSP with obstacles.
may have an arbitrarily long path between them due to obstacles. In such a case, the only way to proceed is to perform a search for paths connecting the pairs of points. Are the shortest paths always found? Are obstacles used at the stage of clustering or at the stage of constructing the tour? The first question is addressed in Section 3, and the second question is addressed in Section 4.
Fig. 2. E-TSP in a maze.
3 Shortest-Path Problem Consider the E-TSP-O problem shown in Figure 3. There are eight squareshaped obstacles, each square having two gaps. The gaps are always on
344
John Saalweachter and Zygmunt Pizlo
opposite sides of a square and the sides (up-to-down versus left-to-right) change from one square to another. This problem resembles the TSP in a maze shown in Figure 2. The optimal solution of this problem is shown as well. It is not obvious that this solution is indeed optimal. In particular, it is not obvious that the paths involved in this solution are shortest paths for the pairs of cities. They must be; otherwise, the tour would not be the shortest one.
Fig. 3. Ten-city E-TSP with eight complex obstacles. The optimal tour is shown on the right.
Before we understand and model how humans solve E-TSP in the presence of obstacles, we have to understand how humans find paths between pairs of points in the presence of obstacles. Do they always find the shortest path? If not, what path do they choose? Experiment 1. Human performance on SPP problem. Subjects. The two authors served as subjects. They ran the same problems in a different and randomly determined order. Stimuli. The SPP problems were displayed on a computer screen in a window of 512 × 512 pixels. There were a total of eight sessions representing two types of obstacles (simple versus complex) and the number of obstacles (1, 2, 4, and 8). There were 25 randomly generated problems per session. Figure 4 illustrates examples of several of these conditions. Results. The total solution time for the two subjects is shown in Figure 5a. Clearly, it took substantially longer to produce a path with complex obstacles. But this difference could, at least partially, be related to the fact that the subject had to move and click the mouse more often when complex obstacles were used. To remove this confounding factor, the time per vertex was plotted in Figure 5b. Time per vertex is defined as a total time of solving each problem divided by the number of vertices (including the points representing the start and the goal) in the polygonal line representing the solution tour. Now, the difference between simple and complex obstacles is substantially
Non-Euclidean Traveling Salesman Problem
345
(a) SPP with eight complex obstacles
(b) SPP with four complex obstacles
(c) SPP with two complex obstacles
(d) SPP with one complex obstacle
(e) SPP with eight simple obstacles
Fig. 4. SPP with various obstacles, with the shortest path appearing in the righthand panel.
smaller. Note that the times for both simple and complex obstacles tend to increase with the number of obstacles and the increase rate is somewhat faster with complex obstacles. This fact suggests that the mental mechanisms have average computational complexity higher than linear. This contrasts with ETSP, where the average time per city does not depend on the number of cities [15]. This comparison has interesting implications. E-TSP is computationally more difficult than SPP. However, mental mechanisms are computationally more complex in the case of SPP than E-TSP. This difference in complexity of mental mechanisms is most likely related to the fact that the human visual
346
John Saalweachter and Zygmunt Pizlo
system can analyze large parts of E-TSP, but not SPP, in a parallel fashion. When obstacles are present, the shortest path between a pair of cities is not a straight line. Establishing which path is the shortest involves examining several alternative paths, one after another.
(a) Total solution times in SPP
(b) Solution time per vertex in SPP
Fig. 5. Subject solution times for Experiment 1.
The proportion of optimal solutions is shown in Figure 6a, and the average error is shown in Figure 6b. Average error is computed by subtracting the length of the shortest path from the length of the path produced by the subject and normalizing the result to the former. The results for simple obstacles indicate that the subjects are almost always able to produce the shortest paths; the small departures from optimality are likely to result from the visual noise. If the visual system could measure distances precisely, all paths produced by the subjects would have been shortest. This was not the case
Non-Euclidean Traveling Salesman Problem
347
with complex obstacles. Here, the proportion of optimal solutions dropped, and the average error increased substantially as the number of obstacles increased. Clearly, the subjects were rarely able to produce shortest paths in this case. They performed search when they solved the SPP problem, as indicated by the analysis of solution time, but the search was limited and did not guarantee finding the shortest path when complex obstacles were used. This suggests that with complex obstacles, a greedy algorithm was used by subjects, most likely due to the limitations of the short-term memory. Specifically, the subjects cannot store all partial paths, as required by the optimal algorithm. The algorithm that guarantees finding the shortest path, as well as a greedy algorithm that does not, are presented in the next section.
(a) Proportion of optimal solutions in SPP
(b) Average solution error in SPP
Fig. 6. Quality of solutions obtained in Experiment 1.
348
John Saalweachter and Zygmunt Pizlo
Simulation models for SPP First, an algorithm for finding the shortest path is described. We applied Dijkstra’s algorithm to the “visibility graph.” The visibility graph determines which points in a 2D image are “visible” from any given point. If the point representing the goal is visible from the start point, then the straight line segment connecting these two points is the shortest path. Otherwise, all endpoints of obstacles that are visible from the start point are found and the endpoint that is closest to the start point is selected. Next, all endpoints of obstacles that are visible from the selected point are found, and the shortest path to each of these points from the start point is stored. Again, the endpoint that is closest to the start point is selected. The process is repeated recursively, until the goal is reached. The shortest path from the paths that were stored is guaranteed to be the shortest path from the start to the goal. Recall that the subjects almost always found the shortest path when simple obstacles were used (Figure 6). Subjects’ performance can be modeled by this algorithm when the information about the points and obstacles is modified by adding visual noise. A greedy algorithm that was a modification of the algorithm described above was used to model the subjects’ performance with complex obstacles. The visual noise was incorporated in the algorithm by adding a Gaussian noise to every estimate of distance. The mean value of the noise was zero and the standard deviation was 3% of the estimated distance (3% is a Weber fraction in line length discrimination task [21]). The algorithm described above was applied to the points representing the start and the goal and to the first k1 obstacles, counting from the start point. (k1 represents the limitations of the short-term memory. For example, when k1 = 2, there are at most four different paths that have to be stored in short-term memory.) Once SPP for the first k1 obstacles was determined, the algorithm made k2 steps to go around the first k2 obstacles, and the process was repeated by replacing the start point by the point reached after k2 steps. Obviously, k2 ≤ k1 . Figure 7 shows the model’s average solution error for several values of k1 , k2 . Figure 8 compares the model’s performance for k1 = k2 = 2 to that of the subjects. Although the fit is not perfect, the graphs show that the effect of the number of complex obstacles on the solution error and proportion of optimal solutions is similar in the case of the model and the two subjects. If the model were fit to individual problems, the fit would have been substantially better. In the next section, the subject’s performance in E-TSP-O with three types of obstacles was measured and modeled. The obstacles varied in size and shape.
4 Euclidean TSP with Obstacles (E-TSP-O) Two experiments tested the effect of geometrical properties of obstacles (size and shape) on performance. In the first experiment, obstacles were straight
Non-Euclidean Traveling Salesman Problem
349
Fig. 7. Average error of the SPP model for several levels of local search.
lines with different lengths. In the second experiment, the obstacles had different shapes. Experiment 2. Human performance on E-TSP-O: straight-line obstacles. Subjects. The two authors were tested. They ran the same problems in a different and randomly determined order. Stimuli. The TSP problems were displayed on a computer screen in a window of 512 × 512 pixels. Four sets of 25 problems were used. Each problem consisted of 20 randomly placed points and 10 randomly placed obstacles. The obstacle length was constant within each set of problems, but different across sets. The obstacle length was 100, 144, 208, and 300 pixels in the four sets of problems (the length across the sets was increased by a constant factor). Examples of problems for two obstacle length are shown in Figure 9. If there was an isolated city due to the placement of obstacles, a randomly placed gap was produced in the relevant obstacle to produce a connection to this city (see Figure 9, the panel on the right). Results. The average solution time per vertex is shown in Figure 10 and the average solution error is shown in Figure 11 (results from Experiment 3 are superimposed). The number of vertices in the solution tour is equal to the number of cities plus the number of obstacles whose endpoints had to be included in the solution. Figure 12 shows an example of a solution tour with 25 vertices. The rationale for using time per vertex is the same as that for using time per city in Euclidean TSP. Each is fairly insensitive to the time it takes to move and click the mouse. Longer obstacles are likely to lead to more clicks because more obstacles interfere with the solution tour. The error is computed by subtracting the length of the shortest tour from the length of the subject’s tour and normalizing the result to the former. Solution time was systematically affected by the length of the obstacles. It increased from 0.8 to 1.3 sec per vertex in the case of JS and from 1.4 to 2.3 sec per vertex in the case of ZP. For comparison, time per city in the case of
350
John Saalweachter and Zygmunt Pizlo
(a) The comparison of the model and the subjects’ average errors in SPP
(b) The comparison of the model and the subjects’ proportion of optimal solutions in SPP
Fig. 8. Comparison of simulation model and subjects’ performances.
20-city Euclidean TSP was 0.94 sec for JS and 1.21 sec for ZP. The solution error, however, was not systematically affected by the obstacle length. Furthermore, the errors with obstacles were not very different from errors without obstacles. JS’s solution error with 20-city Euclidean TSP was 1.2% and ZP’s error was 3.1%. Experiment 3. Human performance on E-TSP-O: the role of obstacle shape. Subjects. The two authors were tested. Stimuli. Two types of obstacles, C- and L-shaped, were used. The total length of each obstacle was 208 or 300. Each problem contained 20 cities and 10 obstacles. Figure 13 shows examples of problems with each of these two types of obstacles. Each subject was tested with four sets of 25 problems. The
Non-Euclidean Traveling Salesman Problem
351
Fig. 9. Examples of E-TSP-O stimuli in Experiment 2. On the left the second shortest-line obstacles (length 144) and on the right the longest obstacles are shown (length 300).
order of the sets was random. All other aspects of this experiment were the same as those of Experiment 2. Results. It can be seen in Figures 10a and 10b that solution times for L and C obstacles are similar to those for straight-line segment obstacles. The errors, however, are, overall somewhat higher. The pattern of results for errors is not systematic. Long L obstacles were easier for JS, whereas long C obstacles were easier for ZP. Our simulation analyses indicate that long obstacles reduce the number of short tours (tours whose length is close to the shortest tour). So, if the subject makes the right decisions in planning the tour, decisions that eliminate large errors, the solution tour may be fairly short. Apparently, JS was able to exploit this feature during the session with L obstacles, whereas ZP did it with C obstacles. These two subjects ran the conditions in different order, so it could be practice that was responsible for this improvement. Discussion. The effect of the obstacle length on the time per vertex suggests that the complexity of the mental mechanisms increases with an obstacle length. This seems intuitively obvious: longer obstacles force the subject to perform more search. However, once the subject performs the search, the tours are not necessarily longer, as measured by the solution error. How should the pyramid model for E-TSP be modified to account for these results on E-TSP-O? Simulation Models A pyramid model for E-TSP developed by Pizlo et al. [15] was elaborated into a pyramid model for E-TSP-O. Two versions of the E-TSP-O model are presented here. The first version, called Model 1, differed from the E-TSP model only in the way it performed the cheapest insertion during the top-down tour refinement. The hierarchical clustering was performed the same way as in E-TSP of Pizlo et al. [15]. Namely, obstacles were ignored during clustering.
352
John Saalweachter and Zygmunt Pizlo
(a) Subject JS
(b) Subject ZP
Fig. 10. Time per vertex in E-TSP-O.
They were used only at the second stage when the tour was produced. While performing cheapest-insertion, Model 1 determined and used the shortest path between a given pair of cities. The shortest path was determined by applying Dijkstra’s algorithm to the visibility graph (see Section 3). The shortest path was determined in this model without imposing visual noise. Our preliminary simulations showed that including visual noise in the model of TSP is not essential because the errors are more likely to be produced at the stage of producing a TSP tour than solving the SPP problem. The second version of E-TSP-O, called Model 2, is identical to Model 1, except that the obstacles are also used at the stage of clustering. Specifically, after the clusters are formed without obstacles, the shortest paths among centers of gravity of clusters are computed. If a given child node is closer to another node’s parent than to its
Non-Euclidean Traveling Salesman Problem
(a) Subject JS
(b) Subject ZP
Fig. 11. Average error in E-TSP-O.
Fig. 12. An optimal tour for 20-city E-TSP-O. This tour has 25 vertices.
353
354
John Saalweachter and Zygmunt Pizlo
Fig. 13. 20-city E-TSP-O with L and C obstacles, length 300.
own parent, the link in the pyramid representation is changed to represent this proximity relation. Fitting the model to the subject’s results was done for each problem individually. Specifically, the model tried all points as starting points as well as both directions of the tour (clockwise and counterclockwise). For each starting point and starting directions the amount of local search was varied by changing the value of parameter K (see [15]). This parameter specifies how many nodes were tried in the cheapest insertion method. For example, K = 1 means that cheapest insertion will only check the edges directly next to the parent node. A K = 4 means that a total of 8 edges will be considered by cheapest insertion, the four edges on either side of the parent node. The tour whose error was closest to the error of the subject on a given problem was taken as the best fitting tour. Figures 14 and 15 show the average solution errors of the best fitting tours for both models and both subjects. It can be seen that both models provide a reasonable fit, with Model 2 being slightly better. At this point it is impossible to decide between these two models. This decision might be possible when results from additional subjects are available. It seems, however, that each model is a possible model of the underlying mental mechanisms. Model 1 can represent trials, in which the subject starts solving the problem without examining the distribution of cities and obstacles in any greater detail. The information about obstacles is taken into account during the solution process. Model 2 can represent trials, in which the subject begins with examining the problem first and determining the actual distances among clusters. Only after the problem is examined, does the subject start producing the tour. Reports of the two subjects, as well as the actual data, suggest that both approaches are used. It is unclear at this point how the subject decides to choose which approach (model) is used for a given problem.
Non-Euclidean Traveling Salesman Problem
355
(a) Model 1
(b) Model 2
Fig. 14. Performance of models in E-TSP-O superimposed on results of JS.
5 Summary and Conclusions The present study showed that humans can find near-optimal solutions to TSP problems not only with Euclidean distances, but also with non-Euclidean ones, when the obstacles are placed on a Euclidean plane. As such, this study generalizes prior results to TSP problems that are closer to real-life applications. In both types of problems, humans can solve the problems quite well without performing exhaustive search. However, the complexity of the mental mechanisms is higher in the case of E-TSP-O due to the greater amount of search that is needed to establish clusters and/or solve the SPP problem. In order to account for the subjects’ results, an SPP model was formulated and the E-TSP model was elaborated to an E-TSP-O model. For the kinds of obstacles that were used in Experiments 2 and 3, the subjects appear to solve the SPP problem optimally. But for more complex obstacles, such as those
356
John Saalweachter and Zygmunt Pizlo
(a) Model 1
(b) Model 2
Fig. 15. Performance of models in E-TSP-O superimposed on results of ZP.
used in Experiment 1, the SPP problem is not solved optimally. In such cases, the subjects use a greedy algorithm, which is likely to lead to larger errors in E-TSP-O. The E-TSP-O model involved two modifications of an E-TSP model. The first version of the new model performed clustering of points without obstacles. The information about obstacles was used only at the stage of tour refinement when the cheapest insertion was performed. This model already produced quite good fits to the subjects’ data. The second version involved a modification of the pyramid structure by changing the links, depending on the actual distances among clusters. The actual distances were computed by solving the SPP problem with obstacles. This version provided slightly better fits to the data. In fact, there is reason to believe that humans use both models
Non-Euclidean Traveling Salesman Problem
357
depending on whether they decide to examine the distribution of obstacles before they start solving the problem. Our future research will use non-Euclidean TSP problems (nE-TSP) without obstacles. This will be accomplished by placing the points (cities) on 3D surfaces, such as spheres. In such cases, the geodesics are not Euclidean distances, but they satisfy metric axioms. Will the subjects be able to produce near-optimal TSP tours?
Acknowledgments This project was supported by a grant from AFOSR. The authors are grateful to Dr. Walter Kropatsch for his comments and suggestions. In particular, we would like to acknowledge his suggestion to apply the visibility graph to the shortest-path problem.
References 1. S. M. Graham, A. Joshi, and Z. Pizlo. The traveling salesman problem: A hierarchical model. Memory and Cognition, 28:1191–1204, 2000. 2. G. Gutin and A. P. Punnen. The Traveling Salesman Problem and Its Variations. Kluwer, Boston, 2002. 3. C. M. Harris. On the optimal control of behavior: A stochastic perspective. Journal of Neuroscience Methods, 83:73–88, 1998. 4. Human problem solving difficult optimization tasks workshop. http://psych. purdue.edu/tsp/workshop/downloads.html. Last accessed January 2008. 5. J. M. Jolion and A. Rosenfeld. A Pyramid Framework for Early Vision. Kluwer, Dordrecht, 1994. 6. D. C. Knill and W. Richards. Perception as Bayesian Inference. Cambridge University Press, Cambridge, UK, 1996. 7. K. Koffka. Principles of Gestalt Psychology. Harcourt, Brace, New York, 1935. 8. E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. The Traveling Salesman Problem. Wiley, New York, 1985. 9. M. Li and P. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer, New York, 1997. 10. R. M. Nosofsky. Attention, similarity and the identification-categorization relationship. Journal of Experimental Psychology: General, 115:39–57, 1986. 11. M. A. Pitt, J. Myung, and S. Zhang. Toward a method of selecting among computational models of cognition. Psychological Review, 109:472–491, 2002. 12. Z. Pizlo. Perception viewed as an inverse problem. Vision Research, 41:3145–3161, 2001. 13. Z. Pizlo, A. Rosenfeld, and J. Epelboim. An exponential pyramid model of the time-course of size processing. Vision Research, 33:1089–1107, 1995. 14. Z. Pizlo, M. Salach-Golyska, and A. Rosenfeld. Curve detection in a noisy image. Vision Research, 37:1217–1241, 1997.
358
John Saalweachter and Zygmunt Pizlo
15. Z. Pizlo, E. Stefanov, J. Saalwaechter, Z. Li, Y. Haxhimusa, and W. G. Kropatsch. Traveling salesman problem: A foveating algorithm. Journal of Problem Solving, 1:83–101, 2006. 16. W. V. Quine. Word and object. MIT Press, Cambridge, MA, 1960. 17. H. A. Simon. The Sciences of the Artificial. MIT Press., Cambridge, MA, 1996. 18. R. M. Steinman, Z. Pizlo, and F. J. Pizlo. Phi is not beta, and why Wertheimer’s discovery launched the Gestalt revolution. Vision Research, 40:2257–2264, 2000. 19. The Journal of Problem Solving. http://docs.lib.purdue.edu/jps/. Last accessed January 2008. 20. J. von Neumann and O. Morgenstern. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1944. 21. R. J. Watt. Scanning from coarse to fine spatial scales in the human visual system after the onset of a stimulus. Journal of the Optical Society of America, A4:2006–2021, 1987.
Index
adaptation in evolution, 340 level, 203, 204, 209–211, 213, 214, 216, 217 to wealth, 199, 200, 202–204, 207–210, 212, 216–218, 220, 222 adaptive good, 200, 205, 220, 221, 223 affiliation, 166, 167, 169, 171, 173–175, 177, 178, 183, 185–188, 194, 195 agent, 144–150, 157–160, 278, 280–291, 294, 297, 298, 300, 301, 304 accountant, 147, 148, 157 conformist, 146, 150 greedy, 147, 153, 156 in agent-based computational model, 143, 285, 286, 297 influencing, 145, 146, 153–155 Pavlovian, 146, 147, 150 agriculture, 57, 58, 61–64, 66–69, 71–73, 190–192 Allais paradox, 9 alternative, 5, 6, 9, 11, 12, 14, 16, 18–24, 40, 43, 44, 52, 79–84, 86–90, 92–95, 100, 101, 108, 113, 116, 118–121, 125, 126, 131, 182, 184, 227–229, 238–240, 242, 243, 245, 246 acceptable, 80, 82–84, 87, 90, 95 ambiguity, 8, 254 aversion, 231 linguistic, 234 analytic hierarchy process, 23, 25, 26 anchor-and-adjust, see heuristic, anchor-and-adjust
anchoring, 8, 40, 43, 44, 49, 232, 271 anthropology, 178, 189, 191 aspiration, 80 error, 82, 95 level, 80–87, 89, 93–96 role, 174, 181, 195 status, 178 asymmetric dominance, 184, 185 attribute, 11, 12, 19–21, 26, 80–84, 87, 88, 91, 92, 95, 125, 126, 182, 183, 234, 253, 255 acceptable, 82 correlated, 86 exponentially distributed, 89 nonstationary, 89 normal, 86 splitting, 10–12 uniform, 85, 86, 88, 89 availability, 7, 8, 232 basic good, 200, 204, 205, 220–223 battle of sexes (BOS), 143–145, 157, 159, 160 bias, 4–6, 8–10, 21, 23, 27, 34, 35, 44, 183, 207, 227, 228, 230, 231, 243–246, 259, 260, 286, 297–299 confirmation, 231 disconfirmation, 231 goal, 286, 288 inhibiting tradeoff study acceptance, 238, 239 parameter, 11 partition dependence, 8, 9, 17 projection, 199, 217–223
360
Index
bidimensional matching, 11 binary choice problem, 113 binary relation, 101 classes of, 101 complement, 101 inverse, 101 transitivity, 101 Braess paradox (BP), 309–312, 314–316, 319, 320, 328, 329 Brunswik’s lens model, 125, 128–131, 134, 135 case study Argentine pampas, 57, 62 as research method, 41 Compensation, Accessions, and Personnel Management, 41 East Africa risk sharing, 193 European university comparison, 120 in research versus teaching, 41 Northeast Brazil water allocation, 192, 193 Obergurgl development, 42 Pinewood derby tradeoff study, 230 POLANO estuary study, 43 prospective use, 52 San Joaquin Valley model, 42 Shipbuilding and Force Structure Analysis, 42 Ugandan agricultural communities, 190–192 centipede game, 310, 311, 314 certainty effect, 10, 235 certainty equivalent (CE), 253, 255 choice, see choice certainty equivalent judged, see judged certainty equivalent choice certainty equivalent (CCE), 260, 261, 263 climate, 57–59, 62, 65, 66, 73, 165, 166, 177, 190–193, 195 cognitive trap as observed in cases, 48 possibly eliminated by modeling, 48 possibly exacerbated by roles, 50 recommendations to reduce, 52 reduced by clear communication, 50 requiring analysts’ consideration, 50
commons dilemma, 165, 167–169, 171, 173, 179, 195 communication of uncertainty, 50 to reduce cognitive traps, 50 comparison probability binary, 99, 100, 105–115 ternary paired, 100, 106, 111–113, 117 with indifference, 106, 107, 116 without indifference, 105–107, 118 compensatory strategy, 92, 125, 126, 128, 130, 135, 242 equal weight, 126 weighted additive, 126 Condorcet paradox, 108 conformity, 146, 165, 290, 291, 300 conjunction fallacy, 234 consolidation, 291 constant relative risk aversion (CRRA), 60 consumption, 189, 199, 200, 202, 204, 205, 209–211, 213–223, 278 budget, 199 independence, 201, 202 plan, 211–215, 217–220 stream, 202 contingent decision behavior, 129 valuation method, 11, 13 weighting, 182, 183 conventional wisdom, 34 convex hull, 113 debiasing, 6, 10 disjunctive normal form (DNF), 127, 131 dual processing system, 7 dynamic process deterministic, 158 stochastic, 158 dynamic programming, 84 Easterlin paradox, 205 ecological criterion, 135, 136 effectiveness, 3, 5, 13–16, 21, 22, 24, 25, 27 strong, 3, 5, 14, 22, 26 longitudinal studies, 17, 18
Index simulation, 18 weak, 3, 5, 14, 23, 25, 26 expected value, 18, 19 panel preferences, 19–21 efficient frontier, 19, 20 Ellsberg paradox, 231, 253, 257 enterprise allocation, 62 enterprise (cropping), 57, 62–66, 68, 69, 71–73 EPICURE, 285–289 ethnology, 24 expected utility, 14, 16, 57, 59–61, 64–73, 170, 178, 180–182, 194, 256 subjective, 4, 6, 9, 26, 235, 253, 254 expected utility theory, 16, 59, 60, 101, 103 extensionality fallacy, 235 eye movement, 342 facet, 113–121 foraging, 277, 279, 280, 282, 284, 286–290, 294, 302, 303 ideal free distribution (IFD), 279, 280, 288 ideal preemptive distribution hypothesis, 284 forecasting, 22, 43, 58, 59, 177, 190–192, 195 foveating, 342 framing, 7, 9, 24, 45, 46, 50, 185, 188, 189, 195, 230–232, 236, 237 goal, 258, 270 frequency illusion, 234 game, 18, 22, 79, 143–145, 153, 165, 167, 170, 171, 186, 281, 302, 310, 311, 314, 315, 317–323, 326–331 battle of sexes, see battle of sexes (BOS) centipede game, see centipede game cooperative group identity, 185 dynamic, 159 iterated, 145 network, 310, 314, 315, 319 noncooperative, 144, 312 prisoner’s dilemma, see prisoner’s dilemma
361
Gaussian blurring, 341 Gaussian distribution, 281 Gaussian noise, 348 gestalt revolution, 339 group-role obligation, see role obligation heuristic, 6–8, 39, 82, 93, 127, 131, 220, 228, 230–232, 236, 238, 242, 316 affect, 7, 12 anchor-and-adjust, 6 ignorance prior, 8, 9, 11 imitation, 147, 153, 290, 292, 297 impartial culture, 109 incomparability, 103 indifference, 10, 11, 20, 67, 100, 102–104, 106–110, 112, 116, 119–121, 146, 253–256 information conveyance pathway, 36 inherent penalty, 82, 90, 92, 95, 96 innovation, 58, 73, 277, 290, 291, 294, 297, 300, 302 propagation paradigm, 301, 302 interior additivity, 9 intransitivity, 103, 104, 108, 109, 111, 121, 182, 185 intrinsic reward, 167, 169–173, 179, 195 judged certainty equivalent (JCE), 260–262 judgmental gap, 35 efforts to close, 38 factors producing, 35 lens model equation (LME), 130 lens model, see Brunswik’s lens model loss aversion, 10, 60, 67, 71–73, 181, 205, 232 loss aversion parameter, see parameter, loss aversion medium propaganda, see propaganda minimum critical network, 313 mixture model, see model, mixture model adaptation–social comparison, 211, 212, 216, 218 assumptions, 35
362
Index
crop CROPGRO, 63 generic-CERES, 63 discounted utility (DU), 200–202, 209, 212, 220 formal, 34 classes of, 35 versus mental models, 35 Mallows, 121 mixture, 99, 100, 104–106, 111, 114, 116, 117, 121 of transitive relations, 104 pyramid, 341, 342, 351, 354, 357 random preference, 100 random utility, 106, 107, 121 noncoincident, 106 SSEC, 297, 298 transitive preference, 99–114, 116–122 weak utility, 108 model conveyance lack of guidelines for, 39 model presentation, 38 as gateway for decision maker, 38 momentum strategy, 18 Nash equilibrium, 144, 171, 172, 179, 186, 309–311 no-trial learning, 290 noncompensatory strategy, 125, 126, 128, 129, 131, 132, 135, 138 conjunctive (see also, satisficing), 126 disjunctive, 126 elimination by aspects, 126 lexicographic, 126 take the best, 126 NP-hard, 113, 340 optimal stopping, 84, 88–90 optimization, 18, 61, 62, 64, 66, 68, 69, 72, 73, 79, 94, 96, 113, 132, 139, 219, 220, 245, 340 nonlinear, 84 GAMS, 64, 66 MINOS5, 64 numerical, 84, 88 of expected utility, 65 of prospect theory value, 65 of traveling salesman problem, 340 order
classes of, 100–102 overmatching, 284–288 parameter, 10, 57, 60, 61, 63, 66–68, 70–73, 86, 143, 148, 150, 159, 160, 176, 229, 245, 253, 256–259, 268–271, 289, 298, 310 bias, see bias, parameter in cheapest insertion, 354 lens model, 130, 131, 134, 136–138 loss aversion, 61, 67, 72, 73 network, 314, 315 reference wealth, 67 reference level, see reference level risk aversion, 67, 73, 205 risk preference, 67, 73 risk seeking, 70 shape, 146, 147 weak stochastic transitivity, 111 parametric study, 339, 340 particle swarm algorithm, 297, 298 partition dependence, see bias, partition dependence Pearson’s correlation coefficient, 130, 135 polytope, 99, 100, 111, 113–115, 117 binary choice, 114 interval order, 118, 119 linear order, 113–117 minimal description, 113 partial order, 100, 118, 119 weak order, 116–118, 120 preference, 3, 4, 7, 10, 13, 16, 19, 26, 27, 36–38, 60, 73, 80, 81, 100, 143, 144, 158–160, 184, 218, 238, 246, 254, 255, 260, 263, 278 assessment and elicitation, 5, 6, 9, 10, 12, 25 construction, 12, 13, 183 multiattribute, 10, 17 panel, see effectiveness, weak, panel preferences random, see model, random preference reversal, 9, 11, 260 transitive, see model, transitive preference under risk, 9 vagueness, 256–258
Index prisoner’s dilemma, 143–145, 160, 169–171, 310, 311 probability assessment, 4–8, 16, 39 and prospect theory, 237, 238 propaganda, 145–148 prospect theory, 10, 27, 59–62, 64–67, 69, 71–73, 101, 103, 181, 183, 227, 230, 232, 233, 235–238, 246, 253, 255–257, 259, 268 cumulative, 60 random graph, 292, 293 reference level, 64, 199, 200, 203–205, 209, 210, 212–215, 217–221, 223 actual, 217–219 predicted, 217–219 regression, 126, 127, 129–131, 135, 139, 186 regret, 59, 64 resource sharing, 303 risk, 6, 7, 9, 10, 12, 16, 22, 25, 27, 57–61, 67–69, 71, 73, 173, 178, 180, 188, 189, 193, 230, 233, 239, 244, 253–257, 259, 260 aversion, 17, 45, 60, 61, 67–73, 188 communication, 37 neutrality, 67–71, 73, 269 preferences under, 27 seeking, 45, 61, 70, 71, 73, 269 risk communication versus model conveyance, 37 role obligation, 165, 175, 176, 178, 194 rule, 7, 81, 96, 126–128, 144, 158, 159, 165, 167, 182, 185, 194, 243, 256, 286, 289 inference system, 131 set, 132–134 rule-based lens model (RLM), 129, 131, 132, 134–139 rule-based lens model (RLM)., 137 sanction, 166, 169, 171–173, 175, 179, 194 satisficing, 79–96 decision maker, 79 heuristic, 82, 93 infinite horizon, 94 multiattribute, 81, 90 set, 80
363
versus maximizing, 79, 82, 90–92 scale compatibility, 10, 11, 17 search behavior, 277, 278, 301, 304, 305, 343, 347, 351 abstract, 277, 278, 290, 302 collective, 277, 278, 291, 294, 300, 302–305 concrete, 277, 302 self-regulation, 13, 176 shortest-path problem (SPP), 340, 343, 344, 348 Dijkstra’s algorithm, 348, 352 with obstacles, 339, 344, 346, 348 simulation, 18, 22, 23, 34, 35, 42, 63, 64, 143, 145, 148–157, 159, 160, 228, 288, 294, 298, 301, 348, 351, 354 social comparison, 175, 199, 200, 202–204, 207, 209–217, 222 level, 203, 204, 209–211, 213, 215, 216, 218 social goal, 165–168, 172–179, 182, 183, 185, 186, 188, 189, 191–195 and broader decision theory, 185 and environmental decisions, 167 economic tradeoffs, 179 intertemporal tradeoffs, 167, 181 reciprocity, 178, 179 tradeoffs among different goals, 180 uncertainty, 180, 181 social identity, 166, 195 social network, 277, 300, 301, 303, 304 stakeholder, 16, 25, 27, 36, 45, 51, 57 stigmergy, 278 subjective expected utility, see expected utility, subjective support theory, 7, 8 sure-thing principle, 234 switching, 71, 288, 309, 318, 320, 321, 323, 326–330 contrarian, 327 inconsequential, 327 myopic, 327 tipping point, 291 tradeoff, 12, 13, 27, 52, 73, 80, 227–231, 238–246, 301, 304 assessments, 11 components, 229 in social goals, see social goal
364
Index
method, 10 transitive preference, see model, transitive preference traveling salesman problem (TSP), 339–341, 349, 350 Euclidean (E-TSP), 340–344, 346, 351, 354, 356, 357 with obstacles, 341–343, 349–351, 354, 356, 357 triangle inequality, 114–117, 119 two-alternatives forced choice, 108 typicality, 232, 234 undermatching, 280, 282, 284–289 unmodeled knowledge, 130, 136, 138 utility, 10, 12–14, 19–21, 26, 37, 39, 60, 96, 99, 100, 103, 104, 126, 167, 179–183, 185, 190, 194, 195, 199–202, 204, 205, 207, 210, 212, 214–222, 235, 239 actual, 218, 219 discounted, see model, discounted utility (DU) expected, see expected utility, see expected utility experienced, 204, 209–211, 214, 216 function, 6, 10, 17, 20, 21, 60, 67, 80–82, 90, 103, 104, 107, 180, 205, 259, 260 maximization, 165
predicted, 218, 219 random, see model, random utility subjective, see expected utility, subjective threshold, 103 vague, 255, 257, 261, 268 gain, 253, 258, 261, 267, 268 loss, 253, 258, 261, 267, 268 outcome, 253–255, 257, 264, 266, 270 probability, 254–256, 261 prospect, 253, 255, 256, 270 mixed(-outcome), 253, 259 vagueness, 253–260, 269–271 aversion, 253, 254, 256, 258, 266, 271 avoidance, 269, 270 coefficient, 257–259, 269 outcome, 256 probability, 256, 258 insensitivity to, 256 preference, see preference, vagueness resolving operation, 256 seeking, 253, 258, 266, 269–271 value attribute, see attribute economic, 42, 58 of information (VOI), 58 utility, see utility value-focused thinking (VFT), 22, 23