The Geometry of Opinion

  • 97 19 9
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

The Geometry of Opinion: Jeffrey Shifts and Linear Operators Bas C. Van Fraassen Philosophy of Science, Vol. 59, No. 2. (Jun., 1992), pp. 163-175. Stable URL: Philosophy of Science is currently published by The University of Chicago Press.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact [email protected] Tue Jun 19 06:29:14 2007

Philosophy of Science

June, 1992



BAS C . VAN FRAASSEN.1 Department of Philosophy

Princeton University

Richard Jeffrey and Michael Goldstein have both introduced systematic approaches to the structure of opinion changes. For both approaches there are theorems which indicate great generality and width of scope. The main questions addressed here will be to what extent the basic forms of representation are intertranslatable, and how we can conceive of such programs in general.

1. Expectation. The creation of the modern theory of probability in the correspondence between Ferrnat and Pascal (1654) was quickly followed by a definite monograph by Huyghens (1656/57), and by an announcement of its philosophical significance in the Port-Royal Logic (1662): To judge what one must do to obtain a good or avoid an evil, it is necessary to consider not only the good and the evil in itself, but also the probability that it happens or does not happen; and to view geometrically the proportion that all these things have together. This story is well-known, as is the rapid assimilation and development of the mathematical theory by the Bernoullis, Buffon, Bayes, and so on (see David 1962, chaps. 8 and 11). I wish here to express a conjecture: that the clue to this rapid development lies in the word "geometrically" in the above quotation. By the time Huyghens wrote his monograph on probability, he and his contemporaries were doing physics with the analytic geometry which Descartes had created only shortly before. This *Received February 1991; revised April 1991. ?Send reprint requests to the author, Department of Philosophy, 1879 Hall, Princeton University, Princeton, NJ 08544-1006, USA. Philosophy of Science, 59 (1 992) pp. 163- 175 Copyright D 1992 by the Philosophy of Sc~enceAssociat~on



new form of geometry is the context in which it makes sense to say that value (expectation, expected value, Huyghens's "value of the hope"; the French term is still "espkrance") is geometrically determined. An explanation of this point will provide us here with the right setting and terminology to approach twentieth-century projects of representing opinion change. To limit the scope of this paper, I will throughout concentrate on discrete quantities with finite range, also called "simple" random variables (rv). Examples are the number of dollars earned in a given day, the rainfall rounded off to the nearest centimeter, and the indicator quantity which takes value 1 if there is oxygen present and 0 otherwise. Such quantities may be functions of time or other parameters, but we will assume that the values of all such parameters are set. Each such quantity depends on a division (partition) of possibilities, and has a numerical value in each case listed in that division (in each cell of the partition). At the same time, each case has its own probability. If the cases are A,, . . ., A, then quantity X has value xi in case A,, which has a certain probability pi. The expectation value of X is then p,x, . . . + p,x,. In general, if PP is the partition characteristic of quantity X, and p a probability function defined for PP, then the expectation is


E,(X) = C{p(A)xA : A in PP}


where xA is the value X takes in case A. All of this is only semiprecise here as yet; I will give more exact definitions below. If we are looking at a single finite partition PP, we can thereby concentrate on the set of rv X such that PP = PP(X), and the probability functions matter only in their restriction to the subfield generated by PP. Hence all of these can be thought of as vectors with the same dimensionality; for example, if PP = {A(l), . . ., A(n)} then we can write

where the inner product, here symbolized by a comma, has algebraic definition ((x,, . . ., x,), (y,, . . ., y,)) = Cxiyi. But this product also has a geometric interpretation: (X,Y) =

1x1 JYlcos(XAY)

where XAYis the angle between the representing vectors X and Y in the plane which contains both and 1x1 is the length of vector X. It is clear then that expectation values can be compared geometrically-in some cases even visually-by considering lengths and angles.



Here is how a simple decision problem could be handled. Suppose that X = (1,3) and Y = (2.5,2) represent the values of the possible outcomes (in states of nature A(1), A(2)) of two actions, and that our probability vector for this partition is p = (0.2,O.g). Naively construed, at least, the crucial question is which is the greater, E,(X) or E,(Y)? We draw a diagram in which each of the three vectors X, Y, p is represented, and then project p on X. The length of the projection is Iplcos(pAX), which we measure and then multiply by the length 1x1 to get E,(X). We make a similar construction and measurement for Y. Note that with the numbers all given, this can all be done on graph paper with pencil and ruler-no separate calculation of cos(pAX)is involved. But if the lengths of X and Y are nearly equal, for example, we will get our answer just by visually comparing the angles (pAX)and (pAY). In our example, even the crudest graphing shows the lengths of X and Y to be nearly the same (approximately 3.16 and 3.20) and the angle p makes with X much smaller than the one it makes with Y (approximately 5 and 40 degrees). Since the cosine varies inversely with the angle, "closer is better", and E,(X) must be significantly greater than E,(Y). The cosines of 5 and 40 degrees are approximately 1.25 : 1, so this very rough depiction shows already that the expectation of X must be about one and a quarter as much as that of Y. A much more careful graphing, or analytic calculation, would show the more nearly precise figures:

Except in very simple cases, there is no advantage to the visual graphing method. But the important point remains: The basic forecasting tools of the new theory of probability were exactly the basic tools of analytic geometry, the new mathematics which made mathematical physics possible. This way of thinking about the representation of opinion can also be extended beyond the elementary level.

2. The Manifold of Simple rv. Looking at simple rv as vectors automatically raises the question about how they combine with each other through addition, multiplication, and so forth. They form a manifold with a good deal of structure, which can be described as follows. A sample space is a couple S = (K,F) where F is a field of sets that has nonempty set K as its maximal element. The members (points) of K represent different possibilities (events, situations, worlds), and the sets



in F are called the measurable sets (also called events, or propositions). A random variable (rv) on S is a map X : K -+ real numbers, such that for all Bore1 sets E , the set X-'(E) = {x in K : X(x) in E} is measurable. If X has only countably many distinct values it is discrete, and if only finitely many values simple. In those cases the preceding condition takes the equivalent form

Let RV(S) be the set of rv on space S. This is a vector space on the field of real numbers. It is closed under the operations of vector addition and scalar multiplication defined by

which makes vector addition associative and commutative, with the properties (definitive of such a vector space):

where 4, the null vector, is defined by +(x) = 0 for all x in K. In addition, there is a product operation, point-wise multiplication which I will symbolize with a dot:

This is also associative and commutative, and there is a unit element I : I(x) = 1 for all x in K. We have


(if but there is no multiplication inverse for the elements other than X(x) = 0 for any x in K at all, then X . Y Z I for any Y). However, this point-wise multiplication, despite its irrelevance to the vector space structure, will play a role in some of our constructions below. All of the above holds for the subset Q(S) of RV(S), the set of simple rv on S. Of course there are differences in structure between RV(S) and Q(S), but our focus will be on Q(S) with the larger space here only present as background for discussion. The distributivity of the operations noted above entail that the operations (X . ) and (X ) are linear. In general an operator V on this manifold is linear exactly if




The effect of V on X must be distinguished from point-wise multiplication such as Y . X; the notation "VX" is short for "V(X)". There are many different sorts of linear operators, and the project below will be to single out a family of them that correspond, in a certain way, to changes of opinion (specifically, to changes by Jeffrey Conditionalization). For use below, I will here give two examples of linear operators, which will not fit the description of that family but instead can illustrate certain distinctions. Example 1: Let K be finite, containing exactly the numbers 0, . . ., N, and let F contain all the subsets of K. Let PP be the partition {Ai = {i} : i = 0, . . ., N} and let I,(,, be the indicator of set A, : I,(,,(x) = 1 if x is in A, and equal to 0 otherwise. All rv on this space are linear combinations of these indicator functions. We can identify a particular linear operator on this space by its effect on these "atoms". For example,

extends by linearity to all rv on the space. Example 2: Let K be the unit interval [0,1] and F the (sigma-) field of all its Bore1 sets. Then if h maps [0,1] into itself, and we define H by

we find that H too is linear, no matter what function h is like:

but, of course, whether HX is always an rv does depend on what h is. A somewhat "smaller" example results if we replace F by the field generated by finitary union and intersection from the subintervals of [0,1]. If V is a linear operator and VX = k X , we call X an eigenvector of V corresponding to eigenvalue k. The family of these eigenvectors is the (k-)eigenspace of V; it is again a linear manifold. In Example 1, eigenvectors of U must be as follows (setting N = 2 for clarity):

which means that a, = ka,, a , = ka,, a, = kao. This is true trivially if either k or a,, a , , a, are zero. It is also true if k = 1 and a , = a, = a,, which makes X a multiple of the unit rv I. But it cannot be true any other way (for the equations entail that a, = k2a, = k3a2so k3 = 1 if a, is not



zero. So U has only two eigenspaces, but trivial: one containing all rv (0-eigenspace) and one containing the multiples of I (1-eigenspace). In general, if PP is a measurable partition of K, then it corresponds to a set {IA : A in PP) of indicator functions. Because the cells in a partition are mutually disjoint we have I,

. I,


4 if A # B, and

= ZA otherwise.

Moreover, if PP is finite, then we also have C{1/N)IA : A in PP} = I C{X . IA: A in PP} = X where N is the cardinality of PP, and X is any rv. This discussion of partitions utilizes point-wise multiplication. Apart from that, we can divide Q(S) into finite-dimensional subspaces in various ways. The set of linear combinations of a given set XX of simple rv is the span of XX, a subspace of Q(S). Given any partition PP and any simple Y, we can find a refinement PP' = {A fl B : A in PP, B in PP(Y)) on whose cells Y is constant; thus Y belongs to the span of {IA : A in PP'}. Therefore, for any partition PP, the space Q(S) is the union of the subspaces spanned each by the set of indicators {ZB : B in PP'} of cells of some finite refinement PP' of PP. So although in general Q(S) is not finite dimensional itself, it can be usefully approached through its finite-dimensional subspaces. Finally, a probability measure p on a sample space S = (K,F) is an additive map of F into [O,l]-sigma-additive if F is a sigma-field-with p(A) = 0 and p(K) = 1. The triple S(p) = (K,F,p) is a probability space. The expectation value Ep(X) is a linear functional mapping RV(S) into the real numbers:

This is true also for the extended definition of expectation to all rv (by integration); in the case of simple rv these properties follow readily from the equation Ep(X) = C{p(A)xA : A in PP(X)) with the characteristic partition and values of X symbolized as above.

3. Kinematic Shifts. Supposing that opinion is represented by a probability function, and given some constraint on what initial opinion should change to, can we find a rule to construct or select the probability function which represents the final opinion? That is the general problem, and the



first partial solution is associated with the name of Bayes. Suppose the constraint is that the posterior probability of given "evidence" E must be 1, then the posterior probability measure p ' is related to the prior p by: (SC) P' = P(. IE) = P (

n E)/p(E).

This is the rule of Simple Conditionalization, applicable provided p(E) is positive. How such a rule can be justified is discussed in Hughes and van Fraassen (1984), and in van Fraassen (1986, 1989). Richard Jeffrey (1965) created probability kinematics proper by extending this partial solution. Let the constraint on the posterior take the more general form that the members A of a given partition PP should receive probability qA. Then Jeffrey's rule relates the posterior p' to the prior p by (JC) pl(.) = 2{q~p(.IA) : A E PP} applicable provided p is positive everywhere on PP, PP is countable, and the numbers qA are nonnegative and sum to 1. The shift from p to p ' induces of course a corresponding shift in the expectation values of random variables. This corresponding shift can be represented geometrically (in linear algebra/geometry) in a way that builds naturally on the foregoing. As before, the discussion will be limited to cases involving finite partitions. The map p -+p ' , defined on those probability functions which give a positive value to each cell of PP, described by (JC), I call a JefSrey shift, on partition PP. Let us now see how this shift affects expectation. Suppose X has characteristic partition PP(X), with X = E{xBZB: B in PP(X)}. Then we can write the expectation value of X for the prior probability P as Ep(X) = E{xBp(B) : B in PP(X)} = 2{xBp(B n A) : A in PP, B in PP(X)} (3.1) by the theorem of total probability. We can similarly write Ep,(X) = E{xBpt(B) : B in PP(X)}

= 2{xBp1(Bn A) : A in PP, B in PP(X)}

= E{x~~'(A)/~'(BIA)

: A in PP, B in PP(X)} = E{xB[qA/p(A)]p(Bf? A) : A in PP, B in PP(X)}.


We see therefore that what is needed to change the expectation, from prior to posterior, is the systematic insertion of coefficients [qA/p(A)], which depend solely on the given prior and the nature of the shift. Done properly, this insertion changes the last line of (3.1) to the last line of (3.2), hence on the probability space S(p) = (K,F,p) the Jeffrey shift can be duplicated by the effect of a linear operator Vq on the rv on S.



Recalling that IA is the indicator which takes value 1 on A and value 0 elsewhere, we note that p(A) = Ep(ZA).Hence we can rewrite (3.2) above as (where now all sums will be over all free indices, with A ranging over PP and B over PP(X)):

which compares to Ep(X) = Ep(zX . I,) as posterior to prior expectation value of X. Defining the operator V 4 by:

V4X = C{[qA/p(A)]X. IA : A in PP}


we deduce accordingly that

Ep8(X)= Ep(V4X)for all rv X.


This is the basic correspondence between a Jeffrey shift characterized by the partition PP and associated numbers qA : A in PP, and an operator V 4 on the vector space RV(S) or Q ( S ) . We must now investigate its properties. First of all V 4 has the numbers [qA/p(A)]as eigenvalues, with at least the corresponding eigenvectors:

V41B= [qA/p(A)]IB if B is a subset of cell A of PP. For in that case IB . IA = IB, and I, PP. Let us abbreviate:

. I,



for every other cell C of

Thirdly, the indicated eigenvalues are the only ones (other than zero), for suppose V4X = kX. Then kX = CuAX . I,, so if x is in A then kX(x) = uAX(x)for all A in PP. This is possible only if either k = U A or else X ( x ) = 0. Hence if X is not 4 then k = UA for some A in PP. The eigenvalues U A are restricted because of the special characteristics of the numbers q, and p(A). Since qA = uAp(A),and the former numbers



are positive and sum to one, this is also true of the numbers vAp(A). But because p assigns numbers in [0,1], their sum 1 is a weighted average of the eigenvalues v,, so 1 must lie in the interval spanned by them. The eigenvalues "surroundn 1, in the sense that if any of them is higher (lower) than 1 there is also an eigenvalue lower (higher) than 1. The eigenvectors of V q span the space Q(S). For suppose Y = C{yBIB: B in PP(Y)}. Then also Y = C{yBIB. IA: B in PP(Y), A in PP}. But I, . I, = I,,,, the indicator of a subset of cell A in PP. Hence Y is a linear combination of eigenvectors of V q . This is of course related to the observation near the end of the previous section that Q(S) is the union of its finite-dimensional subspaces associated with finite partitions. We can sum this up: (I) Let q map finite partition PP (on which p is positive) into a set of positive real numbers summing to 1, and let p ' be defined by (JC). Then the operator V qdefined by (3.3) satisfies (3.4) and is linear, with eigenvalues "surrounding" the number 1, and with all indicators of measurable subsets of cells of PP as eigenvectors. The eigenvectors of V q span the space Q(S) of simple rv. We know from our examples that not all linear operators fit this sort of picture. There is however a complication which allows some other operators to play the role of V q in the basic correspondence described by (3.4). If the field of measurable sets is finite, it is possible to use the partition formed by its atoms to describe almost any probability function as obtained by a Jeffrey shift. Therefore even our "cyclic" Example 1 can correspond there to a Jeffrey shift. In larger probability spaces that will not be so. For that reason, let us expand for a moment on Example 2. Let h and H be as described, and ask if there can be a finite partition PP and assignment q of posterior probabilities to its cells such that the Jeffrey shift yielding posterior p ' is such that

It is characteristic of the Jeffrey shift that odds are preserved inside the cells, that is,

for subsets B, C of the same cell A. So we would need here

for subsets B, C of A. But HIB(x) = 1 iff IB(h(x)) = 1 and zero otherwise,



so HZB = IBh where Bh = {x : h(x) is in B}. So, with p being Lebesgue measure in our examples, we require

To continue we should now try to choose h so that HX is still an rv if X is, and such that (3.5) will not be true, for certain subsets B, C of any cell A of any relevant finite partition. Let us take h (x) = x0 5 . If A is a measurable subset of [O, I], then so is Ah = {y : in A} = {z2 : z in A}. Now if A has positive measure, m/2"+'] and [rn/2"+', rn/2"] as then it must have the intervals [ w ~ / 2 " + ~ , subsets for some numbers m and n, since these fractions are dense in the interval. The measures of these two intervals are in proportion 1 :2. Their images under squaring, however, are in proportion 1 : 4 . Therefore equation (3.5) does not hold for all measurable subsets of any Bore1 subset of [0,1] with positive measure. Accordingly we have here a linear operator which cannot correspond to any Jeffrey shift in the requisite way.

4. Inner Product. The identification of a linear operator on rv corresponding to a Jeffrey shift on a given probability space calls to mind Michael Goldstein's program (1981, 1987). In that program, opinion change in general is represented by self-adjoint operators on the space of rv. To establish a further bridge, it is necessary to introduce Goldstein's inner product (which he studied following on a suggestion of De Finetti). An inner product on a real vector space is a symmetric bilinear form, that is, it is a binary operation, mapping vectors into scalars such that

It follows that (+,+) = 0. The inner product is positive (semi-)definite exactly if (X,X) > 0 if X # 4 (if (X,X) 2 0 for all X). The square root of (X,X) is 1x1, the norm or length of X; X is orthogonal to Y iff (X,Y) = 0. Let S(p) = (K,F,p) be a probability space and define W>Y) = E,(X

. Y).


Then ( , ) is an inner product on RV(S) and on Q(S). It is positive semidefinite, and it is positive definite if p is "strictly coherent", that is, assigns zero only to the null set. Otherwise, if A is a nonempty measurable set with zero measure, then I, # 4 but (I,, I,) = 0. Strict coherence is of course an exception, and this fact distances our subject somewhat from geometry, where being positive definite is often incorporated in the definition of inner product. (Compare Curtis 1990 which does and Kaplansky



1974 which does not). It is possible here to switch from probability spaces to probability algebras, which result if the field of measurable sets is reduced modulo differences of measure zero. (Compare Kaplansky 1974, 4 , exercises 2 and 3.) This is a natural move and restores the geometric character without loss, but for now we will keep that in reserve. It is possible in our case to describe the set of exceptions, which would make the inner product fail to be positive definite on the space Q ( S ) of simple rv, more precisely. The radical of the inner product is the set of vectors X such that ( X , Y ) = 0 for all Y . This is a subspace. If X is in the radical then ( X , X ) = 0. But if X is a simple rv and ( X , X ) = 0 then X is in the radical. For let X = ExAZAso ( X , X ) = E ( ~ , ) $ ( A ) . This equals zero only if for every A in PP(X), either xA or p(A) equals zero. But then for Y = EyBZB we have (X,Y) = E,(xAyBIAIB)= CxAyBp(A f? B ) 5 ExAyBp(A)= 0, too. Thus the radical is at the same time the set of exceptions to positive definiteness. On a real inner product space, an operator V is self-adjoint if (X,VY) = (VX,Y) for all X,Y. Our Example 1 is not self-adjoint: (UIA(,), IA(,))= (IA(z,~ [A(,)) = p(A(2)) but (I,(,),UIA(ZJ= (IA(I), [ A ( , ) ) = p(A(1) n A(3)) = 0. But the linear operator V q associated with a Jeffrey shift is self-adjoint: (VqX,Y)= (CuAX.I,, Y ) =

EuAE,(X. I A . Y )


EuAE,(X . (Y . ZA ))

= E,(X. =



We can therefore modify our finding ( I ) by replacing "is linear" with "is linear and self-adjoint"; let us call the result (11).

5. Representation Theory. We would like some assurance that in the above discussion we have identified in principle all the relevant geometric features of Jeffrey Conditionalization. The best assurance would be to have theorems that say, "An inner product space of a certain sort with a linear operator V as described above corresponds to a probability space with a Jeffrey shift in the indicated way". We can conclude from the above discussion that the central case will be the correspondence between a Euclidean space and a strictly coherent probability space. I will take up this "central" case. A Euclidean space is a finite-dimensional real vector space with positivedefinite inner product. The inner product can be identified in terms of any orthonormal basis B: vector X = Z{X,Y)Y : Y E B} and ( X , Z ) = E{(X,Y)(Z,Y) : Y E B}. We have the following theorem (see Kaplansky 1974, 58):



THEOREM 5.1. A linear operator V on a Euclidean space is self-adjoint if and only if the space has an orthonormal basis consisting of eigenvectors of V. Let a Euclidean structure be a Euclidean space together with a self-adjoint linear operator on that space, whose eigenvalues "surround" the number 1, in the sense we had above, and which is itself strictly positive (the space has a basis consisting of eigenvectors corresponding to positive eigenvalues) . As a particular Euclidean structure, let SE be the real vector space with inner product ( , ) and let T be the designated linear operator. Let B be an orthonormal basis of SE consisting of eigenvectors of T, and let TY = t,Y for Y in B, with t, positive and {t,} surrounding 1, as indicated. We now construct a probability space as follows:

F = class of all subsets of K p is a probability measure on F such that p(X) = 0 only if X is empty, and 2{p({Y})ty : Y in B} = 1. The choice of p is possible because of the restriction on the eigenvalues of T, and indeed, many such choices are possible. In this case, since K is finite, RV(S) = Q(S) and we look for a correspondence between SE and RV(S), endowed with Goldstein's inner product. Define the map g: (a) if Y is in B, g(Y) = p ( { ~ } ) - 0 ~ 5 ~ ~ y , (b) g(C{a,Y : Y in B}) = 2{ayg(Y) : Y in B}. It is clear that g is 1-1, and is onto RV(S), although the unit basis vectors do not correspond to the unit subsets of K. However, Goldstein's inner product now works as follows, when g(X) = Ca,g(Y) and g(Z) = Cb,g(Y), with the sums over Y in B:

So the correspondence g preserves inner product: It is an isomorphism, so that RV(S) with Goldstein's inner product is also a Euclidean space.



Next, we can define q, = t,p({Y}) to identify the corresponding Jeffrey shift:

which is possible (the numbers q, are positive and sum to 1) because of the restrictions observed on the numbers t, and the measure p. We have therefore in every desired respect an exact correspondence between the Euclidean structure and a probability space with Jeffrey shift. REFERENCES

Curtis, M. L. (1990), Abstract Linear Algebra. New York: Springer-Verlag.

David, F. N. (1962), Games, Gods and Gambling. New York: Hafner.

De Finetti, B. (1974), Theory of Probability: A Critical Introductory Treatment, vol. 1.

Translated by A. Machi and A. Smith. London: Wiley. Goldstein, M. (1981), "Revising Previsions: A Geometric Interpretation", Journal of the Royal Statistical Society B43: 105-130. . (1987), "Can We Build a Subjectivist Statistical Package?", in R . Viertl (ed.), Probability and Bayesian Statistics. New York: Plenum, pp. 203-215. Hughes, R. I. G. and van Fraassen, B. (1984), "Symmetry Arguments in Probability Kinematics", in P. Kitcher and P. Asquith (eds.), PSA 1984, vol. 2 . East Lansing, MI: Philosophy of Science Association, pp. 851-869. Jeffrey, R. C. (1983), Logic of Decision, 2d ed. Chicago: University of Chicago Press. Kaplansky, I. (1974), Linear Algebra and Geometry. New York: Chelsea. van Fraassen, B. (1986), "A Demonstration of the Jeffrey Conditionalization Rule", Erkenntnis 24: 17-24.

. (1989), Laws and Symmetry. Oxford: Oxford University Press.