1,203 277 2MB
Pages 175 Page size 540 x 666.24 pts Year 2005
ESSAYS IN SYNTACTIC THEORY
The essays in this important collection explore wide-ranging aspects of the syntax and semantics of human languages. The book also examines methods in generative linguistics, and the role of rules vs. principles in syntactic theory. Key topics covered in the work include: • movement phenomena and the syntax of logical form • the representation and semantic interpretation of certain empty categories • the formation and representation of A-chains, A chains and verb-chains. The book presents a bold hypothesis concerning methods of determining unification vs. non-unification of syntactic principles, based on comprehensive theoretical case-study. It is argued that Case and Theta subsystems are in fact distinct, and thus unification should not be sought. In addition, the book addresses the fundamental question of whether syntax is rule-based vs. principle-based, and advances a rule-based, “UNprincipled,” derivational approach. Essays in Syntactic Theory makes a vital contribution to substantive and methodological debates in linguistic theory, and should therefore be of interest to any serious scholar of the discipline. Samuel David Epstein is currently an associate Professor at the University of Michigan, and taught linguistics at Harvard University for nine years. He is co-editor of the journal Syntax, and co-author of A Derivational Approach to Syntactic Relations.
ROUTLEDGE LEADING LINGUISTS Series editor Carlos Otero 1 ESSAYS ON SYNTAX AND SEMANTICS James Higginbotham 2 PARTITIONS AND ATOMS OF CLAUSE STRUCTURE Subjects, Agreement, Case and Clitics Dominique Sportiche 3 THE SYNTAX OF SPECIFIERS AND HEADS Collected Essays of Hilda J.Koopman Hilda J.Koopman 4 CONFIGURATIONS OF SENTENTIAL COMPLEMENTATION Perspectives from Romance Languages Johan Rooryck 5 ESSAYS IN SYNTACTIC THEORY Samuel David Epstein 6 ON SYNTAX AND SEMANTICS Richard K.Larson 7 COMPARATIVE SYNTAX AND LANGUAGE ACQUISITION Luigi Rizzi
ESSAYS IN SYNTACTIC THEORY Samuel David Epstein
London and New York First published 2000
First published 2000 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 Routledge is an imprint of the Taylor & Francis Group This edition published in the Taylor & Francis e-Library, 2005. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” © 2000 Samuel David Epstein All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Epstein, Samuel David. Essays in Syntactic Theory/Samuel David Epstein. p. cm. Includes bibliographical references and index. 1. Grammar, Comparative and general—Syntax. 2. Generative Grammar. I. Title. P291.E66 2000 415–dc21 99–058651 CIP ISBN 0-203-45470-7 Master e-book ISBN
ISBN 0-203-76294-0 (Adobe eReader Format) ISBN 0-415-19235-8 (alk. paper)
THIS BOOK IS DEDICATED TO MOLLY AND SYLVIE
CONTENTS
Acknowledgments
1
Introduction
2
1
A Note on Functional Determination and Strong Crossover
9
2
Quantifier-pro and the LF Representation of PROarb
15
3
The Local Binding Condition and LF Chains
21
4
Adjunction and Pronominal Variable Binding
39
5
Quantification in Null Operator Constructions
52
6
Differentiation and Reduction in Syntactic Theory: A Case Study
64
7
Derivational Constraints on Ā-Chain Formation
72
8
Overt Scope Marking and Covert Verb-Second
95
9
“UN-Principled” Syntax and the Derivation of Syntactic Relations
141
Index
163
ACKNOWLEDGMENTS
I am indebted to the following people for extremely helpful comments about earlier versions of the introduction: Noam Chomsky, Joshua Epstein, Jon Gajewski, Sam Gutmann, Elaine McNulty, Steve Peter, and Daniel Seely. For preparing the manuscript, I am unimaginably grateful to Steve Peter— who did it all under very difficult circumstances. (Thanks to Christer Platzack for making materials available to Steve.) I also thank Liz Brown, Senior Editorial Assistant at Routledge, for her patience, invaluable assistance, and for her invariably kind and encouraging email messages. Last, but by no means least, I am indebted to Carlos Otero, the series editor. For granting me/Routledge permission to reprint these articles here, I am grateful to Jay Keyser and MIT Press (for “Quantifier-pro and the LF Representation of PROarb,” “The Local Binding Condition and LF Chains,” “Adjunction and Pronominal Variable Binding,” “Quantification in Null Operator Constructions,” “Derivational Constraints on Ā-Chain Formation,” and “Overt Scope Marking and Covert Verb-Second”), to Kluwer Academic Publishers (for “Differentiation and Reduction in Syntactic Theory: A Case Study”), and to Foris Publications/Mouton de Gruyter (for “A Note on Functional Determination and Strong Crossover”). I am also grateful to MIT Press for granting me permission to reprint here “‘UN-Principled’ Syntax and the Derivation of Syntactic Relations,” which appeared in Working Minimalism (MIT Press, 1999). Thanks also to Oxford University Press for allowing this paper, very similar to chapter 1 of Epstein, Groat, Kawashima, and Kitahara’s A Derivational Approach to Syntactic Relations (1998), to appear here.
INTRODUCTION
1 The (Mentalistic) Framework The publications appearing in this volume are the result of research conducted between 1984 and 1997. Each article appearing in this volume reflects research conducted within the generative theory of linguistics, pioneered by Noam Chomsky. The analyses are conducted within the syntactic frameworks of Chomsky’s Government and Binding Theory, and subsequent modifications of it, currently culminating in the so-called “Minimalist Program”. The central goal of the generative enterprise remains constant, regardless of the framework adopted: to construct an explanatory theory of human knowledge of language, specifying the contributions of the environment on the one hand and of the organism (biology) on the other, in determining the linguistic knowledge-states attained (attainable) by a normal human being under the standard environmental conditions under which such knowledge develops or, perhaps more accurately, “grows”. Such a knowledge-state (for example, knowing English, or for example, knowing Tagalog) constitutes one property of the overall mental state, or mind, of the person in such a linguistic knowledgestate. Thus, linguistics is said to be “mentalistic”. Linguistic mentalism is thought to confront a serious problem, namely the so-called mind-body problem, with respect to which I adopt Chomsky’s perspective and therefore believe that there simply is no problem; hence, seeking a solution to “it” is misguided. The “problem” amounts to the following question, which constitutes the linguistic variant of the more general problem, Why is it that when we cut open someone’s head—someone who knows, for example, French—we do not find the rule systems for French postulated by linguists and attributed by linguists to people who know French? The question—in particular, its materialist perspective—seems to me no more worrisome than the following, Why, when we dig up the earth, do we not find gravitational forces—or whatever alleged property of the earth it is that is hypothesized to exert such forces? No serious science requires that all its postulates refer to tangible, material entities. Thus, I don’t expect to find rules in your head, in the same way that I do not expect to find gravitational forces in the dirt. At the same time, I am willing to (tentatively) attribute to the earth gravitational forces which, by hypothesis, the earth exerts, since I might be able to explain certain phenomena by doing so. I am, I think similarly, willing to (tentatively) attribute rules of French to someone who knows French, in order to explain or describe what it is they know (and how they use this knowledge when engaged in linguistic performance; for example, speech comprehension). As Chomsky (1996, 45) puts it:
INTRODUCTION
3
It is mere stipulation to include [as physical entities] gravitational attraction, fields, Kekule’s structural formulas, curved space-time, quarks, superstrings etc., but not the processes, events, entities and so on postulated in the study of mental aspects of the world. As Wilhelm von Humboldt said it: The demand of science, in all its manifold appearances, is always the recognition of the invisible in the visible. (Cowan (1963, 106/423), from: Über die Bedingungen, unterdenen Wissenschaft und Kunst in einem olke gedeihen (On the Proper Conditions for Science and Art, 1814 [fragment]) Indeed, some of the roots of the generative revolution in linguistics, within which the mentalistic study of language is pursued in a manner entirely consistent with Humboldt’s conception of science, are traced back, by Chomsky, not only to Humboldt, but to Descartes: We have so far extracted from “Cartesian linguistics” certain characteristic and quite important doctrines regarding the nature of language and have, quite sketchily, traced their development during the period from Descartes to Humboldt. As a by-product of this study of “langue,” and against the background of rationalist theory of mind, certain views emerged as to how language is acquired and used…The central doctrine of Cartesian linguistics is that the general features of grammatical structure are common to all languages and reflect certain fundamental properties of the mind. (Chomsky (1966, 59)) Descartes’s philosophy, in turn, has been characterized as follows: That Cartesianism was, most characteristically, a philosophy of science of the New Science is the key, not only to its continuing relevance, but to both its “revolutionary” anti-Scholasticism and its Humanism. The New Science, essentially committed to an apparatus of “ideal,” physically unrealizable systems —the frictionless plane, the perfect sphere, the freely falling body—was, intrinsically, very loosely associated to experiment and virtually independent of sheer observation. If Galileo ever did, in fact, drop leaden and wooden balls from the Leaning Tower, he most assuredly was not looking for experimental verification or confirmation of his theory of motion; he could only have done it in an effort to convince those recalcitrant Aristotelian dunderheads of the ruinous consequences for science of their intransigent and unsophisticated observationalism, which, blinded by such immediately apparent motions as the upward and downward movements of real fire and actual earth, could not see profoundly important universal laws of motion, unrealized, in any simple way, in any actual motion. (Joseph Epstein (1965, Introduction)) In conclusion, I hope that you, the reader, will find the following decidedly non-material material neither immaterial, nor without substance!
4
ESSAYS IN SYNTACTIC THEORY
2 The Articles The articles appear in chronological order. Here, I hope to clearly review each analysis, discuss some relations among them, and point out their relations to other analyses and to my own current thinking about the broader issues raised. In writing this introduction, I realized that the first article, “A Note on Functional Determination and Strong Crossover,” has something in common with the last “‘UN-Principled’ Syntax and the Derivation of Syntactic Relations”. Each presents arguments that certain representational constructs are inadequate, and that one empirically and conceptually preferable theory instead makes reference to properties of the derivation; that is, to ordered rule application, as opposed to representations. The first of these two articles suggests that the Functional Determination algorithm, coupled with a representational definition of “variable,” is in certain respects inadequate to the task of accounting for Strong Crossover (SCO) configurations, by appeal to (only) Conditions A and B of the Binding Theory. That is, these algorithms which apply to representations in order to determine the feature content of certain categories appearing in these representations overgenerate certain Strong Crossover configurations. Notice also that intermediate representations are in fact implicitly postulated even in this “representational approach”—namely, those representations containing empty categories whose featural content is not yet established. The Functional Determination algorithm then applies to these representations and maps them into another representation in which the feature content of the empty categories is specified. Thus, implicitly there is D-Structure, transformations, another structure, application of the Functional Determination algorithm, and ultimately SStructure. This now seems to me derivational, implictly postulating transformed intermediate representations (neither D-Structure nor S-Structure) from which the application of Functional Determination derives S-Structure representation. It is suggested that one alternative account, in which the feature content of an “empty” (i.e. phonologically null) category is determined by the formal properties of the rule that created it, avoids the problem confronting an analysis incorporating both Functional Determination and the representational definition of “variable”. This “Intrinsic Feature” account, whereby the move(r) determines properties of the “trace,” accords with current conceptions of trace theory, in particular the copy theory as presented in Chomsky (1995). (But for further discussion of trace theory, and its possible elimination, by appeal to derivational mechanisms, see below.) The last article, “‘UN-Principled’ Syntax and the Derivation of Syntactic Relations,” is in a sense similar, although more far-reaching, in that it argues against defining any syntactic relations on trees/representations, and suggests instead that such relations might be explained by appeal to the independently-motivated, partially-ordered rule applications that constitute the Minimalist derivation. This derivational approach to syntactic relations is explored further in Epstein, Groat, Kawashima and Kitahara (1998) and is being investigated in ongoing research. This theory of syntactic relations extends the rule-based Minimalist approach in an attempt to derive syntactic relations from the properties of the derivation—certainly preferable to defining relations on trees without explaining why these particular definitions/tree configurations (and not any of the infinite number of definable others) are syntactically significant. Similarly filters/principles, that is, definitions of what constitutes an ill-formed property of a tree, are targeted for elimination, and the Y-model itself, with the two interface levels derived only at the end of the line—that is, posttransformationally—is significantly revised, yielding a level-free system with interpretation of each transformational operation—that is, derivational interpretation—with the possible elimination of trace theory, chain theory, or some subparts of these systems of representation. The idea is that traces and chains (among other representational postulates) are induced by the Y-model’s postponement of interpretation until after transformational application is complete, necessitating representational annotations (e.g. chains
INTRODUCTION
5
and traces) that allow important aspects of the derivational history to be encoded in the representational output. (See also Epstein and Seely (1999) and Kitahara (forthcoming) for some of the more recent analyses exploring this derivational approach.) As claimed by Epstein and Seely (1999), I think one central question regarding this approach, as compared to “standard Minimalism,” is not, Which is preferable, the representational or the derivational theory? The standard Minimalist theory is derivational, since copy traces and chains, among perhaps other postulates, encode properties of the derivation, including where a category was at an earlier stage of the derivation and where the rule Move has moved it to. So the more salient question becomes, Which of the two derivational theories is preferable—one eliminating traces and chains and all such encoding mechanisms, if possible, or one which is inexplicably “mixed,” since it is: (i) “enriched” with these seemingly redundant (and perhaps also empirically problematic) representational/notational encodings of the derivational history, while (ii) concurrently incorporating “truly” derivational concepts such as the transformational rules Merge and Move/Attract, feature deletion (occurring after one, but before another, transformational application), the Cycle, etc.? The first article, “Functional Determination and Strong Crossover,” not only relates to the last, as just noted, but is also related to the sixth, “Differentiation and Reduction in Syntactic Theory”. As noted, the first discusses potential empirical problems confronting an elegant, and, in a sense, eliminative analysis of Strong Crossover, that is, one that eliminates appeal to Condition C of the Binding Theory in accounting for SCO. “Differentiation and Reduction in Syntactic Theory” is also concerned with problems confronting another elegant eliminative analysis, namely the Visibility analysis seeking to eliminate the Case Filter—a worthy goal, since filters are, by their very nature, non-explanatory, describing (without explaining) what is an illicit representation. The article constitutes a case study of a broader methodological issue confronting linguistic research: namely how we might determine which rules/principles exclude which data. In the case study, the question is, How do we determine whether a particular string is to be correctly ruled out by the Case Filter or by the Theta Criterion? Notice, if we were concerned only with (the set-theoretic conception of) E-language (as explicated in the sense of Chomsky (1986)), the question is irrelevant; that is, we don’t care how we rule out any particular ungrammatical string; we just want to rule them all out in any way we can. With the shift of focus from E-language to I-language, whereby linguistics is properly construed as a cognitive science, seeking to formally characterize human knowledge of language correctly, it does matter how we rule things out. For example, a theory that rules out each of the following on exactly the same grounds—that is, by exactly the same formal apparatus— we would assume is wrong: (1) I hope John to sleep. (2) John is likely that Bill sleeps. (3) I hope John to be likely that Bill sleeps. A (unified) theory in which each of these strings is ruled out by the same single principle P should, I think, be assumed to be an inadequate theory. Since there seem to be different degrees and kinds of ungrammaticality, it is incumbent upon the linguist to characterize them correctly. A differentiated, modularized syntactic theory, with many different rules/laws and components, can in principle distinguish what should be distinguished. For example, a theory with both the Case Filter and the Theta Criterion distinguishes Case Filter violations from Theta Criterion violations from violations of both (from violations of neither), whereas an eliminative theory reducing the former to the latter cannot, by definition, make such distinctions. There is evidence indicating that these three types of violations are different from one another, and hence the reduced theory seems empirically inadequate, since it fails to distinguish them. More
6
ESSAYS IN SYNTACTIC THEORY
generally, we should not blindly seek reduction, or invariably seek to “eliminate extra machinery”. Rather, we should seek the right theory, which makes the right distinctions, thereby correctly accounting for what seems to be differentiated human knowledge of language. It is worth noting here that in Chomsky (1995) there is another (elegant) attempt to explain the Case Filter. There it is argued that Case, being uninterpretable, cannot appear in LF representation and therefore must be checked (eliminated) before LF. As Epstein and Seely (1999) note, this “delayed LF enforcement” of Case requirements, under which Case requirements are not enforced until LF, may well be inadequate, to the extent that Case bearing/checking is, as has been assumed, in fact a phonological requirement, with Case constituting a PF-illegitimate feature (perhaps in addition to being an LF-illegitimate feature). The seventh article, “Derivational Constraints on Ā-Chain Formation,” is itself eliminative, and derivational. It seeks to eliminate a number of descriptive—and seemingly unrelated—filters and constraints, by reducing them all to a single Derivational Economy condition, Last Resort, in the sense of Chomsky (1991). (This unified analysis is of course itself subject to the same kind of scrutiny just discussed concerning its empirical adequacy in accounting for degrees and types of ungrammaticality). In this article, I investigate properties of certain Ā-chains, and seek to derive them from overarching derivational constraints, under which the “function” of Movement is to satisfy what are now called “the bare output conditions”. The idea was to extend Ā-chain Last Resort to Ā-chains. This is consonant, to a certain degree, with more contemporary conceptions of the nature of Movement, namely to optimally derive convergent interface representations containing no illegitimate morpholexical features; that is, features that are (naturally) uninterpretable at the interfaces. Again, the attempt is to explain— or better explain—properties of Āchains merely described by a formal apparatus like filters on representation, or descriptive, ad hoc constraints on movement and/or chain formation. The fifth article, “Quantification in Null Operator Constructions,” seeks to deduce an unrecognized, apparently correct, prediction concerning quantifier interpretation, from the independently-motivated Null Operator analysis of certain constructions. The question posed was, Why is there no narrow scope reading, derived by Quantifier Lowering of the subject quantifier, in cases like sentence (4)? (4) Many people are easy to talk to. Sentence (4) cannot mean: It is easy to address a large group of people. The idea was that although Quantifier Lowering from a thetaless position is allowed, the lowered quantifier adjoined to the embedded infinitival projection (IP) would fail to bind the Null Operator occupying the embedded [Spec, CP] at LF. At the same time, binding of the Null Operator by the matrix subject, a thetaless trace of Quantifier Lowering, fails to “strongly bind” the Null Operator, which is as a result present yet uninterpretable at LF, yielding an illicit representation at that level, thus providing independent evidence for the presence of a Null Operator in the LF representation of such sentences. The analysis directly addresses three general topics explored in some of the articles remaining to be discussed here: Adjunction, Quantification and (Local) Binding. The third article, “The Local Binding Condition and LF Chains,” examines different definitions of the central relation “local binding”. The definitions, although formally different, had been assumed to be empirically equivalent. This article investigates an independently-motivated analysis within which the definitions are empirically distinguishable. The analysis concerns the well-formedness of certain cases of LF anaphor cliticization in English as compared to the ill-formedness of the analogous, but overt, anaphoric cliticization in Italian. Here, as in the eighth article, I adopt the influential and intriguing idea (I believe originating in Huang (1982)) that in certain grammars there exist (“unheard of”) LF operations akin to those
INTRODUCTION
7
found overtly in other grammars, the fundamental idea being that the syntactic component is virtually universal/invariant, but the point in the syntactic derivation which undergoes phonological interpretation is subject to parametric variation. Notice that under this kind of analysis the perceived vast (or infinite) diversity of human languages (grammars) is an observed illusion. In Joseph Epstein’s (1965) terms (see p. 5), it is a case of being “blinded by the immediately apparent”. Consistent with Humboldt’s (1814) conception of scientific practice (see p. 4), this blindness which is induced by the immediately apparent can be overcome by “recognizing the invisible in the visible,” an apparent methodological prerequisite (as it is in other sciences, I think) to productively pursuing Chomsky’s central unifying thesis that, contrary to appearances, there is in essence only one human language grammar. This very same leading idea is adopted in the eighth article, “Overt Scope Marking and Covert VerbSecond,” in which I suggest that English incorporates covert operations generating verb-second configurations in the LF component. In addition, overt scope-marking conditions are investigated, and it is also proposed that certain LF representations (derived by covert V movement ultimately to C) are categoryneutralized VP-recursion structures lacking functional checking categories and their projections. This checking-induced deletion and the resulting derived LF constituent structures are argued to permit simplification of index-dependent, head-government ECP requirements. The article also proposes a “mixed” theory of adjunction, with overt adjunction duplicating the category targeted by adjunction (as recently advocated by Lasnik and Saito (1992)), while covert adjunction segments the target (May 1985)). This segmental theory of adjunction is the topic addressed in the fourth article, “Adjunction and Pronominal Variable Binding,” which examines a potential problem confronting May’s influential segmental theory of adjunction. Perhaps the central facet of the segmentation analysis, in direct contrast to category-duplicating adjunction, is that it predicts that an adjoined category commands a slightly larger domain than is predicted under duplication. I suggest that this slightly increased “permissiveness” results in the overgeneration of certain cases of quantificational weak crossover. Scope and quantification are also central aspects of the second article, “Quantifier-pro and the LF Representation of PROarb,” which has two central goals. The first is to try to explicate the precise meaning of “arbitrary interpretation,” a type of interpretation perhaps unique (hence suspect), borne only by PROarb. The second goal is to try to eliminate, or at least reduce, the number of cases presumed to be instances of uncontrolled PRO (undergoing, by hypothesis, arbitrary interpretation). The article was, I think, the first to suggest that a non-null-subject grammar, namely (knowledge of) English, had pro in its lexical inventory. The basic idea is that uncontrolled PROarb, is in fact controlled PRO, controlled by an implicit argument, pro, which itself undergoes pronominal coreferent interpretation or, if free, is interpreted as a universal quantifier, and as such undergoes Quantifier Raising in LF. Arbitrary interpretation is thereby reanalyzed as universal quantification, and at least one type of allegedly uncontrolled PRO is reduced to PRO that undergoes obligatorily control. I hope you find the articles interesting, and please remember: comments are (still) welcome. Samuel David Epstein Ann Arbor, Michigan February 1999 References Chomsky, N. (1966) Cartesian Linguistics: A Chapter in the History of Rationalist Thought, Lanham, Md. and London: University Press of America .
8
ESSAYS IN SYNTACTIC THEORY
Chomsky, N. (1986) Knowledge of Language: Its Nature, Origin, and Use, New York: Praeger. Chomsky, N. (1991) “Some Notes on Economy of Derivation and Representation,” in Freidin, R. (ed.) Principles and Parameters of Comparative Grammar, Cambridge, Mass.: MIT Press. Chomsky, N. (1995) The Minimalist Program, Cambridge, Mass.: MIT Press. Chomsky, N. (1996) Powers and Prospects: Reflections on Human Nature and the Social Order, Boston: South End Press. Cowan, M. (trans. and ed.) (1963) An Anthology of the Writings of Wilhelm von Humboldt: Humanist Without Portfolio, Detroit: Wayne State University Press. Epstein, J. (ed.) (1965) René Descartes: A Discourse on Method and Other Works, New York: Washington Square Press. Epstein, S.D., Groat, E., Kawashima, R., and Kitahara, H. (1998) A Derivational Approach to Syntactic Relations, New York/Oxford: Oxford University Press. Epstein, S.D. and Hornstein, N. (eds.) (1999) Working Minimalism, Cambridge, Mass.: MIT Press. Epstein, S.D. and Seely, D. (1999) “On the Non-existence of the EPP, A-chains, and Successive Cyclic A-movement,” manuscript, University of Michigan and Eastern Michigan University. Huang, C.-T. J. (1982) “Logical Relations in Chinese and the Theory of Grammar,” unpublished doctoral dissertation, MIT. Kitahara, H. (forthcoming) “Two (or More) Syntactic Categories vs. Multiple Occurrences of One,” Syntax: A Journal of Theoretical, Experimental, and Interdisciplinary Research. Lasnik, H. and Saito, M. (1992) Move α, Cambridge, Mass.: MIT Press. May, R. (1985) Logical Form: Its Structure and Derivation, Cambridge, Mass.: MIT Press.
1 A NOTE ON FUNCTIONAL DETERMINATION AND STRONG CROSSOVER
In this letter the analysis of Strong Crossover (SCO) phenomena provided in Koopman and Sportiche (1983) (K and S) is examined. It will be shown that certain cases of SCO are beyond the scope of that analysis (an analysis also discussed in Chomsky (1982)). Alternatives will be presented which handle the entire range of cases. The SCO configuration is illustrated by K and S as follows: (l) (=K and S(22a))
In Section 1, I will briefly summarize the K and S analysis of SCO. The ill-formedness of such configurations is claimed to be derivable without appealing to Principle C of the Binding Theory, the properties of such configurations following from four independently motivated principles, namely: (2) a. The K and S definition of “variable” b. The Functional Determination algorithm (Chomsky (1982)) c. Principle A of the Binding Theory (Chomsky (1982)) d. Principle B of the Binding Theory (Chomsky (1982))
In Section 2, I will examine a structure of Standard English exhibiting the SCO configuration (1). We shall see that the K and S analysis of SCO incorrectly fails to rule out such structures. In light of this problem, I will then supplement the K and S system (2a-d) with the following principles: (2) e. Principle C of the Binding Theory (Chomsky (1982)) f. Control Theory g. Case Theory h. The Theta Criterion (in particular the notion “theta-chain” (Chomsky (1981))) and i. The (or A) Resumptive Pronoun Parameter (Chomsky (1982))
10
ESSAYS IN SYNTACTIC THEORY
We shall see that the derived system (2a-i), also fails to rule out certain SCO configurations of Standard English. I shall suggest that the inability to rule out such instances of SCO is a consequence of the incorporation of both principles (2a) and (2b). In Section 3, I provide two alternative analyses of SCO in Standard English. In the first analysis, principle (2a), the K and S definition of “variable”, is abandoned. Under this analysis, Functional Determination and Binding Theory are shown to be superfluous with respect to ruling out SCO configurations. Under the second analysis, principle (2b), Functional Determination, is altogether abandoned. The correctness of this analysis will indicate that Functional Determination plays, at most, a superfluous role in ruling out instances of SCO in Standard English. 1 The K and S Analysis of SCO K and S propose the following (universal) definition of “variable” (a definition presumed to apply at all syntactic levels), under which variables need not be empty categories: (3) (= K and S (21))
α is a variable if a is in an A-position is locally Ā-bound
They note that the Bijection Principle (BP): (4) (= K and S (20))
There is a bijective correspondence between variables and Ā-positions
requires the incorporation of (3). That is to say that Weak Crossover constructions such as: (5) [S′ whoi [S does hisi mother love ei]]
can be ruled out by the BP only if both the (overt) pronoun “hisi”, as well as “ei” are defined as variables. K and S argue that (3), the definition of “variable”, is independently motivated in that SCO configurations can be ruled out under this definition by Principles A and B of the Binding Theory, i.e. without appealing to Principle C of the Binding Theory. Thus, for example, consider a structure exhibiting the SCO configuration: (6) (=K and S (22b)) *Whoi does hei think ei left
K and S account for the ill-formedness of (6) as follows. First, they note that under definition (3), “…it is the pronoun “he” which is interpreted as a variable, and no longer the trace ei of the wh-phrase “who”.” Crucially then, the structure is not ruled out by some principle prohibiting the local Ā-binding of the pronominal “he”. In fact, notice that “he” is not a pronominal; it is by definition (3), a variable. Rather, such structures are ruled out by Functional Determination, as applied to ei, and Principles A and B of the Binding Theory. In particular, (6) is ruled out because: [ei is]…locally A-bound to “he”, ignoring traces of successive cyclic movement in the intermediate COMPs which appear to play no particular role. “He” has an independent θ-role, so ei is an empty pronominal, i.e. a PRO.
FUNCTIONAL DETERMINATION AND STRONG CROSSOVER
11
But principles A and B of the Binding Theory…require PRO to be ungoverned and ei…is ungoverned: hence [(6) is]…ruled out by these principles. The SCO violations are thus explained by [(3)] and principles A and B of the Binding Theory. (Koopman and Sportiche (1983, 148) The reader will notice that the above analysis also correctly rules out SCO configurations containing empty objects such as (7) (=K and S (22c)) *Whoi does hei think you saw ei
2 Generable SCO Configurations Under the K and S analysis, S-structures of the following type (noted independently in Epstein (1983) and in Sportiche (1983, 35)) are incorrectly generable: (8) [S′ Whoi [S did hei try [S′ ei [S ei to go]]]]
In (8), as in (6), “he” is locally Ā-bound, hence a variable (under (2a)). The subject ei (again, ignoring traces in COMP) is locally A-bound by “he”, which has an independent theta-role. (Notice that if Subjacency is a constraint on movement (see e.g. Lasnik and Saito (1984)) the trace in COMP need not be present. Even if this trace is present it does not count as an Ā-binder under Functional Determination (see Chomsky (1982)). Consequently the ei subject is PRO (under (2b)). Under Principles A and B of the Binding Theory (2c and 2d) (8) is generable since ei, (i.e. PRO), is ungoverned. Thus, under the K and S analysis (2a-2d) such SCO configurations are generable. (Sportiche (1983, 35) claims that the ungrammaticality of examples such as (8) is “…due to the accidental property of English of not allowing resumptive pronouns in subject position…” However, this account fails to specify the formal principles and/or parameters governing the distribution of resumptive pronouns (see also Sportiche (1983, 149–150))). Notice that incorporating Principle C of the Binding Theory (2e) is without effect here. Functional Determination identifies the ei subject in (8) as PRO. Consequently, no Binding violation results. Furthermore, notice that the Theory of Control (2f) is satisfied in (8); PRO is properly controlled. In addition, we can not rule out (8) under Case-Theory (2g). Specifically, (8) can not be ruled out under the assumption that whtrace (or variables) require Case. Such an assumption is orthogonal here because the ei subject in (8) is, by Functional definition, PRO, not wh-trace (nor a variable). Concerning the Case-status of the (lexical) NP “who”, in (8), notice first that this operator (a nonargument in an Ā-position) does not require a theta-role (see Chomsky (1981, 179–180)). Consequently, under the reduction of the Case-Filter to the Theta Criterion (see e.g. Chomsky (1981, 336)), this operator need not be Case-marked, i.e. visible for theta-role assignment. (See also McNuIty (in preparation) for further discussion of these issues.) Turning now to (2h), notice that if we were to replace Functional Determination with the following principle: (9) Freely Assign the features [± anaphor, ± pronominal] to an empty category (see Brody (1983))
we would still be unable to rule out (8). Free assignment of features certainly allows the assignment of the features [+anaphor, +pronominal] to the ei subject, in which case no principle of grammar is violated. In
12
ESSAYS IN SYNTACTIC THEORY
particular notice that the Theta Criterion (Chomsky (1981)) apparently provides us with no means by which to rule out (8). Identifying (8) as a Theta Criterion violation would seem to require that “theta-chain” be defined in such a way that the constituents [“whoi”, “hei”, “ei”] obligatorily constitute a single theta-chain. Under such a definition, this three-membered chain in (8) would be assigned two theta-roles. The structure would then be ruled out as a violation of the Theta Criterion. However, such a definition of “theta-chain” seems untenable, since it would presumably entail that in, for example, (10) [S′ [S hei tried [S′ [S ei to go]]]]
there also exists a single theta-chain, namely [“hei”, “ei”], which is illicitly assigned two theta-roles. This, of course, is an unwanted result. The theory of theta-chains would thus seem to require the standard assumption that any occurrence of PRO heads a theta-chain. Thus, we see that the theta-theory subsystem of grammar (2h) appears to provide us with no means by which to rule out (8). Finally, notice that (8) is generable by movement, i.e. without base-generating in COMP (2i) an operation assumed to be available only in the Resumptive Pronoun languages (see Chomsky (1982)). In summary, we see that not only does the K and S analysis of SCO (2a– 2d) fail to rule out SCO configurations such as (8), but also the system of principles properly including the K and S system (namely 2a–2i) fails to rule out such structures. I would like to suggest that the failure to rule out such structures is a consequence of the incorporation of both principles (2a) and (2b). That is to say that we are unable to rule out structures such as (8) for two reasons. First, under the K and S definition of “variable”, nothing precludes the local Ā-binding of the pronominal “he” in (8). Second, under Functional Determination the ei subject is determined as PRO. This, in turn, renders principles (2c–2h) ineffective with respect to ruling out such structures. In the following Section, I present two alternative analyses of SCO in Standard English. The first involves abandoning the K and S definition of “variable”, the second involves abandoning Functional Determination. 3 Solutions As a first solution to the SCO problem, suppose we retain Functional Determinism and abandon the K and S definition of “variable”, adopting instead presumably parameterized conditions governing the Ā-binding (or variable-binding) of pronominals in Standard English. Suppose, for example, that we adopt the Reindexing Rule of Higginbotham (1980). (But see K and S for arguments against adopting such an analysis.) Essentially, the Reindexing Rule requires the presence of an empty category to the left of the pronominal as a necessary condition for variable-binding of the pronominal. (For the present purposes, I will assume that the empty category in question must be in an A-position. But see Higginbotham (1980, fn. 18) for speculation otherwise.) Under the Reindexing Rule, SCO configurations such as (6), (7) and (8) are ruled out. More precisely, such structures are simply not generable; the pronominal “he” cannot become coindexed with the operator “who” because there exists no empty category to the left of the pronominal. But notice now that since such structures are apparently precluded by independent principles (or rules) governing pronominal variable-binding, Functional Determination and Binding Theory become superfluous with respect to ruling out SCO configurations. As a second solution to the SCO problem, suppose we maintain the K and S definition of “variable” and abandon Functional Determination, (re-)adopting instead the notion “intrinsic feature content of empty
FUNCTIONAL DETERMINATION AND STRONG CROSSOVER
13
categories”. (Recall that under the K and S definition of “variable”, neither Functional Determination nor free assignment of the features [± anaphor, ± pronominal] provided us with a means by which to rule out examples such as (8).) Under this analysis, the empty category created by movement to COMP is, by definition, wh-trace, an R-expression (see Freidin and Lasnik (1981)). Under the incorporation of Principle C of the Binding Theory (see Chomsky (1982)), SCO configurations such as (6), (7) and (8) represent Binding violations; the wh-trace is illicitly A-bound in the domain of the Operator that Ā-binds it. (Notice that under the intrinsic feature analysis, some, but not all SCO configurations violate (2g), in particular the requirement that wh-trace have Case.) 4 Summary We have seen that the K and S analysis of SCO fails to rule out certain instances of SCO in Standard English. I have suggested that this is a consequence of the incorporation of both the K and S definition of “variable” and the Functional Determination algorithm. I have presented two alternative analyses of SCO in Standard English. Under the first analysis, I retain Functional Determinism and abandon the K and S definition of “variable”, adopting instead presumably parameterized conditions governing pronominal variable-binding in Standard English, (see Higginbotham (1980)). Under this analysis, Functional Determination and Binding Theory are shown to play a superfluous role ruling out instances of SCO. Under the second analysis, the K and S definition of “variable” is retained. As we have seen, under such a definition, neither Functional Determination nor free assignment of the features [± anaphor, ± pronominal] provides us with a means by which to rule out examples such as (8). Consequently, the notion “intrinsic feature content of empty categories” is (re-) adopted. Under the incorporation of Principle C of the Binding Theory, SCO configurations represent Binding violations. The correctness of this analysis indicates that Functional Determination plays, at most, a superfluous role in ruling out instances of SCO in Standard English. Acknowledgment * I thank Howard Lasnik for many helpful comments on an earlier draft of this paper. I am also grateful to Elaine McNulty for helpful discussion. Finally, thanks to the anonymous Linguistic Review reviewers for their suggestions. Any errors are, of course, my own. References Brody, M. (1983) “On Contextual Definitions and the Role of Chains,” manuscript, University College London. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Epstein, S.D. (1983) “Comments on Topicalization and Left-Dislocation,” unpublished paper, University of Connecticut, Storrs. Freidin, R. and Lasnik, H. (1981) “Disjoint Reference and Wh-Trace,” Linguistic Inquiry 12:39–53. Higginbotham, J. (1980) “Pronouns and Bound Variables,” Linguistic Inquiry 11:679–708. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160.
14
ESSAYS IN SYNTACTIC THEORY
Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. McNulty, E. (in preparation) “On Operators,” University of Connecticut. Sportiche, D. (1983) “Structural Invariance and Symmetry in Syntax,” unpublished doctoral dissertation, MIT.
2 QUANTIFIER-PRO AND THE LF REPRESENTATION OF PROARB
In this squib I will argue that, in English, PROarb is correctly represented at the level of logical form (LF) as a variable bound by a universal quantifier. In attempting to derive such an LF representation of PROarb, we will be led to the conclusion that pro exists in English and is capable of being interpreted as a universal quantifier. In a large class of constructions containing PROarb, it is pro, interpreted as a universal quantifier, that binds PROarb. That pro is capable of receiving universal quantifier interpretation is apparently not unique to the nonnull subject language English. A re-examination of the data of Suñer (1983) suggests that pro, occupying subject position in the null subject language Spanish, is also capable of receiving universal quantifier interpretation. (Suñer, it should be noted, does not identify the pro-subject she discusses as quantificational, but rather identifies this element as proarb.) Hence, I tentatively assume that this quantificational interpretation of pro is universally available. Consider the interpretation of the sentence associated with the S-structure (1): (1) [S′ [S It is fun [S′ [S PRO to play baseball]]]]
Subject to further refinement, the sentence associated with (1) is interpreted as It is fun for anyone to play baseball. Suppose we attempt to formally represent this interpretation by making the natural assumption that, contrary to the above claim that PROarb is a variable, PROarb is a quantifier phrase that obligatorily undergoes Quantifier Raising (QR) in the sense of May (1977). Under this analysis, PROarb is represented at LF as a universal quantifier that binds a variable (namely, the LF trace of the quantifier-raised PROarb), said variable occupying the position occupied by PROarb at S-structure. Notice however that, contrary to the unmarked application of May’s rule of QR, the LF rule raising PROarb is not clause bounded. The application of a clause-bounded rule of QR would derive from (1) the following LF representation, in which PROarb, a universal quantifier, is adjoined to the S-node immediately dominating it: (2) [S′ [S It is fun [S′ [S (∀x1,) [S t1 to play baseball]]]]]
(2) clearly is not the correct LF representation of (1); the scope of the universal quantifier is misrepresented. (2) is the LF representation of a sentence asserting that If everyone plays baseball, it is fun. In the correct LF representation of (1), the universal quantifier has wide (matrix) scope. Thus, it appears that PROarb not only can, but must, be quantifier-raised out of its clause.
16
ESSAYS IN SYNTACTIC THEORY
Notice that the non-clause-bounded LF movement of PROarb must proceed from Comp to Comp, assuming the correctness of Huang’s (1982) proposal that the Empty Category Principle (ECP) applies at LF. It is a theorem of the binding theory (Chomsky (1981; 1982)) that PRO occupies an ungoverned position at S-structure. Consequently, an intermediate trace in Comp is required to properly govern the subject trace of the quantifier-raised PRO.1 Thus, for the purposes of satisfying the ECP while also correctly representing the scope of the universal quantifier, the S-structure (1) must be mapped into the LF representation (3): (3) [S′ (∀x1) [S It is fun [S′ t1 [S′ t1 to play baseball]]]]
Another property of the LF movement of PROarb must be accounted for, namely, that this element moves twice and only twice. Recall that PROarb cannot be moved only once, as demonstrated by the inadequacy of the LF representation (2).2 That PROarb does not move more than twice can be seen by considering the quantificational interpretation of, for example, (4): (4) Josh said it is fun to play baseball3
Given the S-structure representation of (4) in (5), if PROarb is moved three times, the LF representation (6) is derived: (5) [S′ [S Josh1 said [S′ [S it is fun [S′ [S PRO2 to play baseball]]]]]] (6) [S′ (∀x2) [S Josh1 said [S′ t2 [S it is fun [S′ t2 [S t2 to play baseball]]]]]]
(6) is the LF representation of a sentence asserting that Josh said something about every x, namely, that it is fun for x to play baseball. This is not the correct LF representation of the quantificational interpretation of sentence derived by moving PROarb exactly twice: first into the lower Comp, then (4). The correct LF representation of (5) (again, as in the case of (1)) is from Comp to Comp, yielding (7): (7) [S′ [S Josh1 said [S′ (∀x2) [S it is fun [S′ t2 [S t2 to Play baseball]]]]]]
Thus, it appears that if we assume that PROarb is a quantifier, we require an explanation for the marked LF movement of this element. A reconsideration of the LF representation (3) suggests a possible solution. (For the sake of simplicity, the particulars of the following arguments pertain to the LF representation (3). The arguments, of course, obtain identically for (7) and all similar constructions.) In fact, (3) is an inadequate LF representation of the sentence associated with (1). The inadequacy of this representation becomes apparent upon considering the interpretation imposed on the sentence in question. This sentence is obligatorily interpreted as (∀x) if x plays baseball, it is fun for x. The sentence cannot be interpreted either as (∀x) if x plays baseball, it is fun or as (∀x) if x plays baseball, it is fun for y. The correct LF representation of (1) must contain two argument variables, each bound by the same universal quantifier with matrix scope, namely, a variable occupying subject position of the infinitival and a variable occupying the complement argument NP position to the adjective. The correct LF representation of the sentence is (8): (8) [S′ (∀x1) [S It is fun (for) x1, [S′ (for) [S x1 to play baseball]]]]
QUANTIFIER-PRO AND THE LF REPRESENTATION OF PROARB
17
Syntactic evidence for the existence of these two NP positions (each occupied by a variable in (8)) is provided by, for example, (9): (9) It is fun for Lucy for Joe to play baseball
The S-structure of (9) is (10): (10) [S′ [S It is fun for Lucy [S′ for [S Joe to play baseball]]]]
Furthermore, consider (11): (11) It is fun for Lucy to play baseball
Notice that a control interpretation can be imposed on this sentence; that is, (11) allows the S-structure analysis shown in (12): (12) [S′ [S It is fun for Lucy1 [S′ [S PRO1 to play baseball]]]]
That the prepositional phrase for Lucy occurs in the matrix clause in (12) is shown by the fact that it can be preposed, as in (13): (13) [S′ [S [For Lucy1]2] [S it is fun t2 [S′ [S PRO1 to play baseball]]]]4
Returning to representation (8), assuming the correctness of the Projection Principle (Chomsky (1981)), it follows that if the (θ-marked) complement argument to the adjective is present at LF, then it must be present at all levels of representation. Consequently, the LF representation (8) cannot be derived from the Sstructure (1). Rather, it can be assumed (Epstein (1983)) that in the correct S-structure representation of the sentence, so-called PROarb is controlled by (obligatorily coindexed with) a base-generated quantificational empty category, namely, pro, occupying the governed complement NP position of the adjective.5 At LF, then, an unmarked (i.e. clause-bounded) rule of QR applies only to this controlled argument.6 In this way PROarb is correctly represented at LF as a variable bound by this universal quantifier, namely, pro, which takes matrix scope. The problem of explaining the marked LF movement of PROarb thus disappears. In addition, the fact that the correct LF representation contains two argument variables (PRO and the trace of the controlling quantifier-raised pro), each bound by the same universal quantifier, namely, pro, is explained. That pro is capable of receiving the same universal quantifier interpretation in Spanish is suggested by the data of Suñer (1983). Her glosses of the impersonal se construction (her examples (3a-c)) include translations of pro as “one”. Surely, this suggests universal quantifier interpretation of pro.7 Under this analysis, PROarb is identified as PRO, controlled at S-structure by a universal quantifier, namely, pro.8 At LF PROarb is represented as a variable bound by this quantifier.9
18
ESSAYS IN SYNTACTIC THEORY
Notes
1
2
3 4 5
* I thank Howard Lasnik for many useful comments on earlier drafts of this squib and for invaluable discussion of the issues addressed here. My thanks also to those graduate students and faculty in the Department of Linguistics at the University of Connecticut who were kind enough to discuss these matters with me. Finally, I thank the anonymous LI reviewers for their helpful suggestions. Any errors are, of course, my own. Alternatively, it could be assumed that the LF movement of PROarb consists of a series of Adjunction-to-S operations. The ECP as formulated by Huang (1982) requires only that the first move be local (either movement to the embedded Comp or Adjunction to the embedded S). In what follows, in the absence of discriminating data, I will simply assume that the movement proceeds from Comp to Comp. In the LF representation (2), May’s formalism is used. The representation is equally inadequate under the assumption that the universal quantifier is in the embedded Comp, as opposed to being adjoined to the embedded S. Either way, the scope of the quantifier is misrepresented. In sentence (4) PRO may also be interpreted as coreferential with the NP Josh. This fact will be explained below. What is at issue here is the preposability of this constituent. I provisionally assume the adjunction structure given in (13). That so-called PROarb reduces to controlled PRO here suggests the possibility that, in general, PRO must be controlled. If, in fact, PRO must be controlled, this would explain why (at least in the standard cases) “PROarb” is possible only in structures in which a controlled argument is possible. Thus, for example, the contrast between (ii) and (iv) would be explained: (i) To play baseball is fun for John (ii) To play baseball is fun (iii) * To play baseball is certain for John (iv) * To play baseball is certain
In (iv), then, there exists an uncontrolled, hence illicit, PRO. 6 This element (pro) receives universal quantifier interpretation in English only if it is antecedentless. In the correct S-structure representation of sentence (4) as well, PRO is controlled by pro. A representation of the coreferential interpretation of PRO is obtained only if pro is freely coindexed with the NP Josh. If pro is freely contraindexed (and is therefore antecedentless), a representation of the quantifier-bound variable interpretation of PRO is derived. The fact that there exist two possible interpretations of sentences such as (4) is thus explained. For standard theory analysis of sentences such as (4), see for example Grinder (1970), who claims that such sentences are derived by the application of Super-Equi (an unbounded rule, collapsible with the local rule of Equi (see Rosenbaum (1967)), which deletes infinitival subjects under identity). By contrast, Kimball (1971) identifies a dative argument in the analysis of such sentences. He argues that Equi deletes the infinitival subject under identity with the dative argument and that the dative argument is deleted under the application of Dative Deletion, an unbounded deletion-under-identity rule. (See also Grinder’s (1971) reply to Kimball (1971).) Since each of the above analyses rests solely on deletion-under-identity, notice that neither is capable of accounting for the facts concerning PROarb. However, Kimball (1971, 145) does briefly mention a rule of Indefinite Deletion, which apparently is not a deletion-under-identity rule. This rule deletes the dative phrase for someone, containing an existential quantifier. 7 By contrast, Suñer (1983) identifies the pro of her examples (3a-c) as proarb. For evidence that quantifier-pro has the feature [+ human] in English, as Suñer (1983) claims what she calls “proarb” has in Spanish, see Epstein (1983). 8 The proposed analysis is consistent with the algorithm for the functional determination of empty categories proposed in Chomsky (1982, 84). Regarding the distribution of pro, I assume, following Chomsky (1982), that this
QUANTIFIER-PRO AND THE LF REPRESENTATION OF PROARB
19
element must be locally determined. We might tentatively assume that benefactive θ-role assignment constitutes local determination in English, as it may well in other grammars. In English, perhaps this constitutes the only form of local determination. Notice, however, that this account alone fails to exclude (i), for example, in which pro is apparently locally determined. (i) * [S′ [S It is fun for pro1 [S′ [S PRO1 to play baseball]]]] For the purpose of ruling out structures such as (i), we could additionally assume the following surface filter, under which Case-marked pro is ill-formed. (ii) Notice that by adopting the characterization of the null subject languages given in Chomsky (1981, 257) (rule R optionally applies in the syntax), we could provisionally assume that the above filter is universal, since in such languages, subject position need not be governed (hence Case-marked) by Agr. Under such an account, structures such as (iii) (iii) pro laughed are ruled out in the non-null subject languages because pro is invariably governed by Agr and hence Casemarked, in violation of the filter. (Following standard assumptions regarding the Agr element of such languages, (iii) also represents a failure to locally determine pro.) Notice that, whereas the filter (ii) alone correctly rules out (iii) (in a non-null subject language) as well as (i), it does not rule out, for example, (iv): (iv) * It was arrested pro Such an example reveals that pro must not only satisfy the filter (ii), but also be locally determined. 9 Notice that in structures such as (i) (i) [S′ [S John1 knows [S′ how [S PRO2 to solve the problem]]]] PRO again receives the interpretation of a variable bound to a universal quantifier. Given this quantificational interpretation, the LF representation of (i) is presumably (ii): (ii) [S′ (∀x1) [S John1 knows [S′ how [S x2 to solve the problem]]]] (Notice that the scope facts obtaining in the LF representations of these constructions are apparently identical to those of the adjectival constructions discussed above.) The preceding analysis may fail to cover such constructions as these, in which there is no syntactic A-position in which pro could be base-generated. However, see Epstein (1983) for a possible solution to this problem.
References Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris.
20
ESSAYS IN SYNTACTIC THEORY
Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Epstein, S.D. (1983) “On the Interpretation of the Base-Generated Empty Categories in English,” manuscript, University of Connecticut, Storrs. Grinder, J. (1970) “Super Equi-NP Deletion,” in Papers from the Sixth Regional Meeting of the Chicago Linguistic Society, Chicago: University of Chicago. Grinder, J. (1971) “A Reply to Super Equi-NP Deletion as Dative Deletion,” in Papers from the Seventh Regional Meeting of the Chicago Linguistic Society, Chicago: University of Chicago. Huang, C.-T.J. (1982) “Logical Relations in Chinese and the Theory of Grammar,” unpublished doctoral dissertation, MIT. Kimball, J. (1971) “Super Equi-NP Deletion as Dative Deletion,” in Papers from the Seventh Regional Meeting of the Chicago Linguistic Society, Chicago: University of Chicago. May, R. (1977) “The Grammar of Quantification,” unpublished doctoral dissertation, MIT. Rosenbaum, P. (1967) The Grammar of English Predicate Complement Constructions, Cambridge, Mass.: MIT Press. Suñer, M. (1983) “proarb,” Linguistic Inquiry 14:188–191.
3 THE LOCAL BINDING CONDITION AND LF CHAINS
Following Chomsky (1981), Chomsky (1984) proposes the following condition on A-chains: (1) The Local Binding Condition In each link (α, β) of a chain of A-positions, a locally binds β.
Chomsky (1984) defines local binding (and its component terms) as follows: (2) Local binding: α locally binds β if α binds β and there is no γ such that α binds γ and γ binds β. (3) Binds: α binds β if α c-commands and is coindexed with β. (4) C-command: α c-commands every element of its domain that is not contained within α. (5) Domain: The domain of α is the least maximal projection containing α.
Although Chomsky discusses certain violations of the Local Binding C ondition that are reducible to independent principles, he notes that “…it is not entirely clear that the condition can be reduced to others in its entirety” (Chomsky (1984, 254)). Of course, whether or not a condition can be reduced—and if so, how— depends on the formal properties of the condition in question. In this article I will attempt to reveal certain empirical problems that emerge under the Local Binding Condition. I will show that under a particular LF analysis proposed in Chomsky (1984), certain LF Achains, required by the θ-Criterion, are incorrectly precluded by the Local Binding Condition. As we will see, this is the result of the definition of local binding given in (2). I will then propose alternative, empirically distinct definitions of local binding. If the analysis proposed here is correct, then a new question emerges: can the Local Binding Condition, incorporating one of these alternative definitions of local binding, be reduced to independent principles? The question of whether the Local Binding Condition, incorporating the definition (2), can or cannot be reduced, disappears. Throughout the remainder of this study what is at issue is the empirically correct definition of local binding. Since local binding (or locally binds) forms part of the definition of a well-formed A-chain (see (1)), the correct definition of A-chain is also at issue. Furthermore, since the definition of A-chain in turn forms an integral part of the definition of the θ-Criterion (it is A-chains to which the θ-Criterion, by definition, applies), the correct formulation of this principle too is at stake. Finally, since we will be
22
ESSAYS IN SYNTACTIC THEORY
investigating the application of the θ-Criterion to different levels of syntactic representation, the correct formulation of the Projection Principle also comes under examination. 1 Evidence for a Local Binding Condition In this section I will briefly discuss certain S-Structure representations that Lasnik (1985) and Rizzi (1982) present as motivation for the incorporation of a local binding condition on A-chains. I will examine these S-Structure representations within the framework of Chomsky (1984), showing that they are correctly excluded within this framework (just as they are excluded within the frameworks adopted by Lasnik and Rizzi). Throughout, the reader should note that both Lasnik and Rizzi adopt a definition of local binding formally distinct from (2), both discuss alternative accounts of local binding violations, and both suggest that certain conditions on A-chains are, or should be, reducible to deeper principles of grammar (see the works cited for details). Lasnik (1985) notes that Chomsky (1981, 332) provides no explicit argument for the incorporation of a local binding condition on A-chains. Lasnik argues, however, that precisely such a condition appears to be required to exclude the S-Structure representations he presents, such as (6): (6) * [Comp′ [infl″ Johni is believed [Comp′ that [Infl″ hei likes ei]]]1
As Lasnik notes, (Johni, ei) must form an A-chain under the θ-Criterion, yet this required chain structure is precluded by the local binding condition he assumes. Notice that under definition (2), Johni in (6) does not locally bind ei Consequently, (Johni, ei) does not constitute an A-chain under the Local Binding Condition (1). The structure is thus correctly ruled out by the θ-Criterion applying at S-Structure. Rizzi (1982) provides independent evidence from Italian for a local binding condition on A-chains. Rizzi notes that in Italian, passive constructions containing anaphoric clitics, bound by the derived subject, are ungrammatical: (7) * Gianni si è stato affidato. Gianni to himself was entrusted ‘Gianni was entrusted to himself.’
Rizzi notes that the θ-Criterion requires that the S-Structure representation of such sentences contain two chains (Giannii, ei) and (sii, e′i):
THE LOCAL BINDING CONDITION AND LF CHAINS
23
(8)
Rizzi argues that the ill-formedness of such S-Structure representations is due to the violation of a local binding condition on A-chains.2 Before we examine (8) in detail, it should be noted that Rizzi assumes (as will I, for present purposes) that (a) clitic traces are nonarguments (on a par with NP-trace), subject to condition A of the binding theory, and (b) clitics are chain-heading arguments occupying an A-position (see also Chomsky (1982)).3 Notice that in (8), under definition (3) of binding, the NP Giannii binds (in fact, asymmetrically binds) each nominal element in V″, namely sii, ei, and e′i. Further, any nominal element in V″ symmetrically binds any other nominal element in V″. Thus, (8) contains the following substructure: (9)
Consequently, under definition (2) of local binding, Giannii does not locally bind ei. Both sii and e′i represent γ’s disallowed by the definition. (Giannii binds both sii and e′i, and both sii and e′i bind ei.) Thus, (Giannii, ei) does not constitute an A-chain under the Local Binding Condition (1). The structure is thus correctly ruled out by the θ-Criterion applying at S-Structure.
24
ESSAYS IN SYNTACTIC THEORY
Although we have just seen that (Giannii, ei) does not constitute an A-chain under the Local Binding Condition (and thus that (8) is ruled out by the θ-Criterion), we might ask, in the interest of clarity, whether a local binding condition should be assumed to constrain clitic chains such as (sii, e′i) in (8). (Notice that in (8) sii does not locally bind e′i, under definition (2).) As Lasnik (1985) notes, S-Structure representations such as (10) seem to indicate that clitic chains, unlike A-chains, are not subject to a local binding condition: (10)
(Under (2), sii does not locally bind ei.) The grammaticality of the sentence represented leads Lasnik to suggest that the local binding condition on chains is perhaps restricted to A-chains (as suggested in Chomsky (1981; 1984)).4 I will tentatively assume such a restriction, and consequently only the A-chain (Giannii, ei) in (8) violates the Local Binding Condition (1). 2 LFA-Chains Assuming (as we have) that chains are the formal objects to which θ-roles are assigned, we adopt the definition of the θ-Criterion given in Chomsky (1981, 335), updated with the Local Binding Condition of Chomsky (1984): (11) The θ-Criterion Given the structure S, there is a set K of chains, K={Ci}, where Ci=(ai1,…, ain) [satisfying the Local Binding Condition (S.D.E.)], such that: i. If a is an argument of S, then there is a Ci ∈ K such that α= αij and a θ-role is assigned to Ci by exactly one position P. ii. If P is a position of S marked with the θ-role R, then there is a Ci ∈ K to which P assigns R, and exactly one aij in Ci is an argument.
Given this definition of the θ-Criterion, the Projection Principle can be regarded as the requirement that the θ-Criterion be satisfied at all levels of syntactic representation-D-Structure, S-Structure, and LF. In the structures discussed thus far, we have seen that the θ-Criterion is violated at S-Structure. The Achain structures required by the θ-Criterion are precluded by the Local Binding Condition governing Achain structure. Thus, in these derivations LF representation is irrelevant, since the S-Structure representations are ill-formed. One uninvestigated question concerns the status of derivations in which the Local Binding Condition on A-chains is satisfied at D-Structure (trivially) and at S-Structure, yet is violated at LF. The question is interesting, since
THE LOCAL BINDING CONDITION AND LF CHAINS
25
At the LF-level, the θ-criterion is virtually a definition of well-formedness, uncontroversial in its essentials, though the nature of the syntax of LF, hence the precise way in which the θ-criterion applies at this level, is an important if difficult empirical issue. (Chomsky (1981, 343)) Under the above conception of the θ-Criterion and the Projection Principle we would of course predict that such derivations are excluded since the θ-Criterion is not satisfied at LF. Consider first the grammatical English sentence (12): (12) John was entrusted to himself.
The S-Structure chain structure of this sentence is clearly well-formed: (13)
But now consider its LF representation. Following Chomsky’s (1984) analysis, under which overt anaphors undergo LF cliticization, the overt anaphor himself in (13) is obligatorily adjoined to Infl (i.e. “cliticized”) at LF.5 Thus, the following LF representation is derived from (13): (14)
Notice that the LF chain structure of (14) appears to be quite similar to the Italian S-Structure chain structure of the ungrammatical passive anaphoric clitic construction discussed above (see (8)). There are, however, two notable differences between the Italian S-Structure representation (8) and the English LF representation (14). First, under definition (3) of binding, in (14), unlike (8), the indirect object,
26
ESSAYS IN SYNTACTIC THEORY
dominated by the maximal projection P″, does not bind the direct object. That this is a correct result is demonstrated by the grammaticality of (15): (15) I entrusted John to himself.
If the indirect object were a binder of the direct object, incorrectly, condition C of the binding theory would be violated. The second difference between the LF representation (14) and the S-Structure representation (8) is that in the former I assume (following Chomsky (1984)) that (the “clitic”) himself is adjoined to Infl, whereas in the latter I assume (following Rizzi (1982)) that the clitic si is adjoined to V.6 Despite these differences, in the LF representation (14) the Local Binding Condition on A-chains (1) is nonetheless violated. Under the LF application of the θ-Criterion, (Johni, ei) must form an A-chain. Under the Local Binding Condition (1), Johni must locally bind ei. Notice however that under definition (2) of local binding, Johni does not locally bind ei. Under definition (3) of binding, the LF representation (14) contains the following substructure (recall that e′i is not a binder): (16)
Under definition (2) of local binding, Johni does not locally bind ei since himselfi represents a γ disallowed by the definition; that is, Johni (α) binds himselfi (γ) and himself (γ) binds ei (β). Thus, in the LF representation (14) the Local Binding Condition on A-chains is violated. This LF representation (and any other possible chain structure) is ruled out by the θ-Criterion applying at LF (such application being required by the Projection Principle). This, of course, is an incorrect result. Unlike the S-Structure representations (6) and (8), the LF representation (14) must be generable since the sentence it represents (namely, (12)) is grammatical. Precisely the same problem is evidenced by the grammatical Italian analogue of (12): (17) Gianni è stato affidato a se stesso. Gianni was entrusted to himself
The S-Structure representation of (17) is analogous to (13):
THE LOCAL BINDING CONDITION AND LF CHAINS
27
(18)
The LF representation of (18) (derived by adjoining se stesso to Infl) is identical in all relevant respects to the English LF representation (14): (19)
In (19), under definition (2) of local binding, the Local Binding Condition on A-chains is incorrectly violated. The Italian LF representation (19), like the English LF representation (14), is thus incorrectly ruled out by the θ-Criterion applying at LF. Notice that the grammaticality of both the English sentence (12) and the Italian analogue (17), as contrasted with the ungrammatically of the Italian sentence (7), does not seem to reveal some parametric distinction between English and Italian. Rather, it seems to reveal a universal distinction between SStructure and LF requirements. For the purposes of allowing the LF representations (14) and (19), while still excluding the S-Structure representations (6) and (8), the following partial organization of Universal Grammar would appear to be motivated: (20) Under definition (2) of local binding, the Local Binding Condition on A-chains, (1), constrains S-Structure representation but does not constrain LF representation.7
This organization is, however, problematic. Given that the Local Binding Condition (1) in effect represents part of the definition of (well-formed) A-chain, this approach entails that the definition of A-chain varies at different levels. At S-Structure, the definition of A-chain incorporates the Local Binding Condition. At LF, it does not.
28
ESSAYS IN SYNTACTIC THEORY
Taken in isolation, this is not necessarily problematic. It would indicate only that a particular principle (a definition of well-formedness) constrains S-Structure representation but not LF representation (see, for example, Huang’s (1982) discussion of Subjacency). But given that the definition of A-chain forms an integral part of the θ-Criterion (see (11)), it follows that the θ-Criterion itself has varying, level-dependent properties. Under (20), the θ-Criterion, by definition, applies to different types of formal objects at different levels. This of course raises the question of whether the θ-Criterion can be maintained as a unitary formal object (that is, a single principle of grammar). This in turn raises questions concerning the Projection Principle, which, under standard assumptions, requires that a single principle, the θ-Criterion, be satisfied at all levels of syntactic representation. 3 Solutions We have seen that under the LF cliticization analysis as proposed within the framework of Chomsky (1984), certain LF representations are incorrectly excluded by the θ-Criterion applying at LF. Such LF representations are incorrectly excluded by the θ-Criterion as a result of the LF application of the Local Binding Condition (1), incorporating the definition of local binding (2). I have argued that a problematic characterization of Universal Grammar results from the assumption that LF representation (unlike S-Structure representation) is exempt from the Local Binding Condition (1). In the remainder of this section I propose two alternative definitions of local binding. The incorporation of either definition allows the Local Binding Condition (1) to apply uniformly to all levels of syntactic representation. Consequently, under either definition, the definition of A-chain, the θ-Criterion, and the Projection Principle can be maintained as invariant across levels. 3.1 Solution 1 Recall the problematic definition of local binding, from Chomsky (1984): (2) α locally binds β if α binds β and there is no γ such that α binds γ and γ binds β.
Now consider the following definition of local binding, adapted from Chomsky (1981, 185): (2A) α locally binds β iff α [X-]binds β, and if γ [Y-]binds β, then either γ[Y-]binds α or γ=α, where X and Y may be independently replaced by A or Ā.8
Under the assumption that any binder occupies either an A- or an Ā-position, the bracketed information in (2A) may be omitted, deriving (2B) (equivalent to (2A) under this assumption): (2B) α locally binds β iff α binds β, and if γ binds β, then either γ binds α or y=α.
Although it is hard to imagine a generable structure providing a predictive distinction between (2) and (2A) =(2B), there is at least one type of structure with respect to which, in principle, the predictions of (2) and (2A) =(2B) diverge. Consider a structure exhibiting the following schema:
THE LOCAL BINDING CONDITION AND LF CHAINS
29
(21)
What is crucial in (21) is that the two binders of β, namely α and γ, bind each other. Under definition (2), α does not locally bind β, since α binds γ and γ binds β. Under (2A)=(2B), however, α does locally bind β. Even though γ binds β, since γ also binds α, α locally binds β.9 Notice now that (21) is equivalent to the substructure (16) of the LF representation (14). In the LF representation (14) Johni is equivalent to α of (21), himself is equivalent to γ of (21), and ei is equivalent to β of (21) (again, in (14) ei is not a binder). Consequently, under (2A)=(2B), as opposed to (2), Johni does locally bind ei in the LF representation (14). The A-chain (Johni, ei) satisfies the Local Binding Condition, and the LF representation (14) thus satisfies the θ-Criterion. The Italian LF representation (19), which is structurally identical to (14), is generable in precisely the same way. In addition, under (2A)=(2B), in the SStructure representation (6) the Local Binding Condition correctly excludes the A-chain (Johni, ei), and (6) is correctly ruled out by the θ-Criterion. Finally, the S-Structure representation (8) is also correctly excluded. As is clear in the substructure (9), Giannii asymmetrically binds both sii and e′i, each of which binds ei. Thus, both sii and e′i represent γs disallowed by (2A)=(2B), with the result that Giannii does not locally bind ei. Consequently, (Giannii, ei) is not an A-chain under the Local Binding Condition and (8) is correctly ruled out by the θ-Criterion. Thus, under (2A)=(2B), adapted from Chomsky (1981), as opposed to (2), from Chomsky (1984), we need not assume that the Local Binding Condition on A-chains constrains S-Structure representation yet fails to constrain LF representation. Rather, we may assume that the Local Binding Condition applies uniformly to all levels of syntactic representation. Under (2A)=(2B) we encounter none of the problems engendered by definition (2) concerning level-dependent definitions of A-chain, the θ-Criterion, and the Projection Principle. 3.2 Solution 2 The (2A)=(2B) definition of local binding is empirically adequate since in the ill-formed S-Structure representations, α (the head of the requisite A-chain) asymmetrically binds γ (any binder of the tail of the requisite A-chain), whereas in the well-formed LF representations α (the head of the requisite A-chain) and γ (any binder of the tail of the requisite A-chain) bind each other. Recall that the local binding definition (2A)=(2B) in effect incorporates no requirements concerning the positions (A versus Ā) occupied by α, γ, and β. (Of course, in an A-chain α and (β both occupy Apositions.) An empirically correct alternative definition of local binding can be proposed under which the definition of local binding imposes requirements on the positions occupied by α and γ.
30
ESSAYS IN SYNTACTIC THEORY
Reconsider the S-Structure representations (6) and (8), which must be excluded, and the structurally identical LF representations (14)/(19), which must be generable. Disregarding Ā-binding relations, we have the following substructures: (6)
(8)
(14)/(19)
In the S-Structure representations (6) and (8) the tail of the requisite A-chain (namely, ei) has an A-binder, other than the head of the requisite A-chain. In the LF representations (14)/(19) this is not the case. The only Abinder of the tail of the requisite A-chain (ei) is the head of the requisite A-chain (Johni). Recall that in all these structures the clitic occupies an Ā-position and in (14)/(19), unlike (8), the indirect object does not bind the direct object. A distinction between (6) and (8), on the one hand, and (14)/(19), on the other, can be formally instantiated on the basis of these observations. Reconsider definition (2) of local binding: (2) α locally binds β if α binds β and there is no γ such that α binds γ and γ binds β.
THE LOCAL BINDING CONDITION AND LF CHAINS
31
Recall that under (2), the requisite A-chain in the LF representations (14)/ (19) is incorrectly excluded, because himselfi, occupying an Ā-position, represents a γ disallowed by the definition. Suppose then that we modify the definition as follows, imposing the requirement that α and γ occupy the same type of position: (2C) α locally binds β if α X-binds β and there is no γ such that α X-binds γ and γ X-binds β, where X is uniformly replaced by either A or Ā throughout.
Under (2C), the S-Structure representation (6) is still excluded. Johni (α) in an A-position does not locally bind ei (β), since hei represents a γ such that Johni (α) A-binds hei (γ) and hei (γ) A-binds ei (β). Under (2C), the S-Structure representation (8) is also correctly excluded. Giannii (α), in an A-position, does not locally bind ei (β) since ei represents a γ such that Giannii (α) A-binds e′i (γ) and e′i (γ) A-binds ei (β). The LF representations (14)/(19) are correctly generable. Johni, in an A-position, is the only A-binder of ei. Recall that the overt anaphor (“clitic”) occupies an Ā-position and ei, dominated by the maximal projection P″, does not bind ei. Thus, in (14)/(19) there is no substitution instance for γ of (2C). Under (2C), Johni locally binds ei. Consequently, the A-chain (Johni, ei) is well-formed and the θ-Criterion is satisfied at LF.10 Under this analysis, the crucial distinction between Italian S-Structure representations such as (8) and LF representations such as (14)/(19) is that only in the former does the indirect object bind the direct object. That the indirect object does in fact bind the direct object in S-Structure representations such as (8) is suggested by Rizzi’s discussion of the ill-formedness of anaphoric cliticization in all Italian S-Structure representations with derived subjects: …formation of the appropriate chain structure would be inevitably blocked by the intervening binder si (and by the trace of si), hence no well-formed output would ever result. (Rizzi (1982, 9)) Under (2C), the above parenthetical is what is crucial. In (8) the clitic sii, in an Ā-position, plays no role in “preventing” the N″ Giannii, occupying an A-position, from locally binding ei.11 Under (2C), then, the crucial distinction between Italian S-Structure indirect object cliticization, such as (8), and English/Italian LF indirect object cliticization, such as (14)/(19), is that S-Structure indirect object cliticization does not involve extraction of the clitic from the maximal projection P″, whereas LF indirect object cliticization does. The result is that only S-Structure indirect object cliticization creates a local Abinder of the direct object position. LF indirect object cliticization creates no such binder.12 Further evidence for this distinction between S-Structure cliticization and LF cliticization is provided by the following paradigm (see Rizzi (1982) for an analysis of the S-Structure facts): (22)
(23) [Comp′ [Infl″ I entrusted Johni [P″ to himselfi]]] (=(15))
32
ESSAYS IN SYNTACTIC THEORY
(24)
Following Chomsky (1984), in the LF representations of (22) and (23) we tentatively assume that the reflexive is adjoined to V″ (as opposed to Infl), in which position the reflexive is assumed to be governed by its antecedent. (LF adjunction to V″ is presumably a marked option corresponding to the marked option (selected in both English and Italian) of allowing a direct object as the sole binder of an indirect object (Chomsky (1984)).) Assuming that the binding theory applies at LF (Chomsky (1982; 1984)), the LF representations of (22) and (23) do not violate condition C. (25) Condition C An R-expression is A-free (in the domain of the head of its chain). (Chomsky (1984))
Johni/Giannii, an R-expression, is A-bound by the reflexive clitic, an allowable configuration (in fact providing evidence of a further distributional similarity between wh-trace and “names” (see Freidin and Lasnik (1981))). Moreover, in the LF representations of (22) and (23) the trace of the raised reflexive does not bind Johni/Giannii, since the trace, unlike the R-expression, is dominated by the maximal projection P″. In the ill-formed S-Structure representation (24), again the binding of Giannii by sii, in an A-position, is allowed under condition C.13 Nonetheless, condition C is violated in (24) since the trace of the clitic illicitly A-binds the R-expression Giannii. This paradigm thus provides further evidence for the S-Structure absence (and the LF presence) of the maximal projection P″ in structures exhibiting indirect object anaphoric cliticization. In summary, under the (2C) definition of local binding, the S-Structures (6) and (8) are correctly excluded by the Local Binding Condition (1) and hence the θ-Criterion. The LF representations (14)/(19) are correctly generable. Under (2C), as opposed to (2), we need not assume that the Local Binding Condition on A-chains constrains S-Structure representation yet fails to constrain LF representation. Rather, the Local Binding Condition (1), incorporating the definition of local binding (2C), can apply uniformly to all levels of syntactic representation. Consequently, none of the above problems concerning level-dependent definitions of A-chain, the θ-Criterion, and the Projection Principle emerge. 4 Summary and Discussion We have seen that under the LF cliticization analysis as proposed within the framework of Chomsky (1984), certain LF A-chains are incorrectly excluded by the Local Binding Condition (1), incorporating the definition of local binding (2): (2) α locally binds β if α binds β and there is no γ such that α binds γ and γ binds β.
The exclusion of such LF A-chains results in the incorrect exclusion of certain LF representations by the θCriterion applying at LF, such application being required by the Projection Principle.
THE LOCAL BINDING CONDITION AND LF CHAINS
33
It has been argued that the following partial organization of Universal Grammar represents a problematic solution: (20) Under definition (2) of local binding, the Local Binding Condition on A-chains, (1), constrains S-Structure representation but does not constrain LF representation.
Since the Local Binding Condition in effect represents part of the definition of A-chain, this organization entails that the definition of A-chain varies at different levels. Moreover, given that the definition of A-chain forms an integral part of the definition of the θ-Criterion, it follows that the θ-Criterion itself has varying, level-dependent properties. This raises the question of whether the θ-Criterion can be maintained as a single (invariant) principle of grammar. This in turn raises questions concerning the Projection Principle, which, under standard assumptions, requires that a single principle, the θ-Criterion, be satisfied at all levels of syntactic representation. As solutions, I have proposed two alternative definitions of local binding: (2A)=(2B), adapted from Chomsky (1981), and (2C): (2A) α locally binds β iff α [X-]binds β, and if γ [Y-]binds β, then either γ [Y-]binds α or γ=α, where X and Y may be independently replaced by A or Ā. (2B) α locally binds β iff α binds β, and if γ binds β, then either γ binds α or γ=α. (2C) a locally binds β if a X-binds β and there is no γ such that a X-binds γ and γ X-binds β, where X is uniformly replaced by either A or A throughout.
Under the incorporation of either definition, the above data are accounted for under the Local Binding Condition on A-chains applying as a constant to all levels of syntactic representation. Consequently, under either definition of local binding, the definition of A-chain, the θ-Criterion, and the Projection Principle can be maintained as invariant across levels. A number of unanswered questions of course remain. First, provided they are empirically distinct, which (if either) of the two alternative definitions of local binding is correct? The answer to this requires further investigation of the (multilevel) properties of chains. One issue that is relevant to the further investigation of LF chains concerns the definition of antecedent. Recall that LF cliticization of overt anaphors is, in effect, forced by the LF requirement that an overt anaphor be governed by its antecedent. Chomsky (1984) suggests that in structures with a subject binder, the overt anaphor is adjoined to Infl (see fn. 5 and Chomsky (1984)), whereas in (marked) structures with an object binder, the overt anaphor is adjoined to V″ (see discussion of (22) and (23), and Chomsky (1984)). The LF representation (14) is based on the assumption that in the S-Structure representation (13) (13) [Comp′ [Infl″ Johni was entrusted ei to himselfi]]
Johni is the antecedent of the anaphor himselfi (hence, in the LF representation (14), himselfi is adjoined to Infl). However, this is not altogether clear. Perhaps the direct object (trace) is the antecedent of himselfi. If so, then in the LF representation of (13) himselfi is adjoined to V″ (not to Infl as in (14)), yielding the structure (26):
34
ESSAYS IN SYNTACTIC THEORY
(26)
Assuming that (26) is the only LF representation of (13), notice first that under definition (2) it is incorrectly ruled out since Johni does not locally bind ei. In addition, (26) provides an empirical distinction between (2A)= (2B) and (2C). Under (2A)=(2B), the LF representation (26) is incorrectly ruled out ((Johni, ei) is not an A-chain since Johni does not locally bind ei). Under (2C), the LF representation is correctly generable ((Johni, ei) is an A-chain since Johni does locally bind ei). Determining whether (14), (26), or some other structure is the LF representation of (13) requires further specification of the term antecedent.14 In the absence of such further specification, (2A)=(2B) and (2C) would appear to be equally adequate definitions of local binding. Another approach to determining the relative adequacy of these two alternative definitions of local binding might well involve the investigation of principles other than the Local Binding Condition that incorporate the term locally binds. Finally, an important distinction between these two alternative definitions may rest on the answer to the question posed at the outset: can a local binding condition, defined in terms of the (2A)=(2B) or the (2C) definition of local binding (as opposed to (2)), be entirely reduced to independent principles? Determining the relative reducibility of these alternative definitions of the Local Binding Condition awaits further research. Notes * I thank Howard Lasnik and Elaine McNulty for many useful comments on an earlier draft of this article. Thanks also to an anonymous LI reviewer for helpful suggestions. Any errors are, of course, my own. 1 Following Chomsky (1984), I assume the following clausal structure:
THE LOCAL BINDING CONDITION AND LF CHAINS
35
(i)
2
3
4 5
In addition, I will simply assume the chain formation algorithm is free—that is, chains are freely formed. The arguments presented below do not depend upon this assumption. Representation (8) is adapted from Rizzi (1982). The prime notation occurring on the indirect object, e′i, is used here (and below) only to distinguish the indirect object empty category from the direct object empty category; it has no theoretical status. Rizzi notes that the order of ei and e′i is irrelevant; e′i could occur to the left of ei. Regardless of the order of ei and e′i, the configurational relations remain the same. Since we are concerned here only with such relations, I will simply assume structure (8). There is a problem here. If condition A demands A-binding, then structures containing clitic traces (anaphors) bound solely by the clitic (occupying an Ā-position) will incorrectly violate it. Rizzi (1982, fn. 15) suggests that NP-trace and clitic-trace could perhaps be unified and distinguished from wh-trace if the binding theory were to distinguish clause-internal binding (NP-trace and clitic-trace) from clause-peripheral binding (wh-trace). As Lasnik notes, the postulation of such a restriction must be tentative, since such cases of “clitic climbing,” as in (10), are limited to a small set of verbs. Under the binding theory proposed in Chomsky (1984), Agr is not a binder. The governing category of the anaphor in an “NIC violation” structure, such as (i), is the matrix S: (i) * [Comp′ [Infl″ Johni thinks [Comp′ (that) [Infl″ himselfi left]]]] In (i), then, condition A is satisfied; the anaphor is bound in its governing category. Chomsky (1984) assumes an LF requirement under which an overt anaphor must be governed by its antecedent. He proposes that in the LF of (i) the overt anaphor is raised to the matrix Infl, in which position it is governed by its antecedent, the subject: (ii) * [Comp′ [Infl″ Johni [Infl′ [Infl himselfi [Infl [V″ thinks [Comp′ (that) [Infl″ ei left]]]]]]]] The LF representation (ii) is now ruled out by the ECP (see, for example, Lasnik and Saito (1984)), for the trace is not properly governed. Notice that at no point in the derivation is binding condition A violated. The approach thus seeks to eliminate a redundancy between binding condition A and the ECP, both of which would otherwise be violated in the following structure, for example: (iii) * [Comp′ [Infl″ Johni seems [Comp′ (that) [Infl″ ei left]]]]
See Chomsky (1984) for further details and conseque nces. 6 Notice that under definition (3) of binding, the clitic in the LF representation (14) binds all nominal elements in V ″, including its trace. That the clitic binds its trace would appear to be a necessary result. If clitics are chainheading arguments occupying an Ā-position (as is assumed by Rizzi (1982) and Chomsky (1982) for S-Structure
36
ESSAYS IN SYNTACTIC THEORY
7
8
9 10
clitics), then LF clitics (arguments in an Ā-position) must bind their traces so as to enter into a chain and thereby satisfy the θ-Criterion. Under this approach, to allow the LF representations (14)/(19) we must also assume that LF Ā-chains (like SStructure Ā-chains (see (10) and fn. 4)) are exempt from a local binding condition. Furthermore, assuming that the binding theory applies at LF (see Chomsky (1982; 1984)), LF clitic traces, like S-Structure clitic traces, must be anaphors. If the clitic trace were pronominal or an R-expression, binding conditions B or C respectively would incorrectly exclude (14)/(19). Within the framework of Brody (1984), notice that the clitic trace could only be an anaphor. The definition of locally binds in (2A) is an adaptation of the definition of is locally bound by provided in Chomsky (1981, 185). I have changed Chomsky’s definition of is locally bound by to active voice, thereby deriving a definition of locally binds. In addition, for comparison with (2), in deriving (2A) I have systematically interchanged the variables α and β as they occur in Chomsky’s (1981, 185) definition. Notice that in (21), under definition (2), β has no local binder. Under (2A)= (2B), both α and γ locally bind β. Under the (2C) definition of local binding, all the clitic chains discussed above represent instances of local binding. (Since the clitic is the only Ā-binder of the clitic trace, the clitic locally binds the clitic trace.) Notice also that definition (2C) is incompatible with the functional determination algorithm proposed in Chomsky (1982, 35), which crucially incorporates the term local binding. For example, under (2C), the empty category in the following strong crossover configuration is both locally Ā-bound and locally A-bound: (i) * [Comp′ Whoi [Infl″ did hei see ei]]
Under the functional determination algorithm, ei is a variable. Since the functional determination analysis (see Chomsky (1982) and Koopman and Sportiche (1983)) allows locally Ā-bound pronouns, such as hei in (i), while also eliminating condition C of the binding theory, (i) is incorrectly generable. This is problematic to the extent that the functional determination analysis provides a tenable account of strong crossover and, more generally, an account of the distribution of empty categories. For arguments that the functional determination analysis does not provide a general account of strong crossover, see Epstein (1984). For an analysis eliminating functional determination altogether, see Brody (1984). 11 Rizzi (1982, 2) assumes the following definition of local binding, which is (roughly) the “negated equivalent” of (2B) (adapted from Chomsky (1981)): (i) α is the local binder of β iff α is a binder of β, and there is no γ such that γ is a binder of β, and γ is not a binder of α. Under this definition, in (for example) the structure (8) both the clitic sii and the clitic trace e′i locally bind ei. Notice that the definition Rizzi assumes places no restriction on the positions (A versus Ā) occupied by α, β, and γ. Rizzi does however suggest the possibility of an alternative analysis of the Italian S-Structure facts under which “…the intervention effect…is somehow restricted to A-positions” (1982, fn. 18). This issue is left open pending further investigation of the clitic-trace relation. 12 This structural distinction is arguably a consequence of Case theory requirements both that overt anaphors such as himselfi be Case-marked at S-Structure and that S-Structure indirect object clitic traces be Caseless. Given Chomsky’s (1981) Visibility Principle, under which the Case Filter is reduced to the θ-Criterion, the above Case requirements would perhaps be analyzable as θ-theory requirements. If such an analysis is tenable, then the structural distinction between S-Structure and LF indirect object cliticization (which plays a crucial role in determining the θ-theoretic notion of A-chain well-formedness) is itself a consequence of θ-theory—that is, not an “accidental” consequence of requirements imposed by the Case Filter. 13 Rizzi (1982, 29) formulates condition C as (i):
THE LOCAL BINDING CONDITION AND LF CHAINS
37
(i) A lexical element is free. Rizzi claims that (24) is ruled out under condition C because the lexical element Giannii is (Ā-)bound by the clitic. Rizzi (1982, fn. 16) leaves open the role of traces (in A-positions) in determining such ill-formedness. The above formulation of condition C would appear to preclude a unified analysis of “names” and wh-trace, since the latter may (in fact, by definition, must) be Ā-bound—that is, under this formulation of condition C whtrace must not be a lexical element. In addition, given the application of the binding theory at LF (see Chomsky (1982; 1984)), which Rizzi (1982) was presumably not assuming, this version of condition C is problematic since it would incorrectly rule out the LF representations of (22) and (23) (as well as (14)/(19)), in which the lexical elements Johni/ Giannii are (Ā-) bound by the reflexive “clitics,” himselfi/se stessoi. 14 An LI reviewer suggests another conceivable LF representation of (13), one in which himself is adjoined to P″, the direct object ei being the antecedent: (i) [Comp′ [Intl″ Johni [V″ was entrusted ei [P″ himselfi [P″ to e′i]]]] The reviewer suggests that (i), hence (12), is generable without modifying Chomsky’s (1984) framework. This suggestion embodies two essential claims: first, that (i) satisfies Chomsky’s LF requirement, himself being, governed by the antecedent ei (crucially, the adjunct P″ must not be a barrier to government), and second, that Chomsky’s (1984) Local Binding Condition is satisfied (that is, John locally binds ei). Strictly speaking, (i) is not generable within Chomsky’s (1984) framework. Consider first Chomsky’s (1984, 229) definition of government: “…a category α governs a maximal projection X″ if α and X″ c-command each other….” Now if ei fails to govern himself, then the LF requirement on overt anaphors is violated. If, on the other hand, ei governs himself, then, by definition, himself must c-command ei (and conversely). But if himself ccommands ei, then, since these categories are coindexed, himself must bind ei. Thus, (i) must exhibit the following substructure: (ii)
Given (ii), (i) is ruled out by Chomsky’s (1984) Local Binding Condition; that is, John does not locally bind ei. Hence, whether or not ei governs himself, (i) is not generable. Having noted problems with the reviewer’s analysis of anaphors with object antecedents, I should also note a problem with Chomsky’s (1984) analysis of this phenomenon. Recall, in cases in which an anaphor is bound solely by an object, it is tentatively suggested that the anaphor is adjoined to V″ at LF. However, under adjunction to V ″, the object (antecedent) fails to c-command—hence fails to govern—the anaphor, thereby violating the LF requirement. Chomsky (1984, 247) acknowledges this problem and suggests that some slight revision of the definition of c-command may be needed not only for such cases of LF adjunction to V″ but also for other cases not referenced or discussed in that work. Notice that in (26), for example, the required revision of c-command is presumably one under which himself and the direct object would c-command each other— that is, one under
38
ESSAYS IN SYNTACTIC THEORY
which himself would be governed by its antecedent, the direct object. But again, if these two categories were to ccommand each other, then, since they are coindexed, himself would (still) certainly bind the direct object just as in (26) or (ii)—that is, the Local Binding Condition of Chomsky (1984) would be violated, just as in (26) or (ii).
References Brody, M. (1984) “On Contextual Definitions and the Role of Chains,” Linguistic Inquiry 15:355–380. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1984) “Knowledge of Language: Its Nature, Origin, and Use,” manuscript, MIT. Epstein, S.D. (1984) “A Note on Functional Determination and Strong Crossover,” The Linguistic Review 3:299–305. Freidin, R. and Lasnik, H. (1981) “Disjoint Reference and Wh-Trace,” Linguistic Inquiry 12:39–53. Huang, C.-T.J. (1982) “Logical Relations in Chinese and the Theory of Grammar,” unpublished doctoral dissertation, MIT. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160. Lasnik, H. (1985) “Illicit NP Movement: Locality Conditions on Chains?” Linguistic Inquiry 16:481–490. Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. Rizzi, L. (1982) “On Chain Formation,” manuscript, Università della Calabria.
4 ADJUNCTION AND PRONOMINAL VARIABLE BINDING
In this article I examine the analysis of pronominal variable binding proposed by May (1985), an analysis that he claims independently supports the theory of adjunction he proposes. I point out certain empirical problems confronting this analysis of pronominal variable binding and suggest solutions to them. My results indicate that the pronominal variable binding phenomena examined provide no independent support for May’s theory of adjunction and are fully consistent with an alternative standard analysis of adjunction. In sections 1 and 2 I review the central aspects of May’s theory of adjunction and the crucial role it plays in his analysis of pronominal variable binding. In section 1 I present the Scope Principle from May (1985), including the concept relative scope, and introduce the Path Containment Condition, as adapted by May (1985) from Pesetsky (1982). In section 2 I discuss the concept absolute scope and its role in May’s proposed analysis of pronominal variable binding. In section 3 I point out empirical problems confronting this analysis, and in section 4 I provide a solution to these problems and discuss the consequences of adopting this solution. 1 The Scope Principle, Relative Scope, and the Path Containment Condition One of the central phenomena with which May is concerned is the contrast in interpretation between sentences such as (1) and (2): (1) (=May’s (12), p. 38) What did everyone buy for Max? (2) (=May’s (16), p. 39) Who bought everything for Max?
(1) is ambiguous; it can be interpreted as a single question or as a “distributed” question. Under the former interpretation, an appropriate answer would be Everyone bought Max a car. Under the latter interpretation, an appropriate answer would be John bought Max a pen, Bill bought Max a pencil, and Fred bought Max a paper clip. By contrast, (2) displays no such ambiguity; it is interpretable only as a single (undistributed) question. How is this contrast between (1) and (2) to be accounted for? Within May’s framework, the LF representation of (1) is derived by adjoining the quantifier phrase (QP) everyone to S, yielding (3): (3) (=May’s (14), p. 38)
40
ESSAYS IN SYNTACTIC THEORY
[S′ what2 [S everyone3 [S e3 bought e2 for Max]]]
Under May’s analysis, this single LF representation represents both interpretations of (1). This follows from his proposal that a single multiple quantified LF representation can be seen to manifest a uniquely specifiable class of interpretations just in case the quantified phrases…govern…one another, (p. 33) Following, in essentials, Aoun and Sportiche (1983), government and c-command are defined as follows: (4) (=May’s (8), p. 33) α governs β=df α c-commands β and β c-commands α, and there are no maximal projection boundaries between α and β. (5) (=May’s (9), p. 34) α c-commands β=df every maximal projection dominating α dominates β, and α does not dominate β.
Under these definitions, what and everyone c-command each other in (3).1 Further, these two operators govern each other. Notice that to derive the latter result there must be no maximal projection boundary between what and everyone. Since the outermost S bracket is indeed a projection boundary occurring between what and everyone (see May, p. 38), May (p. 34) assumes (6): (6) S is not a maximal projection.
Hence, there is no maximal projection boundary between what and everyone, with the result that these two operators govern each other. The ambiguity of (1) is then formally accounted for by (7) and (8): (7) The Scope Principle “…members of Σ-sequences are free to take on any type of relative scope relation.” (p. 34) (8) Σ-sequence “…a class of occurrences of operators Ψ [is] a Σ-sequence if and only if for any Oi, Oj a member of Ψ, 0i governs Oj, where ‘operator’ means ‘phrases in Ā-positions at LF,’…” (p. 34)
Given that the ambiguity of (1) is accounted for, how is the interpretation of (2) predicted? To predict the nonambiguity of (2), the following LF representation must not be generated: (9) (=May’s (21), p. 41) [S′ who3 [S everything2 [S e3 bought e2 for Max]]]
If such a representation were generated, the Scope Principle would apply, thereby (wrongly) predicting that (2) is ambiguous. To rule out representation (9), May adapts the Path Containment Condition (PCC) of Pesetsky (1982), defining this condition as follows:
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
41
(10) Path Containment Condition “Intersecting Ā-categorial paths must embed, not overlap.” (p. 118)
May explicates this condition as follows: A path is a set of occurrences of successively immediately dominating categorial nodes connecting a bindee to its binder…. Each contiguous pair of nodes within a path constitutes a path segment, and a path, more precisely, is just a set of such segments. I will refer to a set of such paths associated with an LF representation as its path structure. Paths intersect only if they have a common path segment. Consequently, paths sharing a single node do not intersect. If the paths do intersect, then the PCC requires that one of the paths must properly contain all the members of the other, (p. 118) The LF representation (9) is ruled out by the PCC since it exhibits the following path structure (see May, p. 120): (11) path e3 {S,S,S′} path e2 {VP,S,S}
The paths of e3 and e2 intersect; that is, they have a common path segment, (S,S). Thus, the PCC is applicable. It is violated because neither path is a proper subset of the other. Given that (9) is ruled out, what then is the LF representation of (2)? May (p. 42) proposes that in the LF representation of (2) everything is adjoined to VP, yielding (12): (12) [S′ who3 [S e3 [VP everything2 [VP bought e2 for Max]]]]
This representation has the following path structure: (13) path e3 {S, S′} path e2 {VP, VP}
(13) satisfies the PCC, which is, in fact, inapplicable since the paths do not intersect. In addition, given that (12) is the only LF representation of (2), it is correctly predicted that this sentence is unambiguous. This prediction is derived because the Scope Principle fails to apply to (12); who and everything do not govern each other since the maximal projection boundary of VP (the outermost VP bracket) intervenes between these two operators. Thus, the Scope Principle is inapplicable, with the result that the grammar will make available only a dependent interpretation, relative scope order being fixed simply as a function of constituency, determined in a “top-to-bottom” fashion, from structurally superior to structurally inferior phrases, (p. 35) Thus, the relevant interpretive property of sentence (2) is predicted: in the LF representation (12) everything has relative scope narrower than who.
42
ESSAYS IN SYNTACTIC THEORY
2 Absolute Scope and Pronominal Variable Binding As seen in the LF representation (12), a VP-adjoined operator has relative scope narrower than an operator in the next higher Comp. Despite having different relative scopes, however, such operators have the same absolute scope, given that the absolute scope of an operator is simply its c-command domain. The fact that the wh-phrase in Comp and the VP-adjoined quantifier have the same absolute scope (equivalently, the same c-command domain) follows directly from May’s theory of adjunction. Within this theory, the VPadjoined quantifier is not dominated by VP but rather is dominated by only one node (equivalently, segment) of the VP projection. Consequently, the minimal maximal projection dominating a VP-adjoined operator is S′; hence, its absolute scope is S′, the same as that of a wh-phrase in Comp. May best explains this distinction between relative and absolute scope: What is the scope of [an] NPi, adjoined to VP?—we see that it is its c-command domain, which is S′. This maximal projection is the minimal one dominating NPi, which is dominated not by the VPprojection, but only by its higher member node. Thus, the “absolute” scope of NPi extends outside the VP to the clausal level as does the scope of S-adjoined quantifiers and wh-phrases in COMP, although, as we have seen, VP-adjoined phrases will have relative scope shorter than their counter-parts in COMP or adjoined to S. (pp. 58–59) Given this distinction between relative and absolute scope, May argues that the interpretive properties of sentences exhibiting “crossed” pronominal variable binding, such as (14), are predicted (see May, p. 59): (14) Which pilot who shot at it hit every MIG that chased him?
According to May, sentences such as (14) exhibit the following two interpretive properties: (15) The QP every MIG that chased him is interpreted as having narrower scope than the wh-phrase which pilot who shot at it, and (16) The pronoun it can be interpreted as a variable bound by the QP every MIG that chased him and the pronoun him can be interpreted as a variable bound by the wh-phrase which pilot who shot at it.
(14) is exactly like (2) in that it contains a subject wh-phrase and an object QP. Thus, the object QP must be adjoined to VP in the LF representation of (14); were it adjoined to S, the PCC would be violated (see (9)). Hence, I assume that the LF representation of (14) must be (17), in which the wh-phrase occupies Comp and the QP is adjoined to VP: (17) [S′ [NP2 which pilot who shot at it3][S e2 [VP [NP3 every MIG that chased him2] [VP hit e3]]]]
In (17) the Scope Principle is inapplicable; the QP has narrower scope than the wh-phrase (see (12)). Thus, the interpretive property (15) is accounted for. But given the LF representation (17), how is (16) accounted for? That is, how is the possibility of “crossed” pronominal variable binding predicted? To answer this, we must first identify the conditions under which pronominal variable binding is allowed. May assumes the following condition on LF representation:
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
43
(18) “A pronoun is a bound variable only if it is within the scope of a coindexed quantifier phrase.” (p. 21)
Given that the term scope, as it occurs in (18), is to be interpreted as absolute scope (which May defines as c-command domain), it follows that each pronoun in (17) is indeed a bound variable. Each pronoun is a bound variable because each pronoun is within the absolute scope (c-command domain) of the operator with which it is coindexed. Under this analysis, then, each pronoun in (17) is a bound variable, since each operator has absolute scope over S′ and can therefore variable-bind the pronoun with which it is coindexed. Nonetheless, in (17) the Scope Principle is inapplicable; the QP is interpreted as having relative scope narrower than the wh-phrase. 3 Problems with the Analysis of Pronominal Variable Binding Although (18) is a necessary condition for pronominal variable binding, it cannot be strengthened to a sufficient one. To see this, consider (19) and (20), only the latter of which allows a bound variable reading of the pronoun his: (19) Who does his mother admire? (20) Who admires his mother?
The LF representation of (19) is (21): (21) (=May’s (66), p. 146) [S′ [C who2][S [NP [Det his2][N′ mother]][VP admire e2]]]
In this LF representation (18) is met; that is, the pronoun is indeed within the scope of a coindexed QP (who), yet pronominal variable binding is in fact impossible. How then are LF representations such as (21) excluded? May (section 5.5) proposes that all locally Ā-bound categories, be they empty categories or overt pronominals, generate paths to their binders.2 The unavailability of the bound variable reading of the pronoun in (19) then follows from the fact that the LF representation of such a reading—namely, (21)— violates the PCC. Since (21) contains two locally Ā-bound categories, its path structure consists of two paths (see May, p. 146): (22) path his2 {NP,S,S′} path e2 {VP,S,S′}
Thus, the PCC is violated. By contrast, the LF representation of the bound variable reading of the pronoun in (20) satisfies the PCC: (23) (=May’s (67), p. 147) [S′ [C who2][S e2 [VP admires [NP [Det his2][N′ mother]]]]]
44
ESSAYS IN SYNTACTIC THEORY
As May (p. 147) notes, the structure contains only a single Ā-path, namely, that of e2; his generates no path at all since it is not locally Ā-bound. Since the path structure of (23) consists of only one path, the PCC is, of course, satisfied. Because condition (18) is also satisfied, his is a bound variable.3 Consider next the quantificational analogues of (19) and (20):4 (24) His mother admires every man. (25) Every man admires his mother.
(24), like (19), disallows a bound variable reading of the pronoun, whereas (25), like (20), permits it. The LF representation of (24) is (26): (26) [S′ [S every man2 [S [NP [Det his2][N′ mother]] [VP admires e2]]]]
The path structure of (26) is like that of (21), violating the PCC: (27) path his2 {NP,S,S} path e2 {VP,S,S}
By contrast, consider the LF representation of (25): (28) [S′ [S every man2 [S e2 [VP admires [NP [Det his2][N′ mother]]]]]]
Given that e2 is the only locally Ā-bound category, (28) has a path structure consisting of only one path; hence, the PCC is satisfied. Thus, it appears that the contrast in interpretation between (24) and (25) is accounted for. However, to derive this contrast, yet another LF representation of (24) must be ruled out. Recall that distinguishing the interpretive properties of (1) and (2) rested on the possibility of adjoining quantifiers to VP in the LF component. Given this possibility, consider the following LF representation of (24): (29) [S′ [S [NP his2 mother][VP every man2 [VP admires e2]]]]
Now, since the absolute scope of a VP-adjoined quantifier is S′ (see (17)), his is indeed within the scope of every man, just as the pronoun it is within the scope of the VP-adjoined QP in (17). Since these phrases are coindexed, his satisfies principle (18) and is thus a variable bound by the quantifier adjoined to VP. Furthermore, the path structure of (29) satisfies the PCC. Giving the PCC the best possible chance of ruling out (29), assume that both his and e generate paths to every man. Under this assumption, the path structure of (29) is (30): (30) path his2 {NP,S,VP} path e2 {VP,VP}
Recalling from the above discussion of the PCC that “…paths sharing a single node do not intersect” (see May, p. 118), the paths in (30), having only the higher VP node in common, do not intersect. Consequently, the PCC is satisfied.5 Thus, (29) (with the path structure (30)) is generable, with the result that it is incorrectly
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
45
predicted that sentences like (24) allow a bound variable reading of the pronoun. Thus, quantificational weak “crossover” is incorrectly generable under this analysis of pronominal variable binding.6 To summarize the problem May’s analysis confronts: The analysis of “crossed” pronominal variable binding (see (17)) entails that a VP-adjoined QP c-commands (hence, can variable-bind) a pronoun dominated by a higher Comp. Such pronominal variable binding provides independent support for May’s theory of adjunction. Within this theory, a VP-adjoined QP is not dominated by VP, with exactly the desired result that it does indeed c-command (hence, can variable-bind) a pronoun dominated by a higher Comp. But if it is true that a VP-adjoined QP is not dominated by VP (as dictated by May’s theory of adjunction) and it therefore c-commands (hence, can variable-bind) categories external to the VP, then LF representations such as (29) are incorrectly generable. 4 A Solution In developing a solution for this problem, let us begin by trying to exclude LF representations such as (29). As we have seen, such representations are allowed within the analysis examined here because (a) the pronoun is within the scope of the VP-adjoined QP (satisfying (18)) and (b) the PCC is satisfied. As a solution, we will retain (18), the (standard) requirement that a pronoun must be within the scope of a coindexed QP to receive bound variable interpretation, but we will abandon the PCC as a constraint on pronominal variable binding, since it is ineffective in excluding cases like (29). Thus, we will assume (31): (31) Pronouns do not generate paths.
Under (31) we now must exclude two different LF representations of sentence (24), namely, (26) and (29): (26) [S′ [S every man2 [S [NP [Det his2][N′ mother]][VP admires e2]]]] (29) [S′ [S [NP his2 mother][VP every man2 [VP admires e2]]]]
Given (31), (26) and (29) each have only one path; hence, the PCC is trivially satisfied. Bound variable interpretation of the pronoun is then incorrectly allowed, since each pronoun is within the scope of a coindexed QP. To exclude such representations, we adopt the analysis of pronominal variable binding proposed in Higginbotham (1980), itself a formalization of the Leftness Condition of Chomsky (1976):7 (32) The Leftness Condition A variable cannot be the antecedent of a pronoun to its left.
Under Higginbotham’s analysis, quantifiers and pronouns are initially contraindexed. A pronoun can reindex and thereby come to have the same index as a QP only if an empty category that is coindexed with the QP occurs to the left of the pronoun. That is, Higginbotham (1980, 689) proposes the following rule: (33) The Reindexing Rule In a configuration: …ei…pronounj…
46
ESSAYS IN SYNTACTIC THEORY
Optionally reindex pronounj to pronouni.
Thus, pronominal variable binding is allowed only if some empty category that is coindexed with the QP occurs to the left of the pronoun (see Higginbotham (1980) for further details). This analysis provides exactly the desired result. The LF representations (26) and (29) are not generable. In each representation the pronoun cannot reindex (cannot have the same index as the QP) since there is no empty category that is both coindexed with the QP and located to the left of the pronoun. Rather, the empty category occurs to the right of the pronoun. Since reindexing is impossible, the pronoun must retain its initial index, an index distinct from that of the quantifier, with the result that bound variable interpretation of the pronoun is correctly blocked by (18). Thus, representations (26) and (29) and all other representations like them are correctly excluded. But now that such representations are ruled out, how can we predict the possibility of “crossed” pronominal variable binding interpretation of sentences such as (14)? Recall that in the LF representation of (14) the wh-phrase is in Comp and the QP must be adjoined to VP: (34) [S′ [NP2 which pilot who shot at it4] [Se2 [VP [NP3 every MIG that chased him5] [VP hit e3]]]]
The LF representation (34) is the representation prior to the application of the Reindexing Rule; in other words, in (34) the QPs and the pronouns are contraindexed. Correctly, the Reindexing Rule predicts that the pronoun him5 can reindex to him2. Such reindexing is allowed since an empty category with the index 2— namely, e2—occurs to the left of him. Thus, the Reindexing Rule permits him5 to become him2 as in (35): (35) [S′ [NP2 which pilot who shot at it4] [S e2 [VP [NP3 every MIG that chased him2] [VP hit e3]]]]
In (35) the pronoun him satisfies condition (18), and thus it is correctly predicted that him is interpretable as a variable bound by the wh-phrase. But we now confront an apparent problem. To derive a representation of “crossed” pronominal variable binding interpretation, not only must the pronoun him be coindexed with the wh-phrase, but also the pronoun it must be coindexed with the QP every MIG that chased him. However, the Reindexing Rule prohibits coindexing it with the QP. That is, the pronoun it4 cannot reindex to it3 since no empty category e3 occurs to the left of it4. Rather, e3 occurs to the right of it4, and consequently reindexing is blocked. Thus, Higginbotham’s (1980) formalization of Chomsky’s (1976) Leftness Condition predicts that it cannot be interpreted as a variable bound by the QP. Hence, this analysis predicts that such “crossed” pronominal variable binding interpretation is impossible.8 Although this may appear at first to be a problematic result, I believe it is exactly the right prediction. In fact, the pronoun it cannot be interpreted as a variable bound by the QP, just as this analysis predicts.9 Examples like (36) demonstrate the same effect: (36) Which pilot who shot at it hit every MIG?
Here again, a pronoun within a subject wh-phrase cannot be interpreted as a variable bound by an object QP. The same phenomenon occurs in examples in which a QP replaces the wh-phrase. For instance:
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
47
(37) Some pilot who shot at it hit every MIG (that chased him).
With or without the parenthesized phrase, it is once again the case that the pronoun it cannot be interpreted as a variable bound by the object QP, exactly as predicted. We now have a solution to the problem with which we began this section. LF representations such as (29) are correctly excluded by appealing to Higginbotham’s formalization of Chomsky’s Leftness Condition. This analysis also predicts that “crossed” pronominal variable binding is impossible. I believe this prediction is exactly right.10 Crucially, these results concerning pronominal variable binding have direct implications for the theory of adjunction proposed in May (1985). Consider again, for example, the LF representation (35): (35) [S′ [NP2 which pilot who shot at it it4][S e2 [VP NP3 every MIG that chased him2] [VP hit e3]]]]
If it is true, as assumed here, that the VP-adjoined QP cannot in fact variable-bind the pronoun it, then such pronominal variable-binding phenomena provide no motivation for assuming that the QP adjoined to VP ccommands the pronoun it. Since there is no reason to assume that the VP-adjoined QP c-commands the pronoun it, there is no motivation for assuming May’s theory of adjunction for these cases. Rather, we could just as well make the standard assumption regarding adjunction structures under which adjunction to VP results in the appearance of two maximal VP projections. Under this analysis, a QP adjoined to VP would not c-command any position external to the higher maximal VP projection. As we have seen, this analysis of adjunction structures is apparently consistent with the facts concerning pronominal variable binding.11, 12 Notes * This article is a modified version of a subsection of Epstein (1987). I am indebted to Andy Barss, Howard Lasnik, Elaine McNulty, and Esther Torrego for very helpful discussion. Portions of this material were presented at the University of Texas at Austin and at Harvard University. I wish to thank the audiences at each institution for insightful discussion. Thanks also to two anonymous LI reviewers for helpful suggestions. Unless otherwise noted, references to May’s work refer to May (1985). 1 Regardless of whether S is assumed to be maximal or not, everyone c-commands what since, under the theory of adjunction proposed in May (1985), everyone is not dominated by the S projection but rather is dominated by only one node of this projection. 2 Although I believe this is May’s assumption, there seems to me to be some potential unclarity surrounding the question of which categories generate paths to their binders. I assume throughout that May’s proposal is that all and only locally Ā-bound categories generate paths to their binders (see below). However, in certain passages May states that all Ā-bound categories (whether locally Ā-bound or not) generate paths. For example: (i) “…paths are more generally associated with all Ā-bound elements, regardless of whether they are lexical or not. More specifically,… locally Ā-bound pronouns generate paths to their binders….” (p. 146) (ii) With respect to the LF representation [S′ who2 [S e2 admires his2 mother]], “This structure contains only a single Ā-path, that of the trace. The pronoun generates no path at all, since it is locally Ā-bound by the trace.” (p. 147) (iii) “…all Ā-bound categories give rise to paths.” (p. 147) (iv) With respect to the following LF representation in which every pilot is in an Ā-position and ccommands and is coindexed with him
48
ESSAYS IN SYNTACTIC THEORY
[S [NP2 every pilot] [S e2 [VP [NP3 some MIG that chased him2] [VP hit e3]]]] “Here the pronoun is no longer Ā-bound; its local binder is the empty category in the subject position. That is, it is Ā-bound and therefore not associated with a path.” (p. 148) (v) “…all Ā-bound categories, be they empty or lexical, generate paths….” (p. 154) As further concerns this potential unclarity, I should add that I am unable to locate definitions of binds, locally binds, and locally Ā-binds in May (1985). For standard definitions, see Chomsky (1981; 1986). For discussion of empirical differences between the definitions of local binding given in Chomsky (1981) and Chomsky (1986), see Epstein (1986). 3 Although I am unable to find any discussion of example (14) within the context of the PCC (see May, section 5. 5), its LF representation (17) should be discussed with respect to this principle. (17) must be well-formed since it is the LF representation of what is assumed to be an available interpretation of (14). I assume that the path structure of (17) is as follows and therefore satisfies the PCC: (i) path e2 {S,S′} path e3 {VP,VP} path it3 {…NP2,S′,S,VP} A number of comments regarding this path structure are in order. First, by analogy with (23) I assume that him in (17) is not associated with a path. Second, following May (p. 148) I use “…” to refer to the nodes in the path of if, each of which has the following two properties: (i) it dominates it, and (ii) it is dominated by NP2. Finally, I assume that it is associated with a path (see fn. 2). The status of it in (17) is very much like that of him in the following LF representation (that is, each pronoun is c-commanded by a coindexed phrase occurring to its right): (ii) (=May’s (68a), p. 148) [S [NP3 some MIG that chased him2][S [NP2 every pilot] [S e2 hit e3]]]
Crucially, May assumes that him in (ii) is indeed associated with a path; the path structure of (ii) violates the PCC as desired (see May, p. 148) precisely because the paths of him and e overlap yet fail to embed: (iii) path e2 {S,S} path him2 {…NP,S,S} path e3 {VP,S,S,S} Thus, given that him in (ii) generates a path, I assume that it in (17) does too. 4 The discussion of weak crossover and the PCC in May (section 5.5) concerns only wh-phrases, not QPs. 5 To see why it must be the case that paths sharing a single node do not intersect, consider, for example, inversely linked sentences such as (i) (from May, p. 150): (i) Somebody from every city despises it. May (p. 151) assumes that the pronoun can be interpreted as a bound variable (see also May (1977) for discussion of inverse linking). He assumes the following LF representation and path structure for this bound variable interpretation of (i):
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
49
(ii) (=May’s (74b), p. 151; I assume that despised here versus despises in (i) is a typographical error) [S [NP3 every city2 [NP3 somebody from e2]] [S e3 [VP despised it2]]] (iii) path e2 {…NP3,NP3} path it2 {VP,S,S,NP3} path e3 {S,S} The assumption that such structures are well -formed motivates the condition that paths sharing a single node do not intersect. If such paths did intersect, (ii) would be ruled out by the PCC; the paths of e2 and it2 share a single node (the higher NP3 node), yet neither path is a proper subset of the other (see May (1985) for discussion of other well-formed structures consisting of paths that are not in a proper subset relation but do share a common node). Given t hat paths sharing a single node do not intersect, notice that the PCC in fact fails to apply to (29)/(30). 6 A similar problem confronts the analysis of sentences such as (i): (i) Every MIG that chased him hit some pilot. May (p. 147) claims that, regardless of the scope relations of the two quantifiers, the pronoun cannot be interpreted as a bound variable. However, consider the following LF representation of (i) in which the object QP is adjoined to VP: (ii) [S′ [S [NP2 every MIG that chased him3][S e2 [VP some pilot3 [VP hit e3]]]]] In (ii) him is within the absolute scope of the VP-adjoined QP some pilot (just as it is within the absolute scope of the VP-adjoined QP every MIG that chased him in (17)). Since these phrases are coindexed, (18) is satisfied and the pronoun is thus a bound variable. Further, the PCC is satisfied, given that (ii) has the following path structure: (iii) path him3 {…NP2,S,S,VP} path e2 {S,S} path e3 {VP,VP} Thus, it is also incorrectly predicted that sentences like (i) allow bound variable interpretation of the pronoun. 7 See also Higginbotham (1981; 1983) for modifications (irrelevant here) of the analysis presented in Higginbotham (1980). For different analyses excluding weak crossover, see, for example, Koopman and Sportiche (1982), Safir (1984), Reinhart (1983), and Stowell and Lasnik (1987). 8 It is not an idiosyncratic property of the Leftness Condition that it excludes such “backward” pronominal variable binding. Other analyses that also predict the impossibility of such pronominal variable binding include those of Postal (1970), Kuno (1972), Lasnik (1976), Jacobson (1977), and Wasow (1979). 9 An LI reviewer notes that if S were taken to be maximal (namely, I″), contra May (1985), then this would equally well predict that the VP-adjoined QP in (35) could not (variable-) bind it. Nevertheless, postulating that S is maximal does not exclude the central problematic structure (29) (nor is it excluded by assuming that S is nonmaximal). By contrast, (29) is, as proposed above, readily excluded by the Leftness Condition; hence, I will continue to assume this constraint. 10 An LI reviewer has informally polled ten native speaker linguists regarding the interpretation of (14). The reviewer reports that no one was able to get the crossed interpretation; more specifically, no one could interpret it as a bound variable. This accords with the judgments I have also obtained.
50
ESSAYS IN SYNTACTIC THEORY
However, the reviewer provides the following two counterexamples to the Leftness Condition for which s/he can get a bound variable reading of its: (i) A plaque commemorating its incorporation stands/can be found at the boundary of every New England township. (ii) A parade honoring its leader takes place every year in every African republic. The reviewer argues that the scope and θ-relations in (i) and (ii) differ from those in (14), that this might be relevant, and that if it is, it is not captured by the Leftness Condition. The exact status of these particular data and the adequacy of the Leftness Condition with respect to accounting for them, I leave as an open issue. See the references in footnote 8 for further detailed discussion of the data. It should also be mentioned that, in contrast to “crossed” pronominal variable binding, crossing coreference is certainly well-formed: (iii) The pilot who shot at it hit the MIG that chased him. This follows from the fact that representations of preferential interpretation are not subject to the constraints on representations of bound variable interpretation. 11 Lasnik and Saito (forthcoming) present evidence (relating to the Empty Category Principle) against May’s theory of adjunction. Further problems (unrelated to pronominal variable binding) confronting May’s analysis are presented in Williams (1988). See the latter study for an alternative to May’s analysis of sentences such as (1) and (2). 12 An LI reviewer notes that if Higginbotham’s formulation of Chomsky’s Leftness Condition were adopted in May’s system, then (17) and (29) would be correctly excluded by this principle and consequently May’s analysis of adjunction could be retained. The reviewer is right that May’s analysis could be retained; however, what I have argued is that there is no motivation for retaining May’s theory of adjunction for these cases and that we could just as well make the standard assumption regarding adjunction structures. In other words, the data examined here provide no independent support for May’s theory of adjunction and are fully consistent with an alternative, standard analysis of adjunction.
References Aoun, J. and Sportiche, D. (1983) “On the Formal Theory of Government,” The Linguistic Review 2:211–235. Chomsky, N. (1976) “Conditions on Rules of Grammar,” Linguistic Analysis 2:303–351. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1986) Knowledge of Language: Its Nature, Origin, and Use, New York: Praeger. Epstein, S.D. (1986) “The Local Binding Condition and LF Chains,” Linguistic Inquiry 17:187–205. Epstein, S.D. (1987) “Empty Categories and Their Antecedents,” unpublished doctoral dissertation, University of Connecticut, Storrs. Higginbotham, J. (1980) “Pronouns and Bound Variables,” Linguistic Inquiry 11:679–708. Higginbotham, J. (1981) “Anaphora and GB,” in Jensen, J.T. (ed.) Proceedings of the Tenth Annual Meeting, NELS (Cahiers Linguistiques d’Ottawa 9), Department of Linguistics, University of Ottawa, Ontario. Higginbotham, J. (1983) “Logical Form, Binding, and Nominals,” Linguistic Inquiry 14:395–420. Jacobson, P. (1977) “The Syntax of Crossing Coreference Sentences,” unpublished doctoral dissertation, University of California at Berkeley. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160. Kuno, S. (1972) “Functional Sentence Perspective: A Case Study from Japanese and English,” Linguistic Inquiry 3: 269–320.
ADJUNCTION AND PRONOMINAL VARIABLE BINDING
Lasnik, H. (1976) “Remarks on Coreference,” Linguistic Analysis 2:1–22. Lasnik, H. and Saito, M. (forthcoming) “Move Alpha,” Cambridge, Mass.: MIT Press. May, R. (1977) “The Grammar of Quantification,” unpublished doctoral dissertation, MIT. May, R. (1985) Logical Form: Its Structure and Derivation, Cambridge, Mass.: MIT Press. Pesetsky, D. (1982) “Paths and Categories,” unpublished doctoral dissertation, MIT. Postal, P. (1970) “On Preferential Complement Subject Deletion,” Linguistic Inquiry 1:429–500. Reinhart, T. (1983) Anaphora and Semantic Interpretation, Chicago: University of Chicago Press. Safir, K. (1984) “Multiple Variable Binding,” Linguistic Inquiry 15:603– 638. Stowell, T. and Lasnik, H. (1987) “Weakest Crossover,” manuscript, UCLA and University of Connecticut, Storrs. Wasow, T. (1979) Anaphora in Generative Grammar, SIGLA 2, Ghent: E. Story-Scientia. Williams, E. (1988) “Is LF Distinct from S-Structure? A Reply to May,” Linguistic Inquiry 19:135–146.
51
5 QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
In this article I present new phenomena concerning quantificational interpretation in certain null operator constructions and argue that these phenomena receive a natural explanation within a Government-Bindingtype theory of LF representation. In particular, I present a new form of evidence indicating both that Universal Grammar does incorporate an LF Quantifier Movement rule (as in May (1977; 1985)) and that the representation of certain constructions does indeed contain a phonetically null operator (NO) (as in Chomsky (1982; 1986b)). I will show that (under certain (re-) formulations) the Quantifier Movement analysis and the NO analysis interact to explain the phenomena presented below. In section 1 I briefly review the core analysis of LF quantifier movement as presented in May (1977; 1985). In section 2 I summarize the central aspects of Chomsky’s (1982; 1986b) analysis of NO constructions. Finally, in section 3 I present new data regarding these constructions and show how a natural explanation follows from certain principled revisions of the two analyses reviewed in the previous sections. 1 Quantifier Movement: Raising and Lowering Under the analysis provided in May (1977; 1985), a Quantifier Phrase (QP) undergoes the following rule in the LF component:1 (1) Adjoin QP (to S). (adapted from May (1977))
Rule (1) applies freely. For example, consider S-Structure representations such as (2a) and (2b): (2) a. [S′ [S every childi thinks [S′ that [S John left]]]] b. [S′ [S every childi tried [S′ [S PROi to leave]]]]
Application of (1) produces two outputs for each of these: (3) a. i. [S′ [S every childi [S ei thinks [S′ that [S John left]]]]] ii. [S′ [S ei thinks [S′ that [S every childi [S John left]]]]] b. i. [S′ [S every childi [S ei tried [S′ [S PROi to leave]]]]] ii. [S′ [S ei tried [S′ [S every childi [S PROi to leave]]]]]
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
53
In (3ai) and (3bi) the QP has been adjoined to the matrix S (“raised”) by rule (1). These are well-formed LF representations since each trace/variable is bound by a QP and each QP binds a trace/variable (see May (1977; 1985) for further discussion). In (3aii) and (3bii) the QP has been adjoined to the embedded S (“lowered”) by rule (1). The derived representations are filtered. (3aii) exhibits two violations: the QP fails to bind a trace/variable, violating the prohibition against vacuous quantification, and the matrix subject, a θmarked, Case-marked free empty category, is presumed to be illicit (see May (1977; 1985)). In (3bii) the matrix empty category is similarly illicit (whether vacuous quantification also occurs depends on whether PRO can function as a variable in the requisite sense). Thus, (3ai) and (3bi) are well-formed, but (3aii) and (3bii) are not. Given hierarchically defined principles of interpretation, this is the right result: sentences like (2a) and (2b) are scopally unambiguous. The QP obligatorily takes wider scope than the matrix predicate. This is directly represented in (3ai) and (3bi) given that scope of a QP is defined as every node within the ccommand domain of the QP. Under this analysis, one might suspect that Quantifier Lowering never yields a well-formed LF representation. May (1977; 1985) proposes one licit type of lowering, however. Consider an S-Structure representation with a QP in subject position of a Raising predicate such as seem: (4) [S′ [S some childi seems [S ei to be intelligent]]]
As May notes, the sentence represented is two-ways scopally ambiguous; the QP can be interpreted as having scope either wider than or narrower than the matrix predicate. Thus, ignoring tense differences (needed to preserve grammaticality), this sentence is interpretable as either (5a) or (5b): (5) a. As for some child, it seems that s/he is intelligent. b. It seems that some child is intelligent.
Within May’s (1977) framework (compare May (1985)), there must be two well-formed LF representations of (4). First, rule (1) can adjoin the QP to the matrix S, deriving the LF representation of the wide-scope interpretation, (6a). May argues that the narrow scope interpretation is properly represented by lowering the QP, thereby deriving (6b): (6) a. [S′ [S some childi [S ei seems [S ei to be intelligent]]]] b. [S′ [S ei seems [S some childi [s ei to be intelligent]]]]
But why is the LF representation (6b) well-formed whereas (3aii) and (3bii) are not? Lowering is apparently allowed only with Raising-type predicates. Only with such predicates will the free empty category created by Quantifier Lowering occupy a nonthematic position. May argues that the category itself is nonthematic—that is, an expletive. Hence, in (6b) the matrix empty category is licit, whereas in (3aii) and (3bii) it is not (see May (1977; 1985)). Furthermore, (6b) is arguably an LF representation in which the QP binds a variable (namely, the embedded subject), thereby satisfying the prohibition against vacuous quantification.2 This completes our outline of Quantifier Movement as analyzed in May (1977; 1985).3 Before we look at strong additional support for this analysis, let us briefly review Chomsky’s analysis of null operator constructions.
54
ESSAYS IN SYNTACTIC THEORY
2 Null Operators Following Chomsky (1977; 1981), Chomsky (1982; 1986b) proposes that the S-Structure representation of a sentence such as (7a) is (7b): (7) a. The men are too stubborn to talk to. b. [S′ [S the meni are too stubborn [S′ Oi [S PROj to talk to ei]]]
Under Chomsky’s analysis, this structure is derived by syntactic movement of the null operator O (hereafter NO) from the position of e to Comp.4 The empty category e created by NO Movement is a variable in the (syntactic) sense that it is locally Ā-bound by an operator. However, as concerns interpretation, Chomsky argues that an NO is incapable of assigning a range to the variable it binds. Thus, with respect to the NO, e is a free variable, a category type prohibited at LF. Consequently, at this level e must be bound by the men. Chomsky presents two independent sources of evidence to support this analysis. First, if syntactic movement is indeed involved, we should expect Subjacency-type effects, and this expectation is apparently borne out:5 (8) * John is too stubborn to visit anyone who talked to. (from Chomsky (1986b, 110))
Second, the movement analysis entails the existence of an indexed Ā-binder at S-Structure, predicting that parasitic gaps are licensed (see Chomsky (1982)). This prediction is also confirmed:6 (9) John is too charming [Oi [PRO to talk to ei] without liking ei] (from Chomsky (1986b, 111))
Chomsky’s (1982; 1986b) NO analysis concentrates on degree-clause constructions such as (7a). Hereafter we will be concerned more with so-called tough constructions, as exemplified by (10): (10) John is easy to talk to.
As is standard, I assume that tough constructions, like degree-clause constructions, involve NO Movement (see Browning (1987) for detailed discussion). Thus, I assume that the S-Structure representation of (10) is (11): (11) [S′ [S Johni is easy [S′ Oi [S PROj to talk to ei]]]]
The NO Movement analysis of this construction is well motivated in that, like degree-clause constructions, tough constructions appear to exhibit Subjacency-type effects and license parasitic gaps, as shown by (12) and (13), respectively: (12) * John is easy to visit anyone who talked to. (13) John is easy to talk to without offending.
This strongly suggests that in the S-Structure representation of tough constructions, as in that of degree-clause constructions, an NO occupies Comp.7
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
55
However, as is well known, there is a crucial thematic difference between tough and degree-clause constructions. Consider first (14a) and (14b): (14) a.* Itexpl is too stubborn to talk to the men. b. The men are too stubborn for Mary to convince them.
Example (14a) is excluded by the θ-Criterion (Chomsky (1981)) since the matrix subject position is θmarked yet no argument is present. As expected, (14b) is well-formed. Unlike degree-clause subjects, the subject position of tough predicates is nonthematic; hence the contrast in (15): (15) a. Itexpl is easy for Bill to talk to John. b.* John is easy for Bill to talk to Fred.
Later we will return to this crucial difference between the two constructions. This concludes our brief reviews of Quantifier Movement and NO constructions.8 In the next section I present new evidence strongly supporting the core features of these analyses while also elucidating the exact formulation of certain very specific aspects of these accounts. 3 Quantification in NO Constructions Consider the following degree-clause construction and its S-Structure representation: (16) Many people are too stubborn to talk to. (17) [S′ [S many peoplei are too stubborn [S′ Oi [S PROj to talk to ei]]]]
Such a sentence is scopally unambiguous: the QP is obligatorily interpreted as having scope wider than the matrix predicate. This is predicted straight-forwardly under the Quantifier Movement analysis. Given free application of rule (1), the QP could be adjoined either to the matrix or to the embedded S, deriving either (18) or (19), respectively: (18) [S′ [S many peoplei [S ei are too stubborn [S′ Oi [S PROj to talk to e′i]]]]] (19) [S′ [S ei are too stubborn [S′ Oi [S many peoplei [S PROj to talk to e′i]]]]]
(18) is a well-formed LF representation, representing the only possible interpretation of sentence (16). (19), however, is ill-formed; ei occupies a thematic position and therefore cannot be an expletive. In this respect, (19) is identical to (3aii) and (3bii): it is excluded by virtue of containing an illicit free thematic empty category, namely, the matrix subject. Thus, degree-clause constructions with QP subjects are scopally unambiguous. Next, consider the tough construction (20a) and its S-Structure representation (20b): (20) a. Many people are easy to talk to. b. [S′ [S many peoplei are easy [S′ Oi [S PROj to talk to ei]]]]
56
ESSAYS IN SYNTACTIC THEORY
Like (16), (20a) is scopally unambiguous. Again, the QP is obligatorily interpreted as having wider scope than the matrix predicate. Thus, informally, (20a) is obligatorily interpreted as (21):9 (21) There are many people x, such that it is easy to talk to x.
In other words, a narrow-scope interpretation is impossible; the sentence cannot be interpreted as It is easy to talk to a large group of people. Formally, the QP cannot take narrow scope but rather must take scope over the matrix predicate.10, 11 This nonambiguity is surprising, since it means that the tough construction exhibits exactly the same interpretive properties as (16) (=too stubborn), (3a) (=think), and (3b) (=try) while displaying interpretive properties different from those of the Raising structure (4) (=seem). This is precisely the opposite of what we might expect. Quantification in a tough construction “should” pattern with quantification in a Raising structure since both have nonthematic subjects. However, quantification in a tough construction does not in fact pattern with quantification in Raising structures but instead patterns with quantification in constructions having θ-marked subjects. I will now argue that this fact is readily explained under certain principled assumptions regarding the Quantifier Movement and NO analyses reviewed in sections 1 and 2. To begin with, the freely applying Quantifier Movement rule (1) applying to the S-Structure representation (20b) yields two possible outputs: one in which the QP is adjoined to the matrix S and another in which it is adjoined to the embedded S: (22) [S′ [S many people; [S ei are easy [S′ Oi [S PROj to talk to e′i]]]]] (23) [S′ [S ei are easy [S′ Oi [S many peoplei [S PROj to talk to e′i]]]]]
(22) is a well-formed LF representation. First, the matrix subject is a licit category (a variable) since it is locally Ā-bound by the QP many people. Furthermore, e′i, locally Ā-bound by the NO, has its value properly determined since it is bound by the matrix subject (as well as the QP many people). This is the desired result. The LF representation is well-formed, and it is indeed a representation of the interpretation of this sentence: the QP has scope over the matrix predicate. Now consider (23). In deriving this representation, rule (1) has lowered the QP and adjoined it to the embedded S. In this representation the QP has scope narrower than the matrix predicate. This is not a possible interpretation of the sentence; hence, this representation must be excluded. Is (23) a well-formed LF representation? Notice first that there is nothing wrong with the matrix subject empty category. This category enjoys exactly the same status as the empty category in the LF representation (6b). That is, it is a licit expletive. The next thing to consider is the object empty category. Is it licit? Recall that the NO is unable to range-assign e′; hence, e′ must meet other requirements. With respect to nonquantificational NO constructions, such as (7a), Chomsky (1986b, 85) explicitly states the relevant requirement as follows:12 A variable must not only be bound by an operator…but must be bound in a still stronger sense: Either its range must be determined by its operator, or its value must be determined by an antecedent that binds it. Let us call this property strong binding as distinct from ordinary binding. Then, a further principle is: A variable must be strongly bound. The object empty category in (23) clearly meets this requirement since its range is indeed determined by a binding operator, namely, the QP many people. Thus, there is no failure of range assignment; the
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
57
nonvacuous operator many people strongly binds the object empty category and it therefore range-assigns this category. This analysis therefore apparently predicts (23) to be a well-formed LF representation. This is not the desired result, however, since the sentence represented in fact allows no narrow-scope reading of the QP. I will now argue that under certain modifications, representation (23) can be correctly excluded. Suppose first that in addition to requiring that a variable must be strongly bound, we assume that an NO also must be strongly bound:13 (24) Variables and NOs must be strongly bound.
Now, under this requirement, is (23) excluded? The answer to this question depends entirely on the point in the derivation at which (24) applies. Suppose first that the requirement could be satisfied prior to the application of Quantifier Lowering. This analysis will not exclude (23) since prior to Quantifier Lowering both the NO and the variable are strongly bound. Thus, if the strong binding requirement (24) could be satisfied at any point in the derivation prior to Quantifier Lowering, then both the NO and the variable would be strongly bound by the QP occupying matrix subject position. After such strong binding occurred, the QP could then be lowered. All requirements would be satisfied and (23) would be wrongly generated. To avoid this result, let us assume, as is natural, that (24) is a constraint on LF representation: (25) Variables and NOs must be strongly bound in LF representation.
Let us further assume that (25) is not a principle of grammar but a theorem derivable from the principle of Full Interpretation of Chomsky (1986b). It is thus natural that it be treated as a constraint on LF representation applying to null (that is, vacuous) operators, just as it does to variables. Given this organization of the grammar, we must now ask, Is the strong binding requirement (25) satisfied in the LF representation (23)? As before, the object variable is strongly bound by a range-assigning operator, namely, the QP many people. But is the NO strongly bound? Crucially, the matrix empty category, which is an expletive under May’s analysis, cannot strongly bind the NO since expletives by definition lack semantic content and therefore can neither range-assign nor determine the value of any category.14 Can any category in (23) function as a strong binder for the NO? The only potential binder (in the standard sense of the term bind) of the NO is the QP many people, which is adjoined to S in (23). If this QP did in fact bind the NO, then both the NO and the variable would be strongly bound, with the result that (23) would satisfy (25). The result we apparently want, then, is that many people does not bind the NO in the LF representation (23) and therefore cannot strongly bind the NO in this representation. Deriving this result requires that we do not adopt the definition of binding given in May (1985). Under that definition, the QP many people, adjoined to S, is predicted to bind the NO since c-command is defined as follows (see Aoun and Sportiche (1983)): (26) C-Command X c-commands Y=df every maximal projection dominating X dominates Y and X does not dominate Y.
If, by contrast, binding were defined so that many people did not bind the NO, then the NO would not be strongly bound and (23) could therefore be excluded as a violation of (25). This result could be obtained by assuming (contra May (1985)) both that S is a maximal projection (namely, I″) and that adjunction to a
58
ESSAYS IN SYNTACTIC THEORY
projection X results in the appearance of a second full X-projection (that is, not a segmented, single Xprojection).15 Under these assumptions, the QP many people does not bind the NO in the LF representation (23), with the result that the NO is not strongly bound. Consequently, the representation is excluded by (25) —that is, by Full Interpretation—and, as desired, the unavailability of a narrow-scope interpretation of such sentences is explained. It should be noted, however, that alternative analyses of the ill-formedness of (23) could be provided within May’s (1985) framework. Thus, contrary to the above analysis, suppose that we do adopt May’s (1985) theory of adjunction and binding. Under this analysis, many people and the NO bind each other. Consequently, the NO in (23) is licit since it is bound, hence strongly bound by the QP many people. Nonetheless, under this analysis, the representation (23) might still be excluded by the prohibition against vacuous quantification. Recalling that many people and the NO bind each other, suppose we adopt (27): (27) X is a variable only if it is in an A-position and is locally Ā-bound.
(On the definition of variable, see among others Chomsky (1981; 1982; 1986b), Koopman and Sportiche (1983), May (1983), Borer (1980), Epstein (1984b; 1987), and Lasnik and Uriagereka (1988).) Under (27), the object empty category will be a variable only if it is locally Ā-bound. Does either the NO or many people locally Ā-bind the object empty category in (23)? As noted in Epstein (1986), neither does under (28): (28) Local Binding X locally binds Y iff X binds Y and there is no Z such that X binds Z and Z binds Y. (from Chomsky (1986b))
Thus, since the NO and many people bind each other, (28) predicts that neither the NO nor many people locally binds the object empty category in (23); hence, this category is not locally Ā-bound and therefore is not a variable. Consequently, many people (as well as the NO) fails to bind a variable and the prohibition against vacuous quantification correctly excludes (23). May’s (1985) theory of adjunction and binding could also exclude (23) in yet a different way. Suppose local binding were alternatively defined as follows: (29) Local Binding X locally binds Y iff X binds Y and if Z binds Y then Z binds X. (adapted from Chomsky (1981))
As Epstein (1986) points out, under this definition, both many people and the NO locally Ā-bind the object empty category: a variable. Consequently, (23) is excluded by the first conjunct of the Bijection Principle of Koopman and Sportiche (1983): Every variable is locally bound by one and only one Ā-position. (But for problems with the (second conjunct of the) Bijection Principle as it pertains to null operators, see fn. 6 and Stowell and Lasnik (1987).) Thus, under May’s (1985) theory of adjunction and binding, there are at least two conceivable alternative analyses excluding (23). These two analyses, like the first one presented, crucially rest on an interaction between the Quantifier Movement and NO analyses, thereby similarly providing independent support for each. (But for problems with May’s (1985) analysis of adjunction and binding, upon which each of the latter two alternative analyses rest, see Epstein (1989), Lasnik and Saito (forthcoming), and Williams (1988).)
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
59
In sum, this article has presented new evidence indicating both that UG does incorporate an LF Quantifier Movement rule, as proposed in May (1977; 1985), and that the representation of tough constructions does indeed contain a phonetically null operator, as proposed in Chomsky (1982; 1986b).16, 17 Notes
1
2
3
4 5
* For very helpful discussion, I thank Maggie Browning, Howard Lasnik, Elaine McNulty, and Esther Torrego. I also thank an LI reviewer for very insightful comments. Many assume that a QP can also be adjoined to VP (see, for example, May (1985) and the references cited). I will ignore this possibility since it is irrelevant to the analysis presented here. For problems related to VP adjunction, see, for example, Epstein (1989) and Lasnik and Uriagereka (1988). Notice, however, that the variable bound by the QP, although thematic, is not Case-marked. I leave open the question of how LF representations derived by Quantifier Lowering might satisfy Case-related conditions on chains (if any). Notice that it appears to be the case that Quantifier Movement either moves a QP to a position that c-commands the extraction site (“raising”) or moves a QP to a position that is c-commanded by the extraction site (“lowering”). Barss (1986) presents evidence that this reflects a constraint on movement that prohibits “sideward” movement, that is, movement from a position X to a position Y where neither position c-commands the other. See Barss (1986) for further discussion. See also Chomsky (1986b) for discussion of a similar property of A-to-A movement. See Browning (1987) for an extensive analysis of a number of different NO constructions. In fact, the operative constraint seems to be more restrictive than Subjacency, as evidenced by the ungrammatically of examples such as the following: (i) * John is too stubborn to tell anyone that Mary talked to.
See Lasnik and Fiengo (1974). 6 If, as argued, there is an NO in Comp in such constructions, we might expect (i) to exhibit a weak crossover effect. In fact, however, it does not. (i) [S′ [S the men are too stubborn [S′ Oi [S PRO to talk to theiri friends about ei]]]]
7
8 9 10
The grammatically of this type of case follows from the analysis of Stowell and Lasnik (1987). As they note, NOs do not induce weak crossover effects, an effect they argue to be restricted to instances of true quantification not exhibited by NOs. Given this, the grammaticality of (i) is consistent with the NO analysis. The exact derivation and S-Structure representation of such constructions remain in certain respects unclear (see Lasnik and Uriagereka (1988) for discussion). Here, all that is at issue is that an NO indeed exists in Comp. For further discussion and analysis of the derivation of tough constructions, see Epstein (in progress). See May (1977; 1985), Chomsky (1982; 1986b), and Browning (1987) as well as the references cited for much more detailed discussion. Here I am ignoring the so-called arbitrary interpretation of the infinitival subject. See Epstein (1984a) for an analysis of this. This effect seems to be a syntactic one; that is, regardless of definiteness/indefiniteness, no QP occupying this position permits a narrow-scope interpretation. One might think that plural verbal agreement (somehow) forces a wide-scope reading for the QP many people. This is not true, as evidenced by the existence of both a wide- and a narrow-scope reading for (i) and (ii), which also exhibit plural agreement:
60
ESSAYS IN SYNTACTIC THEORY
(i) Many people are likely to leave. (ii) Many people seem to leave. 11 Postal (1971) notes that true indefinites do not undergo Tough Movement: (i) * A car which I gave Bill is difficult for him to drive slowly. (ii) The car which I gave Bill is difficult for him to drive slowly. He takes such data as evidence for an idiosyncratic constraint prohibiting indefinites from undergoing Tough Movement. Lasnik and Fiengo (1974), arguing against a movement analysis, propose that the facts are explained by a Deep Structure constraint prohibiting indefinites as subjects of predicates denoting characteristics. (See Postal (1971) and Lasnik and Fiengo (1974) for further discussion.) See also Postal (1974) for discussion of quantifier scope in tough constructions. Postal notes the forced widescope reading of examples such as (iii) and (iv): (iii) Nothing is hard for Melvin to lift. (iv) Few girls would be difficult for Jim to talk to. 12 Chomsky (1982, 31) states a requirement that is, for all intents and purposes, identical to this requirement. 13 Stowell (1985a) argues that an NO receives certain features including [±wh] from its local binder. Given this, he accounts for the fact that (i) has no multiple interrogation reading whereas (ii) does (for some speakers): (i) Whati did John buy ti [Oi without reading ei]? (ii) Whoi did [Oi your stories about ei] amuse ti? Under Stowell’s analysis, (ii) allows (in fact must obligatorily have) a multiple interrogation reading, since O is locally bound by who. (This analysis requires that coindexation permit independent construal, which seems to necessitate some reinterpretation of coindexation.) By contrast, (i) allows no multiple interrogation reading since the NO is locally bound by the trace t, which, as shown by Lasnik and Saito (1984), is a [—wh] category. For this analysis of (i) to succeed, it must be assumed that the trace is indeed the local binder of the NO (as argued by Contreras (1984)). However, Stowell and Lasnik (1987) present arguments that (contra Contreras) the trace does not bind into the adjunct clause. If this is correct, the NO in (i) is locally bound by the wh-phrase, incorrectly predicting the existence of (or necessity of) a multiple interrogation reading of (i). An additional potential problem confronting Stowell’s analysis is pointed out by Browning (1986). As she notes, in (iii) the NO is locally bound by who, yet a multiple interrogation reading is not possible (see Browning (1986) for further discussion): (iii) Whoi did you see [NP a picture of ti][Oi before meeting ei]? Despite these potential problems, Stowell’s (1985a) Range Assignment Principle appears to entail the proposal made here that an NO must be strongly bound: (iv) Range Assignment Principle The range of a null quantifier is determined by its local (A- or Ā-) binder. 14 Here, I will not pursue Chomsky’s (1986b) proposal that expletives are replaced at LF (an analysis facing a number of complications; see Chomsky (1986b, 179)). As far as I can tell, expletive replacement is inconsequential to the analysis presented here.
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
61
15 Evidence for this theory of adjunction is presented in Lasnik and Saito’s (forthcoming) analysis of the Empty Category Principle (ECP). See also Epstein (1989) for discussion of the theory of adjunction and pronominal variable binding. 16 Authier (1989) presents some evidence suggesting that (contrary to standard assumptions) null operators might not occupy Comp (=Spec CP) at S-Structure. He assumes (for concreteness) that they are instead adjoined to S at this level. First, notice that if Authier’s suggestion regarding the S-Structure position of null operators is correct, this is in no way inconsistent with the first (strong binding) analysis presented here: all that is necessary for that analysis to succeed is that the null operator in (23) occupy Comp/Spec CP (or, for that matter, any position outside the scope of the S-adjoined QP) at LF. In other words, the S-Structure position of the null operator is not directly relevant. However, there is one derivation that is potentially problematic for the strong binding analysis. Suppose, following Authier, that null operators are indeed barred from Spec CP at S-Structure (for whatever reason) and are instead adjoined to S at S-Structure. Now, if the null operator is S-adjoined in the S-Structure representation of (20a) and if the null operator does not move in the LF component, then the following LF representation (of the nonexistent narrow-scope reading) is apparently derivable by Quantifier Lowering: (i) [S′ [S ei lnfli are easy [S′ [S many peoplei [S Oi [S PROj to talk to e′i]]]]]] In (i) the strong binding requirement (25) is presumably met. Nonetheless, a host of distinct analyses, each excluding (i), present themselves. First, the latter two analyses of (23) given in the text (provided within May’s (1985) framework) can exclude it. Thus, under May’s (1985) definitions, combined with Chomsky’s (1986b) definition of local binding, e′i has no local binder and therefore is not a variable. Consequently, the principle barring vacuous quantification (Full Interpretation) is violated. Second, by combining May’s (1985) definitions with Chomsky’s (1981) definition of local binding, both many peoplei and Oi locally bind e′i, thereby violating the Bijection Principle. A third way to exclude (i) is to assume that O is, in certain respects, indistinguishable from an intermediate trace (see Aoun and Clark (1984)). In particular, suppose it is subject to the ECP. Since an adjunct trace requires an antecedent governor (see Lasnik and Saito (1984)), O, being an adjunct that is indistinguishable from a trace, must be antecedentgoverned. Now, if only a head can properly govern (see, for example, Davis (1984; 1987), Stowell (1985b), Rizzi (1986), Lasnik and Saito (forthcoming)), the QP many people does not antecedent-govern O. Rather, the only potential head-governor for O is the matrix Infl. But if S′ is a barrier to antecedent government, except for elements in Comp/Spec CP (see Lasnik and Saito (1984)) then O violates the ECP. (Thus, the ECP “forces” O into Comp/Spec CP, thereby deriving (23), which violates the strong binding requirement (25).) A fourth independently motivated way of excluding (i) is to prohibit any instance of “double adjunction” to a single node, a constraint most recently assumed, for independent reasons, in Chomsky (1986a) and in May (1985). In sum, if it is true that null operators are S-adjoined at S-Structure, this presents no difficulty whatsoever for the analysis presented here. 17 An LI reviewer suggests that the impossibility of a narrow-scope reading in tough constructions would be easily explained if Quantifier Lowering were treated as a chain-internal phenomenon: Lowering appears to be possible only when a lowered quantifier binds a trace of its own movement. Notice first that the analyses I presented in the text entail that Quantifier Lowering is possible only when a QP binds “its own trace.” Whether an equally successful explanation of this fact could be proposed in terms of the construct chain depends entirely on the precise definition of the term chain and on the exact analysis of the chain structure of tough constructions (see fn. 7). If a QP subject of a tough predicate does form a chain with the NO and its trace (as appears to be dictated by the Projection Principle, while perhaps running counter to the standard assumption that nonsingleton chains are histories of movement), then it must be explained why chain-internal lowering is illicit with tough predicates but is otherwise allowed. In other words, chain-internal does not provide the requisite distinction. One answer would
62
ESSAYS IN SYNTACTIC THEORY
be that lowering is prohibited with tough predicates for precisely the reasons given above, namely, that independently motivated LF principles are necessarily violated. On the other hand, if a QP subject of a tough predicate does not form a chain with the NO and its trace then it could indeed be claimed that “chain-internal lowering is allowed whereas chain-external lowering is not.” But notice that this can be explained; chain-external lowering either leaves a θ-marked empty category free at LF (as in (3aii) and (3bii)) or violates other LF principles (as shown above).
References Aoun, J. and Clark, R. (1984) “On Non-Overt Operators,” manuscript, UCLA. Aoun, J. and Sportiche, D. (1983) “On the Formal Theory of Government,” The Linguistic Review 2:211–235. Authier, J.M.P. (1989) “Two Types of Empty Operator,” Linguistic Inquiry 20:117–125. Barss, A. (1986) “Chains and Anaphoric Dependence: On Reconstruction and Its Implications,” unpublished doctoral dissertation, MIT. Borer, H. (1980) “On the Definition of Variable,” Journal of Linguistic Research 1:17–40. Browning, M. (1986) “Null Operator Constructions,” manuscript, MIT. Browning, M. (1987) “Null Operator Constructions,” unpublished doctoral dissertation, MIT. Chomsky, N. (1977) “On Wh-Movement,” in Culicover, P., Wasow, T. and Akmajian, A. (eds.) Formal Syntax, New York: Academic Press. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986a) Barriers, Cambridge, Mass.: MIT Press. Chomsky, N. (1986b) Knowledge of Language: Its Nature, Origin, and Use, New York: Praeger. Contreras, H. (1984) “A Note on Parasitic Gaps,” Linguistic Inquiry 15: 698–701. Davis, L.J. (1984) “Arguments and Expletives: Thematic and Non-thematic Noun Phrases,” unpublished doctoral dissertation, University of Connecticut, Storrs. Davis, L.J. (1987) “Remarks on Government and Proper Government,” Linguistic Inquiry 18:311–321. Epstein, S.D. (1984a) “Quantifier-pro and the LF Representation of PROarb,” Linguistic Inquiry 15:499–504. Epstein, S.D. (1984b) “A Note on Functional Determination and Strong Crossover,” The Linguistic Review 3:299–305. Epstein, S.D. (1986) “The Local Binding Condition and LF Chains,” Linguistic Inquiry 17:187–205. Epstein, S.D. (1987) “Empty Categories and Their Antecedents,” unpublished doctoral dissertation, University of Connecticut, Storrs. Epstein, S.D. (1989) “Adjunction and Pronominal Variable Binding,” Linguistic Inquiry 20:307–319. Epstein, S.D. (in progress) “On the Derivation of Tough-Constructions,” manuscript, Harvard University. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160. Lasnik, H. and Fiengo, R. (1974) “Complement Object Deletion,” Linguistic Inquiry 5:535–571. Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. Lasnik, H. and Saito, M. (forthcoming) Move Alpha, Cambridge, Mass.: MIT Press. Lasnik, H. and Uriagereka, J. (1988) A Course in GB Syntax: Lectures on Binding and Empty Categories, Cambridge, Mass.: MIT Press. May, R. (1977) “The Grammar of Quantification,” unpublished doctoral dissertation, MIT. May, R. (1983) “Autonomy, Case, and Variables,” Linguistic Inquiry 14: 162–168. May, R. (1985) Logical Form: Its Structure and Derivation, Cambridge, Mass.: MIT Press. Postal, P. (1971) Crossover Phenomena, New York: Holt, Rinehart and Winston. Postal, P. (1974) On Raising: One Rule of English Grammar and Its Theoretical Implications, Cambridge, Mass.: MIT Press. Rizzi, L. (1986) “Null Objects in Italian and the Theory of pro,” Linguistic Inquiry 17:501–558.
QUANTIFICATION IN NULL OPERATOR CONSTRUCTIONS
63
Stowell, T. (1985a) “Licensing Conditions on Null Operators,” in Proceedings of the West Coast Conference on Formal Linguistics 4:314–326. Stowell, T. (1985b) “Null Operators and the Theory of Proper Government,” manuscript, UCLA. Stowell, T. and Lasnik, H. (1987) “Weakest Crossover,” manuscript, UCLA and University of Connecticut, Storrs. Williams, E. (1988) “Is LF Distinct from S-Structure? A Reply to May,” Linguistic Inquiry 19:135–146.
6 DIFFERENTIATION AND REDUCTION IN SYNTACTIC THEORY: A CASE STUDY
1 Introduction The Government-Binding theory is modular, composed of distinct subcomponents, each consisting of differentiated syntactic principles. The development of this theory, like that of other theories, has, in large part, consisted of the empirical and conceptual task of determining the class of differentiated principles and components and the nature of the differences among them. Over the course of the evolution of this theory, it has often been argued that what was thought to be a distinct subcomponent or principle is in fact not, but is reducible to other, independent principles of grammar. To take but one prime example of this (and there are many others) consider the existence of phrase structure rules. Until quite recently, it was assumed that natural language grammars contain a differentiated (distinct) phrase structure component, consisting of a set of such rules. However, it has been argued recently that phrase structure rules may well be eliminated by virtue of being reducible to independently motivated grammatical principles (see Chomsky (1981, sections 2.1 and 2.5; 1982), Stowell (1981)). Similar kinds of reduction have also been proposed with respect to the transformational component, now arguably reduced to the single rule Move-alpha as well as to other components of Universal Grammar. A classic example of syntactic differentiation is seen in the development of the distinct subcomponent called (Abstract) Case theory (see Lasnik and Uriagereka (1988) for discussion of the origins of this theory). The core principle of Case theory is the Case Filter, a filter which characterizes as ill-formed any SStructure representation containing a lexical NP lacking Case (for further refinements, see below). Recently, it has been suggested that this principle too is reducible. Chomsky (1981; 1986b) proposes that, under the postulation of a Visibility Principle (defined below), Case Filter violations are reducible to violations of the Theta Criterion. Hence, it is argued, the Case Filter is eliminable. This article constitutes a study of this particular reduction of the Case Filter to Theta theory. We will introduce a form of evidence which can be brought to bear quite directly on the correctness of this or any such reduction. In the specific case at hand, one clear consequence of reducing the Case Filter to the Theta Criterion is that the resulting system contains exactly one less filter, i.e. the resulting system lacking the Case Filter is less differentiated that was its unreduced predecessor incorporating this Filter. The empirical question is: Which is correct, the differentiated system containing the Case Filter or the reduced system lacking it? One rather direct test for the existence of differentiated principles is to be found in the existence of degrees of (or contrasts in) grammaticality. As early as Chomsky (1965) it was noted that a system of
DIFFERENTIATION AND REDUCTION IN SYNTACTIC THEORY
65
differentiated principles (in this case, a differentiated rule system) could, in principle, provide an empirically adequate model of differentiated knowledge of syntax, as exhibited by differentiated grammaticality judgements. For example, Chomsky (1965,149) noted a three-way grammaticality contrast in the following data: (1) John compelled. [liii] (2) Golf plays John. [2ii] (3) John plays golf. [3ii]
Clearly, (3) is well-formed, whereas (1) and (2) are not. But further, notice that the ungrammaticality of (1) and (2) is not the same kind of ungrammaticality. With respect to this data, Chomsky (1965) writes: A descriptively adequate grammar should make all of these distinctions on some formal grounds, and a grammar of the type just described seems to make them in some measure, at least. It distinguishes perfectly well-formed sentences such as (3) from the sentences of (1) and (2), which are not directly generated by the system of grammatical rules. It further separates…sentence…(1), generated by relaxing strict subcategorization rules, from sentence…(2),…generated when selectional rules are relaxed. Thus, it takes several steps toward the development of a significant theory of “degree of grammaticalness.” (p. 150) Thus, (1) and (2) violate different rule types, hence their grammaticality differs.1 Crucially, in terms of this framework, any reduction of strict subcategorization to selection (or conversely) would be (ceteris paribus) empirically problematic since the resulting reduced system (containing only a single rule type where there once were two) would be unable to account for the distinction in grammaticality. I argue here that the reduction of the Case Filter to the Visibility Principle similarly results in a grammar incapable of predicting certain differences in (and degrees of) ungrammaticality. The correct predictions are obtained within a differentiated system incorporating the Case Filter as an independent unreduced principle.2 This case-study presents a general method for determining the overall organization of syntactic principles. 2 The Reduction of the Case Filter To begin with, consider a sentence such as (4) * I hope John to think that Bill left.
The S-Structure representation of (4) is (5). (5) * [S′ [S I hoPe [S′ [S John to think that Bil1 left]]]]
This sentence is unquestionably ungrammatical. This is predicted, under standard treatments, by the Case Filter (see Freidin and Lasnik (1981)): (6) At S-Structure; *NP, where NP is lexical or a wh-trace and NP lacks (Abstract) Case
66
ESSAYS IN SYNTACTIC THEORY
In (5) the NP John is ungoverned (on the definition of government see e.g. Chomsky (1981; 1982; 1986a; 1986b)). Under the standard assumption (e.g. in Chomsky (1981)) that government by a Case assigner is necessary for Case assignment, this NP is Caseless, violating (6). Chomsky (1981, 336; 1986b, 94) provides an alternative analysis within which the Case Filter is eliminated, being reduced to a Visibility Principle.3 This principle constrains the assignment of theta roles to argument chains: (7) The Visibility Principle: Suppose that the position P is marked with the theta role R and C=(a1,…, an) is a chain. Then C is assigned R by P if and only if for some i, ai is in position P and C has Case or is headed by PRO. (Chomsky (1981, 334))
In (7) “C has Case” is defined thus: (8) The Chain C=(a1,…, an) has the Case K if and only if for some i, ai occupies a position assigned K (adapted from Chomsky (1981, 334))
The Theta Criterion is then defined as follows: (9) The Theta Criterion: Given the structure S, there is a set K of chains, K={Ci}, where Ci=(a1i,…, ani) such that: (i) if a is an argument of S, then there is a Ci, which is a member of K such that a=aji and a theta role is assigned to Ci, by exactly one position P, (ii) if P is a position of S marked with the theta role R, then there is a Ci which is a member of K to which P assigns R, and exactly one aji in Ci is an argument. (Chomsky (1981, 335))
Under the Visibility Principle, an argument chain is visible for theta assignment only if it is headed by PRO or contains a Case-marked position.4 Chomsky argues that the Case Filter is eliminable under Visibility, i.e. the Case Filter is reduced to the Theta Criterion.5 To see how this analysis works, consider again, for example, the S-Structure (5). In this representation the NP John must form a singleton argument chain. Under Visibility, it is a prerequisite to theta assignment that some position in the chain be Case-marked. Since no position in the chain receives Case, theta assignment is precluded by Visibility. Consequently, the Theta Criterion/Projection Principle is violated. Hence, it appears that the Case Filter is eliminable since such structures now violate the Theta Criterion, given the Visibility Principle. Perhaps the most fundamental distinction between the Case Filter analysis and the Visibility analysis is that the Visibility Principle is not a filter in the sense of Chomsky and Lasnik (1977). That is, in direct contrast to the Case Filter, application of the Visibility Principle does not map an unstarred representation into a starred representation.6 Rather, the Visibility Principle imposes a prerequisite to theta assignment: a chain is eligible for theta assignment only if it is visible. We will return to this fundamental and crucial distinction between the Case Filter and Visibility analyses below. 3 Contrasts in Ungrammatically; Evidence for Differentiation Next, consider the contrast in grammaticality between the ungrammatical sentence (4) repeated here and the ungrammatical sentence (10):
DIFFERENTIATION AND REDUCTION IN SYNTACTIC THEORY
67
(4) * I hope John to think that Bill left. (10) * I hope John to be likely that Bill left.
While (4) and (10) are each patently ungrammatical, it seems clear that the nature of the ungrammatically differs. In both (4) and (10) there is “something wrong with” the lexical subject of the infinitive; the two cases are identical in this regard. However, (10) differs from (4) in that, in this example, “something else is wrong”, namely something interpretive. Thus the two can be said to differ in degree of grammaticality: (10) is worse than (4).7 As discussed above, at least as early as Chomsky (1965) it was noted that degrees of ungrammaticality (or degrees of deviance) exist. Chomsky proposed that the degree of ungrammaticality of a string is to be explained in terms of the type of rule of grammar violated by the representation/derivation of the string. This idea can also be instantiated within a principles and parameters framework. Within this type of approach, it is often assumed that degrees of ungrammaticality are to be expressed in terms of the type and/ or number of principles a derivation violates. While this idea has not been fully formalized, it is nonetheless conceptually appealing and, in addition, it has enjoyed empirical success in, for example, Lasnik and Saito’s (1984) and Chomsky’s (1986a, section 7) analyses of the contrast in grammaticality between sentences whose derivations violate Subjacency and those whose derivations violate both Subjacency and the ECP. I propose the following (partial) characterization of “degree of ungrammaticality”:8 (11) Grammaticality Metric: If a sentence S has a derivation D incurring violations V1-Vn at Levels L1-L4, and another sentence S′: (a) has a derivation D′ incurring exactly the same violations at exactly the same levels, then S and S′ have the same degree of ungrammaticality (b) has a derivation D′ incurring exactly the same violations at exactly the same levels, plus at least one other additional violation, then S′ has a greater degree of ungrammaticality than does S (c) meets neither (a) nor (b), then S and S′ display different types (as opposed to different degrees) of ungrammaticality. (Note: two types of ungrammaticality can, in principle, be analyzed as two different degrees of ungrammaticality if e.g. different types of violations are assigned different relative strengths. For an early concrete proposal of this kind see Chomsky (1965, section 4.1.1)).
Given (11), can we now account for the fact that (10) displays a greater degree of ungrammaticality than does (4)? First, consider a standard theory incorporating both the Theta Criterion and the Case Filter. The derivation of sentence (4) is as follows: the D-Structure is well-formed, whereas the S-Structure violates the Case Filter. Now, consider (10). In its derivation the D-Structure violates the Theta Criterion while the SStructure violates both the Case Filter and (given the Projection Principle) the Theta Criterion. Thus, under the Case Filter analysis, (4) and (10) each violate the Case Filter at S-Structure. However, (10) incurs additional violations, namely Theta Criterion violations at both D-Structure and at S-Structure. Thus, (11b) is applicable: the violations incurred in the derivation of (4) are a proper subset of the violations incurred in (10). Consequently, (10) is correctly analyzed as more ungrammatical than (4) under the Case Filter approach.9 Consider next the Visibility account of the contrast between (4) and (10). As we shall see, Visibility can also account for the contrast, provided we make an independently necessary (yet suspect) stipulation. First, consider the D-Structure of (4) in which the singleton chain [NP John] is neither Case-marked nor headed by PRO. Does it satisfy the Theta Criterion under Visibility? It does, given the stipulation that Visibility does
68
ESSAYS IN SYNTACTIC THEORY
not apply at D-Structure. This stipulation is independently necessitated by, for example, passives with lexical subjects, such as (12) John was arrested.
The D-Structure representation of such a sentence is (13) [S′ [S e was arrested John]]
To allow such a D-Structure, it must be stipulated that Visibility does not apply at D-Structure (if Visibility did apply at this level, (13) (which is required by the Theta Criterion) would be excluded; the singleton chain John does not have Case nor is it headed by PRO). Notice that this necessary stipulation is suspect in that Visibility is a constraint on theta assignment yet it does not apply to D-Structure, the level at which theta structure is directly represented. Nonetheless let us simply accept this stipulation without further discussion. Thus, under the Visibility analysis, (4) satisfies the Theta Criterion at D-Structure. At SStructure, under Visibility, (4) violates the Theta Criterion. Now consider (10). At D-Structure the Theta Criterion is violated. At S-Structure, the Theta Criterion is violated again. Thus, under Visibility (assumed not to apply at D-Structure) (4) and (10) each violate the Theta Criterion at S-Structure, yet (10) also violates the Theta Criterion applying to D-Structure, while (4) does not. Thus the violations in the derivation of (4) are a proper subset of the violations in the derivation of (10). Hence, under (11b), the Visibility analysis can account for the contrast between (4) and (10) just as the Case Filter account does. Each analysis correctly predicts that (10) has a greater degree of ungrammaticality than does (4). The two analyses are summarized as follows: (14) CASE FILTER VISIBILITY Derivation of (4): Derivation of (4): DS: no violation DS: no violation (Visibility inapplicable at D-Structure) SS: Case Filter violation SS: Theta Criterion violation Derivation of (10): Derivation of (10): DS: Theta Criterion violation DS: Theta Criterion violation SS: Theta Criterion violation SS: Theta Criterion violation and Case Filter violation
However, a problem emerges when we consider a third type of ungrammatical example; an example which differs from both (4) and (10). The example is given in (15). (15) * I want John to be likely that Bill left.
The matrix verb want in (15) is a so-called “S′-deleter” which, in contrast to hope, is capable of assigning Case to the infinitival subject. While this example, too, is ungrammatical, it is quite clear that the nature of the ungrammaticality is different from that of both (4) and (10). In fact, the data clearly exhibits a three-way contrast, i.e. no two sentences of the triplet {(4), (10), (15)} should receive identical analyses. First, what do the Visibility and Case Filter analyses predict regarding (15)? Under the Visibility analysis, (15) violates the Theta Criterion at both D-Structure and S-Structure. Under the Case Filter analysis, exactly the same violations occur:
DIFFERENTIATION AND REDUCTION IN SYNTACTIC THEORY
69
(16) CASE FILTER VISIBILITY Derivation of (15): Derivation of (15): DS: Theta Criterion violation DS: Theta Criterion violation SS: Theta Criterion violation SS: Theta Criterion violation
Now, what do the two analyses predict regarding the contrast between (15) and (4)? Under (11b), Visibility predicts that (15) is more ungrammatical than (4) (compare diagrams (14) and (16)); each derivation violates the Theta Criterion at S-Structur\e, yet not at D-Structure, (15) violates the Theta Criterion while (4) does not. By contrast, under the Case Filter analysis of (15) vs. (4), (11b) is inapplicable, the violations incurred in the two derivations are not a proper subset relation. Hence (11c) applies and the sentences are characterized as having different types (as opposed to degrees) of ungrammatically. As noted above, under (11c), types of ungrammatically can be translated into degrees of ungrammatically provided different violations/ principles are assigned different relative strengths. In the examples at hand, the Case Filter analysis could (like the Visibility analysis) analyze (15) as more ungrammatical than (4), if it were assumed that Theta Criterion violations at D-Structure and at S-Structure produce a greater degree of ungrammaticality than a single Case Filter violation. While it is clear that the violations in (4) and (15) are of a different kind, it is not altogether clear which is worse.10 As noted: (i) each analysis directly expresses the fact that there is a difference between (4) and (15), i.e. under each analysis, (11a) is inapplicable; (ii) under (11b), Visibility predicts (15) is worse than (4); and (iii) the Case Filter analysis could similarly predict that (15) is worse than (4) under an appropriate interpretation of (11c). Since the facts are not altogether clear and since the two analyses can be made empirically equivalent with respect to this contrast, I leave this for further inquiry. Summarizing to this point, the Case Filter analysis and the Visibility analysis yield the same predictions regarding (10) vs. (4) and can be made to yield the same predictions regarding (15) vs. (4). Now, however, consider the final contrast, that between (10) and (15). It is clear that there is a difference between (10) and (15). (Further, it seems that there is not only a difference, but that (10) is worse than (15).11) However, the Visibility analysis cannot account for any contrast whatsoever between the two. Under Visibility, (15) violates the Theta Criterion at both D-Structure and S-Structure. Thus (15) is analyzed as identical to (10) (as shown in (14) and (16)). This analysis is incorrect. There is clearly a difference. While (10) and (15) are equally uninterpretable, there is a difference between the two with respect to the well-formedness of the lexical subject of the infinitival. As a clear illustration of this difference, suppose a non-native speaker were to utter (10). A complete correction requires at least two separate in-structions. First, that likely is semantically different from e.g. proud (I am proud/*likely that Bill left). And second, that hope is different from e.g. want in that the former prohibits a lexical subject in its infinitival complement (I want/*hope John to leave). Given the two corrections, (10) can be changed into a grammatical sentence: I want (*hope) John to be proud (*likely) that Bill left. The crucial point is that there are two distinct facts regarding the grammar of English to be learned here; in “theory-neutral” terms, one concerns the meaning of adjectives such as likely and the other concerns the distribution of lexical infinitival subjects. In direct contrast to (10), if a non-native speaker were to utter (15), only the former correction (concerning the interpretation of likely) would be necessary, i.e. there is nothing anomalous concerning the infinitival subject. Thus, in contrast to (10), there is only one fact regarding English grammar to be learned here.
70
ESSAYS IN SYNTACTIC THEORY
The minimal and crucial point is that (10) and (15) do differ, leaving aside the question of which is worse, yet this is not expressed under the Visibility analysis, an analysis which provides identical analyses of the two. In contrast to the Visibility analysis, a theory incorporating the Case Filter correctly analyzes (10) as different from (in fact, worse than) (15). Each derivation violates the Theta Criterion at both D-Structure and S-Structure. But (10) additionally violates the Case Filter at S-Structure whereas (15) does not. Thus, (11b) is applicable. The violations in the derivation of (15) form a proper subset of those in (10). Hence (10) is correctly analyzed as not only different from, but worse than, (15). Crucially, the contrast is expressible precisely because there exist two independent filters within this analysis, namely the Theta Criterion and the Case Filter. An otherwise identical theory in which the Case Filter is reduced to a Visibility requirement on theta assignment is unable to account for the data presented here precisely because the Case Filter is eliminated with the result that there exists one and only one relevant filter: the Theta Criterion. In conclusion, I have argued against a reduction of the Case Filter, suggesting instead that this filter is a distinct, differentiated principle of Grammar. This form of argumentation is applicable to all proposals regarding the organization of syntactic theory and thus provides a method for determining the precise nature of differentiation and reduction throughout the syntactic component.12 Notes
1
2
3
4 5
6
7 8
* I am very grateful to Howard Lasnik, Elaine McNulty and Esther Torrego. This article has benefited greatly from discussions with them. In addition, I thank two reviewers for their helpful comments. It should also be noted that one of the two reviewers reported explicitly that s/he was in agreement with all of the crucial judgements. This reviewer’s judgements are separate from those reported below. Portions of this material were presented at MIT. In addition to accounting for differences in grammatically (the grammatically of S1 differs from that of S2) Chomsky also constructs a more precise system within which certain contrasting degrees of grammatically are predicted (S1 is more ungrammatical than S2). This is pursued further below. For other problems directly confronting the Visibility Principle, see Davis (1984; 1986), Epstein (forthcoming). For arguments that, contra the Visibility analysis, there is no Case-inheritance within (certain) expletive-argument chains see Pollock (1981), Belletti (1986; 1988) and Lasnik (1989). A reviewer notes that another potential problem confronting Visibility emerges if there exists Theta-marked pro lacking Case. Evidence for the existence of exactly this type of category is provided in Epstein (1984). Chomsky (1986b, 94) attributes the Visibility approach to Joseph Aoun. The motivation for eliminating the Case Filter is that this filter imposes a disjunctive requirement on an unnatural class of categories, namely, lexical NPs and wh-traces. As we shall see, the Visibility Principle also expresses a number of unnatural requirements. See Epstein (forthcoming) for arguments that the unnaturalness of the Case Filter is eliminable by reformulating it as a requirement imposed only on lexical NPs, i.e. there is no requirement that wh-trace be Case-marked. This is one of the unnatural requirements alluded to in footnote 3. As Chomsky (1981, 337) notes, the Case Filter is not really reduced to completely independently motivated principles under this analysis. Rather, the Case Filter is replaced by incorporating Visibility, a nonfilter, yet a similarly Case-theoretic principle. More precisely, it does not map a representation bearing n stars into a representation bearing n+1 stars (n>1 being the case of multiple violations). See Chomsky and Lasnik for arguments that filters are, or at least are similar to, transformational operations. This is my own judgement. In addition I have informally polled four linguists and four nonlinguists. All judged (10) worse than (4). In (11), “has a derivation” should be read as “has a best possible derivation”, i.e. we disregard derivations exhibiting gratuitous violations. Notice that we are clearly assuming that a grammar “derivatively generates”
DIFFERENTIATION AND REDUCTION IN SYNTACTIC THEORY
9 10
11 12
71
ungrammatical strings as discussed above in Section 1 and in Chomsky (1965, 227). For more recent discussion of these and related issues, see Chomsky (1986b, chapter 2, especially p. 26). For the sake of simplicity, I will ignore LF representation. Uniform application of the Theta Criterion at this level (as dictated by the Projection Principle) has no effect on the arguments presented here. Three of the four linguists I polled felt that (15) was worse than (4). The fourth had the opposite judgement. Interestingly, the fourth did say that the members of this pair seem more different from each other than the members of all the other possible pairs. One other linguist similarly noted that these seemed to be violations of a “different kind”. I share this judgement. Of the nonlinguists, two felt (15) was worse than (4) while the other two had the opposite intuition. My own judgement as to which is worse is unclear, but, if forced to choose, I would judge (15) worse than (4). All four nonlinguists felt (10) was worse than (15), and three of the four linguists had this same intuition (as do I). The fourth linguist judged there to be no difference between the two. The methods employed here are presumably applicable to research concerning the organization of nonsyntactic components as well.
References Belletti, A. (1986) “Unaccusatives as Case Assigners,” Lexicon Project Working Paper #8, Lexicon Project, Center for Cognitive Science, MIT. Belletti, A. (1988) “The Case of Unaccusatives,” Linguistic Inquiry 19:1– 35. Chomsky, N. (1965) Aspects of the Theory of Syntax, Cambridge, Mass.: MIT Press. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986a) Barriers, Cambridge, Mass.: MIT Press. Chomsky, N. (1986b) Knowledge of Language: Its Nature, Origin and Use, New York: Praeger. Chomsky, N. and Lasnik, H. (1977) “Filters and Control,” Linguistic Inquiry 8:425–504. Davis, L.J. (1984) “Arguments and Expletives,” unpublished doctoral dissertation, University of Connecticut at Storrs. Davis, L.J. (1986) “Remarks on the Theta Criterion and Case,” Linguistic Inquiry 17:564–568. Epstein, S.D. (1984) “Quantifier-pro and the LF Representation of PROarb,” Linguistic Inquiry 15:499–504. Epstein, S.D. (forthcoming) Empty Categories and Their Antecedents, New York/Oxford: Oxford University Press. Freidin, R. and Lasnik, H. (1981) “Core Grammar, Case Theory and Markedness,” Proceedings of the 1979 GLOW Conference, Scuola Normale Superiore di Pisa, pp. 407–421. Lasnik, H. (1989) “Case and Expletives: Notes Toward a Parametric Account,” paper presented at the 1989 Princeton Workshop on Comparative Syntax, Princeton University. Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. Lasnik, H. and Uriagereka, J. (1988) A Course in GB Syntax: Lectures on Binding and Empty Categories, Cambridge, Mass.: MIT Press. Pollock, J.-Y. (1981) “On Case and Impersonal Constructions,” in May, R. and Koster, J. (eds.) Levels of Syntactic Representation, Dordrecht: Foris, pp. 219–252. Stowell, T. (1981) “Origins of Phrase Structure,” unpublished doctoral dissertation, MIT.
7 DERIVATIONAL CONSTRAINTS ON A-CHAIN FORMATION
In this article I examine a number of phenomena, each representing a seemingly distinct restriction on Āchain formation. Some of these phenomena have not yet been explained. Others have been analyzed in different ways. Where relevant, I will discuss previous analyses and note potential problems with them. So far no analysis has provided a unified explanation for these restrictions. I will demonstrate that Chomsky’s (1991) Economy Constraint, a constraint motivated on entirely independent grounds, does just that. In the course of the discussion I will also examine the Earliness Principle, a derivational constraint proposed by Pesetsky (1989) as a replacement for the Economy Constraint. The article is organized as follows: In section 1 I discuss previous analyses of a descriptive constraint on LF wh-movement originating from Comp, and I propose a more satisfactory account. In section 21 explain a constraint on LF movement of syntactically topicalized quantifier phrases. In section 3 I provide an analysis of the impossibility of syntactic topicalization of wh-phrases. In section 4 I deduce the [–Wh] Comp Filter, a filter banning occupancy of a [–wh] Comp by a wh-phrase at S-Structure (and at LF). In section 5 I provide an explanation for a certain restriction on syntactic VP-adjunction needed within Chomsky’s (1986a) framework. Each of the phenomena discussed in these five sections is shown to be explained by the Economy Constraint. In section 6 I provide an explanation for the [+Wh] Comp Filter, a parameterized SStructure filter that requires of all and only those languages having syntactic wh-movement that a [+wh] Comp contain a wh-phrase at S-Structure. I show that S-Structure application of this filter can be deduced, not from Economy, but from the Earliness Principle proposed (on independent grounds) by Pesetsky (1989). However, I suggest that, contra Pesetsky (1989), the Earliness Principle does not subsume the Economy Constraint. I conclude by suggesting instead that (with a certain modification) these two principles operate as distinct yet interacting principles of Universal Grammar. 1 LF Wh-Movement from Comp Baker (1970) noted that a sentence of the following type is only two ways ambiguous: (1) Who wonders where we bought what?
The ambiguity lies solely in the fact that what can be interpreted as having either wide (matrix) scope or narrow (embedded) scope.1 By contrast, the scope possibilities for who and where are more restricted: where has only narrow (embedded) scope, and who has only wide (matrix) scope. The S-Structure representation of (1) is (2).2
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
73
(2) [S′ whoi [S ti wonders [S′ wherej [S we bought what tj]]]]
It is the facts concerning the scope of where (and who) that will concern us here. These facts can be descriptively characterized as follows: (3) If a wh-phrase W occupies a Comp C in an S-Structure representation 5, then W occupies C in the LF representation of 5.
Thus, in the LF representation of (2), who occupies the matrix Comp and where occupies the embedded Comp. The problem is, How is the descriptive constraint (3) to be explained? The fact that who remains in the matrix Comp at LF is readily accounted for. Following Lasnik and Saito (1984) (hereafter L&S (1984)), I assume (4). (4) In LF representation each wh-phrase must occupy a [+wh] Comp.
I also assume that the following theorem (see Chomsky (1981; 1982), Koopman and Sportiche (1983)) is derivable from Full Interpretation (FI) (see Chomsky (1986b; 1991)): (5) In LF representation each operator must bind a variable and each variable must be bound.
From (4) and (5) we can derive the fact that who in (2) occurs in the matrix [+wh] Comp at LF. If who occupied any other [+wh] Comp at this level of representation, (5) would be violated. Thus, the scope of who is explained. What about the scope of where? Can we explain that this wh-phrase, occupying the embedded Comp at S-Structure, must occur in this same Comp at LF? Under (4), where must occupy a [+wh] Comp at LF. Assuming that adjunction to Comp is available in the LF component (see L&S (1984)), suppose that where adjoins to the matrix Comp at LF. This yields the LF representation (6). (6) [[[wherej] whoi] [ti wonders [(tj) [we bought what tj]]]]
I assume that the trace in the embedded Comp is only optionally present. That is, I assume that trace-leaving is not a defining (hence, stipulated) property of movement and that trace distribution is instead to be derived from other principles (see Chomsky (1981), Stowell (1981), Pesetsky (1982), and L&S (1984)). Accordingly, a trace in Comp is only optionally left by movement. The two LF representations encoded in (6) (one with a trace in Comp, one without) are both excluded. Given that traces are not [+wh] categories (L&S (1984)), both representations violate (7). (7) At LF a [+wh] Comp must contain a [+wh] phrase.
The representation containing a trace in Comp also violates the Empty Category Principle (ECP), given (8). (8) All traces present at LF are subject to the ECP. (L&S (1984))
Now reconsider the LF representation (6) lacking a trace in Comp. This representation violates only (7). But, as L&S (1984) note, this violation can be overcome if what moves into the embedded Comp. That is,
74
ESSAYS IN SYNTACTIC THEORY
no violation occurs if in the mapping from S-Structure to LF, where adjoins to the matrix Comp (leaving no trace in the embedded Comp) and then what moves into the embedded [+wh] Comp. In such a derivation (2) is mapped into the LF representation (9). (9) [S′ [wherej [whoi]] [S ti wonders [S′ whatk [S we bought tk tj]]]]
Since where has no matrix scope interpretation, this derivation must be prevented.3 Of course, (3) provides no principled means to achieve this, since (3) is merely a descriptive generalization. The question is, Is there a principled, independently motivated constraint that excludes such derivations? A number of constraints have been proposed that—in one way or another—prevent derivations like this. I will review these in sections 1.1–1.7, noting problems with each, and in section 1.8 I will propose an alternative analysis. 1.1 A θ-Based Constraint on Rule Application Aoun, Hornstein, and Sportiche (1981) (hereafter AHS) propose the following constraint on LF whmovement: (10) Wh-R (=LF wh-movement) only affects wh-phrases in argument [A-] positions.
(10) correctly prevents the unwanted derivation (since where cannot be moved at LF) and more generally, it enforces the descriptive constraint (3). But beyond this, AHS note that (10) also excludes LF Comp-toComp movement altogether. As L&S (1984) note, this is potentially problematic since Huang (1982) has (subsequently) argued that LF Comp-to-Comp movement exists in (for example) cases of long-distance LF adjunct movement in languages like Japanese and Chinese that lack syntactic wh-movement. Thus, for example, consider a Japanese S-Structure-to-LF mapping such as the following: (11) S-Structure (=L&S’s (37b)) Bill-wa [S′ John-ga naze kubi-ni natta tte] itta no? Bill-TOP John-NOM why was fired Comp said Q ‘Why did Bill say that John was fired t?’ (12) LF (=L&S’s (38b)) [S′ [S Bill-wa [S′ [S John-ga ti kubi-ni natta]S tte t′i]S′ itta]S nazei]S′ no
To satisfy the ECP, a trace must be created in the intermediate Comp at LF. This apparently requires LF Comp-to-Comp movement. The mapping from (11) to (12) reveals another, more serious problem confronting (10). (10) precludes not only LF Comp-to-Comp movement but also any LF movement of the adjunct naze. That is, since naze occupies an Ā-position, LF wh-movement cannot affect this category.4 Consequently, the derived LF representation (like any LF representation containing a wh-adjunct) necessarily violates the prohibition against vacuous quantification (a constraint presumably reducible to Full Interpretation; see Chomsky (1986b; 1991)). Thus, under (10) the target LF representation (12) cannot be derived.
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
75
In summary, (10) entails (3), as desired. However, the exclusion of LF Comp-to-Comp movement by (10) may be problematic. Moreover, (10) precludes LF wh-adjunct movement altogether. Finally, even if these empirical problems were somehow overcome, (10) would nonetheless remain descriptive.5 1.2 A θ-Based Chain Formation Constraint As noted above, Huang (1982) presents evidence for the existence of LF Comp-to-Comp movement. Thus, he confronts the problem of allowing such movement while at the same time ruling out unwanted derivations like that from (2) to (9). In order to do this, Huang (1981/82, fn. 13) proposes the following constraint: (13) Every chain of movement (either at SS or LF) [must] originate from an argument [A-] position.
(13) rules out the unwanted derivation from (2) to (9) since LF movement of where would entail that the chain of LF movement of this category originates in an Ā-position, namely, Comp. By contrast, instances of LF movement from one Comp to another are not excluded provided the chain of movement originates in an A-position. Although this approach does exclude the unwanted derivation, it nonetheless confronts a now familiar empirical problem. Consider again the mapping from the S-Structure representation (11) to the LF representation (12). The descriptive constraint (13), like (10), precludes any movement of the adjunct naze since the chain of LF movement of naze originates in an Ā-position. 1.3 A Structure-Based Chain Formation Constraint In contrast to the preceding analyses, Chomsky (1986a) proposes the following θ-independent, purely structural constraint on LF wh-movement: (14) Chain formation can only be initiated from IP-internal position.
(14) is intended to prevent derivations from S-Structure representations like (2) to LF representations like (9) by preventing a category in Comp at S-Structure from undergoing LF movement. Since Comp is IPexternal, movement (i.e., chain formation) cannot be initiated from this position. However, the empirical problem with this purely structural constraint, based on the notion “IP-internal,” is that it does not really exclude the unwanted derivation, in which where, occupying Comp at S-Structure, undergoes LF movement. Such LF movement is allowed by (14) because Comp is, in fact, an IP-internal position; that is, the embedded Comp is internal to IP, in this case, the matrix IP. However, the unwanted derivation can be excluded by modifying (14) as follows: (15) Chain formation can only be initiated from a position that is internal to all IPs.
This correctly blocks LF movement of where in (2), but it also wrongly excludes LF movement of who(m) in structures like (16). (16) [CP whoi [IP ti told who(m)j [IP PROj to go]]]
76
ESSAYS IN SYNTACTIC THEORY
Thus, in addition to being largely descriptive, such structural constraints appear to be empirically problematic. 1.4 The Strict Cycle Condition in LF Another approach to excluding the derivation from (2) to (9) is to assume that the Strict Cycle Condition (SCC) (Chomsky (1973)) applies in the LF component: (17) Strict Cycle Condition No rule can apply to a domain dominated by a cyclic node A in such a way as to affect solely a proper subdomain of A dominated by a node B which is also a cyclic node.
This constraint on movement would indeed prevent the unwanted derivation we have been discussing.6 However, as L&S (1992, chap. 2) point out, it does not prevent other unwanted derivations proceeding from the S-Structure representation (2) and terminating in the LF representation (9). For example, proceeding from (2), suppose the following LF operations occur: (18) a. First, where adjoins to the matrix Comp, leaving no trace in the embedded Comp (a matrix-S′ operation). b. Second, what adjoins to the matrix Comp (a matrix-S′ operation). c. Third, what moves into the embedded Comp (a matrix- S′ operation).
Notice that if wh-phrases can adjoin to VP, other derivations from (2) to (9) are also wrongly allowed. For example: (19) a. First, where adjoins to the embedded VP, leaving no trace in the embedded Comp (an embedded-S′ operation). b. Second, what moves into the embedded Comp (an embedded-S′ operation). c. Third, where adjoins to the matrix Comp (a matrix-S′ operation).
Again, no SCC violation occurs. Thus, assuming that the SCC applies in the LF component is not sufficient to exclude mapping the S-Structure representation (2) into the LF representation (9). 1.5 An S-Structure Condition Van Riemsdijk and Williams (1981) (hereafter VR&W) consider certain reorganizations of the Y-model of grammar advocated in Chomsky and Lasnik (1977). Among these is the L-model, within which the scope of a quantifier or wh-phrase is represented by adjoining its index to a dominating S-node, an operation performed by the rule of Quantifier Interpretation.7 Within this model, the unwanted derivation from (2) to (9) is as follows:8 (20) a. D-Structure [S′ [S whoi wonders [S′ [S we bought whatj wherek]]]]
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
77
b. Move NP → NP-Structure c. Quantifier Interpretation → LF [S′ [Sk [Si [S whoi wonders [S′ [Sj [S we bought whatj wherek]]]]]]] d. Wh-Movement → S-Structure [S′ [Sk [Si [S whoi wonders [S′ wherek [Sj [S we bought whatj
This derivation must be excluded; where occupies the embedded Comp at S-Structure but has matrix-S scope. VR&W propose the following filter:9 (21) A wh-phrase immediately dominated by Comp must govern its own index, (adapted from VR&W)
(21) excludes the S-Structure representation (20d); more generally, the descriptive generalization (3) is (mutatis mutandis) expressed by (21). However, since this S-Structure filter explicitly mentions “a whphrase immediately dominated by Comp [at S-Structure]” and requires “close structural proximity” between such a phrase and its scope-indicating index, it too seems largely descriptive.10 That is, if it is true, it should be deduced from deeper principles. 1.6 LF Filters In contrast to the constraints thus far discussed, L&S (1984) exclude the unwanted derivation with an LF filter. First, they propose that, at each level, Comp obligatorily receives the index of its head (see AHS and L&S (1984) for discussion of the Comp indexing rule). Thus, a more detailed representation of the SStructure representation (2) is (22): (22) [S′ [Ci whoi] [S ti wonders [S′ [Ck wherek] [S we bought whatj tk]]]]
Now if, in the LF component, the unwanted derivation from (2) to (9) occurs (i.e., where first adjoins to the matrix Comp and then, to satisfy LF Comp requirements, what substitutes into the embedded [+wh] Comp), then the embedded Comp in the resulting LF representation will appear as follows: (23) [Ck whatj]
The following filter will properly exclude the resulting LF representation containing (23): (24) * [compi Headj], where i and j are not identical indices. (L&S (1984))
As discussed in Epstein (1987; 1991), this filter, though it successfully excludes the unwanted derivation, nonetheless seems insufficiently general. That is, there are other unwanted derivations that display intuitively illicit forms of Comp indexing at LF, yet are not excluded by (24). For example, the following unwanted derivation escapes the filter (24) as well as the ECP and is consequently overgenerated by L&S’s (1984) analysis: (25) a. S-Structure
78
ESSAYS IN SYNTACTIC THEORY
[S′ whyi [S do you believe [NP the claim [S′ that [S John said [S′ [compi ti] [S Bill left ti]]]]]]] b. LF [S whyi [S do you believe [NP the claim [S′ that [S John said [S′ [Compi ] [S Bill left ti]]]]]]] [+γ]
Given that nonarguments are not γ-marked at S-Structure (L&S (1984)), neither trace is assigned [−γ] (or [+γ]) in (25a). Thus, (25a) is well formed with respect to the ECP. Under the Comp-indexing rule, the most deeply embedded Comp is indexed at S-Structure. In the LF component, the trace in this Comp deletes, but the index on Comp remains. Consequently, at LF, there is no “offending” trace in Comp, and furthermore, the remaining trace is indeed antecedent-governed by a head (namely, the indexed Comp), as required by the ECP. Thus, incorrectly, no ECP violation occurs. One way to exclude such derivations is to assume (or derive from independent principles) that wh-adjuncts and their traces have no indices at S-Structure and therefore the problematic Comp indexing triggered by the adjunct trace in Comp in (25a) cannot occur. This is the approach proposed in Epstein (1987; 1991):11 (26) Wh-adjuncts and their traces have no indices at S-Structure.
An alternative to this analysis is to assume, not that (25a) cannot be generated, but that there is something wrong with the LF representation (25b)— in particular, with the indexed, yet unoccupied Comp. Pursuing this line of inquiry, Epstein (1987; 1991) proposes the following filter: (27) * [Compi], where Comp has no head with index i.
This filter excludes the unwanted derivation in (25) by excluding (25b). In addition, filter (27) subsumes L&S’s filter (24). (This is so because any Comp contraindexed with its head (i.e., the type of Comp excluded by filter (24)) is necessarily an indexed Comp lacking a coindexed head (the type of Comp excluded by filter (27)).) Thus, (27) successfully rules out derivation (25), and it also rules out the unwanted derivation from (2) to (9). Although I know of no empirical problems with filter (27), it too seems descriptive and ad hoc. Being able to deduce it from independently motivated principles would clearly be preferable. 1.7 Summary of the Previous Approaches We have seen that none of the accounts thus far investigated provides a completely satisfactory explanation of the descriptive generalization (3) and therefore of the impossibility of mapping S-Structure representations such as (2) into LF representations such as (9). The relevant principles are repeated here: (10) Wh-R (=LF wh-movement) only affects wh-phrases in argument [A-] positions. (AHS) (13) Every chain of movement (either at SS or LF) [must] originate from an argument [A-] position. (Huang (1981/ 82)) (14) Chain formation can only be initiated from IP-internal position. (Chomsky (1986a))
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
79
(17) The Strict Cycle Condition applies in the LF component. (21) A wh-phrase immediately dominated by Comp must govern its own index, (adapted from VR&W) (24) * [compi Headj], where i and j are not identical indices. (L&S (1984)) (27) * [compi], where Comp has no head with index i. (Epstein (1987; 1991))
In the following section we will explore an alternative analysis that not only is empirically successful but also seems to provide a more principled account of the facts. 1.8 An Economy Approach The derivation from the S-Structure representation (2) to the LF representation (9) has the notable property that (intuitively) LF movement of where from the embedded Comp is unnecessary. The question is, In what formal sense is it unnecessary? To begin with, we have assumed (following L&S (1984; 1992)) both that wonder selects a [+wh] Comp and that (28) In English, a [+wh] Comp must have a [+wh] head at S-Structure and at LF.
In (2), then, S-Structure application of this filter to the Comp selected by wonder is satisfied. Further, if where fails to undergo LF movement, LF application of the filter will also be satisfied. That is to say, movement of where is unnecessary. The unwanted derivation can now be blocked by assuming that unnecessary movement is prohibited—or equivalently, that movement is allowed only when movement is necessary (Chomsky (1991)). The notion “necessity,” as we have assumed here, can be defined only with respect to some specified set of filters. That is, movement, or more generally Affect α, will be necessary if and only if failing to apply Affect α results in the violation of at least one filter. I will follow Chomsky (1991) in assuming that Full Interpretation indirectly defines the set of LF filters, or more precisely the set of ill-formed LF objects. (This indirect definition is achieved by defining the set of well-formed “legitimate” LF objects (precisely those permitted by Full Interpretation). The complement set is, of course, the set of illformed objects. See Chomsky (1991, section 6.2) for further definitions.) Clearly, in the S-Structure representation (2) the chain headed by where is a legitimate LF object, that is, a legitimate Ā-chain that satisfies the prohibition against vacuous quantification (Full Interpretation) and satisfies Comp filters as well as scope requirements imposed on the wh-phrase itself. Consequently, movement is unnecessary. Unnecessary movement can then be prohibited by the following principle (see Chomsky (1991)): (29) Economy Constraint Satisfy filters using the fewest possible applications of Affect α.
Under (29), where occupying the embedded [+wh] Comp at S-Structure cannot undergo LF movement. Such movement is blocked precisely because it is unnecessary. That is, there exists a well-formed derivation (one satisfying all filters) in which such movement fails to occur—or, to put it yet another way, there exists a “shorter” well-formed derivation. The unwanted derivation is thus excluded on general and independently motivated grounds, and the fact that where has no wide-scope interpretation is now explained. More generally, we now have an explanation for the descriptive generalization (3).
80
ESSAYS IN SYNTACTIC THEORY
2 Topicalized Quantifier Phrases In section 1 we saw that the Economy Constraint (29) correctly predicts that a wh-phrase occupying a particular [+wh] Comp at S-Structure necessarily occupies the same Comp at LF. In this section we will see that the same constraint predicts analogous effects with quantifiers. To begin with, the following descriptive constraint appears true: (30) A quantifier phrase (QP) that has been topicalized at S-Structure occurs in its S-Structure position at LF.
Direct evidence that (30) is a correct description is provided by Lasnik and Uriagereka (1988). They note the following contrast (p. 156): in (31a) a wide-scope interpretation of every problem is (at least marginally) possible, whereas in (31b), where the QP is topicalized, it is not. (31) a. Someone thinks that Mary solved every problem, b. Someone thinks that every problem, Mary solved.
They also note the following contrast: (32) a. I don’t think that Mary solved any problems. b.*I don’t think that any problems, Mary solved.
They suggest that (32) could be explained if the negative polarity item any (problem) must move at LF to the negative licensor, but cannot, given (30). As they note, the question remains why (30) holds. Similarly, Kiss (1987) notes that the S-Structure configuration of fronted QPs in Hungarian fixes their scope. Thus, for example, the following sentences are scopally unambiguous: (33) (= Kiss’s (157)) [S″ János [S′ ′minden feladatot [S′ ′többször is John every problem-ACC several-times also [S′ ′gondosan [S′ ′át [S nézett]]]]]] carefully over looked ‘As for John, every problem was several times surveyed by him carefully.’ (34) (= Kiss’s (158)) [S″ János [S′ ′többször is [S′ ′minden feladatot John several-times also every problem-ACC [S′ ′gondosan [S′ ′át [S nézett]]]]]] carefully over looked ‘As for John, on several occasions he surveyed every problem carefully.’
If indeed the grammar of Hungarian incorporates an LF level of representation, this too would appear to provide evidence for (30). The question remains, How is (30) to be explained? This descriptive constraint is in fact deducible from the Economy Constraint. As concerns, for example, (31b), suppose it is true, as Baltin (1982) proposes, that topicalization is correctly analyzed as adjunction to S. At the same time, suppose that May (1977; 1985) is right in proposing that Quantifier Raising (QR) is (in effect) obligatory and similarly can adjoin a QP to S. It then follows from the Economy Constraint that if a QP is topicalized at S-Structure, it (or the chain it heads) is a legitimate LF object; that is, further movement is unnecessary and therefore is blocked by
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
81
Economy. Specifically, the widescope reading of every problem in (31b) is blocked because deriving the wide-scope representation of this quantifier requires moving it from its S-Structure topicalized position to a position in which it is adjoined to the matrix S. In all, this derivation requires that every problem move twice (for simplicity, I am not counting the constant single additional LF movement of someone): (35) a. D-Structure [S′ [S someone thinks [S′ [S Mary solved every problem]]]] b. S-Structure [S′ [S′ someone thinks [S′ [S every problemi [S Mary solved ti]]]]] c. LF [S′ [S every problemi [S someonej [S tj thinks [S′ [S Mary solved ti]]]
However, there is a shorter well-formed derivation involving only one move, namely, a derivation in which the LF movement occurring in the mapping from (35b) to (35c) fails to occur. Therefore, such LF movement is prohibited by Economy, and it is correctly predicted that every problem in (31b) has no wide-scope reading.12 A similar explanation presumably extends to the Hungarian examples (33) and (34). Notice that since a one-move derivation from the D-Structure representation (35a) is well formed, all derivations from this D-Structure representation involving more than one application of Affect a are excluded by Economy (including, as just shown, (35) itself). This entails that the (marginally available) wide-scope reading of every problem in (31a) (represented by adjunction of every problem to the matrix S at LF) must be obtained in one (fell-swoop) movement of every problem. This in turn entails that Subjacency does not constrain LF movement (Huang (1982)) and that lexical proper government is sufficient for the ECP (L&S (1984)). (These entailments will be discussed more in the following sections.) Given this characterization of LF, (32b) can now be excluded, as desired. Movement to the negative licensor in this example will involve two moves: S-Structure topicalization and LF movement to the licensor. However, direct (one-fell-swoop) LF movement to the negative licensor (a derivation involving only one move) is allowed. Therefore, the longer two-move derivation, including S-Structure topicalization, is prohibited. The ungrammatically of (32b) is thus explained. More generally, the descriptive generalization (30) is deduced from the Economy Constraint (29). 3 The Nontopicalizability of Wh-Phrases Consider the following S-structure representations:13 (36) a. [S′ whoi [S ti said [S′ that [S John likes Mary]]]] b. [S′ whoi [S ti said [S′ that [S Maryj [S John likes tj]]]]] c. [S′ whoi [S′ ti said [S′ that [S John likes who]]]] d.*[S′ whoi [S ti said [S′ that [S whoj [S John likes tj]]]]]
As Lasnik and Uriagereka (1988, 156) note, “The descriptive generalization seems to be that a wh-phrase can’t undergo topicalization; but why that should be remains unclear.” To begin with, notice that given the LF requirement (4), repeated here, the topicalized wh-phrase in (36d) must undergo LF movement.
82
ESSAYS IN SYNTACTIC THEORY
(4) In LF representation each wh-phrase must occupy a [+wh] Comp.
The question then is, What prevents a derivation from the S-Structure representation (36d) to the LF representation (37)? (Recall that I assume trace-leaving is optional.) (37) [S′ [whoj [whoi]] [S ti said [S′ that [S John likes tj]]]]
No principle appears to be violated. In particular, Economy appears to be satisfied: since LF movement is necessary (in order to satisfy (4)), it is allowed to apply. Nonetheless, notice that the derivation that includes the S-Structure representation (36d) and the LF representation (37) involves two movements of the object wh-phrase: syntactic topicalization plus LF movement to the [+wh] Comp as required by filter (4). But crucially, there exists a shorter derivation resulting in the same LF representation. In this shorter derivation, the object wh-phrase remains in situ at S-Structure (i.e., it does not topicalize), and at LF it moves in one fell swoop to the [+wh] Comp: (38) a. S-Structure [S′ whoi [S ti said [S′ that [S John likes whoj]]]] b. LF [S′ [whoj [whoi] [S ti said [S′ that [S John likes tj]]]]
I assume (again) that such direct LF movement is allowed, given that Subjacency does not constrain LF movement and given that the object trace satisfies the ECP by virtue of lexical proper government. The question now is, How can the shorter derivation (38) be forced and the longer derivation including the SStructure representation (36d), containing a topicalized wh-phrase, be excluded? Exactly this result follows from Economy. This constraint thus correctly blocks S-Structure topicalization of wh-phrases: since a wh-phrase must occupy a [+wh] Comp at LF (filter (4)), S-Structure topicalization always represents an “extra” (i.e., unnecessary) move. That is, there will always be a shorter derivation, resulting in the same LF representation, in which S-Structure topicalization fails to occur. So far we have seen only that this result obtains with respect to object wh-phrases, as in (36d). Consider topicalization of a subject wh-phrase, as in (39) and (40). (39) * [S′ whoi [S ti thinks [S′ [S whoj [S tj left]]]]] (40) * [S′ whoi [S ti thinks [S′ [S whoj [S Bill said [S′ tj [S tj left]]]]]]]
If subject position is lexically properly governed by Infl at LF (see L&S (1992)), then each of these examples violates Economy. That is, there is a shorter (one-move) well-formed derivation in which the subject whoj remains in situ at S-Structure and moves in one fell swoop to the matrix [+wh] Comp at LF.14 Thus, the Economy Constraint predicts that S-Structure topicalization of wh-phrases is barred, and the ungrammaticality of examples like (36d) is explained.15
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
83
4 Elimination of the [-Wh] Comp Filter Derivations of the following kind, in which a [+wh] phrase occupies a [−wh] Comp at S-Structure, must also be prevented: (41) a. D-Structure [S′ [S who thinks [S′ [S John saw what]]]] b. S-Structure [S′ whoi [S ti thinks [S′ whatj [S John saw tj]]]] c. LF [S′ [whatj [whoi]] [S ti thinks [S′ [S John saw tj]]]]
One approach to excluding such derivations is to assume with L&S (1984) that the following Comp filter applies not only at LF but also at S-Structure: (42) A [−wh] Comp cannot have a [+wh] head.
Given that think selects a [−wh] Comp, the S-Structure representation (41b) violates (42) and the derivation is blocked. However, there are at least conceptual grounds for assuming that the [−Wh] Comp Filter (42) does not apply at S-Structure. First, if the Comp filters are in fact enforcing interpretive requirements (e.g., think does not select an indirect question), then these requirements should be imposed not at S-Structure but only at LF (see Chomsky (1986a) for an analysis within which selectional requirements are imposed only at this level). Second, if it is true, as Chomsky (1986b) suggests, that S-Structure should be deducible from the correct formulation of constraints on the three interpretive levels and from constraints on Affect α, then there should be no S-Structure filters at all.16 Regardless of the force of these conceptual arguments, let us assume that S-Structure application of (42) is eliminated—in other words, that there is no S-Structure [−Wh] Comp Filter. Consequently, the derivation in (41) apparently violates nothing. How is it to be excluded? The answer is that it violates Economy. Specifically, it involves two applications of Affect a: what moves first into the [−wh] Comp at S-Structure and then into the [+wh] Comp as required by filter (4). But there is another derivation, resulting in exactly the same LF representation, which involves only one transformational application. In this one-step derivation what remains in situ at S-Structure and adjoins to the matrix Comp, in one fell swoop, at LF. Thus, Economy excludes the unwanted derivation in (41), rendering S-Structure application of the [−Wh] Comp Filter, a largely descriptive filter, unnecessary. This analysis has at least five important consequences/entailments. First, the derivation in (41) is now excluded by a very general and independently motivated derivational constraint. The impossibility of (41), in which a wh-phrase occupies a [−wh] Comp at S-Structure, is now seen to be exactly the same as the impossibility of the French example (43) (see Pollock (1989)), in which Infl lowers to V (instead of V raising to Infl), with the result that “additional” uneconomical LF raising is forced (see Chomsky (1991)). (43) * Jean sou vent embrasse Marie. Jean often kisses Marie
(41) and the derivation of (43) each involve “too many” transformational steps; that is, in each case the grammar makes available a shorter derivation satisfying all filters. Thus, the ad hoc [−Wh] Comp Filter can
84
ESSAYS IN SYNTACTIC THEORY
be eliminated, and its effects are enforced by a very general constraint on Affect a that provides a unified analysis of examples as seemingly different as (41) and (43).17 Second, the success of the Economy analysis of (41), like the analyses presented in the previous sections, rests on the assumption that Subjacency does not constrain LF movement. Recall that Economy predicts that (41) is excluded because the following derivation, resulting in exactly the same LF representation, is shorter than (41), precisely because what moves only once: (44) a. D-Structure [S′ [S who thinks [S′ [S John likes what]]]] b. S-Structure [S′ whoi [S′ ti thinks [S′ [S John likes what]]]] c. LF [S′ [whatj [whoi [S ti thinks [S′ [S′ John likes tj]]]]]
If Subjacency constrained LF movement, then LF movement of what would be forced to proceed through (at least) the embedded Comp, thereby involving two steps. But this derivation would then be no shorter than the unwanted derivation in (41), and the Economy analysis would therefore incorrectly predict it to be possible. We would then apparently be forced to appeal to the S-Structure [−Wh] Comp Filter to exclude (41). Third, the success of this analysis (an analysis allowing (44)) rests on the sufficiency of lexical proper government. As we have seen, the assumption that lexical proper government is a sufficient condition for satisfaction of the ECP is also crucial to the analyses presented in the previous sections. These analyses, which reduce a number of descriptive constraints to an explanatory principle, thus provide support for the existence of lexical proper government. Fourth, the [−Wh] Comp Filter can now be eliminated from the grammar. As we have seen, S-Structure application is unnecessary, given Economy. But as L&S (1992) show, LF application is also unnecessary since the effect of this filter at LF is entailed by the independently necessary filter (4): a [−wh] Comp cannot have a [+wh] head at LF because if it did, (4) would be violated. Fifth, in eliminating a (seemingly necessary) S-Structure filter by appeal to a constraint on Affect a, we have (perhaps) moved one step closer to eliminating all S-Structure principles and thereby deducing the properties of this level, in this case from constraints on the application of Affect α. In summary, this section has shown that S-Structure application of the [−Wh] Comp Filter can be reduced to the Economy Constraint. This Filter can then be eliminated from the grammar. 5 On Syntactic VP-Adjunction of Wife-Phrases Within the framework of Chomsky (1986a), syntactic VP-adjunction of wh-phrases is an available operation. For example, it is employed in order to satisfy Subjacency in cases like the following:18 (45) [CP whati do [IP you [VP ti [VP like ti]]]
However, as Epstein (1986) points out, if syntactic VP-adjunction is permitted, derivations like (46) are incorrectly allowed.
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
85
(46) a. D-Structure [CP [IP who [VP saw what]]] b. S-Structure [CP whoi [IP ti [VP whatj [VP saw tj]]]] c. LF [CP whatj whoi [IP ti ([VP) [VP saw tj] (])]]
Such derivations are in relevant respects completely analogous to derivations like (41); that is, they involve “unnecessary” movement to what must be a permissible syntactic landing site. Within the framework assumed here, however, derivations like (46) are excluded as desired. This is so because there exists a shorter, wellformed derivation in which what moves only once—namely, the following derivation, in which syntactic VP-adjunction fails to occur: (47) a. D-Structure [CP [IP who [VP saw what]]] b. S-Structure [CP whoi [IP ti [VP saw what]]] c. LF [CP whatj whoi [IP ti [VP saw tj]]]
Thus, given the existence of derivations like (47), the Economy Constraint successfully excludes derivations like (46). 6 The [+Wh] Comp Filter In section 4 we saw that the [−Wh] Comp Filter can be eliminated. In contrast to this filter, which blocks occupancy of Comp by wh-phrases, the [+Wh] Comp Filter requires occupancy of Comp by wh-phrases. Despite this difference, a natural question is whether the [+Wh] Comp Filter can be eliminated as well. Here, I will argue that it can be partially eliminated in that its S-Structure application can be deduced. Consequently, we can assume that this filter applies only at LF. To begin with, notice that S-Structure application of the [+Wh] Comp Filter seems to be parameterized (see L&S (1984)). In languages that have syntactic wh-movement, such as English, a [+wh] Comp must have a [+wh] head at S-Structure. Thus, the S-Structure representation in (48b) is well formed, but the one in (48a) is not. (48) a.
b.
86
ESSAYS IN SYNTACTIC THEORY
But in languages lacking syntactic wh-movement, such as Chinese, the analogue of (48a) is well formed; hence, there is no S-Structure requirement. To explain this, L&S (1984) assume that (49) is an implicational universal. (49) If language L has syntactic wh-movement, then a [+wh] Comp must have a [+wh] head at S-Structure in L.
For L&S (1984), then, the parameter is whether a language has or does not have syntactic wh-movement. Again, if it is desirable to eliminate S-Structure filters in general and/or to restrict Selection to LF, then an alternative to (49) should be explored. Chomsky (1986a) proposes one such alternative. This analysis rules out English derivations that include (48a), but without appealing to an S-Structure filter. Chomsky assumes that, universally, Selection applies only at LF. Hence, the S-Structure representation (48a) is well formed both in English and in Chinese. A derivation including (48a) is ruled out in English yet allowed in Chinese given the following parameterized constraint on LF wh-movement:19 (50) At LF, wh-phrases move only to a position occupied by a wh-phrase.
English has the positive setting of this constraint. Thus, in English, in deriving the LF representation for (48a), the wh-phrase cannot move to the empty Comp selected by wonder, and Selection (the requirement that a [+wh] Comp have a [+wh] head)—applying at LF, and LF only—is violated. Thus, the example is excluded without appealing to an S-Structure filter. In contrast to English, Chinese has the negative setting of (50); hence, in deriving the LF representation of the Chinese analogue of (48a), LF movement of the whphrase to the empty [+wh] Comp is allowed and is in fact required by Selection. Thus, provided the whphrase moves to the Comp selected by wonder at LF, a derivation including (48a) will be well formed in Chinese. This analysis thus eliminates S-Structure application of the [+Wh] Comp Filter by appealing to a parameterized LF movement constraint. However, LF parameters are potentially problematic because the language learner confronts little direct evidence regarding LF properties, and it is therefore unclear how LF parameters can be set. In the case of (50), however, this parameter could be implicationally linked to an SStructure parameter: languages with S-Structure wh-movement have the positive setting, whereas languages without syntactic wh-movement have the negative setting. This approach would overcome the learnability problem: the LF value of the parameter is set on the basis of S-Structure properties “revealed” in the primary linguistic data. Under this analysis, then, a given setting of the syntactic wh-movement parameter entails a particular setting of the LF parameter (50). Thus, under this analysis, as under the one proposed by L&S (1984), a pair of languages like English and Chinese differ with respect to two properties: (1) syntactic wh-movement and (2) the implied difference that follows. Now we must ask, Can the facts be explained without appealing to implicational universals? Is there an alternative analysis postulating only one relevant difference between such languages? To begin with, suppose, following L&S (1984), that there is a syntactic parameter of the following form:20 (51) ±syntactic wh-movement
Suppose further that the following filter is universal and applies only at LF: (52) A [+wh] Comp must contain a [+wh] phrase.
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
87
These two principles alone cannot explain the fact that the S-Structure representation (48a) is ill formed in English yet its analogue is well formed in Chinese. Notice that Economy is of no help here. The facts are that in English the wh-phrase moves to the [+wh] Comp at S-Structure whereas in Chinese such movement takes place at LF. In contrast to Economy, the following principle, adapted from Pesetsky (1989), yields the desired prediction: (53) Earliness Principle Satisfy filters as early as possible on the hierarchy of levels (D-Structure) > S-Structure > LF.
Since English has syntactic wh-movement, filter (52) can be satisfied at S-Structure.21 Hence, it must be satisfied at S-Structure, and (48a) is ill formed. By contrast, since Chinese lacks syntactic wh-movement, the earliest the LF filter can be satisfied is in the LF component. Therefore, the Chinese analogue of (48a) is well formed. Under this analysis, there is no difference between English and Chinese with respect to S-Structure Comp filters or LF movement constraints. Rather, there is only one formal difference between the two: English has syntactic wh-movement, and Chinese does not. The rest follows from non-implicational, unparameterized universal principles: in particular, the [+Wh] Comp Filter (52), applying universally at LF and only at LF, and the Earliness Principle. This elimination of implicational movement parameters in the account of English- versus Chinese-type languages parallels the argument Pesetsky (1989) provides in eliminating the implicational account of the difference between Polish-type languages, which have multiple syntactic wh-movement, and English-type languages, which do not. Pesetsky (1989) notes that he earlier (1987) assumed the following two differences between Polish and English: (54) Polish English Multiple syntactic wh-movement + − LF movement − +
He points out that this analysis raises at least the following question: Why is it true that languages with multiple syntactic movement lack LF movement? He then argues that this follows from the Earliness Principle. That is, in languages with multiple syntactic wh-movement, the requirement that each wh-phrase be assigned scope can be satisfied at S-Structure; hence, by Earliness, it must be satisfied at S-Structure. In other words, all wh-phrases must move in the syntax. Earliness thus predicts that wh-in-situ is prohibited in such languages.22 Under Earliness, it no longer need be assumed that Polish-type and English-type languages differ in two respects. Rather, it can be assumed that the two types of languages differ in only one way, as (55) (from Pesetsky (1989)) illustrates. (55)
Thus, under Earliness, Pesetsky (1989) eliminates an LF movement parameter in explaining the Polish/ English difference. In exactly the same fashion we have argued that the Chinese/English difference reduces
88
ESSAYS IN SYNTACTIC THEORY
to a single difference, regarding syntactic wh-movement, between these two languages. That is, given Earliness, we need not appeal either to a parameterized S-Structure [+Wh] Comp Filter or to a parameterized LF movement constraint. Combining Pesetsky’s (1989) results and those obtained here, we instead derive the following distinctions among Polish, English, and Chinese wh-movement: (56)
We have argued in this section that the obligatory S-Structure occupancy of a [+wh] Comp by a wh-phrase in languages like English can be deduced from Earliness, but not from Economy. It is natural to ask what the relationship between these two principles might be.23 I would like to tentatively suggest that they coexist as independent principles of Universal Grammar. As I have argued, each one explains phenomena that the other cannot. I will conclude by illustrating a general and empirically correct interaction of these two principles. Consider again the D-Structure representation of (1). (57)
As shown in this section, under Earliness, we can eliminate S-Structure application of the [+Wh] Comp Filter, a desirable result. Earliness correctly predicts (without appealing to descriptive S-Structure filters, implicational universals, or LF parameters) that each [+wh] Comp must be occupied by a [+wh] phrase at SStructure. By contrast, Economy makes no such prediction. Thus, Earliness, but not Economy, correctly forces syntactic movement, yielding the S-Structure representation (2), repeated here.24 (2) [S′ whoi [S ti wonders [S′ wherej [S we bought what tj]]]]
But now notice that the converse state of affairs obtains in the LF component. In order to block the illicit derivation from the S-Structure representation (2) to the LF representation (9), Economy is needed. By contrast, Earliness does not block this derivation (all filters are satisfied as early as possible in this illicit derivation). Thus, Economy (but not Earliness) correctly blocks further LF movement of the syntactically moved wh-phrases, whereas Earliness (but not Economy) correctly forces syntactic movement. The correct predictions can be obtained (syntactic movement is forced, LF movement is blocked) if Earliness and Economy are assumed to be independent principles of grammar. However, there is also some evidence indicating that certain interactions of these two principles yield incorrect predictions (at least within the particular analysis presented here). To see this, first recall two assumptions that were made: (1) Subjacency constrains the syntax but not LF, and (2) lexical proper government suffices for the ECP. With this in mind, consider long-distance wh-movement originating from a D-Structure representation such as (58).
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
89
(58)
(Recall also that under the Earliness analysis presented in this section, the [+Wh] Comp Filter does not apply at S-Structure.) With respect to (58), Earliness would appear to demand that movement must occur in the syntax. But notice that syntactic movement requires (at least) two moves by virtue of Subjacency. Given this, Economy requires that movement must be postponed until LF, at which level what can be moved to the [+wh] Comp in one movement (not two, or more). Contrary to appearances, the two principles do not impose contradictory requirements. Rather, Economy simply “overrides” Earliness in the following way: Earliness says, “Satisfy the [+Wh] Comp Filter as early as possible.” But under Economy, the earliest possible level at which this filter can be satisfied is LF. This is because a derivation in which the filter is satisfied at S-Structure requires too many moves and is therefore impossible; in other words, Economy excludes it.25 Thus, movement is incorrectly postponed until LF, and this interaction of the two principles therefore appears to yield the wrong result. I tentatively suggest that this problem can be overcome while retaining the beneficial aspects of both the Economy Constraint and the Earliness Principle. This can be achieved by modifying Economy (a derivational constraint) so that it refers not only to the set of filters but also to Earliness, itself a derivational constraint: (59) Economy Constraint (final version) Satisfy filters and the Earliness Principle, using the fewest possible applications of Affect α.
Thus, Economy makes allowances for—in other words, it “tolerates”—the Earliness Principle. That is, filters must be satisfied as early as possible regardless of the number of applications of Affect α this might require. In a certain sense this is natural: if we think of Economy as blocking only unnecessary movement, then we may think of the class of necessary movements (those allowed by Economy) as being not only the set of movements forced by filters, but also the set of movements forced by derivational constraints, such as Earliness.26 However, much more research is required to determine whether these principles are in fact organized in exactly this way. (Notice that if the solution tentatively proposed here (and any others) should prove to be untenable—that is, if Economy and Earliness are shown to be incompatible—then the “worst” result is only that we cannot simultaneously maintain both the Economy explanations discussed in sections 1–5 and the Earliness analysis eliminating S-Structure application of the [+Wh] Comp Filter, discussed in section 6.) A comprehensive determination of the wide-spread empirical predictions (beyond those investigated here) made by these two overarching derivational constraints and their interaction awaits further inquiry. Notes * I thank Noam Chomsky, Howard Lasnik, Elaine McNulty, David Pesetsky, and Esther Torrego for very helpful discussion. Thanks also to an LI reviewer for a careful reading and helpful suggestions. Portions of this material were presented in February 1990 at an MIT colloquium. 1 This is my judgment, as well as the judgment given in Baker (1970), Lasnik and Saito (1984; 1992), Pesetsky (1987), and Chomsky (1973). See Pesetsky (1987, fn. 12) and the references cited for discussion of divergent
90
ESSAYS IN SYNTACTIC THEORY
views regarding the possibility of a wide scope reading of what. As will become clear momentarily, the scope facts concerning what are not at issue here. 2 Throughout, I will use “S′/S” notation. I presume that the analyses presented here translate readily into a “CP/IP” analysis. 3 The ECP does not suffice to exclude the derivation. That is, I assume that where can γ-mark its trace at SStructure and/or that the trace of where is lexically properly governed (see Huang (1982)), as demonstrated by its ability to occur grammatically in situ. Regardless of the status of where, the following analogous types of examples, lacking where, would remain problematic: (i) Who wonders what we gave to who(m)? (ii) Who wonders who bought what?
4
5
6 7 8 9 10 11
12
Thus, the existence of the problematic type of derivation that we are trying to exclude does not rest on the status of where. There is some unclarity concerning the notion “A-position.” AHS (fn. 10) define A-positions as those “basegenerated under S.” As we will see, such a structural definition of “A-position” is problematic since all Comps, except the matrix Comp, are “base-generated under (the matrix) S.” Thus, by this definition, (10) permits the derivation from (2) to (9), precisely the type of derivation it was intended to exclude. The fact that (10) is descriptive is noted by AHS (p. 75), who write, “We do not have any good explanation as to why wh-R [=LF wh-movement] cannot apply in these cases. [The constraint (10)]…simply state[s] what appears to be a fact….” Following arguments given in Epstein (1987; 1991), I am assuming that the SCC is not a general constraint on Affect α, but rather constrains only movement. See VR&W for motivation for this model and discussion of other models. The scope of what is irrelevant (see again fn. 1). See VR&W (section III.5) for further discussion and suggestions regarding the deduction of this filter. I have omitted certain technical details (irrelevant here) from their filter (68′), p. 197. VR&W (p. 197) in fact discuss the apparent descriptiveness of their filter (68′), of which (21) is a simplified version. As Epstein (1987; 1991) shows, (26) need not be stipulated but rather follows from the indexing algorithm motivated on entirely independent grounds in Chomsky (1982). Under that algorithm there is no free Ā-indexing at S-Structure; furthermore, movement cannot create indices, but can only carry over indices that have already been assigned. Thus, it follows that wh-adjuncts and their traces have no indices at S-Structure. Moreover, they cannot be assigned indices until Free LF A-Indexing applies at the LF level. This excludes the unwanted derivation in (25) since (25a) cannot be generated; in other words, since the adjunct and its traces are not indexed at S-Structure, S-Structure Comp indexing is precluded. Epstein (1987; 1991) also shows that this lack of indexing on wh-adjunct chains until the LF level explains why each adjunct trace must be antecedent governed at LF. This follows since, given their lack of an index until LF, antecedent government (which itself requires binding, hence indexing) isn’t possible until LF and hence must obtain at this level. (For an analysis of whadjuncts very much like the one proposed in Epstein (1987; 1991), see Rizzi (1990), where it is similarly proposed that an impoverished indexing on wh-adjuncts plays a central role in explaining their distribution with respect to the ECP.) The one-move derivation is itself allowed—that is, (31b) is grammatical—because there is no well-formed derivation involving zero moves, given the (in effect) obligatoriness of QR. Thus, Economy correctly predicts not only that topicalization of quantifiers is allowed (as in (31b)) but also that topicalized quantifiers cannot undergo LF movement. These are precisely the results we intended to derive in this section. There is however a potential problem for the Earliness Principle (discussed below) regarding topicalized quantifiers. Suppose quantifiers must undergo QR (i.e., S-adjunction) at LF (May (1977; 1985)). Since they can
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
91
undergo topicalization (i.e., S-adjunction) at S-Structure (see (31b)), Earliness would predict that they must undergo topicalization at S-Structure—that is, Earliness would predict that (31a) is ungrammatical. It should also be noted that topicalization of nonquantificational categories (in contrast to topicalization of quantifiers, as discussed in this section) is potentially problematic for Economy. For example, since the zeromove derivation in (i) is possible (unlike the case with quantifiers, which must undergo QR), Economy predicts that the seemingly longer one-move derivation in (ii) is ungrammatical since it contains an unnecessary move. (i) I like Jim. (ii) Jim, I like. As Chomsky (1991) notes, such apparent cases of this particular type of optionality may well be a serious problem for Economy. 13 Some find embedded topicalization as in (36b) marginal; others find it acceptable. Both groups detect a contrast between (36b) and (36d). It is this contrast with which we are concerned here. 14 If Infl becomes a proper governor at LF by virtue of adjoining to S at this level (see L&S (1992)), then the shorter derivation, in which whoj remains in situ at S-Structure, actually involves two moves: Infl-adjunction to S and one-fell-swoop movement of whoj. The longer derivations in (39) and (40) would still each involve unnecessary movement and are therefore still excluded by Economy. Notice that we have not discussed topicalization of wh-adjuncts as in (i), for example. (i) [S′ whoi [S ti thinks [S′ [S whyj [S Bill left tj]]]]] Such cases uniformly violate the ECP at LF. 15 If Japanese, unlike English, does permit syntactic S-adjunction of wh-phrases, as argued by Saito (1989), this would clearly present a problem for the Economy analysis (as do other uneconomical derivations (involving Japanese scrambling and reconstruction) that Saito argues for). A descriptive account might nonetheless be adapted from the preliminary discussion of the Japanese/English difference provided by Saito (1989, sec. 3). He argues that S-adjoined position has a different status in the two languages. Only in Japanese is this a position in which an NP can appear at D-Structure (what Saito calls a “D-position”); this is due to the existence of the Japanese multiple subject construction (see Saito (1989) and the references cited for details). Thus, in Japanese, but not English, S-adjoined position is a D-position. Saito then describes the Japanese/English difference regarding the possibility of syntactic S-adjunction of wh-phrases directly in terms of the notion “D-position.” His description can be adapted into the Economy analysis as follows: (i) Economy does not count movement to and from D-positions. Of course, (i) has precisely the status Saito ascribes to his own analysis; it provides a mere description that, if correct, remains to be explained. Also, an LI reviewer suggests that descriptions like (i) “weaken” the Move a generalization in that Economy treats different types of movement differently. 16 As Howard Lasnik (personal communication) notes, it is a problem for this approach that (at least) Condition C of the binding theory and Case requirements seem to be imposed at S-Structure. 17 Epstein (1990) argues that certain unified analyses are empirically incorrect. That is, unification is not per se desirable, but rather is subject to empirical test. Consider the most extreme (hypothetical) case of unification in which Universal Grammar is reduced to one and only one principle P. Such a unified, one-principle theory necessarily analyzes every case of ungrammaticality as the same type of ungrammaticality, namely, “a violation of principle P.” However, there is empirical evidence against such a theory of our knowledge of syntax. Speakers not only know what is grammatical and ungrammatical; they distinguish different types and degrees of ungrammaticality. As Chomsky (1965) points out, such knowledge of different types and degrees of (un)
92
ESSAYS IN SYNTACTIC THEORY
18 19 20 21
22
23
grammaticality can in principle be explained by a theory of syntax incorporating a number of different principles (or rules). Different types and degrees of ungrammaticality can then be analyzed as violations of different principles; that is, the facts can be predicted. By contrast, the hypothetical one-principle theory is unable to predict the different types and degrees of ungrammaticality that exist; in other words, it is unable to explain our differentiated knowledge of syntax. Epstein (1990) investigates a non-hypothetical case of unification—namely, the reduction of the Case Filter to the θ-Criterion, as proposed by Chomsky (1981)—and argues that this unified analysis, under which two principles (the Case Filter and the θ-Criterion) are reduced to one (the θ-Criterion), is empirically problematic to the extent that there exist different types of ungrammaticality that it cannot distinguish or predict. (See Epstein (1990) for more detailed argumentation.) The unified analysis provided by an overarching derivational constraint like Economy (as well as the unified analyses provided by other overarching principles) might well confront a similar type of empirical problem, but this is a matter I leave for future research. See Chomsky (1986a) for further details. This is adapted from Chomsky (1986a). I am ignoring complexities introduced by the Vacuous Movement Hypothesis adopted there. An LI reviewer suggests that principles like (51), which specifically mention a certain type of movement, may represent a “weakening” of the Move a generalization. Under the standard interpretation of satisfaction, this is clearly (and trivially) false; that is, since filter (52) does not apply at S-Structure, it cannot be satisfied at this level. I leave this general issue confronting the Earliness account open. See Pesetsky (1989, 6) for discussion. In fact, as Pesetsky (1989) argues, Earliness does not exactly predict that wh-in-situ is prohibited in languages with multiple syntactic wh-movement. Rather, it predicts that a wh-phrase that must be assigned scope by movement (i.e., a non-D-linked wh-phrase) cannot remain in situ at S-Structure. That is, if a wh-phrase is assigned scope by movement, and the language can move it as early as S-Structure, then it must move at S-Structure. But if a wh-phrase is assigned scope, not by movement, but by coindexation with a Q-morpheme at LF (as Pesetsky argues is the case for D-linked wh-phrases), then Earliness does not demand that the phrase move at S-Structure. The prediction then is that languages with multiple syntactic wh-movement will allow wh-in-situ, in the event that the wh-phrase is not going to undergo LF movement—that is, in the event that the wh-phrase is D-linked. Pesetsky argues that this is true: in such languages D-linked wh-phrases can occur in situ. Pesetsky further argues that, in fact, not all wh-phrases in situ are D-linked in such languages. A non-D-linked wh-phrase can also remain in situ but only if, for some reason, syntactic movement is blocked. This effect too is predicted by Earliness: in such cases the earliest possible movement is at LF; hence, LF movement is allowed. (See Pesetsky (1989) for further details.) Pesetsky (1989) argues that the Economy Constraint should be eliminated (and replaced with the Earliness Principle). His argument rests largely on the claim that unnecessary movement (movement not forced by any filter) exists, but is incorrectly excluded by Economy. In a nutshell, Pesetsky argues for the existence of unnecessary movement as follows: (i) a. In English, Infl is [+θ-opaque, –Case-opaque]; in other words, a verb in Infl cannot assign a θ-role via its trace (Pollock (1989)), but it can assign Case via its trace (Lasnik (1989)). b. In English, the position u (a position Chomsky (1991) calls AGR-O and Pollock (1989) calls AGR) is [−θ-opaque, +Case-opaque]; in other words, a verb in u can assign a θ-role via its trace, but cannot assign Case via its trace. (ii) Given (i), verbs such as existential be (a Case assigner, but not a θ-assigner) can move to Infl (see (a) below) but cannot move to u (see (b)). a. (adapted from Pesetsky’s (80a)) There [Infl] arei] never ti any cops when you need them, b. (adapted from Pesetsky’s (80b)) *My whole life, there [Infl have] [u beeni never ti any cops when I’ve needed them,
DERIVATIONAL CONSTRAINTS ON Ā-CHAIN FORMATION
93
(iii) Given the Head Movement Constraint (or the ECP), verb movement to Infl, as in (iia), cannot be direct but rather must pass through u as an intermediate step. (iv) But given that be cannot move to u (see (ii)), the only way to satisfy the Head Movement Constraint (or the ECP) is for u to be absent from the structure. (v) Therefore, u is only an optionally present position, (vi) Since u is optional, no filter can require movement to it. (vii) Certain (main) verbs do move to u (see Pesetsky (1989, section 3.1.2)). (viii) Movement to u, when it occurs, is therefore unnecessary movement, required by no filter. (ix) Such movement exists (see (vii)), but it is unnecessary (see (viii)); therefore, Economy is overly restrictive.
There are at least two potential problems with this argument (as Pesetsky notes). First, with respect to (vi), it could be that u is optionally present, but when it is present, movement to it is nonetheless required (see Pesetsky (1989, 38)). Second, with respect to (ii), example (iib) does not show that be cannot move to u. Rather, it shows only that be cannot occupy u at S-Structure. For further detailed discussion, see Pesetsky (1989, section 3.3). 24 What, instead of where, could move to the embedded Comp. This is irrelevant to the present point. 25 I am assuming that even though the S-Structure representation itself might be well formed, it is not possible to satisfy a filter at S-Structure, if the S-Structure representation in question is not part of a well-formed derivation. 26 An LI reviewer suggests that the relation between Economy and Earliness expressed in (59) might follow from a more general principle of constraint relations dictating that it is always the definition of the transderivational constraint (Economy) that incorporates the constraint on individual derivations (Earliness).
References Aoun, J., Hornstein, N. and Sportiche, D. (1981) “Some Aspects of Wide Scope Quantification,” Journal of Linguistic Research 1: 69–95. Baker, C.L. (1970) “Notes on the Description of English Questions: The Role of an Abstract Question Morpheme,” Foundations of Language 6: 197–219. Baltin, M.R. (1982) “A Landing Site Theory of Movement Rules,” Linguistic Inquiry 13:1–38. Chomsky, N. (1965) Aspects of the Theory of Syntax, Cambridge, Mass.: MIT Press. Chomsky, N. (1973) “Conditions on Transformations,” in Anderson, S. and Kiparsky, P. (eds.) A Festschrift for Morris Halle, New York: Holt, Rinehart and Winston. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986a) Barriers, Cambridge, Mass.: MIT Press. Chomsky, N. (1986b) Knowledge of Language: Its Nature, Origin, and Use, New York: Praeger. Chomsky, N. (1991) “Some Notes on Economy of Derivation and Representation,” in Freidin, R. (ed.) Principles and Parameters in Comparative Grammar, Cambridge, Mass.: MIT Press. [Originally published (1989) in MIT Working Papers in Linguistics 10, Department of Linguistics and Philosophy, MIT.] Chomsky, N. and Lasnik, H. (1977) “Filters and Control,” Linguistic Inquiry 8:425–504. Epstein, S.D. (1986) “On Lexical Proper Government and LF Wh-Movement,” manuscript, University of Connecticut, Storrs. Epstein, S.D. (1987) “Empty Categories and Their Antecedents,” unpublished doctoral dissertation, University of Connecticut, Storrs.
94
ESSAYS IN SYNTACTIC THEORY
Epstein, S.D. (1990) “Differentiation and Reduction in Syntactic Theory: A Case Study,” Natural Language and Linguistic Theory 8:313–323. Epstein, S.D. (1991) Traces and Their Antecedents, New York/Oxford: Oxford University Press. Huang, C.-T. J. (1981/82) “Move WH in a Language without WH-Movement,” The Linguistic Review 1:369–416. Huang, C.-T. J. (1982) “Logical Relations in Chinese and the Theory of Grammar,” unpublished doctoral dissertation, MIT. Kiss, K. (1987) Configurationality in Hungarian, Budapest: Akademiai Kiado and Dordrecht: Reidel. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160. Lasnik, H. (1989) “Case and Expletives: Notes Toward a Parametric Account,” paper presented at the 1989 Princeton Workshop on Comparative Syntax, Princeton University. Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. Lasnik, H. and Saito, M. (1992) Move α, Cambridge, Mass.: MIT Press. Lasnik, H. and Uriagereka, J. (1988) A Course in GB Syntax: Lectures on Binding and Empty Categories, Cambridge, Mass.: MIT Press. May, R. (1977) “The Grammar of Quantification,” unpublished doctoral dissertation, MIT. May, R. (1985) Logical Form: Its Structure and Derivation, Cambridge, Mass.: MIT Press. Pesetsky, D. (1982) “Paths and Categories,” unpublished doctoral dissertation, MIT. Pesetsky, D. (1987) “Wh-in-Situ: Movement and Unselective Binding,” in Reuland, E.J. and ter Meulen, A.G.B. (eds.) The Representation of (In)definiteness, Cambridge, Mass.: MIT Press. Pesetsky, D. (1989) “Language Particular Processes and the Earliness Principle,” paper presented at the GLOW Colloquium, Utrecht, The Netherlands. Pollock, J.-Y. (1989) “Verb Movement, Universal Grammar, and the Structure of IP,” Linguistic Inquiry 20:365–424. Riemsdijk, H. van and Williams, E. (1981) “NP-Structure,” The Linguistic Review 1:171–217. Rizzi, L. (1990) Relativized Minimality, Cambridge, Mass.: MIT Press. Saito, M. (1989) “Scrambling as Semantically Vacuous Ā-Movement,” in Baltin, M.R. and Kroch, A.S. (eds.) Alternative Conceptions of Phrase Structure, Chicago: University of Chicago Press. Stowell, T. (1981) “Origins of Phrase Structure,” unpublished doctoral dissertation, MIT.
8 OVERT SCOPE MARKING AND COVERT VERB-SECOND
In this article I reanalyze certain superiority phenomena and investigate consequences of the account I propose. In section 1 I briefly review two standard analyses of superiority effects and a problematic type of case (from Lasnik and Saito (1992)) that neither analysis handles correctly. In section 2 I present the Operator Disjointness Condition (ODC), which Lasnik and Saito (L&S) (1992) propose as the central principle accounting for superiority effects (including the problematic cases they present). In section 3 I argue that the ODC can be reformulated as a more natural principle of scope marking: the Scope-Marking Condition (SMC). I then show that the success of this reformulation rests crucially on incorporating the theory of linked chains independently motivated by Chomsky and Lasnik (1993). In section 4 I present independent support for the SMC. In section 5 I tentatively propose a new theory of adjunction. In section 6 I investigate how subject position in English becomes properly governed by I-raising at LF (a process crucial to L&S’s analysis of superiority). In particular, I seek an explanation for the fact that LF I-to-C movement, but not syntactic I-to-C movement, renders subject position properly governed. I argue that a principled account can be provided within the framework of checking theory (Chomsky (1991; 1993)). I propose, among other things, that (a) English is a “covert verb-second” grammar, (b) LF representations derived by V-raising are VP-recursion structures resulting from checking and deletion of functional heads and their projections, and (c) index-sensitive head government conditions may be at least partially eliminable and replaced by simpler and arguably more natural requirements on traces (chains), given the derived (VP-recursion) constituent structure I propose for checking-induced deletion. In section 7 I reformulate certain aspects of the proposed analysis within the bare theory of phrase structure presented in Chomsky (1994). 1 Superiority Effects: Previous Analyses and Counterexamples The classic example of the superiority effect is given in (1). (1) a. [Whoi [ti bought what]] b.*[Whatj did [who buy tj]]
One account of this phenomenon is provided by Chomsky’s (1973, 246) Superiority Condition. (2) The Superiority Condition
96
ESSAYS IN SYNTACTIC THEORY
a. No rule can involve X, Y in the structure…X…[…Z… WYV…]…, where the rule applies ambiguously to Z and Y, and Z is superior to Y. b. “[T]he category A is ‘superior’ to the category B…if every major category dominating A dominates B as well but not conversely.”
This constraint on rule application renders (1b) ungenerable. In the D-Structure representation, X=Comp, Z=who, and Y=what. Therefore, wh-movement, applying ambiguously to X(who) or to Y(what), is not allowed to “involve” X(Comp) and Y(what), since Z(who) is superior to Y(what). By contrast, (1a) is generable. Another account of cases like (lb) is provided by Aoun, Hornstein, and Sportiche (1981, 81), who attribute the account to Chomsky (see Chomsky (1981)). Under this alternative, the Superiority Condition is reduced to the Empty Category Principle (ECP), given the following three assumptions: (3) a. Comp-Indexing: At S-Structure, [Comp XPi…]→ [Compi XPi…] iff Comp dominates only i-indexed elements. b. At LF all wh-phrases occupy Comp. c. Only an indexed Comp (Compi) coindexed with a subject trace (ti) can properly govern ti.
Aoun, Hornstein, and Sportiche consider the following ungrammatical French analogue of (lb): (4) * Quel manteau qui a acheté? what (sort of) overcoat who has bought
Given (3a), the S-Structure representation of (4) is (5). (5) [S′ [Compj quel manteauj] [S quii a acheté tj]]
Given (3b), qui moves to Comp in the LF component, yielding (6). (6) * [S′ [Compj quel manteauj quii] [S ti a acheté tj]]
Given (3c), the ECP is violated: ti is not properly governed. Thus, there are at least two general types of approaches to superiority phenomena: a constraint on rule application and an alternative hypothesis under which the constraint on rule application arguably reduces to the ECP, a constraint on representation. A problem for both analyses is the grammaticality of examples like (7), from L&S (1992). (7) a. Who wonders what who bought? b. [CP Whoi Ci [IP ti wonders [CP whatj Cj [whok bought tj]]]]
Incorrectly, the embedded CP in (7b) is ungenerable, given the Superiority Condition. Under the alternative account, the LF representation of (7b) violates the ECP, since the trace left by LF movement of whok cannot be properly governed. This incorrect result comes about as follows: First, I assume that (3a) can be reformulated in more current terms as (8).
OVERT SCOPE MARKING AND COVERT VERB-SECOND
97
(8) Specifier-head coindexing occurs at S-Structure.
The effects of (8) are indicated by the indices in (7b). Second, I assume (3b) can be reformulated as (9). (9) In LF representation, each wh-phrase either occupies [Spec, CP] or is adjoined to a wh-phrase occupying [Spec, CP].
Consequently, whok must move in the LF component, leaving a trace in subject position. Third, I assume the following descriptive generalization, discussed in detail in Epstein (1992):1 (10) A wh-phrase occupying [Spec, CP] at S-Structure does not undergo LF movement.
Under (10), both whoi and whatj remain in [Spec, CP] in the LF component. Consequently, the only way for whok to satisfy (9) is to adjoin to a wh-phrase in [Spec, CP]. But in such an adjoined (nonspecifier) position, specifier-head indexing (in LF) is blocked. Therefore, no Comp (=C0) will bear the index k. Consequently, proper government of the subject trace is precluded under the following definition of antecedent government (see L&S (1992)), which dictates that a trace must be bound by an X0 in order to be antecedent-governed by it: (11) Antecedent government A antecedent-governs B iff a. A=X0, and b. A binds B, and c. B is subjacent to A.
In summary, if (1b) is excluded either by the Superiority Condition or under the reduction of the Superiority Condition to the ECP, then (7) will be wrongly excluded along with it. The grammaticality of (7) is not the only fact to be explained: its interpretation must be accounted for as well. In particular, whok, the wh-phrase in the embedded subject position at S-Structure, has no narrow scope interpretation; instead, it obligatorily takes wide scope (see L&S (1992)). This is suggested by the fact that (12a) is, but (12b) is not, an appropriate answer (as indicated by “#”).2 (12) a. Sue wonders what Bill bought. b.#Sue wonders what who bought.
Thus, in the LF representation of (7b), whok obligatorily adjoins to whoi, occupying the matrix [Spec, CP], yielding an LF representation roughly as follows (see below for further details): (13)
Notice that principle (9) alone fails to account for the scope of whok.
98
ESSAYS IN SYNTACTIC THEORY
Thus, (at least) two questions are raised by examples like (7):3 (14) a. How can we at the same time exclude (1b) and allow (7b)? b. Why must whok in (7b) take wide scope; that is, why must whok adjoin to whoi, in the LF component? 2 The Operator Disjointness Condition In this section, I present L&S’s (1992) answers to the questions posed in (14). First, with respect to (14a), the exclusion of the central case of superiority (1b), they reject an account in terms of the Superiority Condition, since, as noted, this condition wrongly excludes (7b) as well. They also reject the ECP account, which rests on the assumption that, in English, a trace left by a subject undergoing LF movement cannot be properly governed. As L&S note, this assumption cannot be maintained. The well-formedness of the LF representation (13) indicates that a wh-phrase in subject position at S-Structure can indeed undergo LF movement (in fact, long-distance LF movement) and leave a trace that is somehow (see below) properly governed. Thus it appears that subject position somehow becomes properly governed in LF, and therefore the ECP account of (1b) cannot be maintained. As L&S discuss, further evidence for rejecting this analysis is that it is unable to account for “pure” superiority effects such as the following (see Hendrick and Rochemont (1982)): (15) * I wonder [CP whoi Ci [you told whom [CP (ti) [IP PRO to see ti]]]]
That is, LF movement of whom satisfies the ECP, given the following disjunctive formulation (from L&S (1992, 28, 52)): (16) a. ECP A trace must be properly governed, b. A trace is properly governed iff it is antecedent-governed (see (11)) or lexically properly governed, c. Lexical proper government A trace is lexically properly governed iff it is assigned Case or a θ-role by a lexical head.
Under (16c), even though the LF trace of whom cannot be antecedent-governed (since whoi has indexed C0 at S-Structure—see (8)—and remains in this position in LF—see (10)), it satisfies the ECP by being lexically properly governed. Therefore, L&S argue, one can appeal neither to the ECP nor to the Superiority Condition to explain superiority phenomena, in particular, the contrast between the ungrammaticality of (1b) and the grammaticality of (7). In order to describe the contrast, L&S propose the following condition (pp. 120– 121): (17) The Operator Disjointness Condition (ODC) a. A wh-phrase X in [Spec, CP] is O-disjoint (operator-disjoint) from a wh-phrase Y if the assignment of the index of X to Y would result in the local Ā-binding of Y by X (at S-Structure). b. If two wh-phrases X and Y are O-disjoint, then they cannot undergo Absorption.
OVERT SCOPE MARKING AND COVERT VERB-SECOND
99
(Regarding Absorption, see Higginbotham and May (1981).) First, consider how (17) excludes (lb). The S-Structure representation of (18) [CP Whatj did [IP whoi buy tj]]
The first step is to determine whether whatj is O-disjoint from whoi (17a). This is done by hypothetically assigning the index j to who. This results in the hypothetical representation (19). (19) [CP Whatj did [IP whoj buy tj]]
Now we ask, Does what: locally Ā-bind whoi in this representation? It does, given the following definition (see Chomsky (1981;1986) and Epstein (1986; 1987; 1991) for alternative, empirically distinct definitions): (20) X locally Ā-binds Y iff a. X binds Y, and b. X occupies an Ā-position, and c. if Z binds Y, then Z binds X.
Since the hypothetical assignment of the index of what to who in (19) does indeed yield local Ā-binding of who by what in the hypothetical representation (19), these two operators are marked as O-disjoint in the (non-hypothetical) S-Structure representation (21). (21)
Now, in the LF component, (9) applies, dictating that who must either occupy [Spec, CP] or be adjoined to a wh-phrase in [Spec, CP] in the LF representation. As discussed above, who cannot occupy [Spec, CP] since this position is occupied by what. Consequently, who has only one option: it must adjoin to what. However, in the resulting configuration, in which one wh-phrase is adjoined to another, the structural description of Absorption is met Assuming (with L&S) that Absorption is obligatory, the two wh-phrases must undergo this process—but they cannot, by (17b), since they are (were) O-disjoint. Hence (lb) has no well-formed LF representation and is therefore excluded.4 Now that (lb) is excluded, how is (7) allowed? Consider again its S-Structure representation. (22) [CP whoi [IP tj wonders [CP whatj [IP whok bought tj]]]]
Whatj and whok are O-disjoint in (22). This is because hypothetical assignment of j to whok does indeed yield hypothetical local Ā-binding of the embedded subject who by what.5 Since whok and whatj are O-disjoint, whok cannot be adjoined to whatj in LF since obligatory Absorption will be blocked by (17b). So far this is exactly what happened in (1b). Unlike in (1b), however, in (22), whok has another possible LF landing site: it can be adjoined to whoi occupying the matrix [Spec, CP]. In fact, this is a licit landing site: whoi and whok are not O-disjoint (i.e., hypothetical assignment of i to whok does not yield local Ā-binding of whok by whoi
100
ESSAYS IN SYNTACTIC THEORY
because of the “intervening” matrix subject ti). Thus, provided whok adjoins to whoi in LF, the derivation is well formed: the obligatory rule of Absorption can apply. Thus, only one LF representation can be derived from the S-Structure representation (22), roughly like (13). (23)
In summary, the ODC (17) has answered the question posed in (14a): (1b) is excluded and (7) is generable. Question (14b) has been answered too: whok must take wide scope, as in (23), since, at S-Structure, whatj and whok are O-disjoint. There is, however, at least one outstanding question. The LF movement of whok constitutes (wh-island hopping) long-distance subject movement, the result of which produces a “classic Comp-trace” configuration. Why then does the LF representation in (23) satisfy the ECP, when the S-Structure representation in (24), exhibiting similar long-distance subject movement, violates it? (24) *[CP whoi [C do] [IP the men wonder [CP whatj Cj [IP ti Ii bought tj]]]]
The violation in (24) is explicable given the definition of the ECP in (16). The trace ti, created by longdistance subject movement of whoi, is not lexically properly governed since it receives neither a Case feature nor a θ-role from a lexical head (Case is assigned by I; θ-role is assigned by VP compositionally, or, under the VP-Internal Subject Hypothesis, the subject position lacks a θ-role (see Koopman and Sportiche (1986) and references cited there)). Furthermore, ti, is not antecedent-governed according to (11) since it is not subjacent to a head that binds it. The only head coindexed with ti is I. ti is indeed subjacent to Ii; moreover, they are coindexed. To obtain the desired result that Ii does not in fact antecedent-govern ti, it must be the case that I (= head of IP) does not bind ti, (= specifier of IP). This result can be obtained if the definition of binds in (11b) incorporates the branching node definition of c-command given in Reinhart (1979) (a definition I seek to deduce in Epstein (1994; to appear) and which is discussed both in further detail and in a broader theoretical context in Epstein et al. (to appear)). (25) Binds in (11b)=X binds Y iff a. X and Y are coindexed, and b. X c-commands Y. (26) X c-commands Y iff the first branching node dominating X dominates Y and neither X nor Y dominates the other.
Given (25), an (unmoved) head does not bind, and therefore cannot antecedent-govern, its specifier. Thus, in (24) ti is neither antecedent-governed nor lexically properly governed; it therefore violates the ECP as desired. But notice now that the LF representation (23) will be wrongly excluded, along with the S-Structure representation (24), since the subject trace similarly fails to be properly governed. We are thus led to believe that (23) is not in fact the LF representation of (7). Rather, there would seem to be some LF operation that renders the subject position properly governed at this level. L&S (1992, 117) propose:6
OVERT SCOPE MARKING AND COVERT VERB-SECOND
101
(27) In the LF component, I can substitute into C or can adjoin to IP.
Under either option, the subject trace in (23) will be bound (under definition (25)) and hence antecedentgoverned, and the ECP will therefore be satisfied. Thus, L&S account for all the relevant properties of (7). 3 Reformulating the ODC To summarize up to this point: L&S account for the grammaticality of (7) by rejecting both Chomsky’s (1973) account and an ECP account of superiority phenomena and by postulating (27), whereby the subject position can become properly governed at LF.7 The unique wide-scope interpretation of (7) and the ungrammatically (or uninterpretability) of (1b) (the classic case of the superiority effect) are accounted for by the ODC (17). However, the ODC is descriptive and, as L&S note, the question arises whether it can be deduced from, or replaced by, natural principles. I will show that this is indeed possible, by reformulating the ODC as a principle of scope marking. To begin with, consider again the ODC (17) and the central contrast (1a) versus (1b) (repeated here). (1) a. [Whoi [ti bought what]] b.*[Whatj did [who buy tj]]
In a nutshell, (la) is allowed by the ODC because the trace of the wh-phrase who (who in [Spec, CP]) ccommands the wh-in-situ at S-Structure; (lb) is excluded because this condition is not met. Suppose then that we simply restate the ODC as the following (preliminary) condition: (28) In the LF component, a wh-in-situ Y can adjoin to a wh-phrase X occupying [Spec, CP] only if the trace of X ccommands Y at S-Structure. (preliminary formulation)
With this in mind, let us consider the following example: (29) *[CP [IP John likes who]]
Apart from the echo-question interpretation, (29) is ungrammatical (see, e.g., Chomsky (1986, 53)). Given the LF requirement (9), the derivation in which who moves to [Spec, CP] in LF must be excluded. Neither the ODC (17) nor the alternative (28) applies to this type of derivation; hence neither can exclude it. Following Chomsky (1986, 53), suppose we assume that a grammar like English incorporates the positive value of the following parameter (whereas the grammar of, say, Chinese, incorporates the negative value):8 (30) In the LF component, wh-in-situ must adjoin to a wh-phrase in [Spec, CP] (LF substitution into empty [Spec, CP] is prohibited).
This parameter is in turn arguably derivable from parameterizing the morphological features of C0 (as proposed in Chomsky (1993)); that is, in English, [+wh] C0 has strong N-features. Thus, when a [+wh] C0 is generated, a wh-phrase must move to [Spec, CP] prior to Spell-Out, so that the strong [+wh] feature (an illegitimate PF object) can be checked and hence deleted. Thus, in (29) (assuming the matrix C0 is [+wh])
102
ESSAYS IN SYNTACTIC THEORY
the postponement of wh-movement until LF entails that the strong N-feature of C0 is not checked prior to Spell-Out and therefore remains at PF, at which level it is illegitimate. This derivation thus does not converge (i.e., it “crashes”) at PF. If, in (29), the matrix C0 is [−wh] (possible if matrix (hence lexically unselected) C0 is freely assigned [+/−wh], as in L&S (1992)), then there is simply no position in which who can have its [+wh] feature checked, since only [+wh] C0 can check a [+wh] feature. This unchecked feature therefore remains throughout the derivation. Such unchecked [wh] features borne by wh/-in-situ must be legitimate at PF (if they were not, what in (la)—and, more generally, all instances of wh-in-situ—would induce nonconvergence at PF). I assume then that these features are illegitimate LF objects; hence, wh-insitu must move to the checking domain of a [+wh] C0 in LF.9 Summarizing this analysis, I assume: (31) a. Universal: Wh-phrases bear a [+wh] feature that can be checked only in the checking domain of a [+wh] C0. b. Universal: An unchecked [+wh] feature borne by a wh-phrase is an illegitimate LF object. c. Parameter: [+wh] C0=strong vs. weak (value in English: strong)
Under this analysis, there is no convergent derivation that includes the S-Structure representation (29). By contrast, (31) alone fails to block superiority violations such as (1b), where C0 checks the [+wh] feature on what prior to Spell-Out and the [+wh] feature on who in LF. I therefore incorporate the movement constraint (28), which restricts the class of configurations in which a wh-in-situ can adjoin to a wh-element in [Spec, CP] (required for convergence at LF). In the configuration displayed by (1b), for instance, (28) prevents LF movement of wh-in-situ and therefore precludes convergence at LF; that is, the LF-illegitimate [+wh] feature borne by who-in-situ remains unchecked at LF. Regarding (28), notice also that if the trace of X c-commands Y at S-Structure (or Spell-Out), then X itself (=the wh-phrase in [Spec, CP]) will also c-command Y at S-Structure, given that the Proper Binding Condition on traces applies at this level.10 This then suggests (see below) that (28) can be redefined as (32a), with the definition of chain c-command given in (32b). (32) a. In the LF component, a wh-in-situ can adjoin to a wh-phrase X occupying [Spec, CP] only if the chain of X c-commands Y at S-Structure. (preliminary formulation) b. A chain C c-commands a position P iff every member of C c-commands P.
The constraint (32a) amounts to the arguably natural condition that a c-commanding wh-chain is a scope marker for wh-in-situ. However, with respect to (32a), one might ask why adjunction of a wh-phrase in situ Y to the wh-phrase X should depend on the configurational relation of the chain of X (as opposed to the whphrase X itself) to Y. The relevance of the configurational relation between the chain of X, on the one hand, and Y itself, on the other (and the irrelevance of the X/Y relation itself) is expected under the following provisional proposal: (33) In the LF component wh-in-situ is not adjoining to a wh-phrase (=part of a chain) occupying [Spec, CP]; rather wh-in-situ is adjoining to the wh-chain headed by the wh-phrase in [Spec, CP].
It just happens that adjunction to a chain head and adjunction to a chain yield identical configurations. As intended, (33) allows us to restate (32a) as the more natural Scope-Marking Condition.
OVERT SCOPE MARKING AND COVERT VERB-SECOND
103
(34) Scope-Marking Condition (SMC)
In the LF component, a wh-in-situ Y can adjoin to a wh-chain X only if X c-commanded Y at S-Structure. The SMC amounts to the (simple) assumptions stated in (35).11 (35) a. Only a c-commanding wh-chain is a scope marker for wh-in-situ. b. Scope marking is determined at S-Structure.
(Regarding (35b), see fn. 17.) From here on, I will refer to (33) and (34) together as the SMC analysis. The classic cases of superiority (typified by (lb)), the problematic cases provided by L&S (typified by (7)), and the “pure” cases (typified by (15)) are all accounted for by the SMC analysis, suggesting that the arguably less natural ODC is eliminable.12 Before I conclude this section, I should address one question in particular regarding the reduction of the ODC to the SMC. Notice that the chain-based SMC says that in order for a wh-in-situ Y to adjoin to a whphrase (chain) X in LF every member of the chain X must c-command Y at S-Structure. By contrast, the ODC, which the SMC is intended to supplant, entails only the following: in order for a wh-in-situ Y to adjoin to a wh-phrase X in LF, it is sufficient that merely a single trace of X c-commands Y at S-Structure. This is sufficient because the presence of such a single trace of X is enough to prevent local Ā-binding under hypothetical indexing. In this respect, the ODC and the SMC would appear to diverge. One relevant type of case to consider in this regard is the grammatical (36). (36) [CP whoi [IP t′i seems to whom [IP ti to be intelligent]]]
Clearly, the ODC makes the right prediction. Who is not O-disjoint from whom; consequently whom can adjoin to who in LF and these two operators can undergo Absorption, as the ODC analysis requires. In contrast to the ODC, it appears that the SMC analysis (wrongly) predicts that (36) is ungrammatical. Since the three-membered chain of who, whoi, t′i, ti), does not c-command whom at S-Structure (i.e., it is not the case that every member of the chain c-commands whom), it appears as though this chain is not a scope marker for whom. Consequently, in LF either whom does not adjoin to this chain, resulting in nonconvergence at LF, or whom does adjoin to the chain, violating the SMC (34). The problem is only apparent, however. There is, in fact, no chain of the form (whoi, t′i, ti) in (36). Following the analysis presented by Chomsky and Lasnik (1993), I assume that syntactic movement of who results in the appearance of two chains in structures like (36), namely, those in (37). (37) C1=(whoi, t′i) C2=(t′i, ti)
Chain C1 is an operator-variable (wh-) chain; chain C2 is an argument chain. (These two chains are said to be “linked” at the position t′i.) Given that (36) exhibits the chain structure (37), the SMC correctly predicts that in the LF component whom can indeed adjoin to the wh-chain C1. This is a licit operation because, at SStructure, the wh-chain C1 does indeed c-command whom. Thus, Chomsky and Lasnik’s (1993) analysis of chain-structure allows us to maintain the SMC (in the face of examples like (36)) precisely because the SMC is formulated in terms of the arguably natural notion “c-commanding chain”.13 In the following section I investigate certain other aspects of the SMC analysis.14
104
ESSAYS IN SYNTACTIC THEORY
4 The Scope-Marking Condition Consider the following example: (38) * Why did Bill buy what?
Although L&S (1984) judge (38) grammatical, Huang (1982a, 557) judges it less than perfect, and Lee (1993, 179) judges examples like it “…grammatical, or, at worst, slightly marginal.” Like Huang, I (and those I have informally polled) find such cases ungrammatical (i.e., worse than the grammatical Why did Bill buy a house ?), but not strongly so. For example, ECP violations, such as (39), seem worse (as Huang notes). (39) ** What did Bill buy why?
The contrast between (38) and (39) follows from the ECP. The LF representation of (38) is (40). (40) [CP whatj whyi [Ci (did)] [IP Bill buy tj ti]]
The trace of why (ti) is antecedent-governed (16c) by the X0 C that (as a result of specifier-head indexing) binds ti, which is in turn subjacent to C. The trace tj is lexically properly governed. Thus, (38) satisfies the ECP. By contrast, (39), the LF representation of which is given in (41), violates the ECP since ti is neither antecedent-governed nor lexically properly governed. (41) [CP whyi whatj [Cj (did)] [IP Bill buy tj tj]]
Thus, (39) violates the ECP. Although the ECP predicts a contrast between (38) and (39), it leaves the ungrammaticality of (38) unexplained. Regarding degree of ungrammaticality, (38) seems in fact on a par with (lb), the classic case of superiority, suggesting that we should pursue a unified analysis of these two types of examples. Further motivation for such unification is provided by the following facts. Recall from section 1 that when the classic case of superiority (1b) is (in effect) embedded, as in (7a), the result is grammatical and, moreover, the wh-in-situ requires a wide-scope reading. (1) b.* What did who buy? (7) a. Who wonders what who bought?
Example (38) displays exactly the same properties. When it is (in effect) embedded, as in (42), the result is grammatical and, furthermore, the wh-in-situ requires wide scope interpretation (as evidenced by (43)). (42) Who wonders why Bill bought what? (43) a. #Fred wonders why Bill bought what. b. Fred wonders why Bill bought a car.
Given this similarity, it seems reasonable to seek a unified analysis of (lb) and (38).15
OVERT SCOPE MARKING AND COVERT VERB-SECOND
105
Can the SMC analysis provide a unified account? To begin, recall the SMC analysis of the S-Structure representation (lb). (1) b.* [Whatj did [who buy tj]]
The SMC analysis predicts ungrammaticality: the wh-chain (whatj, tj) is not a scope marker for who since it is not the case that every member of the chain c-commands who. Consider now the S-Structure representation of (38). (44) [CP whyj did [IP Bill buy what tj]]
The SMC analysis could exclude (44) exactly as it excludes (lb) if we could maintain that, at S-Structure, ti fails to c-command the wh-in-situ what and therefore the chain (whyj, tj) is not a scope marker for this whin-situ. However, it seems implausible that tj fails to c-command what. Whatever the exact location of the adjunct trace, it presumably c-commands the direct object what.16 In particular, if tj is a sister to the V or higher, it will c-command what. Thus, the chain (whyj, tj) presumably does c-command what at S-Structure and therefore is indeed a scope marker for it. In this way, the SMC allows what to adjoin to this chain in LF and, incorrectly, no principles are violated. Hence, the SMC analysis apparently fails to account both for the deviance of (38) and for its presumed similarity to (1b). Notice however that the SMC analysis would yield exactly the right result if the S-Structure representation of (38) were not (44), but (45), in which no adjunct trace is present. (45) [CP whyj did [IP you buy what]]
Just like the wh-in-situ in (1b), the wh-in-situ in (45) has no scope marker since “only a c-commanding whchain is a scope marker for wh-in-situ” (see (35a)). In (45) why is not a wh-chain c-commanding what (since there is no variable). Thus, there is no wh-chain c-commanding what at S-Structure, with the result that the SMC disallows the required LF-movement.17 Given the S-Structure representation (45), we obtain the unified analysis with (1b). But is (45), containing no wh-adjunct trace, in fact the S-Structure representation of (38)? It is, given the following independently motivated assumption (see Epstein (1987; 1991) and L&S (1992) for ways to deduce (46)): (46) S-Structure representations contain no traces of wh-adjuncts.
This assumption is independently motivated by (among other phenomena) the fact that (47) is better than the ECP violation (48) (see L&S (1984; 1992)).18 (47) * Whoi do you wonder [CP whether [IP John said [CP t′i [IP ti left]]]] (48) ** Whyi do you wonder [CP whether [IP John said [CP t′i [IP Sue left ti]]]
Informally, the contrast in grammaticality indicates that “short” movement (i.e., movement into a “local” [Spec, CP]) followed by “island-hopping” movement is worse with an adjunct (48) than it is with a subject (47). To account for the contrast, L&S (1984; 1992, 52) formalize “proper government” (see (16a)) as feature assignment (49a–b) constrained by the stipulation in (49c).
106
ESSAYS IN SYNTACTIC THEORY
(49) a. A properly governed trace is assigned [+γ]. b. A trace that is not properly governed is assigned [−γ]. c. Only an argument is assigned a γ-feature at S-Structure.
Under (49), (47) escapes the ECP as follows: at S-Structure, t′ (or, in conformity with (16), C0) assigns [+γ] to t. t′, being a nonargument, receives no γ-feature at S-Structure and subsequently deletes in LF.19 The ECP, formulated as the following filter, is satisfied (regardless of its level of application): (50) ECP *t [−γ]
An analogous (ECP-satisfying) derivation of (48) is blocked by the stipulation (49c); that is, in (48) the nonargument t cannot be assigned [+γ] at S-Structure (with subsequent deletion of t′ in LF). Rather, (49c) dictates that γ-assignment of the nonargument t in (48) cannot occur at S-Structure. Therefore, γ-assignment must be postponed until the LF level, assuming (51) (L&S (1984, 92)). (51) γ-assignment occurs at levels of representation only.
At the LF level, t′ must be present to assign a γ-feature to t. But t′ then violates the ECP: it is neither lexically properly governed (16c) nor antecedent-governed (11). Thus, (49c) yields the desired results: (47) escapes the ECP, but (48) violates it precisely because the nonargument t is not allowed to receive a γfeature at S-Structure. But the question that arises with respect to (48) is, Why is the adjunct trace invisible to γ-assignment at S-Structure? One truly explanatory answer is, Because it is not present at S-Structure. In order to maintain such an explanatory answer, I will therefore assume that the descriptive generalization in (46) holds. To summarize, there is independent motivation concerning the ECP for assuming that traces of adjuncts are not present at S-Structure (46), the truth of which entails that (45) is indeed the S-Structure representation of (38). Given (45), the results we sought are obtained: the deviance of (38) is explained by the SMC analysis, and therefore a unified treatment of (38) and the classic superiority case (lb) is provided (i.e., (38) and (Ib) receive identical analyses, as do (7a) and (42)).20 More generally, (46) entails (52). (52) At S-Structure, a wh-adjunct occupying [Spec, CP] is neither a chain nor a chain member; hence, it never functions as a scope marker for wh-in-situ.
Having provided independent support for the SMC, formulated as a movement constraint,21 I turn to the formal mechanism, LF I-raising, whereby long-distance LF subject movement, forced by the SMC (see (7)), can satisfy the ECP at LF. I will present evidence favoring an analysis in which I undergoes movement at LF over one postulating I-in-situ at this level. The remainder of the article is concerned with determining the precise properties of such LF I-movement.
OVERT SCOPE MARKING AND COVERT VERB-SECOND
107
5 I as a Proper Governor at LF How can the LF representation (23), derived by long-distance LF subject movement, in effect “forced” by the SMC, satisfy the ECP? Recall that L&S (1992) propose that subject position (coindexed with I via specifierhead indexing) becomes antecedent-governed at LF by virtue of (the disjunctive) (27). (27) In the LF component, I can substitute into C or can adjoin to IP.
Thus, there are two well-formed LF representations associated with the S-Structure representation (7b), namely, (53) and (54). (53)
(54)
Given (27), there should be no ECP effects with subject wh-in-wh-situ. But consider the (I believe previously unnoted) contrast between the following two cases of embedded topicalization: (55) [CP Whoi [IP ti thinks that [IP this problemj [IP John solved tj]]]] (56) * [CP Who, [IP ti thinks that [IPB this problemj [IPA whok Ik solved tj]]]]
Following Baltin (1982) and L&S, I assume, as indicated in (55) and (56), that (57) Topicalization is S- (=IP-) adjunction.
Regarding the contrast in (55) and (56), I should first note that some speakers find embedded topicalization, as in (55), somewhat marginal. Nonetheless, (56) seems to be significantly less grammatical. Importantly, the ungrammatical (56) differs only “minimally” from L&S’s grammatical example (7). In (7) the object undergoes syntactic wh-movement to [Spec, CP]. In (56), by contrast, the object undergoes topicalization (not wh-movement); the result is strongly ungrammatical and arguably violates the ECP.22 (Clearly, the SMC is not implicated: obligatory LF adjunction of whok to the c-commanding wh-chain headed by whoi is licensed.) How, then can the ill-formedness of (56) be explained? I will (tentatively) adopt the LF I-movement analysis postulated by L&S, but I will argue that their disjunctive analysis (27) must be reformulated in order to account for (56).23
108
ESSAYS IN SYNTACTIC THEORY
First, with respect to (56) itself, I assume (with L&S and contra May (1985)) that each of the two IPs in this representation, IPA and IPB, is a distinct maximal projection. In other words, I assume (58). (58) Adjunction to an Xmax yields two distinct maximal projections, as opposed to a single segmented maximal projection.
Thus, the more precise representation of (56) is (59). (59)
Evidence for the existence of two separate maximal IPs in topicalized structures is provided by the following contrast adapted from L&S (1992, 96): (60) a. [CP whoi [IP ti likes this book]] b.* [CP whoi [IP this bookj [IP ti likes tj]]]
Under May’s (1985) segmentation theory of adjunction, the “distance between” ti and its antecedent (whoi) is identical in (60a) and (60b); that is, in each case there is exactly one maximal projection, IP, dominating ti and excluding whoi. The grammaticality contrast is thus (arguably) unexplained. By contrast, under a (standard) duplication theory of (Chomsky-) adjunction, there is only one IP “separating” ti from whoi in (60a), whereas there are two IPs separating ti and whoi in (60b). Exploiting precisely this distinction provided by the standard, duplication theory of adjunction they adopt, L&S analyze (60b) as a violation of (Subjacency and) the ECP. As in the standard theory of (English) bounding nodes, their analysis entails that two (distinct maximal projection) IPs form a barrier for Subjacency (and for antecedent government). Thus, the “island-forming” property of topicalization is accounted for. The relevant definitions from L&S (1992, 74, 87, 102, 183) are as follows (supplementing those cited in (11) and (16)): (61) B is subjacent to A if for every C, C a barrier for B, C m-commands A. (62) A m-commands B iff the minimal maximal projection dominating A dominates B. (See Aoun and Sportiche (1983).) (63) C is a barrier for B if a. C is a maximal projection, and b. C is not an Ā-binder, and c. C is not L-marked, and d. C dominates B. (See also Chomsky (1986).) (64) A L-marks B iff A is a lexical category that θ-governs B. (See Chomsky (1986).) (65) A θ-governs B iff A is an X0 that θ-marks B, and A and B are sisters. (See Chomsky (1986).)
Under this analysis, (60a) is predicted to be grammatical. By contrast, (60b) violates Subjacency and the ECP. The lower IP in (60b) is a barrier for ti (i.e., it is not L-marked, it dominates ti, it is not an Ā-binder, and, under the duplication theory of adjunction, it is a maximal projection in its own right). Since the lower
OVERT SCOPE MARKING AND COVERT VERB-SECOND
109
IPmax is a barrier for ti and it fails to m-command whoi (precisely because the upper IP is also a maximal projection, given “duplication-adjunction”), ti is not subjacent to who, violating Subjacency, Since this lower IP similarly fails to m-command C0 (which, under specifier-head indexing, binds ti), ti, also fails to be antecedent-governed (=subjacent to a binding head) and thereby violates the ECP. With this analysis of adjunction, Subjacency, and ECP in hand, let us now return to (59). Recall, given (27), subject wh-in-situ should not give rise to ECP violations, yet (59) is, by hypothesis, precisely such an example. Under (27) and the duplication theory of adjunction, one possible LF representation of (59) is (66); that is, whok undergoes long-distance LF movement, and Ik adjoins to IPAmax, creating a second (duplicate) IPAmax. (66)
Later I will point out another logically possible LF representation of (59), but (as I will show) it also yields the wrong result; that is, like (66), it does not violate the ECP. Suppose, following L&S (all of whose evidence, I believe, concerns syntactic adjunction) that (67) is correct. (67) Syntactic adjunction to Xmax creates a second Xmax. As discussed, this accounts for the contrast between (60a) and (60b). At the same time it could well be the case, following May (1985) (all of whose evidence, I believe, instead concerns LF adjunction), that (68) holds.24 (68) LF adjunction to Xmax segments the Xmax; that is, it does not create a second Xmax. Thus, contra standard assumptions, I believe there is in fact no incompatibility between the theory of adjunction proposed by L&S and that proposed by May (1985). In other words, I tentatively suggest that the former (duplication) characterizes syntactic adjunction, whereas the latter (segmentation) characterizes LF adjunctions.25 Even if this “mixed theory of adjunction” is correct, it provides no solution to the problem of capturing (59) as an ECP violation (at LF). That is, even if LF I-adjunction segmented IPA (as opposed to duplicating it, as indicated in (66)), there would still be no violation. (69)
In (69), as in (66), tk is antecedent-governed—the wrong result. That is, in both (66) and (69) the minimal barrier dominating the subject tk is IPAmax. Since IPAmax (in both representations) m-commands Ik, tk is (by definition) subjacent to this binding head; equivalently, tk is antecedent-governed by Ik, thereby satisfying the ECP. How can (66) and (69) be blocked? A natural proposal is that Ik cannot be adjoined to IPA. I therefore propose that the constraint on adjunction shown in (70) (the relevant aspects of which will be deduced
110
ESSAYS IN SYNTACTIC THEORY
below) prevents Ik from adjoining to IPA in (56). (The question of whether such adjunction (now prohibited) would segment or duplicate IPA disappears.) Hence, (66) and (69) are not generable. (70) Affect α, applying to the adjunction configuration …[XPB [XPA ]]… cannot adjoin a category to XPA.
Given (70), when the S-Structure representation (59) enters the LF component, LF I-adjunction to the maximal projection IPA is prohibited.26 Consequently, the ECP-satisfying LF representations (66) and (69) cannot be derived from (59). Rather, under (70), the “closest” that Ik can get to the subject trace (while binding it) is a position adjoined to the “outer” IP— namely, IPB, the IP (maximal projection) created by syntactic topicalization. In order to have the worst possible chance of succeeding in analyzing this as an ECP violation, let us assume the mixed theory of adjunction under which LF I-adjunction to IPBmax results in the segmentation of IPBmax (putting Ik closer to tk than it would be if each IPB were a distinct maximal projection, as would be the case under the duplication theory of adjunction applying at LF). This yields the following LF representation, derived by long-distance LF movement of whok and segmentation adjunction of Ik to IPBmax. (71)
As desired, (71) violates the ECP. The subject tk is not subjacent to (hence not antecedent-governed by) the binding head Ik. Recall the definitions given earlier, repeated in (72). (72) a. B is subjacent to A if for every C, C a barrier for B, C m-commands A. b. A m-commands B iff the minimal maximal projection dominating A dominates B.
For the subject trace tk to be antecedent-governed in (71), it must be subjacent to a binding head. The only head binding tk (under c-command) in (71) is Ik. In order for tk to be subjacent to Ik, every barrier for tk must m-command Ik. The maximal projection IPA is a barrier for tk. But this barrier does not m-command Ik: the minimal maximal projection dominating IPA is the bisegmental maximal projection IPB, but this bisegmental maximal projection IPB does not dominate Ik, given (73): (73) A is dominated by B only if it is dominated by every segment of B. (Chomsky (1986))
Thus, the subject tk is not antecedent-governed (nor is it lexically properly governed). The ECP is therefore violated, as desired, and we now have an account of the ungrammaticality of cases like (56).27
OVERT SCOPE MARKING AND COVERT VERB-SECOND
111
6 Syntactic vs. LF I-Raising within Checking Theory In the preceding sections I have investigated in detail some consequences of LF I-raising, in particular the resulting antecedent government of subject position by I at LF (L&S (1992)).28 Antecedent government of subject position by virtue of LF I-raising must somehow be restricted to the LF component since subject position is not in fact antecedent-governed at S-Structure, even when syntactic I-raising applies. This is shown in Rizzi (1990) by, for example, the ill-formedness of a matrix wh-question displaying both I-to-C movement and heavy NP shift of the subject.29 (74)
The ill-formedness of (74) is not predicted by the analysis assumed here; that is, (74) satisfies the ECP. Given S-Structure specifier-head indexing (see (8)), the subject trace and the I-trace are coindexed. Consequently, I and the subject are coindexed. The subject trace is therefore antecedent-governed by Ii in C. How can it be that LF I-raising renders the subject position antecedent-governed but syntactic I-raising does not? In Chomsky’s (1991; 1993) checking theory framework, there is a crucial structural difference between syntactic I-raising in English, as in (74), and LF I-raising. (The latter, I will argue, is in fact covert verbsecond.) I believe this independently motivated structural difference between the syntax and LF explains, in a principled way, the contrast in antecedent government. 6.1 Background Assumptions: Checking Theory First, regarding the category I have been calling “I,” Chomsky (1993, 7) (elaborating proposals in Pollock (1989)) proposes the clausal structure (75), in which there exist at least three inflectional heads: Agrs, Agro, and T(ense). (75) [CP C [AgrSP AgrS [TPT [AgrOP AgrO [VPV]]]]]
Regarding the movement of these three inflectional heads, Chomsky makes the (nonstandard) assumption in (76). (76) There is no affix lowering (i.e., Agr-lowering or T-lowering) onto the verb.
Rather, the verb exits the lexicon fully inflected, bearing agreement and tense features. However, these features must be checked. An inflectional feature of a verb (e.g., the verb’s tense feature) can be checked only if the verb becomes a member of the checking domain of T, which it does if it adjoins to T.30 Thus, agreement and tense checking is achieved in structure (75) by adjoining the verb to the inflectional heads, AgrO, AgrS, and T (rendering the verb a member of the checking domain of each), that is, by “V-raising.” There is, of course, parametric variation concerning the level at which V-raising (checking) occurs. For
112
ESSAYS IN SYNTACTIC THEORY
example, in French the verb raises (i.e., inflectional checking is performed) in the syntactic component, whereas in English this happens at LF.31 This picture is however complicated by the fact that, in English, certain verbs raise in the syntax (finite have/be) whereas main verbs raise in LF. Why should this be? Chomsky (1993, 24) assumes the following: (77) In general, English performs V-raising in LF; that is, inflectional features are checked at LF.
Why then do have/be exhibit the exceptional property of undergoing syntactic raising? Following a long tradition within which it is assumed that the “deficient” semantics of these verbs is responsible for their undergoing syntactic movement (see, e.g., Pollock (1989), Chomsky (1993), and the references cited there), Chomsky (1993, 31) assumes LF Movement Visibility (78). (78) LF Movement Visibility In the LF component, semantically deficient (underspecified) categories are not visible to Move α; hence, they cannot be moved by Move α.
Thus, finite have/be cannot be moved in the LF component. That these verbs raise in the syntax now follows from (78) and the above assumption that checking of the verb’s inflectional features is obligatory and can be achieved only by raising the inflected verb and adjoining it to the functional inflectional head. But this raises one other question. Why is checking obligatory? Chomsky (1993, 27) proposes the following explanation. Full Interpretation (FI)/economy of representation requires (naturally) that every symbol in an LF representation have an interpretation. Agr and (I will assume) T are semantically vacuous. They therefore must delete in order for a legitimate LF representation to be generated.32 Suppose now that (79) obtains. (79) Functional categories can be deleted only after performing their checking.
It now follows that checking (achieved only in the configuration generated by V-raising) must occur at some point prior to LF; that is, raising/checking is obligatory. Notice, however, that (79) leaves unspecified the precise point in a convergent derivation at which a functional category (e.g., AgrS, AgrO, T) checks and deletes. (This will be crucial below.) Regarding this issue, Chomsky in effect assumes that economy of representation holds at intermediate levels as well, entailing that (79) must be reformulated as (80). (80) Functional categories can be deleted only after performing checking, and they must delete immediately after performing checking (since they have no function other than checking).
6.2 A Difference between Syntactic V-Raising and LF V-Raising One of the central contrasts Chomsky (1991, section 3; 1993, 30) seeks to explain is that syntactic V-raising can violate the Head Movement Constraint (HMC) but LF V-raising cannot. This is illustrated by the following S-Structure representations: (81) [AgrSP John hasi…[NegP not ti seen Mary]] (82) [AgrSP John ____…[NegP notsee+s Mary]]
OVERT SCOPE MARKING AND COVERT VERB-SECOND
113
In (81) syntactic movement of has violates the HMC; that is, movement over the intervening head of NegP (not) has occurred. But the ECP is (somehow) satisfied. In (82) the (lexically inflected) verb see+s must raise in LF in order to have its inflectional features checked and thereby effect the deletion of the functional heads, as required by FI/economy of representation. For some reason, this LF movement, which would seem to correspond exactly to the licit syntactic movement in (81), violates the ECP. How is the contrast between syntactic and LF V-raising to be explained? Consider first the case of syntactic raising in (81). The D-Structure representation of (81) is (83).33 (83) [AgrSP John AgrS [TP T [NegP Neg [AgroP AgrO [VP has [VP seen Mary]]]]]]
In order to have its inflectional features checked, has must raise in the syntax, since it is invisible to LF Move α (see (78)). First, it adjoins to AgrO creating [AgrO [V has]+AgrO]. Even if the agreement features in has are now checked by AgrO, AgrO does not delete at this point since it has not in fact completed all of its checking. (Chomsky assumes that AgrO is also involved in the checking of the accusative Case feature with which the NP Mary exits the lexicon. That is, he assumes that NPs exit the lexicon bearing structural Case (e.g., accusative), just as Vs exit the lexicon bearing inflectional features. In the LF component, Mary moves to [Spec, AgrO], (= “object shift”), in which position the category [AgrO V+AgrO] (or, more precisely, the trace of this category) can check accusative Case (in a specifier-head configuration).) Next, in order to check the tense features on has, the category [AgrO has+AgrO] must adjoin to T. But this violates the HMC because the head Negation (Neg0) intervenes. However, this movement is movement of AgrO; hence, it leaves an AgrO-trace in the head position of AgrO?. It is this trace that would appear to violate the ECP. However, recall that after LF object shift of Mary to [Spec, AgrO], AgrO (or, more precisely, the trace of AgrO) will check accusative Case on Mary, thereby completing its checking of both inflectional and Case features; it will consequently delete automatically. The offending trace is therefore absent from the LF representation, and the ECP (applying only to LF; see, e.g., Epstein (1987; 1991) is therefore satisfied. Hence, checking theory explains why syntactic V-raising over negation is allowed. But then, why isn’t the same (ECP-satisfying) movement allowed in LF in (82)? (That is, why does LF movement in (82) violate the ECP?) I will now propose an analysis within which this corresponding movement is in fact precluded in LF by an independently motivated principle, with the result that the ECP is violated. The articulated D-Structure representation of (82) is (84). (84)
In the LF component, see+s first adjoins to AgrO yielding [AgrO see+s + AgrO]. In this configuration the agreement features of see+s are checked by AgrO. We must now somehow prevent what happened in the well-formed syntactic derivation from happening in LF: namely, raising of [AgrO see+s + AgrO] to T (violating the HMC) and subsequent deletion (by virtue of object shift) of the offending AgrO-trace left by this movement. I propose that movement of [AgrO see+s +AgrO], although allowed in the syntax, is in fact prevented in LF by an independently motivated principle. Under LF Movement Visibility, the AgrO category [AgrO see+s+AgrO] cannot be moved. Since AgrO is a semantically vacuous category, Move a applying in LF cannot see it, hence cannot move it. If no further movement applies, FI/ economy of
114
ESSAYS IN SYNTACTIC THEORY
representation is violated since, without further raising, the functional heads T and AgrS cannot check the tense and agreement features in see+s and therefore cannot delete. These illegitimate LF objects would consequently be present in the LF representation, but FI/economy of representation disallow this. In other words, this derivation crashes at LF. There is one other derivation of the ungrammatical (82) that must be blocked. Suppose that, after [V see +s] adjoins to AgrO in LF, yielding the movement-invisible [AgrO see+s+AgrO], object shift of Mary to [Spec, AgrO] occurs. Given this, AgrO (to which the V is adjoined) can now mediate checking of accusative Case on Mary. Since AgrO has now checked both the (verbal) agreement features in see+s and the (nominal) accusative Case on Mary, its checking is completed, and it therefore automatically deletes. I assume that deletion of AgrO results in the “excision repair” (sub)structure (85) in which AgrO is absent and V comes to be immediately dominated by AgrO′; that is, the deletion of AgrO, to which V was adjoined, puts V in the head of AgrOP (see Epstein (1993)). (85)
In general, I will assume (crucially) that if a head Z adjoins to a head Y, as in (86a), and Y subsequently deletes, then the derived constituent structure is (86b), in which Z occupies the head position of YP; that is, Z is immediately dominated by Y′ (just as it was prior to Y’s deletion).34 (86)
Crucially, (86b) is inconsistent with X-bar theory (Y′ fails to immediately dominate Y0), an important fact that I will take up below. For the moment I return to (85), derived by (a) V-to-AgrO movement, (b) object shift of Mary, and (c) AgrO-deletion. Given AgrO-deletion, which has produced the (intermediate) representation in (85), the V, now in the head position of AgrOP, is a category that is indeed visible for movement by LF Move α. So that its tense features can be checked, the V now adjoins to T. Crucially, this movement of a V leaves a V-trace. (87) [AgrsP John…[see+s+T] [NegP not [AgrOP Mary [AgrO′ [tv…]]]]]
OVERT SCOPE MARKING AND COVERT VERB-SECOND
115
The V-trace, unlike an AgrO-trace, has semantic content (it is, or is part of, a legitimate LF object) and is therefore not deletable (Chomsky (1991)). The ECP is therefore violated and the ungrammaticality of cases like (82) is explained.35 The contrast between syntactic V-raising over negation (81) and LF V-raising over negation (82) is accounted for. Syntactic raising is, in fact, movement of the deletable functional category AgrO. By contrast, the independently motivated and natural principle of LF Movement Visibility dictates that LF raising must be raising of the nonfunctional (undeletable) lexical category V. In what follows, I will exploit this crucial distinction.36 6.3 An Answer I am now in a position to answer the question with which this section began: How is it that LF I-raising renders the subject position antecedentgoverned but syntactic I-raising does not? The analysis depends crucially on the independently motivated presence of a functional category in the syntax, and on its absence at LF. I will begin with the question of why syntactic I-to-C raising does not cause the subject to be antecedent-governed in (74). Recall that syntactic V-raising first adjoins the V to AgrO, yielding [AgrO [V has]+AgrO]. Recall also that further syntactic raising leaves an AgrO-trace in the head of AgrOP. This trace does not delete in the syntax, but only in LF. I assume that AgrO itself (to which the V has is adjoined) deletes only when the AgrO-trace is deleted in LF. The fact that AgrO itself is not deleted before LF and hence is not deleted in the syntax is deducible from the Proper Binding Condition, which, following L&S, I will assume applies to the output of each application of Affect α. That is, if AgrO itself deleted prior to LF, the AgrO-trace (occupying the head position of AgrOP) would be unbound, violating the Proper Binding Condition. Thus, since the AgrO-trace does not delete until LF, AgrO itself does not delete until LF. It now follows that in the syntax the raised verb is still adjoined to AgrO. Now, I also assume that after syntactic adjunction of V to AgrO, the resulting [AgrO [V has]+AgrO] next adjoins to AgrS. As in Chomsky (1993), AgrS has T adjoined to it in the syntax. This happens because T (having strong N-features) must check Case on an NP in the syntax and can do so only if adjoined to AgrS.37 Thus, in the syntax, T adjoins to AgrS and then checks the Case on an NP occupying [Spec, AgrS].38 That is, under the VP-Internal Subject Hypothesis, the subject must raise in the syntax to [Spec, AgrS] so that T can do what it must, namely, check Case on an NP in [Spec, AgrS] in the syntactic component. Thus, in the part of the derivation of (74) shown in (88), [AgrO [V has]+ AgrO], occupying the head of AgrOP, now adjoins to AgrS (which has T adjoined to it). (88)
116
ESSAYS IN SYNTACTIC THEORY
a. [V has]-to-AgrO movement b. [AgrO V+AgrO]-to-AgrS movement c. T-to-AgrS movement d. heavy NP shift
Notice that movement b. over the trace of T violates HMC. However, the ECP (a constraint on LF representations) will not be violated, since the offending trace of AgrO deletes in LF. Finally, I-to-C (i.e., AgrS-to-C) movement applies to (88), moving the entire AgrS-complex to C.39 Movement of the AgrS-complex to C in (88) yields the S-Structure representation in (89). (89)
a. AgrS-to-C movement b. heavy NP shift (repeated here)
Notice that, as a result of S-Structure specifier-head coindexing, the subject trace of heavy NP shift and the trace of AgrS bear index i, and so does AgrS itself (occupying C). Assuming with L&S (1992) that subject position is indelibly γ-marked at S-Structure, the subject trace is assigned a γ-feature in (89). As usual, the head of AgrSP (= the trace of AgrS) does not properly govern its specifier. But AgrS in C does antecedentgovern the subject; that is, the subject trace is indeed subjacent to the binding head AgrS in C. Therefore, the subject trace is assigned [+γ] and the ECP is satisfied. This, of course, is the wrong result. Suppose then that we supplement L&S’s definition of antecedent government (repeated in (90a-c)) with Rizzi’s (1990) condition on proper head government (90d), a condition Rizzi motivated precisely to prevent I-to-C from causing a subject to be properly head-governed in English.40 (90) Antecedent government A antecedent-governs B iff a. A=X0, and b. A binds B, and c. B is subjacent to A, and d. A and B are both dominated by the single-bar projection of A.
Given (90d), AgrS in C is now prevented from antecedent-governing the subject trace. Even though the subject trace is indeed subjacent to the binding head AgrS, (90d) prevents antecedent-government since it is
OVERT SCOPE MARKING AND COVERT VERB-SECOND
117
not the case that both AgrS in C and the subject trace are dominated by AgrS′. The subject trace is therefore indelibly marked [−γ] in the S-Structure representation (89), and the ECP filter is therefore violated at LF. Thus, adopting (90d) gives exactly the result Rizzi intended: syntactic I-raising does not render subject position properly governed. In fact, this analysis, including (89) with AgrS in C, is for all intents and purposes identical to the I-to-C analysis Rizzi assumes. This brings us to the final question: Why is it that “I”-raising in LF does yield antecedent government of the subject, as for example in the derivation of (7a)? (7) a. Who wonders what who bought?
If we assume the VP-Internal Subject Hypothesis, the embedded CP of (7a) appears as follows, prior to movement: (91)
In the mapping to S-Structure, who raises to [Spec, AgrS], in which position obligatory syntactic checking of nominative Case by T adjoined to AgrS can take place. Moreover, what moves to [Spec, CP] so that obligatory syntactic checking of [+wh] by C can occur. Thus, the embedded CP of (7a) appears as (92) at SStructure. (92)
(Notice, by specifier-head coindexing, the trace of who (ti in [Spec, VP]) and boughti are coindexed, and therefore who and bought are coindexed as well.) In the LF component, who must undergo long-distance movement, adjoining to the wh-phrase in the matrix [Spec, CP] and leaving a trace in the embedded [Spec, AgrSP]. LF V-raising (checking) also applies. Recall that, given LF Movement Visibility, functional categories cannot be moved in the LF component. This means that, in this case, each LF movement must be movement of a verb. LF movement therefore proceeds in the following order: (93) a. V adjoins to AgrO (the resulting AgrO category is now invisible to LF Move α). b. Object-shift (of the direct object trace of what, tj) to [Spec, AgrO]. c. AgrO checks both Case features on tj and V-features on bought. d. AgrO automatically deletes, leaving a (movement-visible) V in the head position of AgrOP. e. V adjoins to AgrS (“hopping over” the trace of T, in violation of the HMC).
118
ESSAYS IN SYNTACTIC THEORY
Thus, in conformity with LF Movement Visibility, each movement is movement of a V (leaving a V-trace). The V-movement in (93), plus long-distance LF movement of the subject who, yields the intermediate representation (94) of the embedded CP in the LF component (irrelevant details omitted). Notice that LF movement of the verb bought from the head of AgrOP to AgrS-adjoined position (=b.) violates the HMC owing to the intervening trace of T. The ECP would appear to be violated because this LF movement (of a V) leaves a V-trace, which is not deletable. Thus, just like the LF movement of a V over negation, this LF movement of a V over T-trace “should” violate the ECP. But in fact the ECP, a constraint on LF representations, will be satisfied. This is so because T (and T-trace) is, as I have assumed, a functional checking category that undergoes deletion in the LF component; hence, the intervening T-trace, the presence of which induced the HMC violation, will be absent in the LF representation (see below). The ECP will then be satisfied. (Thus, there are at least two types of movement violating the Head Movement Constraint that can nevertheless satisfy the ECP: (a) the movement leaves a deletable trace (Chomsky (1991)) and (b) the intervening head (more generally, the barrier to movement) deletes. I will return to T-trace deletion, in particular the derived constituent structure, momentarily.) (94)
a. V-to-AgrO movement (LF) b. V-to-AgrS movement (LF) c. T-to-AgrS movement (syntax) d. long-distance movement of who (LF)
Now, suppose LF “I”-to-C (i.e., AgrS-to-C) movement applies to (94). Once again, under LF Movement Visibility, the functional category AgrS cannot be moved. Thus, it must check and delete. However, this deletion leaves both the V0 bought and T unattached, and it is not clear to me what the resulting structure is. I will simply assume (although I believe this is not crucial) that in (94) both AgrS and T check and delete, leaving only the V in the head of AgrSP. The V, a category visible for LF movement, is then moved to C.41 (95)
a. long-distance movement of who (repeated here)
OVERT SCOPE MARKING AND COVERT VERB-SECOND
119
b. V-to-C movement
If (95) were in fact the LF representation (of the embedded CP of (7a)), the ECP would be violated: the subject trace is not antecedent-governed. Although clauses (90a-c) of the definition of antecedent government are satisfied (i.e., the subject trace is subjacent to the binding head bought), clause (90d) is not satisfied since the subject trace is not subjacent to bought within V′. But (95) is not in fact the relevant LF representation. Given that C0 is a purely functional category in this structure, FI/economy of representation dictates that it too must delete before a legitimate LF representation is obtained (I will return to the question of whether C0-deletion (like Agr-and T-deletion) is a result of C0-checking). The deletion of C0 yields an intermediate representation in which [V bought] is immediately dominated by C′. (96) …[CP whatj [C′ [Vi bought] [AgrSP ti ti [TP…]]]]
This is still not an LF representation. As a result of C0-deletion, X-bar theory is now violated since CP does not have a C head, but a V head. Similarly, the functional Xmax categories AgrSP and AgrOP are also headed by a verbal category (namely, a V-trace of bought), in violation of X-bar theory. (Recall, in LF, it was the movement-visible verb that moved from the head of AgrOP (to AgrS-adjoined position) and from the head of AgrSP (to C), leaving V-traces in the head of both AgrOP and AgrSP.) In order to produce an LF representation that is consistent with X-bar theory, the V-headed AgrOP, the V-headed AgrSP, and, most crucially, the V-headed CP must each become VP-projections. This yields the (still intermediate) representation (97) of the embedded CP (7a) in LF, in which all projections (except TP) have “become” VPprojections. (97)
120
ESSAYS IN SYNTACTIC THEORY
a. VP-internal subject (=who) raising to [Spec, AgrSP] (syntax) b. wh-movement (syntax) c. V-to-AgrO movement (LF) d. object shift (of direct object trace of what) (LF) e. V-to-C movement (V2) (LF)
Thus (as a result of functional head deletion (rendering the projection V-headed) and requirements imposed by X-bar theory), all projections (except TP) are transformed into VP projections. But what about TP? Recall that LF V-movement over T never moved bought through (i.e., never adjoined it to) T0, so the head position of TP, unlike the head of the two Agr projections, is not verbal (i.e., not a V or V-trace). In fact, I assume the head of TP is, at this point in the derivation, e; that is, an empty head devoid of features. I make this assumption because T (adjoined to AgrS) has checked and deleted in the LF component. Under the Proper Binding Condition, if T deletes, its trace must too; otherwise, it would be unbound. (Alternatively, the T0 chain is the object that undergoes deletion.) I assume this deletion leaves e, a category devoid of features (Chomsky (1991)) in the head of TP. But there is now (arguably) an unwanted ECP violation in (97) since the V-trace in the head of what was AgrOP is not close enough to the V-trace in the head of what was AgrSP, because of the intervening e in the head of TP (assuming e “counts” as an intervening head). Over and above this possible ECP violation, the representation is not consistent with X-bar theory since T' fails to immediately dominate a T head. Following Chomsky (1991), I assume that T-deletion, leaving e in the head of TP, entails that the TP projection must be transformed as follows in order to satisfy X-bar theory. (98) [eP [e′e]]
But now, if (as in Chomsky (1991)) eP has no features, then I assume (contra Chomsky) that FI/economy of representation prohibits it from appearing in an LF representation since it presumably has no interpretation. I suggest that a legitimate LF representation is derived by substituting the V-trace occupying the head position of (what was) AgrOP into the empty e (leaving a V-trace in the departure site).42 In accordance with X-bar theory, the eP, now bearing a verbal head, becomes VP, just like the other functional projections did.43 This yields the (fully interpretable, convergent and X-bar-theory-consistent) LF representation (99) of the embedded CP of (7a). Notice that, given the transformation of TP/eP to VP, the V-chain now unequivocally satisfies the ECP. Moreover, the central result we have been seeking is now obtained: the subject trace of long-distance LF movement of who (occupying the specifier position of the former AgrSP) is antecedent-governed in this LF representation. That is, not only is the subject trace subjacent to the binding verb bought (90a-c), but also, in conformity with (90d), the subject trace is subjacent to the binding verb bought, within the single-bar projection (the V′ projection) of bought.44
OVERT SCOPE MARKING AND COVERT VERB-SECOND
121
(99)
a. V-to-C movement (LF) b. long-distance movement of the subject who (LF) c. V-trace movement
We have now an answer to the question, Why does LF “I”-raising to C0 properly govern the subject position? In a nutshell, LF “I”-raising is in fact V-raising to C, that is, covert V2. This yields a (“category neutral”) LF representation like (99) in which all projections are (non-functional) VP projections, as (I have argued) is independently dictated by economy of representation at LF (prohibiting purely functional checking categories) and X-bar theory. Therefore, in LF, unlike in the syntax, clause (90d) of antecedent government is satisfied: the fronted verb does antecedent-govern the subject within the V′ projection of the verb.45 Consequently, long-distance LF subject movement (unlike long-distance syntactic subject movement) can satisfy the ECP. Under the analysis proposed here, English has V2 (albeit in LF), a process known to render subject position properly governed (see Torrego (1984) concerning Spanish and Rizzi (1990) concerning languages like German, for a proposal regarding proper government of subject position at S-Structure by a syntactically fronted verb). By contrast, syntactic “I”-raising in English does not render subject position properly governed. As in Rizzi (1990), I-to-C (here, AgrS-to-C movement) does not result in the subject’s being antecedent-governed. Since C and its projections remain present at S-Structure, the fronted AgrS in C does not bind the subject within Agr . Rather, the only single-bar projection dominating both the fronted AgrS and the subject is C′, and antecedent government therefore does not obtain at SStructure. Thus, like Chomsky’s analysis of V-raising over negation, the analysis proposed here rests on the (independently motivated) presence of certain functional categories in the syntax and on their (principled) absence at LF. The significant aspects of the differences produced by syntactic “I”-to-C and LF “I”-to-C movement are illustrated in (100).
122
ESSAYS IN SYNTACTIC THEORY
(100)
Under Rizzi’s (1990) analysis, in the S-Structure representation AgrS does not antecedent-govern the subject since it is not the case that they are both dominated by AgrS′. By contrast, in LF, a fronted V does indeed antecedentgovern the subject. This is a direct result of (a) functional-head deletion (demanded by FI/ economy of representation), which yields a CP with a V head, and (b) X-bar theory, which requires the Vheaded CP to be transformed into a VP. In fact, conforming to X-bar theory (but see section 7) requires each V-headed functional projection (produced by functional-head checking and deletion) to be transformed into a VP. This yields, at LF, “category neutral” VP-recursion structures, such as (99) (similar in certain respects to those Larson (1988) proposes for double-object constructions).46 In the following section I examine the arguably unnatural definition (90) of antecedent government, upon which I have been relying. 6.4 Toward the Elimination of Index-sensitive Antecedent (Head) Government Conditions Consider again the index-sensitive (hybrid) definition of antecedent (head) government used thus far. (90) Antecedent government A antecedent governs B iff
OVERT SCOPE MARKING AND COVERT VERB-SECOND
123
a. A=X0, and b. A binds B, and c. B is subjacent to A, and d. A and B are both dominated by the single-bar projection of A.
As noted above, (90b) and (90d) each serve to block proper government of a specifier by its head in the following configuration: (101) [XP Spec [X′ X YP]]
Proper government is blocked under (90b) because X does not c-command (see (26)) Spec, and under (90d) because Spec is not dominated by X′. The question I would like to address here is this: Can the indexsensitive head government condition (90) be reduced to a more natural condition (as dictated by the Minimalist Program (Chomsky (1993)))? Notice that Spec in (101) is a member of the checking domain of X0, but is not a member of the internal domain of X0. Given this observation, I propose the following sufficient condition for the satisfaction of the ECP: (102) Proper Government t is properly governed (i.e., satisfies the ECP) if t is a member of the internal domain of an X0 chain.
As desired, (102) does not predict that Spec in (101) is properly governed. Moreover, this definition allows us to unify three seemingly distinct cases of proper government: (103) a. Direct object (V-sister) complements b. The (non-V-sister) “inner subject” in VP shell configurations—for example, the book in
c. [Spec, AgrSP] (=“subject position”) after V2 applies (e.g., ti in the LF representation (l00b)) [+γ]
Given (1 02), cases (1 03b) and (103c) are equated under the proposed analysis; that is, the LF representation (100b) is identical in the relevant respects to one derived by V-raising (in a VP-shell). That is, both V-substitution and V-adjunction to a checking X0 with subsequent deletion of the checking X0 (103c) are processes whereby a specifier becomes a member of the internal domain of an X0 chain and as such is nondistinct from a direct object complement (103a)—the position-type canonically satisfying the ECP. But even though (102) equates (103b) and (103c) while reducing each to (103a), it confronts a potentially serious problem. (102) would seem to be overly permissive in that it apparently incorrectly predicts proper government in (89)/(100a): the ti subject (=[Spec, AgrSP], trace of heavy NP shift) is apparently a member of the internal domain of the AgrS chain. This incorrect result would obtain even if AgrS were instead adjoined to C0. Depending on whether such AgrS-movement is substitution or adjunction to C0, there is a possible way of preventing proper government from obtaining in (89)/(100a). Recall that the internal domain of the AgrS chain is the minimal subset of the domain of the AgrS chain reflexively dominated by the complement of α1 (where α1 is the head of the AgrS chain, itself)-The question, then, is this: In (100a) (=
124
ESSAYS IN SYNTACTIC THEORY
(89)), or in the corresponding structure in which AgrS is instead adjoined to C0, what is the complement of AgrS? I suggest the two following alternative definitions of complement: (104) β is the complement of α00 iff the set of categories dominating β is equivalent to the set of categories dominating α0. (105) β is the complement of a0 iff the set of categories containing β is equivalent to the set of categories containing α0.
If AgrS is substituted into C0 (100a)/(89), then under either (104) or (105), AgrS has no complement whatsoever. (Crucially, AgrSP is not the complement of AgrS, since C dominates AgrS but not AgrSP.) Consequently, the internal domain of the AgrS chain is the null set. As desired, [Spec, AgrSP] (=the trace of heavy NP Shift in (89)) is therefore not a member of the internal domain of the AgrS chain and is therefore not properly governed under (102). (If AgrS is adjoined to C, and if we assume, contra May (1985), that adjunction to C duplicates C (as in L&S (1992)), then, once again, either (104) or (105) suffices to block proper government of [Spec, AgrSP] under (102).) If, on the other hand, AgrS is adjoined to C and May’s segmental theory of adjunction obtains (as in Chomsky (1993)), then only (105) (i.e., not (104)) suffices to block proper government under (102). Given (105), AgrSP is not a complement of AgrS (C is a container of AgrS but is not a container of AgrSP). In fact, the internal domain of the AgrS chain is (again) the null set. Therefore, as desired, [Spec, AgrSP] is not a member of the internal domain of the AgrS chain and is not properly governed under (102).47 To summarize this section: I have sought to eliminate the index-sensitive antecedent (head) government condition (90) and have suggested that membership in the internal domain of an X0 chain is sufficient for (and in fact represents the canonical case of) satisfaction of the ECP. This analysis unifies the three cases in (103) and incorporates only independently motivated primitives. Moreover, it makes no appeal to indices, “θ-marking by an X0,” or “head government.”48 7 The Bare Phrase Structure Theory The purpose of this section is to suggest that the analysis proposed here (which in one respect depends upon a certain aspect of (stipulated) X-bar theory) is entirely consistent with (if not entailed by) the theory of bare phrase structure (BPS) outlined in Chomsky (1994). I have argued that when a V adjoins to a functional checking head (which heads a functional projection), the checking head checks and deletes, leaving the V, immediately dominated by the single-bar projection of the (deleted) checking head (as it in fact was before the checking head deleted). For example, I have postulated that for (7a), V-raising to AgrS and then to C, with checking and deletion of AgrS and C, yields the intermediate structure (106) in the LF component.
OVERT SCOPE MARKING AND COVERT VERB-SECOND
125
(106)
(In (106), (P) and (′) appearing in category labels are parenthesized since, within BPS, such notations are eliminated given that the phrase structure status (minimal projection, maximal projection, minimal and maximal projection, neither minimal nor maximal projection) is relationally determined (see Muysken (1982), Freidin (1992)). In order to exclude such representations (at least at the interface levels, but perhaps altogether), I have claimed that such (postchecking and -deletion) structures, displaying a V-headed C projection and a Vheaded AgrS projection, are “excluded by X-bar theory,” which Chomsky (1993) construes as a constraint on the output of the application of the generalized transformation GT. BPS, I believe, excludes these structures as well. Within BPS, the category labels C(P) and C(′), as well as AgrS(P) and AgrS(′), lack the parenthetical material (as noted) and are in fact identical to the head; in other words, “the head…is…the label…” (Chomsky (1994, 16)). Consequently, if the head (C or AgrS) checks and deletes, these “higher” projection labels must (automatically) delete since they are identical to the head, from which they are projected.49 Thus, within BPS too, structures like (106) cannot exist; in particular, if the C head checks and deletes, the identical C(′) and C(P) labels delete too. Similarly, checking and deletion of the AgrS head entails deletion of the identical AgrS(′) and AgrS(P) projection labels. But then, if these four projection labels are absent, what are the projection labels appearing in (106) as generated within BPS? Let us begin with Agrs(′). This label must be either the head V-copy trace or the head of (its sister) TP, but this too is identically V-copy trace. Thus, the label Agrs(′) must be V; there is no other possibility. Exactly the same pertains to the label C(′); this label, too, must be identical to the head of one of its two daughters, but again “the two are identically” V. What about the maximal projection labels AgrS(P) and C(P)? Each of these labels must be V as well. This follows since (as Chomsky (1994) shows) in cases of substitution (in (106), specifier substitution), it is deducible that a moved category (in (106) who and what) can never project; hence the head of the N(P) sister in (106)— namely, the category V—must project. This AgrS(P) and C(P) are, (like AgrS(′) and C(′)) labeled V in BPS. Thus, it appears that BPS (like stipulated X-bar theory) precludes structures like (106) and moreover forces their “transformation” into VP-recursion structures, just as I have proposed. There are, however, two respects in which my analysis might appear to be inconsistent with BPS. First, the VP-recursion structures produced in LF appear to be derived by a type of movement that Chomsky (1994) refers to as “self-attachment”. Although (as far as I can tell) Chomsky’s (1994, 19, 20) discussion of self-attachment provides no direct empirical support for prohibiting it, the empirical matter need not concern us here since my analysis does not in fact involve (Greed-violating) self-attaching movement; that is, the V was not in fact adjoined to a VP (labeled V) that it headed prior to movement, but rather was adjoined to Agr/C (i.e., a checking head), thereby satisfying Greed and thereby not representing a case of
126
ESSAYS IN SYNTACTIC THEORY
self-attaching movement (although the derived VP-recursion structures look as if they were derived by selfattaching movement). As a result, Chomsky’s (1994) conceptual arguments against self-attachment (namely, the unambiguous deduction of target (not mover) projection for cases of movement) are unaffected by my analysis, which invokes only (familiar) V-adjunction to a checking head. The final respect in which my analysis might appear to be inconsistent with BPS is that I assume that the checking head literally deletes (as in Chomsky (1991; 1993)), whereas Chomsky (1994, 13, fns. 23, 24) suggests the possibility that checked functional heads might not delete but might instead be rendered “invisible” (noting (fn. 24) that “A question arises about implementation; deletion raising technical issues…”). There seem to me to be at least two potential problems with this alternative to literal deletion. First, conceptually, there is always a potential problem with explaining why and how it is that symbols are present in a representation yet the computational system and interface treat them exactly as if they were absent, a fact that could be readily (but not necessarily uniquely) explained if they were indeed absent, as is assumed here and in Chomsky (1991; 1993). Second, there is an empirical problem. If a checking head (e.g., AgrS) is merely rendered “invisible” (i.e., not deleted), then given that the projection labels AgrS(′) and AgrS(P) in, say, (106) are identical to the AgrS head in BPS, then they too, being identical, would remain but would be rendered (like the head itself) “invisible.” But this presumably is the wrong result, since (e.g.) the category labeled AgrS(P) in (106) is (I assume) in fact a maximal projection and is therefore visible both for computation and at the interface, as is captured under the literal deletion analysis within which this category is a (fully visible) maximal projection bearing the category label V.50 Beyond the investigation presented here, a more comprehensive determination of the precise formal properties and empirical content of checking and checking-induced deletion within a BPS-based Minimalist Program analysis awaits further research. 8 Summary The central aspects of the analyses presented here can be summarized as follows: • The Operator Disjointness Condition (L&S (1992)) can be replaced by the arguably more natural ScopeMarking Condition, the empirical adequacy of which relies on the theory of linked chains. • A new theory of adjunction, the mixed theory of adjunction, perhaps obtains. • Consistent with a leading idea concerning parametric variation, English generates covert verb-second. • (Certain) LF representations/subtrees are “category-neutral” VP-recursion structures, lacking “functional” checking categories and their projections. • Index-sensitive head government ECP requirements may be reducible to simpler, more natural conditions, given certain independently motivated principles of checking theory, and the derived constituent structures I have proposed for checking-induced deletions. • The analysis of checking-induced deletion proposed here is argued to be consistent with, if not “forced by,” the bare phrase structure analysis proposed in Chomsky (1994). Notes * This article is a revised version of a manuscript completed in July 1992, which appeared as Epstein (1993). The analysis presented in section 4.1 first appeared in Epstein (1989b). For very insightful discussion of the analyses presented here, I thank Maggie Browning, Noam Chomsky, Bob Freidin, Günther Grewendorf, Howard Lasnik,
OVERT SCOPE MARKING AND COVERT VERB-SECOND
1
2 3 4
127
Geoff Poole, Daiko Takahashi, and Höskuldur Thráinsson. I am especially indebted to Erich Groat, Madelyn Kissock, Hisatsugu Kitahara, Elaine McNulty, Robyne Tiedeman, and Esther Torrego for their extensive and invaluable comments. I am also grateful to all the Harvard University students in Linguistics 213r (Spring 1993) who listened patiently and thoughtfully to the material presented here. Portions of this article were presented at the University of Massachusetts at Amherst in February 1993, the University of Connecticut at Storrs in October 1993, and Princeton University in December 1993, and I thank the audiences at these presentations, too, for their helpful comments. Finally, I am indebted to an anonymous LI reviewer for carefully reading the manuscript and providing insightful commentary. In Epstein (1992) I deduce (10) from Chomsky’s (1991) principle of derivational economy. For other discussion of (10) see Aoun, Hornstein, and Sportiche (1981), Huang (1982b), Chomsky (1986), van Riemsdijk and Williams (1981), Lasnik and Saito (1984), Epstein (1987; 1991). An LI reviewer correctly notes that (12b) is not just infelicitous, but also ungrammatical. This, too, will be predicted by the analysis presented below. For other current analyses of superiority phenomena concerning data not analyzed here, see, among others Kitahara (1993a), Williams (1994, section 5.2.7), and Freidin (1995). An LI reviewer asks whether there is any independent motivation for L&S’s assumption that Absorption is obligatory. L&S (1992, 189, fn. 22) attribute the obligatory wide-scope reading of what in (i) to the impossibility of Absorption (“…in approximately the sense of Higginbotham and May (1981)…”) between what and whether: (i) whoi ti wonders whether John bought what If Absorption in this sense were merely optional, the representation of the unavailable reading would presumably be generable. From L&S (1992, 189, fn. 29), we can infer that L&S assume that, in the absence of Absorption, the LF representation of, say, (ii) (ii) Who bought what?
would contain a free variable, namely, the trace of what. Thus, Absorption would be, in effect, obligatory in that failure to apply it invariably yields ill-formedness. 5 For ease of exposition, I have suppressed C0 indexing in (22). Under specifierhead indexing the embedded CP in (22) appears as in (i). (i) …[CP whatj [C0j] [IP whok bought tj]] Hypothetical indexing changes whok to whoj. Even though C0j “intervenes,” whatj locally Ā-binds who provided that the second occurrence of bind in (20c) is defined as in (ii), (ii) X binds Y iff (a) X and Y are coindexed, and (b) X m-commands Y. where m-command (see Aoun and Sportiche (1983)) may be defined as in (iii). (iii) X m-commands Y iff the minimal maximal projection dominating X dominates Y. Another possibility is that heads are simply ignored in calculating local binding relations among maximal projections (contra Epstein (1991)). To an LI reviewer, the latter possibility appears preferable, since the former incorporates an m-command definition of binding (internal to the definition of local binding) that, as we shall see
128
ESSAYS IN SYNTACTIC THEORY
momentarily, would coexist with a c-command definition of binding (internal to the definition of antecedent government), an arguably unattractive state of affairs. 6 Regarding the disjunction in (27), L&S note that LF I-substitution into C is not sufficient to generate (i), for example, given Quantifier Raising. (i) Everyone left. LF application of QR (May (1977; 1985)) and I-to-C movement yields the following representation: (ii) * [CP [
Ii [IP everyonei [IP ti
left]]]
The ECP is violated: two IPs “intervene” between the ti subject and Ii. Therefore, ti is not subjacent to Ii (see L&S (1992) and below), with the result that ti is not antecedent-governed by Ii given (11). In order to generate a well-formed LF representation of (i), L&S assume that, as one option, in addition to adjoining to C, I can adjoin to IP, yielding the LF representation in (iii), which satisfies the ECP. (iii) [CP [
7 8 9 10
11
] [IP everyonei [IP Ii [IP ti
tIi left]]]]
I will return to this disjunction in condition (27) below, in particular the non-structure-preserving adjunction of I0 to IP, expressed in the second disjunct. Regarding (27), the question arises why syntactic I-to-C movement does not render subject position properly governed at S-Structure. This difference between the syntax and LF is analyzed in section 6. Chomsky writes, “In English…a wh-phrase in situ must move to specifier of CP [at LF]…and it may do so…only if this position is already occupied by a wh-phrase; wh-situ is ‘attracted’ by a wh-phrase in ‘scopal’ position.” This “minimalistic” analysis thus derives (9). “Checking domain” is defined below. For discussion of the Proper Binding Condition, see May (1977); for an attempt to derive it, see Collins (1994). L&S argue that the condition applies to the output of each application of Move α and therefore it does indeed apply to S-Structure. The SMC (34) (and (33)) is arguably odd in that it refers, on the one hand, to a wh-phrase in situ Y and, on the other, to a wh-chain X. However, given that (a) a wh-phrase in situ is never either chain-medial or a chain-tail, but is rather a chain-head, and (b) chain links exhibit binding (in fact local binding; see Chomsky (1981), Rizzi (1986), Lasnik (1985), Epstein (1986)), it follows that if a wh-chain X c-commands a wh-phrase in situ Y, then X c-commands the chain of which Y is the head; that is, every member of the chain X c-commands every member of the chain of Y. Consequently, we could restate the SMC (34) as a purely chain-theoretic condition, eliminating reference to “…a wh (i.e., a wh-phrase) in situ…” This might be done as follows: (i) In the LF component, a wh-chain in situ Y can adjoin to a wh-chain X only if X c-commands Y at SStructure.
Of course, we would then need to assume that the operation of adjoining a chain Y to a chain X appears identical to the operation of adjoining the head of Y to the head of X. In what follows, I will continue to assume (33)–(35) as given in the text. 12 It might seem at this point that the SMC analysis is empirically equivalent to Pesetsky’s (1987) Path Containment Condition (PCC) analysis of superiority phenomena; the PCC disallows “interlocked” Ā-chains, and this is apparently what the SMC prohibits as well. However, the SMC and the PCC are in fact empirically (quite) distinct. As an illustration, note that the PCC (incorrectly) disallows the LF representation of (7b), in which the Āchain headed by whatj and the Ā-chain headed by whok “interlock” (see (23)). By contrast (as we have seen), the SMC (correctly) allows such representations containing “interlocked” Ā-chains; in other words, the fact that the
OVERT SCOPE MARKING AND COVERT VERB-SECOND
129
whok chain and the whatj chain interlock in the LF representation (23) is irrelevant in the eyes of the SMC, a principle permitting LF adjunction of whok to the whoi chain that c-commands it at S-Structure. 13 See Kitahara (1992; 1996) for other arguments directly supporting this analysis of chains. For an alternative analysis of superiority phenomena, see Kitahara (1993a). 14 Before proceeding, I should note that the SMC and ODC also diverge with respect to the following type of derivation: (i) a. S-Structure [CP [IP whoi wonders [CP whoj [tj left]]]] b. LF [CP [IP ti wonders [CP whoi whoj [tj left]]]] The ODC allows such LF movement of whoi since whoi and whoj are not O-disjoint in (ia). By contrast, the SMC prohibits such movement and is therefore partially redundant with the Proper Binding Condition applying to ti at LF (and/ or the ban on vacuous quantification/Full Interpretation). If this redundancy is in fact undesirable (see Epstein (1990a) for arguments that redundant analyses can be empirically motivated), we could eliminate it by adding (iib) to the existing formulation of the SMC (given informally in (iia)). (ii) At S-Structure a wh-chain C is a scope marker for a wh-in-situ X iff a. every member of C c-commands X (see (34)/(35)), or b. no member of C c-commands X. With the addition of (iib), the SMC, like the ODC, would permit derivation (i), thereby eliminating the redundancy. The addition of (iib) would also allow “inversely linked LF wh-movement” (allowed by the ODC), as in (iii). (iii) a. S-Structure [CP [NPi whose picture of whom] [ti fell]] b. LF [CP [NP whomj [NPi whose picture of tj [IP ti fell]]] That is, since no member of the chain ([whose picture of whom]i, ti) c-commands whom at S-Structure, whom can adjoin to the chain in LF, as in (iiib). Notice finally that the disjunction in (ii) can be eliminated by restating this condition as a condition under which a wh-chain is not a scope marker. (iv) At S-Structure a wh-chain is not a scope marker for a wh-in-situ X iff one member of C does ccommand X and some other member of C does not c-command X. I believe that the addition of (iib) to the existing analysis (iia) is irrelevant in what follows. 15 Although I have abandoned Chomsky’s (1973) Superiority Condition, it is interesting to note that it cannot account either for the deviance of (38) or for its similarity to (1b). In (38), unlike in (1b), the presumably superior phrase why was indeed selected to undergo wh-movement. Hence, (38) satisfies the Superiority Condition whereas (1b) violates it. The ECP is similarly ineffective. As noted, (38) satisfies the ECP and (as discussed in section 2) (1b) does too. 16 Rizzi (1990, 47) suggests that reason adverbials are adjoined to either TP or AgrP. 17 I assume that a trace of why appears in the LF component. Given the existence of such a trace, in LF there will be a chain c-commanding what, unlike at S-Structure. This is the only empirical motivation I know of for (35b) =“scope marking is determined at S-Structure,” hence the only impediment to reformulating the SMC as a purely LF condition, consistent with the leading idea (Chomsky (1986, 1993)) that there are no S-Structure conditions. If wh-adverbials like why in fact bind no trace in LF (see Rizzi (1990,47)) or if, alternatively (perhaps by
130
ESSAYS IN SYNTACTIC THEORY
Procrastinate), once the trace of why is created, the LF representation has by definition been created (hence, whsitu, by definition, cannot move), then scope marking could take place entirely in LF; that is, the S-Structure condition (35b) could by hypothesis be eliminated. 18 As L&S (1984) note, (47) is certainly better than the ECP violation (i). (i) ** whoi do you wonder [CP whether [IP John said [CP t′i that [IP ti left]]]] 19 Chomsky (1991, 441) deduces such trace deletion from Full Interpretation. That is, (whoi, t′i, ti) is neither an operator-variable chain, nor a homogeneous/uniform Ā-chain, nor an argument chain. Deleting t′ yields the legitimate LF object (whoi, ti), an operator-variable chain. See Browning (1987) for an earlier discussion of such chain uniformity. 20 The data presented in this section are accounted for, as are the following types of examples: (i) * Why did who fix the car? (*SMC analysis) (ii) ** Why did John fix the car how? (*SMC, *ECP) Huang (1982a, 557) judges the following perfect, hence better than (38), to which he assigns “?”: (iii) Tell me why you bought what. No account for the purported contrast is given (see Huang 1982a, 585–586). I am not sure that I find (iii) any better than (38), but if it is, the SMC might provide an explanation. Example (iii) is an imperative, the felicitous use of which is constrained by certain discourse factors. It may be, then, that (iii) forces or biases for a D-linked interpretation of what in the sense of Pesetsky (1987). If what is D-linked, then, under Pesetsky’s analysis, it does not undergo LF-movement; hence, the SMC is irrelevant and lack of a scope-marker in (iii) does not induce ungrammatically. The inapplicability of the SMC to D-linked wh-phrases would also account for the suppression of superiority effects (noted by Pesetsky) in analogues to (1b) such as (iv). (iv) Which woman did which man see? 21 The SMC is a constraint on, and hence presupposes the existence of, LF wh-movement. However, the SMC could be rendered compatible with an analysis in which wh-in-situ does not undergo LF wh-movement but is instead subject to an LF interpretive rule of scope assignment. That is, the SMC could be reformulated as the following interpretive (not movement) constraint: (i) In the LF component, interpret a wh-in-situ W as having scope identical to that of the head of a whchain CH, where CH c-commanded W at S-Structure. However, I will suggest below that certain directly relevant data can readily be explained under the LF whmovement analysis presumed here, but not under an interpretive (nonmovement) reanalysis of wh-in-situ. Consequently, I will continue to assume LF wh-movement and therefore will retain the original SMC formulated as a constraint on LF wh-movement. The interpretive constraint (i) is not incorporated. To begin with, consider the raising structure (ii). (ii) [IP Johni is [AP likely [IP ti to win]]] (iii) [CP [APj how likely ti to win] [C [Ik is]] [IP Johni
tj]]
(ii) is an example of what might be called “the least controversial case of a predicate-internal subject”; that is, John originates internal to the “predicate” AP and moves to [Spec, IP]. In (iii), the subject John has similarly
OVERT SCOPE MARKING AND COVERT VERB-SECOND
131
raised from the position ti to [Spec, IP], and, in addition, the AP has undergone wh-movement to [Spec, CP]. I will assume that ti, the trace of subject raising, is an anaphor. Applying the chain-binding algorithm proposed by Barss (1986; 1988), I further assume that, even though this anaphor is free at S-Structure, it is nonetheless licit by virtue of being chain-bound by its antecedent John. Informally, ti is licit because John is a local c-commander of the trace tj, the trace of the container of the anaphor. (I in fact depart here from Barss’ analysis of (iii), which assumes that ti is not anaphoric and is therefore not subject to the binding theory; see Barss (1986, 409).) Now consider the following minimal pair: (iv) [CP whoi [IP ti wonders [CP [APj how likely tk to win] [IP Johnk is tj]]]] (v) * [CP whoi [IP ti wonders [CP [APi how likely tk to win] [IP whok is tj]]]] The grammatical (iv) is essentially a case of embedding a structure like (iii). The ungrammatical (v) differs from (iv) only in that the raised subject whok is a wh-phrase. Under the SMC, notice that (v) is like the grammatical (7) in that whok cannot adjoin to the wh AP-chain in LF since this chain fails to c-command whok. Rather (just as in (7)), whok must adjoin to (the chain headed by) whoi. This yields an LF representation of the following form: (vi) [CP whok whoi [IP ti wonders [CP [APi how likely tk to win] [IP t′k is tj]]]] This derivation exhibits the following (I believe) previously unobserved property: a trace (tk) created by syntactic A-movement of a wh-phrase (whok) becomes locally Ā-bound by the wh-phrase (whok) at LF. There are two independently motivated analyses under which LF representations like (vi) might be excluded. The first is (vii). (vii) a. Locally Ā-bound categories are variables (Chomsky (1982), Koopman and Sportiche (1983)), and b. Variables require Case (Chomsky (1981)). (vii) excludes (vi) since tk created by syntactic A-movement is a variable lacking Case in LF. As Brody (1984) notes, cases such as the following may present a problem for (viia): (viii) [CP whoi did [IP [ei losing the race] [VP upset ti]]]
That is, the locally Ā-bound ei is arguably PRO, not a variable. If, however, PRO bears (null) Case (Chomsky (1986; 1993)), this problem disappears. (Arguments against (viib) are provided in Borer (1981) and in Epstein (1987; 1991).) An alternative approach to excluding (vi) is to adopt Koopman and Sportiche’s (1983) Bijection Principle, entailing (ix). (For potential problems confronting the Bijection Principle, see Epstein (1984), Safir (1984), and Stowell and Lasnik (1991). For discussion of the definition of local binding, see Epstein (1987).) (ix) Every Ā-position locally binds one and only one A-position. (ix) is violated in (vi) since whok locally binds both tk and t′k. Within this account, (vi) is thus analyzed as a case of LF weak crossover, similar to (x). (x) S-Structure [CP whoi [IP ti expects [IP hisj mother to love whoj]]]
132
ESSAYS IN SYNTACTIC THEORY
LF [CP whoj whoj [IP ti expects [IP hisj mother to love tj]]] This analysis faces a problem, however, (vi) is not exactly the relevant LF representation. In order for t′k (the trace of long-distance LF subject movement) to be properly governed, recall that Ik substitutes (or can substitute) into C, yielding the LF representation (xi). (xi) [CP whok whoi [IP ti wonders [CP [APj how likely tk to win] [Cj [Ik is] [IP t′k tIk tj]]]]] In (xi) tk is the one and only A-position that whok locally Ā-binds. Crucially, t′k is not locally Ā-bound by whok since the raised binder Ik “intervenes.” Consequently, (ix) is satisfied; that is, the derivation satisfies the Bijection Principle and the proposed alternative to (vii) fails. One possible solution to this problem is to assume that heads, such as I, are ignored in computing local binding relations between maximal projections; hence, these are Bijection Principle violations (see fn. 3). To summarize this digression: I have provided two analyses of the ungrammaticality of strings like (v). LF representations such as (vi) may violate the Case requirement on variables (vii) and/or the Bijection Principle (see (ix)). I have noted potential problems with each analysis, and I leave open which (if either) is correct. Note that under either analysis, the wh-AP must not be reconstructed at LF. That is, in (vi), if the wh-AP (or any subconstituent containing tk) were moved back to the position of the AP-trace tj in LF (as a way of properly A-binding the NP-trace tk at LF), both (vii) as well as (ix) would be satisfied. There would be no violation. Thus, under the analyses presented here, the ungrammatically of (v) can be taken as evidence against LF reconstruction. Notice also that the following examples display exactly the same contrast in grammaticality as the subject raising examples (iv) and (v): (xii) [CP whoi [IP ti wonders [CP [APi how tall] [IP John is tj]]]] (xiii) * [CP whoi [IP ti wonders [CP [APj now tall] [IP who is tj]]]] The similarity of these examples to the raising cases (iv) and (v) can be explained by, and hence provide support for, an analysis within which tall (like likely) is analyzed as a raising predicate. If either of these independently motivated conditions ((vii) or (ix)) is in fact correct, (v) is explained. By contrast, under an interpretive analysis of wh-in-situ (lacking LF movement) there is no apparently ready way to distinguish the ungrammatical (v) from the grammatical (iv). I will therefore (tentatively) continue to assume both LF movement of wh-in-situ and the original SMC constraining such movement. I do not adopt the alternative constraint (i). 22 As is characteristic of an ECP effect, there also seems to be a subject-object asymmetry. (i) * [CP whoi [IP ti thinks [CP that [IP on the shelfj [IP who put a book tj]]]]] (ii) ? [CP whoi [IP ti thinks [CP that [IP on the shelfj [IP John put what tj]]]]] 23 Tiedeman (1990) provides an alternate analysis, which, however, cannot account for the ungrammaticality of (56). For discussion, see Epstein (1993). 24 For detailed discussion of May’s analysis, see, among others, Williams (1988), L&S (1992, chapter 5), and Epstein (1989a). Note that in Epstein (1989a) I do not argue against May’s segmental theory of adjunction. Rather, I argue that a purported argument for that theory confronts certain empirical problems. 25 Poole (1996) points out that if Chomsky’s (1993) extension version of the strict cycle constrains not only substitution but also XP-adjunction (see Chomsky (1993, 23)), then syntactic segmentation adjunction would be disallowed, since a dominator of the node targeted by the generalized transformation GT is not created. Thus, Poole notes, duplication adjunction is forced by the extension version of the strict cycle in the syntax. If, as
OVERT SCOPE MARKING AND COVERT VERB-SECOND
133
Chomsky (1993) proposes (cf. Kitahara (1993b)), the extension version of the strict cycle does not apply at LF, then segmentation adjunction would be allowed (but not ensured), as Poole further notes. 26 I leave open the possibility suggested by Howard Lasnik (personal communication) that (70) might be derivable from the A-over-A Constraint discussed in Chomsky (1972, 51). 27 The trace of I-movement in (71) also violates the ECP since it, like the subject trace, is not subjacent to Ik. Hence, it fails to be antecedent-governed. Although L&S (1992) do not discuss X0-movement with respect to the ECP, notice that I-to-C movement “over” topicalization as in (i) is ungrammatical and is similar to L&S’s core case of subject extraction (ii). (i) * [CP [C willi [IP this bookj [IP John ti buy tj]]] (ii) * [CP whoi [Ci] [IP this bookj [IP ti bought tj]]] This suggests that L&S’s formulation of the ECP might in fact extend at least partially to head movement (see Epstein (1990b) for discussion). But I will not pursue this issue here. 28 Some potentially interesting consequences of the proposed analysis regarding the application of subject Quantifier Raising in LF are explored in Epstein (1993). 29 See also Koopman (1983), where (i) is analyzed as an ECP effect (not unlike (74)). (i) * Who did see Mary? 30 The relevant definitions from Chomsky (1993) are as follows: (i) The domain of a head α=the set of nodes contained in Max(α) that are distinct from and do not contain α. (p. 11) a. A category α dominates β if every segment of α dominates β. (p. 11) b. Max(α)=the least full-category maximal projection dominating α. (p. 11) c. A category α contains β if some segment of α dominates β. (p. 11) Two subsets of the domain of a head or are then defined: the complement domain of α and the residue of α. (ii) The complement domain of α=the subset of the domain reflexively dominated by the complement of the construction, (p. 11) a. A category α reflexively dominates all categories dominated by α, and α itself. b. Complement of the construction is not defined in Chomsky (1993). (iii) The residue of α=the domain of α minus the complement domain of α. (p. 11) As a simple illustration of the three sets just defined, consider (iv). (iv)
134
ESSAYS IN SYNTACTIC THEORY
a. Domain of X (=α) is {ZP, Z′, Z, YP, Y′, Y}. b. Complement domain of X (=α) is {YP, Y′, Y}. (assuming “complement of the construction” = “complement of α”) c. Residue of X (=α) is {ZP, Z′, Z}. Assuming that X=α enters into syntactically significant relations with only ZP (=specifier-head) and YP (=head-complement), the complement domain of X (=α) needs to be “narrowed down” or “minimized” to the unit set {YP} and, similarly, the residue of X (=α) needs to be “minimized” to the unit set {ZP}. To attain precisely this result, “minimal subset of a set S” (=Min(S)) is defined: (v) Min(S)=the smallest subset K of S such that for all γ ∈ S, some β ∈ K reflexively dominates γ. (p. 12) Loosely speaking, this picks out “the ‘topmost’ node from a set.” As Akira Watanabe (personal communication) points out, a perhaps more perspicuous informal but empirically equivalent definition of Min(S) is (vi). (vi) Min(S)=all members of a set S that are not dominated by a member of S. As discussed, with respect to (iv), (vii) The minimal complement domain of X (=α) is {YP}. (viii) The minimal residue of X (=α) is {ZP}. Finally, (ix) The internal domain of α=the minimal complement domain of α. (p. 12) (x) The checking domain of α=the minimal residue of α. (p. 12) All definitions thus far presented in this footnote pertain to a head α, that is, a singleton X0 chain. For nonsingleton head chains the definitions are (naturally) modified as follows: (xi) Suppose there exists an X0 chain: a. CH={α1,…,αn} b. The domain of CH=the set of nodes contained in Max(α1) and not containing any αi. c. The complement domain of CH=the subset of the domain of CH reflexively dominated by the complement of α1. d. Complement is not defined in Chomsky (1993). e. The residue of CH=the domain of CH minus the complement domain of CH. Minimizing the sets defined in (xic) and (xie) (by “applying” definitions (v) and (vi)) yields the checking domain and the internal domain of CH. 31 This analysis is attractive in a number of respects. First, there is no affix lowering, hence no free trace at SStructure (cf. Chomsky (1991)). Second, the analysis is consistent with the leading idea that natural language
OVERT SCOPE MARKING AND COVERT VERB-SECOND
135
grammars incorporate the same instantiations of Affect α (in this case, V-raising) but differ with respect to the level at which these operations apply. Third, this analysis permits a French sentence and an English sentence that are synonymous to have structurally identical LF representations, a natural if not necessary result precluded if French has V-raising and English has affix lowering. 32 T might not be semantically vacuous (see Chomsky (1991) regarding finite T). I nonetheless assume that T0 is absent from LF representations. This is deducible from FI/economy of representation, which would prohibit an LF representation containing, redundantly, both the tense features lexically specified on the verb and the same features on the functional checking category T0, as in (i), for example.
33 34 35 36
37
38 39
40
Given that a sentence has only one tense, (i) cannot be an LF representation. As demonstrated in Epstein (1987; 1991), features of a head cannot be deleted (e.g., the masculine feature of himself, the [−γ] feature on a trace, the tense feature on the verb). Rather, only entire syntactic categories are subject to deletion. In (i) V cannot delete since it has semantic content not recoverable from T. However, T can delete since its feature is fully recoverable from the verb’s (identical) tense feature. If T can delete, it follows that it is an unnecessary symbol in the LF representation. By economy of representation, then, it must delete. I omit VP-internal subject raising, which is irrelevant here. Notice that, under May’s (1985) segmental theory of adjunction, the immediate dominator (see (73)) of Z is Y′, in both (86a) and (86b). For a treatment adopting aspects of the analysis presented here, see Kissock (1995, chapter 4). Notice that, in the analysis just presented, order of “rule” application, stipulated in the standard theory (entailing concomitant learnability problems), is deduced from the presumably universal principle of LF Movement Visibility, an independently motivated and arguably natural (as opposed to “purely formal”) principle of the LF component. See Jonas (1992) and Jonas and Bobaljik (1993) for arguments that T-to-AgrS movement is subject to parametric variation and that in (e.g.) Icelandic, T-in-situ can check nominative Case on an NP occupying [Spec, TP], an operation underlying transitive expletive constructions, apparently unavailable in English. The obligatoriness of Case checking by T of an NP occupying [Spec, AgrS] constitutes, in large part, the Extended Projection Principle. There is a potential question regarding which category moves from the head of AgrSP to C in I-to-C movement. This question arises because it is not altogether clear which, if any, of the functional heads check and delete in the syntax. In fact, this potential unclarity is irrelevant to the proposed analysis. There are only four logical possibilities: only (a) a V, (b) a T, (c) an AgrO, or (d) an AgrS could be the category that moves to C. Under any of these possibilities the analysis will succeed. It will explain why syntactic “I”-to-C movement (i.e., syntactic V-, T-, AgrO-, or AgrS-movement to C) does not yield proper government of the subject. In the text, I assume that the entire AgrS complex moves to C. I believe this is what Chomsky’s (1993) framework predicts; that is, since AgrS, AgrO, and T need not check V-features in the syntax (i.e., English is not a syntactic V-raising language), economy of representation predicts that they cannot. Thus, these categories do not delete in the syntax, since they have checking to do at LF. There is a redundancy between clauses (90b) and (90d). Each prevents a head X from antecedent-governing the specifier position YP in (i). (i) [XP YPi [X′ [Xi]]] For expository purposes, I will leave the definition as is.
136
ESSAYS IN SYNTACTIC THEORY
41 If T had not deleted prior to movement to C, it would delete after movement to C, leaving the V bought in C, exactly as in (95). This is the crucial property, yielding proper government of the subject trace in LF, as we will see momentarily. 42 See Chomsky (1991) and Kitahara (1994; 1997) for proposals regarding the movement of Xmax-traces. 43 Other analyses of the problem raised by V-movement over the intervening T0/e0 are imaginable. First, as suggested in Thráinsson (1996), it might be that English, in contrast to (e.g.) Icelandic, has an unsplit I, as in analyses preceding Pollock (1989). If this is correct, the problem simply disappears. Second, even if English has both AgrS and T, it is not altogether clear that LF V-movement from the head of AgrOP to AgrS-adjoined position, over the head of TP/eP, is in fact a violation. Given Ferguson and Groat’s (1994) formulation of the “Shortest Movement” Condition, such V-movement over the tail of the T0/e0 chain would be allowed, since no closer Vchecking position was “skipped-over.” Third, if e0, being semantically null, is (perhaps naturally) invisible to LF operations, its presence during the movement would be irrelevant. Finally, an LI reviewer suggests that “[i]t may be possible to assume that the empty projection is pruned away, excised.” This, too, is a possibility, but notice that the entire empty “projection” (i.e., the syntactic categories labeled T′ or TP in (97)) of course cannot be deleted (in any standard sense of deletion). By contrast, deleting just the nodes labeled TP, T′ and e in (97) would constitute deletion (Affect α) of entities which are not syntactic categories (a type of operation the existence of which I leave open for further research). 44 Recall that bought and who are coindexed via specifier-head coindexing at S-Structure. LF specifier-head coindexing coindexes the trace of who in the specifier of what was AgrSP with the trace of bought in the head of what was AgrSP. Thus, the trace of who and bought are indeed coindexed, so that binding does in fact obtain. Further, binding within the V′ projection of bought obtains because, as a result of C0-deletion, CP is transformed into VP. The (complex) index-sensitive head government condition (90) will be reformulated as a more natural condition in section 6.5. 45 An alternative analysis should be noted. Suppose we abandon Rizzi’s (1990) (90d) and adopt only L&S’s definition of antecedent government as given in (90a–c). As desired, in an LF representation (like (99)), the subject is antecedent-governed according to this definition. But why is there no antecedent government when I-toC movement applies in the syntax (see (89))? Suppose (contra (89)) that I-to-C movement does not substitute AgrS into C but rather adjoins AgrS to C. As a result of such adjunction, AgrS would not bind the subject trace under L&S’s independently motivated branching definition of c-command. Therefore, as desired, syntactic I-to-C movement does not result in the subject’s being antecedent-governed, whereas in LF antecedent government obtains. 46 Several issues arise regarding the analysis presented in this section. The first concerns LF V2 and parameterization concerning that-t effects. Law (1991) provides a (pre-Minimalist Program) analysis of West Flemish (WF) complementizer agreement postulating [V+I] movement to C in LF. (I thank Daiko Takahashi for bringing this article to my attention.) Within Law’s analysis, the suppression of syntactic that-t effects, as illustrated by the grammaticality of (i), is ascribed to the head government of the subject trace produced by LF movement of [V+I] to C. ((i) is adapted from Law (1991, (14), (16)).) (i) a. den vent da Pol peinst da Valère gezien heet the man that Pol thinks that Valère seen has ‘the man that Pol thinks that saw Valère’ b. S-Structure den vent [CPOi [da [IPPol peinst [CPt′i [da [IPti Valère gezien heet+I]]]]]] c. LF da (that)-replacement den vent [CPOi [peinst [Pol [CPt′i [heet+I]j [ti Valère gezien tj]]]]] Thus, contra Rizzi (1990, 39), where it is proposed that the proper head government condition must be met at S-Structure, and contra L&S (1984; 1992), in which argument (subject) traces are indelibly γ-marked at SStructure, and contra the “*-marking” analysis proposed in Chomsky and Lasnik (1993, 546; Law postulates that the subject trace in (ib), which violates the head government requirement at S-Structure (and at the moment of its
OVERT SCOPE MARKING AND COVERT VERB-SECOND
137
creation), can be “saved” by LF head movement of [V+I] to C, which renders the subject trace head-governed in LF. Given his analysis, Law argues that English must lack [V+I] movement to the overt finite complementizer that in LF; if English had such movement, that-t configurations as in (ii) would be incorrectly predicted to be grammatical, being salvageable by LF head movement, just like the WF case (i). (ii) * [CP whoi do [IP you think [CP ti [that [ti Ii left]]]]] However (leaving aside his discussion of negative inversion), Law’s proposal that English (otherwise) lacks LF movement to finite complementizers (that and Ø) leaves not only the central example (7a) unexplained, but also the following type of case, which is certainly far better than (ii): (iii) who1 thinks [CP that [IP who2 I left]] (I find (iii) grammatical, but see L&S (1992, 116) and Aoun, Hornstein, and Sportiche (1981, fn. 14, and references cited there) for discussion of the grammaticality status and analysis of such examples.) So that the long-distance trace of who2 created in LF can be properly head-governed, LF V-to-C movement presumably applies in (iii). But then the following question emerges: If (a) Law’s analysis is correct, and (b) WF has LF head movement that can “salvage” a subject trace created in the syntax, and (c) English indeed has LF head movement properly governing a subject trace created in LF ((7a) and section 5), then why is it that English syntactic that-t violations like (ii) cannot be salvaged by LF head movement, as (Law’s analysis proposes) they are in the WF case (i)? One way (out of perhaps many) of resolving this possible problem (which arises only if Law’s analysis and the one presented here are both on the right track) is to assume a subtle and by no means unprecedented type of distinction between the two grammars: in English the head government requirement is in effect a constraint on movement (Chomsky and Lasnik (1993)) or perhaps an S-Structure requirement (L&S (1984, 92), Rizzi (1990)). Thus, for (ii), there is no salvation. In WF the constraint is representational, applying only in LF. It is interesting in this context to note that “the ECP” as formulated in Chomsky and Lasnik (1993) has both a derivational property (*-marking) and a representational aspect (the * filter applying to LF representations). The second issue that arises from the discussion in this section involves syntactic V2 and parametric variation regarding the “timing” of C0-deletion. Within the (checking-based) analysis proposed here, parametric variation concerns the level at which checking by a functional category occurs and critically involves Procrastinate (a principle that, I assume here, entails that operations apply as late as possible to yield convergence). Here I simply mention this issue; for a more complete discussion, see Epstein (1993). 47 Notice that when the internal domain is the null set, the checking domain will (consequently) be equivalent to the minimal domain itself. The empirical consequences of this require further investigation. Notice also that, within the account I am proposing, the bifurcation of the minimal domain into internal and checking domains differs from the bifurcation assumed in Chomsky (1993). However, the minimal domain of the chain is the same in my account and Chomsky’s. Consequently, my analysis is consistent with Chomsky’s (1993, 12) “equidistance” analysis of apparent Relativized Minimality violations induced by object shift, an analysis resting only on the notion “minimal domain” (and hence not sensitive to the particular bifurcation of the minimal domain into checking and internal domains). 48 For an alternative analysis of the LF suppression of subject ECP/wh-island effects, see Martin (1996). Martin also postulates a crucial role for LF head movement. In his analysis, [+wh] C is an LF affix and undergoes LF adjunction to the higher V. In (7a), the embedded [+wh] C adjoins to the V wonder in LF. Such movement renders the embedded [+wh][Spec, CP] (=what) and the matrix VP-adjoined position equidistant (in the sense of Chomsky (1993)) from the embedded wh-subject in situ (who). The Minimal Link Condition therefore allows the
138
ESSAYS IN SYNTACTIC THEORY
wh-subject to “skip over” the “filled” [+wh][Spec, CP], thereby escaping this wh-island in its LF journey (via matrix VP-adjoined position) to the matrix [+wh][Spec, CP]. 49 I am indebted to Noam Chomsky for discussion of this and other points raised in this section. Any errors are mine. 50 For a critical examination of this analysis, see Groat (1994).
References Aoun, J., Hornstein, N. and Sportiche, D. (1981) “Some Aspects of Wide Scope Quantification,” Journal of Linguistic Research 1:69–95. Aoun, J. and Sportiche, D. (1983) “On the Formal Theory of Government,” The Linguistic Review 2:211–235. Baltin, M. (1982) “A Landing Site Theory of Movement Rules,” Linguistic Inquiry 13:1–38. Barss, A. (1986) “Chains and Anaphoric Dependence,” unpublished doctoral dissertation, MIT. Barss, A. (1988) “Paths, Connectivity, and Featureless Empty Categories,” in Cardinaletti, A., Cinque, G. and Giusti, G. (eds.) Constituent Structure: Papers from the 1987 GLOW Conference, Dordrecht: Foris. Borer, H. (1981) “On the Definition of Variable,” Journal of Linguistic Research 1:17–40. Brody, M. (1984) “On Contextual Definitions and the Role of Chains,” Linguistic Inquiry 15:355–380. Browning, M. (1987) “Null Operator Constructions,” unpublished doctoral dissertation, MIT. [Published in 1991 by Garland Press, New York.] Chomsky, N. (1972) Language and Mind, enlarged edition, New York: Harcourt, Brace Jovanovich. Chomsky, N. (1973) “Conditions on Transformations,” in Anderson, S. and Kiparsky, P. (eds.) A Festschrift for Morris Halle, New York: Holt, Reinhart, and Winston. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986) Barriers, Cambridge, Mass.: MIT Press. Chomsky, N. (1991) “Some Notes on Economy of Derivation and Representation,” in Freidin, R. (ed.) Principles and Parameters in Comparative Grammar, Cambridge, Mass.: MIT Press. Chomsky, N. (1993) “A Minimalist Program for Linguistic Theory,” in Hale, K. and Keyser, S.J. (eds.) The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, Cambridge, Mass.: MIT Press. Chomsky, N. (1994) “Bare Phrase Structure,” MIT Occasional Papers in Linguistics 5, MITWPL, Department of Linguistics and Philosophy, MIT. [Also published in 1995 in Webelhuth, G. (ed.) Government and Binding Theory and the Minimalist Program, Oxford: Blackwell Press.] Chomsky, N. and Lasnik, H. (1993) “The Theory of Principles and Parameters,” in Jacobs, J., von Stechow, A., Sternefeld, W. and Vennemann, T. (eds.) Syntax: An International Handbook of Contemporary Research, Berlin: Walter de Gruyter. Collins, C. (1994) “Economy of Derivation and the Generalized Proper Binding Condition,” Linguistic Inquiry 25: 45–61. Epstein, S.D. (1984) “A Note on Functional Determination and Strong Crossover,” The Linguistic Review 3:299–305. Epstein, S.D. (1986) “The Local Binding Condition and LF Chains,” Linguistic Inquiry 17:187–205. Epstein, S.D. (1987) “Empty Categories and Their Antecedents,” unpublished doctoral dissertation, University of Connecticut, Storrs. Epstein, S.D. (1989a) “Adjunction and Pronominal Variable Binding,” Linguistic Inquiry 20:307–319. Epstein, S.D. (1989b) Unpublished review of Lasnik and Saito (1992), manuscript, Harvard University. Epstein, S.D. (1990a) “Differentiation and Reduction in Syntactic Theory: A Case Study,” Natural Language and Linguistic Theory 8:313–323. Epstein, S.D. (1990b) “X0 Movement, Xmax Movement and a Unified Definition of ‘Antecedent Government’,” manuscript, Harvard University.
OVERT SCOPE MARKING AND COVERT VERB-SECOND
139
Epstein, S.D. (1991) Traces and Their Antecedents, Oxford: Oxford University Press. Epstein, S.D. (1992) “Derivational Constraints on Ā-Chain Formation,” Linguistic Inquiry 23:235–259. Epstein, S.D. (1993) “Superiority”, Harvard Working Papers in Linguistics 3:14–64. Epstein, S.D. (1994) “The Derivation of Syntactic Relations,” paper presented at the Harvard University Forum in Synchronic Linguistic Theory. Epstein, S.D. (To appear) “‘UN-Principled’ Syntax and the Derivation of Syntactic Relations,” in Browning, M. (ed.) Working Minimalism, Cambridge, Mass.: MIT Press. Epstein, S.D., Groat, E., Kawashima, R., and Kitahara, H. (To appear) A Derivational Approach to Syntactic Relations, New York/Oxford: Oxford University Press. Ferguson, K.S. and Groat, E. (1994) “Defining ‘Shortest Move’,” paper presented at the 17th GLOW Colloquium, Vienna. Freidin, R. (1992) Foundations of Generative Syntax, Cambridge, Mass.: MIT Press. Freidin, R. (1995) “Superiority, Subjacency and Economy,” in Campos, H. and Kempchinsky, P. (eds.) Evolution and Revolution in Linguistic Theory, Washington, D.C.: Georgetown University Press. Groat, E. (1994) “Against Functional Category Deletion: A Bare Theory Argument,” Harvard Working Papers in Linguistics 4:52–62. Hendrick, R. and Rochemont, M. (1982) “Complementation, Multiple wh and Echo Questions,” manuscript, University of North Carolina and University of California at Irvine. Higginbotham, J. and May, R. (1981) “Questions, Quantifiers, and Crossing,” The Linguistic Review 1:41–80. Huang, C.-T.J. (1982a) “Logical Relations in Chinese and the Theory of Grammar,” unpublished doctoral dissertation, MIT. Huang, C.-T.J. (1982b) “Move wh in a Language Without wh-Movement,” The Linguistic Review 1:369–416. Jonas, D. (1992) “Checking Theory and Nominative Case in Icelandic,” Harvard Working Papers in Linguistics 1: 175–196. Jonas, D. and Bobaljik, J. (1993) “Specs for Subjects: The Role of TP in Icelandic,” in MIT Working Papers in Linguistics 18: Papers on Case and Agreement I, MITWPL, Department of Linguistics and Philosophy, MIT. Kissock, M. (1995) “Reflexive-Middle Constructions and Verb Raising in Telugu,” unpublished doctoral dissertation, Harvard University. Kitahara, H. (1992) “Checking Theory and Scope Interpretation Without Quantifier Raising,” Harvard Working Papers in Linguistics 1:51–72. Kitahara, H. (1993a) “Deducing ‘Superiority’ Effects from the Shortest Chain Requirement,” Harvard Working Papers in Linguistics 3:109–119. Kitahara, H. (1993b) “Target-α: Deducing Strict Cyclicity from Principles of Economy,” paper presented at the 16th GLOW Colloquium, Lund. Kitahara, H. (1994) “Target-α: A Unified Theory of Movement and Struc-ture-Building,” unpublished doctoral dissertation, Harvard University. Kitahara, H. (1996) “Raising Quantifiers Without Quantifier Raising,” in Abraham, W., Epstein, S.D., Thráinsson, H., and Zwart, J.-W. (eds.) Minimal Ideas: Syntactic Studies in the Minimalist Framework, Amsterdam: John Benjamins. Kitahara, H. (1997) Elementary Operations and Optimal Derivations, Cambridge, Mass.: MIT Press. Koopman, H. (1983) “ECP Effects in Main Clauses,” Linguistic Inquiry 14: 346–350. Koopman, H. and Sportiche, D. (1983) “Variables and the Bijection Principle,” The Linguistic Review 2:139–160. Koopman, H. and Sportiche, D. (1986) “A Note on Long Extraction in Vata and the ECP,” Natural Language and Linguistic Theory 4:357–374. Larson, R. (1988) “On the Double Object Construction,” Linguistic Inquiry 19:335–391. Lasnik, H. (1985) “Illicit NP Movement: Locality Conditions on Chains?” Linguistic Inquiry 16:481–490. Lasnik, H. and Saito, M. (1984) “On the Nature of Proper Government,” Linguistic Inquiry 15:235–289. Lasnik, H. and Saito, M. (1992) Move α, Cambridge, Mass.: MIT Press. Law, P. (1991) “Verb Movement, Expletive Replacement and Head Government,” The Linguistic Review 8:253–285.
140
ESSAYS IN SYNTACTIC THEORY
Lee, E.-J. (1993) “Superiority Effects and Adjunct Traces”, Linguistic Inquiry 24:177–183. Martin, R. (1996) “On LF-Movement and wh-islands,” in UCONN Syntax in the Minimalist Program, MITWPL, Department of Linguistics and Philosophy, MIT. May, R. (1977) “The Grammar of Quantification,” unpublished doctoral dissertation, MIT. May, R. (1985) Logical Form: Its Structure and Derivation, Cambridge, Mass.: MIT Press. Muysken, P. (1982) “Parametrizing the Notion ‘Head’,” Journal of Linguistic Research 2:57–75. Pesetsky, D. (1987) “Wh-in-Situ: Movement and Unselective Binding,” in Reuland, E.J. and ter Meulen, A. (eds.) The Representation of (In)de-finiteness, Cambridge, Mass.: MIT Press. Pollock, J.-Y. (1989) “Verb Movement, Universal Grammar, and the Structure of IP,” Linguistic Inquiry 20:365–424. Poole, G. (1996) “Deducing the X′-Structure of Adjunction,” in Di Scuillo, A.-M. (ed.) Essays on X'-Structure and Adjunction, Somerville, Mass.: Cascadilla Press. Reinhart, T. (1979) “Syntactic Domains for Semantic Rules,” in Guenthner, F. and Schmidt, S.J. (eds.) Formal Semantics and Pragmatics of Natural Language, Dordrecht: Reidel. Riemsdijk, H.van and Williams, E. (1981) “NP-Structure,” The Linguistic Review 1:171–217. Rizzi, L. (1986) “On Chain Formation,” in Borer, H. (ed.) Syntax and Semantics 19, New York: Academic Press. Rizzi, L. (1990) Relativized Minimality, Cambridge, Mass.: MIT Press. Safir, K. (1984) “Multiple Variable Binding,” Linguistic Inquiry 15:603– 638. Stowell, T. (1981) “Origins of Phrase Structure,” unpublished doctoral dissertation, MIT. Stowell, T. and Lasnik, H. (1991) “Weakest Crossover,” Linguistic Inquiry 22:687–720. Thráinsson, H. (1996) “On the (Non-)Universality of Functional Categories” in Abraham, W, Epstein, S.D., Thráinsson, H. and Zwart, J.-W. (eds.) Minimal Ideas: Syntactic Studies in the Minimalist Framework, Amsterdam: John Benjamins. Tiedeman, R. (1990) “An S-Structure/LF Asymmetry in Subject Extraction,” Linguistic Inquiry 21:661–667. Torrego, E. (1984) “On Inversion in Spanish and Some of Its Effects,” Linguistic Inquiry 15:103–129. Williams, E. (1988) “Is LF Distinct from S-Structure? A Reply to May,” Linguistic Inquiry 19:135–146. Williams, E. (1994) Thematic Structure in Syntax, Cambridge, Mass.: MIT Press.
9 “UN-PRINCIPLED” SYNTAX AND THE DERIVATION OF SYNTACTIC RELATIONS
0 Preface Nash (1963) writes: …having learned to reject, as delusive, the hope that theoretical premises are, or can be made, selfevident—we cannot but recognize that always our explanations are incomplete. Hall (1956) attributed to Galileo and Newton the opinion that: “The explanation of phenomena at one level is the description of phenomena at a more fundamental level...” Complete understanding then fails by the margin of those theoretical premises which are stipulated, perhaps “described”, but certainly not themselves explained or explicable for so long as they remain our ultimate premises…Resolved to maximize our understanding, we find ourselves committed to a highly characteristic effort to minimize the number of theoretical premises required for explanation. Einstein (1954) speaks of: “The grand aim of science, which is to cover the greatest possible number of empirical facts by logical deductions from the smallest possible number of hypotheses or axioms.” Some centuries earlier Newton had expressed the same “grand aim” in the first of his Rules of Reasoning: “We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances.” …Each “quality” imputed to a premised entity figures as an additional postulate. Our desire for parsimony of postulates thus evokes a search for theoretical posits having the slenderest possible qualitative endowment. 1 Introduction: Syntactic Relations Alongside explicit specifications of “syntactic feature” and “permissible syntactic feature-bundle” (=a possible syntactic category), perhaps the most fundamental construct postulated within syntactic theory is that of “a syntactic relation.” In fact, broadly conceived, syntactic theory is precisely a theory of relations
142
ESSAYS IN SYNTACTIC THEORY
and the elements that enter into them. For example, the following are each considered to be core syntactic relations. (1) a. (Subject-Verb 3rd-Person Singular) Agreement: The dog kicks walls. b. (Object) Theta-Role Assignment Relation: The dog kicks walls. c. (Accusative) Case Relation: The dog kicks them. (*they) d. (Reflexive) Binding-Relation: The dog kicks herself. e. (Passive) Movement Relation: (Sadly) the dog was kicked t. f. The “is a” Relation: The dog [VP[Vkicks] [DPwalls]] (“kicks” (=V) and “walls” (=DP) together constitute, that is, are a VP).
Such relations are apparently very heavily constrained; that is, we do not find empirical evidence for the logically possible syntactic relations, but instead (at least from the perspective of unified theories, to be discussed momentarily) we find only one, or perhaps “a few,” to be distinguished on a principled basis. Given that syntactic relations are very heavily constrained, the questions we confront include: “What are they?”/“How are they to be formally expressed?” and more deeply “Why do we find these and not any of the infinite number of logically possible others?” Within the framework of Chomsky (1981; 1982) there were, by hypothesis, at least two fundamental syntactic relations. The first, “the government relation” (see also Chomsky (1986)) was a unified construct, a binary relation, under which all the seemingly disparate phenomena illustrated in the English examples (1a–e) (but not (1f)) were to be captured. By contrast, the “is a” relation (1f) was not a government relation, but rather, a relation created by the base-component, upon which government relations were defined. For example, the trinary relation in (1f), “V and DP are a VP,” is not a government relation. This type of theory of syntactic relations arguably confronts a number of conceptual (and empirical) problems: • Unification: Unification is, in a sense, precluded in that the “is a” relation is divorced from the government relation. Government relations are defined on base-generated representations “already” exhibiting “is a” relations. Why isn’t there just one relation? And if there is not, why are (1a–e) (these five cases) unified, under government, with (1f), the “is a” relation the one “left out”? • Explanation: Explanation at a certain level is lacking to the extent that the (or one of the) fundamental relations, “government” (Chomsky (1981; 1986)), or more recently “Minimal Domain” (Chomsky (1993)), a binary relation defined on representations, is merely a definition. Hence true explanation is lacking at this level; that is, the following is unanswered: “Why is ‘government’ as defined in (3) below or ‘Minimal Domain’ (see (38)) the fundamental syntactic relation and not any of the infinite number of other logically possible syntactically definable relations?” • Primitive Constructs: Government/Minimal Domain are not in fact primitives in that they each incorporate a more fundamental binary relational construct, command. Thus “government” or “Minimal Domain” is not just an (unexplained) definition, it is in fact, contrary to standard assumption, not the fundamental unexplained definition, but rather the relation command is. Of course, if command is to express a fundamental syntactic relation, and it remains an (unexplained) definition, then, in the exact same sense as above, explanation is lacking: “Why is this relation, so defined, syntactically significant?” • Complexity: “Government” or “Minimal Domain” definitions are “complex”. Of course, this claim has no substance whatsoever in the absence of an explicit principled complexity metric. Hence, I will leave it to the intuition of the reader that the alternative theory of syntactic relations proposed in this paper achieves significant simplification and (we hope) exhibits no associated loss, and perhaps even a gain, in empirical adequacy.
“UN-PRINCIPLED” SYNTAX
143
In this paper I will address each of these four (closely related) problems confronting the fundamental construct “syntactic relation” as it is expressed in contemporary syntactic theories. The analysis will be couched within the Minimalist Program (Chomsky (1993; 1994)) within which, importantly: (i) D-Structure is eliminated and along with it, the bifurcation of the D-Structure-generating base component and the transformational component, and (ii) Generalized Transformation (=Merge), arguably unifiable with Singulary Transformation (=Move-a), as proposed in Kitahara (1994; 1995; 1997), is reinstated. The central hypothesis I will propose here can be expressed in an informal (inexact) and preliminary fashion as follows: (2) Preliminary Hypothesis:
a. The fundamental concept “Syntactic Relation,” e.g. “Government” or “Minimal Domain,” are not unexplained definitions defined on representations (i.e., “already built-up” phrase structure representations). Rather, syntactic relations are properties of independently motivated, simple and minimal transformations. That is, syntactic relations are established between a syntactic category X and a syntactic category Y when (and only when) X and Y are transformationally concatenated (thereby entering into sister-relations with each other) by either Generalized Transformation or Moveα during the tree-building, iterative, universal rule application which constitutes the derivation. b. The fundamental structure-building operation is not Move-α (Chomsky (1981; 1982)) nor Affect α (Lasnik and Saito (1992)) but rather Concatenate, as is entirely natural and minimal in that concatenation requires at least two objects: Concatenate X and Y, thereby forming Z. The analysis I will propose is entirely natural in that concatenation and only concatenation establishes syntactic relations between categories. Given this hypothesis, and given that any syntactic system, by definition, requires concatenation, it follows that the fundamental construct “syntactic relation” should “fall out;” that is, it will be deducible from (hence explained by appeal to) the independently and strongly motivated, postulate “Concatenate” as simply expressed by Merge and Move in the Minimalist Program. To the extent that we are correct that “syntactic relation” is, contra contemporary theories of syntax, an explicable derivational construct, not a definitional (unexplained) representational notion, the four central and “forbidding” obstacles listed above, confronting all current syntactic explanation will be overcome. Before proceeding, I would like to briefly place my hypothesis, namely that fundamental, representational, unexplained definitions can be replaced by derivational explanations, in a broader historical context. As is well-known, the “rule-less,” representation-based Principles and Parameters theory evolved from earlier rule-based systems. The construction-specificity and language-specificity of the phrase structure and transformational rules postulated represented a serious obstacle to explanatory adequacy: “How does the learner select this grammar with these rules, based on exposure to this (highly ‘degenerate’) data?” An entirely natural development was the gradual abandonment of rule-based grammars and the concomitant postulation of universal constraints on representations or principles (expressing the properties common to rules) which were consequently neither construction- nor language-particular entities. The residue, namely the language-specific properties of rules, to be fixed by “experience,” were ascribed the status of parameters with the hope that construction-specificity would be altogether eliminated from core grammar. While the abandonment of rule-systems and the adoption of Principles— that is, filters or wellformedness conditions on rule-generated representations—was an entirely natural development, there is an alternative, which I believe is reflected (perhaps only implicitly) in Chomsky (1991; 1993; 1994). The
144
ESSAYS IN SYNTACTIC THEORY
alternative is this: given that it was the language-specificity and construction-specificity of rules and not the fact that they were rules per se that apparently threatened explanatory adequacy, an alternative to the postulation of principle-based theories of syntax is to retain a rule-based framework, but eliminate from the rules their language-particular and construction-particular formal properties. That is, instead of universal Principles (=constraints on representations, for example Binding Theory, Case Theory, X-bar Theory, Theta Theory, as postulated within a rule-“free,” principle-based system), an alternative is to postulate universal iteratively applied rules thereby maintaining a “strongly derivational” theory of syntax, like Standard Theory in that it incorporates iterative application of rules but unlike Standard Theory in that the rules are “universalized,” as are Generalized Transformation (Merge) and Move (Chomsky (1993; 1994)), purged of language-particular and construction-specific properties: their apparent cross-linguistic differences being attributed, by hypothesis, to irreducible morphological variation. In this paper, I will argue that this strongly-derivational universal-rule approach, in which iterative rule application characterizes syntactic derivations, and perhaps constraints on output representations and constraints on levels of representation (hence levels themselves) are altogether eliminated (see Chomsky (1994) and Fall 1994 lectures), exhibits vast explanatory advantages over the existing, representational, “rule-free,” principle-based (hence representation-based) theories—at least in the domain of accounting for the absolutely central construct “syntactic relation”. 2 Syntactic Relations in Principle-Based Theory In the pre-Minimalist, principle-based framework of Chomsky (1981; 1986), Representations (Phrase Structure Trees) are built “freely” by an unconstrained (implicit) base. The output is constrained by the X′schema, a filter on output representations. The unifying construct “government” is a binary “unidirectional” (asymmetric) syntactic relation holding between two syntactic categories in a derived representation. The unifying syntactic relation, “government,” is defined as follows: (3) Government: a. X governs Y iff i. X m-commands Y, and ii. there is no Z, Z a barrier for Y, such that Z excludes X. b. X m-commands Y iff the minimal maximal projection dominating X dominates Y. (See Aoun and Sportiche (1983).) c. X excludes Y iff no segment of X dominates Y. d. X dominates Y only if every segment of X dominates Y. (See Chomsky (1986, fn. 10).) e. Z is a barrier for Y iff i. Z immediately dominates W, W a blocking category for Y, OR ii. Z is a blocking category for Y and Z≠IP. f. A maximal projection X immediately dominates a maximal projection Y iff there is no maximal projection Z such that X dominates Z and Z dominates Y. g. Z is a blocking category for Y iff i. Z≠L-marked, and ii. Z dominates Y. h. X L-marks Y iff X is a lexical category that theta-governs Y.
“UN-PRINCIPLED” SYNTAX
145
i. X theta-governs Y iff i. X is a zero-level category, and ii. X theta-marks Y, and iii. X and Y are sisters.
While such an approach constitutes an impressive, highly explicit and unified analysis of a number of seemingly very disparate syntactic phenomena, it arguably suffers from the four bulleted problems noted above. First, it is not wholly unified since the “is a” relation is not a government relation. Second, “government” is a definition hence it is entirely unexplained why syntactic phenomena, by hypothesis, conform to this particular relation and not to any of the other infinite, alternative, and syntactically definable, relations. Third, “government” is not really a primitive relation since it incorporates the more primitive relation “m-command” (3a.i, 3b). Fourth, the definition of “government” is arguably “complex” (but see section 1 for serious unclarities surrounding such inexplicit claims). Let us begin by addressing the question of what is, by hypothesis, primitive. Following Chomsky (1993, fn. 9), I will assume that m-command in fact plays no role. I will however assume that c-command is indeed a primitive. In the next section, I will attempt to show that contrary to all syntactic analyses since Reinhart (1979), and including, most recently, Kayne (1993; 1994), c-command need not be expressed as a representational unexplained definition but can instead be expressed as a natural explicable derivational construct, assuming Chomsky’s (1993; 1994) elimination of a distinct base-component and, along with it, the elimination of a (base-generated) Deep Structure level of representation and the postulation of a syntactic component in which derivations are characterized by iterative, “bottom-up” application of “universalized” simple, and perhaps unifiable (Kitahara (1994; 1995; 1997)) rules, Merge (Generalized Transformation) and Move (Singulary Transformation). 3 C-Command 3.1 Representational C-Command Consider definition (4). (4) The Representational Definition of “c-command” A c-commands B iff: a. The first branching node dominating A dominates B, and b. A does not dominate B, and c. A does not equal B. (Reinhart (1979))
The first thing to notice is that this is a definition, hence explanation is lacking; that is, we have no answer to the question “Why is this particular binary relation syntactically significant?” As an illustration, consider, for example, (5). In (5), [Spec, IP] (=Da) c-commands I′, INFLwill, VP, Vlike, D, Dthe, Nmovie—and nothing else. The question is “Why?” It is exactly as if the other categories in (5) are, with respect to Da, (inexplicably)
146
ESSAYS IN SYNTACTIC THEORY
“invisible,” hence Da enters into no relations with these other categories. That is, Da c-commands none of these others (although Da is c-commanded by some). We should also note some other important properties of c-command. In addition to being a definition, hence nonexplanatory, it is, secondly, pervasive and fundamental, apparently playing a unifying role throughout the different subcomponents of the syntax. Third it is persistent; that is, despite substantive changes in the theory of syntax, Reinhart’s definition, proposed 19 years ago, remains, by hypothesis, linguistically significant. Fourth, as noted, it is representational; that is, it is a relation defined on representation. (5) A Schematic Illustration of C-Command (AgrP and TP movements omitted (irrelevant))
The unanswered questions confronting c-command are thus, at least, those given in (6). (6) a. Why does it exist at all? Why doesn’t A enter relations with all constituents in the tree? b. Why is the first branching node relevant? Why not: “the first or second or third (nth?) node dominating A must dominate B?” c. Why is branching relevant? d. Why doesn’t A c-command the first branching node dominating A, but instead c-commands only categories dominated by the first branching node? e. Why must A not dominate B? f. Why must A not equal B?
Thus, arguably, one of the most fundamental unifying relations is expressed as a purely stipulated representational definition. Explanation is thereby precluded. The hypothesis I will advance is that the properties of c-command just noted are not accidental, but are intimately related. First, I believe it is fundamental, pervasive, and persistent because it is, by hypothesis, indeed a syntactically significant relation. Second, I propose that it is definitional (nonexplanatory) precisely because it has been formulated/ construed as a representational relation. Third, I will propose that c-command is in fact derivational, that is, a relation between two categories X and Y established in the course of a derivation (iterative universal-rule application) when and only when X and Y are Paired (Concatenated) by Transformational Rule, either
“UN-PRINCIPLED” SYNTAX
147
Merge or Move. Construed derivationally, the unanswered questions confronting the representational definition will receive natural answers. 3.2 The Derivation of C-Command To begin with, I will assume (7). (7) a. Merge and Move (Chomsky (1993; 1994)) are at least partly unifiable (as proposed in Kitahara (1993; 1994; 1995; 1997)) in that each “Pairs” (=Concatenates) exactly two categories, “A” and “B,” rendering them sisters immediately dominated by the same (projected) mother “C,” (where C=the head of A or of B (Chomsky (1994))). b. Given (7a), there is a fundamental operation, common to or shared by both Merge and Move alike, namely Concatenate A and B forming C (C=the head of A or of B).
Crucially then, what the universalized transformational rules Merge and Move each do is establish a syntactic relation between two concatenated syntactic categories A and B by virtue of placing the two in the “is a” relation with C, the projected category. I will also assume, with Chomsky (1994), that Merge operates “bottom-up”; that is, it applies cyclically (=a universal (independently motivated) constraint on a universal rule) and Move does so as well. Consider, for example, the derivation in (8). (8) Merging Vlikes and Dit yields, informally,
The “lower” Vlikes (=A) and Dit (=B) are by virtue of undergoing Merge, in a relation; namely, they are “the sister constituents of a V-phrase/projection C, labelled ‘Vlikes’”. Thus what Merge does is create sisters; that is, it concatenates exactly two categories A and B, and projects their mother C (C=the head of A or of B). Crucially A and B cannot be or become sisters without having a common mother. Conversely, if nonbranching projection is disallowed, and only binary branching is permitted, then there cannot be a mother C without exactly two daughters (=the sisters A and B). In a nutshell, both the sisterhood relation and the motherhood relation (the latter, the “is a” relation) are simultaneously created in one fell swoop, by (internal to) a single-application of Merge, generating C. Thought of in terms of standard-theory transformations, Vlikes and Dit constitute the Structural Description of Merge. The Structural Change (perhaps deducible, given the Structural Description (see Chomsky (1994))) specifies the categorical status of the mother or output tree/set. Thus invariably the two entities in the Structural Description are rendered sisters, that is, are placed in the “is a” relation to the projected (perhaps predictable) mother C, all of this internal to a single Merge application. Consequently, there is no need for a representational definition of “sister” or “mother”/“is a” since these two relations are clearly expressed (and unified) within the independently-motivated, universal structure-building rules themselves. Representational definitions would therefore be entirely redundant, and as definitions, nonexplanatory. The tree in (8) is formally represented as {Vlikes, {Vlikes, Dit}}. This object (set) consists of three “terms,” as given in (9).
148
ESSAYS IN SYNTACTIC THEORY
(9) a. The entire tree/set (=C) b. Vlikes (=A) c. Dit(=B)
That is, following Chomsky (1994), I assume the definition (10). (10) Term (“constituent”): For any structure K (i) K is a term of K (the entire set or tree is a term). (ii) if L is a term of K, then the members of the members of L are terms of K. (Chomsky (1994, 12)) (iii) K={Vlikes {Vlikes, Dit}}=one term (iv) K has two members: Member 1=“Vlikes”=“the label” Member 2=a two-membered set={Vlikes, Dit} M1 M2 (v) M1 and M2=members of a member, that is, each is a member of Member 2 of K. Therefore each is a term.
Thus,“Terms correspond to nodes of the informal representations, where each node is understood to stand for the subtree of which it is the root” Chomsky (1994, 12)). Continuing with the derivation, suppose, after constructing (8), we now Merge Dthe and Ndog, yielding informally the structure in (11). (11)
The tree is formally represented as {Dthe, {Dthe, Ndog}}, similarly consisting of three terms: the entire two-membered set and each of the two categories that are members of a member of the two-membered set (namely Dthe and Ndog) Now, having constructed the two three-membered trees in (8) and (11), suppose we Merge these two, yielding the structure in (12). (12)
But now notice that there exists a massive redundancy: • The representational definition of c-command (4) stipulates c-command relations between sisters in the derived representation (12).
“UN-PRINCIPLED” SYNTAX
149
• But, sisters are precisely the objects A and B which invariably undergo Merge in building the representation. Thus, the relations in (13) obtain. (13) In (12): i. a. Dthe representationally c-commands Ndog: They were MERGED, b. Ndog representationally c-commands Dthe: They were MERGED, ii. a. Vlikes representationally c-commands Dit: They were MERGED, b. Dit representationally c-commands Vlikes: They were MERGED, iii. a. Da representationally c-commands Vb: They were MERGED, b. Vb representationally c-commands Da: They were MERGED. iv. Vc representationally c-commands nothing: It has not undergone MERGE with another category. v. In (12), the ten binary dominance relations: (“X dominates Y”) are, by pure stipulation in (4b) not c-command relations: They were not MERGED. vi. No category representationally c-commands itself (by pure stipulation in (4c)): No category is MERGED with itself.
Thus we see that Merge, an entirely simple, natural, minimal, and independently-motivated Structure Building Operation (i.e., transformational rule), seems to capture representational c-command relations. That is, if X and Y are Concatenated, they enter into (what we have called) “c-command relations”. Consequently, it would seem that we can entirely eliminate the stipulated, unexplained representational definition of “c-command” (4), since the relation is expressed by an independently motivated transformational rule. There is however a problem with this suggestion: when Merge pairs two categories, this establishes only symmetrical (reciprocal) c-command relations. Consider, for example, (14). Correctly, the arrows in (14a) each indicate c-command relations. But, Merge does not totally subsume the representational definition of “c-command”, precisely because there exist c-command relations between two categories that were not Merged. Thus, (14b) is true, but (14c) is false. (14)a
150
ESSAYS IN SYNTACTIC THEORY
b. If A and B were Merged, then A c-commands B and B c-commands A. c. If A c-commands B, then A and B were Merged. To see the falsity of (14c) consider (15), in which Da and Vb are Merged. (15)
Da (=Spec) c-commands Head (Vlikes) and Complement (Dit), but Da (Spec) was not Merged with the Head nor with the Complement. As a solution to this problem confronting our attempt to entirely deduce representational, definitional ccommand from Merge, notice that although Da (Spec) was not Merged with Vlikes (Head) nor with Dit (Complement), Da was Merged with Vb. But, now recall that Vb={Vb, {Vlikes, Dit}}; that is, Vb consists of three terms: (16) (i) {Vb, {Vlikes, Dit}} (the whole Vb subtree in (15)), and (ii) Vlikes, and (iii) Dit
Thus, recall “…each node is understood to stand for the subtree of which it is the root”. Given that a syntactic category is a set of terms (in dominance/precedence relations)—Vb, for example, consists of three terms—we can propose the following, natural derivational definition of “c— command.” (17) Derivational C-Command (Preliminary Version): X c-commands all and only the terms of the category Y with which X was Merged in the course of the derivation.
Thus, Da (Spec) c-commands Vb (X′) and all terms of Vb. Recall Move, the other Structure Building Operation also Pairs/Concatenates exactly two categories, projecting the head of one, and in this respect is identical to Merge (Kitahara (1993; 1994; 1995)). Therefore, since “is a” relations are created by Move in the same manner as they are created by Merge, we can now propose the version in (18). (18) Derivational C-Command (Final Version): X c-commands all and only the terms of the category Y with which X was Paired by Merge or by Move in the course of the derivation.
Given (18), consider the case of Move in (19).
“UN-PRINCIPLED” SYNTAX
151
(19)
By Reinhart’s (1979) definition (4), in the representation (19) Dhe, by unexplained definition, representationally c-commands the five categories (subtrees) I′, INFLwas, V, Varrested, Dt, and nothing else. But this unexplained state of affairs is explicable derivationally: Dhe was Paired/Concatenated (in this case by Move) with I′, and I′ is a five-term category/tree/set consisting of precisely I′, INFLwas, V, Varrested and Dt. It is entirely natural then that, since Dhe was Paired with a five-term object, and Pairing/ Concatenation is precisely the establishment of syntactic relations, Dhe enters into a relation (what has hitherto been called “c-command”) with each of these five terms, and with nothing else. Notice that, given this analysis, a certain (correct) asymmetry is also captured. While it follows that Dhe, as just noted, c-commands each of the five terms of I′, the converse is not true; that is, it is not true that each of the five terms of I′ c-commands Dhe. For example, INFLwas is a term of I′, but INFLwas does not ccommand Dhe; rather, since in the course of the derivation INFL was Paired, this time by Merge, with V, our analysis rightly predicts that INFLwas c-commands each of the three terms of V, namely V itself, Varrested, Dit, and nothing else. Given the derivational definition of “c-command” (18), we can now answer questions that were unanswerable given the representational definition of c-command (4). (20)a. Q: (Really an infinite number of questions:) Why is it that X c-commands Y if and only if the first branching node dominating X dominates Y? A: It is the first, (not, e.g., the fifth, sixth, nth (n=any posi tive integer)) node that appears relevant since this is the projected node created by Pairing of X and Y as performed by both Merge and by Move. b. Q: Why doesn’t X c-command the first branching node dominating X, but instead only the categories dominated by the first branching node. A: X was not Paired with the first branching node dominating X by Merge or by Move. c. Q: Why is branching node relevant? A: Assuming Bare Phrase Structure (Chomsky (1994)), no category is dominated by a nonbranching node; that is, Free Projection (as in Chomsky (1993)) is eliminated: Structure Building (Merge and Move) consists of Pairing, hence it invariably generates binary branching. d. Q: Why must X not equal Y; that is, why doesn’t X c-command itself? A: Because X is never Paired with itself by Merge or by Move. e. Q: Why is it that in order for X to c-command Y, X must not dominate Y? A: If X dominates Y, X and Y were not Paired by Merge or by Move.
152
ESSAYS IN SYNTACTIC THEORY
Thus, as is entirely natural, we propose that Pairing/Concatenating X and Y, by application of the universal transformational rules Move and Merge, expresses syntactic relations such as c-command. We have, thus far, provided what I believe to be strong explanatory arguments for the derivational construal of c-command proposed here. However, since we have thus far sought only to deduce precisely the empirical content of representational c-command, we have not provided any arguments that representational c-command is empirically inadequate. I will now provide one just such argument, suggesting that representational c-command “must” be abandoned; that is, it is inconsistent with an independently-motivated hypothesis. By contrast, derivational c-command will be shown to display no such inconsistency. Consider again a tree such as (21), in which Vb and Vc each=Vlikes and Da=Dthe. (21)
Recall in the input to (i.e., the Structural Description of) Merge, there were the two categories in (22). (22)a. Da=three terms: 1. Da itself (K is a term of K, see (10)) 2. Dthe 3. Ndog, and b. Vb=three terms: 1. Vb itself 2. Vlikes 3. Dit
Given that Da and Vb were Merged, derivational c-command (18) entails (23). (23) Da c-commands Vb, Vlikes, and Dit. Vb c-commands Da, Dthe, and Ndog.
But assuming a relational analysis of a syntactic category’s phrase-structure status (Muysken (1982), Freidin (1992)), in the representation (21) Vb, being neither a minimal nor a maximal projection of V (verb) is not a term (or is an “invisible term”) of (21) (Chomsky (1994)). Therefore, (algorithmically speaking) Vb is “stricken from the record” in (23); that is, it is not a c-commander at all. Consequently, Kayne’s (1994) reanalysis of Spec as an X′-adjunct is not required for LCA-compatibility, exactly as Chomsky (1994) proposed. Nor is Vb (=X′) c-commanded. Thus, in the informal representation (21), we have only the following relations, a proper subset of those in (23): (24)a. Da asymmetrically c-commands Vlikes and Dit. b. Dthe symmetrically c-commands Ndog. c. Vlikes symmetrically c-commands Dit.
“UN-PRINCIPLED” SYNTAX
153
But these are, by hypothesis, the desired results (facts). Importantly, V′ (=Vb), although representationally “invisible,” that is, not a term in the resulting representation (21), nonetheless blocks c-command of [Spec, VP] (=Da) by Vlikes, the Head, and by Dit, the Complement (see (24a)). But given that V′ is representationally invisible, the representational definition of c-command fails to even stipulate the apparent fact stated in (24a). That is, neither Vlikes nor Dit is a term of some other visible term which excludes [Spec, VP] (=Da) in the resulting representation (21). But, in direct contrast to representational ccommand, since Vlikes and Dit were Merged with each other, derivational c-command (18) entails that they c-command each other and nothing else. Notice, Vlikes and Dit were at one derivational point, members of a term, namely Vb, which was a maximal term (Vmax) which excluded [Spec, VP] immediately after Merging Vlikes and Dit However, given X′ invisibility, in the resulting representation, neither Vlikes nor Dit is a member of some term (other than themselves) which excludes [Spec, VP]; that is, there is no (visible) node (term) which dominates Vlikes and Dit, and also excludes [Spec, VP] (=Da). This suggests that the derivational construal of c-command proposed here is not only natural and explanatory as argued above, but given the empirical motivation for X′-invisibility, derivational c-command is in effect necessitated to the extent that the representational definition of c-command wrongly predicts that categories “immediately dominated” by a representationally invisible single-bar projection, for example, the Complement, ccommand the Specifier, and (“worse yet”) all members of the Specifier. 3.3 Summary The derivational definition of c-command (18) proposed here eliminates massive redundancy (see (13)), provides principled answers to an infinite number of unanswered questions confronting the definition of representational c-command (see (20)), and also overcomes empirical inadequacies (just noted) resulting from the interaction of the X′-invisibility hypothesis (Chomsky (1994)) and representational c-command (Reinhart (1979)). Moreover, the derivational definition is an entirely natural subcase of a more general hypothesis (explored below): namely all syntactic relations are formally expressed by the operation Concatenate A and B (=the Structural Description) forming C (=the Structural Change), common to both the structurebuilding operations (transformational rules) Merge and Move. Thus what Merge and Move do is establish relations including the “is a” relation and the c-command relation by virtue of concatenating categories. Nonetheless, despite its very significant advantages over representational c-command, the derivational definition is just that, a definition (albeit a very natural one), but as a definition, we must ask why it obtains. Thus we still have not answered at least one very deep question confronting the derivational approach, namely (6a), concerning c-command. (6) a. Why does it exist at all? Why doesn’t A enter relations with all constituents in the tree?
The derivational definition of c-command does not answer this question, rather it (very naturally) asserts that X enters into c-command relations with all and only the terms of the category with which it is transformationally Concatenated. But why doesn’t X enter into c-command relations with other categories/ terms? I will attempt to address this in the following section.
154
ESSAYS IN SYNTACTIC THEORY
3.4 Towards a Deduction of the Derivational Definition of C-Command First, let us consider the case of two categories such that neither c-commands the other. Consider (25), in which Vb and VP each = Vlikes. (25)
In (25), Dthe and Dit are such that neither c-commands the other, illustrating the generalization that members of Spec do not c-command X′-members, and X′-members do not c-command members of Spec. The first conjunct of this generalization is illustrated by, for example, the Binding violation in (26). (26) * [Spec This picture of John] [X′ upsets himself.]
The derivational definition (18) correctly entails that John fails to c-com-mand himself in (26). But, the nonexistence of such c-command relations is, I think, deducible. Consider what I will call “The First Law,” namely that the largest syntactic object is the single phrase structure tree. Interestingly, this hypothesis is so fundamental, it is usually left entirely implicit. The standard, that is, representational construal can be stated as given in (27). (27) The First Law: Representational Construal A term (=tree=category=constituent) T1 can enter into a syntactic relation with a term T2 only if there is at least one term T3 of which both T1 and T2 are member terms.
Informally, by the most fundamental definition of “syntax,” there are no syntactic relations from one tree to another distinct tree; that is, the laws of syntax are intra-tree laws; X and Y can enter into syntactic relations only if they are both in the same tree. Note, in (25)—the Merger-derived representation—there is indeed a tree (=the entire tree in (25)) such that Dthe (member of Spec) and Dit (the Complement) are both in it. But, as shown above (see (12)), derivationally in particular, prior to Cyclic Merger Da (the Spec tree) and the Vb (=X¢) tree were literally two unconnected trees. By the definition of “syntax” there can be no relation, including c-command, between members of two unconnected trees. To capture this, I propose that we reformulate the implicit First Law as a derivational, not a representational, law, which I give in (28). (28) The First Law: Derivationally Construed T1 can enter into c-command (perhaps, more generally, syntactic) relations with T2 only if there exists NO DERIVATIONAL POINT at which: a. T1 is a term of K1 (K1≠T1), and b. T2 is a term of K2 (K2≠T2), and c. There is no K3 such that K1 and K2 are both terms of K3.
“UN-PRINCIPLED” SYNTAX
155
Informally, there are no relations between members of two trees that were unconnected at any point in the derivation. For a formal explication of the First Law, making the intuition presented here explicit, see Groat (1997) and Epstein, Groat, Kawashima and Kitahara (1998, chapter 6). In the derivation of (24), assuming Cyclicity, deducible for Move as hypothesized in Kitahara (1993; 1994; 1995) (a universal constraint on universal-rule application), there was necessarily a derivational point at which Dthe was a member of Da/ Spec and Dit was a member of Vb/X¢ but there did not “yet” exist a tree containing both the branching Da tree (Spec) and the Vb tree (X¢). Therefore it follows from the derivational construal of The First Law, perhaps the most fundamental law of syntax, that there is no relation between Dthe and Dit. More generally, there are no relations between members of Spec and members of X¢. We thus (at this point, partially) derive fundamental syntactic relations like c-command and entirely derive the nonexistence of an infinite number of logically possible, but apparently nonexistent, syntactic relations, each of which is representationally definable, for example, the relation from X to X’s great-great-great (…) aunt. Notice we do so with no stipulations, no “technicalia,” nothing ad hoc, but rather by appeal only to The First Law, derivationally construed. Notice, incidentally, that in (29) the two Merged trees, namely Da and Vb themselves, can enter into syntactic relations even though at one derivational point they were unconnected. That is, (27) entails that since neither is a member of a term/tree (other than itself); that is, each equals a root node, neither has undergone Merge or Move, hence each is (like a lexical entry) not “yet” a participant in syntactic relations. (29)
To summarize, for two nodes (trees/terms/categories) X and Y, where neither c-commands the other, we do not need to stipulate representational c-command (4) to block the relations. In fact, we do not even need to appeal to the far more natural (redundancy-eliminating, X¢-invisibility-consistent) derivational definition of c-command (18). The derivational construal of The First (Most Fundamental) Law is sufficient: there are no syntactic relations between X and Y if they were, at any derivational point, members of two unconnected trees. As a simple illustration, consider again the Binding violation, repeated here with additional labels. (30) * [Spec/Da This picture of John] [X′/Vb upsets himself.]
This type of Binding phenomenon now receives a very simple analysis. A reflexive requires an antecedent of a particular morphosyntactic type, by hypothesis an irreducible lexical property. “To have an antecedent” is to “enter into a syntactic relation”. However, The First Law, derivationally construed, precludes the reflexive from entering into any syntactic relation with the only morphosyntactically possible candidate John since, given Cyclic Merger, there existed a point in the derivation in which: (i) John was a member of Da (Spec), and (ii) himself was a member of Vb/X′, and (iii) Da (an ex-Spec-tant Spec) and Vb/X′ were unconnected trees. In direct contrast to (30), notice that (31) represents a grammatical sentence. This will be discussed presently.
156
ESSAYS IN SYNTACTIC THEORY
(31) [Spec John] [X′ upsets himself.]
This completes our discussion of the deduction of those aspects of the derivational definition of c-command pertaining to two categories X and Y where neither c-commands the other. Next consider the case of asymmetric c-command; that is, X c-commands Y, but Y does not c-command X, as in (32), the tree representation of (31), in which Spec representationally c-commands the Complement but not conversely. The generalization to be accounted for is that Spec asymmetrically c-commands the Complement. The derivation is as in (32), given Cyclic Merge. Notice that DJohn (Spec) was never a member of some tree that did not contain Dhimself. Rather, Merge 2 Pairs/Concatenates DJohn itself (a member of the Numeration) with a tree containing Dhimself. Thus, correctly, The First Law (28) allows (i.e., does not block) a c-command relation from John (=T1 of (28)) to himself (=T2 of (28)). Such a relation is allowed by The First Law precisely because there were, in the course of the derivation, never two unconnected trees with one containing John and the other containing himself. (32)
In fact, notice that The First Law, a relationship blocker, is altogether inapplicable to this derivation since there never appeared two unconnected trees in this derivation. Rather Merge 1 merges two members of the Numeration (Vlikes and Dhimself), formally forming {Vlikes, (Vlikes, Dhimself}}, while Merge 2 merges yet another element of the Numeration, DJohn (not a set/tree) with this object, yielding (33). (33)
Since The First Law (a relationship blocker) is inapplicable, we now in fact confront a problem: All relations are now allowed, not just the empirically supported (“c-command”) relation from Spec to Complement, but also incorrectly a c-command relation from Complement to Spec. That is, in the absence of any supplementary constraints (relationship blockers), the inapplicability of The First Law allows Complement to c-command Spec. As a possible solution to this problem, first recall that in the Minimalist Program (Chomsky (1993; 1994)), all Concatenation/Pairing is performed by either Merge or Move. As we have claimed above, what Merge and Move do, naturally enough, is express syntactic relations, including the “is a” relation. Now, if
“UN-PRINCIPLED” SYNTAX
157
the universal rules Merge and Move are the sole relationship-establishers, and in addition they apply, by universal-rule constraint, cyclically, it is altogether natural, if not necessary, that a relation between X and Y is established exactly at the derivational point at which X and Y are Concatenated. Given this, we now have a potential solution to our problem: a Complement never bears any relation to (e.g., never c-commands) Spec, because when a Complement is transformationally introduced, for example, Dhimself in (32a), Spec does not “yet” exist. Thus, a Complement and all members of a Complement invariably bear no relation to Spec. This is so simply because an entity X can never bear a relation to “a nonexistent entity”. Thus, derivational c-command and perhaps more generally the fundamental concept “syntactic relation” appear to be at least partially deducible from a derivational construal of The First Law (“the unconnected tree Law”) and from derivational preexistence (X cannot bear a relation to Y where Y is nonexistent). Importantly, the empirical facts are explained by appeal only to the independently-motivated and quite simple, formal properties of universalized transformational rules (Chomsky (1994)), of which there are two, perhaps unifiable as one, and their universalized, similarly simple, and perhaps explicable, mode of Cyclic application (Kitahara (1993; 1994; 1995; 1997)), acting in concert with the quite fundamental, perhaps irreducible First Law, derivationally construed, and the derivationally-construed law of preexistence asserting only that “Nothing can (ever) bear a relation to nothing.” In the following sections, we propose a derivational approach to two other apparently fundamental relations, the Head-Complement and the Spec-Head relations. 4 The Head-Complement Relation Consider, for example, (34). In the derivation of (34), Vlikes was paired with Da by Merge. Da is a seven-term category, consisting of the terms listed in (35). As discussed in the previous section, Vlikes c-commands all seven terms of Da, the term with which Vlikes was Paired. Thus, V enters into relations with (c-commands) only these seven terms since nothing else existed when Vlikes and Da were Cyclically Merged, and these seven terms are in effect “what Da is”. In fact, if a syntactic category/tree/term is, in part, defined as a set of terms (in dominance/precedence relations), then the theory makes a prediction. In (34), given that Vlikes was Paired with the seven-term Da tree, the theory in fact predicts that there should exist two types of relations, given in (36). (35) 1. The Da tree/set itself 2. The branching N tree/set 3. The branching P tree/set 4. Dthat 5. Npicture 6. Pof 7. Dit (36) a. A relation between Vlikes and each of the seven terms of Da, including Da itself (=c-command), and b. A relation between Vlikes and Da, the seven-term tree itself (=the Head-Complement relation).
158
ESSAYS IN SYNTACTIC THEORY
Da itself is special among the seven terms (including Da) which constitute Da: since Vlikes was Paired with Da itself; that is, Vlikes and Da constituted the Structural Description of Merge. This completely natural analysis, couched in derivational/transformational terms, captures part of the representational definition of “minimal domain” proposed in Chomsky (1993). Consider (37), an “enriched” representation of (34). (37)
In order to account for (among other things) the Head-Complement relation, Chomsky (1993) incorporates the definition in (38). (38) A Representational Definition of (Spec-Head and) Head-Complement Relations: a. The Domain of a head α=the set of nodes contained in Max (α) that are distinct from and do not contain α. i. Max (α)=the least-full category maximal projection dominating α. (In (37), for α=Vlikes, Max (α)=the set consisting of all categories in the circle and all categories in the square=a set of 10 categories/ terms.) b. The Complement Domain of a head α=the subset of the domain reflexively dominated by the Complement of the construction. (In (37)=the seven terms in the square which constitute Da.) c. The Minimal Complement Domain of a head α=all members of the Complement Domain which are not dominated by a member of the Complement Domain. (In (37)=Da itself.)
This fundamental “complex” representational definition, like its predecessor, the representational definition of “government” (see (3)), is just that: a definition, hence explanation is lacking; that is, the following questions are unanswered: “Why is the Complement Domain of a head (defined in (38b)) significant?”, “Why is the Minimal Complement Domain of a head fas defined in (38c)) significant?”, and “Why are these and not any of the other, infinite, logically possible, syntactically definable relations linguistically significant?” By contrast, derivationally the fundamental nature of the Head-Complement relation is revealed; that is, what the syntax does—more specifically, what Merge and Move each do—is establish syntactic relations by Pairing (two) categories. Derivationally, Vlikes was indeed Paired with Da, a seven-term category. Thus it is entirely natural, if not in fact predicted, that (39) obtains. (39) a. Vlikes bears a relation to Da itself, namely the “Head-Complement” relation. Thus the representational (nonexplanatory) definition (38c) is unnecessary; and b. Vlikes bears a relation to each member of Da (=the Complement Domain un-minimized” as defined in (38b)), since these members constitute Da. This is the relation we have called “c-command;” and
“UN-PRINCIPLED” SYNTAX
159
c. the converse of (b) does not hold; that is, correctly, it is not the case that each member of Da bears a relation to Vlikes; certain members of Da underwent Pairing prior to the syntactic introduction of Vlikes, thereby “fixing” their derivationally-established relations “forevermore.”
5 The Spec-Head Relation Consider the trees in (40). In (40a), INFL (=Head) is assumed to check Agreement and Nominative Case on [Spec, IP].1 In (40b), V assigns Agent to [Spec, VP]. The hypothesized generalization is thus: Case and Agreement (and external theta role) can be (perhaps “can only be”) assigned from Head (40)
to Spec. There has however been a long-standing problem confronting the expression of this relation; that is, the Head does not c-command the Spec, given the “first branching” node definition (4) from Reinhart (1979). This motivated m-command as in Aoun and Sportiche (1983). Under the derivational analysis proposed here, Spec c-commands the Head, but the Head, having been Cyclically Merged with the Complement, is created prior to and in the absence of Spec, and therefore the Head bears a relation only to Complement and members of Complement; that is, there is no relation from Head to Spec. There is, however, at least one possible solution to this apparent problem. First recall that when we Merge two categories A and B, they form a new category, the label of which is identical to the head of either A or of B. Thus, for example, consider the tree in (41). (41)
Thus (Chomsky (1994, 11)) “…Merge…is asymmetric, projecting one of the objects to which it applies, its head becoming the label of the complex formed.” This reflects a more general and very heavily restricted hypothesis concerning the inventory of syntactic entities (see section 1), namely (Chomsky (1994, 27)) “There are…only lexical elements and sets constructed from them.” Therefore, when [Spec, VP] is Paired with the entire tree/set in (41), it is Paired with (and thus the Structural Description contains) a category “headlabeled” Vlikes. This is shown in (42).
160
ESSAYS IN SYNTACTIC THEORY
(42)
Thus, Spec is indeed Merged with a (complex) category bearing the morphological features (label) of the head Vlikes. If this analysis is maintainable, the Spec-Head relation can also be captured as a relation established by Merge/Move. If feasible, this would represent a clear advance over nonunified, nonexplanatory theories invoking not only a representational definition of “c-command” but in addition, a representational definition of “m-command” (the latter postulated precisely to capture the relation from Head-to-Spec), inexpressible as a representationally-defmed c-command relation.2 6 Summary and Discussion In this paper, I have proposed a syntactic theory in which, arguably (at least some of) the most fundamental syntactic relations posited, including “c-command,” “is a,” “Spec-Head,” and “Head-Complement,” are not formally expressed as unexplained representational definitions. I have proposed instead that such syntactic relations are derivational constructs expressed by the formally simple (“virtually conceptually necessary” (Chomsky (1994))) and unified (Kitahara(1993; 1994; 1995; 1997)) universalized transformational rules Merge and Move, each motivated on entirely independent grounds in Chomsky (1993). The theory of syntactic relations proposed here, seeking to eliminate central representational definitions such as “Government,” “Minimal Domain,” and “c-command” is entirely natural and, I think, explanatory. Concatenation operations are by (minimal) hypothesis, a necessary part of the Syntax; that is, there must exist some concatenative procedure (the application of which, by hypothesis, yields representations of sentences). But while there “must” be concatenative operations of some sort, it is not the case that in the same sense there “must” be Principles—that is, Filters or Well-Formedness Conditions on representation. The question I have sought to investigate here is thus: “Are the simple, independently-motivated, virtually conceptually necessary, Structure-Building Operations themselves, specifically the universalized transformational rules Merge and Move, iteratively applied in conformity with the Cycle, sufficient to capture (the) fundamental syntactic relations?” The tentative answer is that they seem to be. If they are, a theory of syntax expressing this will, as a result, attain a much more unified, non-redundant, conceptually simple, and correspondingly explanatory account of what is a most fundamental syntactic construct, that of “syntactic relation,” known in advance of experience by virtue of what is by hypothesis a (uniquely) human biological endowment for grammar formation. Notes * This is a revised version of a draft originally written in the Summer of 1994. Portions of this material were presented at the Harvard University Linguistics Department Forum in Synchronic Linguistic Theory in December of 1994. I thank the members of that audience for very helpful discussion, in particular Naoki Fukui, M.Koizumi, and Ken Wexler. A later version was presented in April 1995 at the Linguistics Department at the University of Maryland. I thank members of that department as well for their hospitality and for very insightful comments, especially Norbert Hornstein, Juan Carlos Castillo, and Jairo Nunes. I am especially grateful to the following people for extensive discussion of the ideas presented here: Maggie Browning, Noam Chomsky, Robert Freidin, Erich Groat, Hisa Kitahara, Elaine McNulty, Esther Torrego, Sam Gutmann, Günther
“UN-PRINCIPLED” SYNTAX
161
Grewendorf, Joachim Sabel, Robert Berwick, David Lieb, Robert Frank, and Larry Wilson. I am also particularly indebted to Suzanne Flynn and to Höskuldur Thráinsson for their help during this project. Finally, I also thank Matthew Murphy, Elizabeth Pyatt, and Steve Peter for indispensable editorial assistance with this manuscript. A modified version of this article appears as chapter 1 of Epstein, Groat, Kawashima and Kitahara (1998). I gratefully acknowledge Oxford University Press, and in particular, Peter Ohlin for granting Routledge permission to publish this article in this collection. 1 Here for the purposes of illustration, I assume a pre-Pollock (1989) unsplit Infl. In fact, the unsplit Infl might not be simply illustrative, but may be empirically correct for English, as Thráinsson (1996) argues. By contrast, Icelandic would display a truly Split Infl, AgrS0 and T0 (see Jonas and Bobaljik (1993), Bobaljik and Jonas (1996), and Bobaljik and Thráinsson (1998)). 2 For a different analysis of Spec-Head Relations, see Epstein, Groat, Kawashima and Kitahara (1998).
References Aoun, J. and Sportiche, D. (1983) “On the Formal Theory of Government,” The Linguistic Review 2:211–235. Bobaljik, J. and Jonas, D. (1996) “Subject Positions and the Roles of TP,” Linguistic Inquiry 27:195–236. Bobaljik, J. and Thráinsson, H. (1998) “Two Heads Aren’t Always Better Than One,” Syntax: A Journal of Theoretical, Experimental and Interdisciplinary Research 1:37–71. Chomsky, N. (1981) Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986) Barriers, Cambridge, Mass.: MIT Press. Chomsky, N. (1991) “Some Notes on Economy of Derivation and Representation,” in Freidin, R. (ed.) Principles and Parameters in Comparative Grammar, Cambridge, Mass.: MIT Press. Chomsky, N. (1993) “A Minimalist Program for Linguistic Theory,” in Hale, K. and Keyser, S.J. (eds.) The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, Cambridge, Mass.: MIT Press. Chomsky, N. (1994) “Bare Phrase Structure,” MIT Occasional Papers in Linguistics 5, MIT Working Papers in Linguistics, Department of Linguistics and Philosophy, MIT. Epstein, S.D., Groat, E., Kawashima, R. and Kitahara, H. (1998) A Derivational Approach to Syntactic Relations, New York/Oxford: Oxford University Press. Freidin, R. (1992) Foundations of Generative Syntax, Cambridge, Mass.: MIT Press. Groat, E. (1997) “A Derivational Program for Syntactic Theory,” unpublished doctoral dissertation, Harvard University. Jonas, D. and Bobaljik, J. (1993) “Specs for Subjects: The Role of TP in Icelandic,” in Bobaljik, J. and Phillips, C. (eds.) Papers on Case and Agreement I, MIT Working Papers in Linguistics, Department of Linguistics and Philosophy, MIT. Kayne, R. (1993) “The Antisymmetry of Syntax,” manuscript, City University of New York. Kayne, R. (1994) The Antisymmetry of Syntax, Cambridge, Mass.: MIT Press. Kitahara, H. (1993) “Deducing Strict Cyclicity from Principles of Derivational Economy,” paper presented at the 16th GLOW Colloquium, Lund, Sweden. Kitahara, H. (1994) “Target-α: A Unified Theory of Movement and Structure-Building,” unpublished doctoral dissertation, Harvard University. Kitahara, H. (1995) “Target α: Deducing Strict Cyclicity from Derivational Economy,” Linguistic Inquiry 26:47–78. Kitahara, H. (1997) Elementary Operations and Optimal Derivations, Cambridge, Mass.: MIT Press. Lasnik, H. and Saito, M. (1992) Move α, Cambridge, Mass.: MIT Press. Muysken, P. (1982) “Parameterizing the Notion ‘Head’,” Journal of Linguistic Research 2:57–75. Nash, L.K. (1963) The Nature of the Natural Sciences, Boston: Little, Brown, and Co. Pollock, J.-Y. (1989) “Verb Movement, Universal Grammar, and the Structure of IP,” Linguistic Inquiry 20:365–424.
162
ESSAYS IN SYNTACTIC THEORY
Reinhart, T. (1979) “Syntactic Domains for Semantic Rules,” in Guenthner, F. and Schmidt, S. (eds.) Formal Semantics and Pragmatics of Natural Language, Dordrecht: Reidel. Thráinsson, H. (1996) “On the (Non-)Universality of Functional Categories,” in Abraham,W, Epstein, S.D., Thráinsson, H. and Zwart, J.-W. (eds.) Minimal Ideas: Syntactic Studies in the Minimalist Framework, Amsterdam: John Benjamins.
INDEX
A-bound categories 56–8, 62, 75, 169 A-chains/Ā-chains: derivational constraints 95–122; LF 29–49; local binding conditions 8, 29–49 A-indexing at S-Structure 118 absolute scope 51, 54–6 Absorption 129, 130–31, 135, 164–5 Abstract Case theory see Case theory adjunct trace 79 Adjunction 24, 64, 65, 74, 78, 125, 135, 164; category-duplicating 10; mixed theory 10; and pronomial variable binding 9, 51–65; segmental theory 10, 161, 174; VP 76, 111 Affect-α 109–11, 117, 118, 173, 186 anaphor 49; clitics 9, 31; overt 44, 47 antecedent 44, 45 antecedent (head) government 79, 175, 128, 145, 152; index-sensitive 125, 159–61, 164 Aoun, J. 52, 74, 79, 91, 101, 103, 126, 142, 164, 176, 207 arbitrary interpretation 10 Authier, J.M.P. 78
Category Principle; Local Binding Condition 9, 29–49, 78, 170; Principle A, 15–16; Principle B 13, 15–16; Principle C 7, 13, 15, 18, 120; strong binding 73–44, 78; and Strong Crossover 13–18; violations 18, 202 blocking categories 188 Bobaljik, J. 174, 209 Borer, H. 75, 170 branching 190 Brody, M. 16, 46, 47, 169 Browning, M. 76, 77, 78, 164, 168, 209 c-command 52, 54, 56, 74, 189–204; derivation 191–9, 200–4 Case bearing/checking 8 Case Filter 7–8, 47–8, 83–5, 92; analysis 87–91; elimination 91–2, 111; reduction to Theta Criterion 16, 47, 83–6, 120; reduction to Visibility Principle 84, 85–6; violations 7–8, 83–5, 87–91 Case marking 16, 26, 76, 86 Case Theory 13, 16, 47, 83, 187 Castillo, Juan Carlos 209 category-neutralized VP-recursion structures 9 chain formation: minimal domain 177; structure 136; structure-based constraint 99–100, 103 chain-internal lowering 79 chain theory 6 checking theory 125, 145–61 Chomsky, Noam;
Baker, C.L. 96, 117 Baltin, M.R. 106, 141 bare phrase structure theory 125, 161–3 barriers 142, 188 Barss, A. 61, 76 Belletti, A. 91 Berwick, R. 209 Bijection Principle 14, 75, 78, 170 Binding Theory 3, 5, 9, 187; Empty Category Principle see Empty 163
164
INDEX
A-to-A movement 76; bare phrase structure theory 125, 145–61; Binding Theory 3, 13–14; Case Filter 8, 92; central unifying thesis 9; chain formation 103; checking theory 125, 145–7; copy theory 6; definition of variable 75; derivational economy principle 164; difference 91; differentiation principles 83–4; E-language 7; Economy Constraint 95–122; Empty Category Principle 22; Full Interpretation 73, 96, 98, 104, 168; Functional Determinism 13, 47; Government 3, 48, 85; Government and Binding Theory see Binding Theory; gravitational forces 4; & Lasnik 101, 125, 136, 176; Last Resort 8; Leftness Condition 59, 60–61, 64, 65; local binding 29–49, 62, 78; Minimal Domain 185, 205; Minimalist Program 3, 6, 159, 186, 203; necessity 104; null subject languages 26, 67, 69–71, 76; principle-based framework 187, 188; Projection Principle 24; Resumptive Pronoun Parameter 14, 17; Selection 113; Strict Cycle Condition 100–101, 171; Strong Crossover 13; Superiority Condition 167; syntactic relations 184–7; syntactic VP-adjunction of wh-phrases 111; Theta Criterion 14, 16, 33; transformational rules 204; ungrammatically 87, 120; variables 169; Visibility Principle 47, 83, 86; V-raising 158; X-invisibility hypothesis 199; Y-model 101 Clark, R. 79 clitics: anaphoric 9, 31;
chains 32–3; LF 37, 44, 46; traces 47 coindexing 58–9, 62, 77, 121, 151 Collins, C. 166 COMP 17–18; embedded 22, 24; indexing rule 102–3 complexity 185 Concatenates 186, 191, 194–7 Concatenation/Pairing 203 contraindexing 60 Contreras, H. 77 Control Theory 13 copy theory 6 Covert Verb-Second 9, 125–77 Cowan, M. 4 Cyclicity 7, 201, 202, 204, 208 D-position 119–20 D-Structure 5, 33, 87–91, 107, 147 Dative Deletion 25 Davis, L.J. 79, 91 Deep Structure 77, 189 definitions 188 degree-clause constructions 70, 71 deletion: checking-induced 164; deletion-under-identity rule 25 Derivational Economy Principle 8, 164 derivational theories 6–7, 111–12 Descartes, René 4 differentiation: syntactic 83–92; ungrammatically 86–91 distributed questions 51–2 domination 144, 188 Earliness Principle 95, 114–17, 119, 120–22 Economy Constraint 95, 104–12, 114– 22 Einstein, A. 183 E-language 7 eliminative theory 8 empty categories 18, 25 Empty Category Principle (ECP) 22, 46, 64, 78–9, 98, 102, 116–17, 126– 32, 152, 171, 176; violations 87, 97, 127, 136–7, 142, 143, 148, 150, 156 Epstein, Joseph 4–5, 9
INDEX
Epstein, S.D. 6, 8, 24–6, 47, 61–2, 75– 8, 91–2, 102–4, 111, 118, 120, 132, 138, 164, 167, 170–71, 173, 176, 201, 209 explanation 185 expletives 71, 72, 74; replacement 78 Fiengo, R. 76, 77 Filters 6, 208; LF 101–3 First Law 201–4 Flynn, S. 209 Frank, R. 209 Free assignment of features 16 free empty category 69 Freidin, R. 18, 42, 85, 162, 164, 198, 209 Fukui, Naoki 209 Full Interpretation (FI)/economy of representation 73–4, 78, 96, 98, 104, 146–8, 156, 167–8, 173 Functional Determination 5–6, 13–18, 47; empty categories 25; and Strong Crossover 13–18 Galileo Galilei 183 Generalized Transformation (Merge) 7, 186–99, 208 Government 3, 48, 52, 85, 184–5, 188– 9 Government-Binding theory see Binding theory gravitational forces 3–4 Greed 162–3 Grewendorf, Günther 164, 209 Grinder, J. 25 Groat, E. 6, 164, 177, 201, 209 Gutmann, S. 209 Head-Complement Relation 204–6 head domains 172–3 head movement 171 Head Movement Constraint (HMC) 121, 147; violation 148, 151, 153–4 Higginbotham J. 17, 18, 59, 60–61, 64, 65, 129 Hornstein, N. 101, 103, 126, 164, 176, 209 Huang, C.-T. J. 9, 22, 24, 37, 98, 99, 103, 107, 117, 136, 164, 168 Humboldt, Wilhelm von 4, 9 I-language 7 I as proper governor at LF 140–44 illicit free thematic empty categories 71 Indefinite Deletion 25
165
indexing algorithm 118 indexing on wh-adjuncts 118 index-sensitive head government 10, 125, 159–61, 164 infinitival projection (IP) 9 Intrinsic Feature 6 I-raising 158; at LF 125; syntactic versus LF 145–61 I-to-C raising 150 Jacobson, P. 64 Jonas, D. 174 Kawashima, R. 6, 201, 209 Kayne, R. 189, 198 Kimball, J. 25 Kiss, K. 105–6 Kissock, M. 164, 174 Kitahara, H. 6, 164, 166, 171, 174, 186, 189, 191, 195, 201, 204, 208, 209 knowledge-state, linguistic 3 Koizumi, M. 209 Koopman, H. 13–18, 47, 64, 75, 96, 131, 169, 170, 171 Kuno, S. 64 L-marking 142 L-model 101 Larson, R. 159 Lasnik, H. 10, 15, 18, 24, 30–32, 42, 45–6, 61, 64, 75–9, 83, 85–7, 91–2 96–112, 113, 117, 119, 125, 128– 34, 136, 138–43, 151–2, 164, 176, 186 Last Resort 8 Law, P. 175–6 Lee, E.-J. 136 Leftness Condition 59, 60–61, 64, 65 Lexical proper government 110–11, 129 LF (logical form): A-chains 29–49; anaphor cliticization 9; Chains 9, 29–49; cliticization 37, 44, 46; Comp-to-Comp movement 98–9; delayed enforcement 8; filters 101– 3; forced raising 110; I as proper governor 140–44; LF-illegitimate feature 8; Move-alpha 149; Movement Visibility 146–7, 148, 150, 153–4, 174;
166
INDEX
Null Operator 9; Quantifier Movement rule 67; representation see LF representation; Strict Cycle Condition 100–101, 103, 171; Theta Criterion 37; V-raising 153 LF representation 8; category-neutral 164; Government-Binding-type theory 67; of PROarb 10, 21–6 Lieb, D. 209 Local Binding Condition 9, 29–49, 62. 78, 170 m-commands 142, 144, 188, 189 McNulty, E. 16, 45, 61, 76, 91, 117, 164, 209 Martin, R. 177 matrix clause 23 matrix empty categories 68, 69, 72, 74 matrix predicate 72 matrix scope 24, 77, 96, 106–7 matrix subject 70, 71 maximal projections 74 May, R. 10, 21, 24, 51–65, 67–9, 74, 77–9, 119, 129, 141, 143, 161, 171 174 mentalism 3–4 Merge rule (Generalized Transformation) 7, 186–99, 208 mind-body problem 3 Minimal Complement Domain 206 Minimal Domain 185, 205 minimal domain of the chain 177 Minimal Link Condition 177 Minimalist Program 3, 4, 6, 159, 163, 186, 203 Move rule (Singularly Transformation) 6, 7, 186–99, 208 Move-α rule 83, 120, 149, 186 Movement 8 Movement Visibility 146–7, 148, 150, 153–4, 174 multiple interrogation reading 77–8 Murphy, M. 209 Muysken, P. 162, 198 Nash, L.K. 183 natural language grammars 83 necessity 104 New Science 4 Newton, 183 non-null subject languages 26 nonsingleton chains 79 Null Operator constructions 9; quantification 8–9, 67–79;
S-Structure position 78, 79 null subject languages 21, 26, 67, 69– 71, 76 Nunes, Jairo 209 object antecedents 49 object empty category 73, 75 Ohlin, P. 209 Operator Disjointedness Condition (ODC) 125, 128–32, 163, 167; reformulating 132–6 Overt Scope Marking 9, 125–77 Pairing 197 Pairs/Concatenates 194–7, 202 parasitic gaps 69–70 Path Containment Condition 51–4, 166 paths: intersection 53; segments 53; structure 53, 62–3 Pesetsky, D. 51, 53, 95, 97, 114, 115, 117, 120–22, 166, 168 Peter, S. 209 PF-illegitimate feature 8 phrase structure rules 83 Phrase Structure Trees 188 Pollock, J.-Y. 91, 110, 145, 146, 174, 209 Poole, G. 164, 171 Postal, P. 64, 77 Primitive constructs 185 Principle-Based theory 188–9 Principle P 120 PRO 16; null Case 170; pro: quantifier-pro 10, 21–6; universal quantifier interpretation 21, 24 PROarb: LF representation 10, 21–6; as quantifier phrase 21–4; as variable 21 Projection Principle 24, 30, 33, 37, 42, 43, 79; violation 86 pronomial variable-binding 17–18, 51– 65, 78; Adjunction 9, 51–65; analysis 56–8; backward 64; crossed 55, 58, 60–61, 64 Proper Binding Condition 134, 150, 156, 166, 167
INDEX
proper government 138–44, 159–60 Pyatt, E. 209 Quantification 9, 10; null operator constructions 8–9, 67–79; vacuous 68, 69, 74, 78, 167 Quantifier Interpretation 101 Quantifier Lowering 8–9, 67–9, 73, 76, 78, 79 Quantifier Movement analysis 67–9, 71, 72 Quantifier Phrase 21–4, 67–9; topicalized 105–7 Quantifier Raising 10, 21–2, 67–9, 72, 76, 106 Quantifier-pro 10, 21–6 Range Assignment Principle 73, 74, 78 Recall Move 195 reduction 83–92 Reindexing Rule 17–18, 59–60 Reinhart, T. 64, 132, 189, 196, 199, 207 relative scope 51–4 Relativized Minimality 177 representational approach 5 Representations (Phrase Structure Trees) 188 Resumptive Pronoun Parameter 14, 16, 17 R-expression 18, 42 Riemsdijk, H. van 101, 103, 118, 164 Rizzi, L. 30–32, 34, 41, 45–8, 79, 118, 152, 157–8, 168, 175–6 Rosenbaum, P. 25 S, embedded 24 S-adjunction 119 S-Structure 5, 21–4, 30–37, 39–42, 44, 69–71, 101; A-indexing 118; Case Filter violation 87–91; Case requirments 120; filter elimination 91–2, 111; null operator conditions 78, 79; specifier-head coindexing 151; Theta Criterion violation 87– 91; [+Wh] Comp Filter application 112–17 Sabel, J. 209 Safir, K. 64, 170 Saito, M. 10, 15, 46, 64, 75, 77–9, 87, 96–112, 119–20, 138–43, 151–2, 186 satisfaction 120 scope: absolute 51, 54–6; narrow (embedded) 72, 73, 77, 79, 96;
167
overt marking 9, 125–77; unambiguous 71; wide (matrix), 24, 77, 96, 106–7, 132 Scope-Marking Condition (SMC) 125, 135–40, 163, 166, 167 Scope Principle 10, 51–4, 68 Seely, D. 6, 8 Selection 113 self-attachment 163 singleton chain 86; Visibility 88 Singularly Transformation 6, 7, 186– 99, 208 Spec-Head Relation 206–8 Spell-Out 133–4 Sportiche, D. 13–18, 47, 52, 64, 74–5, 96, 101, 103, 126, 131, 142, 164, 169, 170, 176, 207 standard Minimalism 6 Standard Theory 187 Stowell, T. 64, 75, 76, 77–8, 79, 83, 97, 170 Strict Cycle Condition 100–101, 103, 171 Strong Crossover (SCO) 5, 7, 13–18; and Binding Theory 13–18; generable configurations 15–17; solutions 17–18 Structural Change 192 Structural Description 192 Structure Building Operations 195, 197, 208 Subjacency 15, 37, 70, 76, 110, 111, 116, 144; violations 87, 142 Suñer, M. 21, 24, 25 Super-Equi 25 Superiority Condition 126–8, 129, 167 superiority effects 125–77 syntactic adjunction 143 syntactic differentiation 83–92 syntactic relations 6, 184–7; in principle-based theory 188–9 syntax, UN-principled 183–209 Takahashi, Daiko 164 Theory of Control 16 Theta Criterion 7, 14, 16, 29–37, 187; constraint on wh-movement 98– 100; D-Strucrure violation 87–91; theta-chain 14, 16–17; violations 7– 8, 17, 33, 83–5, 87–91 Thráinsson, H. 164, 174, 209 Tiedeman, R. 164 topicalization 141, 144
168
INDEX
Torrego, E. 61, 76, 91, 117, 164, 209 tough constructions 70, 71–2, 76, 77, 79 Tough Movement 77 trace theory 6 transformational component 83 transformational rules 6–7, 204; Merge 7, 186–99, 208; Move 6, 7, 187–99, 208 transformations 5 trees/representations 6 undistributed questions 51–2 ungrammaticality 86–91, 107, 120, 132, 150 unification 120, 184–5 Universal Grammar 37, 43, 67, 83, 95, 115, 120 universal quantifier 21, 24 UN-Principled Syntax 6, 183–209 Uriagereka, J. 75, 76, 83, 105, 107 V-movement 153 V-raising 125, 153, 158, 160, 173; syntactic versus LF 147–50 Vacuous Movement Hypothesis 120 vacuous quantification 68, 69, 74, 78, 167 variables 169; binding 17, 73; definition 6, 13, 14–15, 75; strong binding 73–4 Visibility Principle 7, 47, 84, 85–6, 88– 92 VP Adjunction 76, 111 VP-Internal Subject Hypothesis 131, 150, 152 Wasow, T. 64 Weak Crossover 10, 14, 58, 76 Well-Formedness Conditions 208 Wexler, K. 209 wh-phrases 16, 18,42; A-binding 130– 31; absorption 130–31; assigned scope by movement 120–21; Case 18, 92; coindexing 60; D-linked 121, 168; Economy Constraint 105; LF movement 129; local binding 77; Movement from Comp 96–105 non-topicalizability 107–9; scope interpretation 54–6, 127–8;
scope-marking condition 136–40; syntactic movement 113–16; syntactic VP-adjunction 111–12; theta-based constraint 98–100; [+Wh] Comp Filter 112–17; [−Wh] Comp Filter elimination 109–11 wide-scope interpretation 24, 77, 96, 106–7, 132 Williams, E. 64, 75, 101, 103, 118, 164, 171 Wilson, L. 209 X-bar Theory 149, 156, 161–3, 187 X′-invisibility hypothesis 199 X-projection 74 Y-model 6, 101