From Orthography to Pedagogy: Essays in Honor of Robert L. Venezky

  • 55 70 3
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

From Orthography to Pedagogy: Essays in Honor of Robert L. Venezky

FROM ORTHOGRAPHY TO PEDAGOGY Essays in Honor of Richard L. Venezky Richard Lawrence Venezky, 1938-2004. A call from t

2,703 53 20MB

Pages 345 Page size 336 x 527.04 pts Year 2004

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

File loading please wait...
Citation preview

FROM ORTHOGRAPHY TO PEDAGOGY Essays in Honor of Richard L. Venezky

Richard Lawrence Venezky, 1938-2004.

A call from the hospital during the Festschrift meeting in which the participants listened intently to Dick Venezky's recognition of the proceedings. (Photo by Dominic Massaro)

FROM ORTHOGRAPHY TO PEDAGOGY Essays in Honor of Richard L. Venezky

Edited by

Tom Trabasso University of Chicago

John Sabatini Educational Testing Service, Princeton, NJ

Dominic W. Massaro University of California, Santa Cruz

Robert C. Calfee University of California, Riverside

2005

LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Mahwah, New Jersey London

Copyright © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microform, retrieval system, or any other means, without the prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, New Jersey 07430 www.erlbaum.com

Cover design by Kathryn Houghtaling Lacey

Library of Congress Cataloging-in-Publication Data From orthography to pedagogy : essays in honor of Richard L. Venezky / edited by Tom Trabasso . . . [et al.]. p. cm. Includes bibliographical references and index. ISBN 0-8058-5089-9 (h. : alk. paper) 1. Reading—Code emphasis approaches. 2. English language—Orthography and spelling—Study and teaching. I. Venezky, Richard L. II. Trabasso, Tom. LB1050.22.F76 2005 372.46'5—dc22

2004053325 CIP

Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

Contents

In Memoriam Preface

vii xi

1

The Exploration of English Orthography Robert C. Calfee

2

Phonological Variation and Spelling Rose-Marie Weber

3

The Magic of Reading: Too Many Influences for Quick and Easy Explanations Dominic W. Massaro & Alexandra Jesse

37

When Actions Can't Speak for Themselves: How Might Infant-Directed Speech and Infant-Directed Action Influence Verb Learning? Khara Pence, Roberta Michnick Golinkoff, Rebecca J. Brand, & Kathy Hirsh-Pasek

63

The Role of Causal Reasoning in Understanding Narratives Tom Trabasso

81

4

5

1

21

V

vi

CONTENTS

6

7

8

9

10

11

12

Theory and Practice of Using Information Text for Beginning Reading Instruction Michael L. Kamil, Diane Lane, & Emma Nicolls

107

Research and Theory Informing Instruction in Adult Literacy Banu Oney & Aydm Yucesan Durgunoglu

127

What Does It Mean to Comprehend or Construct Meaning in Multimedia Environments: Thoughts on Cognitive and Assessment Construct Development John Sabatini From Real Virtuality in Lascaux to Virtual Reality Today: Cognitive Processes With Cognitive Technologies David Mioduser

173

Gaining Perspective Through Science: A History of Research Synthesis in Reading Timothy Shanahan

193

Literacy and New Technologies: Ten Principles for Assisting the Poor Daniel A. Wagner

211

Problematic and Promising Trends in Holding Teacher Education Programs Accountable Frank B. Murray & Elaine M. Stotko

227

13

High-Stakes Testing: Literacy by the Numbers Dale D. Johnson & Bonnie Johnson

14

Time Is of the Essence: An Overview of Quantitative Methodologies for the Study of Change David Kaplan & Ximena Uribe-Zarain

15

149

The Dictionary of Old English: The Next Generation(s) Antonette diPaolo Healey

245

265

289

Author Index

309

Subject Index

319

In Memoriam

On June 11, 2004, three weeks following the Festschrift, Dick Venezky died due to complications from leukemia. The night before he spent with a sister, his wife, and son, watching a basketball game. All made plans for the next day and said goodnight. Everyone expected Dick to make it. ... Venezky's career is richly but implicitly reflected in the contents of the Festschrift volume. The spectrum of his accomplishments is rather amazing. For more than three decades Venezky served as an authority on literacy, spelling, and educational technology. Among his diverse activities were roles as National Research Director for the U.S. Secretary of Education's Initiative on Reading and Writing (1995-1998), Director of Computing for the Dictionary of Old English at the University of Toronto, and Senior Researcher at the Organisation of Economic Co-operation and Development (OECD) in Paris (1999-2001). He was the Benton Visiting Scholar in Education at the University of Chicago (1994-95), Scholar in Residence at the U.S. Department of Education (1997-98), and a consultant to the Office of Educational Research and Improvement in Washington, DC (OERI). Richard Lawrence Venezky was born on April 16, 1938 in Pittsburgh, PA. He earned a BEE and MA in linguistics from Cornell University, and a PhD in linguistics at Stanford University. From 1965-1977, he was a member of the faculty at the University of Wisconsin, Madison, in the Departments of English and Computer Sciences. From 1975-1977 he was the chairman of the Computer Science department and in 1977 he was apvii

viii

IN MEMORY

pointed Unidel Professor of Educational Studies at the University of Delaware, where he also served as a member of the Computer and Information Sciences and Linguistics faculties. His curriculum vitae includes books and journal articles on the design of computer-assisted instruction, English orthography, reading instruction, and the psychology of reading, including the 1999 publication of The American Way of Spelling: The Structure and Origins of American English Orthography. He developed computer analysis systems for two major dictionary projects (Dictionary of Old English and Dictionary of American Regional English (DARE)), in addition to assisting with the Oxford English Dictionary. He contributed to several instructional programs for prereading, reading, spelling, language arts, and multimedia. He was awarded the Distinguished Service Award by the Society for the Scientific Study of Education, and was a member of the Hall of Fame in the International Reading Association. In 1964, Dick married Karen Gauz, a soul mate and friend. He took great delight in the birth of their children, Dina and Elie, who have wonderful memories of him reading or working in his home office in the evenings, driving his 1919 Nash and tending the garden on the weekends, and taking family vacations in the spring and winter. When asked, "What does your Dad do?", they would try to answer—he's a professor in three departments, he teaches teachers how to teach reading, he works on textbooks, he made word lists for Speak and Spell, he helps with an Old English Dictionary—neither child felt they did him justice, because they could remember only a fraction of what he was doing, but they were proud. Dick made juggling multiple projects look easy to those around him. His family and friends might joke about his travel schedule, but it always included time to enjoy life and his family. An example is his 1970 sabbatical in Israel, where he was joined by Calfee. From one perspective it was a busman's holiday, as the two explored the school system's literacy programs. But it was also an excuse to roam the country from Acco in the north to Eilat in the south—losing themselves in the rocky fields outside of Hevron along the way. A favorite image—Dick and Karen excavating an enormous urn they found on the beach at Caesarea, which they seriously considered bringing back to the States in their laps—Elie and Dina were yet to be conceived. Stories of Dick's personal qualities are legend, and thread through this volume in many ways. He was an amazing intellect. His ways with words were a hallmark, reflecting the spectrum of his interests and accomplishments, but also his fascination with language. His imagination led to creative adventures that obviously transcended disciplinary boundaries, as reflected in his academic appointments. But almost any arena provided opportunities for his playfulness—who else would organize a formal

IN MEMORY

ix

event in an automobile scrap yard, complete with a string quartet and an auto-crushing event a la James Bond. Guests were provided with champagne and sledge hammers. Dick was resplendent in a tuxedo with short pants and boots. He was an incredibly nice guy, not always the case with smart people. He was soft spoken, supportive but able to offer criticism in ways that felt good. He was renowned as a discussant at convention presentations, where he would typically begin by offering a few "wise words of wisdom." He invited those around him to share with him the interesting things they saw and did. Dick's first grandson, Benjamin, was born on his 64th birthday. Discussing Benjamin's development with Dick was delightful. He would pose questions to Benjamin's parents in an unassuming way and the interaction was wonderfully natural. One would think that he might be an expert in child development. In September of 2002, Dick almost collapsed at the top of the stairs in the Paris Metro, dragging his luggage from a meeting in Germany and a trapeze performance by his son. Some thought that his travel schedule had caught up with him, but on his return home he was diagnosed with leukemia. He spent the next 20 months battling a variety of conditions related to the disease and treatment, all the while continuing his research projects and maintaining his prized garden—all with amazing persistence, style, and grace. Between hospital visits he managed several wonderful months with family, friends, and colleagues, including a vacation to the family home in Sag Harbor. Dick entered the hospital in early May with fever, lymphomas in his throat, chest, and spine, and masses in his lungs. He fought these ailments with such tenacity and dignity that few of us fully comprehended their impact. His doctors were optimistic that he could attend the Festschrift, but he was hospitalized with a fever the night before the event. Dick was deeply moved by the Festschrift, monitoring the presentations from his hospital room. During the few weeks that followed, he fully expected to return home—an expectation shared by the rest of us. He spent his last days reading, walking through the halls, watching movies with Karen, and tending to projects. In early June, it seemed at one point that he was about to return home to his garden and office, but his condition suddenly worsened. He spent his last week with Karen, his two sisters, and his son Elie. He was thrilled when he learned the name of his second grandson, Nathaniel Read, who was born in California three days after his death. —Dina Venezky and Robert Calfee September 26, 2004

This page intentionally left blank

Preface

On May 21 and 22, 2004, friends and colleagues of Richard Venezky assembled outside Wilmington DE on the grounds of the Winterthur estate to conduct a Festschrift in his honor. For a scholar and activist whose career centered around the literacy domain, the idea of a "feast of writing" seemed especially appropriate. Dick had been hospitalized the previous weeks, but participated in the events telephonically; more significantly, his spirit permeated the various presentations and discussions that filled the day, which remains springlike bright and clear in the minds of all who attended. This volume presents our formal contributions, our efforts to capture the several facets of Venezky's incredible career. The following prefatory notes provide snapshots of the chapters to guide the reader through the expositions in a manner that we think Dick would approve. The chapters provide an appropriate archival account of the domains, an important outcome in its own right. They partially convey the spirit of the Winterthur event, the celebration of a life well lived both personally and professionally. The chapters are arranged roughly in order of the appearance of various projects during Dick's career. In his dissertation, conducted at Stanford under the supervision of Ruth Weir, Venezky argued that, notwithstanding the wishes of Mark Twain and other proponents of a consistent English orthography, there would never be an exact deterministic relationship between the two. Robert Calfee engages us in an insightful tutorial on Venezky's analyses xi

xii

PREFACE

of the structure of English orthography and the American way of spelling. Children learning to read have to crack the code, complicated by multiple pronunciations and variabilities reflecting a word's heritage. Words change depending on phonological and syntactic context. We articulate "Did you eat?" differently from the words did, you, and eat pronounced in isolation. Calfee's paper describes the Venezkian regularities that permeate these variabilities, and the implications for early phonics instruction. Rose-Marie Weber offers an engaging discussion of the challenges faced by young children in bridging sound and spelling, elaborating on Venezky's analysis of the connection. She sketches the dimensions of phonological variation in relation to spelling, a topic seldom addressed theoretically or practically, focusing on current variation in American English. There is regional variation throughout the country; some speakers employ more vowels than others. There is variation by social group, such as gender, ethnicity, and age; speakers in the same region may sound different from others, including teachers and children. There is stylistic variation by occasion; speakers have choices in pronunciation that signal the nature of an event or a personal connection. These dimensions of variation permeate young children's speech in the classroom, and they have to cope with these variations as they begin to read and write using an idealized version of English for learning sounds and letters. Weber considers how instruction can assist young students to become aware of their speech patterns while learning to spell, an awareness that can shape social futures. Words are fundamental to reading, and yet a century of research has not resolved controversies around how words are recognized. Dominic Massaro and Alexandra Jesse review old and new research disproving simplistic ideas such as words are read as wholes or are simply mapped directly to spoken language. Inspired by Dick Venezky's work and many years of collaboration, the authors describe orthography and phonology, how they relate to each other, and describe new experiments on how these sources of information are processed. Tasks include lexical decision, perceptual identification, and naming. Dependent measures are reaction time, accuracy of performance, and a new measure, initial phoneme duration. Important factors for informing the controversies show that reading has multiple determinants, including the type of task, familiarity of the test items, and measurement accuracy. The authors also address potential limitations with measures related to the mapping between orthography and phonology, and show that the existence of a sound-to-spelling consistency effect does not require interactive activation, but can be explained and predicted by a feed-forward model, the Fuzzy logical model of perception.

PREFACE

xiii

Golinkoff's chapter is concerned with how spoken language is acquired, and the importance of infant-directed speech. Although Dick's analyses usually began with the child who had already acquired spoken language, he would have been sympathetic to important stimulus properties of the input analogous to those he studied in reading. After all, as Golinkoff argued in her presentation, reading is parasitic on language. In order to learn to read, children must do more than recognize the letters and find the spelling-to-sound correspondences. They must recognize that reading involves the extraction of meaning from text. And what is it that determines what the sentences we read are about? The verbs. The verb is the architectural centerpiece of the sentence and Golinkoff and her colleagues describe the difficulty of verb learning (in comparison to noun learning) and how the input parents provide—of both a linguistic and a non-linguistic nature—conspire to provide children with the data they need to parse the events verbs label and learn the verbs themselves. Venezky understood that decoding skill served as a portal to reading comprehension. Tom Trabasso's chapter surveys his illuminating program of theoretical and empirical research on narrative comprehension. The studies describe how readers use knowledge of human psychological and physical causation to understand why and how characters behave as they do. These explanatory inferences form the basis for learning connections between the explanation and what is being explained, enabling the reader to form a mental representation of the text situation in memory. The representation constitutes a network of causal-like motivations that allow the reader to perform a variety of high-level interpretive tasks. Most reading instruction is delivered using stories or narrative text. The research described by Michael Kamil stems from a conversation with Dick Venezky dating from the late 1960s. Dick pointed out the lack of information text in basal readers, and he was concerned that instruction in narrative or story would not prepare students for literacy demands they would encounter as adults. Kamil describes an instructional approach that uses information text instead of stories to teach reading. In this approach, 50% or more of the text used for first grade reading instruction was information. Students appear to be more motivated and learn as well or better than they did in more conventional, story-based instruction. The second part of the chapter briefly describes an observational study of first grade classrooms to determine what teachers currently do if and when they have students reading information text. These data are important to determine what sorts of professional development would be needed to allow teachers to use methods that would place greater emphasis on information text used for early reading instruction.

xiv

PREFACE

Venezky repeatedly pointed out the adult literacy problem America faces and suggested ways to improve the effectiveness and efficiency of adult literacy instruction. Oney and Durgunoglu discuss the complex interactions between cognitive-linguistic and affective factors and the social and instructional contexts in which they operate in an effort to understand the specific relationships that support and hinder literacy development in adults. They use their general model of literacy development as a lens to study adult literacy acquisition. They describe the development and implementation of their Functional Adult Literacy Program to demonstrate building reciprocal links between research, theory, and practice for effective adult literacy instruction. Given Venezky's interests in both reading and technology, it is only natural that he would puzzle about how the latter might serve the former. Dick appreciated that humans are multimodal multisensory organisms, and that creative technology might enhance the experience of reading by providing a richer sensory environment, the topic of John Sabatini's chapter. The chapter has two main theses. First, the time has come to merge what is typically called "media education" (Tynan, 1998) with the reading literacy/language arts curriculum, which will facilitate the design of assessment (and cognitive) constructs to address the relationships (and seeming contradictions that often arise) between traditional print literacy and media, electronic or other. Second, the activity of assessment construct development promises to transform comprehension assessment to better align with changing intellectual, social, and technological perspectives and practices of a multiple media educational environment. Technology in the service of human activity was a constant theme in Venezky's work. David Mioduser shared with Venezky a concern for the interaction between natural (human) and artificial cognitive processes as a fascinating field for inquiry and reflection. In his chapter, Mioduser describes the potential that technology has for enhancing our cognitive functioning when the individual and their goals are foremost. He surveys nine approaches for analyzing and discussing this interaction between cognition and technology, along with the implications for education. Cognitive technologies are conceived as supporting, respectively, the acquisition, extension, consolidation, externalization, internalization, construction, collaborative creation, compensation and evolution of cognitive processes and skills. Regarding practical implications, the various approaches have found their way into the educational arena in the form of computer or network based learning environments and tools. Venezky was a "history nut." His study of orthography was fundamentally grounded in history, and historical themes permeated much of the rest of his work. This dimension clearly influenced the chapter by his former student, Tim Shanahan, who leads the reader on a journey through reading

PREFACE

XV

research. The federal government has recently turned to research synthesis to guide education policy. Shanahan shows how research synthesis in reading has provided the basis for policy-oriented, panel centered, and formal meta-analyses of 21st-century educational research. Given his above interests, it is no surprise that Venezky would become founding co-Director of Perm's federally funded National Center on Adult Literacy, along with Dan Wagner. Working together for nearly a decade, Venezky and Wagner worked on numerous projects and copublications on the national and international scene, including Literacy: An International Handbook (1999). More recently, Venezky and Wagner collaborated in the area of education, literacy and technology. Wagner's chapter on Literacuuy and New Technologies summarizes some of the findings on the effectiveness of the use of technology, with a special emphasis on developing countries. Wagner concludes that literacy work in developing countries will inevitably have to take advantage of new technological tools, and that efforts in this regard are only now coming into effective use. Although he held academic positions in Linguistics, Psychology, and Computer Sciences, it was in the discipline of Education that Venezky established his professional base—which meant a commitment to understanding the work of teachers and teaching. In fact, Venezky's work can be seen as one complex argument and base for enabling teaching to take eventually its place as one of the learned professions. Frank Murray and Elaine Stotko compare the profession of teaching to other learned professions, investigating size, work, and regulatory practices. The authors ponder why, though the teaching profession appears to have the quality assurance indicators of other professions (licenses, degrees, standards boards and so on), the outcomes of these measures of quality are mistrusted by the public in ways that other professions, such as medicine or law, are not. They explore the mechanisms that have been put into place to hold teacher education programs accountable for the quality of their graduates, including standardized testing, Title II reporting, and assessments of student achievement. They then examine the nature and validity of evidence and scholarship in the field of teacher education, asking whether such evidence is strong enough to assure the public that teachers are well educated and competent. The authors suggest, as Venezky certainly would have, that multiple measures be used by teacher education programs as more compelling evidence of teacher quality, and the way that such measures might fit into program accreditation. Dale and Bonnie Johnson, motivated by earlier work of Venezky and Linda Winfield on school-wide reform, describe their experiences during the 2000-2001 school year when they taught in a public school in rural Louisiana—a school lacking a library, playground equipment, hot water,

xvi

PREFACE

art classes, and adequate teaching materials. Their students were AfricanAmerican children from impoverished homes. They explore the impact of high-stakes testing pressures and inequities in public school funding that impede teaching and learning. The Johnsons conclude that when achievement demands are imposed with inadequate funding, the negative consequences for public schools far outweigh any positive impact for either teachers or students. Change—the Janus side of history—was reflected in a collaborative research project with David Kaplan and Ximena Uribe-Zarain, which examined cross-national differences in the development of instructional computing technology skills. The chapter provides an historical overview and critique of modern methodologies for the study of change, with particular relevance to issues of educational evaluation and policy. The chapter addresses the technical aspects of the topic: time series analysis, the analysis of difference scores, structural equation modeling of longitudinal data, growth curve modeling and extensions, and models for the transition between stages over time. Venezky's undergraduate degree was in Electrical Engineering, and he delighted in unpacking complex problems. The Dictionary of Old English (DOE), one of the first computer-corpusbased dictionary projects, engaged Richard Venezky's attention and benefited from his computer expertise. Beginning in the late 1960s, he looked with delight to the annual meetings in Canada, which featured marvelous medieval banquets in the stately halls of the University of Toronto. In her chapter, Antonette diPaolo Healey describes the DOE project, which relies on a comprehensive survey of English manuscripts written between 600 and 1150 A.D. Together with the now-completed Middle English Dictionary (2001), including records from 1100 and 1500 A.D., and the Oxford English Dictionary, which documents English to the present day, we possess a remarkable account of the development of the English language. Venezky possessed the rare combination of talents, technological and linguistic, that the project needed. Launching the project on a technologically-sound footing, he became its Director of Computing. The project drew on his imagination of a dictionary for the electronic age, tempered by his engineer's common sense. The essay charts the recent journeys of DOE as it moved from the closed proprietary systems of the previous century to the open source technology of the present. The effects of this migration have been profound, and have marked the greatest transformation in the project's history as it makes the transition from microfiche publication to CDROM. The account describes the many challenges in telling the story of the English language. All of these projects remain works in progress, of course. We still have much to learn about how to help teachers help students in the acquisition of literacy—from kindergarten through adulthood, from decoding to

PREFACE

xvii

comprehension and composition. We have much to learn to better connect the technical and the human aspects of these endeavors—classroom observations and computer codes. We have much to learn about connecting across cultures and nations—not represented in this collection are Dick's endeavors with OECD, in Israel, and in that most curious of environments, Washington DC. Sadly, we must proceed without his enthusiasms, curiosities, and incredible insights. On Friday, June 11, 2004, he passed away from us at much too early an age.

This page intentionally left blank

1 The Exploration of English Orthography Robert C. Calfee University of California, Riverside

When 35-year-old one-armed John Wesley Powell launched forth in 1869 with his party of 10 to explore the Grand Canyon of the Colorado, he knew that it existed, that it had shape and substance, land and people. To be sure, the images were largely penumbrae, promises, and perils. The start of his trip was easy enough, but each turn thereafter offered surprise and challenge. When Richard Venezky set out a century later to explore English orthography, his journey moved likewise through strange and uncertain territories, whose very existence was questioned. As for many adventures, the exploration of English orthography required sensitivity to time and space, to history and culture. It required technical prowess and mastery of antiquarian machineries of the day. Venezky considered prevailing ideas about the lay of the land, but equally important was his willingness to investigate alternative models and methods. Powell relied on Paiute locals for information about riverine conditions at the bottom of the Grand Canyon, but he also knew when to pack the boats and walk. Treatises on English spelling were available when Venezky started his journey (e.g., Bloomfield & Barnhart, 1961; Fries, 1962), but early on he decided to pursue other paths. Finally, pioneers were driven by commitment and perseverance. Once in the canyon, Powell could have bailed out, but he didn't. Venezky, despite distractions from many other explorations, made time to extend his original investigations in ways that inform instructional contexts (Venezky, 1970a; see also

1

2

CALFEE

1967, 1970b) and to create, along with his wife, Karen, a monograph designed to amuse and delight (Venezky & Venezky, in press). This chapter touches first on the distinctiveness of Venezky's findings and the significance of the ideas for designing the effective and efficient acquisition of English literacy and then summarizes results from a series of educational investigations incorporating these discoveries. Several decades passed following Powell's trip before the voyage became safe for others. Voyages through English orthography remain perilous for many in today's schools, but Venezky's explorations provide the foundation for smoother trips in years ahead.

THE ENGLISH POLYGLOT Reading is about phonics—ask almost anyone in the United States (where, incidentally, the "approved" language is English). Phonics is about speech sounds—look in the dictionary. Letters clearly have something to do with the matter, but print does not generally "talk," leaving readers to figure out what letters "say." In English, however, each letter stands for a lot of sounds, including no sound at all. What a mess! The amazing thing is that lots of people actually learn to read English, even when a text is nonphonic. Imagine the following cartoon bubble over a woman's head: "DNO'T BLNIK YUOR EYES AT ME LKIE YOU DNID'T UDNERSNATD A SIGNLE WROD I JSUT SIAD. AND DNO'T GVIE ME THAT WMOEN ARE FORM VEUNS, MEN ARE FORM MRAS TINHG— WE BTOH ARE SPAEKNIG ELINGSH, FOR CRIYN' OUT LUOD!" (Lee, 2003). This text troubles my spell checker, but most readers can handle it. And if the inconsistencies in this sample seem unusual, the Venezkys' monograph provides delicious examples from "real" English texts. How do children manage this orthographic maze? They all emerge from the womb uttering little more than a yelp, but within a few years they have moved from babbling to coherent communication using oral language, along the way acquiring phonological, semantic, and syntactic systems and remarkable mastery of discourse structures. Literacy is another matter. For many children, early reading seems to proceed much like speech; most such children are immersed from conception in rich and meaningful linguistic environments that blend spoken and written communications. Reading becomes a formal school activity around the age of 5 or 6 years, with prescribed sequences and performances. Somewhere between the introduction of reading instruction and the onset of adolescence, substantial numbers of students become certified reading failures. Practically speaking, they appear to be word blind: They cannot translate print into speech with sufficient fluency to make sense of the messages on

1. EXPLORATION OF ORTHOGRAPHY

3

a page. By the time they enter the middle school years, these children have spent the better part of a decade denied access to the academic content and linguistic register available mostly from books. Such access is a primary benefit of literacy, both for entry to the resources we associate with the concept of the library and to succeed as a player in the mainstream culture. Current discussions emphasize entry over activity. The first grader who can pronounce the words on a page with some degree of speed and accuracy can "read." This dogma is problematic for several obvious reasons, but for present purposes I focus on decoding and on the task of creating early reading programs in which this element is handled in an effective and efficient manner—the goal is for all students to quickly acquire the essentials of English orthography, so they can get on with the good stuff. English Spelling Is a Mess, But Someone Has to Teach It Sunday supplements, augmented of late by internet spam, routinely offer examples of the irregularities in English spelling. The gimmick is to highlight irregularities with vowels of almost any variety, but especially vowel digraphs and ough. Researchers also contribute to the mischief. In the 1960s, computers were used to study the spelling system by examining bigram and trigram frequencies; the studies provided counts of pair and triplet of letters occurring in a large lexicon, sometimes with the pronunciations correlated with each pattern. Consider the trigrams in doughnut: DOU, OUG, UGH, GHN, HNU, NUT. Assuming that a pronunciation can be assigned to each unit, the consistencies are then tallied. The result often appeared rather chaotic (for a recent review, cf. Goswami, 2000). With the emergence of basal reading series in the 1940s, "accountants" took charge of curriculum design. Behavioral objectives spelled out what first graders needed to learn. Detailed scope and sequence matrices ensured that every essential spelling-sound pattern was introduced, practiced, and tested. Three principles undergirded these designs. The onsetrime principle (Moats, 2000) used vowel-consonant rhyming patterns, such as -at and -im, combined with onset consonants to produce short, simple, frequent words. The second principle dealt with very high frequency words like those found in the Dolch list (Fry, Kress, & Fountoukidis, 2000). Onset-rime patterns offer little help with the, of, was, the days of the week, and the numbers from one to ten, all critical in the first-grade environment, so these words were taught by rote. Finally, the decodability principle restricted student texts to spelling-sound patterns that had been previously taught. Swimmy (Lionni, 1963) might be a favorite of kindergartners, but a phase such as "a lobster like a giant water-moving ma-

4

CALFEE

chine" clearly placed the text out of bounds for first graders. To be sure, writers like Theodor Geisel demonstrated creative possibilities within these limits (e.g., The Cat in the Hat; Seuss, 1957). In combination, these principles and practices reflect a view of English spelling as inherently complex and inconsistent. Students need to first learn to read, during which they are taught a large collection of basic decoding patterns. Then they can read to learn. Learning to read means learning the print system. Whatever the program label—analytic phonics, synthetic phonics, or look-say—the first grade focus is on the letter-sound system, simple consonant-vowel-consonant (CVC) patterns, memorization of high-frequency "handy" words, and oral reading of simple stories of prescribed readability. Second grade provides opportunities to rehearse these patterns and to take on the challenge of learning to spell, which is even more problematic than decoding. In third grade, students begin to be sorted into sheep and goats. Those lacking oral reading fluency are assigned to special programs resembling first grade activities. Others move ahead to regular school programs, into demanding science and history texts with complex vocabulary (e.g., photosynthesis), where they seldom need to demonstrate competence with English orthography except in writing assignments, where spelling counts. By high school, a small cadre emerge who have mastered the orthography. They can decode almost any word they encounter and can accurately render anything they want to put into print. Before Sputnik and Nation at Risk (National Commission on Excellence in Education, 1983), the nation seemed adequately served by this elite. In today's age of accountability, the high-standards expectation is that all high school graduates will perform at levels previously attainable by only a favored few. English orthography (along with algebra) stands as a major challenge to this aspiration. The practical reality is that a substantial proportion of U.S. citizens, after 13 years of formal schooling, remain puzzled by the eccentricities of the spelling system. Venezky's quest led him to search for order in what appeared to be chaos. The Morphophonemic Principle Writing is an artifice, an invention, a human construction taking many forms, much like motor cars and computers. Human beings invented these systems; nature provides little help in learning to use them. Some writing systems are rather transparent, in that the relation between print symbols and spoken correlates is relatively direct; Spanish and Finnish are examples. Other systems are transparent in a different way, with symbols that bypass speech altogether; Chinese and Egyptian are often mentioned, but the icons sprinkled around today's world offer another example.

1. EXPLORATION OF ORTHOGRAPHY

5

The English lexicon fits neither of these models. The letter-sound correspondences are certainly not transparent, and the more than 500,000 entries in the lexicon exceed the possibilities for iconal representations. Where might rhyme and reason be found in the system? What is the connection between cat and catastrophic? Hop and hopeless? Save and salvation? How to deal with the phone book; under T in my directory, Tabb, Tague, Takei, Talamante, and Taylor appear on the first page. The phone book provides a segue for introducing Venezky's conceptualization of English orthography. Despite the claims of English-only advocates, our language is a polyglot—many tongues. An epiphany emerges from study of the history of the English language, available from several engaging sources (McCrum, Cran, & MacNeil, 1986). The parallel development of the written language is likewise documented in various monographs (Gaur, 1992). Venezky's assignment as a research assistant for Paul Hanna was to assist in the design of a spelling curriculum, a challenging task. Venezky transformed the assignment to encompass a more general analysis of the spelling-sound system. He proceeded along two lines. The first was conceptual—The Structure of English Orthography (Venezky, 1970a, henceforth, Structure) examined the "spelling mess" through a historical lens, and chaos began to take on order. The key to bridging the many languages that make up English was the morphophonemic principle, the notion that it is important to carve a complex word along its morphological or word-part joints before studying the phonological patterns. For instance, doughnut should be divided into dough-nut before looking at letter-sound patterns. The second was empirical—drawing on his engineering background, Venezky developed computer programs to conduct sophisticated but comprehensive investigations of letter-sound correspondences. The history of the English language is quite a tale, which must necessarily include both speech and writing. In spoken English, we have seldom said no to a word. The narrative begins with early Celtic foundations, moves through the four-letter-word amalgamations of the Angles and Saxons and (later) Jutes, past the polysyllabic intrusions of the Norman French conquerors, on to the exponential growth of the Elizabethan era (the sun never set on the empire, and Limeys brought back both loot and language), and finally to the growing independence of the colonies (including the United States, "Two great nations separated by a common language," according to Churchill). Advocates of simplified spelling systems misunderstand a fundamental point: The amalgamation that is spoken English is inherently multilingual at its core (McCrum et al., 1986). The English writing system emerged along the way through parallel play during various social, cultural, and linguistic upheavals. The beginning of the Current Era (AD 0) saw the British Isles peopled by Celtic

6

CALFEE

tribes (Britons and Irish). Roman invaders arrived on the scene, establishing outposts such as London and erecting Hadrian's wall to control the natives. Celtic runes and Latin letters serve as evidence of these literate cultures. Midway through the first millennium, waves of seafaring bands arrived from the German-Dutch Lowlands and the Danish peninsula, and later the North Sea region, invading from the north and east and moving southward. The Romans returned home, and the Celts moved westward. The invaders were accomplished warriors but less inclined toward reading and writing. The spoken language scrambled into a mishmash of Germanic and Scandinavian dialects. Short four-letter words became the hallmark of Anglo-Saxonish. Because the language was spoken more than written, it is better described as consonant-vowel-consonant syllables, with a smattering of syntactic markers (most of the original conjugations and declensions fell by the wayside), free-floating prepositions, and a propensity toward word compounding. The total lexicon was in the low tens of thousands. In AD 597, Augustine arrived with a band of clerics to convert the pagans, who by that time had settled on farms and created a sophisticated culture. A century later, both spoken and written English had undergone significant changes. Latinate "church" words had entered the lexicon, introducing speech patterns quite unlike Anglo-Saxon. The Latin alphabet became dominant in the spelling of Anglo-Saxon texts. Two centuries later, the Vikings overran England, threatening at one point to occupy the entire country—we might speak Norse today except for the achievements of Alfred the Great, who managed a political reconciliation and then campaigned to translate "certain books which are most necessary for all men to know into the language that we can all understand" (McCrum et al., 1986, p. 53). English and Norse persisted as parallel languages for several decades, gradually merging into a common spoken version. Spelling patterns adjusted along the way. By 1066, Old English manuscripts had settled into spellings recognizable (with a little help) to the modern eye. The Norman Conquest brought enormous changes in culture, language, and writing. William clearly won the battle, but over time the English flair for assimilation won the war. The monks brought Church Latin, but the Normans brought French, which they imposed as the official language. Over the next two centuries, English absorbed much of the French lexicon, anglicizing the pronunciations while the spellings remained largely unchanged (cf. nation, actuelle). By 1400, Geoffrey Chaucer emerged as the exemplar of a transformed language, Middle English, readily readable by most moderns. By 1417, Henry V was writing his documents in English rather than French, and in 1476 William Caxton brought the printing press to London, where he proceeded to make a fortune printing Chau-

1. EXPLORATION OF ORTHOGRAPHY

7

cer's works and anything else that he could beg, borrow, and translate. He directly confronted the idiosyncrasies of English spelling; individual and regional variations might be acceptable for manuscripts copied in the dozens but not for books printed in the hundreds. He took steps to regularize English spelling—up to a point. Elizabeth I reigned as queen from 1558 to 1603, followed for two decades by James I, during which time England underwent a renaissance. The growth of British political and economic power during this time is well known but equally important were advances in scholarship and science: Shakespeare and Bacon, Newton and Boyle, translation of the Bible, and worldwide domination by the British Navy. It was a time of enormous growth in the language, partly in response to the advances in the academic arenas but also through the influx of words from around the world. Some words came with spellings attached; most required transliteration. And the beat goes on. With the emergence of the United States as a force majeure during the past century, the language has undergone further expansions and inclusions. A forthcoming volume by Robert MacNeil and his collaborators (MacNeil, 2003) will focus on American English, which they view as a story in its own right. Documenting the lexicon boggles the imagination, more than half a million words and climbing— and those are just the words in print. Amazingly, English speakers can generally communicate with one another, and English readers can make sense of most texts. Teaching young children to handle the system, however, is another matter. Venezky's analysis was grounded in the history. English spelling is fundamentally alphabetic, in the sense that letters and sounds are related, no matter the origin of a word, rather than hieroglyphic or syllabic. But the system is clearly not a one-to-one mapping of letters to sounds. The 26 letters of the alphabet must account for more than 40 phonemes, disregarding regional variations. The lexicon encompasses different structural systems reflecting the different origins: one-syllable words (e.g., cat), compounds (e.g., catbox) and simple affixes (catboxes), root-plus-affix combinations (e.g., concatenation), and lots of simply long words (e.g., catywampus). CAT may seem straightforward in the previous examples, but what about delicate, application, and catheter. The key to untangling the apparent complexities in English spellingsound patterns is the morphophonemic principle mentioned earlier, which springs from the historical analysis. Finding order in English lettersound correspondences requires separating words into the fundamental meaning-bearing entities—the morphemes or "bodies." These elements, the basis for the primary spelling patterns, reflect linguistic origins that have metamorphosed over time, hence the importance of historical anal-

8

CALFEE

ysis. Absent this initial "chunking," the letter h appears quite volatile in samples like mighty, hothouse, rough, rehabilitate, yacht, and orthography. Separation into morphemic units is an essential first step in identifying the underlying grapheme-phoneme regularities—notice the shift from letters and sounds. Venezky defined the relational unit as a grapheme pattern associated with a phoneme, sometimes a single letter representing a single sound (e.g., hot-house), sometimes two or more letters for a single sound (e.g., or-tho-gra-phy, along with might-y, rough, yacht). Graphemes also functioned as markers, especially for vowels, where a small set of letters had to represent a wide variety of sounds. For instance, markers such as final e multiplied the correspondences generated by the letters a, e, i, o, and u. The empirical investigation—Venezky's (1970a; but also 1967, 1970b) dissertation, published as The Structure of English Orthography—analyzed the grapheme-phoneme regularities in a collection of 160,000 words, the largest such effort up to that point. The results were comprehensive, coherent, and considerate. Comprehensiveness was promoted by consideration of productivity. Some grapheme-phoneme patterns appear in a large number of different words or tokens. Structure highlighted consistencies rather than celebrating the unusual. But it also sought to encompass the full spectrum of English orthography, quite unlike compilations found in other sources. In a tightly crafted chapter published at the same time as Structure, Venezky (1970b) moved beyond the language of rules and regularities to discuss the predictabilities in English orthography. He argued that many patterns could be classified as invariantly predictable, the case for many simple consonant spellings. Others are variantly predictable when markers are considered. The unpredictable category was sorted into a variety of subcategories. The chapter concludes with a brief but essential remark: "The importance of this classification is that it separates patterns according to the pedagogy that can be employed to teach them" (Venezky, 1970b, p. 41), which conveys the notion of priorities for deciding what to teach and when to teach it. Coherence was attained in part through simple devices such as the distinction between vowels and consonants and to major subcategories within each. The marker concept was especially important for consolidating patterns that otherwise take shape as shopping lists or mystical rules. Consider the free-checked system in Anglo-Saxon vowels, more commonly known as the long-short variation. The usual rule is that adding final-e to a CVC pattern makes the vowel long—it says its name, as in fat/ fate and bit/bite. The correspondence, attributed to Caxton, is highly productive for Anglo-Saxon words. Exceptions like give and come bring stories of their own but are relatively rare. Venezky also underscored the

1. EXPLORATION OF ORTHOGRAPHY

9

broader marking system at work here. The long-short pattern doubled the number of vowel sounds represented by the five Latin vowel letters in the CVCs that are the basis for the four-letter Anglo-Saxon lexicon. But, Venezky pointed out, final e is only part of the marking system, which also included common suffixes like -ed, -er, and -ing. There is rat and rate but also ratted and rated, ratter and rater, ratting and rating. Once the entire pattern is laid out, including doubled consonants, it can be grasped by first graders—and even grown-ups—for both decoding and spelling. In current phonics programs, the final-e is still presented as a pattern and twinned consonants as something else. Considerateness was reflected in the large number of examples and explanations provided in Structure. Descriptions of spelling patterns are presented not only as counts but by characterizations, not only by labels but by rich examples. To be sure, Structure was written for scholars, and the language is generally academic in tone. But the playfulness with words presented many opportunities for teachers and curriculum designers. The challenge has been to translate the analysis into workable programs.

WHAT DO READERS NEED TO KNOW AND WHEN DO THEY NEED TO KNOW IT? Given the longstanding persistence of the phonics wars, little territory might seem left to be ploughed, trenched, or mined. Contemporary battles are fought on old ground. In the National Reading Panel Report (National Reading Panel, 2001), teaching children to read is presented as a simple matter of inculcating phoneme awareness in kindergarten, teaching basic phonic skills in first grade, and then getting on with reading to learn. Data from the First-Grade Reading Studies (Bond & Dykstra, 1967) onward suggest that this strategy does not yield sustained effects for children who have not learned to read before they enter kindergarten and is also ineffective for children who for other reasons do not benefit from such programs. The usual alternative—immersion in engaging children's literature—relies on the natural acquisition of the orthographic system, which calls for a substantial leap of faith, both conceptually and empirically. Both positions present a view of reading that emphasizes oral performance in the primary grades, gives little attention to comprehension, neglects writing altogether, and relies on surface-level indicators to gauge achievement. Multiple-choice tests, Dynamic Indicators of Basic Early Literacy (DIBELS), and Running Records (Clay, 2000) provide an uncertain foundation for monitoring the full spectrum of reading achievement in the primary grades—and afterward.

10

CALFEE

What might be the alternatives? The two following segments describe efforts to employ Venezky's morphophonemic concept as the foundation for introducing young students to English orthography in ways fundamentally different from the two previous strategies, neither of which incorporates the distinctive features of English orthography. Both segments feature the metaphonic concept, the idea that children will benefit through understanding the orthographic system to the point that they can explain grapheme-phoneme patterns. Another consideration is the notion of parallel progress; middle-class parents seldom insist that their children first memorize the alphabet, then simple words, and finally get to the good stuff. The stage strategy for early reading instruction flies in the face of what works for most successful readers. The recommendation is that phonics be included in a program that also supports the development of oral language, including vocabulary, engagement with stories and reports, and compositions, both oral and written. An ambitious agenda, to be sure. The "Phonemic" in Morphophonemic As noted earlier, morphemes constitute the critical first pass for detecting regularities in English orthography. For instance, in the compounds nothing and nuthouse, the medial th is a relational unit in the first word but not the second. If one looks only at TH, the grapheme might seem inconsistent but not if the reader can "see" the base words. The morphemic process, which is covered in more detail later, might seem to apply to more advanced readers, but even the youngest readers can learn this principle. Once the print lexicon is "chunked" into morphemes, then how to proceed? The answer depends on the word origin, and this section focuses on Anglo-Saxon CVC patterns. Although the number of productive grapheme-phoneme correspondences is relatively small (100 or so), the variety of CVC combinations is enormous (in the tens of thousands), even if onset rimes serve as the unit for instruction (in the thousands; e.g., Fry et al., 2000). The vowels, the glue that holds syllables together, provide the basis for the simple solution (Table 1.1). English morphemes (not necessarily words) tend to be one or two syllables in length, and syllables tend to consist of seven or fewer letter-sound elements. One of these elements must be a vowel, hence, the importance of the vocalic-center-unit. A CVC combination is functionally much like a sandwich. The bread comes in great variety, including open face, and the challenge is with the filling—the vowels. Vowel patterns vary with word origins, but within any given layer of the language, the five primary vowel patterns—a, e, i, o, u—are surprisingly consistent and quite productive; when the long-short marking system is taken into account, the vocalic foundation consists of 10 grapheme-

1. EXPLORATION OF ORTHOGRAPHY

11

TABLE 1.1 Matrix Summarizing Productive Grapheme-Phoneme Correspondences for Anglo-Saxon Sources in English Orthography (After Calfee & Associates, 1981) Consonants

p

y f h

g w j k

d 1 m q

Digraphs

Blends

Single b r n x

c t w y

z

initial

bl fl pr spl sp

br fr tr sm st

initial cl gl sc squ sw

cr gr sk sn tw

-ft

-mp

final -nt

-lk

dr sl scr str thr

sh wh

ch th gh

-ng

final -ck

-sh

Vowels Single Short a: e: i: o:

u:

mad pet Tim hop hops hopped hopping cut

r & / Controlled

Long made Pete time hope hopes hoped hoping cute

ar: or: er: ir: ur: al:

park for her bird fur hall halter walk

lard horn stern thirst churn fall falter talk

Digraphs charm short fern sir burn call balk

ai/ay: ee: ie: oi/oy: oa: au/aw: ew:

one sound pain, play meet piece foil, toy boat laud, law few

ea: ei: oo: ou: ow:

two sounds breath, breathe seize, eight noon, cook round, soul cow, snow

phoneme patterns. Vowel digraphs provide other fillings that delight those who challenge claims of spelling-sound consistencies; this category includes about 20 diverse grapheme-phoneme patterns. Consonants—the bread—are quite varied but also consistent. The basic building blocks for Anglo-Saxon consonant spellings include a little more than four dozen grapheme-phoneme correspondences. About 20 consonant phonemes are consistently represented by a single letter; here the alphabetic principle holds. Several other consonant phonemes are consistently represented by consonant digraphs (e.g., ch, sh, th, wh, ck, ng; letters are notoriously unfaithful in Anglo-Saxon spellings). Adding to the mix

12

CALFEE

are the consonant blends, also commonplace in Anglo-Saxon words, with combinations that stretch the imagination. By the time one combines the 50 or so consonant spellings that can occur at the beginning and end of a syllable with the roughly 30 vowel elements that provide the glue, the CVC combinations reach into the several tens of thousands. Many CVCs are real words, and most others serve in two-syllable words (e.g., rabbit). Of these commonplace words, Nist (1966) states the following: English remains preeminently Anglo-Saxon at its core: in the suprasegmentals of its stress, pitch and juncture patterns and in its vocabulary. No matter whether a man is American, British, Canadian, Australian, New Zealander or South African, he still loves his mother, father, brother, sister, wife, son and daughter; lifts his hand to his head, his cup to his mouth, his eye to heaven and his heart to God; hates his foes, likes his friends, kisses his kin and buries his dead; draws his breath, eats his bread, drinks his water, stands his watch, wipes his sweat, feels his sorrow, weeps his tears and sheds his blood; and all these things he thinks about and calls both good and bad. (p. 9)

The challenge for early literacy instruction is to provide efficient and productive access to this enormous print lexicon for both reading and writing. Most words are short in primary grade texts. The most popular strategy in current practice—the onset-rime technique—approaches English spelling as a syllabary. Rimes—vowel-consonant units, such as -AT, -ALL, -IG, and -IGHT—are introduced as learning objectives, to be combined with onset consonants to produce real words. Each of the hundreds of rimes is introduced, practiced, and tested over several lessons. Presumably, students will perceive these units in more complex words like lobster and (perhaps) water-moving machine. A more ambitious hope might aim toward transfer beyond rime units to more general principles, such as the functional role of vowels as critical elements across the entire English spelling system. Investigations over the past half century suggest that such transfer is uncommon; errors in both decoding and spelling center around vowels through middle school and beyond. Why not teach students directly about consonants and vowels at the phoneme level, rather than resorting to onset-rime patterns? A hundred grapheme-phoneme correspondences can generate the predictable patterns in Anglo-Saxon words, supporting the productivity principle. Students are usually taught about consonants and vowels—they can cheerfully recite "A, E, I, O, U and sometimes Y!" Unfortunately, the recitation does not guarantee any understanding of the functionality of vowel elements. Identifying letters as consonants or vowels can rest on rote learning, but perceiving the phonemes in the speech stream is another matter. Consider your perceptual experience while saying BLENDER, for instance—stretch the word and study the flow of sounds.

1. EXPLORATION OF ORTHOGRAPHY

13

The recent emphasis on phonemic awareness attempts to raise the sensitivity of early readers to speech patterns by practice with rhyming patterns and initial consonants. These activities, assigned to kindergartners, can be quite complex: delete a sound ("Say STOP without the T") or move a phoneme ("Move the first sound in STOP to the end of the word"). Prereaders, without letters to help them, are often puzzled by such requests, as are many primary-grade teachers (cf. Calfee & Norman, 1999, for a comprehensive review). The bottom line is that early reading programs employ the onset-rime approach because it is difficult to teach young prereaders to deal with individual phonemes. As Alvin Liberman and colleagues (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) demonstrated in a brilliant series of investigations, spoken words are complex; individual phonemes overlap and shingle in ways best revealed in articulatory patterns. The phoneme /p/, for example, no matter the context, is produced when a puff of air is released through pursed lips, unaccompanied by vocalization. The grapheme P generally captures this action. If, however, you try to hear this stimulus, you can easily be tricked (Moats, 2000). It is far easier to leave vowels connected to consonants, as in Japanese and Spanish, rather than try to teach prereaders to become junior phonologists (cf. the Venezky-Calfee collaborations referenced in Calfee & Norman, 1999). The challenge to the onset-rime approach in English arises from the enormous number of syllabic combinations, in Anglo-Saxon and beyond. My empirical investigations during the past several years (Calfee, 1998; Calfee, Norman, Trainin, & Wilson, 2001; Calfee, Norman, Miller, Wilson, & Trainin, in press) explored an instructional strategy in which primary students, after puttering with consonant articulation patterns linked to graphemes, learn to build words by gluing two consonants together using a vowel. In first grade lingo, " P A T' tells you to put your lips together and blow, and then you make the 'aaaa' sound for the glue letter, and then you put the tip of your tongue at the top of your mouth and blow again. And you stretch it out so you can think about what you are doing." The WordWork construction zone incorporates several principles from Structure and other facets of Venezky's early work (Calfee & Norman, in press). First, phoneme awareness is developed through articulation of the simple consonants; /p/, /t/, /k/, /f/, /s/, /m/, /n/, /b/, /d/, /g/, /v/, /z/ are regularly and productively linked to simple graphemes, which in turn can be directly linked to letter tiles. From the outset of instruction, letters are paired with sounds, graphemes with phonemes. Vowels are then introduced as glue letters, allowing students to build words (CVCs). Leading the students in choral response, the teacher models blending and stretching. PAT is not /puh/ /a/ /tuh/, but /p-aaaa-t/. Short-a leads the curricular list because it is the first letter in the alphabet,

14

CALFEE

and the short pronunciation sets the phoneme apart from the letter name. The first seven consonants in the previous list, along with medial /a/, produce 49 CVC patterns, most of which are real words and all of which (along with the -VC rimes) are pronounceable and useful. Add another few consonants (taking care to avoid semivowels like r and 1) and the remaining short vowels, and the first grader has access to several hundred building blocks. Additional consonant blends and digraphs move the collection into the thousands. All of the constructions are inherently phonemic and syllabic, and all have morphemic potential in the Anglo-Saxon layer of the language—they enter into compound words and other morphemic combinations. Secondary patterns are brought into the mix based on productivity and efficiency—how many CVC units can be generated by each element and how simple is the element. Consonant blends are quite productive and easy to add to the mix; -tch and qu- don't help that much and can be difficult to explain. An aside: The grapheme-phoneme correspondences can be used for both reading and writing, allowing students to compose their own decodable texts. Metaphonic exchanges among students are fostered by the availability of a technical language and purposeful discussions about spelling patterns. The technical language captures important constructs in language appropriate for young children, hence glue letter for vowels and popping and nose sounds for plosives and nasals. Small-group discussions provide the basis for transfer of the concepts, as does the cumulating curriculum. After a couple of weeks on short-a patterns, short-i is introduced as another glue letter, but with ongoing review of short-a. As they are acquired, CVC patterns are mentioned when they appear in reading and employed for writing activities but without restrictions on either readings or writings. After the five short vowels, students are next introduced to the AngloSaxon marking system for the primary vowels. Final-e is presented as a buddy that lets the vowel say its name. Students act out the relation using neck cards. Again following Structure, common Anglo-Saxon suffixes (e.g., -ING, -ED, -ER) are added to the mix to explain the function of geminate consonants: Build MAT. Add -E as a buddy—MATE. Add -ED to MAT, with another T to break up the buddy—MATTED. Add -ED to MATE and forget -E—MATED. Add -ING to MAT, with an extra T—MATTING. Add -ING to MATE and drop -E because any glue letter can be a buddy—MATING.

1. EXPLORATION OF ORTHOGRAPHY

15

The apparent complexity of the relations vanishes as first graders act out the transformations with neck cards, arguing and explaining as the actors move from one position to another. Meaning springs from the constructive process. It matters little whether the patterns are semantically real or familiar to the children (no need to discuss the meanings of mating). Transfer of the marking system from A to the other primary vowels is typically rapid, doubling the number of vowel patterns in a matter of a few weeks. The suffixes, taught as units, are almost free of charge because of their high frequency of occurrence in authentic texts. By the time students have completed the 10 primary vowel patterns, their capacity for word attack provides a solid foundation for both reading and writing and for conversations about spelling-sound patterns. At this point, attention can be given to secondary vowel patterns, including vowel digraphs and variations associated with semivowels (the -R and -L-affected vowels). These patterns, which will have already been encountered in reading, are introduced as such, using charts to lay out various patterns. The material basis for the program is a scope-and-sequence chart featuring busybody ants, letter tiles, handy-word lists, neck cards, and a few wall charts. The program does pose challenges to the teacher accustomed to a more traditional phonics program. Curricularly, there may not appear enough to do. Instructionally, the task is to draw out student responses rather than teach. Small-group activities, often with student pairs, are essential but require skills in managing and encouraging group interactions. The linguistic concepts are generally unfamiliar to teachers, as is the idea of cumulative learning, of reviewing previous learning rather than moving from one disconnected objective to another. Achievement results have proven positive in several experiments across a wide range of contexts (Calfee, 1998; Calfee et al., 2001; Calfee et al., in press). The strategy, when implemented by a knowledgeable teacher, appears effective, efficient, transferable, and sustaining. Unfortunately, WordWork is difficult to package, leaving administrators and teachers to wonder how it can possibly be a real program. The "Morpho" in Morphophonemic WordWork covers the Anglo-Saxon patterns found in children's books in the primary grades. By fourth grade, vocabulary in social studies and (especially) science texts begins to be populated by words from Romance and Greek origins. Second graders may read a story about sandpipers digging for clams; fifth graders are more likely to encounter an exposition

16

CALFEE

about the excavation of molluscs by Scolopacidae. The latter words seldom occur with high frequency but are highly informational. These texts include many familiar elements, including handy words and recognizable syllable patterns. But a new set of morphological structures arrive on scene, leading to longer words with less obvious junctures. Consider submission: the obvious chunks include sub (a sandwich or a stand-in teacher), miss (a failure to hit or perhaps how to address the stand-in), and ion (have heard it somewhere). The surface analysis helps neither phonologically nor semantically. Anglo-Saxon vowel marking systems no longer have the same degree of consistency—consider sane versus sanity. Anglo-Saxon stress patterns are fairly simple: Destress function words and most affixes, but everything else (including compound words) receives equal emphasis. A word like internationalization, by contrast, is an auditory rollercoaster, with a muted schwa around every bend. Finally, oral reading is finished by fourth grade. Comprehension is featured, mostly answering questions from the teacher or textbook. Vocabulary occurs regularly, mostly as memorizing word meanings. Roots and affixes appear occasionally and rather disjointedly. Spelling is tested every Friday, usually as a list of unrelated words. If students haven't already come to puzzle about English print patterns, spelling tests are likely to drive them to distraction. For instance, elementary students are often taught that when two vowels go walking, the first does the talking. This rule is wrong most of the time in any event, but it never works with -sion, -tial, and their many relatives. Rarely do students learn about the history of the language. WordWork II (Calfee & Norman, in press) extends the primary-grades program to the late elementary grades and beyond, following the same basic themes from WordWork—efficiency, productivity, constructivism, and metacognition. Students are introduced to the linguistic history, both oral and orthographic. The program features the morphophonemic building blocks for words from Romance origins—root words, prefixes, and suffixes—much as consonants and vowels are laid out in WordWork I. The letter-sound correspondences for these building blocks are simpler and more consistent, by and large, than for Anglo-Saxon morphemes, and CVC spelling patterns often transfer directly. The collection of building blocks varies in productivity and ease of access, and the curriculum begins with easiest and most useful elements: pre-, post-, uni-, bi-, pro-, anti-, tele-; -vid-, -scrib/pt-, -graph-, -man-, -ped-; -t/sion, -t/cious, -ist/ive, -ology. The components are then assembled much like the popular mix-and-match animal books. Some constructions from the list emerge as real words, familiar in their own right, such as television, or to be found in the dictionary. Others require the creation of a meaning, as in antipedology (one possibility, "the study of New York cab drivers").

1. EXPLORATION OF ORTHOGRAPHY

17

Meaning moves to center stage in this activity; the meaning of the individual morphemes provides a starting point, but establishing the relations is critical. Conventional pronunciation is of secondary importance in early learning. Accenting the right sy-LAB-le can be remedied once the student is close. The aim of the exercise is the recognition that English words spring from different origins that call for distinctive word-attack strategies. Awareness of word origin is fostered in two ways. First is the realization that long words cannot always be divided into simple compounds but sometimes need to be tackled differently. Second is sensitivity to the suffixes that mark Romance words. Once a Romance word is identified, chunking is a matter of identifying the prefix and the root—suffixes generally play a secondary role. In addition to the mix-and-match activity, the strategy can be applied on the fly during content area reading. The pieces of multicultural may not have been covered in previous lessons, but the -al is a signal, multi- is likely to be familiar from other contexts, and culture remains as the root. Teacher knowledge and skill are critical for implementing the strategy. Program materials are available from several sources (Calfee & Associates, 2004; Calfee & Norman, in press; Henry, 2003), but implementing the strategy where it counts most—content area reading and writing—requires the teacher to seize the moments. As noted earlier, surveys have demonstrated that primary-grade teachers are not especially knowledgeable about linguistics in general and orthographic principles in particular. It is easy to trick them with patterns like fox in socks. The same situation probably holds true for teachers in the later grades. Latin and Greek vanished from the curriculum several decades ago (along with most other foreign languages). By the middle-school grades, the pressure to cover the content is enormous, and the emphasis is on reading rather than thinking. WordWork II requires students (and teachers) to "slow down and make the moment last." It does not require a degree in Romance languages; rather, as students examine the morphological patterns that appear in science and social studies, the teacher learns from their experiences. As an aside, students for whom Spanish is the first language bring expertise that can serve others because their first language is quite similar to the "fancy" French words bequeathed by William the Conqueror a millennium ago. AROUND THE NEXT BEND ... The phonics fury has afflicted English-speaking countries, especially the United States, for generations. Extremes of the dispute—to teach every spelling pattern directly or to rely on nature—have dominated much of

18

CALFEE

the discourse. The recent calls for balance (Pressley, 2002) seem reasonable on the surface, but how to design and manage the substance is less obvious. Venezky's analysis suggests a different strategy, one that unearths from events during the past two millennia a small set of core concepts that provide the foundation for leading students to an understanding of English orthography as they move from kindergarten to the high school years, through an emerging awareness of the origins of this rather incredible language. The continuing challenge to full realization of the potential of these discoveries is effective engineering, a notion that Venezky would certainly appreciate.

ACKNOWLEDGMENT Prepared for the Festschrift, in Honor of Richard L. Venezky. Support provided by Spencer Grant 199900046.

REFERENCES Bloomfield, L., & Barnhart, C. L. (1961). Let's read: A linguistic approach. Detroit, MI: Wayne State University Press. Bond, G. L., & Dykstra, R. (1967). The cooperative research program in first-grade reading instruction. Reading Research Quarterly, 2(4). Calfee, R. C. (1998). Phonics and phonemes: Learning to decode and spell in a literaturebased program. In J. L. Metsala & L. C. Ehri (Eds.), Word recognition in beginning literacy (pp. 315-340). Mahwah, NJ: Lawrence Erlbaum Associates. Calfee, R. C., & Associates. (1981). THE BOOK: Components of reading instruction. A generic manual for reading teachers (rev.). Unpublished, Stanford University, School of Education. Calfee, R. C., & Associates. (2004). Project READ-Plus materials. Riverside, CA: University of California. http://www.education.ucr.edu/read_plus Calfee, R. C., & Norman, K. A. (1999). Psychological perspectives on the early reading wars: The case of phonological awareness. Teachers College Record, 100, 242-274. Calfee, R. C., & Norman, K. A. (in press). Working with words. New York: Guilford Press. Calfee, R. C., Norman, K., Miller, R. G., Wilson, K., & Trainin, G. (in press). Learning to do educational research. In R. J. Sternberg & M. Constas (Eds.), Translating theory and research into practice. Mahwah, NJ: Lawrence Erlbaum Associates. Calfee, R. C., Norman, K. A., Trainin, G., & Wilson, K. (2001). Conducting a design experiment for improving early literacy, or, what we learned in school last year. In C. Roller (Ed.), Learning to teach reading: Setting the research agenda (pp. 166-179). Newark, DE: International Reading Association. Fries, C. C. (1962). Linguistics and reading. New York: Holt, Rinehart & Winston. Fry, E. B., Kress, J. E., & Fountoukidis, D. L. (2000). The reading teacher's book of lists (4th ed.). Upper Saddle River, NJ: Prentice Hall. Gaur, A. (1992). A history of writing. New York: Cross River Press. Goswami, U. (2000). Phonological and lexical processes. In M. L. Kamil, P. B. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research, Volume III (pp. 251-268). Mahwah, NJ: Lawrence Erlbaum Associates.

1. EXPLORATION OF ORTHOGRAPHY

19

Henry, M. K. (2003). Unlocking literacy: Effective decoding and spelling instruction. Baltimore: Brookes. Lee, V. (2003, October 27). Pardon my planet (Cartoon). Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431-461. Lionni, L. (1963). Swimmy. New York: Dragonfly. MacNeil, R. (2003, November). The new story of English. Keynote address to convention of the National Council of Teachers of English, San Francisco, CA. McCrum, R., Cran, W., & MacNeil, R. (1986). The story of English. New York: Penguin Press. Moats, L. C. (2000). Speech to print: Language essentials for teachers. Baltimore: Brookes. National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office. National Reading Panel. (2001). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Washington, DC: NICHD/NIH. Nist, J. (1966). A structural history of English. New York: St. Martin's Press. Pressley, M. P. (2002). Reading instruction that works: The case for balanced teaching (2nd ed.). New York: Guilford Press. Seuss, Dr. (1957). The cat in the hat. New York: Random House. Venezky, R. L. (1967). English orthography: Its graphical structure and its relation to sound. Reading Research Quarterly, 2, 75-105. Venezky, R. L. (1970a). The structure of English orthography. Paris: Mouton. Venezky, R. L. (1970b). Regularity in reading and spelling. In H. Levin & J. P. Williams (Eds.), Basic studies on reading (pp. 30-42). New York: Basic Books. Venezky, R. L., & Venezky, K. (in press). An ABC of odd and unusual spellings.

This page intentionally left blank

2 Phonological Variation and Spelling Rose-Marie Weber University at Albany, State University of New York

English speech varies in the ways that it is pronounced from one place to another, from one group to another, and, by any individual, from one occasion to another. English spelling, on the other hand, varies very little, so that its fixedness conceals these facets of dialect and style. In The American Way of Spelling of 1999, Richard Venezky offered us his wordlywise analysis of the written forms of English, upholding and consolidating the insights from 30 years of scholarship. To draw the connections between spelling and sounds, he chose a single variety for the sound patterns, General American English, the better to concentrate on the graphical patterns that his analysis revealed. At the same time, he revisited topics relevant to dialect variation. For one thing, he gave attention to the origins and evolution of English spelling, recognizing systematic changes in the sounds of the language through its history as a principal force in shaping the distinctive contours of English spelling today. For another, he emphasized the need for an abstract level of organization in moving from spelling to sound in order to capture the underlying consistency in spelling morphemes, in spite of differences in pronunciation; thus major/majority. For still another, he drew implications from his analysis of American spelling—not to mention his empirical studies of children and adults—for teaching the correspondences between spelling and sounds to novice readers, lightly touching on dialect differences in pronunciation across the country. 21

22

WEBER

For this occasion, I would like to explore issues surrounding phonological variation in American English as they relate to children learning to read and spell as they begin school. Although we all notice variation in the real talk in our neighborhoods and in our classrooms, we have not generally brought it to bear on our thinking about the sound side of soundletter correspondences. Yet in these times, attention to phonological awareness, invented spelling, and types of phonics instruction for teaching to read and write has intensified. Teachers and children in classrooms across the country are spending a good deal of energy paying attention to sounds and their connection to letters and printed words. They are segmenting words into parts, stretching syllables, creating sets of rhyming words, writing inventively, and doing other sorts of word work that give weight to sounds as well as letters. Research on the substance and extent of such activities has proliferated and continues to raise many questions (e.g., Calfee & Norman, 1998; Dahl, Scharer, & Lawson, 1999; Ehri et al, 2001). There is evidence that many children truly benefit from instruction in systematic instruction on sounds and letters, but there is also evidence that many children break into the sound-letter connection through intense experience reading and writing on their own terms. At the same time, scholarship on the dimensions and distribution of phonological variation in American English has thrived. Not only has the traditional study of regional dialects been invigorated, but it also has been extended by the study of the variation that cuts across regions—the speech patterns that individuals adopt, reject, and evaluate as members of social groups and, beyond that, the range of styles they draw on to create and take part in particular speech events. A good deal of this work has been undertaken to address core questions about the very nature of phonological systems, how they change through time and space and how they should be studied as social phenomena. All together this work has provided evidence that the kinds of sound changes in the past that led to today's dialect variation are continuing into the present and shaping the future. In several instances, whole sets of vowels are shifting their qualities together in ways that recall the Great Vowel Shift in the history of English. The place of children in the picture of variation and change is only beginning to emerge (Chambers, Trudgill, & Schilling-Estes, 2002; Labov, 2001; Pederson, 2001; Wolfram & Schilling-Estes, 1998). My plan for this chapter, then, is to sketch the regional, social, and stylistic dimensions of research on phonological variation in American English and then to take stock of how these may be involved in current thinking, research, and practice in learning and teaching to read as children move into reading and writing. First, though, it is relevant to say a few words about spelling theory, a field where Venezky has been an active player, and to consider the ways that phonological variation is taken into account.

2. PHONOLOGICAL VARIATION AND SPELLING

23

Theoretical discussion about the relationship between a standard spelling, such as English, and the variation in the spoken language that it necessarily accommodates is sparse (Coulmas, 1989; Cummings, 1988; Daniels & Bright, 1996; Luelsdorff, 1987; Vachek, 1989). When writing systems are considered in general, it is assumed, implicitly if not explicitly, that a spoken language has unvarying phonetic form across regions and speakers. When a writing system is examined in particular, an idealized form of the spoken language is chosen as a frame of reference. In many cases, it is a reflection of the variety that has become established as the standard, identified with the well educated for use in public settings and upheld through schooling, the media, and in some places even a language academy. This idealization of the spoken language, cut away from its contexts of use and cut up into segments for analysis, can then be examined for its relationship to its written form with few disturbances. Dialects are susceptible to being viewed as falling short of the idealized spoken language, handled incidentally, or ignored. Theoretical efforts in spelling that grew from generative phonology have included reference to dialect variation in an incidental way. Abstract representations, roughly comparable to Venezky's (1999) morphophonemic representations, are required to capture the phonological relationships in sets like major/majority or courage/courageous. Carol Chomsky (1970), for instance, proposed that such abstract representations in a generative grammar—she called them lexical spellings—correspond closely to conventional spelling. In this respect, they are free of phonetic detail and resistant to change, so they transcend dialect variation in English around the world. She acknowledged, though, that children are only on their way to acquiring the lexical spellings for sets like major/majority as they begin school, constructing and modifying them as they gain experience in the phonology and learn to read and spell. Generative phonology, with its cognitive slant, invited researchers to frame their thinking about abstract representations that children may call on as they bridge varieties of spoken and written English. In the United States, General American English is the idealized variety often chosen for analyzing spelling and its relation to phonology. The label is applied, for instance, as a cover term for American as opposed to British (McMahon, 2002) or contrasted with varieties such as African American English (Lanehart, 2001). The features of General American English have been chosen as the basis for the pronunciation of sounds in many dictionaries, in materials for teaching phonics, and in materials for teaching American English abroad. As noted by Pederson (2001, p. 261), it is rooted in the Inland Northern regional dialect of the area around Chicago, the native pronunciation of so many scholars of 20th century American speech—including Venezky (he grew up in Peoria!). General Ameri-

24

WEBER

can English is sometimes described as lacking regional features, providing the model for so-called Network English and for the elite speech type that Kretzschmar (1997) identifies as coexisting with regional and social varieties across the country. General American English has provided the fixed frame of reference for linking spelling to sounds in our educational traditions. Standard American English, on the other hand, suggests a somewhat different abstraction because it accommodates regional variation in pronunciation across the United States, as one moves from Boston to Atlanta to St. Louis to San Francisco. It also encompasses ethnic and other social varieties that flourish across the nation free of stigmatized features. Further, it includes pronunciation used for informal, everyday exchanges—the vernacular—as well as more formal speech. Standard American English, given the varieties it encompasses, is complexly related to our spelling, given its consistency and stability. When it comes to varieties considered nonstandard for their stigmatized features, the links to spelling are more tortuous. Of course, there are some challenges to spelling that hint at the everyday variation in pronunciation, standard or otherwise (Weber, 1986). They include the narrow range of conventions for representing informal speech in dialog and advertising (hafta and gimme) and the spellings associated with cultural eddies such as hip-hop (missundastood) and instant messaging (u c?). These variations have not been welcomed into the language arts curriculum, given its societal function to maintain Standard American English and the spelling system that supports it. The question of how young children face the English spelling and the phonology of schooling, speaking as they do when they first begin formal study of literacy, needs to take into account the variation they encounter. Given that phonological awareness, invented spelling, and phonics programs are central to current early literacy curriculums, such variation may raise practical issues for young learners and their teachers, especially at a time when children are quickly expanding their linguistic capacities. In that period when learners set out to master the system, they are the ones who will bridge the differences between their speech and its written representation. THE STUDY OF VARIATION Ways to study how American English phonology varies by region have been extended to allow more precise thinking about other aspects of variation, its place in the history of English, and especially its complex relation to social factors. The methods go far beyond the traditional questionnaires about words of interest from key speakers in the field, though they have not been abandoned. They now centrally include long interviews that provide

2. PHONOLOGICAL VARIATION AND SPELLING

25

multiple tokens and contexts of sound segments of interest from carefully chosen samples of speakers differentiated by social characteristics. These data are subjected to sophisticated analysis—linguistic, social, statistical, acoustic, cartographic, and more—as a foundation for significant sociolinguistic generalizations as to variation by region, by social identity of speakers, by occasion, and by the interconnections among them. A leading figure in these matters is the linguist William Labov (Chambers et al., 2002; Labov, 2001; Labov, Ash, & Boberg, 2004; Wolfram & Schilling-Estes, 1998). These methods for studying variation have been refined and reinforced by the guiding principle that phonological change, the source of dialect variation, is at root socially motivated. They have yielded results that emphasize the importance of social arrangements and group identity to the ways people speak, whether with conscious intention or not. Furthermore, with respect to both theoretical and practical issues, as in education, they have led to outcomes that at times sustain and other times break down a sharp distinction between regional dialects, social dialects, and style. To take one brief example, the choice of -in' over -ing in verbs, long-standing in the language, offers a stylistic choice. In the South, however, -in' is generally favored for most occasions. The choice of -ing, favored by Northerners in relatively careful speech, may contribute to Southerners' judgments of Northern Inland speech as "overly precise, self-conscious, and unnatural" (Pederson, 2001, p. 262). The choice of -in', though, may be favored by some Northerners, who might well agree with this judgment and avoid -ing except under the most formal circumstances—men more than women, children more than adults, plumbers more than teachers. There are other complexities. Yet speakers with some education, by the way, spell the suffix -ing, no matter how they pronounce it, unless they're writing dialog or otherwise intentionally marking the text for a particular voice. As a matter of course, linguists who study such variation for theoretical purposes give attention to Americans who remain at the margins of our affluence and our educational system. They have been concerned to apply their perspective to facing everyday issues in schools, especially teaching and evaluating children whose speech does not line up closely with their teachers' or with the words that they see in their books or on tests (e.g., Labov, 2003; Wolfram, Adger, & Christian, 1999). PHONOLOGICAL DIFFERENCES BETWEEN VARIETIES

The idealized variety General American English serves our thinking about phonology in more than one way. It provides a general frame of reference for relating spelling patterns to sounds—and vice versa. It is the one that is conventionally represented in our materials for teaching Ian-

26

WEBER

guage arts and for preparing teachers. It also provides a frame of reference for showing differences, an analytic convenience for describing how the phonology varies from one region to another. Types of phonological variation can be sketched relative to the set of phonemes of General American English, taking into account their phonetic features. From this perspective on phonological systems, three types of differences from one variety to another are worth noting (McMahon, 2002). The first is systemic. A different variety may have a different number of phonemes. To take an example, General American English as Venezky (1999) presented it, has 12 simple vowels, but some regions have only 11, and others have 13. On the side of fewer vowel sounds, in particular, the contrast between the vowels in words such as cot and caught, body and bawdy have merged. Speakers without the contrast in their speech have a larger number of homophones than those who maintain it. In reading, the vowel sounds traditionally called short-o and aw are pronounced as one, but in spelling the vowel letters must be distinguished by the identity of the word without support from pronunciation. A second type of difference is distributional. In this case, the difference in pronunciation from one variety to another is not a matter of number of phonemes but rather how the phonemes appear in words. There are two subtypes. In one case, a different phoneme occurs in a particular word or morpheme from one variety to another: You say greasy, I say greezy, you say shrimp, I say srimp. We both say ring, but specifically for the suffix -ing, we sometimes choose ringing, sometimes ringin'. In the other case, the distribution of the phoneme depends on phonological conditions and can be captured by phonological rules. For example, short-e and short-i merge into a single sound before the class of nasal phonemes in some varieties, as in ten and tin. Other frequent examples involve the loss of phonemes or phoneme classes relative to General American, such as the loss of stops like d in certain consonant clusters at the ends of words, as ben' for bend and han' for hand. These distributional differences can involve many details and play havoc with spelling patterns. A third difference is realizational, involving the way phonemes are realized phonetically. From one variety to another, the sounds considered the same phoneme have different qualities. Though the pronunciations differ, they do not cross the boundaries of phonemes. Short-a, as in the words bat, batch, and back, are pronounced in the Hudson Valley of New York State as Americans have done for generations, but in the western part of the state and beyond to the cities around the Great Lakes, short-a such as bat are widely pronounced with the vowel quality of care and kale. The differences can be captured by rules specifying the conditions for their occurrence, such as when they precede the voiceless class of consonants, but they are not phonemic. Likewise, the dentalization of conso-

2. PHONOLOGICAL VARIATION AND SPELLING

27

nants t, d, n, I and the th- sounds, so that this sounds like dthis, creates a distinctive quality in words, but these consonants are variants of their phonemes. Our spelling does not capture differences in sound below the phoneme level, so such realizational differences do not directly involve sound-letter links. DIMENSIONS OF VARIATION Differences in phonology permeate English in ways that Americans may note, even stigmatize, or hardly notice at all. They can be seen to vary systematically by region, by the social membership of speakers, and by style. The differences come into play especially among children at the time when they begin school. It is important to underline, though, that these dimensions are not entirely separable and that a particular feature often crosses regional, social, and stylistic boundaries as Americans use their English for innumerable social purposes. Phonology is only part of the picture, as grammatical features, vocabulary, and ways of using the language show variation as well. Regional Variation The region where children grow up offers the model for their pronunciation. In the United States today, four broad regions are generally recognized in the study of dialectal differences and their ongoing changes: the North, the Midland, the South, and the West (Labov et al., 2004; Pederson, 2001; cf. Hartman, 1985; Wolfram & Schilling-Estes, 1998). These largely reflect the history of settlement in the country. Needless to say, the dialects are far from uniform and their boundaries far from fixed. Features other than phonological, especially vocabulary, often support but may also cross-cut pronunciation patterns. The broad regions include many distinctive dialect centers, such as New York City. One might think that regional variation in pronunciation may be lessening in these times of constant radio and television, high levels of schooling, shifts in occupations, mobility in the population, and the continuing expansion of metropolitan areas. But differences among the regions are thriving and even appear to be intensifying. Sound change is still operating in English, in spite of spelling, just as it has through the history of the language. It is largely out of awareness, fast enough to be tracked by intricate methods (Labov 2001) but slow enough to maintain the coherence of the phonological system as a carrier of meanings. The most widespread systemic difference concerns the vowels of cot/ caught, body/bawdy. In much of the Midland and the West, not to mention

28

WEBER

Canada, the distinction no longer exists. The sounds have merged, reducing the number of phonemes in those dialects relative to the rest of the country. It has not taken over entirely in the regions, the large cities in California showing diversity. Mergers like this increase the number of homophones for their speakers. As readers, they may take the homophones in stride, as they would in speech, and learn to benefit from the distinctions in meaning that the spelling represents. As spellers they may need to learn to track those distinctions more closely than other learners (Varnhagan, Boechler, & Steffler, 1999). Other mergers of phonemes are apparent by region, but their distribution is restricted to certain phonological environments. The merger of short-i and short-e, as in pin/pen; gym/gem holds only before nasal sounds, especially m and n, distinguishing the South regionally. The merger of vowels preceding -r in words, as in fairy/ferry/carry and very/vary is widespread across the country, but the vowels remain distinct in some places. For another example, in parts of the South, long-e and short-i have merged before /, so that words like feel/fill, heel/hill sound the same. In these cases, the vowels in question are distinctive in other phonological environments and accord with their spelling, but in the particular environments, the distinctiveness disappears (Labov et al., 2004; Pederson, 2001). Social Dialects Cutting across the variation in English by region is variation by the identity of speakers—their social class, ethnicity, gender, age, place of residence, and the like. The systematic study of social factors in relation to linguistic form, phonology in particular, has expanded tremendously, clarifying yet complicating issues of linguistic theory and educational practice. It has led to intense examination of the historical sources of variation and the social motivation for phonological changes, such as the Northern Cities Shift going on around us. It has brought into focus the evaluations that Americans make of one another through speech, especially what we have come to call nonstandard English, a perennial concern of schools and society (Labov, 2001; Wolfram & Schilling-Estes, 1998). The phonology of nonstandard varieties, such as Appalachian English and African American Vernacular English (AAVE), differs from the General American frame of reference, especially with regard to consonants. In AAVE, for instance, voiced th in words such as them and that may be pronounced as d, as in many other varieties, and voiceless th after vowels in words such as tooth and breath may be pronounced as f. Specific types of consonant clusters, such as -st and -nd, are reduced at the ends of words so that innumerable English words such as fact and find are reduced at the ends of words to fac' and fin'; often these reductions involve suffixes, per-

2. PHONOLOGICAL VARIATION AND SPELLING

29

meating the workings of the grammatical system and so playing a role in nonstandard features beyond phonology, such as reducing the past tense suffix of fined in They fined me ten dollars relative to General American. Close study has revealed that for many features, such as -ct reduction, the consonant clusters are not reduced every time they occur in running speech by speakers of a variety. In large samples, the nonreduced forms occur as well, suggesting that speakers have knowledge of standard forms and don't always put them into play. Some of the reductions are conditioned by whether or not they are followed by vowels in the following word or suffix, act of God but ac' mean. Some speakers of AAVE, though, may reduce them even before vowels. At any rate, what stands out is that few consonant reductions are unique to AAVE but rather occur in the rapid or vernacular speech of standard English speakers as well, though not with the same frequency (Wolfram & Schilling-Estes, 1998). Social dialects reflect class and ethnic roots that arouse concern about the differences between the standard variety spoken in schools, fixed spelled forms, and the lively speech children bring from home. Their study and implications for education show the paradoxes of living in a complex linguistic world. The prestige dialect clearly has antiprestige counterparts—so-called covert prestige—that speakers value on their own terms. In these times, teachers are often urged to accept nonstandard varieties of English in their classrooms. In this way, students can use verbal abilities to use their home variety as a tool for thinking, for taking stances, for exchanging ideas, and, in time, as a foundation for acquiring the standard variety that holds in schools and in white-collar occupations (Wolfram & Schilling-Estes, 1998).

Style and the Vernacular Style is the type of variation in the language that covers the range from formal to informal, careful to casual speech, lento to allegro. Style changes with the nature of occasions; speakers choose their vocabulary, pronunciation, sentence types, and discourse to create occasions, to project their identities onto a situation, to express their deference, and the like. The informal style used among family and friends over everyday matters—the vernacular—may not always be appropriate in classrooms for instructional matters. But it is considered by variationists to be the most authentic phonologically. When speakers use the language with least attention to its form, they put into play the variety that is most closely aligned with the pronunciation of other members of the speech community, that shows the most regular patterning, and that can be seen as the primary locus of sound change (Labov, 2001; Wolfram & Schilling-Estes, 1998).

30

WEBER

Vernacular style and social dialect often overlap in many ways. To sound casual or familiar, speakers may use a broad range of features common in their social group, such as infrequent r after vowels in words such as far, farm, even foreign. To sound more formal, they may overrule such features and in this case move toward a higher incidence of r, choosing to sound perhaps clearer, more precise, and, incidentally, closer to spelling. Less formality is available as a stylistic choice in the written language by contractions such as we'll and can't. But these contractions can only hint at the stylistic variation available in speech. In primary-grade classrooms, teachers may use a range of styles. They may project their voice so that all in the class can hear, speaking forcefully and deliberately to present a problem, to explain a concept, to give instructions, to model pronunciations of words under study, and to discipline. In reading a story to the class, they may exaggerate their pitch range, stretch out words and phrases, and otherwise decorate their formal style. But to an individual child, they may softly say, Keep goin' or Dja lose y'r pencil? In this less formal speech, they may omit consonants, lower the stress on certain vowels and weaken their quality, choose forms such as in' for ing, and—in a rule-governed way—blur word boundaries. In other words, they sometimes choose to sound the way they often do at home with their own children. Their students, though, may not yet have developed their deliberate speech. Children's Variety Still another dimension of variation, the children's own emergent knowledge of the phonological system, may come into play as children begin formal instruction in reading and writing English. From their point of view, they face a complex writing system as they shape their ideas about what a word in print is, how letters constitute words, and how these relate to the speech they use to express meaning. It is likely that young school-age children come to school with a somewhat different phonological system from adults, especially their teachers. The differences can fall along the several regional, social, and stylistic lines and the ways they are intertwined in their families and community. Although children and teachers live in the same region, they may well be from a different social group, growing up around a somewhat different vernacular. Studies of variation in the speech of children have provided many examples indicating that children learn variation in phonology as they acquire the phonology of the speech community and, by the way, take part in ongoing sound change (Roberts, 2002). To take a few examples, even preschool children adjust their pronunciation according to topic, to communicative intent, to conversational partner, for instance,

2. PHONOLOGICAL VARIATION AND SPELLING

31

playing doctor (Andersen, 1990) and talking with adults in contrast to other children (Wyatt, 2001), so that they are flexible. Yet social identity expressed through speech gets started early, as shown by the boys ages 3 to 10 years who chose -in' over -ing more frequently than the girls (Fischer, 1958). Children show that they change through the school years, for instance, the African American children in Grades 1 to 4 who came to select -th over -/ in words such as with and both, especially in structured speaking situations (Wyatt, 2001). At the same time, there is evidence that when families move to a new regional dialect, children of school age tend to be less successful acquiring the new phonological features of the dialect than younger children (Roberts, 2002). More fundamentally, children may hear and think about speech sounds in subtly different ways from adults. For instance, 3 first graders of 15, when recently assessed by my students for rhyming ability as an aspect of phonological awareness, resisted hearing given words as rhymes on a published test. In our part of upstate New York, dog and frog do not rhyme, though this is the only possibility according to the test. These 6year-olds heard the contrast, while my students—their teachers—were puzzled by it. The firm relevant evidence comes mainly from studies of 5- to 7-yearolds' invented spelling, not to mention the observations of teachers, since the possibility for creative spelling entered our national curriculum. When Charles Read (1975), Rebecca Treiman (1998), and others took a close look at the spellings children chose, they came to appreciate how sensitive the children were to the sounds in the words they were writing down. Memorable examples, such as CHRAP for trap and JRAGIN for dragon, showed that the children applied the alphabetic principle to the sound characteristics they heard in their own speech, recognizing the palatalization. Furthermore, Read (1975) and Treiman (1998), in their somewhat independent ways, concluded that the children not only were paying attention to low-level phonetic detail, but also showing their own conception of phonemic structure. The children in their studies were classifying sounds into abstract categories on their own terms. Many of them, for instance, identified the first sound of words like trap and try as belonging to ch rather than to t. Many also gave examples of invented spelling concerning nasals, familiar to teachers who encourage invented spelling. For pump, the children in Read's study wrote PUP; for bent, they wrote BET; for sink, they wrote SIK, omitting the nasal sounds m, n, or ng before consonants. Read (1975) noted that children apparently did not categorize such nasal sounds with nasals in other positions because they wrote the m in mad or the n in night. They heard that pump differs from pup and bent differs from ben, but they had no symbol to represent the acoustic fact that the na-

32

WEBER

salization is in the vowel, not a segment that we write with m or n. Read concluded that children may well regard the nasal as an aspect of the preceding vowel or of the following consonant, unlike adults. Labov (2003) noted how the phonological systems of children who speak African American Vernacular English may differ from their teachers', emphasizing their broad capacities. Speaking of the representations of words in their mental lexicon, Labov insists they must be stored with respect to "phonological categories, general and abstract enough to cover the wide variety of phonetic variants that the child encounters in everyday life. Yet we cannot assume that these categories match the sophisticated categories of adults, which are influenced to some degree by the writing system" (p. 130). An example is the category of / sounds, as in leap, tool, and people. He points out that these are different sounds and that to young speakers—of AAVE, in particular—the / of people is close to the second vowel of into and even identified as such. The origins of children's differences may be developmental because they are still shaping their phonological systems as they begin formal reading instruction. But they are likely to be grounded in the social dialects and styles they hear and engage in day to day. For many children, their lives may not have involved much attention to careful or formal speech, so the instructional style of their teachers may be somewhat novel. Labov (2001) likes to say that we all speak our mother's vernacular (p. 415). At any rate, some of these details of difference may exacerbate the distance that children perceive between spelled words and spoken words. VARIATION IN THE CLASSROOM In these times of phonological awareness, direct phonics instruction, and all sorts of activities with individual words and their parts, it is worth considering what actually happens in classrooms, given the vernacular that children bring to class. My informal survey of teachers concerning variation, especially nonstandard forms, elicited such remarks as "I work with it," "I provide the model," "I teach the kids the standard." My observations of teachers in classrooms, also informal, have shown that these remarks validly reflect what goes on. Teachers use their clear instructional style, focused on the task at hand and projected out over a group of students. The children are the ones who, as listeners and learners, make the adjustments. To see how variation may be recognized in teaching, it is instructive to consider how publishers of materials have decided to address it. I examined nearly 30 examples of commercial products published since the mid1990s, with an eye on the primary grades. These included materials to teach phonological awareness, phonics, and spelling; manuals for basal

2. PHONOLOGICAL VARIATION AND SPELLING

33

reading series; and books for teachers recommending ways to teach spelling-to-sound connections. By and large, little attention is given to differences. It seems that General American English is the variety represented in the materials. The cot/caught difference is maintained for all. Lists of words are offered that are supposed to rhyme (long, tong, strong, prong) for everyone; the word ten is chosen as a key word for short-e, even for areas where it would not differ from tin; the well-documented consonantcluster reduction in AAVE is overlooked. There were several exceptions. For instance, a book for teachers on the richness of word patterns for phonics study included this quick note regarding key words to exemplify pronunciations: "Remember that vowel sounds are pronounced differently in different parts of the country. If your students do not pronounce these words with the given sound, do not use them as examples of that sound" (Cunningham, 2000, p. 150). A spelling program (Harris, Graham, Zutell, & Gentry, 1995) provided multiple remarks on diverse pronunciations for teachers. For instance, the notes suggested that they give their r-less students practice pronouncing words like farm and there with the r sound, and prompted them to watch for possible spelling attempts, such as MAYIL for mail by students who say such words with two syllables. (The discussion of variation in a later edition of the program is much diminished.) The density of social dialect features (not just phonological) in children's speech has been shown to be related to achievement in literacy, in particular among children who come to school speaking African American Vernacular English. By and large, those speaking with a higher density in kindergarten achieve less in later grades than those who began with low dialect density. But linguistic differences are not the clear cause, as various studies have shown (Washington & Craig, 2001). Many other social factors may be implicated, including unequal educational opportunities, meager preparation for school, and low teacher expectations. There have been several efforts to pinpoint specifically phonological dialect features as a source of difficulty for speakers of AAVE. Labov (1970, 2003), in the course of his scholarly work on linguistic variation and change, has sustained an intense interest in confronting reading failure, especially among inner city children. He recently embarked on directing an instructional program to teach AAVE speakers what they need to know about the code to read connected text, based not only on the overall connections between spelling and sound but also the dialect-specific intricacies of reduced consonant clusters at the ends of words. The children improved on clusters at the beginning of words, which had been difficult for them. But they did not improve on clusters at the ends of words such as left and test, where their spoken language had reduced clusters. He was cautious about attributing the children's difficulties to dialect as such be-

34

WEBER

cause his project did not compare them to the problems of children speaking other dialects. But he concluded that the children may benefit from strengthening the abstract morphophonemic form, the type that fuses major/majority, in words with complex consonant clusters at the ends of words. This would be done by demonstrating that lef' and tes' in their speech has a i that shows up in the speech of other speakers of English, even in phrases such as left out and test on spelling (Labov, 2003). Consonant cluster reduction as a dialect feature has been shown to come into play in a study of performance on a phoneme-deletion task by 7- and 8year-old children who speak AAVE compared with children who speak the standard (Sligh & Conners, 2003). The children were matched for reading level yet showed different patterns of deletion, the standard-speaking children performing more accurately on word-final clusters (flakt > flat) than on beginning clusters (croal > coal) but, as expected, the AAVE-speaking children performing more accurately on beginning clusters than on wordfinal. Whereas the scores of the standard-speaking children correlated more strongly with reading measures, the overall scores of the AAVE-speaking children were on the whole more accurate, suggesting a greater phonological sensitivity among children who are on their way to becoming bidialectal, adding classroom style to their vernacular.

CLOSING COMMENTS It is difficult to weigh the significance of phonological features of regional dialects, social dialects, and stylistic variation in American English for beginning readers and writers. They have to learn a good deal about the spelling-sound code through spoken surface forms. In the classroom, they have to learn how to listen and talk about these forms—names of letters, sounds, rhymes, words, beginning and end of word. The details accumulate quickly. For many children, most of the learning is below awareness, as is most of their language acquisition and, incidentally, most of adopting and maintaining a social dialect. From instruction in spelling patterns, they may become aware of what they have already learned. Then, if all goes well, their awareness is overtaken by automatic recognition of one word after another. In light of the complexity and abstractness of the relationship between sounds and spelling, it may well be that differences between the children's vernacular and the instructional style of classrooms play a minimal part in connecting with print. At the same time that children are expected to become aware of the core correspondences and general regularities between spelling patterns and sound in English, they also face the inescapable irregularities in frequent words: of/off, the/she, do/go, own/down. They

2. PHONOLOGICAL VARIATION AND SPELLING

35

need to accommodate words that defy the patterns, even as they learn the powerful rules linking sound and spelling, aware of them or not. Expecting and accommodating such irregularity in the sound-spelling links, they may also come to expect and accommodate variation—a certain irregularity—rooted in their regional dialect, their social dialect, and their everyday speech. Perhaps more fundamentally, the variation they have come to expect and accommodate in speech may prepare them for the irregularities in sound-spelling patterns that exist for all English speakers. Irregularities, whatever the source, can cut into the confidence that children need to control and use the correspondence rules, but that is a different story. Perhaps what many children really become phonologically aware of is how different their teachers sound, especially when working on how speech is written down. The study of variation in English suggests that, for some, talking like their teacher may very much be a part of their agenda in working out their social identity, though for others it may not be. The study of variation also points to how flexible and aspiring children are regarding the many sides of the phonology of their language. When it comes to bridging the gaps between English spelling, the idealized variety that teachers may refer to, the standard but regional English of the classroom, and their own vernacular, children do most of the work.

REFERENCES Andersen, E. S. (1990). Speaking with style: The sociolinguistic skills of children. New York: Routledge. Calfee, R. C., & Norman, K. A. (1998). Psychological perspectives on the early reading wars: The case of phonological awareness. Teachers College Record, 98, 242-275. Chambers, J. K., Trudgill, P., & Schilling-Estes, N. (Eds.). (2002). The handbook of language variation and change. Maiden, MA: Blackwell. Chomsky, C. (1970). Reading, writing, and phonology. Harvard Educational Review, 40, 287-309. Coulmas, F. (1989). The writing systems of the world. Cambridge, MA: Blackwell. Cummings, D. W. (1988). American English spelling. Baltimore: Johns Hopkins University Press. Cunningham, P. (2000). Systematic sequential phonics they use. Greensboro, NC: CarsonDellosa. Dahl, K. L., Scharer, P. L., & Lawson, L. L. (1999). Phonics instruction and student achievement in whole language first-grade classrooms. Reading Research Quarterly, 34,312-341. Daniels, P. T., & Bright, W. (1996). The world's writing systems. New York: Oxford University Press. Ehri, L. C., Nunes, S. R., Willows, D. M., Schuster, B. V., Yaghoub-Zadeh, Z., & Shanahan, T. (2001). Phonemic awareness instruction helps children learn to read: Evidence from the National Reading Panel's meta-analysis. Reading Research Quarterly, 36, 250-287.

36

WEBER

Fischer, J. L. (1964). Social influences in the choice of a linguistic variant. In D. Hymes (Ed.), Language in culture and society (pp. 483—488). New York: Harper & Row. (Reprinted from Word, 14, 47-56.) Harris, K. R., Graham, S., Zutell, J., & Gentry, J. R. (1995). Spell it—write. Columbus, OH: Zaner-Bloser. Hartman, J. W. (1985). Guide to pronunciation. In F. Cassidy (Ed.), Dictionary of American Regional English (Vol. 1, pp. xli-lxi). Cambridge, MA: Harvard University Press. Kretzschmar, W. A., Jr. (1997). American English in the 21st century. In E. W. Schneider (Ed.), Englishes around the world (Vol. 1, pp. 307-323). Philadelphia: John Benjamins. Labov, W. (1970). The reading of the -ed suffix. In H. Levin & J. P. Williams (Eds.), Basic studies on reading (pp. 222-245). New York: Basic Books. Labov, W. (2001). Principles of linguistic change, Vol. 2: Social factors. Maiden, MA: Blackwell. Labov, W. (2003). When ordinary children fail to read. Reading Research Quarterly, 38, 128-131. Labov, W., Ash, S., & Boberg, C. (2004). The atlas of North American English: Phonetics, phonology, and sound change. Berlin: Mouton de Gruyter. Lanehart, S. L. (Ed.). (2001). Sociocultural and historical contexts of African American English. Philadelphia: John Benjamins. Luelsdorff, P. A. (Ed.). (1987). Orthography and phonology. Amsterdam: John Benjamins. McMahon, A. (2002). An introduction to English phonology. New York: Oxford University Press. Pederson, L. (2001). Dialects. In J. Algeo (Ed.), The Cambridge history of the English language; English in North America (Vol. 6, pp. 253-290). New York: Cambridge University Press. Read, C. (1975). Children's categorization of speech sounds in English (NCTE Research Report No. 17). Urbana, IL: National Council of Teachers of English. Roberts, J. (2002). Child language variation. In J. K. Chambers, P. Trudgill, & N. SchillingEstes (Eds.), The handbook of language variation and change (pp. 333-348). Maiden, MA: Blackwell. Sligh, A., & Conners, F. (2003). Relation of dialect to phonological processing: African American Vernacular English vs. Standard American English. Contemporary Educational Psychology, 28, 205-228. Treiman, R. (1998). Beginning to spell in English. In C. Hulme & R. M. Joshi (Eds.), Reading and spelling: Development and disorders (pp. 371-393). Mahwah, NJ: Lawrence Erlbaum Associates. Vachek, J. (1989). Written language revisited. Amsterdam: John Benjamins. Varnhagan, C. K., Boechler, P. M., & Steffler, D. J. (1999). Phonological and orthographic influences on children's vowel spelling. Scientific Studies in Reading, 3, 363-379. Venezky, R. L. (1999). The American way of spelling: The structure and origins of American English orthography. New York: Guilford. Washington, J. A., & Craig, H. K. (2001). Reading performance and dialectal variation. In J. L. Harris, A. G. Kamhi, & K. E. Pollock (Eds.), Literacy in African American communities (pp. 147-168). Mahwah, NJ: Lawrence Erlbaum Associates. Weber, R.-M. (1986). Variation in spelling and the special case of colloquial contractions. Visible Language, 20, 415-426. Wolfram, W., Adger, C. T., & Christian, D. (1999). Dialects in schools and communities. Mahwah, NJ: Lawrence Erlbaum Associates. Wolfram, W., & Schilling-Estes, N. (1998). American English: Dialects and variation. Maiden, MA: Blackwell. Wyatt, T. A. (2001). The role of family, community, and school in children's acquisition and maintenance of African American English. In S. L. Lanehart (Ed.), Sociocultural and historical contexts of African American English (pp. 261-280). Philadelphia: John Benjamins.

3 The Magic of Reading: Too Many Influences for Quick and Easy Explanations Dominic W. Massaro Alexandra Jesse University of California, Santa Cruz

A skilled reader cannot help but read even the blandest banners on the information highway and real highways. Like listening, contact with the linguistic signal is all that seems to be necessary. This behavior is easily exposed by the Stroop color word test. You are asked to name the color of the print of each of the words in a list. When the words are the names of other colors (e.g., the word blue printed in red), however, you either switch gears into slow motion or name the written words rather than the colors (i.e., in our example, you incorrectly answer "blue" rather than "red"). The written word overrides your intention to name the color, contributing to the impression that reading is clearly magical. The goal of this chapter is to show that reading of words, though indeed magical, is a magic that has been well examined and basically involves the ability of the reader to exploit multiple sources of information in a (overlapping) series of information-processing stages. Many of these sources and stages were studied by Dick Venezky, which makes this chapter a tribute to his insights into the magic of reading. Our proposal is grounded in the assumption that reading words is fundamentally a pattern recognition process, which involves imputing meaning to an input pattern. As our guide to the understanding of visual word recognition, we use a pattern-recognition model, the fuzzy logical model of perception (FLMP), that has achieved scientific success in reading as well as in several other domains of information processing. The general assumption of the FLMP is that well-learned patterns, such as written words, are recognized by applying a general algorithm, regard37

38

MASSARO AND JESSE

less of the modality or the nature of the pattern (see, e.g., Massaro, 1998). The FLMP assumes three operations: feature evaluation, feature integration, and decision. All three processes are successive but overlapping. Feature evaluation provides the degree to which each feature of the stimulus matches the corresponding feature in each prototype in memory. Prototypes are summary descriptions and contain a conjunction of various ideal properties (features) that a member of this prototype category should have. Fuzzy truth values (Zadeh, 1965) reflect the degree to which a given stimulus matches to the features of a prototype. The fuzzy truth values lie between completely false (0) and completely true (1). In addition to the multiple bottom-up sources of information, various top-down sources are assumed. These sources in reading are the orthographic, phonological, syntactic, semantic, and pragmatic structure, as well as the sublexical mappings from print to sound. Continuous information is available from each source, and the output of the evaluation of each source is independent of the output of another source (see Fig. 3.1).

FIG. 3.1. Schematic representation of the FLMP to include learning with feedback. The three recognition processes are shown to proceed left to right in time to illustrate their necessarily successive but overlapping processing. These processes make use of prototypes stored in long-term memory. The sources of information are represented by uppercase letters. Auditory information is represented by Ai and visual information by Vj. The evaluation process transforms these sources of information into psychological values (indicated by lowercase letters a; and YJ). These sources are then integrated to give an overall degree of support, Sk, for each alternative k. The decision operation maps the outputs of integration into some response alternative, Rk. The response can take the form of a discrete decision or a rating of the degree to which the alternative is likely. The feedback is assumed to tune the prototypical values of the features used by the evaluation process.

3. MAGIC OF READING

39

Feature integration combines all degrees of matches from each source of information for each prototype. The outcome of this process is the total degree to which each prototype matches the stimulus. The third process in the model makes a decision based on a relative goodness rule (Massaro & Friedman, 1990), the relative support of one alternative compared to the support for all other alternatives. The model predicts that one feature has its greatest effect when a second feature is the most ambiguous. Through this assumption, the model predicts that the time for decision increases with the ambiguity of the information available to the decision stage (Massaro, 1987). Consider the elaboration of the FLMP, depicted in Fig. 3.2, as a description of how the many different sources of information can influence letter and word processing in reading. The presentation of a letter pattern initiates a sequence of processing stages. Visual features are evaluated, and this information has several consequences. First, complete or even partial information from the features can activate letter patterns in long-term memory. Needless to say, the more visual information available, the more easily letter and word recognition can take place. Second, recognition of letters can be supplemented by the reader's knowledge of how letter patterns occur in the language. We call the form of this knowledge ortho-

FIG. 3.2. The different processes between presentation of a letter string and access to the lexicon as described by an elaboration of the fuzzy logical model of perception, which shows the processing streams of the many different sources of information that can influence letter and word processing in reading.

40

MASSARO AND JESSE

graphic structure. Letters that occur together more often should be easier to recognize than those in an infrequent or unlawful arrangement because of the contribution orthographic structure. Letter information activates words and spoken language representations, which we call phoneme information. Because readers also know the relationship between sounds and spellings, the activation of phonemes in turn activates a set of spelling patterns. Like the information about the association of letters to phonemes, the activated spelling patterns associated with phoneme information also feed forward to the lexical level and can aid or hinder word activation. A phoneme pattern limited in the number of ways it can be spelled would facilitate lexical access because only these spellings would activate the lexicon. When a phoneme pattern can be spelled in many different ways, it would hinder lexical access because a larger set of different possible spellings would be activating the lexicon. The information passed from this sound-to-spelling source (sound-tospelling fluency) does not affect evaluation or integration but can influence the time needed for decision making (Massaro, 1987). Using this model as a framework, we discuss three of the sources potentially involved in the word recognition process in detail. The first source is visual influences, such as the features of letters and the overall shape of a word. Second, we describe research indicating that knowledge about the orthographic structure of a word might help its recognition. Finally, we discuss evidence that the two-way association between orthographic and phonological information influences the word recognition process.

INFLUENCES IN WRITTEN WORD RECOGNITION In our model, letters and words are recognized via the visual features that make them up. Features can be elemental or relatively global depending on how much of a letter they describe. Elemental features of uppercase £ include three horizontal lines and one vertical. A global feature of lowercase c, e, and o is a circular envelope that distinguishes them from other letters, such as/, h, or j. Discovering the functional features in reading is a challenging empirical endeavor (for reviews, see also Massaro & Sanocki, 1993). Our goal here is simply to provide the reader with the flavor of what is already known and recent studies addressing this problem. Reading research began as an active area of psychological inquiry at the turn of the century (see Huey, 1968; Woodworth, 1938). For the last three decades, after a period of relative inactivity during the heyday of behaviorism, the process of reading written words has been intensely studied. One finding that led to this renewed interest was the demonstration that a

3. MAGIC OF READING

41

letter could be better recognized when presented in the context of a word than when presented in a random letter string or even when presented alone. This advantage, called the word advantage or word superiority effect, was shown to exist even if the possibilities of postperceptual guessing and memory loss were eliminated (Reicher, 1969). What was it about words that contributed to this word advantage? A natural interpretation of the word superiority effect is that words are recognized as wholes without intermediate processing of the features of letters that make them up. This little paragraph has circulated cyberspace in the last quarter of 2003, with the implication that words are read as wholes: Aoccdrnig to a rscheeahcr at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the Itteers in a wrod are, the olny iprmoetnt tihng is taht the frist and Isat Itteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey Iteter by itslef but the wrod as a wlohe.

Are you impressed that you were able to read this passage? Maybe you shouldn't be because you read much more slowly and laboriously than normal. Reading aloud would have also revealed the added difficulty created by scrambling the internal letters. Holistic word recognition is an old idea in reading research. Like John Updike, we are not fans of holism: "Next to the indeterminacy principle, I have learned in recent years to loathe most the term 'holistic,' a meaningless signifier empowering the muddle of all the useful distinctions human thought has labored at for two thousand years" (Roger Lambert, in John Updike's Roger's Version, p. 171). Some researchers and educators (Haber, Haber, & Furlin, 1983; Johnson, 1975) proposed that words are recognized as patterns of unique shapes rather than as unique sequences of letters. We call these properties global supraletter features because they supposedly are composed of multiletter patterns and even whole word patterns. The earlier paragraph shows convincingly that we can read scrambled words, even if they are misspelled or incomplete (like rscheeahcr or iprmoetnt). But are we actually reading words as a whole? And do we need the first and last letter to stay in their original position? A little thought reveals that global features cannot be sufficient for even the expert reader. One of the strongest arguments against the idea of supraletter features is the small potential contribution of supraletter features to reading. Overall word shape, for example, does not sufficiently differentiate among the words of a language. In a classic study, Groff (1975) examined the shapes of high-frequency words taken from schoolbooks. The shape was defined by drawing a contour around the letters.

42

MASSARO AND JESSE

Only 20% of the 283 words was represented by a unique shape. Groff rightly concludes that the small number of words that can be represented by a unique shape precludes the use of this cue for accurate word recognition. Using a much larger sample of words, Paap, Newsome, and Noel (1984) also showed that there is not sufficient uniqueness of word shapes that could be used to mediate word recognition. There is also experimental evidence against the idea of word recognition based on supraletter features. Adams (1979) asked whether disrupting word shape (mixing upper- and lowercase and type fonts of letters) eliminates the identification advantage of words over nonword letter strings. If the word shape is contributing to the word advantage, because it is used to access the lexicon, then the advantage should diminish when the shape of words is altered and can therefore no longer be used to access the mental lexicon. The word advantage did not change when the global word shape was eliminated (see also Thompson & Massaro, 1973). One would think that the word shape idea was sufficiently demolished but Paap and his colleagues (1984) tested whether the number of words that share a certain word shape could still influence word recognition. When a shape matches a small set of words (e.g., cellar), then the shape feature restricts the lexical search to this small set of candidates, and therefore all words of this small set should be processed faster or more accurately than words in a larger set (e.g., recall). When the shape is shared by a large set of words, a response cannot be given until letter identification is almost completed. Contrary to this expectation, Paap et al. (1984) actually found that words with rare shapes are not accessed faster than words with common shapes, falsifying the word shape hypothesis. Although three decades of empirical evidence indicate that words are not read as a whole, the first and last letters may be more important than the medial ones. The paragraph of scrambled words that was sent so actively over the Internet could have been inspired by the research of Jordan and colleagues (Jordan, Thomas, Patching, & Scott-Brown, 2003). Jordan et al.'s study goal was to show that exterior letters (i.e., the first and the last letter of a word) are special in reading. Indeed, there is some truth to the hypothesis that first and last letters have an advantage over their embedded letter cohort. This advantage occurs because neighboring letters are not always kind to one another. Lateral masking refers to the interference that a letter has on its neighbor(s). An embedded letter in a word has two interfering neighbors, whereas the first and last letters have only one. Accordingly, a letter will necessarily be (ceteris paribus) more visible at the first and last position than in the middle of a word. Jordan et al.'s results could be simply evidence of this lateral masking rather than implication of a special functional unit of exterior letters used to access the mental lexicon.

3. MAGIC OF READING

43

If the first and last letters were responsible for word recognition, then we would also expect that words would be uniquely defined by their first and last letters in analogous fashion to what we expected from word shape. A quick look at the 1,000 most frequent words in English reveals that there are many words that share their first and last letters, even when word length is controlled: wish wash short shoot share shape wide wife

while whole whose where week weak tree true

that test step stop shake share scale scene

In the spirit of finding a magical solution, we thought that it would be valuable to combine the whole word shape and first-last letters solutions and determine if these two factors in combination provide sufficient information for reading words. We found that only 9% of the 1,000 most frequent words was uniquely defined by their exterior letters. Adding word length as a defining feature increased this percentage to 40%. In comparison, only 24% of the words has a unique word shape. When exterior letters, interior word shape, and length were considered as features, 75% of the thousand most frequent words was uniquely described. At first glance, the reader might believe that three out of four times is not bad. However, this requires the reader to recognize the first and last letters, the length of the word, and the word shape of the interior letters. This is not a trivial amount of processing to bypass a strategy simply of processing the letters of the word. Although we have rejected minimalist hypotheses about reading words, we have not yet accounted for the magic of word recognition. What is it about words that make them so easy to recognize by the expert reader? To better appreciate how words are read, it is important to understand that readers can operate reasonably well with partial information but sometimes must falter. This is a common outcome in pattern recognition more generally. We recognize our friend in a crowd and then discover it was not our friend. Another friend who shaved his beard goes unnoticed. All of us have experienced misunderstanding a sentence because we recognized a word incorrectly. This shows that we do not usually require complete unambiguous information before making a decision in word recognition. Second, we use multiple sources of information in pattern recognition. Many sources of nonvisual information supplement the featural information from the letters. In our infamous paragraph, syntactic and semantic constraints facilitated its reading. A colleague's skilled fourth grade reader had trouble with the paragraph, ostensibly because she had less knowledge that was critical to reading its visually degenerate

44

MASSARO AND JESSE

form. Another important source of information is knowledge about the orthographic structure of the language (Massaro, 1975; Venezky & Massaro, 1987).

ORTHOGRAPHIC STRUCTURE INFLUENCES IN WRITTEN WORD RECOGNITION Orthographic structure refers to the fact that a written language, such as English, follows certain rules of spelling. These rules prohibit certain letter combinations and make some letters and combinations much more likely in certain positions of words than others. There is evidence that readers use these constraints in the written language in word recognition. Venezky's (1970) seminal analysis of English orthography offered this perspective as an alternative account of the word superiority effect. He found that there was a considerable amount of sublexical structure in English that could be used in reading and spelling. His early empirical studies carried out with Calfee and colleagues (e.g., Calfee, Chapman, & Venezky, 1972) tracked the growth of this understanding across the development of reading skill. Isolating these sublexical influences on word recognition is, however, not easy. There are methodological and technical challenges that impede progress, as well as theoretical controversies that continue unabated. An important question is the nature of a reader's knowledge about orthographic structure. It is possible to distinguish between two broad categories of orthographic structures: statistical redundancy and rulegoverned regularity (Massaro, Taylor, Venezky, Jastrzembski, & Lucas, 1980; Venezky & Massaro, 1979, 1987). The first category includes all descriptions derived solely from the frequency of letters and letter sequences in written texts. The second category includes all descriptions derived from the phonological constraints in English and scribal conventions for writing words as sequences of letters. Although these two descriptions are highly correlated in written English, it is possible to create letter strings that allow the descriptions to be orthogonally varied. Our collaborative studies indicated some psychological reality for both frequency and the regularity description of orthographic structure. The results of these studies provided evidence for the use of top-down knowledge in the perceptual processing of letter strings. Lexical status, orthographic regularity, and frequency appear to be important components of the higher order knowledge that is used (Massaro et al., 1980). In addition, an item analysis of Waters and Seidenberg's study (1985) found that word frequency, spelling-to-sound correspondences, and orthographic regularity influence the time needed to identify and name a word as well as the accuracy

3. MAGIC OF READING

45

of this recognition performance (Massaro & Cohen, 1994; Venezky & Massaro, 1987).

SPELLING-TO-SOUND INFLUENCES IN WRITTEN WORD RECOGNITION Returning to the reading model shown in Fig. 3.2, it can be seen that letter patterns can be mapped into spoken language, and this information can be used to recognize printed words. The best-known models built on Venezky's seminal book in 1970, which based on his dissertation gave the first systematic analysis of the correspondence between orthography and phonology in English. Dual-route models (Coltheart, 1978; Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Forster & Chambers, 1973) assumed a mostly rule-based mapping of the letter string into its pronunciation. Pronunciations for regular words like hint and nonwords can be assembled using grapheme-phoneme correspondence rules. This process will succeed for regular words but not irregular words, such as pint, because an incorrect phonological code will be assembled. Correct pronunciations for irregular words must therefore be retrieved along a second route directly by accessing the lexicon. Evidence supporting the dual-route assumption was that regular words were named more quickly than exception words (Baron & Strawson, 1976; Gough & Cosky, 1977; Stanovich & Bauer, 1978). The dual-route model and its implementation predict this result (Coltheart et al., 2001). The model assumes that for irregular items the information sent from the lexical and from the nonlexical route to the phoneme system will conflict. The size of the effect is determined by the difference in speed of the lexical route in comparison to the nonlexical route. This predicts an interaction between regularity and frequency. For high-frequency irregular words, phonological information from the lexicon is available sooner than for low-frequency words and therefore has less of a chance to be inhibited by information from the grapheme-phoneme correspondence route. This mechanism, in addition to the assumption of serial left-to-right processing, also predicts a serial position effect of regularity (Rastle & Coltheart, 1999; but see Rastle & Coltheart, 2000; Zorzi, 2000). Our model differs from the dual-route model in that there are many parallel influences in word recognition, not separate routes. We also prefer the descriptor streams to describe the continuous and temporal overlapping nature of these influences. As with other sources of information, an empirical challenge is to determine to what extent sound-to-spelling information influences word recognition. In addition, it is important to understand how this influence occurs in the processing leading up to

46

MASSARO AND JESSE

word recognition. We now present two different views about how soundto-spelling information influences word recognition. Lexical Consistency Contrary to the idea that sublexical spelling patterns can be mapped to sound, Glushko (1979) proposed a new concept of lexical consistency. Glushko defined in his activation and synthesis model words that only activate similarly pronounced words as consistent and if they activate words with other pronunciations as inconsistent. One important difference between spelling-to-sound regularity and lexical consistency is that words are not consistent or inconsistent based on their own spelling but only in relation to other words that are activated while processing them. Given these descriptions of regularity and consistency, words can be irregular and inconsistent, irregular and consistent, regular and consistent, or regular and inconsistent. If consistency is psychologically meaningful, then consistent regular words (e.g., _EEK as in WEEK, which shares the pronunciation with all other words including _EEK, i.e., CHEEK, CREEK, MEEK, REEK, SEEK, and SLEEK) should be named more quickly than regular inconsistent words (e.g., _ORK as in CORK, which shares the pronunciation of _ORK with FORK and PORK, but not with _ORK in WORK). Results from Glushko (1979) and others (e.g., Andrews, 1982; Jared, 2002; Seidenberg, Waters, Barnes, & Tanenhaus, 1984) support this prediction, which indicates that consistency is a meaningful concept and that regularity cannot fully account for the mapping between orthography and phonology during word recognition because it does not predict a difference between inconsistent-regular and consistent-regular words. The evidence for the lexical consistency account was thought to falsify models that incorporated a rule-governed conversion from spelling to sound. However, this is not necessarily the case. For example, the dualroute model (Coltheart, Curtis, Atkins, & Haller, 1993; Coltheart et al., 2001) traditionally assumed only a rule-based mapping from spelling to sound. However, Coltheart and colleagues (Coltheart et al., 2001) show that their dual-route model can simulate spelling-to-sound consistency effects. Therefore, the consistency effect no longer falsifies the dual-route model. This new assumption morphs their model into one that is much more similar to the FLMP depicted in Fig. 3.2. Spelling-to-Sound Fluency We offered an alternative to the lexical consistency description by formalizing a fluency metric that was meant to capture systematic occurrences that exist between spelling and sound in the input language (Venezky &

3. MAGIC OF READING

47

Massaro, 1987). A written letter string would have high fluency to the extent that its spelling patterns mapped in a consistent way to spoken language. Low fluency would correspond to a case in which the sublexical spelling patterns of a word are not very predictive of its pronunciation. We also assumed that a critical variable for the spelling-to-sound fluency was the frequency of occurrence of the spelling-to-sound associations. Frequency of exposure is an important influence on behavior. Infants, for example, can be attuned to systematic occurrence of speech segments by a very short exposure (Saffran, Newport, & Aslin, 1996). According to the sublexical fluency approach, the correspondences of the sublexical units, not just the correspondence of the word, are functional. Zero-order fluency is a simple measure of single letters and their pronunciation. The letters of the word THIN would be mapped in the following way: T to /0/, H to a blank, I to /I/, and N to /n/. First-order fluency allows the input spelling to be partitioned into multiletter spelling units (e.g., CHIN is treated as a sequence of three grapheme units, CH, I, and N). Second-order fluency measure acknowledges that the positions of the graphemes would be informative (e.g., the CH in CHIN would have a different fluency measure than the CH in ACHE). Venezky and Massaro (1987) found that second-order fluency independently predicted 14% of the variance in both naming and lexical decision tasks, after other sources of variance (e.g., word frequency) were partialed out. We now turn to another potential influence in word recognition, which concerns how sound maps into spelling.

SOUND-TO-SPELLING INFLUENCES IN WRITTEN WORD RECOGNITION Lately, researchers have tried to show that a critical variable is not only how letter patterns map into spoken language but also how spoken language maps back into written language. Stone, Vanhoy, and Van Orden (1997) operationalized this idea in terms of the concept of feedback (sound-to-spelling) consistency. This two-way street of word recognition was inspired by interactive activation. The principle of interactive activation assumes that the activation is transmitted back and forth between different layers of neural units. In contrast, noninteractive models, such as our FLMP (Massaro & Cohen, 1994), suggest a strict feedforward flow of information. Stone et al. (1997) used a lexical consistency framework to analyze whether a spoken language segment can be spelled in more than one way. For example, the segment /_ip/ can be spelled either _EAP as in HEAP or _EEP as in DEEP. Therefore, a word with this segment is sound-to-

48

MASSARO AND JESSE

spelling inconsistent. In contrast, the segment /_ob/ could only be spelled as _OBE, as in the words PROBE and GLOBE, which are therefore called sound-to-spelling consistent words. Using this measure, Stone et al. not only replicated the spelling-to-sound consistency effect but also showed that sound-to-spelling consistency played a role in the lexical decision. Ziegler, Montant, and Jacobs (1997) replicated Stone et al.'s results successfully with French monosyllabic words in the lexical decision task. Methodological Issues Peereman and colleagues (Peereman, Content, & Bonin, 1998) argued that Ziegler et al.'s (1997) results were due to a confound of subjective familiarity. The subjective familiarity measure is based on a rating of how familiar a typical reader is with a word. Peereman et al. found that when subjective familiarity is entered as a covariate in Ziegler et al.'s study, no significant sound-to-spelling consistency effect is found. Peereman et al. were also not able to replicate sound-to-spelling consistency effects in the naming task or in the lexical decision task for French words when subjective frequency in print, as estimated by independent ratings, was controlled. Not surprisingly, however, the Peereman et al. (1998) study can be criticized on several counts. Importantly, they did not control for the second phonemes in their test words. There were more consonant cluster onsets in the sound-to-spelling inconsistent condition than in the sound-tospelling consistent condition. Peereman et al. found no significant soundto-spelling consistency effect, but if consonant-cluster onsets decrease reaction time, then this effect could have cancelled out the sound-to-spelling consistency effect (see also Kessler, Treiman, & Mullennix, 2002). Given these ambiguous findings, we decided to explore further the existence and generality of the sound-to-spelling consistency effect. A Systematic Investigation of the Sound-to-Spelling Consistency Effect The first two studies attempted to replicate both the spelling-to-sound and sound-to-spelling effect in a 2 x 2 factorial design in naming aloud while circumventing methodological problems of previous studies (Peereman et al., 1998; Ziegler et al., 1997). To avoid potential problems using a voice key, we recorded participants' responses and analyzed them with offline visualization methods to determine the onset of the articulation in the sound wave. To be able to record a more direct measure of articulation with the use of offline visualization procedures, we used a postvocalic naming task (Kawamoto et al., 1998). In this task, the participant was cued before each test trial and was asked to initiate and to pro-

3. MAGIC OF READING

49

duce continuously an "uhhhh" sound when the stimulus is presented. In this postvocalic naming task, the participant must stop production of the vowel sound before the test word can be named aloud (Kawamoto, Kello, Jones, & Bame, 1998). This task is based on the assumption that the offset of the vowel sound "uhhhh" is equal to the onset of articulation of the target stimulus. Mean initial phoneme duration was used in addition to mean naming latency as dependent variables. Kawamoto and colleagues (Kawamoto et al., 1998) showed the informativeness of this dependent variable. The duration of pronouncing a phoneme preceding an inconsistently pronounced vowel is longer than when the same phoneme precedes a vowel with a regular and consistent pronunciation. The finding was interpreted to mean that readers start articulation for a word as soon as the necessary information for the first phoneme is available. Seventy-two monosyllabic, English, four-letter words were used as test items. The two independent variables of spelling-to-sound and sound-tospelling consistency had each two levels, lexically consistent and inconsistent items. Neighborhood structure (Coltheart, Davelaar, Jonasson, & Besner, 1977; Grainger, O'Regan, Jacobs, & Segui, 1992) and subjective familiarity (taken from the MRC Psycholinguistic Database, 1987) were equated across the four sets of words. Words were also matched on various variables that are known to influence written word recognition, such as frequency in print (Kucera & Francis, 1967) and summed positional bigram frequency (Massaro, Jastrzembski, & Lucas, 1981; Massaro et al., 1980). The word sets were also matched on initial phonemes (i.e., an equal number of items with the same manner of articulation). All words were phonological consonant-vowel-consonant (CVC) words. Two postvocalic naming tasks were conducted: The first one was an immediate naming task, and the second one a delayed version with six different delays between 150 and 1,400 ms. This delayed naming task allows the participant to complete lexical access ("delays of 650 ms or greater," according to Goldinger, Azuma, Abramson, & Jain, 1997, p. 191), and any consistency effects could be attributed to postlexical processing. The immediate and the delayed postvocalic naming task showed similar results (see Table 3.1 for an overview of all results). Sound-to-spelling consistency influenced initial phoneme duration of the response as well as its reaction time. Spelling-to-sound consistency only affected the initial phoneme duration of the response. There was no interaction between the two types of consistency. Delay in the delayed naming task shortened reaction times as delay was increased up to 1,150 ms. Most important, however, delay did also not interact with consistency. Following the logic of the delayed naming task (Balota & Chumbley, 1985; Forster & Chambers, 1973), it seems safest to conclude that the significant effects were at least partially produced by postperceptual processes.

50

MASSARO AND JESSE TABLE 3.1 Results of Consistency Studies Consistency Spelling-to-Sound

Type of Task Immediate postvocalic naming task Delayed postvocalic naming task German fragmentation task

English fragmentation task

Lexical decision task Words

DV

Subjects

Items

RT IPdur RT IPdur IT % err %T IT % err %T RT % err

ns p < .01 ns p < .01 p < .05 ns p < .01 ns ns ns — —

ns ns ns ns ns ns p < .05 ns ns ns — —

Sound-to-Spelling Subjects

p p p p

< .05 < .01 < .01 < .01 ns p < .05 p < .01 ns p < .05 ns ns ns

Items

ns ns ns ns ns ns ns ns ns ns ns ns

Note. Dependent variables (DV) are mean reaction time (RT), mean initial phoneme duration (IPdur), mean level of correct identification (IT), mean error rate in percentage (% err), and overall mean performance (% T).

Replacing the voice key with digital offline processing, adding a new informative dependent variable—initial phoneme duration of the response, controlling for all previous confounds, and including delayed response conditions improved the validity of the naming task. With this improvement, no convincing evidence was found for consistency influences on word recognition. In successive studies, we also investigated whether consistency effects would occur in other tasks, such as perceptual identification and lexical decision, as well as in other languages, such as German. Do consistency effects differ in (shallow) orthographies, such as, for example, German, compared with inconsistent (deep) orthographies such as English and French (Ziegler, Jacobs, & Stone, 1996; Ziegler, Stone, & Jacobs, 1997)? The next two experiments examined consistency in the fragmentation task (Snodgrass & Mintzer, 1993; Snodgrass & Poster, 1992; Ziegler, Rey, & Jacobs, 1998), which presents test words that are only partially displayed. Participants are instructed to type in a word as soon as they think there is enough information. There were eight stimulus levels, ranging from minimal information displayed to a presentation of the complete word. The level of information is increased systematically until the participant responds. This nature of the task makes the fragmentation task similar to speed-accuracy trade-off tasks (Ziegler et al., 1998). Both accuracy

3. MAGIC OF READING

51

and how much stimulus information was presented when a response was made are necessary to describe performance. For the German study, spelling-to-sound consistent items are produced at less informative levels and with a lower error rate than spelling-to-sound inconsistent items. The same results were found for soundto-spelling consistency. Unfortunately, this result could be due to a difference in subjective familiarity because it could not be controlled for German (see Table 3.1 for detailed overview of results of the German and English fragmentation task). When subjective familiarity was controlled in the English fragmentation study, there was no difference between consistent and inconsistent items of either type of consistency for level of correct identification. There was also no interaction between the two types of consistency for any of the dependent variables. However, words that have rimes that can be spelled in more than one way produce significantly higher error rates than words with rimes that are always spelled the same way. A last study was conducted that investigated whether consistency influences lexical decision. Spelling-to-sound and sound-to-spelling consistency effects had been found for lexical decision for English and French (Stone et al., 1997; Ziegler et al., 1997). However, our word list eliminated the confounds in these previous studies (i.e., neighborhood structure and subjective familiarity). Given these constraints in controlling for a variety of variables, there was a limitation in constructing matching nonwords, and the design had to be reduced to just one independent variable, soundto-spelling consistency. The 36 spelling-to-sound consistent words from the naming and English fragmentation studies were used. Half of them were sound-to-spelling consistent, the other half inconsistent. Thirty-six four-letter, spelling-to-sound consistent, monosyllabic CVC nonwords were selected. All of the nonwords were created based on bodies of real words. A body of a monosyllabic word is the end of a word starting at the first vowel. An onset is the part of the word that precedes the body. All of the nonwords had bodies of monosyllabic four-letter words, but the onset of each was replaced by a new single consonant onset (e.g., SISK created out of DISK). The onsets used were the same as for the word items so that lexicality was not confounded with a certain onset. Half of the nonwords were sound-to-spelling consistent (e.g., _IFE in the nonword TIFE, based on FIFE, KNIFE, LIFE, STRIFE, WIFE), the other half is inconsistent (e.g., _AKE in the nonword PAKE, based on CAKE, BAKE, BRAKE, DRAKE, FAKE, FLAKE, LAKE, MAKE, QUAKE, RAKE, SHAKE, but also on ACHE and BREAK). Only bodies were used that were shared by words from the same consistency group. The word and nonword stimuli were matched on positional bigram frequency. There-

52

MASSARO AND JESSE

fore, nonwords highly resembled English words in their orthographic structure. This should increase the reliance on phonological (or semantic) information to distinguish words from nonwords. Sublexical orthographic information could not be used to make a decision. In the lexical decision task, none of the analyses for words showed a significant difference between sound-to-spelling consistent and inconsistent items. Words with a pronunciation that could be spelled multiple ways (e.g., MALL), and therefore resembled many different words, were not more difficult to reject than nonwords that have a pronunciation that can only be spelled in one way (e.g., CAPE; see Table 3.1 for an overview of the results of the lexical decision task). In sum, the existing literature and our current experiments provide very little evidence that lexical consistency influences word recognition (lexical access). There is little evidence that the observed consistency effects are caused by perceptual influences on word recognition. We believe that the reason for this outcome is that consistency—as currently defined in terms of word neighbors that share the body or its phonological equivalent, the rime—is not the appropriate measure of sublexical influences in reading. We believe that fluency, described earlier in the account of spelling-to-sound influences, is a more psychologically valid description of these influences (Massaro & Cohen, 1994; Venezky & Massaro, 1987). Although our fluency measure has been formalized only for spelling-tosound influences, we describe how it can be easily extended to sound-tospelling influences. Fluency and Modeling Sound-to-Spelling Influences Independent of any methodological issues in the previous empirical studies, there are two important questions to address. First, if indeed soundto-spelling consistency influences written word recognition, what is the best description of this sound-to-spelling structure? Second, if it is psychologically functional in written word recognition, is feedback necessary to account for this influence or can this influence be accounted for in a feed-forward model? With respect to the description of consistency descriptions, the soundto-spelling consistency manipulation in our experiments was based, as in Stone et al. (1997), on the body or rime of a word. This definition and operationalization of orthographic feedback consistency is only one of the many possible ones. A different method of segmenting the spoken language would lead to different measures of feedback consistency. The type of definition that is most psychologically real would also inform the debate of written word recognition in terms of whether words are read via

3. MAGIC OF READING

53

sublexical letter patterns or whether they are read as simply being selected from activated words in the reader's lexicon. Given the previous formalization of spelling-to-sound fluency, we now develop an analogous fluency measure for sound-to-spelling consistency (Jesse, 2000; Massaro & Jesse, 2000). Any database used to compute fluency of the mapping from spelling to sound can be used to compute the mapping from sound to spelling. Instead of asking how likely a grapheme is pronounced as a phoneme, we ask how likely a phoneme is spelled by a grapheme. This might be a better measure of sound-to-spelling influences than the current ones. The fluency score of a phoneme-to-grapheme correspondence is based on the number of occurrences of the mapping relative to the sum of all mappings for the phoneme. The zero-order measure would force each phoneme into a single letter, whereas the first-order measure would allow the phoneme to be mapped into multiletter graphemes. An example to illustrate is the mapping of the word THIN: For the first-order fluency measure, the phonemes of THIN (i.e., /0In/) have each to be mapped into one grapheme: /0/ would map into TH, /I/ to I, and /n/ to N. The second-order fluency measure would consider the position of the phonemes in the word. For all versions of the fluency measure, the fluency score for a whole word would be the average of the fluency scores for its constituent parts. A similar development process using our earlier approach to spellingto-sound correspondences (Venezky & Massaro, 1987) was employed by Perry, Ziegler, and Coltheart (2002) to create a comparable fluency measure for sound-to-spelling correspondences. Our measure differs from Perry et al.'s (2002) in that it is calculated not on the basis of all monosyllabic words, but on all words. Also their use of a truncated frequency measure is problematic, and we think that our measure based on logarithmic frequency deals better with the problem of inflating the measure by the inclusion of a few single high-frequency items. These two fluency measures also differ in their definitions of how different orders of fluency are determined. Our fluency metric also differs significantly from the definition of sound-to-spelling consistency used by Stone et al. (1997). Because their analysis is based on lexical consistency, potential spelling patterns are limited to existing words in the language rather than potentially legal sublexical spellings. Furthermore, their definition of sound-to-spelling consistency precludes certain spelling patterns for spoken language segments even though these would be admissible in the language. For example, they claim that /ob/ can be written only as OBE. This is because consistency is defined in terms of the rime or body of existing words in the lexicon (Bowey, 1990, 1993; Treiman, 1994; Treiman & Chafetz, 1987; Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995; Treiman &

54

MASSARO AND JESSE

Zukowski, 1988). The word GLOBE would be broken up into the onset /gl/ and the rhyme /ob/. Thus, they ask, what are the possible spellings in English of /ob/? Because all monosyllabic words that end in /ob/ are spelled OBE, GLOBE is categorized as a sound-to-spelling consistent word, which should produce faster reaction times in a lexical decision task. In contrast, the speech segment /o/ can be spelled in many different ways, such as OA as in the word MOAT, OE as in HOE, OW as in GROW, or O_E as in VOTE. By this criterion, OBE would not be sound-to-spelling consistent. The sublexical unit at which consistency is defined is crucial for the word's classification as consistent or inconsistent. An appropriate definition is needed to describe and predict reading performance adequately. The second issue in modeling sound-to-spelling consistency effect is what structure a model needs to account for it. Any influence of how a phonological segment is spelled is usually implemented as an interaction among different levels in a connectionist model. To understand how our feed-forward model is sufficient to describe sound-to-spelling influences in word recognition, it is valuable to understand how it would work in a feedback model. Stone and colleagues (1997) inappropriately define feedback models as all models that can behave as if they would contain a feedback loop. This can be realized through interactive activation, where information between layers is transmitted via a forward and a feedback loop. A letter string is presented, which activates the letter representation. Processing of this letter representation activates phonemes, supposedly on the basis of something like the influence of our fluency variable but most commonly described as consistency measure. This activation of phonemes in turn activates a set of spelling patterns. These spelling patterns then feed back to the letter representation and activate it accordingly. To the extent that the phoneme level is mapped into a single spelling pattern, the letter representation would be greatly biased to this spelling pattern. To the extent that the phoneme level is mapped into several different spelling patterns, the letter representation would be much noisier and therefore less informative about which written word was presented. This description of how feedback consistency would work in a connectionist model is exactly analogous to how context is assumed to operate in the interactive activation model. Peereman et al. (1998) stated in their article on sound-to-spelling consistency that the "purpose of the present research (on the existence of the sound-to-spelling consistency effect) is to explore whether word recognition, as indexed by the lexical decision task, entails interactive activation between orthographic and phonological codes" (p. 152). As can be seen in the account given by the FLMP, an influence of sound-to-spelling consis-

3. MAGIC OF READING

55

tency, if one exists, does not require interactive activation, and therefore the test of sound-to-spelling consistency is not a test of interactive activation. Stone et al. (1997) acknowledge that their results do not necessarily require a model with a feedback loop. It also can be implemented with a "simple-match procedure" (Stone et al., 1997, p. 353), in which the processing of one source of information is not altered by any information flowing back from another source of information (Massaro, 1979; Paap, Newsome, McDonald, & Schvaneveldt, 1982). Stone and colleagues still call this kind of model feedback, although the flow of information is strictly forward. We suggest that this blurring of the difference between feedback and feed-forward is inappropriate. Postulating this source of information is not completely a post hoc explanation because, in addition to orthography itself (e.g., Seidenberg & Tanenhaus, 1979; Tanenhaus, Flanigan, & Seidenberg, 1980; Whatmough, Arguin, & Bub, 1999), sound-to-spelling correspondences have been shown to influence auditory recognition (Ziegler & Ferrand, 1998). Ziegler and Ferrand (1998) found for auditory word recognition that French words with phonological rimes that can be mapped into multiple spellings produce longer auditory lexical decision latencies and more errors than sound-to-spelling consistent words. The FLMP does predict that the relationship between orthography and phonology plays a role in auditory word perception as it assumes a general algorithm across modalities. Therefore, it is not justified to say that the FLMP does not "naturally predict a feedback effect" (Ziegler et al., 1997, p. 535). Contrary to Peereman et al. (1998), the existence of feedback consistency does not require a model of interactive activation. The FLMP can account for a feedback consistency effect by assuming that sound-to-spelling correspondences are influential. As the word's letters are recognized, their corresponding pronunciations are made available, which then provide independent information about which letters and word is present. We do not need a feedback loop to explain the feedback consistency effect. The FLMP and connectionist models (such as, e.g., the multiple read-out model including phonology by Jacobs, Grainger, Rey, & Ziegler, 1998), therefore, make similar predictions. However, it has been shown in several other studies that it is not necessary to assume interactive activation to explain common phenomena in language processing (Massaro, 1989; Massaro & Cohen, 1991). The FLMP can, for example, better account for the influence of context on stimulus identification without assuming a feedback connection (Massaro & Cohen, 1991,1994; Massaro & Friedman, 1990) than interaction models with feedback. It can also account better for the influences of bottom-up and top-down sources of information in speech perception (Massaro, 1989)

56

MASSARO AND JESSE

than its interactive-activation opponent, the TRACE model (McClelland & Elman, 1986). More recent evidence comes from a study of masked priming in Hebrew, obtaining about 100,000 data points from 160 participants (Frost, 2003; Frost, Ahissar, Gotesman, & Tayeb, 2003). In the masked priming study, a short presentation of a priming word is preceded by a mask and followed by a test word. A lexical decision is made based on the test word. Participants do not report the priming word supposedly because of the preceding mask and the test word functioning as a backward mask. The priming word is either similar to or different from the test word in spelling or pronunciation, or both. As expected, a test that differs by two letters from the prime is responded to more slowly in the lexical decision task than if the prime differs by just a single letter. In both cases, the prime was homophonic with the test. This effect of orthography shows that the lexical decision task is sensitive to the letter processing of the test word. Effects of phonology were also apparent. The homophonic (i.e., identical sounding) prime (e.g., KLIP as a prime for CLIP) facilitates the response to the test word in the lexical decision task relative to a priming word that differs in its pronunciation by the initial phoneme (e.g., PLIP as a prime for CLIP). This klip-CLIP priming effect as originally found by Lukatela and colleagues (Lukatela, Frost, & Turvey, 1998) had been criticized because replications usually failed (Coltheart et al., 2001). It has been argued that it could be observed only under certain light conditions. However, Frost (2003) shows in his replication in Hebrew that this phonological priming effect can be found for a wide range of levels of luminance and stimulus onset asynchronies (SOAs). This strong result shows that prelexical phonological effects appear to be influential very early in processing, if we can assume that the lexical decision task is measuring this early processing. A postperceptual process might also be responsible in that the test word might be recognized without any influence from the prime but that the prime speeds up the response selection and production process. Somehow a similar sounding word to the test would speed up lexical decision relative to a dissimilar sounding word. A control analogous to the delayed naming task would be to present the prime just after the test word is recognized (about 200 ms after its onset) and determine whether the nature of the prime is still influential. The phonological effects in masked priming, if accepted to reflect early processing of the test word, are strong evidence against interactive activation. The reason is that there would not be sufficient time for feedback and interactive activation to occur. Frost (2003) makes an analogous argument against the dual route cascaded (DRC) model because the orthographic lexicon would have to be accessed before phonological information gets activated in the DRC.

3. MAGIC OF READING

57

CONCLUSION We began with acknowledging that reading was magical but composed of a magic that was well studied and, perhaps, even fairly well understood. We hope that our guided travel through the research and theories about word recognition have illuminated the complexity involved when multiple sources of information are influential. To support our road map, we presented evidence that visual word recognition relies, just like any other form of pattern recognition, on the successful exploitation of multiple sources of information. We discussed the contribution of visual, orthographic, and phonological sources of information to the visual word recognition process. We showed that some old ideas, such as reading words as wholes or without the influence of spoken language, have been successfully defeated. However, they have been replaced with new controversies that are currently unresolved. Such controversies include the question of the appropriate sublexical units used in reading and the role of sound-tospelling mappings. We described a series of experiments to shed light on these questions. We can conclude that although phonological information in general certainly plays a role in written word recognition, there is little evidence that consistency influences word recognition. We believe that a reason for this might be that the definition of consistency is based on the body and rime unit, which is not the appropriate sublexical unit in reading. Fluency, as developed by Venezky, is a more psychologically valid description of sublexical phonological influences (Massaro & Cohen, 1994; Venezky & Massaro, 1987). Fluency had previously only been defined as a spelling-to-sound measure. We proposed a new sound-to-spelling version of fluency. Finally, we outlined a feed-forward model of visual word recognition, the fuzzy logical model of perception, that can also account for sound-to-spelling fluency effects. Sound-to-spelling effects do not require interactive activation to be accounted for by a model. Therefore, their value to discriminate between current competing model candidates in visual word recognition is low. ACKNOWLEDGMENTS This work was supported in part by grants from the National Science Foundation (NSF CHALLENGE Grant CDA-9726363 and NSF Grant BCS9905176), a grant from the Public Health Service (Grant PHS R01 DC00236), cooperative grants from the Intel Corporation and the University of California Digital Media Program (D97-04), and grants from the University of California, Santa Cruz. This research was also supported by a Fulbright Scholarship to the second author.

58

MASSARO AND JESSE

Parts of the research were presented at the meeting of experimental psychologists (TEAP) at Leipzig, Germany, March 1999. The research presented in this chapter was part of the second author's master's thesis at the University of California, Santa Cruz. Send correspondence to Dominic W. Massaro, Department of Psychology, University of California, Santa Cruz, CA 95064, USA. Electronic mail may be sent to [email protected].

REFERENCES Adams, M. J. (1979). Models of word recognition. Cognitive Psychology, 11, 133-176. Andrews, S. (1982). Phonological recoding: Is the regularity effect consistent? Memory and Cognition, 10, 565-575. Balota, D. A., & Chumbley, J. I. (1985). The locus of word frequency effects in the pronunciation task: Lexical access and/or production? Journal of Memory and Language, 24, 89-106. Baron, J., & Strawson, C. (1976). Use of orthographic and word-specific knowledge in reading words aloud. Journal of Experimental Psychology: Human Perception and Performance, 2, 386-393. Bowey, J. A. (1990). Orthographic onsets and rimes as functional units of reading. Memory and Cognition, 18, 419-427. Bowey, J. A. (1993). Orthographic rime priming. Quarterly Journal of Experimental Psychology, 46, 247-271. Calfee, R., Chapman, R., & Venezky, R. (1972). How a child needs to think to learn to read. In L. W. Gregg (Ed.), Cognition in learning and memory (pp. 139-182). Oxford, UK: Wiley. Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies of information processing (pp. 151-216). London: Academic Press. Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, 100(4), 589-608. Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535-555). London: Academic Press. Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Revieiv, 108(1), 204-256. Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior, 12, 627-635. Frost, R. (2003). The robustness of phonological effects in fast priming. In S. Kinoshita & S. J. Lupker (Eds.), Masked priming: The state of the art (pp. 173-191). New York: Psychology Press. Frost, R., Ahissar, M., Gotesman, R., & Tayeb, S. (2003). Are phonological effects fragile? The effect of luminance and exposure duration on form priming and phonological priming. Journal of Memory and Language, 48, 346-378. Glushko, R. J. (1979). The organization and synthesis of orthographic knowledge in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 5, 674-691. Goldinger, S. D., Azuma, T., Abramson, M., & Jain, P. (1997). Open wide and say "Blah!": Attentional dynamics of delayed naming. Journal of Memory and Language, 37(2), 190-216. Gough, P. B., & Cosky, M. J. (1977). One second of reading again. In N. J. Castellan, D. B. Pisoni, & G. R. Potts (Eds.), Cognitive theory (Vol. 2, pp. 271-288). Hillsdale, NJ: Lawrence Erlbaum Associates.

3. MAGIC OF READING

59

Grainger,j., O'Regan, J. K., Jacobs, A. M., & Segui, J. (1992). Neighborhood frequency effects and letter visibility in visual word recognition. Perception and Psychophysics, 51, 49-56. Groff, P. (1975). Research in brief: Shapes are cues to word recognition. Visible Language, 9, 67-71. Haber, L. R., Haber, R. N., & Furlin, K. R. (1983). Word length and word shape as sources of information in reading. Reading Research Quarterly, 18(2), 165-189. Huey, P. B. (1968). The psychology and pedagogy of reading. Cambridge, MA: MIT Press. (Original work published 1908) Jacobs, A. M., Grainger, J., Rey, A., & Ziegler, J. C. (1998). MROM-P: An interactive activation, multiple read-out model of orthographic and phonological processes in visual word recognition. In J. Grainger & A. M. Jacobs (Eds.), Localist connectionist approaches to human cognition (pp. 147-188). Mahwah, NJ: Lawrence Erlbaum Associates. Jared, D. (2002). Spelling-sound consistency and regularity effects in word naming. journal of Memory and Language, 46(4), 723-750. Jesse, A. (2000). Consistency effects in the fragmentation task. Unpublished master's thesis, University of California at Santa Cruz. Johnson, N. F. (1975). On the function of letters in word identification: Some data and a preliminary model. Journal of Verbal Learning and Verbal Behavior, 24(1), 17-29. Jordan, T. R., Thomas, S. M., Patching, G. R., & Scott-Brown, K. C. (2003). Assessing the importance of letter pairs in initial, exterior, and interior positions in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(5), 883-893. Kawamoto, A. H., Kello, C. T., Jones, R., & Bame, K. (1998). Initial phoneme versus wholeword criterion to initiate pronunciation: Evidence based on response latency and initial phoneme duration. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(4), 862-885. Kessler, B., Treiman, R., & Mullennix, J. (2002). Phonetic biases in voice key response time measurements. Journal of Memory and Language, 47(1), 145-171. Kucera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press. Lukatela, G., Frost, S. J., & Turvey, M. T. (1998). Phonological priming by masked nonword primes in the lexical decision task. Journal of Memory and Language, 39, 666-683. Massaro, D. W. (1975). Understanding language: An information-processing analysis of speech perception, reading, and psycholinguistics. New York: Academic Press. Massaro, D. W. (1979). Letter information and orthographic context in word perception. Journal of Experimental Psychology: Human Perception and Performance, 5(4), 595-609. Massaro, D. W. (1987). Speech perception by eye and by ear: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates. Massaro, D. W. (1989). Testing between the TRACE model and the fuzzy logical model of speech perception. Cognitive Psychology, 22(3), 398-421. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press. Massaro, D. W., & Cohen, M. M. (1991). Integration versus interactive activation: The joint influence of stimulus and context in perception. Cognitive Psychology, 23(4), 558-614. Massaro, D. W., & Cohen, M. M. (1994). Visual, orthographic, phonological, and lexical influences in reading. Journal of Experimental Psychology: Human Perception and Performance, 20, 1107-1128. Massaro, D. W., & Friedman, D. (1990). Models of integration given multiple sources of information. Psychological Review, 97, 225-252. Massaro, D. W., Jastrzembski, J. E., & Lucas, P. A. (1981). Frequency, orthographic regularity, and lexical status in letter and word perception. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 15, pp. 163-200). New York: Academic Press.

60

MASSARO AND JESSE

Massaro, D. W., & Jesse, A. (2000). Explorations of reading processes within the framework of the fuzzy-logical model of perception. Unpublished manuscript. Massaro, D. W., & Sanocki, T. (1993). Visual information processing in reading. In D. M. Willows & R. S. Kruk (Eds.), Visual processes in reading and reading disabilities (pp. 139-161). Hillsdale, NJ: Lawrence Erlbaum Associates. Massaro, D. W., Taylor, G. A., Venezky, R. L., Jastrzembski, J. E., & Lucas, P. A. (1980). Letter and word perception: Orthographic structure and visual processing in reading. Amsterdam: North-Holland. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86. MRC Psycholinguistic Database: Machine Usable Dictionary (Version 2.00) [Electronic Database]. (1987). Didcot, OX, Australia: Informatics Division Science and Engineering Research Council Rutherford Appleton Laboratory Chilton. Paap, K. R., Newsome, S. L., McDonald, J. E., & Schvaneveldt, R. W. (1982). An activation verification model for word and letter recognition: The word superiority effect. Psychological Review, 89, 573-594. Paap, K. R., Newsome, S. L., & Noel, R. W. (1984). Word shape's in poor shape for the race to the lexicon. Journal of Experimental Psychology. Human Perception and Performance, 20(3), 413-428. Peereman, R., Content, A., & Bonin, P. (1998). Is perception a two-way-street? The case of feedback consistency in visual word recognition. Journal of Memory and Language, 39, 151-174. Perry, C., Ziegler, J., & Coltheart, M. (2002). How predictable is spelling? Developing and testing metrics of phoneme-grapheme contingency. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 55(3), 897-915. Rastle, K., & Coltheart, M. (1999). Serial and strategic effects in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 25(2), 482-503. Rastle, K., & Coltheart, M. (2000). Serial processing in reading aloud: Reply to Zorzi (2000). Journal of Experimental Psychology: Human Perception and Performance, 26(3), 1232-1235. Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology, 81, 275-281. Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language, 35, 606-621. Seidenberg, M. S., & Tanenhaus, M. K. (1979). Orthographic effects on rhyme monitoring. Journal of Experimental Psychology: Human Learning and Memory, 5, 546-554. Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling or pronunciation influence word recognition? Journal of Verbal Learning and Verbal Behavior, 23, 383-404. Snodgrass, J. G., & Mintzer, M. (1993). Neighborhood effects in visual word recognition: Faciliatory or inhibitory? Memory and Cognition, 21, 247-266. Snodgrass, J. G., & Poster, M. (1992). Visual-word recognition thresholds for screenfragmented names of the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments and Computers, 24(1), 1-15. Stanovich, K. E., & Bauer, D. W. (1978). Experiments on the spelling-to-sound regularity effect in word recognition. Memory and Cognition, 6, 410-415. Stone, G. O., Vanhoy, M., & Van Orden, G. C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language, 36, 337-359. Tanenhaus, M. K., Flanigan, H. P., & Seidenberg, M. S. (1980). Orthographic and phonological activation in auditory and visual word recognition. Memory and Cognition, 8(6), 513-520.

3. MAGIC OF READING

61

Thompson, M. C., & Massaro, D. W. (1973). Visual information and redundancy in reading. Journal of Experimental Psychology, 98, 49-54. Treiman, R. (1994). To what extent do orthographic units in print mirror phonological units in speech? Journal of Psycholinguistic Research, 23, 91-110. Treiman, R., & Chafetz, J. (1987). Are there onset- and rime-like units in written words? In M. Coltheart (Ed.), Attention and performance XII: The psychology of reading (pp. 281-298). Hillsdale, NJ: Lawrence Erlbaum Associates. Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology: General, 124, 107-136. Treiman, R., & Zukowski, A. (1988). Units in reading and spelling. Journal of Memory and Language, 27(4), 466-477. Updike, J. (1986). Roger's version. New York: Knopf. Venezky, R. L. (1970). The structure of English orthography. Paris: Mouton. Venezky, R. L., & Massaro, D. W. (1979). The role of orthographic regularity in word recognition. Theory and practice of early reading (Vol. 1, pp. 85-107). Hillsdale, NJ: Lawrence Erlbaum Associates. Venezky, R. L., & Massaro, D. W. (1987). Orthographic structure and spelling-sound regularity in reading English words. In D. A. Allport, D. G. MacKay, W. Prinz, & E. Scheerer (Eds.), Language perception and production: Relationships between listening, speaking, reading, and writing (pp. 159-179). London: Academic Press. Waters, G. S., & Seidenberg, M. S. (1985). Spelling-sound effects in reading: Time course and decision criteria. Memory and Cognition, 13, 557-572. Whatmough, C., Arguin, M., & Bub, D. (1999). Cross-modal priming evidence for phonology-to-orthography activation in visual word recognition. Brain and Language, 66, 275-293. Wood worth, R. S. (1938). Experimental psychology. New York: Holt. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353. Ziegler, J. C., & Ferrand, L. (1998). Orthography shapes the perception of speech: The consistency effect in auditory word recognition. Psychonomic Bulletin and Review, 5(4), 683-689. Ziegler, J. C, Jacobs, A. M., & Stone, G. O. (1996). Statistical analysis of the bidirectional inconsistency of spelling to sound in French. Behavior Research Methods, Instruments, and Computers, 28, 504-515. Ziegler, J. C., Montant, M., & Jacobs, A. M. (1997). The feedback consistency effect in lexical decision and naming. Journal of Memory and Language, 37, 533-554. Ziegler, J. C., Rey, A., & Jacobs, A. M. (1998). Simulating individual word, identification thresholds and errors in the fragmentation task. Memory and Cognition, 26, 490-501. Ziegler, J. C., Stone, G. O., & Jacobs, A. M. (1997). What's the pronunciation for _OUGH and the spelling for /u/? A database for computing feedforward and feedback inconsistency in English. Behavior Research Methods, Instruments, and Computers, 29(4), 600-618. Zorzi, M. (2000). Serial processing in reading aloud: No challenge for a parallel model. Journal of Experimental Psychology: Human Perception and Performance, 26(2), 847-856.

This page intentionally left blank

4 When Actions Can't Speak for Themselves: How Might Infant-Directed Speech and Infant-Directed Action Influence Verio Learning? Khara Pence University of Virginia Roberta Michnick Golinkoff University of Delaware Rebecca J. Brand Villanova University Kathy Hirsh-Pasek Temple University Imagine Dina, a new mother sitting on the floor with her 1.5-year-old son. She announces her intention to roll a big blue ball, saying, "Look Benjamin ... I'm going to roll the ball." She labels the action as it is in progress and in rough synchrony with the movement of the ball when she says, "Wow, it's roooollllling!" Benjamin's mother places the verb roll at the end of the sentence, a beneficial position for parsing it from the speech stream because it is bounded by a pause (Golinkoff & Alioto, 1995). The pause may help Benjamin realize that what he just heard was the end of a unit. Furthermore, he may get some idea that that unit is likely a verb (although Benjamin has no conscious awareness of the category "verb") by noting that roll ends in an /ing/ (Golinkoff, Hirsh-Pasek, & Schweisguth, 2001). Benjamin may well remember the verb roll in this sentence because it profits from the recency effect by being at the end of the utterance. Without even realizing it, Benjamin's mother repeats this new verb many more times as they play. She also emphasizes the verb by elongating its vowel, placing stress on the word and uttering it in isolation. Moreover, Benjamin's mom uses larger-than-life movements, such as throwing her arms out in front of her body in an exaggerated fashion and holding them

63

64

PENCE ET AL.

out there as the ball rolls away from her. Benjamin's attention is maintained throughout their 5-min ball-rolling game. Although Benjamin is learning several new words per day, he has no action verbs in his productive vocabulary. On this occasion, however, he is being afforded the opportunity to learn the name of an action he is engaged in. How often do young language learners find themselves in similar situations? Do parents typically exaggerate the actions they perform or emphasize the verbs that label those actions acoustically in speech to their children? As we describe, verb learning is not easy. A combination of exaggerated input in what the child sees when an action is performed and in what the child hears when that action is named might assist infants in learning verbs. Knowing verbs is tantamount to knowing grammar as verbs form the architectural centerpiece of the sentence. As Bloom (1978) argued many years ago, and despite that it took a number of years for the field to catch up (e.g., Hirsh-Pasek & Golinkoff, in press; Tomasello & Merriman, 1995), "As important as nouns are, . .. there has been the repeated finding that the verb is the real hero in determining what children learn about language structure ... Verbs ... reflect conceptual development (and) the semantics of the verbs that children learn have a mediating effect on language learning" (pp. 1-2). Verbs perform their heroic acts by being terms that are inherently relational. Without verbs, we are limited largely to commenting on the perceptual attributes and functions of the things around us. Verbs broaden our possibilities immensely. With verbs, not only can we talk about the color of a dog or the function of his doghouse, but also we can describe how he likes to bury bones under his doghouse only to have the neighbors' dog dig them up and carry them away when he is not looking. Verbs are also crucially important in learning to read. As Gibson and Levin (1975) argued, long ago, the purpose of reading is the extraction of meaning from the printer page. Since verbs are what the sentence is about, children must understand them in order to get the thread of the text. Verbs are important, yet we do not know much about how they are learned. Furthermore, we do not know how they appear in input to young language learners or whether the exaggerated actions parents perform assist infants in parsing events into actions in the first place. THE PLAN OF THIS CHAPTER This chapter discusses the role that infant-directed (henceforward referred to as ID) input (including ID speech and ID action) may play in the mapping of verbs to actions. We begin by describing what it takes to learn a verb. We then introduce the concept of ID speech and describe infants' preference for it. We evaluate some of the evidence that suggests that ID speech may facilitate language acquisition, although the majority of re-

4. VERB LEARNING

65

search on ID speech focuses on nouns. Finally, we discuss a form of ID action that may assist infants in parsing actions from the flow of events they observe in the world. We propose a separate but related line of research to examine whether ID input (i.e., a combination of ID speech and ID action) facilitates infants' mapping of verbs to actions. WHAT'S IT TAKE TO LEARN A VERB? Consider Benjamin's task. Not only must he interpret the movements he sees as a series of discrete actions, he must additionally discern the intentions motivating those actions, parse corresponding lexical items (viz., action verbs) from the speech stream as he hears them, and finally associate those verbs with the actions they label. Unfortunately, actions cannot speak for themselves, so Benjamin and young language learners like him must depend on the verbal and nonverbal cues in the input that might facilitate the remarkably complex aforementioned task. To master the verb system in any language, children must conquer three preliminary tasks. First, they must attend to and individuate actions in their environment. Research suggests that infants are keenly aware of movement and use movement to individuate objects (e.g., Mandler, 1992, 1998) and actions (Sharon & Wynn, 1998; Wynn, 1996). Second, infants must be able to form categories of actions without language. The action of jumping, for example, refers to a decontextualized category of jumping motions that includes different kinds of jumps made by the same actor (e.g., Elmo jumping off tables and chairs) and the same action performed by different actors (e.g., Elmo jumping off the chair and Big Bird jumping off the chair). Third, children must be able to map words to actions and action categories. Consider how inefficient language would be if we needed a new label for each nonidentical instance of the action of jumping. That infants have action words among their first words is testimony to the fact that they can successfully form word-action mappings (e.g., Bloom, 1993; Smith & Sachs, 1990). WHY ARE VERBS SO HARD TO LEARN? According to diary studies, the majority of children's early words are nouns and not verbs (Gentner, 1978). Even at 5 years of age, English- and Japanese-speaking children struggle in the task of mapping novel verbs to novel actions (Imai, Haryu, & Okada, 2003; Meyer, Leonard, Hirsh-Pasek, Golinkoff, Imai, Haryu, Pulverman, & Addy, 2003). They tend to favor novel objects as the recipients of novel verb labels. Verbs are harder to learn than concrete object labels for several reasons, which may account for the disparity we find in the ages of acquisition for these two types of words.

66

PENCE ET AL.

Actions and the verbs that label them are conceptually more complex than objects and the nouns that label them (Gentner, 1982; Centner & Boroditsky, 2001). Actions are ephemeral and are thus more difficult to label than concrete, tangible, and permanent objects in the environment. Complicating matters further is that many verbs are polysemous. For example, the Merriam-Webster Online Dictionary contains 3 entries for the noun bottle and an extraordinary 42 entries for the verb run (MerriamWebster, 2002). Granted, many of the entries for run are semantically identical, and these dictionary entries are only meant to serve as an illustration of how verbs can be polysemous. Still, arguably there exists a larger number of contexts in which a young language learner might hear the verb run than the noun bottle. The large number of possible word-world mappings poses an inherent challenge for learning action verbs. Golinkoff and her colleagues (Golinkoff, Jacquet, Hirsh-Pasek, & Nandakumar, 1996) present four additional reasons for why verbs may appear more slowly than nouns in children's vocabularies. First, to learn a verb, one must uncover that verb's semantic components (e.g., causation, manner, path) and then select the corresponding surface elements. As Talmy (1985) explains, this is no straightforward task given that "a combination of semantic elements can be expressed by a single surface element, or a single semantic element by a combination of surface elements. Or again, semantic elements of different types can be expressed by the same type of surface element, as well as the same type by several different ones" (p. 57). As a first step, English-learning infants have been found to attend to manner and path in motion events (Pulverman, Sootsman, Golinkoff, & HirshPasek, 2002). For example, infants recognize the difference between an actor performing a spinning manner of motion and a jumping-jack manner of motion. They also recognize the difference between an actor traversing a path above versus below a fixed point of reference. This ability is crucial because verbs are often distinguished according to these very event components, and infants may benefit from analyzing events when they are able to attend to those aspects of events that may be lexicalized as verbs. Second, verbs may be harder to acquire than nouns because it may be more difficult to notice invariant features across actions and events than to detect invariant features of objects. Because actions contain a greater array of semantic components than objects, they necessarily encompass a greater amount of potential variability. New research is uncovering how infants are able to detect certain invariant properties across actions and events. By 9 months, infants are able to detect changes in the manner of a novel action as well as the agent performing the action and are able to use those invariant features as a basis for category formation (Salkind, Scotsman, Golinkoff, Hirsh-Pasek, & Maguire, 2002). Research to be re-

4. VERB LEARNING

67

viewed later further suggests that adults may offer infants nonlinguistic cues, such as the exaggerated actions performed by Benjamin's mother in the vignette, for parsing actions from events. Third, just as verbs are complicated in that a single verb can describe several different actions or events, many different verbs can be used to describe a single action. For example, the act of eating may be labeled with the verbs devour, munch, consume, or ingest. Fourth, to use a verb correctly, children must master the necessary argument structure required by that verb. For example, transitive verbs must contain a direct object, whereas ditransitive verbs require a direct object and an indirect object. Observing events in the world allows children to follow the number of participants in events (Fisher, 1996). But to express these arguments linguistically, children must know the names of the participants to fill the argument positions that surround a verb. Gillette, Gleitman, Gleitman, and Lederer (1999) propose that knowledge of these nouns, in combination with event observation, is what enables children to bootstrap their way into the meanings of new verbs. In addition to the conceptual and structural properties of action verbs, these words may not be featured prominently in the speech to young language learners. Because verb learning has only very recently been a focus of critical attention, there is little research addressing how and when parents begin to introduce verbs to their infants. Research examining how action verbs are used in speech to children may hold the key to understanding why verbs are so difficult to learn.

WHAT IS INFANT-DIRECTED SPEECH? Upon observing virtually anyone address a baby, one is struck by the uniqueness of this speech register. ID speech is a term used to describe the speech used in communicative situations with young language learners and is characterized by a high overall pitch, exaggerated pitch contours, and slower tempos as compared to adult-directed (henceforward referred to as AD) speech. ID speech had been found to attract infants' attention, and infants prefer it to AD speech, even as neonates (Fernald & Kuhl, 1987). ID speech also aids in communicating emotion and the communicative intent of speakers (Fernald, 1989; Trainor & Desjardins, 2002). Simply using this register when introducing new lexical items should, at the very least, capture infants' attention and increase the chances that they will focus on the speech they hear. The use of ID speech by Benjamin's mother in the vignette could open the door to verb learning that Benjamin might

68

PENCE ET AL.

have missed had his mother described the same rolling action as she would to another adult. The prosodic and paralinguistic modifications thought to capture and maintain infants' attention have been documented not only in English but cross-linguistically in several other languages, including German (Fernald & Simon, 1984), French, Italian, Japanese, British English (Fernald et al., 1989), and Mandarin Chinese (Papousek, Papousek, & Symmes, 1991). Examinations of these languages provide evidence for possible universal prosodic modifications. Fernald et al. (1989), who also found common patterns in ID speech across languages, posit that these cross-cultural commonalities might serve to facilitate language acquisition by regulating infant arousal and attention, communicating affect, and aiding speech perception and language comprehension. Differences in speech modifications (e.g., pitch raising) across languages, which serve different functions cross-linguistically, offer proof that ID speech in and of itself may indeed be universal, yet the functions of speech modifications are sociolinguistically determined (Bernstein Ratner & Pye, 1984). Research shows that when lexical content is removed, infants prefer the fundamental frequency (F0) patterns of ID speech but do not exhibit the same preference for ID amplitude or duration patterns (Fernald & Kuhl, 1987). Although the reason that infants prefer the F0 patterns of ID speech is debatable, these characteristic intonation contours appear to be highly salient auditory stimuli for infants and likely play an important role in the development of meaning (Fernald, 1984). Incredibly, even nonparents and adults possessing little experience with infants tend to use higher and more variable pitch when speaking to infants, suggesting that a biologically based propensity for these modifications might exist (Jacobson, Boersma, Fields, & Olson, 1983). No matter the reason or reasons for the ID speech preference, infants are placed at a distinct advantage for acquiring language in that as they listen: They consume the speech sounds, intonation patterns, and other regularities comprising their native language. Because infants enjoy listening to ID speech, they are likely to benefit from the modifications present in ID speech that have been found to facilitate language acquisition. DOES ID SPEECH FACILITATE THE ACQUISITION OF LANGUAGE?

Now that we know that infants prefer to listen to ID speech, are there any ways in which it can facilitate language acquisition? The answer is yes. First, vowel production appears to be modulated by the perceived Ian-

4. VERB LEARNING

69

guage ability of the listener. Adults tend to clarify words in ID speech by producing vowels that contain little overlap between vowel phoneme categories (Bernstein Ratner, 1986). This finding appears to be true of vowel production in other languages as well. Kuhl et al. (1997) found acoustically more extreme vowels and a stretching of vowel space in ID speech compared with AD speech for mothers speaking Russian, Swedish, and English. In other words, infants likely benefit from the acoustic stretching of vowel space by adults as they begin to process the inventory of vowels in their native language. ID speech also appears to exaggerate preboundary vowel lengthening and pauses, both salient cues for the identification of major syntactic units in speech (Bernstein Ratner, 1986). In addition to properties that may facilitate the formation of vowel categories and the detection of clause and phrase boundaries, ID speech appears to contain features that work to facilitate language acquisition at the word level. For example, English-learning toddlers have been found to learn new lexical items better when those lexical items are presented in ID speech than in AD speech (Golinkoff, Hirsh-Pasek, & Alioto, 1995). Why might this be? Fernald and Mazzie (1991) explored some of the specific modifications contained within ID speech that likely promote the learning of new lexical items. Using a picture book, adults described new articles of clothing on each page to their 14-month-olds. They found that adults consistently placed newly introduced object nouns on exaggerated pitch peaks in utterance-final position, whereas prosodic emphasis in AD speech was more variable. Marking focused words by placing them on pitch peaks at the ends of utterances may facilitate speech processing for infants by designating highlighted lexical items as meaningful units. Cross-linguistic support for these adjustments is also available. Aslin (1993) reports that while Turkish- and English-speaking mothers do not consistently use target words in isolation and do not avoid difficult-tosegment word boundaries, they typically highlight target words in utterances by using exaggerated pitch contours and consistently place target words in utterance-final position, even when this placement violates the word order of their language. Fernald (2000) suggests that the manner in which adults interacting with infants highlight focused words is one of many ways we provide contextual support on perceptual levels that are accessible to infants—even those in the earliest stages of language learning. Although she doesn't realize it, by placing the word roll in salient utterance-final position on exaggerated pitch peaks, Benjamin's mother has highlighted the very action word she is attempting to teach him. Granted, the majority of research on ID input has focused exclusively on the characteristics of ID speech and infants' preference for this type of input, as well as the specific linguistic abilities promoted by ID speech, in-

70

PENCE ET AL.

eluding the acquisition of lexical items. Now though, equipped with this knowledge, the potential exists to better understand how ID input might contribute to other abilities. We know that infants not only notice but also enjoy listening to speech presented in an ID fashion and that ID speech may facilitate the acquisition of smaller linguistic units and eventually lexical items. Now that we know that ID speech may facilitate language acquisition, we must confront the possibility that nouns might be favored in ID speech; indeed, the majority of research on ID speech focuses on nouns. ARE NOUNS FAVORED IN INFANT-DIRECTED SPEECH? Infants and toddlers seem to map nouns to the objects they label with relative ease. By 18 months of age, children are producing anywhere between 2 to 65 words, and girls at the 50th percentile for their age group understand an average of 65 words (boys understand an average of 56 words) (Fenson et al., 1994). According to diary studies, the majority of these early words are nouns (Centner, 1978). In fact, the argument has been presented that there exists a universal noun bias such that nouns comprise the semantic class of words most easily acquired (Gentner, 1978,1982; Gentner & Boroditsky, 2001). Perceptually, objects are readily accessible targets to which labels can be mapped. Not only are their features stable, but objects are also long lasting and oftentimes permanent fixtures in our environment. In addition to conceptual properties, nouns may also be more easily acquired by English learners by virtue of how they are presented in the input. Here, cross-linguistic differences come into play to reveal how discourse patterns and acoustic highlighting may influence how and when verbs are acquired. Nouns are often the objects of focus in communicative situations with children. In English, mothers frequently elicit noun production but rarely elicit verbs (Goldfield, 2000). Even in languages where argument dropping is licensed (e.g., Japanese, Korean, Mandarin Chinese), nouns have been found to predominate in the discourse in many instances (but see later examples of how this is not always the case). In one pro-drop language, Japanese, mothers often use objects to engage their infants in social routines (Fernald & Morikawa, 1993). Tardif, Shatz, and Naigles (1997) uncovered cross-linguistic differences in the examination of another set of pro-drop languages. They found that in Italian (as in English) caregivers' speech was more noun oriented, whereas in Mandarin caregivers' speech was more verb oriented. Camaioni and Longobardi (2001) later postulated that these differences might be attributed to the diverse morphological en-

4. VERB LEARNING

71

vironments in which Italian verbs are presented (in comparison with the morphologically transparent environments of Mandarin Chinese verbs). Researchers are also beginning to examine how context might mediate the extent to which nouns and verbs are used in input to young children. For example, two separate studies confirm that English-speaking mothers emphasize nouns both in book-reading and toy-play contexts, whereas Korean mothers emphasize nouns in book-reading contexts and focus more on actions in toy-play contexts (Choi, 2000; Gopnik, Choi, & Baumberger, 1996). Tardif, Gelman, and Xu (1999) similarly found that Mandarin- and English-speaking mothers used more noun types than verb types in bookreading contexts but used more verb types than noun types when given toys to play with. Although nouns appear to dominate children's speech in a variety of languages and communicative contexts, there are some cases in which verbs are emphasized over nouns. For example, Italian-speaking mothers have been found to produce verb types and tokens more frequently than noun types and tokens and to place verbs in initial or final position more than nouns (Camaioni & Longobardi, 2001). Korean mothers have been reported to engage in activity-oriented discourse and provide more action verbs than English-speaking mothers (Choi & Gopnik, 1995), and Mandarin-speaking mothers have been found to produce roughly twice as many verb tokens as noun tokens and to place verbs in salient positions (utterance-initial and utterance-final) more frequently than nouns (Tardif, 1993). Most striking is that the input children receive may be related to the age and rate at which they acquire verbs. For example, Korean children experience a verb spurt around 19 months, during which they acquire 10 or more verbs over a 1-month period (Choi & Gopnik, 1995). A verb spurt for English-speaking children has not yet been found, meaning that English-speaking children must acquire the majority of verbs at a much later age. Mandarin-speaking children also produce more action words than object words at 22 months (Tardif, 1996). It is quite possible that the linguistic environment, including the sheer frequency of verbs, and the simplicity of their morphological environments might propel children's production of verbs, despite their inherent conceptual complexities. Conversely, a minimized focus on action verbs by English-speaking parents, when combined with conceptual and syntactic intricacies, could explain why verbs are so difficult to acquire. In addition to being the focus of discourse, nouns in English appear to benefit from prominent acoustic properties. In communicative situations with children, caregivers tend to place object nouns in salient (utterancefinal) positions and emphasize them acoustically (Fernald & Morikawa, 1993; Tardif et al., 1997). There is a limited amount of research demon-

72

PENCE ET AL.

strating that caregivers highlight object nouns in speech to their children with primary stress and beneficial utterance-final placement, which might promote their acquisition. The few researchers to examine parents' established tendencies (for highlighting object nouns in various ways) empirically in comprehension tasks have revealed positive results. Shady and Gerken (1999) found that young 2-year-old children were better able to comprehend familiar object nouns placed in utterance-final position than those presented in medial position, and 1- and 2-year-olds in a separate study were also able to learn the names of novel lexical items best when they were placed in utterance-final position in the context of ID speech (Golinkoff et al, 1995). Furthermore, Golinkoff and Alioto (1995) demonstrated that English-speaking adults were better able to learn new nouns in Mandarin Chinese when those words were presented in ID speech and placed in utterance-final position as opposed to medial position. In general, caregivers (especially English-speaking caregivers) tend to use object-focused discourse and highlight nouns acoustically in their speech. According to the few available comprehension studies, the way in which object nouns are presented in ID speech appears beneficial for language learners. We do not yet know whether action verbs benefit from the same kinds of highlighting that are characteristic of object nouns in ID speech. Until recently, research examining how action verbs are presented in ID speech has yet to be a critical focus of attention. Pence and her colleagues (Pence, Golinkoff, Winn, Salkind, & Hirsh-Pasek, 2003) are exploring how verbs are presented to early-lexical (14- 16-month-olds) and advanced-lexical (21- 23-month-olds) infants in a storytelling context and whether input changes as children develop. When asked to describe familiar actions,and activities in a picture book titled What Are These People Doing? mothers appear to avoid commenting on actions and activities in favor of the objects depicted, even with explicit instruction to focus on actions. Their descriptions tend to focus largely on familiar objects in the illustrations. In terms of acoustic properties, parents of both early-lexical and advanced-lexical infants used target verbs that were significantly longer than nouns (when matched for utterance position and syllable length). However, neither group of parents highlighted target verbs with an increased pitch range in the same way that target nouns have been found to be highlighted. Additionally, there were no significant developments in either acoustic highlighting strategy between the two ages sampled. A developing focus on verbs, including a prosodic focus, is critical because this input may positively affect verb learning. These findings point to one reason why verb learning may be initially so challenging for infants: Action verbs do not appear to benefit from the same kinds of prosodic highlighting or discourse focus characteristic of object nouns. Although ID speech alone may not suffice in highlighting actions

4. VERB LEARNING

73

and the verbs that label them, a combination of ID speech and ID action could provide an appropriate boost for infants. INFANT-DIRECTED ACTION Recall that ID speech is the term used to characterize the register used in communicative interactions with young language learners. In addition to ID speech, there is a special type of input known as ID action that we use with infants and young children as well. Brand, Baldwin, and Ashburn (2002) demonstrated that infant-directed action is more repetitive and simplified, is enacted with a larger range of motion and in closer proximity to the partner, and is accompanied by greater enthusiasm and bids for interaction than adult-directed action. In most situations, ID speech and ID action coincide, but there are some cases in which the two are used independently of one another. For example, on some occasions, adults may use only speech and facial expressions to communicate with infants, such as when a mother is positioned with her infant in one arm and a bottle in the other. In other instances, adults may act out motions or sequences of motions for infants in the absence of speech, such as when demonstrating how to shake a rattle. Although ID action and ID speech usually co-occur, there are special instances in which ID speech and ID action are coordinated in temporal synchrony, such as in the vignette when Benjamin's mom extended the word rolling (rrrrooooollllliiiinnng) to match the ball's movement. This type of input, known as multimodal motherese, has been found to help infants learn new words (Gogate & Bahrick, 1998, 2001; Gogate, Bahrick, & Watson, 2000). Interestingly, ID input is not limited to oral language. Deaf mothers too alter the characteristics of their sign language when communicating with infants (Masataka, 1992, 1996, 1998). Until recently, research on ID communication has given little consideration to gestures, actions, and other types of nonverbal input that may contribute to infants' understanding of events, actions, and the verbs that label them. We now know that mothers modify action they present toward infants versus adults. Interestingly, the repetition and exaggeration characteristic of ID action are identical to the properties of ID speech that capture and maintain infants' attention. These modifications may serve not only to make action more salient and attention grabbing but also to mark boundaries between actions (just as ID speech marks boundaries between clauses) and to instruct about the meaning of actions (e.g., the emotional consequences of completed or failed actions) just as ID speech cues infants to the emotional content of speech. In other words, ID action may give young language learners a boost as they parse actions from larger

74

PENCE ET AL.

events and interpret others' actions above and beyond the effects of ID speech.

ACTION ANALYSIS Language development, and verb acquisition more specifically, may depend on action analysis skills for several reasons. For one, infants' ability to learn verbs, that is, the words for action units, trades on their ability to extract those units out of the ongoing flow of action. For example, an infant could not be said to know the word throw without knowing where the "picking up" part of the action stream ended and the "throwing" action began. Second, infants must be able to analyze the communicative intentions of others to learn verbs at all. That is, actions have goals, and infants must understand the goal of an action to correctly interpret the meaning of the verb that is paired with it. For example, pour and spill are perceptually similar actions, but the verbs that label them carry very different meanings. Pour denotes an intentional action, whereas spill denotes an accidental action. The same holds for the action of rolling a ball in the opening vignette as compared with dropping a ball. Benjamin's mother might emphasize her intentional rolling action by exaggerating her movements, perhaps by carrying out the action more slowly than if she had accidentally dropped the ball and watched it roll toward her son. Exaggerated, slow actions are advantageous to observers because they can easily be parsed from the stream of ongoing action. Tomasello and Kruger's (1992) finding that verb learning is easier when an action is labeled prior to its being carried out also has important implications for this situation. In addition to exaggerating her movements, Benjamin's mother announced her intention to roll the ball before performing the action. This cue alerted Benjamin to his mother's goal and undoubtedly helped him to realize that she was describing the purposeful action that would follow her label. Tomasello and Kruger argue that when actions are particularly difficult or engaging, children may miss verb labels accompanying them. Thus, hearing a verb and observing an action at separate times may reduce processing demands for young verb learners. Third, infants may benefit from analyzing events when they are able to attend to those aspects of events that may be lexicalized as verbs. For example, when hearing a novel word in the presence of a novel action, how do we narrow the possibilities to determine which aspect of the event is being labeled? We would be off to a good start by first considering those semantic components that are most commonly lexicalized in our own language. Pulverman et al. (2002) found that English-learning infants attend

4. VERB LEARNING

75

to manner and path in novel motion events. Infants habituated to a computer-animated starfish moving in a given manner along a given path dishabituated to changes in both manner and path. This ability is crucial because verbs are often distinguished according to these very event components. By this we mean that if we were to see one starfish spinning in circles and another starfish flapping its arms, we would not only recognize two distinct manners of action, but we would also label the manners of action with different words (spinning vs. flapping). ID action is rich with properties that help attract and maintain infants' attention. Additionally, the exaggeration and repetition characteristic of ID action may help infants parse actions from ongoing events and provide cues about the intentions of actions. It may be that the combination of ID speech and ID action is just the cocktail to facilitate the mapping of verbs to the actions they label.

HOW MIGHT ID INPUT—A COMBINATION OF ID SPEECH AND ID ACTION—FACILITATE INFANTS' DEVELOPMENT OF UNDERSTANDING ABOUT ACTIONS AND THE VERBS THAT LABEL THEM? In addition to characteristics of actions themselves, infants may learn about action from input that predictably coincides with it, such as an ongoing narration. This possibility, that ongoing sound or speech might help infants divide the action stream into units, was proposed and termed acoustic packaging by Hirsh-Pasek and Golinkoff (1996). However, as of yet, no empirical evidence to support this hypothesis has been offered. Gogate and colleagues' findings that young infants learned connections between vocal sounds and objects better when such synchrony was present than not, together with the Brand et al. (2002) work demonstrating the use of motionese, provide support for the notion that infantdirectedness goes beyond the characteristics of speech in isolation; rather, infant-directed modifications involve both adults' actions and the coordination of actions with speech. However, Brand et al.'s work looked at action independent of speech, and Gogate and colleagues' (1998, 2001) looked at the coordination of speech and action but focused on the learning of speech sounds. Neither piece of work directly investigates whether infant-directed speech that accompanies action may facilitate infants' learning about action per se. Current work by Brand is investigating this new question. More specifically, the first study investigates infants' sensitivity to correspondences between actions and others' infant-directed narrations of those actions, whereas the second study directly explores whether infant-directed narration of an action sequence facilitates infants'

76

PENCE ET AL.

ability to divide that sequence into meaningful units. The combination of action and language is just beginning to be explored. It may be that a combination of cues in the input (ID action accompanied by ID speech) is really what helps to propel verb learning forward for infants.

SO WHAT HAPPENS WHEN ACTIONS CANT SPEAK FOR THEMSELVES? Let's return to Benjamin in the opening vignette. We presented Benjamin as toddler playing with his mother while observing and listening to her intently. We explained how, as a young language learner, Benjamin would be faced with having to interpret the ball-rolling event as a series of discrete actions (e.g., picking up the ball, pushing it on the floor), discern the intentions motivating those actions (i.e., mother wanting to pass the ball to her son), parse corresponding action verbs from the speech stream (i.e., roll), and finally associate those verbs with the actions they label. How does Benjamin manage these difficult tasks when the actions he witnesses do not speak for themselves? This chapter addressed the value of the special kinds of infant-directed input that are likely to accompany those actions and activities and facilitate a whole host of processes related to the understanding of actions and the association of those actions with verbs. Research surrounding ID input, including adjustments made in speech, action, and the combination of speech and action, will undoubtedly continue to provide answers to wide-ranging developmental dilemmas for generations of Benjamins to come.

ACKNOWLEDGMENT Richard Venezky recently became a grandfather. His interest in how it is that children learn language was rekindled as a result. This chapter is dedicated to the linguistic segment of Dick's multifaceted brain, the side that got his PhD in linguistics and continues to be a keen observer of linguistic phenomena. The diversity of Dick's interests are well known and attested to by the diversity found in this volume. Hopefully Benjamin will remember his wonderful grandfather as he continues to acquire language.

REFERENCES Aslin, R. N. (1993). Segmentation of fluent speech into words: Learning models and the role of maternal input. In B. de Boysson-Bardies, S. de Schonen, P. W. Jusczyk, P. McNeilage,

4. VERB LEARNING

77

& J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 305-315). Norwell, MA: Kluwer. Bernstein Ratner, N. (1986). Durational cues which mark clause boundaries in mother-child speech. Journal of Phonetics, 14(2), 303-309. Bernstein Ratner, N., & Pye, C. (1984). Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan. Journal of Child Language, 11, 515-522. Bloom, L. (1978). The semantics of verbs in child language. Paper presented at the meeting of the Eastern Psychological Association, New York. Bloom, L. (1993). The transition from infancy to language: Acquiring the power of expression. New York: Cambridge University Press. Brand, R. J., Baldwin, D. A., & Ashburn, L. (2002). Evidence for "motionese": Modifications in mothers' infant-directed action. Developmental Science, 5, 72-83. Camaioni, L., & Longobardi, E. (2001). Noun versus verb emphasis in Italian mother-to-child speech. Journal of Child Language, 28, 773-785. Choi, S. (2000). Caregiver input in English and Korean: Use of nouns and verbs in bookreading and toy-play contexts. Journal of Child Language, 27, 69-96. Choi, S., & Gopnik, A. (1995). Early acquisition of verbs in Korean: A cross-linguistic study. Journal of Child Language, 22, 497-529. Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thai, D. J., & Pethick, S. J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59(5). Fernald, A. (1984). The perceptual and affective salience of mothers' speech to infants. In L. Feagans, C. Garvey, & R. M. Golinkoff (Eds.), The origins and growth of communication (pp. 5-29). Norwood, NJ: Ablex. Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the melody the message? Child Development, 60, 1497-1510. Fernald, A. (2000). Speech to infants as hyperspeech: Knowledge-driven processes in early word recognition. Phonetica, 57, 242-254. Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development, 10, 279-293. Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental Psychology, 27(2), 209-221. Fernald, A., & Morikawa, H. (1993). Common themes and cultural variation in Japanese and American mothers' speech to infants. Child Development, 64, 637-656. Fernald, A., & Simon, T. (1984). Expanded intonation contours in mothers' speech to newborns. Developmental Psychology, 20(1), 104-113. Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., & Fukui, I. (1989). A cross-linguistic study of prosodic modifications in mothers' and fathers' speech to preverbal infants. Journal of Child Language, 16, 477-501. Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children's interpretation of sentences. Cognitive Psychology, 31(1), 41-81. Gentner, D. (1978). On relational meaning: The acquisition of verb meaning. Child Development, 49, 988-998. Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. Kuczaj (Ed.), Language development: Language, thought, and culture, Volume 2 (pp. 301-334). Hillsdale, NJ: Lawrence Erlbaum Associates. Gentner, D., & Boroditsky, L. (2001). Individuation, relativity and early word learning. In M. Bowerman & S. Levinson (Eds.), Language acquisition and conceptual development (pp. 215-256). Cambridge, UK: Cambridge University Press. Gibson, E. J., & Levin, H. (1975). The psychology of reading. Cambridge, MA: MIT Press. Gillette, J., Gleitman, H., Gleitman, L., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135-176.

78

PENCE ET AL.

Gogate, L.J., & Bahrick, L. E. (1998). Intersensory redundancy facilitates learning of arbitrary relations between vowel sounds and objects in seven-month-old infants. Journal of Experimental Child Psychology, 69, 133-149. Gogate, L. ]., & Bahrick, L. E. (2001). Intersensory redundancy and 7-month-old infants' memory for arbitrary syllable-object relations. Infancy, 2, 219-231. Gogate, L. J., Bahrick, L. E., & Watson, J. D. (2000). A study of multimodal motherese: The role of temporal synchrony between verbal labels and gestures. Child Development, 71, 878-894. Goldfield, B. A. (2000). Nouns before verbs in comprehension vs. production: The view from pragmatics. Journal of Child Language, 27, 501-520. Golinkoff, R., & Alioto, A. (1995). Infant-directed speech facilitates language learning in adults hearing Chinese: Implications for language learning. Journal of Child Language, 22, 703-726. Golinkoff, R. M., Hirsh-Pasek, K., & Alioto, A. (1995, November). Infants' word learning is facilitated when novel words are presented in infant-directed speech in sentence-final position. Paper presented at the Boston University Conference on Language Development, Boston. Golinkoff, R. M., Hirsh-Pasek, K., & Schweisguth, M. A. (2001). A reappraisal of young children's knowledge of grammatical morphemes. In J. Weissenborn & B. Hoehle (Eds.), Approaches to bootstrapping: Phonological, syntactic and neurophysiological aspects of early language acquisition (pp. 167-188). Philadelphia: John Benjamins. Golinkoff, R. M., Jacquet, R. C., Hirsh-Pasek, K., & Nandakumar, R. (1996). Lexical principles may underlie the learning of verbs. Child Development, 67, 3101-3119. Gopnik, A., Choi, S., & Baumberger, T. (1996). Cross-linguistic differences in early semantic and cognitive development. Cognitive Development, II, 197-227. Hirsh-Pasek, K., & Golinkoff, R. M. (Eds.). (1996). The origins of grammar: Evidence from early language comprehension. Cambridge, MA: MIT Press. Hirsh-Pasek, K., & Golinkoff, R. M. (Eds.), (in press). Action meets word: How children learn verbs. New York: Oxford University Press. Imai, M., Haryu, E., & Okada, H. (2003, October). The role of argument structure and object familiarity in Japanese children's verb learning. Paper presented at the 28th annual Boston University Conference on Language Development, Boston. Jacobson, J. L., Boersma, D. C., Fields, R. B., & Olson, K. L. (1983). Paralinguistic features of adult speech to infants and children. Child Development, 54, 236-442. Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., Stolyarova, E. I., Sundberg, U., & Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684-686. Mandler, J. M. (1992). How to build a baby: II. Conceptual primitives. Psychological Review, 99, 587-604. Mandler, J. M. (1998). Representation. In D. Kuhn & R. Siegler (Eds.), Cognition, perception, and language: Vol. 2: Handbook of child psychology (pp. 255-308). New York: Wiley. Masataka, N. (1992). Motherese in a signed language. Infant Behavior and Development, 15, 453-460. Masataka, N. (1996). Perception of motherese in a signed language by 6-month-old deaf infants. Developmental Psychology, 32, 874-879. Masataka, N. (1998). Perception of motherese in Japanese sign language by 6-month-old hearing infants. Developmental Psychology, 34, 241-246. Merriam-Webster. (2002). Merriam-Webster online dictionary. Retrieved August 2, 2004, from http://www.m-w.com Meyer, M., Leonard, S., Hirsh-Pasek, K., Golinkoff, R. M., Imai, M., Haryu, E., Pulverman, R., & Addy, D. (2003, October). Making a convincing argument: A cross-linguistic comparison of noun and verb learning in Japanese and English. Poster presented at the 28th annual Boston University Conference on Language Development, Boston.

4. VERB LEARNING

79

Papousek, M., Papousek, H., & Symmes, D. (1991). The meanings of melodies in motherese in tone and stress languages. Infant Behavior and Development, 14, 415-440. Pence, K., Golinkoff, R. M., Winn, M. B., Salkind, S. ]., & Hirsh-Pasek, K. (2003, November). Coming into focus: Emergence of parents' conversational focus on verbs. Paper presented at the American Speech Language Hearing Association, Chicago. Pulverman, R., Sootsman, J. L., Golinkoff, R. M., & Hirsh-Pasek, K. (2002, April). Infants' nonlinguistic processing of motion events: One-year-old English speakers are interested in manner and path. Paper presented at the Proceedings of the 34th Child Language Research Forum, Stanford. Salkind, S. J., Sootsman, J. L., Golinkoff, R. M., Hirsh-Pasek, K., & Maguire, M. J. (2002, April). Lights, camera, action! Infants and toddlers create action categories. Poster presented at the International Conference on Infant Studies, Toronto, Ontario, Canada. Shady, M., & Gerken, L. (1999). Grammatical and caregiver cues in early sentence comprehension. Journal of Child Language, 26, 163-175. Sharon, T., & Wynn, K. (1998). Individuation of action from continuous motion. Psychological Science, 9, 357-362. Smith, C. A., & Sachs, J. (1990). Cognition and the verb lexicon in early lexical development. Applied Psycholinguistics, 11, 409-424. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and the lexicon, Vol. Ill: Grammatical categories and the lexicon. Cambridge, UK: Cambridge University Press. Tardif, T. (1993). Adult-to-child speech and language acquisition in Mandarin Chinese. Unpublished doctoral dissertation, Yale University, New Haven, CT. Tardif, T. (1996). Nouns are not always learned before verbs: Evidence form Mandarin speakers' early vocabularies. Developmental Psychology, 32, 492-504. Tardif, T., German, S. A., & Xu, F. (1999). Putting the "noun bias" in context: A comparison of English and Mandarin. Child Development, 70, 620-635. Tardif, T., Shatz, M., & Naigles, L. (1997). Caregiver speech and children's use of nouns versus verbs: A comparison of English, Italian, and Mandarin. Journal of Child Language, 24, 535-565. Tomasello, M., & Kruger, A. (1992). Acquiring verbs in ostensive and non-ostensive contexts. Journal of Child Language, 19, 311-333. Tomasello, M., & Merriman, W. E. (Eds.). (1995). Beyond names for things: Young children's acquisition of verbs. Hillsdale, NJ: Lawrence Erlbaum Associates. Trainor, L. J., & Desjardins, R. N. (2002). Pitch characteristics of infant-directed speech affect infants' ability to discriminate vowels. Psychonomic Bulletin and Review, 9(2), 225-340. Wynn, K. (1996). Infants' individuation and enumeration of actions. Psychological Science, 7, 164-169.

This page intentionally left blank

5 The Role of Causal Reasoning in Understanding Narratives Tom Trabasso University of Chicago

This chapter surveys a program of research on the comprehension of narrative texts. The central question that guided the research is, How do we achieve coherence in understanding a narrative text? The research program began about 30 years ago. In the 1970s, largely through the sponsorship of the National Institute of Education, an intense interest arose in the study of text: its nature, its structure, its influence on comprehension, and its representation by readers. Amidst these developments, Dick Venezky and I worked on a committee for the National Assessment of Reading. At our meetings, we had several very lively discussions about reading comprehension and its measurement. These discussions have continued to the present. I am indebted to Dick for broadening my interest in reading and convincing me of the central importance of reading to human communication and learning.

APPROACH

Readers interact with texts to create meaning (Bartlett, 1932). They achieve meaning by using relevant knowledge to interpret and relate the concepts that are in the text. Because the text provides the necessary information for the reader to interpret and structure concepts, it plays a central role in promoting coherent understanding. 81

82

TRABASSO

Our approach focuses on the influence of the text as well as on what readers do in interaction with the text to achieve coherent understanding. The research begins with a discourse analysis of a narrative text. Discourse analysis reveals the concepts and relations that the reader must infer to construct a coherent representation of the text (Dewey, 1933/1963). A discourse analysis is a careful reading and interpretation of the text. It identifies the kind of relational inferences allowed by an interaction with the text. In our work, we focused on the reader's causal inferences that explain what is happening in the text. Causal inferences explain how and why the events occur in a narrative. The inferences made by the reader connect or link the content of text's clauses or sentences to one another. Coherence is achieved when a text is considerate and provides the conditions necessary for the reader to causally relate the text's clauses or sentences. Incoherent texts omit some necessary conditions and place a burden on the reader, forcing him or her to guess what happened and why it happened. As the reader makes causal connections between the text's words, clauses, or sentences, he or she constructs a mental representation of the text in long-term memory. The memory representation is subsequently used to perform postreading, comprehension tasks, such as recalling what occurred or answering questions. The discourse analysis of the text creates an ideal reader's mental representation of concepts and relations as a causal network (Trabasso, Secco, & van den Broek, 1984; Trabasso, van den Broek, & Suh, 1989). The causal network captures the structural properties of an understood text. The properties of interest are the causal connections of concepts and the causal distance between concepts in the network. These properties are psychologically valid as predictors of comprehension behavior. Coherence and ease of understanding and memory depend on the number of connections made between the concepts. Causal connections conceptually integrate the contents of the reader's mental representation of the text. The connections between concepts affect their accessibility in the reader's memory. In general, narratives that are conceptually coherent are more interconnected and, hence, are more easily comprehended and are more accessible in memory than those texts that are less interconnected (Trabasso, Suh, & Pay ton, 1994). Given that a causal discourse analysis has been made on a narrative and its causal network is known, the next step is to validate psychologically the causal network against comprehension data. The properties of the causal network are used to predict behavioral data as a test of its psychological validity. For example, Trabasso et. (1984) predicted recall of stories and individual clauses by knowledge of their causal connectivity. Recall was strongly related to the number of causal connections identified

5. ROLE OF CAUSAL REASONING

83

by the causal discourse analysis. The assumption that is validated is that readers actually made the causal inferences revealed by the a priori discourse analysis of the text. The approach of Trabasso et al. (1984) was two-pronged: a discourse analysis of the causal relations in a network representation and the validation of the causal network properties (e.g., causal connections) by prediction of comprehension data (e.g., recall). This approach was modified and extended to include a third prong, namely, a model for processing the causal network (Langston & Trabasso, 1999). Magliano and Graesser (1991) had argued for the combination of discourse analysis, the process model (McClelland & Rumelhart, 1988), and validation against behavioral data. The addition of the connectionist model provided a theory of how the mental representation of the text is constructed, how the representation changes over the course of reading, and quantitative measures of how accessible the text constituents are over the course of reading a text. Of these, the quantitative measurement of accessibility of the information in the mental representation at any time during the course of reading and its use in predicting comprehension data were important advances. The present approach relies heavily on the discourse analysis of Trabasso et al. (1989) and the connectionist model of Langston and Trabasso (1999). The discourse analysis provides logical criteria of necessity and other procedures for identifying causal relations that could be inferred between text conditions. The conditions and relationships determine the causal network representation. The connectionist model processes the conditions and relationships, one condition and its connections at a time. The processing of one condition at a time mimics the reader's understanding of a single clause or sentence. The model integrates the new condition with prior conditions to which it is connected by spreading activation and updating the connection strengths between pairs of conditions in memory. The addition of each new condition changes the accessibility of all the conditions in the memory representation. The model updates quantitatively the changes in accessibility by modifying the connection strengths between all conditions after integration of the new condition. The measures of connection strengths between conditions change dynamically over the course of reading the text. The importance of the measure of connection strength is that it indexes accessibility of information from the text representation in memory and provides quantitative predictions of comprehension behavior. The measures of accessibility of information may be obtained either during or after the completion of reading. Thus, online or offline comprehension performance can be modeled and predicted by the use discourse analysis of the text and quantitative measures of connectionist modeling the text representation and prediction of behavioral measures of compre-

84

TRABASSO

hension. The main validity test is how well the connection strength measures predict empirical findings quantitatively. Langston and Trabasso (1999) report a high degree of success of the model in the simulation of findings from a wide range of text comprehension studies. Some of the comprehension phenomena simulated by the Langston and Trabasso model are judgments of importance and coherence, reading time, recall, and decision making (see also Trabasso & Bartolone, in press; Trabasso & Wiley, 2002). The success of these simulations depends mainly on what the text offers by way of potential connections and on the reader's making of the causal inferences that link the ideas expressed in sentences or clauses in the text. CAUSALITY, COHESION, AND COHERENCE When we witness a series of events, we do not experience them as an isolated series, but rather we experience them as a coherent whole. John Dewey (1933/1963) wrote, "To grasp the meaning of a thing, an event, or a situation is to see it in its relations to other things; to note how it operates or functions, what consequences follow from it; what causes it, what uses it can be put to" (p. 135). Coherence is established by causal explanations. The central question for research is, What is it about a series of events that enables their interpretation as a coherent whole? Discovering the reasons or causes of actions, events, or states in a text leads to an experience of a connected rather than a disconnected series. Schank (1975) characterized a story as a sequence of actions and states. He assumed that the states and actions could be linked via inferences into a causal chain. He speculated that events on the causal chain, being more connected and integrated, would be better recalled than those states and actions not on the chain. Stein and Glenn (1979) also assumed that causal and temporal relationships were central to linking events both within and between episodes. Causal chaining of events also played a role in Black and Bower's (1980) state transition hierarchies for stories and Omanson's (1982) goal-outcome sequences. In creating taxonomies of inferences, Nicholas and Trabasso (1980), Trabasso (1981), Trabasso and Nicholas (1980), and Warren, Nicholas, and Trabasso (1979) considered causal inferences to be the most important way in which readers could link narrative events. IDENTIFYING CAUSAL INFERENCES A major problem is how to identify causal relations that readers might infer between events. A conceptual understanding of causality, a means to identify it, and its role in the analysis of text should not be left entirely to

5. ROLE OF CAUSAL REASONING

85

intuition. We began our work on the analysis of causation in narrative understanding with the goal of developing and testing reliable, valid, and explicit procedures for identifying causal relations in narrative text (Trabasso et al., 1984). To illustrate how causal relations inferred by the reader play a role in understanding a text, consider how a series of events might be linked. The following example story is taken from Myers, Shinjo, and Duffy (1987): Sentence Sentence Sentence Sentence Sentence

1: Joey went to play baseball with his brother. 2: Joey got angry with his brother in a game. 3: Joey began fighting with his older brother. 4: Joey's brother punched him again and again. 5: The next day his body was covered with bruises.

The series of events depicted in Sentences 1-5, when read and understood, can be perceived as forming a coherent sequence rather than an isolated series. One can infer that there is a causal relationship between each successive pair of sentences. For example, Joey's playing baseball with his brother enables Joey to become angry with him in the game; Joey's anger motivates him to fight with his brother; in fighting, punching is an attempt to achieve a goal of injuring one's opponent; Joey's engagement in a fight enables him to become a victim of his brother's blows; and his brother's blows physically cause his bruises. In this analysis, there are implied, conceptual dependencies between adjacent pairs of sentence. These dependencies, when inferred by a reader, form local, causal cohesive ties between the concepts. Causal cohesion may occur in conjunction with or may be independent of other cohesive devices (e.g., pronominal anaphor or nominal reference as studied by Halliday & Hasan, 1976, and Kintsch & van Dijk, 1978).

VALIDATION OF THE DISCOURSE ANALYSIS AND CAUSAL NETWORKS Trabasso et al. (1984), in a seminal study, tested Schank's (1975) claim about causal chains. They identified causal relations by the use of logical criteria of necessity based on the philosophical, historical, and legal literature on causality (e.g., Collingwood, 1938; Fischer, 1970; Hart & Honore, 1959; Hospers, 1953; Mackie, 1980). Collingwood (1938) argued that people select causes from a set of available conditions. The selection is made from a particular perspective used to explain an event. A given perspective offers a unique explanation, but

86

TRABASSO

an explanation may be valid with the framework of the perspective taken. Mackie (1980) termed these frameworks causal fields and, like Collingwood, claimed that a person's causal field determines which conditions are selected as causes. Whenever important events occur that are unexpected, people are generally motivated to trace causes. For example, Hart and Honore (1959) pointed out that in the law both causes and consequences are traced to determine responsibility and assign liability for harmful or unlawful outcomes. The tracing of causes also occurs generally with disasters such as that of the crash of the shuttle Challenger in 1986 (Hilton, Mathews, & Trabasso, 1992). The reason that causes are traced is that their discovery (e.g., erosion of O-rings in the rocket fuel tanks) could enable the prevention of future disasters. Of interest to the analysis of causes, counterfactuals may be used to test beliefs about potential causal explanations (e.g., if the temperature had been above 57°, then the O-rings would have been flexible enough to expand and prevent fuel from exiting and burning through). The destruction of the space shuttle Columbia during reentry underwent a similar search for a causal explanation in February 2003 and ensuing months. A candidate cause, namely debris that hit the Columbia's left wing during takeoff, was observed and reported as a possible cause. Counterfactually, if the debris from the foam protection had not struck the left wing during takeoff, the shuttle would not have been destroyed. The causal search should produce design changes that prevent a repetition of the problem of falling debris during takeoff. Mackie (1980), among other philosophers, recognized that dependencies exist between causes and their effects. In Mackie's view, a cause is but one of several jointly necessary conditions for a consequence. If a condition, A, is removed from the set of jointly necessary conditions (called the circumstances) for condition B, condition B will not occur. Counterfactually, if notA, then not-B, in the circumstances. The argument is that if condition A had not happened in the circumstances of the story, then condition B would not have happened. Referring to the Joey story, if Joey had not become angry with his brother, in the circumstances of the story, he would not have had a fight with him. Likewise, if his brother had not hit him, the bruises would not have occurred, in the circumstances of the story. The sufficiency of a condition selected as a cause is weak in Mackie's (1980) view. The reasons are that one condition is not sufficient and that other sets of conditions that do not contain a particular condition can serve as causes. Mackie describes how one of several, different, disjunctive sets of individually necessary conditions can cause an outcome. If other, jointly necessary conditions are lacking, the condition that is put into the story may not lead to the consequent. Thus, a condition is necessary but not sufficient as a cause.

5. ROLE OF CAUSAL REASONING

87

The logical criterion for a causal relation, then, is one of necessity in the circumstances. Trabasso et al. (1984) assumed that the circumstances of the story inferred by the reader provided the set of conditions for the necessity of a particular condition as a cause. The reader infers a cause by taking into account several conditions present or implicit in the story. It is not a general cause that is inferred but one that operates only within the particular circumstances of the story. It may also be consistent with general knowledge of physical and psychological causation. Joey's bruises required several blows to his body, and these are not mentioned in the story. The reader might infer that Joey sustained several blows during the fight that were necessary for him to be bruised. Therefore, the logical criteria for necessity require assuming a set of circumstances in which an identified condition occurs. The condition identified as a cause is individually necessary in the circumstances of the story. Negating the condition counterfactually leads to a belief that its consequence would not occur. If Joey had not been hit by his brother, in the circumstances of the story, he would not have suffered bruises. The counterfactual became the primary test of an inferred causal relation that links two events in the causal analysis of a story (Trabasso et al., 1984,1989). EMPIRICAL VALIDATION OF THE APPROACH Causal Connectivity, Chains, and Recall Trabasso et al. (1984) adopted Mackie's (1980) criteria of logical necessity to identify potential causal relations in narrative texts. They used this criteria to identify causal chains that could be inferred by readers in a story. They identified causal chain membership of clauses to test Schank's (1975) claim that events on the chain would be better remembered than those not on the chain. Trabasso et al. (1984) carried out a causal analysis on stories that Stein and Glenn (1979) used to study recall by elementary school children in Grades 1, 3, and 5. In the analysis of these stories, Trabasso et al. (1984) first parsed the stories into clauses. Each clause had one main verb predicate. Causal relations were then found between clauses. Each related pair was tested counterfactually. A causal network representation was constructed for each story where a clause was a node and causal relations between clauses were directed arrows. The causal chain was a connected path from a clause in the beginning of the story to a clause at the end of each story. Causal chain clauses each had a linked antecedent and consequent except for the one that started or ended the chain. In contrast, dead-end clauses were either isolated and had no causal connections, or had a cause but no

88

TRABASSO

consequence, or had no antecedent cause but having a consequence, or were in a chain that represented an offshoot of the main causal chain. The clauses that form the causal chain are likely to have more causal connections, on average, than those that are dead end. Causal chain clauses are more likely to be more integrated and accessible than are deadend clauses. Causal chain clauses should therefore be recalled more than dead-end clauses. The higher memory accessibility of causes on the causal chain was supported in predictions of the recall data of Stein and Glenn (1979). Clauses in the causal chain were recalled more often than deadend clauses by a 2:1 ratio (80% vs. 40%). Causal chain clauses were also better retained 1 week later. The retention was at the same level as that for immediate recall of causal chain clauses; the dead-end clauses were retained far less well with the passage of a week's time, showing a greater loss of accessibility with time. The recall was the same for a cause given its consequent and for a consequent given a cause. This bidirectional effect indicates mutual accessibility of one clause from another. For total story recall, the percentage of story events in the causal chain predicted perfectly the percentage of recall for each of the four stories. In addition, the patterns of story grammar category recall reported by Stein and Glenn (1979) and by Mandler and Johnson (1977) were also perfectly predicted by their percentage of causal chain events. In sum, the identification of causal relations by counterfactual criteria was a psychologically valid predictor of what the children recalled after reading or listening to stories. The validity of causal connectivity indicates that readers made the causal inferences identified by the a priori discourse analysis of the texts. In a related study that preceded the Trabasso et al. (1984) analysis of causal chains, Omanson (1982), in a doctoral thesis directed by the author, examined central versus peripheral events in terms of differences in accessibility in memory. Central events had both causes and consequences. Peripheral events, in contrast, had either a cause but not a consequence or vice versa, making them dead-end clauses. In this sense, Omanson studied causal connectivity similar to the causal chain analysis of Trabasso et al. (1984). The important contribution of Omanson was that he controlled the content across versions but varied the causal connectivity of the same events. Omanson found that central events were recalled better in tests of immediate and delayed recall and that they were rated as more important and were summarized twice as often as peripheral events. Trabasso and van den Broek (1985) reanalyzed the stories of Omanson (1982) and Stein and Glenn (1979) using the logical criteria of Trabasso et al. (1984). They found causal networks. They added, as a variable, the episodic story grammar categories of clauses. Here, each clause was categorized as an initiating event; an internal response, such as an emotion, per-

5. ROLE OF CAUSAL REASONING

89

ception, cognition; or a goal, an attempt, and an outcome. These categories correspond to those in story grammars of Mandler and Johnson (1977) and Stein and Glenn (1979). In regression analyses, Trabasso and van den Broek (1985) found that causal chain membership and the number of causal connections accounted for substantial proportions of variance in both the Omanson (1982) and Stein and Glenn (1979) data. Story grammar categories and causal chains contributed unique variance, but they each overlapped substantially with causal connections. Other properties of the clauses, such as concreteness, serial position, and argument overlap, failed to account for any unique variance. For the Omanson (1982) data, the proportions of variance accounted for in immediate and delayed recall, summaries, and importance judgments were, respectively, .33, .46, .44, and .66. For Stein and Glenn's (1979) recall data, the proportion of variance accounted for were .61 and .66 for immediate and delayed recall. The findings lend further support that causal connectivity is a primary predictor of comprehension findings and that readers make the causal inferences identified by the discourse analysis. Importance of Events Identification by readers of important units in a text has had a long history. Binet and Henri (1894) regarded what was recalled from a story as important. They argued that if what was recalled long after reading or listening to a story was removed from the original story, the story would make no sense. Thorndike (1917) suggested that the importance of a statement depended on its relations to other statements in a text but offered no independent or a priori way by which to judge its relational role. In practice, the reader's ability to identify the importance of ideas in a text has been taken as a measure of reading comprehension (Brown & Smiley, 1977). Despite its popularity, there was no theoretical basis for importance as a measure of comprehension. Trabasso et al. (1984) had used the percentage of events in the causal chain to predict the rank order of importance ratings by children of a set of episodic story grammar categories (data from Stein & Glenn, 1979). This finding indicated that causal connectivity might provide a theoretical basis by which participants judge the importance of an event. The basic idea here is that the reader or listener intuits the number and kind of relations that a clause has to other clauses in judging its importance to the story. Trabasso and Sperry (1985) tested whether causal connectivity was a predictor of judgments of importance. To do this, they carried out discourse analyses of six Chinese folktales that were used by Brown and Smiley (1977) to obtain importance judgments. Trabasso and Sperry

90

TRABASSO

found causal networks for each story, employing the same logical criteria as Trabasso et al. (1984). Identification of causal relations was highly reliable (96% agreement). The six Chinese folktales had large networks with the number of clauses per story ranging from 72 to 126. From the networks, Trabasso and Sperry (1985) found the number of connections and the causal chain membership for each clause in each story. In a regression analysis, the number of connections accounted for substantial degrees of unique and total variance (15% and 27%, respectively), whereas causal chain membership accounted for only 2% unique variance. The average importance judgment was nearly perfectly correlated with the number of causal connections (range = 0 to 11) of a clause. These data show a high degree of influence of causal connectivity on the importance of clauses in a text. Psychological Validity of Logical Necessity Criteria Trabasso et al. (1989) asked whether untrained participants would independently judge the same clauses as related that were a priori identified as causally related by the discourse analysis. The participants were asked to judge pairs of clauses as to whether the first clause was necessary or sufficient for the second clause. Trabasso et al. (1989) sampled pairs of sentences that varied in the kind of relationship. In their sample, pairs were included where (1) initiating events psychologically caused appraisals, emotions, goals or other internal states; (2) goals motivated attempts and subordinate goals; (3) attempts enabled or physically caused outcomes; and (4) outcomes, acting like events, psychologically caused internal states (see also Schank & Abelson, 1977, for a similar taxonomy of causal relationships). Since Graesser (1981) had shown that why and how questions could be validly used to identify causal and enabling relation, Trabasso et al. (1989) adopted his method of posing why and how questions to assist them in the identification of causal relations. Trabasso et al. (1989) identified causal relationships in narratives by (1) parsing the stories into clauses; (2) categorizing the clauses in terms of their episodic function; (3) using the categories to look for conditions that were psychological caused, physically caused, motivated, or enabled; (4) posing why or how explanatory questions on candidate conditions when a cause was hypothesized; and (5) testing pairs of clauses by the use of counterfactuals. The use of these several criteria enabled highly reliable (r > .90) identification of causal relationships. Once identified, the causally related pairs were joined together in a causal network representation of each story. Trabasso et al. (1989) sampled several pairs of clauses from the causal networks. The pairs of clauses varied in their causal distance from 0, a di-

5. ROLE OF CAUSAL REASONING

91

rect cause, to 6, an indirect cause in a causal chain. The 0 to 6 was the range of the number of intervening clauses between a pair of clauses in the causal network. College student participants read each Stein and Glenn (1979) story. They then saw a list of pairs of clauses and rated each pair on 7-point scale. They rated either how necessary was clause A for the occurrence of clause B or whether a clause was a sufficient cause of another clause. Necessity was rated more highly than sufficiency for the pairs of causes. Participants also differentiated between the kinds of causes. Their ratings of the kind of relationship, ordered from the highest to the lowest, were physical, motivational, psychological, and enabling. Regression analyses showed that the number of causal connections accounted for substantially more variance in the judgments than did the presence of argument overlap (i.e., the presence of common nouns across clauses). Question Asking Questions play a central role in the assessment and promotion of understanding (Dillon, 1982; Graesser, 1981; Graesser & Black, 1985; Lehnert, 1978). Questions can direct how a reader understands a text, assess whether the reader can explain an event, or assist the reader in the discovery of meaningful relations. Despite a long history of interest in questioning, the results on questioning the reader per se are rather mixed (Anderson & Biddle, 1975; Trabasso & Bouchard, 2000). Trabasso, van den Broek, and Liu (1988) used causal discourse analysis to provide a theoretical basis for where in the text one should pose how and why questions to maximize integration, comprehension, and memory. Suppose a story character has a goal to attain a particular object (e.g., Jimmy wants a bicycle) but meets with an initial failure (e.g., Jimmy's mother refuses to buy him a bike). Jimmy generates a subordinate goal (e.g., to obtain a job) to earn the money to buy a bike. If one asks why Jimmy obtained a job, the reader could integrate the higher order goals of earning money and buying the bike. If one asks how Jimmy obtained the job, the reader could answer and integrate the goal of job attainment with a plan that Jimmy followed to get the job (e.g., applying for a job in a grocery store and delivering groceries to earn money). The backward answering of why questions and the forward answering of how questions are direct ways in which a teacher could pose questions to maximize understanding and integration of the story. van den Broek, Tzeng, Risden, Trabasso, and Basche (2001) posed questions on narrative comprehension by 4th-, 7th-, and lOth-grade students as well as college students. They found a questioning effect on specifically targeted information for college students, namely, that questioned content and answers were recalled better than that which was not questioned.

92

TRABASSO

They also found that questioning the youngest children actually interfered with recall on both general and on the targeted information. Questioning had no beneficial effect on children in 7th and 10th grade. Van den Broek et al. (2001) concluded that questioning might be beneficial to highly skilled readers who are more proficient and automatic in their reading and language skills. Younger and less skilled readers' processing capacity may be challenged by the additional task of answering questions. This explanation might account for some of the mixed results on question asking and comprehension reviewed by Anderson and Biddie (1975) and Trabasso and Bouchard (2000). Goal Hierarchies and Causal Connectivity A number of researchers who studied discourse comprehension assumed that statements high in a text structure hierarchy would be recalled better than those lower in the hierarchy (Black & Bower, 1980; Lichtenstein & Brewer, 1980; Rumelhart, 1975; Thorndyke, 1977). Van den Broek and Trabasso (1986) varied the level of a goal in a hierarchy by making it either a superordinate or a subordinate goal. They also varied the number of connections of a goal. If goal had a large number of connections, it would motivate a large number of subordinate goals and actions. They found that superordinate goals were summarized often when they had a large number of causal connections but not when they had a small number of connections. Van den Broek (1988), in a published dissertation study, varied connectivity across levels of a three-goal hierarchy. Those goals with more connections were best recalled, both immediately and after a delay of 2 days; were most frequently summarized; and were rated as most important, regardless of the location in the hierarchy. This result is of interest because in a story, a lower order goal can dominate the action and assume greater importance than its higher order goal. Causal Connections in Narration Berman and Slobin (1994a) carried out a cross-linguistic study on narration of a picture book by children and adults. The picture book (Mayer, 1969) depicts the adventures of a frog who is captured and kept by a boy but then escapes from a jar and returns to its family. Individuals from a large number of different language groups were asked to tell the story by looking at each of the 24 pictures in their normal temporal order. One corpus of protocols obtained on 52 American speakers by Berman and Slobin (1994b) were made available to the author. The protocols came from four groups with 10 children each (3-, 4-, 5-, and 9-year-olds) and one group of 12 adults.

5. ROLE OF CAUSAL REASONING

93

Trabasso and Nickels (1992) applied Trabasso et al's (1989) analysis to the protocols. They represented each story as a causal network from the perspective of each character (a boy, the frog, a dog, a gopher, bees, a deer, and the family of frogs), a procedure originated by Warren et al. (1979). Each network was allowed to interact with other networks where the characters had effects on one another and where causal relations existed across the networks. Of interest was that the main character, the boy, had the longest causal chain from the beginning to the end of the story. An important building block of a story is the relationship among different kinds of events that organizes them into episodes (Stein & Glenn, 1979). A core episode has a causally interrelated, goal-attempt-outcome (GAO) unit: for example, "the boy wanted to find his frog, so he called out the window but the frog did not come" or "the boy looked for his frog in a hole in the ground, but a gopher jumped out and bit him on the nose." In the analysis of the story generation protocols, 9-year-old children resembled adults in the number of GAO units required to tell a coherent story. The 9-year-olds did not elaborate within the episodes as much as did the adults. This suggests that episodic structuring enables richer, elaborated memories. The main developmental findings, however, were in the 3- to 5-year range. Three-year-olds mainly described who and what were in the pictures and did not include actions and did not structure the sequence episodically. Four-year-olds described attempts and outcomes but omitted reasons or purposes of the actions. Five-year-old children described attempts and outcomes but introduced goals as reasons or added clear goal success or failure outcomes of attempts. The developmental picture showed a sharp increase in causally connected GAOs from 3 and 4 to 5 years in age and then another, lesser increase from 5 to 9 years in age. By the time they enter kindergarten, children are becoming proficient in encoding events with basic episodic structure. Knowledge of goals and plans of action that enable causal explanation of actions and outcomes develops rapidly over the 3- to 5-year range. Four-year-olds, in their encoding of picture stories, however, know why actions are being carried out. Trabasso, Stein, Rodkin, Munger, and Baughn (1992) asked 4-year-olds why each encoded attempt occurred and found that the 4-year-olds understood the reason for the action (e.g., the boy wanted to find his frog). Five-year-olds appear to be mastering the art of storytelling. They used settings that introduced the characters and their relationships (boy and pets). These serve to orient the listener to the importance of the initiating event of the frog escaping from the jar. The developmental sequence of description, action, and goal-directed action represent a general, developmental finding on the use of causal reasoning to generate and understand stories. Stein and Policastro (1984)

94

TRABASSO

asked children, 3 to 11 years in age, to generate stories to two setting statements that introduced characters in space and time (see also Stein, 1988; Stein & Albro, 1997a, 1997b). They found a developmental trend where the youngest children mainly described what the characters were like, preferred, or possessed. Next came action sequences where the children encoded actions and outcomes but without causal relations. The remaining four types of sequences generated by the older children contained causal and enabling relations and increased in their episodic completeness and complexity. The most advanced stories had causal relations. These were classified as reactive sequences, where the character mainly reacted passively to events but lacked goal-directed action. Other causal sequences found were incomplete, complete, and complex episodes that contained goals and goal-directed actions. Trabasso and Stein (1994) studied how 4- and 6-year-old children encoded and recalled six events in an episode from two frog stories of Mayer (1967, 1973). Encoding was observed by having the child tell the story in the pictures. In encoding, children showed the same developmental sequence as was found in story generation by Stein and Policastro (1984) and Stein and Albro (1997a, 1997b). The younger children described a series of isolated setting statements that named characters and identified objects. Next, slightly older children generated isolated action sequences. Then the older children generated complete episodes with causal connections. In recall, the children who encoded the isolated series showed strong serial position effects in recall; those who encoded the series causally recalled all the events at a high level independent of the order of events in the series. These data show the power of encoding events in terms of their causal relationships and creating a coherent memory representation that aids in subsequent recall. Coherence and Story Structure Trabasso et al. (1994) studied the relationship between coherence and recall of stories that varied in causal structure. Hierarchical stories are organized around a main goal that fails in its attainment, and a set of subordinate goals and actions are carried out to attain the main goal. This embedded goal process gives the structure higher causal connectivity than the comparison, sequential story because the latter is a linear structure with a series of goals and successes. Participants read and rated the coherence of each story. The expectation was that the hierarchical structure would be rated higher because it is more interconnected. This was in fact the case. Of interest, the average number of connections per story accounted for 91% of the variance in coherence ratings of the 16 stories.

5. ROLE OF CAUSAL REASONING

95

Online Comprehension The strategies and processes that readers use to achieve understanding can be revealed by the use of thinking-out-loud procedures during reading (Kucan & Beck, 1997; Suh & Trabasso, 1993; Trabasso & Magliano, 1996a, 1996b; Trabasso & Suh, 1993). Trabasso and Suh (1993) and Suh and Trabasso (1993) asked a group of eight college student readers to "think out loud" by reading one sentence of a story at a time, trying to understand the sentence in the context of the story, and reporting their understanding as they read. The protocols indicated where the readers made inferences. Suh and Trabasso focused on those inferences that involved the goals of the protagonist. They found that readers referred to goals most frequently at the same locations predicted by the discourse analysis and causal networks. In a set of three experiments involving priming (facilitation of processing sentences that contain inferences), Trabasso and Suh (1993) and Suh and Trabasso (1993) probed readers with a question whose answer would have been primed at that location by a goal inference (e.g., "Betty went to the department store" would prime a goal to "buy her mother a present"). The participants were college students. Readers were faster to answer probe questions during reading on goals (e.g., "Had Betty wanted to buy her mother a present?") in those locations where the discourse analyses and causal networks predicted that the reader should infer a goal to explain either a related subordinate goal or attempt. Hence, both thinkaloud and priming data indicated that readers made online inferences where they were predicted to make them by the a priori discourse analysis. Further, the average priming response times and the number of goal references were highly correlated across groups and conditions. These studies show that readers made inferences while reading a story at those locations where the causal network analysis indicated that they should. To understand how inferences are made on information in texts, Trabasso and Magliano (1996a) performed a process analysis on the adult, skilled reader think-out-loud protocols of Suh and Trabasso (1993) and Trabasso and Suh (1993). Three working memory operations were identified in the protocols: activation of relevant world knowledge in working memory, maintenance and carrying over of information in working memory, and retrieval of text and prior thoughts from a long-term memory store. These operations are functionally necessary to three kinds of inferences: explanation, elaboration, and prediction. Figure 5.1 shows a representation of the memory and inference processes that were studied by Trabasso and Magliano (1996a). During reading, the reader has to understand the current, focal sentence (S) in working memory. In understanding this sentence, the reader can maintain it by

96

TRABASSO

FIG. 5.1. Memory and inference processes during reading of a sentence (based on figure from Trabasso & Magliano, 1996a).

repeating or paraphrasing (sentence S-1) in working memory. The reader can explain a focal sentence (S) by carrying over a prior sentence (S-1) or by retrieving from long-term memory information from prior sentences (e.g., sentence S-k) or from prior thoughts that were elaborated from world knowledge (not shown). The reader may make predictions about what is going to happen based on the focal sentence or can elaborate it by activating relevant information from world knowledge. Of these processes, memory and retrieval along with explanation and accurate prediction serve best to associate, learn, integrate, and update the memory representation of the text (Langston & Trabasso, 1999), although activation and elaboration through the use of world knowledge is also a necessary contributor (Cote, Goldman, & Saul, 1998; Stein & Trabasso, 1992). Trabasso and Magliano (1996a) found that understanding was largely explanation based. The readers' protocols showed that their thoughts were 75% inferences; 4% metacomments, including comprehension monitoring; and 21% paraphrases. For inferences, explanations predominated (70%), followed by elaborations (19%) and predictions (11%). Most elaborative inferences were based on activation of relevant knowledge. Memory processes provided access to information that readers used to think about the current contents in working memory. Prior or new thoughts were often used to explain focal sentences. Learning through elaborating and integrating the text information was largely achieved through local processing (i.e., keeping information available in working memory through maintenance and carryovers). Retrieval into working memory also played a substantial role and accounted for a large percentage of the use of text-based sources. Global coherence was achieved through integration of text by higher order

5. ROLE OF CAUSAL REASONING

97

goals that were maintained over successive sentences and by retrieving information at distances as far as 16 sentences away. Trabasso, Suh, Payton, and Jain (1995) had 24 nine-year-old third graders read silently a story and then read and think out loud on a second story or vice versa. After both stories were read, the children recalled the first and then the second story. They found that those children who made the most integrative inferences, namely, explanation and prediction, recalled the most information. Asking questions of child on a story and then asking the child to think out loud on another shows that question asking can benefit comprehension. Trabasso et al. (1994) had children think out loud when they read a first story. Then they asked questions during the reading of a second story which the children answered while reading silently. They asked why and how questions (e.g., "How did Ivan defeat the giant?" "Why did Ivan want to battle the giant?"). They also asked what questions (e.g., "What did Ivan do to the giant?"). A third story served as a transfer condition. Here, thinking out loud was used as a measure of the effect of the questioning. Compared to controls that were not questions, explanatory and predictive inferences increased during thinking out loud. Trabasso and Magliano (1996b) analyzed the protocols of Trabasso et al. (1994) and compared their findings on adult data (Trabasso & Magliano, 1996a) with those on third grade readers (Trabasso & Magliano, 1996b). Explanatory inferences favored the more expert, college readers over the more novice, third grade readers (32% vs. 58%). The third graders showed more elaborative (21% vs. 12%) and predictive (17% vs. 11%) inferences, and they paraphrased relatively more (30% vs. 19%) than college-level readers. The distributions of these proportions were, however, comparable across the kinds of inferences, with explanation being the modal inference. With respect to knowledge activation and memory operations, college and third grade readers performed in nearly identical ways: generation of new activation (63% vs. 60%), use of prior activation (5% vs. 6%), and use of prior text (33% each). These data indicate the kind of differences in thinking aloud one might expect. Simulation of Comprehension by a Connectionist Model Langston and Trabasso (1999) and Langston, Trabasso, and Magliano (1998) simulated construction of a mental representation during reading. They did this by using a connectionist model that operates on the causal network as input. The connectionist model is implemented in the software of the construction-integration model of Kintsch and others (Goldman & Varma, 1995; Kintsch, 1988,1992; Kintsch, Welsch, Schmalhofer, & Zimny, 1990; Tapiero & Denhiere, 1995). The Langston and Trabasso (1999) version

98

TRABASSO

allows access to long-term memory during the processing of new information because information stored in long-term memory has to be kept directly accessible by means of retrieval cues in short-term memory. The Langston and Trabasso (1999) model has a long-term storage called text representation that contains nodes, connections between nodes, and quantitative values that change over time as each new node is processed. The nodes correspond to clauses or sentences from the discourse being modeled. Each node has an associated activation value that changes over time as the text representation is constructed. Each connection represents the relationship between a pair of clauses in the text, as identified by the discourse analysis. Each connection between nodes has a connection strength. Connection strengths change as new nodes and their connections are integrated into the existing text representation. The activation values and connection strengths reflect, through processing, both the current memory representation of the reader and the history of his or her comprehension. The network is processed one node at a time to simulate reading. The model incorporates each new node and its connections into an existing text representation and spreads activation among the nodes. Once activation has settled, the activation values are used to adjust the strengths of the connections between the nodes. At any time during or after this processing, connection strength values may be obtained. Although activation values, settling rates, or connection strengths index accessibility in memory and each could be used in simulation, Langston et al. (1998) found empirically that connection strength was the most reliable and psychologically meaningful predictor. In the simulation, each network is presented to the connectionist model for processing, one pair of conditions at a time. The model outputs an n x n matrix of connection strengths between all possible pairs of conditions processed up to any point in the story. From this matrix, average connection strength of a condition or the connection strength between a pair of conditions is obtained. Connection strength indexes how accessible one condition is from other conditions, either on average from all other conditions or from a particular condition. Average connection strength assumes that accessibility is made by activation from multiple conditions to a particular condition. For example, in free recall, multiple connections are activated, so the appropriate measure is the average connection strength of a node. If a condition is prompted or primed by another condition, then the connection strength between a pair of conditions is an appropriate index of accessibility. In priming, for example, a single sentence, clause, or word can activate another sentence, clause, or word. In this case, the connection strength between a pair of conditions serves as the predictor. Langston and Trabasso (1999) report a number of successful simulations. Using the connection strength between nodes to predict accessibility of one

5. ROLE OF CAUSAL REASONING

99

node from another, Langston and Trabasso simulated the amount of time taken to read the second sentence of a pair of sentences. The second sentence varied in how many nodes separated it from the first sentence. This variation is termed causal distance in the mental representation. The connection strength between the pair of sentences varied with causal distance and predicted reading time as well as ratings of causal relatedness for pairs of sentences (data from Myers et al., 1987). The connection strength between a pair of conditions also has been found by Langston and Trabasso to predict the number of goal references made during reading in think-outloud protocols (data from Lutz & Radvansky, 1997; Suh & Trabasso, 1993), the priming of online goal inferences (data from Lutz & Radvansky, 1997; Rizzella & O'Brien, 1996; Suh & Trabasso, 1993), and ratings of causal relatedness for pairs of sentences (data from Myers et al., 1987). Langston and Trabasso (1999) simulated accessibility made from multiple sources in the text. Here, the average connection strength of a node indexes the node's accessibility from all other nodes in the mental representation. Average connection strength successfully predicted substantial variance in judgments of how well a sentence fits into the context of the story (data from Magliano, Trabasso, & Langston, 1995), importance judgments (data from Trabasso & Sperry, 1985), immediate and delayed recall of sentences (data from Trabasso et al., 1994,1995), and the recall and coherence of entire stories (data from Trabasso et al., 1994, 1995). Connection strength between clauses or sentences is directly related to causal distance or directness of causation. This index decreases as the reader has to make more inferences that connect a pair of clauses or sentence or the text introduces a number of intervening causes between the pair. Average connection strength is directly related to the number of causal connections that one clause or sentence has to other clauses or sentences. Thus, the connectionist model provides a numerical index of the properties of the causal network derived from a discourse analysis of a narrative. The success of the simulations supports the assumption that readers make the inferences identified through the a priori discourse analysis of the text. Decision Making Since the publication of the Langston and Trabasso (1999) chapter, the range of simulations has been expanded. Simulations now include decision making, including counterfactual reasoning (data of Kahneman & Tversky, 1982) where readers must access alternative conditions (Trabasso & Bartolone, in press) and hindsight bias (Trabasso & Wiley, 2002; data from Wasserman, Lempert, & Hastie, 1991) where readers access conditions to predict an outcome (Fischhoff, 1975; Hawkins & Hastie, 1990).

100

TRABASSO

Hindsight bias is a situation where memory updating is important. In hindsight bias, knowledge of an outcome of an event strongly influences decision making predictions and judgments. After-the-fact reasoning is not the same as before the fact. In the studies on hindsight bias, readers read a story about a forthcoming battle and were given several conditions in a story that favor one side or the other. The readers were then asked to predict an event without knowledge of the outcome (e.g., a battle between the British and the Gurkas in 1814). In another condition, readers were told the outcome (e.g., the Gurkas won) but were asked to predict who would have won the battle as if they did not know who won. Knowledge of the outcome is difficult to disregard. The bias is a change in likelihood of who won that is changed in the direction of the outcome knowledge. So, for example, knowing that the Gurkas won against the British increases the prediction that they would win. Trabasso and Wiley (2002) used discourse analysis and connectionist modeling to predict accurately the observed data by Wasserman et al. (1991). Wasserman et al. report data on seven groups of readers. They varied the causal elaboration of outcomes across the groups. Trabasso and Wiley showed that the outcome and its causal elaboration affected the updating of connection strengths of the conditions relevant to the outcome in a systematic manner. In making a prediction in the absence of knowledge of the outcome, readers were assumed to access and weigh conditions that were favorable relative to both favorable and unfavorable conditions. For example, the size of the Gurka force is a favorable condition, whereas the discipline of the British soldiers is an unfavorable condition to the prediction Gurka victory. Knowledge of the outcome that the Gurkas won updates and maintains the favorable conditions for the Gurkas and decreases those favorable to the British. This updating increases the weighted proportion of favorable conditions for the Gurkas and predicts the changes in predictions after knowledge of the outcome. Likewise, the causal elaboration of outcomes changes the connection strengths of conditions that favor the winner in the outcome relative to those that favor the loser. Thus, when accessing conditions to make a prediction after knowledge of an outcome, the reader is more likely to access conditions that are relevant to the outcome—hence, the hindsight bias. The hindsight bias is better understood as a change in accessibility of relevant information as a function of learning than it is as an unexplained bias. Minimalism Revisited McKoon and Ratcliff (1992) took a minimalist position with respect to understanding narratives. They claimed that readers made only local connections and did not make inferences at a distance in the text. They pre-

5. ROLE OF CAUSAL REASONING

101

sented three experiments in which they probed for goal or attempt information after a sentence near the end of the story. The probes yielded priming data on accessibility of goals and attempts. Trabasso (2003) obtained their stories for the three experiments, performed causal discourse analyses, obtained the causal networks for each story, and simulated the data. Trabasso used the Langston and Trabasso (1999) model to obtain estimates of accessibility of target clause information following priming by a goal or action in a series of recognition tests following reading. Priming data from all three experiments were very accurately simulated. The success of the simulations questions the minimalist claim that readers do not make causal inferences based on goals at a distance in the text. The simulations also show the value of using both a discourse analysis and a process model to predict behavioral data. Anaphoric and Spatial Referents Readers must access who and what anaphoric references to know which person or object is the focus or topic. Trabasso (2000) simulated accessibility of anaphoric referents where a pronoun had to be disambiguated as to which person is being discussed (Greene, Gerrig, McKoon, & Ratcliff, 1994; Lea, Mason, Albrecht, Birch, & Myers, 1998) and spatial referents in a building (Rinck & Bower, 1995,2000). Trabasso (2000) predicted the time it took to locate a person as the referent of a pronoun or an object in a room after the reader memorized a map of a laboratory and read a story in which a character negotiates his way through the lab. In both instances, the time to access the referent depended directly on the strength of the connection between the last sentence read and the sentence in which the person or room was last mentioned in the text. These findings indicate that readers do not search for referents, but rather referents to pronouns or locations are passively activated by the current focus of attention during reading, and the degree of activation as indexed by connection strength determines the ease of access.

CONCLUSION The studies surveyed in this chapter provide strong support for readers use of knowledge of human psychological and physical causation to explain why and how characters behave as they do in the circumstances of the story. These explanatory inferences form the basis for learning connections between the explanation and what is being explained. The connections enable the reader to form a mental representation of the text situation in memory. The mental representation has properties analogous to a

102

TRABASSO

causal network, and the nature of the representation affects the reader's ability to access and use what was read to perform a variety of comprehension tasks. This chapter surveyed a number of theoretical and empirical studies that support these ideas and demonstrate the importance of explanatory understanding of narratives. ACKNOWLEDGMENTS

This writing of this chapter was funded, in part, by a grant from the National Institute of Child Health and Human Development, Grant # HD 38895. The author thanks John Sabatini for reviewing the original and revised drafts and for his constructive suggestions on improving its exposition. REFERENCES Bartlett, F. (1932). Remembering: A study in experimental and social psychology. Cambridge, UK: Cambridge University Press. Berman, R. A., & Slobin, D. I. (Eds.). (1994a). Different ways of relating events in narrative: A cross-linguistic study. Hillsdale, NJ: Lawrence Erlbaum Associates. Berman, R. A., & Slobin, D. I. (1994b). Narrative structure. In R. A. Berman & D. I. Slobin (Eds.), Relating events in narrative: A cross-linguistic developmental study (pp. 39-84). Hillsdale, NJ: Lawrence Erlbaum Associates. Binet, A., & Henri, V. (1894). La memoire des phrases. L'Annee Psychologic, 1, 24-59. Black, J. B., & Bower, G. H. (1980). Story understanding and problem solving. Poetics, 9, 223-250. Brown, A. L., & Smiley, S. S. (1977). Rating the importance of structural units of prose passages: A problem of metacognitive development. Child Development, 48, 1-8. Collingwood, R. G. (1938). On the so-called idea of causation. In H. Morris (Ed.), Freedom and responsibility: readings in philosophy and law (pp. 303-312). Stanford, CA: Stanford University Press. Cote, N., Goldman, S., & Saul, E. U. (1998). Students making sense of informational text: Relations between processing and representation. Discourse Processes, 25, 1-53. Dewey, J. (1933). How we think. Oxford, England: Heath. Dillon, J. T. (1982). The multidisciplinary study of questioning. Journal of Educational Psychology, 74, 147-165. Emmott, C. (1997). Narrative comprehension. Oxford, UK: Clarendon Press. Fischer, D. H. (1970). Historian's fallacies: Towards a logic of historical thought. New York: Harper & Row. Fischhoff, B. (1975). Hindsight foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1, 288-299. Goldman, S. R., & Varma, S. (1995). CAPping the construction-integration model of discourse comprehension. In C. A. Weaver, III, S. Mannes, & C. R. Fletcher (Eds.), Discourse comprehension: Essays in honor of Walter Kintsch (pp. 337-358). Graesser, A. C. (1981). Prose comprehension beyond the word. New York: Springer-Verlag.

5. ROLE OF CAUSAL REASONING

103

Graesser, A. C., & Black, J. B. (Eds.), (1985). The psychology of questions. Hillsdale, NJ: Lawrence Erlbaum Associates. Graesser, A. C., Singer, M., & Trabasso, T. (1994). A constructionist theory of inference generation during narrative text comprehension. Psychological Review, 101, 371-395. Greene, S. B., Gerrig, R., McKoon, G., & Ratcliff, R. (1994). Unheralded pronouns and the management of common ground. Journal of Memory and Language, 33, 511-526. Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman. Hart, H. L. A., & Honore, A. M. (1959). Causation in the law. Oxford, UK: Clarendon Press. Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgment of past events after the outcomes are known. Psychological Bulletin, 107, 311-327. Hilton, D. J., Mathews, R. H., & Trabasso, T. (1992). The study of causal explanation in natural language: Analyzing reports of the Challenger disaster in The New York Times. In M. L. McLaughlin, M. J. Cody, & S. J. Read (Eds.), Explaining one's self to others: Reason-giving in a social context (pp. 41-59). Hillsdale, NJ: Lawrence Erlbaum Associates. Hospers, J. (1953). Introduction to philosophical analysis. Englewood Cliffs, NJ: Prentice Hall. Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201-208). New York: Cambridge University Press. Kintsch, W. (1988). The role of knowledge in discourse comprehension: A constructiveintegration model. Psychological Review, 95, 163-182. Kintsch, W. (1992). How readers construct situation models for stories: The role of syntactic cues and causal inferences. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), Essays in honor of William K. Estes (pp. 261-278). Hillsdale, NJ: Lawrence Erlbaum Associates. Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363-394. Kintsch, W., Welsch, D., Schmalhofer, F., & Zimny, S. (1990). Sentence memory: A theoretical analysis. Journal of Memory and Language, 29, 133-159. Kucan, L., & Beck, I. (1997). Thinking aloud and reading comprehension research: Inquiry, instruction, and social interaction. Review of Educational Research, 67, 271-299. Langston, M., & Trabasso, T. (1999). Modeling causal integration and availability of information during comprehension of narrative texts. In H. van Oostendorp & S. Goldman (Eds.), The construction of mental representations during reading (pp. 29-69). Mahwah, NJ: Lawrence Erlbaum Associates. Langston, M. C., Trabasso, T., & Magliano, J. P. (1998). Modeling on-line comprehension. In A. Ram & K. Moorman (Eds.), Computational models of reading and understanding (pp. 181-225). Cambridge, MA: MIT Press. Lea, R. B., Mason, R. A., Albrecht, J. E., Birch, S. L., & Myers, J. L. (1998). Who knows about whom: The role of common ground in accessing distant information. Journal of Memory and Language, 39, 70-84. Lehnert, W. G. (1978). The process of question answering. Hillsdale, NJ: Lawrence Erlbaum Associates. Lichtenstein, E. H., & Brewer, W. T. (1980). Memory for goal-directed events. Cognitive Psychology, 12, 412-440. Lutz, M. F., & Radvansky, G. A. (1997). The fate of completed goal information in narrative comprehension. Journal of Memory and Language, 36, 293-310. Mackie, J. L. (1980). The cement of the universe. Oxford, UK: Clarendon. Magliano, J. P., & Graesser, A. C. (1991). A three-pronged method for studying inference generation in literary text. Poetics, 20, 193-232. Magliano, J., Trabasso, T., & Langston, M. (1995, November). Cohesion and coherence in sentence and story understanding. Paper presented at the meetings of the Psychonomic Society, Albuquerque, NM.

104

TRABASSO

Mandler, J. M., & Johnson, N. S. (1977). Remembrance of things parsed: Story structure and recall. Cognitive Psychology, 9, 111-151. Mayer, M. (1967). A boy, a dog, and a frog. New York: Dial Press. Mayer, M. (1969). Frog, where are You? New York: Dial Press. Mayer, M. (1973). Frog on his own. New York: Dial Press. McClelland, J. L., & Rumelhart, D. E. (1988). Explorations in parallel distributed processing: A handbook of models, programs, and exercises. Cambridge, MA: Bradford Books. McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological Review, 99, 440-66. Myers, J. L., Shinjo, M., & Duffy, S. A. (1987). Degree of causal relatedness and memory, journal of Memory and Language, 26, 453-465. Nicholas, D. W., & Trabasso, T. (1980). Towards a taxonomy of inferences. In F. Wilkening, J. Becker, & T. Trabasso (Eds.), Information integration by children (pp. 243-266). Hillsdale, NJ: Lawrence Erlbaum Associates. Omanson, R. C. (1982). The relation between centrality and story category variation. Journal of Verbal Learning and Verbal Behavior, 21, 326-337. Rinck, M., & Bower, G. H. (1995). Anaphora resolution and the focus of attention in situation models. Journal of Memory and Language, 34, 110-131. Rinck, M., & Bower, G. H. (2000). Temporal and spatial distance in situation models. Memory and Cognition, 28, 1310-1320. Rizzella, M. L., & O'Brien, E. J. (1996). Accessing global causes during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1208-1218. Rumelhart, D. E. (1975). Notes on a schema for stories. In D. G. Bobrow & A. Collins (Eds.), Representation and understanding: Studies in cognitive science. New York: Academic Press. Schank, R. C. (1975). The structure of episodes in memory. In D. G. Bobrow & A. M. Collins (Eds.), Representation and understanding: studies in cognitive science (pp. 237-272). New York: Academic Press. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Stein, N. L. (1988). The development of storytelling skill. In M. B. Franklin & S. Barten (Eds.), Child language: A book of readings (pp. 282-297). New York: Cambridge University Press. Stein, N. L., & Albro, E. R. (1997a). Building complexity and coherence: Children's use of goal-structured knowledge in telling good stories. In M. Bamberg (Ed.), Learning how to narrate: New directions in child development (pp. 5-70). San Francisco: Jossey-Bass. Stein, N. L., & Albro, E. R. (1997b). The emergence of narrative understanding: Evidence for rapid learning in personally relevant contexts. Contemporary Issues in Education, 60,83-98. Stein, N. L., & Glenn, C. G. (1979). An analysis of story comprehension in elementary school children. In R. O. Freedle (Ed.), New directions in discourse processing, Vol. 2 (pp. 53-120). Norwood, NJ: Ablex. Stein, N. L., & Policastro, M. (1984). The concept of a story: A comparison between children's and teachers' perspectives. In H. Mandl, N. L. Stein, & T. Trabasso (Eds.), Learning and comprehension of text (pp. 113-155). Hillsdale, NJ: Lawrence Erlbaum Associates. Stein, N. L., & Trabasso, T. (1992). Scientific reasoning and explanatory patterns: The effects of thinking aloud and pictorial representation. Paper presented at American Educational Research Association, San Francisco, CA. Suh, S., & Trabasso, T. (1993). Inferences during on-line processing: Converging evidence from discourse analysis, talk-aloud protocols, and recognition priming. Journal of Memory and Language, 32, 279-301. Tapiero, I., & Denhiere, C. (1995). Simulating recall and recognition by using Kintsch's construction-integration model. In C. A. Weaver, III, S. Marines, & C. R. Fletcher (Eds.), Discourse comprehension: Essays in honor of Walter Kintsch (pp. 211-232). Hillsdale, NJ: Lawrence Erlbaum Associates.

5. ROLE OF CAUSAL REASONING

105

Thorndike, E. L. (1917). Reading as reasoning: A study of mistakes in paragraph reading. Journal of Educational Psychology, 8, 323-332. Thorndyke, P. W. (1977). Cognitive structures in comprehension and memory of narrative discourse. Cognitive Psychology, 9, 77-110. Trabasso, T. (1981). On the making and the assessment of inferences during reading. In J. T. Guthrie (Ed.), Comprehension and teaching: Research Reviews (pp. 56-76). Newark, DE: International Reading Association. Trabasso, T. (2000, January). What makes information accessible during or after narrative comprehension? Paper presented at Winter Text Conference, Jackson Hole, WY. Trabasso, T. (2003, January). Miminalism, maximalism, or whatever is necessary for coherence: Discourse analysis and simulation of McKoon and Ratcliff(1992). Paper presented at Winter Text Conference, Jackson Hole, WY. Trabasso, T., & Bartolone, J. (2003). Counterfactual thinking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 904-923. Trabasso, T., & Bouchard, E. (2000). Teaching children how to comprehend what they read: A review of experimental research on direct instruction of reading comprehension. In Report of the National Reading Panel. Teaching Children to Read: An Evidence-Based Assessment of the Scientific Research Literature on Reading and Its Implications for Reading Instruction (NIH Publication No. 00-4769, pp. 39-118). Washington, DC: U.S. Government Printing Office. Trabasso, T., & Magliano, J. P. (1996a). Conscious understanding during comprehension. Discourse Processes, 21, 255-287. Trabasso, T., & Magliano, P. A. (1996b). How do children understand what they read and what can we do to help them? In M. Graves, P. van den Broek, & B. Taylor (Eds.), The first R: A right of all children (pp. 160-188). New York: Teachers College Press. Trabasso, T., & Nicholas, D. W. (1980). Memory and inferences in the comprehension of narratives. In F. Wilkening, J. Becker, & T. Trabasso (Eds.), Information integration by children (pp. 215-242). Hillsdale, NJ: Lawrence Erlbaum Associates. Trabasso, T., & Nickels, M. (1992). The development of goal plans of action in the narration of picture stories. Discourse Processes, 15, 249-275. Trabasso, T., Secco, T., & van den Broek, P. (1984). Causal cohesion and story coherence. In H. Mandl, N. L. Stein, & T. Trabasso (Eds.), Learning and comprehension of text (pp. 83-111). Hillsdale, NJ: Lawrence Erlbaum Associates. Trabasso, T., & Sperry, L. L. (1985). Causal relatedness and importance of story events, Journal of Memory and Language, 24, 595-611. Trabasso, T., & Stein, N. L. (1994). Using goal/plan knowledge to merge the past with the present and the future in narrating events on-line. In M. M. Haith, J. B. Benson, R. J. Roberts, Jr., & B. F. Pennington (Eds.), The development of future oriented processes (pp. 85-106). Chicago: University of Chicago Press. Trabasso, T., Stein, N. L., Rodkin, P. C, Munger, G. P., & Baughn, C. (1992). Knowledge of goals and plans in the on-line narration of events, Cognitive Development, 7, 133-170. Trabasso, T., & Suh, S. (1993). Understanding text: Achieving explanatory coherence through on-line inferences and mental operations in working memory. Discourse Processes, 16, 3-34. Trabasso, T., Suh, S., and Payton, P. (1994). Explanatory coherence in communication about narrative understanding of events. In M. A. Gernsbacher & T. Givon (Eds.), Text coherence as a mental entity (pp. 189-214). Amsterdam: John Benjamins. Trabasso, T., Suh, S., Payton, P., & Jain, R. (1995). Explanatory inferences and other strategies during comprehension and their effect on recall. In R. F. Lorch, Jr. & E. J. O'Brien (Eds.), Sources of coherence in reading (pp. 219-239). Hillsdale, NJ: Lawrence Erlbaum Associates. Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24, 612-630.

106

TRABASSO

Trabasso, T., van den Broek, P., & Liu, L. (1988). A model for generating questions that assess and promote comprehension. Question Exchange, 2, 25-38. Trabasso, T., van den Broek, P., & Suh, S. (1989). Logical necessity and transitivity of causal relations in stories. Discourse Processes, 12, 1-25. Trabasso, T., & Wiley, J. (2002, January). Dynamic understanding and updating of memory: Simulation of hindsight bias. Paper presented at the Winter Text Conference, Jackson Hole, WY. van den Broek, P. W. (1988). The effect of causal relations and goal failure position on the importance of story statements. Journal of Memory and Language, 27, 1-22. van den Broek, P., & Trabasso, T. (1986). Causal networks versus goal-hierarchies as predictors of importance of story statements. Discourse Processes, 9, 1-15. van den Broek, P., Tzeng, Y., Risden, K., Trabasso, T., & Basche, P. (2001). Inferential questioning: Effects on comprehension of narrative texts as a function of grade and timing. Journal of Educational Psychology, 93, 521-529. Warren, W. H., Nicholas, D. W., & Trabasso, T. (1979). Event chains and inferences in understanding narratives. In R. Freedle (Ed.), New directions in discourse processing (Vol. 2, pp. 23-52). Hillsdale, NJ: Lawrence Erlbaum Associates. Wasserman, D., Lempert, R. O., & Hastie, R. (1991). Hindsight and causality. Personality and Social Psychology Bulletin, 17, 30-35.

6 Theory and Practice of Using Information Text for Beginning Reading Instruction Michael L. Kamil Stanford University Diane Lane Hilliard, Ohio, Public Schools

Emma Nicolls Stanford University

In 1968,1 had a conversation with Dick Venezky in which he pointed out the overwhelming preponderance of story text in basal readers. At the time I viewed the comment as a curiosity. A similar sentiment was repeated (Venezky, 1982) in print. However, it simply remained an interesting comment until sometime in 1994 when I taught a course called "Trends and Issues in Reading." As part of that course students were required to read an article by Pappas (1993), which reported some of her research on information text. In that article, Pappas reported that there were conditions under which kindergarten students processed information text as easily as stories. Diane Lane, a teacher in a local school district, was a student in that course and became intrigued with including information text in reading instruction. She invited me into her classroom to do research shortly thereafter. We have been collaborators in using information text for reading instruction ever since. Another line of the work to be described here was done with Emma Nicolls when she was a master's student at Stanford University. Emma read not only the Pappas article but also some of the work that Diane Lane and I had done. By that time, Duke (2000) had published a study in which she estimated the number of minutes (3.6) students could read information text in many classrooms. Emma decided to look at the instruction teachers delivered in support of those 3.6 min. 107

108

KAMIL, LANE, NICOLLS

This line of inquiry represents the true tradition of academic research. A conversation held some 25 years earlier continues to inspire research projects that involve new generations of students. These studies in this line of research have been largely descriptive and formative in nature. I spent many hours in Diane Lane's classrooms when I lived in Ohio. When I moved from Ohio, I still returned to visit. What follows is a presentation of some of the theory and the research conducted by Diane Lane, Emma Nicolls, and me. More extensive discussion of the research and issues in each of the studies is available (Kamil & Lane, 1998; Kamil & Nicolls, 2001; Nicolls, 2001). RESEARCH ON TEXTS USED IN INSTRUCTION Beginning reading instruction has been focused on the use of story text on the assumption that stories are easier to comprehend because of their predictable structure (e.g., Mandler & Johnson, 1977; Stein & Glenn, 1979). This assumption is so deeply ingrained that almost all of the available programs for beginning reading instruction are based on story text. Although estimates vary, Hoffman et al. (1994) suggest that only 12% of text in basal readers is nonnarrative text. Moss and Newton (1998, 2002) report data in line with that of Hoffman et al. from six basal reading series. They found that the mean percentage of selections devoted to informational literature ranged from 16% to 20% across grade levels. The largest number of pages was devoted to fiction (66%), followed by informational literature (20%), biography (6%), poetry (5%), and plays (3%). The mean percentage of pages devoted to informational literature ranged from 18% to 24% across grade levels. Overall, 20% of the pages at all grade levels was devoted to informational literature. It is important to think about the task demands of reading in school in light of the preponderance of fiction over information, either by assumption or empirical estimate. It only takes a cursory examination of any school curriculum to notice that beyond the earliest grades, the primary genre of text to which children are exposed is informational. That is, outside of reading instruction or language arts, students must read and process text that conveys information as its primary function. Some of the researchers who made this point include Venezky (1982), Chambliss and Calfee (1989), Flood and Lapp (1986), Kamil and Lane (1997, 1998) and McKenna and Robinson (1997). Because information text is so important in later schooling and in the world after school, it is difficult to reconcile the nearly exclusive use of story or literary text in reading instruction. The assumption behind the nearly exclusive use of literary text is that students will find it easier, in be-

6. INFORMATION TEXT THEORY AND PRACTICE

109

ginning reading, to process stories than information. Corollary to this is the assumption that students prefer stories to information text. It is difficult to find the roots of this set of assumptions. Relatively recent research paints a somewhat different picture. Pappas (1993) has raised questions about the efficacy of stories as opposed to information text for kindergarten children. In her study, kindergarten students showed equal abilities to recall information and stories. They expressed preferences for information, at least under some circumstances. Although Pappas did not have the students read the story, she did find that they could process the text in ways that are contrary to the story-is-always-better assumption. Educational Testing Service (1995, p. 4) reported evidence that suggests that fourth grade students who read more types of text have higher reading scores than students who read fewer types. In this analysis, students reported whether they read storybooks, magazines, or information books. Students who read all three types of materials had the highest proficiency. In a series of small-scale studies, my students and I (Kamil, 1994) showed that students checked out a significantly greater number of storybooks, compared with information books, from the school library. However, there was a larger number of information books, compared with storybooks, checked out of a neighborhood library (in the same neighborhood as the school). The librarians reported that there was no apparent difference in the proportions of information and storybooks in the two libraries. In another study, teachers reported they were interested in doing more work with expository material but felt constrained by the curriculum. At the same time, approximately 80% of the teachers used predominantly narrative materials because they felt that expository materials would be too hard for the students. In yet another line of research, we demonstrated that reading information text was correlated with science achievement, whereas reading story text was not (Bernhardt, Destino, Kamil, & Rodriguez-Munoz, 1995; Kamil & Bernhardt, 2004). This finding suggests that there are separable skills in reading the two types of text. What this means is that there are different reading skills required for different types of text. Students need to be taught the complete set of skills that would make them proficient at reading all of the texts they encounter. To concentrate only on story text would limit the types of texts students would be able to read later. At the very least, the pieces of evidence presented in this section, taken together, strongly suggest that the rigid emphasis on story materials in reading instruction has been misguided. What is needed, at least, is an approach where students are taught and learn strategies appropriate for both different types of texts.

110

KAMIL, LANE, NICOLLS

WHY IS THE FOCUS ON TYPES OF TEXT IMPORTANT? There is a prevalent belief in educational contexts that reading is reading and that it does not matter what genre of text is used for instruction. According to this belief, the same reading skills are always in use and what is different is the content of the text. The experiences we have had over the past decade or more with students learning to read information texts suggests that the situation is far more complex. The following argument assumes that there are at least two types of reading skills: those that are generic and useful in any reading context and those that are specific to the type of text being read. For example, an example of a generic reading skill might be the ability to decode words to sound. This is a skill that does not change much, if at all, from genre to genre. Decoding words is the same in stories as it is in information text. On the other hand, determining the setting in which a story takes place is a skill that applies only to stories. Exposition does not usually "take place" in a setting; it is typically assumed to be universal. The skill of identifying the setting is not only not useful for information text; it may not even result in meaningful information if applied. Similarly, being able to determine the main idea of an information text is not applicable to story text. Stories have their own grammar, and main idea is not a part of that grammar.

CRITICAL DIFFERENCES BETWEEN STORY AND INFORMATION TEXT As a basic premise, it is assumed that there are critical differences in the way in which readers must approach different types of text. Further, it is assumed that differences between text types are so important that information about text types must be part of instruction. There are four dimensions on which we believe the reading of different text types differs: truth value, unity, structure, and instrumentality. In the next sections, we discuss each of these in turn. Truth Value By truth value we mean the default assumption of whether what is contained in a text is true. Truth value refers to the default assumption about what is in the text, not the determination of whether it actually is true or not. The differences can be seen by thinking about these assumptions in different types of text. For example, a story can be equally good, entertain-

6. INFORMATION TEXT THEORY AND PRACTICE

111

ing, or enjoyable, whether or not it really happened. There is no need to assume that the content of a story is true for it to be enjoyable. By comparison, information text has the opposite assumption. Information is not useful if it is untrue, with a few exceptions. In general, a reader must make the assumption that the text is true until such time as there is evidence it is not. Otherwise there would be no point in trying to read the information for a further purpose. Think about reading information about repairing a car. If the information is incorrect, it yields inappropriate results. This distinction has important implications for instruction. If students are provided reading instruction only in stories, they cannot receive instruction in how to evaluate texts along this dimension of truth value because they have no counterexamples. Rather, this is a dimension that is irrelevant in reading stories, as noted previously. Consequently, students may not even consider this dimension important in comprehension when they move to reading expository text in content areas because it was irrelevant in the stories in which they learned to read. Unity The second dimension reflects the amount of text one needs to read for appropriate comprehension. In other words, the question is whether the text can be read in parts or whether it has to be read as a whole to be comprehended. Stories generally have to be read completely to be fully appreciated. It is difficult to imagine reading the very end of a story and understanding it without having read the beginning and middle. (Of course, there are minor exceptions, such as detective stories where a reader might skip to the end to find out who committed the crime. However, the reader still must have read earlier parts of the text to make this meaningful.) By comparison, information text is most often read in small bits as the reader searches for relevant information. Typographical aids are built into information texts to make this possible: tables of contents, indexes, headings, chapters, and the like. For example, in a social studies text, a section might have a heading labeled "Bill of Rights." The student who wants to find out about those rights can look in that section, rather than in another section that might be labeled "Checks and Balances in Government." Comparable text aids are typically not available in stories, although, again, there are exceptions. Stories are sometimes divided into chapters or scenes, but, in general, these divisions have no critical relevance to the story. Rather, they may serve as devices to signal changes in action, but the reader cannot know what the change is without reading the sections in their entirety. These divisions generally do not help the reader to locate specific information in the story. (Of course, older novels did have brief

112

KAMIL, LANE, NICOLLS

descriptions of the contents of each chapter preceding the chapter itself. This style is only rarely seen in contemporary fiction.) If students do not learn the strategy of reading only what is necessary in information texts, they may have to work their way through much information that is irrelevant to their purposes and, ultimately, distracting for the task at hand, whatever it may be. It seems almost humorous to think about formulating a question before reading a story and then searching for headings that might help answer it, without reading the remainder of the story. Structure A great deal of the research on text has focused on differences in structure of the texts. Early work focused on stories and story grammars. The idea of predicting what will happen in a story becomes a simple task after a student has learned the basic grammar of stories. Stein and Glenn (1979) showed that a story schema could predict comprehension. Although stories may or may not adhere rigidly to a single story grammar, there is far more uniformity in the structure of stories than in the structure of exposition. Researchers and theorists have identified many different structures of expository text. Meyer (1975) suggested five different types: description, collection, comparison, problem-solution, and causation. What is important about this is not the specific structures but that there are so many of them. Thus, identification of structure of text is far more difficult when reading nonnarrative material. Consequently, we believe that students need—at least—instructional exposure to different types of text. Purpose or Instrumentality Readers have different purposes for reading stories from those that they have for reading information text. There are also differences in what readers can do with the results of reading different types of text. For information text, readers generally select a text that will fulfill a specific need for information. Selection of text is typically done on the basis of what a reader would need to know or do. For stories, the purpose is more typically a matter of aesthetic enjoyment, not necessarily what a reader would or would not need to do subsequently. Some of the dimensions we listed interact with each other. Thus, the purpose might determine the amount of text a reader would need to read. There is a critical need to be able to identify, for example, whether a text might be accurate and reliable (high on the truth value dimension) to

6. INFORMATION TEXT THEORY AND PRACTICE

113

make the decision about reading the text. All of these considerations have direct and important instructional implications. It is important to note that the argument presented does not suggest that instruction in story text is irrelevant. What is suggested is that stories should not be the entire focus of reading instruction as it is in many current programs. The conception of reading skills has strong implications for instruction. Most obviously, the implications are that reading instruction should consist of both generic skills and skills that are unique to each genre. In the next section, a brief description of a study is presented in which these instructional implications were used as the basis for an instructional intervention. BACKGROUND OF THE STUDY The study grew out of a concern about the nature of the materials being used for reading instruction. As set forth in the previous argument, there are reading skills that cannot be taught in the context of stories and there are reading skills that cannot be taught in the context of exposition. Consequently, we thought it important to attempt to teach reading with expository materials on an equal footing with narrative materials, beginning at the first grade level. The study was a 3-year observational/intervention study of three first grade classes and one second grade class with the same teacher. The major intervention was the inclusion of an approximately equal amount of information text in reading instruction. Concomitantly, instruction was provided in strategies that were necessary for students to read information books efficiently. Students were also given parallel instruction and opportunities to write information text as well as story text, again in more nearly equal proportions than is typical. The instructional program involved teaching students how to recognize different genres of text, how to make use of text features (i.e., indexes and tables of contents) in information text, how to assess information text in critical ways, and how to make use of multiple sources of information. The program stressed writing as much as reading, and all instruction was balanced between information and story text. Free reading and writing were encouraged in both information texts and storybooks. INSTRUCTIONAL PROCEDURES Early on, students learn to use different features of expository text that are usually not present in or appropriate for narrative text. Among the first of these lessons is the ability to distinguish between information and story

114

KAMIL, LANE, NICOLLS

text. (These words are chosen to make it simpler for first grade students to remember.) Lessons in the use of a table of contents are given early in the year. This is taught in conjunction with the skill of learning how much of a text one has to read. Students need to know that they have to read the entirety of a story to be able to understand it in its entirety, whereas an information book almost always can be read in smaller, independent pieces. The students seem to take to this distinction easily and are able to extend it. This attitude was taught by extensive use of the K-W-L procedure (Ogle, 1986) to guide reading in information text. In this procedure, students are taught to formulate what they Know before they begin reading, generate questions that relate to What they want to find out as they read, and eventually to summarize what they Learned after they read. Students need to be fairly specific about what they want to know and then do reading that is targeted at finding out that information. This version of purpose setting provided opportunities to point out to students that they can learn different parts of the answers from different books. Reinforcement was provided by teacher-directed questions (embedded in an information reading context): "Did that answer your question?" or "Is it a good idea to read two information books?" Other instructional elements that were emphasized were other features of expository books that do not appear in narrative books: glossaries, indexes, tables of contents, and so on. The students quickly came to realize, however, that the distinction is not as clear as they might wish. Some storybooks have true information in them; some information in information books is not true. Although this mixed genre problem is difficult conceptually, students did not seem to have difficulty understanding it. Another instructional principle was that story texts were used after appropriate information texts had been read. The typical pattern (e.g., Freeman & Person, 1992) has been to use narrative to provide background knowledge for information text. In the current intervention, the attempt was made to read information prior to reading stories. The background knowledge obtained from information text selections often makes the stories more meaningful. Most of the instruction in reading information was embedded in themed units: farm animals, pets, mammals, rain forest, and so on. This gave students wide latitude to contribute what they know and what they need to find. The products of these projects could be books, large murals, or individualized writing about the topic. Students did research by reading books that allowed them to answer the "W" (What do we want to learn?) questions posed at the outset of the unit. Parents often helped their children in this by taking them to the library outside of school hours, reading the books with them, and even helping with some of the transcribing

6. INFORMATION TEXT THEORY AND PRACTICE

115

of compositions. The instructional intervention was implemented and refined over 3 years in two first grade classes and another that was team taught. There were 75 students who participated for some or all of these 3 years. Because this was a version of a design experiment, there were few consistent, formal measures.

RESULTS AND DISCUSSION For each of the 3 years, general reading achievement was measured by taking running records in materials that were leveled by Reading Recovery standards. Running records target general reading achievement. Their use in this context is simply to ensure that students did not fall behind in general reading ability. Students from the 1st year learned to read at a relatively high level. They averaged 9.3 levels passed in Reading Recovery materials. Although this is partially confounded with their native abilities, they performed above teacher expectations independently and showed great motivation. A significant portion of the 2nd-year class qualified for some sort of special reading program. They averaged 7.6 levels passed. Although they did not perform as well as the first class, their improvement was substantial. Students from the 3rd year showed an average of 9.6 levels passed in January. By September of the same year, when they had entered second grade, they averaged 19.0 levels. Text Genre Results In the 1st year, students were given opportunities to express preferences for either story or information text at various points during the year. They showed differential preferences for the two genres. They expressed preferences for reading information text (22 said they liked it more) compared with only 5 who preferred reading stories. When asked which they liked to write, 17 still indicated a preference for expository, whereas 8 said they would rather write stories. Two were undecided. One of the students indicated that he liked to write stories because he had to read books to find out information to put into his information writing. In one class, the students read a book called Monkeys in the Jungle, a fictional story about animals and where they live. The only problem that the students found with it is that the author did not use only jungle animals. The story includes a penguin, bear, goats, and the like. An explanation was offered that in fiction authors can include any animal they want, but that did not satisfy the group. The teacher offered the class the opportunity to change it into a nonfiction book. The students listed different animals in the jungle. Each person was given an animal, and the class wrote

116

KAMIL, LANE, NICOLLS

their own jungle book. Some examples include "There are toucans in the canopy," "Giraffes live outside the rain forest," and "There are ants on the forest floor." When the students did use animals that didn't live in the jungle, as with the giraffe example, they noted the exclusion. The students had learned that information needs to be more specific than that in fiction and should adhere more closely to reality. Difficulty of Material Much of what the students read was well above their grade placement. Despite this, students learned strategies for dealing with complicated information text that, at least as judged by readability, should have been beyond their capabilities. They routinely consulted books and reference works that were at adult level, as measured by Fry readability calculations. An important strategy was that students would try to locate the neighborhood of the text where the information was located, even if they could not read the passage. They would then ask one of the adults (either a teacher, an aide, a researcher, or a parent) in the room to read a relevant section to them. For example, they might look for numbers in a National Geographic Encyclopedia article about whales when they were trying to determine sizes of whales. They knew that the request for reading would more likely be honored if it were targeted at the information they wanted, instead of a broader request. By the end of February of the 1st year, students were using two books about whales that measured between the sixth and seventh grade level in readability. By the end of April, one of the better readers brought an information sheet about bullfrogs to read. She read it without errors, even though it had a readability level measured at the mid-lOth grade level. Sources of Information Perhaps the most important instructional feature of this intervention is that students should learn when to use texts for different purposes. When students in the 1st year were asked where they would get information about something we wanted to know, they responded with a long list of possibilities: movies, books, shows on TV, field trips, parents, and so forth. Students in the 2nd year were asked the same question in the context of the farm unit. They listed a farmer, parents, a farm, and a library. For the students at the beginning of the year, books were not listed as a source of information. By the end of the year, all students reported that they could find information by looking in books.

6. INFORMATION TEXT THEORY AND PRACTICE

117

Another important example of the importance of reading information text first occurred when the students read Eric Carle's (1988) book A House for Hermit Crab. The importance of the story is bound up with knowing that hermit crabs move from home to home using abandoned shells for houses. Without having learned about hermit crabs first, the students would have had little or no knowledge of the important characteristics used in story. Students seemed to pick up the various ways of locating information rapidly. They liked the idea of being able to select from a table of contents in a book only one chapter or a few pages without having to read the entire book. They also clearly began to understand the various uses of other text features for finding information. One interesting extension was when the students questioned the teacher about the list of other books in the same series. One student misidentified it as an index. When the teacher explained it, another student asked whether or not the book she was reading would be in the list of other books in the series. Clearly, the students were not just learning these features; they were thinking about them in sophisticated ways.

CONCLUSION This study demonstrates that it is not only possible but desirable to teach students at the first grade level about information text genres, features, and uses. Students of widely differing abilities were able to understand, learn, and make use of this information in reading and writing while making normal or above average progress. Moreover, it is possible to conduct early reading instruction in information texts, rather than relying solely on literary forms. Given the text demands of later schooling and life after school, these are important skills that students need to acquire early. An important caution is needed. We are not advocating the elimination of literary or story text from the curriculum, even for reading instruction. We are advocating a balance that we feel is missing at present. In the present study, approximately 50% of reading instruction in first grade was conducted in the context of expository or information texts. The remainder was accomplished with story or literary forms. Students in the classes we studied read a great deal in literary formats. However, they also read far more information texts than most students ever do in reading instruction. This last is a crucial point for the perspective we have adopted. Many programs emphasize the combining of literary materials and information. Almost all of them, without fail, suggest that exposition should be used in

118

KAMIL, LANE, NICOLLS

a secondary role to narrative. However, the emphasis in this intervention is different. It assumes that these genres are sufficiently different and important that reading instruction needs to account for both. It is also assumed that exposition should not be presented as a secondary text genre. Although it would be wonderful to claim that this instructional intervention accounted for the results, there are many factors that account for the findings. There was a deliberate attempt on the part of the teacher to make literacy authentic and comprehensive. Parents were enlisted to help their children and participate in the classroom. Also, literacy activities were clearly valued in the classroom, and everyone participated, including the adults. Given the strong rationale for having students learn about reading information, these results were encouraging. The students seemed to do at the very least as well as they would have in the more traditional storybased curriculum. That evidence, informal as it is, suggests that this intervention is worth the investment. The information text intervention was judged to be effective, but it was important to find out what instructional practice was in other classrooms. To determine whether the elements of the intervention could be transplanted, it was important to examine current information text practices of teachers in other classrooms. There was very little guidance, except for the observation that students were generally exposed to no more than 3.6 min of information text reading per day (Duke, 2000). WHAT HAPPENS IN OTHER CLASSROOMS DURING THE 3.6 MINUTES? Duke (2000) found that the amount of expository text that first grade students could be exposed to in schools would account for 3.6 min per day of reading. Most of the text that students encounter is story or narrative. What this means is that students simply do not have access to a large amount of information text. There would be little opportunity to practice whatever was taught about information text. Emma Nicolls became interested in the issue of instructional practices during the 3.6 min per day that students did encounter information text. To address this issue, she undertook an observational study. THE STUDY During spring 2001, 36 one-hour observations were carried out in four Northern California first grade classrooms. The four teachers were interviewed, and genre assessments of the four classroom libraries were con-

6.

INFORMATION TEXT THEORY AND PRACTICE

119

ducted. The participating teachers chose the time to be observed during their most focused periods of reading instruction. The classrooms were chosen to reflect a range of socioeconomic status (SES) and limited English proficient (LEP) children. A summary of some relevant characteristics of each of the classrooms is given in Table 6.1. Classroom Observations Two types of observations were made: timed coded data and open notebook narratives. For the timed observations, the focus was on the teaching of specific skills, which were categorized into three broad areas: skills for reading nonfiction, such as how to use an index; skills for reading fiction, such as story grammar; and skills generalizable across all genres, for example, phonics. The definitions of these three categories were expanded and refined as the research progressed. Any instructional activity that could not be classified in one of the three categories was labeled as "Other." Classroom Library Assessments The content of classroom libraries was coded using five categories: storybooks and, following the definitions used by Duke (2000), narrativeinformational, informational-poetic and informational, and all others. After initial scans of the libraries, a sixth category was created: information/ story. Teacher Interviews An interview with each participating teacher occurred near the end of the research. Each teacher was asked the same questions so as not to unfairly TABLE 6.1 Selected Characteristics of Schools

Students

ELL

Teacher Experience (years)

P

18

11

4

N

15

2

6

B

14

4

30

H

20

17

1 (noncredential)

School

Note.

ELL = English language learners.

Classroom Environment Mid-SES, literacy focus, class library, with fiction and nonfiction sections Mid-SES, literacy focus, guided reading, four ability groups Title I school, lower SES, literacy focus, book corner, guided reading, four ability groups Lower SES, Open Court program

120

KAMIL, LANE, NICOLLS

influence the results. One teacher (Teacher P) was asked an extra question about genre because this was felt to be appropriate given that she placed more emphasis on genre in her teaching. Each interview was tape recorded and transcribed.

FINDINGS Table 6.2 displays the total proportion of time spent on each of the four categories during the observations in each school. What follows is a brief summary of the observations and interviews for each teacher and class. School P Observations What is initially striking is the high proportion of time (19%) spent in School P on nonfiction instruction compared with the overall average of 7%. Teacher P personally thought that it was important for students to understand the functions of reading. Principal P had a particular interest in literacy acquisition issues and promoted the implementation of new research findings and an innovative attitude to literacy throughout the school. The classroom library assessment data showed that Class P had the greatest number of books and by far the greatest number of books classified within the information groupings. In her interview, Teacher P stated that she saw it as part of her role as an educator to have a wide range of books within the classroom so that all students can access something they can enjoy. Class P frequently visited the school library to browse and choose books. It is also important to examine more closely the nature of the 19% of instructional time spent with information. Teacher P explicated the role of nonfiction and strategies to read it. Teacher P specifically instructed the students about the difference between fiction and nonfiction, its use for answering questions they had and providing them with more questions, TABLE 6.2 Percentages of Instructional Time for Types of Skills (out of 8 hours) School Type of Skill Instruction

P

N

B

H

Average

Generalizable Fiction Nonfiction Other

63 15 19 3

67 21 4 8

61 29 5 5

64 30 1 5

64 24 7 5

6. INFORMATION TEXT THEORY AND PRACTICE

121

the fact that it can be read selectively, and the use of the contents page. This level of attention to nonfiction was not observed in any other classroom. Teacher P also modeled the discovery of information from the Internet. She told them that she needed to find some facts out about the habitat of the luna moth (which had hatched in the classroom that morning) and that she was going to look on the Web site Yahooligans. The information, which she subsequently printed out, was shared with the class. The generalizable skills observed in Class P were taught predominantly in storybooks. So, for example, when children were observed to read and helped by Teacher P to sound out a word, they were almost always reading from a fiction text. School N Observations Classroom N has the highest proportion of "Other" out of the four sites (8%). This time was not spent dealing with either behavior management issues or outside interruption but rather on transitions between their literacy stations; four stations necessitated three changes in the 1.5-hour reading period. Class N spent the most time on generalizable skills (67%). Students reading to the teacher and their guided reading group, or reading to a peer, accounted for the majority of this activity. Teacher N stated that she thought fluency and confidence were very important to successful reading. The majority of time students spent with the teacher during the observations was spent reading aloud, reading to a partner, or reading silently (the latter was less common). As she said in her interview, all the children were reading successfully, so the decoding skills were less of a focus. As at School P, generalizable skills were taught using story text. This was reflected in the fiction-related skills, where implicit comprehension was the dominant focus. Teacher N also frequently asked the students questions about the text as they were reading it: "What do you think that means?" and "How does that make him feel?" Teacher N made a few references to more explicit comprehension strategies, such as pointing out to students on three separate occasions that "good readers go back and reread what they do not understand." A large amount of the 4% of time spent in nonfiction skills instruction was in a single session where students were required to read preprinted workbooks detailing the development of a larva into a butterfly. It was also observed that Teacher N discussed with the students that they must report on exactly what they see, as opposed to writing about the way they felt: "We can't say caterpillars are nice, but we can say things like caterpillars crawl." This activity illustrated to the students that information should be recorded differently from story. However, the discussion was short-lived. Other informational reading instruction was about content.

122

KAMIL, LANE, NICOLLS

When an information book was used in Classroom N, it was more often an information/story-type text, and any skills instruction surrounding it was about the story genre. School B Observations Teacher B said that she was especially interested in elementary literacy acquisition and that this had been the topic of her master's thesis. Within this area of interest, she was particularly concerned about children's literature. Her focus was predominately on story text during the observed teaching of reading. However, children in Class B had a much wider range of book choices than students in other classes. Teacher B had the second highest number of information-type texts in her classroom library. She stated in her interview she recognized the value of nonfiction texts as part of providing diverse text choices, as well as encouraging reluctant readers. Most of the total instructional time was spent in guided reading sessions with four ability groups. However, less time was spent on transitions between activities in Class B because continuous activities were occurring while the guided reading was being completed with each group. As with the other sites, generalizable skills were overwhelmingly taught in story text. Teacher B took an active interest in providing the children with quality story texts, poems, or plays as opposed to what she perceived as the dry and often boring basal reader selections. Guided reading was carried out with carefully chosen texts (matched to students' reading levels), and students were free to take home a large array of classroom library books. Due to the wider variety of reading abilities within this classroom, Teacher B spent more time than Teacher N teaching generalizable skills, such as phonemic awareness skills, phonics rules, naming of words and letters, and blending. So less of the 61% spent on generalizable skills was spent on fluency, although with the higher ability groups more time was spent actually listening to the students read. When Teacher B spent time on implicit comprehension skills, she spent much of it discussing whether and why students liked or disliked the text, rather than on strategies for comprehending. The vast majority of the time on nonfiction skills was spent discussing issues of content, rather than specific nonfiction reading strategies. This usually occurred when Teacher B asked the students to reread or discuss what they had read during their morning circle time. At least two thirds of the books being read during free reading time were nonfiction. The evidence from this and the other classes suggest that, when students have more access to nonfiction, their personal book choices are more balanced between fiction and nonfiction.

6. INFORMATION TEXT THEORY AND PRACTICE

123

School H Observations Reading instruction in Class H happened both before and after recess in the morning. Generally, instruction on generalizable skills would occur before recess. All the generalizable skills were taught following the Open Court teaching manual. Teacher H held the manual in her hand throughout most of the instruction time. The time spent (1%) teaching nonfictionrelated skills was observed on four occasions when Teacher H told individual students to look up a word in the picture dictionary that children had under their desks. Apart from this, all the reading instruction occurred in fiction texts. When oral reading occurred, students would read to the whole class and the teacher. Due to the variety of English language ability in Class H, this system often proved problematic. Teacher H spent much of the 5% of time classified as "Other" in behavior management. Class H was often restless and distracted, perhaps because of different levels of English language ability. Class H had the largest proportion of instruction in fiction-related skills, albeit only slightly more than that of Class B. The importance of comprehension strategy teaching was illustrated on a number of occasions in Class H. In comparison with the other three sites, it seemed more likely that students would not understand that which they were reading. For example, even after reading aloud fluently to the class, some students could not answer simple questions about what they had just read.

CONCLUSIONS AND IMPLICATIONS The first and most obvious conclusion from these observations is that there is much variability in the use of information texts for reading instruction. The range of time spent with information text (1% to 19%) is relatively large. Except for a single classroom, the amount of time spent with information text does not reach parity with stories. The results also show that skills specific to information text are not part of primary reading instruction. These observations suggest that early reading instruction does not prepare children for effective reading of information text. A second finding is that even when nonfiction text was used during reading instruction the instructional focus was on the content, rather than skills for reading nonfiction. What happened was a type of instructional compartmentalization with regards to genre. Story texts are used for most reading instruction. When information text was used, there was no focus on generalizable skills instruction. Comprehension strategies taught were more appropriate to story genre (e.g., encouraging prediction and imagi-

124

KAMIL, LANE, NICOLLS

native responses). Learning only story-based comprehension strategies could be a serious problem for effective reading of information text. Perhaps the most striking contrast in the observations was that there was little or no instructional overlap with the principles on which our original intervention was based. That is, most teachers did not teach about information text structure or how to read information text.

WHAT CAN (AND SHOULD) BE DONE ABOUT INFORMATION TEXT? On the basis of these findings and those described in the first part of this chapter, it can be argued that there is evidence for including more nonfiction in early reading instruction. However, these data also suggest that simply including more nonfiction would not be sufficient. Teachers must focus on teaching how and why to read information texts in addition to the content. To prepare teachers to do so will require a concerted effort in professional development to raise awareness of the need to teach genrespecific skills. It is important to provide students with the reading skills they need for success in school and in the workplace. Given the overwhelming preponderance of information text in those settings, students must be prepared to read those texts. We have an idea of what teachers do in reading instruction when it comes to information texts. We have a reasonable model of how to teach students to read information texts. What we need now is a way to raise the collective consciousness of the profession for the need to teach these critical reading skills. A spate of research, published books, and emphasis in state standards about teaching information text shows that some change is beginning to occur. We need to continue to push for instruction in information text to ensure that students are prepared to confront the reading that awaits them in school, in the workplace, and everywhere else.

REFERENCES Bernhardt, E., Destino, T., Kamil, M., & Rodriguez-Munoz, M. (1995). Assessing science knowledge in an English/Spanish bilingual elementary school. Cognosos, 4(1), 4-6. Carle, E. (1988). A house for hermit crab. New York: Simon & Schuster. Chambliss, M. J., & Calfee, R. C. (1989). Designing science textbooks to enhance student understanding. Educational Psychologist, 24, 307-322. Duke, N. K. (2000). 3.6 minutes per day: The scarcity of informational texts in first grade. Reading Research Quarterly, 35, 202-224.

6. INFORMATION TEXT THEORY AND PRACTICE

125

Educational Testing Service. (1995). A synthesis of data from NAEP's 1992 Integrated Reading Performance Record at Grade 4. Washington, DC: Office of Educational Research and Improvement, U.S. Department of Education. Flood, J., & Lapp, D. (1986). Types of texts: The match between what students read in basals and what they encounter in tests. Reading Research Quarterly, 21, 284-297. Freeman, E., & Person, D. (1992). Using nonfiction trade books in the elementary classroom: From ants to zeppelins. Urbana, IL: National Council of Teachers of English. Hoffman, J. V., McCarthy, S. J., Abbott, J., Christian, C, Corman, L., Curry, C, Dressman, M., Elliott, B., Matherne, D., & Stahle, D. (1994). So what's new in the new basals? A focus on first grade. Journal of Reading Behavior, 26, 47-73. Kamil, M. L. (1994, April). Matches between reading instruction and reading task demands. Paper presented at the American Educational Research Association Conference, New Orleans, LA. Kamil, M., & Bernhardt, E. (2004). The science of reading and the reading of science: Successes, failures, and promises in the search for prerequisite reading skills for science. In E. W. Saul (Ed.), Crossing borders in literacy and science instruction: Perspectives on theory and practice (pp. 123-139). Newark, DE: International Reading Association. Kamil, M., & Lane, D. (1997, December). Using information text for first grade reading instruction: Theory and practice. Paper presented at the National Reading Conference, Scottsdale, AZ. Kamil, M. L., & Lane, D. (1998, December). Information text, task demands for students, and readability of text on the Internet. Paper presented at the National Reading Conference, Austin, TX. Mandler,J., & Johnson, N. (1977). Remembrance of things parsed: Story structure and recall. Cognitive Psychology, 9, 111-151. McKenna, M., & Robinson, R. (1997). Teaching through text: A content literacy approach to content area reading. New York: Longman. Meyer, B. J. F. (1975). The organization of prose and its effects on memory. Amsterdam: NorthHolland. Moss, B., & Newton, E. (1998, December). An examination of the informational text genre in recent basal readers. Paper presented at the National Reading Conference, Austin, TX. Moss, B., & Newton, E. (2002). An examination of the informational text genre in basal readers. Reading Psychology, 23(1), 1-13. Nicolls, E. (2001). Instructional implementation of informational literacy in first grade. Unpublished master's thesis, Stanford University, Stanford, CA. Nicolls, E., & Kamil, M. L. (2001, December). Instruction implementation of information text. Paper presented at the National Reading Conference, San Antonio, TX. Ogle, D. (1986). K-W-L: A teaching model that develops active reading of expository text. Reading Teacher, 39, 564-570. Pappas, C. (1993). Is narrative "primary"? Some insights from kindergarteners' pretend readings of stories and information books. Journal of Reading Behavior, 25, 97-129. Stein, N. L., & Glenn, C. G. (1979). An analysis of story comprehension in elementary school children. In R. O. Freedle (Ed.), New directions in discourse processing, Vol. 2 (pp. 53-120). Norwood, NJ: Ablex. Venezky, R. (1982). The origins of the present-day chasm between adult literacy needs and school literacy instruction. Visible Language, 16, 113-126.

This page intentionally left blank

7 Research and Theory Informing Instruction in Adult Literacy Banu Oney University of Illinois at Chicago

Aydm Yiicesan Durgunoglu University of Minnesota, Duluth

By the summer of 2003, a pilot project that had started with fewer than 100 participants and 5 teachers in 1995 in Istanbul had expanded to become a model, evidence-based adult literacy program with well-established assessment, evaluation, and staff development components and had reached a diverse group of 35,000 participants from 17 provinces in Turkey. In this chapter, we describe the development and implementation of the Functional Adult Literacy Program (FALP) with special emphasis on how research and theory in literacy acquisition informed and shaped the nature of the program and how lessons learned through the implementation of the program, in turn, shaped our conceptualization of the model for adult literacy acquisition. Research and theory have had limited impact on instructional decisions in adult literacy programs. More often, traditional solutions, opinions, and trial-and-error rather than evidence-based models of literacy acquisition have guided educational practice. In most cases, it is difficult to identify the theory of adult literacy acquisition that programs are based on. There is another side of the coin as well: Practitioner knowledge and lessons learned through practice rarely have had an impact on the development of theory and research in adult literacy education. Unlike children, adults are expected to acquire complex literacy skills over a short time. They also do not have the advantage of a school environment that allows for a high level of interaction with print. In fact, adults usually return to settings in which the practice of literacy skills is rarely encouraged. Therefore, adult literacy programs have to rely on in127

128

ONEY AND DURGUNOGLU

tensive, effective, evidence-based instructional strategies that are capable of developing and sustaining literacy. Through the development, implementation, and revision cycles of our FALP system, we have tried to break this gap between the components of the adult education process and to build an effective program based on a solid model of adult literacy acquisition informed by literacy research and theory. This program has been modified and revised through feedback from program evaluation and practitioner knowledge, both of which have expanded our understanding of the processes underlying adult literacy acquisition. THE FRAMEWORK: A MULTIDIMENSIONAL MODEL OF LITERACY ACQUISITION As opposed to adopting a strictly cognitive or social-anthropological perspective, we developed a multidimensional model of literacy acquisition that assumes that the affective, social, cultural, and instructional circumstances of an individual should to be analyzed along with cognitive processes, such as decoding, comprehension, and vocabulary development, in understanding literacy acquisition. In this model, the interactions of the cognitive-linguistic component with the affective component are situated

FIG. 7.1. The multidimensional model of literacy acquisition.

7. ADULT LITERACY INSTRUCTION

129

within the specific social, cultural, and instructional contexts of individuals. This model is culled from studies with children and adults, ours (Durgunoglu, 2000; Durgunoglu & Oney, 1999,2002; Oney & Durgunoglu, 1997) and others' (Adams, 1990; Juel, Griffith, & Gough, 1986; Langer, Bartolome, & Vasquez, 1990; Lomax & McGee, 1987; Snow, Barnes, Chandler, Hemphill, & Goodman, 1991; Tunmer, Herriman, & Nesdale, 1988), which have focused on different subparts of the model. The Cognitive-Linguistic Component The cognitive-linguistic component of the model consists of facilitators, building blocks and outcomes of reading acquisition. Outcomes: Reading and Writing. The outcomes of literacy acquisition are fluent reading with comprehension and writing proficiency. Neither of these processes are simple and unidimensional. Besides understanding of text, reading comprehension also consists of responding to and learning from text, using reading and writing for daily practices. Likewise, writing proficiency includes the mechanics of writing and expressing thoughts coherently and appropriately for a given audience. Building Blocks: Listening Comprehension and Decoding. Listening and reading both require comprehension of the language. However, skills required for reading comprehension are not limited to understanding the semantic and syntactic aspects of conversational language. Especially related to literacy acquisition is the ability to comprehend decontextualized academic language. Still other important dimensions of listening comprehension are vocabulary and background knowledge, which grow through a learner's experiences with oral and written languages within a culture. It must be noted that, in this model, listening comprehension rather than productive fluency is included as a building block. In reading, phonological information has to be extracted from print using orthographic decoding routines. Quick and effortless recognition of words is an integral component of fluent reading, and unskilled decoding is regularly associated with poor comprehension. When the individual words of a text are read inaccurately or too slowly, comprehension suffers because integrative processes are disturbed. Likewise, when spelling is laborious, it interferes with the quality of writing. Facilitators: Metalinguistic Awareness and Skills. Before a beginning reader can progress to the analytic stage and begin to systematically use the correspondences between graphemes and phonemes, several developments need to occur. These insights can be grouped under the metalinguistic skills of phonological awareness, functional awareness, and

130

ONEY AND DURGUNOGLU

syntactic awareness. They facilitate the building blocks as well as each other. These insights develop as a result of literacy experiences both at home and school. Another metalinguistic insight that is not tied to decoding but to comprehension is metacognitive skills. Before beginning readers can understand how orthography represents spoken language, they need to be aware of the relevant units in spoken language, such as words, syllables, onset-rimes and phonemes. Phonological awareness is highly correlated with word recognition and spelling, and it is an important factor for adult beginning readers as well as children. Studies in English and in other languages suggest that adults who cannot read have difficulty in manipulating phonemes. Syntactic awareness refers to the beginning readers' ability to reflect on the internal grammatical structure of the sentences. Even though unable to articulate a relevant rule, a beginning reader may still be aware of the regularities in a language. Syntactic awareness can affect decoding and listening comprehension. It can enable readers to monitor ongoing comprehension and notice when a word does not fit the ongoing representation of the text. It can also influence reading by enhancing or verifying the incomplete visual and phonological information that an inexperienced reader has extracted from text. Also, syntactic awareness as measured by morphological knowledge predicts spelling performance. Functional awareness refers to the beginning readers' developing notions about the functions and conventions of written language. Through interactions with written language, readers develop concepts about print. This awareness also includes an understanding of when and why print is used. For adults who are used to navigating through society mostly with the help of spoken language, relying on print is a new challenge. Through interactions with print, readers also develop sensitivity to the structural relations present in text, which in turn aids comprehension. Finally, metacognition involves awareness of what skills, strategies, and resources are needed to perform and monitor a task effectively. These skills are affected by domain knowledge. Prior knowledge and experience in a content area facilitate the use of metacognitive strategies. Although control strategies can also apply at the word level (e.g., figuring out the meaning of an unfamiliar word), most of the interest in metacognition is at the comprehension level for both listening and reading. Typically, poor readers adopt decoding rather than comprehension goals during reading, and they are less accurate in monitoring comprehension failures. The Affective Component As Comings, Parrella, and Soricone (2000) summarized, a key difference between adult and child literacy education is that adults have a choice in attending classes. Therefore, adults' goals, needs, and expectations affect

7. ADULT LITERACY INSTRUCTION

131

their attendance, persistence, and effort level and, through those, the outcomes. In our previous work, while we were looking at the cognitive components of adult literacy development, we were struck by the strong influence of the affective components that determine whether an individual will persevere or drop out and how much effort they will put into their literacy development. Informal observations and feedback from instructors and participants made us notice that a sense of self-efficacy, optimism (as can be defined by their explanatory style, that is, how they explain the causes of bad events in their lives), and clear goals were traits of individuals who stayed in the program (Durgunoglu, 2000). In addition, participating in adult literacy classes produces not only cognitive but also affective outcomes, such as increased self-confidence, independence, and awareness of the world (Durgunoglu, 2000, 2003). In a recent study, Skilton-Sylvester and Carlo (1998) interviewed 100 English as a second language (ESL) students to understand their goals and expectations about adult literacy classes. Several themes emerged from that research. The participants were hoping to communicate better with both the native speakers in their communities as well as their own children who were rapidly becoming monolingual English speakers. They also reported the need to become independent and take care of their business without relying on others' help. Another reason given was to be able to help children. Those latter two goals were also the ones reported most often by our FALP participants in Turkey (Durgunoglu, 2000). Finally, adults also discussed the more functional goals of attending classes for educational and economic improvement rather than just to learn survival skills in English. Given anecdotal evidence, we can expect that programs addressing the needs of the participants are likely to lead to better attendance, more effort, and persistence. Another factor that affects participants' expectations is previous schooling. Individuals who have had positive experiences at school have a more solid literacy base. In addition, they have experienced success and have higher motivation levels (Comings et al., 2000). The academic selfconcept and expectations develop very early in childhood. Children with positive academic self-concepts perform literacy tasks much better than those children with negative academic self-concepts (Chapman, Tunmer, & Prochnow, 2000). The Instructional and Social Context In our model, the cognitive and affective components of literacy development evolve within the social contexts of family, community, and culture. Furthermore, the specific educational orientations of the instructional programs guide the nature of literacy acquisition. The social and instructional

132

ONEY AND DURGUNOGLU

factors are included in our model as the contexts of literacy development. Hence the constructs of the model in Fig. 7.1 are embedded within these contexts of development. The Social Context for Literacy Development. The study of literacy has become more sensitive to cultural, linguistic, gender-related, and ethnic factors. Focusing on power relations among groups, Cummins (1994) proposes that these are reflected not only at the macrolevel in the society but also at the microlevel as in the interactions of teachers and students. Literacy is interlaced with numerous symbol systems and located within social contexts of differential power distribution and identity. The meanings assigned to literacy by social and cultural groups determine the nature of individuals' attempts to acquire and use literacy. For example, Street (1984) and Wagner (1993), focusing on Quranic study, showed how literacy practices carried meaning primarily through their embeddedness in specific cultural values and orientations. Adult literacy programs are effective to the extent that the models on which they are founded allow them to be sensitive to the social and cultural nuances around literacy in particular cultures. Recently we broadened how our framework is conceptualized to include social contexts and consequences. In our model of literacy development, we acknowledge that literacy develops within the social context of family, peers, community, and culture and that these factors critically shape the nature of development. Social contexts determine the value and use of literacy for a participant (Durgunoglu & Verhoeven, 1998). These include not only practical matters, such as the possibility of a better job and furthering one's education, but also self-identity and independence. The sociocultural contexts shape the learners' attitudes and expectations about literacy. Adult literacy programs often fail because of a lack of attention to important social factors that eventually lead to student dropout from programs. The dropout rate from adult basic education classes is reported to be 50% in the National Evaluation of Adult Education Programs (Young, Fleischman, Fitzgerald, & Morgan, 1994). Although some factors such as the desire for a higher income are positive forces that support persistence in adult education, others, such as lack of free time or the lack of child care, are negative forces that act as barriers to adult literacy participation and persistence. In a study on student persistence among adult basic education students, Comings et al. (2000) identified the positive and negative forces on student persistence. The strongest positive force they identified was the support of friends, teachers, and fellow students. In our work we also found that adult literacy participants experience various obstacles as well as supports for adult literacy participation. Many

7. ADULT LITERACY INSTRUCTION

133

successful adult literacy participants had spouses and children who were supportive of their education (Durgunoglu, 2000). They also reported various barriers, such as having young children, unsupportive family members, and inflexible jobs, that often were negatively affecting success in literacy acquisition. The Instructional Context for Literacy Development. Continuing with our theme of complementary frameworks of cognitive and affective factors, the instructional context can also be analyzed in terms of cognitive content and affective atmosphere. The cognitive-instructional context refers to the instructional practices and strategies, whereas the affective-instructional context refers to the affective climate created in the classroom. Literacy education is offered by a wide variety of agencies. The federal government, state governments, nongovernmental organizations, and private industry all commit funding to adult literacy issues in settings as diverse as local schools, libraries, correctional facilities, and churches. Local communities may also initiate their own programs. Given such variability among programs, the type and quality of instruction as well as the nature of the participants' experiences are determined by the funding sources of the programs. Any instructional effort has its own assumptions about the nature of the skill to be learned, the potentials of the learners, and the best way to implement it for optimal learning. A program will take one shape if literacy is assumed to be a process of acquiring decoding skills, and it will take a different shape if it is assumed that literacy is a lifelong, ongoing process of acquiring the meanings and functions of language. Across programs, curricula for adult literacy range from those carefully defined in terms of what is to be learned, what materials are to be used, and how teachers and participants are to progress through the materials to those where the curriculum is grounded in learners' own interests and objectives. In the latter case, the belief is that learners best extend their literate abilities by using reading and writing for authentic purposes, rather than building it up from direct tuition in component processes (Soifer et al., 1990). In these programs, students may enter and exit as they choose, and they generally choose their own goals and content interests (Venezky, Bristow, & Sabatini, 1994). However, Wagner and Venezky (1999) question whether this approach is effective for either the adult participants or the overall outcomes of adult literacy programs. The affective and social-instructional contexts also determine the nature of interactions and consequences in adult literacy programs. The affective and social-instructional contexts include educators' ideas about the learners, including their potential for learning, their ways of thinking as well as their background knowledge of the world, the subject matter,

134

ONEY AND DURGUNOGLU

and values about what is to be learned. Also included are the educators' strategies and skills as teachers and the engagement of the learners. In short, the affective-instructional context determines how the literacy program unfolds on a daily basis. This context determines how accepted and comfortable participants feel in a given program, which in turn affects their motivation, persistence, and desire to learn. Most adult education participants are unsure about their cognitive capabilities; therefore, support from the teacher as well as the fellow participants is quite important. Of course, here it is the perceived climate of the classroom reported by the participant that is of interest. Technical reports by Sherman, Tibbetts, Woodruff, and Weidler (1999) and Condelli (1996) list indicators of instructor competence and program quality in the U.S. adult education programs (but they do not report outcome measures). Among the indicators they highlight are learning in a comfortable, safe environment; learning from the peers; relating new learning to previous experiences; and applying theory/information to practical situations in their lives. They also note the following teacher quality indicators: keeping up with current knowledge, communicating with colleagues and learners, and working positively and nonjudmentally with diverse populations.

THE FUNCTIONAL ADULT LITERACY PROGRAM In 1995, at the invitation of the Turkish Ministry of Education, and with support from the Mother-Child Education Foundation, we developed the functional Adult Literacy Program in Turkey. FALP is an intensive 3month, 120-hour program that provides basic literacy skills to adults. FALP materials consist of three textbooks: participants' textbook, instructor's guide, and a theoretical guide to literacy for teachers (Durgunoglu et al., 2002). The Participants The target population for the FALP has been different from that in adult literacy programs in the United States or other Western countries. Adult literacy participants in the United States typically need to correct learning problems not resolved in their schooling or they do not have English as their home language (Auerbach, 1996). FALP, on the other hand, has targeted women with very little or no schooling and who have Turkish as their home language. (This year, we just started working with participants who have a different home language.)

7. ADULT LITERACY INSTRUCTION

135

FALP participants are primarily women between the ages of 15 and 65 years. The majority of them are migrants from rural areas, currently living in urban settings. These women have had little or no schooling, not because of cognitive deficits but because they were not allowed to go to school or had married early and had children. In interviews and classroom discussions, they express their feelings about being marginalized and oppressed because of their lack of literacy. Most report feeling inadequate and worthless, and others express fears of being ridiculed by others because of their lack of literacy. Most of these women report being confined to their home because they feel limited in negotiating with the demands of a literate world and unable to perform tasks such as taking a bus, conducting business in a bank, or helping their children with schoolwork. At the beginning of FALP courses, participants either cannot recognize letters and words or they have very limited knowledge of both. They also have a limited and narrow conceptualization of literacy and assume that literacy is being able to decode the symbols on the page. The overwhelming majority of FALP participants come to literacy classes in hopes that literacy acquisition will help them with basic life skills, such as being able to take a bus or finding one's way in a hospital. Participants usually report a strong social network with other women, mostly relatives and neighbors, often coming to literacy classes together. Unlike the pattern in most adult literacy programs (Gillette, 1987), the dropout rates for FALP participants are modest (about 25-30%) compared with almost 50% in some programs in the United States (Wagner & Venezky, 1999). Teachers and Teacher Training FALP relies entirely on volunteer instructors. We use the slogan "with an amateur spirit but professionally." The instructors give their time and effort with no material gain, sometimes spending their own money to cover copying and transportation expenses. When selecting volunteer instructors, we pay specific attention to their communication skills, making sure that they will be able to interact respectfully and warmly with a group of learners who may be unsure about their cognitive abilities and chances for success. Instructors have at least a high school education, but they do not need to have prior teaching experience. They undergo an intensive, 3week professional development program, which introduces them to FALP's philosophy, principles, and curriculum. The professional development program includes not only the technical aspects of literacy instruction but also interpersonal communication skills. We encourage the teacher candidates to get to know the goals, needs, and support systems of the participants in their classes and respect their existing knowledge and experience.

136

ONEY AND DURGUNOGLU

Each instructor is assigned a program liaison/consultant when they begin to teach. These consultants observe instructors, provide feedback, and establish a network among a group of instructors by coordinating information flow. Program Philosophy Although adult literacy programs intend to strengthen the ability to read, write, and calculate, they need to be embedded in a coherent philosophical and instructional approach to human learning and development. A general goal of FALP is to introduce the many dimensions of literacy and to help the participants use literacy to empower themselves (Freire, 1970). Above and beyond any instructional concerns, we try to offer a program built on a solid foundation of respect for the individual as an intelligent adult. Most adult literacy programs in Turkey have a deficit model of illiteracy. According to this view, literacy instruction is expected to "fix" some basic deficit by crossing a decoding threshold. FALP teacher-training sessions as well as the teacher guide have a strong focus on reversing this approach and establishing a respectful, trusting climate in the classroom and viewing literacy development as a continuous process. FALP's curriculum clearly sets specific learning goals and supplies the materials necessary to reach those goals. The aim is to provide a structured framework for instruction and evaluation that can be used by volunteer instructors even without prior teaching experience. However, the program is continuously revised, given teacher and participant input, and also the instructors are urged to bring in or develop materials consistent with their understanding of the potentials, goals, and interests of the participants. Participants are encouraged to engage in personal literacy activities reflecting their interests and needs. Thus, FALP invites teachers and participants to make literacy more relevant to their own social reality. FALP uses various approaches to literacy instruction, including those based on reciprocal facilitation of skills through group practices. In small group activities, participants learn to support and enrich each other's educational experience. The aim is to build a cooperative learning environment where participants share responsibility for learning and supporting each other. A Typical Lesson To describe how the theory informed the program development, we next describe a typical lesson of FALP and discuss how the implementation in the classroom reflects the components of the model we just described. Course participants use a textbook containing 25 units (Durgunoglu et al.,

7. ADULT LITERACY INSTRUCTION

137

2002). The book is currently in its third edition, and every edition has been revised in light of the feedback from the instructors as well as the participants. The teachers have a detailed annotated edition of the textbook. Each textbook unit has a short passage and its accompanying picture describing the same family in different situations, such as taking a bus, going to the hospital, and dealing with problems among family members. The textbook also has decoding exercises, applications (e.g., filling out a form), informational materials (e.g., first aid), and additional reading passages of poetry and stories. Finally, there are also math problems. A typical lesson in FALP has multiple components covered in every class in the particular order they are presented here. Some components may be more central than others as participants acquire particular skills during the course. Reviewing the Previous Day's Homework. There are several reasons for making homework assignments an integral part of FALP. The first reason is affective: It helps develop a sense of identity as a student and understanding of the school discipline and expectations. This is important for the participants because some have never attended school before. Another affective reason is to motivate the participants and enable them to justify their efforts and motivation both to themselves and to their family members. The homework evaluation includes teachers signing or putting a star by tasks that are well done. This is highly valued by the participants and is a strong motivator. Although we encourage them to work on their own at home, many participants ask for help from family members, thus homework also facilitates home support. For many women, homework and its appreciation by the teacher is validation of the hard work that they do in class. During homework check, teachers inquire about the participants who are absent and ensure that those individuals get the materials from a classmate that night. This builds a community and encourages the participants to watch out for each other and get to know each other. There are also cognitive reasons for including homework assignments. To automatize certain skills—especially decoding, fluent reading, spelling, and arithmetic—practice is needed. FALP lasts only about 3 months; therefore, homework is an essential part of the curriculum. Reviewing the material at home is a way to add to the instruction time. Finally, in some cases teachers may give different homework assignments, depending on what a participant needs to practice. This way, although the curriculum is highly scripted, individualized attention can be given to the participants. Putting the Day's Date on the Board and Discussing Any Events Related to That Date. This component leads to practicing the essential functional skills of reading and writing the date, which is required on

138

ONEY AND DURGUNOGLU

many transactions, such as filling out forms. It also enables the participants to practice and automatize writing numerals and the days of the week. This activity also provides many teachable moments about civics and history, for example, if there is a national holiday coming up that celebrates an important event in history. Reading One or Two Articles From a Newspaper. Teachers bring a newspaper to the class and read one or two articles that may be of interest. Participants usually prefer health care and human interest-related stories. Initially, teachers select and read a story, but as the class gets more comfortable with each other, participants are invited to select and read stories. During the professional development seminars, teachers are given explicit instructions on how to use the newspaper as a tool. Teachers ask inference questions about the news items and also inquire about the participants' relevant experiences as well as their opinions regarding that news item. Participants are usually quite vocal and animated in discussing these news items. In fact, a common problem is how to limit these discussions rather than getting them going. There are several benefits to this activity. Newspapers provide a first step in encouraging the participants to start using print rather than only spoken language as an information resource, thus understanding the functional uses of print. This activity also improves listening comprehension, inference making, and understanding of the context of a news story and its implications. There are also affective benefits. Increased knowledge about one's community and country as well as topics such as nutrition and health can be a source of empowerment. In addition, the classroom provides a safe environment for expressing one's thoughts, feelings, and opinions. This helps build a learning community and emphasizes that the participant's voices, usually silent in the community, are valued and respected. These discussions also validate that adult participants have a rich knowledge base; they are not "ignorant" as the society usually labels them. (In Turkish, the term cahil means both "ignorant" and "unschooled/illiterate.") Newspaper discussions also invite critical thinking by encouraging the participants to evaluate what is said and how it is said in the paper. Discussing the Picture in the Textbook at the Beginning of the Unit. As mentioned before, each unit starts with a short passage about a family. There is also a picture that depicts the events in the passage. In this exercise, the participants hear the title of the passage and look at the picture in their books. They then make predictions about what the passage may be about and also discuss any relevant experiences that the picture conjures. Normally, this prereading activity is used more heavily in the

7. ADULT LITERACY INSTRUCTION

139

beginning of the course, before the participants are able to read the passage on their own. The goal of picture discussion is to encourage the participants to make predictions and inferences and also activate any relevant background knowledge before they start reading the passage. As they progress in their reading ability, they discuss the picture in relation to their reading of the text. Participants also practice storytelling and review vocabulary items. Language practice such as this and activation of relevant background information are directly related to reading comprehension. At the affective level, this activity reveals that the participants may have different interpretations of the picture, but, as the class values different opinions, the participants learn that diversity of point of view and opinion is acceptable. Listening to the Passage That Was Depicted by the Picture. When the picture discussion is over, the teacher reads the passage aloud, modeling the intonation at the same time. The participants are asked to compare their picture-discussion predictions to what was actually read in the passage. This highlights that text and pictures may serve different functions. Also, among FALP students, one general tendency is to answer any comprehension questions based on their own experiences rather than on what was in print. This activity focuses attention on comparing what was in print with what was initially activated based on their own experiences. The teacher usually points out the differences or similarities in the participants' knowledge and the information presented in the text. This activity highlights the decontextualized nature of print. Story retelling and other listening comprehension skills are practiced further as well. As the teacher reads and reasons about print, some of the metalinguistic and metacognitive skills routinely used by skilled readers are made public and available to the participants. Decoding: Sound-Letter-Syllable-Word-Sentence Activities. In this component, word recognition and spelling proficiency are the focus. The activities center around sounds (phonological awareness); letters; lettersound correspondences, especially at the syllabic level; word recognition; and spelling. The decoding activities are quite intense during the 1st month of classes. Our previous research with children (Durgunoglu & Oney, 1999; Oney & Durgunoglu, 1997; Oney & Goldman, 1984) showed that decoding proficiency develops effortlessly in Turkish beginning readers due to the systematic correspondence between symbols and sounds. Thus, FALP explicitly teaches these correspondences. First, the identities and sounds of letters are established (i.e., hearing that particular sound in the beginning, middle, and end of different words). This activity serves as both a vocabu-

140

ONEY AND DURGUNOGLU

lary-building and a phonological awareness exercise. The next step is to create syllables, which are salient units in Turkish (Oney & Durgunoglu, 1997). The syllables are combined in different orders to create many new words. These words are then used in sentences to integrate decoding with comprehension. When all the letters are learned, we also practice syllabification. Turkish is a highly inflected language in which seven- to eightletter words containing three to four syllables are quite common. To facilitate the recognition of such long words, syllabification rules are explicitly taught and practiced. Reading the Passage and Answering Comprehension Questions. We emphasize that reading is not only word recognition. It requires thinking, reasoning, and inferencing, as well as activating prior knowledge on a topic. Good readers activate prior knowledge to help them integrate new text information (Wilson & Anderson, 1986), they are strategic in their processing of text, they monitor their understanding of text (Baker, 1989), and they are sensitive to the structural characteristics of text. Thus, FALP emphasizes explicit cognitive strategy instruction in addition to word recognition and spelling. In this component, the participants go back to the original passage that was read aloud by the teacher and try to read it on their own. Because they are quite familiar with the passage by now, the decoding burden is not as heavy. Here the activities depend on the level of the class. In more advanced classes, participants pair up and read to each other, whereas in less advanced classes, the whole class reads each sentence of the text with the help of the teachers, sometimes in a round robin fashion, sometimes in choral reading. Here, in addition to decoding words accurately and fluently, reading the text with intonation and meaning is also emphasized. To facilitate this, punctuation marks are analyzed and noted. After reading the passage, the participants answer more comprehension questions about the text. The questions involve not only factual information but also inferences. In this activity, comprehension monitoring and going back to text for clarification are encouraged. There is also writing practice. Comprehension questions require writing single-word answers in the early classes (e.g., name of the character) but require more detailed explanations as the course progresses. Application Exercises. In FALP, as the name implies, we try to make literacy relevant to the lives of the participants and their articulated needs. Our goal is to enable the participants to use reading/writing as tools in their lives. We include applications and assignments that in-

7. ADULT LITERACY INSTRUCTION

141

volve an awareness of the functions of literacy in everyday life, what Rogers (1999) calls real literacies. The textbooks have been revised to include topics for which the participants have expressed a need. For example, in the beginning, it never occurred to us that the participants may want to practice how to sign their names for filling out documents. As this and other needs became obvious, they were incorporated into the program. In this component of the program, real-world materials are examined (such as a telephone bill) or completed (a form to register a child in school). There may also be class trips (such as to the post office to mail the holiday greetings that were written). These activities make literacy a part of the participants' lives. Additional Reading Comprehension Exercises. Because reading involves language, another goal was to enable the participants to hear the nuances and the richness in texts. Every unit includes additional reading materials that further develop reading comprehension and writing proficiencies. Some reading materials are informational (e.g., preventive health), but some are purely aesthetic, such as poetry. This highlights the affective dimensions of literacy. Reading and writing are not only for acquiring knowledge but also for emotional enrichment, to lift our spirits, and to make us feel empathy, anger, joy, interest, and curiosity—in short, to make us human. Mathematics. When we noticed that we could not even ask the participants to turn to a certain page, we realized that numeracy is an important part of literacy instruction (Durgunoglu & Oney, 2002). In FALP, we start with number recognition and counting to make these processes as automatic as possible. We also add simple arithmetic and functional exercises, such as reading the clock, reading transportation time tables, recognizing price tags, and making change with money. Keeping a Journal. One of the goals of FALP is to ensure that the participants have confidence in their voices and can express their thoughts, feelings, and opinions. To facilitate this process, we encourage participants to keep a journal and share it with their teacher, if they wish. At the cognitive level, this activity builds spelling and writing skills. However, there is also a benefit at the affective level. Although some journal entries may include routine descriptions of what they did the day before, we also find that many participants write poems, express their appreciation for their teachers, and describe their innermost thoughts. Seeing this trust

142

ONEY AND DURGUNOGLU

built between the teachers and the participants is one of the indicators that a safe learning community has been created in the classroom.

RESEARCH, PROGRAM EVALUATION, AND STUDENT AND PRACTITIONER FEEDBACK INFORMING PROGRAM EFFECTIVENESS We had various opportunities for research and evaluation since the first implementation of FALP in 1995. These research and evaluation efforts ranged from basic research on the processes of adult literacy acquisition to evaluation of cognitive social and personal outcomes of program participation. Furthermore, one of us actually taught an FALP course and conducted a participant-observation study. During these evaluations, we used multiple research approaches, with methods ranging from experimental manipulations to interviews, focus groups, and qualitative analyses to study the cognitive, affective, social, and instructional factors in adult literacy acquisition. The results of our research and evaluation led to modifications of our theoretical model of literacy development as well as to three distinct revisions of our program and its implementation. Testing the Model of Literacy Acquisition Because earlier versions of our model of literacy development were based primarily on research with children, we first had to determine whether this model captured the nature of adult literacy acquisition. In a study on the cognitive-linguistic component of the model, we focused on an in-depth assessment documenting the progress of a group of FALP participants (Durgunoglu & Oney, 2002), measuring the facilitators, and building blocks and outcomes of literacy acquisition. This study demonstrated that participants who started FALP with very low levels of literacy proficiency demonstrated significant improvements in their letter and word recognition, spelling, phonological awareness, and reading comprehension levels after only 90 hours of attendance in the program. We observed that word recognition proficiency was significantly related to both letter recognition and phonological awareness, replicating the patterns we observed with children (Durgunoglu & Oney, 1999; Oney & Durgunoglu, 1997). This study helped establish parallels between adult and child literacy acquisition, test the cognitive-linguistic component of our model, and alert us to the need to further develop the affective, social, and instructional components of our model because the cognitive component alone was not sufficient to describe the complex process of literacy acquisition fully.

7. ADULT LITERACY INSTRUCTION

143

Program Evaluation FALP has been evaluated several times during various phases of implementation, and each evaluation led to successive revisions of the instructional content, the materials, as well as the staff development program. These evaluations were of two sorts. One set of evaluations compared FALP participants with participants of the traditional adult literacy program in Turkey. The second set of evaluations took a closer look at FALP participants and studied their development over time. The details of these evaluations can be found in Durgunoglu (2000), Durgunoglu and Oney (2002), and Durgunoglu, Oney, and Kuscul (2002). To summarize the evaluations briefly, comparisons of FALP with the traditional adult literacy program in Turkey consisted of in-depth assessments and interviews as well as quick diagnostic tests of participant performance (Durgunoglu & Oney, 2003). These evaluation studies indicated that FALP is effective in establishing basic literacy skills, especially in participants who were more likely to be marginalized. We also learned that the initial 90 hours of instruction was unrealistic for achieving the goals of fluent word recognition and comprehension. A closer look at FALP participants indicated that the program had the biggest impact on those individuals who already had a foundation for literacy, such as some letter knowledge and some decoding skills. For these participants, gains were immediate and long lasting. The evaluations revealed several factors key to FALP effectiveness. First, explicit teaching of phonological awareness, spelling-sound correspondences, and reading comprehension strategies (such as inferring, predicting, and questioning) was found to be critical. Second, the realworld exercises that involve filling out forms and reading bills, newspapers, and bus signs were also key to the functional relevance dimension. Finally, the poetry, folktales, and drama that provoke an emotional response to print were key to support the affective dimension. We believe that a balanced combination of these factors makes FALP an effective program. The patterns of results we observed also indicate that the threshold model of literacy, where a certain level of performance is considered the goal, is inadequate. Instead, we now view literacy as a continuous development shaped by the social contexts and personal needs of participants. Exit interviews and focus groups with FALP participants informed us about their perceptions of the program, the materials, and their instructors. There was overwhelming consensus that the course structure and materials met participants' goals. Also, participants have consistently rated the curriculum and their teachers as very effective. Our formal and informal observations, as well as one of us teaching a course, indicated

144

ONEY AND DURGUNOGLU

that this supportive, nonthreatening climate perceived by the participants facilitates persistence and engagement. The system of volunteer instructors supported by intensive training and a support network functions well. However, the participants unanimously expressed that the duration of the course was too short. We have responded by developing a follow-up course to allow participants to further their literacy development. This second course, named FALP2, lasts 60 hours and focuses on reading comprehension, writing, and knowledge acquisition. As with FALP1, there is a participants' textbook and an instructors' annotated edition (Durgunoglu, Oney, Dagidir, & Kuscul, 2000). Teachers and teacher trainers also provided valuable information that helped us restructure the 3-week teacher-training content and materials. Our strategic decision to rely on volunteer instructors rather than professional teachers was the result of evaluations by teacher trainers. Our experiences during teacher training as well as interviews with teachers showed that professional teachers were less likely to adopt FALP's instructional strategies, which were often contrary to their beliefs and practices around literacy instruction. Many professional teachers especially had difficulty implementing FALP's approach to decoding and comprehension instruction. We are currently conducting a study on teacher change and development as a result of FALP participation to better understand teachers' needs, goals, and experiences during FALP participation. We hope these studies will help us to learn how to reduce instructor turnover. Teacher trainers have made significant contributions to the program revisions we performed over the years. Their systematic observations have been invaluable in detecting ineffective aspects of the program and improving them. Beyond the Cognitive Component: Assessing the Affective, Social, and Cultural Factors Our work on the cognitive factors of adult literacy led us to further investigate the affective, social, and cultural factors that are closely associated with them. In a study focusing on the growth and development of individuals within their immediate communities (Durgunoglu, 2000), we documented development at the completion of the program as compared to the situation in the beginning. Employing methodologies based on both social-constructivist and cognitive perspectives, we used a battery of cognitive measures as well as in-depth interviews and home visits. Thus, we were able to document the social and cultural contexts in which social development takes place.

7. ADULT LITERACY INSTRUCTION

145

Besides the cognitive outcomes of program participation, we documented reasons why participants were not schooled, their thoughts on schooling, their experiences of the difficulties of illiteracy, roles in the family, perceived quality of life, daily activities, goals in life and expectations about literacy acquisition, as well as self-reports of change due to program participation. At the end of the course, participants expressed a strong sense of self-efficacy, describing the tasks they can now perform on their own with minimal dependence on others (Durgunoglu, 2000). However, in addition to the course experience, participants' own personalities and motivational levels also played a big role in their development. In an effort to understand why some participants were successful in literacy acquisition while others were not, we also selected a smaller sample of success stories from our sample and searched for commonalities. The most striking characteristic of this group was that they were overcoming incredible odds and obstacles to learn to read and write. They were all remarkably resilient individuals who tried to make the most of the very limited opportunities they had. Thanks to a high level of motivation, these women had learned the foundations of literacy on their own, had deliberately searched for adult literacy classes, and wanted to improve themselves. They persisted and succeeded even when faced with obstacles. These participants also shared belief in their own abilities and faith that literacy was going to improve their lives. We also conducted a qualitative comparative analysis of four of the participants in the previous study (Durgunoglu, 2003). For this study, from a group of participants with comparable cognitive skills and beginning literacy levels, we selected two participants who had completed the program and matched each with a participant who had dropped out. These comparisons clearly illustrated how affective variables such as motivation and self-efficacy are important in whether participants will complete or drop out of an adult literacy class. CONCLUSION In 1995, we developed FALP as a research-based model program. Across the years and through several implementation/revision cycles, research continues to shape the nature of our instructional program. However, at the same time, our experiences with FALP, its participants, and its teachers continue to shape our theoretical conceptualization of adult literacy development. This reciprocal link makes FALP a rewarding experience, not only for the teachers and participants but also for us, the program developers.

146

ONEY AND DURGUNOGLU

ACKNOWLEDGMENTS We owe a debt of gratitude to the Mother-Child Education Foundation (MOCEF) in Istanbul and to our colleagues, Hilal Kuscul, Meltem Canturk, Filiz Aslan, Mutlu Yasa, Gulseren Kumus, for their efforts and hard work in organizing, implementing, overseeing and supporting FALP. We also thank FALP volunteer instructors and their consultants. Their dedication to making this adult literacy program effective to improve the lives of individuals is truly inspiring.

REFERENCES Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press. Auerbach, E. (1996). Adult ESL literacy: From the community to the community. Mahwah, NJ: Lawrence Erlbaum Associates. Baker, L. (1989). Metacognition, comprehension monitoring and the adult reader. Educational Psychology Review, 1, 3-38. Chapman, J. W., Tunmer, W. E., & Prochnow, J. E. (2000). Early reading-related skills and performance, reading self-concept, and the development of academic self-concept: A longitudinal study. Journal of Educational Psychology, 92, 703-708. Comings, J., Parrella, A., & Soricone, L. (2000). Helping adults persist: Four supports. Focus On Basics, Volume 4, Issue A, 3-6. Condelli, L. (1996). Evaluation systems in the adult education program. The role of quality indicators. Washington, DC: Pelavin Research Institute. Cummins, J. (1994). From coercive to collaborative relations of power in the teaching of literacy. In B. M. Ferdman, R.-M. Weber, & A. Ramirez (Eds.), Literacy across languages and cultures (pp. 295-331). Albany: State University of New York Press. Durgunoglu, A. Y. (2000). Adult literacy: Issues of personal and community development. Unpublished manuscript. Durgunoglu, A. Y. (2003). Affective dimensions of adult literacy development. Paper presented at the annual meeting of the American Educational Research Association, Chicago. Durgunoglu, A. Y., & Oney, B. (1999). A cross-linguistic comparison of phonological awareness and word recognition. Reading and Writing, 11, 281-299. Durgunoglu, A. Y., & Oney, B. (2002). Phonological awareness in literacy development: It's not only for children. Scientific Studies of Reading, 6, 245-266. Durgunoglu, A. Y., Oney, B., Dagidir, F. Z., & Kuscul, H. (2000). Functional literacy Level 2. Istanbul, Turkey: Mother-Child Education Foundation. Durgunoglu, A. Y., Oney, B., & Kuscul, H. (2002). Development and evaluation of an adult literacy program in Turkey. International Journal of Educational Development, 23, 17-36. Durgunoglu, A. Y., Oney, B., Kuscul, H., Dagidir, F. Z., Aslan, F., Canturk, M., & Yasa, M. (2002). Functional literacy Level 1. Istanbul, Turkey: Mother-Child Education Foundation. Durgunoglu, A. Y., & Verhoeven, L. (1998). Epilogue: Multilingualism and literacy development across different cultures. In A. Y. Durgunoglu & L. Verhoeven (Eds.), Literacy development in a multilingual context: Cross-cultural perspectives (pp. 289-298). Mahwah, NJ: Lawrence Erlbaum Associates. Freire, P. (1970). Pedagogy of the oppressed. New York: Herder and Herder.

7. ADULT LITERACY INSTRUCTION

147

Gillette, A. (1987). The experimental world literacy program. A unique international effort revisited. In R. F. Amove & H. J. Graff (Eds.), National literacy campaigns: Historical and comparative perspectives (pp. 197-218). New York: Plenum Press. Juel, C, Griffith, P. L., & Gough, P. B. (1986). Acquisition of literacy: A longitudinal study of children in first and second grade. Journal of Educational Psychology, 78, 243-255. Langer, J. A., Bartolome, L. I., & Vasquez, O. A. (1990). Meaning construction in school literacy tasks: A study of bilingual students. American Educational Research Journal, 27, 427-471. Lomax, R. G., & McGee, L. M. (1987). Young children's concepts about print and reading: Toward a model of word reading acquisition. Reading Research Quarterly, 22, 237-256. Oney, B., & Durgunoglu, A. Y. (1997). Learning to read in Turkish: A phonologically transparent orthography. Applied Psycholinguistics, 18, 1-15. Oney, B., & Goldman, S. (1984). Decoding and comprehension skills in Turkish and English: Effects of the regularity of grapheme-phoneme correspondences. Journal of Educational Psychology, 76, 557-568. Rogers, A. (1999). Improving the quality of adult literacy programmes in developing countries: The "real literacies." International Journal of Educational Development, 19, 219-234. Sherman, R., Tibbetts, J., Woodruff, D., & Weidler, D. (1999). Instructor competencies and performance indicators for the improvement of adult education programs. Washington, DC: U.S. Department of Education. Skilton-Sylvester, E., & Carlo, M. S. (1998). "I want to learn English": Examining the goals and motivations of adult ESL students in three Philadelphia learning sites (Technical Report No. TR98-08). Philadelphia: National Center for Adult Literacy. Snow, C. E., Barnes, W. S., Chandler, J., Hemphill, L., & Goodman, I. F. (1991). Unfulfilled expectations: Home and school influences on literacy. Cambridge, MA: Harvard University Press. Soifer, R., Irwin, M., Crumrine, B., Honzaki, E., Simmons, B., & Young, D. L. (1990). The complete theory-to-practice handbook of adult literacy: Curriculum design and teaching approaches. New York: Teachers College Press. Street, B. (1984). Literacy in theory and practice. Cambridge, UK: Cambridge University Press. Tunmer, W. E., Herriman, M. L., & Nesdale, A. R. (1988). Metalinguistic abilities and beginning reading. Reading Research Quarterly, 23, 134-158. Venezky, R. L., Bristow, P. S., & Sabatini, J. P. (1994). Measuring change in adult literacy programs: Enduring issues and a few answers. Educational Assessment, 2, 101-131. Wagner, D. A. (1993). Literacy, culture, and development: Becoming literate in Morocco. Cambridge, UK: Cambridge University Press. Wagner, D. A., & Venezky, R. L. (1999). Adult literacy: The next generation. Educational Researcher, 28, 21-29. Wilson, P. T., & Anderson, R. C. (1986). What they don't know will hurt them: The role of prior knowledge in comprehension. In J. Orasanu (Ed.), Reading comprehension: From research to practice (pp. 31-48). Hillsdale, NJ: Lawrence Erlbaum Associates. Young, M., Fleischman, H., Fitzgerald, N., & Morgan, M. (1994). National evaluation of adult education programs: Patterns and predictors of client attendance. Arlington, VA: Development Associates.

This page intentionally left blank

8 What Does It Mean to Comprehend or Construct Meaning in Multimedia Environments: Thoughts on Cognitive and Assessment Construct Development John Sabatini Educational Testing Service

Those who have insinuated that Menard devoted his life to writing a contemporary Quixote besmirch his illustrious memory. Pierre Menard did not want to compose another Quixote, which is surely easy enough—he wanted to compose the Quixote. Nor surely, need one be obliged to note that his goal was never a mechanical transcription of the original; he had not intention of copying it. His admirable ambition was to produce a number of pages which coincided—word for word and line for line—with those of Miguel de Cervantes. (Borges, p. 91)

The increasing use and integration of multiple media1 in schools and the culture more broadly is placing novel demands on traditional assessment constructs designed to measure student proficiencies in gaining information and constructing understanding from media sources. In this chapter, I explore lines of thinking that could inform new media constructs, with an eye toward more formative than summative purposes. The argument of the essay has two main theses. The first is that the time has come to 1

Throughout I use the term "multiple media" to be loosely inclusive of traditional print forms (books, newspapers, texts of all kinds, etc.), audio forms (radio, narrations, musical recordings, audio recordings of all kinds), visual forms (graphs, maps, pictures, photos), live presentation forms (plays and other performances), hybrids of these traditional forms (television, movies, filmstrips, movies of dramatic performances), and all the hybrid electronic and technology-based information-communication forms emerging (email, websites, hypermedia, multimedia, simulations). Later in the chapter, the term media is itself used as a short-hand term inclusive of "multiple media."

149

150

SABATINI

merge what has been typically called media education (Tynan, 1998) with the traditional reading literacy/language arts curriculum. This will help in the design of assessment (and cognitive) constructs to address the relationships (and seeming contradictions that often arise) between traditional print literacy and media products, electronic or other. The second is that the process of assessment construct development should be viewed as a heuristic research tool for furthering our theoretical and pragmatic understanding of multimedia comprehension. This latter argument is predicated on the observation that the accelerated rate of intellectual, social, and technological change in literacy, media, and comprehension warrant flexible, provisional assessment tools that can be rapidly deployed and replaced to keep pace with this rate of change. I introduce the example of evidence-centered design as such a heuristic research tool for transforming comprehension assessment to better align with changing and technological perspectives and practices of the multimedia educational environment. I have not attempted an exhaustive review of the basic research on these topics but rather have relied primarily on edited volumes and monographs in which authors have attempted to synthesize and characterize the current and historic treatment of media and literacy issues. The arguments and citations represent this selective approach. The Borges short story is revealing of a conundrum of comprehension assessment as it has popularly been conceived. In the Borges story, the 20th century author Pierre Menard sets out to rewrite Don Quixote, word for word, except from his own experiences, history, and perspectives. Let's assume Menard was successful in reproducing, word for word, the entirety of Don Quixote. If you or I were to read Menard's Quixote, would we interpret it the same as we do Cervantes's novel? It consists of the same words in the same sequence, so on some levels of analysis it must be the same. But, for instance, can we ascribe the same author's intent? Put another way, would we ask ourselves the same questions when interpreting the text of a 20th century writer describing 16th century Spain as we would a 16th century author describing his or her world? I would suggest that at least some of our interpretations of the text would be altered by this change in authorship. If we can conceive of the same words in the same sequence as having multiple meanings, what are the implications for the design of constructs that measure text comprehension? Modern conceptions of text comprehension recognize that meaning extends beyond the written text to include considerations of at least the author, the reader, the context, and their interactions (e.g., Farr & Jongsma, 1997; National Assessment Governing Board, 2002). Menard's quest also hints at a cognitive perspective of comprehension as constructed meaning. It is often remarked about great literary works

8. MEDIA CONSTRUCTS

151

that one should read them when young, in middle age, and when aged. Each time, new meaning, understanding, and interpretation are reached. This observation reflects that at each reading the reader has changed and so has the world around him or her.2 One can take this thought one step further. Does reading the same text once, then again immediately afterwards, result in the same comprehension and interpretation? In a cognitive sense, there is some overlap and reinforcement of one's initial construction of meaning; however, different interpretations, different meaning, and different nuances may also emerge when rereading. Cognitive research supports the conclusion that all the readings described may result in different memory representations of text, that is, differing constructed meanings (e.g., Britton & Graesser, 1996; Kintsch, 1998; National Reading Panel, 2000). The initial memory structure is an interaction of previous knowledge and experiences. This newly formed structure interacts with the subsequent reading, changing the network of representations yet again, strengthening some connections, altering others. And, of course, those constructed meanings vary across different readers, in interaction with their own background knowledge and experiences and knowledge of texts. Add information about author and context, and new interpretations emerge. Communication among readers (or teachers) further strengthens, weakens, and alters representations, thus the salience of the phrase constructing meaning from texts. From this cognitive perspective, reading is truly a mutually constitutive system. Background knowledge shapes our reading experience. Our reading experience reshapes our background knowledge. Following this line of reasoning leads to questions concerning what meaning or interpretations may be constant or at least constrained by a printed text versus all the possible, potential meanings (see Trabasso, this volume, for one approach). The same questions hold for other representational forms; that is, what might be the implications for constructing meaning from multi- or mixed-media sources?

THE CONSEQUENCES OF CROSS-MEDIA STUDY ON THE DEVELOPMENT OF THINKING AND UNDERSTANDING The effects of literacy on intellectual and social change are not straightfor-

ward ... it is misleading to think of literacy in terms of consequences. What matters is what people do with literacy, not what literacy does to people. Literacy does not cause a new mode of thought, but having a written record 2

In the case of Menard, the perspective shift actually takes place in the writer's head.

152

SABATINI

may permit people to do something they could not do before—such as look back, study, re-interpret, and so on....Literacy is important for what it permits people to do—to achieve their goals or to bring new goals into view. (Olson, Hildyard, & Torrance, 1985, p. 14)

Much has been written analyzing the history of literacy and orality (Cipolla, 1969; Clanchy, 1993; Eisenstein, 1979; Goody, 1968; Goody & Watt, 1968; Graff, 1979,1987; Kittay, 1991; Olson, 1991; Pattanayak, 1991; Saenger, 1991; Scholes, 1995; Taylor & Olson, 1995; Venezky, 1984); communication, digital, media, visual, computer, network, and technology literacies (Bolter, 1998; Christ, 1997; Jonassen, 2001; Kubey, 2001; Masterman, 2001; McClure, 2001; Messaris, 2001; Potter, 2001; Salomon, 1997; Silverblatt, 2001; Tynan, 1998; Warschauer, 1999; Wulff, 1997); and their relation to traditional literacy and language arts (Flood, Heath, & Lapp, 1997; Kamil & Lane, 1998; Lemke, 1998; Reinking, McKenna, Labbo, & Kieffer, 1998; Venezky, 1997). It is beyond the scope of this chapter to summarize this body of work, and it will probably be some time before an agreed-on synthesis is reached, given the rapidly changing technological environment in and out of schools. It may be said that print media has dominated the pedagogical functions of schooling in the United States to the present.3 This dominance reinforced the value and centrality of print literacy to the exclusion of media. We know that a dramatic play is a different experience when read, heard, or seen. We know the experience differs in viewing a live stage performance, a film, or a television adaptation. However, it is the printed representation that is most likely—and that we expect most often—the central target of course assessments, especially with the high value placed on reading comprehension as a primary outcome of formal schooling. We anticipate that some meaning is common across these different representations and that some meanings are unique. Two questions arise: Are we more, less, or equally interested in the commonalities or distinctions among these representation forms and is there anything special, anything unique, anything precious about written/printed text representations and the cultural artifacts and processes they have helped generate and afford?4 3 This "dominance" may be as much perceived as documented. Whether students learn more or spend more time with books and other printed materials in comparison to classroom lectures and discussion or TV/Internet usage in recent years cannot be easily inferred from the existing literature. Nonetheless, taking a historic sweep of the past 100 years of formal schooling, the centrality of print in published curriculum materials is evident and persistent. 4 The concept of affordances in the Gibsonian sense has been a theme throughout my study with Dick Venezky and again came to the forefront as we first discussed the abstract and broad aims for this essay. What affordances were there that differentiated the uses of different media, technologies, and printed text that should be addressed in cognitive and measurement constructs?

8. MEDIA CONSTRUCTS

153

Olson's (1991) examination of literacy's relationship to language/ orality addresses, to some extent, the latter issue (also see Goody & Watt, 1968; Graff, 1987; Scribner & Cole, 1981; Tynan, 1998). He reviews the fallacy of general claims of literacy on cognition and modernity. For example, he notes that participation in a literate society acculturates one to literacy knowledge, skills, and practices, even for individuals who have never learned to decode printed text. He examines several common hypotheses of how written texts may uniquely influence cognition. In his analysis, he rejects the modality hypothesis (input through the eye vs. ear produces different forms of thought), the medium hypothesis (speaking and writing as distinctive forms of discourse), and the mental skills hypothesis (learning to think like a reader or writer leads inevitably to new forms of transferable thought). Olson concludes that the metalinguistics hypothesis is the most viable—that reading, as a secondary linguistic activity, makes language into an object of thought and discourse. Olson argues that writing, by its very nature, is a metalinguistic activity. The reflection and new awareness of language, over time, cultivates literate forms of thought that transcend printed text. At the same time, it calls into question assumptions of the primacy of printed text literacy to thinking and learning. Olson's (1991) reasoning is consistent with recent findings in the neurosciences. Though it may be premature to draw firm conclusions about findings that emerge from neuropsychology, we are learning from magnetic resonance imaging and other techniques that the networked systems of the brain are quite complex, interactive, and redundantly represented (see Berninger & Richards, 2002). Simple acts of reading texts or performing memory tasks access multiple functional areas of brain activity, including visual imagery, phonological (audio) centers, and perceptual/motor routines. The function of memory and cognition are to enable thinking about concepts in the absence of perceptual stimuli. Reading of a text potentially enables all those functions. We are likely to find functional overlap from learner's critical analysis of media as well. Some tasks will use perceptual and modality specific networks in the initial construction of understanding (e.g., orthographic and phonological processing networks for visual word recognition). Other tasks will call up overlapping cognitive processes that serve as secondary supports (e.g., accessing the spelling of a word heard or seen in visual media). Thus, the research points to both modality specific structures that process inputs before they are modified and communicated to other areas of the brain as well as diverse functional areas that can be called on in creating multiple representations and meanings. The neuroscience evidence can be interpreted as generally supportive of moving beyond a narrow view of print-based comprehension as a unitary construct with unique cognitive conse-

154

SABATINI

quences to a more inclusive assessment and curricular agenda that acknowledges both modality /functionally specific processes, such as print processing, as well as integrated, cross-modality processing of symbolic representations. Salomon (1997) expands on this reasoning to include explicitly other forms of symbolic representational media. He writes the following: In other words, different symbolic forms of representation address different aspects of the world around us and thus afford us the opportunity to learn something different about the world from each form of representation. This much is, of course, pretty obvious. What is less obvious is that even when different symbol forms of representation address the same field of reference, conveying (what appears to be) the same information—say, a verbal description of an event and a video rendering of it—the meaning we would derive from each might be pretty different, (p. 377)

This is at the heart of the problem of any assessment construct that might differentiate what is unique to printed text representations as opposed to other (mixed) media or symbolic representational forms. Even the transition from a written text to one that is read aloud undergoes an interpretive shift that could influence meaning construction. Consider, for example, the differing interpretations that may arise from alternative dramatic renderings of a text. Should we care whether an individual gains insights about the human condition from a written, radio, or live dramatic production of Hamlet? We might but not merely out of a prejudgment of print's superiority in conveying such insights. Salomon (1997) provides guidance as to when and why the forms of representation might matter to the kinds of understanding we seek to foster and assess in school-based instruction. Salomon suggests that the media form may matter most when the learner's pre-existing domain knowledge is lean. When it is rich, well organized, and well developed with respect to the domain, the representation system may matter less to the meaning constructed. "In other words, the extent to which symbolic forms of representation affect meaning is a matter of balance between the richness of one's schemata: the less knowledge already available to the learner, the more the symbolic forms of representational will make a difference in the meanings the learner arrives at" (Salomon, 1997, p. 377). The seeming paradox of this formulation is that the more work, i.e., cognitive effort, required in processing and engaging a representational form, the more learning that is likely to ensue, i.e., change or growth in internal representations. This phenomenon has been demonstrated in studies of reading comprehension processes. For skilled readers, a somewhat disorganized presentation of information, imperfect coherence, or

8. MEDIA CONSTRUCTS

155

missing information that requires bridging inferences can stimulate deeper processing, with a resulting reorganizing of knowledge and greater learning from text (Bisanz, Das, Varnhagen, & Henderson, 1992; Kintsch, 1998). This line of reasoning has also been offered as a putative reason for questioning the pedagogical value of film and television productions. The claim is that viewers typically exert less effort in comprehending a television or video program (i.e., the much lamented passive viewing habits of citizens), and, because it requires less effort, they, in general, retain and learn less from it. However, this criticism may be less a function of the media than the pedagogy5 (or lack thereof) when teaching with that media. Schramm (1977) notes the following: Students learn from any medium, in school or out, whether they intend to or not, whether it is intended or not that they should learn (as millions of parents will testify), provided that the content of the medium leads them to pay attention to it. Many teachers argue that learning from media is not the problem; it is hard to prevent a student from learning from media, and the real problem is to get him to learn what is intended to learn. (p. 267, cited in Krendl, Ware, Reid, & Warren, 2001)

We conclude that there will always be an overlap between the knowledge, skills, and strategies acquired and applied in response to printed texts and other forms of media representation, cognitively as well as pedagogically. The issue is one of focusing attention on the proficiencies we value in students. We also conclude that no clear, categorical distinctions will emerge that differentiate what should count as a printed text versus the various hybrid forms of media representations that may include print, audio, and visual content and may be arrayed in temporally and spatially novel configurations (e.g., hypermedia Web sites). Therefore, looking forward, there is reason to question the continued treatment of reading literacy/language arts as distinct from other forms of media representation. Instead, an assessment strategy that seeks convergent and divergent qualities across media could serve to highlight unique affordances of specific symbolic forms. Pedagogies based On such activities could produce proficiencies within learners that allow them to understand, go beyond, and transform many different kinds of symbolic forms. To smooth this transition, assessment techniques are necessary that cross the domains of reading literacy and media for gathering evidence of these forms of student proficiency. 5

It is assumed that the term pedagogy as used here includes specified curricular aims; that is, one knows what one intends to learn, not merely generic teaching techniques.

156

SABATINI

READING AND MEDIA LITERACY ASSESSMENT AND CURRICULA How then is reading literacy assessment conceptualized in comparison with media constructs? The National Assessment of Educational Progress (NAEP) 2003 Reading Framework (National Assessment Governing Board, 2002) represents one well-known approach. The NAEP is used to report on how well 4th-, 8th-, and 12th-grade student samples perform in reading various texts and responding to those texts. The NAEP framework builds on the view from the National Reading Panel (2000) report that "In the cognitive research, reading is purposeful and active (Pressley & Afflerbach, 1995). According to this view, a reader reads a text to understand what is read, to construct memory representations of what is understood, and to put this understanding to use" (p. 4-39). Table 8.1 summarizes the NAEP goals for reading literacy, the contexts for reading, and the aspects of reading used as construct specification in developing the assessment. Two other international assessment efforts, the Progress for International Student Assessment (PIRLS; International Association for the Evaluation of Educational Achievement, 2000) and Program for International Student Assessment (PISA; Organization for Economic Cooperation and Development, 2000) have similar definitions of reading literacy. The PIRLS defines reading literacy as "the ability to understand and use those written language forms required by society and/or valued by the individual. Young readers can construct meaning from a variety of texts. They read to learn, to participate in communities of readers, and for enjoyment" (p. 3). The PISA defines reading literacy as "understanding, using, and reflecting on written texts in order to achieve one's goals, to develop one's knowledge and potential, and to participate in society" (p. 18). Note first that the specific charge of the NAEP, PISA, and PIRLS are to assess reading literacy, that is, written texts and written language forms. Even so, the texts may include charts, bus schedules, forms, and various graphic and visual stimuli. Second, with some revision, the goals, contexts, and aspects need not be restricted to reading or printed materials at all. The focus is primarily on applying skills and strategies toward constructing meaning, understanding, learning, and performing tasks. Compare these definitions with those of the National Leadership Conference on Media Literacy: A media literate person—and everyone should have the opportunity to become one—can decode, evaluate, analyze, and produce both print and electronic media. The fundamental objective of media literacy is critical autonomy in relationship to all media. Emphasis in media literacy training range

8. MEDIA CONSTRUCTS

157 TABLE 8.1 NAEP 2003 Reading Framework

Goals for Reading Literacy

The goals for reading literacy are to develop good readers who • read with enough fluency to focus on the meaning of what they read; • form an understanding of what they read and extend, elaborate, and critically judge its meaning; • use various strategies to aid their understanding and plan, manage, and check the meaning of what they read; • apply what they already know to understand what they read; • can read various texts for different purposes; and • possess positive reading habits and attitudes. Contexts for Reading Three contexts for reading are assessed: 1. Reading for literary experience—involves the reader in exploring themes, events, characters, settings, problems and the language of literary works. Readers explore events, characters, themes, settings, plots, actions, and the language of literary works by reading novels, short stories, poems, plays, legends, biographies, myths, and folktales. 2. Reading for information—involves the engagement of the reader with aspects of the real world. Readers gain information to understand the world by reading materials such as magazines, newspapers, textbooks, essays, and speeches. 3. Reading to perform a task—involves reading in order to accomplish or do something. Readers apply what they learn from reading materials such as bus or train schedules, directions for repairs or games, classroom procedures, tax forms (grade 12), maps, and so on. The Aspects of Reading

• To form a general understanding the reader must consider the text as a whole and provide a global understanding of it. (p. 11) • To develop an interpretation, the reader must extend initial impressions to develop a more complete understanding of what was read. This process involves linking information across parts of a text as well as focusing on specific information. (p. 11) • To make reader/text connections, the reader must connect information in the text with knowledge and experience. This might include applying ideas in the text to the real world. (p. 12) • Examining content and structure requires critically evaluating, comparing and contrasting, and understanding the effects of such features as irony, humor, and organization. (p. 12) Note. From Reading Framework for the 2003 National Assessment of Educational Progress, by the National Assessment Governing Board, 2002, Washington, DC: U.S. Department of Education.

widely, including informed citizenship, aesthetic appreciation and expression, social advocacy, self-esteem, and consumer competence. The range of emphasis will expand with the growth of media literacy. (Aufderheide, 2001, p. 79) This media definition encompasses print literacy (see also Calfee, 1997, and Olson, 1991, for integrated views).

158

SABATINI

For a variety of attitudinal and economic reasons (especially in the U.S.), media has not become a primary subject or object of study in the K-12 school curriculum (Masterman, 2001). "Media" is more often analyzed and understood in the context of news and entertainment industries, only secondarily in K-12 education. Nonetheless, media education scholars, such as Aufderheide (2001), Kubey (2001), Masterman (2001), Potter (2001), and Tyner (1998), have been building a rationale and foundation for media education. In doing so, they have been cautious in their direct criticism of the existing, historically based U.S. literature and language arts curriculum. In their cautious stance, however, can be detected a hint of a pragmatic, strategic, and rhetorical agenda. This agenda includes articulating the aims and goals of literature and language arts under the broader mantel of media. It is not entirely accidental that decoding, evaluating, analyzing, and producing are parallel terms that can be used both with print and electronic media. Media scholars also consistently argue for critical interpretation as fundamental to media education. The higher level processes of critical awareness, analysis, and evaluation have been present and foundational, in one way or another, to all historical and present formulations of the aims of a reading, literature, and language arts curriculum as well. Where media formulations may go beyond traditional literature and language arts purposes are in their examination of mass media influences (e.g., Hobbs & Frost, 2003). Such formulations encompass a broader communication perspective that analyzes communicator, message content, channel, and audience (e.g., Silverblatt, 2001). In media studies, message content is often the least important consideration, given how it is shaped by decisions of what content might fit the channel (e.g., what stories can be visualized in a TV broadcast), what audience is targeted (e.g., what demographic is to be reached), and what the communicator's aims are (e.g., to engage a viewer audience that might buy a sponsor's product). Masterman (2001) refers to the rationale for this formula of media study as the inoculation view of media; that is, we must examine mass media or fall prey to its unconscious influences on our behavior. This view is inclusive of print media, though the emphasis is generally placed on public news sources (newspapers, magazines), print advertisements, and other print genres in which the marketing aims of the communicator are clear or seem to take precedence over content/message (e.g., genre fiction such as romance novels). There are straightforward analogs of the communications perspective in interpreting print from a language arts perspective. Examining author's intent, including considerations of what audience the author may have been writing to reach, is analogous to analyzing the communicator and audience. A focus on the genre, document form, structure of language,

8. MEDIA CONSTRUCTS

159

and literary devices does approach an analysis of the unique features of the printed text channel. The mass media channel—print publishing—is less frequently an object of analysis in school curriculum. Venezky (1992) for one has examined the contexts influencing textbook publishing and adoption on message content. This kind of examination could be a more central object of study within the school curriculum as soon as it is developmentally appropriate and then continuously addressed throughout the learner's education. This is not to say that every kindergarten classroom reading of Sendak's Where the Wild Things Are must be accompanied by a discussion of the various ways in which the visual characters, stuffed animals, Tshirts, televised cartoons, and advertisements are used across the culture to appeal and sell to children and adult audiences. However, it is fact that many published products that gain popularity or acceptance in the school curriculum these days also spawn accompanying marketing and media accoutrements. In contemporary U.S. culture, any curriculum inclusion (e.g., Sendak, Seuss, Shakespeare, and so forth), just as any television, movie, musical, or other popular mass media product, has the potential to appeal to a mass audience of developing children and youth. Furthermore, as students transition to adult consumers and producers in the society, they will (as they become advertisers, movie makers, writers) draw on this cultural iconography to reproduce it in new forms to appeal to audiences they now represent en masse. This reproductive cycle is initiated in the institutions of schooling; it could be studied there as well. Although the correlates to media studies exist in the instructional content and aims of reading and literacy instruction, the primary focus of outcomes and assessment constructs, however, continues to be on the text content or message. Implicit in these constructs are a view of reading as a private, individual activity, with message content as the primary concern of comprehension and interpretation. Analogously, printed texts in schools have traditionally been viewed by educators as largely neutral pedagogical products, less so the products of communicators using a particular channel to reach an audience.6 The channels of print are expanding to include print-related media, such as hyperlinked Web sites and interactive multimedia software. Au6 As documented by Venezky (1992), one need not adopt a cynical view to acknowledge that children's books, educational materials, and even the repackaging of the classics of literature are themselves a mass-produced media with at least one aim of creating a consumptive audience of the educators and future citizenry. The target audience of school materials includes the purchaser of school curricula via state or local adoption and governing bodies, as well as teachers and student readers. An uncritical acceptance of the printed texts, especially texts produced by the publishing industries for school-age children, as neutral and existing outside the contexts that inform them, is untenable.

160

SABATINI

dio, visual, and animated ancillary content are common, blurring the traditional media channel category distinctions of newspaper, radio, television, and film. This suggests limitations in a view that too narrowly focuses on those attributes of meaning that are uniquely resident in printed text content and more amenable to the broader constructs of media studies. With the merging of media forms on the Internet and other mixed media devices, any artificial separation of print and media in popular cultural forms is both difficult and diminishes the possibilities of multiple media education. There is reason to believe that the adoption of multiple and multimedia forms into U.S. school context, albeit traditionally slower than in the culture at large, will continue and perhaps accelerate.7 The process of integration into classroom instruction is well under way (Flood, Heath, and Lapp, 1997).8 The language arts classroom has always included activities that require analysis and reflection across the boundaries of media representations. Using the metalanguage of media study can enrich such comparative analyses. For example, how does the channel of print convey message content differently from oral speech? What choices might be made by the communicator/author to convey meaning (e.g., the stage directions for an actor's oral rendition vs. the poetic text imagery)? How do visual media artists convey time passing, setting, and tone? And, of course, how is the meaning the same and different in each representational form? In sum, printed texts are one form of media representation. Multiple forms of media now inevitably intrude into the learning traditionally conducted on and with printed texts in schooling and pedagogy (Bazalgette, 2001). The constructs we have traditionally defined and assessed as reading literacy will and probably should take their place as a subset of media 7

In fact, the National Council of Teachers of English (NCTE) Web site (http:// www.ncte.org/) has clear resolutions for supporting and promoting media literacy, evaluation, and composition as part of the language arts curriculum dating back from 1970 and continuing to the present. 8 A partial explanation of previous failures for media to become fully integrated in formal schooling lay in the high cost to produce and distribute. Access to media remained the province of specialized technicians who used expensive production technologies. Recent events have made media more accessible and less expensive for a wider range of users. The digital age not only increased access to computers, networks, and mass storage of visual and audio imagery, but it also made it easier to use and more accessible to the citizenry. Even the status of text writing (even if one accepts the technology-enhanced word processing as a text writing) as a primary form of student skill and evidence is yielding to mixed media productions. It is arguably no simpler technologically for a student to compose a hand written (or typed) essay than it is to compose a multimedia presentation (with audio and visual supports) to convey thoughts on a topic. Again, on the NCTE Web site is a 2003 resolution specifically supporting and encouraging multimedia presentation as a form of composition and repeating their continued support for media literacy.

8. MEDIA CONSTRUCTS

161

forms in the curricula. This is not to say that reading comprehension, as visual text processing, should be relegated to a secondary role. It is premature to consign printed materials and visual text processing to the dustbin of antiquated pedagogy. Print remains a unique and valuable representation system and cannot be discarded easily in a posttypographical age (Mackey, 2003; Reinking, 1998; Venezky, 1997). It is not too soon, however, to reexamine our assumptions as to why visual text remains an important, unique, and valuable symbolic representational system of information and communication technology. However, the construct of "comprehension" can and should be more broadly conceptualized across media and symbolic representation systems and need not be limited to visual texts (e.g., Farr & Jongsma, 1997; Kintsch, 1998; Tierney, 1997). Capturing this interplay in assessment constructs has been difficult. How might one address or disentangle the mixture of print, media, and other cultural forms? How might one plan for individual differences in exposure to the various forms in assessing comprehension? These topics will be addressed in the final sections of this chapter.

WHY A CONSTRUCT? In the edited chapter volumes and monographs reviewed for this chapter, assessment/measurement is often an afterthought to the more central topics of learning from and with media. Historical/social/cultural aspects, research methodologies, practice, and policy represent the typical categories for dividing up edited volumes and handbooks. Assessment is given a few chapters in a work, if at all, though those who do focus on it provide forward-thinking treatments (e.g., Calfee, 1997; Farr & Jongsma, 1997).9 Absent new conceptions and models of assessments, there is a risk that even expert practitioners may revert to traditional measurement practices that distort and defeat new conceptions of learning promoted and desired. Whether constructed in the classroom or imposed from without, the tendency is to underestimate the uncertainty we have about what precisely we truly value as proficiencies, tasks that truly require application of these proficiencies (assuming we did know), and means of evaluating evidence produced by such tasks (if they truly did require those proficiencies). In a dynamic social, cultural, historical, social science enterprise such as education, assessment resides at a nexus between the theoretical and the pragmatic. Pure research, qualitative or quantitative, employs data collec9

Jonassen's (2001) Handbook for Educational Communications and Technology has no chapters devoted directly to the issue of assessment and only three citations in the topic index. Flood, Heath, and Lapp (1997) present a richer treatment of the topic including several chapters.

162

SABATINI

tion procedures, measurement instruments, and analytic processes that range from open-ended observation, interview, or artifact review to standardized tests and laboratory experiments. The various forms of assessment available today overlap considerably with any and every form of research data collection (e.g., portfolio and performance assessment techniques). However, they are typically put to pragmatic ends that influence the environments of their employ. This influence can be formative information to learners and teachers to guide learning and instruction; placement, selection, or credentialing assessments that may greatly affect individuals' life choices and goals; or accountability purposes that influence governance of entire systems and institutions. At the core of the assessment is validity (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999; Calfee, 1997; Messick, 1989, 1994, 1995). The construct is designed to address, simultaneously, questions including the following: What is it we are measuring? Why (i.e., why should we care)? With whom? For what purpose? The construct is always a social construct, a statement of values, that potentially perturbs the very system it is designed to predict or describe. The construct defines a goal state. We first describe what a proficient learner would be able to say or do with knowledge and skills and then we build our educational means toward achieving that end. To that end, this section treats assessment as a means to an end for helping us better understand what we would like learners to be able to do with emerging multimedia forms. Research into media and communication takes many forms, both quantitative and qualitative (see reviews in Jonassen, 2001; Kamil & Lane, 1998; Knupfer & McLellan, 2001; Krendl et al., 2001). The rapid social and technological change occurring in schools and society has led to research aimed toward expanding thought and meaning of media forms (e.g., Tierney, 1997). It is not a movement toward parsimony or synthesis. The research methodologies often called for or recommended for media literacy study mirror those in literacy studies broadly. A greater emphasis is placed on descriptive and qualitative methods, less on quantitative comparisons (Calfee, 1997), though the latter are still common in psychological studies.10 The measurement tools recommended are open-ended: observation, interview, and artifact collection. The analytic tools applied to the data would be pre10 Of 13 chapters on Methods of Inquiry in Flood, Heath, and Lapp (1997), nearly all emphasize more qualitative than quantitative techniques. In the Handbook on Reading Research III (Kamil et al., 2002), which emphasizes recent or emerging techniques as applied to literacy, 9 of 10 chapters on method focused on qualitative and mixed methods, none on traditional experimental designs: teacher research, programmatic interventions (evaluation designs), historical, narrative, critical, ethnographic, verbal reports and protocol analysis, single-subject case study, discourse and sociocultural, and research synthesis.

8. MEDIA CONSTRUCTS

163

cise and rigorous (e.g., microethnography ethnographic, discourse analyses and thick description; see Kamil et al., 2002); however, such techniques are not easily transferable to assessment purposes. The view that has been emerging from this research approach tends toward a broad view of literacy, multiple literacies, or multiliteracies (see Chandler-Olcott & Mahar, 2003; Hagood, 2003; Hobbs, 2001; Mackey, 2003; New London Group, 1996). The view encompasses all forms of media and technology influences as well as sociocultural factors. Each literacy can be defined as a unique constellation of skills and knowledge applied to a context or set of practices. Texts and reading are conceived beyond printed texts (e.g., an audio narrative, a performance, a sequence of events and practices, a soccer match, a musical score, a film, the world). Although I am uncomfortable with a proliferation of literacies, I am less concerned with a proliferation of measurement constructs. Constructs can be seen as reasonably bounded theoretical frameworks and pragmatic judgments of what is valued. The types of assessments that are afforded or produced from these constructs, and the depth and breadth of claims or inferences from assessment results, can be compared and contrasted. These analyses help us refine our purposes, processes, and products. This is a formative conceptualization of assessment predicated on a willingness to refine or discard constructs frequently. We have technical and logical machinery to help us decide whether we are measuring what we are describing and whether those measurements are helping us to better understand learners and the domain of interest or to accomplish the learning ends we strive toward. Thus, I am arguing for the use of disposable constructs to call attention to the nexus of thought and theory with the pragmatics of educational aims. In a time of rapid changes in forms, uses, and notions, a research agenda that attempts to describe all possible worlds and interpretations as multiplying literacies seems unmanageable. A strict set of comparisons that pits competing theories against each other experimentally to identify a single, unified construct suggests a universal stability than is also untenable. The forms of literacy, media, and language and our interpretations of them may seem to be stable over time. Closer examination, however, reveals that that perception is often illusory.11 This is especially true when it"Saenger (1991) and Venezky (1993) analyzed how changes in print spacing, format, and scribal practice would have demanded different cognitive strategies and knowledge for reading continuous text silently with or without meaning. Early texts were not meant to be read silently at all, but aloud. Furthermore, the social context was that reading was a privilege of the well-to-do classes. The reader may have been a servant or slave calling out the words, that is, not meant to be comprehending the reading at all. Such changes in social practices, which alter the cognitive demands, are likely a continuous, ongoing evolution that a more dynamic assessment strategy need accommodate.

164

SABATINI

erative cycles feed back and reconstruct our social constructs. Setting a goal of iteratively identifying what we value, how we measure it, and whether we are succeeding is a strategy worth considering. EVIDENCE-CENTERED DESIGN Emerging techniques might be used to examine convergent and divergent constructs across reading and nonreading dimensions. For example, narrative transcends the forms it takes in reading but has consistent features when represented in text genres. Analyzing materials and individual responses to what changes and what is common across modalities (reading, writing, speaking, listening, viewing, and interacting) is a tried and tested research technique. This has been exploited for drawing inferences in psychology, linguistics, and neuroscience. Looking at the intersections of narrative in text, text with visuals, animation, movie or TV programs, oral recitations, and various blends helps to demarcate the adaptations of cognitive systems and what is needed to acquire coordinated, expert subsystems that can construct meaning within and across variants. Correspondingly, contrasting narrative, nonnarrative, and mixed narrative forms (e.g., docudrama) may reveal new measurement challenges and insights about what is interesting and worth valuing as an end in education and therefore celebrating by measuring in new constructs. The evidencecentered design (ECD) approach is one analytic framework that could serve such ends. ECD builds on modern conceptions of validity. It serves as an ongoing research process of evidentiary reasoning, "that is, how we draw inferences about what students know, can do, or understand as more broadly construed, from the handful of particular things they say, do, or make in an assessment setting" (Mislevy, Wilson, Ercikan, Chudowsky, 2003, p. 489). Following terminology provided by Toulmin (1958), the goal of an ECD approach is to use substantive theories and accumulated experience to reason from particular data to particular claims. Inferences from data to claims are justified by warrants, are backed by theory and experience, and are qualified by addressing alternative explanations. In ECD, a design process is used that separates a student model (what we want to say about what a student knows or can do), a task model(s) (situations we can set up in the world in which the student can be observed saying or doing something that gives clues to knowledge and skills in the student model), and an evidence model(s) (the scoring rules or rubrics that determine which observations to select and how to evaluate the selected evidence). One of the great values of an ECD approach is the separation of the three models, which enable many-to-one relationships among student, task, and evi-

8. MEDIA CONSTRUCTS

165

dence models. Many evidence models are applied to a single task datum. Multiple task models are evaluated against a single scoring model. Coupled with student models that recognize developmental differences in anticipated proficiencies (e.g., the elaboration and critical skills expected of an elementary vs. secondary student who read the same text), the ECD process yields a powerful set of operations for addressing convergent and divergent understanding. The initiating activity in an ECD approach is domain analysis—gathering substantive information about the domain from a variety of sources— followed by domain modeling, which organizes the information into evidentiary arguments in relation to each of the three models. The result of the first two stages is not a unified, single conception of a domain. In fact, quite the opposite is true. Competing, alternate, and contradictory theoretical and experiential organizations of claims and arguments are expected. The goal is to make these arguments more transparent for the next stage. In formation of the conceptual assessment framework, the operational elements of an assessment are constructed by selecting from the domain model that set of student variables, evidence, and task models that represent the construct of interest. Competing conceptualizations now can and should either be addressed in the assessment design or be acknowledged as weakening the evidentiary argument. Unaddressed, these options diminish the validity of inferences or claims about the student. Thus, a key concept incorporated into ECD becomes the awareness and influence of construct relevant versus irrelevant variance. For example, if a text inference would have been obvious had a person spoken the same text aloud (and we can demonstrate that fact empirically), is the difficulty associated with the textual inference construct relevant or irrelevant? Well, it depends on the claims we wish to make about the proficiencies about the learner. If we are interested in text processing, then perhaps it is relevant. If we are interested in advanced thinking skills, then perhaps it's not relevant. Thus, different aspects of proficiency and performance can be highlighted by instantiating multiple conceptual assessment frameworks from a given domain model. The conceptual machinery of ECD, by making the entire design process more transparent, allows us to go back and reanalyze or refine the constructs. In this way we improve definitions of what we value to be measured. We can also examine whether the measures provide evidence of valued proficiencies and determine the inferences that the proficiencies warrant. The ECD approach was intentionally conceptualized to be in better alignment with advances in cognitive and instructional sciences, as well as with advances in technology for evoking, capturing, and analyzing complex performance. For example, consider the reading of Miguel de Cervantes's novel Don Quixote versus viewing a film adaptation of the musical stage play Man of

166

SABATINI

La Mancha. Imagine a general, familiar prompt of the following form: "How would you characterize the theme (or tone or setting) in this text excerpt? ... In this film scene? Are they similar or different? Explain how and why. Describe elements of the text/film that support your responses." Some questions on the construction of meaning, such as theme, might transcend the representational format. Other aspects, such as perceived tone or setting, may differ significantly. The evidence collected and scoring rules could be, though need not be, quite different, depending on what one considers construct relevant versus construct irrelevant variance. If one's interest were the student's proficiency in recognizing a canonical theme, tone, or setting as taught by the instructor, then responding to a prompt based on either the text or film with the anticipated response is appropriate. If the interest were how a theme, setting, or mood is conveyed using literary or cinematic techniques, then the prompt and scoring rules can be revised. Thus, one can systematically manipulate whether one could answer adequately or proficiently based on either reading the text or viewing the film versus rules that are contingent on presenting evidence from one form or the other. As this example illustrates, ECD does not require assessments that are unfamiliar or radically different from those techniques commonly used in classrooms and on outcome assessments. The previous prompt could be taken from any high school language arts classroom or advanced placement exam. The added value stems from the transparency of the articulation of the relationships of student proficiency to task to evidence rules to claims. This permits comparisons across representation formats, reducing the risk of inferring proficiencies falsely. If one's interest were to isolate a construct labeled reading comprehension, then background knowledge about Don Quixote from movies and culture in general would seem to be construct irrelevant and might interfere with inferring proficiency in constructing meaning from the texts. The task itself would need to specify clearly and provide opportunities to construct a response that was derived primarily from the text, for example, requiring specific language or quotes cited in the text. Of course, there could be considerable overlap between the text and film representations, so this puts an additional technical requirement on the learner and assessment designer and might require ample opportunity to search the text in constructing the response. These examples underscore some of the limitations of current reading comprehension constructs that attempt to restrict themselves to visual text processing. The risk of construct irrelevant intrusions (e.g., prior experience or background knowledge) privilege localized inferences and meaning construction because as broader, more complex meaning con-

8. MEDIA CONSTRUCTS

167

struction is probed, the opportunities for construct-irrelevant variance to intrude via background knowledge and multiple media representations is multiplied. In addressing this constraint, the general approach has been to select short texts and tasks delivered in a single sitting. Furthermore, the texts are chosen so as to be somewhat novel at the time of presentation to the majority of test takers, though the content and context are expected to be part of learners' general cultural knowledge. This approach has proven efficient and functional for large-scale assessment purposes (e.g., NAEP assessment design). Moving beyond these models will present challenges. However, constructs of comprehension that permit the contrasts of different representational forms may provide richer evidence about both specific and general proficiencies.

FINAL THOUGHTS Venezky (1997) seemed confident that books and texts, enhanced by electronic forms as they emerge, would be appreciated and valued as long as educators and learners adapted to the affordances they provide. He wrote: We have no memory today of the time when textbooks were made of clay tablets or quires of papyrus, nor do we lament their passing, even though they lasted nearly a thousand years. There will probably be a time in the future when memory of books printed on paper sheets gathered between covers will disappear. Audiences in theaters, should there still be such assembly places, will express mild surprise and benign amusement when viewing videos of reading in the 20th century, and librarians (or their equivalents) will puzzle over the tactical problems in our time of providing space for so many millions of printed volumes. Yet we have no reason to believe the cognitive processes involved in the writing or reading of good fiction will change dramatically in the near or distant future, nor the need to teach plot, character, author intent, or any component of the fictive process. In other words, there is no imminent threat to the content of literary textbooks from technology. Its future continues to lie within the hands of educators who select the canon and method of presentation. Whether the aesthetics of literature will continue to be appreciated, even by a few, depends more on the backbone of the language arts educators than it does on how many pixels can be packed onto a delimited surface or how cheaply solid state devices can be manufactured. But however safe the literary textbook may be for the immediate future, the technologies described here offer opportunities for advancing literary instruction that should not be ignored. All of them place more resources in the hands of teachers and students than ever before, and require, for full utilization, more independent exploration and testing by students than is common now in language arts classrooms. Students will need encouragement

168

SABATINI

and instruction in how to use on-line information such as author comments, instructor interpretations, definitions and so on, to improve their understanding and appreciation of fiction. ... We are far beyond the era of computer-as-threat; each new technology can be evaluated for its potential in enhancing literacy instruction, as well as for its potential drawbacks. In a world in which only a small proportion of even the college-educated adults read serious fiction, we can all use all the help we can get in making reading enjoyable, rewarding, and enhancing. (pp. 534-535)

Many of the authors reviewed in the chapter share Venezky's optimism that the reading of texts, fiction or other, will not soon disappear in a posttypographical age. However, there are risks. After all, for visual text processing to continue as a common social practice requires that succeeding generations continue to acquire complex cognitive modules to ensure rapid automatic word recognition and fluency in silent reading of continuous prose pieces of considerable durations (e.g., chapters in this volume), with varying cognitive challenges across languages and writing systems (Taylor & Olson, 1995; Venezky, 1995, 1999) in a world still largely non-literate with respect to written/printed scripts. So much of U.S. cultural and intellectual history is still contained in books that the task of transferring them to alternative symbolic representations (oral readings, films, animations, even hypertexts), though ongoing, is not likely to be completed in the near term.12 Also, as argued, transfer to other representations would be altering their meaning and interpretation as printed documents. Thus, those of us who were privileged to learn visual text processing have access, while others not so trained would be denied. Equity and fairness then suggest that we should provide education so that anyone could have a choice of access based on print text reading proficiencies. However, the era of printed text as the primary receptacle and transmission technology of cultural knowledge is passing, presaging greater urgency in promoting print media so that the next generation of learners will value and seek it out in all its forms and the intellectual traditions it subsumes. REFERENCES American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Education Research Association. 12 Technically, everything in printed text form can be easily translated into spoken language via an optical scan, if necessary and a text to speech synthesizer. However, universal accessibility, ease of use, and acceptance of such technologies for the non-disabled general public is probably still likely some years in the future.

8. MEDIA CONSTRUCTS

169

Aufderheide, P. (2001). Media literacy: From a report of the National Leadership Conference on Media Literacy. In R. Kubey (Ed.), Media literacy in the information age: Current perspectives (Vol. 6, pp. 79-86). New Brunswick, NJ: Transaction Publishers. Bazalgette, C. (2001). An agenda for the second phase of media. In R. Kubey (Ed.), Media literacy in the information age: Current perspectives (Vol. 6, pp. 69-78). New Brunswick, NJ: Transaction Publishers. Berninger, V. W., & Richards, T. L. (2002). Brain literacy for educators and psychologists. Boston: Academic Press. Bisanz, G. L., Das, J. P., Varnhagen, C. K., & Henderson, H. K. (1992). Structure and components of reading times and recall for sentences in narratives: Exploring changes with age and reading ability. Journal of Educational Psychology, 84, 102-114. Bolter, J. D. (1998). Hypertext and the question of visual literacy. In D. Reinking, M. C. McKenna, L. D. Labbo, & R. D. Keiffer (Eds.), Handbook of literacy and technology: Transformations in a post-typographic world (pp. 3-14). Mahwah, NJ: Lawrence Erlbaum Associates. Britton, B. K., & Graesser, A. C. (Eds.). (1996). Models of understanding text. Mahwah, NJ: Lawrence Erlbaum Associates. Calfee, R. (1997). Assessing development and learning over time. In J. Flood, S. B. Heath, & D. Lapp (Eds.), Handbook of research on teaching literacy through the communicative and visual arts (pp. 144-166). New York: Macmillan. Chandler-Olcott, K., & Mahar, D. (2003). "Tech-savviness" meets multiliteracies: Exploring adolescent girls' technology-related practices. Reading Research Quarterly, 38(3), 356-385. Christ, W. G. (1997). Defining media education. In W. G. Christ (Ed.), Media education assessment handbook (pp. 3-22). Mahwah, NJ: Lawrence Erlbaum Associates. Cipolla, C. (1969). Literacy and development in the west. Harmondsworth, England: Penguin. Clanchy, M. T. (1993). From memory to written record: England 1066-1302 (2nd ed.). Cambridge, MA: Blackwell. Eisenstein, E. L. (1979). The printing press as an agent of change: Communications and cultural transformations in early modern Europe. Cambridge, UK: Cambridge University Press. Farr, R., & Jongsma, E. (1997). Accountability through assessment and instruction. In J. Flood, S. B. Heath, & D. Lapp (Eds.), Handbook of research on teaching literacy through the communicative and visual arts (pp. 592-606). New York: Macmillan. Flood, J., Heath, S. B., & Lapp, D. (Eds.). (1997). Handbook of research on teaching literacy through the communicative and visual arts. New York: Macmillan. Goody, J., & Watt, I. (1968). The consequences of literacy. In J. Goody (Ed.), Literacy in traditional societies (pp. 27-68). Cambridge, UK: Cambridge University Press. Graff, H. J. (1979). The literacy myth: Literacy and social structure in the nineteenth-century city. New York: Academic Press. Graff, H. J. (1987). The legacies of literacy: Continuities and contradictions in Western culture and society. Bloomington: Indiana University Press. Hagood, M. C. (2003). New media and online literacies: No age left behind. Reading Research Quarterly, 38(3), 387-391. Hobbs, R. (2001). Expanding the concept of literacy. In R. Kubey (Ed.), Media literacy in the information age: Current perspectives (Vol. 6, pp. 163-183). New Brunswick, NJ: Transaction Publishers. Hobbs, R., & Frost, R. (2003). Measuring the acquisition of media-literacy skills. Reading Research Quarterly, 38(3), 330-355. International Association for the Evaluation of Educational Achievement. (2000). Framework and specifications for PIRLS assessment 2001. Chestnut Hill, MA: International Study Center, Boston College. Jonassen, D. H. (Ed.). (2001). Handbook of research for educational communications and technology. Mahwah, NJ: Lawrence Erlbaum Associates.

170

SABATINI

Kamil, M., Mosenthall, P., Pearson, D., & Barr, R. (Eds.). (2002). Handbook of reading research, Volume III. Mahwah, NJ: Lawrence Erlbaum Associates. Kamil, M. L., & Lane, D. M. (1998). Researching the relation between technology and literacy: An agenda for the 21st century. In D. Reinking, M. C. McKenna, L. D. Labbo, & R. D. Keiffer (Eds.), Handbook of literacy and technology: Transformations in a post-typographic world (pp. 323-342). Mahwah, NJ: Lawrence Erlbaum Associates. Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge, UK: Cambridge University Press. Kittay, J. (1991). Thinking through literacies. In D. R. Olson & N. Torrance (Eds.), Literacy and orality (pp. 165-173). Cambridge, UK: Cambridge University Press. Knupfer, N. N., & McLellan, H. (2001). Descriptive research methodologies. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (pp. 1196-1213). Mahwah, NJ: Lawrence Erlbaum Associates. Krendl, K. A., Ware, W. H., Reid, K. A., & Warren, R. (2001). Learning by any other name: Communication research traditions in learning and media. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (pp. 93-111). Mahwah, NJ: Lawrence Erlbaum Associates. Kubey, R. (Ed.). (2001). Media literacy in the information age: Current perspectives (Vol. 6). New Brunswick, NJ: Transaction Publishers. Lemke, J. L. (1998). Metamedia literacy: Transforming meanings and media. In D. Reinking, M. C. McKenna, L. D. Labbo, & R. D. Keiffer (Eds.), Handbook of literacy and technology: Transformations in a post-typographic world (pp. 283-302). Mahwah, NJ: Lawrence Erlbaum Associates. Mackey, M. (2003). Researching new forms of literacy. Reading Research Quarterly, 38(3), 403-407. Masterman, L. (2001). A rationale for media education. In R. Kubey (Ed.), Media literacy in the information age: Current perspectives (Vol. 6, pp. 15-68). New Brunswick, NJ: Transaction Publishers. Messaris, P. (2001). Visual "literacy" in cross-cultural perspective. In R. Kubey (Ed.), Media literacy in the information age: Current perspectives (Vol. 6, pp. 135-162). New Brunswick, NJ: Transaction Publishers. Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (Vol. 50, pp. 13-103). New York: American Council on Education. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessment. Educational Researcher, 23, 13-23. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. Mislevy, R. J., Wilson, M. R., Ercikan, K., & Chudowsky, N. (2003). Psychometric principles in student assessment. In T. Kellaghan & D. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 489-531). Dordrecht, The Netherlands: Kluwer Academic Press. National Assessment Governing Board. (2002). Reading framework for the 2003 National Assessment of Educational Progress. Washington, DC: U.S. Department of Education. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Washington, DC: National Institute of Child Health and Human Development. New London Group. (1996). A pedagogy of multiliteracies: Designing social futures. Harvard Educational Review, 66, 60-92. Olson, D. R. (1991). Literacy as a metalinguistic activity. In D. R. Olson & N. Torrance (Eds.), Literacy and orality (pp. 251-270). Cambridge, UK: Cambridge University Press.

8. MEDIA CONSTRUCTS

171

Olson, D. R., & Torrance, N. (Eds.). (1991). Literacy and orality. Cambridge, UK: Cambridge University Press. Organization for Economic Cooperation and Development. (2000). Measuring student knowledge and skills: The PISA 2000 assessment of reading, mathematics, and scientific literacy. Paris: Author. Pattanayak, D. P. (1991). Literacy: An instrument of oppression. In D. R. Olson & N. Torrance (Eds.), Literacy and orality (pp. 90-104). Cambridge, UK: Cambridge University Press. Potter, W. J. (2001). Media literacy (2nd ed.). Thousand Oaks, CA: Sage. Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively responsive reading. Mahwah, NJ: Lawrence Erlbaum Associates. Reinking, D. (1998). Introduction: Synthesizing technological transformations of literacy in a post-typographic world. In D. Reinking, M. C. McKenna, L. D. Labbo, & R. D. Keiffer (Eds.), Handbook of literacy and technology: Transformations in a post-typographic world (pp. xi-xxx). Mahwah, NJ: Lawrence Erlbaum Associates. Reinking, D., McKenna, M. C., Labbo, L. D., & Keiffer, R. D. (Eds.). (1998). Handbook of literacy and technology: Transformations in a post-typographic world. Mahwah, NJ: Lawrence Erlbaum Associates. Saenger, P. (1991). The separation of words and the physiology of reading. In D. R. Olson & N. Torrance (Eds.), Literacy and orality (pp. 198-214). Cambridge, UK: Cambridge University Press. Salomon, G. (1997). Of mind and media: How culture's symbolic forms affect learning and thinking. Phi Delta Kappan, 78(5), 375-380. Scholes, R. J. (1995). Orthography, vision, and phonemic awareness. In I. Taylor & D. R. Olson (Eds.), Scripts and literacy: Reading and learning to read alphabets, syllabaries, and characters (pp. 359-373). Dordrecht, The Netherlands: Kluwer Academic Publishers. Schramm, W. (1977). Big media, little media. Beverly Hills, CA: Sage. Scribner, S., & Cole, M. (1981). The psychology of literacy. Cambridge, MA: Harvard University Press. Silverblatt, A. (2001). Media literacy: Keys to interpreting media messages (2nd ed.). Westport, CT: Praeger. Taylor, I., & Olson, D. R. (1995a). An introduction to reading the world's scripts. In I. Taylor & D. R. Olson (Eds.), Scripts and literacy: Reading and learning to read alphabets, syllabaries, and characters (pp. 1-15). Dordrecht, The Netherlands: Kluwer Academic Publishers. Taylor, I., & Olson, D. R. (Eds.). (1995b). Scripts and literacy: Reading and learning to read alphabets, syllabaries, and characters. Dordrecht, The Netherlands: Kluwer Academic Publishers. Tierney, R. J. (1997). Learning with multiple symbol systems: Possibilities, realities, paradigm shifts and developmental considerations. In J. Flood, S. B. Heath, & D. Lapp (Eds.), Handbook of research on teaching literacy through the communicative and visual arts (pp. 286-300). New York: Macmillan. Toulmin, S. (1958). The uses of argument. Cambridge, UK: Cambridge University Press. Tynan, K. (1998). Literacy in a digital world: Teaching and learning in the age of information. Mahwah, NJ: Lawrence Erlbaum Associates. Venezky, R. L. (1984). The development of literacy in the industrialized nations of the West. In R. Barr, M. Kamil, P. Mosenthall, & D. Pearson (Eds.), Handbook of reading research (pp. 46-67). New York: Longman. Venezky, R. L. (1992). Textbooks in school and society. In P. W. Jackson (Ed.), Handbook of research on curriculum (pp. 436-461). New York: Macmillan. Venezky, R. L. (1993). History of interest in the visual component of reading. In D. M. Willows & R. S. Kruk (Eds.), Visual processes in reading and reading disabilities (pp. 3-30). Hillsdale, NJ: Lawrence Erlbaum Associates. Venezky, R. L. (1995). How English is read: Grapheme-phoneme regularity and orthographic structure in word recognition. In I. Taylor & D. R. Olson (Eds.), Scripts and liter-

172

SABATINI

acy: Reading and learning to read alphabets, syllabaries, and characters (pp. 111-130). Dordrecht, The Netherlands: Kluwer Academic Publishers. Venezky, R. L. (1997). The literary text: Its future in the classroom. In J. Flood, S. B. Heath, & D. Lapp (Eds.), Handbook of research on teaching literacy through the communicative and visual arts (pp. 528-535). New York: Macmillan. Venezky, R. L. (1999). The American way of spelling: The structure and origins of American English orthography. New York: Guilford. Warschauer, M. (1999). Electronic literacies. Mahwah, NJ: Lawrence Erlbaum Associates. Wulff, S. (1997). Media literacy. In W. G. Christ (Ed.), Media education assessment handbook (pp. 123-142). Mahwah, NJ: Lawrence Erlbaum Associates.

9 From Real Virtuality in Lascaux to Virtual Reality Today: Cognitive Processes With Cognitive Technologies David Mioduser Tel-Aviv University

Cogito ergo sum

(Descartes, 1637) Digito ergo =(SUM) (ABC, 1939; ENIAC, 1946) Let no attempt be made to sap the strength from the meaning of the relation: relation is mutual

(M. Buber, 1958, p. 20) Fifteen thousand years ago, the Paleolithic denizens of the Lascaux, Pechmerle, or Altamira caves labored to represent aspects of reality that were vital to their life: the animals on which they fed. A crucial motivation for these creations was suggested to be their belief in the power inherent in the representations and imitations of reality to affect and modify aspects of that reality (Fisher, 1963; Hauser, 1951). The hunter who masqueraded as the animal he wished to hunt, and who through identification with the animal intended to increase the yield of the hunt, used the power of imitation to control reality. The cave paintings were not just representations of reality; they were conceived as reality itself (real virtuality?), and any action directed at them (such as the throwing of arrows or the casting of spears) signified an action affecting a reality destined to take place (virtual reality?). The representations are a human creation, but their presen173

174

MIODUSER

tation on the walls of the cave evoked a complex relationship between the creator and his creation with regard to the represented reality. In more contemporary terms, this situation can be seen as ancient evidence of the reciprocal relationship between the means and products of knowledge technology and the thoughts and beliefs of the creator of these technological means and products. Perceptions, beliefs, and knowledge about ourselves, about the world around us and about the cosmos, have undergone a number of essential transformations since humans first endowed their cave paintings with qualities representative of the actual hunt. A common claim is that these transformations are not a consequence of any change in "hardware" (i.e., in the physiological characteristics of the human brain). For more than 100,000 years, the human brain has remained stable at a volume of about 1,300 cm3 and the configuration found in modern humans, the Homo sapiens sapiens. Yet with the same physiological infrastructure, human beings in different periods have created different or even opposing conceptions and systems of knowledge. A typical example is the drastic transformations in conceptual perspective (formal theories) and intuitive approach (personal conceptions), which characterize the evolution and development of humans' perception of the universe. Some other examples include the following: A known Egyptian drawing portrays the firmament's divinity represented as an arch (the sky) wrapping the Earth, who eats the Sun every day in the west, thereby causing night, and who releases it in the east for the beginning of a new day. After about 2,000 years of the prevailing cosmological theory of Aristotle, according to whom the celestial elements are organized in perfect order in eight layers that revolve around the Earth, Copernicus removed the Earth from the center of the universe and shifted it to the heavens. Additional conceptual transformations succeeded from Galileo to Kepler to Descartes to Newton and so on, complementing, expanding, contradicting, and replacing each other. And conceptual transformations continued not just with regard to the cosmos but also with regard to every aspect of the natural and artificial worlds in which we live (Kuhn, 1970; Simon, 1985). To the range of factors that presumably influence the development of knowledge, theories, beliefs, and conceptions, the present chapter addresses a specific factor: the knowledge technologies with which we interact. This chapter presents further elaboration on the claim that the development of knowledge, skills, and cognitive processes is influenced by the demands and constraints presented by available knowledge technologies (Olson, 1985; Pea, 1994). From this perspective, cognitive processes are viewed not just as a basic characteristic of mind but also as a consequence of the interaction between cognitive structures and cognitive technologies.

9. COGNITIVE TECHNOLOGIES

175

At this point, it is appropriate to define knowledge technology and cognitive technology because these two terms occur throughout the chapter. Technology may be defined as human knowledge applied in solving problems and in the creation of the artificial world (Mioduser, 1998). Since the first creatures we call humans began to solve problems by means of modifying nature, they—we—have initiated the exciting enterprise of devising the artificial world. Once referred to as the manmade world, I prefer to use here a broader definition: the mind-made world—all human mind's products (either symbolic or physical) aiming to cope with problems and satisfy needs (e.g., a theory, a chair, a computer). Technology serves to complement and augment natural human abilities, such as vision beyond the distance at which the unaided eye can see or on microscopic scales or movement from one place to another at speeds that far exceed the speed of walking or running (McLuhan, 1964). Knowledge or cognitive technology may be defined as all the means (either instrumental or methodological) that contribute to the completion and expansion of the natural abilities of the human mind, in processes relating to the handling of knowledge, thinking, learning, and solving problems (Pea, 1985). Examples of cognitive technologies are writing and all kinds of symbol systems, the various types of measuring instruments and data collection tools, calculation tools, means for storing and processing knowledge, means for simulating events, phenomena and processes, and planning methods. Technologies for the handling of knowledge (generation, organization, storage, processing, retrieval, and transmission) are among the quintessential products of human reason. Between these cave paintings and the feats of virtual reality that are materializing before our very eyes lies a rich history of the development of knowledge technologies. Inspection of the reciprocal relationship between knowledge technologies and human reason reveals a complex system: (a) knowledge technology is a product of reason; (b) while being knowledge itself, it is intended for handling knowledge; (c) the handling of knowledge itself, again, is a vital function of reason, which (d) uses technology (its product) to carry out that same function. These intricate relationships have long since preoccupied those interested in knowledge technologies, and such reflection is of unparalleled relevance in today's microprocessor-based digital technology world. In the sections that follow, I survey various interpretations of these reciprocal relationships and the way in which these are reflected in learning environments and in computer-based teaching aids. The survey of the different approaches includes references to published theoretical and empirical work as well as brief descriptions of research projects by our Knowledge Technology Lab researchers at Tel-Aviv University's School of Education.

176

MIODUSER

INTERPRETATIONS OF THE RECIPROCAL RELATIONSHIPS BETWEEN COGNITIVE PROCESSES AND KNOWLEDGE TECHNOLOGIES Key questions regarding the nature of the reciprocal relationships between technology and cognitive processes were already raised more than 2,000 years ago by Plato. In response to a question posed by his student, Phaedrus, Socrates refers to one of the gods in Egypt who presented to the King his recent invention, writing: "Here, O king, is a branch of learning that will make the people of Egypt wiser and improve their memories: my discovery provides a recipe for memory and wisdom" (Hamilton & Cairns, 1961, p. 520). Writing is presented as a new and powerful cognitive technology that may liberate the mind from the necessity of carrying a large number of details and facts. By allowing the storage of the knowledge elsewhere "outside" the mind, it now becomes possible to divert valuable cognitive energy to more complex thought processes. But, King Tammuz's response was not long in coming: If men learn this, it will implant forgetfulness in their souls; they will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks. What you have discovered is a recipe not for memory, but for reminder. ... By telling them of many things without teaching them you will make them seem to know much, while for the most part they know nothing. (Hamilton & Cairns, 1961, p. 520) This sweeping criticism of Tut's invention is interesting for a number of reasons. First, it clearly establishes the parties to an argument that nowadays, when computer and communication technologies are being introduced into schools, is still a very topical one. One party stresses the potential inherent in a technology designed to carry out processes that up until its appearance were exclusively performed by human reason. The other party points out the dangers involved in relegating a growing number of human mental functions to technology, lest this lead to degeneration of human cognition. Second, the king doubts the quality of the wisdom created by means of the technology. Replacing students' own involvement in creating knowledge, the appropriation of externally represented knowledge can, in his view, only lead to "seemingly rather than truly wise" people. Finally, the king warns of another factor that may generate knowledgeable but actually ignorant people: the depreciation of teaching (as perceived by the philosopher) as essential means for negotiating knowledge. Again, this is a very contemporary issue in view of the claim that

9. COGNITIVE TECHNOLOGIES

177

technology is now making it possible to create environments in which the student learns without being taught (e.g., "Logo"; Papert, 1980). A number of valuable conclusions can be drawn from Plato's discussion regarding writing. First, there is the very idea that the interaction with knowledge technologies affects cognition at various levels (e.g., regarding the nature of acquired knowledge and skills; the strategies for the acquisition, storage, and retrieval of knowledge and skills; or the mental modeling of aspects from the physical and social worlds). Second, the perception of this interaction and its value is affiliated with, and shaped by, people's overall philosophical, cultural, and moral conceptions regarding cognition, learning, and teaching. Finally, the need to reformulate and reconsider essential questions regarding this interaction becomes crucial whenever a new relevant technology is developed. Today's information and communication technologies embody processes once the sole preserve of human cognition. As such, the interaction between natural (human) and artificial cognitive processes becomes a fascinating field for inquiry and reflection. For more than five decades, since the appearance of the first electronic digital computers and associated technologies, these complex reciprocal relationships have received various interpretations in the form of both theoretical and research work. In addition, these different approaches have had their counterparts regarding the educational implications of the integration of the technology into teaching and learning processes. In the following sections, I briefly refer to nine approaches for analyzing and discussing these relationships, along with their implications for education. From these different perspectives, cognitive technologies are conceived as supporting, respectively, the acquisition, extension, consolidation, externalization, internalization, construction, collaborative creation, compensation and evolution of cognitive processes and skills (see Fig. 9.1). Regarding practical implications, the various approaches have found their way into the educational arena in the form of computer- or network-based learning environments and tools. Next, I refer to each approach, along with examples of its instantiation, by means of educational technologies. Acquisition In the 6th century, Pope Gregory claimed that the sculptural pieces in the churches are the books for those who are not able to read. This claim is archetypal of the deeply rooted perception of the function of cognitive technologies that view these as powerful means of supporting the acquisition of knowledge and skills. This perception is embedded in the rationales for

178

MIODUSER

acquisition extension consolidation

COGNITIVE TECHNOLOGY supports the

externalization

of COGNITIVE RESOURCES

internalization (e.g., knowledge, skills, compensation

processes, metacognition)

construction collaborative creation evolution

FIG. 9.1. Approaches to the description of reciprocal relations between cognition and technology.

(educational) use of a wide range of tools and means along history (e.g., visual and mnemonic aids in preliterate societies, the book for the last 500 years, or today's computer-assisted instruction [CAI] software packages). In pre-Gutenberg times, churches were considered encyclopedias of glass and stone. These immersive environments fulfilled didactic functions, served as content conveyers (e.g., knowledge, values, norms) and memory agents, by means of systematic resources (e.g., conventions regarding color usage, motives, order and sequence of images). A popular learning aid among Middle Ages scholars and liberal professionals was the Ad Herennium, a book aimed to teach techniques and strategies to empower memory, a key cognitive resource in those times (Burke, 1985). In the transition from the print to the digital era, the programmed instruction model was developed (Garner, 1966). The model incorporates principles based on the behaviorist approach toward learning (e.g., reinforcements, repeated practice) and translates them into a systematic repertoire of instructional methods and techniques (e.g., meticulous dissection of the topic or task into discrete units or frames, formulation of clear operational objectives for each unit—and success criteria—precise definition of sequences of units, and planning of reinforcements). First instantiated in mechanical teaching machines (e.g., Skinner machines), the

9. COGNITIVE TECHNOLOGIES

179

model found its optimal instantiation with the advent of computer technology, in the form of classical CAI (Venezky & Osin, 1991). Notwithstanding the numerous transformations in the model components along several decades (e.g., in the complexity of the underlying branching/ adaptive algorithm; in the incorporation of multiple representational resources—text, still images, sound, video—and interactive modes; or in the didactic value of the feedback reinforcement supplied), its essential principles still characterize many of the instructional products being developed today, even within the Internet environment (Mioduser, Nachmias, Oren, & Lahav, 1999). These examples, and countless additional teaching means developed throughout the times, are the products of a common approach: the perception of cognitive technology as a powerful facilitator for the acquisition of knowledge and skills. Extension of Natural Capabilities This approach adopts the perception of any technology as an extension of people's natural capabilities (McLuhan, 1964). Tools and machines extend and augment the activities of the muscular system—the means of locomotion (the locomotive ability of the legs) and the means of shelter (clothing, housing), which provide more efficient protection than that naturally provided by the skin. In a similar fashion, cognitive technologies expand and augment the functions of the central nervous system: They aid the memorization of large quantities of knowledge through means external to the mind (books, CDs), the performance of fast and precise calculations (the multiplication table, calculator), or the running of knowledge-intensive decision-making processes (expert systems). Writing is an instance of a technology that plays a crucial role in extending natural capabilities, as discussed earlier. Leibnitz, in the 17th century, referred to various symbol systems such as "... words; letters; chemical, astronomical, Chinese and hieroglyphical figures; musical, stenographic, arithmetic and algebraic marks; and all others we use for things when thinking ... signs are the more useful the more they express the concept of the thing they denote, so that they may do service not only in representation but also in reasoning" (Dascal, 1987, p. 181). Today, generic tools for processing words, images, sounds, or numbers are part of the students' learning environment in all subject areas, extending the learner's capability to carry out complex manipulations of information in varied representational forms. In addition, communication networks enhance students' ability to interact with large repositories of knowledge, as well as with people (e.g., teachers, peers, experts), beyond school time and space constraints. The extension of natural capabilities

180

MIODUSER

approach is implemented in learning environments in several modalities or through a number of metaphors: • The Olympic games motto metaphor (i.e., Citius, Altius, Fortius) perceives technology plainly as empowerment means (e.g., amplification) for the learners' cognitive abilities. • The division of labor metaphor perceives of the learner as someone who concentrates on performing high order cognitive functions by delegating functions that are routine, repetitive, time-consuming, and computation extensive to the technology. • The artifactual extension metaphor conceives the technological device in use by the individual as inscribed within his or her cognitive contour (e.g., the blind person's cognitive contour includes the cane or the searcher's cognitive contour includes the browser, the search engine, the complex communication network, and the servers containing the target information) (Gibson, 1986). In an extreme version of this metaphor, Logan (2000) suggests that "if media are extensions of our psyches, the interconnectivity of the internet means that its users will become extensions of each other's psyches" (p. 31). Activation and Consolidation In this approach, technology constitutes a facilitating milieu for the activation and consolidation of cognitive skills and abilities at various stages of the learner's cognitive development (diSessa, 2000; Lajoie & Derry, 1993). The unique features of the technology supply opportunities, as well as demands, for the activation of otherwise either latent or underdeveloped cognitive resources. Classic examples of this approach relate to the use of generic tools, such as spreadsheets, word or image processors, or search engines. Most of the core skills involved in using these tools are normally acquired without any connection with computer technology (e.g., writing, drawing, information search, or number manipulation skills). But the claim is that the use of the technological tools triggers the emergence of qualitatively different manifestations of these core skills (McCullough, 1996). In the spreadsheet example, the abilities required for the definition of the mathematical content of each individual cell are not exclusive to this tool. But the spreadsheet enables the development of more sophisticated abilities (e.g., the perception of the tool as phenomenon-modeling means, therefore implying the perception of the problem to be solved as a modelable entity; the ability to define a problem in terms of an array of cells, their content, and the link-paths among them; or the ability to selectively manipulate cells' values to explore hypotheses).

9. COGNITIVE TECHNOLOGIES

181

Externalization Cognitive technologies allow external presentation to the mind of the process and products of cognitive activity. We make use of various types of artifacts (such as writing, sketches, diagrams, or graphs) as thinking aids for the presentation, organization, analysis, or summary of an idea or of a problem solution (Miller, 1984). The support of cognitive process presentation is provided by various types of computerized tools. For example, information retrieval software allows texts and images received during a search to be copied and added to one's own electronic notebook to create a personal document relevant to the current task (Oren & Chen, 1992). Moreover, the queries and searches made are also recorded. These documents constitute a picture or reflection of the search process, of the unfolding and development of the queries, of decisions regarding the relevance of the material, and of decisions regarding the structure of the document formed. Another form of externalization support consists of computer tools that incorporate mechanisms aiming to explicitly encourage students' reflection on their performance. An example of this kind of tool is a program for the investigation of meteorological phenomena and their prediction ("The Weather Machine"; Mioduser, Venezky, & Gong, 1998). The learning is conducted as a dialogue between the learner and a computerized partner. At a certain point, they both propose a weather forecast based on the meteorological data defined at a preceding stage. If there are discrepancies between both forecasts, the student is asked to examine the events included in his or her forecast that the computerized expert did not include and to point out which specific factors (such as temperature, cloudiness, or barometric pressure) led the student to the inclusion of these events. The computerized expert examines then the possible effect of the factors mentioned by the student, presents its feedback (e.g., "It is possible that factor X will cause strong winds from the east, but because of the interaction between factor X and factor Y it is not plausible for this phenomenon to occur") and asks the student to suggest how to proceed with the discussion. In programs that incorporate such mechanisms, the processes of externalization, thinking-about, and reflection are intentionally and explicitly supported by external aids. Perhaps a strong representative of the situation at which technology supports externalization of thinking processes is computer programming, in particular, educational programming (diSessa, 2000; Papert, 1980). A program code is an explicit mirror of the programmer's problem-solving process. When a bug occurs in a program, this is in fact a programmer's (conceptual) bug (e.g., a branching point incorrectly defined, a missing variable, or an endless-loop construct). Navigating the code, reconstruct-

182

MIODUSER

ing the (unexpected) process, locating and repairing faulty segments are part of a process in which the program and its programmer's thinking are simultaneously debugged. Even with today's sophisticated tools, due to which traditional programming has been relegated behind the scenes far from the user's apparent space of action (e.g., using friendly Web-page editors we do HTML tagging without actually writing source code, or using a word processor's formatting menus we generate printing specifications within a document without actually dealing with code), debugging remains a fundamental reflective process. Internalization Cognitive scientists investigate information-processing processes carried out by cognitive beings ("Informavores," Miller, 1984; "Cognizers," Pylyshyn, 1984), taking place in either the human mind or a machine. Many years of research have yielded formal and computational models of various aspects of the information-processing process (such as perception, representation of information, memory storage and retrieval, and problem solving). All these are intellectual artifacts, the products of human reason, yet they simultaneously serve as aids to the functioning of human reason. There is a large variety of well-known intellectual artifacts that are often found in schools (e.g., the multiplication table, the set of rules or operators for the solution of algebraic equations, grammatical rules, procedures for the analysis of a literary work). The assumption that guides the teaching of these artifacts is that once they are assimilated by learners they enrich their cognitive functioning (Ohlsson, 1993). A similar claim is maintained with regard to work with sophisticated knowledge technologies. Some work belonging to this approach focuses on the internalization or internal mapping of qualities implicitly present in the technology. The interaction with the technology is supposed to leave cognitive traces that go beyond the work as such and may be generalized for use in new situations and areas (Salomon, Perkins, & Globerson, 1991). As an example, at a basic level we can consider the everyday use of metaphors originated in the interaction with technology while referring to situations (individual, social) not related to technology as having become useful language as well as thought devices. At a more complex level, we can relate to people's cognitive assimilation of defining components of the continuously developing visual language of our time (from the first movie scenes shouted through television evolution to the current digital worlds). Significant examples are our ability to visually reason in terms of (and feel comfortable with) zoom ins and outs beyond our natural capabilities, simultaneous multiple perspectives, visual transformations, or fast bombardment of visual fragments out of which we compose stories.

9. COGNITIVE TECHNOLOGIES

183

From a different perspective, educators deal with the explicit instruction of complex cognitive aids (Novak, 1990; Wideman & Owston, 1993). In the project reported by Mioduser and Santa Maria (1995), fifth and sixth grade students worked in a computerized environment to create treelike structured knowledge bases (the knowledgeable tree). The product is a (textual and visual) knowledge base the students or their classmates can navigate to carry out tasks and write projects. All stages in the construction of the knowledge tree (e.g., mapping the content, defining the knowledge units or nodes, establishing hierarchy levels and links) are performed following the conceptual model embedded in the computer tool. The expectation is that this model will be assimilated as a powerful knowledge-manipulation resource to be activated in future scenarios. Indeed, the study showed clear impact of the learning not only on the students' knowledge representation abilities but also on their propensity to use the analytical, organizational, and representational skills they had applied in the course of working with the technology outside the computer environment. Construction This approach focuses on the role of technology in the construction of knowledge and skills. According to this approach, artifacts (including computerized environments) receive the status of "objects to think with" (Granott, 1991; Kindfield, 1994; Papert, 1980). The umbrella of this approach is a pretty wide one, and it covers various types of observers of the reciprocal relationships between technology and reason from a number of different theoretical viewpoints, such as constructivism/constructionism, situated cognition, and cognitive apprenticeship. According to the constructivist viewpoint, learning is a process of knowledge structuring. To this, constructionists add that this structuring will be significantly supported by constructing something in reality—a public entity (Harel & Papert, 1991). Computerized environments, such as the Logo language, or technological systems such as Logo-controlled robotics kits, enable this sort of construction, therefore fostering "the turning of making into thinking" (Mitcham, 2001, p. 31). From the situationist point of view, knowledge structures are anchored within the context in which they are developed, and they have a direct relationship with the situation, the tasks, the objects, and the human partners in that context (Brown, Collins, & Duguid, 1989). Knowledge technology makes it possible to create rich and significant learning tools supporting natural processes of learning (e.g., microworlds; Papert, 1980) or inquiry processes in authentic activities (e.g., scientific visualization learning tools; Edelson, Gordin, & Pea, 1999). Sometimes the use of tech-

184

MIODUSER

nology is the only way for learners to encounter a certain reality, which would have been inaccessible for them due to, for instance, complexity, cost, or a high level of risk (Lajoie & Derry, 1993). The third perspective, cognitive apprenticeship, addresses the need to include strategies of training and support in the learning environment to help the learner in the gradual construction of knowledge and skills (scaffolding, coaching). The basic idea is to provide the learner with contextual help and to gradually remove it when it is no longer required (Brown & Palincsar, 1989). Certain computerized systems include mechanisms that combine continuous evaluation of the learner's performance with the creation of support and feedback based on that evaluation (or a student model), in an attempt to achieve maximal individual adaptation of the coaching or tutoring (Albrecht, Koch, & Tiller, 2000; Wenger, 1987). Collaborative Creation The core idea behind this approach is the perception of knowledge as a social construct (Vygotsky, 1979). Knowledge construction is perceived as the consequence of a social process, or at least a process that includes, in addition to the learner, an "other" with whom one cooperates and shares tasks in the process of creating the knowledge (Koschmann, 1994; Nachmias, Mioduser, Oren, & Ram, 2000; Salomon, 1993). The question "Can one expect the lone learner to think and function as a researcher or scientist (biologist, historian, or physicist)?" is replaced with "Can one turn the group (the class) into a knowledge-building community, similar to those that presently advance each and every area of human endeavor?" (Scardamalia & Bereiter, 1994). Along these lines, technology may play a number of roles. At the basic level, technology provides the physical infrastructure for the interaction, such as electronic mail or electronic discussion groups means. Yet beyond this level, one can find complex systems that support varied kinds of processes, such as collaborative writing, collaborative design, or shared annotation systems for collaborative Web browsing (Amory, 1999; Kennedy & McNaught, 2001; Verdejo & Barros, 1999). An example of a collaborative Web-based environment is the Knowmagine Virtual Park for Science Technology and Culture (Mioduser & Oren, 1998). This collaborative learning site on the Internet allows students to enter the park from different locations at the same time, walk through its three-dimensional graphic space, activate exhibits, consult the knowledge center, perform short-term learning tasks interacting with occasional distant visitors, as well as long-term collaborative projects with previously committed distant fellows (e.g., a replica of Galileo's trial in the form of a 2-week role-playing event).

9. COGNITIVE TECHNOLOGIES

185

Unique collaborative entities, which emerged with the development of network technologies and continue to evolve into interesting configurations, are virtual learning communities (Jones, 1997). These are defined as social spaces where people with common (academic or professional) interests carry out learning transactions. By means of varied collaborative models (e.g., online workshops, peer review and friendly critique of ideas, synchronous and asynchronous discussion groups, info-boots), individual and social learning takes place, continuously enriching the shared body of knowledge. Oren, Nachmias, Mioduser, and Lahav (2000) suggest a comprehensive model for virtual learning communities, or Learnets, defined as novel educational systems based on the combination of three components: a virtual community (social dimension), hosted by an appropriate virtual environment (technological dimension), and embodying advanced pedagogical ideas (educational dimension). The model by its different variables (e.g., extent of presence, alternative-status definitions, immersivity, multiuser features, communication means, educator functions, hypercurriculum) aims to serve for both the development and evaluation of virtual learning communities. Compensation By this approach technological means are perceived as cognitive prostheses, providing compensatory resources for the development of cognitive abilities whenever fundamental resources are impaired (e.g., as in blindness or quadriplegia). Cognitive and developmental theories stress the crucial role of interaction with the world (the natural, artificial, and social surroundings) for the evolvement of cognitive functions and skills. Partial or total impairment of an essential interaction channel or means (e.g., vision, hearing, motor ability) seriously affects a person's cognitive functioning. Here, cognitive technologies may supply alternatives in place of the damaged path, providing substitute resources for appropriate cognitive development or functioning. Recent work by my colleagues and me within this approach relates to cognitive-technological support for vision-impaired and blind people. In one case, a computer tool was used to supply semi-intelligent remedial treatment for spelling difficulties of vision-impaired people (Mioduser, Lahav, & Nachmias, 2000). Further recent work focuses on supporting blind people's cognitive mapping of new (unknown) spaces by navigating a virtual model of these spaces based on haptic and auditory feedback (Lahav & Mioduser, 2004). The work within the force-feedback environment supports cognitive processes, such as holistic recognition of the space (as opposed to linear or object-to-object paths characteristic of cane-based navigation) or the composition of spatial advance organizers to be probed and corrected during the actual navigation of the real space (Fig. 9.2).

186

MIODUSER

FIG. 9.2. Dynamic log of a blind person's navigation of a new space, in a training session using the haptic feedback-based virtual environment.

Another example relates to the development of spatial knowledge (e.g., directionality, perspective taking) by severely handicapped persons. Developmental theories claim that motor activity has a critical role in the evolvement of cognitive abilities, among them spatial knowledge. In the case of severe motor impairment, a person is deprived of the possibility to actively manipulate objects in the world, and therefore this may affect his or her cognitive development. In a recent work, subjects with quadriplegia due to cerebral palsy worked with a robotics system to deal with spatial problem solving (Gliksman, 1999). The system served for the mediated manipulation of objects, supplying the subjects with the possibility to act on the physical world and solve spatial problems in ways that would not have been feasible without the technology. Evolution This controversial approach embraces the biological model of evolution and adapts it to analyze and explain in philosophical, social, and cultural terms the role of the interaction between humans and technology in shaping cognition. The main claim is that when human evolution is examined (against that of other living creatures), one unique and distinctive aspect of it, cultural evolution, must be considered. The evolution of living crea-

9. COGNITIVE TECHNOLOGIES

187

tures is mainly endosomatic and is expressed in changes in various organs. Humans are mainly characterized by exosomatic evolution, which is expressed in the creation of new organs—artifacts—outside the body (Olson, 1985). The new external organs related to information generation and handling (such as books or computers) crucially affect the development of new cognitive abilities. A classic example of this approach frequently given in the literature is the shift from spoken to written language (Havelock, 1973). The invention of writing has clear cognitive implications, such as the liberation of memory from the need to bear the tremendous body of knowledge that was hitherto transferred orally from one generation to the next. This shift decreased the importance of the skills, which until then had been used to organize and represent knowledge in a way appropriate for memorizing (rhymes, sayings, a powerful plot). At the same time, it provoked the development of new skills, such as the ability to derive conclusions regarding (large sets of) logical statements, which are stored by means of memorization tools external to the mind. The interaction with cognitive technologies leads to an intellectual or social evolution, which finds expression in the reorganization of cognitive abilities and the development of new skills, structures, and cognitive functions. Recently Logan (2000) suggested that "the computer and the Internet are the most recent in a long series of techniques and technologies that organize human thought" and that these "are part of an evolutionary chain of languages that also includes speech, writing, mathematics, and science" (pp. 61-62). These languages are presented as evolutionary solutions generated vis-a-vis the increasing complexity of human thought. Regarding computer technology, it is enough to mention such issues as the cognitive connotations of different approaches to computer programming (e.g., algorithmic, functional, object oriented) or of the very process of debugging, which have no precedent in previous technologies. The sixth language, the Internet, owns unique semantic and syntactic characteristics therefore implying new cognitive modes (e.g., hypertext, which demands novel approaches to writing and reading, or search processes, which demand appropriate retrieval skills and strategies to exploit the power of immediate accessibility to every unit of information in congested databases).

ADVANCED KNOWLEDGE TECHNOLOGIES IN EDUCATION Technology continues to develop at a rapid pace. There is no sense in trying to predict how the scene will look beyond the immediately foreseeable future, and this too must be done cautiously because an innovation that

188

MIODUSER

seems to be no more than a technical development may suddenly combine with other innovations resulting in a qualitative (not just a quantitative) leap (Dede, 1996; Tubin, Nachmias, Mioduser, & Forkosh-Baruch, 2003; Venezky & Davis, 2002). Reality indicates that the assimilation of technology in education is a very slow process (not to mention the almost certain inability of educational systems to cope with updates and innovations taking place daily). Against this backdrop it seems that any analysis of the way in which the reciprocal relationship between technology and cognition is expressed in education must maintain a comprehensive outlook, rather than latch onto particular technical aspects, which will undoubtedly change and improve in the future. Bearing this perspective in mind, I conclude with the presentation of a number of issues, or evolving questions, which are worth considering. Fostering an Integration of Approaches The previous presentation and classification of the various approaches was of course done for purposes of survey and analysis. In reality, one may view a certain computerized environment as embodying the combined qualities of a number of approaches. An electronic spreadsheet program may serve as an aid in calculation (extension), as a tool for building and representing a model of a phenomenon (externalization), as a tool for the acquisition of skills for analyzing a phenomenon by means of various representations (construction), and more. Any learning process comprises many stages and functions. Determining the importance of a tool that serves as a practice aid (through repeated experience) supporting the automation of a skill, as compared with the importance of an inquiry tool that serves to discover causal relationships between factors, must be relative. This determination is a function of needs and goals. It seems that the analysis of these needs and goals, in combination with a theoretical perspective, must guide the shaping of computerized environments appropriate for supporting the various functions, while combining various modes of interaction between the learner and the technological environment. The Action Space for Cognitive Processes One of the more interesting implications of technological development is how it fosters increasing flexibility and expansion of the action space of cognition. The technology that hundreds of years ago enabled humans to examine the celestial bodies more comprehensively than they were able to do with the naked eye has led to an essential transformation in the perception of the universe. Today, technology creates a space with ever-expand-

9. COGNITIVE TECHNOLOGIES

189

ing boundaries. Let us take the prehistoric cave as metaphor for presentday technology. By means of the equipment of virtual reality (head mount, glove), a cave (any simulated reality) surrounds us directly, encompassing us as an additional layer beyond clothing. On the other hand, by means of computer communication technologies, the cave expands up to the entire world in the form of a giant network connecting millions at every point on the face of the earth—the cyberspace (Rucker, Sirius, & Mu, 1992). Where is the school realm (its physical and human boundaries) actually located on the continuum between the private cave and the networked world? Learning institutions have, and will in the foreseeable future continue to have, definite physical boundaries, with a certain structure and organizational configuration. We may refer to the school and the institutionalized learning activities taking place within it as a window or a frame encompassing a defined field of vision. Then we may ask, can we move this frame back and forth, according to individual and social needs, along the previously mentioned continuum between private and collective cognition in cyberspace? What additional (new, other) schooling and learning functions, modes, and configurations must be developed? What kind of support may cognitive technologies supply for the definition and implementation of these new functions and modes?

Theory Versus Reality There is harsh criticism of the attempts to use technology for teaching and learning, which may be summed up with a statement such as "They're throwing computers into schools, but nobody knows what to do with them." The criticism, in other words, is directed against an overenthusiasm for technological gadgets and effects without proper reference to educational content and needs. It has also been claimed that, in many cases, technology has been used to merely duplicate existing methods (Scardamalia & Bereiter, 1994). Many educational ventures are directed at exploiting technological potential rather than at satisfying educational needs and are technology based rather than educational-theory based (Mioduser et al., 1999). The previous surveyed theoretical approaches regarding the interaction between technology and cognition are still very far from materialization in the everyday reality of schools. We must learn to take advantage of the accumulating theoretical knowledge as leverage for the mindful implementation of technological advances in education and not just as a means of explaining why a particular application is or (in many cases) is not significant to learning.

190

MIODUSER

This task is not an easy one. It is therefore of great importance to drive for a proper combination and just balance between research efforts, development efforts, and application efforts to achieve a better assimilation of cognitive technologies into educational processes and practices.

CODA The walls of our present cave are once again full of representations of reality, but now the walls are not real nor are the paintings they bear; everything takes place inside a helmet, goggles, a glove, and a tiny processor. Today, like 15,000 years ago, the helmet wearer manipulates representations of reality instead of manipulating reality itself, but the difference between the two situations is highly significant: The caveperson used reason to manipulate reality (representing reality and manipulating that representation, in the belief that he or she was manipulating reality); the present-day helmet wearer uses (virtual) reality to manipulate reason (representing reality and manipulating that representation, in the belief that he or she is acquiring skills or knowledge). Even if we refer to less sophisticated means than those of virtual reality (such as simulations, intelligent systems, or even commonly used wordor image-processing tools), these imply the existence of a uniquely powerful feedback system: Reason creates technology (its products), and work with that technology influences, in turn, that same reason and its products. As to education, a deep understanding of the nature of this feedback cycle is fundamental for the appropriate assimilation of cognitive technologies (either existent or still to come) into teaching and learning processes.

REFERENCES Albrecht, F., Koch, N., & Tiller, T. (2000). SmexWeb: An adaptive Web-based hypermedia teaching system. Journal of Interactive Learning Research, 11(3-4), 367-388. Amory, A. (1999, June). Writing on the Web: Technology and implications. Paper presented at the ED-MEDIA Conference, Seattle, WA. Brown, A. L., & Palincsar, A. S. (1989). Guided cooperative learning and individual knowledge acquisition. In L. B. Resnick (Ed.), Knowing, learning and instruction: Essays in honor of Robert Glaser (pp. 393-51). Hillsdale, NJ: Lawrence Erlbaum Associates. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18, 32-42. Buber, M. (1958). / and thou. Edinburgh: T. & T. Clark. Burke, J. (1985). The day the universe changed. Boston: Little, Brown. Dascal, M. (1987). Leibniz—Language, signs, and thought. Amsterdam: John Benjamins. Dede, C. (1996). Emerging technologies and distributed learning. The American Journal of Distance Education, 10(2), 4-36.

9. COGNITIVE TECHNOLOGIES

191

diSessa, A. (2000). Changing minds—Computers, learning and literacy. Cambridge, MA: MIT Press. Edelson, D., Gordin, D., & Pea, R. (1999). Addressing the challenges of inquiry-based learning through technology and curriculum design. The Journal of the Learning Sciences, 8(3-4), 391-450. Fisher, E. (1963). The necessity of art. Harmondsworth, UK: Penguin Books. Garner, W. (1966). Programmed instruction. New York: Center for Applied Instruction. Gibson, J. (1986). The ecological approach to visual perception. Hillsdale, NJ: Lawrence Erlbaum Associates. Gliksman, S. (1999). Effect of learning with a robotics environment on severely motor-handicapped students' acquisition of spatial knowledge. Unpublished master's thesis, Tel-Aviv University. Granott, N. (1991). Puzzled minds and weird creatures: Phases in the spontaneous process of knowledge construction. In I. Harel & S. Papert (Eds.), Constructionism (pp. 295-310). Norwood, NJ: Ablex. Hamilton, E., & Cairns, H. (Eds.). (1961). The collected dialogues of Plato. Princeton, NJ: Princeton University Press. Harel, I., & Papert, S. (Eds.). (1991). Constructionism. Norwood, NJ: Ablex. Hauser, A. (1951). The social history of art. London: Routledge & Kegan Paul. Havelock, E. A. (1973). Prologue to Greek literacy. In D. Boulter (Ed.), Lectures in memory of Louise Taft Semple, second series, 1966-1971. Cincinnati: University of Oklahoma Press. Jones, Q. (1997). Virtual-communities, virtual-settlements and cyber-archaeology: A theoretical outline. Journal of Computer Mediated Communication, 3(3). Kennedy, D., & McNaught, C. (2001). Computer-based cognitive tools: Description and design. In C. Montgomery & J. Vitely (Eds.), Proceedings of Ed-Media 2001 (pp. 925-930). Tampere, Finland: AACE. Kindfield, A. C. (1994). Biology diagrams: Tools to think with. The Journal of the Learning Sciences, 3(1), 1-36. Koschmann, T. (1994). Toward a theory of computer support for collaborative learning. The Journal of the Learning Sciences, 3(3), 219-225. Kuhn, T. S. (1970). The structure of scientific revolutions. Chicago: University of Chicago Press. Lahav, O., & Mioduser, D. (2004). Exploration of unknown spaces by people who are blind, using a Multisensory Virtual Environment (MVE). Journal of Special Education Technology, 19(3), 15-23. Lajoie, S. P., & Derry, S. J. (Eds.). (1993). Computers as cognitive tools. Hillsdale, NJ: Lawrence Erlbaum Associates. Logan, K. (2000). The sixth language—Learning a living in the Internet age. Toronto, Canada: Stoddart. McCullough, M. (1996). Abstracting craft—The practiced digital hand. Cambridge, MA: MIT Press. McLuhan, M. (1964). Understanding media: The extensions of man. New York: McGraw-Hill. Miller, G. A. (1984). Informavores. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages (pp. 111-113). New York: Wiley. Mioduser, D. (1998). Framework for the study of the cognitive nature and architecture of technological problem solving. Journal of Technology Education and Design, 8(2), 167-184. Mioduser, D., Lahav, O., & Nachmias, R. (2000). Using computers to teach computer remedial spelling to a student with low vision: A case study. Journal of Visual Impairment and Blindness, 94(1), 15-25. Mioduser, D., Nachmias, R., Oren, A., & Lahav, O. (1999). Web-based learning environments (WBLE)—Current implementations and evolving trends. Journal of Network and Computer Applications, 22, 233-247. Mioduser, D., & Oren, A. (1998). Knowmagine—A virtual knowledge park for cooperative learning in cyberspace. International Journal of Educational Telecommunications, 4(1), 75-95.

192

MIODUSER

Mioduser, D., & Santa Maria, M. (1995). Students' construction of structured knowledge representations. Journal of Research on Computing in Education, 28(1), 63-84. Mioduser, D., Venezky, R. L., & Gong, B. (1998). The Weather Lab: An instruction-based assessment tool built from a knowledge-based system. Journal of Computers in Mathematics and Science Teaching, 17(2-3), 239-263. Mitcham, C. (2001). Dasein versus design: The problematic of turning making into thinking. International Journal of Technology and Design Education, 11, 27-36. Nachmias, R., Mioduser, D., Oren, A., & Ram, J. (2000). Web-supported emerging collaboration in higher education courses. Journal of Educational Technology and Society, 3(3), 94-104. Novak, J. D. (1990). Concept maps and vee diagrams: Two metacognitive tools to facilitate meaningful learning. Instructional Science, 19, 1-25. Ohlsson, S. (1993). Abstract schemas. Educational Psychologist, 28(1), 51-66. Olson, D. (1985). Computers as tools for the intellect. Educational Researcher, 4, 5-8. Oren, A., & Chen, D. (1992). New knowledge organizations in the history classroom. History and Computing, 4(2), 120-131. Oren, A., Nachmias, R., Mioduser, D., & Lahav, O. (2000). Learnets—Virtual learning communities. International Journal of Educational Telecommunications, 6(2), 141-158. Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. New York: Basic Books. Pea, R. (1994). Seeing what we build together: Distributed multimedia learning environments for transformative communications. The Journal of the Learning Sciences, 3(3), 285-299. Pea, R. D. (1985). Beyond amplification: Using the computer to reorganize mental functioning. Educational Psychologist, 20(4), 167-182. Pylyshyn, Z. W. (1984). Computation and cognition: Toward a foundation for cognitive science. Cambridge, MA: MIT Press. Rucker, R., Sirius, R. U., & Mu, Q. (Eds.). (1992). Mondo 2000—A user's guide to the new edge. New York: HarperCollins. Salomon, G. (1993). On the nature of pedagogic computer tools: The case of the writing partner. In S. P. Lajoie & S. J. Derry (Eds.), Computers as cognitive tools (pp. 179-196). Hillsdale, NJ: Lawrence Erlbaum Associates. Salomon, G., Perkins, D. N., & Globerson, T. (1991). Partners in cognition: Extending human intelligence with intelligent technologies. Educational Researcher, 20(3), 2-9. Scardamalia, M., & Bereiter, C. (1994). Computer support for knowledge building communities. The Journal of the Learning Sciences, 3(3), 265-283. Simon, H. A. (1985). The sciences of the artificial. Cambridge, MA: MIT Press. Tubin, D., Nachmias, R., Mioduser, D., & Forkosh-Baruch, A. (2003). Domains and levels of pedagogical innovation in schools using ICT: Ten innovative schools in Israel. Education and Information Technologies, 8(2), 127-145. Verdejo, M., & Barros, B. (1999, June). Combining user-centered design and activity concepts for developing computer-mediated collaborative learning environments: A case example. Paper presented at the ED-MEDIA Conference, Seattle, WA. Venezky, R., & Davis, C. (2002). Que vademus? The transformation of schooling in a networked world. Washington, DC: OECD/CERI. Venezky, R., & Osin, L. (1991). The intelligent design of computer-assisted instruction. New York: Longman. Vygotsky, L. (1979). Mind in society. Cambridge, MA: Harvard University Press. Wenger, E. (1987). Artificial intelligence and tutoring systems: Computational approaches to the communication of knowledge. Los Altos, CA: Morgan Kaufmann. Wideman, H., & Owston, R. D. (1993). Knowledge base construction as a pedagogical activity. Journal of Educational Computing Research, 9(2), 165-196.

10 Gaining Perspective Through Science: A History of Research Synthesis in Reading Timothy Shanahan University of Illinois at Chicago

In 1883, James McKeen Cattell, 23 years old, stood on the steel deck of the Cunard steamship Servia, gazing east through the moist autumn air in eager anticipation. It was not his first passage to Europe; that had taken place 3 years earlier, and it was on that trip that he decided to abandon literature and philosophy in favor of the new science of psychology (Sokal, 1981). The young Cattell yearned to accomplish greatness in the labs of Leipzig. He could already anticipate the nature of the studies he would undertake in Wundt's experimental psychology laboratory—studies he had already begun under a brief tutelage with G. Stanley Hall at Johns Hopkins (Boring, 1929). But it is unlikely, as he stood in the sea air, that he could have imagined a time would come—and come soon—when there would be a need to synthesize the findings and procedures from a growing plethora of psychological investigations—then just a trickle—so that one could make ultimate sense of them. Cattell earned his PhD at Leipzig in 1886 and returned home to Pennsylvania. Over the next three decades, he peripatetically contributed to the early rapid growth of experimental psychology—and, inadvertently, reading research—during stints at the University of Pennsylvania (twice), Bryn Mawr, Cambridge, and, finally, Columbia University. During this period, he edited six important journals in psychology and science; conducted several original investigations on reaction times, mental measurements, association, and legibility; and produced more than 50 doctoral students under his mentorship—including Thorndike, Woodworth, Dear193

194

SHANAHAN

born, and Gates—all of whom made important contributions to reading research (Woodworth, 1944). For all of that, Cattell's early interest was less in synthesizing a burgeoning body of work than in trying to find outlets for the primary work that was then being done. Although Hall had created a fine journal, the American Journal of Psychology (AJP), he tightly controlled its access, only publishing studies out of Johns Hopkins and Clark—his universities—along with those of a few outsiders. Cattell's 4 months at Hopkins were enough for Hall to claim credit for his development (Hall, 1895; James, Ladd, Baldwin, & Cattell, 1895) but not enough to permit Cattell to publish in AJP, especially in light of Cattell's charges that Hall had misappropriated his work (Sokal, 1981). Not surprisingly, given Cattell's energies, he along with James Mark Baldwin began publishing a competing journal, much to Hall's chagrin— Psychological Review, which they published together for 10 years. At that point, Cattell and Baldwin had a painful falling-out, and Baldwin continued alone with Psych Review, but he also began publishing, significantly, Psychological Bulletin, a journal devoted to accumulating the research literature or making sense of it (Sokal, 1981). Baldwin—unlike Cattell—was more devoted to theory than research, and the Bulletin would play a valuable role in turning experimental results into theory, a role that journal continues in today (Boring, 1929). Cattell stimulated research on reading with his choice of the perception of letters and words as appropriate phenomena for psychological investigation, he helped develop the careers of several men who became pioneers in reading research, and he opened access to—or created—several research journals in which such investigations could be published. He helped set free a flood of studies, but it was his estranged colleague, James Mark Baldwin, who best recognized the need for a place to guide this deluge of data into a more focused and productive channel. THE BEGINNINGS OF READING SYNTHESIS The earliest Psychological Bulletin article devoted to anything akin to literacy was published in its eighth volume by June E. Downey (1911), and it provided a review of 13 studies published in 1910-1911 dealing with the psychology of graphic functions—a summary of studies of drawing, handwriting, and other conscious marking activities. Downey continued publishing summaries of this line of work in five of the following six years. In 1913, the psychologist E. H. Cameron began a similar series specifically on reading, which he continued through 1919 (Cameron, 1913). The first of Cameron's reading reviews is illustrative of most early examples of the genre. Cameron examined four disparate studies (one on legibility of print, one on speech curves in oral reading, one attempting to

10. RESEARCH SYNTHESIS

195

locate the neurological center of reading behavior, and one on eye movement measurement). In the following years, Cameron's reviews of the reading research summarized four, five, seven, seven, and two studies, respectively. These summaries were useful in terms of keeping psychologists apprised of a growing body of related work, but it was not true synthesis in the way that we talk about it now. These were really no more than a series of almost random bibliographic entries. Synthesis, in which a scholar attempts to evaluate the collective meaning of a body of research, was at that time largely relegated to the textbook rather than journals. This was certainly the case in reading. Downey and Cameron's annual compendiums of studies pale in contrast to the substantive, deep, and highly theoretical synthesis of reading research provided by Edmund Burke Huey in his 1908 book, The Psychology and Pedagogy of Reading (Huey, 1908/1968). Huey had studied under Hall at Clark University, and, unlike Cattell, he was permitted to publish his primary research findings in AJP. What perhaps is most remarkable about Huey's book is the insightful and deep manner in which he reviewed and synthesized 75 studies in English, German, and French, drawn from the experimental work on reading, over the first 186 pages of this volume (the rest of the book is a historical analysis of the teaching of reading, in which he applies the practical insights drawn from the earlier presented research synthesis; this surprising practical turn, a harbinger of the synthesis work in the second half of the 20th century, might be due to the fact that Huey first taught in a normal school or college of education prior to taking his doctoral training, apparently the only early experimental psychologist with such credentials at that time). Huey's review set out to answer several questions about the processes of reading, such as eye movement, visual perception, and legibility, and he strove to develop a more coherent picture of what was going on in reading by bringing the various findings together. It is not an explicitly or formally critical analysis of the research—Huey did not reject any research design, nor did he openly set aside any works because of flaws of design, execution, or logic. However, he did attempt to emphasize differences among conclusions and to adjudicate these to some extent, especially when there were differences with his own primary research findings: "This experimenter opposes Goldscheider and Muller's conclusion ..." or "The conclusions of Cattell, Erdmann and Dodge, and others as to the perception in word-wholes are also thought to be incorrect." In these critical forays, Huey doesn't reject the research evidence itself so much as challenge the conclusions drawn by the original researchers. For Huey, psychological phenomena are primary, and he seems to trust that the researchers' investigations will describe those phenomena with accuracy, though their conclusions drawn from these phenomena might be faulty.

196

SHANAHAN

Huey's effort at a synthesis of reading research is remarkable, as the contrast between Cameron's annual compendia and his work reveals. It should also be noted in this regard that no other scholar attempted anything like another thorough integrative review and synthesis of such a wide-ranging body of research on reading for more than two decades. The next forays into this particular arena were made by Miles Tinker, who published an important review of 110 titles on the experimental psychology of reading in a 1931 issue of Psychological Bulletin (Tinker, 1931), and M. D. Vernon (1931) published a book in the same year that also reviewed this literature in the Hueyian tradition. THE SUMMARY OF INVESTIGATIONS RELATED TO READING Cameron's early bibliographical work bore quicker-ripening fruit than Huey's, and this is most apparent in the work of a true giant in the field of reading and reading education, William S. Gray. Gray began as a 19-yearold teacher of 15 students (including one who was 2 years older than the teacher) in a one-room schoolhouse on the Illinois prairie. A year later, he became principal of a junior high school (where he still was required to teach 40 students daily, fifth through eighth grades), and in this role he began training some of his graduates to become teachers as high school diplomas were not a requirement for entry into the field at the time (Stevenson, 1985). From that practical beginning, Gray proceeded to Illinois State Normal University, a teachers college, and went from there to the University of Chicago, where he studied with the experimental psychologist Charles Judd, interspersing across these years some additional work with E. L. Thorndike, Cattell's student, at Columbia. Gray was trained as an experimental psychologist, but he brought a rich practical understanding of and commitment to education in general and literacy education in particular. His doctoral dissertation focused on the development of the first psychometric reading test, the still-used Gray Oral (Stevenson, 1985). Gray had a long research career, during which he developed a number of remarkable doctoral students and conducted many primary studies. Most pertinent here, however, were his efforts at research synthesis. In 1925, he published a monograph, the Summary of Investigations Relating to Reading, in which he provided a compendium of all of the research on reading that he was able to find (Gray, 1925). It is apparent that this publication seemed valuable both to Gray and his readers because less than 2 years later he issued a follow-up article in the Elementary School Journal covering the research of 1924-1925 (Gray, 1926). The outlet chosen for this

10. RESEARCH SYNTHESIS

197

summary of research suggests a more pedagogical—as opposed to theoretical—purpose and content—though from the beginning Gray included studies of the psychological processes of reading. This compendium, though much broader in scope, is very much in the tradition of Cameron's early annual reviews, a listing of studies with a brief, not particularly critical summary of the report, and it is hard to imagine that Gray was not influenced directly by Cameron in this regard. Gray's Summary of Investigations was, remarkably, an almost annual publication for the next 70 years, appearing in the Elementary School Journal through 1932 then in the Journal of Educational Research from 1933 through 1959. Although Gray's involvement with the Summary ended at that point, the Summary itself, now under the direction of some of his students, most notably Sam Weintraub, continued on for the next 33 years, using the same style and organizational plan developed by Gray. During the 1960s, the Summary was published annually in The Reading Teacher and later appeared as an annual issue of Reading Research Quarterly, and, finally, in the same form that it had begun—as an annual monograph, in this case published by the International Reading Association. Gray's contribution to synthesis extends beyond the Summary of Investigations and the ancillary Gray collection (a compendium of 9,325 titles compiled from key journals, books, research reports, and monographs published between 1884 and June 1976 that had been cited in the various annual Summaries). While still working on the Summary of Investigations, Gray published other more synthetic reviews (along with his own primary research). One of these, written with some rather illustrious collaborators in the history of reading research (Gray, Gates, Horn, & Yoakam, 1935), reviewed studies conducted from 1931 to 1934. However, in 1941, on his own this time, Gray published one of the most notable summaries of reading research to ever appear, this one in the Encyclopedia of Educational Research. This remarkable review examined nearly 2,000 sources. Unlike his Summary of Investigations, this was more in the tradition of Huey than Cameron, but even with that in many ways it remained a kind of narrative list rather than a true scientific research synthesis. This review showed greater breadth of coverage than what is now typical; the review attempted to answer literally dozens of questions on a broad array of issues covering nearly all aspects of reading and reading education. This breadth allowed him to focus on areas in which only a few research studies were actually available. But even in areas where extensive work had been carried out, he did not attempt to be comprehensive, apparently trusting his expert ability to grasp a central tendency, and he would rely on portions of the original work mainly for purposes of illustration rather than proof, more in the style of a good sermon than a modern scientific treatise, such as in this example:

198

SHANAHAN

The data secured in a score of experimental studies, particularly those by Gates and Russell (16: No. 43) and Agnew (16: No. 1), justify the conclusion that a moderate amount of training in phonetic analysis is valuable for most pupils. Evidence presented in some of the reports indicates that the most valuable results are secured if the amount of emphasis on phonics is limited in the first-grade and the training continued in the second and third grade until all the important elements have been learned. (Gray, 1941/1984, p. 69)

Gray, like Huey before him, did not explicitly provide critical analysis of research designs, but even more than Huey he avoided controversy or disagreement and failed to explore contradictory evidence, nor did he explicitly weigh the relative value of different contributions. As has been noted, the Summary of Investigations continued publication long after the passing of William S. Gray. Eventually, the International Reading Association, the publisher for its final 30 years, retired it due to lack of interest among scholars and practitioners. The end of the annual Summary came about for three reasons. First, the explosive growth in reading research made the need for more synthetic work—as opposed to compendia—paramount for the advancement of the field. As the number and quality of literature reviews on reading rose, the value of the annual Summary declined. Second—and closely related—there was an increase in research specialization that beset scholarship after World War II that continues today. It became less likely that any one scholar would have the interest or patience to wade through, what's more read, discrete summaries of investigations in the history of children's books, eye movements, family structure, dyslexia, effects of news coverage, readability, text interpretation, gender and reading, and spelling instruction—all of which were regularly summarized by Gray and his followers, along with material on dozens of other disparate issues. Third, the availability of more wide-reaching electronic summaries, such as ERIC and PsycINFO, made the Summary redundant—and since the annual Summary was harder to search because of its paper format, it eventually fell into disuse, with fewer copies sold each year until its demise. OTHER QUALITATIVE REVIEWS OF NOTE Before turning to the sea change in research synthesis that marked the final quarter of the 20th century, it would be worthwhile to take a brief detour to consider a few of the more important traditional research reviews on reading that were carried out from the 1920s through the 1970s. Although traditional narrative or qualitative reviews continue to be written today, arguably the importance of these in the development of the field or in the application of research to practice and policy has dimin-

10. RESEARCH SYNTHESIS

199

ished. Nevertheless, several of these earlier review efforts deserve some note, none more than the continuing series of reviews conducted by Miles Tinker, one of the true highpoints of synthesis in reading scholarship of the century. During the first half of the 20th century, no synthesizer of reading research—aside from W. S. Gray—was more productively active in synthesis work than Miles Tinker. Tinker was an experimental psychologist at the University of Minnesota who conducted a substantial amount of primary research on issues of legibility or text readability. This empirical work, though completed more than 40 years ago, is still widely cited and used by text designers today. Punctuating his steady productivity in determining how people see print, Tinker published several targeted syntheses of reading research, collected some landmark bibliographies, and published one overarching literature review on all experimental aspects of reading (the 1931 publication noted earlier). After a stint in the U.S. Navy at the end of World War I, Tinker completed his undergraduate and master's work at Clark University and from there went on to Stanford, where he earned his doctorate in experimental psychology while serving as an instructor and assistant professor. Immediately after graduation, Tinker published the first of what was to become a once-per-decade summary of research reviews on eye movements in reading (Tinker, 1927). The first of these notable syntheses examined 81 studies that focused on legibility and eye movements. The three follow-up reviews that ensued (1937, 1946, 1958) continued over the same research ground, but these reviews shifted increasingly from legibility toward eye movements in terms of coverage. From the first of these four remarkable reviews, Tinker did a masterful job of drawing conclusions based on a complex body of work, subjectively conditionalizing findings on the nature of the studies and their limitations. Early on, for example, he noted that the relationship of perception to reading varied depending on the nature of the materials used to measure perception; the connection was markedly stronger when letters and words made up the target materials than when numbers, shapes, or other materials were the focus. These scholarly reviews sketched a steady but increasingly strong case against eye movement training in reading. Repeatedly, the reviews revealed a strong pattern of research evidence that debunked various ophthalmic approaches to overcoming reading problems. Additionally, Tinker issued an extensive synthesis of 110 studies of visual apprehension and perception only a year after his first on eye movements (Tinker, 1929), and in 1933 he added a two-part bibliography that included 180 titles dealing with the identification and treatment of reading disabilities (Tinker, 1933a, 1933b). Though using none of the formal research review procedures now available, and working alone rather than

200

SHANAHAN

in concert with other scholars, Tinker's reviews continue to garner acceptance and are outstanding examples of what can be learned from qualitative or subjective reviews. Another reading research reviewer deserving of some note is Jeanne Chall. She studied with Edgar Dale at Ohio State University, and together they published one of the most widely used measures of text readability, the Dale-Chall formula (Dale & Chall, 1948). She also proposed one of the more widely cited theories of the development of reading (Chall, 1983). During her long career, Chall conducted two major research syntheses, one on readability research (Chall, 1958) and another on studies of phonics instruction. The phonics review, Learning to Read: The Great Debate (Chall, 1967), remains one of the most influential of the qualitative reviews in reading. Although Great Debate falls far short of the methodological standards of current research syntheses, it has historical importance beyond most, perhaps all, other reviews of this period because it became the first of a series of reading research reviews conducted to influence policy matters more than theory. Unlike current research reviews, Great Debate was meant to be neither comprehensive nor balanced. Chall's review did not seek all studies on phonics instruction but all studies conducted between 1910 and 1965 that supported the teaching of phonics. Modern synthesizers recognize that in a burgeoning body of research there are likely to be a range of findings and that it is more valid to determine what the average impact of an approach might be, rather than to identify its high-water mark. Chall's examination of these studies was not a critical analysis, unlike her fine review of readability research, but instead was an attempt to make the strongest case possible for the teaching of phonics in beginning reading. This analysis was supplemented with a thorough practical description of programs and materials for teaching phonics. The Great Debate was supported by the Carnegie Foundation and was issued by a popular press publisher rather than appearing in a more typical research venue, such as a refereed journal or an academic press. This was no accident. This report was, from the start, intended to have policy implications for states, school districts, and publishers (it did not emphasize federal concerns because, even as late as the mid-1960s, there was a very limited federal interest or investment in education). The Carnegie Foundation and Harvard University, where Chall taught, teamed up to complete a series of studies about the status of reading education (Austin & Morrison, 1963), professional preparation for reading instruction (Austin & Morrison, 1961), and phonics (Chall, 1967). Of these high-level policy-oriented reports, two were surveys and one was a research synthesis. The impact of Great Debate was almost immediate. Soon after it appeared, all major basal reading programs in the United States began to in-

10. RESEARCH SYNTHESIS

201

elude a phonics component. Historically, phonics had been seen as an alternative way of introducing beginning reading—alternative to the controlled vocabulary approach of the basal reader (Smith, 1965). Largely as a result of Great Debate, this changed, and within a generation phonics was widely accepted as an essential component within any beginning program to teach reading. No previous reading research synthesis had such a dramatic impact within the field, and it paved the way for a series of official reports based on research syntheses that aimed to influence American reading policy and practice more directly than primary research usually had. In fact, during the final quarter of the 20th century, three qualitative research syntheses with major policy motives followed directly in the footsteps of Chall's landmark effort—and all, one way or another, focused on phonics instruction in important ways. One of these, Becoming a Nation of Readers (Anderson, Hiebert, Scott, & Wilkinson, 1985), was issued by the Commission on Reading of the National Academy of Education. This may be the best written of all of the reports reviewed here; it reads more like a popular press book than a scholarly research report. As with Chall's contribution, this one made no effort to be comprehensive or critical. Instead, it was a highly selective review, with most of the research studies drawn from the work of a single research center (the Center for the Study of Reading at the University of Illinois at Urbana-Champaign). Richard Anderson, one of the authors of Becoming Nation of Readers, was the director of this center, and the other authors were all associated with the center as well. Their effort was less to determine the relative contributions and limitations of the research findings than to array these findings toward a comprehensive and well-organized view of reading instruction that could influence practice. Where there were noticeable gaps, the authors reached for studies outside of the reports of their center but relied on center reports as much as possible. Although researchers at the Center for the Study of Reading focused mainly on reading comprehension research, they published a few studies concerned with phonics or word perception, and this research synthesis like all of the major ones noted here included a section on the value of teaching phonics. Despite the existence of Becoming a Nation of Readers, the teaching of phonics began to wane during the 1980s, probably due to the efforts of whole language advocates who managed to persuade state policymakers in California and elsewhere to alter the requirements for textbook purchases in ways that diminished attention to word recognition instruction. As a result, some national legislators began to push the Center for the Study of Reading to accord greater attention to research findings on phonics (Pearson, 1990). Although the center was primarily involved in the conduct of research on reading comprehension, it was pressed into service

202

SHANAHAN

to use its funding to conduct and report a major synthesis on the more phonics-oriented aspects of reading. This led to a major research review on phonics, Beginning to Read by Marilyn Jager Adams (1990). This synthesis of research focused more on the basic perceptual findings about how the eye sees print than it did on instruction. The instructional part of this review examined less what works in the teaching of phonics than various side issues, such as the use of various phonics generalizations. Beginning to Read did an exceptional job, however, of making sense of the body of empirical study that had been developed on these issues during the previous three decades. As has been documented, the major research on reading in the 19th and early 20th centuries focused heavily on psychological investigations of word recognition or word perception, during what has been characterized as the Golden Age of reading research (Venezky, 1984). By the early 1920s, this period had ended and been replaced by a relatively nontheoretical emphasis on various aspects of reading instruction. This course was reversed somewhat in the 1950s and 1960s as psychologists again took up the mantle of reading research, driven forward by the advances of information-processing theory and the cognitive revolution. Adams's review took advantage of these advances and showed persuasively how important it was for beginning readers to learn the alphabetic principle. Nevertheless, Adams's research review failed to stanch the movement away from phonics instruction in school textbooks and state standards, and what came to be called the reading wars ensued. It is certainly possible that a more direct synthesis of the research on phonics instruction may have held off these bitter disputes about how best to teach beginning reading—Adams's treatment of the effects of such teaching may simply have been too oblique to convince practitioners of the value of phonics—but it is just as likely that this belligerency would have come about anyway (for two very different characterizations of the reading wars, see Taylor, 1998, and Stanovich, 2000). In any event, these debates did rage to the point of awakening official notice at the highest levels of government. In an effort to quell these conflicts, the National Research Council appointed a diverse committee of scholars to determine how reading difficulties could best be prevented in the early grades. The committee represented a wide range of views, and they eventually issued the last of the major qualitative research syntheses on reading of the 20th century. Preventing Reading Difficulties in Young Children (Snow, Burns, & Griffin, 1998) is best characterized as a consensus report on reading. It reflected the agreed-on positions of the participating scholars, but the degree to which this consensus was based on reading research is open to dispute. Some sections of Preventing Reading Difficulties make heavy reliance on research findings, and other sections

10. RESEARCH SYNTHESIS

203

have little or no research evidence to bolster the claims. This unevenness of treatment reveals one of the major limitations of qualitative research reviews; conclusions appear to be equally important no matter how disparate the evidentiary basis may be. Despite these limitations, Preventing Reading Difficulties in Young Children did help to temper the bitter arguments in the field to some extent and gave hope that some kind of political consensus might be possible. What it also did was to bring an important chapter of reading research history to a close. It is unlikely that the qualitative review in reading will ever again have the importance in the field that it did up to this time.

SYSTEMATIC RESEARCH SYNTHESIS AND READING POLICY From the appearance of Huey's 1908 review through the issuance of Preventing Reading Difficulties 90 years later, the field of reading research experienced rapid expansion. When Gray wrote his 1941 review of the research, he attempted to include all investigations that had been carried out during the first 55 years of scientific investigation in the field. He referred to 1,951 studies that had accumulated in that time, an amount that equals approximately 11 months of research productivity today, according to a check of the ERIC and PsycINFO databases. This increasingly rapid accumulation of studies led to the need for synthetic reviews of research, and such reviews began to be conducted and used with some regularity (Shanahan, 2000), including as the basis of public policy. As more research accumulated, and the importance of literature reviews grew, the serious limitations of these qualitative or subjective reviews became increasingly evident. Before examining how these limitations came to be addressed in the field of reading, it is important to consider the amazing advances in research synthesis that had been taking place in science during the final quarter of the century. It has been noted that the old systems of managing bibliographic sources was changed dramatically by the electronic-based search tools such as ERIC, PsycINFO, and many others (Reed & Baxter, 1994). The first of these, ERIC (the Educational Resource Information Clearinghouse) got its start in 1966, and the other information indexing services followed from that. These started out as paper resources but by the 1990s had shifted over to the much more flexible and powerful electronic versions. There is little question that the development of these storage and search capacities played an important role in revolutionizing research synthesis.

204

SHANAHAN

Another important milestone along this path was the development of publication outlets for research syntheses. Earlier the creation of Psychological Bulletin was noted, and this was followed by other publications solely devoted to the research review, such as Review of Educational Research, which got its start in 1931. Equally important, however, has been the opening up of other research publications as outlets for the publication of research reviews. All major research journals devoted to literacy (including Reading Research Quarterly, Journal of Literacy Research, Scientific Studies in Reading, Research in the Teaching of English) publish research reviews, and these are seen today as fundamental tools in scientific investigation. The problems of applying research to practice and policy are not unique to the field of reading, nor are data-based disputes, such as those that erupted in the reading wars. Because of these kinds of problems, a new approach to research review began to evolve. Gene Glass, an educational psychologist, proposed a statistical method for combining the results of studies in a systematic manner (Glass, 1978), and he called this approach meta-analysis. Although a form of meta-analysis had long been available (Pearson, 1904), it was not used much and exerted little impact on any field of science. However, when Glass proposed these procedures in his 1976 presidential address to the American Educational Research Association, his idea fell on fertile ground. Although the term meta-analysis is usually reserved for integrative reviews that statistically combine and analyze results in a particular manner, the basic idea of it has much wider ramifications for how research reviews are conducted and used. The best description of this approach remains that put forth by Jackson (1980), in which he proposes that a sound literature review should be conducted in the same manner as any primary research study. That is, it should begin with a question or hypothesis and should include a methods section in which it is explicitly revealed who the subjects of the study were and how and why they were selected (in this case, the subjects are the actual studies to be reviewed), as well as explicit methods for analyzing these data. The idea that Glass proposed and that has been elaborated on and improved steadily for nearly 30 years (Cooper & Hedges, 1994; Lipsey & Wilson, 2001; Rosenthal & DiMatteo, 2001) is that a literature review needs to be as free of bias as any study and needs to be so transparent in its methodology that it can be replicated by others. The history of research synthesis in reading and other fields is one of selective use of studies based strictly on the individual beliefs of a particular scholar. Reviews far too often have amplified minor results of small studies or neglected the accumulation of major findings across many studies. What was needed was a systematic way of identifying and selecting studies for review that would result in either a comprehensive data

10. RESEARCH SYNTHESIS

205

pool or at least an unbiased one. Additionally, it was essential that there be systematic ways of evaluating the quality and contribution of each study and methods for combining the results beyond the subjective capacities of an individual. Although some research reviews have been models of scholarship showing remarkable care, patience, and insight, the possibility of doing synthesis this well subjectively declines inversely to the size and complexity of the body of research. Since its inception, meta-analysis has gained wide acceptance in educational psychology and many other fields of study. In a recent analysis, it was found that fully 25% of all Psychological Bulletin articles are based on meta-analysis (Glass, 2000), and a keyword search (Google, April 14,2004) identified 507,000 hits indicating wide visibility and impact. No place has its use been more evident than in medical research, however, and the next important synthesis development that paved the way for changes in reading research comes from medical science, the Cochrane Collaboration (http://www.cochrane.org). The Cochrane Collaboration is an independent, international organization dedicated to the production and dissemination of systematic reviews of health care interventions. This remarkable effort is based on the work of various collaborative review groups that are formed voluntarily by health care professionals. These groups are sometimes regional but more often are focused on particular issues, such as breast cancer, stroke, and pregnancy and childbirth. The reviews of the Cochrane Collaboration have had a clear impact in the field of medicine, not only in encouraging and channeling new research efforts that are better targeted on the most critical questions but also in forming new standards of professional care on the basis of the very best research reviews. Not surprisingly, given the high stakes of medical research, the Cochrane Collaboration has relied heavily on meta-analytical techniques. As a direct result of the Cochrane Collaboration, there is now a group devoted to research synthesis in the social sciences, the Campbell Collaboration (http://www. campbellcollaboration.org). This collaborative group reviews and synthesizes evidence on social and behavioral interventions and public policy in education, criminal justice, social welfare, and other issues. It was in this context—reading wars, a lack of responsiveness to the qualitative research reviews that had dominated reading for a century, and widespread use of objective research reviews to determine policy in other fields—that the National Reading Panel (NRP) was formed. In 1998, the U.S. Congress, in response to the ongoing debates on reading education, required that a panel be appointed to evaluate what is known about the teaching of reading. The National Reading Panel examined research on eight topics (phonemic awareness, phonics, oral reading fluency, vocabulary, reading com-

206

SHANAHAN

prehension, reading encouragement, teacher preparation and development, and technology), breaking both with those past reviews that tried to be comprehensive and coherent (such as Becoming a Nation of Readers and Preventing Reading Difficulties in Young Children) or that focused on one particular area of research (such as the eye movement reviews or Learning to Read: The Great Debate). Also, breaking with the traditions of earlier reviews, NRP was willing to conclude that no policy determination could be made in some areas (technology, preservice teacher education, encouraging reading) because of a lack of sufficient research data. Furthermore, NRP did systematic, replicable searches for research studies in each of these topical areas and early on set rule-based study selection procedures that governed which data would be examined. If a study met these standards, it had to be analyzed, no matter what the personal beliefs or biases of the panelists. In three of the six topical areas in which there were findings, the panel relied on meta-analytic techniques for analyzing the data from the original studies. This approach allowed for two advances not seen before in reviews in the field of reading. First, it became possible to provide an objective appraisal of the quality of the underlying research and to consider this quality as a mediator or moderator of particular findings. Instead of an individual synthesizer making a decision to set aside or weigh less heavily one study or another, NRP was able to systematically examine the impact of quality characteristics. One conclusion that it was able to draw from this was that well-designed phonemic awareness studies (such as those with random assignment) reported significantly higher effect sizes for this teaching practice. Second, the systematic approach taken by NRP allowed for replication and reanalysis of the NRP findings. Earlier, if you disagreed with the conclusions drawn by a synthesizer about a particular body of research, there was little that could be done to adjudicate the matter. The resolution of this disagreement would have to be a matter of rhetoric rather than science because there was no ultimate way to either objectively evaluate the quality of the original synthesis procedures, because they were usually inexplicit, or to make a convincing case that some other conclusion would be more meritorious. The NRP results (National Institute of Child Health and Human Development, 2000) led to much argument and dispute (Shanahan, 2004), but they also generated two critical, independent analyses that resulted in similar findings (Almasi, Garas, & Shanahan, 2002; Camilli, Vargas, & Yurecko, 2003). The replicability of the NRP findings should increase the confidence with which policymakers and practitioners use the results. Soon after NRP issued its report, President George W. Bush announced that the NRP findings would be the cornerstone of federal literacy policy

10. RESEARCH SYNTHESIS

207

in the United States. The U.S. Congress followed suit by appropriating $5 billion, by far the largest amount ever devoted to reading education by the U.S. government, to support professional development, instructional programs and interventions, and classroom assessments that address the NRP findings. This is clearly the largest public impact exerted by any of the influential and important research syntheses in reading. Since its appearance, NRP has received a good deal of criticism, including the widespread complaint that it failed to address all important issues in the field (Shanahan, 2004). As a result, two additional panels have been put in place, and more seem certain to follow. The National Early Literacy Panel (NELP) is synthesizing the research literature on literacy development during the pre-K years, including family literacy, and the National Literacy Panel for Language Minority Children and Youth (NLP) is examining the research on learning to be literate in a second language, both issues neglected by NRP. Each panel is independent and is able to establish its own rules of operation, but each has chosen to follow important precedents established by NRP. For example, both panels are conducting the same kinds of systematic, replicable searches that characterized the earlier work. Although both NELP and NLP have chosen to examine a wider methodological range of research studies than did NRP, both have limited their evaluations of what works instructionally to experimental and quasi-experimental studies that allow a causal link to be established between instructional practices and outcomes in the same manner that NRP did. This is an important advance over earlier reviews that did not typically require an alignment of the research question and the appropriateness of the research to answer such a question. And both panels are relying heavily on meta-analytic tools wherever possible. CONCLUSIONS The development and use of research synthesis in reading has seen breathtaking changes during the 20th century. When one considers the distance between the insightful, idiosyncratic, and highly individual scholarly pursuit of Edmund Burke Huey trying to craft theory from the universe of reading research and compares it to the highly collaborative, systematic, objective, partial, public, and policy-oriented synthesis of the National Reading Panel, one can see the distance between the ox cart moving across a shtetl at the beginning of the 20th century and the shuttle hurtling through outer space. Huey's task was much like that of an artist trying to translate a connect-the-dots—with few dots—into the Mona Lisa. By midcentury, there were too many dots to allow any but the most knowledgeable and skilled synthesizer to end up with a meaningful por-

208

SHANAHAN

trait, and it became increasingly difficult to show the correspondence of the original dots to the resulting picture. And now, we are at a time in reading education when synthesis is a science in its own right, with its own methodology and with an increasingly clear public commitment to using research synthesis as the basis for public standards of educational practice. This chapter is a history of growth and innovation. It examined changes in journals, bibliographies, and the technology and use of synthesis. But through it all, there is one aspect of research review that has remained unalterable. This might best be revealed by a quote from Chall, in her analysis of Gray's monumental 1941 review in which she points out, "Today we seldom see Gray's warm enthusiasm for the value of scientific studies of reading. We rarely encounter his optimism about the uses of research for the improvement of reading instruction at all ages and for social policy with regard to literacy" (Chall, 1984, p. x). It is this thread of optimism about the value of research evidence in fostering public good that runs through the fabric of research synthesis from the very beginning; it is the thread of hope that links Huey to Gray and Gray to Chall and Adams and ultimately to the National Reading Panel and the panels that follow it.

REFERENCES Adams, M. J. (1990). Beginning to read. Cambridge: MIT Press. Almasi, J. F., Garas, K., & Shanahan, L. (2002, December). Qualitative research and the report of the National Reading Panel: No methodology left behind? Paper presented at the annual meeting of the National Reading Conference, Miami, FL. Anderson, R. C., Hiebert, E. H., Scott, J. A., & Wilkinson, I. A. G. (1985). Becoming a nation of readers: The report of the Commission on Reading. Washington, DC: National Institute of Education. Austin, M. C., & Morrison, C. W. (1961). The torch lighters: Tomorrow's teachers of reading. Cambridge, MA: Harvard University Press. Austin, M. C., & Morrison, C. W. (1963). The first r: The Harvard Report on reading in elementary schools. New York: Macmillan. Boring, E. G. (1929). A history of experimental psychology. New York: Appleton-Century-Crofts. Cameron, E. H. (1913). Reading. Psychological Bulletin, 10, 351-353. Camilli, G-, Vargas, S., & Yurecko, M. (2003). Teaching children to read: The fragile link between science and federal policy. Education Policy Analysis Archives, 11, 1-52. Chall, J. S. (1958). Readability: An appraisal of research and application. Columbus: Ohio State University. Chall, J. S. (1967). Learning to read: The great debate. New York: McGraw-Hill. Chall, J. S. (1983). Stages of reading development. New York: McGraw-Hill. Chall, J. S. (1984). Foreword. In J. T. Guthrie (Ed.), Reading: A research retrospective, 1881-1941. Newark, DE: International Reading Association. Cooper, H., & Hedges, L. V. (1994). The handbooks of research synthesis. New York: Russell Sage Foundation.

10. RESEARCH SYNTHESIS

209

Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27, 11-20, 37-54. Downey, J. E. (1911). Graphic functions. Psychological Bulletin, 8, 311-317. Glass, G. V. (1978). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8. Glass, G. V. (2000). Meta-analysis at 25. Retrieved September 10, 2004, from http://glass.ed. asu.edu/gene/papers/meta25.html Gray, W. S. (1925). Summary of investigations relating to reading (Supplementary Educational Monographs, No. 2). Chicago: University of Chicago Press. Gray, W. S. (1926). Summary of reading investigations (July 1,1924, to June 30,1925). Elementary School Journal, 26, 449-459, 507-518, 574-584, 662-673. Gray, W. S. (1984). Reading. In J. T. Guthrie (Ed.), Reading: A research retrospective, 1881-1941 (pp. 1-89). Newark, DE: International Reading Association. (Original work published 1941) Gray, W. S., Gates, A. I., Horn, E., & Yoakam, G. A. (1935). Reading. Review of Educational Research, 5, 54-69. Hall, G. S. (1895). Editorial note. American Journal of Psychology, 7, 3-8. Huey, E. B. (1968). The psychology and pedagogy of reading. Cambridge: MIT Press. (Original work published 1908) Jackson, G. B. (1980). Methods for integrative reviews. Review of Educational Research, 50, 438-460. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. National Institute of Child Health and Human Development (NICHD). (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction: Reports of the subgroups (NIH Publication No. 00-4754). Washington, DC: U.S. Government Printing Office. Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical Journal, 1243-1246. Pearson, P. D. (1990). Foreword: How I came to know about Beginning to Read. In M. J. Adams (Ed.), Beginning to read (pp. v-viii). Cambridge, MA: MIT Press. Reed, J. G., & Baxter, P. M. (1994). Using reference databases. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 58-70). New York: Russell Sage Foundation. Rosenthal, T., & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology, 52, 59-82. Shanahan, T. (2000). Research synthesis: Making sense of the accumulation of knowledge in reading. In M. L. Kamil, P. B. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research (Vol. 3, pp. 209-228). Mahwah, NJ: Lawrence Erlbaum Associates. Shanahan, T. (2004). Critiques of the National Reading Panel report: Their implications for research, policy, and practice. In P. McCardle & V. Chhabra (Eds.), The voice of evidence in reading research (pp. 235-266). Baltimore: Paul H. Brookes. Smith, N. B. (1965). American reading instruction. Newark, DE: International Reading Association. Snow, C. E., Burns, M. S., & Griffin, P. (Eds.). (1998). Preventing reading difficulties in young children. Washington, DC: National Academy Press. Sokal, M. M. (1981). An education in psychology. Cambridge: MIT Press. Stanovich, K. E. (2000). Progress in understanding reading. New York: Guilford Press. Stevenson, J. (1985). William S. Gray, teacher, scholar, leader. Newark, DE: International Reading Association. Taylor, D. (1998). Beginning to read and the spin doctors of science. Urbana, IL: National Council of Teachers of English.

210

SHANAHAN

Tinker, M. A. (1927). Legibility and eye movement in reading. Psychological Bulletin, 24, 621-639. Tinker, M. A. (1929). Visual apprehension and perception in reading. Psychological Bulletin, 26, 223-240. Tinker, M. A. (1931). Physiological psychology of reading. Psychological Bulletin, 28,81-98. Tinker, M. A. (1933a). Diagnostic and remedial reading I. Elementary School Journal, 33, 293-307. Tinker, M. A. (1933b). Diagnostic and remedial reading II. Elementary School Journal, 33, 346-358. Tinker, M. A. (1937). Time taken by eye-movements in reading. Journal of Educational Research, 30, 241-277. Tinker, M. A. (1946). The study of eye movements in reading. Psychological Bulletin, 43, 93-120. Tinker, M. A. (1958). Recent studies of eye movements in reading. Psychological Bulletin, 55, 215-231. Venezky, R. L. (1984). The history of reading research. In P. D. Pearson (Ed.), Handbook of reading research (pp. 3-38). New York: Longman. Vernon, M. D. (1931). The experimental study of reading. Cambridge, UK: Cambridge University Press. Woodworth, R. S. (1944). James McKeen Cattell, 1860-1944. Psychological Review, 51,201-209.

11 Literacy and New Technologies: Ten Principles for Assisting the Poor1 Daniel A. Wagner University of Pennsylvania

Few areas of social and economic development have received as much attention, and as few proportionate resources, as adult literacy and adult education. Across the world—in both industrialized and developing countries alike—it is widely acknowledged that about 5% or less of national education budgets is spent on the nearly 25% to 50% of the population in need of increased literacy skills (see Wagner, 2000). For several centuries, it has been variously claimed that literacy—a key (if not the key) product of schooling—would lead to economic growth, social stability, a democratic way of life, and other social and economic benefits. Detailed historical reviews have not been so kind to such generalizations (see several chapters in Wagner, Venezky, & Street, 1999). Both universal literacy and universal economic growth have suffered from what has been called at times development fatigue (King, 1991)—namely, that governments and international agencies have come to feel that a great deal of toil and funding have led to only limited returns on investment. l The author is especially pleased to have this chapter be part of the festschrift for Dick Venezky, who has been a partner and friend on adult literacy issues for many years. In his inimitable words, "Our best ain't none too good," which surely describes the state of evidence today on technology and literacy. This chapter is derived in part from a larger international review by D. A. Wagner and R. Kozma (2003). This report was supported in part by the U.S. Department of Education/Office of Vocational and Adult Education (ED-01-R-0023) to the University of Pennsylvania, under TECH21/UN Literacy Decade modification, as well as by the Spencer Foundation.

211

212

WAGNER

Further, the historical record contains not only debate, but considerable doubt, as to the efficacy of literacy campaigns around the world (cf. Amove & Graff, 1987). Thus, as we enter the United Nations (UN) Literacy Decade (declared in February 2003), one might legitimately ask why we are doing this again. What has changed that leads us to believe that the goals and means for a special Literacy Decade will succeed when decades of prior effort have not "solved" the "problem of illiteracy"? Have the rationale and purpose been clearly staked out? Do we have new or better ideas? In this chapter, I consider some new capabilities for literacy promotion, specifically that of new technologies and how they are beginning to change what can be done, and, indeed, must be done, to promote universal education for the 21st century.

PREVIOUS INTERNATIONAL LITERACY EFFORTS The 2000 World Education Forum on Education for All (EFA) in Dakar, Senegal, included literacy within three of its five major worldwide goals, including: (iii) ensuring that the learning needs of all young people and adults are met through equitable access to appropriate learning and life skills programs; (iv) achieving a 50 per cent improvement in levels of adult literacy by 2015, especially for women, and equitable access to basic and continuing education for all adults; and (vi) improving all aspects of the quality of education and ensuring excellence of all so that recognized and measurable learning outcomes are achieved by all, especially in literacy, numeracy and essential life skills. As part of the worldwide EFA goals, a new approach to learning was emphasized, one that focused on measurable learning achievement (rather than mere class attendance or participation). These challenges, then, have formed the basis for renewed interest in literacy and adult education in recent years. Concern about illiteracy, of course, has been a focus of human development activity in many parts of the world well before Dakar. As part of the creation of the United Nations, after World War II, literacy was chosen as a key part of its mandate, and one that has been adopted by nearly all the international and bilateral agencies over the decades that followed. Fo-

11. ASSISTING THE POOR

213

cused international conferences on literacy also show its importance prior to Dakar in 2000, such as Persepolis (1976), Udaipur (1982), the 1990 World Conference on EFA (Jomtien, Thailand), the Mid-Decade EFA Review (Amman, 1996), World Conference on Literacy (Philadelphia, 1996), and the International Conference on Adult Education (CONFINTEA V, Hamburg, 1997). Although numerous such efforts have been undertaken to promote universal literacy, the fundamental problems, and the global statistics, have changed only moderately, whether in industrialized or developing countries. The United Nations estimates that there are one billion illiterate adults in the world today (about one quarter of the world's adult population), the vast majority of whom are located in the poorest countries of the world (UNICEF, 2000). Furthermore, recent surveys suggest that this situation is even more serious than previously believed: industrialized countries, over the past decade, have come to admit to having very serious problems of their own in literacy and basic skills, with up to 25% of adults considered to be lacking in basic skills needed to function effectively in the workforce (see OECD, 1997; Tuijnman et al., 1997). Furthermore, when looking at access to print media, it has been found that while about 26% of adults in OECD countries read a daily newspaper, only 4% do so in developing countries. Due in large part to increasingly competitive and knowledge-based economies across the world, most governments and international/bilateral agencies continue to express concern about the impact of illiteracy and low literacy on economic prosperity and social progress. Resource allocations, however, have remained a disproportionately small fraction of what is contributed to formal schooling. And even the substantial increase in primary school attendance in many poor countries is problematic, because it has driven the quality of that schooling downward, giving the erroneous policy impression that literacy problems have been solved by primary school attendance (Wagner, 2002).

LITERACY, ECONOMIC DEVELOPMENT, AND TECHNOLOGY In January 2002, the UN General Assembly proclaimed the years 20032012 to be the UN Literacy Decade (United Nations, 2002a), which was officially launched on February 13, 2003. The founding resolution (56/116) reaffirmed the Dakar Framework for Action (mentioned above; UNESCO, 2000), in which the commitment was made to achieve a 50% improvement in adult literacy by 2015, especially for women, and equitable access to basic and continuing education for all adults. The International Action Plan

214

WAGNER

for implementing Resolution 56/116 states that "literacy for all is at the heart of basic education for all and that creating literate environments and societies is essential for achieving goals of eradicating poverty, reducing child mortality, curbing population growth, achieving gender equality and ensuring sustainable development, peace, and democracy" (United Nations, 2002, p. 3). The action plan calls for a renewed vision of literacy that goes beyond the limited view of literacy that has dominated in the past. The plan elaborates: "it has become necessary for all people to learn new literacies and develop the ability to locate, evaluate and effectively use information in multiple manners" (p. 4). These proposals and plans have come during a period of significant, interconnected economic, social, and technological change in which literacy and education have become even more important to personal, social, and national development. Economists claim that a profound shift has occurred in the role that knowledge and technology play in driving productivity and global economic growth (Stiglitz, 1999), a phenomenon referred to as the knowledge economy (OECD, 1996). From this perspective, knowledge is both the engine and the product of economic growth (OECD, 1999). The production, distribution, and use of new knowledge and information are major contributors to increased innovation, productivity, and the creation of new, high-paying jobs. Developments in human, institutional, and technological capabilities are, in turn, major sources of new knowledge and innovation. A parallel, linked consequence—sometimes called the information society (European Commission, 2000)—is the assertion that a broader social transformation is resulting from the convergence of computers and communication technologies and their assimilation throughout society. As information and communication technologies (ICTs)—ranging now from laptops wirelessly connected to the Internet to cell phone Web browsers, personal digital assistants, and low-cost video cameras—become more accessible and embedded in society, they offer the potential to make education more widely available, foster cultural creativity and productivity, increase democratic participation and the responsiveness of governmental agencies, and enhance the social integration of individuals and groups with different abilities and of different cultural backgrounds. These economic, social, and technological transformations would have significant implications for the skills needed by both employees of the knowledge economy and citizens of the information society (21st Century Partnership, 2003). In the knowledge economy, there is an increased proportion of the labor force engaged in handling, producing, and using information, rather than producing more tangible economic goods (OECD, 2001). Consequently, employees in the knowledge economy must be able to use ICT to search for and select relevant information, interpret and ana-

11. ASSISTING THE POOR

215

lyze data, work with distributed teams, and learn new skills as needed. Particularly prized in the knowledge economy is the ability to use information to solve problems and create new knowledge. Similarly, citizens of the information society must be able to use ICT to access information about education, health care, and government services (European Commission, 2000). Participants in the information society need the skills to be creative producers of cultural artifacts and to communicate effectively with others, particularly those of different backgrounds. Furthermore, continued economic, social, and technological developments require that employees and citizens be able to acquire new skills in response to changing circumstances, to assess their own learning needs and progress, and to learn throughout their lifetime—they must become lifelong learners (OECD, 2001). Although notions of knowledge economy and information society may characterize changes in the developed world, one might question their relevance for less developed countries where GDP, literacy rates, and access to technology are all low. Until relatively recently, developing countries have relied primarily on cheap, unskilled labor to compete in the global market. Although this may be a viable short-term strategy, the United Nations Industrial Development Organization (UNIDO, 2002) encourages developing countries to take the high road to development by building new institutions and infrastructure, along with providing the support needed to create new skills, information, and capabilities. A continuation of the current low-road strategy would mean that developing countries and transitioning economies risk being even further marginalized because their education and training systems are not equipping learners with the skills they need for the future, according to the World Bank (2002). Thus, it seems clear that skills needed for lifelong learning not only prepare citizens for competition in the global market but also improve their ability to function as members of the community and thus increase social cohesion, reduce crime, and improve income distribution. IS ICT FEASIBLE FOR PROVIDING LITERACY EDUCATION TO THE POOREST OF POOR? In an era of increasing globalization, there is no area where one feels the pressure of rapid change more than that of technology and information transfer. And there is no area where it appears that the gap between rich and poor seems to be laid bare so starkly. Yet long before the term digital divide (NTIA, 1999) became a common term to describe gaps between the rich and poor in the access and use of ICT, most policymakers, researchers, and practitioners could at least agree on one thing: Using ICT for the

216

WAGNER

purposes of literacy and basic education seemed, at the very least, counterintuitive. It was, simply put, difficult for government and international agency authorities to imagine how the most destitute of education programs—lacking in basic amenities such as pencils and paper—could afford or manage advanced ICTs. Given the extreme lack of resources available to many adult literacy programs, it has been commonplace to hear exasperation among specialists, such as program managers and government workers, who see ICTs as just another burden placed on these already poorly funded programs or a nonexistent magic bullet that would be incapable of doing more than wasting scarce resources. Even reaching the so-called ordinary poor would entail challenges of electrical power, telecommunications connectivity, human resources infrastructure, and the like (DotForce, 2001). Reaching the poorest would be even more difficult due to wider gaps in those parameters just mentioned. In addition to infrastructural and material problems, there are also a variety of limitations in the human skill competencies of this target population (OECD, 1997). By human competencies, we refer here to a broad range of skills that often fall into the general catchall term literacy, but in fact include a wide variety of discrete skills ranging from reading and math, language and multilingual fluency, content knowledge in specific domains, eye-hand coordination, typing (and mousing) skills, and so forth.2 This list is, in reality, relatively long when operationally specified (OECD/Statistics Canada, 2000). Limitations of human skill competencies—some acquired in schools, others in formal (work) or informal settings—remain a major barrier to the use of ICT tools as we employ them today. As mentioned above, international agencies assert that there are about 900 million adult illiterates in the world today, the majority of whom reside in South Asia and Africa. Even these large (and growing per annum) numbers are likely to be a serious underestimation of literacy needs in the digital age. Indeed, if the larger set of skill competencies mentioned previously were employed, along with the limited efficiency of adult literacy and second-chance education programs, compounded by the very low quality of many poor rural schools in developing countries, it would probably be more accurate to say that those in need of improved basic skills today represent between 2 and 3 billion individuals (Wagner, 2000; Wagner et al., 1999)—nearly three times the current UN estimates (see Fig. 11.1). Of these individuals, we might estimate that at least half are among 2 In the ongoing ALL survey of the Educational Testing Service, ICT literacy is defined as the "skills and abilities that will enable the use of computers and related information technologies to meet personal, educational and labour market goals" (Lowe & McAuley, 2003).

11. ASSISTING THE POOR

217

FIG. 11.1. Regional variation in traditional and technological illiteracy. (Literacy data adapted from UNESCO, 2000; technological illiteracy data on OECD adapted from OECD, 2002, while regional data are estimates extrapolated from descriptive accounts, as there is as yet no consensus on a definition of the latter term.)

the poorest of the poor because they will undoubtedly be overrepresented by ethnolinguistic minority groups, especially in postcolonial countries, where access in the metropolitan languages of the digital world (i.e., English, French, Spanish) is quite limited. This situation, when considered in its entirety and over decades of promises and goals unmet—both within and across countries—would lead the rational observer to have serious doubts that ICTs might have something special to offer to the poor. Indeed, up until fairly recently, the most usual response from both international and national policymakers, as well as those practitioners on the ground, has often been that such advanced approaches to educational solutions are simply impractical—well beyond the fiscal and human resources of poor countries. However, let us reconsider the situation as we move into the 21st century. In many developing countries, the atmospherics concerning ICT applications have undergone a dramatic change: from what was universally labeled as impractical toward the ways in which ICT can be used most effectively. Even for the poorest population sectors, the benefits of ICT increasingly seem well suited for coping with the problems of basic literacy

218

WAGNER

and technological literacy and enhancing the socioeconomic consequences for the lives of the users. Why is this so? First, ICT tools can be adapted more appropriately to the diversity of learners. For example, poor people in developing countries (and many in industrialized countries as well) tend to live in dispersed geographical contexts and consist of diverse populations of youth and adult learners, where distance education can be an effective tool and where obtaining high-quality and time-sensitive information has been difficult or impossible. Also, because many poor people have either dropped out of school or are too old for the formal school system, the interactive and asynchronous nature of ICT can provide useful solutions if community-based resources can be made available to them after school hours. In addition, the diversity of the population of poor people (especially by ethnicity, language, and gender) requires the kind of consumer-driven and context-sensitive ICT tools that, when properly deployed and employed, can be far more effective than inadequately trained individual teachers. Second, there is a perennial problem with teacher training in developing countries. Professional expertise is limited and thinly distributed in the poorest parts of poor countries, and this can be enhanced by a variety of ICT-supplemented training tools (see the PDK teacher training multimedia project described in Wagner & Hopey, 2001). Further, beyond teacher competency is the issue of teacher motivation, which is often lacking among teachers posted to the geographically isolated areas. Finally, even teachers who are well trained may lack the language skills necessary to be effective with poor, minority-language learners. Thus, minority-language ICT-based tools offer some advantages to ethno-linguistic groups (and their teachers) who do not have mastery of main national languages. When ICT is mentioned as a solution among the very poor, it is often asked, "How can you give every poor person a computer or access to the Internet when ICT is simply too expensive for everyone, especially poor people." This is quite right, up to a point. Although the United States is a very wealthy country in per capita income, as recently as 1998, a major digital divide problem was declared, and the government urgently sought ways to support ICT use among minority/ethnic groups. Yet by 2002, data on home Internet use showed that growth among the poorest ethnic minority groups (African American, Hispanic) was higher than that of majority (White) populations (see Fig. 11.2). These data do not imply that similar patterns of change will occur soon in poor countries, but they do indicate that it is not impossible for poor people to realize that putting scarce dollars into ICTs may be a good, indeed critical, investment for family survival and productivity of various kinds. However, the relatively high cost of ICT may not be the key question. A more pertinent question would be: "What ICT solutions should be consid-

11. ASSISTING THE POOR

219

FIG. 11.2. Internet access in the United States (adapted from U.S. Department of Commerce, 2002).

ered in the near, medium, and long term with respect to poor populations with very diverse demographic characteristics?" One response in education is to focus on the professional development and training of teachers, because the quality of teachers is known in virtually all countries (rich and poor) to be a key predictor of student achievement. And, as noted earlier, many if not most teachers in poor parts of poor countries lack adequate training for the jobs they are doing and have few in-service training opportunities. Thus, teacher training provides a relevant locus for this kind of effort, assuming the cost constraints can be met. This is so not only because training a teacher can leverage impact on many more beneficiaries but also because it is not so difficult, even in poor countries, to bring most or all teachers to ICT, rather than having to take ICT out to all the teachers. Furthermore, teachers can become intermediaries for community-based programs that serve people who have had little or no prior access to ICT. Teacher training resources can be delivered through existing training institutions and would comprise CD-ROMbased materials, collaboration technology for sharing materials, pupil training resources, and culturally appropriate and multilingual content. Two examples of literacy-supported work for learners and for teachers emanate from the University of Pennsylvania. Recently, the federal government established a National Technology Laboratory for Literacy and Adult Education, also known as Tech21 (see www.tech21.org). Tech21 provides a one-stop location for vetting appropriate ICT solutions, providing results on effectiveness, and adapting for alternative uses in other sectors. Tech21 provides both end-user information and links for learners, as well as a repertoire of resources and training tools for teachers. Internationally, the University is partnering with the World Bank, UNICEF, and

220

WAGNER

other agencies (public and private) in a collaborative effort known as the Bridges to the Future Initiative (BFI)—see www.bridgestothefuture.org— which began in 2003 in India, Ghana, and South Africa. The main goal of this project is to try to answer a key question posed in this chapter, namely, in what ways can ICT-based learning and information resources be put to service to assist the poorest sectors of populations in diverse cultural settings? Again, the BFI project focuses both on the end-user (in this case, usually primary school dropouts), as well as on teacher training resources. Both are being developed in a multimedia framework with the poorest populations in mind.

POLICY IMPLICATIONS OF DIFFERENT MODELS FOR USING TECHNOLOGY TO FOSTER LITERACY AND ECONOMIC DEVELOPMENT The United Nations Development Program (2001) presents a model that illustrates the relationship between technology, skill development, and economic development. According to this model, a country's ICT investments can directly enhance the capabilities of its citizens. Increased skill capacity can, in turn, support the further development and increase the productive use of the technological infrastructure. The growing sophistication of the skill base and the technological infrastructure can lead to innovation and the creation of new knowledge and new industries. New knowledge and innovation support the growth of the economy that in turn provides resources needed to further develop the human, economic, and technological infrastructure and the welfare of society. Personal participation in this technology-knowledge-economic development cycle begins with literacy. The connection between literacy, technology, and global progress (with an emphasis on developing countries) is central. ICT is viewed primarily as a set of potential delivery and instructional tools that can be used to help people acquire the skills associated with traditional notions of literacy. In this approach, computerassisted tutorials and other technology-supported resources can make education more accessible and help adults improve their ability to decode and comprehend prose text, thus increasing their literacy, employability, and continued use of literacy skills to become lifelong learners. The policy implications of this approach are relatively straightforward: Are the expenses associated with providing the hardware, software, and delivery infrastructure for literacy learning less than those required to provide this training by some other means? Or if not less expensive, are technologybased means more effective than traditional means and sufficiently so to justify the added costs?

11. ASSISTING THE POOR

221

In a second approach, the relationship between literacy, technology, and development can be seen in a more integrated way—one that suggests very different policy implications. With this approach, literacy is defined as a broader set of text and technological skills that includes not only the decoding and comprehension of prose but the ability to access, analyze, evaluate, communicate, and use information to solve problems and create new knowledge (Educational Testing Service [ETS], 2002; International Society for Technology in Education [ISTE], 1998; OECD, 2000b; Quellmalz & Kozma, 2003). From this perspective, ICT is not just a means for delivering literacy skills but is an integral part of an informationliterate society. Individual participation in this society not only involves text literacy skills but the skills to use technology as a means to access, disseminate, and create new information and knowledge products for the benefit of the individual and society. From a policy perspective, the costs and uses of ICT need to be considered in a broader educational, social, and economic context. The rationale for ICT investment is not justified merely in terms of a more efficient or effective means to deliver literacy training but also as an environment that sustains literacy and development by providing a wide range of productive tools and information by which literate people can use their skills to promote their own personal improvement and the social and economic development of the country. With this second approach, ICT investments would involve not only the development of the hardware, software, and network infrastructure but also the development of language-appropriate and culturally relevant content software and online information on health, nutrition, family planning, continuing education, employment, agricultural production, and so forth. In addition, there is a great need for the tools and programs to support the local development and distribution of such relevant content. A significant benefit is that this new ICT infrastructure would be used not only for adult literacy and basic skills learning but also to support elementary and secondary education, improve community service and welfare, and promote the development of businesses. The policies and costs involved in such a coordinated approach are undoubtedly higher than those of the first approach alone—but the potential impact would be much greater.

TEN PRINCIPLES FOR HELPING THE POOR Although many opportunities for success and failure will present themselves in the future, there are at least 10 principles that are likely to guide such efforts. These are as follows:

222

WAGNER

1. ICT may be a cost-effective strategy even for poor nations and poor people, especially when compared to the total cost of other traditional human resource-dependent solutions (e.g., the cost of well-trained teachers and institutional schooling). 2. Use of advanced ICT tools should not be restricted from use with the very poor. Indeed, advanced ICT tools may be relatively more costeffective for the poor than for the rich. It was often thought that old ICTs (such as radio) were necessarily the best route to reaching poor people, whereas advanced ICTs were only cost-effective for the rich. The example of the cellular phone dispelled that thought. The Grameen Bank effort in South Asia showed that even the poorest people can find value and resources to support a system of cellular communications, which has become a driver for micro-enterprise (World Bank, 1994). 3. Learning technologies must have learning and content at their core. Many of the most egregious mistakes in the digital divide era (such as the provision in the 1980s of TV production labs and video libraries when a consistent electrical grid was unavailable or the present-day efforts of some ICT corporations to dump computers into countries with insufficient attention to educational utility) concern an overly narrow focus on ICT equipment, without commensurate focus on learning and content. Projects within the digital divide must first be about learning and about culturally appropriate content. 4. ICT tools must be consumer oriented and context/culture sensitive. Consumer sensitivity is a longstanding buzzword of marketing in the private sector, yet it seems to be sometimes forgotten in supply-side projects that try to marry ICT and education. Especially when focused on the poor, it is critical to pay very close attention to consumer interests and values, which also means ethnic, language, gender, and other cultural and contextual features. Even the Grameen Bank example, mentioned earlier, may be seen as effective only when used by a relatively small number of village (women) entrepreneurs who are sufficiently skilled to master the commercial and technical aspects of maintaining a mobile phone service. 5. Literacy and technology are interdependent. Literacy and technology are tools that have much in common. Neither is an end to itself, but each can amplify human intelligence and human capability. In addition, both are rapidly becoming interdependent. New literacy programs need to take advantage of the power of technology, but ICT work will require an ever more skilled population of workers and consumers. 6. JIT (just-in-time) and JEH (just-enough-help) concepts should be used to help keep the focus on priorities and maximize limited resources. This notion of tailoring assistance to meet the needs of individuals and groups of individuals derives from a substantial body of research on collaborative learning in the realm of ICTs.

11. ASSISTING THE POOR

223

7. Collaborations among government, educational, and nongovernmental agencies are critical to addressing of digital divide problems for the poorest sector. Programs with staying power—that will be sustainable—are likely to have to reinforce existing government structures (rather than replace them) and enhance as a priority mainly those areas of public education that are most in need of assistance (e.g., teacher training). Further, institutions of higher education can assist in outcome evaluation and monitoring processes that are at the heart of determining whether further investments are warranted. Similarly, research is required to promote promising innovations. 8. Private sector involvement early in the process is essential to take advantage of the latest ICT tools. The private sector can offer advanced knowledge concerning ICT tools that will be coming into the marketplace, and that will inevitably provide cost-effective and cheap tools over time. Corporations can also pass down large numbers of newly obsolete computers, which can be quite serviceable among the poor. There are innumerable examples of such corporate philanthropy in both industrialized and developing countries. 9. Sustainability will nonetheless likely require some type of ongoing, subsidized (non-fee-driven) approaches for the foreseeable future. That is, private sector involvement is necessary but not sufficient. In today's environment, and especially when dealing with the very poor, the concern over sustainability can bias projects in directions that are not necessarily most effective for the end users. There is no single answer to this question, but there is little doubt that the poorest of the poor are unlikely to be able to pay user fees in the same way that the Grameen Bank model of cell phones was able to achieve over the past decade. Commercially viable ICT-based projects—such as fee-driven Internet kiosks—will have some benefits in poor population sectors, but it is unclear whether the poorest people will derive much benefit in the near term. To be more precise, it is clear that the poorest populations (with few exceptions) have neither the literacy (or ICT literacy) skills nor the user-fee resources to take advantage of kiosklike approaches to ICT access. Further, as with most marketdriven approaches, content development for kiosks will be inevitably biased toward those who have more money to spend and thus toward the upper end (financially) of even the poorest of communities. Hence, some type of subsidized (non-fee-driven) approaches will be required for the foreseeable future in such populations. 10. Finally, remember to keep a dedicated focus on the target population, that is, the poorest segments of the population—the wealthier groups will take care of themselves! At present, it is not unusual to find digital divide initiatives that provide better access to ICTs in universities, secondary schools, and primary schools. However, in a great many of these cases,

224

WAGNER

the recipients are those who are already in the middle or upper classes of their respective societies—this is especially true in developing countries, where it is assumed that only middle-class communities can make appropriate use of ICT. The challenge, of course, is to stay focused on the poor— otherwise the digital gap will simply increase further.

CONCLUSIONS The UN Literacy Decade began only recently. Its success will depend on the mobilization of the best talents that can be brought to bear on worldwide literacy problems. In this chapter, it is argued that the use—indeed, the increased use—of effective and appropriate technologies can play a significant role in creating a more literate world. Conversely, the failure to take appropriate advantage of ICTs to help improve the lives of the poorest and least schooled populations of the world make it all the more difficult to achieve the Decade's goals. Further, the promise of information and communications technologies to enhance the education and livelihood of poor people is a tremendously challenging area of development work today. Yet, with a set of good principles and a reasonable level of support and an eye toward innovation, a great deal can be achieved to employ ICTs to help the poorest of the poor.

REFERENCES Amove, R. F., & Graff, H. J. (Eds.). (1988). National literacy campaigns. New York: Plenum. DOTForce. (2001). Digital opportunities for all: Meeting the challenge. Report of the Digital Opportunity Task Force. Washington: World Bank/UNDP. Educational Testing Service [ETS]. (2002). Digital transformation: A framework for ICT literacy. Princeton, NJ: Author. European Commission. (2000). eEurope: An information society for all. Brussels: Author. International Society for Technology in Education [ISTE]. (1998). National educational technology standards for students. Eugene, OR: Author. King, K. (1991). Aid and education in the developing world. London: Longman. Lowe, G. S., & McAuley, J. (2003). Information and communication technology literacy assessment framework (in the ALL survey). Princeton: ETS. National Telecommunications and Information Administration (NTIA). (1999). Falling through the net: Defining the digital divide. Washington, DC: U.S. Department of Commerce. OECD. (1996). The knowledge-based economy. Paris: Author. OECD. (1997). Literacy skills for the knowledge society: Further results from the International Adult Literacy Survey. Paris: Author. OECD. (1999). Knowledge management in the learning society. Paris: Author. OECD. (2000). Literacy in the information age. Paris: Author. OECD. (2001). Learning to bridge the digital divide. Paris: Author.

11. ASSISTING THE POOR

225

OECD. (2002). Measuring the information economy. Paris: Author. Perraton, H. (2000). Applying new technologies and cost-effective delivery systems in basic education. UNESCO, Paris: World Education Forum, Dakar, Senegal. Quellmalz, E., & Kozma, R. (2003). Designing assessments of learning with technology. Assessment in Education, 10(3), 389-407. Stiglitz, J. (1999). Public policy for a knowledge economy. Washington, DC: The World Bank Group. Tuijnman, A., Kirsch, I., & Wagner, D. A. (Eds.). (1997). Adult basic skills: Innovations in measurement and policy analysis. Cresskill, NJ: Hampton Press. 21st Century Partnership. (2003). Learning for the 21st century. Washington, DC: Author. UNDP. (2001). Human development report 2001: Making new technologies work for human development. New York: United Nations. UNESCO. (2000). The Dakar Framework for Action: Education for all: Meeting our collective commitments. Paris: Author. Unicef. (2000). The state of the world's children. New York: Author. United Nations. (2002a). Resolution adopted by the General Assembly: 56/116. United Nations Literacy Decade: Education for all. New York: Author. United Nations. (2002b). United Nations Literacy Decade: Education for all. International plan of action: Implementation of General Assembly Resolution 56/116. New York: Author. United Nations Industrial Development Organization [UNIDO]. (2002). Annual Report 2002. Vienna: UNIDO. U.S. Department of Commerce. (2002). A nation online. Washington, DC: Author. Wagner, D. A. (2000). Global thematic study on literacy and adult education. Paris: World Education Forum. Wagner, D. A. (2002, May). Analytic review of four LAP country case studies. Philadelphia: ILI/ UNESCO. Wagner, D. A., & Hopey, C. (1999). Literacy, electronic networking and the Internet. In D. A. Wagner, R. L. Venezky, & B. L. Street (Eds.), Literacy: An international handbook. Boulder, CO: Westview Press. Wagner, D. A., & Kozma, R. (2003). New technologies for literacy and adult education: A global perspective. Technical report. Philadelphia: International Literacy Institute/University of Pennsylvania. Wagner, D. A., Venezky, R. L., & Street, B. V. (Eds.). (1999). Literacy: An international handbook. Boulder, CO: Westview Press. World Bank. (1994). Poverty reduction strategy: The Grameen Bank experience. HRO Dissemination Notes. Human Resources Development and Operations Policy, Number 23. Washington, DC: Author. World Bank. (2002). Lifelong learning in the global knowledge economy: A challenge for developing countries. Washington, DC: Author.

This page intentionally left blank

12 Problematic and Promising Trends in Holding Teacher Education Programs Accountable Frank B. Murray University of Delaware

Elaine M. Stotko Johns Hopkins University

Teaching is seemingly different from the other learned professions on several interrelated dimensions. In comparison with other professions, it is massive (e.g., more than 3,000,000 practitioners compared with law's 400,000) and less well compensated.1 There are, for example, approximately 1,300 schools of education in contrast to 180 law schools and 125 medical schools. In teaching, unlike most professions, the client does most of the work (viz., the students labor to learn their lessons, but the lawyer's clients can do very little on their own to produce justice). Unlike other professions, teachers do not set or control the standards for their profession. The skills of the teaching profession, in contrast to the skills of the other professions, seem quite accessible to laypersons. Nearly every adult teaches someone something in the course of daily life and believes he or she could teach school; they simply prefer not to. Indeed teaching is seen as a natural act that can be readily observed in those who have had no professional training in the profession of teaching. Moreover, professional training in teaching is apparently not very difficult because, unlike the other professions, persons of modest abilities are admitted to teacher education programs, and few of them fail. In fact, almost all earn top marks for their efforts (Darling-Hammond, 2001; Howey & Zimpher, 1996). :

In 2000-2001, for example, the average salary for beginning teachers was just under $29,000, compared to $42,712 for fields other than teaching (American Federation of Teachers, 2001).

227

228

MURRAY AND STOTKO

THE LOW STATUS OF QUALITY ASSURANCE IN THE TEACHING PROFESSION Teaching, despite its ancient and transparent accessibility, seems to have all the modern quality assurance mechanisms of the other professions. It has licenses, certificates, academic degrees, accreditation, standardized examinations, standards boards, prize-awarding professional associations, and so forth, but none of these routine mechanisms for the assurance of quality, however, has the result for the teaching profession that it apparently has for the other professions. Their outcomes in the case of teaching seem afflicted, at least in the public mind, with more false positive mistakes than in the other professions. The public and policymakers, in other words, have come to doubt that the traditional methods of assuring quality in teaching yield what they seem able to yield elsewhere. They believe that many teachers with academic degrees from accredited schools of education and teaching certificates are not competent teachers (Conant, 1963; Judge, Lemosse, Paine, & Sedlak, 1994; Koerner, 1963). Fewer than half the nation's education schools and colleges of education are currently accredited, a fact that appears to have no appreciable consequence for a school's reputation or for the prospects for its graduates. This fact would not be troubling if only the worst schools of education were unaccredited, but some of the leading and nationally ranked schools of education have declined the opportunity to be accredited. Apparently, neither accreditation of the schools of education, nor the degrees they offer, seems to provide a trustworthy basis for professional practice. This depressing trend does not stop here, however. Departing from the practice of other professional national boards, the National Board for Professional Teaching Standards (NBPTS) elected not to require a degree in teacher education or a teaching license for those permitted to sit for its certification examinations. Alternative routes to a state's teaching license, increasingly popular with policymakers, invariably bypass the teacher education degree and the standards that stateapproved programs must meet. The state's requirements for teaching licenses, even when given automatically to graduates from a teacher education program, are easily waived, and the licenses are not required for private school teaching in some states. It would be an unthinkable public policy to require driving licenses only for those who drive publicly owned vehicles or medical licenses only for those who work in public hospitals and clinics. However, nationally, policymakers have required that only teachers employed in public settings be licensed. Also, states regularly grant the teaching license to graduates of unaccredited schools, a practice without parallel in law or medicine. It is rare indeed to find tangible evidence that anyone—inside or outside the profession—has confidence that the education school degree, the teaching license, or the accreditation stat-

12. ACCOUNTABILITY IN TEACHER EDUCATION

229

us of the education school can be trusted to accomplish what they seem to accomplish in other fields. The last reauthorization of the federal Higher Education Act did not permit funds to go to a college or school of education, preferring instead that education schools partner with the "more responsible" public schools or arts and science colleges. To receive the designation Meritorious New Teacher, a program recently proposed by the Mid-Atlantic Regional Teacher Project (MARTP) and supported by the Council for Basic Education (CBE), applicants must submit evidence about the noneducation portion of their training, namely, their scores on the SAT, ACT, or GRE tests. THE INITIAL POLICY REMEDY FOR THE PERCEPTION OF LOW QUALITY IN TEACHER EDUCATION In the decade following the release of A Nation at Risk (National Commission on Excellence in Education, 1983), all but 10 states added basic skills tests to their teaching license requirements. The subject matter of the tests (e.g., four-function arithmetic, spelling, basic reading comprehension) ordinarily would be a presumed prerequisite to the college degree. Thirty states eventually went further and retested the graduates' subject-matter knowledge, presumably because the college major, or the degree in education, is an insufficient indicator of competence in the teaching field. So that the public can have assurances not provided by the education degree or by accreditation, Section 211 of Title II of the Higher Education Act required that only colleges of education (not the colleges of business, law, medicine, physical therapy, or nursing) report the pass rate of its graduates on state licensing examinations. One admitted goal of the new Ready to Teach Act2 is the closure of a perceived loophole in the earlier Title II reporting that permits teacher education programs to submit misleading pass rates for their graduates on the state's license examinations. Congress, in its own attempt to hold education schools accountable for their programs, thought the pass rates on the license tests for the graduates of the nation's colleges of education was the telling indicator and that the public display of the pass rates would have the salutary effect on education schools that was later hoped for in the No Child Left Behind (NCLB) act for public schools. As usual the devil was in the details because almost immediately the problems in the following paragraphs involving the scores reported by the nation's schools of education were discovered. 2

See http://edworkforce.house.gov/issues/108th/education/highereducation/2211billsummary.htm.

230

MURRAY AND STOTKO

One issue that plagued the Title II reporting, and also NCLB, from the beginning was that states use different tests and different passing rates, making comparisons among states about the quality of graduates impossible. And which pass rate should be reported when a program's graduates go on to teach in several states, each with its own test and cut score? One solution might be to report only the pass rates of those who actually teach within the state where the teacher education program was delivered. This solution, however, could present a misleading picture of a program's graduates because, for some schools, it would omit many graduates, sometimes even the majority of graduates in a program that draws candidates from several states. These out-of-state candidates are often the superior students in the program owing to the fact that admission standards are typically higher for out-of-state students. Because the modal number of institutions that today's college students attend is approaching three, there is also the problem of which institution should be credited with the pass or fail of the candidate for the license. Should it be the one that actually gave the degree, the one that gave the student teaching course, or the one(s) in which a student enrolled for two semesters (the Ready to Teach Act proposal), or some other proportional allocation of credit? Apart from who is reported and which institution is credited, there are issues about which score to report. Some license tests have several parts that can be taken separately at different times. What should be reported for a candidate who takes only one subtest and declines to take the others or who passes some and fails the others? What score gives the best representation of the competence of the program's graduates? Similarly, most states allow the test to be retaken until a passing score can be obtained. In those instances, what rate should be reported—the highest, the lowest, the average of all the rates? Again, which passing rate best represents the program's candidates? Even if one satisfactorily solved the problems of which candidates and which rates to report, there is the larger issue of whether the passing rate information is a credible measure of the quality of an institution or program. Passing rates are problematic for several reasons. First, the current tests aren't that good; no one has established the psychometric validity of the tests or their associations with high or low levels of actual teaching performance.3 The validity methods that the states use, and that the courts accept, are well below the standards teachers are taught to ap3 In fact, the evidence from the initial round of Teacher Education Accreditation Council (TEAC) accredited teacher education programs showed a uniform absence of any meaningful correlation between the license scores of graduates in five states and their teaching performance scores in the program (www.teac.org).

12. ACCOUNTABILITY IN TEACHER EDUCATION

231

ply to their own classroom tests and those researchers use in their work. Moreover, the tests are not very demanding and have loose connections to the content of teacher education programs in the country. To conclude that an education school is good because large numbers of graduates pass these tests is a risky conclusion because the tests do not assess actual teaching performance. Some teacher education programs, for example, take on challenging instructional assignments, recruit nontraditional students, and succeed as a result with a smaller number of candidates than those programs that take on less-challenging assignments. They may also have lower pass rates as a result. It would be as much of a mistake to infer that these programs are of poor quality as it would be to conclude that the Johns Hopkins Hospital, which has a higher death rate (or lower pass rate) than most local community hospitals, should be closed down because it was a low-quality hospital owing to its low "pass rate." Consequently, it is a fool's errand to create rank-order lists of institutions within states by their pass rates. The formulators of the Ready to Teach Act have expressed their concern, and sometimes their outrage, that education schools earn 100% pass rates by using the license tests as screening tests. It is a curious concern, however. If the tests truly measure something important, as the legislation must presume, what relevance is the timing of the tests as long as they are passed? The public and the state only want to know that the candidates passed the test, not when it was passed—before, during, or after the program was completed. That candidates can pass the license tests before they enter a program, or complete a program, could mean any number of things. It could mean that they were superior students, that the test was too easy, that the test is irrelevant to matters of teaching, or that teacher education programs don't add much value, and so forth. In the current policy climate, because the incentives are exclusively focused on high pass rates, education schools can be expected to take steps to succeed in the areas in which others hold them accountable. The more glaring loophole in the pass-rate accountability approach is, despite all the previous problems, that it is not a stable indicator of candidate competence because the states set different passing scores on the same test, some as low as the 50% mark. The average or most representative score of those the institution has recommended for a teaching license, coupled with a sound measure of variation, would have been the obvious superior choice for the mandated Title II report. Even here, one needs to be vigilant because, for example, a score of 170 out of 190 on PRAXIS I, a common license test and state passing score, would seem to be a demanding 90%. It turns out to be less impressive when it is revealed that the range of scores for PRAXIS I is actually 150-190, not 0-190. This places the 170 passing score in the middle and not the top of

232

MURRAY AND STOTKO

the distribution, a place significantly lower than passing scores on most academic tests. The closing of the loophole requires a frank recognition of the inherent weakness in pass-rate data, of course, but the more important undertaking in making education schools accountable is the examination of the evidence schools of education truly rely on to support their public claims that they are preparing competent, caring, and qualified teachers. Those who would hold education schools accountable for the quality of their programs, including the education schools themselves, should scrutinize and verify the evidence the schools actually have and rely on to support their claims of quality. In doing so, there needs to be skepticism about the value of single standardized measures of teacher competence or student competence. The grades the faculty give teacher education students represent some hundred hours of evaluation over 4 years by 40 independent raters. The standardized license test of the same content occurs with sharply reduced sampling on all dimensions and should, on psychometric grounds alone, be seen as less reliable and valid. It should give way on other empirical grounds as well. More than 30 years ago, Green, Ford, and Flammer (1971) reported a case in which children whose test scores indicated that they understood the floating body principle were shown on more careful individual questioning and interviewing to have no suitable or accurate understanding of the principle. There are decades of corroborative empirical findings in the cognitive development literature to the effect the student's answer to a test question is only the beginning, often a poor beginning, to the eventual revelation of a student's understanding of the topic the test designer hoped to measure. Ginsburg (1997) makes a persuasive case that the "only solution" to the assessment of the child's thinking "is to create objectively different tests that have equivalent subjective meaning for the people in question. Standardized administration won't work" (p. 13). He is able to show that the interpretation of the student's standardized response is poorly and misleadingly aligned with the responses exhibited in a clinical method of assessment. Harlow and Jones (2003), to take a current example, examined how a sample of students answered questions on the Third International Mathematics and Science Study (TIMSS) and investigated whether the test questions represented the scientific understanding of these students. They found the following results: Only 13% of the written test items actually elicited the knowledge held by the students in the middle school interview sample. For 58% of the items in the test, students had more knowledge than they wrote in their written responses, and for 29% of the items, students who had the "correct" written response did not have a complete understanding of the concept being assessed. (p. 10)

12. ACCOUNTABILITY IN TEACHER EDUCATION

233

THE NATURE OF THE EVIDENCE IN TEACHER EDUCATION It is fair to ask if the nature and validity of the evidence and scholarship in the field of teacher education is itself equal to the task of assuring the public that its teachers are well educated and competent. Some have asserted that the knowledge base in teacher education is too weak to even support academic standards (e.g., Meier, 2000; Ohanian, 1999, 2000; Raths, 1999). Any discussion of the evidence that could support the program's claim that its teacher education students have learned their lessons would begin with consideration of the academic grades that are already supposed to measure such learning. There is no question that teacher education faculties currently rely on the grades they give for any number of high-stakes decisions they make about the programs they administer (e.g., who can transfer in, who can continue in the program, who can student teach, who can graduate). Course grades are meant to be a measure of subject-matter understanding, but their validity is threatened by the fact that they are frequently measures of other matters that may have only a tangential relationship or even no relationship with the candidate's mastery of the subject matter. Some of the common threats to the validity of course grades occur when they become influenced by other factors and become as a result measures of these other factors. In contemporary higher education, it is fair to say that grades may be, in varying degrees, measures of any, or all, of the following: • A measure of punctuality (when faculty take points off for late work or give extra points for early work) • A measure of gain or growth (when faculty base the grade on the degree of improvement over the course of the semester) • A measure of place in a distribution (when faculty assign grades on the curve, or some predetermined percentage formula, so that the grades only indicate the candidate's percentile or rank in the class) • A measure of dishonesty (when faculty or the university lower the grade for cheating, plagiarism, and so forth, with the result that a low grade is uninterpretable because it may signify a low level of understanding or a low level of honesty) • A measure of extra effort (when faculty give extra points for more work that may not be qualitatively superior to the prior work but is simply quantitatively more than other candidates have done) • A measure of attendance (when faculty deduct points for cutting class or arriving late)

234

MURRAY AND STOTKO

• A measure of writing skill (or some prior expertise separable from the subject matter, as when neatness, rhetoric, or format count) • A measure of reduced spread (when faculty inflate the grades or reduce the variance as in the quip "the best way to turn C students into B students is to put them in graduate school") • A measure of motivation and perseverance (when candidates receive the last grade of several unsuccessful attempts at the subject matter, or when effort is rewarded) • A measure of a candidate's background (when faculty members introduce examples and analogies that speak to some groups of students more than others, or when there is cultural bias in the teaching format4) • A measure of political statement (when faculty are sensitive to the candidate's draft or immigration status, scholarship or grant conditions, graduate/undergraduate status, and gender and take these into consideration in the assignment of course grades) For all practical purposes, today's teacher education program faculty can typically provide evidence of teacher education candidate learning by some combination of the following categories of evidence, each of which is plagued with known flaws and distortions: • Grades—overall, in the subject-matter major, pedagogy, clinical skill, education foundations, liberal and general learning, and so forth • Standardized tests (entrance, exit, and license)—given to graduates or the graduates' own students (in rare instances when these scores are available for examination), or both • Survey results—including those from students, alumni, and employers of the graduates about the graduate's competence • Ratings by faculty—including those of students' portfolios, work samples, and cases of teaching competence • Basis for rates—the factors that determine hiring/promotion, certification, graduate study, professional awards, publications, NBPTS certification, and so on 4 Consider, for example, the following test of a student's ability to form a category: "Delete the element that doesn't belong with the others: Violin, Drum, Guitar, Cello." Drum is the correct choice, of course, but a student familiar with symphony orchestras could delete guitar, and students unfamiliar with musical instruments might delete cello on the view that the other three were musical instruments and cello was simply unknown. In each case, there would be equivalent mental functioning with respect to the underlying skill of the formation of a class or category, yet in only one case would it have been recognized. In sum, we would be misled in our inference of the abilities of those who deleted guitar or cello.

12. ACCOUNTABILITY IN TEACHER EDUCATION

235

Like grades, there are serious but manageable validity issues within each of these categories of evidence, of course. Hiring rates, for example, in times of teacher shortages, may not be the indicator of candidate accomplishment that they would be in times of teacher oversupply. In times of shortage, hiring rates may indicate nothing about quality, but the rate of first choice hires, for example, may prove to be a valid indicator of candidate accomplishment. Passing rates on the currently available teaching license tests, for example, are surprisingly high, but the passing scores are sometimes set at the 25th percentile of actual cohort performance and with fewer than half the test's items answered correctly in some cases (Mitchell & Earth, 1999). Thus, we are left with a cluster of measures, any one of which may be flawed and weak. However, if taken together, they may converge and align to provide an improved evidentiary base for the program faculty's claim that the program's graduates are competent. This evidentiary base is more persuasive than the consensus standards approach, which in the past would have anchored, for example, the evidence of candidate learning in the course syllabus. Not that long ago, the teacher education accreditor or the state program review panel examined the syllabi for the required math courses as the sole basis for their inference of the candidates' mathematical competence. At best, the syllabus is evidence of a faculty member's intention and may not have been evidence of the course that was actually taught to and experienced by the candidates. The use of the single license test score, as problematic as it is, was at least an evolutionary advancement in the quality of evidence accepted in accreditation and state program review. The overriding problems with accountability by single tests, even if they provide an advancement over no meaningful assessment, are as follows: 1. 2. 3. 4. 5.

The tests are seen as disproportionately important. They distort what they are designed to monitor. They define the curriculum. The item formats and rubrics become part of the curriculum. They become, for all intents and purposes, the goal of schooling with a worrisome shift of power from the school to the agency that funds the test.

THE EVOLUTION OF EVIDENCE The evolution of evidence in education, the improvement in the validity of measures available, is slow and plagued by a discipline of education that does not seem able to build on past accomplishments. For example,

236

MURRAY AND STOTKO

after the report A Nation at Risk (National Commission on Excellence in Education, 1983), the U.S. Secretary of Education posted a wall chart representing the educational health of each state by its mean SAT score (or ACT score). The validity of the evidence was low partly because it was not corrected for participation rates (the number and portion of the population taking the test), which had significant impact on the mean scores (the greater participation, the lower the mean). Even corrected for participation, the mean SAT was an imperfect, although readily available measure because the test had been designed to be insensitive to the variations in the high school curriculum. The SAT, an ability test, gradually has been replaced as a measure of educational health in public policy discourse in favor of the National Assessment of Educational Progress (NAEP) tests, which were actually about the curriculum. However, until federal legislation regarding the use and design of NAEP was modified, state norms could not be reported or compared. The shift from the SAT to NAEP, though requiring decades, was largely a change in political wisdom and strategy. Shifts that are based on evolving scholarship, such as are needed for valid measures of teacher competence, require even longer periods and more complex analysis. Consider the kind of evidence we would accept as conclusively indicating that a child knows the simple school concept that the number remains the same when the configuration of five objects, • • • • • , becomes spread out• • • • •. What would the child have to say or do to convince us that he or she knew what we know—that the number is the same even though the spread-out row looks as if it had more objects? This would appear to be a simple matter, far less demanding than what we would need to ask of a prospective teacher to determine teaching competence. We would ask the child whether the number in the spread-out row was more, less, or the same as the number in the original row and take the child's answer as a solid indicator of what the child understood about number and spatial arrangement. What has frustrated researchers, who were initially inclined to pose and rely on such a question, was that young children (and some adults confronted with more complex arrangements of objects) would argue, even after correctly counting the number, that the spread-out row still had more objects. What did the child's assertion of inequality mean? What was it evidence of? Was it, for example, evidence of a problem in the child's perception, language, cognition, maturation, learning, development, logic, and so on? The methods that would yield conclusive evidence to this question took more than 30 years (from the 1960s to the 1990s) to analyze and assemble (Murray, 1978, 1990, 1992; Smith, 1993). They have the following

12. ACCOUNTABILITY IN TEACHER EDUCATION

237

eight elements that are emblematic of the work still to be done in the field of teacher education: 1. Judgment. What the child says in response to the question, "Does this row have more, less, or the same number as the other row?" 2. Reasons. The child might support the assertion of equality with appeals to the fact that nothing was added or subtracted, that it was the same objects in the row, that the spread-out row could be put back as it was before, that one row only looks more than the other, and so on. Whether these are adequate reasons is itself a matter of debate, but clearly we would want more than the child's yes/no response to the question in Item 1, whatever it was. 3. Duration. Our confidence would increase if the child responded the same way at a later time on the assumption that ideas truly understood are almost never given up, whereas those held on other grounds often fade. 4. Resistance to countersuggestion. On the assumption that what is truly understood is not easily modifiable, the child could be presented with counterevidence, pressure, and argument in an effort to change the child's response to see if he or she would give up the initial response. 5. Specific transfer. Our confidence that the child truly understood the number concept would increase if the assessment were made with different materials and with different tasks of the same specific form on the assumption that the understanding of number transcends any particular task features. 6. Nonspecific transfer. Our confidence in the evidence might increase further if the child succeeded in a family of tasks in different domains that had a common theoretical structure. If the child also knew that other features of numbers of objects (their mass or weight) were unaffected by spatial reorganization, we might be even more confident that the child truly understood the original number task. 7. Trainability. This criterion for understanding is the converse of the countersuggestion criterion because it assumes that a quick or abrupt change in response accuracy after feedback, hints, cues, argument, and so on indicates that the original response was not a valid indicator of true understanding. Because genuine understanding is a relatively slow process and not amenable to fast change, we assume that a quickly trained response is not based on genuine understanding. If the child were incorrect initially, but was easily trained in the right answer, we would assume that the child really understood the problem all along and that his or her initial failure was due to inattention or misinformation about some aspects of the task.

238

MURRAY AND STOTKO

8. Necessity. If the child truly understood the task, he or she would also know that the outcome had to be what it was and could not be different from what it is and so on. Our confidence increases further still should the child assert that the number is not only equal but also that it would always be equal and would never be affected by the spatial arrangement of the elements. When all eight measures align themselves in a consistent pattern, we can be more confident that the child truly understands what we understand about the number of objects in the array. If the child's response to Item 1, the item most similar to a standardized test question, were the sole or principal means of assessment, the true assessment outcome is forever in doubt. It was so frequently in doubt in the research literature that the fact that children's responses to Item 1 were so often false positives was considered a phenomenon in its own right (viz., pseudooperativity or pseudoconservation). If it proves so difficult to establish compelling evidence about such a simple matter as a child's concept of the invariance of number, and if the available methods of documenting a teacher's competence are of similar limited validity, how might the evidence in this domain evolve and develop to the point that it could serve as a reliable basis for assessing the quality of teacher education programs? A PROMISING LINE OF EVIDENCE One promising line of thinking on this question can be found in an examination of Table 12.1, which gives the percentile scores of schoolchildren in Tennessee on the Grade 5 standardized test of mathematics achievement as a function of whether the students were taught in Grades 3 to 5 by teachers who produced low, average, or high gains from their pupils (based on Sanders & Rivers, 1996; Wright, Horn, & Sanders, 1997). TABLE 12.1 Percentile Achievement in Grade 5 Mathematics on the CTM/McGraw-Hill Tests in Two Tennessee Urban School Systems (A & B) after 3 Years of Instruction by Teachers Who Produced Low, Average, or High Gains in Achievement School System Grade 3 Grade 4 Grade 5 A B

Pathways Through Grades 3-5 With Low, Average, High Gain Teachers

Low Low Low 44 29

Low Low Avg. 63 40

Low Low High 83 59

Avg. Avg. Low 61 39

Avg. Avg. Avg. 80 50

Avg. Avg. High 92 70

High High High 96 83

12. ACCOUNTABILITY IN TEACHER EDUCATION

239

The data in the table indicate that individual teachers make substantial differences in student achievement, a point that has been doubted since the 1960s when differences in student achievement were attributed primarily to differences in social class and other nonschool factors (Ferguson, 1991). The data reveal that if two pupils of equal standing and ability entered system A or B at the same time, and one had three successive teachers who produced low gains in their pupils (left column), and the other had three successive teachers who produced high gains in their pupils (right column), their fifth grade math achievement would differ by 50 percentile points, a life-altering difference that seems wholly attributable to their respective teachers. No other factor (class size, school system, heterogeneity of the class, etc.) yielded effects of this magnitude (Sanders & Rivers, 1996). Throughout the table, the data also reveal that a single teacher's beneficial or harmful influence extends over several years with very little if any evidence of compensatory effects from successive teachers. Rivers (1999) further examined how sequences of low and high gainproducing teachers in Grades 5-8 influenced the performance of fourth graders on a high-stakes minimum competency math test taken in the ninth grade. A sequence of four high-gain teachers (vs. four low-gain teachers) significantly increased the probability that the fourth grade student would pass the high-stakes minimum competency test. The probability for a bottom quartile fourth grade student, for example, rose from .10 to .60 of passing the ninth grade test, and a second quartile student's chances rose from .40 to .80. If it were the case that a school of education could show that the graduates of its elementary school teacher education program were overrepresented among the teachers who produced high gains and underrepresented among the teachers who produced low gains, many would take that as solid evidence of the program's success. Such evidence, in fact, might trump any other evidence cited previously. Almost no one would care what the graduate's scores were on the license test if it could be shown that the graduates were those who reliably produced high gains in their pupils. Of course, there are weaknesses and limitations in the approach represented in Table 12.1. A clinical method of assessment would undoubtedly reveal that the students' true understanding of elementary school mathematics was both significantly stronger in some respects and weaker in other respects than what was revealed in the relatively inexpensive paperand-pencil assessments Tennessee employed. Those who gained and lost in a clinical method analysis could be quite different from those identified by the Tennessee tests, and consequently the implications for the evaluation of their teachers' effectiveness would be problematic. The findings in Table 12.1 are also less striking for other areas of the curriculum, such as reading, where teacher effects are attenuated by fac-

240

MURRAY AND STOTKO

tors outside the school that contribute significantly to the students' literacy skills. At the moment, there is also some circularity in the value-added argument that high-gain teachers are defined as teachers whose pupils gain more on tests, so the conclusion that they have pupils who gain more and achieve more on tests is not as interesting as the fact that the gains can be attributed to the teacher. Finally, though the public may be gratified to see that there is measurable growth and gain over the course of a year's instruction, they are more interested in the status of the students' competence and whether it is sufficient. There are as well concerns about the methodologies of the various value-added approaches. McCaffery, Koretz, Lockwood, and Hamilton (2004) concluded that although Sanders's equations were biased toward finding an effect due to the quality of teaching, there were no plausible alternative explanations for the size of effects that Sanders and his colleagues found. McCaffery et al. concluded that although the effects may not be as large as claimed (because the equations lead to bias), they can be attributed to differences in teacher quality that persist over time. Finally, we need to know what these high-gain teachers do differently from average- and low-gain teachers before we can be sure this line of evidence is truly useful in the accountability of teacher education programs. There is some encouraging preliminary evidence on this key point from Hamilton County in Tennessee, where the teaching of a sample of 92 highgain teachers is being investigated (Carter, 2002). These teachers seem to be reliable high-gain producers across a wide variety of students and teaching assignments. What these teachers are not doing is somewhat more interesting than what they are doing. They are not teaching by drill and kill, and they are not teaching to the tests. Instead, they hold high, clearly stated expectations for their students as defined by the state's standards; they post both student work and prototypes of work from prior classes that met standards; and they move about the room, monitoring small groups of students, who freely ask questions and comment on their teacher's work. Few education programs, at this time, have access to the kind of evidence portrayed in Table 12.1, let alone the capacity to determine how their graduates figure into it. As the field evolves, however, this kind of evidence could become pivotal in the case a faculty makes that its teacher education graduates are competent teachers. THE PLACE OF PROGRAM ACCREDITATION

A large part of the problem with the assessment of teacher education programs, and the other quality assurance measures, is that the profession has not grounded its work in scholarly evidence. Current accreditation

12. ACCOUNTABILITY IN TEACHER EDUCATION

241

practices have evolved to the point that programs are asked to provide evidence that education school graduates can teach effectively (National Council for the Accreditation of Teacher Education) or to provide evidence that the academic degree program and the system the program faculty uses to satisfy itself and others that its claims that its students are competent, caring, and qualified educators are warranted, can withstand scrutiny, and otherwise meet the tests and standards of scholarly evidence (Teacher Education Accreditation Council). This evidence is only one piece of the puzzle, however—a piece that speaks only to whether the students learned what the program faculty taught them about critical professional knowledge, disposition, and skill. The evidence any accreditation system might provide about overall quality is still likely to be inconclusive about whether a particular degree holder should teach. The public's confidence in the quality of its professional educators must rest on multiple and converging lines of evidence about the quality of the individuals who wish to teach. Accreditation provides only one of these several lines of evidence. The other lines of evidence must come from independent assessments by others of different aspects of the prospective educator's competence. The states must secure independent evidence to warrant the granting of a license, school boards must secure their own evidence with regard to hiring and tenure decisions, standards boards must secure their own evidence to justify the granting of certificates, professional societies must devise their own kinds of evidence for the award of prizes and trophies, and so forth. Because all the known measures and sources of evidence are subject to documented distortions and flaws, it is critical that the public have independent lines of evidence on the various aspects of educators' competencies—whether they have studied and mastered what matters most, whether they are entitled to a license, whether they should be hired and tenured, whether they deserve merit payments, promotions, awards, and so on. The key point is that there be solid evidence, grounded in the professional literature and standards of scholarship, to warrant the granting of degrees, licenses, certificates, professional positions, tenure, merit payments, promotions, and awards.

ACCOUNTABILITY IN OTHER PROFESSIONAL PROGRAMS The case of accountability in teacher education is a microcosm of higher education accountability writ large. It is very difficult, for example, to find those who think that American higher education—whether accredited or

242

MURRAY AND STOTKO

not—is living up to the trust and confidence the public has invested in it (see, e.g., Blits, 1985). Although the reputations of all civic institutions traditionally held in public trust have eroded, the charge against higher education is that it is not delivering on its promises. Students fail to receive the individual attention and guidance to which they and their tuitionpaying parents thought they were entitled, they have not learned nor do they understand what their grades indicate they have learned, legislators see fewer services to the state and community for each year's tax dollar subsidy, governors see the universities focused solely on a narrow research agenda that is unresponsive to the needs of the wider community, and the alumni see that standards for academic degrees and honors have slipped to embarrassingly low levels. Accredited institutions, in particular, are seen as excessively costly and self-serving, while failing to meet their obligations and promises. In the view of the most severe critics, the college degree has merely replaced the high school diploma of the 1950s in function and quality, and the mechanisms of accountability, as currently implemented, have not significantly altered these critics' view of the erosion of standards in higher education. There are preliminary signs that the other learned professions may come to receive the kind of secondguessing treatment that has been reserved for teachers. Physicians are increasingly seeing their professional judgments, warranted by their academic degrees, subordinated to decisions made by health managers. Judges, for example, are finding their professional judgments supplanted by legislatively imposed mandatory sentences that nullify their professional training and experience. Thus, we can expect to find pressures, similar to those found in teacher education, on the accountability mechanisms in other professions. They, too, will be called on to provide solid evidence that their members are fully competent and qualified if they are to extricate themselves from intrusive and misplaced oversight by other bodies. We should also expect to find that the assurance of quality in the other learned professions is, like teaching, beyond the capacity of accreditation itself and that it inevitably entails the mechanisms of licensure, certification, peer review, employment, and so forth. The decisions made about the granting of employment, the professional license, certificate, merit award, and honors should be based more on solid evidence of accomplishment than on conformity to standards, largely unvalidated, and established by mere consensus of the members of the profession.

REFERENCES American Federation of Teachers. (2001). Survey and analysis of teacher salary trends. Washington, DC: Author.

12. ACCOUNTABILITY IN TEACHER EDUCATION

243

Blits, J. (1985). The American university: Problems, prospects and trends. Buffalo, NY: Prometheus Books. Carter, P. (2002). Highly effective teacher research. Chattanooga, TN: Public Education Foundation. Conant, J. B. (1963). Education of American teachers. New York: McGraw-Hill. Darling-Hammond, L. (2001). Standard setting in teaching: Changes in licensing, certification, and assessment. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 751-776). Washington, DC: American Educational Research Association. Ferguson, R. (1991). Paying for public education: New evidence of how and why money matters. Harvard Journal on Legislation, 28, 465-498. Ginsburg, H. (1997). Entering the child's mind: The clinical interview in psychological research and practice. Cambridge, UK: Cambridge University Press. Green, D., Ford, M., & Flammer, G. (1971). Measurement and Piaget: Perspectives of the CTB/ McGraw-Hill conference on ordinal scales of cognitive development. New York: McGraw-Hill. Harlow, A., & Jones, A. (2003, June). Why students answer TIMSS science test items the way they do. Australian Science Education Research Association Conference, Melbourne, Australia. Howey, K. R., & Zimpher, N. L. (1996). Patterns in prospective teachers: Guides for designing preservice programs. In F. Murray (Ed.), The teacher educator's handbook: Building a knowledge base for the preparation of teachers (pp. 465-505). San Francisco: Jossey-Bass. Judge, H., Lemosse, M., Paine, M., & Sedlak, M. (1994). The university and the teachers. Oxford Studies in Comparative Education, 4(1-285). Koerner, J. D. (1963). The miseducation of American teachers. Boston: Houghton Mifflin. McCaffrey, D., Koretz, D., Lockwood, J., & Hamilton, L. (2004). Evaluating value-added models for teacher accountability. Santa Monica, CA: RAND Corporation. Meier, D. (2000). Will standards save public education? Boston: Beacon Press. Mitchell, R., & Earth, P. (1999). How teacher licensing tests fall short. Thinking K-16, 3(1), 3-23. Murray, F. (1978). Teaching strategies and conservation training. In A. M. Lesgold, J. W. Pellegrino, S. Fokkema, & R. Glaser (Eds.), Cognitive psychology and instruction (pp. 419-428). New York: Plenum. Murray, F. (1990). The conversion of truth into necessity. In W. Overton (Ed.), Reasoning, necessity and logic: Developmental perspectives (pp. 183-204). Hillsdale, NJ: Lawrence Erlbaum Associates. Murray, F. (1992). Restructuring and constructivism: The development of American educational reform. In H. Beilin & P. Pufall (Eds.), Piaget's theory: Prospects and possibilities (pp. 287-308). Hillsdale, NJ: Lawrence Erlbaum Associates. National Commission on Excellence in Education. (1983). A nation at risk. Washington, DC: U.S. Government Printing Office. Ohanian, S. (1999). One size fits few: The folly of educational standards. New York: Heinemann. Ohanian, S. (2000). Goals 2000: What's in a name? Phi Delta Kappan, 81, 345-355. Raths, J. (1999). A consumer's guide to teacher standards. Phi Delta Kappan, 81(3), 136-142. Rivers, J. (1999). The impact of teacher effectiveness on math competency achievement. Unpublished doctoral dissertation, University of Tennessee, Knoxville. Sanders, W., & Rivers, J. (1996). Cumulative and residual effects of teachers on future student academic achievement. Knoxville: University of Tennessee Value-Added Research and Development Center. Smith, L. (1993). Necessary knowledge: Piagetian perspectives on constructivism. Hillsdale, NJ: Lawrence Erlbaum Associates. Wright, S., Horn, S., & Sanders, W. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57-67.

This page intentionally left blank

13 High-Stakes Testing: Literacy by the Numbers Dale D. Johnson Bonnie Johnson Dowling College

Social justice has become a buzzword among teacher educators, appearing in academic journals, conference presentations, and school of education Web sites. The following example illustrates the point: As I reflect on my experiences in education, and especially on my experiences in international education, I am constantly reminded that our educational system in the United States is rooted in a firm belief in democracy. Although we may find some exceptions in practice, basically the way we have organized schooling, the way we teach, the questions we ask, and our firm belief in the value of the individual are rooted in a democratic tradition founded on social justice. Education is our prime vehicle for creating the "just" society. In all of our efforts in education we are preparing citizens to lead productive lives in a democratic society characterized by social justice. (Switzer, 2004, pp. 1-2)

Statements like this may present an unrealistic impression to potential teachers about the realities of contemporary classrooms. Such statements by Switzer and others who speak of social justice appear to overlook the resegregation in our nation's schools (Frankenberg, Lee, & Orfield, 2003). Such statements appear to underestimate the increasing proportion of children who come to school hungry and sick from poverty (Johnson & Johnson, 2002). Such statements appear to ignore the life-altering consequences imposed on children judged as failures based on a single, end-ofyear test (Edley & Wald, 2002). From our combined 70 years of experience 245

246

JOHNSON AND JOHNSON

in public education, working with preschoolers through doctoral candidates, we find that such statements ring empty for the thousands of teachers in underfunded public schools serving the children of poverty. A contrasting opinion comes from Katz (1971): There is a great gap between the pronouncement that education serves people and the reality of what schools do to and for the children of the poor. Despite the existence of free, universal, and compulsory schooling, most poor children become poor adults. Schools are not great democratic engines for identifying talent and matching it with opportunity. The children of the affluent by and large take the best marks and the best jobs. (p. xviii)

A Nation at Risk (National Commission on Excellence in Education, 1983) further entrenched the model of educational oligarchy, arguing that American schools become more rigorous, decrying the practice of social promotion. This report initiated the public school accountability movement that engulfs the nation today. "High standards" and tests of these standards dictate that students work harder and teachers submit to more regulation. There is little mention of the accountability of policymakers and politicians who control the funds for public education. By 2000, all 50 states had developed standards, and 45 states issued school report cards. Incentives were used by 22 states to reward schools with high test scores and punish those with low test scores. High school graduation was contingent on test performance in 18 states, and in 3 states—Louisiana, New Mexico, and North Carolina—promotion in elementary and middle school was determined by a state test score (Quality Counts, 2000). In 2003, seven states retained elementary schoolchildren based on a single test, contrary to recommendations by professional organizations about the danger of a single test making a life-altering decision. In 1999, the two of us accepted professorships at a Louisiana university, where we began to see firsthand the human impact of school accountability. Louisiana was the first state to invoke harsh high-stakes testing measures requiring fourth- and eighth-grade students to pass the annual LEAP (Louisiana Educational Assessment Program) test given in March or to repeat the entire grade. This practice was mandated despite evidence from the National Academy of Science, the National Association of School Psychologists, and others showing that grade retention is harmful and that no decision with serious consequences should rest on a single test score (Edley & Wald, 2002). Each year Louisiana, like many other states, issues a school report card labeling every public school in one of six categories from "School of Academic Excellence" down to "Academically Unacceptable School." Results from the LEAP test and the Iowa Test of Basic Skills

13. HIGH-STAKES TESTING

247

are announced in May in various media—newspapers, television reports, Web sites, radio reports, and state and local pamphlets. Students are also labeled based on LEAP scores. Test pressures on children, teachers, parents, and school administrators begin on the first day of school in late August and continue until tests are completed. Our doctoral students, all teachers, told us of plummeting morale and teachers seeking early retirement or new careers. Teachers were told what to teach, how to teach it, and the minutes a day it should be taught by officials in Baton Rouge, who were far removed from life in classrooms. Meanwhile, most schools were not provided the materials to meet these objectives, leaving teachers to locate materials on their own. Teachers were required to prepare astonishingly detailed weekly lesson plans and told how to grade and evaluate students' assignments. Grades didn't really matter, however, because in fourth or eighth grades passing LEAP was what really mattered. As part of our university duties, we frequently visited classrooms in Louisiana, often unannounced, and invariably came away impressed with the quality of teaching, despite the lack of materials and dismal school buildings. Louisiana had instituted punitive accountability measures in part because of state officials' embarrassment over consistently low scores on state and national assessments. It seemed clear to us that incompetent teachers were not the cause of the low achievement. We felt certain that poverty was a major cause, but we wondered about what else might be contributing. We decided to return to public school teaching to become more knowledgeable about the daily lives of students and teachers. We read again the case studies by Richard Venezky and Linda Winfield (1979) of two urban schools serving low-income students. They concluded that schools with achievement-oriented leaders were effective in raising reading scores. We decided to see how the Venezky-Winfield hypothesis held up in these schools. During the 2000-2001 school year, we took unpaid leaves from our university positions to seek positions as elementary teachers. Louisiana faces a severe teacher shortage, so it was not difficult to find positions. We were hired to teach third and fourth grades at Redbud Elementary (pseudonym), a school serving rural pupils from prekindergarten through fourth grade. Redbud served the poorest of the poor. About 95% of the 611 children qualified for free breakfast and lunch. Some 80% of the pupils were African American, most raised by a single mother, an aunt, or a grandmother. The children lived in housing project apartments, substandard mobile homes, or small, unpainted shotgun houses. Some dwellings lacked running water, flooring, or electricity. Parents and guardians were minimumwage workers in a chicken processing plant, a discount store, and fast

248

JOHNSON AND JOHNSON

food restaurants. In the town of Redbud, 3,500 residents were equally divided between African Americans and Whites. Most White children attended private Christian academies. Private schools in Louisiana are exempt from high-stakes tests and other accountability mandates, even though they receive state monies. Tuition at the local private academy was $3,000 per year—a sum far beyond the reach of parents of Redbud pupils. Redbud Elementary School had seen few improvements since it was built in 1948. Each room had window heating and cooling units—often plugged with towels to avoid wind, rain, and insects—that worked on occasion. The building was infested with cockroaches, spiders, and rodents. It lacked a library, playground equipment, hot water, adequate pupil bathrooms, art classes, or a school counselor. The single toilet in the teachers' room provided for 72 faculty and staff; a single telephone served all children and teachers. A school nurse visited one morning each week. Classroom materials were limited, and texts were outdated. Our classroom dictionaries were published in 1952 and included a definition for mammy: "a colored woman in charge of white children, an old Negro woman" (Thorndike & Barnhart, 1952, p. 363). Science equipment in one of our rooms consisted of a plastic beaker and a box of small mostly unlabeled rocks. The other room had nothing. We shared a globe (no maps) to help us teach social studies. The 72 staff members included 32 teachers, about one third uncertified or temporarily certified, reflecting Louisiana's teacher shortage. Redbud teachers seldom stayed for long. The year prior to our arrival, one fourth grade class had 14 different teachers. Some teachers lasted only a few days. Louisiana is near the bottom of the nation in average teacher salary (International Reading Association, 2003), and the town of Redbud has among the lowest salaries in Louisiana. During our year, Redbud salaries ranged from $23,515 to $34,904. The day that our pupils arrived, some carried school supplies and others came with nothing. The children's reading and writing abilities varied widely, not unexpected. A quick pretest revealed six nonreaders in the third grade class and four in the fourth grade class. Only four third graders were near grade level in reading. Our lessons had to accommodate a wide range of skill and social development. We became keenly aware that many children needed extra help that would not be provided by the school. As the year progressed, we noted pupils wearing clothes several sizes too large, apparently to allow for growth. Redbud children lacked dental care, complaining of tooth pain and bleeding gums. Teachers tried to arrange care for the children through the school nurse. Sick children often stayed at school for good reasons. If a mother had to take off from her

13. HIGH-STAKES TESTING

249

minimum-wage job, she might lose the job. Some parents or guardians could not be reached because the phone had been disconnected. A few parents had no vehicle and could not pick up the children. Teachers kept blankets in their classrooms to keep children comfortable at their desks; the building had no place for a sick child to lie down. During our year at Redbud, we learned that even the youngest students may bring enormous emotional pain to school each day. Family members of two of our students were murdered during the year. One student's brother was killed in a drive-by shooting. One child's grandmother had her throat slashed by a crack user a few blocks from the school. Three pupils lost their homes to house fires during the year. We kept journal records during our Redbud year to document observations and impressions of day-to-day events in the schools (Johnson & Johnson, 2002). In addition, each Monday the fourth graders wrote about the weekend. Here is Emerald's journal entry: My House Fire Thing that happened in my life is that my house got burn up. First it starting at the top and then it just starting to burn. My close, shoes, and bed, toys and my fish tank. It is sad because my brothers game got burn up. One thing about is that everybody come to help my mom. My grandmother tried to run back in there but my mom hold my grandmother back. In my life at my uncle house has been so bad. My cousin always picking on me and they don't like me. My uncle try to make me happy but he cannont make me happy. Things in my life goes bad. (Johnson & Johnson, 2002, p. 95)

The fire occurred shortly before the winter holidays. During Drug Awareness Week, the principal encouraged students to write essays or prepare posters about drug problems. Malcolm wrote: Drugs are everywhere. I see people walking down the street acting like somethin wrong. They need on clean clothes, a bath, hair need washing and combing. They come over the fence in the backyard when we are not home and steal things. They steal our hanging plants late at night. They come to our house trying to sell us things that they might have taken from someone else. They seem like they do not eat any food from looking at their size and seem like they never go home at all. (Johnson & Johnson, 2002, p. 78)

Another fourth grader wrote: Drug affect my family by dad do all kinds of drug. He do dope and weed. My uncle do weed. It just hurt my heart. I feel like one day they goin to get the wrong thing. When my dad had smoke weed, he try to burn my grand-

250

JOHNSON AND JOHNSON

mother house. I was crying when my dad tried to burn up my grandmother house and he live there. (Johnson & Johnson, 2002, p. 78)

Despite harsh conditions beyond their control, the children arrived daily with sunny outlooks and an eagerness to learn. They frequently exhibited compassion for their classmates and felt bad when they thought they were letting us down. During a morning recess, a third grader recounted that his mother had left home because she hates him. A classmate standing within earshot came over to the distraught child, put his arm around him, and said, "Don't worry, it's O.K. My daddy hates me. He hasn't seen me or talked to me since I was six months old." Pupils' notes were sincere and endearing. Asked to describe the teacher, a fourth grader wrote: "Mr. Johnson talk funny, but he smell good." Another observed, "He is very old but still smart." When Wendice was told that he wasn't working as hard as he could, he responded, "I know. I'm lazy." Toward the end of the day, the following note appeared on the third grade teacher's desk: "I love you Mrs. Johnson I'm very sorry What I did Bad thing Not Working." The children, with so little themselves, displayed great generosity. No one ever went without a pencil, paper, or something to eat or drink from the free lunch trays if other children could supply these things. They shared without prodding or complaint. Their sense of humor delighted us, and their problem-solving abilities impressed us. After the custodian installed a new pencil sharpener, a student trying out the sharpener discovered that the handle was on the wrong way so that she had to crank the handle toward herself with her left hand. Several children, including Jaheesa, demonstrated a solution to the problem. Jaheeza squeezed behind the cabinet, so she could sharpen her pencil with her right hand moving the handle away from her. As it turned out, this problem is not on the LEAP test. We judged our pupils every bit as bright as any we have taught. They just didn't have the opportunities that high-test-scoring children have, such as books and computers in the home, trips to other cities or states, a full stomach, a bed of their own, and more. Although most children came from homes with at least one caring adult, many did not. When a third grader needed extra help with carrying and regrouping (i.e., borrowing) in math, his teacher suggested his mother help him. "No," the child replied. "When I get home, she drunk. Every day she drunk or gone." The assistant principal reported that the parent had been "a dopehead" and was "a drunk," adding, "You'll get no help from there." Another child confided that he couldn't sleep because his stepdad "watches war movies all the time, and he makes me watch them. When somebody gets blown up, he laughs and says, 'They got him! They got

13. HIGH-STAKES TESTING

251

him!' I don't think kids should watch that stuff. He don't want me to read. But I have a little book hidden under my bed. Sometimes I can read that if it's not too dark" (Johnson & Johnson, 2002, p. 120). There is much more to teaching than preparing lessons, directing activities, and grading papers. Children of poverty could benefit from wellstocked school libraries, frequent field trips to expand background knowledge, and extra help for those falling behind. What they too often receive is skill-and-drill preparation for high-stakes tests. Throughout the year, we witnessed pupils' lack of prior knowledge, the kind of middle-class, White, standard English knowledge useful on a standardized test. Children repeatedly told us that they could "sound out the words," but they didn't know what the words meant. Confronting passages about a restaurant or an airport, children had no personal experiences with either. Only one child had visited a restaurant other than a fast food establishment—that child had been to a cafeteria-style, all-youcan-eat buffet. From a story about an airport, replete with vocabulary about "tugs," "baggage tags," "check-in," and "gates," our pupils drew blanks. One child responded, "I kind of was at an airport. We drove past one once." The school accountability mania that has swept American public education has been particularly harsh on children who attend schools like Redbud. Millions of dollars are squandered on test-preparation booklets, computer test-prep programs, and the purchase and scoring of tests, although the school lacks for basic supplies. For example, Redbud had no library, but the superintendent authorized $85,500 of district funds to purchase a test-prep computer program for fourth graders. The program, which crashed frequently, offered little more than electronic workbook pages. Since 2001, Bush, Cheney, and Cecil Picard, Louisiana's state superintendent of schools, frequently were quoted as saying that teachers would be held accountable. Secretary of Education Paige stated, "Anyone who is against annual testing of children is an apologist for a broken system of education that dismisses certain children and classes of children as unteachable" (Federal File XX 37,2001, p. 25). We do not think that any child is unteachable, but we have seen firsthand the disservice that high-stakes testing can do to poor and minority children in underfunded schools. The time devoted to test preparation is time that otherwise could be devoted to genuine teaching and learning, and funds spent on the testing could be better used for personnel and materials to help underperforming pupils catch up. We wonder why school officials and politicians across the country seem oblivious to or disinterested in the omnipresent correlation between children living in poverty and low test scores. On national assessments,

252

JOHNSON AND JOHNSON

the states that invariably score at the bottom are the high-poverty states, such as Louisiana, Mississippi, and New Mexico. In Louisiana, the schools with the lowest scores are found in high-poverty areas: inner city New Orleans, the river parishes along the Mississippi delta, and pockets of rural poverty, such as Redbud. Schools with the highest scores are magnets and campus lab schools, often with selective admissions procedures, as well as schools in the affluent suburbs. One morning during our Redbud year, we tuned in to an early local television program before leaving for school to see the principal of a suburban elementary school being interviewed. As the camera panned across well-equipped classrooms, an abundance of children's books, banks of new computers, a playground the modern play structures, the principal celebrated the award as a "School of Academic Distinction." We recognized the school; we had visited it in connection with our university duties. It serves well-fed, mostly White children who live in expensive homes. One of the school's teachers told us that the principal liked to brag about the school's accomplishments. "But look at the neighborhood. Look at the beautiful homes. We should be a school of distinction. The children have everything at home" (Johnson & Johnson, 2002, p. 115). At Redbud, anxieties about high-stakes tests permeated children's thoughts throughout the year. Early in September, fourth graders completed a survey. One question asked, "What is the number one problem today?" Pupil answers included "drugs," "drinking," "guns," "knives," "killing," and "the LEAP test." In mid-October, a fourth grade girl, assigned to write a letter to the Saints (the local professional football team), wrote, "Our school has no playground equipment. Can you help us out we need so much. I'm in the fourth grade and I hope I pass the LEAP test. Will you guys pray for us? If you do, thanks" (Johnson & Johnson, 2002, p. 69). (The children did not receive a response.) Later in the year, a note appeared on the fourth grade teacher's desk: Dear: Mr. Johnson Happy Birthday I hope you have fun on your specail day. Thank you so much for teach us. I hope everyby pass the leap Test. But one more I didn't get you noting for your Birthday. But I will try really hard to pass the Leaptest just for you. Happy Birthday. Dawnyetta (Johnson & Johnson, 2002, p. 136)

The LEAP test was on fourth graders' minds all year long, and third graders began to worry about the challenges they would face in the following grade. During the spring, test preparation workbooks, drill sheets, and daily work on the computer test-prep program continued unabated. The mes-

13. HIGH-STAKES TESTING

253

sage for teachers was clear: "Don't worry about science, social studies, or anything but math and English—the subjects of the high-stakes LEAP test." Most teachers managed to sneak enjoyment into the school day, but as test week drew closer, tension built steadily among pupils and teachers alike, in response to the endless drumbeat of reminders that the test was coming. While at Redbud, we experienced a new phenomenon in public education: test-prep pep rallies. High school cheerleaders performed test cheers, the mayor spoke about trying hard on the tests, and test-related skits were presented. Each class was required to write and perform a cheer about the tests. The fourth grade class wrote the following cheer to a hip-hop beat: We're from Mr. Johnson's homeroom, And we are the best. We're not afraid of that old LEAP test. We have studied and we have learned, We have lots of energy to burn.

The cheer continued through three more verses. Pep rallies for testing in Louisiana have escalated in the 2 years since we were in the classroom. Rob Nelson (2003) of the New Orleans TimesPicayune described a rally designed to "relieve stress and boost excitement" for the looming LEAP test. The article stated that the rally, held for 10,000 Jefferson Parish students, would "bring together 800 band members, 400 cheerleaders, local elected leaders and television news personalities to help 'psych' students. Navy pilots are also scheduled to fly over the field." In the article, school board member Chris Roberts observed, "We feel like the kids are so stressed. They need some time to relieve some of the tension ... This is just as important as receiving instruction in the classroom." One wondered who was paying the bill—the stadium, fuel for military planes, transportation to and from the field, and so on. The article noted that Zepher stadium was loaned free of charge, and "CocaCola signed on as a corporate sponsor" (p. 2). The LEAP test and the Iowa tests were kept under tight security. The 68-page test administration manual emphasized security. Teachers signed an oath of security and confidentiality. The manual warned of penalties for any breach of security: Any teachers or other school personnel who breach test security or allow breaches in test security shall be disciplined in accordance with provisions of R.S. 17: 416 et seq., R. S. 17: 441 et seq. policy and regulations adopted by the Board of Elementary and Secondary Education, and any and all laws that may be enacted by the Louisiana legislature. (Louisiana Department of Education, 2001, inside cover)

254

JOHNSON AND JOHNSON

Teachers were not allowed to look at any of the questions on the LEAP or Iowa tests before, during, or after the testing; they could read only the instructions in the manual. We were warned that state inspectors could show up in our classrooms to monitor our behavior during the testing. Two years later, in March 2003, we received an e-mail from a friend who has served for 30 years as an elementary teacher in Louisiana, one of the best teachers we have seen anywhere. She wrote the following: A man, who did not bother to introduce himself said, "I was monitoring the halls during the Iowa test, and I saw you give answers to two students." I was floored. I replied, "You saw me tell two boys that they didn't erase their answers very well, and it looked like they had two answers on one line." He said, "Well what you say and what it looked like are two different things." I said, "Why don't we call the boys in here and ask them?" But he refused. This man was from the state department and he, of course, doesn't know me. But I did expect my principal to stand up for me. She didn't say a word. I was so upset by this. (K. Gandy, personal communication, March 21,2003)

One has to work under these conditions to realize the fear that has permeated public school classrooms because of high-stakes tests. Principals are as fearful as the teachers. Our friend, the veteran, retired from teaching at the end of the school year, and Louisiana lost another of its best. Another March headline read, "Principal Checks Out Alleged Test Violation" (Wilson, 2003). The article reported a rumor that one LEAP test question had been distributed to students to take home. The curriculum director indicated that the school system was conducting an investigation to determine "if the integrity of the test has been compromised, then we're going to follow state and parish guidelines and deal with the individual whether student or adult" (p. 1). We wistfully recall earlier teaching experiences when tests served instructional purposes to help teachers and parents understand how pupils were progressing and where they needed help. Today, tests are used to accuse, sort, label, and fail children, regardless of home and school environments. Is this the social justice we hear so much about? The executive directors of the National Association of Elementary School Principals and the National Association of Secondary School Principals, Vincent L. Ferrandino and Gerald N. Tirozzi (2003), in Education Week, decried the use of "shame" by policymakers and legislators to raise test scores. They said: It is indeed discomforting, frustrating, and perplexing to have policymakers echoing a belief that the strongest weapon in the arsenal for improving pub-

13. HIGH-STAKES TESTING

255

lie education is to shame the schools—in effect, principals, teachers, students, and parents. ... The extensive literature promoting successful business and corporate practices and strategies that companies use to move from a "good" to "great" status is replete with references to the importance of quality leadership, staff involvement, treating staff with respect and dignity, and promoting a supportive and constructive work environment. In this body of literature, shame is never mentioned.... Wouldn't shame be better focused on those policymakers who promote the use of testing as a punitive measure rather than as a diagnostic tool to improve instructional practices and student learning? (p. 19)

Across the country, newspapers publish test scores and the names of "failing" schools. Since the passage of the No Child Left Behind Act, frontpage headlines such as "State Names Schools that Must Improve" (Draper & Walsh, 2003, p. 1) are commonplace. Students and teachers in "failing" schools are repeatedly publicly shamed. Realtors use test data to sell homes. Some parents display school rankings on bumper stickers. The LEAP and Iowa tests were administered at Redbud in mid-March. On the first day of the tests, one third grader vomited in his hands and ran to the bathroom before he could write his name on the test sheet. Another student took one look at the first section of the test and began to cry. He then randomly filled in bubbles on the answer sheet, put his head down on his desk, and sobbed. A few fourth grade children who finished the first part of the test put their heads on their desks and fell asleep. The testing continued for 5 days, during which nonreaders and children with severe learning disabilities were required to sit and struggle day in and day out. Teachers could offer neither help nor comfort. Two anxious months passed between test week and the announcement of test scores. During that time, teachers finally were free to teach—not conduct test-prep sessions—although many children seemed to feel that the school year was over and appeared listless. The principal received the LEAP results in May and called the fourth grade teachers to her office, where they learned that 54 of their 118 students had failed the test and would have to attend summer school, be tested again, and spend another year in fourth grade if they failed a second time. Fourteen of these children were in special education. Throughout Louisiana, 17,988 fourth graders and 21,529 eighth graders learned in 2001 that they had failed, for a total of more than 100,000 Louisiana students since 2000. These are numbers each representing a child who has to, at best, give up the summer to cram facts for a retake and, at worst, spend another year in the same grade. Fourth grade teachers decided to inform students of the results by calling each into the hallway, one at a time, in alphabetical order. A journal entry from one fourth grade teacher (D. Johnson) reported:

256

JOHNSON AND JOHNSON

My first student is Dario. I tell him that he has passed the English language portion of the LEAP, but he will have to attend summer school because he failed the math portion. He cries quietly. "My papa (grandfather) gonna be mad at me. He will beat me." "I'm sure that if you work very hard you will pass it this summer, Dario," I tell him. "Don't be discouraged. Just be determined to do your best." He reenters the classroom. My words sound hollow to me. How can he not be discouraged? I am discouraged. He just has had his summer jerked away from him and maybe all of next year. Dario is a child who had been locked in a trailer as a preschooler whenever his grandfather and mother left home. There was no adult with him. He began kindergarten with a fear bordering on panic whenever he was left alone. He had limited oral language proficiency, but he was one of my best readers who borrowed books every day to take home. But because he didn't pass the math test, he was labeled a failure. What are policymakers doing to children? The next pupil was Jamal, who is a special education student. In addition to his speech problems, he has severe learning disabilities. I must inform Jamal that he has failed the English language arts and math sections of the test. I plead with him to go to summer school. In Louisiana, special education students must take the week-long LEAP test. When they fail, they must attend summer school and take the test again or repeat fourth grade. If they attend summer school, however, special education pupils are promoted to fifth grade whether or not they pass the retake of LEAP. Why take away their summer and make them sit through the agony again when the results mean nothing? This process continues until I have informed everyone. When I reenter the classroom, most of the children are crying. Those who passed are hugging those who failed and are trying to comfort them. "You'll do fine in the summer," LaDelle tells Chalese, a child who has severe learning disabilities and can barely read her name. This scene is repeated in the hallway in the four other fourth grade classrooms. One little girl in the room next door says, "I'm going to kill myself." The teachers are as choked with emotion as the students.

A journal entry from the third grade teacher (B. Johnson) reported: After school, while the children are sitting in the hallway waiting for their buses to be called, it is nearly silent. The pupils have not been this quiet all year. Sounds of fourth graders crying carry down to where my third graders sit. "Why are all those kids crying?" asks Jeron. "They flunked," says Kenziah softly. I walk down to the fourth grade classrooms. I try to console inconsolable children. The pitiful scene is too much for me. I walk back to my third graders. (Johnson & Johnson, 2002, pp. 177-178)

That same year, two Redbud eighth graders attempted suicide after failing the LEAP tests. There can be no social justice in schools as long as we

13. HIGH-STAKES TESTING

257

continue to fail our children based on a single test score when these children are not given the same in-school advantages that more economically privileged children have. Toward the end of the school year, we asked our students for suggestions for the future teachers we would prepare when we returned to the university. A third grader wrote, "Do not be mean to a student or you might destroy their dreams." The insistence on making a child repeat an entire year based on a single state standardized test score certainly can be viewed as mean spirited and a destroyer of dreams. In Louisiana, as elsewhere, test performance is strongly correlated with percentage of students on free lunch, percentage of minority students, and percentage of students in special education. For example, one Louisiana "School of Academic Distinction" had 11% on free lunch, 8% minority students, and 9% in special education. A campus laboratory school had 0% on free lunch, 12% minority students, and 3% in special education. By contrast, Redbud, a "Below Average School," had 95% on free lunch, 80% minority students, and 16% in special education. The same pattern holds across the state, across the nation, and elsewhere in the world. In 1992, Elley reported an international study of reading achievement in 32 nations. Elley concluded that family affluence and national indices of economic development and health were related to reading achievement. More recently, Alloway and Gilbert (2003) reported that, in Australia, 87% of the high-socioeconomic status students passed the required test (National English Literacy Survey), but only 47% of low-socioeconomic status students passed the test. They stated, "Against both reading and writing benchmarks, children from low socio-economic backgrounds scored lower than those from medium and high socio-economic backgrounds" (p. 167). The results in Louisiana and elsewhere stem directly from poverty and its associated ills: lack of medical and dental care, poor nutrition, drug and alcohol abuse rooted in despair, scarcity of decent-paying jobs, and the social injustice of racism. We were dumbfounded when Congress passed President George W. Bush's plan to test all public school students in Grades 3-8 every year— No Child Left Behind (NCLB). Our Redbud experience tells us that this testing will accomplish nothing—except to give employment to those in the bureaucratic fiefdoms that will spring up in response, boosting the profits of the corporations that market tests, print and electronic testpreparation materials, and tutoring services for the millions of children who will be forced to take these tests. At the 2003 convention of the Education Industry Association, Brad Baird, vice president of Excelsior Software, said in reference to NCLB, "The day the act was signed by President Bush, we began the due diligence of reading it and finding which provi-

258

JOHNSON AND JOHNSON

sions applied to our products. There is gold at the end of that rainbow for us" (Walsh, 2003). The testing craze will encompass children even younger than first graders. Beginning in fall 2003, the Bush administration plans to require standardized tests for 4-year-olds in the federal Head Start program. Steinberg (2002) noted, "Those results will be used, at least in part, to determine whether the children's teachers are doing a good job—and whether the government should continue to finance that particular Head Start center" (p. B1O). It troubled us to learn recently that, because of the $11.5 billion budget gap in New York, Governor Pataki intended to slash the entire $204 million allocation for prekindergarten. Everyone knows the value of a good preschool education. Wealthy parents send their children to private preschools. In New York City, parents can hire advisers to assist with admission procedures of private preschools. The price of these services start at $300 an hour, or parents can opt for "a package of services for $3,000" (Gross, 2003, p. Al). But what about the poor children who can benefit most from preschool education? We have heard no mention in New York or any other state of redirecting or reducing the amount of money spent on testing in this time of fiscal crisis to save public preschool education. On the contrary, on March 17, 2003, the New York State Education Department issued a call for proposals for the production of state examinations in various subject matter areas for elementary and secondary schools; bidders were required to hold a bachelor's degree. Test production continues unabated in hard financial times. In July 2003, New York City Schools Chancellor Joel I. Klein announced that he had hired Princeton Review, a test-preparation company, at a cost of $8.2 million over 3 years, to develop and administer more tests. New York City public schoolchildren will take three additional standardized tests in math and three in English throughout the year, in addition to the annual statewide and citywide exams. Princeton Review annually ranks state accountability systems, giving the highest marks to the states with the most rigid test-driven mandates. New York and Louisiana consistently rank in the top six or seven positions in the nation. In May 2003, the Review ranked New York number one and Louisiana number six. The Review sells multiple titles of test coaching books to seven of the top 10 state programs in its rankings. It has contracts similar to that just signed with New York City with several states and cities. Upon seeing Louisiana's number seven ranking in 2002, Leslie Jacobs, a leading proponent of accountability on the Louisiana Board of Elementary and Secondary Education, called the report "yet another unbiased, third-party affirmation that the testing and accountability policies we've put into place are among the best in the nation" (Hasten,

13. HIGH-STAKES TESTING

259

2002, p. 5A). The Web site of the Louisiana Department of Education (2003) refers to the Review as a "magazine" (p. 1), not a test development firm. On the site, referring to the rankings, Superintendent Picard commented, "I'm proud that Louisiana is finally at the top of education lists. We are truly leading the way" (p. 1). It is hard for us to know what to think about such reactions. Teachers increasingly have come under scrutiny as the causes of many ills that befall students, including harsh statements and claims by those we think should know better. Arthur E. Wise, president of the National Council for the Accreditation of Teacher Education (NCATE), a group that accredits more than half of the nation's teacher education programs, and coauthor Marsha Levine, director of NCATE's standards project (2002), offered a simplistic, 10-step solution to improving student achievement in low-performing schools. Step 1 called for identifying the 10% lowest-performing schools in a school district; Step 2 continued, "Transfer all teachers and administrators in the identified schools. The school clientele should remain the same; the adults should change. New leadership and new faculty who share a commitment to a new mission of student achievement, teacher preparation, and staff development are critical for success" (p. 38). Wise and Levine (2002) clearly think that teachers and administrators are solely at fault. How long has it been since they have taught in an underfunded inner city or rural public school? Where do they think the replacements for the current teachers and administrators will be found? At Redbud, we worked with dedicated, skillful teachers whom we believe any school would be fortunate to employ. They performed heroically in helping children build toward success from weak foundations, but high-stakes tests seldom reflected their hard work. Progress is sometimes better measured in small steps than giant leaps. Wise and Levine (2002) imply that teachers like those with whom we worked are failing their pupils. We think that it takes a special person to get physically close to children who don't always smell good, have open sores, wear dirty clothing, have bleeding gums, and cry from illness. It takes a special person to get emotionally close to pupils who are difficult to teach because of traumatic experiences: watching others beaten or a relative murdered. It takes a special person to return daily to a crumbling, ill-equipped classroom to deal with children beset with the specters of poverty. Equally simplistic are recommendations by Allan Odden, professor of educational leadership and policy analysis at the University of Wisconsin-Madison, and Marc J. Wallace, Jr., founding partner of Teacher Effectiveness through Compensation (2003). They suggest compensating teachers based on gains in student achievement. The authors seem to overlook that high test scores are linked to parental affluence, school dis-

260

JOHNSON AND JOHNSON

trict expenditures, children's background experiences, books and computers in the home, and more. A colleague who has taught for 34 years noted, "I'll be glad to be paid according to the kids' test scores when I get to choose who comes into my classroom on the first day of school." Some flaws are appearing in the accountability movement. In Florida, a group of minority politicians and religious leaders is threatening a boycott of some of the state's largest industries to force suspension of the high school graduation exam recently failed by 13,000 seniors. California's State Board of Education voted unanimously to postpone until 2006 the implementation the state's high school exit exam when it appeared that 92,000 students would be denied diplomas in 2004. Texas relaxed its third grade reading standards to forestall retention of thousands of students based on the state test. Georgia changed its end-of-course exams to diagnostic tools rather than graduation requirements. In New York, state officials allowed thousands of students to graduate from high school despite their low scores on the state mathematics exam; officials stated that the test might not have fairly assessed age-appropriate mathematical knowledge. In Houston, New York City, and elsewhere, school officials have encountered criticism when hundreds or thousands of high school dropouts (or "pushouts") were "uncounted" to boost accountability scores. Don Freeman, a former high school principal in the Bronx, observed, "Ten years ago, you could focus on the kids. The pressures were not the same, and you could take some risks. Now you're supposed to focus on the numbers" (Medina & Lewin, 2003, p. B6). In Michigan, parents, teachers, principals, and administrators are in an uproar over federal and state mandated tests for the severely disabled. Winerip (2003) described the situation at Wing Lake Developmental Center in Bloomfield Hills: The 140 students, between the ages of 3 and 26, have IQs below 30. Ninety percent... wear diapers. Half are in wheelchairs. For the rest of their lives they will need to be cared for by relatives or in supervised group homes.... So why the state tests? ... under the new No Child Left Behind Act, by 2005, all states have to develop math, reading and science tests for the severely retarded, (p. D9)

Arthur Levine (2003), president of Teachers College at Columbia University, speaks to these issues in the following scenario: Imagine that our nation was facing a terrible disease of the dimension of AIDS. Then imagine that the principal approach our country took to finding a cure was to create benchmarks of wellness, establish high standards for a healthy person and then set up a way to test if people with the disease had achieved those standards. Those with the terrible disease, of course, would fail the test every time. Perhaps hospitals and doctors could be held respon-

13. HIGH-STAKES TESTING

261

sible for their failure. This clearly ludicrous approach would bring us no closer to curing the disease or even to understanding its causes. Yet it is just what our country is doing in education, (p. A12)

Our year as classroom teachers transformed us, and we do not teach our university classes as we did before teaching at Redbud Elementary. We are more realistic in our depiction of life in classrooms today, and we try to prepare our students for the relentless pressures that high-stakes tests put on children and teachers. We remind our students that children come to school with different needs, interests, fears, backgrounds, motivations, language development, and skills. These individual differences cannot be overlooked, despite the constraints of test mandates. Stories from our year at Redbud inform our discussions with candidates of what education is all about—the nurturing and development of young minds. We share our rediscovered joys of working with children—their spontaneity, their compassion for other living things, their pride in their accomplishments, and their eagerness to please. During our year at Redbud, we gained a new appreciation of the meaning of the Venezky-Winfield hypothesis. Achievement-oriented mandates can increase test scores—but at what cost? Is sacrificing science, the arts, social studies, oral language, and more worth small increases in reading and math scores? Is literacy by the numbers what schooling should be about? We have come to understand firsthand how the accountability movement and high-stakes tests can impose damaging stress on young children, their teachers, and their parents and guardians. We have come to appreciate the influence of people far removed from the classroom on what happens in today's classrooms, ignoring and undermining the expertise of teachers. We now know the injustice that school accountability imposes on children of poverty. We believe that professors, teachers, parents, and other citizens must put pressure on legislators and policymakers, holding them accountable for the unacceptable conditions in many schools and for the intolerable practices they have imposed. We believe that teacher educators cannot sit mute in their offices or prattle about a democratic society founded on social justice while public education is strangled by decisions and mandates from economically privileged noneducators. We have become convinced that teachers and teacher educators cannot be meek or we will lose a large group of children. Our year of classroom teaching awakened us from the comfortable dormancy of college life, leaving us reenergized and prepared to speak out and rally as never before. Cookson (2003) recently posed three critical questions about the goals of education: "How does one measure the growth of intellect, imagination, and aspiration? How does one measure curiosity, self-confidence,

262

JOHNSON AND JOHNSON

and hope? Why would we believe that educational potential could be captured by a standardized test?" (p. 30). In this era of accountability, Americans need to ask themselves what they want public schools to accomplish. Should students leave school with the skills and knowledge measured on standardized tests in reading and math, or with a more encompassing education that includes science, social studies, and the arts? No one is arguing against the importance of students' mastery of basic skills. The issue is about how far we want our students to go beyond those skills and the best ways to get there.

REFERENCES Alloway, N., & Gilbert, P. (2003). Reading literacy test data: Benchmarking success? In H. Fehring (Ed.), Literacy assessment: A collection of articles from the Australian literacy educators' association (pp. 164—176). Newark, DE: International Reading Association. Cookson, P. W., Jr. (2003, January 22). Standardization and its unseen ironies: Why testing is part of the dumbing down of America. Education Week, pp. 30, 32. Draper, N., & Walsh, J. (2003, July 8). State names schools that must improve. Star Tribune, p. 1. Edley, C, Jr., & Wald, J. (2002, December 16). The grade retention fallacy. The Boston Globe, p. A19. Elley, W. B. (1992). How in the world do students read? Hamburg, Germany: The International Association for the Evaluation of Educational Achievement. Federal File XX 37. (2001, May 23). The word on tests. Education Week, p. 25. Ferrandino, V. L., & Tirozzi, G. N. (2003, June 4). For shame. Education Week, p. 19. Frankenberg, E., Lee, C., & Orfield, G. (2003). A multiracial society with segregated schools: Are we losing the dream? Retrieved February 9,2003, from http://www.civilrightsproject.harvard.edu/news/pressreleases.php/record_id=27/ Gross, J. (2003, May 28). Right school for 4-year-old? Find an adviser. The New York Times, pp. Al, C17. Hasten, M. (2002, June 19). Journal ranks Louisiana's testing seventh. The News Star, pp. 3A, 5A. International Reading Association. (2003). Teacher salaries. Reading Today, 20(4), 12. Johnson, D. D., & Johnson, B. (2002). High stakes: Children, testing, and failure in American schools. Lanham, MD: Rowman & Littlefield. Katz, M. B. (1971). Class, bureaucracy, and schools: The illusion of educational change in America. New York: Praeger. Levine, A. (2003, January 8). Tests find USA, not students, lacking. USA Today, p. A12. Louisiana Department of Education. (2001). LEAP for the 21st century: Grade 4: English language arts, mathematics, science, and social studies: Test administration manual. Baton Rouge, LA: Author. Louisiana Department of Education. (2003). Louisiana's testing program ranked sixth in nation. Retrieved May 5, 2003, from http://www.louisianaschools.net Medina, J., & Lewin, T. (2003, August 1). High school under scrutiny for giving up on its students. The New York Times, pp. Al, B6. National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office.

13. HIGH-STAKES TESTING

263

Nelson, R. (2003). Rally to psych LEAP students. Retrieved February 28, 2003, from http:// www.nola.com Odden, A., & Wallace, M. J., Jr. (2003, August 6). Leveraging teacher pay: How we can raise student achievement through better systems of compensation. Education Week, p. 64. Quality counts 2000: Who should teach? (2000). Bethesda, MD: Education Week. Steinberg, J. (2002, December 4). For Head Start children, taking a turn at testing. The New York Times, p. B10. Switzer, T. J. (2004). College of Education: Dean's welcome. Retrieved September 10, 2004, from http://www.utoledo.edu/colleges/education/welcome.htrnl Thorndike, E. L., & Barnhart, C. L. (1952). Thorndike Barnhart beginning dictionary. Chicago: Scott, Foresman. Venezky, R. L., & Winfield, L. P. (1979). Schools that succeed beyond expectations in teaching reading (University of Delaware Studies on Education Technical Rep. No. 1). Newark, DE: University of Delaware. Walsh, M. (2003, August 6). Companies jump on "No Child Left Behind" bandwagon. Education Week, p. 8. Wilson, L. (2003). Principal checks out alleged test violation. Retrieved March 15, 2003, from http://www.thenewsstar.com Winerip, M. (2003, April 16). Testing fad is farce for the disabled. The New York Times, p. D9. Wise, A., & Levine, M. (2002, February 27). The 10-step solution: Helping urban districts boost achievement in low-performing schools. Education Week, p. 38.

This page intentionally left blank

14 Time Is of the Essence: An Overview of Quantitative Methodologies for the Study of Change David Kaplan Ximena Uribe-Zarain University of Delaware

SECTION 1: INTRODUCTION Over the years that I (DK) have been at the University of Delaware, I have been fortunate to collaborate with Dick Venezky. Our most recent collaboration, along with the second author (XU-Z), was on an OECD-sponsored longitudinal study of instructional and computing technology (ICT). The expertise that I was able to bring to bear on this problem focused on estimating change over time in ICT skills, and the methodologies I had to offer to the project seemed to fit nicely into the general questions that Dick was addressing. Indeed, Dick seemed quite intrigued by the power of modern methodologies for the study of change. So this chapter is a gift to Dick, providing a didactic overview of classical and modern methodologies for the study of change, highlighting the historical trends that led to the development of these methods. The history and methods described in this chapter are drawn from the fields of statistics, psychometrics, econometrics, and sociology—disciplines that have contributed separately and together in developing methodologies for the analysis of change. Applications of these methods to problems in the social sciences will be provided. Finally, we consider new issues and problems that beset all of the models discussed in this chapter, laying the groundwork for future research in the study of change. The organization of this chapter is as follows. Section 2 discusses methods that consider time as a predictor of change. We concentrate on the

265

266

KAPLAN AND URIBE-ZARAIN

methodologies of time series analysis, the analysis of difference scores, growth curve modeling and its extensions, and end with a discussion of models for the transition between stages over time. The chapter closes in Section 3 with a summary and a discussion of some remaining issues that are fundamental to the study of change. Before beginning, it is important to note that we focus our chapter on the study of models that have been applied to longitudinal data for the purposes of studying change and growth. That is not to suggest that time doesn't factor into other kinds of research questions of major importance. For example, time plays a crucial role in models that primarily address problems of inferring causality. Models that have been used to address issues of causality are cross-lagged panel correlation analysis (see, e.g., Kenny, 1973) and simultaneous equation modeling (Haavelmo, 1943). An excellent discussion of the role of time in discerning cause can be found in Pearl (2000). In addition, time can be an outcome of interest. For example, one may be interested in the time it takes for a teacher to drop out of the teaching profession (Singer, 1993). Such models as discrete-time event history modeling are applicable to questions regarding when an event occurs and predictors of variation in the timing of events (Singer & Willett, 2003). SECTION 2: METHODOLOGIES FOR THE STUDY OF CHANGE In this section, we consider a set of classic and modern methodologies for the analysis of change, where time is viewed as the predictor. Our discussion begins with models of change using highly aggregated data to models that take into account individual differences. We begin with a discussion of time series analysis—a methodology that examines stochastic behavior on a single unit of analysis. We then move on to models for the analysis of panel data, which combines time series data with cross-sectional data and which constitutes the most common design for longitudinal data collection in the social sciences. Although quite simplistic in many ways, a common method for the study of change at average levels using panel data is the analysis of difference scores. We outline the methodology of difference scores as well as common critiques. We also consider the structural equation modeling approach to the analysis of panel data. Although these methods are of historical importance and are, in fact, still used today, they do not take into account information available from individual growth trajectories. To address this criticism, we then move on to a discussion of the analysis of growth curves and recent extensions. Finally, we close this section with a discussion of how change can be modeled as a stage-sequential process and where the outcome of interest is the probability of passing from one stage to another over time.

14. QUANTITATIVE METHODOLOGIES

267

Time Series Analysis At the highest level of aggregation, concern might center on how a single variable fluctuates in value over time. A simple example might be tracking the annual behavior of the gross domestic product (GDP) of the United States from, say, 1929 to 2001. The GDP is an aggregate measure defined as "The total market value of all final goods and services produced annually within the boundaries of the United States, whether by American or foreign-supplied resources" (McConnell & Brue, 1996, p. 120). There are two main objectives to a time series analysis: First, time series analysis allows one to identify the statistical structure of the observable phenomena represented in the set of consecutive measurements, and the second is to predict or forecast future values of the time series variable. A Brief History of Time Series Analysis. Summarizing a detailed history of time series analysis given in Dufour (2003), the earliest recorded time series goes back to antiquity and focused on problems in astronomy. For example, the Romans used systematic temporal observations of the sun to determine the length of a year, leading to the advent of the Julian calendar. The first plot of a time series, at least for the Western world, can be found in an ancient manuscript dating from around the 10th or llth century. This manuscript showed the inclination of the orbit of seven planets as a function of time. However, this ancient plot of a time series seems to be an isolated entity because time series plots were not seen again in the scientific literature until the 18th century. Rather, instead of plots, time series data were collected in tabular format. For example, in 1572 the Danish astronomer Tycho Brahe (1546-1601) recorded an abundance of precise data in tabular format on planetary orbits. Some years later, his assistant, Johannes Kepler (1571-1630), used the data from Brahe's tables to formulate his theory of planetary motion. This work marked the beginning of an accumulation and informal analysis of astronomic series. Half a century later, time series data were recorded in settings other than astronomy. In 1672, William Petty (1623-1687) noticed the need for correcting the effect of a single 7-year cycle to determine the rent of land. That same year, John Graunt (1620-1674) informally studied population and mortality time series data. By the end of the 17th century, the first empirical demand curve was plotted by Gregory King (1648-1712) who used a time series of harvest and prices to represent the demand for wheat in England. The plot of the demand curve provided by King was ahead of its time—not seen again until the 20th century. It was during the second half of the 18th century that time series plots started appearing in scientific literature. For example, in 1786, the English economist William Playfair (1759-1823) published the Commercial and Po-

268

KAPLAN AND URIBE-ZARAIN

litical Atlas, containing 44 graphs of data concerning imports and exports in England from 1700 to 1782. The Swiss philosopher and mathematician Johann Heinrich Lambert (1728-1777) published many different time series plots, including plots of variation of the temperature of the ground in relation to the depth under the surface. Advances in time series analysis during the 18th century were characterized by methods for the detection of multiple periodicities in astronomy. Specifically, astronomers began to realize that Kepler's laws were not exact when they noticed fluctuations and irregularities as well as periodic movements. In 1772, Louis de Lagrange (1736-1813) proposed a mathematical method to detect the periodicities in the orbits of comets. Another French mathematician, Joseph Fourier (1768-1830), proposed that all periodic functions could be written as a series of sinusoidal and cosinusoidal functions. The search for mathematical models to represent periodicity continued through the 19th century. In the 19th century, we begin to see evidence of empirical studies of economic cycles. The relationship between sunspots and wheat prices as well as the search for cycles related to famines were recurrent topics of investigation. The first person to research seasonality fluctuations in the economy was an English banker, James W. Gilbart, who in the mid-19th century examined the circulation of bank bills. Twenty years later, in 1860, Clement Juglar (1819-1905) interpreted commercial crises as a phenomenon that occurred every 10 years, rather than as random accidents. In the 1860s, William Stanley Jevons (1835-1882) also studied commercial fluctuation and asserted the importance of detecting and eliminating such cycles in the data. In 1884, the British physicist John Henry Poynting (1852-1914) posited the so-called moving averages method to eliminate accidental or periodical variations. The idea of removing accidental or periodic trends in the data led to subsequent methods of deseasonalization. From an inspection of time series plots, several researchers were interested in studying if observable peaks were statistically significant. Similarly, work on adjustments for trends was evolving. In 1922, H. Hooker studied the relation between marriage rates and the commerce index in England. He used the correlation between variables shifted in time and applied the correlation coefficient to the data to correct for trends. And it was in 1915 that a method for removing the seasonal component was developed by the economist Morris Albert Copeland (1895-1989). Finally, in 1919 the economist Warren Persons (1878-1937) proposed a general model of the decomposition of a time series. This original model consisted of four components: trend, cyclic, seasonal, and the accidental (noise) components. One of his main contributions was the development of methods to decompose economic time series into components that could then be used to reveal the business cycle. Indeed, Persons used this

14. QUANTITATIVE METHODOLOGIES

269

method to develop an economic indicator that became famous in the 1920s and was known as the Harvard barometer. This is probably the first economic indicator ever developed. Features of Time Series Analysis. The classical decomposition of time series data consists of three parts. First, we may consider that a time series is nothing more than a series of random shocks, which is referred to as the noise component. Second, overlaying the noise component are possible trends that we wish to identify. One pattern can be based on trends over time, such as linear or quadratic trends. Another potential trend might be due to the effects of early observations in the time series. The third trend might be due to the effects of early shocks to the time series. Finally, the third component of a time series might be the identification of a seasonal cycle, referred to as seasonality (Brockwell & Davis, 1991). An example of a seasonal cycle might be the occurrence of viral infections during the winter months. In general, the pattern can be defined by trend and seasonality. These components are not mutually exclusive and may coincide in any given time series. The approach that is widely considered the most efficient for identifying, estimating, and forecasting with time series data is the Box and Jenkins (1970) autoregressive, integrated, moving average, or ARIMA (p, d, q) model. The elements of the ARIMA model consist of the autoregressive component, p, which is concerned with the effects of preceding scores. The component d addresses the trends in the data. The component q concerns the moving average component representing the random shocks to the process. There are three general steps to a time series analysis: identification, estimation, and diagnosis. Through visual aids such as autocorrelation functions and partial autocorrelation functions, it is possible to identify distinctive trends in the data, should they exist, because various types of patterns show specific footprints in the autocorrelation or partial autocorrelation functions (Tabachnick & Fidell, 2001). This corresponds to the d parameter in the ARIMA model. An important step in determining the trend in a time series is to assess if the time series is stationary or nonstationary. A process is said to be stationary if the series varies around a constant mean with a constant variance. However, if the mean and variance are changing over time, the series is said to be nonstationary. The goal is to attempt to transform a non-stationary process into a stationary process for the purpose of using the theory of stationary processes for prediction and forecasting (Brockwell & Davis, 1991). The Box-Jenkins approach to making a series stationary uses the method of differencing—subtracting the value of an earlier observation from the current observation in the series. Differencing by a

270

KAPLAN AND URIBE-ZARAIN

lag of one observation (d = 1) removes the linear trend in the data; a lag-2 difference (d - 2) removes the linear and quadratic trends, and so on. Clearly, d - 0 implies that the process is already stationary. Nonconstant variance is handled by logarithmic transformations of the observations. The second step in a time series analysis is the estimation of the autoregressive component or the moving average component, or both. Typically, these components are tested against a null hypothesis to determine if they are zero. The final step of a time series analysis is to determine if there are any patterns that have not been accounted for by the model thus far. Identifying remaining patterns is accomplished by examining the residual difference between the scores predicted by the model and the actual scores themselves. Should such remaining patterns exist, the model would be fine-tuned until an adequate model is developed. When the model achieves a certain satisfactory fit to the data, then it is ready to be used to predict future values of the process. This step is referred to as forecasting. Example of lime Series Analysis. An example of a time series analysis applied to a noneconomic problem is that of Caces and Harford (1998). The objective of the Caces and Harford study was to examine relationship between time series data on the consumption of alcohol and suicide mortality during the era just past Prohibition—specifically, from 1934-1987. Related goals of the study included examining possible covariate effects of unemployment rate, per capita income, and divorce rates on the relationship between alcohol consumption and suicide mortality. Gender and age effects were also examined. To account for possible effects of World War II, the investigators introduced a dummy variable into the time series model with 1 representing the years 1942-1945 and 0 representing all other years. The authors noted that when comparing time series data in the presence of a long range trend, such comparisons can yield artificially large correlations. Therefore, Caces and Harford (1998) decided to use the method of first-differencing discussed earlier to remove the confounding effects of the trend. The results of their study indicated a nonsignificant correlation between alcohol consumption and suicide rates. However, when the covariate of unemployment was included in the model, the bivariate association between alcohol consumption and suicide rates became significant for both men and women and for young (under 40 years) and middle-age (40-59 years) people, but not for those over 60 years of age. Analysis of Difference Scores In the previous paragraphs, we considered a model for explaining change over time in a single variable. We see applications of time series analysis to highly aggregated economic or other social data. In this section, we con-

14. QUANTITATIVE METHODOLOGIES

271

sider models for the analysis of so-called panel data. Panel data arise from panel studies, where at least one and usually numerous measurements are taken on individual units of analysis, such as individuals, organizations, and so on. Rather than aggregating across these units, panel data simply describe the change in each of these units on the measurement from one measurement period to the next. Thus, panel data are the combination of cross-sectional data and time series data. Panel data are very common in the social sciences. Some major panel studies in the social sciences include the Panel Study of Income Dynamics (ICPSR, 1995), the National Longitudinal Study, and the National Educational Longitudinal Study (NCES, 1988). Panel data allow us to begin to assess the nature of change and its determinants at individual units of analysis. At the simplest level analysis, change can be described by the analysis of difference scores. Description of the Difference Score and Criticisms. Measuring a variable at two different points in time leads to the almost natural calculation of a difference score regardless of what the true underlying model of change might be. As its name indicates, a difference (d) or change score is obtained by subtracting the value observed for the variable at Time 1 (X1) from the value observed for the variable at Time 2 (X2). This perception of individual change accounts to a large extent for the attraction of longitudinal data (Plewis, 1985). Throughout the years, the difference score approach to the measurement of change has been criticized on two main grounds: its unreliability and its tendency to yield regression effects (Allison, 1990). With respect to unreliability, Lord (1963) stated that "differences between scores tend to be much more unreliable than the scores themselves" and that "the difference between two fallible measures is frequently much more fallible than either" (Lord, 1963; as cited in D. Rogosa, 1995, p. 12). To see this, consider the simplest case where X1 and X2 are equally reliable. Then, the reliability of the difference score is where 12 is the correlation between Xl and X2, and is their common reliability. When the correlation between Xl and X2 is positive, as it usually is, the reliability of the difference score will be less than their common reliability. By way of illustration, if is 0.8 and 12 is 0.7, the reliability of the difference score is 0.333. The second criticism of the difference score is its susceptibility to regression effects. Allison (1990) describes this problem as the spuriously negative relationship of a third variable to the difference score. He explains that because of regression toward the mean from X1 to X2, the correlation between X1 and the difference score will be negative. In other words, individuals with high pretest scores will tend to evidence lower scores on the posttest. Conversely, individuals with low scores on the pre-

272

KAPLAN AND URIBE-ZARAIN

test will likely show improvement on the posttest. So, if a third variable is positively correlated with X1, it will tend to have a spuriously negative correlation with the difference score. As an example of regression effects, let us take reading scores Rl and R2 for the same population at two different points in time, and socioeconomic status (SES) as a third variable affecting both scores. Assume, reasonably, that SES is positively correlated with the reading scores (i.e., high reading scores are correlated with high SES). If we consider extreme cases, we see first that when R1 is very high, R2 will decrease due to the regression toward the mean; therefore, the difference score (R2 minus R1) will be negative. At the same time, in the extreme case of a very low reading score R1 due to the regression toward the mean, the reading score R2 will tend to increase leading to a positive difference score. As a result, SES will be negatively correlated with the difference score (i.e., the higher the SES, the smaller the difference score). In Defense of the Difference Score. Rogosa, Brandt, and Zimowski (1982) argued that in the behavioral sciences, it has not been fully recognized that two-wave panel data contain a very narrow amount of information about the individual history of each subject. Although they consider two-wave data to be inadequate for yielding sound answers to crucial questions of change, the authors advocated some of the assets of the difference score. Specifically, Rogosa and Willett (1983) closely examined the concept of reliability to untangle the criticism of the difference score. They argued that the reliability of a measure expresses the ability of that measure to differentiate among individuals on a particular true score. Consequently, when all individuals have nearly the same true change, the difference score is not capable of making distinctions among individuals. As a result, the reliability of the difference score will be very low. However, when there is variability in the true change among individuals, then the reliability of the difference score will be high, and it does make sense to use the difference score. In offering a defense of difference scores, Rogosa et al. (1982) stated that the difference score does distinguish among subjects when individual differences in true change exist. Moreover, they showed that difference scores were not intrinsically unreliable. Most important, "the difference score can be an accurate and useful measure of individual change even in situations where the reliability is low" (Rogosa et al., 1982, p. 730). They argued that when there are individual differences to be detected and a measure has high reliability, the reliability of the difference score will be "respectable." Others, such as Zimmerman and Williams (1982) and Sharma and Gupta (1986), argued that there are common circumstances in

14. QUANTITATIVE METHODOLOGIES

273

which difference scores can be highly reliable (as cited in Allison, 1990). It is recommended, therefore, not to assume that the difference score is always unreliable. Rogosa and Willett (1983) concluded that "the deficiency ... lies in the data, not the measure of individual change" (p. 335), and once more they reiterate that two-wave panel designs offer very little insight into individual change. An Example Using Difference Scores. Throughout the 1970s and 1980s, a considerable corpus of research was conducted within the domain of learning potential. For a review of the literature and methodological issues surrounding the area of learning potential, see Glutting and McDermott (1990). The use of difference scores (aka gain scores) was a common methodology used to study learning potential. As an example, Folman and Budoff (1971) used gain scores to study the differences in learning potential in educable mentally retarded adolescents and normal adolescents. In that study, 80 adolescents were individually interviewed on questions related to their vocations. The first part of the interview focused on their present occupational status and degree of interest in their jobs. The second part of the interview was concerned with job aspirations, those jobs they actually expected to obtain, their parents' aspirations for them, and the realism of their plans. The methodology employed in the study consisted of three individual administrations of tests given prior instruction, 1 day after instruction, and 1 month after coaching. The patterns of performance differences on the learning potential task were the basis for assigning the learning potential status. Students were classified as gainers, high scorers, and nongainers according to differences in performance from the pretest to the posttest. The results showed few differences between educable mentally retarded and normal adolescents in vocational development. On the other hand, within the educable mentally retarded group, larger proportions of gainers and high scorers held after-school jobs and had more realistic aspirations than nongainers. Structural Equation Modeling for Longitudinal Data Without question, an extremely important contribution to the analysis of social science data came about with the advent of structural equation modeling (Joreskog, 1977). Although it is beyond the scope of this chapter to describe the details of structural equation modeling (see Kaplan, 2000, for a detailed discussion of structural equation modeling), we concentrate on the broad ideas of the method and specifically how it has been used for the study of change.

274

KAPLAN AND URIBE-ZARAIN

Historical Review of Structural Equation Modeling. Structural equation modeling represents the fusion of two separate statistical models. The first model is factor analysis, developed in the disciplines of psychology and psychometrics (Mulaik, 1972). The second model is simultaneous equation modeling, developed mainly in econometrics but having an early history in the field of genetics. Following the historical review given in Kaplan (2000), structural equation modeling represents a hybrid of factor analysis and path analysis into one comprehensive statistical methodology. The path analytic origins of structural equation modeling can be traced to the biometric work of Sewell Wright (1918,1921,1934,1960). Wright's major contribution was in demonstrating how the correlations among variables could be related to the parameters of a model as represented by a path diagram—a pictorial device that Wright was credited with inventing. Wright also showed how the model equations could be used to estimate direct effects, indirect effects, and total effects. Wright (1918) first applied path analysis to the problem of estimating size components of the measurements of bones. Interestingly, this first application of path analysis was statistically equivalent to factor analysis and was developed apparently without knowledge of the work of Spearman (1904). Wright also applied path analysis to problems of estimating supply and demand equations and also treated the problem of model identification. These issues formed the core of later econometric contributions to structural equation modeling (Goldberger & Duncan, 1972). A second line of development occurred in the field of econometrics. Mathematical models of economic phenomena have had a long history, beginning as discussed earlier with Petty in 1676. However, the form of econometric modeling of relevance to structural equation modeling must be ascribed to the work of Haavelmo (1943). Haavelmo was interested in modeling the interdependence among economic variables and developed what was referred to as simultaneous equation modeling. Simultaneous equation modeling was a major innovation in economic modeling. The development and refinement of the simultaneous equations model was the agenda of the Cowles Commission for Research in Economics, a conglomerate of statisticians and econometricians that met at the University of Chicago in 1945 and subsequently moved to Yale (see Berndt, 1991). This group linked the newly developed simultaneous equations model with the method of maximum likelihood estimation and associated hypothesis testing methodologies (see Hood & Koopmans, 1953; Koopmans, 1950). For the next 25 years, a major focus of econometric research was devoted to the fine-tuning of the simultaneous equations approach.

14. QUANTITATIVE METHODOLOGIES

275

Basic Ideas of Structural Equation Modeling. The combination of the simultaneous equation modeling and factor analysis frameworks into a coherent analytic methodology was based on the work of Joreskog (1973), Keesling (1972), and Wiley (1973). The general structural equation model as outlined by Joreskog (1973) consists of two parts: the measurement part, which links observed variables to latent variables via a confirmatory factor model, and the structural part, linking latent variables to each other via systems of simultaneous equations. The estimation of the model parameters uses maximum likelihood estimation under the assumption of multivariate normality, but new estimation methods exist that allow this assumption to be relaxed (see, e.g., Muthen, 1984). Structural equation modeling has been used extensively over the years for the analysis of panel data (see Joreskog, 1977). The most important advantage of the structural equation modeling approach is that it estimates the stability of longitudinal relationships among variables while taking into account the fallible nature of the observed variables. The problem of measurement error in the observed variables is handled, as noted earlier, through the specification of an explicit measurement model, linking multiple measures of a hypothetical construct via the factor analysis model. The factors are then related to other factors via regression. In the context of the analysis of panel data, measures of exogenous factors are measured at Time 1, whereas measures of endogenous factors may be measured at Time 1 as well as Time 2. Application of Structural Equation Modeling Approach to Longitudinal Data. A classic example of this use of structural equation modeling for the analysis of longitudinal data is the study of the stability of alienation, reported in Wheaten, Muthen, Alwin, and Summers (1977). Their study looked at stability over time of the construct of alienation, as measured by two observed scales, one of the feeling of anomia and the other scale measuring feelings of powerlessness. In addition, the background construct of SES was measured by respondents' education and the Duncan Socio-Economic Index. These measures were taken on 932 individuals in two rural areas of Illinois at two points in time, 1967 and 1971. Three models for the stability of alienation were tested. The first model simply regressed the construct of alienation in 1971 on alienation in 1967. The second model added the SES construct. The third model allowed for autocorrelated error among the same measurements of alienation in 1967 and 1971. The results indicate that the model with autocorrelation provided an acceptable fit to the data and showed a significant effect of alienation measured in 1967 on alienation measured in 1971, after controlling for SES.

276

KAPLAN AND URIBE-ZARAIN

Modeling Individual Growth Curves The methodologies described earlier analyze change at an aggregate level. Specifically, time series analysis is typically applied to highly aggregated economic data where the interest is in modeling the specific features of the trend for purposes of forecasting. The analysis of difference scores and structural equation modeling are applied to less aggregated data that combine features of cross-sectional designs with time series designs. The power of these methodologies notwithstanding, Rogosa (1995) and others have argued that these methods do not use information from individual growth trajectories in the estimate of change. These and other criticisms have led to intense work on the analysis of change that accounts for individual differences in development. In this section, we describe the method of growth curve modeling and its recent extension. Early Attempts at Modeling Growth Curves. One of the earliest contributions to the study of growth curves can be traced to the Belgian mathematician Adolphe Quetelet (1796-1874), who developed methods for assessing change in physical, social, and behavioral variables based on his conception of the average man (I'homme moyeri). To Quetelet, the average man was a conceptual construct based on the average of relevant variables over large aggregates of individuals. The idea was that, by taking averages, individual variation would wash out and what would remain was a true picture of the phenomenon in question. This idea was the beginning of what Quetelet hoped would be a " 'social physics,' the gatekeeper to a mathematical social science" (Stigler, 1986). In Quetelet's Physique Sociale (1835), a considerable amount of information on human social and behavioral variables were compiled on the average man (Hald, 1998). Data were collected on categories such as birth and mortality; height, weight, and strength; moral and intellectual capabilities; and propensity to commit crimes, to name just a few. Of relevance to this chapter is Quetelet's application of the average man construct to the analysis of growth curves. According to Hald, Quetelet noted that although height at birth was known from medical records, and the height of men conscripted into the French army was known from government records, the average growth rate between these two periods of time was not known. Based on collecting measurements of the height of children in schools and other institutions in Brussels, Quetelet estimated a hyperbolic equation, the linear term of which captured growth from 5 to 15 years of age. According to Hald, Quetelet also assessed the fit of the curve via a method of calculating residuals. In Section 3, we consider models for the analysis of growth curves that do not aggregate to the average man but

14. QUANTITATIVE METHODOLOGIES

277

rather allow for estimation of growth rates based on information obtained from individual growth curves. Basic Idea of Conventional Growth Curve Modeling. Growth curve modeling is a method that has been advocated for many years by researchers such as Meredith and Tisak (1990), Rogosa et al. (1982), Muthen (1991), Willett (1988), and Willett and Sayer (1994) for the study of intraindividual differences in change. The conventional growth curve model can be specified from two rather distinct perspectives. First, the model can be seen as a special case of multilevel modeling (Raudenbush & Bryk, 2002). In this context, the Level-1 model consists of intraindividual differences in initial starting levels and growth over time. Time-varying predictors can be included in the Level-1 model. These Level-1 growth parameters can be modeled at Level-2 (i.e., the individual level of analysis). At Level-2, variation in the intercepts and slopes can be modeled as functions of time-invariant individual characteristics. Parameters at Level-2 can then be further modeled as functions of Level-3 units of analysis, such as classrooms or schools. Growth curve modeling via the multilevel modeling perspective is an extremely powerful and very natural way of modeling growth. Indeed, the multilevel approach to growth curve modeling is capable of treating individual growth in a highly flexible manner—for example, taking into account different measurement occasions for different individuals (see, e.g., Raudenbush & Bryk, 2002). The flexibility of the multilevel approach toward modeling growth notwithstanding, research by Muthen (1991) and Willett and Sayer (1994) showed how growth curve models can also be specified within the general structural equation modeling framework described earlier. As noted in an earlier section, the general structural equation model consists of two parts: a measurement part that links observed variables to underlying factors and a simultaneous equation part that relates the factors to each other. In the context of growth curve modeling, the measurement part links repeated measures of an outcome, say reading achievement, to latent growth factors and to predictors of the repeated measures, and a structural part links latent growth factors to each other and to individual level predictors (see, e.g., Kaplan, 2000). The major advantage of the structural modeling framework to conventional growth curve modeling lies in its flexibility in handling measurement error in the outcomes and predictors. Applications of Conventional Growth Curve Modeling. In a recent study examining the development of mathematics competencies in young children with and without comorbid reading difficulties, Jordan, Kaplan,

278

KAPLAN AND URIBE-ZARAIN

and Hanich (2002) studied achievement of 180 children examined over four points, spanning second and third grades. In their study, four achievement groups were identified: difficulties in mathematics but not in reading (MD only), difficulties in mathematics as well as in reading (MDRD), difficulties in reading but not in mathematics (RD only), and normal achievement in mathematics and in reading. In the context of growth curve modeling, IQ, income, ethnicity, and gender were used as time-invariant covariates. Jordan et al. (2002) found that when these covariates were held constant, the MD-only group achieved at a faster rate in mathematics than did the MD-RD group. In reading, the RD-only and MD-RD groups achieved at about the same rate. Reading abilities influenced children's achievement growth in mathematics, but mathematics abilities were not found to influence children's achievement growth in reading. Additional examples of growth curve modeling applied to mathematics achievement can be found in Jordan, Hanich, and Kaplan (2003a, 2003b). Growth Mixture Modeling. Clearly, conventional growth curve modeling is a powerful and flexible methodology that captures an important form of individual heterogeneity—namely, individual differences in rates of growth. The major advantage of the structural modeling framework to conventional growth curve modeling lies in its flexibility in handling measurement error in the outcomes and predictors. Because of this, conventional growth curve modeling addresses the concerns raised by Rogosa et al. (1982) and others regarding the importance of using information about individual growth in the estimation of growth functions for the population. The power of conventional growth curve modeling notwithstanding, a fundamental limitation of the method is that it assumes that the observed growth trajectories are a sample from a single finite population of individuals characterized by a single average status parameter and a single average growth rate. However, it may be the case that the sample is derived from a mixture of populations, each having its own unique growth trajectory. For example, children may be sampled from populations exhibiting very different classes of reading growth—some children may have very rapid rates of growth in reading that level off quickly, others may show relatively normal rates of growth, and still others may show very slow or problematic rates of growth (see Kaplan, 2002). If this is so, then conventional growth curve modeling applied to a mixture of populations will result in biased estimates of growth. Moreover, from a policy or clinical perspective, the use of conventional growth curve modeling in the presence of mixtures of populations could result in a lack of power to detect the influence of policy-relevant variables on growth factors. For example, full-

14. QUANTITATIVE METHODOLOGIES

279

day kindergarten may be beneficial for children with problematic growth trajectories but irrelevant for children with positive and steep growth trajectories. Therefore, it is necessary to relax the assumption of a single population of growth and allow for the possibility of mixtures of different populations. This section introduces the methodology of growth mixture modeling developed and advocated by Muthen (2004) and his colleagues, of which the standard growth curve model described earlier is a special case. Basic Ideas of Growth Mixture Models. Growth mixture modeling begins by combining conventional growth curve modeling with latent class analysis (e.g., Clogg, 1995) under the assumption that there exists a mixture of populations defined by unique trajectory classes. The topic of latent class analysis is described briefly in a later section in the context of latent transition analysis. Suffice to say here that latent class analysis is a set of methodologies that aims to uncover clusters of individuals who are similar with respect to a set of characteristics measured by a set of binary outcomes. In this respect, latent class analysis has parallels with factor analysis. Latent class analysis can be extended to include covariate information that helps describe the formation of the latent classes. The results of a latent class analysis yield estimates of class probabilities, the relationship of those probabilities to covariates, and the classification of individuals into the latent classes. An extension of latent class analysis sets the groundwork for growth mixture modeling. Specifically, latent class analysis can be applied to repeated measures at different time points. This is referred to as latent class growth analysis (LCGA) (Nagin, 1999). As with latent class analysis, LCGA assumes homogenous growth within classes. Growth mixture modeling relaxes the assumption of homogenous growth within classes and is capable of capturing two important forms of heterogeneity. The first form of heterogeneity is captured by individual differences in growth through the specification of the conventional growth curve model. The second form of heterogeneity is more fundamental—representing heterogeneity in classes of growth trajectories. Finally, the different classes can exhibit different relationships to a set of covariates. For example, adding covariates allows one to test whether attendance in full-day versus part-day kindergarten has a differential effect on growth depending on the shape of the growth trajectories. Again, one might find that there is a small difference between full-day versus part-day kindergarten for children with normative or above average rates of growth in reading but that full-day kindergarten has a strong positive effect for those children who show below normal rates of growth in reading.

280

KAPLAN AND URIBE-ZARAIN

An Example of Growth Mixture Modeling. Growth mixture modeling is a relatively new procedure, so it has not been applied extensively. That said, we can point to two contributions of the methodology to important policy issues in education. The first example is a recent paper by Kreisman (2003) that concerned the application of growth mixture modeling to the evaluation of academic outcomes of Head Start. Using data from Prospects1 and concentrating on change from Spring of first grade to Spring of third grade, Kreisman found two distinct classes of growth trajectories in reading and mathematics among children receiving Head Start: The first class demonstrated high reading and math scores with modest decline over time, whereas the second class showed very low starting scores and increases over time. Those in the low-starting class did not catch up with their counterparts within the time frame of the study. It should also be noted that a separate group of children that did not participate in any preschool experience showed comparable classes of growth trajectories. Moreover, Kreisman found that the number of years of program participation did not predict the rate of growth in reading or mathematics. Finally, gender gaps and income gaps were found only for children who did not participate in any preschool experience. Stage Sequential Processes: Latent Transition Analysis The power and popularity of growth curve modeling and its extensions is now well established in the array of quantitative methodologies in the social and behavioral sciences. However, these methodologies only address change over time in continuous or discrete outcomes. Another type of question that arises in the social and behavioral sciences concerns change in qualitative status over time. For example, concern may rest on changes in developmental states, such as Piaget's or Kohlberg's stages of cognitive and moral development, respectively. Or, for example, concern may rest on changes in substance abuse status (abstainer to heavy user) during adolescence. This section reviews recent developments and applications of latent transition analysis, a method that extends latent class analysis to problems of stage sequential development over time. A Brief History of Modeling Changes in States. The idea of modeling stage sequential processes can be traced back to the seminal work of A. A. Markov (1856-1922). As cited in Sheynin (1988), Markov was a professor at Saint Petersburg University and a member of the St. Petersburg Acadl Prospects: The Congressionally Mandated Study of Educational Growth and Opportunity was produced by the U.S. Department of Education to evaluate federal Chapter 1 programs (Kreisman, 2003).

14. QUANTITATIVE METHODOLOGIES

281

emy of Sciences. Markov's contribution to probability theory cannot be underestimated. He was the first to prove the central limit theorem, and, by weakening some of the restrictions surrounding this theorem and the law of large numbers, Markov paved the way for probability theory to encompass random variables that possess a dependent structure, such as those linked together by so-called Markov chains (Sheynin, 1988). It appears that Markov began studying the idea of simple chains as early as 1906 and presumably coined the term chain in a paper of that same year. From Sheynin (1988), it seems that the term Markov chain appeared in 1926. However, the notion of dependent structures in data were present in early work, such as problems in Brownian motion, random-walk processes, and the early work of Bachlelier (1870-1946), who used the theory of Brownian motion to model the behavior of the Paris stock market, among other examples. It was Markov's work on the probability theory of chains involving discrete random variables that is of specific relevance to methods for studying stage sequential development. A stochastic model of stage sequential processes is referred to as a Markov process. Following Coleman's (1964) treatment of the subject, a Markov stochastic process specifies several states that an individual can occupy over time. A Markov process, then, "is a stochastic process in which the transition probabilities can be specified by knowing only the present state, and not the past history, of the individual" (Coleman, 1964, p. 23). The outcomes of interest are estimates of the transition probabilities of moving from one state to another. There are a variety of Markov models that can be specified, from very simple to very complex. A simple Markov model consists of a single chain, where predicting the current state of an individual only requires data from the current occasion. In a simple Markov model, it is assumed that the transition probabilities are the same for all individuals—an assumption that the population is homogenous. Such a model can be used to study changes in, say, political preference over time—where at each measurement occasion, individuals are asked if their political preference is, say, Republican or Democrat. The transition probabilities are, then, simply the proportions of individuals who maintain their political preferences. If it is assumed that over time the proportions are the same, this is referred to as a time-invariant Markov model. Time varying Markov models can be defined as well. More complex Markov models could allow different processes to exist within different groups. Finally, Markov models can be extended to latent variables as well (see, e.g., Bijleveld & van der Kamp, 1998). Basic Ideas of Latent Transition Analysis. Although the application of Markov models for the analysis of psychological variables goes back to Anderson (1954; as cited in Collins & Wugalter, 1992), most applications

282

KAPLAN AND URIBE-ZARAIN

at that time focused on single manifest measures. However, as in the early work in factor analysis of intelligence tests, it was recognized that many important psychological variables are latent—in the sense of not being directly observed but possibly measured by numerous manifest indicators. The advantages to measuring multiple latent variables via multiple indicators are the known benefits with regard to reliability and validity. Therefore, it would be preferable to model latent variables in the context of Markov processes and in this way provide an adequate representation of psychological constructs. The appropriate measurement model for categorical latent variables is the latent class measurement model. Latent class models were introduced by Lazarsfeld and Henry (1968) for the purpose of deriving latent attitude variables from responses to dichotomous survey items. In a traditional latent class analysis, it is assumed that an individual belongs to one and only one latent class and that, given the individual's latent class membership, the observed variables are independent of one another—the so-called local independence assumption (see Clogg, 1995).2 The latent classes are, in essence, categorical factors arising from the pattern of response frequencies to categorical items, where the response frequencies play a role similar to that of the correlation matrix in factor analysis (Collins, Hyatt, & Graham, 2000). The analog of factor loadings are parameters that estimate the probability of a particular response on the manifest indicator given membership in the latent class. Unlike continuous latent variables (factors), categorical latent variables (latent classes) divide individuals into mutually independent groups. Of relevance to this chapter is the work of Wiggins (1973), who merged the latent class measurement model with Markov models. Contributions to the problem of estimation were made by van de Pol and de Leeuw (1986) and van de Pol and Langeheine (1989). The difficulty with these models, as noted in Collins and Wugalter (1992), was that they focused on one manifest indicator of the latent variable. Such an indicator could be, they argued, unreasonably long and complicated. The alternative would be to come up with simpler multiple manifest categorical indicators of the categorical latent variable and combine them with Markov stochastic process models. The combination of multiple indicator latent class models and Markov stochastic process models provided the foundation for latent transition analysis of stage-sequential dynamic latent variables. From Collins and Wugalter (1992), stage sequential dynamic latent variables are metaconstructs composed of other latent variables and the relations among them (p. 134). To make these concepts more concrete, 2 It may be interesting to note that latent class models are special cases of latent Markov models where latent class membership is not changing over time (Bijleveld & van der Kamp, 1998).

14. QUANTITATIVE METHODOLOGIES

283

consider a hypothetical stage sequential model of reading development. Indeed, a five-stage model of reading development was considered by Chall (1995). In this model, the first stage represents phonemic awareness ability, the second stage represents phonemic awareness ability and the ability to recognize beginning and ending sounds, and so on. Note that these skills are not observed directly but only indirectly via multiple items tapping their respective skills. Each stage consists of latent classes, where, for this example, there might be two latent classes: "Has ability" and "Does not have ability." Note that for all of these stages, an individual has an array of latent class memberships. At any given point in time, the array of latent class memberships defines his or her latent status. The problem then concerns estimation of the transition probabilities from one latent status to another. Latent transition analysis (Collins & Wugalter, 1992) is used to estimate the transition probabilities from one latent reading status to another. An Example of Latent Transition Analysis. An example of latent transition analysis applied to the development of reading skills can be found in Kaplan and Walpole (2004).3 Using data from the Early Childhood Longitudinal Study (NCES, 2001), Kaplan and Walpole addressed four research questions: (1) Are there underlying latent classes of reading competencies inherent in the ECLS-K data for children moving from kindergarten through the end of first grade, and, if so, how many are there? (2) Do the same number of reading competency classes appear at Fall of kindergarten, Spring of kindergarten, Fall of first grade, and Spring of first grade? (3) What are the transition probabilities across the latent statuses over the waves of the study? (4) To what extent do latent classes and latent status transition probabilities differ by poverty level? The results of the Kaplan and Walpole (2004) study lent empirical support to a stage sequential theory of reading development. Specifically, initial latent class analyses supported the existence of five latent classes.4 Moreover, the five latent classes remain relatively stable over the waves of ECLS-K. The findings further showed that there is a moderate probability of moving from an early skill status to the next skill status but a very low probability of jumping a skill status. With regard to poverty level differences, Kaplan and Walpole found that the number and interpretation of latent classes are similar across poverty levels. Children living below pov3 Results of the Kaplan and Walpole (2004) study were presented at the conference honoring Dick Venezky. 4 Based on the structure of the ECLS-K reading assessment, as well as patterns of response probabilities, these classes were named: Low Alphabet Knowledge, Early Phonemic Awareness, Advanced Phonemic Awareness, Word Reading, and High Literacy Skills (see Kaplan & Walpole, 2004, for more details).

284

KAPLAN AND URIBE-ZARAIN

erty were found to be more heavily represented in the low-skill-level classes compared to children living above poverty. Children living below poverty had higher probabilities of staying in lower skill classes than did children living above poverty, except in cases where children have very high skill levels.

SECTION 3: SUMMARY AND REMAINING ISSUES This chapter presented an overview of some of the more common methodologies used in the social and behavioral sciences to study change, along with a cursory look at the history that led to those methodologies. Framing our chapter in terms of models where time is the predictor of change, it is clear that a sophisticated array of methodologies are at our disposal to analyze problems of change. Our analysis choices extend from models for forecasting highly aggregated macrosocioeconomic variables to fine-grained questions of individual growth. Existing models can also address questions of changes in qualitative status. It is probably not too much of an exaggeration to say that these methodologies are poised to revolutionize our understanding of change and how change is responsive to policy or clinically relevant predictors or interventions. The power of these methodologies notwithstanding, all of them rest on an assumption that at each point in time when measurements are taken the underlying dynamic system is in a state of equilibrium. By equilibrium, we mean simply that any exogenous shocks to the dynamic process under study have worked their way through the system and that the outcome of interest has reached a (possibly) new equilibrium level. Recent work on the equilibrium problem focused on cross-sectional methodologies applied to inherently dynamic phenomena. Specifically, most regression-based models (e.g., structural equation models) are typically applied to cross-sectional data where variables are measured at a single point in time or at discrete time points and where the distance between points in time can sometimes be measured in years. It is well known that cross-sectional data provide only a snapshot of an ongoing dynamic process. As a result, regression-based models applied to cross-sectional data assume that any changes in the system prior to the time point of data collection have manifested their effects and that the system is in a stable equilibrium during estimation. The equilibrium problem has been discussed at length in Tuma and Hannan (1984) and even earlier in Coleman (1964). However, the seriousness of this issue as it pertains to static models of dynamic phenomena have not led to methodological advancements. Indeed, a review of the extant literature on problems of equilibrium shows that studies focused on

14. QUANTITATIVE METHODOLOGIES

285

structural equation models applied to cross-sectional data and reveal that rarely, if ever, is the assumption of a stable equilibrium explicitly acknowledged. Notable exceptions include important methodological papers by Schoenberg (1977) and Sobel (1990). A recent paper by Kaplan, Harik, and Hotchkiss (2000) looking at these issues from the context of structural equation modeling provided further empirical evidence of the seriousness of the problem of applying cross-sectional models to dynamic phenomena that are not in equilibrium. Therefore, as powerful as the models reviewed in this chapter may be, they still provide a rather coarse view of a complex dynamic reality. It is hoped that future methodological research will either adapt existing models to account for the true underlying dynamics of social phenomena or begin to develop a completely new class of models that deals directly with the problem of equilibrium. In this way, the social sciences will move toward a deeper understanding of the essence of time on problems of change.

REFERENCES Allison, P. D. (1990). Change scores as dependent variables in regression analysis. Sociological Methodology, 20, 93-114. Berndt, E. R. (1991). The practice of econometrics: Classic and contemporary. New York: AddisonWesley. Bijleveld, C. C. J. H., & van der Kamp, L. J. T. (1998). Longitudinal data analysis: Designs, models, and methods. Thousand Oaks, CA: Sage. Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. Oakland, CA: Holden-Day. Brockwell, P. J., & Davis, R. A. (1991). Time series: Theory and methods (2nd ed.). New York: Springer-Verlag. Caces, F., & Harford, T. (1998). Time series analysis of alcohol consumption and suicide mortality in the United States, 1934-1987. Journal of Studies on Alcohol, 59, 455-461. Chall, J. S. (1995). Stages of reading development (2nd ed.). Belmont, CA: Wadsworth. Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling in the social and behavioral sciences (pp. 81-110). San Francisco: Jossey-Bass. Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press. Collins, L. M., Hyatt, S. L., & Graham, J. W. (2000). Latent transition analysis as a way of testing models of stage-sequential change in longitudinal data. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples (pp. 147-161). Mahwah, NJ: Lawrence Erlbaum Associates. Collins, L. M., & Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic latent variables. Multivariate Behavioral Research, 27, 131-157. Dufour, J.-M. (2003). Histoire de I'analyse des series chronologiques. Retrieved 2004 from http:// www.fas.umontreal.ca/SCECO/Dufour Folman, R., & Budoff, M. (1971). Learning potential and vocational aspirations of retarded adolescents. Exceptional Children, 38, 121-130.

286

KAPLAN AND URIBE-ZARAIN

Glutting, J. }., & McDermott, P. A. (1990). Principles and problems in learning potential. In C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and educational assessment of children (pp. 296-347). New York: Guilford Press. Goldberger, A. S., & Duncan, O. D. (1972). Structural equation methods in the social sciences. New York: Seminar Press. Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. Econometrica, 11, 1-12. Hald, A. (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. Hood, W. C., & Koopmans, T. C. (Eds.). (1953). Studies in econometric method (Vol. 14). New York: Wiley. ICPSR. (1995). Panel study of income dynamics, 1968-1992. Ann Arbor, MI: Author. Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003a). Arithmetic fact mastery in young children: A longitudinal investigation. Journal of Experimental Child Psychology, 85, 103-119. Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003b). A longitudinal study of mathematical competencies in children with specific mathematics difficulties versus children with comorbid mathematics and reading difficulties. Child Development, 74, 834-850. Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with learning difficulties in mathematics: Findings of a two-year longitudinal study. Journal of Educational Psychology, 94, 586-597. Joreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 85-112). New York: Academic Press. Joreskog, K. G. (1977). Structural equation models in the social sciences: Specification, estimation and testing. In P. R. Krishnaiah (Ed.), Applications of statistics (pp. 265-287). Amsterdam: North-Holland. Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Newbury Park, CA: Sage. Kaplan, D. (2002). Methodological advances in the analysis of individual growth with relevance to education policy. Peabody Journal of Education, 77, 189-215. Kaplan, D., Harik, P., & Hotchkiss. (2000). Cross-sectional estimation of dynamic structural equation models in disequilibrium. In R. Cudeck, S. H. C. du Toit, & D. Sorbom (Eds.), Structural equation modeling: Present and future. A Festschrift in honor of Karl G. Joreskog (pp. 315-339). Lincolnville: Scientific Software International. Kaplan, D., & Walpole, S. (2004, April). An application of latent transition analysis to the development of reading competencies in young children: Evidence from ECLS-K. Paper presented at the American Educational Research Association, San Diego, CA. Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis. Unpublished doctoral dissertation, University of Chicago. Kenny, D. (1973). Cross-lagged and synchronous common factors in panel data. In A. Goldberger & O. Duncan (Eds.), Structural equation models in the social sciences (pp. 153-165). New York: Seminar Press. Koopmans, T. C. (Ed.). (1950). Statistical inference in dynamic economic time series (Vol. 10). New York: Wiley. Kreisman, M. B. (2003). Evaluating academic outcomes of Head Start: An application of general growth mixture modeling. Early Childhood Research Quarterly, 18, 238-254. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris (Ed.), Problems in measuring change (pp. 21-38). Madison, WI: University of Wisconsin Press. McConnell, C. R., & Brue, S. L. (1996). Economics: Principles, problems, and policies (13th ed.). New York: McGraw-Hill. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122. Mulaik, S. (1972). The foundations of factor analysis. New York: McGraw-Hill.

14. QUANTITATIVE METHODOLOGIES

287

Muthen, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132. Muthen, B. (1991). Analysis of longitudinal data using latent variable models with varying parameters. In L. Collins & J. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 1-17). Washington, DC: American Psychological Association. Muthen, B. (2004). Latent variable analysis. Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 345-368). Thousand Oaks, CA: Sage. Nagin, D. S. (1999). Analyzing developmental trajectories: A semi-parametric, group-based approach. Psychological Methods, 4, 139-157. NCES. (1988). National Educational Longitudinal Study of 1988. Washington, DC: U.S. Department of Education. NCES. (2001). Early Childhood Longitudinal Study: Kindergarten class of 1998-99: Base year public-use data files user's manual (No. NCES 2001-029). Washington, DC: U.S. Government Printing Office. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Plewis, I. (1985). Analysing change. Chichester, UK: Wiley. Quetelet, A. (1835). Sur I'homme et le developpement de sesfacultes, ou Essai de physique sociale. Bachelier, Paris. Pirated ed. Hauman, Bruxelles, 1836. Translated into German in 1838 and into English in 1842. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousands Oaks, CA: Sage. Rogosa, D. (1995). Myths and methods: "Myths about longitudinal research" plus supplemental questions. In J. Gottman (Ed.), The analysis of change (pp. 3-66). Mahwah, NJ: Lawrence Erlbaum Associates. Rogosa, D., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92(3), 726-748. Rogosa, D., & Willett, J. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20(4), 335-343. Schoenberg, R. (1977). Dynamic models and cross-sectional data: The consequences of dynamic misspecification. Social Science Research, 6, 133-144. Sharma, K. K., & Gupta, J. K. (1986). Optimum reliability in gain scores. Journal of Experimental Education, 54, 105-108. Sheynin, O. B. (1988). A. A. Markov's work on probability. Archive for History of Exact Sciences, 39, 337-377. Singer, J. D. (1993). Are special educators' careers special. Exceptional Children, 59,262-279. Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford, UK: Oxford University Press. Sobel, M. E. (1990). Effect analysis and causation in linear structural equation models. Psychometrika, 55, 495-515. Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201-293. Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: Harvard University Press. Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon. Tuma, N. B., & Hannan, M. T. (1984). Social dynamics: Models and methods. New York: Academic Press. van de Pol, F., & de Leeuw, J. (1986). A latent Markov model to correct for measurement error. Sociological Methods and Research, 15, 118-141.

288

KAPLAN AND URIBE-ZARAIN

van de Pol, F., & Langeheine, R. (1989). Mover-stayer models, mixed Markov models and the EM algorithm, with an application to labour market data from the Netherlands SocioEconomic Panel. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 485-495). Amsterdam: North Holland. Wheaton, B., Muthen, B., Alwin, D., & Summers, G. (1977). Assessing reliability and stability in panel models with multiple indicators. In D. R. Heise (Ed.), Sociological methodology 1977 (pp. 84-136). San Francisco: Jossey-Bass. Wiggins, L. M. (1973). Panel analysis. Amsterdam: Elsevier. Wiley, D. E. (1973). The identification problem in structural equation models with unmeasured variables. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 69-83). New York: Academic Press. Willett, J. B. (1988). Questions and answers in the measurement of change. In E. Z. Rothkopf (Ed.), Review of research in education (Vol. 15, pp. 345—422). Washington, DC: American Educational Research Association. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Wright, S. (1918). On the nature of size factors. Genetics, 3, 367-374. Wright, S. (1921). Correlation and causation. Journal of Agriculture Research, 20, 557-585. Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics, 5, 161-215. Wright, S. (1960). Path coefficients and path regressions: Alternative or complementary concepts? Biometrics, 16, 189-202. Zimmerman, D. W., & Williams, R. H. (1982). Gain scores in research can be highly reliable. Journal of Educational Measurement, 19, 149-154.

15 The Dictionary of Old English: The Next Generation(s) Antonette diPaolo Healey University of Toronto

Richard Venezky, to whom this essay is dedicated, once described the designing of software systems for a scholarly dictionary as akin to preparing a spaceship for a journey to a distant galaxy in outer space (Venezky, 1987, p. 113). As he noted, such a project certainly encompasses many technological generations and perhaps more than one biological generation. The example of the print Middle English Dictionary (MED) may be instructive. In 1930 when the MED began, an American astronomer at Flagstaff, Clyde William Tombaugh, discovered Pluto, the ninth and, so far, the last planet in our solar system; British engineer Frank Whittle was the first to invent and patent the jet engine; Johannes Ostermeier, a German, patented the flashbulb for the camera; and a Canadian research student, William Chalmers, developed the lightweight thermoplastic polymer we still know today as Plexiglas (Ochoa & Corey, 1997, pp. 238-240). By 2001, when the MED was completed at the mature age of 71 (Lewis, 2002), man had not only viewed his universe through a telescope but had long since walked on the moon; jets had become the usual means of transport for ordinary people traveling great (and sometimes, little) distances; satellites were capable of beaming back photographs taken by cameras on Mars; and significant parts of our lives are lived not so much with the concreteness of polymers and plastics but with the virtual reality of the World Wide Web. More recently, Susan Hockey acknowledged that the problem of time is perhaps the most difficult issue facing a historical dictionary, in contrast

289

290

HEALEY

with other electronic text projects, because of the need to adapt to one computer system after another over the length of the project and the potential loss of even more time in making the various transitions (Hockey, 2000, p. 149). There are also other complexities. As Richard Venezky presciently observed, what scholars think are their needs at the beginning of a project may change on the journey as technology advances and opportunities present themselves (Venezky, 1987, p. 113). This is an elegant way of stating that lexicographers in the computer age aim at a constantly moving target. They need to recognize the new capabilities and meet the new challenges, while simultaneously clearly focusing on the heart of the work, the actual writing of their dictionaries. The revisiting of objectives is often a salutary exercise because it can be a measure of the distance traveled and a means of insight in charting new directions. The Dictionary of Old English (DOE) at the University of Toronto, one of the first corpus-based dictionary projects, engaged Richard Venezky's attention and benefited from his expertise since 1969 (Venezky, 1973). The project's founding editor, Angus Cameron (1970-1983), planned his dictionary from the start to make use of computer technology (Cameron, 1983). Early by-products of computerization were two microfiche concordances to the Corpus of Old English (Venezky & Butler, 1985; Venezky & Healey, 1980), generated from a full-text database of over 3,000 Old English texts, some 5 million running words of Old English and Latin, embodying 25 million characters (Cameron, Amos, Butler, & Healey, 1981). The concordances represented a linguistic milestone: It was the first time the full corpus of a sizable language had been published in analyzed form. (The first concordance contained all of Old English except for some 200 spellings of high-frequency words; the later concordance contained the high-frequency spellings omitted from the earlier.) It was also the first time any dictionary had published its complete citation base. Although these early concordances were hailed as research tools indispensable for the field of Anglo-Saxon studies, their fixed format has since been superseded by the dynamism of interactive concordances created through the powerful search engine developed for the project's Web Corpus (Healey, Price-Wilkin, & Ariga, 1997). In parallel fashion, the electronic Corpus, originally developed through a text-processing system known as LEXICO (Venezky, Relies, & Price, 1977), has migrated from mag tape to diskette to its present distribution on CD-ROM and on the Web (Healey, 2002). As is evident, the challenges facing the Dictionary of Old English in the 21st century are quite different from those encountered by other historical dictionaries, which published originally in print and then had technology applied to them retroactively. The MED (McSparran, 2002), the OED (Warburton, 2000), the Dictionary of the Older Scottish Tongue (Rennie,

15. DICTIONARY OF OLD ENGLISH

291

2001), Das Deutsches Worterbuch of Jacob and Wilhelm Grimm (Christmann, 2001), to take but four examples, have conceived the recent computerization of their dictionaries as an enterprise distinct from their creation. At the Dictionary of Old English, they are one. Our research drives us to develop or exploit new tools to aid us in the writing of the Dictionary, and these new tools, in turn, help us to explore new questions in Old English which we hope are reflected in the entries we write. Although lexicographers are terrifyingly aware of the rush of time, its passage need not always be a disadvantage: The design constraints of an earlier period may become irrelevant in another. We used to worry about the billing algorithms of computing centers and schedule our work for processing at the cheapest time (Venezky, 1987, pp. 114-115). Fortunately, that is no longer an issue in an era of personal computers, with their powerful memories and capacious hard drives. However, there has been one design constraint that has been with us from the beginning, the problem of the special characters: how to create them, how to tag them, how to display them, and how to print them. Susan Hockey saw this as an immediate concern facing historical lexicographers (Hockey, 2000, p. 149). It is an issue to which I will return as I describe our journey into uncertain territory, one that culminated in July 2003 with the publication of the first electronic version of the first seven letters (out of 22) of the Old English alphabet (Healey et al., 2003). It is a journey marked by a deep concern for making efficient use of available technologies as we tell the story of early English.

THE ELECTRONIC CORPUS: A TESTING GROUND FOR THE DICTIONARY OF OLD ENGLISH: A TO F ON CD-ROM The creation of the Dictionary of Old English: A to F on CD-ROM began with a series of incremental steps, and it began, as so much of everything else at the project, with the electronic Corpus of Old English texts. We knew that the Dictionary would have to be tagged (i.e., marked up or encoded) to be searched efficiently. A markup scheme allows implicit textual information to be made explicit. The task is complex principally because of the richly embedded information in a standard dictionary entry. Therefore, we started with the less difficult task of tagging the Corpus, which existed as straight text, not only to make it portable and more easily searchable but also to gain experience with text markup. The tagging of the Corpus, in comparison with the later tagging of the Dictionary, was simplicity itself (although it did not seem so at the time), having only three

292

HEALEY

key elements: a header describing the text, a body of continuous text, and unique identifiers for the system of reference. Emendations were also tagged by a correction tag and languages other than Old English by a foreign tag. The first significant move toward the creation of the electronic Dictionary, therefore, actually took place in 1995, when for the first time the Dictionary of Old English Corpus adhered to the newly published Guidelines issued by the Text Encoding Initiative (Sperberg-McQueen & Burnard, 1994). Ours was among the first corpora of any language to conform fully to what has now become the standard for electronic text. After having made the Corpus available on the World Wide Web with the help of John Price-Wilkin, Humanities Text Initiative, University of Michigan, who created the interface to the Web in late 1997, we were ready to begin tagging the Dictionary, the first six letters of which had already been published on microfiche between 1986 and 1996 (Cameron, Amos, & Healey, 1986-).

THE ELECTRONIC DICTIONARY Between 1998 and 2003 we made the transition from microfiche to electronic publishing, from typographical markup to logical markup. These 5 years have marked the greatest transformation in our project's history. During this period, we moved from the closed proprietary systems of the previous century to the open source technology of the present. It was a seismic undertaking for our small team of eight (five lexicographers, a systems analyst, a copyeditor, a computer editor). If we had fully understood at the outset the time and effort involved, we might not have had the courage to start. However, now that we see the end result, a Dictionary freed from the tyranny of searches only on headwords and capable of multiple points of entry, we know that this migration was the best decision we could have made. Peter Mielke, the project's systems analyst from September 1998 to February 2002, was invaluable in helping us move to a more standard platform by implementing SGML (Standard Generalized Markup Language) across our entire operation. Xin Xiang, who replaced him, was responsible for developing the Dictionary in a Windows environment. The electronic DOE is a tribute to their intelligence, hard labor, and genial support of our editorial efforts. In addition, we relied on colleagues in the field, Peter Baker, University of Virginia, and Demorah Hayes, University of Kentucky, and our Toronto research assistants, Damian Fleming, Rob Getz, and Mark Sundaram, to act as beta testers. Their excellent advice improved both the navigability and the search capability of the electronic DOE.

15. DICTIONARY OF OLD ENGLISH

293

THE INFORMATION FIELDS The field structure of a DOE entry formed the basis of the structural tagging of the entries. It consists principally of 10 fields (with a few additional areas for searching): Headword, PartofSpeech, AttestedSpelling, Occurrence, Schema, Definition, Citation, LatinEquivalent, SeeAlso (Old English references), and Secondary Reference. These fields are the paths followed by the search engine for the electronic Dictionary. As we implemented a system of structural tagging on both past and present entries, we undertook a massive correcting of the six legacy fascicles (A, AE, B, C, D, and £) while writing the entries for F, the third largest letter in the Old English alphabet. Revisions encompassed the correction of substantive errors, the updating of some Old English texts and Latin source material, and the replacement of hand-inserted special characters by electronically generated ones. This last item has not been a trivial matter, and as mentioned at the outset, the problem of the special characters has been with us since the start. Although Unicode provides a larger character set than other character-encoding standards (Sperberg-McQueen & Burnard, 2001), it still does not support all that lexicographers of Old English require. The creation of the fonts for the electronic DOE was facilitated through the great generosity of Peter Baker in allowing us to use his Junicode, a Unicode font for medievalists, as the basis of our display. We have added to Junicode both Greek and Runic and 55 characters not found in Unicode, such as many manuscript abbreviations, including m, n, and w with macrons; certain symbols, such as the does-not-equal sign; the co-occurrence of short and long marks over a vowel with a subscript dot, found in the Middle English Dictionary; and a number of symbols in manuscripts. These characters, which we have designed and tagged, can also be displayed and printed. We made the DOE True Type font in the Windows operating system freely available to colleagues around the world to use for their own purposes. It not only resides on the CD-ROM in the Appendix, but it also can be installed under their system font directory and then used in any application. As I hope is evident, there is already some convergence in Anglo-Saxon studies toward standards.

STANDARDS The kind of repair work I have just described, correcting, updating, replacing, is absolutely crucial to prevent the databases we create from becoming fossils. For our searches to have integrity, it is essential that our material adhere to various standards—standard editions, standard naming conventions for Old English and Latin texts, SGML for text markup,

294

HEALEY

and fonts based on the Unicode standard. One advantage a dictionary being compiled in the 21st century has over print dictionaries is that its lexicographers can maintain and update electronic files on a regular basis as they continue their push through the alphabet. At the Dictionary of Old English project, we believe that the publication of no future letter will require as much revision and standardization as we have undertaken for A through F because this has been a period of consolidation and transformation. This investment in the data has immediate benefits: Our files are now portable and searchable, and, because they adhere to the new standards, they should have longevity and outlast their computer systems.

SYSTEM REQUIREMENTS The search engine for the electronic Dictionary, eponymously labeled "DOEsearch," was developed in a Windows environment. It is the enhanced descendant of the Section Search program developed earlier at the project in the Linux environment for searching and viewing the fields of all tagged entries. The user interface runs on a number of Windows environments (Windows 98/Me/NT/2000/XP) and requires Internet Explorer 5.5 or higher as the browser. The entries are in three formats on the CD-ROM: SGML, XML (Extensible Markup Language), and HTML (Hypertext Markup Language). The HTML format is automatically installed on the hard drive.

GETTING ONE'S BEARINGS As the homepage of the Dictionary opens (see Fig. 15.1), the left window provides users with a series of choices. They can view the title page of the Dictionary, seek information through the user guide and search tips, scan the updated list of short titles, or seek help about specific problems under "Trouble Shooting." If the letter F is clicked (Fig. 15.2), a side window appears that provides a word index to the 3,016 words in F. Users can scroll down to find a particular headword, such as father, click on it, and then the entry will appear. We believe that this path will be very easy for nontechnical users to follow. We also have tried to take advantage of the conventions to which users are accustomed from browsing with Internet Explorer in other contexts: The left arrow takes users back in the series of pages they have visited; the right arrow takes users forward. To these we have added two more arrows: The up arrow will take users to the previous entry alphabetically; the down arrow, to the next alphabetical entry.

FIG. 15.1.

DOE homepage.

FIG. 15.2. Search on the letter "f".

295

296

HEALEY

HOTLINKS

To increase navigability, we inserted a number of hotlinks in the Dictionary itself: in the schemas, definitions, short titles, and the Old English references (SeeAlso field). For example, a user is able to move from a sense in the schema in a long entry, such as the preposition /adverb fore 'before', to the corresponding sense in the entry itself and then back again to the schema by clicking on the arrow. In the definitions, we frequently point to other headwords in the Dictionary of Old English or to other senses within the same word. So, for example, in Sense l.a of the noimfreond 'friend' (Fig. 15.3), we suggest that readers look at freondleas 'friendless'. Crossreferences such as this are usually highlighted; I say "usually" because we have not been able to identify all the patterns yet. But once identified, the user can move effortlessly with a click of the mouse. In addition, each of the short titles at the head of a citation has a hotlink to its bibliographic reference, drawn from the data created for the CD-ROM Corpus in 2000 (Healey, Holland, McDougall, & Mielke, 2000) and since updated. If a user clicks on the Old English short title "Or 5" in the entry fyr 'fire' Sense l.e (Fig. 15.4), the window at the bottom right specifies that the base text is Janet Bately's 1980 edition of The Old English Orosius for the Early English Text Society, together with the system of reference. It should be apparent from this description that we have bundled into the electronic Dictionary

FIG. 15.3.

Hotlink to freondleas in Sense l.a.

15. DICTIONARY OF OLD ENGLISH

297

FIG. 15.4. Hotlink to bibliographic reference for Or 5.

what is, in fact, another publication in its own right, a revised and updated bibliography of the texts that comprise the Dictionary of Old English Corpus. The bibliography is available at two points: here, through the individual short title at the start of a citation and back on the Dictionary homepage, on the sidebar under the heading "Short Titles." As mentioned previously, because the Dictionary has been published between 1986 and the present, it has been a massive task regularizing texts across the letters as new editions appear. For the 2003 publication on CD-ROM, we concentrated on updating across the seven letters much of the homiletic material gathered together in recently published collections. For the publication of G in 2005, we will concentrate on updating the six versions of the AngloSaxon Chronicle that form part of the Dictionary's citation base. Finally, to aid the user in navigating through the Dictionary, we provide for each of the references in the "SeeAlso" field hotlinks to other words in the Dictionary as long as they begin with the letters A through F. So, for example, in the entry for the compound fyrdsearu 'equipment for war', we can move directly to the main item in the word family, the simplex fyrd 'military service' through a hotlink in the "SeeAlso" section. Although we have written and input a number of words for other letters of the Dictionary, we decided not to make them hotlinks yet because they have not undergone final revision in light of the responses of specialist readers, experts on the

298

HEALEY

technical vocabulary of Old English, such as the legal, medical, and botanical terms, to whom we send out those entries for review. Together with the creation of the special characters, the creation of the live links in the "SeeAlso" section was, in programming terms, the most challenging issue in the development of the Dictionary on CD-ROM. The spellings in this section are in an abbreviated format, and the various separators, such as commas and semicolons, can be variously interpreted. It took a number of trials to establish the proper patterns for expanding the spellings to their full format to create the links. FIELD SEARCH

The paths described so far are relatively easy, straightforward ways of navigating the Dictionary of Old English. Another way of organizing a search, a more powerful path, is through the dropdown menu of the section search (Fig. 15.5). Here the user can search the various fields of a Dictionary entry. Colleagues interested in the formation of compounds can discover the words that end in the masculine abstract suffix -dom and the simplex dom 'doom' itself by searching the "Headword" field for the pattern dom. The search result appears in the left window, and each

FIG. 15.5. Dropdown menu specifying search on Headword.

15. DICTIONARY OF OLD ENGLISH

299

search result will bring up the actual entry by a click of the mouse (Fig. 15.6; freodom). Scholars interested in morphology can run a search on feminine weak nouns by keying in "f\ v wk" as a pattern in the "PartOfSpeech" field and discover there are 224 possible candidates so far in A through F. A click on any of the words listed in the search result window will bring the user directly to the appropriate field in the entry. Also to be found in the "PartOfSpeech" field are rare editorial comments on etymology. Comparative linguists eager to note the influence of or association with the cognate Germanic languages can run searches in this field on Old Saxon, for example (Fig. 15.7), to view the Old English vocabulary which is either derived from Old Saxon or at least shows good grounds for direct comparison with an Old Saxon term. Students who want to know under what headword we may have treated a rare spelling are now able to key in the Patternfures in the "AttestedSpelling" field and discover that we consider this string a form both of furh 'furrow' and of fyr 'fire'. Lovers of hapax legomena can now find all words of single occurrence (4,055 words) in A through F by running a search in the "Occurrence" field and can narrow the search even more to words of single occurrence in poetry (855 words). It is also in the "Occurrence" field that we comment on restrictions in dialect, date, authorship, or genre. Appreciators of Wulfstan,

FIG. 15.6. Search on the Headword field for the suffix dom.

300

HEALEY

FIG. 15.7. Search on the PartOfSpeech field for "Old Saxon".

the 11th-century homilist known for his idiosyncratic style of preaching (Orchard, 1992, p. 240), can now discover those words we thought significant enough to label as characteristic of his usage by searching on his name in the "Occurrence" field. Likewise, historians interested in legal material can search for words whose meanings are predominantly or exclusively legal by keying in the phrase "in laws" (Fig. 15.8) or the term "legal" again in the "Occurrence" field. For particular legal senses in the more general vocabulary, a search on "in laws" or "legal" in the "Definition" field will no doubt be productive. Similarly, distinctively poetic meanings can now more readily be discerned among words with prose senses (Fig. 15.9). Or to aid the exploration of the notion of disease in Anglo-Saxon England, a search on this term in the "Definition" field brings up a host of ailments (Fig. 15.10): from madness to the "royal disease" (probably jaundice) to alopecia, leprosy, palpitations of the heart (perhaps) to something akin to ringworm. A search on "pain" brings up a number of specific ailments in various parts of the body: pain in the bones, bladder, throat, chest, knee, eyes, feet, and buttocks; a sudden sharp pain; the pain of childbirth; and the metaphoric pain or sting of anxiety, to name a few. Or, to further the investigation of the cultural concept of the anxiety of authority (Stanton, 2002), a search in the "Definition" field on "authority" provides a striking example

FIG. 15.8.

Search on the Occurrence field for "in laws".

FIG. 15.9. Search on the Definition field for "in poetry".

301

302

HEALEY

FIG. 15.10. Search on the Definition field for "disease".

of an author following the model of a text. The citation is from AElfric of Eynsham, an 11th-century monk known for his orthodoxy. The citation reads: "We follow Augustine's exposition in this gospel." For AElfric, not following authority would be anxiety producing. A search on citations of a specific text in the electronic DOE is an invaluable aid for teaching and an especially effective one in a smart classroom. A search on the short title "wan" in the "Citation Reference" field gathers together all the citations of the poem "The Wanderer" which have been cited for the letters A through F, 110 occurrences so far. Users can then narrow their search to a specific reference, "wan 54," by bringing up the FindDialogue box (Control F) in Internet Explorer and keying in their search. Both teacher and students can then see how the editors of the Dictionary of Old English construe the problematic phrase fleotendraferhp, a notorious crux in a canonical text where we are unsure of the meaning of the phrase (? spirit of the floating ones, ? spirit of the fleeting ones) and of its referents (? imagined companions, ? seabirds). Anglo-Latinists, or those surveying the concept of childhood in Anglo-Saxon England (Crawford, 1999), can know with certainty that at least three Old English words, astyped 'bereft/dispossessed', ealdorleas1 'deprived of parents', andfreondleas 'friendless', gloss Latin orphanus 'orphan' by keying in that word in

15. DICTIONARY OF OLD ENGLISH

303

the "LatinEquivalent" field. There undoubtedly will be more as we move through the alphabet. Anglo-Latinists can also discover which words may have been borrowed from Old English into Latin, the reverse of the usual direction, by running a search on the Dictionary of Medieval Latin from British Sources (MLD) in the "SecondaryReference" field. A quick glance through the search results reveals that many of the words, not surprisingly, are legal terms. Finally, for scholars interested in spellings, one of the most powerful capabilities of the electronic DOE is regular expression searches in various fields of the entry: For example, it is now easy to produce a conflated, alphabetical list of feminine abstract nouns ending in ung or -ing or a list of headword spellings showing either eo or u between w and r (Fig. 15.11). For scholars of a language with as varied a spelling system as Old English, this is a considerable advance. The searches described previously demonstrate how the electronic DOE opens new areas for exploration in the earliest period of English. Now that searches are no longer restricted to headwords, the DOE can become an important research tool for interrogating legal, medical, social, literary, and cultural issues, and so on, in Anglo-Saxon England, as well as investigating questions of language, such as morphology, spelling, semantics, and even notions of genre or the idiolect of named authors. A dic-

FIG. 15.11. Regular expression search on eo or u between w and r.

304

HEALEY

tionary is a significant repository of the culture of an age. A tagged dictionary makes this repository accessible.

MICROFICHE PUBLICATION The electronic Dictionary now supersedes the microfiche publication. However, we still intend to distribute F on microfiche for those colleagues who, for various reasons, are not able to use the electronic DOE. Microfiche publication has been particularly beneficial to colleagues at universities in those parts of the world where electronic resources are scarce or expensive. Although microfiche publication presents a static dictionary rather than a dynamic text, at the very least it allows the dissemination of new research. We wish to provide for colleagues who are promoting the study of Old English under incredibly challenging circumstances the latest results of our research in an inexpensive format they can access. Our expectation is that with the next letter or two, as technology advances more broadly in underprivileged areas, we will be able to cease microfiche publication entirely.

BACK TO THE FUTURE As we look toward the future, we are very interested in the response of users, especially their suggestions for further improvements. We are genuinely concerned about making the electronic DOE a research tool that answers scholarly needs. If it is possible to implement changes, we will certainly try, given the usual constraints of time and funding. Our own experience in developing and publishing the CD-ROM of A to F has already suggested a few directions in which to proceed. We would like to enhance the "DOEsearch" capability. At the moment, the searches are strictly limited to within specific fields. We envision the creation of Boolean searches so that users can search across fields: for example, a search for all headwords beginning with for-, which are classified "Noun." In our tagging system, this involves a search across the "Headword" and "PartOfSpeech" fields. Or another example might be a search for all words meaning 'helmet' in Beowulf, a search across the "Definition" and "Citation" fields. There are many other cross-field searches that can be devised, limited only by our software and our imaginations. We would also like to bundle in and hotlink an updated bibliography of Latin source material, which we often cite in the parenthetical material following the citation. The Latin short title to the source text could form the hotlink and would parallel our treatment of the Old English short title as the hotlink to the Old English

15. DICTIONARY OF OLD ENGLISH

305

edition. One major concern will be the question of enhancing performance. As the Dictionary increases in size and as new features are introduced, we want to ensure that searches can still be efficient. As the second largest letter in the Old English alphabet, H, looms ever closer, scalability becomes an important priority. Whatever the future holds, we hope to take advantage of enabling technologies for illuminating the beginnings of the English language. That we are poised for the future is owing to Richard Venezky's long and vital association with our project. His interventions at certain turning points were crucial for the well-being of the Dictionary of Old English. His scholarly generosity and his gift for friendship have sustained us over three decades on our journey. To complete the journey, we require that virtue highly esteemed by the first speakers of English: ellen noun1 'courage, strength' (Fig. 15.12). ACKNOWLEDGMENTS The research of the Dictionary of Old English project on the letter F has been funded by the Social Sciences and Humanities Research Council of Canada; the Provost's Office, University of Toronto; the McLean Founda-

FIG. 15.12.

A virtue esteemed by the Anglo-Saxons.

306

HEALEY

tion and the Salamander Foundation, Toronto; the British Academy, London; the National Endowment for the Humanities, an independent federal agency; the Gladys Krieble Delmas Foundation and the Andrew W. Mellon Foundation, New York; the Salus Mundi Foundation, Tucson; and the various contributors to the Dictionary of Old English Fundraising Campaign organized by Patrick Conner, West Virginia University; Joyce Hill, University of Leeds; Nicholas Howe, then at Ohio State University; and Jane Toswell, University of Western Ontario. I am grateful to the project's systems analyst, Xin Xiang, for formatting the screens for this essay.

REFERENCES Cameron, A. (1983). On the making of the Dictionary of Old English. Poetica, 15-16, 13-22. Cameron, A., Amos, A. C, Butler, S., & Healey, A. diP. (1981). The Dictionary of Old English Corpus in electronic form. Toronto, Canada: Dictionary of Old English Project. Cameron, A., Amos, A. C., & Healey, A. diP. (Eds.). (1986-). The Dictionary of Old English: D (1986), C (1988), B (1991),AE(1992), A (1994), E (1996), F (2004).Toronto, Canada: Pontifical Institute of Mediaeval Studies for the Dictionary of Old English Project. Christmann, R. (2001). Books into bytes: Jacob and Wilhelm Grimm's Deutsches Worterbuch on CD-ROM and on the Internet. Literary and Linguistic Computing, 16, 121-133. Crawford, S. (1999). Childhood in Anglo-Saxon England. Thrupp, UK: Sutton. Healey, A. diP. (2002). The Dictionary of Old English: From manuscripts to megabytes. Dictionaries, 23, 156-179. Healey, A. diP., Holland, J., McDougall, D., McDougall, I., Speirs, N., & Thompson, P. (with Haines, D.). (2003). Dictionary of Old English: A to F. With electronic version for Windows developed by X. Xiang. Toronto, Canada: Pontifical Institute of Mediaeval Studies for the Dictionary of Old English Project. Healey, A. diP., Holland, J., McDougall, I., & Mielke, P. (2000). The Dictionary of Old English Corpus in electronic form. (TEIP3 conformant version, 2000 release on CD-ROM). Toronto, Canada: Dictionary of Old English Project. Healey, A. diP., Price-Wilkin, J., & Ariga, T. (1997; new release 2000). The Dictionary of Old English Corpus on the World-Wide Web. Ann Arbor: University of Michigan Press. Hockey, S. (2000). Electronic texts in the humanities: Principles and practice. Oxford, UK: Oxford University Press. Lewis, R. E. (2002). The Middle English Dictionary at 71. Dictionaries, 23, 76-94. McSparran, F. (2002). The Middle English compendium: Past, present, future. Dictionaries, 23, 126-141. Ochoa, G., & Corey, M. (Eds.). (1997). The Wilson chronology of science and technology. New York: Wilson. Orchard, A. P. McD. (1992). Crying wolf: Oral style and the "sermones lupi." Anglo-Saxon England, 21, 239-264. Rennie, S. (2001). The Electronic Scottish National Dictionary (eSND): Work in progress. Literary and Linguistic Computing, 16, 153-160. Sperberg-McQueen, C. M., & Burnard, L. (Eds.). (1994). Guidelines for electronic text encoding and interchange. Chicago: Text Encoding Initiative. Sperberg-McQueen, C. M., & Burnard, L. (Eds.). (2001). TEI P4 guidelines for electronic text encoding and interchange XML-compatible edition. Retrieved September 26, 2003, from http:// www.tei-c.org/P4X/

15. DICTIONARY OF OLD ENGLISH

307

Stanton, R. (2002). Bible translation and the anxiety of authority. In The culture of translation in Anglo-Saxon England (pp. 101-143). Cambridge, UK: D. S. Brewer. Venezky, R. L. (1973). Computational aids to dictionary compilation. In R. Frank & A. Cameron (Eds.), A plan for the Dictionary of Old English (pp. 307-327). Toronto, Canada: University of Toronto Press. Venezky, R. L. (1987). Unseen users, unknown systems: Computer design for a scholar's dictionary. In Proceedings of the Third Annual Conference of the UW Centre for the New Oxford English Dictionary: The Uses of Large Text Databases, November 9-10, 1987 (pp. 113-121). Waterloo, Canada: UW Centre for the New OED. Venezky, R. L., & Butler, S. (Comps.). (1985). A microfiche concordance to Old English: The highfrequency words. Toronto, Canada: Pontifical Institute of Mediaeval Studies. Venezky, R. L., & Healey, A. diP. (Comps.). (1980). A microfiche concordance to Old English. Toronto, Canada: Pontifical Institute of Mediaeval Studies. Venezky, R. L., Relies, N. N., & Price, L. A. (1977). Man-machine integration in a lexical processing system. Cahiers de lexicologie, 30, 17-49. Warburton, Y. (2000). The Oxford English Dictionary—From OED to OED Online. Euralex Newsletter, International Journal of Lexicography, 13(2), 7-8.

This page intentionally left blank

Author Index

A Abbott, J., 108, 125 Abelson, R. P., 90, 104 Abramson, M., 49, 58 Adams, M. J., 42, 58, 129, 146, 202, 208 Add, D., 65, 78 Adger, C. T., 25, 36 Afflerbach, P., 156, 171 Ahissar, M., 56, 58 Albrecht, F., 184, 190 Albrecht, J. E., 101, 103 Albro, E. R., 94, 104 Alioto, A., 63, 69, 72, 78 Allison, P. D., 271, 273, 285 Alloway, N., 257, 262 Almasi, J. F., 206, 208 Alwin, D., 275, 288 Amory, A., 184, 190 Amos, A. C., 290, 306 Andersen, E. S., 31, 35 Anderson, R. C., 140, 347, 201, 208 Andrews, S., 46, 58 Andruski, J. E., 69, 78 Arguin, M., 55, 61 Ariga, T., 290, 306 Ash, S., 25, 27, 28, 36

Ashburn, L., 73, 75, 77 Asian, F., 134, 136, 146 Aslin, R. N., 47, 60, 69, 76 Atkins, P., 46, 58 Auerbach, E., 134, 146 Aufderheide, P., 157, 158, 169 Austin, M. C., 200, 208 Azuma, T., 49, 58

B Bahrick, L. E., 73, 75, 77, 78 Baker, L., 140, 146 Baldwin, D. A., 73, 75, 77 Balota, D. A., 49, 58 Bame, K., 48, 49, 59 Barnes, M. A., 46, 60 Barnes, W. S., 129, 147 Barnhart, C. L., 1, 18, 248, 263 Baron, J., 45, 58 Barros, B., 184, 192 Barth, P., 235, 243 Bartlett, F., 81, 102 Bartolome, L. I., 129, 147 Bartolone, J., 84, 99, 105 Basche, P., 91, 92, 106

309

310 Bates, E., 70, 77 Bauer, D. W., 45, 51, 60 Baughn, C., 93, 205 Baumberger, T., 71, 78 Baxter, P. M, 203, 209 Bazalgette, C., 160, 269 Beck, I., 95, 103 Bereiter, C., 184, 189, 292 Berman, R. A., 92, 202 Berndt, F. R., 274, 285 Bernhardt, E., 109, 224, 225 Berninger, V. W., 153, 269 Bernstein Ratner, N., 68, 69, 77 Besner, D., 49, 58 Bijeljac-Babic, R., 53, 62 Bijleveld, C. C. J. H., 281, 282n2, 285 Binet, A., 89, 202 Birch, S. L., 101, 203 Bisanz, G. L., 155, 269 Black, J. B., 84, 92, 202 Blits, J., 242, 243 Bloom, L., 64, 65, 77 Bloomfield, L., 1, 18 Boberg, C., 25, 27, 28, 36 Boechler, P. M., 28, 36 Boersma, D. C., 68, 78 Bolter, J. D., 152, 269 Bond, G. L., 9, 18 Bonin, P., 48, 54, 55, 60 Boring, E. G., 193, 194, 208 Boroditsky, L., 66, 70, 77 Bouchard, E., 91, 92, 205 Bower, G. H., 84, 92, 101, 202, 204 Bowey, J. A., 53, 58 Box, G. E. P., 269, 285 Brand, R. J., 73, 75, 77 Brandt, D., 272, 277, 278, 287 Brewer, W. T., 92, 103 Bright, W., 23, 35 Bristow, P. S., 133, 247 Brockwell, P. J., 269, 285 Brown, A. L., 89, 102, 184, 290 Brown, J. S., 183, 290 Brue, S. L., 267, 286 Bryk, A. S., 277, 287 Bub, D., 55, 62 Buber, M., 173, 290 Budoff, M., 273, 285 Burke, J., 178, 290 Butler, S., 290, 306

AUTHOR INDEX

C Caces, F., 270, 285 Calfee, K. C., 13, 16, 17, 28 Calfee, R. C., 11t, 13, 15, 17, 28, 22, 35, 44, 58, 108, 224, 157, 161, 162, 269 Camaioni, L., 70, 71, 77 Cameron, A., 290, 306 Cameron, E. H., 194, 208 Camilli, G., 206, 208 Canturk, M., 134, 136, 246 Carle, E., 117, 224 Carlo, M. S., 131, 247 Carter, P., 240, 243 Chafetz, J., 53, 62 Chall, J. S., 200, 208, 208, 209, 283, 285 Chambers, S. M., 45, 49, 58 Chambliss, M. J., 108, 224 Chandler, J., 129, 247 Chandler-Olcott, K., 163, 269 Chapman, J. W., 131, 246 Chapman, R., 44, 58 Chen, D., 181, 292 Chistovich, I. A., 69, 78 Chistovich, L. A., 69, 78 Choi, S., 71, 77, 78 Chomsky, C., 23, 35 Christ, W. G., 152, 269 Christian, C., 108, 225 Christian, D., 25, 36 Christmann, R., 291, 306 Chudowsky, N., 164, 270 Chumbley, J. L, 49, 58 Cipolla, C., 152, 269 Clanchy, M. T., 152, 269 Clogg, C. C., 279, 282, 285 Cohen, M. M., 45, 47, 52, 55, 57, 59 Cole, M., 153, 272 Coleman, J. S., 281, 284, 285 Collingwood, R. C., 85, 202 Collins, A., 183, 290 Collins, L. M., 281, 282, 283, 285 Coltheart, M., 45, 46, 49, 53, 56, 58, 60 Comings, J., 130, 131, 132, 246 Conant, J. B., 228, 243 Condelli, L., 134, 246 Conners, F., 34, 36 Content, A., 48, 54, 55, 60 Cookson, P. W. Jr., 261, 262 Cooper, F. S., 13, 29 Cooper, H., 204, 208

311

AUTHOR INDEX Corman, L., 108, 225 Cosky, M. J., 45, 58 Cote, N., 96, 202 Coulmas, F., 23, 35 Craig, H. K., 33, 36 Cran, W., 5, 6, 19 Crawford, S., 302, 306 Crumrine, B., 133, 147 Cummings, D. W., 23, 35 Cummins, J., 132, 146 Cunningham, P., 33, 35 Curry, C, 108, 125 Curtis, B., 46, 58

E Edelson, D., 183, 191 Edley, C. Jr., 245, 246, 262 Ehri, L. C., 22, 35 Eisenstein, E. L., 152, 269 Elley, W. B., 257, 262 Elliott, B., 108, 225 Elman, J. L., 56, 60 Emmott, C., 202 Ercikan, K., 164, 170

F D Dagidir, F. Z., 134, 136, 144, 146 Dahl, K. L., 22, 35 Dale, E., 200, 209 Dale, P. S., 70, 77 Daniels, P. T., 23, 35 Darling-Hammond, L., 227, 243 Das, J. P., 155, 269 Dascal, M., 179, 290 Davelaar, E., 49, 58 Davis, C., 188, 292 Davis, R. A., 269, 285 de Boysson-Bardies, B., 68, 77 de Leeuw, J., 282, 287 Dede, C., 188, 190 Denhiere, G., 97, 104 Desjardins, R. N., 67, 79 Destino, T., 109, 224 Dewey, J., 82, 84, 102 Dillon, J. T., 91, 102 DiMatteo, M. R., 204, 209 diSessa, A., 180, 181, 191 Downey, J. E., 194, 209 Draper, N., 255, 262 Dressman, M., 108, 225 Duffy, S. A., 85, 99, 204 Dufour, J.-M, 267, 285 Duguid, P., 183, 190 Duke, N. K., 107, 118, 119, 124 Duncan, O. D., 274, 286 Dunn, J., 68, 77 Durgunoglu, A. Y., 129, 131, 132, 133, 134, 136, 139, 140, 141, 142, 143, 144, 145, 146, 247 Dykstra, R., 9, 28

Farr, R., 150, 161, 269 Fenson, L., 70, 77 Ferguson, R., 239, 243 Fernald, A., 67, 68, 69, 70, 71, 77 Ferrand, L., 55, 62 Ferrandino, V. L., 254, 262 Fidell, L. S., 269, 287 Fields, R. B., 68, 78 Fischer, D. H., 85, 202 Fischer, J. L., 31, 36 Fischhoff, B., 99, 202 Fisher, C., 67, 77 Fisher, E., 173, 292 Fitzgerald, N., 132, 147 Flammer, G., 232, 243 Flanigan, H. P., 55, 60 Fleischman, H., 132, 147 Flood, J., 108, 225 Folman, R., 273, 285 Ford, M., 232, 243 Forkosh-Baruch, A., 188, 292 Forster, K. L, 45, 49, 58 Fountoukidis, D. L., 3, 10, 28 Francis, W., 49, 59 Frankenberg, E., 245, 262 Freeman, E., 114, 225 Freire, P., 136, 146 Friedman, D., 39, 55, 59 Fries, C. C., 1, 18 Frost, R., 56, 58, 158, 169 Frost, S. J., 56, 59 Fry, E. B., 3, 10, 28 Fukui, I, 68, 77 Furlin, K. R., 41, 59

312

AUTHOR INDEX

G Garas, K., 206, 208 Garner, W., 178, 191 Gates, A. I., 197, 209 Gaur, A., 5, 18 Gelman, S. A., 71, 79 Gentner, D., 65, 66, 70, 77 Gentry, J. R., 33, 36 Gerken, L., 72, 79 Gerrig, R., 101, 103 Gibson, E. J., 64, 77, 180, 292 Gilbert, P., 257, 262 Gillette, A., 135, 247 Gillette, J., 67, 77 Ginsburg, K, 232, 243 Glass, G. V., 204, 205, 209 Gleitman, H, 67, 77 Gleitman, L., 67, 77 Glenn, C. G., 84, 87, 88, 89, 91, 93, 204, 108, 112, 225 Gliksman, S., 186, 292 Globerson, T., 182, 292 Glushko, R. J., 46, 58 Glutting, J. J., 273, 286 Gogate, L. J., 73, 75, 77, 78 Goldberger, A. S., 274, 286 Goldfield, B. A., 70, 78 Goldinger, S. D., 49, 58 Goldman, S., 96, 97, 202, 139, 247 Golinkoff, R., 63, 65, 66, 69, 72, 74, 78, 79 Gong, B., 181, 292 Goodman, I. F., 129, 247 Goody, J., 152, 153, 269 Gopnik, A., 71, 77, 78 Gordin, D., 183, 292 Goswami, U., 3, 18 Gotesman, R., 56, 58 Gough, P. B., 45, 58, 129, 247 Graesser, A. C., 83, 90, 91, 202, 203 Graff, H. J., 152, 153, 269 Graham, J. W., 282, 285 Graham, S., 33, 36 Grainger, J., 49, 55, 59 Granott, N., 183, 292 Gray, W. S., 196, 197, 198, 209 Green, D., 232, 243 Greene, S. B., 101, 203 Griffith, P. L., 129, 247 Groff, P., 41, 59 Gross, J., 258, 262 Gupta, J. K., 272, 287

H Haavelmo, T., 266, 274, 286 Haber, L. R., 41, 59 Haber, R. N., 41, 59 Hagood, M. C., 163, 269 Haines, D., 291, 306 Hald, A., 276, 286 Hall, G. S., 194, 209 Haller, M., 46, 58 Halliday, M. A. K., 85, 203 Hamilton, L., 240, 243 Hanich, L. B., 278, 286 Hannan, M. T., 284, 287 Harford, T., 270, 285 Harik, P., 285, 286 Harlow, A., 232, 243 Harris, K. R., 33, 36 Hart, H. L. A., 85, 86, 203 Hartman, J. W., 27, 36 Haryu, E., 65, 78 Hasan, R., 85, 203 Hasten, M., 258, 262 Hastie, R., 99, 100, 203, 206 Hauser, A., 173, 292 Havelock, E. A., 187, 292 Hawkins, S. A., 99, 203 Healey, A. diP., 290, 291, 296, 306 Hedges, L. V., 204, 208 Hemphill, L., 129, 247 Henderson, H. K., 155, 269 Henri, V., 89, 202 Henry, M. K., 17, 29 Henry, N. W., 282, 286 Herriman, M. L., 129, 247 Hiebert, E. H., 201, 208 Hilton, D. J., 86, 203 Hirsh-Pasek, K., 63, 65, 66, 69, 72, 74, 78, 79 Hobbs, R., 158, 163, 269 Hockey, S., 290, 291, 306 Hoffman, J. V., 108, 225 Holland, J., 291, 296, 306 Honore, A. M., 85, 86, 203 Honzaki, E., 133, 247 Hopey, C., 218, 225 Horn, E., 197, 209 Horn, S., 238, 243 Hospers, J., 85, 203 Howey, K. R., 227, 243 Huey, E. B., 40, 59, 195, 209 Hyatt, S. L., 282, 285

313

AUTHOR INDEX I

Imai, M., 65, 78 Irwin, M., 133, 247

J

Jackson, G. B., 204, 209 Jacobs, A. M., 48, 49, 50, 51, 55, 59, 62 Jacobson, J. L., 68, 78 Jacquet, R. C., 66, 78 Jain, P., 49, 58 Jain, R., 97, 99, 205 Jared, D., 46, 59 Jastrzembski, J. E., 44, 49, 59, 60 Jenkins, G. M., 269, 285 Jesse, A., 53, 59, 60 Johnson, B., 245, 249, 250, 251, 252, 256, 262 Johnson, D. D., 245, 249, 250, 251, 252, 256, 262 Johnson, N., 108, 225 Johnson, N. F., 41, 59 Johnson, N. S., 88, 89, 204 Jonasson, J. T., 49, 58 Jones, A., 232, 243 Jones, Q., 185, 292 Jones, R., 48, 49, 59 Jongsma, E., 150, 161, 269 Jordan, N. C., 277, 278, 286 Jordan, T. R., 42, 59 Joreskog, K. G., 273, 275, 286 Judge, H., 228, 243 Juel, C., 129, 247

K Kahneman, D., 99, 203 Kamil, M., 108, 109, 224, 225, 152, 162, 270 Kaplan, D., 273, 274, 277, 278, 283, 283n3, 283n4, 285, 286 Katz, M. B., 246, 262 Kawamoto, A. H., 48, 49, 59 Keesling, J. W., 275, 286 Kello, C. T., 48, 49, 59 Kennedy, D., 184, 292 Kenny, D., 266, 286 Kessler, B., 48, 59 Kindfield, A. C., 183, 292

King, K., 211, 224 Kintsch, W., 85, 97, 203, 151, 155, 161, 170 Kittay, J., 152, 270 Knupfer, N. N., 162, 270 Koch, N., 184, 290 Koerner, J. D., 228, 243 Koretz, D., 240, 243 Koschmann, T., 184, 292 Kozhevnikova, E. V., 69, 78 Kozma, R., 211nl, 221, 225 Kreisman, M. B., 280, 280nl, 286 Krendl, K. A., 155, 162, 270 Kress, J. E., 3, 10, 18 Kretzschmar, W. A. Jr., 24, 36 Kruger, A., 74, 79 Kucan, L., 95, 203 Kucera, H., 49, 59 Kuhl, P., 67, 68, 69, 77, 78 Kuhn, T. S., 174, 292 Kuscul, H., 134, 136, 143, 144, 246

L Labov, W., 22, 25, 27, 28, 29, 32, 33, 34, 36 Lacerda, F., 69, 78 Lahav, O., 179, 185, 189, 292, 292 Lane, D. M., 108, 225, 152, 162, 270 Langdon, R., 45, 46, 56, 58 Langeheine, R., 282, 288 Langer, J. A., 129, 247 Langston, M., 83, 84, 96, 97, 98, 99, 101, 203 Lapp, D., 108, 225 Lawson, L. L., 22, 35 Lazarsfeld, P. F., 282, 286 Lea, R. B., 101, 203 Lederer, A., 67, 77 Lee, C, 245, 262 Lee, V., 2, 19 Lehnert, W. G., 91, 203 Lemke, I. L., 152, 270 Lemosse, M., 228, 243 Lempert, R. O., 99, 100, 206 Leonard, S., 65, 78 Levin, H., 64, 77 Levine, A., 260, 262 Levine, M., 259, 263 Lewin, T., 260, 262 Lewis, R. E., 289, 306 Liberman, A. M., 13, 29

314

AUTHOR INDEX

Lichtenstein, E. H., 92, 103 Lionni, L., 3, 19 Lipsey, M. W., 204, 209 Liu, L., 91, 206 Lockwood, J., 240, 243 Logan, K., 180, 187, 191 Lomax, R. G., 129, 247 Longobardi, E., 70, 71, 77 Lord, F. M., 271, 286 Lowe, G. S., 216n2, 224 Lucas, P. A., 44, 49, 59, 60 Lukatela, G., 56, 59 Lutz, M. F., 99, 203

M

Mackey, M., 161, 163, 270 Mackie, J. L., 85, 86, 87, 203 MacNeil, R., 5, 6, 7, 29 Magliano, J., 83, 95, 96, 96f, 97, 98, 99, 103, 205 Magliano, P. A., 95, 97, 205 Maguire, M. J., 66, 79 Mahar, D., 163, 269 Mandler, J. M., 65, 78, 88, 89, 204, 108, 225 Masataka, N., 73, 78 Mason, R. A., 101, 203 Massaro, D. W., 38, 39, 40, 42, 44, 45, 47, 49, 52, 53, 55, 57, 59, 60, 62 Masterman, L., 152, 158, 270 Matherne, D., 108, 225 Mathews, R. H., 86, 203 Mayer, M., 92, 94, 204 Mazzie, C, 69, 77 McAuley, J., 216n2, 224 McCaffrey, D., 240, 243 McCarthy, S. J., 108, 225 McClelland, J. L., 56, 60, 83, 204 McConnell, C. R., 267, 286 McCrum, R., 5, 6, 29 McCullough, M., 180, 191 McDermott, P. A., 273, 286 McDonald, J. E., 55, 60 McDougall, D., 291, 306 McDougall, I., 291, 296, 306 McGee, L. M., 129, 247 McKenna, M., 108, 225 McKoon, C., 100, 101, 203, 204 McLellan, H., 162, 270 McLuhan, M., 175, 179, 292

McMahon, A., 23, 26, 36 McNaught, C., 184, 292 McSparran, F., 290, 306 Medina, J., 260, 262 Meier, D., 233, 243 Meredith, W., 277, 286 Messaris, P., 152, 270 Messick, S., 162, 270 Meyer, B. J. F., 112, 225 Meyer, M., 65, 78 Mielke, P., 296, 306 Miller, G. A., 181, 182, 292 Miller, R. G., 13, 15, 28 Mintzer, M., 50, 60 Mioduser, D., 175, 179, 181, 183, 184, 185, 188, 189, 292, 292 Mislevy, R. J., 164, 270 Mitcham, C., 183, 292 Mitchell, R., 235, 243 Moats, L. C., 3, 13, 29 Montant, M., 48, 62 Morgan, M., 132, 247 Morikawa, H., 70, 71, 77 Morrison, C. W., 200, 208 Moss, B., 108, 225 Mulaik, S., 274, 286 Mullennix, J., 48, 53, 59, 61 Munger, G. P., 93, 205 Murray, F., 236, 243 Muthen, B., 275, 277, 279, 287, 288 Myers, J. L., 85, 99, 101, 203, 204

N

Nachmias, R., 179, 184, 185, 188, 189, 292 Nagin, D. S., 279, 287 Naigles, L., 70, 71, 79 Nandakumar, R., 66, 78 Nelson, R., 253, 263 Nesdale, A. R., 129, 247 Newport, E. L., 47, 60 Newsome, S. L., 42, 55, 60 Newton, E., 108, 225 Nicholas, D. W., 84, 93, 104, 205, 206 Nickels, M., 93, 205 Nicolls, E., 108, 225 Nist, J., 12, 29 Noel, R. W., 42, 60 Norman, K., 13, 15, 16, 17, 28, 22, 35 Novak, J. D., 183, 292 Nunes, S. R., 22, 35

315

AUTHOR INDEX

O O'Brien, E. J., 99, 104 O'Regan, J. K., 49, 59 Odden, A., 259, 263 Ogle, D., 114, 125 Ohanian, S., 233, 243 Ohlsson, S., 182, 192 Okada, H., 65, 78 Olson, D. R., 152, 153, 157, 168, 170, 171, 174, 187, 192 Olson, K. L., 68, 78 Omanson, R. C., 84, 88, 89, 104 Oney, B., 129, 134, 136, 139, 140, 141, 142, 143, 144, 146, 147 Orchard, A. P. McD., 300, 306 Oren, A., 179, 181, 184, 185, 189, 191, 192 Orfield, G., 245, 262 Osin, L., 179, 192 Owston, R. D., 183, 192

P

Paap, K. R., 42, 55, 60 Paine, M., 228, 243 Palincsar, A. S., 184, 190 Papert, S., 177, 181, 183, 192 Papousek, H., 68, 79 Papousek, M., 68, 77, 79 Pappas, C., 107, 109, 125 Parrella, A., 130, 131, 132, 146 Patching, G. R., 42, 59 Pattanayak, D. P., 152, 171 Payton, P., 82, 94, 97, 99, 105 Pea, R., 174, 175, 183, 191, 192 Pearl, J., 287 Pearson, K., 204, 209 Pearson, P. D., 201, 209 Pederson, L., 22, 23, 25, 27, 28, 36 Peereman, R., 48, 54, 55, 60 Pence, K., 72, 79 Perkins, D. N., 182, 192 Perraton, H., 225 Perry, C., 45, 46, 53, 56, 58, 60 Person, D., 114, 125 Pethick, S. J., 70, 77 Plewis, I., 271, 287 Policastro, M., 93, 94, 104 Poster, M., 50, 60 Potter, W. J., 152, 158, 171

Pressley, M., 156, 171 Pressley, M. P., 18, 19 Price, L. A., 290, 307 Price-Wilkin, }., 290, 306 Prochnow, J. E., 131, 146 Pulverman, R., 65, 66, 74, 78, 79 Pye, C, 68, 77 Pylyshyn, Z. W., 182, 192

Q Quellmalz, E., 221, 225 Quetelet, A., 276, 287

R Radvansky, G. A., 99, 103 Ram, J., 184, 192 Rastle, K., 45, 46, 56, 58, 60 Ratcliff, R., 100, 101, 103, 104 Raths, J., 233, 243 Raudenbush, S. W., 277, 287 Read, C., 31, 36 Reed, J. G., 203, 209 Reicher, G. M., 41, 60 Reid, K. A., 155, 162, 170 Reinking, D., 161, 171 Relies, N. N., 290, 307 Rennie, S., 290, 306 Rey, A., 50, 51, 55, 59, 61 Reznick, J. S., 70, 77 Richards, T. L., 153, 169 Richmond-Welty, F. D., 53, 61 Rinck, M., 101, 104 Risden, K., 91, 92, 106 Rivers, J., 238, 239, 243 Rizzella, M. L., 99, 104 Roberts, J., 30, 31, 36 Robinson, R., 108, 125 Rodkin, P. C., 93, 105 Rodriguez-Munoz, M., 109, 124 Rogers, A., 141, 147 Rogosa, D., 271, 272, 273, 276, 277, 278, 287 Rosenthal, T., 204, 209 Rumelhart, D. E., 83, 92, 104 Ryskina, V. L., 69, 78

316

AUTHOR INDEX

S Sabarini, J. P., 133, 247 Sachs, J., 65, 79 Saenger, P., 152, 163nll, 272 Saffran, J. R., 47, 60 Salkind, S. J., 66, 72, 79 Salomon, G., 152, 154, 272, 182, 184, 292 Sanders, W., 238, 243 Sanocki, T., 40, 60 Santa Maria, M., 183, 292 Saul, E. U., 96, 202 Sayer, A. G., 277, 288 Scardamalia, M., 184, 189, 292 Schank, R. C., 84, 85, 87, 90, 204 Scharer, P. L., 22, 35 Schilling-Estes, N., 22, 25, 27, 28, 29, 36 Schmalhofer, F., 97, 103 Schoenberg, R., 272, 285, 287 Scholes, R. J., 152, 272 Schuster, B. V., 22, 35 Schvaneveldt, R. W., 55, 60 Schweisguth, M. A., 63, 78 Scott, J. A., 201, 208 Scott-Brown, K. C., 42, 59 Scribner, S., 153, 272 Secco, T., 82, 83, 85, 87, 88, 89, 90, 205 Sedlak, M., 228, 243 Segui, J., 49, 59 Seidenberg, M. S., 44, 46, 55, 60, 61 Seuss, Dr., 4, 29 Shady, M., 72, 79 Shanahan, L., 206, 208 Shanahan, T., 22, 35, 203, 206, 207, 209 Shankweiler, D. P., 13, 29 Sharma, K. K., 272, 287 Sharon, T., 65, 79 Shatz, M., 70, 71, 79 Sherman, R., 134, 247 Sheynin, O. B., 280, 281, 287 Shinjo, M., 85, 99, 204 Shramm, W., 155, 272 Silverblatt, A., 152, 158, 272 Simmons, B., 133, 247 Simon, H. A., 174, 292 Simon, T., 68, 77 Singer, J. D., 266, 287 Skilton-Sylvester, E., 131, 247 Sligh, A., 34, 36 Slobin, D. L, 92, 202 Smiley, S. S., 89, 202

Smith, C. A., 65, 79 Smith, L., 236, 243 Smith, N. B., 201, 209 Snodgrass, J. G., 50, 60 Snow, C. E., 129, 147 Sobel, M. E., 285, 287 Soifer, R., 133, 247 Sokal, M. M., 193, 194, 209 Scotsman, J. L., 66, 74, 79 Soricone, L., 130, 131, 132, 246 Spearman, C., 274, 287 Speirs, N., 291, 306 Sperry, L. L., 89, 90, 99, 205 Stahle, D., 108, 225 Stanovich, K. E., 45, 51, 60, 202, 209 Steffler, D. J., 28, 36 Stein, N. L., 84, 87, 88, 89, 91, 93, 94, 96, 104, 105, 108, 112, 225 Steinberg, J., 258, 263 Stevenson, J., 196, 209 Stigler, S. M., 276, 287 Stiglitz, J., 214, 225 Stolyarova, E. I., 69, 78 Stone, G. O., 47, 50, 52, 53, 54, 55, 60, 62 Strawson, C., 45, 58 Street, B., 132, 247 Studdert-Kennedy, M., 13, 29 Suh, S., 82, 83, 87, 90, 93, 94, 95, 97, 99, 204, 205, 206 Summers, G., 275, 288 Sundberg, U., 69, 78 Switzer, T. J., 245, 263 Symmes, D., 68, 79

T Tabachnick, B. G., 269, 287 Taeschner, T., 68, 77 Talmy, L., 66, 79 Tanenhaus, M. K., 46, 55, 60 Tapiero, I., 97, 204 Tardif, T., 70, 71, 79 Tayeb, S., 56, 58 Taylor, D., 202, 209 Taylor, G. A., 44, 49, 60 Taylor, I., 152, 168, 272 Thal, D. J., 70, 77 Thomas, S. M., 42, 59 Thompson, M. C., 42, 62 Thompson, P., 291, 306

317

AUTHOR INDEX Thorndike, E. L., 89, 105, 248, 263 Thorndyke, P. W., 92, 205 Tibbetts, J., 134, 247 Tierney, R. J., 161, 162, 271 Tiller, T., 184, 190 Tinker, M. A., 196, 199, 209 Tirozzi, G. N., 254, 262 Tisak, J., 277, 286 Tomasello, M., 74, 79 Toulmin, S., 171 Trabasso, T., 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 96f, 97, 98, 99, 100, 101, 103, 104, 105, 106 Trainin, G., 13, 15, 18 Trainor, L. J., 67, 79 Treiman, R., 31, 36, 48, 53, 59, 61 Tubin, D., 188, 192 Tuma, N. B., 284, 287 Tunmer, W. E., 129, 131, 146, 147 Turvey, M. T., 56, 59 Tversky, A., 99, 103 Tynan, K., 150, 152, 153, 158, 171 Tzeng, Y., 91, 92, 106

U

Updike, J., 41, 61

V Vachek, J., 23, 36 van de Pol, F., 282, 287, 288 van den Broek, P., 82, 83, 85, 87, 88, 89, 90, 91, 92, 93, 105, 106 van der Kamp, L. J. T., 281, 282n2, 285 van Dijk, T. A., 85, 103 Van Orden, G. C., 47, 52, 53, 54, 55, 60 Vanhoy, M., 47, 52, 53, 54, 55, 60 Vargas, S., 206, 208 Varma, S., 97, 102 Varnhagen, C. K., 28, 36, 155, 169 Vasquez, O. A., 129, 147 Venezky, K., 2, 19 Venezky, R. L., 1, 2, 5, 8, 19, 23, 26, 36, 44, 45, 46, 49, 52, 53, 57, 58, 60, 61, 107, 108, 225, 133, 135, 247, 152, 159, 159n6, 161, 163nll, 167, 168, 272, 272, 179, 181, 188, 292, 202, 209, 247, 263, 289, 290, 291, 307

Verdejo, M., 184, 192 Verhoeven, L., 132, 246 Vernon, M. D., 196, 209 Vygotsky, L., 184, 192

W Wagner, D. A., 132, 133, 135, 147, 211, 211nl, 213, 216, 218, 225 Wald, J., 245, 246, 262 Wallace, M. J. Jr., 259, 263 Walpole, S., 283, 283n3, 283n4, 286 Walsh, J., 255, 262 Walsh, M., 258, 263 Warburton, Y., 290, 307 Ware, W. H., 155, 162, 170 Warren, R., 155, 162, 170 Warren, W. H., 84, 93, 106 Warschauer, M., 152, 172 Washington, J. A., 33, 36 Wasserman, D., 99, 100, 106 Waters, G. S., 44, 46, 60, 61 Watson, J. D., 73, 78 Watt, I., 152, 153, 169 Weber, R.-M., 24, 36 Weidler, D., 134, 147 Welsch, D., 97, 103 Wenger, E., 184, 192 Whatmough, C., 55, 61 Wheaton, B., 275, 288 Wideman, H., 183, 192 Wiggins, L. M., 282, 288 Wiley, D. E., 275, 288 Wiley, J., 84, 99, 100, 206 Wilkinson, I. A. G., 201, 208 Willett, J., 266, 272, 273, 277, 287, 288 Williams, R. H., 272, 288 Willows, D. M., 22, 35 Wilson, D. B., 204, 209 Wilson, K., 13, 15, 28 Wilson, L., 254, 263 Wilson, M. R., 164, 270 Wilson, P. T., 140, 247 Winerip, M., 260, 263 Winfield, L. P., 247, 263 Winn, M. B., 72, 79 Wise, A., 259, 263 Wolfram, W., 22, 25, 27, 28, 29, 36 Woodruff, D., 134, 247 Woodworth, R. S., 40, 61, 194, 209

318

AUTHOR INDEX

Wright, S., 238, 243, 274, 288 Wugalter, S. E., 281, 282, 283, 285 Wulff, S., 152, 272 Wyatt, T. A., 31, 36 Wynn, K., 65, 79

Young, D. L., 133, 147 Young, M., 132, 147 Yurecko, M., 206, 208

Z X Xu, F., 71, 79

Y Yaghoub-Zadeh, Z., 22, 35 Yasa, M, 134, 136, 146 Yoakam, G. A., 197, 209

Zadeh, L. A., 38, 61 Ziegler, J., 45, 46, 48, 50, 51, 53, 55, 56, 58, 59, 60, 61 Zimmerman, D. W., 272, 288 Zimny, S., 97, 103 Zimowski, M., 272, 277, 278, 287 Zimpher, N. L., 227, 243 Zorzi, M., 45, 61 Zukowski, A., 54, 61 Zutell, J., 33, 36

Subject Index

A Accreditation, 228, 240-242 Acoustic packaging, 75 ACT (American College Testing), 236 Action verb, 72-73 Ad Herennium, 178 Adult literacy illiteracy statistics, 216 literacy program (See Functional Adult Literacy Program) multidimensional model, 128f-134 affective component, 130-131 cognitive-linguistic component, 129-130 instructional context, 133-134 social context, 132-133 previous international effort, 212-213 program dropout rate, 132 Adult literacy /education for poor feasibility of using technology, 215-220 guiding principles, 221-224 Internet access in US, 219f literacy / economic / technology relation, 213-215 policy implications, 220-221 previous international effort, 212-213 regional variation in literacy, 216-217f

African American Vernacular English (AAVE), 28-29, 32, 33-34 After-the-fact reasoning, 100 Alienation, stability of, 275 American College Testing (ACT) exam, 236 The American Way of Spelling (Venezky), 21 Anaphoric /spatial referent, 101 Anglo-Saxon source in English consonant spelling/blend, 11-12 CVC pattern, 8-9, 10-12 grapheme-phoneme correspondence, 10-12, l1t long vowel, 8-9 vowel marking system, 14-15, 16 ARIMA model, 269-270 Artifactual extension metaphor, 180 Auditory word recognition, 55

B Baird, Brad, 257-258 Baker, Peter, 292 Basal reading series, 3, 107, 108, 200-201

319

320

SUBJECT INDEX

Becoming a Nation of Readers (Anderson, Hiebert, Scott, & Wilkinson), 201, 202-203, 206 Beginning reading texts, See Information text, theory /practice Bigram/trigram frequencies, 3, 49 Blend, 11-12 Box-Jenkins approach, 269-270 Brahe, Tycho, 267 Bridges to the Future Initiative (BFI), 220 Bush, George W., 206-207, 257-258

c Cameron, Angus, 290 Campbell Collaboration, 205 Cattell, James McKeen, 193-194, 195 Causal field, 86 Causal inference, 82, 84-85 Causal reasoning research study causality /cohesion /coherence in, 84 empirical validation of approach anaphoric /spatial referent, 101 causal connection in narration, 92-94 causal connectivity/chain/recall, 87-89 coherence /story structure, 94 connectionist model, 97-99 decision making, 99-100 goal hierarchies /connectivity, 92 importance of clause in text, 89-90 minimalist position, 100-101 online comprehension, 95-97, 96f psychological validity, 90-91 question asking, 91-92 identifying causal inference, 84-85 overview of approach to, 81-84 validating discourse analysis/causal network, 85-87 Chunking, 8, 10, 16 Cochrane Collaboration, 205 Cognitive apprenticeship, 184 Cognitive technology advanced knowledge technology in education, 187-190 balancing theory/reality in, 189-190 to expand action space for cognition, 188-189 for integrating various approaches, 188 approach to cognition /technology relation, 176-187 activation /consolidation, 180

acquisition, 177-179 collaborative creation, 184-185 compensation, 185-186f construction, 183-184 evolution, 186-187 extension of natural capability, 179-180 externalization, 181-182 internalization, 182-183 summary of, 178f Cognitive technology, definition of, 175 Conceptual transformation, 174 Connectionist theory, 83-84, 97-99 Connection strength index, 98-99 Consonant blend, 12, 14 Consonant-cluster at end of word, 26 Consonant-cluster onset, 48 Consonant cluster reduction, 29, 34 Consonant digraph, 11 Consonant-vowel-consonant (CVC), 4, 6, 8-9, 10, 11-12, 13-14, 16 Constructivism / constructionism, 183 Contraction, 30 Copeland, Morris Albert, 268 Council for Basic Education (CBE), 229 Covert prestige dialect, 29 Cowles Commission for Research in Economics, 274

D Dakar Framework for Action, 212-213 Dale-Chall formula, 200 Decoding, 3-4, 129, 139-140 De Lagrange, Louis, 268 Dentalization of consonant, 26-27 Dewey, John, 84 DIBELS (Dynamic Indicators of Basic Early Literacy), 9 Dictionary of Old English (DOE) project example, A to F on CD-Rom, 291-292 field search, 298f-304, 299f-300f, 301f, 302f, 303f future issue in, 304-305 getting started, 294-295f homepage, 295f hotlink, 296f-298, 297f information field, 293 introduction to, 289-291

321

SUBJECT INDEX microfiche publication, 304 standards, 293-294 system requirements, 294 transition to electronic dictionary, 292 Digital divide, 215 Digraph, 14, 15 Discourse analysis, 82, 83, 85-87, 89-90 Discourse comprehension, 92 Division of labor metaphor, 180 Domain analysis, 165 Domain modeling, 165 Dropout rate, adult literacy program, 132 Dual route-cascaded (DRC) model, 56 Dual-route model, in written word recognition, 45 Dynamic Indicators of Basic Early Literacy (DIBELS), 9

E Econometrics, 274 Educational Testing Service, 109 Elite English, 24 Encoding, 94 English language, history of, 5-7 English orthography, 2-3 grapheme-phoneme correspondence, 10-12, 11t morphophonemic principle, 5, 7-8 overview of, 1-2 reading program, 9-17 traditional, 9-13 WordWork, 13-17 spelling and, 3-4 ERIC (Educational Resource Information Clearinghouse), 203 Evidenced-centered design (ECD), 164-167 Exosomatic evolution, 187 Expert system, technological, 179

F Factor analysis, 274, 275 FALP, See Functional Adult Literacy Program First-Grade Reading Studies, 9 Fist/last letter, in word recognition, 42-43 Fleming, Damian, 292

FLMP (fuzzy logical model of perception), 30f, 37-40, 38f, 55-56 Fourier, Joseph, 268 Fragmentation task, 50-51 Functional Adult Literacy Program (FALP), 127 affective /social /cultural factors in, 144-145 conclusion, 145 participant, 134-135 program evaluation, 143-144 program philosophy, 136 teacher /teacher training, 135-136 testing, 142 typical lesson component additional reading comprehension exercise, 141 application exercise, 140-141 calendar date activity, 137-138 discussing picture at unit beginning, 138-139 homework review, 137 journal keeping, 141-142 listening to passage depicted by picture, 139 mathematics, 141 reading newspaper article, 138 reading passage /answering comprehension question, 140 sound-letter-syllable-word-sentence activity, 139-140 Functional awareness, 130 Fuzzy logical model of perception (FLMP), 30f, 37-40, 38f, 55-56

G GAO (goal-attempt-outcome), 93 General American English, 21-23 phonological system distributional, 26 realizational, 26-27 systemic, 26 phonological variation in, 24-25 differences between varieties, 25-27 dimension of child phonological system, 30-32 classroom variation, 32-34 regional variation, 27-28 social dialect, 28-29 style and vernacular, 29-30

322

SUBJECT INDEX

General structural equation model, 277 Generative phonology, 23 Getz, Rob, 292 Gilbart, James W., 268 Glue letter, 13-14 Goal-attempt-outcome (GAO), 93 Goal hierarchies /connectivity, 92 Grameen Bank, 222, 223 Grapheme, 10 Grapheme-phoneme pattern, 8, 10-12, l1t, 14, 45 Graunt, John, 267 Gray Oral, 196

H Handy word, 16 Hayes, Demorah, 292 Helping poor, See Adult literacy /education for poor Higher Education Act, 229 High-stakes testing, See School accountability Hindsight bias, 100 Holistic word recognition, 41 Homophone, 26 Homophonic prime, 56 Huey, Edmund Burke, 207-208

background of, 113 conclusion of, 117-118 instructional procedure, 113-115 results/ discussion, 115-117 difficulty of material, 116 source of information, 116-117 text genre results, 115-116 research on text used in instruction, 108-109 story/information text difference purpose /instrumentality, 112-113 structure, 112 truth value, 110-111 unity, 111-112 text genre importance, 110 International Action Plan, 213-214 Invented spelling, 31 Iowa Test of Basic Skills, 246-247, 255

J

Jacobs, Leslie, 258 JEH (just-enough-help), 222 Jevons, William Stanley, 268 JIT (just-in-time), 222 Juglar, Clement, 268

K I

Individual growth curve modeling, 276-280 Infant-directed (ID) speech, 67-68 Infant-directed speech /-action, See Verb learning, infant-directed speech /-action influence on Information and communication technologies (ICT), definition of, 216n2 Information society, 214, 215 Information text, theory/practice future of, 124 observation study on information text, 118-123 characteristic of school in, 119t conclusion /implications, 123-124 findings, 120-123 overview of, 118-120 observational /intervention study

Kepler, Johannes, 267 King, Gregory, 267 Klein, Joel I., 258 Knowledge economy, definition of, 215 Knowledge technology, definition of, 175 Knowmagine Virtual Park for Science Technology and Culture, 184 K-W-L procedure, 114

L Lambert, Johann Heinrich, 268 Language acquisition, 2, 68-70 Latent class analysis, 279 Latent class growth analysis (LCGA), 279 Latent class measurement, 282 Latent transition analysis, 280-284 Lateral masking, 42

323

SUBJECT INDEX LEAP (Louisiana Educational Assessment Program), 246-247, 252-254, 255-257 Learnet (virtual learning community), 185 Learning to Read: The Great Debate (Chall), 200-201, 206 Lexical consistency, 46, 47-48 Literacy acquisition, 2-3 Long-short variation, 8-9 Long-term memory, 82

M Marker concept, 8-9, 14-15, 16 Markov, A. A., 280-282 Markov chains, 281 Markov models, 281-282 Markov process, 281, 282 Markov stochastic process models, 281, 282 MARTP (Mid-Atlantic Regional Teacher Project), 229 Masked priming, 56 Media constructs, 149-151 cross-media effect on development, 151-155 film /television production, 155 neuroscience evidence, 153-154 print media, 152 evidenced-centered design, 164-167 future of print text, 167-168 reading /media literacy assessment /literacy, 156-161 international assessment program, 156-157t media education, 156-157t printed text /multiple /multimedia form, 158-161 validity construct need, 161-164 Medium hypothesis, 153 Memory, 82, 88, 95-97, 96f, 98, 153 Menard, Pierre, 149-151 Mental skills hypothesis, 153 Meritorious New Teacher designation, 229 Metalinguistic awareness, 129-130 Metalinguistics hypothesis, 153 Metaphonic concept, 10 Mid- Atlantic Regional Teacher Project (MARTP), 229 Mielke, Peter, 292 Mind-made world, 175

Modality hypothesis, 153 Multidimensional model of literacy acquisition, See Adult literacy, multidimensional model Multiple-choice test, 9

N Nasal, 14 National Assessment of Educational Progress (NAEP), 156, 236 2003 reading framework, I57t National Board for Professional Teaching Standards (NBPTS), 228 National Council for the Accreditation of Teacher Education (NCATE), 259 National Council of Teachers of English (NCTE), 160nn7-8, 259 National Early Literacy Panel (NELP), 207 National Leadership Conference on Media Literacy, 156-157 National Literacy Panel for Language Minority Children and Youth (NLP), 207 National Reading Panel (NRP), 9, 156, 205-206 National Research Council, 202 National Technology Laboratory for Literacy and Adult Education (Tech21), 219-220 A Nation at Risk (National Commission on Excellence in Education), 236, 246 Network English, 24 No Child Left Behind (NCLB), 229-230, 255, 257-258 Nonword, 52

O Offline visualization, 48-49 Olympic games motto metaphor, 180 Online comprehension, 95-97, 96f Onset-rime technique, 12-13 Oral reading, 16 Orthographic redundancy, 44 Orthographic regularity, 44-45 Orthographic structure, 39-40, 44-45 rule-governed regularity, 44, 45 statistical redundancy, 44

324

SUBJECT INDEX P

Panel data analysis, 270-273 Pataki, George, 258 Path analysis, 274 Persons, Warren, 268-269 Petty, William, 267 Phoneme awareness, 13 Phonics, 2, 9, 10, 17-18, 22, 201-202 Phonological awareness, 130 Phonological variation /spelling, in General American English, See General American English Physique Sociale (Quetelet), 276-277 Plato, on writing, 176-177 Playfair, William, 267-268 Plosive, 14 Postvocalic naming task, 49 Poynting, John Henry, 268 PRAXIS I, 231-232 Prestige dialect, 29 Preventing Reading Difficulties in young Children (Snow, Burns, & Griffen), 202-203 Primary vowel, marking system for, 14-15 Priming, 56, 95, 101 Princeton Review, 258-259 Pro-drop language, 70-71 Program for International Student Assessment (PISA), 156 Programmed instruction model, 178-179 Progress for International Student Assessment (PIRLS), 156

Q Quantitative methodology, for studying change equilibrium problem in, 284-285 introduction to, 265-266 methodology, 266-284 latent transition analysis, 280-284 basic idea in, 281-283 example of, 283-284 modeling changes in various states, 280-281 modeling individual growth curve, 276-280 applying conventional growth curve modeling, 277-278

basic idea in conventional growth curve modeling, 277 basic idea in growth mixture models, 279 early attempt at modeling growth curves, 276-277 example, of growth mixture modeling, 280 growth mixture modeling, 278-279 panel data analysis, 270-273 criticism/defense of, 271-273 description of, 271 example of, 273 structural equation modeling, 273-275 applying to longitudinal data, 275 basic idea in, 275 historical review of, 274 time series analysis, 267-270 brief history of, 267-269 example of, 270 feature of, 269-270 Quetelet, Adolphe, 276-277

R Reading fuzzy logical model of perception, 30f, 37-40, 38f decision-making in, 39-40 feature evaluation in, 38 feature integration in, 39 Stroop color word test, 37 written word recognition influence in, 40-44 orthographic structure influence in, 44-45 sound-to-spelling influence, 47-56 consistency effect, 48-53, 50t fluency/modeling, 52-56 methodological issues, 48 spelling-to-sound influence, 45-47 dual-route model, 45 lexical consistency, 46 spelling-to-sound fluency, 46-47 Reading, history of research synthesis in beginning of reading synthesis, 194-195 compendium of select studies, 194-195 compendium of wide-ranging body of studies, 194-196

325

SUBJECT INDEX change in reading research, 203-207 electronic-based search tools, 203 meta-analysis, 204-205 national reading panel, 205-207 publication outlet, 204 conclusion about, 207-208 other qualitative review, 198-203 eye movement, 202, 206 phonics instruction, 201-202 readability research /phonics review, 200-201 reading difficulty in young children, 202-203 visual apprehension /perception, 199-200 synthetic summary of investigations, 196-198, 203 Reading program, 9-17 traditional, 9-13 Word Work, 13-17 Reading wars, 202, 204 Ready to Teach Act, 229, 230, 231 Redbud. See School accountability, study school (Redbud) Regional dialect, 22, 27-28 Regression effects, 272 Research synthesis in reading, See Reading, history of research synthesis in Rime, 12 Roberts, Chris, 253 Romance word, 16-17 Running Records, 9, 115

S Scholastic Aptitude Test (SAT), 236 School accountability annual ranking of state system, 258-259 blaming teacher /administrator for low performance, 259-260 flaw in accountability movement, 260-261 No Child Left Behind and, 257-258 obliviousness to poverty /low test score correlation, 251-252 socioeconomic status effect on test result, 247, 257, 258 study school (Redbud) high-stakes testing at administration /aftermath of, 255-257

anxiety issue, 252-253 security issue, 253-254 lesson learned from, 261 material assets of study school, 248 overview of student /family of study school, 247-249 staff of study school, 248 student homelife, 249-251 student positive trait, 250 teacher at, 259 test-prep pep rally, 253 using "shame" to raise test score, 254-255 Scope-and-sequence chart, 15 Seasonality, 269 Secondary vowel pattern, 15 Semivowel, 14, 15 Simple-match procedure, 55 Simultaneous equation modeling, 274 Situated cognition, 183-184 SO As (stimulus onset asynchronies), 56 Social dialect, 28-29 Social justice, 245-246 Spatial /anaphoric referent, 101 Spelling, 3-4 invented, 31 written word recognition and soundto-spelling influences, 47-56 spelling-to-sound correspondences, 44-45 spelling-to-sound fluency, 46-47 spelling-to-sound influences, 45-47 Standard American English, 24, See also General American English Stimulus onset asynchronies (SOAs), 56 Stress pattern, 16 Stroop color word test, 37 Structural equation modeling, 273-275 The Structure of the English Language (Venezky), 5, 8, 9, 13 Subjective familiarity measure, 48, 51 Summary of Investigations Relating to Reading (Gray), 196-197, 198 Sundaram, Mark, 292 Supraletter features, 41-42 Syntactic awareness, 130

T Teacher education, accountability in program accreditation, 228, 240-241

326

SUBJECT INDEX

Teacher education (cont .) quality assurance in teaching profession, 228-229 initial policy remedy for, 229-232 reporting licensing pass/fail scores, 230 validity of passing rates, 230-232 teacher education improvement in validity measures, 235-238 validity of competence measure, 233-235 teacher effect on student achievement, 238t-240 teaching license requirement, 228-229 vs. other professional program, 228-229, 241-242 Teacher Education Accreditation Council (TEAC), 230n3 Teacher Effectiveness through Compensation, 259-260 Technology, definition of, 175 Third International Mathematics and Science Study (TIMSS), 232 Time series analysis, 267-270 Title II of Higher Education Act, 229, 230, 231 Truth value, in story vs. information text, 110-111 Turkey. See Functional Adult Literacy Program

U United Nations Development Program, 220 United Nations Industrial Development Organization (UNIDO), 215 United Nations (UN) Literacy Decade, 212, 213

V Venezky, Karen, 2 Venezky, Richard L. biography of, vii-ix on future of literary textbook, 167-168 magic of reading and, 37

morphophonemic concept of, 5, 7-9, 10, 23 orthographic study of, 44 scholarly dictionary and, 289, 290 on spelling/sound in General American English, 21 on text in basal reader, 107 Venezky-Winfield hypothesis, 261 Verb learning, infant-directed speech /-action influence on action analysis, 74 combining action /speech, 75-76 difficulty in learning verbs, 65-67 infant-directed action, 73-74 infant-directed speech, 67-68 as facilitating language acquisition, 68-70 noun favoritism in, 70-73 tasks in learning verbs, 65 vignette about, 63-64 Verb spurt, 71 Virtual learning community (Learnet), 185 Vision-impaired/blind people, 185, 186f Vowel long vowel, 8-9 semivowel, 14, 15 short, 13-14 vowel digraph, 3 vowel marking system, 14-15, 16

W Weintraub, Sam, 197 Where the Wild Things Are (Sendak), 159 Word advantage effect, 41 Word frequency, 44-45 Word origin awareness, 17 Word shape, 41-42, 43 Word superiority effect, 41, 44 WordWork I, 13, 15-16 Word Work II, 16-17 World Bank, 215 World Education Forum on Education for All (EFA), 212, 213 Writing, invention of, 176-177, 187

Z Zero-order fluency, 47