Molecular Biology of the Cell, 5th edition

  • 50 90,746 5
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Molecular Biology of the Cell, 5th edition

MolecularBiologyof Fifth Edition MolecularBiologyof Fifth Edition BruceAlberts Johnson Alexander JulianLewis MartinRa

159,753 69,182 155MB

Pages 1725 Page size 420 x 549 pts Year 2008

Report DMCA / Copyright


Recommend Papers

File loading please wait...
Citation preview

MolecularBiologyof Fifth Edition

MolecularBiologyof Fifth Edition

BruceAlberts Johnson Alexander JulianLewis MartinRaff KeithRoberts PeterWalter

Withproblemsby JohnWilson TimHunt

GarlandScience Group Taylor& Francis

Garland Science Vice President:Denise Schanck Assistant Editor: Sigrid Masson Production Editor and Layout: Emma leffcock Senior Publisher: Jackie Harbor Illustrator: Nigel Orme Designer: Matthew McClements, Blink Studio, Ltc. Editors: Marjorie Anderson and Sherry Granum Copy Editor: Bruce Goatly Indexer: Merrall-Ross International, Ltd. Permissions Coordinator: Marv Disoenza Cell Biology Interactiue Artistic and Scientific Direction: PeterWalter Narrated by: Julie Theriot Production Design and Development: Michael Morales

@ 2008, 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Rafi Keith Roberts, and PeterWalter. @ f 983, f 989, 1994 by Bruce Alberts, Dennis Bray, Iulian Lewis, Martin Raff, Keith Roberts, and lames D. Watson.

Bruce Alberts received his Ph.D. from Harvard university and is professor of Biochemistry and Biophysics at the university of california, san Francisco.For 12 years,he served as President ofthe u.s. NationalAcademy ofSciences (1993-2005). Alexander Johnson received his Ph.D. from Harvard University and is professor of Microbiology and Immunology and Director of the Biochemistry cell Biology, Genetics, and Developmental Biology Graduate Program at the University of california, San Francisco. Iulian Lewis received his D.Phil. from the University of Oxford and is a Principal Scientist at the London ResearchInstitute of Cancer ResearchUK. Martin Raffreceived his M.D. from McGill University and is at the Medical Research Council Laboratory for Molecular Cell Biology and the Biology Department at University College London. Keith Roberts received his Ph.D. from the University of Cambridge and is Emeritus Fellow at the John Innes Centre, Norwich. peterWalter received his ph.D. from The Rockefeller University in Newyork and is professor and chairman of the Department of Biochemistry and Biophysics at the University of california, san Francisco, and an Investigator of the Howard Hughes Medical Institute.

All rights reserved. No part of this book covered by the copyright heron may be reproduced or used in any format in any form or by any means-graphic, electronic, or mechanical, including photocopying, recording, taping, or information storage and retrieval systems-without permission of the publisher. Library of CongressCataloging-in-Publication Data Molecularbiology of the cell / BruceAlberts ... [et al.].-- 5th ed. ISBN 978-0-8153-4r05-5 (hardcover)---ISBN978-0-8f5 g-4t06_Z(paperback) L Cytology.2. Molecular biology. I. Alberts, Bruce. QHsB1.2.M642008 571.6--dc22 2007005475CIP Published by Garland science, Taylor & Francis Group, LLC, an informa business, 270 Madison Avenue, NewYork NY f 0016,USA, and 2 park Square,Milton park, Abingdon, OXl4 4RN, UK. Printed in the United States of America 15 14 13 12 lt

10 I

B 7 6 5 4 3 2 |

Preface In many respects,we understand the structure of the universebetter than the workings of living cells.Scientistscan calculatethe age of the Sun and predict when it will ceaseto shine,but we cannot explain how it is that a human being may live for eighty years but a mouse for only two. We know the complete genomesequencesof theseand many other species,but we still cannot predict how a cell will behaveif we mutate a previouslyunstudied gene.Starsmay be l0a3times bigger,but cells are more complex, more intricately structured,and more astonishingproductsof the laws of physicsand chemistry.Through heredity and natural selection,operating from the beginningsof life on Earth to the presentday-that is, for about 20Voof the ageof the universe-living cellshave been progressivelyrefining and extending their molecular machinery and recording the results of their experimentsin the genetic instructions they pass on to their progeny. With each edition of this book, we marvel at the new information that cell biologistshave gatheredin just a few years.But we are even more amazedand daunted at the sophisticationof the mechanismsthat we encounter.The deeper we probe into the cell,the more we reafizehow much remainsto be understood. In the daysof our innocence,working on the first edition, we hailed the identification of a singleprotein-a signalreceptol say-as a greatstep forward' Now we appreciatethat eachprotein is generallypart of a complexwith many others, working togetheras a system,regulatingone another'sactivitiesin subtleways, and held in specificpositionsby binding to scaffoldproteins that givethe chemical factory a definite spatial structure.Genomesequencinghas given us virtually complete molecular parts-listsfor many different organisms;geneticsand biochemistry have told us a great deal about what those parts are capableof individually and which ones interact with which others; but we have only the most primitive grasp of the dynamics of these biochemical systems,with all their interlocking control loops. Therefore,although there are great achievements to report, cell biologistsface evengreaterchallengesfor the future. In this edition, we haveincluded new material on many topics,rangingfrom epigenetics,histonemodifications,small RNAs,and comparativegenomics,to geneticnoise,cytoskeletaldlmamics,cell-cyclecontrol, apoptosis,stem cells, and novel cancer therapies.As in previous editions, we have tried aboveall to give readersa conceptualframework for the mass of information that we now have about cells.This meansgoing beyond the recitation of facts.The goal is to learn how to put the facts to use-to reason,to predict, and to control the behavior of living systems. To help readerson the way to an activeunderstanding,we have for the first time incorporatedend-of-chapterproblems,written by Iohn Wilson and Tim Hunt. Theseemphasizea quantitative approach and the art of reasoningfrom experiments.A companion volume, MolecularBiologyof the CelI,Fifth Edition: by the sameauthors,givescomTheProblemsBook0SBN978-0-8153-4110-9), plete answersto theseproblemsand also containsmore than 1700additional problemsand solutions. A further major adjunct to the main book is the attachedMedia DVD-ROM disc.This provideshundredsof moviesand animations,including manythat are new in this edition, showingcells and cellular processesin action and bringing the text to life; the disc alsonow includesall the figuresand tablesfrom the main

book,pre-loadedinto PowerPoint@ presentations. Otherancillariesavailablefor the book include a bank of test questionsand lectureoutlines,availableto qualified instructors,and a set of 200full-coloroverheadtransparencies. Perhapsthe biggestchange is in the physical structure of the book. In an effort to make the standard Student Edition somewhatmore portable, we are providing chapters 2r-25, covering multicellular systems,in electronic (pDF) form on the accompanyingdisc,while retaining in the printed volume chapters l-20, covering the core of the usual cell biology curriculum. But we should emphasizethat the final chaptershavebeen revisedand updated as thoroughly as the rest of the book and we sincerelyhope that they will be read!A Reference Edition (ISBN97s-0-8153-4r11-6), containingthe full set of chaptersasprinred pages,is also availablefor thosewho prefer it. Full details of the conventionsadopted in the book are given in the Note to the Readerthat follows this Preface.As explainedthere,we have taken a drastic approachin confronting the different rules for the writing of genenamesin different species:throughout this book, we use the same style, regardlessof species,and often in defianceofthe usualspecies-specific conventions. As always,we are indebted to many people. Full acknowledgmentsfor scientific help are given separately,but we must here singleout someexceptionally important contributions: Iulie Theriot is almost entirely responsiblefor chapters 16 (cytoskeleton)and 24 (Pathogens,Infection, and Innate Immunity), and David Morgan likewisefor chapter 17 (cell cycle).wallace Marshall and Laura Attardi provided substantialhelp with chapters 8 and 20, respectively,as did Maynardolson for the genomicssectionof chapter4, Xiaodongwangfor chapter 18,and NicholasHarberdfor the plant sectionof Chapter15. we also owe a huge debt to the staff of Garland science and others who helped convert writers' efforts into a polished final product. Denise schanck directed the whole enterpriseand shepherdedthe wayward authors along the road with wisdom, skill, and kindness.Nigel orme put the artwork into its final form and supervisedthe visualaspectsof the book,including the backcover,with his usual flair. Matthew Mcclements designedthe book and its front cover. Emma Jeffcocklaid out its pageswith extraordinaryspeedand unflappableefficiency,dealingimpeccablywith innumerablecorrections.MichaelMoralesmanagedthe transformationof a massof animations,video clips, and other materials into a user-friendly DVD-ROM. Eleanor Lawrence and sherry Granum updatedand enlargedthe glossary.JackieHarbor and SigridMassonkept us organized.Adam Sendroffkeptus awareofour readersand their needsand reactions. MarjorieAnderson,BruceGoatly,and sherry Granumcombedthe text for obscurities, infelicities, and errors.we thank them all, not only for their professional skill and dedication and for efficiencyfar surpassingour own, but also for their unfailing helpftrlnessand friendship:they havemadeit a pleasureto work on the book. Lastly,and with no less gratitude, we thank our spouses,families, friends and colleagues. without their patient,enduringsupport,we could not haveproducedany of the editionsof this book.

Contents Features Detailed Contents Acknowledgments A Note to the Reader


uiii ix xxui xxxi

I. 2. 3.

TOTHECELL INTRODUCTION Cellsand Genomes CellChemistryand Biosynthesis Proteins

I 45 t25

PARTII 4. 5. 6. 7.

MECHANISMS BASICGENETIC DNA, Chromosomes,and Genomes DNA Replication,Repair,and Recombination How CellsReadthe Genome:From DNA to Protein Controlof GeneExpression

195 263 329 41I


METHODS Manipulating Proteins,DNA, and RNA VisualizingCells


PARTIV 10. 11.

INTERNAL ORGANIZATION OFTHECELL Membrane Structure MembraneTransportof SmallMoleculesand the Electrical Propertiesof Membranes Intracellular Compartmentsand Protein Sorting IntracellularVesicularTraffic EnergyConversion:Mitochondriaand Chloroplasts Mechanismsof CellCommunication The Cytoskeleton The Cell Cycle Apoptosis

12. 13. 14. 15. 16. t7. tB. PARTV 19. 20. 2I, 22. 23, 24. 25. Glossary Index TabIes

CONTEXT CELLS INTHEIRSOCIAL Cell lunctions, Cell Adhesion,and the ExtracellularMatrix Cancer Chapters2I-25 availableon Media DVD-ROM SexualReproduction:Meiosis,Germ Cells,and Fertilization Developmentof MulticellularOrganisms Tissues,StemCells,and TissueRenewal Specialized Pathogens,Infection, and Innate Immunity The Adaptive Immune System

The Genetic Code,Amino Acids

579 6t7 651 695 749 Br3 879 965 1053 1115 1131 I205 1269 r305 L4l7 1485 1539 G-1 L1 T-1

SpecialFeatures Table l-l Table l-2 Table 2-1 Table2-2 Table 2-3 Table2-4 Panel 2-l Panel2-2 Panel 2-3 Panel2-4 Panel 2-5 Panel 2-6 Panel2-7 Panel 2-B Panel 2-9 Panel 3-l Panel 3-2 Table 3-1 Panel 3-3 Table 4-l Table 5-3 Table 6-l Panel B-l Table 10-l Thble 11-l Panel 1l-2 Panel ll-3 Table l2-l Table l2-2 Table l4-l Panel l4-l Thble r5-5 Panel 16-2 Panel 16-3 Table I7-2 Panel l7-l

SomeGenomesThat HaveBeenCompletelySequenced p. 18 The Numbersof GeneFamilies,classifiedby Function,That Are common to All ThreeDomainsof the LivingWorld p.24 Covalentand NoncovalentChemicalBonds p. 53 TheTypesof MoleculesThat Form a BacterialCell p.55 ApproximateChemicalCompositionsof a TypicalBacteriumand a Typical MammalianCell p.63 RelationshipBetweenthe StandardFree-Energy Change,AG, and the Equilibrium Constant p.77 ChemicalBondsand GroupsCommonlyEncounteredin BiologicalMolecules pp. 106-107 Waterand Its Influenceon the Behaviorof BiologicalMolecules pp. 108-109 The PrincipalTypesof weak NoncovalentBondsthat Hold Macromolecules Together pp. r 1 0 - 1 1 1 An Outline of Someof the Typesof SugarsCommonlyFoundin Cells pp. 1 1 2 - 1 1 3 FattyAcidsand Other Lipids pp. I l4-1 I5 A Surveyof the Nucleotides pp. I 16-1r7 FreeEnergyand BiologicalReactions pp. I IB-t 19 Detailsof the t0 Stepsof Glycolysis pp. r20-I2l The CompleteCitric Acid Cycle pp. I22-t23 The 20 Amino AcidsFoundin Proteins pp.tzg-729 Four DifferentWaysof Depictinga SmallProtein,the SH2Domain pp. 132-133 SomeCommonTypesof Enzymes p.159 Someof the MethodsUsedto StudyEnzymes pp. 162-163 SomeVitalStatisticsfor the Human Genome p.206 ThreeMajor Classesof Transposable Elements p.318 PrincipalTlpes of RNAsProducedin Cells p.336 Reviewof ClassicalGenetics pp. 554-555 ApproximateLipid Compositionsof DifferentCellMembranes p.624 A Comparisonof Ion ConcentrationsInsideand Outsidea TypicalMammalianCell p . 6 5 2 The Derivationof the NernstEquation p.670 SomeClassicalExperimentson the SquidGiantAxon p. 679 RelativeVolumes Occupiedby the Major IntracellularCompartmentsin a Liver Cell (Hepatocyre) p. 697 RelativeAmounts of MembraneTypesin Two Kinds of Eucaryoticcells p.697 ProductYieldsfrom the Oxidationof Sugarsand Fats p.824 RedoxPotentials p.830 The RasSuperfamilyof MonomericGTpases p.926 The Polymerizationof Actin and Tubulin pp. 978-979 AccessoryProteinsthat Controlthe Assemblyand positionof Cvtoskeletal Filaments pp. 994-995 Summaryof the Major Cell-CycleRegulatoryproteins p. 1066 The Princinle

Stases of M Phasp (Mitnsis nnrl Crrfnlrinpcic\ in qn Animal



rATo rA?a

DetailedContents Chapter 1 Cells and Genomes




Information in the SameLinear AllCellsStoreTheirHereditary Chemical Code(DNA) byTemplated Hereditary Information AllCellsReplicateTheir Polymerization Portionsof TheirHereditary Informationinto All CellsTranscribe Form(RNA) the SameIntermediary All CellsUseProteinsasCatalysts RNAinto Proteinin the SameWay All CellsTranslate to One TheFragmentof GeneticInformationCorresponding ProteinlsOneGene LifeRequires FreeEnergy with the Factories Dealing AllCellsFunction asBiochemical BuildingBlocks SameBasicMolecular Across Which in a Plasma Membrane AllCellsAreEnclosed MustPass NutrientsandWasteMaterials A LivingCellCanExistwith FewerThan500Genes Summary


4 5 o

7 8 8 9 10 11

OF GENOMES AND THETREEOF LIFE THEDIVERSITY CellsCanBePoweredby a Varietyof FreeEnergySources SomeCellsFixNitrogenand CarbonDioxidefor Others Cells TheGreatest Biochemical DiversityExistsAmongProcaryotic Archaea, Bacteria, TheTreeof LifeHasThreePrimaryBranches: and Eucaryotes OthersAreHighlyConserved SomeGenesEvolveRapidly; Genes and ArchaeaHave1000-6000 MostBacteria from Preexisting Genes NewGenesAreGenerated of RelatedGenesWithin GiveRiseto Families GeneDuplications a SingleCell Bothin the BetweenOrganisms, GenesCanBeTransferred and in Nature Laboratory of GeneticInformation in HorizontalExchanges SexResults Withina Species TheFunctionof a GeneCanOftenBeDeducedfrom lts Sequence AreCommonto AllThreePrimary MoreThan200GeneFamilies Branches of the Treeof Life the Functions of Genes MutationsReveal HaveFocused a Spotlighton E coli MolecularBiologists Summary


IN EUCARYOTES GENETICINFORMATION CellsMayHaveOriginatedasPredators Eucaryotic CellsEvolvedfrom a Symbiosis ModernEucaryotic HaveHybridGenomes Eucaryotes Eucaryotic GenomesAreBig DNA GenomesAreRichin Regulatory Eucaryotic Development TheGenomeDefinesthe Programof Multicellular LiveasSolitaryCells:the Protists ManyEucaryotes A YeastServesasa MinimalModelEucaryote Levelsof AllTheGenesof An OrganismCanBe TheExpression MonitoredSimultaneously and Computers, To MakeSenseof Cells,We NeedMathematics, Information Quantitative Asa Model HasBeenChosenOut of 300,000Species Arabidopsis



12 13 14 t5 to

17 18 19 21

22 22

23 z)

24 26

26 30 30 31 31 5Z

33 34 35 JO

Bya Worm,a Fly, TheWorldof AnimalCellsls Represented anda Human a Mouse, Development Providea Keyto Vertebrate Studiesin Drosophila Duplication Genomels a Productof Repeated TheVertebrate Butlt Creates ls a Problemfor Geneticists, GeneticRedundancy for EvolvingOrganisms Opportunities asa Modelfor Mammals TheMouseServes Reporton TheirOwnPeculiarities Humans WeAreAll Differentin Detail Summory Problems References

Chapter2 CellChemistryand Biosynthesis OFA CELL COMPONENTS THECHEMICAL

36 37 38 39 5>

40 41 42 42 44

45 45

of Atoms CellsAreMadeFroma FewTypes DetermineHow Atomslnteract TheOutermostElectrons CovalentBondsFormby the Sharingof Electrons ThereAreDifferentTypesof CovalentBonds asif lt Hasa FixedRadius An AtomOftenBehaves in Cells Waterlsthe MostAbundantSubstance AreAcidsand Bases SomePolarMolecules AttractionsHelpBringMolecules of Noncovalent FourTypes Togetherin Cells A Cellls Formedfrom CarbonCompounds Molecules of SmallOrganic CellsContainFourMajorFamilies SugarsProvidean EnergySourcefor CellsandArethe Subunits of Polysaccharides asWellasa of CellMembranes, FattyAcidsAreComponents Sourceof EnergY AminoAcidsArethe Subunitsof Proteins of DNAandRNA Arethe Subunits Nucleotides with of Cellsls Dominatedby Macromolecules TheChemistry Properties Remarkable Shapeof a BondsSpecifyBoththe Precise Noncovalent anditsBindingto OtherMolecules Macromolecule Summary

45 46 48 50 51 51 52

AND THE USEOF ENERGYBY CELLS CATALYSIS by Enzymes CellMetabolismls Organized of HeatEnergy by the Release Orderls MadePossible Biological from Cells Organic UseSunlightto Synthesize Organisms Photosynthetic Molecules CellsObtainEnergyby the Oxidationof OrganicMolecules Transfers Oxidationand ReductionInvolveElectron ThatBlockChemicalReactions Lowerthe Barriers Enzymes Rapidityof TheEnormous FindTheirSubstrates: HowEnzymes MolecularMotions Whetherlt Changefor a ReactionDetermines TheFree-Energy CanOccur the Free-Energy Influences of Reactants TheConcentration Direction Changeand a Reaction's AG"ValuesAreAdditive Reactions, ForSequential for Biosynthesis AreEssential ActivatedCarrierMolecules of an ActivatedCarrierlsCoupledto an TheFormation Reaction Favorable Eneroeticallv


53 54 55 55 58 59 61 oz 63 65

66 66 68 70 71 72 74 75 76 77 78

ATPlsthe MostWidelyUsedActivatedCarrierMolecule EnergyStoredin ATPlsOftenHarnessed to JoinTwoMolecules Together NADHand NADPHAre lmportantElectronCarriers ThereAreManyOtherActivatedCarrierMolecules in Cells TheSynthesis of Biological Polymers ls Drivenby ATpHydrolysis Summary

80 81 82 83 84 87

HOW CELLSOBTAINENERGY FROMFOOD 88 pathway Glycolysis ls a CentralATP-producing 88 Fermentations ProduceATPin the Absenceof Oxygen Rq Glycolysis lllustrates How Enzymes CoupleOxidationto Energy Storage 91 Organisms StoreFoodMolecules in SpecialReservoirs 91 MostAnimalCellsDeriveTheirEnergyfrom FattyAcidsBetween Meals 95 Sugarsand FatsAreBothDegradedto AcetylCoAin Mitochondria vo TheCitricAcidCycleGenerates NADHby OxidizingAcetylGroups to CO2 o7 Electron TransportDrivesthe Synthesis of the Majorityof the ATp in MostCells 100 AminoAcidsand Nucleotides Are partof the NitrogenCycle 100 Metabolismls Organized and Regulated 101 Summary 103 Problems 103 References 124

Chapter3 Proteins THESHAPE ANDSTRUCTURE OFPROTEINS TheShapeof a Proteinls Specified by lts AminoAcidSequence ProteinsFoldinto a Conformation of LowestEnergy ThecrHelixand the B SheetAreCommonFoldingpatterns ProteinDomainsAreModularUnitsfrom whichLargerproteins AreBuilt Fewof the ManyPossible Polypeptide ChainsWillBeUsefur to Cells Proteins CanBeClassified intoManyFamilies Sequence Searches CanldentifyCloseRelatives SomeProteinDomainsFormpartsof ManyDifferentproteins CertainPairsof DomainsAreFoundTogetherin Manyproteins TheHumanGenomeEncodes a ComplexSetof proteins, Revealing MuchThatRemains Unknown LargerProteinMolecules OftenContainMoreThanOne Polypeptide Chain SomeProteinsFormLongHelicalFilaments ManyProteinMolecules HaveElongated, FibrousShapes ManyProteins Containa Surprisingly LargeAmountof Unstructured Polypeptide Chain proteins CovalentCross-Linkages OftenStabilize Extracellular ProteinMolecules OftenServeasSubunitsfor the Assembry of LargeStructures ManyStructures in CellsAreCapableof Self-Assembly AssemblyFactors OftenAidthe Formationof Comolex Biological Structures Summary

125 125 tzJ

130 131 tJ)

I Jt)

137 139 140 141 142 "t42 143 t4)

"146 147 148 149 151 't52

PROTEINFUNCTION 152 All Proteins Bindto OtherMolecules 153 TheSurfaceConformation of a ProteinDetermines lts Chemistrv 154 Sequence Comparisons BetweenproteinFamilyMembers Highlight Crucial Ligand-Binding Sites 155 ProteinsBindto OtherProteins ThroughSeveral Typesof Interfaces tf,o AntibodyBindingSitesAreEspecially Versatile 156 TheEquilibrium Constant Measures BindingStrength 157 Enzymes ArePowerfuland HighlySpecific Catalysts 158 Substrate Bindingls the FirstStepin EnzymeCatalysis i59 Enzymes SpeedReactions by Selectively Stabilizing Transitron States 160 Enzymes CanUseSimultaneous AcidandBaseCatalysis 160 Lysozyme lllustrates Howan EnzymeWorks 16"1 TightlyBoundSmallMolecules Add ExtraFunctions to prorerns 166

MolecularTunnels Channel Substrates in Enzymes with '167 MultipleCatalytic Sites Multienzyme Complexes Helpto Increase the Rateof Cell Metabolism 168 TheCellRegulates the Catalytic Activitiesof its Enzymes 169 AllostericEnzymes HaveTwoor MoreBindingSitesThatInteract 1 7 1 TwoLigandsWhoseBindingSitesAreCoupledMust Reciprocally AffectEachOther'sBinding 171 SymmetricProteinAssemblies ProduceCooperative Allosteflc Transitions 172 TheAllosteric Transition in Aspartate Transcarbamoylase ls Understood in AtomicDetail 173 ManyChangesin Proteins Are Drivenby Protein Phosphorylation 175 A Eucaryotic CellContainsa LargeCollectionof ProteinKinases and ProteinPhosphatases 176 TheRegulation of Cdkand SrcProteinKinases ShowsHowa ProteinCanFunctionasa Microchip 177 Proteins ThatBindand Hydrolyze GTPAreUbiquitousCellular Regulators 178 proteins Regulatory Proteins Controlthe Activityof GTP-Binding by Determining WhetherGTPor GDPls Bound 179 LargeProteinMovements CanBeGenerated FromSmallOnes 179 MotorProteinsProduceLargeMovementsin Cells 181 Membrane-Bound Transporters Harness Energyto Pump Molecules ThroughMembranes 182 ProteinsOftenFormLargeComplexes ThatFunctionasProtein Machines 184 ProteinMachines with Interchangeable PartsMakeEfficientUse of Geneticlnformation 184 TheActivationof ProteinMachines OftenInvolvesPositioning Themat SpecificSites 185 ManyProteinsAreControlledby MultisiteCovalentModification t 6 0 A ComplexNetworkof ProteinInteractions Underlies CellFunction 187 Summary 190 Problems 191 References 193

Chapter4 DNA,Chromosomes, and Genomes




A DNAMoleculeConsists of TwoComplementary Chains of Nucleotides TheStructureof DNAProvides a Mechanism for Heredity In Eucaryotes, DNAls Enclosed in a CellNucleus Summory

197 199 200 201


202 Eucaryotic DNAls Packaged into a Setof Chromosomes 202 Chromosomes ContainLongStringsof Genes 204 TheNucleotide Sequence ofthe HumanGenomeShowsHow OurGenesAreArranged 205 GenomeComparisons RevealEvolutionarily Conserved DNA )equences 207 Chromosomes Existin DifferentStatesThroughout the Life ofa Cell 2OB EachDNAMoleculeThatFormsa LinearChromosome Must Containa Centromere, TwoTelomeres, and Replication Origins 2Og DNAMolecules Are HighlyCondensed in Chromosomes 210 Nucleosomes Area BasicUnitof Eucaryotic Chromosome Structure 211 TheStructureofthe Nucleosome CoreParticleReveals How DNAls Packaged Z'tZ Nucleosomes Havea DynamicStructure, and AreFrequentry Subjected to ChangesCatalyzed by ATp-Dependent ChromatinRemodeling Complexes 215 Nucleosomes AreUsuallyPacked Togetherinto a Compact Chromatin Fiber lto Summary 218

THEREGULATION OFCHROMATIN STRUCTURE SomeEarlyMysteries Concerning Chromatin Structure

219 220

Resistant Heterochromatin ls HighlyOrganized andUnusually 220 to GeneExoression Modifiedat ManyDifferentSites TheCoreHistonesAreCovalently ChromatinAcquiresAdditionalVarietythroughthe Site-Specific Variants lnsertion of a SmallSetof Histone andthe HistoneVariantsAct in TheCovalentModifications Concertto Producea "HistoneCode"ThatHelpsto Function Determine Biological andCode-Writer Proteins CanSpread A Comolexof Code-Reader Alonga for LongDistances Specific ChromatinModifications Chromosome Complexes Blockthe Spreadof Reader-Writer BarrierDNASequences Domains Separate Neighboring Chromatin andThereby How HistoneVariants Reveals TheChromatinin Centromeres zt6 CanCreateSpecialStructures 230 CanBeDirectlyInherited ChromatinStructures to Eucaryotic Add UniqueFeatures ChromatinStructures 231 Function Chromosome 233 Summary THEGLOBALSTRUCTURE OF CHROMOSOMES


AreFoldedinto LargeLoopsof Chromatin Chromosomes Chromosomes AreUniquelyUsefulfor Visualizing Polytene ChromatinStructures ThereAreMultipleFormsof Heterochromatin Whenthe GenesWithinThemAre ChromatinLoopsDecondense Exoressed ChromatinCanMoveto SpecificSitesWithinthe Nucleusto AlterTheirGeneExoression Forma Setof DistinctBiochemical Networksof Macromolecules insidethe Nucleus Environments AreFormedfrom Chromatinin lts Most MitoticChromosomes State Condensed Summary




of the Norma AreCausedby Failures GenomeAlterations DNA for CopyingandMaintaining Mechanisms Differin Proportionto of TwoSpecies TheGenomeSequences Evolved the LengthofTimeThatTheyHaveSeparately of DNA from a Comparison TreesConstructed Phylogenetic of All Organisms Tracethe Relationships Sequences Shows of HumanandMouseChromosomes A Comoarison Howthe Structures of GenomesDiverge Ratesof GenomeReflects the Relative TheSizeof a Vertebrate DNAAdditionand DNALossin a Lineage the Sequence of SomeAncientGenomes WeCanReconstruct ldentifylmportantDNA Comparisons Multispecies Sequence Sequences of UnknownFunction Sequences Can Conserved Changesin Previously Accelerated HelpDecipher Critical Stepsin HumanEvolution an lmportantSourceof Genetic GeneDuplicationProvides NoveltyDuringEvolution GenesDiverge Duplicated TheEvolutionof the GlobinGeneFamilyShowsHow DNA of Organisms Contribute to the Evolution Duplications CanBeCreatedby the GenesEncodingNewProteins Recombination of Exons NeutralMutationsOftenSpreadto BecomeFixedin a Population, that Dependson PopulationSize with a Probability oftheVariation A GreatDealCanBeLearnedfrom Analyses AmongHumans Summary Problems References

Chapter5 DNA Replication,Repair,and Recombination OFDNASEQUENCES THEMAINTENANCE Low MutationRatesAre Extremely for LifeasWe Knowlt LowMutationRatesAreNecessary Summory

236 256

239 239 241 243 245

246 247 248 249 251 251 2s2 253 z>5 254

257 257 258 260 lou


263 263 205

265 265

266 MECHANISMS DNA REPLICATION too andDNARepair DNAReplication Underlies Base-Pairing 266 Forkls Asymmetrical TheDNAReplication Proofreading Several Requires TheHighFidelityof DNAReplication Mechanisms AllowsEfficientError in the 5'-to-3'Direction OnlyDNAReplication 27"1 Correction ShortRNA EnzymeSynthesizes A 5pecialNucleotide-Polymerizing 272 on the LaggingStrand PrimerMolecules Helpto OpenUpthe DNADoubleHelixin Front Proteins Special 273 Fork ofthe Replication 273 ontothe DNA A SlidingRingHoldsa MovingDNAPolymerase to Forma Replication ForkCooperate at a Replication TheProteins 275 Machine Replication MismatchRepairSystemRemoves A Strand-Directed 276 Machine from the Replication ErrorsThatEscape PreventDNATanglingDuringReplication z I d DNATopoisomerases and in Eucaryotes Similar ls Fundamentally DNAReplication 280 Bacteria 281 Summary OF DNA REPLICATION AND COMPLETION THEINITIATION 281 IN CHROMOSOMES 281 Origins Replication at Begins DNASynthesis TypicallyHavea SingleOriginof DNA Chromosomes Bacterial 26l Reolication ContainMultipleOriginsof Replication 282 Chromosomes Eucaryotic TakesPlaceDuringOnlyOnePart DNAReplication In Eucaryotes 284 of the cell cycle at Distinct Replicate on the SameChromosome DifferentRegions 285 Timesin S Phase Late,WhileGenesin ChromatinReplicates HighlyCondensed 285 Early Tendto Replicate Chromatin LessCondensed Originsin a ServeasReplication DNASequences Well-Defined 260 the BuddingYeast SimpleEucaryote, Originsof A LargeMultisubunitComplexBindsto Eucaryotic 287 Reolication ThatSpecifythe Initiationof TheMammalianDNASequences 288 HaveBeenDifficultto ldentify Replication 289 Fork Behindthe Replication AreAssembled NewNucleosomes DuplicationEnsure Chromosome of Eucaryotic TheMechanisms 290 CanBeInheriteo of HistoneModification ThatPatterns 292 the Endsof Chromosomes Replicates Telomerase zY5 by CellsandOrganisms Lengthls Regulated Telomere 294 Summary DNA REPAIR DNADamageWouldRapidly Spontaneous WithoutDNARepair, ChangeDNASequences TheDNADoubleHelixls ReadilyRepaired DNADamageCanBeRemovedby MoreThanOne Pathway Thatthe Cell'sMost Ensures CouplingDNARepairto Transcription Repaired lmportantDNAls Efficiently DamageDetection of the DNABasesFacilitates TheChemistry to RepairDNA AreUsedin Emergencies SpecialDNAPolymerases Repaired Are Efficiently Breaks Double-Strand of the CellCycle DNADamageDelaysProgression Summary


RECOMBINATION HOMOLOGOUS HasManyUsesin the Cell Recombination Homologous in All Cells HasCommonFeatures Recombination Homologous Recombination GuidesHomologous DNABase-Pairing TheRecAProteinand its HomologsEnablea DNA5ingleStrand Regionof DNADoubleHelix to Pairwith a Homologous or Regions BranchMigrationCanEitherEnlargeHetroduplex DNAasa SingleStrand NewlySynthesized Release DoubleRepair CanFlawlessly Recombination Homologous in DNA Breaks Stranded Recombination the Useof Homologous CellsCarefullyRegulate in DNARepair HollidayJunctionsAreOftenFormedDuringHomologous Events Recombination


296 296 297 299 300 302 302 303 304

304 305 305 307 308 308 310 311

MeioticRecombination Beginswith a programmed DoubleStrandBreak Homologous Recombination OftenResults in GeneConversron promiscuous MismatchProofreading Prevents Recombinatron BetweenTwoPoorlyMatchedDNASequences Summary TRANSPOSITION AND CONSERVATIVE SITE-SPECIFIC RECOMBINATION

312 314 315 5to


ThroughTransposition, MobileGeneticElements CanInsertlnto AnyDNASequence y7 DNA-OnlyTransposons Moveby BothCut-and-paste and Replicative Mechanisms 317 SomeVirusesUsea Transposition Mechanism to MoveThemselves intoHostCellChromosomes 319 Retroviral-like Retrotransposons Resemble Retroviruses, but Lacka ProteinCoat 320 A LargeFractionof the HumanGenomels Comoosedof Nonretroviral Retrotransposons 32,l predominate DifferentTransposable Elements in Different Organisms 322 GenomeSequences Reveal the Approximate Timesthat Transposable Elements HaveMoved 323 Conservative Site-Specific Recombination CanReversibly Rearrange DNA 323 Conservative Site-Specific Recombination WasDiscovered in Bacteriophage ), n+ Conservative Site-Specific Recombination CanBeUsedto Turn GenesOn or Off 324 Summary 326 Problems 327 References 328

Chapter6 How CellsReadthe Genome:From DNAto Protein




Portionsof DNASequence AreTranscribed into RNA 552 Transcription Produces RNAComplementary to OneStrandof DNA 5 5 5 CellsProduceSeveral Typesof RNA 335 SignalsEncodedin DNATellRNApolymerase Whereto Startand Stop 336 Transcription Startand StopSignalsAreHeterogeneous in NucleotideSequence 338 Transcription Initiationin Eucaryotes Requires Manyproteins 339 RNAPolymerase ll Requires GeneralTranscription Factors 340 Polymerase ll AlsoRequires Activator, Mediator, andChromatinModifyingProteins 342 Transcription Elongation Produces Superhelical Tension in DNA 343 Transcription Elongationin Eucaryotes lsTightlyCoupledto RNA Processing 345 pre-mRNAs 346 RNACappinglsthe FirstModification of Eucaryotic RNASplicingRemoves IntronSequences from NewlyTranscribed Pre-mRNAs 347 Nucleotide Sequences SignalWhereSplicing Occurs 349 RNASplicingls Performedby the Spliceosome 349 TheSpliceosome UsesATPHydrolysis to producea ComplexSeries of RNA-RNA Rearrangements 351 OtherProperties of Pre-mRNA and lts Synthesis Helpto Explain the Choiceof ProperSpliceSites 352 A Second5et of snRNPs Splicea SmallFractionof IntronSequences in Animals andPlants 353 plasticity RNASplicingShowsRemarkable J55 Spliceosome-Catalyzed RNASplicingprobablyEvolvedfrom Self-Splicing Mechanisms 355 RNA-Processing Enzymes Generate the 3, Endof Eucaryotic mRNAs5 t / MatureEucaryotic mRNAsAreSelectively Exportedfrom tne Nucleus 358 ManyNoncodingRNAsAreAlsoSynthesized and processed in the Nucleus 360 TheNucleolus ls a Ribosome-producing Factory 502 TheNucleusContainsa Varietyof Subnuclear Structures 50J Summarv 366



An mRNASequence ls Decodedin SetsofThreeNucleotide IRNAMolecules MatchAminoAcidsto Codonsin mRNA tRNAsAreCovalently ModifiedBeforeTheyExitfrom the Nucleus SpecificEnzymes CoupleEachAminoAcidto ltsAppropriateIRNA Molecule Editingby RNASynthetases Ensures Accuracy AminoAcidsAreAddedto the C-terminal Endof a Growing Polypeptide Chain TheRNAMessage ls Decodedin Ribosomes ElongationFactorsDriveTranslation Forwardand lmprovelts Accuracy TheRibosome ls a Ribozyme NucleotideSequences in mRNASignalWhereto StartProtein Synthesis StopCodonsMarktheEndofTranslation Proteins AreMadeon Polyribosomes ThereAreMinorVariations in the StandardGeneticCode Inhibitorsof Procaryotic ProteinSynthesis AreUsefulas Antibiotics Accuracy in Translation Requires the Expenditure of FreeEnergy Actto Prevent QualityControlMechanisms Translation of Damaqed mRNAs SomeProteinsBeginto FoldWhileStillBeingSynthesized MolecularChaperones HelpGuidethe Foldingof Mostproteins ExposedHydrophobic RegionsProvideCritical5ignalsfor protein QualityControl TheProteasome lsa Compartmentalized Protease with Sequestered ActiveSites An Elaborate Ubiquitin-Conjugating SystemMarksProteins for Destruction ManyProteins AreControlledby Regulated Destruction AbnormallyFoldedProteins CanAggregateto CauseDestructive HumanDiseases ThereAreManyStepsFromDNAto Protein Summory

367 368 369

THERNAWORLDAND THEORIGINS OF LIFE LifeRequires StoredInformation Polynucleotides CanBothStoreInformationand Catalyze ChemicalReactions A Pre-RNA WorldMayPredatethe RNAWorld Single-Stranded RNAMolecules CanFoldintoHighlyElaborate Structures Self-Replicating Molecules UndergoNaturalSelection How Did ProteinSynthesis Evolve? All Present-Day CellsUseDNAasTheirHereditary Material Summary Problems References

Chapter7 Controlof GeneExpression AN OVERVIEW OF GENECONTROL TheDifferentCellTypesof a Multicellular OrganismContainthe SameDNA DifferentCellTypesSynthesize DifferentSetsof proteins ExternalSignalsCanCausea Cellto Changethe Expression of ItsGenes GeneExpression CanBeRegulated at Manyofthe Stepsin the Pathway from DNAto RNAto Protein Summary

370 371 5t5

373 377 379 379 381 391 392 383 385 385 387 388 390 391 393 39s 396 3gg 3gg 4OO 401 401 402 403 404 407 408 408 409 410

411 4'11 411 412 413 415 415

DNA-BINDING MOTIFS INGENE REGULATORY PROTEINS 416 GeneRegulatory Proteins WereDiscovered UsingBacterial Genetics TheOutsideof the DNAHelixCanBeReadby proteins ShortDNASequences Are Fundamental Components of Genetic Switches GeneRegulatory Proteins ContainStructuralMotifsThatCan ReadDNASeouences TheHelix-Turn-Helix Motif lsOneof the Simplestand Most CommonDNA-B|nding Motifs

416 416 418 418 419

Proteins Constitutea SpecialClassof Helix-TurnHomeodomain 420 HelixProteins 421 of DNA-B|nding ZincFingerMotifs ThereAreSeveralTypes p sheetsCanAlsoRecognize 422 DNA SomeProteinsUseLoopsThatEnterthe Majorand MinorGroove 423 to Recognize DNA TheLeucineZipperMotifMediatesBothDNABindingand Protein 423 Dimerization That Expands the Repertoire of DNASequences Heterodimerization 424 Proteins CanRecognize GeneRegulatory and DNA MotifAlsoMediatesDimerization TheHelix-Looo-Helix 425 Binding Recognized to Predictthe DNASequences It ls NotYetPossible 426 Proteins by All GeneRegulatory ShiftAssayReadilyDetectsSequence-Specific A Gel-Mobility 427 Proteins DNA-Binding of Facilitates the Purification DNAAffinityChromatography 428 Proteins DNA-Binding Sequence-Specific Protein Recognized by a GeneRegulatory TheDNASequence 429 CanBeDeterminedExperimentally Sequences FootprintingldentifiesDNARegulatory Phylogenetic 431 Genomics ThroughComparative ldentifiesManyof the Sites Chromatinlmmunoprecipitation 431 Proteins Occupyin LivingCells ThatGeneRegulatory 432 Summary 432 WORK HOW GENETICSWITCHES Genes That Turns Repressor ls a Simple Switch TheTryptophan 433 On and Off in Bacteria 435 Activators TurnGenesOn Transcriptional Repressor Activatorand a Transcriptional A Transcriptional 435 Controlthe LocOperon 437 GeneRegulation DNALoopingOccursDuringBacterial to Help RNAPolymerase Subunits Bacteria UseInterchangeable 438 GeneTranscription Regulate ComplexSwitchesHaveEvolvedto ControlGeneTranscription 439 in Eucaryotes of a PromoterPlus GeneControlRegionConsists A Eucaryotic 440 DNASequences Regulatory of RNA GeneActivatorProteinsPromotethe Assembly Eucaryotic at the Factors Polymerase and the GeneralTranscription 441 Startpointof Transcription AlsoModifyLocalChromatin GeneActivatorProteins Eucaryotic 442 Structure 444 WorkSynergistically GeneActivatorProteins ProteinsCanInhibitTranscription GeneRepressor Eucaryotic 445 in VariousWays ProteinsOftenBindDNA GeneRegulatory Eucaryotic 445 Cooperatively Development ThatRegulate Drosophila ComplexGeneticSwitches 447 Modules AreBuiltUp fromSmaller Controls 448 by Combinatorial EveGenels Regulated fhe Drosophila AreAlsoConstructed GeneControlRegions ComplexMammalian 450 Modules from SimpleRegulatory Gene ThatPreventEucaryotic Are DNASequences Insulators +)z from Influencing DistantGenes Proteins Regulatory 453 RapidlyEvolve GeneSwitches 453 Summary THATCREATE MECHANISMS GENETIC THEMOLECULAR 454 CELLTYPES SPECIALIZED 454 in Bacteria PhaseVariation Mediate DNARearrangements CellTypein a ProteinsDetermines A Setof GeneRegulatory 455 BuddingYeast the Determine EachOther! Synthesis Repress Two ProteinsThat Lambda HeritableStateof Bacteriophage CircuitsCanBeUsedto MakeMemory SimpleGeneRegulatory 458 Devices Allowthe Cellto CanyOut LogicOperations 459 Circuits Transcriptional Parts 460 Biological from Existing NewDevices BiologyCreates Synthetic Loopsin GeneRegulation 460 ClocksAre Basedon Feedback Circadian the Expression ProteinCanCoordinate A SingleGeneRegulatory of a Setof Genes

ProteinCanTrigger of a CriticalGeneRegulatory Expression Genes of a WholeBatteryof Downstream the Expression ManyDifferentCellTypes GeneControlCreates Combinatorial in Eucaryotes ProteinCanTriggerthe Formation A SingleGeneRegulatory of an EntireOrgan ThePatternof DNAMethylationCanBeInheritedWhen CellsDivide Vertebrate on DNAMethylation lmprintingls Based Genomic with ManyGenesin Mammals lslandsAreAssociated CG-Rich of ThatStablePatterns Ensure Mechanisms Epigenetic to DaughterCells CanBeTransmitted GeneExpression in ChromatinStructure Alterations Chromosome-Wide CanBeInherited Noisy is Intrinsically TheControlof GeneExpression Summary PTIONALCONTROLS POST-TRANSCRI Termination the Premature AttenuationCauses Transcription of SomeRNAMolecules AncientFormsof GeneControl MightRepresent Riboswitches AlternativeRNASplicingCanProduceDifferentFormsof a Proteinfrom the SameGene TheDefinitionof a GeneHasHadto BeModifiedSincethe RNASplicing of Alternative Discovery Dependson a Regulated Drosophilo SexDeterminationin Seriesof RNASplicingEvents and Poly-A Cleavage A Changein the Siteof RNATranscript of a Protein AdditionCanChangethe C-terminus the Meaningof the RNAMessage RNAEditingCanChange from the NucleusCanBeRegulated RNATransport of the Cytoplasm to SpecificRegions SomemRNAsAre Localized Control of mRNAs Regions The5'and3'Untranslated TheirTranslation Protein of an lnitiationFactorRegulates ThePhosphorylation Globally Synthesis Start lnitiationat AUGCodonsUpstreamof the Translation Initiation Translation CanRegulateEucaryotic for EntrySitesProvideOpportunities InternalRibosome Control Translation GeneExpression Changesin mRNAStabilityCanRegulate Poly-AAdditionCanRegulateTranslation Cytoplasmic ManyAnimaland Regulate RNATranscripts SmallNoncoding PlantGenes ls a CellDefenseMechanism RNAInterference Formation CanDirectHeterochromatin RNAInterference Tool HasBecomea PowerfulExperimental RNAlnterference Summory Problems References

464 465 467 468 470 471 473 476 477 477 477 478 479 480 481 482 483 485 486 488 488 489 491 492 493 493 495 496 497 497 497 499

Chapter8 ManipulatingProteins,DNA,and RNA 50r 501 THEMINCULTURE ANDGROWING CELLS ISOLATING CellsCanBelsolatedfrom IntactTissues CellsCanBeGrownin Culture CellLinesArea WidelyUsedSourceof Eucaryotic Cells Homogeneous Medicine StemCellsCouldRevolutionize Embryonic MayProvidea Wayto SomaticCellNuclearTransplantation StemCells Personalized Generate ThatProduceMonoclonal HybridomaCellLinesAreFactories Antibodies Summary

502 s02

PROTEINS PURIFYING intoTheirComponentFractions CellsCanBeSeparated to StudyCellFunctions Systems ProvideAccessible CellExtracts by Chromatography CanBeSeparated Proteins ExploitsSpecificBindingSiteson AffinityChromatography Proteins TagsProvidean EasyWayto Purify Genetically-Engineered Proteins


50s 505 s07 508 s10 510 511 512 513 514

PurifiedCell-Free Systems AreRequired for the preciseDissection of Molecular Functions Summory ANALYZING PROTEINS Proteins CanBeSeparated by SDSpolyacrylamide-Gel Electrophoresis SpecificProteins CanBeDetectedby Blottingwith Antibodies MassSpectrometry Provides a HighlySensitive Method for ldentifyingUnknownproteins Two-Dimensional powerful Separation MethodsareEspecially Hydrodynamic Measurements Reveal the SizeandShapeof a Proteincomolex Setsof InteractingProteins CanBeldentifiedby Biochemical Methods Protein-Protein Interactions CanAlsoBeldentifiedby a Two-Hybrid Technique in yeast produces CombiningDataDerivedfrom DifferentTechniques Reliable Protein-lnteraction MaDs OpticalMethodsCanMonitorProteinInteractions in RealTime SomeTechniques CanMonitorSingleMolecules ProteinFunctionCanBeSelectively Disruptedwith Small Molecules ProteinStructureCanBeDeterminedUsingX-RayDiffraction NMRCanBeUsedto DetermineproteinStructurein Solutron ProteinSequence and StructureprovideCluesAboutprotein Function Summory ANALYZING AND MANIPULATING DNA Restriction Nucleases Cut LargeDNAMolecules into Fragments GelElectrophoresis Separates DNAMolecules of DifferentSizes Purified DNAMolecules CanBeSpecifically Labeled with Radioisotopes or ChemicalMarkersin yitro providea Sensitive NucleicAcidHybridization Reactions Wayof DetectingSpecificNucleotideSequences Northernand SouthernBlottingFacilitate Hybridization with Electrophoretically Separated NucleicAcidMolecules GenesCanBeClonedUsingDNALibraries TwoTypesof DNALibraries ServeDifferentpurooses cDNAClones ContainUninterrupted CodingSequences GenesCanBeSelectively Amplifiedby pCR CellsCanBeUsedAs Factories to produceSoecificproteins Proteins and NucleicAcidsCanBeSynthesized Directlyby Chemical Reactions DNACanBeRapidly Sequenced Nucleotide Sequences AreUsedto predictthe AminoAcio Sequences of Proteins TheGenomesof ManyOrganisms HaveBeenFullySequenceo Summary STUDYING GENEEXPRESSION AND FUNCTION Classical Genetics Begins by Disrupting a Cellprocess by Ranoom Mutagenesis GeneticScreens ldentifyMutantswith Specific Abnormalirres MutationsCanCauseLossor Gainof proteinFunction Complementation TestsReveal WhetherTwoMutationsAre in the SameGeneor DifferentGenes GenesCanBeOrderedin Pathways by Epistasis Analysis Genesldentifiedby MutationsCanBeCloned HumanGenetics Presents 5pecialproblems andSpecial Opportunities HumanGenes AreInherited in Haplotype Blocks, WhichCan Aid in the Searchfor MutationsThat CauseDisease ComplexTraitsAre Influenced by MultipleGenes Reverse GeneticsBeginswith a KnownGeneand Determines WhichCellProcesses Requirelts Function GenesCanBeRe-Engineered in Several Ways Engineered GenesCanBeInsertedintothe GermLineof ManyOrganisms Animals CanBeGenetically Altered Transgenic PlantsAre lmportantfor BothCellBiologyand Agriculture

) to ) to

517 517 518 519 521

522 523 523 524 524 526 J2/ ill

529 530 )Jl

532 s32 534 s35 )J)

s38 540 541 544 544 546

548 548 550 551 ))z

553 553 556

558 558 s59 s60 561 s63

LargeCollections ofTaggedKnockouts Providea Toolfor Examining the Function of EveryGenein an Organism RNAInterference ls a Simpleand RapidWayto TestGeneFunction ReporterGenesand /n SituHybridization RevealWhen ano Wherea Genels Expressed Expression of Individual GenesCanBeMeasured Usino RT-PCR Quantitative Microarrays Monitorthe Expression of Thousands of Genesat Once 5ingle-Cell GeneExpression Analysis Reveals Biological"Noise" Summary Problems References Chapter 9 Visualizing Cells

569 571 572 573 574 575 576 576 579



579 TheLightMicroscope CanResolve Details0.2pm Apart s80 LivingCellsAreSeenClearlyin a Phase-Contrast or a DifferentialInterference-Contrast Microscooe 583 lmagesCanBeEnhanced andAnalyzed by DigitalTechniques 583 IntactTissues AreUsuallyFixedand SectionedbeforeMicroscopy 585 SpecificMolecules CanBeLocatedin Cellsby Fluorescence Microscopy 586 AntibodiesCanBeUsedto DetectSpecificMolecules 588 lmagingof ComplexThree-Dimensional ObjectslsPossible with the OpticalMicroscope 589 TheConfocalMicroscope Produces OpticalSectionsby Excluding Out-of-Focus Light 590 Fluorescent Proteins CanBeUsedtoTagIndividualproteinsin LivingCellsandOrganisms 592 ProteinDynamics CanBeFollowed in LivingCells 593 Light-Emitting Indicators CanMeasure Rapidly Changing Intracellular lonConcentrations 596 Several Strategies AreAvailableby WhichMembrane-lmpermeant Substances CanBeIntroducedinto Cells 597 LightCanBeUsedto ManipulateMicroscopic ObjectsAsWell Asto lmageThem 598 SingleMolecules CanBeVisualized by UsingTotalInternal Reflection Fluorescence Microscopy 5gg Individual Molecules CanBeTouched andMovedUsingAtomic ForceMicroscopy 600 Molecules CanBeLabeledwith Radioisotopes 600 Radioisotopes AreUsedtoTraceMolecules in CellsandOrganisms602 Summary 603 LOOKING AT CELLSAND MOLECULES IN THEELECTRON



TheElectronMicroscope Resolves the FineStructureofthe Cell 604 Biological Specimens Require Special Preparation for the Electron Microscope 605 Specific Macromolecules CanBeLocalized by lmmunogold Electron Microscopy 606 lmagesof Surfaces CanBeObtained by Scanning Electron Microscopy 607 MetalShadowing AllowsSurfaceFeatures to BeExamined at HighResolution byTransmission ElectronMicroscopy 60g NegativeStainingand Cryoelectron Microscopy BothAllow Macromolecules to BeViewedat HighResolution 6l O MultiplelmagesCanBeCombined to Increase Resolution 610 DifferentViewsof a SingleObjectCanBeCombinedto Givea Three-Dimensional Reconstruction 612 Summary ot2 Problems 614 References ot)

s63 564

Chapter10 MembraneStructure


THELIPIDBILAYER Phosphoglycerides, Sphingolipids, andSterols AretheMajor


L i p i d si n C e l lM e m b r a n e s P h o s p h o l i o i d sS o o n t a n e o u s l vF o r m B i l a v e r s






TheLipidBilayer lsa Two-Dimensional Fluid TheFluidityof a LipidBilayerDependson lts Composition CanFormDomainsof DespiteTheirFluidity,LipidBilayers DifferentCompositions Monolayer LipidDropletsAreSurrounded by a Phospholipid lmportant TheAsymmetryof the LipidBilayerls Functionally Are Foundon the Surfaceof All PlasmaMembranes Glycolipids Summary




with the LipidBilayerin MembraneProteins CanBeAssociated Various Ways of Some LipidAnchorsControlthe MembraneLocalization Proteins Signaling Proteins the Polypeptide ChainCrosses In MostTransmembrane the LipidBilayerin an o-HelicalConformation Transmembrane crHelicesOftenlnteractwith OneAnother FormLargeTransmembrane Channels Somep Barrels AreGlycosylated ManyMembraneProteins and Purifiedin Detergents MembraneProteins CanBeSolubilized lsa Light-Driven ProtonPumpThatTraverses Bacteriorhodopsin the LipidBilayerasSevens Helices OftenFunctionasLargeComplexes MembraneProteins ManyMembraneProteinsDiffusein the Planeof the Membrane and Lipidsto SpecificDomainsWithin CellsCanConfineProteins a Membrane Mechanical Strength GivesMembranes TheCorticalCytoskeleton and RestrictMembraneProteinDiffusion Summary Problems References

622 624 ot>

626 628 629

629 630 631 632 634 o5f

636 640 642 642 645 646 648 648 650

Chapter11 MembraneTransportof SmallMolecules 651 and the ElectricalPropertiesof Membranes PRINCIPLES OF MEMBRANETRANSPORT


to lons LipidBilayers AreHighlylmpermeable Protein-Free TransportProteins: of Membrane ThereAreTwo MainClasses and Channels Transporters Coupledto an ActiveTransportls MediatedbyTransporters EnergySource Summary


AND ACTIVEMEMBRANETRANSPORT TRANSPORTERS CanBeDrivenby lon Gradients ActiveTransport pH Cytosolic in the PlasmaMembraneRegulate Transporters Cells in Epithelial Distribution of Transporters An Asymmetric Transportof Solutes Underlies the Transcellular Pumps of ATP-Driven ThereAreThreeClasses P-typeATPase TheCa2+Pumplsthe Best-Understood the PumpEstablishes ThePlasmaMembraneP-typeNa+-K+ Na+GradientAcrossthe PlasmaMembrane Constitutethe LargestFamilyof Membrane ABCTransporters TransoortProteins Summary

654 656


6s3 654


658 oou 661 663 667

OF PROPERTIES ANDTHEELECTRICAL ION CHANNELS 667 MEMBRANES and FluctuateBetweenOpenand Arelon-Selective lon Channels 667 closedStates TheMembranePotentialin AnimalCellsDependsMainlyon K+Leak 669 and the K+GradientAcrossthe PlasmaMembrane Channels Pump the Na+-K+ TheRestingPotentialDecaysOnlySlowlyWhen 669 ls Stopped K+ChannelShows Structureof a Bacterial TheThree-Dimensional 671 CanWork Howan lonChannel 673 to lons to WaterButlmpermeable ArePermeable Aquaporins 675 Structure TheFunctionofa NeuronDependson lts Elongated in ActionPotentials Generate CationChannels Voltage-Gated 676 Excitable Cells Electrically of ActionPotential the Speedand Efficiency MyelinationIncreases o/6 in NerveCells Propagation

IndividualGatedChannels IndicatesThat Recording Patch-Clamp 680 Fashion Openin an All-or-Nothing and Structurally Are Evolutionarily CationChannels Voltage-Gated 682 Related ConvertChemicalSignalsinto lon Channels Transmittercated 682 Onesat ChemicalSynapses Electrical 684 or Inhibitory CanBeExcitatory ChemicalSynapses JunctionAre at the Neuromuscular Receptors TheAcetylcholine 684 CationChannels Transmitter-Gated for Psychoactive AreMajorTargets lon Channels TransmitterGated 686 Drugs Activation the Sequential Involves Transmission Neuromuscular 687 of FiveDifferentSetsof lon Channels 688 SingleNeuronsAreComplexComputationDevices of at Least a Combination NeuronalComputationRequires 689 ThreeKindsof K+Channels (LTP) in the MammalianHippocampus Potentiation Long-Term 691 Channels NMDA-Receptor Dependson Ca2+EntryThrough 692 Summary 693 Problems 694 References

and Compartments Chapter12Intracellular ProteinSorting OFCELLS THECOMPARTMENTALIZATION Setof MembraneBasic CellsHavetheSame AllEucaryotic Organelles Enclosed of Relationships OriginsExplainthe Topological Evolutionary Organelles in DifferentWays CanMoveBetweenCompartments Proteins to the CorrectCellAddress DirectProteins SignalSequences DeNovo:TheyRequire CannotBeConstructed MostOrganelles Informationin the Organelleltself Summary

695 695 695 697 699 701 702 704

THE NUCLEUS BETWEEN OF MOLECULES THETRANSPORT 704 ANDTHE CYTOSOL 705 Envelope Nuclear the Perforate Complexes Pore Nuclear to the Nucleus 705 SignalsDirectNuclearProteins NuclearLocalization Bindto BothNuclearLocalization NuclearlmportReceptors 707 andNPCProteins Signals 708 NuclearExportWorksLikeNuclearlmport,Butin Reverse Through on Transport lmposesDirectionality TheRanGTPase 708 NPCs by Controlling NPCsCanBeRegulated TransportThrough 709 MachinerY to the TransPort Access 7't0 Disassembles DuringMitosisthe NuclearEnvelope 712 Summary INTOMITOCHONDRIA OF PROTEINS THETRANsPORT AND CHLOROPLASTS Dependson SignalSequences into Mitochondria Translocation and ProteinTranslocators ArelmportedasUnfolded Proteins Precursor Mitochondrial Chains Polypeptide and a MembranePotentialDriveProteinlmport ATPHydrolysis Intothe MatrixSpace to lnsert Mechanisms UseSimilar andMitochondria Bacteria 2 PorinsintotheirOuterMembran Membraneand Intothe InnerMitochondrial TransDort SpaceOccursViaSeveralRoutes Intermembrane to the Thylakoid DirectProteins TwoSignalSequences in Chloroplasts Membrane Summary PEROXISOMES to UseMolecularOxygenand HydrogenPeroxide Peroxisomes PerformOxidativeReactions Directsthe lmportof Proteinsinto A ShortSignalSequence Peroxisomes Summary

713 7'13 715 716 717 717 719 720 721 721 722 t25

THEENDOPLASMIC RETICULUM 723 TheERlsStructurally and Functionally Diverse 724 SignalSequences WereFirstDiscovered in proteinslmoorteo into the RoughER 726 A Signal-Recognition Particle(SRp)DirectsERSignalSequences to a SpecificReceptorin the RoughERMembrane 727 porein the ThePolypeptide ChainPasses Throughan Aqueous Translocator 730 Translocation Acrossthe ERMembraneDoesNot AlwaysRequire OngoingPolypeptide ChainElongation 731 In Single-Pass Transmembrane Proteins, a SingleInternal ERSignal Sequence Remains in the LipidBilayer asa Membrane-spanning o Helix 732 Combinations of Start-Transfer and Stop-Transfer SignalsDetermine proteins the Topologyof Multipass Transmembrane 734 Translocated Polypeptide ChainsFoldandAssemble in the Lumen of the RoughER n6 MostProteins Synthesized in the RoughERAreGlycosylated by the Additionof a CommonN-Linked Oligosaccharide 736 Oligosaccharides Are UsedasTagsto Markthe Stateof protein Folding 738 lmproperlyFoldedProteins Are Exportedfrom the ERand Degradedin the Cytosol 739 MisfoldedProteinsin the ERActivatean Unfoldedprotein Resoonse 740 SomeMembraneProteins Acquirea Covalently Attached Glycosylphosphatidylinositol (Gpl)Anchor 742 TheERAssembles MostLipidBilayers 743 Summory 745 Problems 746 References 748

Chapter 13 Intracellular VesicularTraffic THEMOLECULAR MECHANISMS OFMEMBRANE TRANSPORT ANDTHEMAINTENANCE OF COMPARTMENTAL DIVERSITY There AreVarious Types of Coated Vesicles TheAssembly of a Clathrin CoatDrives Vesicle Formation NotAllCoats FormBasket-like Structures Phosphoinositides MarkOrganelles andMembrane Domarns Cytoplasmic Proteins Regulate thepinching-Off andUncoarrng of CoatedVesicles MonomericGTPases ControlCoatAssembly NotAllTransport Vesicles AreSpherical RabProteinsGuideVesicle Targeting SNAREs MediateMembrane Fusion InteractingSNAREs Needto BepriedApartBeforeThey Can Function Again ViralFusionProteins andSNAREs MayUseSimilar Fusion Mechanisms Summary TRANSPORT FROMTHEERTHROUGH THEGOLGI APPARATUS ProteinsLeavethe ERin COPII-Coated Transport Vesicles OnlyProteins ThatAre properlyFoldedand Assembled CanLeave the ER Vesicular TubularClusters MediateTransportfrom the ERto the GolgiApparatus TheRetrieval Pathwayto the ERUsesSortingSignals ManyProteins AreSelectively Retainedin the Compartments in WhichTheyFunction TheGolgiApparatus Consists of an OrderedSeriesof Compartments Oligosaccharide Chains AreProcessed in the GolgiApparatus Proteoglycans AreAssembled in the GolgiApparatus Whatlsthe Purpose of Glycosyfationt Transport Throughthe GolgiApparatus MayOccurbyVesicular Transportor Cisternal Maturation GolgiMatrixProteinsHelpOrganize the Stack Summory


750 751 754 757 t)I

758 760 760 toz

764 764 766

766 767 767 768 769 771 771 773 775 776 777 77g 77g



Lysosomes Arethe Principal Sitesof Intracellular Digestion Lysosomes AreHeterogeneous Plantand FungalVacuoles Are Remarkably Versatile Lysosomes MultiplePathways DeliverMaterials to Lysosomes A Mannose6-Phosphate Receptor Recognizes Lysosomal Proteins in the lronsGolgiNetwork TheM6PReceptor ShuttlesBetweenSpecificMembranes A SignalPatchin the Hydrolase Polypeptide ChainProvides the Cuefor M6PAddition Defectsin the GlcNAcPhosphotransferase Causea Lysosomal Storage Disease in Humans SomeLysosomes UndergoExocytosis Summary

779 780 781 792 783 784 785 785 786 786


787 Specialized Phagocytic CellsCanIngestLargeParticles 787 Pinocytic Vesicles Formfrom CoatedPitsin the PlasmaMemorane 789 Not All Pinocytic Vesicles AreClathrin-Coated 790 CellsUseReceptor-Mediated Endocytosis to lmportSelected Extracellular Macromolecules 791 Endocytosed Materials ThatAreNot Retrieved from Endosomes EndUp in Lysosomes 792 SpecificProteins AreRetrieved from EarlyEndosomes and Returned to the PlasmaMembrane 793 Multivesicular BodiesFormon the Pathway to LateEndosomes 795 Transcytosis Transfers Macromolecules AcrossEpithelial CellSheets 797 Epithelial CellsHaveTwoDistinctEarlyEndosomal Compartments but a CommonLateEndosomal Comoartment 798 Summory 799 TRANSPORT FROMTHE IRANSGOLGINETWORK TO THECELLEXTERIOR: EXOCYTOSIS 799 ManyProteins and LipidsSeemto BeCarriedAutomaticallv from the GolgiApparatus to the CellSurface 800 Secretory Vesicles Budfrom thefuonsGolgiNetwork 801 Proteins AreOftenProteolytically Processed Duringthe Formationof Secretory Vesicles 803 Secretory Vesicles WaitNearthe PlasmaMembraneUntil Signaled to Release TheirContents 803 Regulated Exocytosis CanBea Localized Response ofthe Plasma Membrane andltsUnderlying Cytoplasm 804 Secretory VesicleMembraneComponents AreeuicklyRemoved from the PlasmaMembrane 805 SomeRegulated Exocytosis EventsServeto Enlarge the plasma Membrane 80s Polarized CellsDirectProteins from the lransGolgiNetworkto the Appropriate Domainof the Plasma Membrane 805 DifferentStrategies GuideMembraneProteins and LipidsSelectively to the CorrectPlasmaMembraneDomains 806 Synaptic Vesicles CanFormDirectlyfrom Endocytic Vesicles 807 >ummary 809 Problems 810 References 812

Chapter14 EnergyConversion:Mitochondria and Chloroplasts T H EM I T O C H O N D R I O N TheMitochondrion Contains an OuterMembrane, an Inner Membrane, andTwoInternal Compartments TheCitricAcidCycleGenerates High-Energy Electrons A Chemiosmotic Process ConvertsOxidationEnergyinto ATp NADHTransfers its Electrons to OxygenThroughThreeLarge Respiratory Enzyme Complexes As Electrons MoveAlongthe Respiratory Chain,Energyls Stored asan Electrochemical ProtonGradientAcrossthe lnner Membrane TheProtonGradientDrivesATPSynthesis

813 815 916 817 917 gl9

820 a2'l

TheProtonGradientDrivesCoupledTransportAcrossthe Inner Membrane ProduceMostof the Cell'sATP ProtonGradients Mitochondria Maintain a HighATP:ADP Ratioin Cells MakesATP A LargeNegativeValueof AGfor ATPHydrolysis Usefulto the Cell to Hydrolyze ATPand ATPSynthase CanFunctionin Reverse PumoHr Summary ELECTRON-TRANSPORT CHAINSAND THEIRPROTOI' PUMPS

822 822 823 824 826 827


827 ProtonsAre Unusually Easyto Move 828 TheRedoxPotentialls a Measureof ElectronAffinities 829 EfectronTransfers Release LargeAmountsofEnergy in the MethodsldentifiedManyElectronCarriers Spectroscopic 829 Respiratory Chain TheRespiratory ChainIncludesThree LargeEnzyme Complexes 831 in the InnerMembrane Embedded Efficient An lron-CopperCenterin Cytochrome OxidaseCatalyzes 832 02 Reduction Transfers in the InnerMitochondrial MembraneAreMediated Electron 834 Tunneling duringRandom Collisions by Electron A LargeDropin RedoxPotentialAcrossEachoftheThreeRespiratory 835 EnzymeComplexes Provides the Energyfor H+Pumping in theThreeMajor by DistinctMechanisms TheH+PumpingOccurs 835 EnzymeComplexes Transport from ATPSynthesis 836 H+lonophores UncoupleElectron ElectronFlowThrough Respiratory ControlNormallyRestrains 837 the Chain in BrownFatinto NaturalUncouolers Convertthe Mitochondria 838 Heat-Generating Machines PlaysManyCriticalRolesin CellMetabolism 838 TheMitochondrion Mechanisms to Harness Bacteria AlsoExploitChemiosmotic 839 Energy 840 Summary 840 AND PHOTOSYNTHESIS CHLOROPLASTS ls OneMemberof the PlastidFamilyof TheChloroplast 841 Organelles Resemble Mitochondria ButHavean Extra Chloroplasts 842 Compartment fromSunlight andUselt to Fix CaptureEnergy Chloroplasts 843 Carbon by Ribulose Bisphosphate CarbonFixationlsCatalyzed 844 Carboxylase ThreeMolecules EachCO2MoleculeThatls FixedConsumes 845 ofNADPH ofATPandTwoMolecules to Facilitate CarbonFixationin SomePlantslsCompartmentalized 846 Growthat LowCO2Concentrations of Chlorophyll Dependson the Photochemistry Photosynthesis 847 Molecules Reaction CenterPlusan AntennaComplex A Photochemical 848 Forma Photosystem In a Reaction Center,LightEnergyCapturedby Chlorophyll 849 Creates a StrongElectronDonorfrom a WeakOne BothNADPHand ATP 850 Produces NoncyclicPhotophosphorylation CanMakeATPby CyclicPhotophosphorylation Chloroplasts 853 WithoutMakingNADPH and AlsoResemble I and ll HaveRelatedStructures, Photosystems 8s3 Photosystems Bacterial and Forcelsthe Samein Mitochondria TheProton-Motive 6fJ Chloroplasts Control in the Chloroplast InnerMembrane Proteins Carrier 854 with the Cytosol MetaboliteExchange 855 AlsoPerformOtherCrucialBiosyntheses Chloroplasts 855 Summary AND OF MITOCHONDRIA THEGENETIC SYSTEMS 85s PLASTIDS ContainCompleteGeneticSystems856 Mitochondria and Chloroplasts the Numberof Determine GrowthandDivision Organelle 857 in a Cell Mitochondria andPlastids

859 HaveDiverseGenomes and Chloroplasts Mitochondria ProbablyBothEvolvedfrom and Chloroplasts Mitochondria 859 Bacteria Endosymbiotic CodonUsageand CanHavea Havea Relaxed Mitochondria 861 VariantGeneticCode Known 862 Containthe SimplestGeneticSystems AnimalMitochondria 863 GenesContainIntrons SomeOrganelle About Genomeof HigherPlantsContains TheChloroplast 863 120Genes by a Non-Mendelian GenesAreInherited Mitochondrial 864 Mechanism 866 in ManyOrganisms Inherited GenesAreMaternally Organelle the Overwhelming Demonstrate PetiteMutantsin Yeasts Biogenesis 866 for Mitochondrial of the CellNucleus lmportance that Proteins ContainTissue-Specific and Plastids Mitochondria 867 in the CellNucleus AreEncoded Make Chloroplasts lmportMostof TheirLipids; Mitochondria 867 Mostof Theirs MayContributeto the Agingof CellsandOrganisms 606 Mitochondria HaveTheirOwn Genetic and Chloroplasts WhyDo Mitochondria 868 Systems? 870 Summary 870 CHAINS OF ELECTRON-TRANSPORT THE EVOLUTION 870 ATP to Produce Fermentation CellsProbablyUsed TheEarliest to Use ChainsEnabledAnaerobicBacteria Electron-Transoort 871 asTheirMajorSourceof Energy Molecules Nonfermentable Sourceof ReducingPower, ByProvidingan Inexhaustible a MajorEvolutionary Overcame Bacteria Photosynthetic 872 Obstacle Chainsof Cyanobacteria Electron-Transport ThePhotosynthetic Oxygenand PermittedNewLife-Forms 873 Atmospheric Produced 875 Summary 877 Problems 878 References Chapter 15 Mechanisms of Cell Communication


879 OF CELLCOMMUNICATION PRINCIPLES GENERAL 880 Receptors Bindto Specific SignalMolecules Extracellular CanAct OverEitherShortor Long SignalMolecules Extracellular 881 Distances Cellsto ShareSignaling AllowNeighboring GapJunctions 884 lnformation of Combinations to Specific to Respond EachCellls Programmed 884 SignalMolecule5 Extracellular to the Same DifferentTypesof CellsUsuallyRespondDifferently 885 SignalMolecule Extracellular CellsDependson TheirPositionin TheFateof SomeDeveloping 886 MorphogenGradients Molecule of an lntracellular A CellCanAlterthe Concentration 886 QuicklyOnlylf the Lifetimeof the Moleculels Short the Activityof NitricOxideGasSignalsby DirectlyRegulating 887 SpecificProteinsInsidetheTargetCell GeneRegulatory AreLigand-Modulated NuclearReceptors 889 Proteins ProteinsArelonReceptor of Cell-Surface TheThreeLargestClasses and Enzyme-Coupled G-Protein-Coupled, Channel-Coupled, 891 Receptors ViaSmall RelaySignals Receptors MostActivatedCell-Surface 893 SignalingProteins and a Networkof Intracellular Molecules Switches asMolecular Function Proteins Signaling ManyIntracellular 895 or GTPBinding ThatAreActivatedby Phosphorylation the Speed,Efficiency, Enhance Complexes Signaling Intracellular 897 ofthe Response and Specificity Between ModularInteractionDomainsMediatelnteractions 897 SignalingProteins Intracellular Abruptlyto to Respond CellsCanUseMultipleMechanisms Signal 899 ofan Extracellular Concentration Increasing a Gradually MakeUseof Usually Networks Signaling Intracellular 901 Loops Feedback 902 to a Signal Sensitivity CellsCanAdjustTheir 903 Summary



SomeG ProteinsRegulate the Production of CyclicAMp Cyclic-AMP-Dependent ProteinKinase(pKA)MediatesMosr of the Effectsof CyclicAMP SomeG Proteins Activate An InositolPhospholipid Signaling Pathwayby ActivatingPhospholipase C-p Ca2+ Functions asa Ubiquitous Intracellular Mediator TheFrequency of Ca2+Oscillations lnfluences a Cell! Response proteinKinases Ca2+/Calmodulin-Dependent (CaM-Kinases) MediateManyof the Responses to Ca2+ Signals in AnimalCells SomeG ProteinsDirectlyRegulatelon Channels SmellandVisionDependon GPCRs ThatRegulate CyclicNucleotide-Gated lonChannels Intracellular Mediatorsand Enzymatic Cascades Amplify Extracellular Signals phosphorylation GPCR Desensitization Dependson Receptor Summory

90s 90s 908 909 912 912 914 916 917 919 920 921

SIGNALING THROUGHENZYME-COUPLED CELL-SURFACE RECEPTORS 921 phosphorylate ActivatedReceptorTyrosine (RTKs) Kinases Themselves 922 Phosphorylated Tyrosines on RTKsServeas DockingSitesfor Intracellular Signaling Proteins 923 Proteins with SH2DomainsBindto phosphorylated Tyrosines 924 RasBelongsto a LargeSuperfamily of MonomericGTpases 926 RTKs ActivateRasViaAdaptorsand GEFs: Evidence from the Developing Drosophila Eye 927 RasActivates a MAPKinase Signaling Module 928 ScaffoldProteinsHelpPreventCross-Talk BetweenparallelMAp Kinase Modules 930 RhoFamilyGTPases Functionally CoupleCell-Surface Receptors to the Cytoskeleton 931 Pl3-Kinase Produces LipidDockingSitesin the plasmaMemorane 932 ThePl-3-Kinase-Akt SignalingPathwayStimulates AnimalCellsto Surviveand Grow 934 TheDownstream SignalingPathways ActivatedBy RTKs and GpCRs Overlao v5) Tyrosine-Kinase-Associated Receptors Dependon Cytoplasmic Tyrosine Kinases 935 CytokineReceptors Activatethe JAK-STAT Signalingpathway, Providinga FastTrackto the Nucleus 937 phosphorylations 9 3 8 ProteinTyrosine Phosphatases ReverseTyrosine SignalProteinsof the TGFBSuperfamily ActThroughReceptor Serine/Threonine Kinases andSmads 939 proteinKinases Serine/Threonine andTyrosine AreStructurally Related 941 Bacterial Chemotaxis Dependson a Two-Component Signaling PathwayActivatedby Histidine-Kinase-Associated Receptors 941 Receptor Methylationls Responsible for Adaptationin Bacterial Chemotaxis 943 Summory 944 SIGNALING PATHWAYS DEPENDENT ON REGULATED PROTEOLYSIS OF LATENTGENEREGULATORY PROTEINS protein TheReceptorProteinNotchls a LatentGeneRegulatory Wnt ProteinsBindto Frizzled Receptors and Inhibitthe Degradation of p-Catenin Hedgehog Proteins Bindto patchedRelieving lts Inhibition of Smoothened ManyStressful and Inflammatory StimuliActThroughan NFrB-Dependent Signaling Pathway Summory

946 946 948 950 952 954

SIGNALING IN PLANTS 955 Multicellularity andCellCommunication Evolved Independently in PlantsandAnimals 955 Receptor Serine/Threonine Kinases Arethe LargestClassof Cell-Surface Receptors in Plants vf,o

EthyleneBlocksthe Degradation of SpecificGeneRegulatory Proteinsin the Nucleus Regulated Positioning of AuxinTransporters Patterns PlantGrowth Phytochromes DetectRedLight,andCryptochromes DetectBlue Light Summory Problems References

Chapter 16 The Cytoskeleton

957 959 960 961 964



965 Cytoskeletal Filaments Are Dynamicand Adaptable 966 TheCytoskeleton CanAlsoFormStableStructures 969 EachTypeof Cytoskeletal Filamentls Constructed from Smaller ProteinSubunits 970 Filaments Formedfrom MultipleProtofilaments Have Advantageous Properties 971 Nucleationlsthe Rate-Limiting Stepin the Formationof a Cytoskeletal Polymer 973 TheTubulin andActinSubunits Assemble Head-to-Tailto CreatePolarFilaments 973 Microtubules andActinFilaments HaveTwoDistinctEnds ThatGrowat DifferentRates 975 Filament Treadmilling andDynamicInstability AreConsequences of NucleotideHydrolysis byTubulinand Actin 976 Treadmilling and DynamicInstability Aid RapidCytoskeletal Rearrangement 980 TubulinandActinHaveBeenHighlyConserved During Eucaryotic Evolution 982 Intermediate FilamentStructureDependson TheLateral Bundling andTwisting of CoiledCoils 983 Intermediate Filaments lmpartMechanical Stability to AnimalCells 985 DrugsCanAlterFilamentPolymerization 987 Bacterial CellOrganization andCellDivision Dependon Homologsofthe Eucaryotic Cytoskeleton 999 Summary 991 HOWCELLSREGULATETHEIRCYTOSKELETAL FILAMENTS 992 A ProteinComplexContaining yTubulinNucleates Microtubules 992 Microtubules Emanate fromthe Centrosome in AnimalCells 992 ActinFilaments AreOftenNucleated at the PlasmaMembrane 996 TheMechanism of NucleationInfluences Large-Scale Filament Organization 999 Proteins ThatBindto the FreeSubunitsModifyFilamentElongation999 SeveringProteinsRegulate the Lengthand KineticBehaviorof ActinFilaments andMicrotubules 1000 Proteins ThatBindAlongthe Sidesof Filaments CanEitherStabilize or DestabilizeThem 1OO1 ProteinsThat Interact with Filament EndsCanDramatically Change Filament Dynamics 1OO2 DifferentKindsof ProteinsAlterthe Properties of RapidlyGrowing Microtubule Ends 1003 Filaments AreOrganized into Higher-Order Structures in Cells 1005 Intermediate Filaments AreCross-Linked and Bundledlnto StrongArrays 1005 Cross-Linking Proteins with DistinctProperties OrganizeDifferent Assemblies of ActinFilaments 1006 Filaminand SpectrinFormActinFilamentWebs l OOg Cytoskeletal Elements MakeManyAttachments to Membrane 1009 Summary l0l0 MOLECULARMOTORS Actin-Based MotorProteins AreMembersof the Mvosin Superfamily ThereAreTwoTypesof MicrotubuleMotorProteins: Kinesins and Dyneins TheStructural Similarity of MyosinandKinesin Indicates a CommonEvolutionaryOrigin MotorProteins Generate Forceby CouplingATPHydrolysis to Conformational Chanqes

1010 1 0 11

rc14 1015 1016

AreAdaptedto CellFunctions MotorProteinKinetics Transport of MembraneMediatethe Intracellular MotorProteins Organelles Enclosed Localizes SpecificRNAMolecules TheCytoskeleton CellsRegulateMotorProteinFunction Summary

1020 1021 1022 1023 1025

1025 AND CELLBEHAVIOR THE CYTOSKELETON Muscles to Causes Slidingof Myosinll andActinFilaments 1026 Contract InitiatesMuscle Ca2+ Concentration A SuddenRisein Cytosolic 1028 Contraction 10 3 1 Engineered Machine HeartMusclelsa Precisely AreMotileStructures Builtfrom Microtubules Ciliaand Flagella 1031 andDyneins Microtubule of the MitoticSpindleRequires Construction 1034 of ManyMotorProteins Dynamics and the Interactions 1036 ManyCellsCanCrawlAcrossA SolidSubstratum 1037 DrivesPlasmaMembraneProtrusion ActinPolymerization CellAdhesionandTractionAllowCellsto PullThemselves 1040 Forward Membersof the RhoProteinFamilyCauseMajorRearrangements 1041 of the ActinCytoskeleton Extracellular SignalsCanActivatethe ThreeRhoProtein 1043 FamilyMembers 1045 ExternalSignalsCanDictatethe Directionof CellMigration Betweenthe Microtubuleand ActinCytoskeletons Communication 1046 and Locomotion Whole-Cell Polarization Coordinates of NeuronsDepends Specialization TheComplexMorphological 1047 on the Cytoskeleton 1050 Summary 1050 Problems 1052 References

Chapter17 The CellCycle OFTHECELL CYCLE OVERVIEW CellCyclels Dividedinto FourPhases TheEucaryotic Cell-Cycle Controlls Similarin All Eucaryotes by Analysis of Genetically Cell-Cycle ControlCanBeDissected YeastMutants in Animal ControlCanBeAnalyzedBiochemically Cell-Cycle Embryos Cells Cell-Cycle ControlCanBeStudiedin CulturedMammalian Progression CanBeStudiedin VariousWays Cell-Cycle Summary THE CELL-CYCLE CONTROLSYSTEM Triggersthe MajorEventsof the TheCell-Cycle ControlSystem CellCycle Activated ControlSystemDependson Cyclically TheCell-Cycle (Cdks) ProteinKinases Cyclin-Dependent and CdkInhibitoryProteins(CKls) InhibitoryPhosphorylation CdkActivity CanSuppress Proteolysis ControlSystemDependson Cyclical TheCell-Cycle Regulation ControlAlsoDependson Transcriptional Cell-Cycle asa Networkof ControlSystemFunctions TheCell-Cycle Switches Biochemical Summaty 5 PHASE OncePerCycle S-CdkInitiatesDNAReplication Duplication of Chromatin DuplicationRequires Chromosome Structure Together HelpHoldSisterChromatids Cohesins Summory

1053 1054 1054 1056 1056 1057 1059 1059 1060 1060 1060 1062 1063 1064 I uof,

1065 't067 't067 1067 1069 1070 1071



M-CdkDrivesEntryInto Mitosis M-Cdkat the Onsetof Mitosis Activates Dephosphorylation for Chromosomes HelpsConfigureDuplicated Condensin Separation Machine TheMitoticSpindlels a Microtubule-Based

1071 1074 1075 1075

GovernSpindle MotorProteins Microtubule-Dependent and Function Assembly of a BipolarMitotic in the Assembly Collaborate TwoMechanisms Soindle OccursEarlyin the CellCycle Duplication Centrosome in Prophase M-CdkInitiatesSpindleAssembly in AnimalCellsRequires TheCompletionof SpindleAssembly Breakdown NuclearEnvelope Greatlyin Mitosis MicrotubuleInstabilityIncreases PromoteBipolarSpindleAssembly MitoticChromosomes to the Spindle AttachSisterChromatids Kinetochores ls AchievedbyTrialand Error Bi-Orientation on the Spindle MultipleForcesMoveChromosomes andthe Separation TriggersSister-Chromatid TheAPC/C Completionof Mitosis Separation: BlockSister-Chromatid Chromosomes Unattached CheckPoint TheSpindleAssemblY A and B in Anaphase Segregate Chromosomes in DaughterNucleiat ArePackaged Chromosomes Segregated Teloohase Meiosisls a SoecialFormof NuclearDivisionInvolvedin Sexual Reproduction Summory

1077 1077 l 078 1078 1079 1080 1081 1082 1083 1085 1087 1088 1089 1o9o 1090 1092

1092 CYTOKINESIS for the Force Ring Generate Actinand Myosinll in the Contractile 1093 Cytokinesis of the andContraction LocalActivationof RhoATriggersAssembly 1094 Ring Contractile the Planeof of the MitoticSpindleDetermine TheMicrotubules 1095 AnimalCellDivision 1097 in HigherPlants GuidesCytokinesis ThePhragmoplast to Daughter MustBeDistributed Organelles Membrane-Enclosed 1098 CellsDuringCytokinesis TheirSpindleto DivideAsymmetrically 1099 SomeCellsReposition 1099 MitosisCanOccurWithoutCytokinesis 1100 TheG1Phasels a StableStateof Cdklnactivity 11 0 1 Summary CONTROLOF CELLDIVISIONAND CELLGROWTH MitogensStimulateCellDivision Nondividing CellsCanDelayDivisionby Enteringa Specialized State Activities MitogensStimulateGr-Cdkand GrlS-Cdk TheDNADamageResponse DNADamageBlocksCellDivision: on the Number ManyHumanCellsHavea Built-lnLimitation of TimesTheyCanDivide Arrestor SignalsCauseCell-Cycle AbnormalProliferation Exceptin CancerCells Apoptosis, OrganismandOrganGrowthDependon CellGrowth TheirGrowthand Division CellsUsuallyCoordinate Proliferating SignalProteins CellsCompetefor Extracellular Neighboring CellMassby UnknownMechanisms AnimalsControlTotal Summary Problems References

Chapter18 APoPtosis

11 0 1 11 0 2 1103 1103 1105 1oo7 1107 r 108 11 0 8 1110 1111 1112 1112 1113

11 1 5

1115 UnwantedCells CellDeathEliminates Programmed 1117 Recognizable ApoptoticCellsAreBiochemically Cascade Proteolytic ApoptosisDependson an Intracellular 1118 Thatls MediatedbYCasPases Pathway Activatethe Extrinsic DeathReceptors Cell-Surface 1120 ofApoptosis 1121 TheIntrinsicPathwayof ApoptosisDependson Mitochondria 1121 the IntrinsicPathwayof Apoptosis Bcf2ProteinsRegulate 1124 lAPsInhibitCaspases '1126 Ways Various in Apoptosis Inhibit Factors Survival Extracellular to Disease1127 CanContribute Apoptosis or Insufficient EitherExcessive 1128 Summary 1128 problems 1129 References

Chapter19 CellJunctions,CellAdhesion,and the Extracellular Matrix CADHERINS ANDCELL-CELL ADHESION

I 131 11 3 3

Cadherins MediateCa2+-Dependent Cell-Cell Adhesion in AllAnimals TheCadherinSuperfamily in Vertebrates IncludesHundredsof Different Proteins, Including Manywith Signaling Functions Cadherins MediateHomophilic Adhesion 5electiveCell-CellAdhesionEnables Dissociated Vertebrare Cellsto Reassemble into Organized Tissues Cadherins Controlthe Selective Assortment of Cells TwistRegulates Epithelial-Mesenchyma I Transitions CateninsLinkClassical Cadherins to the ActinCytoskeleton Adherens Junctions Coordinate the Actin-Based Motilityof AdjacentCells Desmosome Junctions GiveEpithelia Mechanical Strenqth Cell-Cell Junctions SendSignals intothe CellInterior Selectins Mediate Transient Cell-Cell Adhesions in the Bloodstream Members of the lmmunoglobulin 5uperfamily of proteins MediateCa2+-lndependent Cell-Cell Adhesion ManyTypes of CellAdhesionMolecules Act in parallelto Create a Synapse ScaffoldProteins OrganizeJunctionalComplexes Summary

1147 11 4 8 1149


11 5 0

11 3 5


1137 139 140 141 142

1142 1143 1"t45 1145 1146

TightJunctionsForma SealBetweenCellsand a FenceBetween Membrane Domains playa KeypartIn ScaffoldProteinsin JunctionalComplexes the Controlof CellProliferation Cell-CellJunctions andthe Basal LaminaGovernAoico-Basal Polarity in Epithelia planarCellpolarity A Separate Signaling System Controls Summary

11 5 5 1157 11 5 8


11 5 8

11 5 0 11 5 3

GapJunctions CoupleCellsBothElectrically andMetabolically A Gap-Junction Connexon lsMadeUp of SixTransmembrane Connexin Subunits GapJunctions HaveDiverse Functions CellsCanRegulate the Permeability of TheirGapJunctions performManyof the SameFunctions In Plants,Plasmodesmata asGapJunctions Summary

1162 11 6 3



Basal Laminae Underlie All Epithelia andSurround Some Nonepithelial CellTypes Lamininlsa Primary Component of the BasalLamina TypelV CollagenGivesthe BasalLaminaTensileStrenoth BasalLaminae HaveDiverse Functions Summary INTEGRINS AND CELL-MATRIX ADHESION IntegrinsAreTransmembrane Heterodimers ThatLinkto tne Cytoskeleton IntegrinsCanSwitchBetweenan Activeand an Inactive Conformation IntegrinDefectsAre Responsible for ManyDifferentGenetic Diseases Integrins Cluster to FormStrongAdhesions Extracellular MatrixAttachments ActThroughIntegrinsto ControlCellProliferation and Survival proteins Integrins Recruit Intracellular Signaling at Sitesof CellSubstratum Adhesion IntegrinsCanProduceLocalized Intracellular Effects Summory

THEEXTRACELLULAR MATRIX OFANIMALCONNECTIVE TlsSuES 1178 TheExtracellular Matrixls Madeand Orientedby the Cells j179 Withinlt (GAG)ChainsOccupyLargeAmountsof Glycosaminoglycan SpaceandFormHydrated Gels 1179 Hyaluronan Actsasa SpaceFilleranda Facilitator of CellMigration DuringTissue Morphogenesis andRepair 1180 Proteoglycans AreComposed of GAGChainsCovalently Linked to a CoreProtein 11 8 1 Proteoglycans CanRegulate the Activitiesof Secreted Proterns 1182 Cell-Surface Proteoglycans Act asCo-Receptors 11 8 3 Collagens Arethe MajorProteins of the Extracellular Matrix 1184 CollagenChainsUndergoa Seriesof Post-Translational Modifications 11 8 6 Propeptides AreClippedOff Procollagen Afterlts Secretion to AllowAssembly of Fibrils 1187 't187 Secreted Fibril-Associated Collagens HelpOrganize the Fibrils CellsHelpOrganize the CollagenFibrilsTheySecreteby ExertingTensionon the Matrix 11 8 9 ElastinGivesTissues TheirElasticitv 11 8 9 Fibronectin ls an Extracellular ProteinThatHelpsCellsAttach to the Matrix 1191 TensionExertedby CellsRegulates Assemblyof Fibronectin 't't91 Fibrils Fibronectin Bindsto IntegrinsThrough an RGDMotif 11 9 3 CellsHaveto BeAbleto DegradeMatrix,asWellasMakeit 11 9 3 MatrixDegradation ls Localized to the Vicinityof Cells 1194 Summary 1195 THEPLANTCELLWALL TheComposition of the CellWallDependson the CellType TheTensileStrengthof the CellWallAllowsPlantCellsto DevelopTurgorPressure ThePrimary CellWallls BuiltfromCellulose Microfibrils Interwovenwith a Networkof PecticPolysaccharides OrientedCell-Wall DepositionControlsplantCellGrowth Microtubules OrientCell-Wall Deposition Summary Problems References

11 9 5 11 9 5 't't97 1197 1199 1200 1202 1202 1204

11 5 8 11 5 9 11 6 1 11 6 1

1164 11 6 5 't166 1167 1169 1169 1170 1"170 "1172 1174 1175 1176 11 7 7 1178

Chapter20 Cancer CANCER A5A MICROEVOLUTIONARY PROCESS CancerCellsReproduce WithoutRestraint and Colonize OtherTissues MostCancers Derivefrom a SingleAbnormalCell Cancer CellsContainSomatic Mutations A SingleMutationls Not Enoughto Cause Cancer Cancers Develop Gradually fromIncreasingly Aberrant Cells Cervical Cancers Are Prevented by EarlyDetection TumorProgression InvolvesSuccessive Roundsof Random InheritedChangeFollowedby NaturalSelection TheEpigenetic Changes ThatAccumulate in CancerCellsInvolve Inherited Chromatin Structures andDNAMethylation HumanCancer CellsAreGenetically Unstable Cancerous GrowthOftenDependson Defective Controlof CellDeath,CellDifferentiation, or Both CancerCellsAre UsuallyAlteredin TheirResponses to DNA Damageand OtherFormsof Stress HumanCancer CellsEscape a Built-lnLimitto Cellproliferation A SmallPopulation of Cancer StemCellsMaintains Many Tumors How Do CancerStemCellsArise? To Metastasize, MalignantCancerCellsMustSurviveand Proliferate in a ForeignEnvlronment TumorsInduceAngiogenesis TheTumorMicroenvironment Influences Cancer Development ManyProperties Typically Contributeto Cancerous Growth Summary

1205 1205 1206 1207 I 208 1209 1210 1211 1212 1213 1214 1215 1216 12'.t7 1217 1218 1220 1220 1222 1223 1223



AgentsDamage DNA Many,ButNotAll,Cancer-Causing Do Not Damage DNA;TumorPromoters TumorInitiators Contribute to a Significant Viruses andOtherInfections of HumanCancers Proportion Reveals Waysto Avoid ldentification of Carcinogens Cancer Summary

1225 1226 1227 1229 1230

1230 GENES FINDINGTHECANCER-CRITICAL and Loss-of-Function of Gain-of-Function Theldentification 1231 MutationsRequires DifferentMethods ThatAlter CanAct asVectorsfor Oncogenes Retroviruses 1232 CellBehavior on the for Oncogenes HaveConverged DifferentSearches 1233 SameGene-Ras Firstldentified Cancer Syndromes of RareHereditary Studies 1234 Genes TumorSuppressor fromStudies Genes CanAlsoBeldentified TumorSuopressor Il5> of Tumors Tumor Mechanisms CanInactivate andEpigenetic BothGenetic 1235 Genes Suppressor in Many CanBeMadeOveractive GenesMutatedin Cancer 1237 Ways 1239 GenesContinues TheHuntfor CancerCritical 1240 Summary 1240 BEHAVIOR BASISOF CANCER-CELL THEMOLECULAR Embryos andGenetically of BothDeveloping Studies of the Function MiceHaveHelpedto Uncover Engineered 1241 Genes Cancer-Critical 1242 CellProliferation GenesRegulate ManyCancer-Critical of Cell-Cycle MayMediate the Disregulation DistinctPathways of CellGrowthin andthe Disregulation Progression 1244 Cells Cancer Cells AllowCancer ThatRegulate Apoptosis in Genes Mutations 1245 WhenTheyShouldNot to Survive Cellsto Survive in thep53GeneAllowManyCancer Mutations 1246 DespiteDNADamage and Proliferate Blockthe Actionof KeyTumorSuppressor DNATumorViruses 1247 Proteins AreStill ThatLeadto Metastasis in TumorCells TheChanges 1249 Largelya Mystery of Visible a Succession Evolve SlowlyVia Colorectal Cancers 1250 Changes AreCommonto a LargeFractionof A FewKeyGeneticLesions 1251 Colorectal Cancers Repair 1254 in DNAMismatch Cancers HaveDefects SomeColorectal with CanOftenBeCorrelated TheStepsof TumorProgression 1254 Mutations SDecific by lts Own Arrayof Genetic EachCaseof CancerlsCharacterized I z)o Lesions Iz>o Summary AND FUTURE PRESENT TREATMENT: CANCER but Not Hopeless Cures ls Difficult for Cancer TheSearch andLossof Instability Exploit the Genetic TraditionalTherapies in Cancer Cells Responses Checkpoint Cell-Cycle Genetic of a Tumor's Cause the Specific NewDrugsCanExploit Instability More BecomeProgressively GeneticInstabilityHelpsCancers Resistant to Therapies of Cancer AreEmergingfrom Our Knowledge NewTherapies Biology Oncogenic to InhibitSpecific CanBeDesigned SmallMolecules Proteins AreLogicalTargetsfor CancerTherapy TumorBloodVessels the lmmune by Enhancing MayBeTreatable ManyCancers Tumor Againsta Specific Response Has with Several DrugsSimultaneously Patients Treating for CancerTherapy PotentialAdvantages into Cancers Profiling CanHelpClassify GeneExpression Subgroups Meaningful Clinically

1256 1257 1257 1257 1259 1260 1260 I loz

| 202

| 205


1264 1265 1265 1267

Therels StillMuchMoreto Do Summory problems References

Chapters21-25 availableon Media DVD-ROM

Meiosis, Chapter2t SexualReproduction: Fertilization GermCells,and


OF SEXUALREPRODUCTIOII OVERVIEW ls Brief TheHaploidPhasein HigherEucaryotes Diversity Genetic Creates Meiosis Advantage a Competitive GivesOrganisms SexualReproduction Summary

1269 1269 1271 1271 1272

1272 ME|OS|S 1272 byTwoMeioticCellDivisions AreProduced Gametes PairDuringEarly (andSexChromosomes) Homologs Duplicated 1274 proohase 1 a Synaptonemal of Formation in the Culminates Pairing Homolog 1275 Complex KinetochoreDependson Meiosis-Specific, HomologSegregation 1276 Proteins Associated 1278 GoesWrong Frequently Meiosis 1279 GeneticReassortment Enhances Crossing-Over 1280 ls HighlyRegulated Crossing-Over 1280 Mammals in MaleandFemale Differently lsRegulated Meiosis 1281 Summary IN GERMCELLSAND sEXDETERMINATION PRIMORDIAL MAMMALS Signalsfrom NeighborsSpecifyPGCsin MammalianEmbryos Gonads Migrateintothe Developing PGCS Gonadto TheSryGeneDirectsthe DevelopingMammalian Becomea Testis VaryGreatlybetween ManyAspectsof SexualReproduction AnimalsPecies Summary

1282 1282 1283 1283 1285 1286



for IndependentDevelopment An Eggls HighlySpecialized EggsDevelopin Stages to Growto TheirLargeSize OocytesUseSpecialMechanisms MostHumanOocytesDieWithoutMaturing Summary

1287 1288 1290 1291 1292 't292

SPERM TheirDNAto an Egg SpermAre HighlyAdaptedfor Delivering Testis in the Mammalian Continuously SpermAreProduced SpermDevelopasa SYncYtium Summary

1292 1293 1294 1296

1297 FERTILIZATION in the FemaleGenitalTract 1297 SpermBecomeCapacitated Ejaculated and Undergoan Pellucida Zona to the Bind Sperm Capacitated 1298 AcrosomeReaction 1298 of Sperm-EggFusionls StillUnknown TheMechanism Ca2+in the Cytosol 1299 the Eggby Increasing SpermFusionActivates OnlyOneSpermFertilizes Ti'reCorticalReactionHelpsEnsureThat 1300 the Egg asWellaslts Genometo the Zygote1301 Centrioles TheSoermProvides theTreatmentof Human IVFand lCSlHaveRevolutionized 1301 Infertility 1303 Summary 1304 References

of Multicellular Chapter22 Development Organisms


OFANIMALDEVELOPMENT 1305 MECHANISMS UNIVERSAL 1307 Features Anatomical Basic Some Share Animals

Multicellular Animals AreEnriched in proteins Mediatino Cell Interactions andGeneRegulation 1308 Regulatory DNADefines the program of Development 1309 Manipulation of the EmbryoReveals the Interactions Between ItsCells 13 1 0 Studies of MutantAnimalsldentifythe Genes ThatControl Developmental Processes 131 A CellMakesDevelopmental Decisions LongBeforelt Shows a Visible Change 131 CellsHaveRemembered Positional Values ThatReflect Their Locationin the Body 1312 InductiveSignalsCanCreateOrderlyDifferences Between Initially ldentical Cells 13 1 3 SisterCellsCanBeBornDifferentby an Asymmetric Cell Division 1313 PositiveFeedback CanCreateAsymmetryWhereThereWas NoneBefore 1314 patterns, PositiveFeedback Generates Creates All-or-None Outcomes, and Provides Memory t5t) A SmallSetof SignalingPathways, UsedRepeatedly, Controls Developmental Patterning 13 1 6 Morphogens AreLong-Range Inducers ThatExertGradedEffects 13 1 6 Extracellular Inhibitors of SignalMolecules Shapethe Response to the Inducer 1317 Developmental Signals CanSpread Through Tissue in Severar DifferentWays 13 1 8 Programs ThatAreIntrinsicto a CellOftenDefinethe Time-Course of its Develooment 1319 InitialPatterns AreEstablished in SmallFields of Cellsano Refined by Sequential Induction asthe EmbrvoGrows 13 1 9 Summory 1320 CAENORHABDITIS ELEGANS: DEVELOPMENT FRoM THE PERSPECTIVE OFTHEINDIVIDUAL CELL Caenorhabditis elegans ls Anatomically Simple CellFatesin the Developing NematodeAreAlmostperfectly Predictable Productsof Maternal-Effect GenesOrganize the Asymmetric Division of the Egg Progressively MoreComplexpatternsAreCreatedby Cell-Cell Interactions Microsurgery andGenetics Reveal the Logicof Developmental Control; GeneCloningandSequencing Reveal ltsMolecular Mechanisms CellsChangeOverTimein TheirResponsiveness to Developmental Signals Heterochronic GenesControlthe Timingof Development CellsDo NotCountCellDivisions in TimingTheirInternal Programs Selected CellsDie by Apoptosisaspartof the proqramof Development Summary

13 2 1 't321 1322 1323 1324

| 5l)

1325 1326 1327 1327 1328

DROSOPHILA AND THEMOLECULAR GENETICS OF PATTERN FORMATION: GENESIS OFTHEBODYPLAN 1328 TheInsectBodylsConstructed asa Series of Segmental Units 1329 Drosophilo Beginslts Development asa Syncytium 1330 GeneticScreens DefineGroupsofGenesRequired for Specific Aspectsof EarlyPatterning 1332 Interactions of the OocyteWith lts Surroundings Definethe Axesof the Embryo:the Roleof the Egg-polarity Genes 13 3 3 TheDorsoventral Signaling GenesCreate a Gradient of a protern Nuclear GeneRegulatory 1334 DppandSogSetUp a Secondary Morphogen Gradient to Refinethe Patternof the Dorsalpartof the Embrvo 1336 TheInsectDorsoventral AxisCorresponds to the Veriebrate Ventrodorsal Axis 1336 ThreeClasses of Segmentation GenesRefinethe Anterior_posterior MaternalPatternand Subdivide the Embrvo 1336 TheLocalized Expression of Segmentation Genesls Regulated by a Hierarchy of Positional Signals 1337 TheModularNatureof Regulatory DNAAllowsGenesto Have MultipleIndependently Controlled Functions I 339

Egg-Polarity, Gap,andPair-Rule Genes Create aTransient Pattern ThatlsRemembered bvOtherGenes Summary HOMEOTIC SELECTOR GENES ANDTHEPATTERNING OF THEANTEROPOSTERIOR AXIS The Hox Code SpecifiesAnterior-PosteriorDifferences

proteins Homeotic Selector GenesCodefor DNA-Binding That lnteractwith OtherGeneRegulatory Proteins TheHomeoticSelectorGenesAreExpressed Sequentially Accordingto TheirOrderin the HoxComplex TheHoxComplexCarries a Permanent Recordof Positional Information TheAnteroposterior Axisls Controlledby HoxSelectorGenesIn vertebrates Also Summary

1340 1341 1341 1342 1342 1343 1344 1344 1347


1347 Conditional andInducedSomatic Mutations Makeit possible to AnalyzeGeneFunctionsLatein Development 1348 BodyPartsof the Adult FlyDevelopFromlmaginalDiscs 1349 HomeoticSelectorGenesAre Essential for the Memoryof Positional Information in lmaginal DiscCells tSfl SpecificRegulatory GenesDefinethe CellsThatWillForman Appendage 13 5 1 TheInsectWing DisclsDividedintoCompartments 1352 FourFamiliar Signaling Pathways Combine to Pattern the WingDisc:Wingless, Hedgehog, Dpp,and Notch I 353 TheSizeof EachCompartment ls Regulated by Interactions AmongltsCells 13 s 3 Similar Mechanisms Pattern the Limbsof Vertebrates 1355 Localized Expression of Specific Classes of GeneRegulatory ProteinsForeshadows CellDifferentiation 1356 Lateral Inhibition Singles Out Sensory MotherCellsWithin Proneural Clusters 1357 Lateral Inhibition Drives the Progeny of the Sensory MotherCell TowardDifferentFinalFates 857 Planar Polarity of Asymmetric Divisions isControlled by Signaling viathe ReceptorFrizzled 1359 Asymmetric Stem-Cell Divisions Generate AdditionalNeurons in the CentralNervousSystem 1359 Asymmetric Neuroblast Divisions Segregate an Inhibitorof Cell Division intoJustOneof the Daughter Cells 1361 patternof NotchSignaling Regulates the Fine-Grained Differentiated CellTypesin ManyDifferentTissues 1362 SomeKeyRegulatory GenesDefinea CellType;OthersCan Activatethe Programfor Creationof an EntireOrgan 1362 Summary 1363 CELLMOVEMENTS AND THESHAPING OFTHE VERTEBRATE BODY 1363 ThePolarityof the AmphibianEmbryoDependson the polarity of the Egg 1364 Cleavage Produces ManyCellsfrom One l 365 Gastrulation Transforms a HollowBallof Cellsinto a Three-Lavered Structurewith a PrimitiveGut I JO) predictable TheMovements of Gastrulation ArePrecisely | 500 Chemical Signals Trigger the Mechanical Processes 1367 ActiveChanges of CellPackingProvidea DrivingForcefor Gastrulation 1368 Changing Patterns of CellAdhesion Molecules ForceCells IntoNewArrangements 1369 TheNotochord Elongates, Whilethe NeuralplateRollsUp ro Formthe NeuralTube 1370 A Gene-Expression Oscillator ControlsSegmentation of the 't371 Mesodermlnto Somites DelayedNegativeFeedback MayGenerate the Oscillations of the Segmentation Clock 1373 Embryonic Tissues AreInvadedin a Strictly Controlled Fashion by Migratory Cells 1373 TheDistribution of MigrantCellsDepends on Survival Factors asWellasGuidance Cues 1375

Left-RightAsymmetryof theVertebrateBodyDerivesFrom in the EarlyEmbryo Molecular Asymmetry Summary




Preamble Begins Witha Specialized Mammalian Development Embryols HighlyRegulative TheEarlyMammalian TotipotentEmbryonicStemCellsCanBeObtainedFroma Embryo Mammalian Generate Between Epithelium andMesenchyme Interactions TubularStructures Branching Summary

1378 1380


13 8 3

ltt I

1380 13 8 1 1382

Accordingto the NeuronsAreAssignedDifferentCharacters 1383 Timeand PlaceWhereTheyAreBorn the Assigned to a Neuronat lts BirthGoverns TheCharacter 1385 lt WillForm Connections EachAxonor DendriteExtendsby Meansof a GrowthConeat 1386 ItsTip TheGrowthConePilotsthe DevelopingNeuriteAlonga Precisely 1387 DefinedPath/n Vlvo 1389 asTheyTravel Sensibilities GrowthConesCanChangeTheir Neurotrophic Factors ThatControlNerve TargetTissues Release 1389 CellGrowthand Survival Guidesthe Formationof OrderlyNeural NeuronalSpecificity 1391 Maps AxonsFromDifferentRegionsof the RetinaRespondDifferently | sYZ in theTectum of Reoulsive Molecules to a Gradient AreSharpened by of SynapticConnections DiffusePatterns 1393 Remodeling Activity-Dependent in the Moldsthe Patternof SynapticConnections Experience 1395 Brain May Synapse Remodeling AdultMemoryandDevelopmental 1396 Mechanisms Dependon Similar 1397 Summary PLANTDEVELOPMENT


Arabidopsis Servesasa ModelOrganismfor PlantMolecular 1398 Genetics Contro Genomels Richin Developmental fhe Arabidopsis 1399 Genes a Root-Shoot Development Startsby Establishing Embryonic 1400 AxisandThenHaltsInsidethe Seed 1403 by Meristems Sequentially ThePartsof a PlantAreGenerated '1403 Signals on Environmental of the Seedling Depends Development Events Coordinate Developmental Hormonal Signals Long-Range 1403 in Separate Partsofthe Plant TheShapingof EachNewStructureDependson Oriented 1406 andExoansion CellDivision Setof Primordia EachPlantModuleGrowsFroma Microscopic 1407 in a Meristem AuxinTransportControlsthe Patternof Primordia Polarized 1408 in the Meristem 1409 the Meristem Maintains CellSignaling PlantTopologyby MutationsCanTransform Regulatory 1410 in the Meristem AlteringCellBehavior TheSwitchto FloweringDependson Pastand Present 1412 Environmental Cues 1413 HomeoticSelectorGenesSpecifythe Partsof a Flower 1415 Summary 1415 References

Tissues,StemCells, Chapter23 Specialized and TissueRenewal BYSTEM CELLS ANDIT5RENEWAL EPIDERMIS

'a417 1417

1419 Barrier Waterproof Epidermal CellsForma Multilayered of Different CellsExpress a Sequence Epidermal Differentiating 1420 GenesasTheyMature StemCellsin the BasalLayerProvidefor Renewalof the Epidermis1420 of a StemCellDo Not AlwaysHaveto TheTwo Daughters 1421 BecomeDifferent

TheBasalLayerContainsBothStemCellsandTransitAmplifying Cells ArePartof the Strategyof Growth TransitamplifyingDivisions Control DNA Original Retain Selectively StemCellsof SomeTissues Strands Dramatically DivisionCanIncrease TheRateof Stem-Cell WhenNewCellsAre NeededUrgentlY Renewal GovernEpidermal Signals ManyInteracting and Cyclesof Development TheMammaryGlandUndergoes Regression Summary EPITHELIA SENSORY Replaced OlfactorySensoryNeuronsAreContinually AuditoryHairCellsHaveto Lasta Lifetime CellsRenewTheirParts:the Photoreceptor MostPermanent Cellsof the Retina Summary

1422 1423 1424 1425 1426 1426 1428 1429 1429 1430 't432 1433

1434 ANDTHEGUT THEAIRWAYS 1434 Lungs of the in the Alveoli AdjacentCellTypesCollaborate to Collaborate andMacrophages Cells, Ciliated GobletCells, 1434 Keepthe AirwaysClean ltselfFasterThan Renews TheLiningof the SmallIntestine 1436 AnyOtherTissue 1438 Compartment the GutStem-Cell Maintains WntSignaling 1439 GutCellDiversification Controls NotchSignaling of GutEpithelial the Migrations Controls Signaling Ephrin-Eph 1440 Cells Combine Pathways andBMPSignaling PDGF, Wnt,Hedgehog, 1441 Niche to Delimitthe Stem-Cell Tract asan InterfaceBetweenthe Digestive TheLiverFunctions 1442 andthe Blood "t443 LiverCellProliferation LiverCellLossStimulates InsulinDoesNot Haveto Dependon StemCells: Renewal Tissue 1444 Cellsin the Pancreas Secreting 1445 Summary AND ENDOTHELIAL LYMPHATICS, BLOODVEsSEL5, 1445 CELLS 1445 andLymphatics CellsLineAll BloodVessels Endothelial 1446 Angiogenesis TipCellsPioneer Endothelial ofVessel 1447 CellsFormDifferentTypes of Endothelial DifferentTypes NotchSignaling VEGF; a BloodSupplyRelease Requiring Tissues 1448 the Response CellsRegulates Endothelial Between of Pericytes CellsControlRecruitment from Endothelial Signals 1450 Wall and SmoothMuscleCellsto Formthe Vessel 1450 Summary BLOODCELL STEMCELLS: BYMULTIPOTENT RENEWAL 1450 FORMATION Granulocytes, Are Cells of WhiteBlood TheThreeMainCategories 1451 and LYmPhocytes Monocytes, of EachTypeof BloodCellin the BoneMarrowls TheProduction 1453 Controlled Individually 1454 StemCells Hemopoietic BoneMarrowContains 1456 of BloodCells A MultipotentStemCellGivesRiseto All Classes |+)o Process Commitmentls a StePwise of Number the Amplify Cells Progenitor of Committed Divisions 1457 BloodCells Specialized 1458 StemCellsDependon ContactSignalsFromStromalCells CanBeAnalyzedin Culture 1459 Hemopoiesis ThatRegulate Factors 1459 Dependson the HormoneErythropoietin Erythropoiesis Production 1460 andMacrophage Neutrophil lnfluence MultipleCSFs 1461 CellDependsPartlyon Chance of a Hemopoietic TheBehavior of Cell lsaslmportantasRegulation of CellSurvival Regulation 1462 Proliferation 1462 Summary OF AND REGENERATION MODULATION, GENESIS, MUSCLE SKELETAL Fuseto FormNewSkeletalMuscleFibers Myoblasts

1463 1464

MuscleCellsCanVaryTheirProperties by Changing the protein fsoforms TheyContain 1465 Skeletal MuscleFibersSecrete Myostatin to LimitTheirOwnGrowth 1465 SomeMyoblasts Persist aseuiescentStemCellsin the Adult :|466 Summary 1467



F i b r o b l a s tC s h a n g eT h e i rC h a r a c t e ri n R e s p o n s et o C h e m i c a l Signals T h e E x t r a c e l l u l aMr a t r i x M a y I n f l u e n c eC o n n e c t i v e - T i s s uCee l l Differentiation by Affecting Cell Shape and Attachment OsteoblastsMake Bone Matrix M o s t B o n e sA r e B u i l t A r o u n d C a r t i l a g eM o d e l s B o n e l s C o n t i n u a l l yR e m o d e l e db y t h e C e l l sW i t h i n l t OsteoclastsAre Controlled by SignalsFrom Osteoblasts Fat CellsCan Develop From Fibroblasts Leptin Secretedby Fat CellsProvidesFeedbackto Requlate

1467 1468 i46g 1470 lr472 1473 1474

Eating Summary

1475 1476



Hemopoietic StemCellsCanBeUsedto Replace Diseased Blood Cellswith Healthy Ones 1477 Epidermal StemCellPopulations CanBeExpanded in Culturefor Tissue Repair 1477 NeuralStemCellsCanBeManipulated in Culture ir478 NeuralStemCellsCanRepopulate the CentralNervousSystem 147g StemCellsin the Adult BodyAreTissue-Specific 1479 ESCellsCanMakeAnyPartofthe Body 1480 Patient-Specific ESCellsCouldSolvethe problemof lmmune Rejection 1481 ESCellsAreUsefulfor DrugDiscovery and Analysis of Disease 14g2 Summary lr4g2 References l4g3

Chapter 24 Pathogens, Infection, andInnate lmmunity INTRODUCTION TOPATHOGENS PathogensHave EvolvedSpecificMechanismsfor Interacting with Their Hosts T h e S i g n sa n d S y m p t o m so f I n f e c t i o nM a y B e C a u s e db y t h e Pathogen or by the Host! Responses PathogensAre PhylogeneticallyDiverse BacterialPathogensCarry SpecializedVirulenceGenes Fungal and ProtozoanParasitesHave Complex Life Cycleswith MultipleForms Alf AspectsofViral PropagationDepend on Host Cell Machinery PrionsAre Infectious Proteins Infectious DiseaseAgents Are Linked To Cancer,Heart Disease,

andOtherChroniclllnesses Summary

1485 1486 1486 1487 1488 1489 1494 1496 14gA

1499 '1501

CELLBIOLOGY OF INFECTION 15 0 1 Pathogens CrossProtective Barriers to Colonizethe Host r5 0 1 Pathogens ThatColonize Epithelia MustAvoidClearance bv the Host 1502 Intracellular Pathogens HaveMechanisms for BothEnterinq and Leaving HostCells I 504 VirusParticles Bindto Molecules Displayed on the HostCell Surface 1505 poreFormation, Virions EnterHostCellsby Membrane Fusion, or Membrane Disruotion 1506 Bacteria EnterHostCellsby phagocytosis 1507 Intracellular Eucaryotic Parasites ActivelyInvadeHostCells 1508 ManyPathogens AlterMembraneTrafficin the HostCell 151 Viruses and Bacteria Usethe HostCellCytoskeleton for Intracellular Movement 1514 ViralInfections TakeOverthe Metabolism of the HostCell 1517 PathogensCan Alter the Behaviorof the Host Organism to Facilitate the Spreadofthe Pathogen

Pathogens EvolveRapidly AntigenicVariationin Pathogens Occursby Multiple Mechanisms Error-Prone Replication Dominates ViralEvolution Drug-Resistant Pathogens Area GrowingProblem Summary

15 1 8 15',19 1520 t)zl





Epithelial Surfaces and Defensins HelpPreventInfection HumanCellsRecognize Conserved Features of Pathogens Complement ActivationTargetsPathogens for Phagocytosis or Lysis Toll-likeProteins and NODProteins Arean AncientFamilyof PatternRecognition Receptors Phagocytic CellsSeek,Engulf,and DestroyPathogens ActivatedMacrophages Contributeto the Inflammatory Response at Sitesof Infection Virus-lnfected CellsTake DrasticMeasures to PreventViral Replication NaturalKillerCellsInduceVirus-lnfected Cellsto KillThemselves Dendritic CellsProvide the LinkBetween the Innateand AdaptivelmmuneSystems Summary References

1525 1526

Chapter25 The Adaptivelmmune System

152g 1530 15 3 1 | )55

1534 1535 1536 1537 |53t


LYMPHOCYTES AND THECELLULAR BA5I5OF ADAPTIVE IMMUNITY 1540 Lymphocytes AreRequired for Adaptivelmmunity 1540 TheInnateand AdaptivelmmuneSystems WorkTogether 154j B Lymphocytes Developin the BoneMarrow;TLymphocytes Developin theThymus 1543 TheAdaptivelmmuneSystemWorksby ClonalSelection 1544 MostAntigensActivateManyDifferentLymphocyte Clones 1545 lmmunological MemoryInvolves BothClonalExpansion and jS45 Lymphocyte Differentiation lmmunologicalTolerance Ensures ThatSelfAntigens AreNot NormallyAttacked 1547 Lymphocytes Continuously Circulate ThroughPeripheral Lymphoid Organs 1549 Summary 15 5 1 B CELLSAND ANTIBODIES 15 5 1 B CellsMakeAntibodiesas BothCell-Surface AntigenReceptors and Secreted Proteins 1552 A TypicalAntibodyHasTwoldenticalAntigen-Binding Sites 1552 An AntibodyMolecule lsComposed of HeavyandLightChains 1552 ThereAre FiveClasses of AntibodyHeavyChains, Eachwitn DifferentBiological Properties 15 5 3 TheStrengthofan Antibody-Antigen InteractionDependson Boththe Numberand the Affinityof the Antigen-Binding Sites 15s7 AntibodyLightandHeavyChains Consist of Constant andVariable Regions 15 5 8 The Light and Heavy ChainsAre Composed of Repeatinglg Domains 1559 An Antigen-Binding Site ls Constructedfrom HypervariableLoops 1s60 't561 Summary

THEGENERATION OFANTIBODY DIVERSITY AntibodyGenesAreAssembled FromSeparate GeneSegments DuringB CellDevelopment lmprecise Joiningof GeneSegments GreatlyIncreases the Diversityof V Regions TheControlof V(D)JRecombination EnsuresThat B CellsAre Monospecific Antigen-Driven SomaticHypermutation Fine-Tunes Antibooy Responses B CellsCanSwitchthe Classof AntibodyTheyMake Summary

1562 1562 1564 I 565 1566 1567 1569



(TCRs) AreAntibodylikeHeterodimers T CellReceptors by DendriticCellsCanEitherActivate AntigenPresentation T Cells orTolerize T CellsInduceInfectedTargetCellsto EffectorCytotoxic KillThemselves EffectorHelperT CellsHelpActivateOtherCellsof the Innate and AdaptivelmmuneSystems the Activityof OtherT Cells Regulatory T CellsSuppress ForeignPeptides Boundto MHCProteins T CellsRecognize Reactions Wereldentifiedin Transplantation MHCProteins WereKnown BeforeTheirFunctions Similar AreStructurally ll MHCProteins Class I andClass Heterodimers with a An MHCProteinBindsa Peptideand Interacts T CellReceptor Targets MHCProteinsHelpDirectT Cellsto TheirAppropriate Bindto InvariantPartsof MHC CD4and CD8Co-Receptors Proteins Fragments of ForeignCytosolic T CellsRecognize Cytotoxic with ClassI MHCProteins Proteinsin Association Foreign of Endocytosed HelperTCellsRespondto Fragments with Classll MHCProteins ProteinAssociated in the Thymus Selected Potentially UsefulT CellsArePositively

1570 1571 tJ/t

1573 1574 1575 1575 1576 1577 1579 1580 15 8 1 1583 1585

Cytotoxicand HelperT CellsThatCould MostDeveloping AreEliminated Complexes BeActivatedby Self-Peptide-MHC 1586 in theThymus in the Expressed AreEctopically Proteins SomeOrgan-specific 1587 ThymusMedulla TheirPolymorphism1588 HelpsExplain of MHCProteins TheFunction 1588 Summary

ACTIVATION ANDLYMPHOCYTE T CELLS HELPER to CellsUseMultipleMechanisms Dendritic Activated ActivateT Cells TheActivationof T CellslsControlledby NegativeFeedback the Nature of EffectorHelperT CellDetermines TheSubclass lmmuneResPonse of the Adaptive and StimulateAn Tu1CellsActivateInfectedMacrophages ResPonse lnflammatory (BCRs) ls OnlyOneStepin AntigenBindingto B CellReceptors B CellActivation for ActivatingMost HelperTCellsAreEssential Antigen-specific B Cells Antigens T-Cell-lndependent of B CellsRecognize Class A Special Belongto theAncientlg Molecules lmmuneRecognition Superfamily Summary References

1589 1590 15 9 1 | )YZ

1594 1595 1597 1598 1599 1600 1600

Acknowledgments In writing this book we have benefited greatly from the advice of many biologists and biochemists. We would like to thank the following for their suggestions in preparing this edition, as well is those who helped in preparing the first, second, third and fourth editions' (Those who helped on this edition are listed first, those who^helpedwith the first, second, third and fourth editions follow.) Chapter1: W.FordDoolittle(Dalhousie University, Canada), (Exploratorium@, Jennifer Frazier SanFrancisco), DouglasKellogg (University of California, SantaCruz),EugeneKoonin(National Institutes of Health), MitchellSogin(WoodsHoleInstitute) Chapter2: MichaelCox(University of Wisconsin, Madison), Christopher Mathews(OregonStateUniversity), DonaldVoet (University of Pennsylvania), JohnWilson(Baylor Collegeof Medicine) Chapter3: DavidEisenberg (University of California, Los Angeles), Louise Johnson(University of Oxford), SteveHarrison (Harvard University), GregPetsko(Brandeis University), Robert Stroud(University of California, SanFrancisco), JanetThornton (European Bioinformatics Institute, UK) Chapter4: DavidAllis(TheRockefeller University), AdrianBird (Wellcome TrustCentre, (National UK),GaryFelsenfeld Institutes of Health), (University SusanGasser of Geneva, Switzerland), Eric

(Massachusetts Instituteof Technology), JoanSteitz(yale (Harvard University), JackSzostak MedicalSchool, Howard HughesMedicalInstitute), (University DavidTollervey of Edinburgh, (California UK).Alexander Varshavsky Instituteof Technology), (University Jonathan Weissman of California, San Francisco) Chapter7: RaulAndino(University of California, SanFrancisco), DavidBartel(Massachusetts Instituteof Technology), Michael Bulger(University of Rochester MedicalCenter), MichaelGreen (University of Massachusetts MedicalSchool), CarolGross (University of California, SanFrancisco), FrankHolstege (University MedicalCenter, TheNetherlands), RogerKornberg (Stanford University), HitenMadhani(University of California, San Francisco), Barbara Panning(University of California, San Francisco), (Memorial MarkPtashne Sloan-Kettering Center), Ueli (University Schibler of Geneva, Switzerland), AzimSurani (University of Cambridge, Chapter8: Wallace (University Marshall [majorcontribution] of California, SanFrancisco)

Washington) Chapter5: Elizabeth (University Blackburn of California, San Francisco), JamesHaber(Brandeis University), NancyKleckner (Harvard University), JoachimLi (University of California, San Francisco), ThomasLindahl(Cancer Research, UK),Rodney (Columbia Rothstein University), (University AzizSancar of North Carolina, ChapelHill),BruceStillman(ColdSpringHarbor Laboratory), StevenWest(CancerResearch, UK),RickWood (University of Pittsburgh)

Chapter9: WolfgangBaumeister (MaxplanckInstituteof Biochemistry, Martinsried), KenSawin(TheWellcome TrustCentre for CellBiology,UK),PeterShaw(JohnInnesCentre,UK),Werner (MaxPlanckInstitute KLlhlbrandt of Biophysics, Frankfurt am Main),Ronald Vale(University of California, SanFrancisco), Jennifer (National Lippincott-Schwartz Institutes of Health) (Swiss Chapter10:Ari Helenius Federal Instituteof Technology Ztjrich,Switzerland), (MaxplanckInstituteof WernerKtjhlbrandt Biophysics, Frankfurt (Maxplanck am Main),DieterOsterhelt Instituteof Biochemistry, Martinsried), KaiSimons(Maxplanck Instituteof Molecular CellBiologyandGenetics, Dresden)

Chapter1l: Wolfhard Almers(OregonHealthand Science Chapter6: RaulAndino(University of California, SanFrancisco), University), (University Robert Edwards of California, San DavidBartel(Massachusetts Instituteof Technology), Richard Francisco), (University Bertil Hille of Washington), Lily Jan Ebright(Rutgers University), DanielFinley(Harvard University), (University of California, SanFrancisco), RogerNicoll(University of JosephGall(Carnegie Institution of Washington), MichaelGreen California, 5an Francisco), (University Robert Stroud (University of California, of Massachusetts MedicalSchool), CarolGross 5anFrancisco), (University Patrick Williamson of Massachusetts, (University of California, SanFrancisco), Christine Guthrie Amherst) (University of California, SanFrancisco), Art Horwich(yale University Schoolof Medicine), (Stanford RogerKornberg Chapterl2tLarry Gerace(TheScrippsResearch Institute), University), Reinhard (MaxplanckInstituteof Lrjhrman Ramanujan Hegde(National Institutes of Health), Nikolaus Biophysical Chemistry, Gottingen), (University Pfanner euinn Mitrovich(University of of Freiburg, Germany), DanielSchnell California, SanFrancisco), (HarryNoller(University (University of California, of Massachusetts, Amherst),KarstenWeis(University SantaCruz), (University RoyParker of Arizona), RobertSauer of California, Berkeley), Susan Wente(Vanderbilt University

rlnarts lallelal)oU) Nl)lU)lq)laleMsolesew'$1 lalseqrueyl;o {tlstan;u1) Inpdrue!lllM'(rQtslantul JosalnlrlsulleuorteN) 6uaq5ueblo6l a}nll}sulspasnqlesseW) seUeq)'1{6olouqre1;o 6rervrzuessnp leerl)rW'(a6p;rqr.ue3 ;o fi;slen1ul)ta6laqnay 'ralua)q)lPesau ells6aayuay '({6o1ouq)alJoa}nlllsul puele)!pawqslMafleuolleN) '(Alsrenlunalel5 ue61qr;y1) leeq)lw'(re^uac sau{gpieq)lU'$n 1a}saq)ueW }o ^}lsla^lun) sDesnq)PsseW) ues'elulo1lle) ;o {r1sten1u6) l)eueW eddr;rq6'(o)st)uerl q6nouepoog '(a6plrqruel;ofirsran1ul) sa;tqdungulueW'(loot{)S le)lpewple^JeH) uotea3se;6no6 rarue'lsrma-l '$n 'lelsaq)uew;o{lrsranru6) porre9pl^eq'({llslantuq '(looq)S lalueC leulLlaog uo^ pleleH:97raldeql le)rpawpJe^reH) '(o)sr)uerl 'eluloJllef ues q6lqa-l) ;o ftlstantu1) Ilel selqlrew (luoula4 lo {ttslanlu6) ueuq)no) uqof'(ll!H eurlore3'()n'a6allof {>lsureq le1radu1) 'eu1;ore)qlroN firsranlu6)e6pprngqlla) '(^ueulag pre6 {re9'({lrsrenru6 eqf) ueuulelsqdleu'()n rallaJal)ou Jo ;eder43 'elulorlle) '({e1a1tag 'q)reasau /aulllpew ouelae3 esnos a stau lale/l Ja)uel) Jo reln)olowloj .ralua]l)nlqlao-xew)lalaulq)J!8 ueS'eturoJ!lef {oulrod;a1ueq'(orsouelj {1rsran1u61) ,o Illstanluq) '(lalual {ilstanlunploJuels)pollaxv,(ar;;a;:61 raldeql le)tpaw ueS'etuloJtle) {a;s1ro1preq)lu'(o)st)uell }o ^}lsla^lun) orolY) (uedel,Qlsrenru61 '(a6prrquel;oIlrsran;u1) uotealse;6no6 raruplsrma-l '(lerrdsop se;6no6 apnf'lS)uaa.r9 s,ualplrq) '({lrsranrul ele6eNnze1a61q5 uo}a)ulld)lslnbul pro}ue}S) MoIlel ue}S'({l;slantu6 '(o)sl)uell '(etleJlsnv'r{)leasau le)lpawJoalnlllsullleHezlll puelalleMaLl! uu{1'(1ooqr5 le)lpawple^leH)lotuqeo8uo^ pleleH jo t$lstantul sexef suepy,fura;'(looq)5 le)lpawulalsaMqlnos ueS'eruroJrleJ ;o ft rstanrul)doqslglaeql;y1'(,Qtstan;uq reldeql 6ueM6uopoelX:91 eqf) [uo!]nqU]uol lPlluelsqnsl ploluels) ro[eu]loUaLlIallnf:tZ re$eq1 [uorlnquluo) (gergdsog tuarplrq)apn|ls) laqs soUeq)'(a6p;tque1 ()n 'alnlllsulq)rpasau ra)ue))DeMeuorl'()n 'elnlrlsuluoprng)sautdueqleuof'(pJolxgJo^llsle^lun) 'qOlnqulpl 'ralsaq)ueW JoIlrsrantuleqf) llnallssapeq)'()n oleloqe1(6o1o;g I eln)elow qv(usep u ry'((uetutag',fu (ltstantul) 1oftrsranruq)qllulsullsnv'(allleas'uot6urqseMJo p;eep1 rluasle):1r3'({e;a1leg/eluloJlle)JoI11stan1u61) qeuseuoql'(looq)Sle)lpawple^leH)ulllo uenls '(loor'l)S ueadorn3) '(al}}eas'le}ua) laz}ol9 e))aqeU'(o6er1q3;o{1rsren1u61) laeq)lW uas;gurofg'(a)uprl'eun)lnlllsul)pJe^nol le)tpawpre^reH) ue5 parj)re6p3a>nrg',(o6a16 le)ue) uosu!Ll)lnH q)leasou tl)leasau sOu;y) seqbnguoluls'(alnlrlsul larueC'()n'aba;1o1 /erurojtle) ueS'e!uloJ!lel jo {ltstanlun)leseopeLlsrv'(o)sl)ueU 'lS'rQ!stentun uot6ulqsenn) uopuol)lprPqregre6;og'(srno1 ;o Il;sten;u6)[uollnquluo)rofeul ue6roylpl^eq:Zl raldeql aql'alnillsullq)elqnH)sle^el) uoplog(er;;a;'(spue;raqleN alenpleuou jleu :97raldeql q)leasauuopuol)sLuepv sueH'(alnlllsul (o)sDuellues'elulojllef;o A;sten;u1) '()fl 'rueqbu;tut1g '(looq)S le)!pawpre^leH)uos!q)llw{qloulf uopuol)uodelselor1l'()n'alnlllsuluopjnD 's)lleua9 (alnlllsulrl)reeseu elnel'(uapsalC ;o {ltslantu6aq1)I1saqre61 loj allua] leuolleN )!J!lual)s aqf) rll!rusutr'(e)uPll 'tl)leasau pue{6o;o19lla) leln)elo4}o oln}llsull)ueldxew) pleMoH ueulnHlo, ellua) q)uau) q1n6s1amg>s slofuell',()n',s)lreuag reln)olopro; Itoletoqel 35y11) aof '()n [6o1o;9lla) pue16o1o19 '(oba16ueS'eluloJ!le) lsnrl auro)llaMeLlI)uosllaqouLllaqezlll /eluroJlle);o {tte1 u1a1sp;o9 {1;sren;u61) llpHuelv'(o6alqueS sluu!9)Wue;11;nn'({6olouqlal I Jooln}l}su ;o 11;sranrul; '(o)st)uerlues'elulo;llP);o,tlsranlun)aulno€ftuag '({tlstan1u6 zlrmola{a6; eluroJrlef) }}o!lll'$n'aln}t}suluoplnDaql) uosutlof a!lnf:91 raldeql ploluels) ]ollaqf [uoltnqll]uo)JofeLul 'tS sra6lnu)aul^llqlauuo)'({11slantu6 latuec'(ftrsren;u61 '(uopuo'la6a1;o1 'eluroJlle1 'atnlllsul ual suaqdar5 aq1) uleqeJqeg ()n a1n6)ue6ogpr6ug'({ala1tag Jo,(rtstanlul) 'q)unZ l(ltstantuq) '(pueUezllMS ra;seg prer.l)lu puel.reH laq)lldellnf'(oluolol';e11dsop1 leu!StunoW)uosmed Jo ftrsran;u1) '1{6olouqra1;oe}n}l}sul (uof '(ArstenlunploJuels)assnN peruo)'()n 'arnllsuluoprngeq1)ra6urrqyollnf:zz reldeql laou Dollll'(o6el6 ueS'eluloJ! |e) ;o {llstantu1) etulojrIe)) zlrlnote{ayy '(uapaMs (au!]|pawjo filstan;u1)aul^llulqou uue) laeqrlW'(a6prrqr.ue1lo '()n 'alnlllsuluopln9eq1) 'q)leosa5 Ined looq)Sreurslunow)ueullessPM Je)ue)loJelnlllsul6ttnpnl)u!plaHlUuaH-lle) ualleq)Splelag ruernS urzv'(lalua) tuaudolana6q6tnqsp14) '(relue) uelV'(aul)!pew 6ulteDe)-ueolS letue) lleH leuoLuaW) '(o)sr)uerlueS'etulojtle) {lrsten;u61) '(lalue) o[ragaeuaS'({6o;ouq>a1 ;o le)lpaw ]o looq)s,firstantulele ) qsoqgrelues '{6o1o;9 ,fute1'(s;ne6'eluloJllel ra^eeM-lrO jo alnlllsulsuasnq)esseW) sexalJo{ltstantulaql) ueulllgpalilv'()n ulalsamqlnos u61eduel '(,tlstenlunptoluer5) sal(y1euer6'(a6puque3'etnltlsul ;o I1;srenru6) Meqlle6 ueuiaell leln)aloW Jofuo1eroqe1) u1 q)reaseuletue] /a[uo)llaM)uelel]W auuy'(rttstant 'Un 'qlleaseu le)lpawloJelnlllsul leuolleN) lleilal sauef 'eluloJllef pJe^reH) roul)el) {tuel '1srne6 }o fi;slentu6) {etg stuua6'(olsl)uell ao)sugsarue6'(abplrqule) Jo{11slan1u1) '()n 'al}ua)sauul ralunHllaN'(^rl) sPsue)'qlreosaulellpaw loj alntllsulsJaMolS) ueS'eluro,tle);o{llsra4un)aulnog^luaH 'eluoj!lef fttstan1ul)6tnqula6 Ia;rvreg}}or5 '(,{a;a1iag }o plaqleHseloqllN:91 raldeql uqof)luot]nqll]uollel]uelsqnsl '(o)sl)uell {qqy '(uopuola6e11o1 fttstantul;lloJre)uqof (obe16ue5 ueS'elu.roJllef ;o {tlstenlun)o)tele)erlllled:17 reldeql '(eruenl(suue6;o 'e!uroJrlpf {ilstanru6) lllstantul a#e leeLl)lW ]o suasnq)esseW) ({6o1ouq>a1 'alnltlsultl)Jeasau 1oa}nlt}sul uosurelqy)uosduoql la)ue) r(;rure3 'q)leasau uosulltuof 'sr1s,(qdo1g le)ue)) 6taqutann llaqou 'Un ule 61er3'(u;e61 unpluell Jo elnlllsull)ueld 'lolsll8 uosduoql 6;er1'(abp;tquel ue1'(eruenl{suua6 clellsaleH jo ;o r$tstantul) ^l!sle^!un) xew) rpuerqlqu laula A'()n lepuodarnrg '({toleroqelroqte;16ut.td5 :91raldeql 1oftrsran1u6) {er9 ;aeq>161 elsnoqleC) MalpuV'(I11slan1u61 plo)) eMol llo)s '(aul)lpewJo looq)s(trsrantunploluers)l)lsd!l (utewuleun}lueu 's)llaue9pue{6o1o19 1|e1 sel6noq ueqeueH qdesol'(o)st)ue4 ues'elulo1lle) 1oI11s.ranru61 '({11slan1u1 a1e1) '(abpqrquel lltslanlu6)spieMpl Ined'(epeue)'oluolol leln)alow,oalnlltsulI)ueld xew)lellazoulrPW 1o '()n 'q)leasau uopuol)o^elq)sotlaldruelD q)Jeesau uleqelg'(alnlllsul uaileM re)ue) Joelnlllsul ;o lltsrantul;IttC uqof '(a6ppque)'ll)uno) q)reasau lerlpawaq1)rueqla6q6ng suqor)zun€pell '(o)sl)ueJl or.lf)ouoS-acuueqor'(sulldoH '(6o;ouqral;o ele ) ueullawPll'(puelrazllMs'q)UnZ doqslglaeq>1y1'(spueUaqloN'(Ilrs.ranlun ueS'elurojtle) ;o Iltstantu6l) snlualappy'(o6er1q3r 1r1;9 ;o fitsten1u6) sula€uoluv'(ftlslantu6 alnlrlsulleJepalss!MS) otll'aln]!lsulle)ueJspueueqlaN) uag'(o6a;6ues'etu.loJllel ;o {lrsrenrul)ltul llo)S :g 1 reldeql projuels)[uot]nqUluo)lel]uetsqnsl !pJeDVelnel :67 reldeql (lsJaquV 'sllasnq)esseW{}rsranrul)uosulellllM led'(le}ua)le)!pal/! ;o (qlleeH aqo) epeue qlauua)'(uedel'alnlllsul leuolreN) ]o selnlrlsul

lunH utI'()n lalseq)ueW serrqdungur1re61 '(se;e6uy sol'etulojtle);o {lrsranrul)6teques13 Jofirsrenru6) pt^e6,(o)st)uell '(uopuolaba11o1 s,6ury) saq6nguotur5,({lrsrantun ralla;el)ou ueS'etuloJ!le) jo ^ltsla^tun) spremplyeqog,(e6prrquel aq1)qradspngseuef ,(allleas,uol6ulr.lse6Jo firsrarrrul)preMoH spleMplIned,()n ,q)tMloN,alnltlsulsauul Jo^ltsle^tun) ueqlpuof'(16olouqta1 ueulsnoH uqof)llaMunCurtf'(uopuo'l,1ru1-1 ;o alnltlsulsDasnq)esseW) s>rs{qdotg lla) fUW)uunC prne6'({6o;ouqlel}o ,()n ,qrreesag atnltlsulsgasnq)esseW) z}t^loH rueqerg'(uopuo'l a6a;;o1q6ury; UeqoU qllo) {a1pnq '(ft rsranr ,(elDeas,i(6o1or u uola)u plagdo ud) g qof u g 1 sr.uels{5 pJeMuMoO .ro1 Jalup3) uer;n;,(o6er6 ues,etuloJtle1 1oftrsrenruq) atnlllsul)poo;1{ore1'110019 {uot5 ,uo^ MeN;o {lrsran;u1a1e15) al]l!looc;1essn5'(a6prrqLUp) Jo{t;stenrul;uosqoc.raqdolsrrq) ,qlleaH salnltlsul '(o)st)uel3 quornsbur;;op {ruey ,(epsaqlag ueS'etulo;tlp) tsrenruq) Jo o)uellaC IuoqluV leuotteN) ft ;o ,uo16urqse14;o '()n 'lsnrl q)snqauutH ,(alpeos '(uopuol uelV lgrslenlu6) alltH auo)llaM eqf) lauac a6e11o1 ltyeg laeq)lW '(pasealep) zltMolsloH ell ,(qeln;ofirsranruq) r(lrslenluq) (pue1-11n3 aleCatlsal'(uopuola6a11o1 ,Ols.renrun) l)ulaH uualg '(abprrquel'{6o;org ,({r;sren;u6 reln)aloW uoslapueH Uenls'(uostpeW'utsuo)stMro {lrsrengul) Jo,toletoqel )UW) Mor) saulef p r e q ) t u ' ( ^ l r s l a ^ teul e n l )s n t u a l eU .rS,oul)lpow HV , ( ) n , u e q 6 u r u r r 6 su;1dogsuqop)6rerl{ruep ,(srno1 Jo loor.l)S ,1ru1,{6o;o19 pup,{6o1o19 {lls.renruq qleaHuqof ,(uopuo-.| 1o{lrslearu6) uol6ulqsen4) .redooluqof ,(o)st)uellues,eturo;tle) llo) ,({ueureg,6raq;aprag lla) leln)aloryrol {roreioqelfUW)poomlpHupupv,(elneas JoIltsre^tun)alool te6og lgWJ)ueqo) 'uot6urqsetr4 {lrsranr un) |la/v\U eHpuelel,(Itrsra^t un pl e^reH) ;o uaqdals'(olst)upllueS'elulo;tlp);ollrslarrul) ueqo)lleqou ,o6etg;o,tlsranlun)slrreH '(puellols'eepunC;o uosuleHuaqdal5,(pueleaz ana51 firslanrul)uaqol drIq6,Un ,Ll)tMroN ,eturortle) 'elnltlsul uqof 'Un 'q)reasau '(6rnoqsel15;o ra)ue)) suleH ueupV,(i(a;e>Fag sauul uqof) ueo) oluuf r{llstenru6) ,aapunC;o pleqltu,(puplro)S 1oIlrstanrul)pueUeH uoqueq) arretd,(uopuola6a;;o1q6ul)) qllurs_ratle^e) ftrs.ranru6) ulol atpleHuleqel9,(yn,q)tMtoN,arlua)sauuluqof)pleqleH '(obalCueS'eruro;!le);o laluadrelaptelapv,(se;a6uy ftrsranruq) seloq)tN'(e6puquel '(qe}n (are1 aIueHpr^e6,(o)sr)uelj sol /erulojtle);o{1;vanru61) ;o {lrstan;uq) {1rs.renlu6) ;eeq)y1 Jo ,(y1,uoldueqlno5 ues'erulo1rleJ l(1rsienlul) q)ez rqt>adel 1o oueyy'1uo6erg;or(1rsrenru61) lleH ;p1ede1 l)uepou '(I1rsianru61 stapuelg) {ar;;a1,(uopuol ;o firsranruq) uqof ,(rQrsranrul lleH etqulnlo))lotueJsalleqJ,(looqtsle]|pawple^leH) 1;eg l(6o1o19 reln);{roleioqel)UW)llpH llof pup{6o1o;9 Iellue3 sranal'({a;e1rag'etulo}tlef}o {lrslenru6)apue) sneq)ez '(ploJXO 'tueutrr;u; ue;y'({grsranrun ple^reH)6regprne6,(q)UnZ uajpH leltsle^tun) e#!l)peu) sulte)uqof,1llrgledeq3,eurlole) ,(sle)!lne)eujleqd lsulf '(o)st)uellueS,eturoJtlel ^lrs.renrun) auqfng eullsuq) quoN ^l!sle^lun) ebprllng qlle) 1o Jo XgS) '(elloqup)'{lrslanlul uetletlsnv) 6uruungueug,(la}ue) {e;tngueqdals'(laseg lpuotteN rabln6xeyl'(eur)tpew Jo^lrs.renru6) Jo re)ueJ6uua11ay-ueolS leuoueW)rautqulng{rreg,(se;a6uy Ilrstanruplro MaN)uepJnge^als,(ploJXO ;o {1slenru1)uanolg ,(spueuoqlaN '(uopuo-l sol 'eruroJtle) ^ltsra^tun) ,(uopuol ulolsunrD ar{l Jo ebe;1o1 laeq)tW {rreg laeq)rW a6aJ;o1 E6ury)unnorg 'lrailsla^run ,(o)st)uell ,etuJoJtle) '(I6olouq)af snLlsell)pla^solg ueS qoorg luelJ jo rJoqou olntltsul etulojtle)) Ebury) leseJl '1e6prrquel,(6o;orglelnleloW ssorglole] /(spuelJoqleN -reuuorgeuuelreW ;o ^lrsranruq) aqf ,ureple]sruv Jo Jo{roteroqel ^}rsranru g) lla^u9etlsal'(lslequV,s]lasnq)essew ;o ^}rsranruq) o}nllsullelapel )UW)loq)slel€lleW,(Ll)unz,{6o;ouqrelyo ueelg ple^leH) ,(a6prrqLuel pleMoH,(uopuol uaalg laeLl)tW,(&rslanlun sst^ tlpuelg olpuv,(pesealep) uapuerg S) ;re) abal;o1s,6ul))ieaeig lalleM'(plojxo;o i(1rs.ranru6) ,(uopuo-l ua1er9 {lrslenru6) puer€ utue4 e6eg1o3 Ilrsranlu6) 1o ,loorl)s uelv'(uopuo-l xasalpptw) plnoglalad ap{oguely'(o)sl)uplJ le)tpaw;el;dso;1 ups,eturoJtle) }o Atsto^tun)aurnog(tueg '(raplnog /opelolof r.llupoogultr,(looq)sle)tpaw '(uepralsuv Jo{lrsranrul,aln}tlsululeplauuleMsupf)lsroglatd Jor(lrsranru6) ,(uopuol,looLl)S pre^reH) '(aur^rl'etuloJrle) qbnouepoog ,q)leesau latueC epo€sueH,(uopuo-.| lellpeW ;elldsog Jo{Usranrun) a6a11o1 llstenlul) suadr.uog uotlspg,(o6e16 ueS,etuloJtlpJ sstlgLlltf'(olst)uellues,erujojtle) le)paw toJalnltlsulleuotleN) ro ,(eulsny,euuetn {lrsranru6) utolsplog ,(lootl)S rt.re1 l(}rsranrul) lez]olg l(1rsranru6) doqsrg }o 1o ;eeqrr61 le)tpawuosuqofpoo4 uole)uud)6ren1r9 leeq)lW'(llrsrenruq selleq),(pesee)ep) elnltg rraqou-fCNWn)lltg pt^eo,(a6prrqueS,otnlllsul r,!eqplqeg arurog'(lsleqrlv,sllesnq)esse61 {lrsranrun) ejoultDptau aqf a6prrreg ple^rpH) ;o plouureguouaw leeq)tW'(loor.tls le)tpaw '(auprpe4 '(o6alqueS'etuto;tleJ;o {lrslanrupale ) qsoqgleIueS'(^6olouqlaf ,(eut)tpew Jo loor.l)S 6legurnnre6 firsranruq) ,({6o;ouqrel lalUagluell ,(pausutUpW ,o alnlllsulsllesnLl)psseW) ulalsulluaqlv)Dauueglaeq)rW ;o a6e1;o1 ',tls[uaq:ol€ ,(Atsla^tun Joalntrlsull)ueld xew)q)sua9laqlung ,({e;a1reg ploruels) Joalntrtsulsuesnq)essew) ;auegpt^ec 'eturoJtle) ^}rslenru6) Ueq.lagur]of,(a]n]tlsul r.llleoseu saJleguag'(o)st)uell ,etuJoIle);o Jo ueS {1rs.ranru6) uueu6reg ,lo^or.leu ,a)uet)Sjo alnulsul sddrrr5eq1)o)elag &Je1,(1aers1 Prlaulo)'(o)st)uerlues,etulojtle) epupglaeq)lw Jo^llsra^tun) ,unJluazotg) '({lrsranru6 uueuztel ) le6legr(uuag,(leseg prolue}S)utMplegpt^ec ,(paseo)ap) JoIlrslenlu6l releg lalod /q)leeseu '(o)sr)uerl 'etuloJtle);o 6ulrqeglolleM,(uopuo-l ,(uopuol lo)ue) elnltlsul) ueS {pqenny Jo ftrslenrul) eu{e1 puepPD ielad'(uopuolabel;o1{lrslenrup) ut/v\paW-rauplpDa6a;1o1 {l;sranru61) elouqsVueqteuof,(e6p;rqruel ;o ft rstenruq) ^uoqluv'(plolxoJo{trsranlun) lauplegplpqltu,({lgsranguq a;e1) taurnqqsvlapq)tW,(looq)Sle)tpowple^leH)seuolesf ,(o6errq3 ,(olsr)uelJ ,(o)st)uell ,etulo;tle) -slue^euV sq)nJ ;1e9qdaso; Alsranlun) autell ueS 1o sol^ds ueS JoIlrslenru6) 'elulojlle) ^rlsran!un) '(epeue),ueMaq)lplses,o '(eruenl{suua6 pualrl Jo pneurv eLlUeW laluec Illsle^lun) 6uor}stu.rv Iel) Jo ,(loorlls '(o)sr)uerlueS'etuloJtle) Ilrs.ranru6) Ilrsranguq) a1ano3,(ire1 lp)lpowpre^lpH)ueu)loj qepnf outpuv;neg,(a6puque1 1o '(r(grsranru6 '(6o1o19 sUnI) ueLuoll {enreg ,(o)souellues/etulo;tleJ leln)alowJo{role.roqel)UW)soulv epur-l,(uo}sog 'q)leesou ,(a6prrquel;o ;o {ltstentul) lreqog'(Ilrste^run etqunlo))r.lleqq)srJ l)ua}lall loJ alnlltsul pat3 le)tpaulolg llv Ug)) ,etulo}tle) p;era9'({e;e1tog ,(o)st)uellueS,eturo;tle);o {re9 Jo{lrslenlu6)auo}satrl firsranrul)uelv laeLl)tW A;slen;up) pte6y prne6suolllpe qyno, pue,prrql ,puo)es,lsrrl (epeup)'oluorofJofirsranru6) seutqeqS eac ,(epeue) 'otuolof ,(eulsnv,euuatn jo {t;stanru6; ltzpnul)lN Jorerstenru6l) ,(firslenru6alnC) snleqsrypl^e6 lptol) lanueuu3 srapeag '(uopuol'tl)reasau ,(fttsla^tun la)ue) Joolnttlsul)ranu3buel unuerg r(Iaq5 ,eruermelloueell {resso;g a)tn6)Mopul ufueq5,(o6arqupS,etuJo;tle);o l(1rsranlul) ,(atnlrlsul lurl llo)s '(eru16rr410 r(tls.ran!un) uoslaulf '(^l!sla^tun selleq) lles eql) uosloul {;laneg,()n ,uoffns,q)leasaule)ue) ,()n ,q)leaseu o}ntllsul) rolleja))ou aql) qdlpU ueululors Jo lo)ue)) esnos ueurlllqlnu '(slnol.lS,,(llslanru6 uol6u!rlse4) ur6;3qere5 o sreuouelae)'(looq)S ,fismefeg snely,(rllleoH le)!powpre^leH)

(Cancer Hurst(University of Bath,UK), Research, UK),Laurence TonyHyman(Max CollegeLondon), JeremyHyams(University Dresden), CellBiology& Genetics, PlanckInstituteof Molecular Philip Instituteof Technology), Hynes(Massachusetts Richard UK),Normanlscove(Ontario Ingham(University of Sheffield, (Cancer Research, Toronto), Davidlsh-Horowicz CancerInstitute, (University Charles 5an Francisco), Lily Jan of California, UK), (Columbia (deceased), Arthur University), TomJessell Janeway AndyJohnston(JohnInnes A & M University), Johnson(Texas College, Norwich,UK),E.G. Jordan(QueenElizabeth Institute, (University Ray LosAngeles), of California, London), RonKaback DouglasKellogg of California, Berkeley), Keller(University (University of of California, SantaCruz),RegisKelly(University (MRCLaboratory JohnKendrick-Jones SanFrancisco), California, of Biology, Cambridge), CynthiaKenyon(University of Molecular (University of RogerKeynes 5anFrancisco), California, Madison), of Wisconsin, JudithKimble(University Cambridge), (Massachusetts Marc Hospital), Kingston General Robert (National (Harvard Klausner University), Richard Kirschner (Harvard Mike University), NancyKleckner of Health), Institutes (University Boulder), KellyKomachi of Colorado, Klymkowsky (University EugeneKoonin(National of California, SanFrancisco), (University 5an of California, JuanKorenbrot lnstitutes of Health), (University 5anFrancisco), of California, TomKornberg Francisco), (Washington Daniel 5t.Louis), University, StuartKornfeld (University MarilynKozak of California, Berkeley), Koshland (Stanford (University University), MarkKrasnow of Pittsburgh), (MaxPlancklnstitutefor Biophysics, Frankfurt WernerKrlhlbrandt (University Robert Berkeley), of California, am Main),JohnKuriyan Peter London), CellBiology, for Molecular Kypta(MRCLaboratory (MRCCenter, Cambridge), UlrichLaemmli(University Lachmann of Cambridge), TrevorLamb(University Switzerland), of Geneva, of Research, UK),DavidLane(University HartmutLand(Cancer (University JayLash of Oxford), JaneLangdale Dundee, Scotland), of (University PeterLawrence(MRCLaboratory of Pennsylvania), (MountSinaiSchool PaulLazarow Biology, Cambridge), Molecular (DukeUniversity), Michael RobertJ.Lefkowitz of Medicine), WarrenLevinson of California, Berkeley), Levine(University (Hebrew (University AlexLevitzki SanFrancisco), of California, (University of York, UK),Joachim Leyser lsrael), Ottoline University, TomasLindahl(Cancer SanFrancisco), of California, Li (University (University San of California, Research, UK),VishuLingappa (National of Institutes JenniferLippincott-Schwartz Francisco), Schoolof DanLittman(NewYorkUniversity Health,Bethesda), UK),Richard Norwich, CliveLloyd(JohnInnesInstitute, Medicine), (National Institute RobinLovell-Badge University), Losick(Harvard of London), ShirleyLowe(University for MedicalResearch, (University of LauraMachesky 5anFrancisco), California, Medical of Colorado UK),iamesMaller(University Birmingham, (Harvard ColinManoil(Harvard University), TomManiatis School), (National JewishMedicaland Marrack Philippa MedicalSchool), of Cancer MarkMarsh(lnstitute Denver), Research Center, San of California, GailMartin(University London), Research, Joan CollegeLondon), PaulMartin(University Francisco), (Memorial Center), Brian Cancer Sloan-Kettering Massagu6 (University McCarty lrvine),Richard of California, McCarthy (University (Cornell of California, WilliamMcGinnis University), (Wellcome/Cancer Campaign Research Anne McLaren Davis), of California, FrankMcNally(University Cambridge), Institute, Institut,Basel), Miescher Meins(Freiderich Freiderick Davis), lraMellman 5anDiego), Mel(University of California, Stephanie (YaleUniversity). of California, Meyer(University Barbara Instituteof Technology), ElliotMeyerowitz(California Berkeley), of RobertMishell(University University), ChrisMiller(Brandeis (University CollegeLondon), UK),AvrionMitchison Birminoham,

(University TimMitchison CollegeLondon), N.A.Mitchison (TheRockefeller (Harvard PeterMombaerts MedicalSchool), DavidMorgan MarkMooseker(YaleUniversity), University), MichelleMoritz (University SanFrancisco), of California, Moses(Duke Montrose (University 5anFrancisco), of California, (University SanFrancisco), of California, Mostov Keith University), HansM0ller-Eberhard CollegeLondon), AnneMudge(University of AlanMunro(University Institute), (Scripps Clinicand Research (Harvard Richard University), Mitchison J.Murdoch Cambridge), of California, DianaMyles(University University), Myers(Stanford MarkE.Nelson University), AndrewMurray(Harvard Davis), MichaelNeuberger (University Urbana-Champaign), of lllinois, Walter Cambridge), Biology, (MRCLaboratory of Molecular DavidNicholls of Munich,Germany), Neupert(University of Noble(University Suzanne (University Scotland), of Dundee, (University of California, HarryNoller 5anFrancisco), California, Paul Davis), of California, JodiNunnari(University SantaCruz), Patrick UK),DuncanO'Dell(deceased), Research, Nurse(Cancer Olson Maynard (University 5anFrancisco), of California, O'Farrell (Children's Orkin Stuart (University Seattle), Washington, of (Massachusetts Instituteof TerriOrr-Weaver Hospital,Boston), WilliamOtto ErinO'Shea(HarvardUniversity), Technology), of Birmingham, (Cancer UK),JohnOwen(University Research, Palade (University George Michigan), of Oxender UK),Dale (University San of California, (deceased), Panning Barbara WilliamW. (University Tucson), of Arizona, RoyParker Francisco), TerencePartridge Seattle), of Washington, Parson(University WilliamE.Paul(National (MRCClinical London), Centre, Sciences (MountSinaiHospital, Toronto), TonyPawson of Health), Institutes Cambridge), Biology, of Molecular HughPelham(MRCLaboratory Greg Philadelphia), Research, of Cancer RobertPerry(lnstitute (Cancer Research, GordonPeters University), Petsko(Brandeis JeremyPickettUniversity), UK),DavidPhillips(TheRockefeller JuliePitcher Australia), of Melbourne, Heaps(TheUniversity JeffreyPollard(AlbertEinstein (University CollegeLondon), BrucePonder TomPollard(YaleUniversity), of Medicine), College of California, DanPortnoy(University (University of Cambridge), (University Seattle), Washington, of JamesPriess Berkeley), (Duke (Tulane DalePurves University), DarwinProckop JordanRaff EfraimRacker(CornellUniversity), University), (University KlausRajewsky (Wellcome/CRC Institute,Cambridge), (University Elio Oxford)' of Ratcliffe George Germany), of Cologne, (University MartinRechsteiner (Harvard MedicalSchool), Raviola Institutefor Medical of Utah,SaltLakeCity),DavidRees(National (University San of California, Reichardt Louis London), Research, (YaleUniversity), ConlyRieder FredRichards Francisco), (Massachusetts Robbins Phillips (Wadsworth Albany), Center, of Reading, ElaineRobson(University Instituteof Technology), Rosenbaum Joel (The University), Rockefeller UK),RobertRoeder Toronto), (MountSinaiHospital, (YaleUniversity), JanetRossant JimRothman(Memorial of Health), Institutes JesseRoth(National (LaJollaCancer ErkkiRuoslahti Center), Cancer Sloan-Kettering General GaryRuvkun(Massachusetts Foundation), Research (NewYorkUniversity), AlanSachs DavidSabatini Hospital), of AlanSachs(University Berkeley), (University of California, (University North of Salmon Edward Berkeley), California, Peter University), ChapelHill),JoshuaSanes(Harvard Carolina, LisaSatterwhite(DukeUniversity Sarnow(StanfordUniversity), (University of California, HowardSchachman MedicalSchool), of Basel), (Biozentrum, University Schatz Gottfried Berkeley), Richard (University Berkeley), of California, RandySchekman (Cancer Schiavo Giampietro (Stanford University), Scheller (NewYorkUniversity Medical UK),JosephSchlessinger Research, (HebrewUniversity), RobertSchreiber MichaelSchramm Center), (Columbia JamesSchwartz lnstitute), (Scripps Clinicand Research

University), RonaldSchwartz (National Institutes of Health), Franqois (ENS, Schweisguth Paris), JohnScott(University of Manchester, UK),JohnSedat(University of California, San rJK),ZviSellinger Francisco), PeterSelby(CancerResearch, (HebrewUniversity, lsrael), (JohnsHopkins GreggSemenza University), peter PhilippeSengel(University of Grenoble, France), Shaw(JohnInnesInstitute, Norwich,UK),MichaelSheetz (Columbia University), DavidShima(Cancer Research, UK), SamuelSilverstein (Columbia University), KaiSimons(Maxplanck Instituteof Molecular CellBiologyandGenetics, Dresden), Melvin l. Simon(California Instituteof Technology), Jonathan Slack (Cancer Research, UK),AlisonSmith(JohnInnesInstitute, Norfolk, UK),JohnMaynardSmith(University of Sussex, UK),Frank Solomon(Massachusetts Instituteof Technology), Michael (University Solursh of lowa),BruceSpiegelman (Harvard Medical School), (Harvard TimothySpringer MedicalSchool), Mathias Sprinzl(University of Bayreuth, Germany), ScottStachel (University of California, Berkeley), (University AndrewStaehelin of Colorado, Boulder), (University DavidStandring of California, SanFrancisco), (University Margaret Stanley of Cambridge), MarthaStark(University of California, SanFrancisco), WilfredStein (HebrewUniversity, lsrael), (princeton MalcolmSteinberg University), PaulSternberg(California Instituteof Technology), ChuckStevens(TheSalkInstitute),MurrayStewart(MRC Laboratory of Molecular Biology, Cambridge), Monroe (University Strickberger of Missouri, St.Louis),RobertStroud (University of California, SanFrancisco), MichaelStryxer (University of California, SanFrancisco), WilliamSullivan (University of California, SantaCruz),DanielSzollosi (lnstitut Nationalde la Recherche Agronomique, France), JackSzostak (Massachusetts General Hospital), (Kyoto Masatoshi Takeichi University), CliffordTabin(HarvardMedicalSchool),Diethard Tautz(University of Cologne,Germany), JulieTheriot(Stanford University), RogerThomas(University of Bristol,UK),Vernon Thornton(King's CollegeLondon), (University CheryllTickle of Dundee, Scotland), JimTill(Ontario CancerInstitute, Toronto), LewisTilney(University of Pennsylvania), NickTonks(ColdSpring HarborLaboratory), (lnstitute AlainTownsend of Molecular

Medicine, JohnRadcliffe (Anthony Hospital, Oxford), PaulTravers NolanResearch Institute, (UMDNJ, London), RobertTrelstad RobertWoodJohnsonMedicalSchool), AnthonyTrewavas (Edinburgh University, Scotland), NigelUnwin(MRCLaboratory of Molecular (University Biology, Cambridge),Victor Vacquier of California, 5anDiego),HarryvanderWesten(Wageningen, The Netherlands), TomVanaman(University of Kentucky), Harold Varmus(Sloan-Kettering Institute), Alexander Varshavsky (California Instituteof Technology), MadhuWahi(University of California, 5anFrancisco), VirginiaWalbot(StanfordUniversity), FrankWalsh(Glaxo-Smithkline-Beecham, UK),TrevorWang(John InnesInstitute, Norwich, UK),Yu-Lie Wang(Worcester Foundation for Biomedical Research), AnneWarner(University College London),GrahamWarren(YaleUniversitySchoolof Medicine), (MountSinaiSchoolof Medicine), PaulWassarman FionaWatt (CancerResearch, (TheScripps UK),ClareWaterman-Storer Research Institute),FionaWatt(CancerResearch, UK),JohnWatts (JohnInnesInstitute, Norwich, UK),KlausWeber(MaxPlanck Institutefor Biophysical Chemistry, Gottingen), MartinWeigert (lnstitute of Cancer Research, Philadelphia), HaroldWeintraub (deceased), KarstenWeis(University of California, Berkeley), lrving (StanfordUniversity), Weissman (University JonathanWeissman (Stanford of California, SanFrancisco), NormanWessells University), JudyWhite(University of Virginia), StevenWest (Cancer Research, UK),WilliamWickner(Dartmouth College), Michael (ChironCorporation), Wilcox(deceased), LewisT.Williams KeithWillison(Chester BeattyLaboratories, London),JohnWilson (BaylorUniversity), AlanWolffe(deceased), RichardWolfenden (University of NorthCarolina, ChapelHill),Sandra Wolin(yale UniversitySchoolof Medicine), LewisWolpert(University College London),RickWood(CancerResearch, UK),AbrahamWorcel (University of Rochester), NickWright(Cancer Research, UK), JohnWyke(Beatson Institutefor CancerResearch, Glasgow), KeithYamamoto(University of California, 5anFrancisco), Charles Yocum(University of Michigan, AnnArbor),peter (UMDNJ, Yurchenco RobertWoodJohnsonMedicalSchool), Rosalind Zalin(University CollegeLondon), Patricia Zambryski (University of California, Berkeley).

A Noteto the Reader Structure of the Book Although the chapters of this book can be read independently of one another, they are arranged in a logical sequence of five parts. The first three chapters of Part I cover elementary principles and basic biochemistry. They can serve either as an introduction for those who have not studied biochemistry or as a refresher course for those who have. Part II deals with the storage, expression and transmission of genetic information. Part III deals with the principles of the main experimental methods for investigating cells. It is not necessary to read these two chapters in order to understand the later chapters, but a reader will find it a useful reference. Part IV discusses the internal organization of the cell. Part V follows the behavior of cells in multicellular systems, starting with cell-cell junctions and extracellular matrix and concluding with tvvo chapters on the immune system. Chapters 2l-25 can be found on the Media DVD-ROM which is packaged with each book, providing increased portability for students. End-of-Chapter Problems A selection of problems, written by Iohn Wilson and Tim Hunt, now appears in the text at the end of each chapter. The complete solutions to these problems can be found in Molecular Biology of the CelI, Fifth Edition: The Problems Book. References A concise list of selectedreferencesis included at the end of each chapter. These are arranged in alphabetical order under the main chapter section headings. These references frequently include the original papers in which important discoveries were first reported. Chapter 8 includes several tables giving the dates of crucial developments along with the names of the scientists involved. Elsewhere in the book the policy has been to avoid naming individual scientists. Media Codes Media codes are integrated throughout the text to indicate when relevant videos and animations are available on the DVD-ROM. The four-letter codes are enclosed in brackets and highlighted in color, like this .The interface for the CeII Biology Interactiue media player on the DVD-ROM contains a window where you enter the 4-letter code. lVhen the code is typed into the interface, the corresponding media item will load into the media player. GlossaryTerms Throughout the book, boldface type has been used to highlight key terms at the point in a chapter where the main discussion of them occurs. Italic is used to set off important terms with a lesser degree of emphasis. At the end of the book is the expanded glossary, covering technical terms that are part of the common currency of cell biology; it is intended as a first resort for a reader who encounters an unfamiliar term used without explanation. Nomenclature for Genes and Proteins Each species has its own conventions for naming genes; the only common feature is that they are always set in italics. In some species (such as humans)' gene names are spelled out all in capital letters; in other species (such as zebrafish),

case and rest in lower case; or (as in Drosophila) with different combinations of upper and lower case,according to whether the first mutant allele to be discovered gave a dominant or recessivephenotype. conventions for naming protein products are equally varied. This typographical chaos drives everyone crazy. lt is not just tiresome and absurd; it is also unsustainable. we cannot independently define a fresh convention for each of the next few million species whose genes we may wish to study. Moreover, there are many occasions, especially in a book such as this, where we need to refer to a gene generically,without specifliing the mouse version, the human version, the chick version, or the hippopotamus version, because they are all equivalent for the purposes of the discussion. \.A/hatconvention then should we use? We have decided in this book to cast aside the conventions for individual species and follow a uniform rule: we write all gene names, like the names of people and places, with the first letter in upper case and the rest in lower case,but all in- italics, thus: Apc, Bazooka, cdc2, Disheuelled,Egll. The corresponding protein, where it is named after the gene, will be written in the same way, but in roman rather than italic letters:Apc, Bazooka, cdc2, Dishevelled,Egll. lvhen it is necessary to specify the organism, this can be done with a prefix to the gene name. For completeness,we list a few further details of naming rules that we shall follow In some instances an added letter in the gene name is traditionally used to distinguish between genes that are related by function or evolution; foi those geneswe put that letter in upper case if it is usual to do so (LacZ,RecA,HoxA4). we use no hyphen to separate added letters or numbers from the rest of the name. Proteins are more of a problem. Many of them have names in their own right, assigned to them before the gene was named. such protein names take many forms, although most of them traditionally begin with a lower-case letter (actin, hemoglobin, catalase), Iike the names of ordinary substances (cheese, nylon), unless they are acronyms (such as GFB for Green Fluorescent protein, or BMP4, for Bone Morphogenetic Protein #4).To force all such protein names into a uniform style would do too much violence to established usages,and we shall simply write them in the traditional way (actin, GFB etc.). For thl corresponding gene names in all these cases,we shall nevertheless follow our standard rule: Actin, Hemoglobin, catalase, Bmp4, G/p. occasionally in our book we need to highlight a protein name by setting it in italics for emphasis; the intention will generally be clear from the context. For those who wish to know them, the Table below shows some of the official conventions for individual species-conventions that we shall mostlv vioIate in this book, in the manner


Human Zebrafish Coenorhabditis Drosophila

Yeast Socch aromyces cerevisiae (budding yeast) Schizosacch aromyces pombe(fissionyeast) Arabidopsis E.coli

Hoxo4 Bmp4 integrinu-|, ltgal HOXA4 cyclops,cyc unc-6 sevenless, sey(named afterrecessive mutant phenotype) Defarmed,Dfd (named afterdominantmutant phenotype) CDC28 Cdc2 GAI uvrA

Hoxa4 BMP4 integrincr1 HOXA4 Cyclops, Cyc UNC-6 Sevenless, SEV


Deformed, Dfd

Deformed, Dfd

Cdc28, Cdc28p Cdc2,Cdc2p GAI UvrA

Cdc28 Cdc2

Cdc28 Cdc2 GAI UvrA



Bmp4 lntegrin d,l,ltgal

BMP4 i n t e g r i na 1



Cyclops,Cyc Unc6

Cyclops,Cyc Unc6


Sevenless, Sev

Gai UvrA

Ancillaries Molecular Biolagy of the Cell,Fifih Edition:The ProblemsBook by Iohn Wilson and Tim Hunt (ISBN:978-0-8I 53-4f 10-9) The ProblemsBook is designedto help students appreciatethe ways in which experimentsand simple calculationscan lead to an understandingof how cells work. It providesproblemsto accompanyChaptersI-20 of MolecularBiologyof the Cell. Each chapter of problems is divided into sectionsthat correspondto those of the main textbook and review key terms, test for understandingbasic problems.MolecularBiologyof the Cell,Fifth concepts,and poseresearch-based Bookshould be useful for homework assignmentsand as Edition: TheProblem.s a basisfor classdiscussion.It could evenprovide ideasfor examquestions.Solutions for all of the problems are provided on the CD-ROMwhich accompanies the book. Solutionsfor the end-of-chapterproblemsin the main textbookare also found in TheProblemsBook. MBoCSMediaDVD-ROM The DVD included with everycopy of the book contains the figures,tables,and presentations, one for micrographsfrom the book,pre-loadedinto PowerPoint@ eachchapter.A separatefolder containsindividual versionsof eachfigure,table, and micrograph in JPEGformat. The panels are availablein PDF format. There arealsoover 125videos,animations,molecularstructuretutorials, and high-resolution micrographson the DVD.The authors have chosento include material that not only reinforcesbasicconceptsbut alsoexpandsthe contentand scope of the book.The multimedia can be accessedeither asindividual files or through the Cell BiologyInteractiuemedia player.As discussedabove,the media player has been programmedto workwith the Media Codesintegratedthroughout the book. A completetable of contentsand overviewof all electronicresourcesis contained in the MBoCSMedia Viewing Guide,a PDF file located on the root level of the DVD-ROMand in the Appendix of the media player.The DVD-ROM also containsChapters21-25which covermulticellularsystems.The chapters arein PDFformat and can be easilyprinted or searchedusingAdobe@Acrobat@ Readeror other PDF software. TeachingSupplements Upon request,teaching supplements for MolecularBiologt of the Cell are available to qualified instructors. MBoC1TransparencySet Provides200 frrll-color overheadacetatetransparenciesof the most important figuresfrom the book. MBoCSTestQuestions A selection of test questions will be available.Written by Kirsten Benjamin (AmyrisBiotechnologies,Emeryville,California)and Linda Huang (Universityof Boston),thesethoughtquestionswill teststudents'understandMassachusetts, ing of the chapter material. MBoCSLecture Outlines Lectureoutlines createdfrom the conceptheadsfor the text are provided. Garlnnd ScienceClasswirerM All of the teachingsupplementson the DVD-ROM(theseinclude figuresin PowerPointand JPEGformat;Chapters2l-25 in PDFformat; 125videos,animations, and movies)and the test questionsand Iectureoutlines areavailableto qualified instructorsonline at the GarlandScienceClasswire'"Web site.GarlandScience Classwire'"offersaccessto other instructional resourcesfrom all of the Garland Sciencetextbooks,and providesfreeonline coursemanagementtools. For addior tional information, pleasevisit Inc.) (Classwire of ChalKree, is a trademark e-mail [email protected]. Adobe and Acrobat are either registeredtrademarks or trademarks of Adobe SystemsIncorporated in the United Statesandlor other countries PowerPoint is either a registeredtrademark or trademark of Microsoft Corporation in the United Statesand/or other countries


Chapter 1

Cells and Genomes The surface of our planet is populated by living things—curious, intricately organized chemical factories that take in matter from their surroundings and use these raw materials to generate copies of themselves. The living organisms appear extraordinarily diverse. What could be more different than a tiger and a piece of seaweed, or a bacterium and a tree? Yet our ancestors, knowing nothing of cells or DNA, saw that all these things had something in common. They called that something “life,” marveled at it, struggled to define it, and despaired of explaining what it was or how it worked in terms that relate to nonliving matter. The discoveries of the past century have not diminished the marvel—quite the contrary. But they have lifted away the mystery as to the nature of life. We can now see that all living things are made of cells, and that these units of living matter all share the same machinery for their most basic functions. Living things, though infinitely varied when viewed from the outside, are fundamentally similar inside. The whole of biology is a counterpoint between the two themes: astonishing variety in individual particulars; astonishing constancy in fundamental mechanisms. In this first chapter we begin by outlining the universal features common to all life on our planet. We then survey, briefly, the diversity of cells. And we see how, thanks to the common code in which the specifications for all living organisms are written, it is possible to read, measure, and decipher these specifications to achieve a coherent understanding of all the forms of life, from the smallest to the greatest.







THE UNIVERSAL FEATURES OF CELLS ON EARTH It is estimated that there are more than 10 million—perhaps 100 million—living species on Earth today. Each species is different, and each reproduces itself faithfully, yielding progeny that belong to the same species: the parent organism hands down information specifying, in extraordinary detail, the characteristics that the offspring shall have. This phenomenon of heredity is central to the definition of life: it distinguishes life from other processes, such as the growth of a crystal, or the burning of a candle, or the formation of waves on water, in which orderly structures are generated but without the same type of link between the peculiarities of parents and the peculiarities of offspring. Like the candle flame, the living organism consumes free energy to create and maintain its organization; but the free energy drives a hugely complex system of chemical processes that is specified by the hereditary information. Most living organisms are single cells; others, such as ourselves, are vast multicellular cities in which groups of cells perform specialized functions and are linked by intricate systems of communication. But in all cases, whether we discuss the solitary bacterium or the aggregate of more than 1013 cells that form a human body, the whole organism has been generated by cell divisions from a single cell. The single cell, therefore, is the vehicle for the hereditary information that defines the species (Figure 1–1). And specified by this information, the cell includes the machinery to gather raw materials from the environment, and to construct out of them a new cell in its own image, complete with a new copy of the hereditary information. Nothing less than a cell has this capability.



Chapter 1: Cells and Genomes





50 mm

50 mm

100 mm



Figure 1–1 The hereditary information in the fertilized egg cell determines the nature of the whole multicellular organism. (A and B) A sea urchin egg gives rise to a sea urchin. (C and D) A mouse egg gives rise to a mouse. (E and F) An egg of the seaweed Fucus gives rise to a Fucus seaweed. (A, courtesy of David McClay; B, courtesy of M. Gibbs, Oxford Scientific Films; C, courtesy of Patricia Calarco, from G. Martin, Science 209:768–776, 1980. With permission from AAAS; D, courtesy of O. Newman, Oxford Scientific Films; E and F, courtesy of Colin Brownlee.)

All Cells Store Their Hereditary Information in the Same Linear Chemical Code (DNA) Computers have made us familiar with the concept of information as a measurable quantity—a million bytes (to record a few hundred pages of text or an image from a digital camera), 600 million for the music on a CD, and so on. They have also made us well aware that the same information can be recorded in many different physical forms. As the computer world has evolved, the discs and tapes that we used 10 years ago for our electronic archives have become unreadable on present-day machines. Living cells, like computers, deal in information, and it is estimated that they have been evolving and diversifying for over 3.5 billion years. It is scarcely to be expected that they should all store their information in the same form, or that the archives of one type of cell should be readable by the information-handling machinery of another. And yet it is so. All living cells on Earth, without any known exception, store their hereditary information in the form of double-stranded molecules of DNA—long unbranched paired polymer chains, formed always of the same four types of monomers. These monomers have nicknames drawn from a four-letter alphabet—A, T, C, G—and they are strung together in a long linear sequence that encodes the genetic information, just as the sequence of 1s and 0s encodes the information in a computer file. We can take a piece of DNA from a human cell and insert it into a bacterium, or a piece of bacterial DNA and insert it into a human cell, and the information will be successfully read, interpreted, and copied. Using chemical methods, scientists can read out the complete sequence of monomers in any DNA molecule—extending for millions of nucleotides—and thereby decipher the hereditary information that each organism contains.



All Cells Replicate Their Hereditary Information by Templated Polymerization The mechanisms that make life possible depend on the structure of the doublestranded DNA molecule. Each monomer in a single DNA strand—that is, each nucleotide—consists of two parts: a sugar (deoxyribose) with a phosphate group attached to it, and a base, which may be either adenine (A), guanine (G), cytosine (C) or thymine (T) (Figure 1–2). Each sugar is linked to the next via the phosphate group, creating a polymer chain composed of a repetitive sugarphosphate backbone with a series of bases protruding from it. The DNA polymer is extended by adding monomers at one end. For a single isolated strand, these can, in principle, be added in any order, because each one links to the next in the same way, through the part of the molecule that is the same for all of them. In the living cell, however, DNA is not synthesized as a free strand in isolation, but on a template formed by a preexisting DNA strand. The bases protruding from the existing strand bind to bases of the strand being synthesized, according to a strict rule defined by the complementary structures of the bases: A binds to T, and C binds to G. This base-pairing holds fresh monomers in place and thereby controls the selection of which one of the four monomers shall be added to the growing strand next. In this way, a double-stranded structure is created, consisting of two exactly complementary sequences of As, Cs, Ts, and Gs. The two strands twist around each other, forming a double helix (Figure 1–2E).


building block of DNA


double-stranded DNA

phosphate sugar

+ sugar phosphate

























DNA strand








sugar-phosphate backbone




(E) (C)

hydrogen-bonded base pairs

DNA double helix

templated polymerization of new strand nucleotide monomers






























Figure 1–2 DNA and its building blocks. (A) DNA is made from simple subunits, called nucleotides, each consisting of a sugar-phosphate molecule with a nitrogen-containing sidegroup, or base, attached to it. The bases are of four types (adenine, guanine, cytosine, and thymine), corresponding to four distinct nucleotides, labeled A, G, C, and T. (B) A single strand of DNA consists of nucleotides joined together by sugarphosphate linkages. Note that the individual sugar-phosphate units are asymmetric, giving the backbone of the strand a definite directionality, or polarity. This directionality guides the molecular processes by which the information in DNA is interpreted and copied in cells: the information is always “read” in a consistent order, just as written English text is read from left to right. (C) Through templated polymerization, the sequence of nucleotides in an existing DNA strand controls the sequence in which nucleotides are joined together in a new DNA strand; T in one strand pairs with A in the other, and G in one strand with C in the other. The new strand has a nucleotide sequence complementary to that of the old strand, and a backbone with opposite directionality: corresponding to the GTAA... of the original strand, it has ...TTAC. (D) A normal DNA molecule consists of two such complementary strands. The nucleotides within each strand are linked by strong (covalent) chemical bonds; the complementary nucleotides on opposite strands are held together more weakly, by hydrogen bonds. (E) The two strands twist around each other to form a double helix—a robust structure that can accommodate any sequence of nucleotides without altering its basic structure.


Chapter 1: Cells and Genomes template strand

new strand

Figure 1–3 The copying of genetic information by DNA replication. In this process, the two strands of a DNA double helix are pulled apart, and each serves as a template for synthesis of a new complementary strand.

new strand parent DNA double helix

template strand

The bonds between the base pairs are weak compared with the sugar-phosphate links, and this allows the two DNA strands to be pulled apart without breakage of their backbones. Each strand then can serve as a template, in the way just described, for the synthesis of a fresh DNA strand complementary to itself—a fresh copy, that is, of the hereditary information (Figure 1–3). In different types of cells, this process of DNA replication occurs at different rates, with different controls to start it or stop it, and different auxiliary molecules to help it along. But the basics are universal: DNA is the information store, and templated polymerization is the way in which this information is copied throughout the living world.

All Cells Transcribe Portions of Their Hereditary Information into the Same Intermediary Form (RNA) To carry out its information-bearing function, DNA must do more than copy itself. It must also express its information, by letting it guide the synthesis of other molecules in the cell. This also occurs by a mechanism that is the same in all living organisms, leading first and foremost to the production of two other key classes of polymers: RNAs and proteins. The process (discussed in detail in Chapters 6 and 7) begins with a templated polymerization called transcription, in which segments of the DNA sequence are used as templates for the synthesis of shorter molecules of the closely related polymer ribonucleic acid, or RNA. Later, in the more complex process of translation, many of these RNA molecules direct the synthesis of polymers of a radically different chemical class—the proteins (Figure 1–4). In RNA, the backbone is formed of a slightly different sugar from that of DNA—ribose instead of deoxyribose—and one of the four bases is slightly different—uracil (U) in place of thymine (T); but the other three bases—A, C, and G—are the same, and all four bases pair with their complementary counterparts in DNA—the A, U, C, and G of RNA with the T, A, G, and C of DNA. During transcription, RNA monomers are lined up and selected for polymerization on a template strand of DNA, just as DNA monomers are selected during replication. The outcome is a polymer molecule whose sequence of nucleotides faithfully represents a part of the cell’s genetic information, even though written in a slightly different alphabet, consisting of RNA monomers instead of DNA monomers. The same segment of DNA can be used repeatedly to guide the synthesis of many identical RNA transcripts. Thus, whereas the cell’s archive of genetic information in the form of DNA is fixed and sacrosanct, the RNA transcripts are mass-produced and disposable (Figure 1–5). As we shall see, these transcripts function as intermediates in the transfer of genetic information: they mainly serve as messenger RNA (mRNA) to guide the synthesis of proteins according to the genetic instructions stored in the DNA. RNA molecules have distinctive structures that can also give them other specialized chemical capabilities. Being single-stranded, their backbone is flexible, so that the polymer chain can bend back on itself to allow one part of the

DNA synthesis (replication) DNA

RNA synthesis (transcription) RNA

protein synthesis (translation) PROTEIN

amino acids

Figure 1–4 From DNA to protein. Genetic information is read out and put to use through a two-step process. First, in transcription, segments of the DNA sequence are used to guide the synthesis of molecules of RNA. Then, in translation, the RNA molecules are used to guide the synthesis of molecules of protein.




strand used as a template to direct RNA synthesis many identical RNA transcripts

molecule to form weak bonds with another part of the same molecule. This occurs when segments of the sequence are locally complementary: a ...GGGG... segment, for example, will tend to associate with a ...CCCC... segment. These types of internal associations can cause an RNA chain to fold up into a specific shape that is dictated by its sequence (Figure 1–6). The shape of the RNA molecule, in turn, may enable it to recognize other molecules by binding to them selectively—and even, in certain cases, to catalyze chemical changes in the molecules that are bound. As we see in Chapter 6, a few chemical reactions catalyzed by RNA molecules are crucial for several of the most ancient and fundamental processes in living cells, and it has been suggested that more extensive catalysis by RNA played a central part in the early evolution of life.

Figure 1–5 How genetic information is broadcast for use inside the cell. Each cell contains a fixed set of DNA molecules—its archive of genetic information. A given segment of this DNA guides the synthesis of many identical RNA transcripts, which serve as working copies of the information stored in the archive. Many different sets of RNA molecules can be made by transcribing selected parts of a long DNA sequence, allowing each cell to use its information store differently.

All Cells Use Proteins as Catalysts Protein molecules, like DNA and RNA molecules, are long unbranched polymer chains, formed by stringing together monomeric building blocks drawn from a standard repertoire that is the same for all living cells. Like DNA and RNA, they carry information in the form of a linear sequence of symbols, in the same way as a human message written in an alphabetic script. There are many different protein molecules in each cell, and—leaving out the water—they form most of the cell’s mass. The monomers of protein, the amino acids, are quite different from those of DNA and RNA, and there are 20 types, instead of 4. Each amino acid is built around the same core structure through which it can be linked in a standard way to any other amino acid in the set; attached to this core is a side group that gives each amino acid a distinctive chemical character. Each of the protein molecules, or polypeptides, created by joining amino acids in a particular sequence folds into a precise three-dimensional form with reactive sites on its surface (Figure














U (B)

Figure 1–6 The conformation of an RNA molecule. (A) Nucleotide pairing between different regions of the same RNA polymer chain causes the molecule to adopt a distinctive shape. (B) The three-dimensional structure of an actual RNA molecule, from hepatitis delta virus, that catalyzes RNA strand cleavage. The blue ribbon represents the sugarphosphate backbone; the bars represent base pairs. (B, based on A.R. Ferré D’Amaré, K. Zhou and J.A. Doudna, Nature 395:567–574, 1998. With permission from Macmillan Publishers Ltd.)


Chapter 1: Cells and Genomes polysaccharide chain + + catalytic site lysozyme molecule (B)

(A) lysozyme

Figure 1–7 How a protein molecule acts as catalyst for a chemical reaction. (A) In a protein molecule the polymer chain folds up to into a specific shape defined by its amino acid sequence. A groove in the surface of this particular folded molecule, the enzyme lysozyme, forms a catalytic site. (B) A polysaccharide molecule (red)—a polymer chain of sugar monomers—binds to the catalytic site of lysozyme and is broken apart, as a result of a covalent bond-breaking reaction catalyzed by the amino acids lining the groove.

1–7A). These amino acid polymers thereby bind with high specificity to other molecules and act as enzymes to catalyze reactions that make or break covalent bonds. In this way they direct the vast majority of chemical processes in the cell (Figure 1–7B). Proteins have many other functions as well—maintaining structures, generating movements, sensing signals, and so on—each protein molecule performing a specific function according to its own genetically specified sequence of amino acids. Proteins, above all, are the molecules that put the cell’s genetic information into action. Thus, polynucleotides specify the amino acid sequences of proteins. Proteins, in turn, catalyze many chemical reactions, including those by which new DNA molecules are synthesized, and the genetic information in DNA is used to make both RNA and proteins. This feedback loop is the basis of the autocatalytic, self-reproducing behavior of living organisms (Figure 1–8).

All Cells Translate RNA into Protein in the Same Way The translation of genetic information from the 4-letter alphabet of polynucleotides into the 20-letter alphabet of proteins is a complex process. The rules of this translation seem in some respects neat and rational, in other respects strangely arbitrary, given that they are (with minor exceptions) identical in all living things. These arbitrary features, it is thought, reflect frozen accidents in the early history of life—chance properties of the earliest organisms that were passed on by heredity and have become so deeply embedded in the constitution of all living cells that they cannot be changed without disastrous effects. The information in the sequence of a messenger RNA molecule is read out in groups of three nucleotides at a time: each triplet of nucleotides, or codon, specifies (codes for) a single amino acid in a corresponding protein. Since there are 64 (= 4 ¥ 4 ¥ 4) possible codons, all of which occur in nature, but only 20 amino acids, there are necessarily many cases in which several codons correspond to the same amino acid. The code is read out by a special class of small RNA molecules, the transfer RNAs (tRNAs). Each type of tRNA becomes attached at one end to a specific amino acid, and displays at its other end a specific sequence of three nucleotides—an anticodon—that enables it to recognize, through base-pairing, a particular codon or subset of codons in mRNA (Figure 1–9). For synthesis of protein, a succession of tRNA molecules charged with their appropriate amino acids have to be brought together with an mRNA molecule and matched up by base-pairing through their anticodons with each of its successive codons. The amino acids then have to be linked together to extend the growing protein chain, and the tRNAs, relieved of their burdens, have to be released. This whole complex of processes is carried out by a giant multimolecular machine, the ribosome, formed of two main chains of RNA, called ribosomal RNAs



amino acids

Figure 1–8 Life as an autocatalytic process. Polynucleotides (nucleotide polymers) and proteins (amino acid polymers) provide the sequence information and the catalytic functions that serve—through a complex set of chemical reactions—to bring about the synthesis of more polynucleotides and proteins of the same types.


catalytic function

sequence information



(rRNAs), and more than 50 different proteins. This evolutionarily ancient molecular juggernaut latches onto the end of an mRNA molecule and then trundles along it, capturing loaded tRNA molecules and stitching together the amino acids they carry to form a new protein chain (Figure 1–10).

The Fragment of Genetic Information Corresponding to One Protein Is One Gene DNA molecules as a rule are very large, containing the specifications for thousands of proteins. Individual segments of the entire DNA sequence are transcribed into separate mRNA molecules, with each segment coding for a different protein. Each such DNA segment represents one gene. A complication is that RNA molecules transcribed from the same DNA segment can often be processed in more than one way, so as to give rise to a set of alternative versions of a protein, especially in more complex cells such as those of plants and animals. A gene therefore is defined, more generally, as the segment of DNA sequence corresponding to a single protein or set of alternative protein variants (or to a single catalytic or structural RNA molecule for those genes that produce RNA but not protein). In all cells, the expression of individual genes is regulated: instead of manufacturing its full repertoire of possible proteins at full tilt all the time, the cell adjusts the rate of transcription and translation of different genes independently, according to need. Stretches of regulatory DNA are interspersed among the segments

Figure 1–9 Transfer RNA. (A) A tRNA molecule specific for the amino acid tryptophan. One end of the tRNA molecule has tryptophan attached to it, while the other end displays the triplet nucleotide sequence CCA (its anticodon), which recognizes the tryptophan codon in messenger RNA molecules. (B) The three-dimensional structure of the tryptophan tRNA molecule. Note that the codon and the anticodon in (A) are in antiparallel orientations, like the two strands in a DNA double helix (see Figure 1–2), so that the sequence of the anticodon in the tRNA is read from right to left, while that of the codon in the mRNA is read from left to right.

amino acid (tryptophan)

specific tRNA molecule tRNA binds to its codon in mRNA A












codon in mRNA (A)




Chapter 1: Cells and Genomes Figure 1–10 A ribosome at work. (A) The diagram shows how a ribosome moves along an mRNA molecule, capturing tRNA molecules that match the codons in the mRNA and using them to join amino acids into a protein chain. The mRNA specifies the sequence of amino acids. (B) The threedimensional structure of a bacterial ribosome (pale green and blue), moving along an mRNA molecule (orange beads), with three tRNA molecules (yellow, green, and pink) at different stages in their process of capture and release. The ribosome is a giant assembly of more than 50 individual protein and RNA molecules. (B, courtesy of Joachim Frank, Yanhong Li and Rajendra Agarwal.)

growing polypeptide chain incoming tRNA loaded with amino acid

STEP 1 2










that code for protein, and these noncoding regions bind to special protein molecules that control the local rate of transcription (Figure 1–11). Other noncoding DNA is also present, some of it serving, for example, as punctuation, defining where the information for an individual protein begins and ends. The quantity and organization of the regulatory and other noncoding DNA vary widely from one class of organisms to another, but the basic strategy is universal. In this way, the genome of the cell—that is, the total of its genetic information as embodied in its complete DNA sequence—dictates not only the nature of the cell’s proteins, but also when and where they are to be made.

two subunits of ribosome


2 1




P 3

A 4

STEP 3 2




Life Requires Free Energy 2

A living cell is a dynamic chemical system, operating far from chemical equilibrium. For a cell to grow or to make a new cell in its own image, it must take in free energy from the environment, as well as raw materials, to drive the necessary synthetic reactions. This consumption of free energy is fundamental to life. When it stops, a cell decays towards chemical equilibrium and soon dies. Genetic information is also fundamental to life. Is there any connection? The answer is yes: free energy is required for the propagation of information. For example, to specify one bit of information—that is, one yes/no choice between two equally probable alternatives—costs a defined amount of free energy that can be calculated. The quantitative relationship involves some deep reasoning and depends on a precise definition of the term “free energy,” discussed in Chapter 2. The basic idea, however, is not difficult to understand intuitively. Picture the molecules in a cell as a swarm of objects endowed with thermal energy, moving around violently at random, buffeted by collisions with one another. To specify genetic information—in the form of a DNA sequence, for example—molecules from this wild crowd must be captured, arranged in a specific order defined by some preexisting template, and linked together in a fixed relationship. The bonds that hold the molecules in their proper places on the template and join them together must be strong enough to resist the disordering effect of thermal motion. The process is driven forward by consumption of free energy, which is needed to ensure that the correct bonds are made, and made robustly. In the simplest case, the molecules can be compared with spring-loaded traps, ready to snap into a more stable, lower-energy attached state when they meet their proper partners; as they snap together into the bonded arrangement, their available stored energy—their free energy—like the energy of the spring in the trap, is released and dissipated as heat. In a cell, the chemical processes underlying information transfer are more complex, but the same basic principle applies: free energy has to be spent on the creation of order. To replicate its genetic information faithfully, and indeed to make all its complex molecules according to the correct specifications, the cell therefore requires free energy, which has to be imported somehow from the surroundings.


STEP 4 2







new tRNA bringing next amino acid 5



STEP 1 2





All Cells Function as Biochemical Factories Dealing with the Same Basic Molecular Building Blocks Because all cells make DNA, RNA, and protein, and these macromolecules are composed of the same set of subunits in every case, all cells have to contain and





manipulate a similar collection of small molecules, including simple sugars, nucleotides, and amino acids, as well as other substances that are universally required for their synthesis. All cells, for example, require the phosphorylated nucleotide ATP (adenosine triphosphate) as a building block for the synthesis of DNA and RNA; and all cells also make and consume this molecule as a carrier of free energy and phosphate groups to drive many other chemical reactions. Although all cells function as biochemical factories of a broadly similar type, many of the details of their small-molecule transactions differ, and it is not as easy as it is for the informational macromolecules to point out the features that are strictly universal. Some organisms, such as plants, require only the simplest of nutrients and harness the energy of sunlight to make from these almost all their own small organic molecules; other organisms, such as animals, feed on living things and obtain many of their organic molecules ready-made. We return to this point below.

All Cells Are Enclosed in a Plasma Membrane Across Which Nutrients and Waste Materials Must Pass There is, however, at least one other feature of cells that is universal: each one is enclosed by a membrane—the plasma membrane. This container acts as a selective barrier that enables the cell to concentrate nutrients gathered from its environment and retain the products it synthesizes for its own use, while excreting its waste products. Without a plasma membrane, the cell could not maintain its integrity as a coordinated chemical system. The molecules forming this membrane have the simple physico-chemical property of being amphiphilic—that is, consisting of one part that is hydrophobic (water-insoluble) and another part that is hydrophilic (water-soluble). Such molecules placed in water aggregate spontaneously, arranging their hydrophobic portions to be as much in contact with one another as possible to hide them from the water, while keeping their hydrophilic portions exposed. Amphiphilic molecules of appropriate shape, such as the phospholipid molecules that comprise most of the plasma membrane, spontaneously aggregate in water to form a bilayer that creates small closed vesicles (Figure 1–12). The phenomenon can be demonstrated in a test tube by simply mixing phospholipids and water together; under appropriate conditions, small vesicles form whose aqueous contents are isolated from the external medium. Although the chemical details vary, the hydrophobic tails of the predominant membrane molecules in all cells are hydrocarbon polymers (–CH2–CH2–CH2–), and their spontaneous assembly into a bilayered vesicle is but one of many examples of an important general principle: cells produce molecules whose chemical properties cause them to self-assemble into the structures that a cell needs. The cell boundary cannot be totally impermeable. If a cell is to grow and reproduce, it must be able to import raw materials and export waste across its plasma membrane. All cells therefore have specialized proteins embedded in their membrane that transport specific molecules from one side to the other (Figure 1–13). Some of these membrane transport proteins, like some of the proteins that catalyze the fundamental small-molecule reactions inside the cell,

Figure 1–11 Gene regulation by protein binding to regulatory DNA. (A) A diagram of a small portion of the genome of the bacterium Escherichia coli, containing genes (called LacI, LacZ, LacY, and LacA) coding for four different proteins. The protein-coding DNA segments (red) have regulatory and other noncoding DNA segments (yellow) between them. (B) An electron micrograph of DNA from this region, with a protein molecule (encoded by the LacI gene) bound to the regulatory segment; this protein controls the rate of transcription of the LacZ, LacY, and LacA genes. (C) A drawing of the structures shown in (B). (B, courtesy of Jack Griffith.)

site of protein binding shown in micrograph (B) below LacI


noncoding DNA segments LacY LacA 2000 nucleotide pairs



protein bound to regulatory segment of DNA (C)

segment of DNA coding for protein


Chapter 1: Cells and Genomes

have been so well preserved over the course of evolution that we can recognize the family resemblances between them in comparisons of even the most distantly related groups of living organisms. The transport proteins in the membrane largely determine which molecules enter the cell, and the catalytic proteins inside the cell determine the reactions that those molecules undergo. Thus, by specifying the proteins that the cell is to manufacture, the genetic information recorded in the DNA sequence dictates the entire chemistry of the cell; and not only its chemistry, but also its form and its behavior, for these too are chiefly constructed and controlled by the cell’s proteins.

phospholipid monolayer


phospholipid bilayer

A Living Cell Can Exist with Fewer Than 500 Genes The basic principles of biological information transfer are simple enough, but how complex are real living cells? In particular, what are the minimum requirements? We can get a rough indication by considering a species that has one of the smallest known genomes—the bacterium Mycoplasma genitalium (Figure 1–14). This organism lives as a parasite in mammals, and its environment provides it with many of its small molecules ready-made. Nevertheless, it still has to make all the large molecules—DNA, RNAs, and proteins—required for the basic processes of heredity. It has only about 480 genes in its genome of 580,070 nucleotide pairs, representing 145,018 bytes of information—about as much as it takes to record the text of one chapter of this book. Cell biology may be complicated, but it is not impossibly so. The minimum number of genes for a viable cell in today’s environments is probably not less than 200–300, although there are only about 60 genes in the core set shared by all living species without any known exception.


plasma membrane OUTSIDE INSIDE

sugars (13)


amino acids, peptides, amines (14)

ions (16)

other (3)


Figure 1–13 Membrane transport proteins. (A) Structure of a molecule of bacteriorhodopsin, from the archaeon (archaebacterium) Halobacterium halobium. This transport protein uses the energy of absorbed light to pump protons (H+ ions) out of the cell. The polypeptide chain threads to and fro across the membrane; in several regions it is twisted into a helical conformation, and the helical segments are arranged to form the walls of a channel through which ions are transported. (B) Diagram of the set of transport proteins found in the membrane of the bacterium Thermotoga maritima. The numbers in parentheses refer to the number of different membrane transport proteins of each type. Most of the proteins within each class are evolutionarily related to one another and to their    counterparts in other species.


Figure 1–12 Formation of a membrane by amphiphilic phospholipid molecules. These have a hydrophilic (water-loving, phosphate) head group and a hydrophobic (water-avoiding, hydrocarbon) tail. At an interface between oil and water, they arrange themselves as a single sheet with their head groups facing the water and their tail groups facing the oil. When immersed in water, they aggregate to form bilayers enclosing aqueous compartments.

THE DIVERSITY OF GENOMES AND THE TREE OF LIFE Figure 1–14 Mycoplasma genitalium. (A) Scanning electron micrograph showing the irregular shape of this small bacterium, reflecting the lack of any rigid wall. (B) Cross section (transmission electron micrograph) of a Mycoplasma cell. Of the 477 genes of Mycoplasma genitalium, 37 code for transfer, ribosomal, and other nonmessenger RNAs. Functions are known, or can be guessed, for 297 of the genes coding for protein: of these, 153 are involved in replication, transcription, translation, and related processes involving DNA, RNA, and protein; 29 in the membrane and surface structures of the cell; 33 in the transport of nutrients and other molecules across the membrane; 71 in energy conversion and the synthesis and degradation of small molecules; and 11 in the regulation of cell division and other processes. (A, from S. Razin et al., Infect. Immun. 30:538–546, 1980. With permission from the American Society for Microbiology; B, courtesy of Roger Cole, in Medical Microbiology, 4th ed. [S. Baron ed.]. Galveston: University of Texas Medical Branch, 1996.)



5 mm

Summary Living organisms reproduce themselves by transmitting genetic information to their progeny. The individual cell is the minimal self-reproducing unit, and is the vehicle for transmission of the genetic information in all living species. Every cell on our planet stores its genetic information in the same chemical form—as double-stranded DNA. The cell replicates its information by separating the paired DNA strands and using each as a template for polymerization to make a new DNA strand with a complementary sequence of nucleotides. The same strategy of templated polymerization is used to transcribe portions of the information from DNA into molecules of the closely related polymer, RNA. These in turn guide the synthesis of protein molecules by the more complex machinery of translation, involving a large multimolecular machine, the ribosome, which is itself composed of RNA and protein. Proteins are the principal catalysts for almost all the chemical reactions in the cell; their other functions include the selective import and export of small molecules across the plasma membrane that forms the cell’s boundary. The specific function of each protein depends on its amino acid sequence, which is specified by the nucleotide sequence of a corresponding segment of the DNA—the gene that codes for that protein. In this way, the genome of the cell determines its chemistry; and the chemistry of every living cell is fundamentally similar, because it must provide for the synthesis of DNA, RNA, and protein. The simplest known cells have just under 500 genes.

THE DIVERSITY OF GENOMES AND THE TREE OF LIFE The success of living organisms based on DNA, RNA, and protein, out of the infinitude of other chemical forms that we might conceive of, has been spectacular. They have populated the oceans, covered the land, infiltrated the Earth’s crust, and molded the surface of our planet. Our oxygen-rich atmosphere, the deposits of coal and oil, the layers of iron ores, the cliffs of chalk and limestone and marble—all these are products, directly or indirectly, of past biological activity on Earth. Living things are not confined to the familiar temperate realm of land, water, and sunlight inhabited by plants and plant-eating animals. They can be found in the darkest depths of the ocean, in hot volcanic mud, in pools beneath the frozen surface of the Antarctic, and buried kilometers deep in the Earth’s crust. The creatures that live in these extreme environments are generally unfamiliar, not only because they are inaccessible, but also because they are mostly microscopic. In more homely habitats, too, most organisms are too small for us to see without special equipment: they tend to go unnoticed, unless they cause a disease or rot the timbers of our houses. Yet microorganisms make up most of the


0.2 mm


Chapter 1: Cells and Genomes

total mass of living matter on our planet. Only recently, through new methods of molecular analysis and specifically through the analysis of DNA sequences, have we begun to get a picture of life on Earth that is not grossly distorted by our biased perspective as large animals living on dry land. In this section we consider the diversity of organisms and the relationships among them. Because the genetic information for every organism is written in the universal language of DNA sequences, and the DNA sequence of any given organism can be obtained by standard biochemical techniques, it is now possible to characterize, catalogue, and compare any set of living organisms with reference to these sequences. From such comparisons we can estimate the place of each organism in the family tree of living species—the ‘tree of life’. But before describing what this approach reveals, we need first to consider the routes by which cells in different environments obtain the matter and energy they require to survive and proliferate, and the ways in which some classes of organisms depend on others for their basic chemical needs.

Cells Can Be Powered by a Variety of Free Energy Sources Living organisms obtain their free energy in different ways. Some, such as animals, fungi, and the bacteria that live in the human gut, get it by feeding on other living things or the organic chemicals they produce; such organisms are called organotrophic (from the Greek word trophe, meaning “food”). Others derive their energy directly from the nonliving world. These fall into two classes: those that harvest the energy of sunlight, and those that capture their energy from energy-rich systems of inorganic chemicals in the environment (chemical systems that are far from chemical equilibrium). Organisms of the former class are called phototrophic (feeding on sunlight); those of the latter are called lithotrophic (feeding on rock). Organotrophic organisms could not exist without these primary energy converters, which are the most plentiful form of life. Phototrophic organisms include many types of bacteria, as well as algae and plants, on which we—and virtually all the living things that we ordinarily see around us—depend. Phototrophic organisms have changed the whole chemistry of our environment: the oxygen in the Earth’s atmosphere is a by-product of their biosynthetic activities. Lithotrophic organisms are not such an obvious feature of our world, because they are microscopic and mostly live in habitats that humans do not frequent—deep in the ocean, buried in the Earth’s crust, or in various other inhospitable environments. But they are a major part of the living world, and are especially important in any consideration of the history of life on Earth. Some lithotrophs get energy from aerobic reactions, which use molecular oxygen from the environment; since atmospheric O2 is ultimately the product of living organisms, these aerobic lithotrophs are, in a sense, feeding on the products of past life. There are, however, other lithotrophs that live anaerobically, in places where little or no molecular oxygen is present, in circumstances similar to those that must have existed in the early days of life on Earth, before oxygen had accumulated. The most dramatic of these sites are the hot hydrothermal vents found deep down on the floor of the Pacific and Atlantic Oceans, in regions where the ocean floor is spreading as new portions of the Earth’s crust form by a gradual upwelling of material from the Earth’s interior (Figure 1–15). Downward-percolating seawater is heated and driven back upward as a submarine geyser, carrying with it a current of chemicals from the hot rocks below. A typical cocktail might include H2S, H2, CO, Mn2+, Fe2+, Ni2+, CH2, NH4+, and phosphorus-containing compounds. A dense population of microbes lives in the neighborhood of the vent, thriving on this austere diet and harvesting free energy from reactions between the available chemicals. Other organisms—clams, mussels, and giant marine worms—in turn live off the microbes at the vent, forming an entire ecosystem analogous to the system of plants and animals that we belong to, but powered by geochemical energy instead of light (Figure 1–16).



SEA dark cloud of hot mineral-rich water hydrothermal vent

anaerobic lithotrophic bacteria invertebrate animal community

chimney made from precipitated metal sulfides


sea floor

Figure 1–15 The geology of a hot hydrothermal vent in the ocean floor. Water percolates down toward the hot molten rock upwelling from the Earth’s interior and is heated and driven back upward, carrying minerals leached from the hot rock. A temperature gradient is set up, from more than 350°C near the core of the vent, down to 2–3°C in the surrounding ocean. Minerals precipitate from the water as it cools, forming a chimney. Different classes of organisms, thriving at different temperatures, live in different neighborhoods of the chimney. A typical chimney might be a few meters tall, with a flow rate of 1–2 m/sec.

350°C contour

percolation of seawater

hot mineral solution

hot basalt

Some Cells Fix Nitrogen and Carbon Dioxide for Others To make a living cell requires matter, as well as free energy. DNA, RNA, and protein are composed of just six elements: hydrogen, carbon, nitrogen, oxygen, sulfur, and phosphorus. These are all plentiful in the nonliving environment, in the Earth’s rocks, water, and atmosphere, but not in chemical forms that allow easy incorporation into biological molecules. Atmospheric N2 and CO2, in particular, are extremely unreactive, and a large amount of free energy is required to drive the reactions that use these inorganic molecules to make the organic compounds needed for further biosynthesis—that is, to fix nitrogen and carbon dioxide, so as to make N and C available to living organisms. Many types of living cells lack the biochemical machinery to achieve this fixation, and rely on other classes of cells to do the job for them. We animals depend on plants for our supplies of geochemical energy and inorganic raw materials


multicellular animals e.g., tubeworms


Figure 1–16 Living organisms at a hot hydrothermal vent. Close to the vent, at temperatures up to about 120°C, various lithotrophic species of bacteria and archaea (archaebacteria) live, directly fuelled by geochemical energy. A little further away, where the temperature is lower, various invertebrate animals live by feeding on these microorganisms. Most remarkable are the giant (2-meter) tube worms, which, rather than feed on the lithotrophic cells, live in symbiosis with them: specialized organs in the worms harbor huge numbers of symbiotic sulfur-oxidizing bacteria. These bacteria harness geochemical energy and supply nourishment to their hosts, which have no mouth, gut, or anus. The dependence of the tube worms on the bacteria for the harnessing of geothermal energy is analogous to the dependence of plants on chloroplasts for the harnessing of solar energy, discussed later in this chapter. The tube worms, however, are thought to have evolved from more conventional animals, and to have become secondarily adapted to life at hydrothermal vents. (Courtesy of Dudley Foster, Woods Hole Oceanographic Institution.)


Chapter 1: Cells and Genomes Figure 1–17 Shapes and sizes of some bacteria. Although most are small, as shown, measuring a few micrometers in linear dimension, there are also some giant species. An extreme example (not shown) is the cigar-shaped bacterium Epulopiscium fishelsoni, which lives in the gut of a surgeonfish and can be up to 600 mm long.

2 mm spherical cells e.g., Streptococcus

rod-shaped cells e.g., Escherichia coli, Vibrio cholerae

the smallest cells e.g., Mycoplasma, Spiroplasma

spiral cells e.g., Treponema pallidum

organic carbon and nitrogen compounds. Plants in turn, although they can fix carbon dioxide from the atmosphere, lack the ability to fix atmospheric nitrogen, and they depend in part on nitrogen-fixing bacteria to supply their need for nitrogen compounds. Plants of the pea family, for example, harbor symbiotic nitrogen-fixing bacteria in nodules in their roots. Living cells therefore differ widely in some of the most basic aspects of their biochemistry. Not surprisingly, cells with complementary needs and capabilities have developed close associations. Some of these associations, as we see below, have evolved to the point where the partners have lost their separate identities altogether: they have joined forces to form a single composite cell.

The Greatest Biochemical Diversity Exists Among Procaryotic Cells From simple microscopy, it has long been clear that living organisms can be classified on the basis of cell structure into two groups: the eucaryotes and the procaryotes. Eucaryotes keep their DNA in a distinct membrane-enclosed intracellular compartment called the nucleus. (The name is from the Greek, meaning “truly nucleated,” from the words eu, “well” or “truly,” and karyon, “kernel” or “nucleus”.) Procaryotes have no distinct nuclear compartment to house their DNA. Plants, fungi, and animals are eucaryotes; bacteria are procaryotes, as are archaea—a separate class of procaryotic cells, discussed below. Most procaryotic cells are small and simple in outward appearance (Figure 1–17), and they live mostly as independent individuals or in loosely organized communities, rather than as multicellular organisms. They are typically spherical or rod-shaped and measure a few micrometers in linear dimension. They often have a tough protective coat, called a cell wall, beneath which a plasma membrane encloses a single cytoplasmic compartment containing DNA, RNA, proteins, and the many small molecules needed for life. In the electron microscope, this cell interior appears as a matrix of varying texture without any discernible organized internal structure (Figure 1–18). Figure 1–18 The structure of a bacterium. (A) The bacterium Vibrio cholerae, showing its simple internal organization. Like many other species, Vibrio has a helical appendage at one end—a flagellum—that rotates as a propeller to drive the cell forward. (B) An electron micrograph of a longitudinal section through the widely studied bacterium Escherichia coli (E. coli). This is related to Vibrio but has many flagella (not visible in this section) distributed over its surface. The cell’s DNA is concentrated in the lightly stained region. (B, courtesy of E. Kellenberger.) plasma membrane


cell wall


1 mm

ribosomes (A)


1 mm






10 mm

Figure 1–19 The phototrophic bacterium Anabaena cylindrica viewed in the light microscope. The cells of this species form long, multicellular filaments. Most of the cells (labeled V) perform photosynthesis, while others become specialized for nitrogen fixation (labeled H), or develop into resistant spores (labeled S). (Courtesy of Dave G. Adams.)

Procaryotic cells live in an enormous variety of ecological niches, and they are astonishingly varied in their biochemical capabilities—far more so than eucaryotic cells. Organotrophic species can utilize virtually any type of organic molecule as food, from sugars and amino acids to hydrocarbons and methane gas. Phototrophic species (Figure 1–19) harvest light energy in a variety of ways, some of them generating oxygen as a byproduct, others not. Lithotrophic species can feed on a plain diet of inorganic nutrients, getting their carbon from CO2, and relying on H2S to fuel their energy needs (Figure 1–20)—or on H2, or Fe2+, or elemental sulfur, or any of a host of other chemicals that occur in the environment. Many parts of this world of microscopic organisms are virtually unexplored. Traditional methods of bacteriology have given us an acquaintance with those species that can be isolated and cultured in the laboratory. But DNA sequence analysis of the populations of bacteria in samples from natural habitats—such as soil or ocean water, or even the human mouth—has opened our eyes to the fact that most species cannot be cultured by standard laboratory techniques. According to one estimate, at least 99% of procaryotic species remain to be characterized.

The Tree of Life Has Three Primary Branches: Bacteria, Archaea, and Eucaryotes The classification of living things has traditionally depended on comparisons of their outward appearances: we can see that a fish has eyes, jaws, backbone, brain, and so on, just as we do, and that a worm does not; that a rosebush is cousin to an apple tree, but less similar to a grass. As Darwin showed, we can readily interpret such close family resemblances in terms of evolution from common ancestors, and we can find the remains of many of these ancestors preserved in the fossil record. In this way, it has been possible to begin to draw a family tree of living organisms, showing the various lines of descent, as well as branch points in the history, where the ancestors of one group of species became different from those of another. When the disparities between organisms become very great, however, these methods begin to fail. How do we decide whether a fungus is closer kin to a plant or to an animal? When it comes to procaryotes, the task becomes harder still: one microscopic rod or sphere looks much like another. Microbiologists have therefore sought to classify procaryotes in terms of their biochemistry and nutritional requirements. But this approach also has its pitfalls. Amid the bewildering variety of biochemical behaviors, it is difficult to know which differences truly reflect differences of evolutionary history. Genome analysis has given us a simpler, more direct, and more powerful way to determine evolutionary relationships. The complete DNA sequence of an organism defines its nature with almost perfect precision and in exhaustive detail. Moreover, this specification is in a digital form—a string of letters—that can be entered straightforwardly into a computer and compared with the corresponding information for any other living thing. Because DNA is subject to random changes that accumulate over long periods of time (as we shall see shortly), the number of differences between the DNA sequences of two organisms can provide a direct, objective, quantitative indication of the evolutionary distance between them. This approach has shown that the organisms that were traditionally classed together as “bacteria” can be as widely divergent in their evolutionary origins as

6 mm

Figure 1–20 A lithotrophic bacterium. Beggiatoa, which lives in sulfurous environments, gets its energy by oxidizing H2S and can fix carbon even in the dark. Note the yellow deposits of sulfur inside the cells. (Courtesy of Ralph W. Wolfe.)


Chapter 1: Cells and Genomes







human Haloferax

Aeropyrum cyanobacteria











Dictyostelium Euglena

E. coli

Thermotoga Aquifex

common ancestor cell

Trypanosoma Giardia 1 change/10 nucleotides


Figure 1–21 The three major divisions (domains) of the living world. Note that traditionally the word bacteria has been used to refer to procaryotes in general, but more recently has been redefined to refer to eubacteria specifically. The tree shown here is based on comparisons of the nucleotide sequence of a ribosomal RNA subunit in the different species, and the distances in the diagram represent estimates of the numbers of evolutionary changes that have occurred in this molecule in each lineage (see Figure 1–22). The parts of the tree shrouded in gray cloud represent uncertainties about details of the true pattern of species divergence in the course of evolution: comparisons of nucleotide or amino acid sequences of molecules other than rRNA, as well as other arguments, lead to somewhat different trees. There is general agreement, however, as to the early divergence of the three most basic domains—the bacteria, the archaea, and the eucaryotes.

is any procaryote from any eucaryote. It now appears that the procaryotes comprise two distinct groups that diverged early in the history of life on Earth, either before the ancestors of the eucaryotes diverged as a separate group or at about the same time. The two groups of procaryotes are called the bacteria (or eubacteria) and the archaea (or archaebacteria). The living world therefore has three major divisions or domains: bacteria, archaea, and eucaryotes (Figure 1–21). Archaea are often found inhabiting environments that we humans avoid, such as bogs, sewage treatment plants, ocean depths, salt brines, and hot acid springs, although they are also widespread in less extreme and more homely environments, from soils and lakes to the stomachs of cattle. In outward appearance they are not easily distinguished from bacteria. At a molecular level, archaea seem to resemble eucaryotes more closely in their machinery for handling genetic information (replication, transcription, and translation), but bacteria more closely in their apparatus for metabolism and energy conversion. We discuss below how this might be explained.

Some Genes Evolve Rapidly; Others Are Highly Conserved Both in the storage and in the copying of genetic information, random accidents and errors occur, altering the nucleotide sequence—that is, creating mutations. Therefore, when a cell divides, its two daughters are often not quite identical to one another or to their parent. On rare occasions, the error may represent a change for the better; more probably, it will cause no significant difference in the cell’s prospects; and in many cases, the error will cause serious damage—for example, by disrupting the coding sequence for a key protein. Changes due to mistakes of the first type will tend to be perpetuated, because the altered cell has an increased likelihood of reproducing itself. Changes due to mistakes of the second type—selectively neutral changes—may be perpetuated or not: in the competition for limited resources, it is a matter of chance whether the altered cell or its cousins will succeed. But changes that cause serious damage lead nowhere: the cell that suffers them dies, leaving no progeny. Through endless repetition of this cycle of error and trial—of mutation and natural selection—



organisms evolve: their genetic specifications change, giving them new ways to exploit the environment more effectively, to survive in competition with others, and to reproduce successfully. Clearly, some parts of the genome change more easily than others in the course of evolution. A segment of DNA that does not code for protein and has no significant regulatory role is free to change at a rate limited only by the frequency of random errors. In contrast, a gene that codes for a highly optimized essential protein or RNA molecule cannot alter so easily: when mistakes occur, the faulty cells are almost always eliminated. Genes of this latter sort are therefore highly conserved. Through 3.5 billion years or more of evolutionary history, many features of the genome have changed beyond all recognition; but the most highly conserved genes remain perfectly recognizable in all living species. These latter genes are the ones we must examine if we wish to trace family relationships between the most distantly related organisms in the tree of life. The studies that led to the classification of the living world into the three domains of bacteria, archaea, and eucaryotes were based chiefly on analysis of one of the two main RNA components of the ribosome—the so-called smallsubunit ribosomal RNA. Because translation is fundamental to all living cells, this component of the ribosome has been well conserved since early in the history of life on Earth (Figure 1–22).

Most Bacteria and Archaea Have 1000–6000 Genes Natural selection has generally favored those procaryotic cells that can reproduce the fastest by taking up raw materials from their environment and replicating themselves most efficiently, at the maximal rate permitted by the available food supplies. Small size implies a large ratio of surface area to volume, thereby helping to maximize the uptake of nutrients across the plasma membrane and boosting a cell’s reproductive rate. Presumably for these reasons, most procaryotic cells carry very little superfluous baggage; their genomes are small, with genes packed closely together and minimal quantities of regulatory DNA between them. The small genome size makes it relatively easy to determine the complete DNA sequence. We now have this information for many species of bacteria and archaea, and a few species of eucaryotes. As shown in Table 1–1, most bacterial and archaeal genomes contain between 106 and 107 nucleotide pairs, encoding 1000–6000 genes. A complete DNA sequence reveals both the genes an organism possesses and the genes it lacks. When we compare the three domains of the living world, we can begin to see which genes are common to all of them and must therefore have been present in the cell that was ancestral to all present-day living things, and which genes are peculiar to a single branch in the tree of life. To explain the findings, however, we need to consider a little more closely how new genes arise and genomes evolve.














human Methanococcus E. coli human

Figure 1–22 Genetic information conserved since the days of the last common ancestor of all living things. A part of the gene for the smaller of the two main RNA components of the ribosome is shown. (The complete molecule is about 1500–1900 nucleotides long, depending on species.) Corresponding segments of nucleotide sequence from an archaean (Methanococcus jannaschii), a bacterium (Escherichia coli) and a eucaryote (Homo sapiens) are aligned. Sites where the nucleotides are identical between species are indicated by a vertical line; the human sequence is repeated at the bottom of the alignment so that all three two-way comparisons can be seen. A dot halfway along the E. coli sequence denotes a site where a nucleotide has been either deleted from the bacterial lineage in the course of evolution, or inserted in the other two lineages. Note that the sequences from these three organisms, representative of the three domains of the living world, all differ from one another to a roughly similar degree, while still retaining unmistakable similarities.


Chapter 1: Cells and Genomes

Table 1–1 Some Genomes That Have Been Completely Sequenced SPECIES





has one of the smallest of all known cell genomes photosynthetic, oxygen-generating (cyanobacterium) laboratory favorite causes stomach ulcers and predisposes to stomach cancer causes anthrax lithotrophic; lives at high temperatures source of antibiotics; giant genome spirochete; causes syphilis bacterium most closely related to mitochondria; causes typhus organotrophic; lives at very high temperatures

human genital tract



lakes and streams



human gut human stomach

4639 1667

4289 1590

soil hydrothermal vents

5227 1551

5634 1544

soil human tissues lice and humans (intracellular parasite) hydrothermal vents

8667 1138 1111

7825 1041 834



hydrothermal vents



hydrothermal vents



hydrothermal and volcanic hot vents



minimal model eucaryote

grape skins, beer



model organism for flowering plants simple animal with perfectly predictable development key to the genetics of animal development most intensively studied mammal

soil and air






rotting fruit






BACTERIA Mycoplasma genitalium Synechocystis sp. Escherichia coli Helicobacter pylori Bacillus anthracis Aquifex aeolicus Streptomyces coelicolor Treponema pallidum Rickettsia prowazekii Thermotoga maritima ARCHAEA Methanococcus jannaschii Archaeoglobus fulgidus Nanoarchaeum equitans

lithotrophic, anaerobic, methane-producing lithotrophic or organotrophic, anaerobic, sulfate-reducing smallest known archaean; anaerobic; parasitic on another, larger archaean

EUCARYOTES Saccharomyces cerevisiae (budding yeast) Arabidopsis thaliana (Thale cress) Caenorhabditis elegans (nematode worm) Drosophila melanogaster (fruit fly) Homo sapiens (human)

Genome size and gene number vary between strains of a single species, especially for bacteria and archaea. The table shows data for particular strains that have been sequenced. For eucaryotes, many genes can give rise to several alternative variant proteins, so that the total number of proteins specified by the genome is substantially greater than the number of genes.

New Genes Are Generated from Preexisting Genes The raw material of evolution is the DNA sequence that already exists: there is no natural mechanism for making long stretches of new random sequence. In this sense, no gene is ever entirely new. Innovation can, however, occur in several ways (Figure 1–23): 1. Intragenic mutation: an existing gene can be modified by changes in its DNA sequence, through various types of error that occur mainly in the process of DNA replication. 2. Gene duplication: an existing gene can be duplicated so as to create a pair of initially identical genes within a single cell; these two genes may then diverge in the course of evolution.






1 gene



gene A




gene B

organism A




organism B organism B with new gene


Segment shuffling: two or more existing genes can be broken and rejoined to make a hybrid gene consisting of DNA segments that originally belonged to separate genes. 4. Horizontal (intercellular) transfer: a piece of DNA can be transferred from the genome of one cell to that of another—even to that of another species. This process is in contrast with the usual vertical transfer of genetic information from parent to progeny. Each of these types of change leaves a characteristic trace in the DNA sequence of the organism, providing clear evidence that all four processes have occurred. In later chapters we discuss the underlying mechanisms, but for the present we focus on the consequences.

Gene Duplications Give Rise to Families of Related Genes Within a Single Cell A cell duplicates its entire genome each time it divides into two daughter cells. However, accidents occasionally result in the inappropriate duplication of just part of the genome, with retention of original and duplicate segments in a single cell. Once a gene has been duplicated in this way, one of the two gene copies is free to mutate and become specialized to perform a different function within the same cell. Repeated rounds of this process of duplication and divergence, over many millions of years, have enabled one gene to give rise to a family of genes that may all be found within a single genome. Analysis of the DNA sequence of procaryotic genomes reveals many examples of such gene families: in Bacillus subtilis, for example, 47% of the genes have one or more obvious relatives (Figure 1–24). When genes duplicate and diverge in this way, the individuals of one species become endowed with multiple variants of a primordial gene. This evolutionary

Figure 1–23 Four modes of genetic innovation and their effects on the DNA sequence of an organism. A special form of horizontal transfer occurs when two different types of cells enter into a permanent symbiotic association. Genes from one of the cells then may be transferred to the genome of the other, as we shall see below when we discuss mitochondria and chloroplasts.


Chapter 1: Cells and Genomes 283 genes in families with 38–77 gene members 764 genes in families with 4–19 gene members

2126 genes with no family relationship

273 genes in families with 3 gene members

Figure 1–24 Families of evolutionarily related genes in the genome of Bacillus subtilis. The biggest family consists of 77 genes coding for varieties of ABC transporters—a class of membrane transport proteins found in all three domains of the living world. (Adapted from F. Kunst et al., Nature 390:249–256, 1997. With permission from Macmillan Publishers Ltd.)

568 genes in families with 2 gene members

process has to be distinguished from the genetic divergence that occurs when one species of organism splits into two separate lines of descent at a branch point in the family tree—when the human line of descent became separate from that of chimpanzees, for example. There, the genes gradually become different in the course of evolution, but they are likely to continue to have corresponding functions in the two sister species. Genes that are related by descent in this way—that is, genes in two separate species that derive from the same ancestral gene in the last common ancestor of those two species—are called orthologs. Related genes that have resulted from a gene duplication event within a single genome—and are likely to have diverged in their function—are called paralogs. Genes that are related by descent in either way are called homologs, a general term used to cover both types of relationship (Figure 1–25). The family relationships between genes can become quite complex (Figure 1–26). For example, an organism that possesses a family of paralogous genes (for example, the seven hemoglobin genes a, b, g, d, e, z, and q) may evolve into two separate species (such as humans and chimpanzees) each possessing the entire set of paralogs. All 14 genes are homologs, with the human hemoglobin a orthologous to the chimpanzee hemoglobin a, but paralogous to the human or chimpanzee hemoglobin b, and so on. Moreover, the vertebrate hemoglobins (the oxygen-binding proteins of blood) are homologous to the vertebrate myoglobins (the oxygen-binding proteins of muscle), as well as to more distant ancestral organism

ancestral organism

early ancestral organism

gene G

gene G


gene G



gene G1 species A

species B

gene GA

gene GB

later ancestral organism gene G2

gene G1


gene G2 genes GA and GB are orthologs (A)

genes G1 and G2 are paralogs (B)

species A

species B

gene G1A

gene G1B

gene G2A

gene G2B

all G genes are homologs

Figure 1–25 Paralogous genes and orthologous genes: two types of gene homology based on different evolutionary pathways. (A) and (B) The most basic possibilities. (C) A more complex pattern of events that can occur.

G1A is a paralog of G2A and G2B but an ortholog of G1B (C)


21 Drosophila globin shark myoglobin

ancestral globin

human myoglobin chick myoglobin shark Hb b chick Hb b chick Hb e chick Hb r human Hb b human Hb d human Hb e human Hb Ag human Hb Gg shark Hb a human Hb q-1 chick Hb a-A human Hb a1 human Hb a2 chick Hb a-D chick Hb p human Hb z

genes that code for oxygen-binding proteins in invertebrates, plants, fungi, and bacteria. From the DNA sequences, it is usually easy to recognize that two genes in different species are homologous; it is much more difficult to decide, without other information, whether they stand in the precise evolutionary relationship of orthologs.

Genes Can Be Transferred Between Organisms, Both in the Laboratory and in Nature Procaryotes also provide examples of the horizontal transfer of genes from one species of cell to another. The most obvious tell-tale signs are sequences recognizable as being derived from bacterial viruses, also called bacteriophages (Figure 1–27). Viruses are not themselves living cells but can act as vectors for gene transfer: they are small packets of genetic material that have evolved as parasites on the reproductive and biosynthetic machinery of host cells. They replicate in one cell, emerge from it with a protective wrapping, and then enter and infect another cell, which may be of the same or a different species. Often, the infected cell will be killed by the massive proliferation of virus particles inside it; but sometimes, the viral DNA, instead of directly generating these particles, may persist in its host for many cell generations as a relatively innocuous passenger, either as a separate intracellular fragment of DNA, known as a plasmid, or as a sequence inserted into the cell’s regular genome. In their travels, viruses can accidentally pick up fragments of DNA from the genome of one host cell and ferry them into another cell. Such transfers of genetic material frequently occur in procaryotes, and they can also occur between eucaryotic cells of the same species. Horizontal transfers of genes between eucaryotic cells of different species are very rare, and they do not seem to have played a significant part in eucaryote evolution (although massive transfers from bacterial to eucaryotic genomes have occurred in the evolution of mitochondria and chloroplasts, as we discuss below). In contrast, horizontal gene transfers occur much more frequently between different species of procaryotes. Many procaryotes have a remarkable capacity to take up even nonviral DNA molecules from their surroundings and thereby capture the genetic information these molecules carry. By this route, or by virus-mediated transfer, bacteria and archaea in the wild can acquire genes from neighboring cells relatively easily. Genes that confer resistance to an

Figure 1–26 A complex family of homologous genes. This diagram shows the pedigree of the hemoglobin (Hb), myoglobin, and globin genes of human, chick, shark, and Drosophila. The lengths of the horizontal lines represent the amount of divergence in amino acid sequence.


Chapter 1: Cells and Genomes

antibiotic or an ability to produce a toxin, for example, can be transferred from species to species and provide the recipient bacterium with a selective advantage. In this way, new and sometimes dangerous strains of bacteria have been observed to evolve in the bacterial ecosystems that inhabit hospitals or the various niches in the human body. For example, horizontal gene transfer is responsible for the spread, over the past 40 years, of penicillin-resistant strains of Neisseria gonorrheae, the bacterium that causes gonorrhea. On a longer time scale, the results can be even more profound; it has been estimated that at least 18% of all of the genes in the present-day genome of E. coli have been acquired by horizontal transfer from another species within the past 100 million years.

Sex Results in Horizontal Exchanges of Genetic Information Within a Species Horizontal exchanges of genetic information are important in bacterial and archaeal evolution in today’s world, and they may have occurred even more frequently and promiscuously in the early days of life on Earth. Such early horizontal exchanges could explain the otherwise puzzling observation that the eucaryotes seem more similar to archaea in their genes for the basic information-handling processes of DNA replication, transcription, and translation, but more similar to bacteria in their genes for metabolic processes. In any case, whether horizontal gene transfer occurred most freely in the early days of life on Earth, or has continued at a steady low rate throughout evolutionary history, it has the effect of complicating the whole concept of cell ancestry, by making each cell’s genome a composite of parts derived from separate sources. Horizontal gene transfer among procaryotes may seem a surprising process, but it has a parallel in a phenomenon familiar to us all: sex. In addition to the usual vertical transfer of genetic material from parent to offspring, sexual reproduction causes a large-scale horizontal transfer of genetic information between two initially separate cell lineages—those of the father and the mother. A key feature of sex, of course, is that the genetic exchange normally occurs only between individuals of the same species. But no matter whether they occur within a species or between species, horizontal gene transfers leave a characteristic imprint: they result in individuals who are related more closely to one set of relatives with respect to some genes, and more closely to another set of relatives with respect to others. By comparing the DNA sequences of individual human genomes, an intelligent visitor from outer space could deduce that humans reproduce sexually, even if it knew nothing about human behavior. Sexual reproduction is widespread (although not universal), especially among eucaryotes. Even bacteria indulge from time to time in controlled sexual exchanges of DNA with other members of their own species. Natural selection has clearly favored organisms that can reproduce sexually, although evolutionary theorists dispute precisely what the selective advantage of sex is.

The Function of a Gene Can Often Be Deduced from Its Sequence Family relationships among genes are important not just for their historical interest, but because they simplify the task of deciphering gene functions. Once the sequence of a newly discovered gene has been determined, a scientist can tap a few keys on a computer to search the entire database of known gene sequences for genes related to it. In many cases, the function of one or more of these homologs will have been already determined experimentally, and thus, since gene sequence determines gene function, one can frequently make a good guess at the function of the new gene: it is likely to be similar to that of the already-known homologs. In this way, it is possible to decipher a great deal of the biology of an organism simply by analyzing the DNA sequence of its genome and using the information we already have about the functions of genes in other organisms that have been more intensively studied.

(A) 100 nm

(B) 100 nm

Figure 1–27 The viral transfer of DNA from one cell to another. (A) An electron micrograph of particles of a bacterial virus, the T4 bacteriophage. The head of this virus contains the viral DNA; the tail contains the apparatus for injecting the DNA into a host bacterium. (B) A cross section of a bacterium with a T4 bacteriophage latched onto its surface. The large dark objects inside the bacterium are the heads of new T4 particles in course of assembly. When they are mature, the bacterium will burst open to release them. (A, courtesy of James Paulson; B, courtesy of Jonathan King and Erika Hartwig from G. Karp, Cell and Molecular Biology, 2nd ed. New York: John Wiley & Sons, 1999. With permission from John Wiley & Sons.)


More Than 200 Gene Families Are Common to All Three Primary Branches of the Tree of Life Given the complete genome sequences of representative organisms from all three domains—archaea, bacteria, and eucaryotes—we can search systematically for homologies that span this enormous evolutionary divide. In this way we can begin to take stock of the common inheritance of all living things. There are considerable difficulties in this enterprise. For example, individual species have often lost some of the ancestral genes; other genes have almost certainly been acquired by horizontal transfer from another species and therefore are not truly ancestral, even though shared. In fact, genome comparisons strongly suggest that both lineagespecific gene loss and horizontal gene transfer, in some cases between evolutionarily distant species, have been major factors of evolution, at least among procaryotes. Finally, in the course of 2 or 3 billion years, some genes that were initially shared will have changed beyond recognition by current methods. Because of all these vagaries of the evolutionary process, it seems that only a small proportion of ancestral gene families have been universally retained in a recognizable form. Thus, out of 4873 protein-coding gene families defined by comparing the genomes of 50 species of bacteria, 13 archaea, and 3 unicellular eucaryotes, only 63 are truly ubiquitous (that is, represented in all the genomes analyzed). The great majority of these universal families include components of the translation and transcription systems. This is not likely to be a realistic approximation of an ancestral gene set. A better—though still crude—idea of the latter can be obtained by tallying the gene families that have representatives in multiple, but not necessarily all, species from all three major domains. Such an analysis reveals 264 ancient conserved families. Each family can be assigned a function (at least in terms of general biochemical activity, but usually with more precision), with the largest number of shared gene families being involved in translation and in amino acid metabolism and transport (Table 1–2). This set of highly conserved gene families represents only a very rough sketch of the common inheritance of all modern life; a more precise reconstruction of the gene complement of the last universal common ancestor might be feasible with further genome sequencing and more careful comparative analysis.

Mutations Reveal the Functions of Genes Without additional information, no amount of gazing at genome sequences will reveal the functions of genes. We may recognize that gene B is like gene A, but how do we discover the function of gene A in the first place? And even if we know the function of gene A, how do we test whether the function of gene B is truly the same as the sequence similarity suggests? How do we connect the world of abstract genetic information with the world of real living organisms? The analysis of gene functions depends on two complementary approaches: genetics and biochemistry. Genetics starts with the study of mutants: we either find or make an organism in which a gene is altered, and examine the effects on the organism’s structure and performance (Figure 1–28). Biochemistry examines the functions of molecules: we extract molecules from an organism and then study their chemical activities. By combining genetics and biochemistry and examining the chemical abnormalities in a mutant organism, it is possible to find those molecules whose production depends on a given gene. At the same time, studies of the performance of the mutant organism show us what role those molecules have in the operation of the organism as a whole. Thus, genetics and biochemistry together provide a way to relate genes, molecules, and the structure and function of the organism. In recent years, DNA sequence information and the powerful tools of molecular biology have allowed rapid progress. From sequence comparisons, we can often identify particular subregions within a gene that have been preserved nearly unchanged over the course of evolution. These conserved subregions are likely to be the most important parts of the gene in terms of function. We can test their individual contributions to the activity of the gene product by creating in



Chapter 1: Cells and Genomes

Table 1–2 The Numbers of Gene Families, Classified by Function, That Are Common to All Three Domains of the Living World GENE FAMILY FUNCTION


Information processing Translation Transcription Replication, recombination, and repair Cellular processes and signaling Cell cycle control, mitosis, and meiosis Defense mechanisms Signal transduction mechanisms Cell wall/membrane biogenesis Intracellular trafficking and secretion Post-translational modification, protein turnover, chaperones Metabolism Energy production and conversion Carbohydrate transport and metabolism Amino acid transport and metabolism Nucleotide transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Inorganic ion transport and metabolism Secondary metabolite biosynthesis, transport, and catabolism Poorly characterized General biochemical function predicted; specific biological role unknown

63 7 13 2 3 1 2 4 8 19 16 43 15 22 9 8 5 24

For the purpose of this analysis, gene families are defined as “universal” if they are represented in the genomes of at least two diverse archaea (Archaeoglobus fulgidus and Aeropyrum pernix), two evolutionarily distant bacteria (Escherichia coli and Bacillus subtilis) and one eucaryote (yeast, Saccharomyces cerevisiae). (Data from R.L. Tatusov, E.V. Koonin and D.J. Lipman, Science 278:631–637, 1997, with permission from AAAS; R.L. Tatusov et al., BMC Bioinformatics 4:41, 2003, with permission from BioMed Central; and the COGs database at the US National Library of Medicine.)

the laboratory mutations of specific sites within the gene, or by constructing artificial hybrid genes that combine part of one gene with part of another. Organisms can be engineered to make either the RNA or the protein specified by the gene in large quantities to facilitate biochemical analysis. Specialists in molecular structure can determine the three-dimensional conformation of the gene product, revealing the exact position of every atom in it. Biochemists can determine how each of the parts of the genetically specified molecule contributes to its chemical behavior. Cell biologists can analyze the behavior of cells that are engineered to express a mutant version of the gene. There is, however, no one simple recipe for discovering a gene’s function, and no simple standard universal format for describing it. We may discover, for example, that the product of a given gene catalyzes a certain chemical reaction, and yet have no idea how or why that reaction is important to the organism. The functional characterization of each new family of gene products, unlike the description of the gene sequences, presents a fresh challenge to the biologist’s ingenuity. Moreover, we never fully understand the function of a gene until we learn its role in the life of the organism as a whole. To make ultimate sense of gene functions, therefore, we have to study whole organisms, not just molecules or cells.

Molecular Biologists Have Focused a Spotlight on E. coli Because living organisms are so complex, the more we learn about any particular species, the more attractive it becomes as an object for further study. Each

5 mm

Figure 1–28 A mutant phenotype reflecting the function of a gene. A normal yeast (of the species Schizosaccharomyces pombe) is compared with a mutant in which a change in a single gene has converted the cell from a cigar shape (left) to a T shape (right). The mutant gene therefore has a function in the control of cell shape. But how, in molecular terms, does the gene product perform that function? That is a harder question, and needs biochemical analysis to answer it. (Courtesy of Kenneth Sawin and Paul Nurse.)


25 Figure 1–29 The genome of E. coli. (A) A cluster of E. coli cells. (B) A diagram of the genome of E. coli strain K-12. The diagram is circular because the DNA of E. coli, like that of other procaryotes, forms a single, closed loop. Proteincoding genes are shown as yellow or orange bars, depending on the DNA strand from which they are transcribed; genes encoding only RNA molecules are indicated by green arrows. Some genes are transcribed from one strand of the DNA double helix (in a clockwise direction in this diagram), others from the other strand (counterclockwise). (A, courtesy of Dr. Tony Brain and David Parker/Photo Researchers; B, adapted from F.R. Blattner et al., Science 277:1453–1462, 1997. With permission from AAAS.)

origin of replication


Escherichia coli K-12 4,639,221 nucleotide pairs

terminus of replication


discovery raises new questions and provides new tools with which to tackle general questions in the context of the chosen organism. For this reason, large communities of biologists have become dedicated to studying different aspects of the same model organism. In the enormously varied world of bacteria, the spotlight of molecular biology has for a long time focused intensely on just one species: Escherichia coli, or E. coli (see Figures 1–17 and 1–18). This small, rod-shaped bacterial cell normally lives in the gut of humans and other vertebrates, but it can be grown easily in a simple nutrient broth in a culture bottle. It adapts to variable chemical conditions and reproduces rapidly, and it can evolve by mutation and selection at a remarkable speed. As with other bacteria, different strains of E. coli, though classified as members of a single species, differ genetically to a much greater degree than do different varieties of a sexually reproducing organism such as a plant or animal. One E. coli strain may possess many hundreds of genes that are absent from another, and the two strains could have as little as 50% of their genes in common. The standard laboratory strain E. coli K-12 has a genome of approximately 4.6 million nucleotide pairs, contained in a single circular molecule of DNA, coding for about 4300 different kinds of proteins (Figure 1–29). In molecular terms, we know more about E. coli than about any other living organism. Most of our understanding of the fundamental mechanisms of life— for example, how cells replicate their DNA, or how they decode the instructions represented in the DNA to direct the synthesis of specific proteins—has come from studies of E. coli. The basic genetic mechanisms have turned out to be highly conserved throughout evolution: these mechanisms are therefore essentially the same in our own cells as in E. coli.


Chapter 1: Cells and Genomes

Summary Procaryotes (cells without a distinct nucleus) are biochemically the most diverse organisms and include species that can obtain all their energy and nutrients from inorganic chemical sources, such as the reactive mixtures of minerals released at hydrothermal vents on the ocean floor—the sort of diet that may have nourished the first living cells 3.5 billion years ago. DNA sequence comparisons reveal the family relationships of living organisms and show that the procaryotes fall into two groups that diverged early in the course of evolution: the bacteria (or eubacteria) and the archaea. Together with the eucaryotes (cells with a membrane-enclosed nucleus), these constitute the three primary branches of the tree of life. Most bacteria and archaea are small unicellular organisms with compact genomes comprising 1000–6000 genes. Many of the genes within a single organism show strong family resemblances in their DNA sequences, implying that they originated from the same ancestral gene through gene duplication and divergence. Family resemblances (homologies) are also clear when gene sequences are compared between different species, and more than 200 gene families have been so highly conserved that they can be recognized as common to most species from all three domains of the living world. Thus, given the DNA sequence of a newly discovered gene, it is often possible to deduce the gene’s function from the known function of a homologous gene in an intensively studied model organism, such as the bacterium E. coli.

GENETIC INFORMATION IN EUCARYOTES Eucaryotic cells, in general, are bigger and more elaborate than procaryotic cells, and their genomes are bigger and more elaborate, too. The greater size is accompanied by radical differences in cell structure and function. Moreover, many classes of eucaryotic cells form multicellular organisms that attain levels of complexity unmatched by any procaryote. Because they are so complex, eucaryotes confront molecular biologists with a special set of challenges, which will concern us in the rest of this book. Increasingly, biologists meet these challenges through the analysis and manipulation of the genetic information within cells and organisms. It is therefore important at the outset to know something of the special features of the eucaryotic genome. We begin by briefly discussing how eucaryotic cells are organized, how this reflects their way of life, and how their genomes differ from those of procaryotes. This leads us to an outline of the strategy by which molecular biologists, by exploiting genetic information, are attempting to discover how eucaryotic organisms work.

Eucaryotic Cells May Have Originated as Predators By definition, eucaryotic cells keep their DNA in an internal compartment called the nucleus. The nuclear envelope, a double layer of membrane, surrounds the nucleus and separates the DNA from the cytoplasm. Eucaryotes also have other features that set them apart from procaryotes (Figure 1–30). Their cells are, typically, 10 times bigger in linear dimension, and 1000 times larger in volume. They have a cytoskeleton—a system of protein filaments crisscrossing the cytoplasm and forming, together with the many proteins that attach to them, a system of girders, ropes, and motors that gives the cell mechanical strength, controls its shape, and drives and guides its movements. The nuclear envelope is only one part of a set of internal membranes, each structurally similar to the plasma membrane and enclosing different types of spaces inside the cell, many of them involved in digestion and secretion. Lacking the tough cell wall of most bacteria, animal cells and the free-living eucaryotic cells called protozoa can change their shape rapidly and engulf other cells and small objects by phagocytosis (Figure 1–31). It is still a mystery how all these properties evolved, and in what sequence. One plausible view, however, is that they are all reflections of the way of life of a



microtubule centrosome with pair of centrioles

5 mm

extracellular matrix chromatin (DNA) nuclear pore nuclear envelope vesicles


actin filaments nucleolus peroxisome ribosomes in cytosol

Golgi apparatus

intermediate filaments

plasma membrane


primordial eucaryotic cell that was a predator, living by capturing other cells and eating them (Figure 1–32). Such a way of life requires a large cell with a flexible plasma membrane, as well as an elaborate cytoskeleton to support and move this membrane. It may also require that the cell’s long, fragile DNA molecules be sequestered in a separate nuclear compartment, to protect the genome from damage by the movements of the cytoskeleton.

Modern Eucaryotic Cells Evolved from a Symbiosis

endoplasmic reticulum


Figure 1–30 The major features of eucaryotic cells. The drawing depicts a typical animal cell, but almost all the same components are found in plants and fungi and in single-celled eucaryotes such as yeasts and protozoa. Plant cells contain chloroplasts in addition to the components shown here, and their plasma membrane is surrounded by a tough external wall formed of cellulose.

A predatory way of life helps to explain another feature of eucaryotic cells. Almost all such cells contain mitochondria (Figure 1–33). These small bodies in the cytoplasm, enclosed by a double layer of membrane, take up oxygen and harness energy from the oxidation of food molecules—such as sugars—to produce most of the ATP that powers the cell’s activities. Mitochondria are similar in size to small bacteria, and, like bacteria, they have their own genome in the form of a circular DNA molecule, their own ribosomes that differ from those elsewhere in the eucaryotic cell, and their own transfer RNAs. It is now generally accepted that mitochondria originated from free-living oxygen-metabolizing (aerobic) bacteria that were engulfed by an ancestral eucaryotic cell that could otherwise make no such use of oxygen (that is, was anaerobic). Escaping digestion, these bacteria evolved in symbiosis with the engulfing cell and its progeny,

10 mm

Figure 1–31 Phagocytosis. This series of stills from a movie shows a human white blood cell (a neutrophil) engulfing a red blood cell (artificially colored red) that has been treated with antibody. (Courtesy of Stephen E. Malawista and Anne de Boisfleury Chevance.)


Chapter 1: Cells and Genomes Figure 1–32 A single-celled eucaryote that eats other cells. (A) Didinium is a carnivorous protozoan, belonging to the group known as ciliates. It has a globular body, about 150 mm in diameter, encircled by two fringes of cilia—sinuous, whiplike appendages that beat continually; its front end is flattened except for a single protrusion, rather like a snout. (B) Didinium normally swims around in the water at high speed by means of the synchronous beating of its cilia. When it encounters a suitable prey, usually another type of protozoan, it releases numerous small paralyzing darts from its snout region. Then, the Didinium attaches to and devours the other cell by phagocytosis, inverting like a hollow ball to engulf its victim, which is almost as large as itself. (Courtesy of D. Barlow.)

(A) 100 mm (B)

receiving shelter and nourishment in return for the power generation they performed for their hosts (Figure 1–34). This partnership between a primitive anaerobic eucaryotic predator cell and an aerobic bacterial cell is thought to have been established about 1.5 billion years ago, when the Earth’s atmosphere first became rich in oxygen.



(A) 100 nm

Figure 1–33 A mitochondrion. (A) A cross section, as seen in the electron microscope. (B) A drawing of a mitochondrion with part of it cut away to show the three-dimensional structure. (C) A schematic eucaryotic cell, with the interior space of a mitochondrion, containing the mitochondrial DNA and ribosomes, colored. Note the smooth outer membrane and the convoluted inner membrane, which houses the proteins that generate ATP from the oxidation of food molecules. (A, courtesy of Daniel S. Friend.)



ancestral eucaryotic cell internal membranes

early eucaryotic cell nucleus

Figure 1–34 The origin of mitochondria. An ancestral eucaryotic cell is thought to have engulfed the bacterial ancestor of mitochondria, initiating a symbiotic relationship.

mitochondria with double membrane


Many eucaryotic cells—specifically, those of plants and algae—also contain another class of small membrane-enclosed organelles somewhat similar to mitochondria—the chloroplasts (Figure 1–35). Chloroplasts perform photosynthesis, using the energy of sunlight to synthesize carbohydrates from atmospheric carbon dioxide and water, and deliver the products to the host cell as food. Like mitochondria, chloroplasts have their own genome and almost certainly originated as symbiotic photosynthetic bacteria, acquired by cells that already possessed mitochondria (Figure 1–36). A eucaryotic cell equipped with chloroplasts has no need to chase after other cells as prey; it is nourished by the captive chloroplasts it has inherited from its ancestors. Correspondingly, plant cells, although they possess the cytoskeletal equipment for movement, have lost the ability to change shape rapidly and to engulf other cells by phagocytosis. Instead, they create around themselves a tough, protective cell wall. If the ancestral eucaryote was indeed a predator on other organisms, we can view plant cells as eucaryotes that have made the transition from hunting to farming. Fungi represent yet another eucaryotic way of life. Fungal cells, like animal cells, possess mitochondria but not chloroplasts; but in contrast with animal cells and protozoa, they have a tough outer wall that limits their ability to move


chlorophyllcontaining membranes

inner membrane outer membrane


10 mm


Figure 1–35 Chloroplasts. These organelles capture the energy of sunlight in plant cells and some single-celled eucaryotes. (A) A single cell isolated from a leaf of a flowering plant, seen in the light microscope, showing the green chloroplasts. (B) A drawing of one of the chloroplasts, showing the highly folded system of internal membranes containing the chlorophyll molecules by which light is absorbed. (A, courtesy of Preeti Dahiya.)


Chapter 1: Cells and Genomes

early eucaryotic cell

photosynthetic bacterium

early eucaryotic cell capable of photosynthesis

chloroplasts with double membrane

rapidly or to swallow up other cells. Fungi, it seems, have turned from hunters into scavengers: other cells secrete nutrient molecules or release them upon death, and fungi feed on these leavings—performing whatever digestion is necessary extracellularly, by secreting digestive enzymes to the exterior.

Eucaryotes Have Hybrid Genomes The genetic information of eucaryotic cells has a hybrid origin—from the ancestral anaerobic eucaryote, and from the bacteria that it adopted as symbionts. Most of this information is stored in the nucleus, but a small amount remains inside the mitochondria and, for plant and algal cells, in the chloroplasts. The mitochondrial DNA and the chloroplast DNA can be separated from the nuclear DNA and individually analyzed and sequenced. The mitochondrial and chloroplast genomes are found to be degenerate, cut-down versions of the corresponding bacterial genomes, lacking genes for many essential functions. In a human cell, for example, the mitochondrial genome consists of only 16,569 nucleotide pairs, and codes for only 13 proteins, two ribosomal RNA components, and 22 transfer RNAs. The genes that are missing from the mitochondria and chloroplasts have not all been lost; instead, many of them have been somehow moved from the symbiont genome into the DNA of the host cell nucleus. The nuclear DNA of humans contains many genes coding for proteins that serve essential functions inside the mitochondria; in plants, the nuclear DNA also contains many genes specifying proteins required in chloroplasts.

Eucaryotic Genomes Are Big Natural selection has evidently favored mitochondria with small genomes, just as it has favored bacteria with small genomes. By contrast, the nuclear genomes of most eucaryotes seem to have been free to enlarge. Perhaps the eucaryotic way of life has made large size an advantage: predators typically need to be bigger than their prey, and cell size generally increases in proportion to genome size. Perhaps enlargement of the genome has been driven by the accumulation of parasitic transposable elements (discussed in Chapter 5)—“selfish” segments of DNA that can insert copies of themselves at multiple sites in the genome. Whatever the explanation, the genomes of most eucaryotes are orders of magnitude larger than those of bacteria and archaea (Figure 1–37). And the freedom to be extravagant with DNA has had profound implications. Eucaryotes not only have more genes than procaryotes; they also have vastly more DNA that does not code for protein or for any other functional product molecule. The human genome contains 1000 times as many nucleotide pairs as the genome of a typical bacterium, 20 times as many genes, and about 10,000

Figure 1–36 The origin of chloroplasts. An early eucaryotic cell, already possessing mitochondria, engulfed a photosynthetic bacterium (a cyanobacterium) and retained it in symbiosis. All present-day chloroplasts are thought to trace their ancestry back to a single species of cyanobacterium that was adopted as an internal symbiont (an endosymbiont) over a billion years ago.



31 Figure 1–37 Genome sizes compared. Genome size is measured in nucleotide pairs of DNA per haploid genome, that is, per single copy of the genome. (The cells of sexually reproducing organisms such as ourselves are generally diploid: they contain two copies of the genome, one inherited from the mother, the other from the father.) Closely related organisms can vary widely in the quantity of DNA in their genomes, even though they contain similar numbers of functionally distinct genes. (Data from W.H. Li, Molecular Evolution, pp. 380–383. Sunderland, MA: Sinauer, 1997.)

E. coli yeast FUNGI



Arabidopsis PLANTS Drosophila INSECTS












107 108 109 1010 number of nucleotide pairs per haploid genome



times as much noncoding DNA (~98.5% of the genome for a human is noncoding, as opposed to 11% of the genome for the bacterium E. coli).

Eucaryotic Genomes Are Rich in Regulatory DNA Much of our noncoding DNA is almost certainly dispensable junk, retained like a mass of old papers because, when there is little pressure to keep an archive small, it is easier to retain everything than to sort out the valuable information and discard the rest. Certain exceptional eucaryotic species, such as the puffer fish (Figure 1–38), bear witness to the profligacy of their relatives; they have somehow managed to rid themselves of large quantities of noncoding DNA. Yet they appear similar in structure, behavior, and fitness to related species that have vastly more such DNA. Even in compact eucaryotic genomes such as that of puffer fish, there is more noncoding DNA than coding DNA, and at least some of the noncoding DNA certainly has important functions. In particular, it regulates the expression of adjacent genes. With this regulatory DNA, eucaryotes have evolved distinctive ways of controlling when and where a gene is brought into play. This sophisticated gene regulation is crucial for the formation of complex multicellular organisms.

The Genome Defines the Program of Multicellular Development The cells in an individual animal or plant are extraordinarily varied. Fat cells, skin cells, bone cells, nerve cells—they seem as dissimilar as any cells could be. Yet all these cell types are the descendants of a single fertilized egg cell, and all (with minor exceptions) contain identical copies of the genome of the species. The differences result from the way in which the cells make selective use of their genetic instructions according to the cues they get from their surroundings in the developing embryo. The DNA is not just a shopping list specifying the molecules that every cell must have, and the cell is not an assembly of all the items on the list. Rather, the cell behaves as a multipurpose machine, with sensors to receive environmental signals and with highly developed abilities to call different sets of genes into action according to the sequences of signals to which the cell has been exposed. The genome in each cell is big enough to accommodate the information that specifies an entire multicellular organism, but in any individual cell only part of that information is used. A large fraction of the genes in the eucaryotic genome code for proteins that regulate the activities of other genes. Most of these gene regulatory proteins act by

Figure 1–38 The puffer fish (Fugu rubripes). This organism has a genome size of 400 million nucleotide pairs— about one-quarter as much as a zebrafish, for example, even though the two species of fish have similar numbers of genes. (From a woodcut by Hiroshige, courtesy of Arts and Designs of Japan.)


Chapter 1: Cells and Genomes receptor protein in cell membrane detects environmental signal

gene-regulatory protein is activated... ...and binds to regulatory DNA...

...provoking activation of a gene to produce another protein...

Figure 1–39 Controlling gene readout by environmental signals. Regulatory DNA allows gene expression to be controlled by regulatory proteins, which are in turn the products of other genes. This diagram shows how a cell’s gene expression is adjusted according to a signal from the cell’s environment. The initial effect of the signal is to activate a regulatory protein already present in the cell; the signal may, for example, trigger the attachment of a phosphate group to the regulatory protein, altering its chemical properties.

...that binds to other regulatory regions... protein-coding region regulatory region produce yet more proteins, including some additional gene-regulatory proteins

binding, directly or indirectly, to the regulatory DNA adjacent to the genes that are to be controlled (Figure 1–39), or by interfering with the abilities of other proteins to do so. The expanded genome of eucaryotes therefore not only specifies the hardware of the cell, but also stores the software that controls how that hardware is used (Figure 1–40). Cells do not just passively receive signals; rather, they actively exchange signals with their neighbors. Thus, in a developing multicellular organism, the same control system governs each cell, but with different consequences depending on the messages exchanged. The outcome, astonishingly, is a precisely patterned array of cells in different states, each displaying a character appropriate to its position in the multicellular structure.

Many Eucaryotes Live as Solitary Cells: the Protists Many species of eucaryotic cells lead a solitary life—some as hunters (the protozoa), some as photosynthesizers (the unicellular algae), some as scavengers (the unicellular fungi, or yeasts). Figure 1–41 conveys something of the variety of forms of these single-celled eucaryotes, or protists. The anatomy of protozoa,

Figure 1–40 Genetic control of the program of multicellular development. The role of a regulatory gene is demonstrated in the snapdragon Antirrhinum. In this example, a mutation in a single gene coding for a regulatory protein causes leafy shoots to develop in place of flowers: because a regulatory protein has been changed, the cells adopt characters that would be appropriate to a different location in the normal plant. The mutant is on the left, the normal plant on the right. (Courtesy of Enrico Coen and Rosemary Carpenter.)




especially, is often elaborate and includes such structures as sensory bristles, photoreceptors, sinuously beating cilia, leglike appendages, mouth parts, stinging darts, and musclelike contractile bundles. Although they are single cells, protozoa can be as intricate, as versatile, and as complex in their behavior as many multicellular organisms (see Figure 1–32). In terms of their ancestry and DNA sequences, protists are far more diverse than the multicellular animals, plants, and fungi, which arose as three comparatively late branches of the eucaryotic pedigree (see Figure 1–21). As with procaryotes, humans have tended to neglect the protists because they are microscopic. Only now, with the help of genome analysis, are we beginning to understand their positions in the tree of life, and to put into context the glimpses these strange creatures offer us of our distant evolutionary past.

A Yeast Serves as a Minimal Model Eucaryote The molecular and genetic complexity of eucaryotes is daunting. Even more than for procaryotes, biologists need to concentrate their limited resources on a few selected model organisms to fathom this complexity. To analyze the internal workings of the eucaryotic cell, without the additional problems of multicellular development, it makes sense to use a species that is unicellular and as simple as possible. The popular choice for this role of minimal model eucaryote has been the yeast Saccharomyces cerevisiae (Figure 1–42)—the same species that is used by brewers of beer and bakers of bread. S. cerevisiae is a small, single-celled member of the kingdom of fungi and thus, according to modern views, at least as closely related to animals as it is to plants. It is robust and easy to grow in a simple nutrient medium. Like other fungi, it has a tough cell wall, is relatively immobile, and possesses mitochondria but not chloroplasts. When nutrients are plentiful, it grows and divides almost as

Figure 1–41 An assortment of protists: a small sample of an extremely diverse class of organisms. The drawings are done to different scales, but in each case the scale bar represents 10 mm. The organisms in (A), (B), (E), (F), and (I) are ciliates; (C) is a euglenoid; (D) is an amoeba; (G) is a dinoflagellate; (H) is a heliozoan. (From M.A. Sleigh, Biology of Protozoa. Cambridge, UK: Cambridge University Press, 1973.)


Chapter 1: Cells and Genomes


cell wall

Figure 1–42 The yeast Saccharomyces cerevisiae. (A) A scanning electron micrograph of a cluster of the cells. This species is also known as budding yeast; it proliferates by forming a protrusion or bud that enlarges and then separates from the rest of the original cell. Many cells with buds are visible in this micrograph. (B) A transmission electron micrograph of a cross section of a yeast cell, showing its nucleus, mitochondrion, and thick cell wall. (A, courtesy of Ira Herskowitz and Eric Schabatach.)

mitochondrion (B)

(A) 10 mm

2 mm

rapidly as a bacterium. It can reproduce either vegetatively (that is, by simple cell division), or sexually: two yeast cells that are haploid (possessing a single copy of the genome) can fuse to create a cell that is diploid (containing a double genome); and the diploid cell can undergo meiosis (a reduction division) to produce cells that are once again haploid (Figure 1–43). In contrast with higher plants and animals, the yeast can divide indefinitely in either the haploid or the diploid state, and the process leading from the one state to the other can be induced at will by changing the growth conditions. In addition to these features, the yeast has a further property that makes it a convenient organism for genetic studies: its genome, by eucaryotic standards, is exceptionally small. Nevertheless, it suffices for all the basic tasks that every eucaryotic cell must perform. As we shall see later in this book, studies on yeasts (using both S. cerevisiae and other species) have provided a key to many crucial processes, including the eucaryotic cell-division cycle—the critical chain of events by which the nucleus and all the other components of a cell are duplicated and parceled out to create two daughter cells from one. The control system that governs this process has been so well conserved over the course of evolution that many of its components can function interchangeably in yeast and human cells: if a mutant yeast lacking an essential yeast cell-division-cycle gene is supplied with a copy of the homologous cell-division-cycle gene from a human, the yeast is cured of its defect and becomes able to divide normally.

The Expression Levels of All The Genes of An Organism Can Be Monitored Simultaneously The complete genome sequence of S. cerevisiae, determined in 1997, consists of approximately 13,117,000 nucleotide pairs, including the small contribution (78,520 nucleotide pairs) of the mitochondrial DNA. This total is only about 2.5 times as much DNA as there is in E. coli, and it codes for only 1.5 times as many distinct proteins (about 6300 in all). The way of life of S. cerevisiae is similar in many ways to that of a bacterium, and it seems that this yeast has likewise been subject to selection pressures that have kept its genome compact. Knowledge of the complete genome sequence of any organism—be it a yeast or a human—opens up new perspectives on the workings of the cell: things that once seemed impossibly complex now seem within our grasp. Using techniques Figure 1–43 The reproductive cycles of the yeast S. cerevisiae. Depending on environmental conditions and on details of the genotype, cells of this species can exist in either a diploid (2n) state, with a double chromosome set, or a haploid (n) state, with a single chromosome set. The diploid form can either proliferate by ordinary cell-division cycles or undergo meiosis to produce haploid cells. The haploid form can either proliferate by ordinary cell-division cycles or undergo sexual fusion with another haploid cell to become diploid. Meiosis is triggered by starvation and gives rise to spores—haploid cells in a dormant state, resistant to harsh environmental conditions.



proliferation of diploid cells 2n meiosis and sporulation (triggered by starvation) 2n n n mating (usually immediately after spores hatch)



spores hatch n n


proliferation of haploid cells n










DNA/RNA/protein biosynthesis cell cycle

environmental response

developmental processes


to be described in Chapter 8, it is now possible, for example, to monitor, simultaneously, the amount of mRNA transcript that is produced from every gene in the yeast genome under any chosen conditions, and to see how this whole pattern of gene activity changes when conditions change. The analysis can be repeated with mRNA prepared from mutants lacking a chosen gene—any gene that we care to test. In principle, this approach provides a way to reveal the entire system of control relationships that govern gene expression—not only in yeast cells, but in any organism whose genome sequence is known.

Figure 1–44 The network of interactions between gene regulatory proteins and the genes that code for them in a yeast cell. Results are shown for 106 out of the total of 141 gene regulatory proteins in Saccharomyces cerevisiae. Each protein in the set was tested for its ability to bind to the regulatory DNA of each of the genes coding for this set of proteins. In the diagram, the genes are arranged in a circle, and an arrow pointing from gene A to gene B means that the protein encoded by A binds to the regulatory DNA of B, and therefore presumably regulates the expression of B. Small circles with arrowheads indicate genes whose products directly regulate their own expression. Genes governing different aspects of cell behavior are shown in different colors. For a multicellular plant or animal, the number of gene regulatory proteins is about 10 times greater, and the amount of regulatory DNA perhaps 100 times greater, so that the corresponding diagram would be vastly more complex. (From T.I. Lee et al., Science 298:799–804, 2002. With permission from AAAS.)

To Make Sense of Cells, We Need Mathematics, Computers, and Quantitative Information Through methods such as these, exploiting our knowledge of complete genome sequences, we can list the genes and proteins in a cell and begin to depict the web of interactions between them (Figure 1–44). But how are we to turn all this information into an understanding of how cells work? Even for a single cell type belonging to a single species of organism, the current deluge of data seems overwhelming. The sort of informal reasoning on which biologists usually rely seems totally inadequate in the face of such complexity. In fact, the difficulty is more than just a matter of information overload. Biological systems are, for example, full of feedback loops, and the behavior of even the simplest of systems with feedback is remarkably difficult to predict by intuition alone (Figure 1–45); small Figure 1–45 A very simple gene regulatory circuit—a single gene regulating its own expression by the binding of its protein product to its own regulatory DNA. Simple schematic diagrams such as this are often used to summarize what we know (as in Figure 1–44), but they leave many questions unanswered. When the protein binds, does it inhibit or stimulate transcription? How steeply does the transcription rate depend on the protein concentration? How long, on average, does a molecule of the protein remain bound to the DNA? How long does it take to make each molecule of mRNA or protein, and how quickly does each type of molecule get degraded? Mathematical modeling shows that we need quantitative answers to all these and other questions before we can predict the behavior of even this single-gene system. For different parameter values, the system may settle to a unique steady state; or it may behave as a switch, capable of existing in one or other of a set of alternative states; or it may oscillate; or it may show large random fluctuations.

regulatory DNA

gene coding region


gene regulatory protein


Chapter 1: Cells and Genomes

changes in parameters can cause radical changes in outcome. To go from a circuit diagram to a prediction of the behavior of the system, we need detailed quantitative information, and to draw deductions from that information we need mathematics and computers. These tools for quantitative reasoning are essential, but they are not allpowerful. You might think that, knowing how each protein influences each other protein, and how the expression of each gene is regulated by the products of others, we should soon be able to calculate how the cell as a whole will behave, just as an astronomer can calculate the orbits of the planets, or a chemical engineer can calculate the flows through a chemical plant. But any attempt to perform this feat for an entire living cell rapidly reveals the limits of our present state of knowledge. The information we have, plentiful as it is, is full of gaps and uncertainties. Moreover, it is largely qualitative rather than quantitative. Most often, cell biologists studying the cell’s control systems sum up their knowledge in simple schematic diagrams—this book is full of them—rather than in numbers, graphs, and differential equations. To progress from qualitative descriptions and intuitive reasoning to quantitative descriptions and mathematical deduction is one of the biggest challenges for contemporary cell biology. So far, the challenge has been met only for a few very simple fragments of the machinery of living cells—subsystems involving a handful of different proteins, or two or three cross-regulatory genes, where theory and experiment can go closely hand in hand. We shall discuss some of these examples later in the book.

Arabidopsis Has Been Chosen Out of 300,000 Species As a Model Plant The large multicellular organisms that we see around us—the flowers and trees and animals—seem fantastically varied, but they are much closer to one another in their evolutionary origins, and more similar in their basic cell biology, than the great host of microscopic single-celled organisms. Thus, while bacteria and eucaryotes are separated by more than 3000 million years of divergent evolution, vertebrates and insects are separated by about 700 million years, fish and mammals by about 450 million years, and the different species of flowering plants by only about 150 million years. Because of the close evolutionary relationship between all flowering plants, we can, once again, get insight into the cell and molecular biology of this whole class of organisms by focusing on just one or a few species for detailed analysis. Out of the several hundred thousand species of flowering plants on Earth today, molecular biologists have chosen to concentrate their efforts on a small weed, the common Thale cress Arabidopsis thaliana (Figure 1–46), which can be grown indoors in large numbers, and produces thousands of offspring per plant after 8–10 weeks. Arabidopsis has a genome of approximately 140 million nucleotide pairs, about 11 times as much as yeast, and its complete sequence is known.

The World of Animal Cells Is Represented By a Worm, a Fly, a Mouse, and a Human Multicellular animals account for the majority of all named species of living organisms, and for the largest part of the biological research effort. Four species have emerged as the foremost model organisms for molecular genetic studies. In order of increasing size, they are the nematode worm Caenorhabditis elegans, the fly Drosophila melanogaster, the mouse Mus musculus, and the human, Homo sapiens. Each of these has had its genome sequenced. Caenorhabditis elegans (Figure 1–47) is a small, harmless relative of the eelworm that attacks crops. With a life cycle of only a few days, an ability to survive in a freezer indefinitely in a state of suspended animation, a simple body plan, and an unusual life cycle that is well suited for genetic studies (described in Chapter 23), it is an ideal model organism. C. elegans develops with clockwork precision from a fertilized egg cell into an adult worm with exactly 959 body cells

Figure 1–46 Arabidopsis thaliana, the plant chosen as the primary model for studying plant molecular genetics. (Courtesy of Toni Hayden and the John Innes Foundation.)


37 Figure 1–47 Caenorhabditis elegans, the first multicellular organism to have its complete genome sequence determined. This small nematode, about 1 mm long, lives in the soil. Most individuals are hermaphrodites, producing both eggs and sperm. The animal is viewed here using interference contrast optics, showing up the boundaries of the tissues in bright colors; the animal itself is not colored when viewed with ordinary lighting. (Courtesy of Ian Hope.)

0.2 mm

(plus a variable number of egg and sperm cells)—an unusual degree of regularity for an animal. We now have a minutely detailed description of the sequence of events by which this occurs, as the cells divide, move, and change their characters according to strict and predictable rules. The genome of 97 million nucleotide pairs codes for about 19,000 proteins, and many mutants and other tools are available for the testing of gene functions. Although the worm has a body plan very different from our own, the conservation of biological mechanisms has been sufficient for the worm to be a model for many of the developmental and cell-biological processes that occur in the human body. Studies of the worm help us to understand, for example, the programs of cell division and cell death that determine the numbers of cells in the body—a topic of great importance in developmental biology and cancer research.

Studies in Drosophila Provide a Key to Vertebrate Development The fruitfly Drosophila melanogaster (Figure 1–48) has been used as a model genetic organism for longer than any other; in fact, the foundations of classical genetics were built to a large extent on studies of this insect. Over 80 years ago, it provided, for example, definitive proof that genes—the abstract units of hereditary information—are carried on chromosomes, concrete physical objects whose behavior had been closely followed in the eucaryotic cell with the light microscope, but whose function was at first unknown. The proof depended on one of the many features that make Drosophila peculiarly convenient for genetics—the

Figure 1–48 Drosophila melanogaster. Molecular genetic studies on this fly have provided the main key to understanding how all animals develop from a fertilized egg into an adult. (From E.B. Lewis, Science 221:cover, 1983. With permission from AAAS.)


Chapter 1: Cells and Genomes

giant chromosomes, with characteristic banded appearance, that are visible in some of its cells (Figure 1–49). Specific changes in the hereditary information, manifest in families of mutant flies, were found to correlate exactly with the loss or alteration of specific giant-chromosome bands. In more recent times, Drosophila, more than any other organism, has shown us how to trace the chain of cause and effect from the genetic instructions encoded in the chromosomal DNA to the structure of the adult multicellular body. Drosophila mutants with body parts strangely misplaced or mispatterned provided the key to the identification and characterization of the genes required to make a properly structured body, with gut, limbs, eyes, and all the other parts in their correct places. Once these Drosophila genes were sequenced, the genomes of vertebrates could be scanned for homologs. These were found, and their functions in vertebrates were then tested by analyzing mice in which the genes had been mutated. The results, as we see later in the book, reveal an astonishing degree of similarity in the molecular mechanisms of insect and vertebrate development. The majority of all named species of living organisms are insects. Even if Drosophila had nothing in common with vertebrates, but only with insects, it would still be an important model organism. But if understanding the molecular genetics of vertebrates is the goal, why not simply tackle the problem head-on? Why sidle up to it obliquely, through studies in Drosophila? Drosophila requires only 9 days to progress from a fertilized egg to an adult; it is vastly easier and cheaper to breed than any vertebrate, and its genome is much smaller—about 170 million nucleotide pairs, compared with 3200 million for a human. This genome codes for about 14,000 proteins, and mutants can now be obtained for essentially any gene. But there is also another, deeper reason why genetic mechanisms that are hard to discover in a vertebrate are often readily revealed in the fly. This relates, as we now explain, to the frequency of gene duplication, which is substantially greater in vertebrate genomes than in the fly genome and has probably been crucial in making vertebrates the complex and subtle creatures that they are.

The Vertebrate Genome Is a Product of Repeated Duplication Almost every gene in the vertebrate genome has paralogs—other genes in the same genome that are unmistakably related and must have arisen by gene duplication. In many cases, a whole cluster of genes is closely related to similar clusters present elsewhere in the genome, suggesting that genes have been duplicated in linked groups rather than as isolated individuals. According to one hypothesis, at an early stage in the evolution of the vertebrates, the entire genome underwent duplication twice in succession, giving rise to four copies of every gene. In some groups of vertebrates, such as fish of the salmon and carp families (including the zebrafish, a popular research animal), it has been suggested that there was yet another duplication, creating an eightfold multiplicity of genes. The precise course of vertebrate genome evolution remains uncertain, because many further evolutionary changes have occurred since these ancient events. Genes that were once identical have diverged; many of the gene copies have been lost through disruptive mutations; some have undergone further rounds of local duplication; and the genome, in each branch of the vertebrate family tree, has suffered repeated rearrangements, breaking up most of the original gene orderings. Comparison of the gene order in two related organisms, such as the human and the mouse, reveals that—on the time scale of vertebrate evolution—chromosomes frequently fuse and fragment to move large blocks of DNA sequence around. Indeed, it is possible, as we shall discuss in Chapter 7, that the present state of affairs is the result of many separate duplications of fragments of the genome, rather than duplications of the genome as a whole. There is, however, no doubt that such whole-genome duplications do occur from time to time in evolution, for we can see recent instances in which duplicated chromosome sets are still clearly identifiable as such. The frog

20 mm

Figure 1–49 Giant chromosomes from salivary gland cells of Drosophila. Because many rounds of DNA replication have occurred without an intervening cell division, each of the chromosomes in these unusual cells contains over 1000 identical DNA molecules, all aligned in register. This makes them easy to see in the light microscope, where they display a characteristic and reproducible banding pattern. Specific bands can be identified as the locations of specific genes: a mutant fly with a region of the banding pattern missing shows a phenotype reflecting loss of the genes in that region. Genes that are being transcribed at a high rate correspond to bands with a “puffed” appearance. The bands stained dark brown in the micrograph are sites where a particular regulatory protein is bound to the DNA. (Courtesy of B. Zink and R. Paro, from R. Paro, Trends Genet. 6:416–421, 1990. With permission from Elsevier.)

GENETIC INFORMATION IN EUCARYOTES Figure 1–50 Two species of the frog genus Xenopus. X. tropicalis, above, has an ordinary diploid genome; X. laevis, below, has twice as much DNA per cell. From the banding patterns of their chromosomes and the arrangement of genes along them, as well as from comparisons of gene sequences, it is clear that the large-genome species have evolved through duplications of the whole genome. These duplications are thought to have occurred in the aftermath of matings between frogs of slightly divergent Xenopus species. (Courtesy of E. Amaya, M. Offield and R. Grainger, Trends Genet. 14:253–255, 1998. With permission from Elsevier.)

genus Xenopus, for example, comprises a set of closely similar species related to one another by repeated duplications or triplications of the whole genome. Among these frogs are X. tropicalis, with an ordinary diploid genome; the common laboratory species X. laevis, with a duplicated genome and twice as much DNA per cell; and X. ruwenzoriensis, with a sixfold reduplication of the original genome and six times as much DNA per cell (108 chromosomes, compared with 36 in X. laevis, for example). These species are estimated to have diverged from one another within the past 120 million years (Figure 1–50).

Genetic Redundancy Is a Problem for Geneticists, But It Creates Opportunities for Evolving Organisms Whatever the details of the evolutionary history, it is clear that most genes in the vertebrate genome exist in several versions that were once identical. The related genes often remain functionally interchangeable for many purposes. This phenomenon is called genetic redundancy. For the scientist struggling to discover all the genes involved in some particular process, it complicates the task. If gene A is mutated and no effect is seen, it cannot be concluded that gene A is functionally irrelevant—it may simply be that this gene normally works in parallel with its relatives, and these suffice for near-normal function even when gene A is defective. In the less repetitive genome of Drosophila, where gene duplication is less common, the analysis is more straightforward: single gene functions are revealed directly by the consequences of single-gene mutations (the singleengined plane stops flying when the engine fails). Genome duplication has clearly allowed the development of more complex life forms; it provides an organism with a cornucopia of spare gene copies, which are free to mutate to serve divergent purposes. While one copy becomes optimized for use in the liver, say, another can become optimized for use in the brain or adapted for a novel purpose. In this way, the additional genes allow for increased complexity and sophistication. As the genes take on divergent functions, they cease to be redundant. Often, however, while the genes acquire individually specialized roles, they also continue to perform some aspects of their original core function in parallel, redundantly. Mutation of a single gene then causes a relatively minor abnormality that reveals only a part of the gene’s function (Figure 1–51). Families of genes with divergent but partly overlapping functions are a pervasive feature of vertebrate molecular biology, and they are encountered repeatedly in this book.

The Mouse Serves as a Model for Mammals Mammals have typically three or four times as many genes as Drosophila, a genome that is 20 times larger, and millions or billions of times as many cells in their adult bodies. In terms of genome size and function, cell biology, and molecular mechanisms, mammals are nevertheless a highly uniform group of organisms. Even anatomically, the differences among mammals are chiefly a matter of size and proportions; it is hard to think of a human body part that does not have a counterpart in elephants and mice, and vice versa. Evolution plays freely with quantitative features, but it does not readily change the logic of the structure.



Chapter 1: Cells and Genomes

gene G1

gene G1

gene G1

gene G1

gene G gene G2 ancestral organism (A)

gene G2

modern organism


loss of gene G1 (B)

gene G2 loss of gene G2

gene G2 loss of genes G1 and G2


For a more exact measure of how closely mammalian species resemble one another genetically, we can compare the nucleotide sequences of corresponding (orthologous) genes, or the amino acid sequences of the proteins that these genes encode. The results for individual genes and proteins vary widely. But typically, if we line up the amino acid sequence of a human protein with that of the orthologous protein from, say, an elephant, about 85% of the amino acids are identical. A similar comparison between human and bird shows an amino acid identity of about 70%—twice as many differences, because the bird and the mammalian lineages have had twice as long to diverge as those of the elephant and the human (Figure 1–52). The mouse, being small, hardy, and a rapid breeder, has become the foremost model organism for experimental studies of vertebrate molecular genetics. Many naturally occurring mutations are known, often mimicking the effects of corresponding mutations in humans (Figure 1–53). Methods have been developed, moreover, to test the function of any chosen mouse gene, or of any noncoding portion of the mouse genome, by artificially creating mutations in it, as we explain later in the book. One made-to-order mutant mouse can provide a wealth of information for the cell biologist. It reveals the effects of the chosen mutation in a host of different contexts, simultaneously testing the action of the gene in all the different kinds of cells in the body that could in principle be affected.

Humans Report on Their Own Peculiarities As humans, we have a special interest in the human genome. We want to know the full set of parts from which we are made, and to discover how they work. But even if you were a mouse, preoccupied with the molecular biology of mice, humans would be attractive as model genetic organisms, because of one special property: through medical examinations and self-reporting, we catalog our own genetic (and other) disorders. The human population is enormous, consisting today of some 6 billion individuals, and this self-documenting property means that a huge database of information exists on human mutations. The complete human genome sequence of more than 3 billion nucleotide pairs has now been determined, making it easier than ever before to identify at a molecular level the precise gene responsible for each human mutant characteristic. By drawing together the insights from humans, mice, flies, worms, yeasts, plants, and bacteria—using gene sequence similarities to map out the correspondences between one model organism and another—we enrich our understanding of them all.

Figure 1–51 The consequences of gene duplication for mutational analyses of gene function. In this hypothetical example, an ancestral multicellular organism has a genome containing a single copy of gene G, which performs its function at several sites in the body, indicated in green. (A) Through gene duplication, a modern descendant of the ancestral organism has two copies of gene G, called G1 and G2. These have diverged somewhat in their patterns of expression and in their activities at the sites where they are expressed, but they still retain important similarities. At some sites, they are expressed together, and each independently performs the same old function as the ancestral gene G (alternating green and yellow stripes); at other sites, they are expressed alone and may serve new purposes. (B) Because of a functional overlap, the loss of one of the two genes by mutation (red cross) reveals only a part of its role; only the loss of both genes in the double mutant reveals the full range of processes for which these genes are responsible. Analogous principles apply to duplicated genes that operate in the same place (for example, in a single-celled organism) but are called into action together or individually in response to varying circumstances. Thus, gene duplications complicate genetic analyses in all organisms.


98 84 86


pig/whale pig/sheep human/rabbit human/elephant human/mouse human/sloth

77 87 82 83 89 81













human/tuna fish






Tertiary 50



human/orangutan mouse/rat cat/dog

time in millions of years



250 Permean 300 Carboniferous 350 Devonian 400 Silurian 450

% amino acids identical in hemoglobin α chain




Ordovician 500 Cambrian 550


We Are All Different in Detail What precisely do we mean when we speak of the human genome? Whose genome? On average, any two people taken at random differ in about one or two in every 1000 nucleotide pairs in their DNA sequence. The Human Genome Project has arbitrarily selected DNA from a small number of anonymous individuals for sequencing. The human genome—the genome of the human species—is, properly speaking, a more complex thing, embracing the entire pool of variant genes that are found in the human population and continually exchanged and reassorted in the course of sexual reproduction. Ultimately, we can hope to document this variation too. Knowledge of it will help us understand, for example, why some people are prone to one disease, others to another; why some respond well to a drug, others badly. It will also provide new clues to our history—the population movements and minglings of our ancestors, the infections they suffered, the diets they ate. All these things leave traces in the variant forms of genes that have survived in human communities.

Figure 1–52 Times of divergence of different vertebrates. The scale on the left shows the estimated date and geological era of the last common ancestor of each specified pair of animals. Each time estimate is based on comparisons of the amino acid sequences of orthologous proteins; the longer a pair of animals have had to evolve independently, the smaller the percentage of amino acids that remain identical. Data from many different classes of proteins have been averaged to arrive at the final estimates, and the time scale has been calibrated to match the fossil evidence that the last common ancestor of mammals and birds lived 310 million years ago. The figures on the right give data on sequence divergence for one particular protein (chosen arbitrarily)—the a chain of hemoglobin. Note that although there is a clear general trend of increasing divergence with increasing time for this protein, there are also some irregularities. These reflect the randomness within the evolutionary process and, probably, the action of natural selection driving especially rapid changes of hemoglobin sequence in some organisms that experienced special physiological demands. On average, within any particular evolutionary lineage, hemoglobins accumulate changes at a rate of about 6 altered amino acids per 100 amino acids every 100 million years. Some proteins, subject to stricter functional constraints, evolve much more slowly than this, others as much as 5 times faster. All this gives rise to substantial uncertainties in estimates of divergence times, and some experts believe that the major groups of mammals diverged from one another as much as 60 million years more recently than shown here. (Adapted from S. Kumar and S.B. Hedges, Nature 392:917–920, 1998. With permission from Macmillan Publishers Ltd.)

Figure 1–53 Human and mouse: similar genes and similar development. The human baby and the mouse shown here have similar white patches on their foreheads because both have mutations in the same gene (called Kit), required for the development and maintenance of pigment cells. (Courtesy of R.A. Fleischman.)


Chapter 1: Cells and Genomes

Knowledge and understanding bring the power to intervene—with humans, to avoid or prevent disease; with plants, to create better crops; with bacteria, to turn them to our own uses. All these biological enterprises are linked, because the genetic information of all living organisms is written in the same language. The new-found ability of molecular biologists to read and decipher this language has already begun to transform our relationship to the living world. The account of cell biology in the subsequent chapters will, we hope, prepare you to understand, and possibly to contribute to, the great scientific adventure of the twenty-first century.

Summary Eucaryotic cells, by definition, keep their DNA in a separate membrane-enclosed compartment, the nucleus. They have, in addition, a cytoskeleton for support and movement, elaborate intracellular compartments for digestion and secretion, the capacity (in many species) to engulf other cells, and a metabolism that depends on the oxidation of organic molecules by mitochondria. These properties suggest that eucaryotes may have originated as predators on other cells. Mitochondria—and, in plants, chloroplasts—contain their own genetic material, and evidently evolved from bacteria that were taken up into the cytoplasm of the eucaryotic cell and survived as symbionts. Eucaryotic cells have typically 3–30 times as many genes as procaryotes, and often thousands of times more noncoding DNA. The noncoding DNA allows for complex regulation of gene expression, as required for the construction of complex multicellular organisms. Many eucaryotes are, however, unicellular—among them the yeast Saccharomyces cerevisiae, which serves as a simple model organism for eucaryotic cell biology, revealing the molecular basis of conserved fundamental processes such as the eucaryotic cell division cycle. A small number of other organisms have been chosen as primary models for multicellular plants and animals, and the sequencing of their entire genomes has opened the way to systematic and comprehensive analysis of gene functions, gene regulation, and genetic diversity. As a result of gene duplications during vertebrate evolution, vertebrate genomes contain multiple closely related homologs of most genes. This genetic redundancy has allowed diversification and specialization of genes for new purposes, but it also makes gene functions harder to decipher. There is less genetic redundancy in the nematode Caenorhabditis elegans and the fly Drosophila melanogaster, which have thus played a key part in revealing universal genetic mechanisms of animal development.

Which statements are true? Explain why or why not. 1–1 The human hemoglobin genes, which are arranged in two clusters on two chromosomes, provide a good example of an orthologous set of genes. 1–2 Horizontal gene transfer is more prevalent in singlecelled organisms than in multicellular organisms. 1–3 Most of the DNA sequences in a bacterial genome code for proteins, whereas most of the sequences in the human genome do not.

Discuss the following problems. 1–4 Since it was deciphered four decades ago, some have claimed that the genetic code must be a frozen accident, while others have argued that it was shaped by natural selection. A striking feature of the genetic code is its inherent resistance to the effects of mutation. For example, a change in the third position of a codon often specifies the same amino acid or one with similar chemical properties. The natural code

resists mutation more effectively (is less susceptible to error) than most other possible versions, as illustrated in Figure Q1–1. Only one in a million computer-generated “random” codes is more error-resistant than the natural genetic code. Does the extraordinary mutation resistance of the genetic code argue in favor of its origin as a frozen accident or as a result of natural selection? Explain your reasoning. number of codes (thousands)


25 20 15 10

natural code

5 0


5 10 15 susceptibility to mutation

Figure Q1–1 Susceptibility of the natural code relative to millions of computergenerated codes (Problem 1–4). Susceptibility measures the average change in amino acid properties caused by random mutations. A small value indicates that mutations tend to cause 20 minor changes. (Data courtesy of Steve Freeland.)

1–5 You have begun to characterize a sample obtained from the depths of the oceans on Europa, one of Jupiter’s moons. Much to your surprise, the sample contains a lifeform that grows well in a rich broth. Your preliminary analysis



shows that it is cellular and contains DNA, RNA, and protein. When you show your results to a colleague, she suggests that your sample was contaminated with an organism from Earth. What approaches might you try to distinguish between contamination and a novel cellular life-form based on DNA, RNA, and protein?

GENE RNA mt nuc mt nuc

1–6 It is not so difficult to imagine what it means to feed on the organic molecules that living things produce. That is, after all, what we do. But what does it mean to “feed” on sunlight, as phototrophs do? Or, even stranger, to “feed” on rocks, as lithotrophs do? Where is the “food,” for example, in the mixture of chemicals (H2S, H2, CO, Mn+, Fe2+, Ni2+, CH4, and NH4+) spewed forth from a hydrothermal vent?

ratory gene Cox2, which encodes subunit 2 of cytochrome oxidase, was functionally transferred to the nucleus during flowering plant evolution. Extensive analyses of plant genera have pinpointed the time of appearance of the nuclear form of the gene and identified several likely intermediates in the ultimate loss from the mitochondrial genome. A summary of Cox2 gene distributions between mitochondria and nuclei, along with data on their transcription, is shown in a phylogenetic context in Figure Q1–2. A. Assuming that transfer of the mitochondrial gene to the nucleus occurred only once (an assumption supported by the structures of the nuclear genes), indicate the point in the phylogenetic tree where the transfer occurred. B. Are there any examples of genera in which the transferred gene and the mitochondrial gene both appear functional? Indicate them. C. What is the minimal number of times that the mitochondrial gene has been inactivated or lost? Indicate those events on the phylogenetic tree. D. What is the minimal number of times that the nuclear gene has been inactivated or lost? Indicate those events on the phylogenetic tree. E. Based on this information, propose a general scheme for transfer of mitochondrial genes to the nuclear genome. 1–11 When plant hemoglobin genes were first discovered in legumes, it was so surprising to find a gene typical of animal blood that it was hypothesized that the plant gene arose




Tephrosia Galactia Canavalia

+ + +

+ + +






Eriosema Atylosia Erythrina

+ + +

+ + +

Ramirezella Vigna Phaseolus

+ + +

+ + +




Calopogonium + Pachyrhizus +

+ +

+ +

+ + + +

+ +

+ + + +

Cologania Pueraria Pseudeminia Pseudovigna

1–8 The genes for ribosomal RNA are highly conserved (relatively few sequence changes) in all organisms on Earth; thus, they have evolved very slowly over time. Were ribosomal RNA genes “born” perfect?

1–10 The process of gene transfer from the mitochondrial to the nuclear genome can be analyzed in plants. The respi-




1–7 How many possible different trees (branching patterns) can be drawn for eubacteria, archaea, and eucaryotes, assuming that they all arose from a common ancestor?

1–9 Genes participating in informational processes such as replication, transcription, and translation are transferred between species much less often than are genes involved in metabolism. The basis for this inequality is unclear at present, but one suggestion is that it relates to the underlying complexity. Informational processes tend to involve large aggregates of different gene products, whereas metabolic reactions are usually catalyzed by enzymes composed of a single protein. Why would the complexity of the underlying process—informational or metabolic—have any effect on the rate of horizontal gene transfer?




+ + +

Ortholobium Psoralea Cullen Glycine



Neonotonia Teramnus Amphicarpa

+ + +


+ + + + + + +


+ +

Figure Q1–2 Summary of Cox2 gene distribution and transcript data in a phylogenetic context (Problem 1–10). The presence of the intact gene or a functional transcript is indicated by (+); the absence of the intact gene or a functional transcript is indicated by (–). mt, mitochondria; nuc, nuclei.

by horizontal transfer from an animal. Many more hemoglobin genes have now been sequenced, and a phylogenetic tree based on some of these sequences is shown in Figure Q1–3. A. Does this tree support or refute the hypothesis that the plant hemoglobins arose by horizontal gene transfer? B. Supposing that the plant hemoglobin genes were originally derived from a parasitic nematode, for example, what would you expect the phylogenetic tree to look like? Whale Rabbit Cat VERTEBRATES CobraChicken Human Salamander Cow Frog Goldfish




Alfalfa Bean




Chlamydomonas Paramecium


Figure Q1–3 Phylogenetic tree for hemoglobin genes from a variety of species (Problem 1–11). The legumes are highlighted in red.


Chapter 1: Cells and Genomes

1–12 Rates of evolution appear to vary in different lineages. For example, the rate of evolution in the rat lineage is significantly higher than in the human lineage. These rate differences are apparent whether one looks at changes in protein sequences that are subject to selective pressure or at

changes in noncoding nucleotide sequences, which are not under obvious selection pressure. Can you offer one or more possible explanations for the slower rate of evolutionary change in the human lineage versus the rat lineage?


Genetic Information in Eucaryotes

General Alberts B, Bray D, Hopkin K et al (2004) Essential Cell Biology, 2nd ed. New York: Garland Science. Barton NH, Briggs DEG, Eisen JA et al (2007) Evolution. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. Darwin C (1859) On the Origin of Species. London: Murray. Graur D & Li W-H (1999) Fundamentals of Molecular Evolution, 2nd ed. Sunderland, MA: Sinauer Associates. Madigan MT & Martinko JM (2005) Brock’s Biology of Microorganisms, 11th ed. Englewood Cliffs, NJ: Prentice Hall. Margulis L & Schwartz KV (1998) Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth, 3rd ed. New York: Freeman. Watson JD, Baker TA, Bell SP et al (2007) Molecular Biology of the Gene, 6th ed. Menlo Park, CA: Benjamin-Cummings.

The Universal Features of Cells on Earth Andersson SGE (2006) The bacterial world gets smaller. Science 314:259–260. Brenner S, Jacob F & Meselson M (1961) An unstable intermediate carrying information from genes to ribosomes for protein synthesis. Nature 190:576–581. Fraser CM, Gocayne JD, White O et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403. Harris JK, Kelley ST, Spiegelman et al (2003) The genetic core of the universal ancestor. Genome Res 13:407–413. Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338. Watson JD & Crick FHC (1953) Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature 171:737–738. Yusupov MM,Yusupova GZ, Baucom A et al (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883–896.

The Diversity of Genomes and the Tree of Life Blattner FR, Plunkett G, Bloch CA et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474. Boucher Y, Douady CJ, Papke RT et al (2003) Lateral gene transfer and the origins of prokaryotic groups. Annu Rev Genet 37:283–328. Cole ST, Brosch R, Parkhill J et al (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537–544. Dixon B (1994) Power Unseen: How Microbes Rule the World. Oxford: Freeman. Kerr RA (1997) Life goes to extremes in the deep earth—and elsewhere? Science 276:703–704. Lee TI, Rinaldi NJ, Robert F et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804. Olsen GJ & Woese CR (1997) Archaeal genomics: an overview. Cell 89:991–994. Pace NR (1997) A molecular view of microbial diversity and the biosphere. Science 276:734–740. Woese C (1998) The universal ancestor. Proc Natl Acad Sci USA 95:6854–6859.

Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195. Andersson SG, Zomorodipour A, Andersson JO et al (1998) The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–140. The Arabidopsis Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815. Carroll SB, Grenier JK & Weatherbee SD (2005) From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design, 2nd ed. Maldon, MA: Blackwell Science. de Duve C (2007) The origin of eukaryotes: a reappraisal. Nature Rev Genet 8:395-403. Delsuc F, Brinkmann H & Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Rev Genet 6:361–375. DeRisi JL, Iyer VR & Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680–686. Gabriel SB, Schaffner SF, Nguyen H et al (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229. Goffeau A, Barrell BG, Bussey H et al (1996) Life with 6000 genes. Science 274:546–567. International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. Kellis M, Birren BW & Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617–624. Lynch M & Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155. Mulley J & Holland P (2004) Comparative genomics: Small genome, big insights. Nature 431:916–917. National Center for Biotechnology Information. Owens K & King MC (1999) Genomic views of human history. Science 286:451–453. Palmer JD & Delwiche CF (1996) Second-hand chloroplasts and the case of the disappearing nucleus. Proc Natl Acad Sci USA 93:7432–7435. Pennisi E (2004) The birth of the nucleus. Science 305:766–768. Plasterk RH (1999) The year of the worm. BioEssays 21:105–109. Reed FA & Tishkoff SA (2006) African human diversity, origins and migrations. Curr Opin Genet Dev 16:597–605. Rubin GM, Yandell MD, Wortman JR et al (2000) Comparative genomics of the eukaryotes. Science 287:2204–2215. Stillman B & Stewart D (2003) The genome of Homo sapiens. (Cold Spring Harbor Symp. Quant. Biol. LXVIII). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018. Tinsley RC & Kobel HR (eds) (1996) The Biology of Xenopus. Oxford: Clarendon Press. Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr Opin Cell Biol 15:221–231. Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291:1304–1351.

9n sq8ra^auoJlnau Jo uoloJd euo 'Jeleuqlrru B ueds ol 'eurl ]qBIeJls e ul lno p1el 'rueqlJo uorTlrrug tnoqe e{Bl plno.^ tl }eqt os teteuerp ur uru 7'g,(lqBnor sr ruo}e uoqJ€f, lenprlrpu uV'ezrs rraq] aurBeurr o] preq sr 1rluql IIEIUsos eJp sruolv 'ruole uaSorpdq e go sseru eq] o1 pnba ,(laleruxordde 11unsspru cnuolp ue Suraq uolpp euo 'suoilDp ur pagrcadsuego sr elnJaloru e Jo ruole ue Jo sseru agl'Orr su uailrJm sr pue 7I go lq8raa,rJlruole uE seq uoqJeJ Jo adotosr elqelsun cruole ue seq uoqrec 3o adol uE seaJaqm')71 se pezrToqufs sr pue ZI ;o lq8rarvr. -osr rofetu eql snqJ'lplol eql o1Surqlou tsorule a]nqrJluoJ pue ralq8q qonru ere suoJlJeleaql ecurs 'sureluoc alnceloru Jo ruole eql leql suoJlneu snld suolord;o raqunu aql o] pnba rtlepuassa q slqJ'urole uaSorp,(q u 1o leql ol elrleleJ ssetu slr sr 'alnoeloru u Jo fq8laiu JBInJeIoru eql Jo 'ruole up go rqtgam Jlurolu eql

sruolv Jo sad^I ru\ale urorl apew aJVslla) 'peruJoJ eJp selnf,elotu ur raqleSol sruote ploq ]Eql spuoq leJrrueqJ eql Jo IIB ^aoq A\ODI ol [BrJnJJ sJlr 'aJoJaJaq]tellEur elprurueur ruoq llrnq are srusrueSro 3utntl llroq puelsJepun ot JepJo uI 'salnJalou ruJoJ ol sdnor8 ur raqlaSol pe{url eJpsruolp rraql ,(ea,raq1uo puadap-aperu eJe s11ac 8ur -^I qclqm ruo4 slerJaleuraq] SurpnlJur-slueuala arnd ueql Jaqto seJuelsqns '11-7 arnt;g) Iaolv ue st saqradord Jo scrlsrJa]f,eJeqJeql tananoll lecrrueqc elrl 'superu -cuDSIpslr suplal Ilrls ]eql lueuele ue;o aycqredlsell"rus eqJ Iecrueqf, ,{.q sacuelsqns Jaqlo o}ur paueluoJ Jo rrmop ue{oJq eq touueJ teql uoqrpo ro uaSorp^,tqse qons saouplsqns-sluautap Jo suouuurqruoo Jo epuru sr Japehtr

r1) v ro slNtNodwo) rv)rwlH)rHr_


COOIWOUIADU]N] Ntvt_80 sttt) MoH slltf A8A9UlNtlO


ttlf v lo -tv)lt slNtNodtJo) tHf tHI

raldeq) sFlI ul

'eJIIJorltslratrereqr aru leql s8ulql reqlo eq] ile op o] sB se-acnpordeJ pue,r,ror8 o] srusrueBro pue slleJ alqeue leql salnJalou IIe^ -oJceruaseqlJo sapradord anbrun eql sl lI 'pue-ol-pua peluq slrunqns leJrrueqc Jo surpqc-salnlalou c1tarut1od snoruroue olur paterodJoJur eJEslleo ur sruole uoqJpo eqt Jo lsolu 'salnJeloru SurumluoJ-uoqJec Ilerus Jo ,fiagen e ureluoc slac q8noqllv'umou{ uralsrts [EcruaqJ raqto rtue ueql ,fu]sluaqJ slr ul patecrld -tuoc aroru dllsel sr :xalduroc dlsnor.uJouasr drlsnuaqc 1ac eqt uela lsalduns 1ac 'lueUodurr 'pJrql'uonnlos snoanbe u1 aceyde{Bi lsoru pue luql suorlf,eal IeJr -uaqc uo ,(1a8re1spuadap eJIIpue leluru lueJJad 64 are sllal 'puoras 'ttlslutaqc uunBn sE u,r,rornleJoJeJeqlsr r(pn1sesoqm 'spunodruoc uoqJec uo ,{punulaq,rn -relo peseq sl ]l 'lsJIC 'prcads s aytgo rfulsrueqc aql tele,r,roH 'sira.e1 pcrsdqd puu IeorrueqJ sLaqosrp ]eql susrue8ro 3urn1 ur 3urqlou sr eJaqt lr,rorDl^ ou a A 'sauradord alrlf, urlsrp Jreql ro; alqrsuodseJ sem lpql-,,snrurup,, uE-acJoC 1etr1 e ureluof, ol peleqaq aJelrlr sprutue drnluec qluee]auru eq] []un 'paepq 'saqrrcsapl.lpurrou drlsrueqf, ]eql sase8pue 'sprnbu 'sprlosJo plrom ar{} ruoq pede ruaql tas ol readde acnpordar pue aror8 o] rqilqe Jlaqt pup torlpqaq pyasodrnd [18uruaas 4aqt 'surro;3unr1 ;o rlrsranrp alqrpeJJuraq;'ualsds leJrruaqc e flararu sl 1 raldeq3 ur peqrJcsep sarnleaJf, Surnq aql Jo qrea ]eq] eapr eql tdecJe 01 llnorJJrp rr{8ls rsrg 1u sf lI

s!seHlu{solg pue Atlsltueq)Ila)

'aur!] Joluelsutua^t6Iue le sr uorl)alaue alaqm{l}texa 6urDrpard Jo (em ou sraraq]pue /slrueq)auuinluenb Jo sltnelaqt {q pauranobst lotneqaq rlaqt^trlearur 'saprlredlpnpr^rpul se alaq uMoqsaresuoll)elaaq1q6noqrle 'pnolr uor})elaaq} {1;eu13 Joleqt,_0t tnoqer(;uosrsnal)nuaql Jolalauetp aql'uor]rppeu1'araqpelera66exa ,(;lear6srozrsraql ialoqMp se uole aqi ol uortplarur elnuruJI1r;earur ate suorlralapue'suo1o.rd'suollneu aqf 's eualeur I > 1ourbrro;oar.u!leql autLulalap ot {boloaeqtreut pesnst q)tqM '6u!rep tI uoqre)se urvroulanbruqral e roJsrseqeqt sLuro]srql'alel ,{pea1s 1nq Molse lp {etap enrl>eorpe.r sao6tapuny1 uoqre)'suollnau1q6rapue suolo.rd xrs a^eLlsuJoleasoqMitt uoqre)a^tl)eotpel oql'aoolosralqelsunueJo stunoue xts lleuisoslealp atoL]l'suollnau pue suolordxrsr1llM'Z t uoqlef adolosr alqetsaLllse slsrxaqlrel uo uoqle) lsout alrqM'alotUpxa lol 'alqelsunalp lsqt auos 6urpn;rur'{1;ernleu ln)fo sluaulola oql llelsorulpJo sadolosra;d1t;ny1 'suolord;olaquJnuaulpsaql lnq suollnau,olaqulnu luata;Jtpe 6urneq adolosr\)ea'sadolos t palle)'sr.!loj le)!]uep!{1;e>tuaql1nqalqeqsrnburlsrp Illerrs{qdlelanasut }stxaue) }uauale ue 'suorlnauJo asne)a8'tlotp aql Jo saryadordle)ruraq)oql lalle lou op {eql lnq-^e)ap a^rl)eorpelIq alet6elursrp Ieu sna;lnu eqr MaJool ro ,(ueu.r ool arearaqljr-snal)nu aql Jo ^ttltqets leJn])nrlsaql ol alnquluor{aq1'suolord sesseu auuesaq1{;;erluassa Jo sa;rrlred pableq)unaJesuollnoN lr,l-toleqns 'laqIIJnu]il.llole aures aq] a^eqtuauale uanr6e;o suroleaql Jo llP'ruoleue,o lot^eqaqlP)tulaq)aql eururalapleql suorl)elaaql srlr asnplag 'e6reqr 1auou spqulolp aql lpql os '(Jaqwnu )twolDaql) suotoldJo laqunu slrol lenDasruiole uP ur suoll)elaJo loqurnu atll'suotlnau;erlnau,{;lerulta;a pue suolotdpa6teqt {;anrlsod qloq Jo slsrsuo)ue6orpr(qlda)xa uole llane lo snapnuaq1'ua6orplq ro urole ue pue uoqre) Jo ruolp ue,o suopleluaserde.r )lleulaq)s(1q6191-7 arnbr3

uoJlJele auo,{1uo qlr^ 'lspJ}uoc[q 'uaSorprtg 'sase8uaul ilP eJEasaql 1g+ g + 7 qll-M uoBJE pue 'B + Z qlyv\ uoau (suoJlJala Z qlF runlTeq eru saldluuxg 'o^Ilce -aJun,li.llerrluaqraroJaraqtpue alqEtsf.lprcadsa sI suorlJale q}l \ palllJ dlarpua sr ilaqs lsoruJelno asoq ^ Iuole uv 'uo os puE '.pJlql eq] eJoJaqpuoces eql 'puoses eq] aroJaq lleqs ]srrJeql-Jepro uI slEllqro eq] ilu ruolP uP Jo suorlcele aq] 'sluole JaBJEIaqt u suolldeJxa ul€lJec qlFu 'aroJaJeqJ'sllaqs lsoureuul eql ,,(dnccodaqt uaqarr'sl ]Eql-alqlssod eJe]eq] salpls punoq dFqBI] tsoru eql uI ere suoJlrale aq] ile ueq^a alq€ls ]solu sI luolP uE Jo luetuaSuBJJPuoJlsela eqJ 'salnJelolu 't{JEe suoDcele Ief,rSolorq ur aJpJ ,{ran aru sfieqs rnoJ ueq] eJolu qll^\ surolv 8I ploq uBr slleqs qulJ puE qunoJ aqJ'suol]cele 1q3le ol dn sploq osp lI :punoq dpq8u ssel uale are tpqt suorlcale suletuoJ llaqs prlql eql 'suorlcala lq8le ol dn sploq ilaqs puoJas srql 'punoq fpq8p ssel are suorlcala sll pue 'snelJnu eq] tuo4 d.e.vreraqpeJ sr Ileqs puosas eql 'suoJ]cale o^\l Jo lunlurxBlu P sploq 'ileqs 'Jsolurauul ,{dncco pue ]l o1 flSuorls punoq eql dllq8p Jsoru fiaqs sIqI uo lsesolJ suoJlJala aqL'Ip|.F aJEsnalJnu a^rlrsod aql ol eSeJaAe ]sotl'rpe13eJ11p uoJpala palles-os E-ad,{} ua^r3 e Jo l€lrqJo ue uI pa}Pporuruof,Jeeq uPc }eq} suoJ]f,eleJo Jeqrunu eq] ol llurll l3rJls e sI aJaq] leql pue 'spllqJo pellEJ 'selEls eleJcsrpureual ur,{luo lsue ueJ ruole ue ur suoJlcala}uql elBlJIp srltelasaql 'aJIT depLrala ur JeIIrueJ asoql ruo4 s,lrel ]uaJaJJIpfrarr daqo apcs cldocsoJJlruqns slql uo suorloru lnq'snalsnu eql punoJBuollotu snonunuoJ uI aJesuoJlselg 'salnJalou ruJoJo] eulquoJ sruole qJIqM,,(q.,tr1sruraq3 selnJ Jo eqf .{JIJads pue ruo}e ue Jo JoIJe}xa aql uJoJ ^,(aq; 'sluaure8ueJJeer oBrapun 'sanssl] SuntT uI 'rolJear JBelcnu e ]Bql ruole uE Jo suoJlJele eql dpo st lI Jo Jo uns aq] Jo JorJaluraq] ur Jo 'aldurexa ro; ,{ecap eAI}JEoIpeJSutrnp-suoplpuoc arueJlxeJepun .{1uosraulred aSueqc puE snelJnu eq} uI Jaqlouu euo ol dFqSll pepla,/!\aJEsuorlnau pup suoloJd 'suoJlsalerleq] uo snJoJe^\'srusrueSro3unr1 dn aleu leql salnJelou aql {uJoJ ot JaqleSol puoq stuo}P ^ oq pue}sJapun oJ

l)eJalul stuolv /vloHau!uJaloc suoll)elf lsotltlolno eql Lrlsnuaqc;o ad,rt a,ulcupslp e Jo eoueplla sI pue (t-Z arn8lg) lua(uuoJllue '1q8raa,r Jlue8rour Sur.trluou eq] Jo ]Pql tuoq ^lpe{Jelu sJaJJIpuolllsodtuol srql s.usrup8ro ue Jo %9'96 dn a{Eur-(O) ue8dxo pue '(N) uaSorllu '(H) ueSorpdq (sluaruele ' (3) esaqt uoqrec-qcrq ^ Jo uol]f,elas llstus E ,,tpo ;o aperu eJe Jo JnoJ 're^e,{i\oq'sursrue8roSuyrtt 'sluole sll uI suoJ]3elapue suoloJd Jo Jeqrunu aql uI sJaqlo aq] uro4 Surra;;rp qJea 'slualuele SuuJnoJo -,(1urnleu 68 aJe eJaqJ '(z-z arnElc) eruBlsqns aql Jo elolu auo palpc sr fi11 -uenb srql 'sruer8XJo sseruE aneq IIIM ]l Jo selnoelorug70I x 9 'XJo tq8la,vrrEIn -Jelou e seq eJuulsqns eJI 'selnJelou Jo sruole PnpIAIpulJo s[uJal uI peJnsEau sapnuunb pup sarlrtuenb fepI-rana uaaMleq dtqsuouelar eqt Sulquf,sep JolceJ 'sluolE eleJs ^e{ eql sI (Jaqlunu s.oJpBSo^vpa11ec'g70I x 9) Jequnu aSnq srq; ez0l x 9 suleluoc ueSoJp^q;o ruur8 auo os'luer8 (sz0l x 9)/I ,{lateurxordde

! = ]q6ta^^ lil-uole I = laqulnu )!ulole urole ua6olpfq

7 1 = 1 q 6 r a m: r u o 1 e )lulole 9 = JAqLUnU r.,lrole uoqre)


pue^rlsruaq) llo) :z ieldeqf srsaqlu^so!g




and only a half-filled shell, is highly reactive. Likewise, the other atoms found in living tissues have incomplete outer electron shells and can donate, accept, or share electrons with each other to form both molecules and ions (Figure 2-4). Becausean unfilled electron shell is less stable than a filled one, atoms with incomplete outer shells tend to interact with other atoms in a way that causes them to either gain or lose enough electrons to achieve a completed outermost shell. This electron exchange occurs either by transferring electrons from one atom to another or by sharing electrons between two atoms. These two strategies generate two types of chemical bonds between atoms: an ionic bond is formed when electrons are donated by one atom to another, whereas a coualent bondis formed when two atoms share a pair of electrons (Figure 2-5). Often, the pair of electrons is shared unequally, with a partial transfer between two atoms that attract electrons differently-one more electronegatiuethanthe other: this intermediate strategy results in a polar coualentbond, as we shall discusslater. An H atom, which needs only one electron to fill its shell, generally acquires it by electron sharing, forming one covalent bond with another atom; often this bond is polar-meaning that the electrons are shared unequally. The other common elements in living cells-C, N, and O, with an incomplete second shell, and P and S, with an incomplete third shell (see Figure 2-4)-generally share electrons and achieve a filled outer shell of eight electrons by forming several covalent bonds. The number of electrons that an atom must acquire or lose (either by sharing or by transfer) to fill its outer shell is knorm as irs ualence. The crucial role of the outer electron shell in determining the chemical properties of an element means that, when the elements are listed in order of their atomic number, there is a periodic recurrence of elements with similar properties: an element with, say, an incomplete second shell containing one electron will behave in much the same way as an element that has filled its second shell

A mole is X grams of a substance, where X is its relative molecularmass (molecularweight).A mole will contain 5 x 102m 3 o l e c u l e so f t h e s u b s t a n c e . 1 m o l e o f c a r b o nw e i g h s1 2 g 1 m o l e o f g l u c o s ew e i g h s 1 8 0g 1 m o l e o f s o d i u mc h l o r i d ew e i g h s5 8 g Molar solutions have a concentration of 1 mole of the substancein 1 liter of s o l u t i o n .A m o l a r s o l u t i o n( d e n o t e da s 1 M ) o f g l u c o s e{,o r e x a m p l e ,h a s , h i l e a m i l l i m o l asr o l u t i o n 1 8 0g / 1 w (1 mM) has 180mg/|. The standardabbreviationfor gram is g; the abbreviationfor liter is l.

Figure2-2 Molesand molar solutions.



numan oodv Earth'scrust

o c 6

Ef s o !

a @ a o

E20 o o

I.: ano Mg

NaP ano K

Figure2-3 The abundancesof some chemicalelementsin the nonliving world (the Earth'scrust)comparedwith their abundancesin the tissuesof an animal.The abundanceof eachelement is expressedas a percentageof the total numberof atomspresentincluding water.Thus,becauseof the abundanceof water,more than 600loof the atoms in a livingorganismarehydrogenatoms.The relativeabundanceof elementsis similar i n a l l l i v i n gt h i n g s .


Chapter2: CellChemistryand Biosynthesis tomic number


e l e c t r o ns h e l |-

Figure2-4 Filledand unfilledelectron shellsin somecommonelements.All the elementscommonlyfound in living organismshaveunfilledoutermostshells (red)andcan thus participatein chemical reactions with otheratoms.For comparison, someelementsthat have only filled shells(yellow)areshown;these arechemicallyunreactive.

&ae&&& &&&&&s e&8*a€ &***** &&&&&& e e e * | & # * t * , s & & & ee

and has an incomplete third shell containing one electron. The metals, for example, have incomplete outer shells with just one or a few electrons,whereas, as we have just seen,the inert gaseshave full outer shells.This pattern gives rise to the famous periodic table of the elements, presented in Figure 2-6 with the elements found in living organisms highlighted.

CovalentBondsFormby the Sharingof Electrons All the characteristics of a cell depend on the molecules it contains. A molecule is defined as a cluster of atoms held together by covalent bonds; here electrons are shared between atoms to complete the outer shells,rather than being transferred between them. In the simplest possible molecule-a molecule of hydrogen (H2)-two H atoms, each with a single electron, share two electrons, which is the number required to fill the first shell. These shared electrons form a cloud of negative charge that is densestbetween the two positively charged nuclei and helps to hold them together, in opposition to the mutual repulsion between like charges that would otherwise force them apart. The attractive and repulsive forces are in balance when the nuclei are separatedby a characteristic distance, called the bond length. Another property of any bond-covalent or noncovalent-is its bond strength, which is measured by the amount of energy that must be supplied to break that bond. This is often expressedin units of kilocalories per mole (kcal/mole), where a kilocalorie is the amount of energy needed to raise the temperature of one liter aroms







covalent bond

positive ron

negative ton

i o n i cb o n d

Figure2-5 Comparisonof covalentand ionicbonds.Atomscanattaina more stablearrangement of electronsin their outermostshellby interactingwith one another.An ionicbond isformedwhen electronsare transferredfrom one atom to the other.A covalentbond isformed when electronsare sharedbetween atoms.The two casesshown represent extremes;often,covalentbonds form with a partialtransfer(unequalsharingof electrons), resultingin a polarcovalent bond (seeFigure2-43).


THECHEMICAL COMPONENTS OFA CELL a t o m i cn u m b e r a t o m i cw e i g h t 6789







24 20













Na Mg 19


14 15

52 42












CuZn 59


19 11








of water by one degreeCelsius(centigrade).Thus if 1 kilocalorie must be supplied to break 6 x 1023bonds of a specific type (that is, I mole of these bonds), then the strength of that bond is I kcal/mole. An equivalent, widely used measure of energy is the kilojoule, which is equal to 0.239kilocalories. To understand bond strengths, it is helpful to compare them with the average energiesof the impacts that molecules are constantly experiencing from collisions with other molecules in their environment (their thermal, or heat, energy), as well as with other sources of biological energy such as light and glucose oxidation (Figure 2-7).Typical covalent bonds are stronger than the thermal energies by a factor of 100, so they resist being pulled apart by thermal motions and are normally broken only during specific chemical reactions with other atoms and molecules. The making and breaking of covalent bonds are violent events, and in living cells they are carefully controlled by highly specific catalysts, called enzymes.Noncovalent bonds as a rule are much weaker; we shall see later that they are important in the cell in the many situations where molecules have to associateand dissociate readily to carry out their functions. \Mhereasan H atom can form only a single covalent bond, the other common atoms that form covalent bonds in cells-O, N, S, and B as well as the allimportant C atom-can form more than one. The outermost shell of these atoms, as we have seen, can accommodate up to eight electrons, and they form covalent bonds with as many other atoms as necessary to reach this number. Oxygen, with six electrons in its outer shell, is most stable when it acquires an extra two electrons by sharing with other atoms and therefore forms up to two covalent bonds. Nitrogen, with five outer electrons, forms a maximum of three covalent bonds, while carbon, with four outer electrons, forms up to four covalent bonds-thus sharing four pairs of electrons (seeFigure 2-4). \.Vhen one atom forms covalent bonds with several others, these multiple bonds have definite arrangements in spacerelative to one another, reflecting the orientations ofthe orbits ofthe shared electrons.The covalent bonds ofsuch an atom are therefore characterized by specific bond angles as well as by bond lengths and bond energies (Figure 2-B). The four covalent bonds that can form around a carbon atom, for example, are arranged as if pointing to the four corners of a regular tetrahedron. The precise orientation of covalent bonds forms the basis for the three-dimensional geometry of organic molecules.

average t h e r m a lm o t i o n s E NE R G Y CONTENT ( k c a l / m o l e0 ) .1

1 noncovalentbond breakagein water

ATPhydrolysis in cell

Figure2-6 Elementsorderedby their atomicnumberform the periodictable. Elements fall into groupsthat show similarpropertiesbasedon the number in its of electronseachelementpossesses outershell.Forexample,Mg and Catend to giveawaythe two electronsin their outershells;C, N,and O completetheir The secondshellsby sharingelectrons. four elementshighlightedin red constitute99oloof the total number of atomspresentin the humanbody.An additionalsevenelements,highlightedin of b/ue,together representabout 0.9olo the total.Otherelements,shownin green, arerequiredin traceamountsby humans. It remainsunclearwhetherthose in elementsshownin yellowareessential humansor not.Thechemistryof life,it the seems,is thereforepredominantly chemistryof lighterelements. Atomicweights,givenby the sum of the orotonsand neutronsin the atomic will varywith the particular nucleus, isotopeof the element.Theatomic weightsshownherearethoseof the mostcommonisotopeof eachelement.

C-C bond breakage

[ ::.,tr,,+;:.1]*rrg|.f+€*#*#Ei#fflff 100 10 green light


complete glucoseoxidation

Figure2-7 Someenergiesimportant for cells.Notethat theseenergiesare comparedon a logarithmicscale


Chapter2: CellChemistryand Biosynthesis

-ooxygen (A)


I nitrogen


-c I caroon

Figure2-8 The geometry of covalent bonds.(A)Thespatialarrangement of the covalentbondsthat can be formed by oxygen,nitrogen,and carbon. (B)Moleculesformed from theseatoms havea precisethree-dimensional structure,as shown here by ball-and-stick modelsfor waterand propane. A structurecan be specifiedby the bond anglesand bond lengthsfor each covalentlinkage. Theatomsarecolored accordingto the following,generally usedconvention:H, white;C, block; O, red; N, blue.

water (H2O) (B)

p r o p a n e( C H 3 - C H 2 - C H 3 )

ThereAre DifferentTypesof CovalentBonds Most covalent bonds involve the sharing of two electrons, one donated by each participating atom; these are called single bonds. Some covalent bonds, however, involve the sharing of more than one pair of electrons. Four electrons can be shared, for example, two coming from each participating atom; such a bond is called a double bond. Double bonds are shorter and stronger than single bonds and have a characteristic effect on the three-dimensional geometry of molecules containing them. A single covalent bond between tvvo atoms generally allows the rotation of one part of a molecule relative to the other around the bond axis. A double bond prevents such rotation, producing a more rigid and less flexible arrangement of atoms (Figure 2-9 and Panel 2-1, pp. 106-107). In some molecules, electrons are shared among three or more atoms, producing bonds that have a hybrid character intermediate between single and double bonds. The highly stable benzene molecule, for example, consists of a ring of six carbon atoms in which the bonding electrons are evenly distributed (although usually depicted as an alternating sequence of single and double bonds, as shown in Panel 2-1). \ivhen the atoms joined by a single covalent bond belong to different elements, the two atoms usually attract the shared electrons to different degrees. compared with a c atom, for example, o and N atoms attract electrons relatively strongly, whereas an H atom attracts electrons more weakly. By definition, a polar structure (in the electrical sense)is one with positive charge concentrated toward one end (the positive pole) and negative charge concentrated toward the other (the negative pole). covalent bonds in which the electrons are shared unequallyinthiswayarethereforeknown aspolarcoualentbonds(Figure2-10). For example, the covalent bond between oxygen and hydrogen, -O-H, or between nitrogen and hydrogen, -N-H, is polar, whereas that between carbon and hydrogen, -C-H, has the electrons attracted much more equally by both atoms and is relatively nonpolar. Polar covalent bonds are extremely important in biology because they create permanent dipolesthat allow molecules to interact through electrical forces. Any large molecule with many polar groups will have a pattern of partial positive and negative chargeson its surface.\Ay'hen such a molecule encounters a second molecule with a complementary set of charges, the two molecules will be attracted to each other by electrostatic interactions that resemble (but are weaker than) the ionic bonds discussedoreviouslv.

(A) ethane

(B) ethene Figure2-9 Carbon-carbondouble bonds and singlebondscompared.(A)The ethanemolecule,with a singlecovalent bond betweenthe two carbonatoms, illustrates the tetrahedral arrangement of singlecovalentbondsformedby carbon. One of the CH3groupsjoined by the covalentbond can rotaterelativeto the otheraroundthe bond axis.(B)The doublebond betweenthe two carbon atomsin a moleculeof ethene(ethylene) altersthe bond geometryof the carbon atomsand bringsall the atomsinto the sameplane(blue);thedoublebond preventsthe rotationof one CH2group relativeto the other.



An AtomOftenBehaves asif lt Hasa FixedRadius \.Vhena covalent bond forms between two atoms, the sharing of electrons brings the nuclei of these atoms unusually close together. But most of the atoms that are rapidly jostling each other in cells are located in separate molecules. \A/hat happens when two such atoms touch? RoshanKeab 02I-66950639 For simplicity and clarity, atoms and molecules are usually represented schematically-either as a line drawing of the structural formula or as a balland-stick model. Space-fiIling models,however, give us a more accurate representation of molecular structure. In these models, a solid envelope represents the radius of the electron cloud at which strong repulsive forces prevent a closer approach of any second, non-bonded atom-the so-called uan derWaals radius for an atom. This is possible because the amount of repulsion increases very steeply as two such atoms approach each other closely.At slightly greater distances, any two atoms will experience a weak attractive force, knor,rryras a uan der Waalsattraction. As a result, there is a distance at which repulsive and attractive forces precisely balance to produce an energy minimum in each atom's interaction with an atom of a second, non-bonded element (Figure Z-tl). Depending on the intended purpose, we shall represent small molecules as Iine drawings, ball-and-stick models, or space-filling models. For comparison, the water molecule is represented in all three ways in Figure 2-l2.lMhenrepresenting very large molecules, such as proteins, we shall often need to further simplifu the model used (see,for example, Panel 3-2, pp. 132-133).



Figure2-10 Polarand nonpolar covalentbonds.Theelectron distributions in the oolarwatermolecule (H:O)and the nonpolaroxygenmolecule (Oz)are compared(6+,partialpositive charge;6-, partialnegativecharge).

Waterls the MostAbundantSubstance in Cells Water accounts for about 70% of a cell'sweight, and most intracellular reactions occur in an aqueous environment. Life on Earth began in the ocean, and the conditions in that primeval environment put a permanent stamp on the chemistry of living things. Life therefore hinges on the properties of water. In each water molecule (HzO) the two H atoms are linked to the O atom by covalent bonds (seeFigure 2-12). The two bonds are highly polar becausethe O is strongly attractive for electrons, whereas the H is only weakly attractive. Consequently,there is an unequal distribution of electrons in a water molecule, with a preponderance of positive charge on the two H atoms and of negative charge on the O (see Figure 2-10). 'vVhen a positively charged region of one water molecule (that is, one of its H atoms) approaches a negatively charged region (that is, the O) of a secondwater molecule, the electrical attraction between them can result in a weak bond called a hydrogenbond (seeFigure 2-15). These bonds are much weaker than covalent bonds and are easily broken by the random thermal motions due to the heat energy of the molecules, so each bond lasts only a short time. But the combined effect of many weak bonds can be profound. Each water molecule can form hydrogen bonds through its two H atoms to two other water molecules, producing a network in which hydrogen bonds are being continually broken and formed (Panel 2-2, pp.f0B-109). It is only because of the

. (+) I E U




v a n d e r W a a l sf o r c e e q u i l i b r i u ma t t h i s p o i n t

Figure2-1 1 The balanceofvan der Waalsforces between two atoms. As the nucleiof two atomsapproach eachother,they initiallyshowa weak due to their bondinginteraction fluctuatingelectriccharges.However,the sameatomswill stronglyrepeleachother if they are brought too closetogether. The balanceof thesevan derWaals forcesoccursat attractiveand reoulsive the indicatedenergyminimum.This minimumdetermines the contact distancebetweenany two noncovalently bondedatoms;this distanceis the sum of theirvan der Waalsradii.By definition, zero energy(indicatedby the dotted red line)is the energywhen the two nuclei areat infiniteseparation.


Chapter2: CellChemistryand Biosynthesis van derWaals


o HH (A)


van derWaals r a d i u so f H i=1.24



hydrogen bonds that link water molecules together that water is a liquid at room temperature, with a high boiling point and high surface tension-rather than a gas. Molecules, such as alcohols, that contain polar bonds and that can form hydrogen bonds with water dissolve readily in water. Molecules carrying plus or minus charges (ions) likewise interact favorably with water. Such molecules are termed hydrophilic, meaning that they are water-loving. A large proportion of the molecules in the aqueous environment of a cell necessarilyfall into this category including sugars, DNA, RNA, and most proteins. Hydrophobic (waterhating) molecules, by contrast, are uncharged and form few or no hydrogen bonds, and so do not dissolve in water. Hydrocarbons are an important example (see Panel 2-I, pp. 106-107). In these molecules the H atoms are covalently linked to C atoms by a largely nonpolar bond. Becausethe H atoms have almost no net positive charge, they cannot form effective hydrogen bonds to other molecules. This makes the hydrocarbon as a whole hydrophobic-a property that is exploited in cells,whose membranes are constructed from molecules that have long hydrocarbon tails, as we shall see in Chapter I0.

Some PolarMoleculesAre Acidsand Bases One of the simplest kinds of chemical reaction, and one that has profound significance in cells,takes place when a molecule containing a highly polar covalent bond between a hydrogen and a second atom dissolvesin water. The hydrogen atom in such a molecule has largely given up its electron to the companion atom and so resembles an almost naked positively charged hydrogen nucleus-in other words, a proton (H+).\A/henwater molecules surround the polar molecule, the proton is attracted to the partial negative charge on the O atom of an adjacent water molecule and can dissociate from its original partner to associate instead with the oxygen atoms of the water molecule to generate a hydronium ion (H3O+)(Figure 2-f 3A). The reversereaction also takes place very readily, so one has to imagine an equilibrium state in which billions of protons are constantly flitting to and fro from one molecule in the solution to another. The same tlpe of reaction takes place in a solution of pure water itself. As illustrated in Figure 2-13B, water molecules are constantly exchanging protons with each other. As a result, pure water contains an equal, very low concentration of H3O+and OH- ions, both being present at 10-7M. (The concentration of H2O in pure water is 55.5M.) Substancesthat releaseprotons to form H3O+when they dissolve in water are termed acids. The higher the concentration of HsO*, the more acidic the solution. As H3O* rises, the concentration of OH- falls, according to the equilibrium equation for water: [HsO*][OH-] = 1.0 x 10-la, where square brackets denote molar concentrations to be multiplied. By tradition, the H3O+ concentration is usually referred to as the H+ concentration, even though nearly all H+ in an aqueous solution is present as H3O+.To avoid the use of unwieldy numbers, the concentration of H+ is expressedusing a logarithmic scale called the pH scale, as illustrated in Panel 2-2 (pp.108-109). Pure water has a pH of 7.0, and is neutral-that is, neither acidic (pH < 7.0) nor basic (pH > 7.0).

Figure2-1 2 Threerepresentations of a water molecule.(A)The usualline formula,in drawingof the structural whicheachatom is indicatedby its standardsymbol,and eachline represents a covalentbondjoiningtwo model,in atoms.(B)A ball-and-stick by spheres whichatomsarerepresented of arbitrarydiameter,connectedby sticks representing covalentbonds.Unlike(A), represented bond anglesareaccurately in thistype of model(seealsoFigure model,in which 2-8).(C)A space-filling both bond geometryand van derWaals represented. radiiareaccurately



53 HOH


o-H 66acetic acid

|| !:!l HrO

hydronium ton

acetate ton




proton moves from one m o l e c u l et o the other




hydronium ton

hydroxyl ton

Becausethe proton of a hydronium ion can be passedreadily to many types of molecules in cells, altering their character,the concentration of H3O+inside a cell (the acidi$ must be closely regulated. The interior of a cell is kept close to neutrality, and it is buffered by the presence of many chemical groups that can take up and releaseprotons near pH 7. The opposite of an acid is a base. Just as the defining property of an acid is that it donates protons to a water molecule so as to raise the concentration of H3O+ions, the defining property of a base is that it acceptsprotons so as to lower the concentration of H3O+ions, and thereby raise the concentration of hydroxyl ions (OH-). A base can either combine with protons directly or form hydroxyl ions that immediately combine with protons to produce H2O. Thus sodium hydroxide (NaOH) is basic (or alkaline) because it dissociatesin aqueous solution to form Na+ ions and OH- ions. Other bases,especially important in living cells, contain NH2 groups. These groups directly take up a proton from water: -NH2 + H2O -+ -NHs* + OH-. All molecules that accept protons from water will do so most readily when the concentration of H3O* is high (acidic solutions). Likewise, molecules that can give up protons do so more readily if the concentration of H3O+in solution is low (basic solutions), and thev will tend to receive them back if this concentration is high.

FourTypesof Noncovalent AttractionsHelpBringMolecules Togetherin Cells In aqueous solutions, covalent bonds are 10-100 times stronger than the other attractive forces between atoms, allowing their connections to define the boundaries of one molecule from another. But much of biology depends on the specific binding of different molecules to each other. This binding is mediated by a group of noncovalent attractions that are individually quite weak, but whose energies can sum to create an effective force between two separate molecules. We have previously introduced three of these attractive forces: electrostatic attractions (ionic bonds), hydrogen bonds, and van der Waals attractions. Table 2-l compares the strengths of these three types of noncoualent bonds with that of a typical covalent bond, both in the presence and in the Table2-1 Covalentand NoncovalentChemicalBonds

Covalent Noncovalent:ionicx hydrogen van derWaalsattraction (peratom)

0.15 0.2s 0.30 0.35

90 80 4 0.1

*An ionicbond isan electrostatic attraction betweentwo fullvcharoedatoms

90 3 1 0.1

Figure2-13 Acids in water. (A)The reactionthat takesolacewhen a moleculeof aceticaciddissolves in water. (B)Watermolecules arecontinuously exchangingprotonswith eachotherto form hydroniumand hydroxylions.These ionsin turn rapidlyrecombineto form watermolecules.


Chapter2: CellChemistryand Biosynthesis

absenceof water. Becauseof their fundamental importance in all biological systems, we summarize their properties here: Electrostatic attractions. These result from the attractive forces between oppositely charged atoms. Electrostatic attractions are quite strong in the absence of water. They readily form between permanent dipoles, but are greatestwhen the two atoms involved are fully charged (ionic bonds).However,the polar water molecules cluster around both fully charged ions and polar molecules that contain permanent dipoles (Figure 2-14). This greatly reduces the attractivenessof these charged species for each other in most biological settings. Hydrogen bonds. The structure of a typical hydrogen bond is illustrated in Figure 2-15. This bond represents a special form of polar interaction in which an electropositive hydrogen atom is partially shared by two electronegative atoms. Its hydrogen can be viewed as a proton that has partially dissociated from a donor atom, allowing it to be shared by a second acceptor atom. Unlike a typical electrostatic interaction, this bond is highly directional-being strongest when a straight line can be drawn between all three of the involved atoms. As already discussed,water weakens these bonds by forming competing hydrogen-bond interactions with the involved molecules. van der Waals attractions. The electron cloud around any nonpolar atom will fluctuate, producing a flickering dipole. Such dipoles will transiently induce an oppositely polarized flickering dipole in a nearby atom. This interaction generates a very weak attraction between atoms. But since many atoms can be simultaneously in contact when two surfaces fit closely,the net result is often significant. Water does not weaken these socalled van der Waals attractions. The fourth effect that often brings molecules together in water is not, strictly speaking, a bond at all. However, a very important hydrophobic force is caused by a pushing of nonpolar surfaces out of the hydrogen-bonded water network, where they would otherwise physically interfere with the highly favorable interactions between water molecules. Bringing any two nonpolar surfaces together reduces their contact with water; in this sense,the force is nonspecific. Nevertheless, we shall see in Chapter 3 that hydrophobic forces are central to the proper folding of protein molecules. Panel 2-3 provides an overview of the four types of attractions just described. And Figure 2-16 illustrates schematically how many such interactions can sum to hold together the matching surfaces of two macromolecules, even though each interaction by itself would be much too weak to be effective in the face of thermal motions.


Figure2-14 How the dipoleson water moleculesorientto reducethe affinity of oppositelychargedionsor polar groups for each other.

A Cellls Formedfrom CarbonCompounds

Figure2-15 Hydrogenbonds.(A)Ball-and-stick modelof a typical hydrogenbond.Thedistancebetweenthe hydrogenand the oxygenatom hereis lessthan the sum of theirvanderWaalsradii,indicatinga partial (B)The mostcommonhydrogenbondsin cells. sharingof electrons.

"o uH


Having looked at the ways atoms combine into small molecules and how these molecules behave in an aqueous environment, we now examine the main classesof small molecules found in cells and their biological roles.We shall see that a few basic categoriesof molecules, formed from a handful of different elements, give rise to all the extraordinary richness of form and behavior shown by living things. If we disregard water and inorganic ions such as potassium, nearly all the molecules in a cell are based on carbon. Carbon is outstanding among all the elements in its ability to form large molecules; silicon is a poor second. Becauseit is small and has four electrons and four vacancies in its outermost shell, a carbon atom can form four covalent bonds with other atoms. Most important, one carbon atom can join to other carbon atoms through highly stable covalent C-C



h y d r o g e nb o n d - 0 . 3 n m l o n g oonor atom

accepror atom

.ou-rf"na UonO - 0 . 1n m l o n g (B)

o-Hililililililro o-Hililililililtoo o - H ilililililil| - H ililililililr o H ililililililr O - H ilililililil|



bonds to form chains and rings and hence generatelarge and complex molecules with no obvious upper limit to their size (seePanel 2-1, pp. 106-107).The small and large carbon compounds made by cells are called organic molecules. Certain combinations of atoms, such as the methyl (-CHs), hydroxyl (-OH), carboxyl (-COOH), carbonyl (-C=O), phosphate (-POs2-),sulfhydryl (-SH), and amino (-NHz) groups, occur repeatedly in organic molecules. Each such chemical group has distinct chemical and physical properties that influence the behavior of the molecule in which the group occurs. The most common chemical groups and some of their properties are summarized in Panel 2-1, pp. 106-107.

CellsContainFourMajorFamilies of SmallOrganicMolecules The small organic molecules of the cell are carbon-based compounds that have molecular weights in the range 100-1000 and contain up to 30 or so carbon atoms. They are usually found free in solution and have many different fates. Some are used as monomer subunits to construct the giant polymeric macromolecules-the proteins, nucleic acids, and large polysaccharides-of the cell. Others act as energy sources and are broken down and transformed into other small molecules in a maze of intracellular metabolic pathways. Many small molecules have more than one role in the cell-for example, acting both as a potential subunit for a macromolecule and as an energy source. Small organic molecules are much less abundant than the organic macromolecules, accounting for only about one-tenth of the total mass of organic matter in a cell (Table 2-Z). As a rough guess,there may be a thousand different kinds of these small molecules in a typical cell. All organic molecules are slmthesized from and are broken down into the same set of simple compounds. Both their slmthesis and their breakdown occur through sequences of limited chemical changes that follow definite rules. As a consequence, the compounds in a cell are chemically related and most can be classified into a few distinct families. Broadly speaking, cells contain four major families of small organic molecules: lhe sugars, the fatty acids, the amino acids, and the nucleotides (Figure 2-17). Although many compounds present in cells do not fit into these categories,these four families of small organic molecules, together with the macromolecules made by linking them into long chains, account for a large fraction of cell mass (seeTable 2-2).

SugarsProvidean EnergySourcefor Cellsand Arethe Subunitsof Polysaccharides The simplest sugars-the monosaccharides-are compounds with the general formula (CH2O)2,where n is usually 3, 4, 5, 6,7 , or 8. Sugars,and the molecules made from them, are also called carbohydratesbecause of this simple formula. Glucose,for example, has the formula C6H1206@igure 2-18). The formula, however,does not fully define the molecule: the same set of carbons, hydrogens, and Table2-2 TheTypesof MoleculesThat Forma BacterialCell

Water I n o r g a n i co n s Sugars and precursors Aminoacidsand precursors Nucleotides and precursors Fattyacidsand precursors O t h e rs m a lm l olecules (protei Macromolecules ns, nucleicacids,and polysaccharides)

70 1 1 0.4 0.4 1 0.2 26

1 20 250 100 100 50 -300 -3000


Figure2-16 Schematicindicatinghow with two macromolecules complementarysurfacescan bind tightly to one anotherthrough noncovalentinteractions,


Chapter2:CellChemistryand Biosynthesis

b u i l d i n gb l o c k s of the cell

l a r g e ru n i t s of the cell

-laltaaclg:---J+ _AUlIgASlps"_-"__l+


___NUcIi-oJlPSl**.-I+ !w

oxygens can be joined together by covalent bonds in a variety ofways, creating structures with different shapes.As shown in Panel 2-4 (pp.1l2-113), for example, glucose can be converted into a different sugar-mannose or galactosesimply by switching the orientations of specific OH groups relative to the rest of the molecule. Each of these sugars,moreover, can exist in either of two forms, called the D-form and the l-form, which are mirror images of each other. Setsof molecules with the same chemical formula but different structures are called isomers,and the subset of such molecules that are mirror-image pairs are called optical isomers.Isomers are widespread among organic molecules in general, and they play a major part in generating the enormous variety of sugars. Panel 2-4 presents an outline of sugar structure and chemistry. Sugarscan exist as rings or as open chains. In their open-chain form, sugars contain a number of hydroxyl groups and either one aldehyde ( > C : O) or one ketone H (> C: O) group. The aldehyde or ketone group plays a special role. First, it can react with a hydroxyl group in the same molecule to convert the molecule into a ring; in the ring form the carbon of the original aldehyde or ketone group can be recognized as the only one that is bonded to two oxygens.Second, once the ring is formed, this same carbon can become further linked, via oxygen, to one of the carbons bearing a hydroxyl group on another sugar molecule. This creates a disaccharide such as sucrose,which is composed of a glucose and a fructose unit. Larger sugar polymers range from the oligosaccharldes(trisaccharides, tetrasaccharides,and so on) up to giant polysaccharides,wlr'ich can contain thousands of monosaccharideunits. The way that sugars are linked together to form poly'rnersillustrates some common features of biochemical bond formation. A bond is formed between an -OH group on one sugar and an -OH group on another by a condensation reaction, in which a molecule of water is expelled as the bond is formed (Figure 2-19). Subunits in other biological polymers, such as nucleic acids and proteins, are also linked by condensation reactions in which water is expelled.The bonds created by all of these condensation reactions can be broken by the reverseprocessof hydrolysis, in which a molecule of water is consumed (seeFigure 2-19).

CH,OH ta -a) "\()H H i lH \r CC H ,/l l\ OH l/ Ho\l n C-C L]



Figure2-17 Thefour main familiesof smallorganicmoleculesin cells.These form the monomeric smallmolecules buildingblocks,or subunits,for mostof the macromolecules and other of the cell.Some,suchasthe assemblies sugarsand the fatty acids,arealsoenergy 50urce5.

Figure2-18 The structureof glucose,a previously simplesugar.As illustrated for water(seeFigure2-12),any moleculecan in severalways.In the be represented structuralformulasshownin (A),(B)and (C),the atomsareshownas chemical symbolslinkedtogetherby lines representing the covalentbonds.The thickenedlineshereare usedto indicate the planeof the sugarring,in an attempt to emphasize that the -H and -OH groupsarenot in the sameplaneasthe ring.(A)Theopen-chain form of this sugar,which is in equilibriumwith the morestablecyclicor ringform in (B). (C)Thechairform is an alternative way to drawthe cyclicmoleculethat reflectsthe geometrymoreaccurately than the structuralformulain (B).(D)A spacefillingmodel,which,aswell as depicting the three-dimensional arrangement of the atoms,alsousesthe van derWaals radiito representthe surfacecontoursof the molecule.(E)A ball-and-stick model in whichthe three-dimensional arrangement of the atomsin spaceis shown.(H,white;C,black;O, red;N, blue.)


57 monosaccharide


Figure2-19 The reactionof two monosaccharidesto form a Thisreactionbelongsto a disaccharide. generalcategoryof reactions termed reactions, in which two condensation join togetheras a resultof the molecules The reverse lossof a watermolecule. reaction(in whichwateris added)is termed hydrolysrs.Note that the reactive carbonat whichthe new bond is formed (on the monosaccharide on the /efthere) is the carbonjoinedto two oxygensasa resultof sugarringformation(seeFigure this commontype of 2-18),As indicated, covalentbond betweentwo sugar bond moleculesis known as a glycosidic (seealsoFigure2-20).




water expelled

water consumed

'""1,]"0 j" ^o flY.'."n'.?,'j Because each monosaccharide has several free hydroxyl groups that can form a link to another monosaccharide (or to some other compound), sugar polymers can be branched, and the number of possible polysaccharide structures is extremely large. Even a simple disaccharide consisting of two glucose units can exist in eleven different varieties (Figure 2-2O), while three different hexoses (CoHrzOo)can join together to make several thousand trisaccharides. For this reason it is a much more complex task to determine the arrangement of sugarsin a polysaccharide than to determine the nucleotide sequenceof a DNA molecule, where each unit is joined to the next in exactly the same way. The monosaccharide glucoseis a key energy source for cells. In a series of reactions, it is broken down to smaller molecules, releasing energy that the cell can harness to do useful work, as we shall explain later. Cells use simple polysaccharides composed only of glucose units-principally glycogenin animals and starchin plants-as energy stores.

p1*6 CH,OH



q"q fo'"

,r-Q '

p 1 *4

CH,OH t-




,/-o\ \-,/



o,.f\_/ |\i

(Il * o(l

Rl* 2

Figure2-20 Elevend isaccharides consistingof two D-glucoseunits. Althoughthesedifferonly in the type of linkagebetweenthe two glucoseunits, distinct.Sincethe they arechemically associated with proteins oligosaccharides and lipidsmay havesixor moredifferent kindsof sugarjoined in both linearand through branchedarrangements glycosidic bondssuchasthoseillustrated here,the numberof distincttypesof that can be usedin cells oligosaccharides is extremelylarge.Foran explanation seePanel2-4 of s and p linkages, (pp.112-113).Shortb/acklinesending (Redlines "blind"indicateOH positions. bond merelyindicatedisaccharide and'torners"do not imply orientations extraatoms.)


Chapter2: CellChemistryand Biosynthesis

Sugars do not function only in the production and storage of energy.They can also be used, for example, to make mechanical supports. Thus, the most abundant organic chemical on Earth-the cellulose of plant cell walls-is a polysaccharide of glucose.Becausethe glucose-glucoselinkages in cellulose differ from those in starch and glycogen, however, humans cannot digest cellulose and use its glucose. Another extraordinarily abundant organic substance, the chitin of insect exoskeletonsand fungal cell walls, is also an indigestible polysaccharide-in this case a linear polymer of a sugar derivative called ly'-acetylgl.,cosamine (see Panel 2-4). Other polysaccharides are the main components of slime, mucus, and gristle. Smaller oligosaccharidescan be covalently linked to proteins to form glycoproteins and to lipids to form glycolipids,both of which are found in cell membranes.As described in Chapter 10,most cell surfacesare clothed and decorated with glycoproteins and glycolipids in the cell membrane. The sugar side chains on these molecules are often recognized selectively by other cells. And differences between people in the details of their cell-surface sugars are the molecular basis for the different major human blood groups, termed A, B, AB, and O.

FattyAcidsAreComponents of CellMembranes, asWellasa Sourceof Energy A fatty acid molecule, such as palmitic acid,has two chemically distinct regions (Figure 2-21). One is a long hydrocarbon chain, which is hydrophobic and not very reactive chemically. The other is a carboxyl (-COOH) group, which behaves as an acid (carboxylic acid): it is ionized in solution (-COO-), extremely hydrophilic, and chemically reactive.Almost all the fatty acid molecules in a cell are covalently linked to other molecules by their carboxylic acid group. The hydrocarbon tail of palmitic acid is saturated: it has no double bonds between carbon atoms and contains the maximum possible number of hydrogens. Stearic acid, another one of the common fatty acids in animal fat, is also saturated. Some other fatty acids, such as oleic acid, have unsaturatedtails,with one or more double bonds along their length. The double bonds create kinks in the molecules, interfering with their ability to pack together in a solid mass. It is this that accounts for the difference between hard margarine (saturated) and liquid vegetable oils (polyunsaturated). The many different fatty acids found in cells differ only in the length of their hydrocarbon chains and the number and position ofthe carbon-carbon double bonds (seePanel2-5, pp.1l4-ll5). Fatty acids are stored in the cytoplasm of many cells in the form of droplets of triacylglycerol molecules, which consist of three fatty acid chains joined to a glycerol molecule (seePanel 2-5); these molecules are the animal fats found in meat, butter, and cream, and the plant oils such as corn oil and olive oil. \Mhen required to provide energy, the fatty acid chains are released from triacylglycerols and broken dor,rminto two-carbon units. These Wvo-carbonunits are identical to those derived from the breakdor,rrnof glucose and they enter the same energyyielding reaction pathways, as will be described later in this chapter.Triglycerides serve as a concentrated food reserve in cells, because they can be broken down to produce about six times as much usable energy,weight for weight, as glucose. Fatty acids and their derivatives such as triacylglycerols are examples of lipids. Lipids comprise a loosely defined collection of biological molecules that are insoluble in water, while being soluble in fat and organic solvents such as benzene. They typically contain either long hydrocarbon chains, as in the fatty acids and isoprenes,or multiple linked rings, as inthe steroids. The most important function of fatty acids in cells is in the construction of cell membranes. These thin sheets enclose all cells and surround their internal organelles. They are composed largely of phospholipids, which are small molecules that, like triacylglycerols, are constructed mainly from fatty acids and glycerol. In phospholipids the glycerol is joined to two fatty acid chains, however, rather than to three as in triacylglycerols. The "third" site on the glycerol is linked to a hydrophilic phosphate group, which is in turn attached to a small hydrophilic compound such as choline (see Panel 2-5). Each phospholipid

h y d r o p h i l i cc a r b o x y l i ca c i d h e a d

o \

h y d r o p h o b i ch y d r o c a r b o nt a i l (A)



Figure2-21 A fatty acid.A fatty acid is composedof a hydrophobichydrocarbon chainto which is attacheda hydrophilic carboxylic acidgroup.Palmiticacidis shown here.Differentfatty acidshave differenthydrocarbontails.(A)Structural formula.Thecarboxylic acidgroupis shownin its ionizedform.(B)Ball-andstickmodel.(C)Space-filling model.


rwo hydrophobic fatty acid tails

59 Figure2-22 Phospholipidstructure and the orientationof phospholipidsin membranes.In an aqueousenvironment, the hydrophobictailsof phospholipids packtogetherto excludewater.Here they haveformed a bilayerwith the hydrophilicheadof eachphospholipid facingthe water.Lipidbilayersarethe asdiscussed in basisfor cellmembranes, detailin Chapter10.




p h o s p h o l i p i dm o t e c u t e

molecule, therefore, has a hydrophobic tail composed of the two fatty acid chains and a hydrophilic head, where the phosphate is located. This gives them different physical and chemical properties from triacylglycerols,which are predominantly hydrophobic. Molecules such as phospholipids, with both hydrophobic and hydrophilic regions, are termed amphiphilic. The membrane-forming property of phospholipids results from their amphiphilic nature. Phospholipids will spread over the surface of water to form a monolayer of phospholipid molecules, with the hydrophobic tails facing the air and the hydrophilic heads in contact with the water. TWo such molecular layers can readily combine tail-to-tail in water to make a phospholipid sandwich, or lipid bilayer. This bilayer is the structural basis of all cell membranes (Figure 2-22).

AminoAcidsArethe Subunitsof Proteins Amino acids are a varied class of molecules with one defining property: they all possessa carboxylic acid group and an amino group, both linked to a single carbon atom called the cr-carbon (Figure 2-23). Their chemical variety comes from the side chain that is also attached to the cx-carbon.The importance of amino acids to the cell comes from their role in making proteins, which are polymers of amino acids joined head-to-tail in a long chain that is then folded into a threedimensional structure unique to each type of protein. The covalent linkage between two adjacent amino acids in a protein chain forms an amide (seePanel 2-l), and it is called a peptide bond; the chain of amino acids is also known as a polypeptide (Figure 2-24). Regardlessof the specific amino acids from which it is made, the pollpeptide has an amino (NH2) group at one end (its N-terminus) and a carboxyl (COOH) group at its other end (its C-terminus).This givesit a definite directionality-a structural (as opposed to an electrical) polarity. Each of the 20 amino acids found commonly in proteins has a different side chain attached to the o-carbon atom (seePanel 3-1, pp. 128-129).All organisms, amtno group

c ar D o x yI group

s i d ec h a i n( R ) n o n i o n i z e df o r m (A)

i o n i z e df o r m (B)


Figure2-23 The amino acid alanine. (A)In the cell,wherethe pH is closeto 7, the freeaminoacidexistsin its ionized into a form;but when it is incorporated polypeptidechain,the chargeson the aminoand carboxylgroupsdisappear. (B)A ball-and-stick modeland (C)a modelof alanine(H,white; space-filling C, black;O, red;N,blue).


Chapter2:CellChemistryand Biosynthesis Figure 2-24 A small part of a protein molecule.The four amino acids shown are linkedtogether by three peptide bonds,one of which is highlightedin yellow.One of the amino acidsis shadedin gray.The amino acid sidechainsare shown in red.The two ends of a polypeptidechainare chemically distinct.Oneend,the N-terminus, terminatesin an amino group,and the other,the C-terminus,in a carboxylgroup.The sequenceis alwaysreadfrom the N-terminalend; hencethis sequenceis Phe-Ser-Glu-Lys.

N-terminal end o{ p o l y p e p t i d ec h a i n I N_H I H-C -CH? I


N-H H-C -CHr



whether bacteria,archea,plants, or animals,haveproteins made of the same20 amino acids.How this preciseset of 20 cameto be chosenis one of the mysteries of the evolution of life; there is no obviouschemicalreasonwhy other amino acidscould not haveservedjust aswell. But once the choicewas established,it could not be changed;too much dependedon it. Like sugars,all amino acids,exceptglycine,existasoptical isomersin D- and L-forms(seePanel3-l). But only L-forms€ueeverfound in proteins(althoughDamino acids occur as part of bacterial cell walls and in some antibiotics). The origin of this exclusiveuse of l-amino acids to make proteins is another evolutionarymystery. The chemicalversatility of the 20 amino acidsis essentialto the function of proteins.Five of the 20 amino acidshave side chainsthat can form ions in neutral aqueoussolution and thereby can carry a charge(Figure2-25). The others are uncharged;some are polar and hydrophilic, and some are nonpolar and hydrophobic.As we discussin Chapter3, the propertiesof the amino acid side chainsunderlie the diverseand sophisticatedfunctions of proteins.

O=C Glu


N-H ,,H H - C - C H , --C H ? - C H r - C H ) --N - H l | \H O=C

C-terminal end of p o l y p e p t i d ec h a i n



aspartic acid pK-4.7

glutamic acid pK-4.7







Figure2-25 The charge on amino acid side chainsdepends on the pH. The five differentsidechainsthat can carrya chargeare shown.Carboxylicacidscan readilyloseH+in aqueoussolutionto form a negativelychargedion, which is denoted by the suffix"-atei'asin aspartdteor glutamote.A comparable situationexistsfor amines.which in aqueoussolutioncan take up H+to form a positivelychargedion (whichdoes not havea specialname).Thesereactionsare rapidlyreversible, and the amountsof the two forms,chargedand uncharged, dependon the pH of the solution.At a high pH, carboxylicacidstend to be chargedand aminesuncharged. At a low pH,the oppositeis true-the carboxylic acidsare unchargedand aminesare charged.The pH at which exactlyhalf of the carboxylic acidor amineresidues are chargedis known as the pK of that amino acid sidechain (indicatedbyyellow stripe). In the cellthe pH is closeto 7, and almostall carboxylic acidsand aminesare in theirfullychargedform.



o() Pl"


Figure2-26Chemicalstructureof adenosinetriphosphate(ATP). (A)Structural formula.(B)Space-filling model.In (8)the colorsof the atomsare C, black;N, b/ue;H, white;O, red; and P,yellow.


\ -"oo'in



"\oto' \

o"'xozclt oX^,t-a.


H nlln OH OH




Nucleotides Arethe Subunitsof DNAand RNA A nucleotide is a molecule made up of a nitrogen-containing ring compound linked to a five-carbon sugar, which in turn carries one or more phosphate groups (Panel2-6, pp.116-117). The five-carbon sugar can be either ribose or deoxyribose. Nucleotides containing ribose are known as ribonucleotides, and those containing deoxyribose as deoxyribonucleotides.The nitrogen-containing rings are generally referred to as basesfor historical reasons:under acidic conditions they can each bind an H+ (proton) and thereby increase the concentration of OH- ions in aqueous solution. There is a strong family resemblance between the different bases. Cyfosine (C), thymine (T), and uracil (U) are called pyrimidines becausethey all derive from a six-membered pyrimidine ring; guanine (G) and adenine (A) are purinecompounds, and theyhave a second, five-membered ring fused to the six-membered ring. Each nucleotide is named for the base it contains (seePanel 2-6). Nucleotides can act as short-term carriers of chemical energy.Above all others, the ribonucleotide adenosine triphosphate, or ATP (Figure 2-26), transfers energy in hundreds of different cell reactions. ATP is formed through reactions that are driven by the energy released by the oxidative breakdown of foodstuffs. Its three phosphates are linked in seriesby two phosphoanhydride bonds,whose rupture releaseslarge amounts of useful energy.The terminal phosphate group in particular is frequently split off by hydrolysis, often transferring a phosphate to other molecules and releasing energy that drives energy-requiring biosynthetic reactions (Figure 2-27). Other nucleotide derivatives are carriers for the transfer of other chemical groups, as will be described later. The most fundamental role of nucleotides in the cell, however, is in the storage and retrieval of biological information. Nucleotides serve as building blocks for the construction of nucleic aclds-long pol].rynersin which nucleotide subunits are covalently linked by the formation of a phosphodiester bond between the p h o s p h o a n h y d r i dbeo n d s TI



P -O-P-O-PP-O_ o P i l it l t l ol oo o

energy from s u n l i g h ot r from food




I o-P-o



o Inorganrc p h o s p h a t e( P i )

e n e r g ya v a i l a b l e f o r c e l l u l a rw o r k and {or chemical synthesis

Figure2-27 The ATPmoleculeservesas an energycarrierin cells.Theenergyrequiringformationof ATPfrom ADPand inorganicphosphateis coupledto the oxidationof foodstuffs energy-yielding (in animalcells,fungi,and somebacteria) or to the captureof light energy(in plant The hydrolysis cellsand somebacteria). of this ATPbackto ADPand inorganic phosphatein turn providesthe energyto drivemanvcell reactions.


Chapter2: CellChemistryand Biosynthesis Figure2-28A smallpartof onechainof a deoxyribonucleic acid(DNA) molecule. Fournucleotides areshown. Oneof thephosphodiester bonds thatlinksadjacent nucleotide residues ishighlighted inyellow, andoneof thenucleotides isshaded in gray.Nucleotides arelinkedtogether bya phosphodiester linkage between specific carbonatomsof theribose, knownasthe5'and3'atoms. Forthisreason, oneendof a polynucleotide groupandtheother,the chain,the5' end,willhavea freephosphate group. 3' end,a freehydroxyl Thelinearsequence of nucleotides in a polynucleotide chainiscommonly abbreviated by a one-letter code,and thesequence isalways readfromthe5'end.Intheexample illustrated the sequence isG-A-T-C.

phosphate group attached to the sugar of one nucleotide and a hydroxyl group on the sugar of the next nucleotide (Figure 2-28). Nucleic acid chains are sgrthesized from energy-rich nucleoside triphosphates by a condensation reaction that releasesinorganic plrophosphate during phosphodiesterbond formation. There are two main types of nucleic acids, differing in the type of sugar in their sugar-phosphate backbone. Those based on the sugar ribose are knor,tmas ribonucleic acids, or RNA, and normally contain the basesA, G, C, and U. Those based on deoxyribose(in which the hydroxyl at the 2' position of the ribose carbon ring is replaced by a hydrogen are knor,rm as deoxyribonucleic acids, or DNA, and contain the bases A, G, C, and T (T is chemically similar to the U in RNA, merely adding the methyl group on the pyrimidine ring; see Panel2-6). RNA usually occurs in cells as a single polynucleotide chain, but DNA is virtually always a double-stranded molecule-a DNA double helix composed of two polynucleotide chains running antiparallel to each other and held together by hydrogen-bonding between the basesof the two chains. The linear sequence of nucleotides in a DNA or an RNA encodes the genetic information of the cell. The ability of the bases in different nucleic acid molecules to recognize and pair with each other by hydrogen-bonding (called base-pairing)-G with C, and A with either T or U-underlies all of heredity and evolution, as explained in Chapter 4.



o I



o I


TheChemistry of Cellsls Dominatedby Macromolecules with Remarkable Properties



By weight, macromolecules are the most abundant carbon-containing molecules in a living cell (Figure 2-29 and Table 2-3). They are the principal building blocks from which a cell is constructed and also the components that confer the most distinctive properties of living things. The macromolecules in cells are polymers that are constructed by covalently linking small organic molecules (called monomers) into long chains (Figure 2-3O). Yet they have remarkable properties that could not have been predicted from their simple constituents. Proteins are especially abundant and versatile. They perform thousands of distinct functions in cells. Many proteins serve as enzymes,the catalysts that i o n s ,s m a l l m o l e c u l e s( 4 % ) p h o s p h o l i p i d(s2 % ) D N A( 1 % ) R N A( 5 % )

^ 7



p r o t e i n s( 1 5 % )

o -m r) C



polysaccharide ( 2s% )

Figure2-29 Macromoleculesare abundantin cells.Theapproximate compositionof a bacterialcellis shown by weight.The compositionof an animal cellis similar(seeTable2-3).



Table2-3 ApproximateChemicalCompositionsof a TypicalBacteriumand a TypicalMammalianCell

Hzo lnorganic ions(Na+,K*,Mg2*, ca2+,cl-, etc.) Miscellaneous smallmetabolites Proteins RNA DNA Phospholipids Otherlipids Polysaccharides Totalcellvolume Relativecellvolume

70 1

70 1

3 15 6 1 ')

3 18 1.1 0.25 ?


.\ 2 y 1 g - 1.2r 3 1


4 x 1 0 - ec m 3 2000

P r o t e i n s p, o i y s a c c h a r i d e sD,N A ,a n d R N Aa r e m a c r o m o l e c u l e sL i p i d sa r e n o t g e n e r a l l yc a s s e da s

b o t h m a m m a la n a n d b a c L e r r a ce ts

bles to make the cell'slong microtubules, or histones, proteins that compact the DNA in chromosomes. Yet other proteins act as molecular motors to produce force and movement, as in the caseof myosin in muscle. proteins perform many other functions, and we shall examine the molecular basis for many of them later in this book. Here we identifu some general principles of macromolecular chemistry that make such functions possible. Although the chemical reactions for adding subunits to each polyrner are different in detail for proteins, nucleic acids, and polysaccharides, they share important features. Each polymer grows by the addition of a monomer onto the end of a growing polymer chain in a condensation reaction, in which a

made from a set of monomers that are slightly different from one another-for example, the 20 different amino acids from which proteins are made. It is critical to life that the polymer chain is not assembled at random from these subunits; instead the subunits are added in a particular order, or sequence.T]ne elaborate mechanisms that allow this to be accomplished by enzymes are describedin detail in Chapters5 and 6.

NoncovalentBondsSpecifyBoth the preciseShapeof a Macromoleculeand its Bindingto Other Molecules Most of the covalent bonds in a macromolecule allow rotation of the atoms they join' giving the polymer chain great flexibility. In principle, this allows a macromolecule to adopt an almost unlimited number of shapes,or conformations, as


s u g ar



p o l y s a c c hr ai d e

amrno u.,0"


nu c l e o t i d e

n u c l e i ca c i d

Figure2-30 Threefamiliesof macromolecules. Eachis a polymer formedfrom smallmolecules(called monomers)linkedtogetherby covalentbonds.


Chapter2: CellChemistryand Biosynthesis Figure2-31 Most proteinsand many RNAmoleculesfold into only one stable bonds conformation.lf the noncovalent are maintainingthis stableconformation disrupted, the moleculebecomesa flexiblechainthat usuallyhasno biologicalvalue.

many unstable conformations

one stable folded conformation

random thermal energy causesthe polymer chain to writhe and rotate. However, the shapesof most biological macromolecules are highly constrained becauseof the many we ak noncoualent bonds that form between different parts of the same molecule. If these noncovalent bonds are formed in sufficient numbers, the polyrner chain can strongly prefer one particular conformation, determined by the linear sequenceof monomers in its chain. Most protein molecules and many of the small RNA molecules found in cells fold tightly into one highly preferred conformation in this way (Figure 2-31). The four types of noncovalent interactions important in biological molecules were described earlier, and they are reviewed in Panel 2-3 (pp. 110-lll). Although individuallyveryweak, these interactions cooperate to fold biological macromolecules into unique shapes.In addition, theycan also add up to create a strong attraction between two different molecules when these molecules fit together very closely,like a hand in a glove.This form of molecular interaction provides for great specificity, inasmuch as the multipoint contacts required for strong binding make it possible for a macromolecule to select outthrough binding-just one of the many thousands of other types of molecules present inside a cell. Moreover, because the strength of the binding depends on the number of noncovalent bonds that are formed, interactions of almost any affinity are possible-allowing rapid dissociation when necessary. Binding of this type underlies all biological catalysis,making it possible for proteins to function as enzymes. Noncovalent interactions also allow macromolecules to be used as building blocks for the formation of larger structures. In cells, macromolecules often bind together into large complexes, thereby forming intricate machines with multiple moving parts that perform such complex tasks as DNA replication and protein synthesis (Figure 2-32).


MACROMOLECULES c o v a l e n tb o n d s

n o n c o v a l e nbt o n d s


e g , s u g a r sa, m i n o a c i d s , and nucleotides 30 nm eg,globularproteins and RNA

e 9., ribosome

Figure2-32 Smallmolecules,proteins,and a ribosomedrawn approximatelyto scale.Ribosomes area centralpart of the machinerythat the (proteinand RNAmolecules). cellusesto makeproteins:eachribosomeisformedasa complexof about90 macromolecules



Summary Liuing organismsare autonomous, self-propagatingchemical systems.Theyare made from a distinctiue and restrictedset of small carbon-basedmoleculesthat are essentially the samefor eueryliuing species.Each of thesemoleculesis composedof a small set of atoms linked to each other in a preciseconftguration through coualent bonds. The main categoriesare sugars,fatty acids,amino acids,and nucleotides.Sugarsare a primary sourceof chemical energyfor cellsand can be incorporated into polysaccharides for energy storage.Fatty acids are also important for energy storage,but their most critical function is in the formation of cell membranes.Polymers consisting of amino acids constitute the remarkably diuerseand uersatilemacromoleculesknown as proteins. Nucleotidesplay a central part in energy transfer.They are also the subunits for the informational macromolecules,RNAand DNA. Most of the dry massof a cell consistsof macromoleculesthat hauebeenproduced as linear polymersof amino acids (proteinsl or nucleotides(DNA and RNA),coualently linked to each other in an exact ordex Most of the protein moleculesand many of the RNAsfold into a unique conformation that depends on their sequenceof subunits. This folding processcreatesunique surfaces,and it depends on a large set of weak attractions produced by noncoualentforces between atoms. Theseforces are of four types:electrostaticattrqctions, hydrogen bonds, uan der Waals attractions, and an interaction between nonpolar groups caused by their hydrophobic expulsion f'rom water. The same set of weak forcesgouernsthe specific binding of other moleculesto macromolecules,making possible the myriad associations between biological moleculesthat produce the structure and the chemistrv of a cell.

CATALYSIS ANDTHEUSEOFENERGY BYCELLS One property of living things above all makes them seem almost miraculously different from nonliving matter: they create and maintain order, in a universe that is tending always to greater disorder (Figure 2-33). To create this order, the cells in a living organism must perform a never-ending stream of chemical reactions. In some of these reactions, small organic molecules-amino acids, sugars, nucleotides, and lipids-are being taken apart or modified to supply the many other small molecules that the cell requires. In other reactions, these small molecules are being used to construct an enormously diverse range of proteins, nucleic acids, and other macromolecules that endow living systems with all of their most distinctive properties. Each cell can be viewed as a tiny chemical factory performing many millions of reactions every second.

Figure2-33 Order in biologicalstructures.Well-defined,ornate,and beautifulspatialpatternscan be size:(A)proteinmoleculesin In orderof increasing foundat everylevelof organization in livingorganisms. the coat of a virus;(B)the regulararrayof microtubulesseenin a crosssectionof a spermtail; (C)surface contoursof a pollengrain(a singlecell);(D)close-upof the wing of a butterflyshowingthe patterncreated by scales, eachscalebeingthe productof a singlecell;(E)spiralarrayof seeds,madeof millionsof cells,in the headof a sunflower.(A,courtesyof R.A.Grantand J.M.Hogle;B,courtesyof LewisTilney;C,courtesyof ColinMacFarlane and ChrisJeffree; D and E,courtesyof KjellB.Sandved.)


Chapter2:CellChemistryand Biosynthesis

otecue l






c a t a l y s ibs y e n z y m e1


o-o-o-a-a-o e n z y m e2

e n z y m e3

e n z y m e4

e n z y m e5

Figure2-34 How a set ofenzyme-catalyzedreactionsgeneratesa metabolic pathway.Eachenzyme In this example,a setof enzymes catalyzes a particularchemicalreaction,leavingthe enzymeunchanged. actingin seriesconvertsmoleculeA to moleculeF,forminga metabolicpathway.

CellMetabolismls Organized by Enzymes The chemical reactions that a cell carries out would normally occur only at much higher temperatures than those existing inside cells. For this reason, each reaction requires a specific boost in chemical reactivity.This requirement is crucial, because it allows the cell to control each reaction. The control is exerted through the specialized proteins called enzymes,each of which accelerates,or catalyzes,just one of the many possible kinds of reactions that a particular molecule might undergo. Enzyme-catalyzedreactions are usually connected in series,so that the product of one reaction becomes the starting material, or substrate,for the next (Figure 2-34). These long linear reaction pathways are in turn linked to one another, forming a maze of interconnected reactions that enable the cell to survive, grow, and reproduce (Figure 2-35). TWoopposing streams of chemical reactions occur in cells: (l) Ihe catabolic pathways break down foodstuffs into smaller molecules, thereby generating both a useful form of energy for the cell and some of the small molecules that the cell needs as building blocks, and (2) the anabolic, or biosynthellq pathways use the energy harnessed by catabolism to drive the synthesis of the many other molecules that form the cell. Together these two sets of reactions constitute the metabolism of the cell (Figure 2-36). Many of the details of cell metabolism form the traditional subject of biochemistry and need not concern us here. But the general principles by which cells obtain energy from their environment and use it to create order are central to cell biology. We begin with a discussion of why a constant input of energy is needed to sustain living organisms.

Biological Orderls MadePossible by the Release of HeatEnergy from Cells The universal tendency of things to become disordered is a fundamental law of physics-the secondlaw of thermodynamics-which states that in the universe, or in any isolated system (a collection of matter that is completely isolated from the rest of the universe), the degreeof disorder only increases.This law has such profound implications for all living things that we restate it in severalways. For example,we can present the second law in terms of probability and state that systems will change spontaneously toward those arrangements that have the greatest probability. If we consider, for example, a box of 100 coins all lying heads up, a series of accidents that disturbs the box will tend to move the arrangement toward a mixture of 50 heads and 50 tails. The reason is simple: there is a huge number of possible arrangements of the individual coins in the mixture that can achieve the 50-50 result, but only one possible arrangement that keeps all of the coins oriented heads up. Becausethe 50-50 mixture is therefore the most probable, we say that it is more "disordered." For the same reason,

Figure2-35 Someof the metabolicpathwaysand their interconnections in a typicalcell,About500commonmetabolicreactions areshown diagrammatically, with eachmoleculein a metabolicpathwayrepresented by a filledcircle,as in the yel/owbox in Figure2-34.The pathwaythat is highlightedin this diagramwith largercirclesand connectinglinesisthe centralpathwayof sugarmetabolism, whichwill be discussed shortly.

CATALYSIS ANDTHEUSEOF ENERGY BYCELLS Figure2-36 Schematicrepresentationof the relationshipbetween catabolicand anabolicpathwaysin metabolism.As suggestedhere,since a major portion of the energystoredin the chemicalbondsof food moleculesis dissipated as heat,the massof food requiredby any organism that derivesall of its energyfrom catabolism is muchgreaterthan the mass of the molecules that can be oroducedbv anabolism.


it is a common experience that one's living space will become increasingly disordered without intentional effort: the movement toward disorder is a sponta-

neous process,requiring a periodic effort to reverse it (Figure 2-37). The amount of disorder in a system can be quantified and expressedas the entropy of the system: the greater the disorder, the greater the entropy. Thus, another way to express the second law of thermodynamics is to say that systems will change spontaneously toward arrangements with greater entropy. Living cells-by surviving, growing, and forming complex organisms-are generating order and thus might appear to defu the second law of thermodynamics. How is this possible?The answer is that a cell is not an isolated system: it takes in energy from its environment in the form of food, or as photons from the sun (or even, as in some chemosynthetic bacteria, from inorganic molecules alone), and it then uses this energy to generate order within itself. In the course of the chemical reactions that generateorder, the cell converts part of the energy it usesinto heat. The heat is dischargedinto the cell'senvironment and disorders it, so that the total entropy-that of the cell plus its surroundings-increases, as demanded by the laws of thermodlmamics. To understand the principles governing these energy conversions, think of a cell surrounded by a sea of matter representing the rest of the universe. As the cell lives and grows, it creates internal order. But it constantly releases heat energy as it synthesizes molecules and assembles them into cell structures. Heat is energy in its most disordered form-the random jostling of molecules. \iVhen the cell releasesheat to the sea, it increases the intensity of molecular motions there (thermal motion)-thereby increasing the randomness, or disorder, of the sea. The second law of thermodynamics is satisfied because the increase in the amount of order inside the cell is more than compensated for by an even greater decreasein order (increase in entropy) in the surrounding sea of matter (Figure 2-38). \Mhere does the heat that the cell releases come from? Here we encounter another important law of thermodynamics. The first law of thermodynamics statesthat energy can be converted from one form to another, but that it cannot

t h e m a n ym o l e c u l e s that form the cell

food molecules


useful forms of energy + lost heat

t h e m a n y b u i l d i n gb l o c k s for biosynthesis

"SPONTANEOUS" REACTION a st i m e e l a p s e s


Figure2-37 An everydayillustrationof the spontaneousdrive toward disorder. Reversingthis tendencytoward disorder reouiresan intentionaleffort and an In inputof energy:it is not spontaneous. fact.from the secondlaw of we can be certainthat thermodynamics, the humaninterventionrequiredwill release enoughheatto the environment for the to morethan compensate reorderingof the items in this room.


Chapter2: CellChemistryand Biosynthesis sea of matter

o I

J \ a

t o

J -O

o.'. {


a\ o.* increaseddisorderincreasedorder Figure2-38A simplethermodynamic analysis of a livingcell.Inthediagram on theleftthe (theseaof matter) molecules of boththecellandtherestof theuniverse in a aredepicted relatively disordered Inthediagram state. on therightthecellhastakenin energy fromfood molecules andreleased heatbya reaction thatorders themolecules thecellcontains. Because theheatincreases thedisorder in theenvironment aroundthecell(depictedby thejagged arrows anddistorted molecules, indicating molecular theincreased motions caused by heat), thesecond lawof thermodynamics-which states thattheamountof disorder in theuniverse mustalways increase-is satisfied asthecellgrowsanddivides. Fora detailed discussion, see (pp.118-119). Panel2-7

be created or destroyed.Figure 2-39 illustrates some interconversions between different forms of energy. The amount of energy in different forms will change as a result of the chemical reactions inside the cell, but the first law tells us that the total amount of energy must always be the same. For example, an animal cell takes in foodstuffs and converts some of the energy present in the chemical bonds between the atoms of these food molecules (chemical bond energy) into the random thermal motion of molecules (heat energy).As described above,this conversion of chemical energy into heat energy is essentialif the reactions that create order inside the cell are to cause the universe as a whole to become more disordered. The cell cannot derive any benefit from the heat energy it releasesunless the heat-generating reactions inside the cell are directly linked to the processesthat generate molecular order. It is the tight coupling of heat production to an increase in order that distinguishes the metabolism of a cell from the wasteful burning of fuel in a fire. Later, we shall illustrate how this coupling occurs. For now it is sufficient to recognize that a direct linkage of the "burning" of food molecules to the generation of biological order is required for cells to create and maintain an island of order in a universe tending toward chaos.

Photosynthetic Organisms UseSunlightto Synthesize OrganicMolecules All animals live on energy stored in the chemical bonds of organic molecules made by other organisms,which they take in as food. The molecules in food also provide the atoms that animals need to construct new living matter. Some animals obtain their food by eating other animals. But at the bottom of the animal food chain are animals that eat plants. The plants, in turn, trap energy directly from sunlight. As a result, the sun is the ultimate source of the energy used by animal cells. Solar energy enters the living world through photosynthesis in plants and photosynthetic bacteria. Photosynthesis converts the electromagnetic energy in sunlight into chemical bond energy in the cell. Plants obtain all the atoms they need from inorganic sources: carbon from atmospheric carbon dioxide, hydrogen and oxygen from water, nitrogen from ammonia and nitrates in the

, :,

CATALYSIS ANDTHEUSEOF ENERGY BYCELLS f a l l i n g b r i c kh a s kinetic energy

r a i s e db r i c k h a sp o t e n t i a l e n e r g yo u e to pull of gravity



heat isreleased w h e n b r i c kh i t s the floor



potential energy due to position--+


kinetic energy

. /c, 3G \\&",\)




two hydrogen oxygen gas g a sm o l e c u l e s m o l e c u l e



c h e m i c abl o n d e n e r g y


heat dispersedto s ur r o u n di n g s

heat energy


kinetic energy





u,.' j; G&'

r a p i dv i b r a t i o n sa n d rotations of two newly f o r m e dw a t e r m o l e c u l e s

chemicat bondenersy inH2and02 +


heat energy

chlorophyll molecule

electromagnetic(light) energ! --+

c h l o r o p h y lm l olecule in excitedstate

high energy electrons --+


chemicalbond energy

soil, and other elements needed in smaller amounts from inorganic salts in the soil. They use the energy they derive from sunlight to build these atoms into sugars, amino acids, nucleotides, and fatty acids. These small molecules in turn are converted into the proteins, nucleic acids, polysaccharides, and lipids that form the plant. All of these substances serve as food molecules for animals, if the plants are later eaten. The reactions of photosynthesis take place in two stages (Figure 2-4O).ln the first stage, energy from sunlight is captured and transiently stored as chemical bond energy in specializedsmall molecules that act as carriers of energy and reactive chemical groups. (We discussthese "activated carrier" molecules later.) Molecular oxygen (Oz gas) derived from the splitting of water by light is released as a waste product of this first stage. In the second stage,the molecules that serve as energy carriers are used to help drive a carbonfixallon process in which sugars are manufactured from carbon dioxide Bas (COz) and water (HzO), thereby providing a useful source of stored chemical bond energy and materials-both for the plant itself and for any animals that eat it. We describe the elegant mechanisms that underlie these two stagesofphotosynthesis in Chapter 14.


Figure2-39 Someinterconversions between different forms of energy. All energyformsare,in principle, interconvertible. ln all theseorocesses the total amountof energyis conserved. Thus,for example,from the heightand weightof the brickin (1),we can predict exactlyhow much heatwill be released when it hitsthe floor.In (2),notethat the largeamountof chemicalbond energy released when wateris formedis initially convertedto very rapidthermal motions in the two new watermolecules; but with other molecules almost collisions instantaneously spreadthis kinetic energyevenlythroughoutthe (heattransfer), makingthe surroundings from all new moleculesindistinguishable the rest.


Chapter2: CellChemistryand Biosynthesis

f.^..^> \1


capture o{ light energy



H2O+ CO2

energy carflers SUGAR

) ( ) heat




Figure2-40Photosynthesis. Theenergycarriers created in thefirst Thetwo stages of photosynthesis. stagearetwo molecules thatwediscuss shortly-ATP andNADPH. The net result of the entire process of photosynthesis, so far as the green plant is concerned, can be summarized simply in the equation light energy + CO2+ H2O -+ sugars + 02 + heat energy The sugarsproduced are then used both as a source of chemical bond energy and as a source of materials to make the many other small and large organic molecules that are essentialto the Dlant cell.

CellsObtainEnergyby the Oxidationof OrganicMolecules All animal and plant cells are powered by energy stored in the chemical bonds of organic molecules, whether they are sugarsthat a plant has photosynthesized as food for itself or the mkture of large and small molecules that an animal has eaten. Organisms must extract this energy in usable form to live, grow, and reproduce. In both plants and animals, energy is extracted from food molecules by a process ofgradual oxidation, or controlled burning. The Earth'satmosphere contains a great deal of oxygen, and in the presence of oxygen the most energeticallystable form of carbon is CO2and that of hydrogen is H2O.A cell is therefore able to obtain energy from sugarsor other organic molecules by allowing their carbon and hydrogen atoms to combine with oxygen to produce COz and H2O,respectively-a processcalled respiration. Photosynthesisand respiration are complementary processes(Figure 2-41). This means that the transactions between plants and animals are not all one way. Plants, animals, and microorganisms have existed together on this planet for so long that many of them have become an essentialpart of the others' environments. The oxygen releasedby photosynthesis is consumed in the combustion of organic molecules by nearly all organisms. And some of the COz molecules that are fixed today into organic molecules by photosynthesis in a green leaf were yesterday releasedinto the atmosphere by the respiration of an animal-or by that of a fungus or bacterium decomposing dead organic matter. We therefore seethat carbon utilization forms a huge cycle that involves the biosphere (all of the living organisms on Earth) as a whole, crossing boundaries PHOTOSYNT EH SIS COr+HrO+02+SUGARS 02





Figure2-41 Photosynthesis and respirationas complementaryprocesses in the livingworld. Photosynthesis uses the energyof sunlightto producesugars and otherorganicmolecules. These moleculesin turn serveasfood for other organisms. Manyof theseorganisms carryout respiration, a processthat uses 02 to form CO2from the samecarbon atomsthat had beentakenup asCO2and convertedinto sugarsby photosynthesis. In the process, the organisms that respire obtainthe chemicalbond energythat Thefirstcellson the they needto survive. Eartharethoughtto havebeencapable of neitherphotosynthesis nor respiration (discussed in Chapter14).However, photosynthesis musthavepreceded respiration on the Earth,sincethereis strongevidencethat billionsof yearsof photosynthesis were requiredbefore02 had beenreleased in sufficientquantity to createan atmosphere richin this gas, (TheEarth's atmosphere currently contains20o/o Ot.)








Figure2-42 The carbon cycle.Individual into carbonatomsareincorporated of the livingworld by organicmolecules activityof bacteria the photosynthetic Theypassto and plants(includingalgae). and organic animals,microorganisms, materialin soiland oceansin cyclicpaths. when CO2is restoredto the atmosphere organicmolecules areoxidizedby cellsor burnedby humansasfuels.


between individual organisms (Figare2-42). Similarly, atoms of nitrogen, phosphorus, and sulfur move between the living and nonliving worlds in cycles that involve plants, animals, fungi, and bacteria.

Oxidationand Reduction InvolveElectron Transfers The cell does not oxidize organic molecules in one step, as occurs when organic material is burned in a fire. Through the use of enzyme catalysts,metabolism takes the molecules through a large number of reactions that only rarely involve the direct addition of oxygen. Before we consider some of these reactions and their purpose, we discusswhat is meant by the process of oxidation, Oxidation does not mean only the addition of oxygen atoms; rather, it applies more generally to any reaction in which electrons are transferred from one atom to another. Oxidation in this senserefers to the removal of electrons, and reduction-the converse of oxidation-means the addition of electrons. Thus, Fe2*is oxidized if it loses an electron to become Fe3*,and a chlorine atom is reduced if it gains an electron to become Cl-. Since the number of electrons is conserved (no loss or gain) in a chemical reaction, oxidation and reduction always occur simultaneously: that is, if one molecule gains an electron in a reaction (reduction), a second molecule loses the electron (oxidation). \Mhen a sugar molecule is oxidized to CO2and HzO, for example, the 02 molecules involved in forming H2O gain electrons and thus are said to have been reduced. The terms "oxidation" and "reduction" apply even when there is only a partial shift of electrons between atoms linked by a covalent bond (Figure 2-43). (A)













*) ATOM 1





methanol I

H _ C" -' O H


H g Figure2-43 Oxidation and reduction.(A)Whentwo atoms form a polar covalentbond (seep. 50),the atom endingup with a greatershareof electronsis saidto be reduced, The whilethe otheratom acquires a lessershareof electronsand is saidto be oxidized. reducedatom hasacquireda partialnegativecharge(6-)asthe positivechargeon the atomicnucleusis now morethan equaledby the total chargeof the electrons surroundingit, and conversely, the oxidizedatom hasacquireda partialpositivecharge (6+).(B)Thesinglecarbonatom of methanecan be convertedto that of carbondioxide by the successive replacement of its covalentlybondedhydrogenatomswith oxygen atoms.With eachstep,electronsareshiftedawayfrom the carbon(asindicatedby the b/ueshading), and the carbonatom becomesprogressively moreoxidized.Eachof thesestepsis energetically favorableunderthe conditionspresentinsidea cell.

I formaldehyde

c:o Hl I I f o r m i ca c i d



I n


c a r b o nd i o x i d e


Chapter2:CellChemistryand Biosynthesis

\Mhen a carbon atom becomes covalently bonded to an atom with a strong affinity for electrons, such as oxygen,chlorine, or sulfur, for example, it givesup more than its equal share of electrons and forms a polar covalent bond: the positive charge of the carbon nucleus is now somewhat greater than the negative charge ofits electrons, and the atom therefore acquires a partial positive charge and is said to be oxidized. Conversely,a carbon atom in a C-H linkage has slightly more than its share ofelectrons, and so it is said to be reduced (seeFigure 2-43). 'W/hena molecule in a cell picks up an electron (e), it often picks up a proton (H+) at the same time (protons being freely available in water). The net effect in this caseis to add a hydrogen atom to the molecule A+o+H+-+AH Even though a proton plus an electron is involved (instead ofjust an electron), such hydrogenation reactions are reductions, and the reverse, dehydrogenation reactions, are oxidations. It is especiallyeasyto tell whether an organic molecule is being oxidized or reduced: reduction is occurring if its number of C-H bonds increases,whereas oxidation is occurring if its number of C-H bonds decreases (see Figure 2-438). Cells use enzymes to catalyze the oxidation of organic molecules in small steps,through a sequenceof reactions that allows useful energy to be harvested. We now need to explain how enzymes work and some of the constraints under which they operate.

Enzymes Lowerthe Barriers ThatBlockChemical Reactions Considerthe reaction paper + 02 -+ smoke + ashes+ heat + CO2+ H2O The paper burns readily, releasing to the atmosphere both energy as heat and water and carbon dioxide as gases,but the smoke and ashesnever spontaneously retrieve these entities from the heated atmosphere and reconstitute themselves into paper.\Ahen the paper burns, its chemical energy is dissipated as heat-not lost from the universe, since energy can never be created or destroyed,but irretrievably dispersed in the chaotic random thermal motions of molecules.At the same time, the atoms and molecules of the paper become dispersed and disordered. In the language of thermodlmamics, there has been aloss of free energJ, that is, of energy that can be harnessedto do work or drive chemical reactions. This loss reflects a loss of orderliness in the way the energy and molecules were stored in the paper. We shall discuss free energy in more detail shortly, but the general principle is clear enough intuitively: chemical reactions proceed spontaneously only in the direction that leads to a loss of free energy;in other words, the spontaneous direction for any reaction is the direction that goes "dor.rmhill."A "doumhill" reaction in this senseis often said tobe energeticallyfauorable, Although the most energetically favorable form of carbon under ordinary conditions is COz, and that of hydrogen is HzO, a living organism does not disappear in a puff of smoke, and the book in your hands does not burst into flames. This is because the molecules both in the living organism and in the book are in a relatively stable state, and they cannot be changed to a state of Iower energy without an input of energy: in other words, a molecule requires activation energy-a kick over an energy barrier-before it can undergo a chemical reaction that leaves it in a more stable state (Figure 244).In the case of a burning book, the activation energy is provided by the heat of a lighted match. For the molecules in the watery solution inside a cell, the kick is delivered by an unusually energetic random collision with surrounding moleculescollisions that become more violent as the temperature is raised. In a living cell, the kick over the energy barrier is greatly aided by a specialized class of proteins-the enzymes. Each enzyme binds tightly to one or more molecules, called substrates, and holds them in a way that greatly reduces the activation energy of a particular chemical reaction that the bound substrates can undergo. A substance that can lower the activation energy of a reaction is



e n z y m er o w e r s activation e n e r g yf o r catalyzed reaction Y+X


I o o c o 6





y (areactant) Figure2-44Theimportantprinciple (A)compound of activation energy. isin a relatively stablestate, andenergyisrequired to convert it to compound X (aproduct), even thoughX isat a loweroverall energylevelthanY Thisconversion willnottakeplace, therefore, unlesscompound Y canacquireenoughactivationenergy(energy a minusenergy b)fromits surroundings to undergo thereaction thatconverts it intocompound X.This energymaybe provided by means of anunusually energetic collision withothermolecules. Forthereverse reaction, X -+ Y theactivation energywill be muchlarger(energy a minusenergy c);thisreaction willtherefore occurmuchmorerarely. Activation positive; energies arealways note,however, thatthetotalenergychange fortheenergetically favorable reaction Y -+ X isenergy c minus (B)Energy energy b,a negative number. barriers for specific reactions canbe loweredby catalysts, asindicated bythelinemarked d. Enzymes areparticularly effective catalysts because theygreatly reduce theactivation energy forthereactions theyperform.

termed a catalyst; catalystsincreasethe rate of chemical reactions becausethey allow a much larger proportion of the random collisions with surrounding molecules to kick the substratesover the energy barrier, as illustrated in Figure 2-45.Enzymes are among the most effective catalystsknown, capable of speeding up reactions by factors of 101aor more. They thereby allow reactions that would not otherwise occur to proceed rapidly at normal temperatures. Enzymes are also highly selective.Each enzyme usually catalyzesonly one particular reaction: in other words, it selectivelylowers the activation energy of only one of the several possible chemical reactions that its bound substrate molecules could undergo. In this way, enzymes direct each of the many different molecules in a cell along specific reaction pathways (Figure 2-46). The success of living organisms is attributable to a cell's ability to make enzymes of many types, each with precisely specified properties. Each enzyme has a unique shape containing an actiue site, a pocket or groove in the enzyme into which only particular substrates will fit (Figure z-42). Like all other catalysts, enzyme molecules themselves remain unchanged after participating in a reaction and therefore can function over and over again. In chapter 3, we discuss further how enzymes work.


m a n y m o l e c u l e sh a v e e n o u g he n e r g yt o u n d e r g o the enzyme-catalyzed c h e m i c arl e a c t i o n


E .=>

39 6o

a l m o s tn o m o l e c u l e s h a v et h e v e r y h i g h e n e r g yn e e d e dt o u n d e r g oa n uncatalyzed

F> o!


m o l e c u l ew s ith a v e r a g ee n e r g y

chemical reaction


e n e r g yp e r m o l e c u l e+

activation e n e r g yf o r catalyzed reaction

activation e n e r g yf o r u ncatalyzed reactron

Figure2-45 Loweringthe activation energygreatlyincreases the probability of reaction.At any giveninstant,a populationof identicalsubstrate molecules will havea rangeof energies, distributedas shownon the graph.The varyingenergiescomefrom collisions with surroundingmolecules, which make jiggle,vibrate, the substratemolecules and spin.Fora moleculeto undergoa chemicalreaction, the energyof the moleculemustexceedthe activation energybarrierfor that reaction; for most biologicalreactions, this almostnever happenswithoutenzymecatalysis. Even with enzymecatalysis, the substrate moleculesmustexperience a particularly energeticcollisionto react(redshaded areo).Raisingthe temperaturecan also increase the numberof molecules with sufficientenergyto overcomethe activationenergyneededfor a reaction; but in contrastto enzymecatalysis, this effectis nonselective, speedingup all reactions.


Chapter2: CellChemistryand Biosynthesis



dry river bed

l a k ew i t h




(*- .U '


flowing s t r e am



u n c a t ay z e d r e a c t i o n - w a v e sn o t a r g e e n o u g ht o s u r m o u n b t arrler

catalyzedreaction-waves often surmount barrier




2^ I

) a c o

uncatalyzed (B)

e n z y m ec a t a l y s i s o f r e a c t i o n1

How Enzymes FindTheirSubstrates: TheEnormous Rapidity of MolecularMotions An enzyme will often catalyzethe reaction of thousands of substrate molecules every second. This means that it must be able to bind a new substrate molecule in a fraction of a millisecond. But both enzl'rnesand their substratesare present in relatively small numbers in a cell. How do they find each other so fast?Rapid binding is possible because the motions caused by heat energy are enormously fast at the molecular level. These molecular motions can be classified broadly into three kinds: (1) the movement of a molecule from one place to another (translational motion), (2) the rapid back-and-forth movement of covalently linked atoms with respect to one another (vibrations), and (3) rotations. All of these motions help to bring the surfacesof interacting molecules together. The rates of molecular motions can be measured by a variety of spectroscopic techniques.A large globular protein is constantly tumbling, rotating about its axis about a million times per second. Molecules are also in constant translational motion, which causesthem to explore the space inside the cell very efficiently by wandering through it-a process called diffusion. In this way, every molecule in a cell collides with a huge number of other molecules each second. As the molecules in a liquid collide and bounce off one another, an individual molecule moves first one way and then another, its path constituting a random walk (Figure 2-48). In such a walk, the average net distance that each molecule travels (asthe crow flies) from its starting point is proportional to the square root of the time involved: that is, if it takes a molecule I second on averageto travel 1 pm, it takes 4 secondsto travel 2 pm, 100 secondsto travel 10 pm, and so on. The inside of a cell is very crowded (Figure 2-49). Nevertheless,experiments in which fluorescent dyes and other labeled molecules are injected into cells


m o l e c u l eA (substrate)

enzymesubstrate comolex

enzymeproduct comolex

molecule B (product)

Figure2-46 Floatingball analogiesfor enzyme catalysis.(A)A barrier dam is loweredto representenzyme catalysis. The greenball representsa potentialreactant(compoundY) that is bouncingup and down in energylevel due to constantencounters with waves (ananalogyfor the thermal bombardmentof the reactantmolecule with the surrounding watermolecules). Whenthe barrier(activation energy)is loweredsignificantly, it allowsthe favorablemovementof the energetically ball(thereactant)downhill.(B)Thefour wallsof the box reoresent the activation energybarriers for four differentchemical reactions that areall energetically favorable, in the sensethat the products areat lowerenergylevelsthan the reactants.ln the left-handbox, none of thesereactions occursbecauseeventhe largestwavesarenot largeenoughto surmountany of the energybarriers. In the right-handbox, enzymecatalysis lowersthe activationenergyfor reaction number1 only;now the jostlingof the wavesallowspassage ofthe reactant moleculeoverthis energybanier, inducingreaction1.(C)A branchingriver with a set of barrierdams(yellowboxes) servesto illustratehow a seriesof enzyme-catalyzed reactions determines the exactreactionpathwayfollowedby eachmoleculeinsidethe cell.

Figure2-47 How enzymeswork. Each enzymehasan activesiteto whichone or more substrotemoleculesbind, formingan enzyme-substrate complex. A reactionoccursat the activesite, producingan enzyme-product complex. fhe productis then released, allowingthe enzymeto bind furthersubstrate mnlarr

Y. If the ratio of Y to X increases,the AG becomes more negative for the transition Y -+ X (and more positive for the transition X -+ Y). How much of a concentration difference is needed to compensate for a given decreasein chemical bond energy (and accompanying heat release)?The answer is not intuitively obvious, but it can be determined from a thermodynamic analysis that makes it possible to separatethe concentration-dependent and the concentration-independent parts of the free-energy change.The AG for a given reaction can thereby be written as the sum of two parts: the first, called Ihe standard free-energychange,AGo,depends on the intrinsic charactersof the reacting molecules; the second depends on their concentrations. For the simple reactionY -+ X at 37'C, A G = A G + 0 . 6 1 6l n ] $ = A G + . r + 2 I o"e E lYl lYl where AG is in kilocalories per mole, [Y] and [X] denote the concentrations of Y and X, ln is the natural logarithm, and the constant 0.616 is equal to R7: the product of the gas constant, R, and the absolute temperature, Z. Note that AG equals the value of AG when the molar concentrations of Y and X are equal (log I = 0). As expected,AG becomes more negative as the ratio of X to Y decreases(the log of a number < I is negative). Inspection of the above equation reveals that the AG equals the value of AG when the concentrations of Y and X are equal. But as the favorable reactionY -+ X proceeds,the concentration of the product X increasesand the concentration of the substrate Y decreases.This change in relative concentrations will cause ffi / [Y] to become increasingly large, making the initially favorable AG less and less negative. Eventually, when AG = 0, a chemical equilibrium will be attained; here the concentration effect just balances the push given to the reaction by AG, and the ratio of substrate to product reaches a constant value (Figure 2-52). How far will a reaction proceed before it stops at equilibrium? To address this question, we need to introduce the equilibrium constant, K The value of K is different for different reactions, and it reflects the ratio ofproduct to substrate at equilibrium. For the reactionY -+ X: IX]


The equation that connects AG and the ratio tX / tYl allows us to connect AG directly to K Since AG = 0 at equilibrium, the concentrations of Y and X at this point are such that:

tG =-r.421"c j+


LG =-L4ZIogK

this reactioncan occurspontaneously


lf the reactionX*Y o c c u r r e dA , Gw o u l d be > 0, and the u n i v e r s ew o u l d b e c o m em o r e oroereo.

thisreaction canoccuronlyif it iscoupledto a second, favorablereaction energetically Figure2-50 The distinction between energetically favorableand energeticallyunfavorablereactions.

the energeticallyunfavorable reactionX*Y is driven by the energeticallyfavorable reaction C*D, becausethe net free-energychangefor the pair of coupled reactionsis less than zero Figure 2-51 How reaction coupling is used to drive energetically unfavorable reactions.



77 Figure2-52 Chemicalequilibrium. the Whena reactionreachesequilibrium, forwardand backwardfluxesof reacting areequaland opposite. molecules


Theformation of X isenergetically favoredin thisexampleIn otherwords,the A6 of Y -+ X isnegative andthe AGof X + Y ispositiveButbecause of thermal bombardments, therewill always besomeX converting to Y andviceversa. SUPPOSE WESTART WITHAN EQUAL NUMBER OFY ANDX MOLECULES

thereforethe ratioof X to " molecules will increase



EVENTUALLY therewill be a largeenoughexcess of X overY to just compensate for the slowrateof X -+ Y.Equilibrium willthenbeattained

Table2-4 Relationship Betweenthe StandardFreeEnergyChange,AG",and the EquilibriumConstant AT EQUILIBRIUM t h e n u m b e ro f Y m o l e c u l e sb e i n gc o n v e r t e dt o X m o l e c u l e s e a c hs e c o n di s e x a c t l ye q u a lt o t h e n u m b e ro f X m o l e c u l e sb e i n gc o n v e r t e dt o Y m o l e c u l e se a c hs e c o n ds. o t h a t t h e r e i s n o n e t c h a n o ei n t h e r a t i o o f Y t o X .

Using the last equation, we can see how the equilibrium ratio of X to Y (expressedas an equilibrium constant, K) depends on the intrinsic character of the molecules, as expressedin the value of AG (Ihble 2-4). Note that for every 1.4 kcal/mole (5.9 kJ/mole) difference in free energy at 37"C, the equilibrium constant changes by a factor of 10. \Alhen an enzyme (or any catalyst) lowers the activation energy for the reaction Y -+ X, it also lowers the activation energy for the reaction X -+ Y by exactly the same amount (see Figure 2-44).The forward and backward reactions will therefore be acceleratedby the same factor by an enzyme, and the equilibrium point for the reaction (and AG) is unchanged (Figure 2-53).

ForSequentialReactions, AGoValuesAre Additive We can predict quantitatively the course of most reactions.A large body of thermodlmamic data has been collected that makes it possible to calculate the standard change in free energy,AG, for most of the important metabolic reactions of the cell. The overall free-energy change for a metabolic pathway is then simply the sum of the free-energychangesin each of its component steps.Consider, for example, two sequential reactions X-+Y and Y -+Z whose AG values are +5 and -13 kcal/mole, respectively.(Recallthat a mole is 6 x 1023molecules of a substance.)If these two reactions occur sequentially, the AG for the coupled reaction will be -8 kcal/mole. Thus, the unfavorable reaction X -+ Y which will not occur spontaneously, can be driven by the favorable reactionY -+ Z, provided that this second reaction follows the first. Cells can therefore cause the energetically unfavorable transition, X -+ Y to occur if an enzyme catalyzing the X -+ Y reaction is supplemented by a second enzyme that catalyzes the energetically fauorable reaction,Y -->Z. In effect, the reaction Y -+ Z will then act as a "siphon' to drive the conversion of all of molecule X to molecule Y and thence to molecule Z (Figure 2-54) . For example,

10s 104 103 102 lor 1 10 10-2 10-3 1o-4 1o-s

-7.1(-29.7) -s.7 (-23.8) -4.3(-18.0) - 2 . 8( - 1 1 . 7 ) -1.4(-s.e) 0 (0)


2 . 8( 1 1 . 7 ) 4.3(18.0) s.7(23.8) 7.1(2s.7)

V a l u eosf t h e e q u i l l b r i ucmo n s t a n t for the simple werecalculated Y = X usingthe chemlcalreaction equationglvenin the text TheAG"givenhereis in kilocalories permoleat 37"C,with kilojoules (1 kilocalorie permolein parentheses s )s i se q u atl o 4 l 8 4 k i l o j o u l e A in the text,AG'represents explained under difference the free-energy (whereal1 conditions standard arepresenlar a components 'l of 0 mole/liter) concentration Fromthistable,we seethat if thereis change free-energy standard a favorable (AG")of -a 3 kca/mole(-l B0 kJlmole)for Y + X,therewill be 1000 the transitlon in stateX than timesmoremolecuJes (K= 1000), in stateY at equilibrium

chapter2:cellchemistry andBiosynthesis





several of the reactions in the long pathway that converts sugars into CO2 and H2O would be energetically unfavorable if considered on their or,rm.But the pathway neverthelessproceeds becausethe total AG for the seriesof sequential reactions has a large negative value. But forming a sequential pathway is not adequate for many purposes. Often the desired pathway is simply X -+ Y without further conversion of Y to some other product. Fortunately, there are other more general ways of using enzymes to couple reactions together. How these work is the topic we discuss next.

Figure2-53 Enzymescannotchange the equilibriumpoint for reactions, Enzymes, likeall catalysts, speedup the forwardand backwardratesof a reaction by the samefactor.Therefore,for both the catalyzed and the uncatalyzed reactions shownhere,the numberof molecules undergoingthe transition X -+ Y is eoualto the numberof molecules undergoingthe transition Y -+ X when the ratioof Y molecules to X molecules is 3.5to 1. In otherwords,the two reactionsreacheouilibriumat exactlythe samepoint.

ActivatedCarrierMolecules Are Essential for Biosynthesis The energy released by the oxidation of food molecules must be stored temporarily before it can be channeled into the construction of the many other molecules needed by the cell. In most cases,the energy is stored as chemical bond energy in a small set of activated "carrier molecules,"which contain one or more energy-rich covalent bonds. These molecules diffuse rapidly throughout the cell and thereby carry their bond energy from sites of energy generation to the sites where energy is used for bioslnthesis and other cell activities (Figure 2-55). The activated carriers store energy in an easily exchangeable form, either as a readily transferable chemical group or as high-energy electrons, and they can serve a dual role as a source of both energy and chemical groups in biosynthetic reactions. For historical reasons,these molecules are also sometimes referred to as coenzymes.The most important of the activated carrier molecules are ATP and two molecules that are closely related to each other, NADH and NADPHas we discuss in detail shortly. We shall see that cells use activated carrier molecules like money to pay for reactions that otherwise could not take place.

z e q u i l i b r i u mp o i n t f o r X * Y r e a c t i o na l o n e

e q u i l i b r i u mp o i n t f o r Y*Z reactionalone






2 x pyruvate

o. o\// C







2 x acetaldehyde

H-, C l -OH CH:



2 x ethanol

Figure2-71 Two pathwaysfor the anaerobicbreakdownof pyruvate. (A)Whenthereis inadequate oxygen,for example,in a musclecellundergoing vigorouscontraction, the pyruvate producedby glycolysisis convertedto lactateas shown.Thisreactton regenerates the NADt consumedin step 6 of glycolysis,but the whole pathway yieldsmuch lessenergyoverallthan completeoxidation.(B)In some organisms that can grow anaerobically, suchasyeasts,pyruvateis convertedvia acetaldehyde into carbondioxideand ethanol.Again,this pathwayregenerates NAD+from NADH,as requiredto enable glycolysis to continue.Both(A)and (B) are exampfes of fermentations.


Glycolysis lllustrates HowEnzymes CoupleOxidationto Energy Storage Returning to the paddle-wheel analogy that we used to introduce coupled reactions (see Figure 2-56), we can now equate enzymes with the paddle wheel. Enzymes act to harvest useful energy from the oxidation of organic molecules by coupling an energetically unfavorable reaction with a favorable one. To demonstrate this coupling, we examine a step in glycolysis to see exactly how such coupled reactions occur. TWo central reactions in glycolysis (steps 6 and 7) convert the three-carbon sugar intermediate glyceraldehyde3-phosphate (an aldehyde) into 3-phosphoglycerate(a carboxylic acid; seePanel2-8, pp. 120-121).This entails the oxidation of an aldehyde group to a carboxylic acid group in a reaction that occurs in two steps.The overall reaction releasesenough free energy to convert a molecule of ADP to AIP and to transfer two electrons from the aldehyde to NAD* to form NADH, while still releasing enough heat to the environment to make the overall reaction energeticallyfavorable (AG for the overall reaction is -3.0 kcal/mole). Figure 2-72 otttlines the means by which this remarkable feat of energy harvesting is accomplished. The indicated chemical reactions are precisely guided by two enzymes to which the sugar intermediates are tightly bound. In fact, as detailed in Figure 2-72, the first enzyme (glyceraldehyde 3-phosphate dehydrogenase) forms a short-lived covalent bond to the aldehyde through a reactive -SH group on the enzyme, and catalyzes its oxidation by NAD+ in this attached state. The reactive enzyme-substrate bond is then displaced by an inorganic phosphate ion to produce a high-energy phosphate intermediate, which is released from the enzyme. This intermediate binds to the second enzyme (phosphoglycerate kinase), which catalyzesthe energetically favorable transfer of the high-energy phosphate just created to ADB forming AIP and completing the process of oxidizing an aldehyde to a carboxylic acid. We have shown this particular oxidation process in some detail because it provides a clear example of enzyme-mediated energy storage through coupled reactions (Figure 2-73). Steps 6 and 7 are the onlyreactions in glycolysis that create a high-energy phosphate linkage directly from inorganic phosphate. As such, they account for the net yield of two AIP molecules and two NADH molecules per molecule of glucose (seePanel 2-8, pp.l20-I2l). As we have just seen,AIP can be formed readily from ADP when a reaction intermediate is formed with a phosphate bond of higher-energy than the phosphate bond in AIP Phosphatebonds can be ordered in energy by comparing the standard free-energy change (AGl for the breakage of each bond by hydrolysis. Figure 2-74 compares the high-energy phosphoanhydride bonds in ATP with the energy of some other phosphate bonds, several of which are generated during glycolysis.

in SpecialReservoirs StoreFoodMolecules Organisms All organisms need to maintain a high ATP/ADP ratio to maintain biological order in their cells. Yet animals have only periodic accessto food, and plants need to survive overnight without sunlight, when they are unable to produce sugar from photosynthesis. For this reason, both plants and animals convert sugars and fats to special forms for storage (Figure 2-75). To compensate for long periods of fasting, animals store fatty acids as fat droplets composed of water-insoluble triacylglycerols,largely in the cytoplasm of specialized fat cells, called adipocltes. For shorter-term storage, sugar is stored as glucose subunits in the large branched polysaccharide glycogen, which is present as small granules in the cltoplasm of many cells,including liver and muscle. The synthesis and degradation of glycogen are rapidly regulated according to need. \.A/hencells need more AIP than they can generate from the food molecules taken in from the bloodstream, they break down glycogen in a reaction that produces glucose 1-phosphate,which is rapidly converted to glucose 6-phosphate for glycolysis.



Chapter2: CellChemistryand Biosynthesis



\ /(_ /



glyceraldehyde 3-phosphate


A covalent bond is formed between glyceraldehyde3-phosphate(the substrate)and the -5H group of a cysteineside chain of the enzyme glyceraldehyde3-phosphate d e h y d r o g e n a s ew,h i c h a l s ob i n d s noncovalentlyto NAD+.

t-? I




Oxidation of glyceraldehyde 3-phosphateoccurs,as two electronsplus a proton (a hydride ion, see Figure2-60) are transferredfrom glyceraldehyde 3-phosphateto the bound NAD+, forming NADH.Part of the energy releasedby the oxidation of the a l d e h y d ei s t h u s s t o r e di n N A D H , and part goes into convertingthe bond between the enzymeand its substrateglyceraldehyde 3-phosphateinto a high-energy thioester bond


(o c r F



A m o l e c u l eo f i n o r g a n i cp h o s p h a t e displacesthe high-energybond to the enzymeto create 1,3-bisphosphoglycerate,which contains a high-energyacyl-anhydride bond.



rU- iAL J







@@o F 4


r fr_A-r,






3 - p h o s p h olgy c e r a t e

The high-energybond to phosphate is transferredto ADP to form ATP.

(B) SUMMARYOF STEPS 6 AND 7 Much of the energy of oxidation has been stored in the activateo carriersATPand NADH.

Figure2-72 Energystoragein steps6 and 7 of glycolysis.In thesestepsthe oxidationof an aldehydeto a carboxylic acidis coupledto the formationof ATP and NADH.(A)Step6 beginswith the formationof a covalentbond between the substrate(glyceraldehyde 3-phosphate) and an -5H groupexposed on the surfaceof the enzyme (glyceraldehyde 3-phosphate dehydrogenase). Theenzymethen catalyzestransferof hydrogen(asa hydrideion-a protonplustwo electrons) from the bound glyceraldehyde 3-phosphate to a moleculeof NAD+.Partof the energy released in this oxidationis usedto form a moleculeof NADHand part is usedto convertthe originallinkagebetweenthe enzymeand its substrate to a highenergythioesterbond (shownin red.). A moleculeof inorganicphosphatethen displaces this high-energy bond on the enzyme,creatinga high-energy sugarphosphatebond instead(red).At this point the enzymehas not only stored energyin NADH,but alsocoupledthe energetically favorableoxidationof an aldehydeto the energetically unfavorable formationof a high-energy phosphate bond.Thesecondreactionhasbeen drivenby the first,therebyactinglikethe "paddle-wheel" couplerin Figure2-56. In reactionstep7, the high-energy just made, sugar-phosphate intermediate 1,3-bisphosphoglycerate, bindsto a secondenzyme,phosphoglycerate kinase.The reactivephosphateis transferredto ADP,forming a moleculeof ATPand leavinga freecarboxylic acid groupon the oxidizedsugar. (B)Summaryof the overallchemical changeproducedby reactions6 and 7.





o c o a o


Figure 2-73 Schematicview of the coupledreactionsthat form NADHand ATPin steps 6 and 7 of glycolysis.The C-H bond oxidationenergydrivesthe formationof both NADHand a highenergyphosphatebond.The breakageof bond then drivesATP the high-energy formation.




\,, C I

C-H bond oxidation energy




t o t a l e n e r g yc h a n g ef o r s t e p6 f o l l o w e d b y s t e p 7 i s a f a v o r a b l e- 3 k c a l / m o l e

o-o \//


enol phosphate bond


H'' C : C - O - . P



/ ,/lo-

p h o s p h o e n oply r u v a t e isee ianel Z-8,'pp. 120-121)

- 14'6 (-61 e)


tltl c-c-o y'P o-

anhydride bondto carbon



phosphate bond in creaTtne phosphate

anhydride bondto phosphate (phosphoanhydride bond)


ooo ililtl



for example, 1,3-bisphosphoglycerate (seePanel2-8)

c r e a t i n eo h o s p h a t e ( a c t i v a t e dc a r r i e rt h a t storesenergy in muscle)


for example, ATPwhen hydrolyzed tOADP

-7.3 (-306)



Hzo HO phosphoester bond

lll , -i-"vP-o-

for example, g l u c o s e6 - p h o s p h a t e (seePanel2-8)


Hzo type of phosphatebond

specificexamplesshowing the standardfree-energychange (AG') for hydrolysisof phosphatebond

Figwe 2-74 Phosphatebonds have different energies.Examplesof differenttypes of phosphatebondswith their sitesof hydrolysisare shown in the moleculesdepictedon the left.Thoseitarting with a gray catbonatom show only part of a molecule.Examplesof molecules (kilojoules transfer in parentheses)'The in kilocalories changefor hydrolysis containingsuchbondsaregivenon the right,with the free-energy (AG') the of for hydrolysis change free-energy if the standard phosfhate group favorable of a from one moleculeto anotheris energetically Thus,a phosphategroup of the phosphatebond in the second. phosphatebond of the firstmoleculeis more negativethan that for hydrolysis to ADPto form ATPThe hydrolysisreactioncan be viewedas the transferof the phosphate is readilytransferredfrom 1,3-bisphosphoglycerate group to water.


Chapter2: CellChemistyand Biosynthesis

9rycogen g r a n u l e isn the cytoplasm o f a l i v e rc e l l

b r a n c hp o i n t

g l u c o s es u b u n i t s


ql a 1,4-glycosidic bond in backbone


Figure2-75 The storageof sugarsand fats in animaland plant cells.(A)The structures of starchand glycogen, the storageform of sugarsin plantsand animals,respectively. Botharestorage polymersof the sugarglucoseand differ only in the frequencyof branchpoints (theregioninyellowisshownenlarged below).Therearemanymore branchesin glycogenthan in starch.(B)An electron micrographshowsglycogengranulesin the cytoplasmof a livercell.(C)A thin sectionof a singlechloroplast from a plantcell,showingthe starchgranules and lipid(fatdroplets)that have accumulated asa resultof the biosyntheses occurringthere.(D)Fat droplets(stainedred)beginningto accumulate in developingfat cellsof an animal.(8,courtesyof RobertFletterick and DanielS.Friend;C,courtesyof K.Plaskitt; D,courtesyof RonaldM. Evans and PeterTotonoz.)

o 1 , 6 - 9 l y c o s i dbi co n d at branch point

/ o-cH2


l;------., Quantitatively, fat is far more important than glycogen as an energy store for animals, presumably becauseit provides for more efficient storage.The oxidation of a gram of fat releasesabout twice as much energy as the oxidation of a gram of glycogen. Moreover, glycogen differs from fat in binding a great deal of water, producing a sixfold difference in the actual mass of glycogen required to store the same amount of energy as fat. An averageadult human stores enough glycogen for only about a day of normal activities but enough fat to last for nearly a month. If our main fuel reservoir had to be carried as glycogen instead of fat, body weight would increase by an averageof about 60 pounds. Although plants produce NADPH and Arp by photosynthesis,this important process occurs in a specialized organelle, called a chloroplast, which is isolated from the rest of the plant cell by a membrane that is impermeable to both types of activated carrier molecules. Moreover, the plant contains many other cellssuch as those in the roots-that lack chloroplasts and therefore cannot produce their or,rmsugars.Therefore, for most of its ATP production, the plant relies on an



.95 Figure 2-76 How the ATPneeded for most plant cell metabolismis made.In plants,the chloroplasts and mitochondria to supplycellswith collaborate metabolitesand ATP.(Fordetails,see Chapter14.)




export of sugars from its chloroplasts to the mitochondria that are located in all cells of the plant. Most of the AIP needed by the plant is synthesized in these mitochondria and exported from them to the rest of the plant cell, using exactly the same pathways for the oxidative breakdor,rrnof sugars as in nonphotosynthetic organisms (Figure 2-76). During periods of excessphotosynthetic capacity during the day, chloroplasts convert some of the sugars that they make into fats and into starch, a polgner of glucose analogous to the glycogen of animals. The fats in plants are triacylglycerols, just like the fats in animals, and differ only in the types of fatty acids that predominate. Fat and starch are both stored in the chloroplast as reservoirs to be mobilized as an energy source during periods of darkness (see Figure 2-75C). The embryos inside plant seedsmust live on stored sources of energy for a prolonged period, until they germinate to produce leaves that can harvest the energy in sunlight. For this reason plant seeds often contain especially large amounts of fats and starch-which makes them a malor food source for animals, including ourselves (Figare 2-7 7),

MostAnimalCellsDeriveTheirEnergyfrom FattyAcidsBetween Meals After a meal, most of the energy that an animal needs is derived from sugars derived from food. Excesssugars,if any, are used to replenish depleted glycogen stores,or to synthesizefats as a food store. But soon the fat stored in adipose tissue is called into play, and by the morning after an overnight fast, fatty acid oxidation generatesmost of the ATP we need. Low glucose levels in the blood trigger the breakdown of fats for energy production. As illustrated in Figure 2-78, the triacylglycerols stored in fat droplets in adipocl'tes are hydrolyzed to produce fatty acids and glycerol, and the fatty acids released are transferred to cells in the body through the bloodstream. \.\hile animals readily convert sugars to fats, they cannot convert fatty acids to sugars.Instead, the fatty acids are oxidized directly.

Figure2-77 SomePlant seedsthat serveas important foods for humans. Corn,nuts,and Peasall containrich storesof starchand fat that providethe youngplantembryoin the seedwith energyand buildingblocksfor (Courtesy of the JohnInnes biosynthesis. Foundation.)


Chapter 2:CellChemistry and Biosynthesis

stored fat bloodstream glycerol


fatty acids

o x i d a t i o ni n mitochondria

Figure 2-78 How stored fats are mobilized for energy production in animals.Low glucoselevelsin the blood triggerthe hydrolysisof the triacylglycerolmoleculesin fat droplets to free fatty acidsand glycerol,as illustrated.Thesefatty acidsenter the bloodstream,wherethey bind to the abundantblood protein,serumalbumin. Specialfatty acidtransportersin the plasmamembraneof cellsthat oxidize fatty acids,suchas musclecells,then pass thesefatty acidsinto the cytosol,from whichthey aremovedinto mitochondria for energyproduction(seeFigure2-80).


Sugarsand FatsAre Both Degradedto AcetylCoAin Mitochondria

The fatty acids imported from the bloodstream are moved into mitochondria, where all of their oxidation takes place ). Each molecule of fatty acid (as the activated molecule /a tty acyl coA) is broken down completely by a cycle of reactions that trims two carbons at a time from its carboxyl end, generating one molecule of acetyl coA for each turn of the cycle. A molecule of NADH and a molecule of FADH2 are also produced in this proces Sugars and fats are the major energy sources for most nonorganisms, including humans. However, most of the useful ene

8 t r i m e r so f lipoamide reductasetransacetylase

+6 dimersof dihydrolipoyl dehydrogenase

+ 1 2 d i m e r so f pyruvatedecarboxylase

o ,//


si*ht{iiii acetyl coA (B)

Figure 2-79 The oxidation of pyruvate to acetylCoA and COz.(A)The structure of the pyruvatedehydrogenase complex, whichcontains60 polypeptidechains. Thisis an exampleof a large multienzymecomplexin which reaction intermediatesare passeddirectlyfrom one enzymeto another.In eucaryotic cellsit is locatedin the mitochondrion. (B)The reactionscarriedout by the pyruvatedehydrogenase complex.The complexconvertspyruvateto acetylcoA in the mitochondrial matrix;NADHis also producedin this reaction.A, B,and C are the three enzymespyruvate decarboxylase,Iipoam ide reductasetronsacetylose,and dihydrolipoyI dehydrogenase,respectively.These enzymesareillustrated in (A);their activities arelinkedas shown.



S u g a r sa n d polysaccharides

Fats+fatty acids CYTOSOL

Figure2-80 Pathwaysfor the production of acetyl CoAfrom sugarsand fats. The mitochondrionin lt is eucaryotic cellsis the placewhereacetylCoAis producedfrom both typesof majorfood molecules. occurand wheremostof its ATPis made. thereforethe olacewheremostof the cell'soxidationreactions in detailin Chapter14. arediscussed Thestructureand functionof mitochondria

extracted from the oxidation of both types of foodstuffs remains stored in the acetyl CoA molecules that are produced by the two t)?es of reactions just described. The citric acid cycle of reactions, in which the acetyl group in acetyl CoA is oxidized to CO2and H2O,is therefore central to the energy metabolism of aerobic organisms. In eucaryotesthese reactions all take place in mitochondria. We should therefore not be surprised to discover that the mitochondrion is the place where most of the ATP is produced in animal cells. In contrast, aerobic bacteria carry out all of their reactions in a single compartment, the cytosol, and it is here that the citric acid cycle takes place in these cells.

TheCitricAcidCycleGenerates NADHby OxidizingAcetylGroups to COz In the nineteenth century, biologists noticed that in the absence of air (anaerobic conditions) cells produce lactic acid (for example, in muscle) or ethanol (for example, in yeast), while in its presence (aerobic conditions) they consume 02 and produce CO2and H2O.Efforts to define the pathways of aerobic metabolism

Figure2-81 The oxidation of fatty acids to acetyl CoA.(A)Electronmicrographof a lipid droplet in the cytoplasm(top),and the structureof fats (bottom).Fatsare The glycerolportion,to triacylglycerols. whichthreefatty acidsarelinked throughesterbonds,is shownherein areinsolublein waterand form blue.Fats fat largelipiddropletsin the specialized in whichthey are cells(calledadipocytes) stored.(B)The fatty acid oxidationcycle. The cycleis catalyzedby a seriesof four Each enzymesin the mitochondrion. turn of the cycleshortensthe fattyacid chain by two carbons(shownin red)and generatesone moleculeof acetylCoA and one moleculeeachof NADHand The structureof FADHzis FADHz. presentedin Figure2-838.(4, courtesy of Daniel5. Friend.)

(B) fatty acyl CoA Rr,-CH2-CH2-



fatty acyl CoA shortenedby .:fl,- CH2 - C two carDons



\s-coA / C H'?\ S -C



o C-

acetylCoA hydrirclrrbohtail

hydrocarbon tail

\? R-CH2-C-CH2-C.



r$,-CHr- Ct l- C HH

o tlC

e s t e rb o n d



o C-

//o -c cH i$tct-tr-cu: 's


2 C




Chapter2: CellChemistryand Biosynthesis

eventually focused on the oxidation ofpyruvate and led in 1937to the discovery of the citric acid cycle, also knoum as the tricarboxylic acid cycle or the Krebs cycle.Thecitric acid cycle accounts for about two-thirds of the total oxidation of carbon compounds in most cells, and its major end products are CO2and highenergy electrons in the form of NADH. The CO2 is released as a waste product, while the high-energy electrons from NADH are passed to a membrane-bound electron-transport chain (discussedin Chapter 14), eventually combining with 02 to produce H2O. Although the citric acid cycle itself does not use 02, it requires 02 in order to proceed because there is no other efficient way for the NADH to get rid of its electrons and thus regeneratethe NAD+ that is needed to keep the cycle going. The citric acid cycle takes place inside mitochondria in eucaryotic cells. It results in the complete oxidation of the carbon atoms of the acetyl groups in acetyl CoA, converting them into CO2. But the acetyl group is not oxidized directly. Instead, this group is transferred from acetyl CoA to a larger, four-carbon molecule, oxaloacetate,to form the six-carbon tricarboxylic acid, citric acid, for which the subsequent cycle of reactions is named. The citric acid molecule is then gradually oxidized, allowing the energy of this oxidation to be harnessedto produce energy-rich activated carrier molecules. The chain of eight reactions forms a cycle because at the end the oxaloacetate is regenerated and enters a new turn of the cycle, as shown in outline in Figure 2-82. we have thus far discussed only one of the three types of activated carrier molecules that are produced by the citric acid cycle, the NAD+-NADH pair (see Figure 2-60). In addition to three molecules of NADH, each turn of the cycle also produces one molecule of FADH2 (reduced flavin adenine dinucleotide) from FAD and one molecule of the ribonucleotide GTP (guanosine triphosphate) from GDP The structures of these two activated carrier molecules are illustrated in Figure 2-83. GTP is a close relative of ATB and the transfer of its terminal phosphate group to ADP produces one ATP molecule in each cycle. Like NADH, FADHz is a carrier of high-energy electrons and hydrogen. As we discussshortly, the energy that is stored in the readily transferred high-energy electrons of NADH and FADH2will be utilized subsequently for Arp production through the process of oxidatiue phosphorylation, the only step in the oxidative catabolism of foodstuffs that directly requires gaseousoxygen (oz) from the atmosphere. Panel 2-9 (pp. 122-123)presents the complete citric acid cycle.Water, rather than molecular oxygen, supplies the extra oxygen atoms required to make co2 from the acetyl groups entering the citric acid cycle.As illustrated in the panel,

o -t - s-coA Hrc acetylCoA 2C


oxd toacelale

4C +H*

/ I 4C


*{ srEP2








srEP 3



I 5C


Figure2-82 Simpleoverviewof the citric acid cycle.The reactionof acetylcoA with oxaloacetatestartsthe cycleby producingcitrate(citricacid).In eachturn of the cycle,two molecules of CO2are producedas wasteproducts,plus threemolecules of NADH,one molecule of GTP, and one moleculeof FADH2. The numberof carbonatomsin each intermediateis shown in a yellowbox. Fordetails,see Panel2-9 (pp. 122-123).



o ll c





ill - N - - c- t - r n c llrtll






,rN '' c,. ,rC'-N.' n




H-C-OH (B)

three moleculesof water are split in each cycle,and the orygen atoms of some of them are ultimately used to make CO2. In addition to pyruvate and fatty acids, some amino acids pass from the cytosolinto mitochondria, where they are alsoconvertedinto acetylCoAor one of the other intermediatesof the citric acid cycle.Thus,in the eucaryoticcell,the mitochondrion is the center toward which all energy-yieldingprocesseslead, whether they begin with sugars,fats,or proteins. Both the citric acid cycle and glycolysisalso function as starting points for important biosynthetic reactionsby producing vital carbon-containing intermediates,such as oxaloacetateand a-ketoglutarate.Someof these substances produced by catabolism are transferredback from the mitochondrion to the cytosol,where they servein anabolicreactionsasprecursorsfor the synthesisof many essentialmolecules,such as amino acids (Figure244).



Figure2-83 The structuresof GTPand FADHz.(A)GTPand GDPare close relativesof ATPand ADP,respectively. (B)FADH2is a carrierof hydrogensand high-energyelectrons,like NADHand is shown here in its oxidized form (FAD)with the hydrogen-canying atoms h ighlightedin yellow.

nucleotides glucose6-phosp nur. /

amrno sugars


fructose 6-phosphate


glycolipids glycoproteins


/\ I

+ + +


d i h y d r o x y a c e t o n+e PhosPhate



a m i n oa c i d s pyrimidines

phosphoenolpyruvate alanine .*.-


andthe citric Figure2-84Glycolysis acidcycleprovidethe precursors manyimportant neededto synthesize Theaminoacids, biologicalmolecules. andother lipids,sugars, nucleotides, hereasproducts-in molecules-shown for the many turnserveasthe precursors the cell.Eachb/ack macromoleculesof arrowinthisdiagramdenotesa single thered reaction; enzyme-catalyzed with arrowsgenerallyrepresentpathways to produce manystepsthatarerequired products. the indicated


Chapter2:CellChemistryand Biosynthesis

ElectronTransportDrivesthe Synthesis of the Majorityof the ATP in MostCells Most chemical energy is released in the last step in the degradation of a food molecule. In this final process the electron carriers NADH and FADH2 transfer the electrons that they have gained when oxidizing other molecules to the electron-transport chain, which is embedded in the inner membrane of the mitochondrion (seeFigure 14-10).As the electrons pass along this long chain of specialized electron acceptor and donor molecules, they fall to successivelylower energy states.The energy that the electrons release in this process pumps H+ ions (protons) across the membrane-from the inner mitochondrial compartment to the outside-generating a gradient of H+ ions (Figure 2-85). This gradient servesas a source of energy,being tapped like a battery to drive a variety of energy-requiring reactions.The most prominent of these reactions is the generation of ATP by the phosphorylation of ADP At the end of this series of electron transfers, the electrons are passed to molecules of oxygen gas (Oz) that have diffused into the mitochondrion, which simultaneously combine with protons (H*) from the surrounding solution to produce water molecules. The electrons have now reached their lowest energy Ievel, and therefore all the available energy has been extracted from the oxidized food molecule. This process, termed oxidative phosphorylation (Figure 2-86), also occurs in the plasma membrane of bacteria. As one of the most remarkable achievements of cell evolution, it is a central topic of Chapter 14. In total, the complete oxidation of a molecule of glucose to H2O and CO2is used by the cell to produce about 30 molecules of ATP In contrast, only 2 molecules of ATP are produced per molecule of glucose by glycolysis alone.

AminoAcidsand Nucleotides ArePartof the NitrogenCycle So far we have concentrated mainly on carbohydrate metabolism and have not yet considered the metabolism of nitrogen or sulfur. These two elements are important constituents of biological macromolecules. Nitrogen and sulfur atoms pass from compound to compound and between organisms and their environment in a seriesof reversible cycles. Although molecular nitrogen is abundant in the Earth's atmosphere, nitrogen is chemically unreactive as a gas.Only a few living speciesare able to incorporate it into organic molecules, a process called nitrogen fixation. Nitrogen fixation occurs in certain microorganisms and by some geophysical processes, such as lightning discharge.It is essentialto the biosphere as a whole, for without it life could not exist on this planet. Only a small fraction of the nitrogenous compounds in today's organisms, however, is due to fresh products of nitrogen fixation from the atmosphere. Most organic nitrogen has been in circulation for

pyruvatefrom gl y c o l y s i s I

Coz I

N A D Hf r o m glycolysis I

Oz I


:*Hl * P



acetyl CoA









Figure2-85 The generationof an H+gradientacrossa membraneby electron-transportreactions. A high-energy electron(derived, for example,from the oxidationof a metabolite) is passedsequentially by carriers A, B,and C to a lowerenergy state.In this diagramcarrierB is arranged in the membranein sucha way that it takesup H+from one sideand releases it to the otherasthe electronpasses. The resultis an H+gradient.As discussed in Chapter14,this gradientis an important form of energythat is harnessed by other membraneoroteinsto drivethe formationof ATP.

Figure2-86 Thefinal stagesof oxidation of food molecules.Molecules of NADH (FADHz and FADH2 is not shown)are producedby the citricacidcycle.These activatedcarriers donatehigh-energy electrons that areeventuallyusedto reduceoxygengasto water. A majorportionof the energyreleased duringthe transferof theseelectrons alongan electron-transfer chainin the mitochondrial innermembrane(or in the plasmamembraneof bacteria)is harnessed to drivethe synthesis of ATPhencethe nameoxidative (discussed phosphorylation in Chapter14).



some time, passing from one living organism to another. Thus present-day nitrogen-fixing reactions can be said to perform a "topping-up" function for the total nitrogen supply. Vertebrates receive virtually all of their nitrogen from their dietary intake of proteins and nucleic acids. In the body these macromolecules are broken down to amino acids and the components of nucleotides, and the nitrogen they contain is used to produce new proteins and nucleic acids-or utilized to make other molecules. About half of the 20 amino acids found in proteins are essential amino acids for vertebrates (Figure 2-87), which means that they cannot be synthesizedfrom other ingredients of the diet. The others can be so synthesized, using a variety of raw materials, including intermediates of the citric acid cycle as described previously.The essentialamino acids are made by plants and other organisms, usually by long and energeticallyexpensivepathways that have been lost in the course of vertebrate evolution. Roshanl(eab 02l-66950639 The nucleotides needed to make RNA and DNA can be synthesized using specializedbiosynthetic pathways. All of the nitrogens in the purine and pyrimidine bases (as well as some of the carbons) are derived from the plentiful amino acids glutamine, aspartic acid, and glycine, whereas the ribose and deoxyribose sugars are derived from glucose. There are no "essential nucleotides" that must be provided in the diet. Amino acids not used in biosynthesis can be oxidized to generatemetabolic energy.Most of their carbon and hydrogen atoms eventually form COz or HzO, whereas their nitrogen atoms are shuttled through various forms and eventually appear as urea, which is excreted.Each amino acid is processeddifferently, and a whole constellation of enzymatic reactions exists for their catabolism. Sulfur is abundant on Earth in its most oxidized form, sulfate (SOaz-).To convert it to forms useful for life, sulfate must be reduced to sulfide (S2-),the oxidation state of sulfur required for the synthesis of essential biological molecules. These molecules include the amino acids methionine and cysteine,coenzymeA (seeFigure 2-62), and the iron-sulfur centers essentialfor electron transport (see Figure 14-23). The process begins in bacteria, fungi, and plants, where a special group of enzymes use ATP and reducing power to create a sulfate assimilation pathway. Humans and other animals cannot reduce sulfate and must therefore acquire the sulfur they need for their metabolism in the food that they eat.

Metabolismls Organized and Regulated One gets a sense of the intricacy of a cell as a chemical machine from the relation of glycolysis and the citric acid cycle to the other metabolic pathways sketched out in Figure 2-88. This type of chart, which was used earlier in this chapter to introduce metabolism, represents only some of the enzymatic pathways in a cell. It is obvious that our discussion of cell metabolism has dealt with only a tiny fraction of cellular chemistry. All these reactions occur in a cell that is less than 0.1 mm in diameter, and each requires a different enzyme. As is clear from Figure 2-88, the same molecule can often be part of many different pathways. Pyruvate,for example, is a substrate for half a dozen or more different enzymes,each of which modifies it chemically in a different way. One enzyme converts pyruvate to acetyl CoA, another to oxaloacetate;a third enzyrne changespyruvate to the amino acid alanine, a fourth to lactate, and so on. All of these different pathways compete for the same pyruvate molecule, and similar competitions for thousands of other small molecules go on at the same time. The situation is further complicated in a multicellular organism. Different cell tlpes will in general require somewhat different sets of enzymes. And different tissues make distinct contributions to the chemistry of the organism as a whole. In addition to differences in specialized products such as hormones or antibodies, there are significant differences in the "common" metabolic pathways among various types of cells in the same organism. Although virtually all cells contain the enzymes of glycolysis, the citric acid cycle, lipid synthesis and breakdown, and amino acid metabolism, the levels of


Figure?-87The nine essentialamino by acids.Thesecannotbe synthesized humancellsand so mustbe suppliedin the diet.


Chapter2:CellChemistryand Biosynthesis Figure2-88 Glycolysisand the citric acid cycleare at the center of metabolism.Some500 metabolic reactions of a typicalcellareshown with the reactions schematically of glycolysis and the citricacidcyclein red. Otherreactions eitherleadinto these two centralpathways-delivering small molecules to be catabolized with productionof energy-or they lead outwardand therebysupplycarbon compoundsfor the purposeof biosynthesis.

these processes required in different tissues are not the same. For example, nerve cells, which are probably the most fastidious cells in the body, maintain almost no reservesof glycogen or fatty acids and rely almost entirely on a constant supply of glucose from the bloodstream. In contrast, liver cells supply glucose to actively contracting muscle cells and recycle the lactic acid produced by muscle cells back into glucose.All types of cells have their distinctive metabolic traits, and they cooperate extensivelyin the normal state, as well as in response to stressand starvation. One might think that the whole system would need to be so finely balanced that any minor upset, such as a temporary change in dietary intake, would be disastrous. In fact, the metabolic balance of a cell is amazingly stable.\.A/henever the balance is perturbed, the cell reacts so as to restore the initial state. The cell can adapt and continue to function during starvation or disease.Mutations of many kinds can damage or even eliminate particular reaction pathways, and yet-provided that certain minimum requirements are met-the cell survives.It does so because an elaborate network of control mechanismsregulates and coordinates the rates of all of its reactions.These controls rest, ultimately, on the remarkable abilities of proteins to change their shape and their chemistry in response to changesin their immediate environment. The principles that underlie how large molecules such as proteins are built and the chemistry behind their regulation will be our next concern.



Su m m a r y Glucoseand otherfood moleculesare broken down by controlled stepwiseoxidation to prouide chemical energy in the form of ATP and NADH. Thereare three main setsof reactions that act in series-the products of each being the starting material for the next:glycolysis(which occursin the cytosol),the citric acid cycle(in the mitochondrial matrix), and oxidatiue phosphorylation (on the inner mitochondrial membrane).The intermediate products of glycolysk and the citric acid cycleare usedboth as sourcesof metabolic energyand to produce many of the small moleculesusedas the raw materials for biosynthesis.Cellsstore sugar moleculesas glycogenin animals and starch in plants; both plants and animals also usefats extensiuelyas a food store.Thesestorage materials in turn serueas a major sourceof food for humans, along with the proteins that comprisethe majority of the dry massof most of the cellsin thefoods we eet.


TableQ2-1 Radioactiveisotopesand someof their properties(Problem2-12).

Whichstatementsare true?Explainwhy or why not. 2-1 Of the original radioactivityin a sample,only about 1/ 1000will remain after 10 half-lives. 2-2

A 10-BM solution of HCI has a pH of B.

2-3 Most of the interactions between macromolecules could be mediatedjust aswell by covalentbonds as by noncovalentbonds.

14c 36 35s 32P

B particle B particle B particle B particle

5730years 12.3years 87.4days 14.3days

0.062 29 1490 9120

2-4 Animals and plants use oxidation to extract energy from food molecules. 2-5 If an oxidation occurs in a reaction, it must be accompaniedby a reduction. 2-6 Linking the energetically unfavorable reaction A -+ B to a second,favorablereaction B -+ C will shift the equilibrium constantfor the first reaction. 2-7 The criterion for whether a reaction proceedsspontaneouslyis AG not AGo,becauseAG takesinto accountthe concentrationsof the substratesand products. 2-8 Becauseglycolysis is only a prelude to the oxidation of glucosein mitochondria, which yields l5-fold more AIB glycolysisis not really important for human cells. 2-9 The oxygen consumed during the oxidation of glucosein animal cellsis returned as COzto the atmosphere. Discussthe following problems. 2- 10 The organicchemistryof living cellsis said to be special for two teasons:it occurs in an aqueous environment and it accomplishessome very complex reactions.But do you suppose it is really all that much different from the organic chemistry carried out in the top laboratoriesin the world? \A/tryor why not? 2-11 The molecular weight of ethanol (CHgCHzOH)is 46 and its density is 0.789g/cm3. A. \A4ratis the molarity of ethanol in beer that is 5% ethanol by volume? [Alcohol content of beer varies from about 4Vo(lite beer) to B% (stout beer).1 B. The legal limit for a driver's blood alcohol content varies,but 80 mg of ethanol per 100 mL of blood (usually

referredto as a blood alcohollevel of 0.08)is t)?ical. \ /hat is the molarity of ethanol in a person at this legal limit? t. How many l2-oz (355-mL)bottles of 5% beer could a 70-kgpersondrink and remain under the legallimit? A 70-kg person contains about 40 liters of water. Ignore the metabolism of ethanol, and assumethat the water content of the person remains constant. D. Ethanol is metabolizedat a constant rate of about 120 mg per hour per kg body weight, regardlessof its concentration. If a 70-kg person were at twice the legal limit (160 mg/f 00 mL), how long would it take for their blood alcohol level to fall below the legal limit? 2-12 Specificactivity refers to the amount of radioactivity per unit amount of substance,usually in biology expressedon a molar basis,for example,as Ci/mmol. [One curie (Ci) corresponds to 2.22 x 1012disintegrations per minute (dpm;.1 As apparent in Table Q2-1, which lists properties of four isotopes commonly used in biology, there is an inverserelationship between maximum specific activity and half-life. Do you suppose this is just a coincidence or is there an underlying reason? Explain your answer. 2-13 By a convenientcoincidencethe ion product ofwater, K- = lH+l[OH-],is a nice round number: 1.0x 10-14M2. A. \AIhyis a solution at pH 7.0 said to be neutral? B. \A/tratis the H+ concentrationand pH of a I mM solution of NaOH? C. If the pH of a solution is 5.0,what is the concentration of OH- ions? 2-14 Suggesta rank order for the pKvalues (from lowestto highest)for the carboxylgroup on the aspartateside chain


Chapter2:CellChemistryand Biosynthesis

in the following environments in a protein. Explain your ranking. 1. An aspartateside chain on the surfaceof a protein with no other ionizable groupsnearby. 2. An aspartateside chain buried in a hydrophobic pocket on the surlaceof a protein. 3. An aspartateside chain in a hydrophobic pocket adjacent to a glutamateside chain. 4. An aspartateside chain in a hydrophobic pocket adjacent to a lysine side chain. 2-15 A histidine side chain is knol,rrnto play an important role in the cataly.ticmechanismof an enz).ryne; however,it is not clear whether histidine is required in its protonated (charged)or unprotonated (uncharged)state.To answerthis question you measureenzyrneactivity over a range of pH, with the resultssho\^Trin Figure Q2-1. \Ahich form of histidine is required for enz)ryneactivity? FigureQ2-1 Enzyme activityasa functionof pH(Problem 2-15).

c f

E o


o ,r_ ,_a, , ! C- O

FigureQ2-2 Threemolecules that illustrate the sevenmostcommonfunctionalgroupsin biology(Problem2-17).1,3-Bisphosphoglycerate and pyruvateareintermediates in glycolysis and cysteineis an aminoacid.


1 , 3 - b i s p h o s p h o g l y c e r a t e pyruvare



Calculatethe instantaneousvelocity of a water molecule (molecularmass= 1Bdaltons),a glucosemolecule (molecular mass = lB0 daltons),and a myoglobin molecule (molecular mass = 15,000daltons) at 37"C. Just for fun, convert thesenumbers into kilometers/hour.Beforeyou do any calculations,try to guesswhether the moleculesare moving at a slow crawl ( 1), reactionswith a large increasein 5 (that is, for which A5 > 0) are favoredand will occurspontaneously. in Chapter2, heat energycausesthe random As discussed the transferof heat from an commotionof molecules.Because the number of enclosedsystemto its surroundingsincreases different arrangementsthat the moleculesin the outsideworld their can be shown that the can have,it increases releaseof a fixed quantity of heat energyhasa greaterdisordering effect at low temperaturethan at high temperature,and that the value of A5 for the defined above (ASr"u), is preciselyequalto h, the amount of heattransferredto the surroundingsfrom the system,dividedby the absolute temperature(f ):

T H EG I B B S FREE E N E R G YG, When dealingwith an enclosedbiologicalsystem,one would like to have a simpleway of predictingwhether a given reaction will or will not occurspontaneously in the system.We have seenthat the crucialquestionis whether the entropy changefor the universeis positiveor negativewhen that reactionoccurs. In our idealizedsystem,the cell in a box,there are two separate componentsto the entropy changeof the universe-the entropy changefor the systemenclosedin the box and the entropy changefor the surrounding"sea"-and both must be added together before any predictioncan be made.For example,it is possiblefor a reactionto absorbheat and therebydecreasethe entropy of the sea (A5r""< 0) and at the sametime to cause sucha large degreeof disorderinginsidethe box (A56o* > 0) = A5r"" + A56o,is greater than 0. In this that the total A5rn;u"rr" casethe reactionwill occurspontaneously, eventhough the seagivesup heat to the box during the reaction.An exampleof sucha reactionisthe dissolvingof sodiumchloridein a beaker containingwater (the "box"), which is a spontaneousprocess eventhough the temperatureof the water dropsasthe salt goesinto solution. Chemistshavefound it usefulto define a number of new "compositefunctions"that describecombinationsof physical propertiesof a system.The propertiesthat can be combined includethe temperature(f), pressure(P), volume (V), energy (E), and entropy (5). The enthalpy(H) is one suchcomposite function.But by far the most usefulcompositefunction for biologistsis the Gibbs free energy, G. lt servesas an accounting devicethat allowsone to deducethe entropy changeof the universeresultingfrom a chemicalreactionin the box, while avoidingany separateconsiderationof the entropychangein the sea.The definition of G is G=H-TS where, for a box of volume V, H is the enthalpydescribedabove (E + PV), r is the absolutetemperature,and 5 is the entropy. Eachof thesequantitiesappliesto the insideof the box only. The changein free energyduring a reactionin the box (the G of the productsminusthe G of the startingmaterials)is denoted asAG and, as we shallnow demonstrate,it is a direct measureof the amount of disorderthat is createdin the universewhen the reaction occurs.

At constant temperature the change in free energy (AG) during a reactionequalsAH - IA5. Rememberingthat AH = -h, the heat absorbedfrom the sea,we have

But h/f is equal to the entropy change of the sea (A5r""),and the A5 in the above equation is A56o^.Therefore

We concludethat the free-energychangeis a direct measure of the entropy changeof the universe.A reactionwill proceed in the directionthat causesthe changein the free energy(AG) to be lessthan zero, becausein this casethere will be a positive entropy changein the universewhen the reactionoccurs. For a complexset of coupledreactionsinvolvingmany the total free-energychangecan be comdifferent molecules, puted simplyby adding up the free energiesof all the different molecularspeciesafter the reactionand comparingthis value with the sum of free energiesbefore the reaction;for common the requiredfree-energyvaluescan be found from substances publishedtables.In this way one can predictthe directionof a reactionand thereby readilycheckthe feasibilityof any proposedmechanism. Thus,for example,from the observed proton gradient valuesfor the magnitudeof the electrochemical acrossthe inner mitochondrialmembraneand the AG for ATP hydrolysisinsidethe mitochondrion,one can be certainthat ATP synthaserequiresthe passageof more than one proton for each moleculeof ATPthat it synthesizes. The value of AG for a reactionis a direct measureof how far the reactionis from equilibrium.The large negativevaluefor ATP hydrolysisin a cell merelyreflectsthe fact that cellskeep the ATP hydrolysisreactionas much as 10 ordersof magnitude away from equilibrium.lf a reactionreachesequilibrium, AG = 0, the reactionthen proceedsat preciselyequal rates in the forward and backward direction. For ATP hydrolysis, equilibriumis reachedwhen the vast majorityof the ATP has been hydrolyzed,as occursin a dead cell'

F o r e a c hs t e p ,t h e p a r t o f t h e m o l e c u l et h a t u n d e r g o e sa c h a n g ei s s h a d o w e di n b l u e , a n d t h e n a m e o J t h e e n z y m et h a t c a t a l y z etsh e r e a c t i o ni s i n a y e l l o w b o x .

S T E P1 G l u c o s ei s phosphorylatedby ATPto f o r m a s u g a rp h o s p h a t e . T h e n e g a t i v ec h a r g eo f t h e phosphatepreventspassage of the sugar p h o s p h a t et h r o u g h t h e p r a s m am e m D r a n e , t r a p p i n gg l u c o s ei n s i d e the cell.

o. .H \,/






f r o m c a r b o n1 t o H O c a r o o nz , t o r m t n g a ketosefrom an a l d o s es u g a r ( S e e Panel2-4.)

*'-i? +




H-C-OH 5 | -CH,OqP(openchainform)


l5 .CH,OP ( o p e nc h a i nf o r m )

STEP3 The new hydroxyl .l g r o u p o n c a r b o n1 ' , phosphorylatedby ATP,in p r e p a r a t i o nf o r t h e f o r m a t i o n o f t w o t h r e e - c a r b o ns u g a r phosphates.The entry of sugars i n t o g l y c o l y s iiss c o n t r o l l e da t t h i s s t e p ,t h r o u g h r e g u l a t i o no f t h e enzyme p hosphof ru ctok i nase-

P O H-) |C, " - \o (








f ructose 1,6-bisphosphate


STEP 4 The s i x - c a r b o sn u g a ri s c l e a v e dt o p r o d u c e two three-carbon m o l e c u l e sO n l y t h e glyceraldehyde 3-phosphate can p r o c e e di m m e d i a t e l y through glycolysis










cH2o P






( o p e nc h a i nf o r m ) f r u c t o s e1 , 5 - b i s p h o s p h a t e





cH2o P

S T E P5 Theother product of step 4, i hyd d roxyacetone p h o s p h a t ei,s isomerized to form glyceraldehyde 3-phosphate.



glyceraldehyde ?-nhncnh:te



" I I


cH2o P

g l y c e r a l d e h y d3e- p h o s p h a t e







cH2o P F i g u r e2 - 7 3 )


g l y c e r a l d e h y d3e- p h o s p h a t e

S T E P7 T h et r a n s f e r t o A D Po f t h e h i g h - e n e r g yp h o s p h a t e groupthat was generated in step 6 forms ATP.





cH2o P 1,3-bisphosphoglycerate

o. .o \//

o. .o\// 'Cl

STEP 8 Theremaining p h o s p h a t ee s t e rl i n k a g ei n 3-phosphoglycerate, which has a relativelylow free energy of hydrolysis,is moved from carbon 3 to carbon 2 to form 2-phosphoglycerate.











cH2oH 2 - p h o s p h olgy c e r a t e


cHz p h o s ph o e n oIp y r u v a t e






o. .o\./

o. .o\//


2-p hosphog lycerate

o. .o \./

o. .o \,/

STEP9 The removal of water from 2-phosphoglycerate c r e a t e sa h i g h - e n e r g ye n o l p h o s p h a t el i n k a g e .

STEP10 The transfer to ADP of the high-energy p h o s p h a t eg r o u p t h a t w a s generated in step 9 forms A T P ,c o m p l e t i n gg l y c o l y s i s .



- C H r O ' .P .


cHz p h o s p h o e n oply r u v a t e


In addition to the pyruvate,the net productsare t w o m o l e c u l e so f A T Pa n d t w o m o l e c u l e so f N A D H





The completecitricacidcycle.The two carbonsfrom acetylCoA that enter this turn of the cycle(shadowedin ) will be convertedto CO, in subsequentturns of the cycle:it is the two carbons shadowed in blue that are convertedto CO, in this cycle.


acetyl CoA


eoo *", tHO-C-COO-

Step 1


in, \*'ioo\


(6c) isocitrate fn, HC COO



no-tH Coo-

fumarate (4C)








t' CH

s u c c i n a t e( 4 C ) ,rStep6





succinylCoA (4C)













Detailsof the eight stepsare shown below. For eachstep,the part of the moleculethat undergoesa changeis shadowedin hlue and the name of the enzymethat catalyzesthe reactionis in a yellow box.

O:C -S-CoA

STEP1 After the enzyme removesa proton from the CH, group on acetyl CoA, t h e n e g a t i v e l yc h a r g e d C H r - f o r m sa b o n d t o a carbonylcarbon of oxaloacetate.The s u b s e q u e nlto s sb y h y d r o l y s ios f t h e c o e n z y m e A (CoA)drivesthe reaction strongly forward.

S T E P2 An isomerization reaction,in which water is f i r s t r e m o v e da n d t h e n added back, movesthe hydroxyl group from one c a r b o na t o m t o i t s n e i g h b o r



cooI CHr



I 9H, l




+ HS-CoA + H*






Hzo H-


cooI C-H I c-cootl C-H I coo-










STEP3 ln the first of f o u r o x i d a t i o ns t e p si n t h e cycle,the carbon carrying the hydroxyl group is convertedto a carbonyl g r o u p .T h e i m m e d i a t e p r o d u c ti s u n s t a b l e l,o s i n g C O ,w h i l e s t i l l b o u n d t o the enzyme.

cooI H-C -H I H-C -H I a-i I coo






I coo-



STEP4 The o-ketog/utarate dehyd ro gen asecomplex closely r e s e m b l etsh e l a r g ee n z y m e complexthat convertspyruvate to acetyl co{(pyruvate dehydrogenase).lt likewise catalyzesan oxidation that producesNADH,CO2,and a h i g h - e n e r g yt h i o e s t e rb o n d t o coenzymeA (CoA).

STEP5 A phosphate m o l e c u l ef r o m s o l u t i o n d i s p l a c etsh e C o A ,f o r m i n g a h i g h - e n e r g yp h o s p h a t e l i n k a g et o s u c c i n a t eT. h i s p h o s p h a t ei s t h e n p a s s e dt o G D Pt o f o r m G T P .( l n b a c t e r i a and plants,ATP is formed instead.)

cooI H-C-H I-H

cooI H-C -H I H-C-H I c:o I coo-








I S-CoA s u c c ln a r e


coo I-H I


s u c cni a t e

cooI C-H H-C


coo f u marate

S T E P8 l n t h e l a s to f J o u r o x i d a t i o ns t e p si n t h e c y c l et,h e c a r b o nc a r r y i n gt h e h y d r o x y l group is converted t o a c a r b o n y lg r o u p , regeneratingthe oxaloacetate n e e d e df o r s t e p 1 .


I coo-

I coo

S T E P7 T h e a d d i t i o no f water to fumarate placesa hydroxyl group next to a c a r b o n y lc a r b o n .

coo I-H

I ,, ) n-L-r



S T E P6 ln the third oxidation step in the cycle,FAD removestwo hydrogen atoms from succinate.

cooI HO-C -H I H-C-H I coomalate









I coof umarate

cooI HO-C -H I H-C-H I coo malate

cooI c:o I CH,

I coo oxa loacetate

+ HS-CoA


Chapter2: CellChemistryand Biosynthesis

REFERENCES General Berg,JM,Tymoczko, JL& StryerL (2006)Biochemistry, 6th ed New York:WH Freeman GarrettRH& Grisham CM (2005)Biochemistry, 3rded philadelphia: ThomsonBrooks/Cole Hortonl-1R, MoranLA,Scrimgeour et a (2005)Princip esof Bioch-.mistry 4th ed UpperSaddleRiver, NJ:prenticeHall NelsonDL& CoxMM (2004)Lehnlnger Principles of Biochemistry, 4th ed NewYork:Worth NichollsDG& Ferguson S_l(2002)Bioenergerics,3rd ed Newyork: AcademicPress MathewsCK,van Ho de KE& AhernK G (2000)Biochemistry, 3rded 5 a ql r a r c , s c oB: e n j a rr C u m m i n g s MooreJA(1993)Sclence Asa Wayof KnowingCambridge, MA: Harvard University Press VoetD,Voet.lG& PrattCIV(2004)Fundamentals of Biochemistry, 2nd ed NewYork:Wiley The ChemicalComponentsof a Cell AbelesRH,FreyPA& Jencks WP(1992)Biochemistry Boston: Jones& Bartlett AtkinsPW('l996)Mo ecues NewYork:WH Freeman Branden C & ToozeJ (l 999) ntroduction to ProteinStructure, 2nd ed NewYork:Garland Scence Bretscher MS(,l985) Themolecules of the cel membrane5clAm 2 5 3 :01O I O 9 Burey 5K& Petsko GA(t 9BB)Weaklypolarinteractions in proteins,4dy PrateinChem39.125-1 89 De DuveC (2005)Singulanties: Landmarks on the pathways of Lif-. Cambridge: Cambridge University Press DowhanW (1997) Molecular phospholipid basisfor membrane diversity: Whyarethereso many ipids?AnnuRevBiochem66:j99-232 EsenbergD & Kauzman W (l 969)TheStructure and properties of WaterOxford:OxfordUnivers ty Press FershtAR(198/)Ihe hydrogenbond in molecular recognitionIrendj BiochemSci123A1-304 Franks F ('l993)WaterCambridge: RoyalSociety of Chemistry Henderson ll (1927) TheFitness of the Envronment,1958ed Boston: Beacon Neidhardt FC,Ingraham _lL& Schaechter M (t 990)physioiogy of the Bacterial Cel: A Mo ecularApproachSunderland, MA:Sinauer PaulingL (1960)Ihe Natureof the Chemical Bond,3rded thaca,Ny: CornellUniversity Press Saenger W (l 984)Princrples of NucleicAcidStructure, New yorx: S p rni g e r SharonN (1980)Carbohydrates 5ci,4,m 243.90116 Stillinger FH(1980) WarerrevisitedScience 2a9.45j-457 TanfordC ('1978) Thehydrophobtc effectandthe organization of living m a t t e rS c l e n c2e0 0 : , l 0 1l2O l 8 TanfordC (1980) ThelydrophobicEffectFormation of Micelesand BioogicalMembranes, 2nd ed Newyork.JohnWi ey Catalysisand the Use of Energy by Cells AtkinsPW(1994) Ihe SecondLaw:Energy, Chaosand Form Newyork: Scientif c American Books AtkinsPW& De PaulaiD (2006)Physical Chemistry for the Life press Sciences Oxford:OxfordUniversity BaldwinJE& KrebsH (1981)The Evolurion of Metabolic CyclesNciure 291:381-382 BergHC(1983)RandomWalksin B ology Princeton, NJ:princeton University Press Dickerson RE(,1969) MolecularThermodynamics Menlopark,CA: B e n j a m iCn u m m i n g s DillKA& Bromberg S (2003) Molecular DrivingForces: StatisticalThermodynamics in Chemistry and Bioogy Newyork:Garland Science Dressler D & PotterH (1991)DiscovelngEnzymes Newyork:Sclentific American L brary

Einstein A (1956)lnvestigations on the Theoryof Brownian Movement NewYork:Dover FrutonJS(1999)Proteins, Enzymes, Genes: The nterplayof Chemistry and Bioogy NewHaven:Yale University Press, GoodseI DS(1991)nsidea livingcell Trends BiochemSci16:203-206 Karplus M & McCammon JA (1986) Thedynamics of protens SclAm 254:42-51 ) o l e c u l adry n a m i cssi m u l a t i o ni ns K a r p l uM s & P e t s kG o A( 1 9 9 0M biology Nature347:631639 Kauzmann W (1967) Thermodynamics andStatistics: withApplications to GasesIn ThermalProperties of MatterVol2 NewYork:WA Benjamin, Inc Kornberg A (1989)Forthe Loveof Enzymes Cambridge, MA:Harvard University Press Lavenda BH(,1985) Brownian Motion5ci,4m252:7085 LawlorDW (2001)Photosynthesis, 3rded Oxford:BIOS L e h n i n g eArL ( 1 9 1 1T) h eM o l e c u l aBra s iosf B i o l o g i cE an l ergy Transformations, 2nd ed MenloPark, CA:Benjamin Cummings LipmannF (1941)Metabolic generation and uti izationof phosphate bondenergyAdvEnzymol 1:99-162 LipmannF (1971) Wanderings of a Biochemist NewYork:Wiley NisbetEE& SleepNH (2001)The habitatand narureof earlylife Nature 409:1081 3091 Racker E (l9BO)FromPasteur to Mitchell: a hundredyearsof n l n t r n t r r ^ o l r r c L o / 1p t ^ . t , ) : 2 l O - 2 I 5

Schrodinger E (1944& 1958)Whatis Life?: ThePhysicai Aspectof the L i v i n gC eI a n dM i n da n dM a t t e r1, 9 9 2c o m b i n e d ed Cambridge: Cambridge University Press van HoldeKE,JohnsonWC& Ho PS(2005)Principles of Physical Biochemistry, 2nd ed UpperSaddleRiver, NJ:Prentice Hal WalshC (2001)Enabling the chemistry of life Nature409.226-23i Westheimer FH(1982) Why naturechosephosphates Science 235.11/3-1t78 YouvanDC& MarrsBL(1987)Molecular mechanisms of photosynthesis SciAm 256:4249 How CellsObtain Energyfrom Food CramerWA & KnaffDB(1990)Energy Transduction in Bioogical Membranes, NewYork:Springer-Verlag, Dismukes GC,KlimovW, Baranov SVet al (2001) Theoriginof atmospheric oxygenon Earth: Theinnovation of oxygenic photosyntheis PracNatAcadSciUSA9821702175 Fel D (l 997)Understanding the Controlof MetabolismLondon: Portand Press F att JP(1995)Useand storageof carbohydrate and fat Am J ClinNutr 61,95259595. FriedmannHC(2004)FromButybacterium to E.coli:An essayon unity in biochemistryPerspect Biollrled47:47-66 Fothergill-Gilmore LA (,1986) Theevolutionof the glycolytic pathway Trends BiochemSci11:475l Heinrich R,Melendez-Hevia E,MonteroF et al (,1999) Thestructural designof glycolysis: An evo utionaryapproachBiochem SocTrans 27:294-298 HuynenMA,Dandekar T & BorkP (1999)Variarion and evolutionof the citricacidcycle:a genomicperspective, Trends Microbrol l:281-291 Kornberg HL(2000)Krebsand histrinityof cyclesNatureRevMolCell Biol1.225-228 KrebsHA& MartinA (1981)Reminiscences and Reflections Oxford/New York:Clarendon Press/Oxford University Press, KrebsHA (l 970)The historyof the tricarboxylic acidcycle perspect Biol Med14.154-17a MartlnBR(1987)Metabolic Regulation: A Molecular ApproachOxford: Blackwell Scientific McGilvery RW(,1983) Biochemistry: A Functional Approach, 3rded P h i l a d e l p hSi aa:u n d e r s MorowitzHJ(1993)Beginnings of Cellular Life:Metabolism Recapitulates Biogenesis NewHaven: YaleUniversity Press Newsholme EA& StarkC (1973)Regulation of Metabolism NewYork.Wiley,


'suorrerlaJqqe Jraq] srsr z-g a.rntlc pue seJnronJlsc..uore rreqt s^aoqs(6ZI-8ZI'dd) f-g larrud 'uo os pue 'spuoq tualeloc urrog ,{lrpear aruos 'pa8reqc dlanrlrsodro dyanrle8aueJesJeqlo '(,,3upeay-ra1e,,rn,,) crqoqdorpl.q pue reloduou eJe sureqo aprs asaq] Jo auros '(I-t arnt1fl suleqc epls prJe ourue lueJeJJrp0Z aql :saruadord anbrun slr prJEourrue qrea arr€ leqt pue puoq appdad u Suqeru ur pallolur lou aJp leql sprJp ourup aql Jo suoruod asoq] erp urcqr anppadar sq] ol peqJeDV'auoqlJuq eppdad,tod aql sB 01 parreJer sr ureqJ apltdaddlod eq] Jo eroo aqt Suop sruole;o acuanbas Surleadar aql 'acuanbas prJE ourrue relnrrged uir,ro sll qlp/r qf,Ea 'suralord luaragrp Jo spuesnoql dueur eJe ereq] pue 'spr3e ourrue yo acuanbas anbrun e seq uraloJd yo addr qteg'saptldadt1od se u,\i\oDl osp aJoJereqleJe sureloJd 'puoq apudad luapnoc e q8norq] roqq8rau slr ol pa{url qJpe 'sproeourrup asaqlJo uuqc Suoye tuo4 eperu sr elnJalou ugalord V'serua -dord pcruraqc lualaJJrpqllm qJEe 'suralord ur sprJeounuu;o saddl 0Z eJeaJaqJ

e)uenbesp!)V oulr,uvsll ^q palJ!)adssl ulelorde Jo adeqsaql 'lleJ e ur uorlJury slr seurru -Jelap alnJelour uralord qJee adeqs asrcardaql rtoq aqrJosepol Jo la^al Jnuole eqt te eJn]f,nr1suralord;o Surpuelsrapun srql esn alvrtaldeqc aqt ur Jale'I 'adeqs puolsuolulp-eerqt sll saurruJalep uralord p sruJoJleq] sprJe ourrue ;o 3uu1s 3uo1aql ur prJ€ ourrue qcea Jo uouelol eql ,lrroqJeprsuooa^ 'uorloes srq] uI 'Eurzeute ,{pr1 ruaas uec suralord;o flqqesran alqe{JeueJ eql 'suedxe ol uala '1atr'&o1srq ,{.reuor1n1ona ;o sreaf Jo suoilrq JeAopeun]-eug pue pado -la^ep uaeq seq uratord qJeaJo frlsruraqc pup eJnlJnJlsaql leq] ezlleat ellr eJuo 'Sursrrdrns 1ou sdeqrad sl sHI'umou) se1ncelotupalecqsrqdosl.lpuorlcurg pue xaldruoc rtlernpnrls lsotu eq] re; dq are suralord '^aet,tJolurod pcnuaqJ E ruoJC




NOtr)NnlNElOUd sNE1OUd lo tunl)nuts


C N V ] d V H S] H I

raldeq) s$ll ul

'suralord;o Surpuelsrapun daap B urpllp lsnur a^a'uorlJunJ serpoq Jno ,t,roqro 'dolanap so,,fuqrua^r,roq'dlrculcele lJnpuoc selreu ^ oq 'lJeJluoc salcsnur ntoq '{{Jomsaua8 anoq pue}sJepun o1 adoq uec e.nA aJoJag'aJuaJseurrunlJosaJJnosro'sadol 'sJeqgJIlsEIa'salnceloruezee4rlup 'sauoruJoq 'sulxol 'sarpoqque sE 'selnoeloru ]Je suralord pazqurcads Jaqlo VNCI palloul afuelun ue) asolautostodol iutseldofc aqr q8norql salaue8ro sladord 'aldruexa rc! 'u6au!q :sped Sulrour qlIM saurqcptu relncaloru Luq sE elJas sJeqlo la;'snalcnu ilac aq] ol eueJqruau eruseydaqt tuo4 pJE^ut spu -8rs;o ,te1ar pu8rs slas srolerSalur sB lJe Jo taqloue o] IIef, euo uror; sa8es leql -saur .d.rrucsuralord Jaqlo 'lleo aql Jo lno pue olut selnoeloru IIErusJo a8essed eq] IoJtuoJ teqt sdund pue slauueqJ ruJoJauerqruaru eruseld aqt ur peppaqua suralord 'suor]3eal IEJrueqJ r{ueru slr alourord tEq} IIaJ e ur seJeJJnsJEInJeloIu aleJrrlur aq] apnord saudzua 'snql 'suorlJunJ s,llal eql IIE dpeau elnJexa osp ,taql ls{Jolq Surplnq s,lleJ eql ,i.Iuo 1ou are .,(aq; 'ssuru 'fup s,llar e Jo lsou alnlrlsuoc suraloJd 'suralord Suurrasqo 'eJuesse ur 'eJe al'- ',Qnrloe IeJrruaqJ -orq Jo lef,rJlJelas1razdpue ro adocsorcnu e q8norql IIeJ u lp {ool a a uaq \



Chapter3: Proteins

methionine (Met)












f 'l t,/














C.U, o






( |l,

I I -^N - c tl H







( H,






tyrosine (Tyr)





( ) tl

p o l y p e p t i d eb a c k b o n e

s i d ec h a i n s I I


a m i n ot e r m i n u s or N-terminus



Hei-i-3 rl Hl











( l-l I

(H, p o l y p e p t i d eb a c k b o n e






As discussedin chapter 2, atoms behave almost as if they were hard spheres with a definite radius (their uan derwaals radius). The requirement that no two atoms overlap limits greatly the possible bond angles in a pollpeptide chain (Figure 3-3). This constraint and other steric interactions severely restrict the possible three-dimensional arrangements of atoms (or conformaflons). Nevertheless, a long flexible chain, such as a protein, can still fold in an enormous number of ways. The folding of a protein chain is, however, further constrained by many different sets of weak noncoualent bonds that form between one part of the chain and another. These involve atoms in the polypeptide backbone, as well as atoms in the amino acid side chains. There are three tlpes of weak bonds: hydrogen bonds, electrostatic attractions, and uan der waals .tttractions, as explained in chapter 2 (see p. 54). Individual noncovalent bonds are 30-300 times weaker than the tlpical covalent bonds that create biological molecules. But manyweak bonds acting in parallel can hold two regions of a polypeptide chain tightly together. In this way, the combined strength of large numbers of such noncovalent bonds determines the stability of each folded shape (Figure 3-4).

Figure3-1 The components of a protein. A protein consistsof a polypeptide backbonewith attachedsioe chains.Eachtype of protein differsin its sequenceand numberof aminoacids; therefore,it is the sequence of the chemically different sidechainsthat makeseach c a r b o x ytl e r m i n u s proteindistinct.Thetwo ends or C-terminus of a polypeptidechainare chemically different:the end carryingthe freeaminogroup (NH3+, alsowrittenNH2)isthe aminoterminus,or N-terminus, and that carrying the free carboxylgroup (COO-,alsowritten COOH)is the carboxylterminusor C-terminus. Theaminoacid sequenceof a proteinis alwayspresentedin the N-to-Cdirection,reading from left to rioht.



A M I N OA C I D Asparticacid G l u t a m i ca c i d Arginine Lysine Histidine Asparagine Glutamine Serine Threonine Tyrosine

Asp Glu Arg Lys His Asn Gln Ser Thr Tyr


A M I N OA C I D D E R K H N a S T Y

Ala Alanine Gly Glycine Val Valine Leu Leucine lle lsoleucine Pro Proline P h e n y l a l a n i n eP h e Met Methionine Trp Tryptophan Cys Cysteine

negative negative positive positive positive polar uncharged polar uncharged unchargep dolar polar uncharged polar uncharged


nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar

Figure3-2 The 20 amino acidsfound in proteins.Each aminoacidhasa three-letter and a one-letterabbreviation. Thereareequalnumbersof p o l a ra n d n o n p o l asr i d e chains;howevetsomeside chainslistedhereas polarare largeenoughto havesome non-polarproperties(for example,Tyr,Thr,Arg,Lys).For seePanet atomicstructures, 3-1 (pp.128-129).

A fourth weak force also has a central role in determining the shape of a protein. As described in Chapter 2, hydrophobic molecules, including the nonpolar side chains of particular amino acids, tend to be forced together in an aqueous environment in order to minimize their disruptive effect on the hydrogenbonded network of water molecules (see p. 54 and Panel 2-2, pp. f08-109). Therefore, an important factor governing the folding of any protein is the distribution of its polar and nonpolar amino acids.The nonpolar (hydrophobic) side chains in a protein-belonging to such amino acids as phenylalanine, leucine, valine, and tryptophan-tend to cluster in the interior of the molecule (just as hydrophobic oil droplets coalesce in water to form one large droplet). This enables them to avoid contact with the water that surrounds them inside a cell. In contrast, polar groups-such as those belonging to arginine, glutamine, and histidine-tend to arrange themselves near the outside of the molecule, where they can form hydrogen bonds with water and with other polar molecules (Figure 3-5). Polar amino acids buried within the protein are usually hydrogenbonded to other polar amino acids or to the polypeptide backbone.


(B) +180

a m i n oa c i d

o HC

. l- ' c i






,,\,' I




p e p t i d eb o n d s





threebonds Figure3-3 Stericlimitationson the bond anglesin a polypeptidechain.(A)Eachaminoacidcontributes (red)to the backboneofthe chain.Thepeptidebond is planar(grayshading)and doesnot permitrotation.By contrast, rotationcanoccuraboutthe Co-Cbond,whoseangleof rotationis calledpsi (V),and aboutthe N-Cobond,whoseangle an R group is often usedto denotean aminoacidsidechain(greencircles). of rotationis calledphi (Q).BVconvention, (B)Theconformation atomsin a proteinis determinedby one pairof Q and ry anglesfor eachaminoacid; of the main-chain betweenatomswithin eachaminoacid,most pairsof Q and ry anglesdo not occur.In this sobecauseof stericcollisions plot,eachdot represents an observedpairof anglesin a protein.Theclusterof dots in the bottom calledRamachandran (seeFigure3-7A)'(8,from quadrant that arelocatedin cr-helixstructures the amino acids left represents all of from AcademicPress.) 1981.Wlthpermission J. Richardson,Adv.Prot.Chem.34:174-175,



Theo,-carbon atomisasymmetric, which allowsfor two mirrorimage(or stereo-) rsomers, LanoD.

T h e g e n e r a fl o r m u l ao f a n a m i n oa c i di s || ,/ c-carbonatom I t't' amtno ^ ^O^H. . c a r b o' x v l -CI -CO group H:N giouf R

group side-chain

R is commonlyone of 20 different sidechains. At pH 7 both the amino and carboxylgroups areionized.

Proteinsconsistexclusivelv of l-amino acids.


BASIC S I D EC H A I N S histidine (Hiso , r H)

T h e c o m m o na m i n oa c i d s are grouped accordingto w h e t h e rt h e i r s i d ec h a i n s are




a c i di c basic u n c h a r g e dp o l a r nonpolar






CH, T h e s e2 0 a m i n o a c i d s are given both three-letter and one-letterabbreviations.


ltl C-CI CH: I CH., I 9H, I









T h u s :a l a n i n e= A l a = A












Thesenitrogenshavea relativelyweak affinity for an H+and are only partly positive at neutral pH.

P E P T I DBEO N D S A m i n o a c i d sa r e c o m m o n l yj o i n e dt o g e t h e rb y a n a m i d el i n k a g e , c a l l e da p e p t i d eb o n d .



o -f







Peptidebond: The four atoms in eachgray box form a rigid planar unit. There is no rotation around the C-N bond.



\llll// -c-\-c \-e /ll\






o -c


;H Proteinsare long polymers o f a m i n oa c i d sl i n k e db y peptide bonds,and they are alwayswritten with the N-terminustoward the left. The sequenceof this tripeptide is histidine-cysteine-valine.

a m t n o -o r N-terminus


',f t'tt:

( ll {t ,

T h e s et w o s i n g l eb o n d sa l l o w r o t a t i o n ,s o t h a t l o n g c h a i n so f a m i n oa c i d sa r e v e r yf l e x i b l e .



alanine (Val, or V)

(Ala,or A)

glutamicacid ( G l u ,o r E )




lll -N-C-C-
















( )/ \



(Ile, or I)

(Leu,or L)





ttl -N-C-C-


(tl ,










( ' fI


proline (Phe,or F)

(Pro,or P)


HO -N-C-C-








//\ oNHzc o



n 9n,


t . lI

( a c t u a lal yn i m i n oa c i d )

| //\


( H,




lll -N-C-C-



di*ffiiiffift$ (Trp,or W)

(Met, or M)




lil -N-C-C-

Although the amide N is not chargedat neutral pH, it is polar.





( ll, C'H,


s-cll r



- cI I ('t



('\ \-' I oH The -OH group is polar.

(Cys,or C)

(Gly,or G)



- N - c -lcl rll

lll -N-C-Cll






Disulfidebondscan form betweentwo cysteinesidechains in oroteins. --.u




Chapter3: Proteins g l u t a m i ca c i d

electrostatic attractions R



h y d r o g e nb o n d


N -

CH, tCH,

van der Waalsattractions





Figure3-4 Threetypes of noncovalent bonds help proteinsfold. Althougha singleone of thesebondsis quiteweak, many of them often form togetherto createa strongbondingarrangement, as in the exampleshown.As in the previous figure,R is usedasa generaldesignation for an aminoacidsidechain.


ProteinsFoldinto a Conformationof LowestEnergy As a result of all of these interactions, most proteins have a particular threedimensional structure, which is determined by the order of the amino acids in its chain. The final folded structure, or conformation, of any polypeptide chain is generally the one that minimizes its free energy. Biologists have studied protein folding in a test tube by using highly purified proteins. Treatment with certain

sequence contains all the information needed for specifying the three-dimensional shape of a protein, which is a critical point for understanding cell function. Each protein normally folds up into a single stable conformation. However, the conformation changes slightly when the protein interacts with other molecules in the cell. This change in shape is often crucial to the function of the protein, as we see later. Although a protein chain can fold into its correct conformation without outside help, in a living cell special proteins called.molecular chaperonesoften assist in protein folding. Molecular chaperones bind to partly folded polypeptide chains and help them progress along the most energetically ravoriute-rolaing

-* r1I;e1e1, eql sre;duexasrql'au{zua ue ro alls a^rl)e oql le p!)e oulruP anll)pa.r{l1ensnunuy gg-g arn6r3

: HJ- t)^.-

sat!S6urpu!8-pue6!l ler)nr)lt{6!lt{6!H sraquaw {1;ure1ulelorduea/n}e8suos!.leduo)e)uanbes drlsnuaqc Jreq] ur ,tltear8 ra;;rp,{eu apcelo{u ulelord aues eq}Jo suolleruJoJuoJluaJeJJIpr(pq311s ua^a o^^l 'uoseer srq] Joc 'JeqlouEeuo 01 eArlEIaJuopeluelJo lsexa Jleql uo osp


H- N.r ^7 N llllllllrlH 1




raqlaool suralordo^^tlu!l uauo sa)eJlnspr6u^Jelueualduo) oMf (l) 'lto)-palto)e ulloJol laqtabol purquel sa)tleqn oMl (B).utalold puo)ase uo (,,6uuls,, e) uteLl)eptldadllod Jo dool papuatxaup ot putq ue) utalord auo uo a)eJlnsp!6u v (v)'uMoqsale suraloldoM] aql Jo sued 6utl)elalutaql I1u6 raqto q)ea ol pu!q ue) sulalord o/vqq)!qm ur srtemealql gg-g eln6r1

ol {uI uP3 uraloJd eql teq} os pueSl E punoJrns o1 sdnoJSIef,rrueqf,Jo Jeqrunu a8rel e ,rlrolle,{teqJ'seln3alou raqlo Surdser8roJ papr ere puDI srq} Jo sdool 'arn]3nJls ureloJd JrsBq aqt Suuetle ]noqtrm 'sdoo1asaql Jo aJuenbes prf,e ourrue pue qfual aql {1uo 3u€ueqc ,r(qsals Surpurq-ua8pup Jo .{lrsranrp snoruJoua ue aleraua8 serpoq -ltue luereJJIC '(It-t arnt;g) sureruop uralord pasoderynf dlasolc;o rred e;o spua aql ruo4 apnrtord leqt ureqc apqdaddlodyo sdool pranas ruo4 peruroJ are ,,{aqlreql slEaAaJsarpoqrlueJo salrsSurpurq-ue8rlueeql Jo uonpuruexa pelre}ep 'alnJelotu ua8uue eql V Jo ecpJJnsaql Jo uorlrod IIBrus e o1 ,rfueluauayduroc eJplpqt salrsSurpurq leJrluepr oml qtlm selnoelorupadeqs-trare serpoqpuv 'serpoquuB ]ualaJ -JrpJo suoulrq acnpord ot alqe aq ot aAEqallr'relunoJua lq8nu sueunq ]eq] sua8 -ltue tueraJJrp suorpq dlequalod ere araql asnpoag ,{lrcgrcads alqelreuer Jo qrwr (uatpue up pelleo) la8rcl slr sazruSocar.{poqrlue uV 'uor}f,nJ}sep JoJ t1 Suqreur ro ,(pcarrp alnoalou le8re1 aqt Surle.,lrlreur raqlra .{qaraql 'epceloru 1a8re1 repcpred e o] l(llq8p spurq ,ri.poqrtup qceg 'rusrueBroorJrtu Surpplur ue Jo aceJrns eql uo asoql sE qJns 'salnJelour uBrarogol esuodseJur uralsr(s aunurtur aql dq pacnpord suralord ere 'surlnqolSounruur Jo 'salpoqpuv '(97 raldeq3 ur Irelep ur passno -srp) Surpurq a^rtJelas rq8rl ro; dllcedec slr JoJ alqelou sr dyruu;,rtpoqrlue aql 'suollcunJ snorJEArraql ]no drrec o] spue8l relncqred ol pulq lsnur suralord 1ry

elrlesren I;;errads3arv salrs6u;pu;g{poqlluy 'lle3 B ur punoJ suralord luara;;rp;o spuesnoql .,(ueuraq1uror; JeulJpd auo ]sn[ ]Jalas 01 uralord e Suqqeua 'crgrcadsdlauarga eq uec suor]3e -ralur eJBJJns-aJeJrns qJns 'uosBel alues aql Joc 'lla1!\q3]eru ]eq] saJeJJnso1!\] uaamlaq ruJoJUEJ spuoq Tea^ Jo Jequnu a8reye acurs '1q8p fral aq uec suor] -Jerelur qJnS '(10'-t arn8rg) reqloup Jo teq] qtlm aJeJJnsp€u auo go Surqcleu asrcard aql .{q sr 'Je^e^toq'lJpJalur o1 suralord ro; ,(e,vruoruuros lsolu aql 'z Jeloeql ur passnJsrp se 'suralord Irolep8ar aua8 go sarlrueJ IBJeAasur punoJ sr aJeJJalururalord go adfi sq1 '(S0?-g arn8rg) IroJ-pelroJ E ruroJ o1 raqtaSo] Jred 'urelord qJee ruorJ auo 'sacllaq rl oml uaqm sruJoJaJeJralururalord-uratord 3o adfi puocas V '(ntoleq aas) aleldroqdsoqd IIIr\ ll ]eq] suralord aql azluSocal o] aseuDluratord e selqeua osle ]r pue 'paqrJJsaplsnf se 'uralord puooes e uo dool apqdaddlod pa1e1,{roqdsoqde anuSocal ol ureruop ZHS eql smolle 'aldurexa JoJ'|uorlJpJa}ur 8urr1s-ace;rns e qons '(V07-t arn8gg) uralord puoces e uo (,,3uu1s,,p) ureqr apqda&t1od;o dool papualxe uB slJeluoc uralord euo Jo eoeJJnseql Jo uoq -rod e 'seser.{ueru u1 's,{emaarq} lseel le ur suralord Jeq}o o} purq uBJ suraloJd

sa)eJralul Jo sad^I lela^asq6norql sulalotdraqlo ol pu!g suratord 'uorlcunJ uralord raqdrcap o1Surdlaq dqaraqt ,{puel leqt Jo sraqruetu aq} JoJ salrs Surpurq aurruJa}ap o1 slsrSolorq s^^olp Surcerl druuorlnlona 'raqrueru ,r{11ue3 auo JoJ paururalep uaeq seq aln}cnrls 'u^\ou{un erB suorlsunJ asoqM peJa^ossrp ueeq IPuorsuarurp-ealq] e eJuo alerl serlrrueJuralord a,rau.,tueru'SurcuanbasaruouaSanrsualxaJo pJaslq] uI l)vruns-3)vJUns o)

x|EH-Xt'llH (8)


I x!laq

z a)e|ns I a)e+Jns

sulaloJd:€ Jaloeql




h e a v yc h a i n

lr: "fr

l o o p st h a t b i n d a n t i g e n Vs domain --...... NH,

lq \-"l


v a r i a b l ed o m a i n o f l i g h t c h a i n( V r ) 5"rn




it with many weak bonds. For this reason, loops often form the ligand-binding sites in proteins.

TheEquilibrium ConstantMeasures BindingStrength Molecules in the cell encounter each other very frequently because of their continual random thermal movements. Colliding molecules with poorly matching surfaces form few noncovalent bonds with one another, and the two molecules dissociate as rapidly as they come together. At the other extreme, when many noncovalent bonds form between two colliding molecules, the association can persist for a very long time (Figure 3-42). Strong interactions occur in cells whenever a biological function requires that molecules remain associatedfor a long time-for example, when a group of RNA and protein molecules come together to make a subcellular structure such as a ribosome. We can measure the strength with which any tvvo molecules bind to each other. As an example, consider a population of identical antibody molecules that suddenly encounters a population of ligands diffusing in the fluid surrounding them. At frequent intervals, one of the ligand molecules will bump into the binding site of an antibody and form an antibody-ligand complex. The population of antibody-ligand complexes will therefore increase, but not without limit: over time, a second process, in which individual complexes break apart because of thermally induced motion, will become increasingly important. Eventually, any population of antibody molecules and ligands will reach a steady state, or equilibrium, in which the number of binding (association)events per second is precisely equal to the number of "unbinding" (dissociation)events (seeFigure 2-52). From the concentrations of the ligand, antibody, and antibody-ligand complex at equilibrium, we can calculate a convenient measure-the equilibrium constant (K)-of the strength of binding (Figure 3-43A.).The equilibrium constant for a reaction in which two molecules (A and B) bind to each other to form a complex (AB) has units of liters/mole, and half of the binding sites will be occupied by ligand when that ligand's concentration (in moles/liter) reaches a value that is equal to l/K This equilibrium constant is larger the greater the binding strength, and it is a direct measure of the free-energy difference

Figure3-41 An antibodymolecule, (A)A typicalantibodymoleculeis and hastwo identicalbinding Y-shaped sitesfor its antigen,one on eacharm of the Y.The protein is composedof four polypeptidechains(two identicalheavy chainsand two identicaland smallerlight chains)heldtogetherby disulfidebonds. Eachchainis madeup of severaldifferent immunoglobulindomains,hereshaded eitherb/ueor groy.fheantigen-binding siteisformedwherea heavy-chain variabledomain(VH)and a light-chain variabledomain(Vr)comeclose together.Thesearethe domainsthat differmost in their seouenceand structurein differentantibodies.Each domainat the end of the two armsof the antibodymoleculeformsloopsthat bind to the antigen.ln (B)we can seethese fingerlikeloops (red)contributedby the Vrdomain.


Chaoter3: Proteins

t h e s u r f a c e so f m o l e c u l e sA a n d B , a n d A a n d C , a r e a p o o r m a t c ha n d a r e c a p a b l eo f f o r m i n g o n l y a f e w w e a k b o n d s ;t h e r m a l m o t i o n r a p i d l y breaksthem aoart


m o l e c u l eA r a n d o m l ye n c o u n t e r s o t h e r m o l e c u l e s( 8 , C ,a n d D )


the surfacesof moleculesA and D match well and therefore can form enough weak bonds to withstand t h e r m a lj o l t i n g ;t h e y t h e r e f o r e stay bound to each other

between the bound and free states (Figure 3-438 and C). Even a change of a few noncovalent bonds can have a striking effect on a binding interaction, as shown by the example in Figure 3-44. (Note that the equilibrium constant, as defined here is also knor,rmas the association or affinity constant, Ku.) We have used the case of an antibody binding to its ligand to illustrate the effect of binding strength on the equilibrium state, but the same principles apply to any molecule and its ligand. Many proteins are enzymes, which, as we now discuss, first bind to their ligands and then catalyze the breakage or formation of covalent bonds in these molecules.

Figure3-42 How noncovalentbonds mediate interactionsbetween macromolecules,

Enzymes Are Powerfuland HighlySpecific Catalysts Many proteins can perform their function simply by binding to another molecule. An actin molecule, for example, need only associatewith other actin 1 dissociation AB-A+B

The relationshipbetween free-energydifferencesand equilibriumconstants(37"C)

d i s s o c i a t i o.n" 1 " = d i s s o c i a t i o n x c o n c e n t r a t i o n rate constant of AB

equilibrium constant

d i s s o c i a t i orna t e = k o r [ A B ]

A+lassociationrate =

AB assoclatlon rate constant

of AB minus o f A B m i n u s free enerqv f r e e e n e r g y tAllBl ofA+B-OTA+ts (liters/mole) (kcal/mole) (kJ/mole) lABl

association c o n c e n t r a t i o n concentration ofA ofB

1 10 102 103 104 1os 106 107 108 10e 1010 1oll

a s s o c i a t i o nr a t e = k o n [ A ] [ B l

AT EQUILIBRIUM: a s s o c i a t i orna t e = d i s s o c i a t i o rna t e kon[A] [B]




k"r [AB]


A l t h o u g hj o u l e sa n d k i l o j o u l e s( 1 0 0 0j o u l e s )a r e standard units of energy, c e l l b i o l o g i s t su s u a l l yr e f e r t o f r e e e n e r g yv a l u e si n t e r m so f c a l o r i e sa n d kilocalories.


0 -'t 4 -2.8 -4.3 -5.7 - 7. 1 -8.5 -9.9 - 11 3 -12.8

0 -5.9 -11.9 -17.8 -23.7 -29.7 -35.5 -41.5 47.4 -53 4

- 1 56



O n e k i l o c a l o r i e( k c a l )i s e q u a lt o 4 . 1 8 4k i l o j o u l e s (kJ). T h e r e l a t i o n s h i pb e t w e e n the f ree-energychange, A G ,a n d t h e e q u i l i b r i u m constant is AG = -0.00458 r log K whereAGis in kilocalories a n d f i st h e a b s o l u t e t e m p e r a t u r ei n K e l v i n s ( 3 1 0K = 3 7 " C )


Figure3-43 Relatingbinding energiesto the equilibriumconstantfor an association reaction.(A)Theequilibrium betweenmolecules A and B and the complexAB is maintainedby a balancebetweenthe two opposingreactionsshownin panels1 and 2. Molecules A and B mustcollideif they areto react,and the association rateis thereforeproportionalto the productof their individualconcentrations As shownin panel3, the ratio [A]x [B].(Squarebracketsindicateconcentration.) of the rateconstantsfor the association and the dissociation reactionsis equalto the equilibriumconstant(K)for the reaction.(B)Theequilibriumconstantin panel3 is that for the reactionA + B + AB,and the largeritsvalue,the stronger the bindingbetweenA and B.Notethat for everyl.41 kcal/mole(5.91kJlmole)decrease in freeenergythe equilibrium constantincreasesby a factor of 10 at 37'C. Theequilibriumconstantherehasunitsof liters/mole: for simplebindinginteractions it is alsocalledthe affinityconstant ot association constant,denoted Ku.The reciprocalof Kuis calledthe dissociationconstant,K6(in units of moles/liter).



molecules to form a filament. There are other proteins, however, for which ligand binding is only a necessaryfirst step in their function. This is the casefor the large and very important class of proteins called enzyrnes. As described in Chapter 2, enzymes are remarkable molecules that determine all the chemical transformations that make and break covalent bonds in cells. They bind to one or more ligands, called substrates, and convert them into one or more chemically modified products, doing this over and over again with amazing rapidity. Enzymes speed up reactions, often by a factor of a million or more, without themselves being changed-that is, they act as catalysts that permit cells to make or break covalent bonds in a controlled way. lt is the catalysisof organized sets of chemical reactions by enzymes that createsand maintains the cell, making life possible. We can group enzymes into functional classesthat perform similar chemical reactions (Table 3-1). Each type of enzyme within such a classis highly specific, catalyzing onll, a single type of reaction. Thus, hexokinase adds a phosphate group to o-glucose but ignores its optical isomer t-glucose; the blood-clotting enzyme thrombin cuts one tlpe of blood protein between a particular arginine and its adjacent glycine and nowhere else, and so on. As discussed in detail in Chapter 2, enzymes work in teams, with the product of one enzvme becoming the substrate for the next. The result is an elaborate network of metabolic pathways that provides the cell with energy and generatesthe many large and small moleculesthat the cell needs (seeFigure2-35).

Substrate Bindingls the FirstStepin EnzymeCatalysis

C o n s i d e r1 0 0 0m o l e c u l e so f A a n d 1 0 0 0m o l e c u l e so f B i n a e u c a r y o t i c c e l l T h e c o n c e n t r a t i o no f b o t h w i l l b e a b o u t 1 0 - eM l f t h e e q u i l i b r i u m . c o n s t a( K n )t f o r A + B . - A B i s 1 0' ' , t h e n o n e c a n c a l c u l a t et h a t a t e q u i l i b r i u mt h e r e will be 270



ABAB molecules molecules molecules l f t h e e q u i l i b r i u mc o n s t a n ti s a l i t t l e w e a k e ra t 1 0 ' , w h i c h r e p r e s e n t s a l o s so f 2 8 k c a l / m o l eo f b i n d i n g e n e r g yf r o m t h e e x a m p l e a b o v e ,o r 2 - 3 f e w e r h y d r o g e n b o n d s ,t h e n t h e r e w i l l b e 915



ABAB molecules molecules molecules

Figure3-44 Smallchangesin the numberof weak bondscan havedrastic effectson a binding interaction.This the dramaticeffectof examoleillustrates or absenceof a few weak the presence noncovalent bondsin a bioloqical conIexI.

For a protein that catalyzesa chemical reaction (an enzyme), the binding of each substratemolecule to the protein is an essentialprelude.In the simplest case,if we denote the enzyme by E, the substrate by S, and the product by Il the basic reaction path is E + S -+ ES -+ EP -+ E + P From this reaction path, we see that there is a limit to the amount of substrate that a single enzyme molecule can process in a given time. An increase in the concentration of substrate also increasesthe rate at which product is formed, up to a maximum value (Figure 3-45). At that point the enzyme molecule is saturated with substrate, and the rate of reaction ( V-oJ depends only on how rapidly the enzyme can processthe substrate molecule. This maximum rate divided by the enzvme concentration is

Table3-1 SomeCommonTypesof Enzymes




andproteoses generaltermfor enzymesthat catalyzea hydrolyticcleavagereaction;nucleases aremorespecific namesfor subclasses of theseenzymes. breakdownnucleicacidsby hydrolyzing bondsbetweennucleotides. breakdownproteinsby hydrolyzing bondsbetweenaminoacids. together. two smallermolecules synthesize molecules by condensing in anabolicreactions catalyze the rearrangement of bondswithina singlemolecule. polymerization of DNAand RNA. catalyze reactions suchasthe synthesis arean importantgroup groupsto molecules. Proteinkinases catalyze the additionof phosphate groupsto proteins. of kinases that attachphosphate groupfroma molecule. catalyze removalof a phosphate the hydrolytic whilethe generalnamefor enzymes reactions in whichonemoleculeisoxidized that catalyze namedeitheroxidases, otheris reduced.Enzymes of this type areoften morespecifically reductases, or dehydrogeno ses. ATPase hydrolyzeATP. Manyproteinswith a wide rangeof roleshavean energy-harnessing activityaspart of theirfunction,for example,motor proteinssuchasmyosinandmembrane pump. transportproteinssuchas thesodium-potassium

Nucleases Proteases Synthases lsomerases Polymerases Kinases Phosphatases Oxido-Reductases


that weredlscovered thrombinand lysozyme Enzymenamestyp ca ly end in " ase,"with the exc-apt trypsin, suchaspepsln, on of some-onzymes, centuryThecommonnameof an enzymeusually and namedb-.forethe convention becamegeneraly acceptedat the end of the nineteenth of citrateby a reaction catayzesthe synthests ndjcates the substrate citratesynthase andthe natureof the reactiorcatayzed Forexample, betweenacetvCoAandoxaloacetate


Chapter3: Proteins



o E 6 o o 0 5v.", o 6


s u b s t r a t ec o n c e n t r a t i o n+

Figure3-45 Enzymekinetics,The rate of an enzymereaction(V)increases asthe substrateconcentration increases untila maximumvalue(Vr"r) is reached. At this point all substrate-binding siteson the enzymemolecules arefullyoccupied, and the rateof reactionis limitedby the rateof the catalyticprocesson the enzymesurface.Formostenzymes, the concentration of substrate(Kr) at which the reactionrate is half-maximal(black dot)is a measureof how tightlythe substrateis bound,with a largevalueof K. corresponding to weakbinding.

called the turnouer number. The turnover number is often about 1000 substrate molecules processedper second per enzyme molecule, although turnover numbers between 1 and 10,000are known. The other kinetic parameter frequently used to characterizean enzyme is its K-, the concentration of substrate that allows the reaction to proceed at onehalf its maximum rate (0.5 V-*) (seeFigure 3-45). A low K^value means that the enzyme reaches its maximum catalytic rate at a low concentration of substrate and generally indicates that the enzyme binds to its substrate very tightly, whereas a high K- value corresponds to weak binding. The methods used to characterize enzymes in this way are explained in Panel 3-3 (pp. 162-163).

Enzymes SpeedReactions by Selectively Stabilizing Transition States Enzymes achieve extremely high rates of chemical reaction-rates that are far higher than for any synthetic catalysts.There are several reasons for this efficiency. First, the enzyme increases the local concentration of substrate molecules at the catal)'tic site and holds all the appropriate atoms in the correct orientation for the reaction that is to follow. More importantly, however, some of the binding energy contributes directly to the catalysis. Substrate molecules must pass through a series of intermediate states of altered geometry and electron distribution before they form the ultimate products of the reaction. The free energy required to attain the most unstable transition state is called the actiuation energyfor the reaction, and it is the major determinant of the reaction rate. Enzymes have a much higher affinity for the transition state of the substrate than they have for the stable form. Becausethis tight binding greatly lowers the energies of the transition state, the enzyme greatly acceleratesa particular reaction by lowering the activation energy that is required (Figure 3-46). By intentionally producing antibodies that act like enzymes, we can demonstrate that stabilizing a transition state can greatly increase a reaction rate. Consider, for example, the hydrolysis of an amide bond, which is similar to the peptide bond that joins two adjacent amino acids in a protein. In an aqueous solution, an amide bond hydrolyzes very slowly by the mechanism shown in Figure 3-47A. In the central intermediate, or transition state, the carbonyl carbon is bonded to four atoms arranged at the corners of a tetrahedron. By generating monoclonal antibodies that bind tightly to a stable analog of this very unstable tetrahedral intermediate, one can obtain an antibody that functions like an enzyme (Figure 3-47F_).Becausethis catalytic antibodybinds to and stabilizes the tetrahedral intermediate, it increases the spontaneous rate of amide-bond hydrolysis more than 10,000-fold.

EnzymesCan Use5imultaneousAcid and BaseCatalysis Figure 3-48 compares the spontaneous reaction rates and the corresponding enzyme-catalyzed rates for five enzyrnes. Rate accelerations range from 109to 1023. Clearly, enzymes are much better catalysts than cata\tic antibodies.

a c t i v a t i o ne n e r g y for uncatalyzed reaction



o q c


EP progress of reaction acTrvaron energy for catalyzed reaction

Figure3-46 Enzymaticaccelerationof chemicalreactionsby decreasingthe activation energy.Often both the uncatalyzed reaction(A)and the enzymecatalyzed reaction(B)cango through isthe transitionstatewith the highestenergy (Srand ESr)that determines tne activationenergyand limitsthe rateof p = product the reaction.(S= substrate; of the reaction;ES= enzyme-substrate complex;EP= enzyme-product complex.)



( A ) H Y D R O L Y SO I SF A N A M I D EB O N D




tetrahedral intermediate



o\ D'


Figure3-47 Catalyticantibodies.The of a transitionstateby an stabilization antibodycreatesan enzyme.(A)The reactionpath for the hydrolysisof an amidebond goesthrougha tetrahedral transition the high-energy intermediate, statefor the reaction.(B)The moleculeon the left wascovalentlylinkedto a protein and usedasan antigento generatean antibodythat bindstightlyto the region of the moleculeshown in yellow.Because this antibodyalsoboundtightlyto the transitionstatein (A),it was found to functionasan enzymethat efficiently of the amide the hydrolysis catalyzed bond in the moleculeon the riqht.


o analog

Enz).rynes not only bind tightly to a transition state, they also contain precisely positioned atoms that alter the electron distributions in those atoms that participate directly in the making and breaking of covalent bonds. Peptide bonds, for example, can be hydrolyzed in the absence of an enzyme by exposing a polypeptide to either a strong acid or a strong base, as illustrated in Figure 3-49. Enzymes are unique, however, in being able to use acid and base catalysissimultaneously, since the rigid framework of the protein binds the acidic and basic residues and prevents them from combining with each other (as they would do in solution) (Figure 3-49D). The fit between an enzyme and its substrate needs to be precise. A small change introduced by genetic engineering in the active site of an enzyme can have a profound effect. Replacing a glutamic acid with an aspartic acid in one enz)ryne,for example, shifts the position of the catalytic carborylate ion by only I A (about the radius of a hydrogen atom); yet this is enough to decreasethe activity of the enzyme a thousandfold.

LysozymelllustratesHow an EnzymeWorks To demonstrate how enzymes catalyze chemical reactions, we examine an enzlrrne that acts as a natural antibiotic in egg white, saliva, tears, and other secretions.Lysozyme catalyzesthe cutting of polysaccharide chains in the cell walls of bacteria. Because the bacterial cell is under pressure from osmotic forces,cutting even a small number of polysaccharide chains causesthe cell wall to rupture and the cell to burst. Lysozl'rneis a relatively small and stable protein

h a l f - t i m ef o r r e a c t i o n 1 0 6y e a r s



1 sec


Figure3-48 The rate accelerations causedby five different enzymes, (Adaptedfrom A, Radzickaand 1995. R.Wolfenden,Science267'.90-93, from AAAS.) With permission

WHY ANALYZETHE KINETICS OF ENZYMES? Enzymesare the most selectiveand powerful catalystsknown. An understandingof their detailedmechanisms providesa criticaltool for the discoveryof new drugs,for the large-scale industrialsynthesis of usefulchemicals, and for appreciating the chemistryof cellsand organisms.A detailedstudyof the ratesof the chemicalreactionsthat are catalyzedby a purified enzyme-more specifically how theserateschangewith changesin conditionssuchasthe concentrations of substrates, products,inhibitors,and regulatory Iigands-allows

biochemists to figure out exactlyhow eachenzymeworks. For example,this is the way that the ATP-producing reactions of glycolysis, shown previouslyin Figure2-72, were deciphered-allowing us to appreciatethe rationalefor this criticalenzymaticpathway. In this Panel,we introducethe important field of enzyme kinetics,which hasbeen indispensable for derivingmuch of the detailedknowledgethat we now haveabout cell chemistry.

STEADY-sTATE ENZYM E KINETICS Many enzymeshaveonly one substrate,which they bind and then processto produceproductsaccordingto the scheme outlined in Figure3-504. In this case,the reactionis written as kr



rate of ESbreakdown k-l [E5]+ kcat[Es]


Es -;

At this steadystate,[ES]is nearlyconstant,so that

rate of ESformation

kr tEltsl



Herewe haveassumedthat the which E + P recombineto form EPand then ES,occursso rarelythat we can ignore it. In this case,EPneed not be represented,ano we can expressthe rate of the reaction- known as its velocity,V, as

or, sincethe concentrationof the free enzyme,[E],is equal to [Eo]- [E5],

r,,r= (;|;)

-,,,,),,, r,r'r = (-jr-_; (,,",

V= k'"t [ES] where IES]is the concentrationof the enzyme-substrate complex, Rearranging,and defining the constantKmas and k.". is the turnover number,a rate constantthat hasa value k-1 + k.", equal to the number of substratemoleculesprocessed per enzymemoleculeeachsecond. k1 But how doesthe value of IES]relateto the concentrations that we know directly,which are the total concentrationof the we get enzyme,IEo],and the concentrationof the substrate,[S]?When enzymeand substrateare first mixed,the concentrationIES]will lE,lIs] tEsl = riserapidlyfrom zero to a so-calledsteady-state lever,as K. + [5] illustratedbelow. or, rememberingthat V = kr"t [E5],we obtain the famous Michaelis-Mentenequation

I c

k."t IEo][S]


K. + [S]

c o c

As IS] is increasedto higher and higher levels,essentially all of the enzymewill be bound to substrateat steadystate;at this point, a maximumrate of reaction,V-"r, will be reachedwhere V = V^u, = k."1[E6J.Thus,it is convenientto rewrite the Michaelis-Mentenequationas time + pre-steady state: E Sf o r m i n g

steadystate: ESalmostconstant


Chapter3: Proteins

\ir, H


o FAsr ll














o ll




/ \






acid catalysis



,,/.- \H


o //

base catalysis







I (A) no catalysis





both acid and base catalyses

that can be easily isolated in large quantities. For these reasons, it has been intensively studied, and it was the first enzyme to have its structure worked out in atomic detail by x-ray crystallography. The reaction that lysozyme catalyzes is a hydrolysis: it adds a molecule of water to a single bond between two adjacent sugar groups in the polysaccharide chain, thereby causing the bond to break (seeFigure 2-19). The reaction is energetically favorable because the free energy of the severedpolysaccharide chain is lower than the free energy of the intact chain. However, the pure polysaccharide can remain for years in water without being hydrolyzed to any detectable degree.This is because there is an energy barrier to the reaction, as discussedin Chapter 2 (seeFigure 2-46). Acolliding water molecule can break a bond linking tvvo sugars only if the polysaccharide molecule is distorted into a particular shape-the transition state-in which the atoms around the bond have an altered geometry and electron distribution. Becauseof this distortion, random collisions must supply a very large activation energy for the reaction to take place. In an aqueous solution at room temperature, the energy of collisions almost never exceeds the activation energy. Consequently, hydrolysis occurs extremely slowly, if at all. This situation changes drastically when the polysaccharide binds to lysozyme.The active site of lysozyme,becauseits substrate is a polymer, is a long groove that holds six linked sugars at the same time. As soon as the polysaccharide binds to form an enzyme-substrate complex, the enzyme cuts the polysaccharide by adding a water molecule across one of its sugar-sugar bonds. The product chains are then quickly released,freeing the enzyme for further cycles of reaction (Figure 3-50). The chemistry of the binding of lysozl.rneto its substrate is the same as that for antibody binding to its antigen-the formation of multiple noncovalent

Figure3-50 The reactioncatalyzedby lysozyme.(A)The enzyme lysozyme(E)catalyzes the cuttingof a polysaccharide chain,which is its substrate(S).Theenzymefirstbindsto the chainto form an enzyme-substrate complex(ES)and then catalyzes the cleavageof a specificcovalentbond in the backboneof the polysaccharide, formingan enzyme-productcomplex(EP)that rapidlydissociates. Release of the severedchain(the productsP)leavesthe enzymefreeto act on another substratemolecule.(B)A space-filling modelof the lysozymemolecule boundto a shortlengthof polysaccharide chainbeforecleavage. (B,courtesyof RichardJ. Feldmann.)






Figure3-49 Acid catalysisand base catalysis.(A)The start of the uncatalyzed reactionshownin Figure3-474,with b/ueindicatingelectrondistributionin the waterand carbonylbonds.(B)An acid likesto donatea proton(H+)to other atoms.By pairingwith the carbonyl oxygen,an acidcauseselectronsto move awayfrom the carbonylcarbon,making this atom much moreattractiveto the electronegative oxygenof an attacking watermolecule.(C)A baselikesto take up H+.By pairingwith a hydrogenof the attackingwatermolecule, a basecauses electronsto move toward the water oxygen/makingit a betterattacking groupfor the carbonylcarbon.(D)By positionedatoms havingappropriately on its surface, an enzymecan perform both acidcatalysis and basecatalysis at the sametime.



bonds. However,lysozyme holds its polysaccharide substrate in a particular way, so that it distorts one of the two sugarsin the bond to be broken from its normal, most stable conformation. The bond to be broken is also held close to two amino acids with acidic side chains (a glutamic acid and an aspartic acid) within the active site. Conditions are thereby created in the microenvironment of the lysozyme active site that greatly reduce the activation energy necessaryfor the hydrolysis to take place. Figure 3-51 shows three central steps in this enzymatically catalyzed reaction. The enzyme stressesits bound substrate, so that the shape of one sugar more closely resembles the shape of high-energy transition states formed during the reaction. 2. The negatively charged aspartic acid reactswith the Cl carbon atom on the distorted sugar,and the glutamic acid donates its proton to the oxygen that links this sugar to its neighbor. This breaks the sugar-sugar bond and leaves the aspartic acid side chain covalently linked to the site of bond cleavage. 3. Aided by the negatively charged glutamic acid, a water molecule reacts with the Cl carbon atom, displacing the aspartic acid side chain and completing the process of hydrolysis. t.

Figure3-51 Eventsat the active site of lysozyme.The top left and top rightdrawingsshowthe freesubstrate and the freeproducts,respectively, whereasthe otherthreedrawingsshow eventsat the enzyme the sequential activesite.Notethe changein the of sugarD in the conformation complex;this shape enzyme-substrate the oxocarbenium changestabilizes ion-liketransitionstatesrequiredfor of the covalent formationand hydrolysis intermediate shownin the middlepanel. It is alsooossiblethat a carboniumion formsin step2, asthe intermediate shownin the covalentintermediate middlepanelhasbeendetectedonly with a syntheticsubstrate.(SeeD.J.Vocadloet 2001.) al.,Nature412:835-838,

The overall chemical reaction, from the initial binding of the polysaccharide on the surface of the enzyme through the final release of the severed chains, occurs many millions of times faster than it would in the absence of enzyme. Other enzymes use similar mechanisms to lower activation energies and speed up the reactions they catalyze.In reactions involving two or more reactants, the active site also acts like a template, or mold, that brings the substrates together in the proper orientation for a reaction to occur between them (Figure


SUBSTRATE T h i ss u b s t r a t ei s a n o l i g o s a c c h a r i doef s i xs u g a r s , l a b e l e dA - F .O n l y s u g a r sD a n d E a r e s h o w n i n d e t a i l

T h e f i n a l p r o d u c t sa r e a n o l i g o s a c c h a r i doef f o u r s u g a r s (/eft) and a disaccharide(dght), produced by hydrolysis.

cHzoH A B CrO

otF cH2oH

-o ---\i_' I H



\ (.

I n t h e e n z y m e - s u b s t r a tceo m p l e x( E 5 ) t, h e e n z y m ef o r c e ss u g a rD i n t o a s t r a i n e d c o n f o r m a t i o nw , i t h G l u 3 5 p o s i t i o n e dt o s e r v ea s a n a c i dt h a t a t t a c k st h e a d j a c e n ts u g a r - s u g a r b o n d b y d o n a t i n ga p r o t o n ( H + )t o s u g a rE ,a n d A s o 5 2 o o i s e dt o a t t a c kt h e C 1 c a r b o na t o m

T h e A s p 5 2 h a sf o r m e d a c o v a l e n tb o n d b e t w e e n t h e e n z y m ea n d t h e C 1c a r b o na t o m o f s u g a rD T h e G l u 3 5 t h e n p o l a r i z e sa w a t e r m o l e c u l e( r e d ) , s o t h a t i t s o x y g e nc a n r e a d i l ya t t a c kt h e C 1 c a r b o na t o m a n d d i s p l a c eA s o 5 2

T h e r e a c t i o no f t h e w a t e r m o l e c u l e( r e d ) c o m p l e t e st h e h y d r o l y s ias n d r e t u r n st h e e n z y m e t o i t s i n i t i a ls t a t e ,f o r m i n gt h e f i n a l e n z y m e o r o d u c tc o m p l e x( E P ) .


Chapter3: Proteins

Figure3-52 Somegeneralstrategiesof enzyme catalysis.(A)Holding substrates togetherin a precisealignment. (B)Chargestabilization of reaction (C)Applyingforcesthat intermediates. distortbondsin the substrate to increase the rateof a particularreaction. ( A ) e n z y m eb i n d st o t w o s u b s t r a t em o l e c u l e sa n d o r i e n t st h e m p r e c i s e ltyo e n c o u r a g ea r e a c t i o nt o o c c u rb e t w e e nt h e m

( B ) b i n d i n go f s u b s t r a t e ( C )e n z y m es t r a i n st h e ro enzyme rearranges bound substrate e l e c t r o n si n t h e s u b s t r a t e , m o l e c u l ef,o r c i n gi t c r e a t i n gp a r t i a ln e g a t i v e toward a transition a n d p o s i t i v ec h a r g e s state to favor a reaction that favor a reaction

3-524.).As we saw for lysozyme, the active site of an enzyme contains precisely positioned atoms that speed up a reaction by using charged groups to alter the distribution of electrons in the substrates (Figure 3-528). In addition, when a substrate binds to an enzyme, bonds in the substrate often bend, changing the substrate shape.These changes,along with mechanical forces, drive a substrate toward a particular transition state (Figure 3-52C). Finally, like lysozyme, many enzymes participate intimately in the reaction by briefly forming a covalent bond between the substrate and a side chain of the enzyme. Subsequent steps in the reaction restore the side chain to its original state, so that the enzyme remains unchanged after the reaction (seealso Figure 2-22).

TightlyBoundSmallMolecules Add ExtraFunctions to Proteins Although we have emphasized the versatility of proteins as chains of amino acids that perform different functions, there are many instances in which the amino acids by themselves are not enough. Just as humans employ tools to enhance and extend the capabilities of their hands, proteins often use small nonprotein molecules to perform functions that would be difficult or impossible to do with amino acids alone. Thus, the signal receptor protein rhodopsin, which is made by the photoreceptor cells in the retina, detects light by means of a small molecule, retinal, embedded in the protein (Figure 3-53A). Retinal changes its shape when it absorbs a photon of light, and this change causesthe protein to trigger a cascade of enzymatic reactions that eventually lead to an electrical signal being carried to the brain. Another example of a protein that contains a nonprotein portion is hemoglobin (see Figure 3-22). A molecule of hemoglobin carries four heme groups, ring-shaped molecules each with a single central iron atom (Figure 3-538). Heme gives hemoglobin (and blood) its red color. By binding reversibly to oxygen gas through its iron atom, heme enables hemoglobin to pick up oxygen in the lungs and releaseit in the tissues. sometimes these small molecules are attached covalently and permanently to their protein, thereby becoming an integral part of the protein molecule itself. we shall see in chapter l0 that proteins are often anchored to cell membranes through covalently attached lipid molecules. And membrane proteins exposed COOH


fn, 9Hz



T", CHz


-e ,u, l La - u







Figure3-53 Retinaland heme.(A)The structureof retinal,the light-sensitive moleculeattachedto rhodopsinin the eye.(B)The structureof a hemegroup. Thecarbon-containing hemering is red and the iron atom at its centeris orange. A hemegroup is tightlyboundto eachof the four polypeptidechainsin hemoglobin,the oxygen-carrying protein whosestructureis shownin Fiqure3-22.


PROTEIN FUNCTION Table3-2 ManyVitaminsProvideCritical Coenzymes for HumanCells

T h i a m i n e( v i t a m i nB r ) Riboflavin(vitaminBz) Niacin Pantothenicacid Pyridoxine Biotin Lipoicacid Folicacid V i t a m i nB r z

thiaminepyrophosphate FADH NADH,NADPH coenzymeA pyridoxal phosphate biotin lipoamide tetrahydrofolate cobalamin coenzymes

activationand transferof aldehydes oxidation-reduction oxidation-reduction acyl group activationand transfer amino acid activation;alsoglycogenphosphorylase CO2activationand transfer acyl group activation;oxidation-reduction activationand transferof singlecarbon groups isomerizationand methyl group transfers

on the surface of the cell, as well as proteins secreted outside the cell, are often modified by the covalent addition of sugars and oligosaccharides. Enzymes frequently have a small molecule or metal atom tightly associated with their active site that assistswith their catalytic function. Carboxypeptidase, for example, an enzyrne that cuts polypeptide chains, carries a tightly bound zinc ion in its active site. During the cleavageof a peptide bond by carboxypeptidase, the zinc ion forms a transient bond with one of the substrate atoms, thereby assisting the hydrolysis reaction. In other enzymes, a small organic molecule servesa similar purpose. Such organic molecules are often referred to as coenzymes. An example is biotin, which is found in enzymes that transfer a carboxylate group (-COO-) from one molecule to another (see Figure 2-63). Biotin participates in these reactions by forming a transient covalent bond to the -COO- group to be transferred, being better suited to this function than any of the amino acids used to make proteins. Because it cannot be synthesized by humans, and must therefore be supplied in small quantities in our diet, biotin is a uitamin. Many other coenzymes are produced from vitamins (Table3-2). Vitamins are also needed to make other types of small molecules that are essential components of our proteins; vitamin A, for example, is needed in the diet to make retinal, the light-sensitive part of rhodopsin.

with Multiple Molecular TunnelsChannelSubstrates in Enzymes CatalyticSites Some of the chemical reactions catalyzedby enzymes in cells produce intermediates that are either very unstable or that could readily diffuse out of the cell through the plasma membrane if released into the cltosol. To preserve these intermediates, enzymes have evolved molecular tunnels that connect tvvo or more active sites, allowing the intermediate to be rapidly processed to a final product-without ever leaving the enzyme. Consider, for example, the enzyme carbamoyl phosphate synthetase,which uses ammonia derived from glutamine plus two molecules of ATP to convert bicarbonate (HCO3-) to carbamoyl phosphate-an important intermediate in several metabolic pathways (Figure 3-54). This enzyme contains three widely separated active sites that are connected to each other by a tunnel. The reaction starts at active site 2, located in the middle of the tunnel, where AIP is used to phosphorylate (add a phosphate group to) bicarbonate, forming carbory phosphate. This event triggers the hydrolysis of glutamine to glutamic acid at active site 1, releasing ammonia into the tunnel. The ammonia immediately diffuses through the first half of the tunnel to active site 2, where it reacts with the carboxyphosphate to form carbamate. This unstable intermediate then diffuses through the second half of the tunnel to active site 3, where it is phosphorylated byATP to the final product, carbamoyl phosphate.


Chapter3: Proteins

Figure3-54 The tunnelingof reactionintermediatesin the enzyme carbamoylphosphatesynthetase.(A)Diagramofthe structureof the enzyme,in whicha redribbonhasbeenusedto outlinethe tunnelon the insideof the proteinconnectingits three activesites.Thesmalland largesubunitsof this dimericenzyme (B)The path of the are color codedyellow and blue,respectively. reaction.As indicated, activesite 1 producesammonia,which diffusesthroughthe tunnelto activesite2, whereit combines with carboxyphosphateto form carbamate. Thishighlyunstable intermediate then diffusesthroughthe tunnelto activesite3, whereit is phosphorylated by ATPto producethe finalproduct, (A,modifiedfrom F.M.Raushel, carbamoylphosphate. J.B.Thoden, and H.M.Holden,Acc.Chem.Res.36:539-548,2003.Witn permission from AmericanChemicalSocietv.)





Hzo I


- o/ / bicarbonate

CP \

q - llu t a m i n e



p,' *l

L+ I

q l u t a m i ca c i d V NHr




NHj diffusion



Severalother well characterized enzymes contain similar molecular tunnels. Ammonia, a readily diffusable intermediate that might otherwise be lost from the cell, is the substrate most frequently channeled in the examples thus far kno'nrm.

Multienzyme Complexes Helpto Increase the Rateof Cell Metabolism The efficiency of enzymes in accelerating chemical reactions is crucial to the maintenance of life. cells, in effect, must race against the unavoidable processes of decay, which-if left unattended-cause macromolecules to run downhill toward greater and greater disorder. If the rates of desirable reactions were not greater than the rates of competing side reactions, a cell would soon die. we can get some idea of the rate at which cell metabolism proceeds by measuring the rate of ArP utilization. A typical mammalian cell "turns over" (i.e.,hydrolyzes and restoresby phosphorylation) its entire ATp pool once every I or 2 minutes. For each cell, this turnover represents the utilization of roughly 107molecules of AIP per second (or, for the human body, about I gram of nfi, everv minute).



The rates of reactions in cells are rapid because enzyme catalysisis so effective. Many important enzymes have become so efficient that there is no possibility of further useful improvement. The factor that limits the reaction rate is no Ionger the enzyme's intrinsic speed of action; rather, it is the frequency with which the enzyme collides with its substrate. Such a reaction is said to be diffusion-limited (seePanel3-3, p. 162-163). If an enzyme-catalyzed reaction is diffusion-limited, its rate depends on the concentration of both the enzyme and its substrate. If a sequence of reactions is to occur extremely rapidly, each metabolic intermediate and enzyme involved must be present in high concentration. However, given the enormous number of different reactions performed by a cell, there are limits to the concentrations that can be achieved. In fact, most metabolites are present in micromolar (10-6 M) concentrations, and most enzyme concentrations are much lower. How is it possible, therefore, to maintain very fast metabolic rates? The answer lies in the spatial organization of cell components. The cell can increase reaction rates without raising substrate concentrations by bringing the various enzJ,.rnes involved in a reaction sequence together to form a large protein assembly as a multienzyme complex (Figure 3-55). Because this A to be passeddirectly to enzyme B, and so on, difallows the product of enzJ,ryne fusion rates need not be limiting, even when the concentrations of the substrates in the cell as a whole are very low. It is perhaps not surprising, therefore, that such enzyme complexes are very common, and they are involved in nearly all aspects of metabolism-including the central genetic processes of DNA, RNA, and protein slmthesis.In fact, few enzymes in eucaryotic cells diffuse freely in solution; instead, most seem to have evolved binding sites that concentrate them with other proteins of related function in particular regions of the cell, thereby increasing the rate and efficiency ofthe reactions that they catalyze. Eucaryotic cells have yet another way of increasing the rate of metabolic reactions: using their intracellular membrane systems.These membranes can segregateparticular substratesand the enzymes that act on them into the same membrane-enclosed compartment, such as the endoplasmic reticulum or the cell nucleus. If, for example, a compartment occupies a total of 10% of the volume of the cell, the concentration of reactants in that compartment may be increased by 10 times compared with a cell with the same number of enzymes and substrate molecules, but no compartmentalization. Reactions limited by the speed of diffusion can thereby be speeded up by a factor of 10.

the Catalytic Activitiesof its Enzymes TheCellRegulates many of which operate at the same A living cell contains thousands of enz).rynes, time and in the same small volume of the c1'tosol.By their catalytic action, these enzymes generate a complex web of metabolic pathways, each composed of chains of chemical reactions in which the product of one enzyme becomes the substrate of the next. In this maze of pathways, there are many branch points (nodes) where different enzymes compete for the same substrate.The system is so complex (see Figure 2-88) that elaborate controls are required to regulate when and how rapidly each reaction occurs.

8 t r i m e r so f l i p o a m i d er e d u c t a s e transacetylase

+ 1 2 m o l e c u l e so f dihydrolipoyl dehydrogenase

+24 moleculeo sf pyruvatedecarboxylase

Figure3-55 The structure of pyruvate Thisenzymecomplex dehydrogenase. catalyzesthe conversionof pyruvateto acetylCoA,as part of the pathwaythat oxidizessugarsto COzand HzO(seeFigure 2-79).lt is an exampleof a large multienzymecomplexin which reaction intermediatesare passeddirectlyfrom one enzymeto another.


Chapter3: Proteins

Regulation occurs at many levels.At one level, the cell controls how many molecules of each enzyme it makes by regulating the expressionof the gene that encodes that enzyme (discussedin chapter 7). The cell also controls enzymatic activities by confining sets of enzymes to particular subcellular compartments, enclosed by distinct membranes (discussedin chapters 12 and 14). As will be discussed later in this chapter, enzymes are frequently covalently modified to control their activity. The rate ofprotein destruction by targeted proteolysis represents yet another important regulatory mechanism (seep. 395). But the most general process that adjusts reaction rates operates through a direct, reversible change in the activity of an enzyme in response to the specific small molecules that it encounters. The most common type of control occurs when a molecule other than one of the substrates binds to an enzyme at a special regulatory site outside the active site, thereby altering the rate at which the enzyme converts its substrates to products. For example, in feedback inhibition a product produced late in a reaction pathway inhibits an enzyme that acts earlier in the pathway. Thus, whenever large quantities of the final product begin to accumulate, this product binds to the enzyme and slows down its catalytic action, thereby limiting the further entry of substrates into that reaction pathway (Figure g-s6). \Mhere pathways branch or intersect, there are usually multiple points of control by different final products, each of which works to regulate its own synthesis (Figure 3-57). Feedback inhibition can work almost instantaneously, and it is rapidlv reversedwhen the level of the product falls.

Figure3-56 Feedbackinhibitionof a singlebiosyntheticpathway.TheendproductZ inhibitsthe firstenzymethat is uniqueto its synthesis and thereby controlsits own levelin the cell.Thisis an exampleof negativeregulation.



I I methionine

Figure3-57 Multiplefeedback inhibition.In this example,which shows the biosyntheticpathwaysfor four differentaminoacidsin bacteria, the red arrowsindicatepositionsat which productsfeed backto inhibitenzymes. Eachaminoacidcontrolsthe firstenzyme specificto its own synthesis, thereby controllingits own levelsand avoidinga wasteful,or evendangerous, buildupof intermediates. The productscanalso separately inhibitthe initialset of reactions commonto all the syntheses; in this case,three differentenzymes catalyze the initialreaction,each inhibitedbv a differentoroduct.


Feedback inhibition is negatiueregulation: it prevents an enzyme from acting. Enzymes can also be subject to positiue regulation, in which a regulatory molecule stimulates the enzyme's activity rather than shutting the enzyme down. Positive regulation occurs when a product in one branch of the metabolic network stimulates the activity of an enzyme in another pathway. As one example, the accumulation of ADP activates several enzymes involved in the oxidation of sugar molecules, thereby stimulating the cell to convert more ADP to AIP

AllostericEnzymesHaveTwoor MoreBindingSitesThatInteract A striking feature of both positive and negative feedback regulation is that the regulatory molecule often has a shape totally different from the shape of the substrate of the enz].ryne.This is why the effect on a protein is termed allostery (from the Greekwords allos,meaning"other," andstereos,meaning"solid" or"three-dimensional"). As biologists learned more about feedback regulation, they recognized that the enzyrnes involved must have at least two different binding sites on their surface-an active site that recognizes the substrates, and a regulatory site that recognizes a regulatory molecule. These two sites must somehow communicate so that the catalytic events at the active site can be influenced by the binding of the regulatory molecule at its separatesite on the protein's surface. The interaction between separated sites on a protein molecule is now knor,tmto depend on a conformational changein the protein: binding at one of the sites causesa shift from one folded shape to a slightly different folded shape. During feedback inhibition, for example, the binding of an inhibitor at one site on the protein causesthe protein to shift to a conformation that incapacitates its active site, located elsewherein the protein. It is thought that most protein molecules are allosteric. They can adopt two or more slightly different conformations, and a shift from one to another caused by the binding of a ligand can alter their activity. This is true not only for enzymes but also for many other proteins, including receptors, structural proteins, and motor proteins. In all instances of allosteric regulation, each conformation of the protein has somewhat different surface contours, and the protein's binding sites for ligands are altered when the protein changes shape. Moreover as we discuss next, each ligand will stabilize the conformation that it binds to most strongly, and thus-at high enough concentrations-will tend to "switch' the protein toward the conformation that the ligand prefers.

TwoLigandsWhoseBindingSitesAreCoupledMustReciprocally AffectEachOther'sBinding The effects of ligand binding on a protein follow from a fundamental chemical principle knor.vnas linkage. Suppose,for example, that a protein that binds glucose also binds another molecule, X, at a distant site on the protein's surface. If the binding site for X changes shape as part of the conformational change induced by glucosebinding, the binding sites for X and for glucose are said to be coupled. Vy'henevertwo ligands prefer to bind to the same conformation of an allosteric protein, it follows from basic thermodynamic principles that each ligand must increasethe affinity of the protein for the other. Thus, if the shift of the protein in Figure 3-58 to the closed conformation that binds glucose best also causes the binding site for X to fit X better, then the protein will bind glucose more tightly when X is present than when X is absent. Conversely,linkage operates in a negative way if two ligands prefer to bind to dffirent conformations of the same protein. In this case,the binding of the first ligand discouragesthe binding of the second ligand. Thus, if a shape change caused by glucose binding decreasesthe affinity of a protein for molecule X, the binding of X must also decreasethe protein's affinity for glucose (Figure 3-59). The linkage relationship is quantitatively reciprocal, so that, for example, if glucose has a very large effect on the binding of X, X has a very large effect on the binding of glucose.



Chapter3: Proteins


Figure3-58 Positiveregulation caused by conformationalcoupling between two distantbinding sites.In this example,both glucoseand moleculeX bind bestto the c/osedconformationof a proteinwith two domains.Becauseboth glucoseand moleculeX drivethe protein toward its closedconformation,each ligandhelpsthe otherto bind.Glucose and moleculeX arethereforesaidto bind cooperativelyto the protein.

molecule X

? I

positive r e gu l a t i o n

ACTIVE 10% active

100% active

The relationships sho'o.rnin Figures 3-58 and 3-59 apply to all proteins, and they underlie all of cell biology. They seem so obvious in retrospect that we now take it for granted. But the discovery of linkage in studies of a few enzymes in the 1950s,followed by an extensive analysis of allosteric mechanisms in proteins in the early 1960s, had a revolutionary effect on our understanding of biology. Since molecule X in these examples binds at a site on the enzyme that is distinct from the site where catalysis occurs, it need have no chemical relationship to glucose or to any other ligand that binds at the active site. Moreover, as we have just seen, for enzymes that are regulated in this way, molecule X can either turn the enzyme on (positive regulation) or turn it off (negative regulation). By such a mechanism, allosteric proteins serve as general switches that, in principle, allow one molecule in a cell to affect the fate of anv other.

SymmetricProteinAssemblies ProduceCooperative Allosteric Transitions A single-subunit enzyme that is regulated by negative feedback can at most decreasefrom 90% to about l0% activity in responseto a 1O0-foldincreasein the concentration of an inhibitory ligand that it binds (Figure 3-60, red line). Responsesof this type are apparently not sharp enough for optimal cell regulation, and most enzymes that are turned on or off by ligand binding consist of s).rynmetricassemblies of identical subunits. with this arrangement, the binding of a molecule of ligand to a single site on one subunit can promote an allosterii change in the entire assembly that helps the neighboring subunits bind the same ligand. As a result, a cooperatiue allosteric transition occurs (Figure 3-60, blue line), allowing a relatively small change in ligand concentration in the cell to switch the whole assembly from an almost fully active to an almost fully inactive conformation (or vice versa).


molecule X I

{ negative regu lation


1 0 %a c t i v e

Figure3-59 Negativeregulation caused by conformationalcoupling between two distant binding sites.The scheme hereresembles that in the orevious figure,but here moleculeX prefersthe open conformation,while glucoseprefers the c/osedconformation.Becauseglucose and moleculeX drivethe proteintoward oppositeconformations(closedand open, respectively), the presenceof eitherligandinterferes with the binding of the other.




o a ^_

EU) N c o o o o

5 i n h i b i t o rc o n c e n t r a t i o n-

The principles involved in a cooperative "all-or-none" transition are the same for all proteins, whether or not they are enzymes.But they are perhaps easiest to visualize for an enzyme that forms a s).rynmetricdimer. In the example sholtryrin Figure 3-61, the first molecule of an inhibitory ligand binds with great difficulty since its binding disrupts an energetically favorable interaction between the two identical monomers in the dimer. A second molecule of inhibitory ligand now binds more easily,however, because its binding restores the energetically favorable monomer-monomer contacts of a symmetric dimer (this also completely inactivates the enzyme). As an alternative to this inducedfirmodel for a cooperative allosteric transition, we can view such a symmetrical enzyme as having only two possible conformations, corresponding to the "enzyme on" and "enzyme off" structures in Figure 3-61. In this view, ligand binding perturbs an all-or-none equilibrium between these two states,thereby changing the proportion of active molecules. Both models represent true and useful concepts; it is the second model that we shall describe next.

Figure3-60 Enzymeactivity versusthe concentrationof inhibitoryligandfor single-subunitand multisubunit allostericenzymes.Foran enzymewith a singlesubunit (redline),a drop from 900/o activity(indicated enzymeactivityto 10o/o by the two dots on the curve)requiresa of in the concentration 10O-foldincrease Theenzymeactivityis inhibitor. from the simpleequilibrium calculated whereP is l( = tlPl/tlltPl, relationship activeprotein,I is inhibitor,and lP is the inactiveoroteinboundto inhibitor.An identicalcurveappliesto any simple bindinginteractionbetweentwo A and B.In contrast,a molecules, enzymecan multisubunitallosteric respondin a switchlikemannerto a the steep changein ligandconcentration: is causedby a cooperative response as bindingof the ligandmolecules, explainedin Figure3-61.Here,the green /ine representsthe idealizedresult expectedfor the cooperativebinding of to an two inhibitoryligandmolecules enzymewith two subunits,and allosteric the blueline showsthe idealized of an enzymewith four response subunits.As indicatedby the two dots on eachof thesecurves,the morecomplex activity enzymesdrop from 90o/oro10o/o overa much narrowerrangeof inhibitor than doesthe enzyme concentration composedof a singlesubunit.

ls Transcarbamoylase in Aspartate TheAllosteric Transition Understood in AtomicDetail One enzyme used in the early studies of allosteric regulation was aspartate transcarbamoylase from E coli. lt catalyzesthe important reaction that begins the synthesisof the pyrimidine ring of C, U, and T nucleotides: carbamoyl phosphate + aspartate -+ ly'-carbamoylaspartate.One of the final products of this pathway, cltosine triphosphate (CTP),binds to the enzyme to turn it off whenever CTP is plentiful. Aspartate transcarbamoylaseis a large complex of six regulatory and six catalyic subunits. The catalyic subunits form two trimers, each arranged in the shape of an equilateral triangle; the two trimers face each other and are held

Figure3-61 A cooperativeallosterictransitionin an enzymecomposed of how the conformation of two identicalsubunits.Thisdiagramillustrates The bindingof a single one subunitcan influencethat of its neighbor. moleculeof an inhibitoryligand (yellow)to one subunitof the enzyme of this subunit occurswith difficultybecauseit changesthe conformation and therebydisruptsthe symmetryof the enzyme.Oncethis the energygainedby changehasoccurred,however, conformational restoringthe symmetricpairinginteractionbetweenthe two subunits makesit especially easyfor the secondsubunitto bind the inhibitory the binding change.Because ligandand undergothe sameconformational the affinitywith whichthe other of the firstmoleculeof ligandincreases of the enzymeto changesin subunitbindsthe sameligand,the response of an of the ligandis much steeperthan the response the concentration enzymewith only one subunit(seeFigure3-60).






Chapter3: Proteins

INACTIVE ENZYME: T STATE cata lytic s ub u n i t s





together by three regulatory dimers that form a bridge between them. The entire molecule is poised to undergo a concerted, all-or-none, allosteric transition between two conformations, designated as T (tense) and R (relaxed)states (Figure 3-62). The binding of substrates (carbamoyl phosphate and aspartate) to the catalytic trimers drives aspartate transcarbamoylase into its catalytically active R state, from which the regulatory crP molecules dissociate. By contrast, the binding of crP to the regulatory dimers converts the enzyme to the inactive T state, from which the substrates dissociate. This tug-of-war between crp and substratesis identical in principle to that described previously in Figure 3-59 for a simpler allosteric protein. But because the tug-of-war occurs in a symmetric molecule with multiple binding sites, the enzyme undergoes a cooperative allosteric transition that will turn it on suddenly as substratesaccumulate (forming the R state) or shut it off rapidly when crp accumulates (forming the T state). A combination of biochemistry and x-ray crystallography has revealedmany fascinating details of this allosteric transition. Each regulatory subunit has two domains, and the binding of crP causes the two domains to move relative to each other, so that they function like a lever that rotates the two catalytic trimers and pulls them closer together into the T state (see Figure 3-62). \.4rhenthis occurs, hydrogen bonds form between opposing catal)'tic subunits. This helps widen the cleft that forms the active site within each catalytic subunit, thereby disrupting the binding sites for rhe substrates (Figure 3-63). Adding large amounts of substrate has the opposite effect, favoring the R state by binding in the cleft of each catalytic subunit and opposing the above conformational change. conformations that are intermediate between R and T are unstable, so that the enzyme mostly clicks back and forth between its R and T forms, producing a mixture of these two speciesin proportions that depend on the relitive concentrations of CTP and substrates.

Figure3-62 The transition between R and T statesin the enzyme aspartate transcarbamoylase.The enzyme consists of a complexof sixcatalytic subunitsand six regulatorysubunits,and the structures of its inactive(T state)and active(Rstate)forms havebeen determinedby x-raycrystallography. The enzymeis turned off by feedback inhibitionwhen CTPconcentrations rise. Eachregulatorysubunitcan bind one moleculeof CTP, which is one of the final productsin the pathway.By meansof this negativefeedbackregulation, the pathway is preventedfrom producingmore CTP than the cell needs.(Basedon K.L.Krause, K.W.Volzand W.N.Lipscomb, Proc.Natl Acad.Sci.U.5.A.82:1643-1647, 1985. With permission from NationalAcademy of Sciences.)



Arg 167


(g,rcs Arg229 rg 234 G l u2 3 9

T state (inactive)

in Proteins Are Drivenby ProteinPhosphorylation ManyChanges Proteins are regulated by more than the reversible binding of other molecules.A second method that eucaryotic cells use to regulate a protein's function is the covalent addition of a smaller molecule to one or more of its amino acid side chains. The most common such regulatory modification in higher eucaryotes is the addition of a phosphate group. We shall therefore use protein phosphorylation to illustrate some of the general principles involved in the control of protein function through the modification of amino acid side chains. A phosphorylation event can affect the protein that is modified in two important ways. First, because each phosphate group carries two negative charges,the enzyme- catalyzed addition of a phosphate group to a protein can cause a major conformational change in the protein by, for example, attracting a cluster of positively charged amino acid side chains. This can, in turn, affect the binding of ligands elsewhere on the protein surface, dramatically changing the protein's activity. \A/trena second enzyme removes the phosphate group, the protein returns to its original conformation and restoresits initial activity. Second, an attached phosphate group can form part of a structure that the binding sites of other proteins recognize.As previously discussed,certain protein domains, sometimes referred to as modules, appear very frequently as parts of larger proteins. One such module is the SH2 domain, described earliel which binds to a short peptide sequence containing a phosphorylated tyrosine side chain (seeFigure 3-398). More than ten other common domains provide binding sites for attaching their protein to phosphorylated peptides in other protein molecules, each recognizingaphosphorylated amino acid side chain in a different protein context. As a result, protein phosphorylation and dephosphorylation very often drive the regulated assembly and disassembly of protein complexes (seeFigure 15-22). Reversible protein phosphorylation controls the activity, structure, and cellular Iocalization of both enzymes and many other types of proteins in

Figure3-63 Part of the on-off switch in the catalyticsubunitsof aspartate Changesin the transcarbamoylase. interactions indicatedhydrogen-bonding for switchingthis arepartlyresponsible enzyme'sactivesite betweenactive (yellow)and inactiveconformations. Hydrogenbonds are indicatedby thin red /ines. Theaminoacidsinvolvedin the interactionin the T state subunit-subunit are shown in red,while thosethat form the activesite of the enzymein the R state areshownin blue.Thelargedrawings showthe catalyticsitein the interiorof the enzyme;the boxed sketchesshow the samesubunitsviewedfrom the enzyme's externalsurface.(Adaptedfrom E.R.Kantrowitzand W.N.Lipscomb,Irends 1990.With Biochem. Sci.15:53-59, permissionfrom Elsevier.)


Chapter3: Proteins

eucaryotic cells. In fact, this regulation is so extensive that more than one-third of the 10,000or so proteins in a tlpical mammalian cell are thought to be phosphorylated at any given time-many with more than one phosphate. As might be expected, the addition and removal of phosphate groups from specific proteins often occur in responseto signals that specify some change in a cell'sstate. For example, the complicated series of events that takes place as a eucaryotic cell divides is largely timed in this way (discussedin Chapter 17), and many of the signals mediating cell-cell interactions are relayed from the plasma membrane to the nucleus by a cascadeofprotein phosphorylation events (discussed in Chapter 15).

A Eucaryotic CellContainsa LargeCollection of ProteinKinases and ProteinPhosphatases Protein phosphorylation involves the enzyme- catalyzedtransfer of the terminal phosphate group of an ATP molecule to the hydroxyl group on a serine, threonine, or tyrosine side chain of the protein (Figure 3-64). A protein kinase catalyzesthis reaction, and the reaction is essentiallyunidirectional because of the large amount of free energy released when the phosphate-phosphate bond in ATP is broken to produce ADP (discussedin chapt er 2) . Aprotein phosphatase catalyzesthe reversereaction of phosphate removal, or dephosphorylation. cells contain hundreds of different protein kinases, each responsible for phosphorylating a different protein or set of proteins. There are also many different protein phosphatases;some are highly specific and remove phosphate groups from only one or a few proteins, whereas others act on a broad range of proteins and are targeted to specific substratesby regulatory subunits. The state ofphosphorylation of a protein at any moment, and thus its activity, depends on the relative activities of the protein kinases and phosphatasesthat modiff it. The protein kinases that phosphorylate proteins in eucaryotic cells belong to a very large family of enzymes, which share a catal),'tic(kinase) sequence of about 290 amino acids. The various family members contain different amino acid sequences on either end of the kinase sequence (for example, see Figure 3-10), and often have short amino acid sequencesinserted into loops within it (red arrowheadsin Figure 3-65). Some of these additional amino acid sequences enable each kinase to recognize the specific set ofproteins it phosphorylates, or to bind to structures that localize it in specific regions of the cell. Other parts of the protein regulate the activity of each kinase, so it can be turned on and off in response to different specific signals,as described below. By comparing the number of amino acid sequence differences between the various members of a protein family, we can construct an "evolutionary tree" that is thought to reflect the pattern of gene duplication and divergence that gave rise to the family. Figure 3-66 shows an evolutionary tree of protein kinases.Kinases with related functions are often located on nearby branches of the tree: the protein kinases involved in cell signaling that phosphorylate tyrosine side chains, for example, are all clustered in the top left corner of the tree. The other kinases shor,m phosphorylate either a serine or a threonine side chain, and many are organized into clusters that seem to reflect their functionin transmembrane signal transduction, intracellular signal amplification, cellcycle control, and so on. Figure3-65 The three-dimensional structureof a proteinkinase. Superimposed on this structureareredarrowheads to indicatesiteswhere insertions of 5-100aminoacidsarefound in somemembersof the protein kinasefamily.Theseinsertions arelocatedin loopson the surfaceof the enzymewhereother ligandsinteractwith the protein.Thus,they distinguish differentkinasesand conferon them distinctiveinteractions with other proteins. TheATp(whichdonatesa phosphategroup)and the peptideto be phosphorylated areheld in the activesire,whichextends betweenthe phosphate-binding loop (yellow)and the catalyticloop (orange).Seealso Figure3-10. (Adaptedfrom D.R.Knightonet al.,Science 253:407-414, 1991.With permission from AAAS.)






s e nn e CH s i d ec h a i n




k in a s e


phosphatase k in a s e -_.




(B) Figure3-64 Proteinphosphorylation. Manythousandsof proteinsin a typical eucaryotic cellaremodifiedby the covalent additionof a phosphategroup. (A)Thegeneralreaction,shownhere, transfers a phosphategroupfrom ATPto an aminoacidsidechainof the targetprotein by a protein kinase.Removalof the phosphategroup is catalyzed by a second enzymera proteinphosphatase. In this example,the phosphateis addedto a serine sidechain;in othercases, the phosphateis insteadlinkedto the -OH groupof a threonineor a tyrosinein the protein. (B)Thephosphorylation of a proteinby a proteinkinasecan eitherincrease or decreasethe protein'sactivity,depending on the siteof phosphorylation and the structureof the protein.



Figure3-66 An evolutionary tree of selectedprotein kinases.Although a cellcontainshundreds highereucaryotic and the human of suchenzymes, genomecodesfor morethan 500,onlY in this bookare someof thosediscussed snown.


PDGF receptor EGF tyrosine recepror kinase subfamily

cyclic-AMPd e p e n d e n tk i n a s e cyclic-GMPd e p e n d e n tk i n a s e p r o t e i nk i n a s eC

IGFB receptor


m y o s i nl i g h t dependent kinase c h a i nk i n a s e s

r e c e p t o rs e r i n e k i n a s es u b f a m i l y

As a result of the combined activities of protein kinases and protein phosphatases,the phosphate groups on proteins are continually turning over-being added and then rapidly removed. Such phosphorylation cyclesmay seem wasteful, but they are important in allowing the phosphorylated proteins to switch rapidly from one state to another: the more rapid the cycle, the faster a population of protein molecules can change its state of phosphorylation in responseto a sudden change in the phosphorylation rate (see Figure 15-11). The energy required to drive this phosphorylation cycle is derived from the free energy of ATP hydrolysis, one molecule of which is consumed for each phosphorylation event.

ShowsHowa TheRegulation of Cdkand SrcProteinKinases ProteinCanFunctionasa Microchip The hundreds of different protein kinases in a eucaryotic cell are organized into complex networks of signaling pathways that help to coordinate the cell's activities, drive the cell cycle, and relay signals into the cell from the cell's environment. Many of the extracellular signals involved need to be both integrated and amplified by the cell. Individual protein kinases (and other signaling proteins) serve as input-output devices, or "microchips," in the integration process. An important part of the input to these signal processing proteins comes from the control that is exerted by phosphates added and removed from them by protein kinases and protein phosphatases,respectively. In general, specific sets of phosphate groups serve to activate the protein, while other sets can inactivate it. A cyclin-dependent protein kinase (Cdk) provides a good example.Kinasesin this classphosphorylate serinesand threonines, and they are central components of the cell-cycle control system in eucaryotic cells,as discussedin detail in Chapter 17.In avertebrate cell, individual Cdk proteins turn on and off in succession, as a cell proceeds through the different phases of its division cycle.r'A/hena particular kinase is on, it influences various aspectsof cell behavior through effects on the proteins it phosphorylates. A Cdk protein becomes active as a serine/threonine protein kinase only when it is bound to a second protein called a cyclin. But, as Figure 3-67 shows, the binding of cyclin is only one of three distinct "inputs" required to activate the Cdk. In addition to cyclin binding, a phosphate must be added to a specific threonine side chain, and a phosphate elsewherein the protein (covalently bound to a specific tyrosine side chain) must be removed. Cdk thus monitors a specific set


OUTPUT Figure 3-67 How a Cdk protein acts as an integrating device.The of the functionof thesecentralregulators in Chapter17' cellcycleis discussed


Chapter3: Proteins

fatty acid

5 0 0a m i n o a c i d s

of cell components-a cyclin, a protein kinase, and a protein phosphatase-and it acts as an input-output device that turns on if, and only if, each of these components has attained its appropriate activity state. Some cyclins rise and fall in concentration in step with the cell cycle, increasing gradually in amount until they are suddenly destroyed at a particular point in the cycle. The sudden destruction of a cyclin (by targeted proteolysis) immediately shuts off its partner Cdk enzyme, and this triggers a specific step in the cell cycle.

Figure3-68 The domain structureof the Srcfamily of protein kinases,mapped alongthe amino acid sequence.Forthe three-dimensional structureof Src.see F i q u r e3 - 1 0 .

cated by the evolutionary tree in Figure 3-66, sequence comparisons suggest that tyrosine kinases as a group were a relatively late innovation that branihed off from the serine/threonine kinases, with the src subfamily being only one subgroup of the tyrosine kinases created in this way. The src protein and its relatives contain a short N-terminal region that becomes covalently linked to a strongly hydrophobic fatty acid, which holds the kinase at the c)'toplasmic face of the plasma membrane. Next come two peptide-binding modules, a Src homology 3 (sH3) domain and a sH2 domain, followed by the kinase catalytic domain (Figure 3-68). These kinasesnormally exist in an inactive conformation, in which a phosphorylated tyrosine near the c-terminus is bound to the SH2 domain, and the sH3 domain is bound to an internal peptide in a way that distorts the active site of the en4/me and helps to render it inactive. Turning the kinase on involves at least two specific inputs: removal of the c-

processing events that enable the cell to compute logical responsesto a complex set of conditions.

Proteins ThatBindand Hydrolyze GTpAre ubiquitousceilurar Regulators we have described how the addition or removal of phosphate groups on a protein can be used by a cell to control the protein's activity. In the examples discuised so

Figure3-69 The activation of a Src-type protein kinaseby two sequentialevents. (Adaptedfrom S.C.Harrisonet al.,Ceil 112:737-7 40,2003.With permission from Elsevier.)

a c t i v a t i n gl i g a n d

k i n a s ed o m a i n





far, the phosphate is transferred from an AIP molecule to an amino acid side chain of the protein in a reaction catalyzedby a specific protein kinase. Eucaryotic cells also have another way to control protein activity by phosphate addition and removal. In this case,the phosphate is not attached directly to the protein; instead, it is a part of the guanine nucleotide GTB which binds very tightly to the protein. In general, proteins regulated in this way are in their active conformations with GTP bound. The loss of a phosphate group occurs when the bound GTP is hydrolyzed to GDP in a reaction catalyzed by the protein itself, and in its GDP-bound state the protein is inactive. In this way, GTP-binding proteins act as on-off switches whose activity is determined by the presence or absence of an additional phosphate on a bound GDP molecule (Figure 3-71). GTP-binding proteins (also called GTPasesbecause of the GTP hydrolysis they catalyze) comprise a large family of proteins that all contain variations on the same GTP-binding globular domain. !\4ren the tightly bound GTP is hydrolyzed to GDB this domain undergoes a conformational change that inactivates it. The three-dimensional structure of a prototypical member of this family, the monomeric GTPase called Ras, is shor.tmin Figure 3-72. The Ras protein has an important role in cell signaling (discussedin Chapter 15). In its GTP-bound form, it is active and stimulates a cascade of protein phosphorylations in the cell. Most of the time, however, the protein is in its inactive, GDP-bound form. It becomes active when it exchangesits GDP for a GTP molecule in responseto extracellular signals,such as growth factors, that bind to receptors in the plasma membrane (seeFigure 15-58).


src-tvpeproteinkinaseactivityturnson to all of the fuliyonlyi{ the answers areYes abovequestions OUTPUT Figure3-70 How a Src-tYPeProtein kinaseacts as an integrating device.The disruotionof the 5H3domaininteraction (green)involvesreplacingits binding to the indicatedred linkerregionwith a tighterinteractionwith an activating ligand,as illustratedin Figure3-69.

Proteins RegulatoryProteinsControlthe Activityof GTP-B|nding WhetherGTPor GDPls Bound by Determining GTP-binding proteins are controlled by regulatory proteins that determine whether GTP or GDP is bound, just as phosphorylated proteins are turned on and offby protein kinases and protein phosphatases.Thus, Rasis inactivated by a GTPase-actiuating protein (GAP),which binds to the Ras protein and induces it to hydrolyze its bound GTP molecule to GDP-which remains tightlyboundand inorganic phosphate (PJ, which is rapidly released.The Ras protein stays in its inactive, GDP-bound conformation until it encounters a guanine nucleotide exchangefactor (GEF),which binds to GDP-Rasand causesit to releaseits GDP Because the empty nucleotide-binding site is immediately filled by a GTP molecule (GTPis present in large excessover GDP in cells),the GEF activatesRas by indirectly adding back the phosphate removed by GTP hydrolysis' Thus, in a sense,the roles of GAP and GEF are analogous to those of a protein phosphatase and a protein kinase, respectively (Figure 3-73).

FromSmallOnes CanBeGenerated LargeProteinMovements The Ras protein belongs to a large superfamily of monomeric GTPases,each of which consists of a single GTP-binding domain of about 200 amino acids. Over the course of evolution, this domain has also become joined to larger proteins with additional domains, creating a large family of GTP-binding proteins. Family members include the receptor-associated trimeric G proteins involved in cell signaling (discussedin Chapter 15), proteins regulating the traffic of vesicles between intracellular compartments (discussed in Chapter 13), and proteins that bind to transfer RNA and are required as assembly factors for protein





Figure3-7 1 GTP-bindingproteinsas molecularswitches.The activityof a protein(alsocalleda GTP-binding generallyrequiresthe presence GTPase) of a tightlyboundGTPmolecule(switch 'bn").Hydrolysis of this GTPmolecule producesGDPand inorganicphosphate (Pi),and it causesthe proteinto convert to a different,usuallyinactive, conformation(switch'bff").As shown here,resettingthe switch requiresthe a slow tightlybound GDPto dissociate, step that is greatlyacceleratedby specific a oncethe GDPhasdissociated, signals; moleculeof GTPis quicklyrebound.


Chapter3: Proteins

Figure3-72 The structureof the Ras protein in its GTP-boundform. ThismonomericGTPase illustrates the structureof a GTP-binding domain,which is presentin a largefamilyof GTP-binding proteins. Theredregionschangetheir conformation when the GTPmoleculeis hydrolyzed to GDPand inorganic phosphateby the protein;the GDP remainsboundto the protein,whilethe inorganicphosphateis released. The specialroleof the "switchhelix"in proteinsrelatedto Rasis explainednext (seeFigure3-75).

synthesis on the ribosome (discussedin chapter 6). In each case,an important biological activity is controlled by a change in the protein's conformation that is caused by GTP hydrolysis in a Ras-like domain. The EF-Tu protein provides a good example of how this family of proteins works. EF-Tu is an abundant molecule that servesas an elongation factor (hence the EF) in protein synthesis, loading each aminoacyl tRNA molecule onto the ribosome. The tRNA molecule forms a tight complex with the GTp-bound form of EF-Tu (Figure 3-74). In this complex, the amino acid attached to the IRNA is improperly positioned for protein slmthesis. The IRNA can transfer its amino acid only after the GTP bound to EF-Tu is hydrolyzed on the ribosome, allowing the EF-Tu to dissociate. Since the GTp hydrolysis is triggered by a proper fit of the IRNA to the mRNA molecule on the ribosome, the EF-Tu serves as a factor that discriminates between correct and incorrect mRNA-IRNA pairings (seeFigure 6-67 for a further discussion of this function of EF-Tu). By comparing the three-dimensional structure of EF-Tu in its GTp-bound and GDP-bound forms, we can see how the repositioning of the IRNA occurs. The dissociation of the inorganic phosphate group (pJ, which follows the reaction GTP -+ GDP + Pi, causes a shift of a few tenths of a nanometer at the GTpbinding site, just as it does in the Rasprotein. This tiny movement, equivalent to rN. I 'IiGNAL r f




srcrunl our l





Figure3-73 A comparisonof the two major intracellularsignaling mechanismsin eucaryoticcells.In both casesr a signalingproteinis activatedby the additionofa phosphategroupand inactivated by the removalof this phosphate. To emphasize the similarities in the two pathways,ATPand GTPare drawnas APPPand GPPP, and ADPand GDPasAPPand GPBrespectively. As shownin Figure3-64,the additionof a phosphateto a proteincanalsobe inhibitorv.


PROTEIN FUNCTION Thethree Figure3-74An aminoacyl boundto EF-Tu. tRNAmolecule proteinarecolored 3-75. to matchFigure differently, domains of theEF-Tu proteinexists protein; in however, a verysimilar Thisisa bacterial (Coordinates et by P.Nissen determined whereit iscalledEF-1. eucaryotes, fromAAA5.) 270:1464-1472, 1995. Withpermission al.,Science a few times the diameter of a hydrogen atom, causes a conformational change to propagate along a crucial piece of a helix, called Ihe switch helix, in the Raslike domain of the protein. The switch helix seems to serve as a latch that adheresto a specific site in another domain of the molecule, holding the protein in a "shut" conformation. The conformational change triggered by GTP hydrolysis causesthe switch helix to detach, allowing separatedomains of the protein to swing apart, through a distance of about 4 nm. This releasesthe bound IRNA molecule, allowing its attached amino acid to be used (Figure 3-75). Notice in this example how cells have exploited a simple chemical change that occurs on the surface of a small protein domain to create a movement 50 times larger.Dramatic shape changesof this type also causethe verylarge movements that occur in motor proteins, as we discuss next.

MotorProteinsProduceLargeMovementsin Cells We have seen that conformational changes in proteins have a central role in enzyrne regulation and cell signaling. We now discuss proteins whose major function is to move other molecules. These motor proteins generate the forces responsible for muscle contraction and the crawling and swimming of cells. Motor proteins also power smaller-scaleintracellular movements: they help to move chromosomes to opposite ends of the cell during mitosis (discussedin Chapter 17),to move organellesalong molecular tracks within the cell (discussed site of tRNA binding

GTPbinding site switch helix



(A)The three-dimensionalstructureof EF-Tuwith Figure3-75 The large conformationalchange in EF-Tucausedby GTPhydrolysis. is the switchhelix,which movesafterGTP lix protein, its and Ras GTPbound.The domainat the top hasa structuresimilarto the (B)Thechangein the conformation of the switchhelixin domain1 causesdomains2 and 3 to rotateas a singleunit by about90" hydrolysis. toward the viewer,which releasesthe IRNAthat was shown bound to this structurein Figure3-74. (A,adaptedfrom H. Berchtoldet al.,Noture Ltd.B,courtesyof MathiasSprinzland RolfHilgenfeld') from MacmillanPublishers 365:126-132,1 993.With permission


Chapter3: Proteins

in chapter 16), and to move enzyrnes along a DNA strand during the synthesis of a new DNA molecule (discussed in chapter 5). All these fundamental processesdepend on proteins with moving parts that operate as force-generating machines. How do these machines work? In other words, how do cells use shape changes in proteins to generate directed movements? If, for example, a protein is required to walk along a narrow thread such as a DNA molecule, it can do this by undergoing a series of conformational changes,such as those shor,rrnin Figure 3-76. But with nothing to drive these changes in an orderly sequence,they are perfectly reversible, and the protein can only wander randomly back and forth along the thread. we can look at this situation in another way. Since the directional movement of a protein does work, the laws of thermodynamics (discussed in chapter 2) demand that such movement use free energy from some other source (otherwise the protein could be used to make a perpetual motion machine). Therefore, without an input of energy,the protein molecule can only wander aimlessly. How can the cell make such a series of conformational changes unidirectional? To force the entire cycle to proceed in one direction, it is enough to make any one of the changes in shape irreversible. Most proteins that are able to walk in one direction for long distances achieve this motion by coupling one of the conformational changes to the hydrolysis of anATp molecule bound to the protein. The mechanism is similar to the one iust discussed that drives allosteric

Figure3-76 An allosteric"walking" protein. Although its three different conformationsallow it to wander randomlybackand forth while boundto a threador a filament,the protein cannot moveuniformlyin a singledirection.

In the model shorrrmin Figure 3-zz, Nlp binding shifts a motor protein from conformation I to conformation 2.The bound ATp is then hydrolyzed to produce ADP and inorganic phosphate (PJ, causing a change from conformation 2

Many motor proteins generate directional movement in this general way, including the muscle motor protein myosin, which walks along actin filamenis to generatemuscle contraction, and the kinesinproteins that walk along microtubules (both discussedin chapter l6). These movements can be rapid:iome of the motor proteins involved in DNA replication (the DNA helicises) propel themselves along a DNA strand at rates as high as 1000nucleotides p". second.

Membrane-Bound Transporters Harness Energyto pump Molecules ThroughMembranes


we have thus far seen how allosteric proteins can act as microchips (cdk and Src kinases),as assembly factors (EF-Tu),and as generatorsof mechanical force and motion (motor proteins). Allosteric proteins can also harness energy derived from ATP hydrolysis, ion gradients, or electron transport processesto pump specific ions or small molecules acrossa membrane. we consider one ex€rrnptetrere; others will be discussedin Chapter ll. The ABC transporters constitute an important class of membrane-bound pump proteins. In humans at least 48 different genesencode them. These transporters mostly function to export hydrophobic molecules from the cytoplasm, Figure3-77 An allostericmotor protein.The transitionbetweenthree differentconformations includesa stepdrivenby the hydrolysis of a bound ATPmolecule,and this makesthe entirecycleessentially irreversible. By repeatedcycles,the proteinthereforemovescontinuouslyto the right alongthe thread.

direction of movement



m e m b r a n e - s p a n n i nsgu b u n i t s

lipid bilayer


Figure3-78 The ABC(ATP-binding cassette)transporter,a protein machine that pumps large hydrophobic molecules through a membrane.(A)The bacterial BtuCDprotein,whichimportsvitamin812 into E coli usingthe energyofATP of The bindingof two molecules hydrolysis. ATPclampstogetherthe two ATP-binding The structureis shownin its ADPsubunits. bound state,wherethe channelto the spacecan be seento be open extracellular but the gateto the cytosolremainsclosed. (B)Schematic of substrate illustration In bacteria, pumpingby ABCtransporters. the bindingof a substratemoleculeto the faceof the proteincomplex extracellular triggersATPhydrolysisfollowed by ADP gate; whichopensthe cytoplasmic release, the pump is then resetfor anothercycle.In eucaryotes,an oppositeprocessoccurs, to be pumped causingsubstratemolecules out ofthe cell.(A,adaptedfrom K.P.Locher, Curr. Ooin. Struct.Biol. 14:426-441,2004' from Elsevier.) With permission

A T P - b i n d i n sgu b u n i t s


(B) A BACTERIAL ABCTRANSPORTER s u b s t r a t em o l e c u l e


CI'TOSOL substrate molecule


\ 2P


\ ATP-binding

serving to remove toxic molecules at the mucosal surface of the intestinal tract, for example, or at the blood-brain barrier. The study of ABC transporters is of intense interest in clinical medicine, because the overproduction of proteins in this class contributes to the resistance of tumor cells to chemotherapeutic drugs. And in bacteria, the same tlpe of proteins primarily function to import essential nutrients into the cell. The ABC transporter is a tetramer, with a pair of membrane-spanning subunits linked to a pair of ATP binding subunits located just below the plasma membrane (Figure 3-78A). As in other exampleswe have discussed,the hydrolysis of the bound ATP molecules drives conformational changes in the protein, transmitting forces that cause the membrane-spanning subunits to move their bound molecules acrossthe lipid bilayer (Figure 3-788). Humans have invented many different types of mechanical pumps, and it should not be surprising that cells also contain membrane-bound pumps that function in other ways. Among the most notable are the rotary pumps that couple the hydrolysis of ATP to the transport of H* ions (protons). These pumps resemble miniature turbines, and they are used to acidify the interior of lysosomes and other eucaryotic organelles.Like other ion pumps that create ion gradients, they can function in reverseto catalyzethe reactionADP + Pr-+ ATB if the gradient acrosstheir membrane of the ion that they transport is steep enough. One such pump, the ATP slrrthase, harnessesa gradient of proton concentration produced by electron transport processesto produce most of the AIP used in the living world. This ubiquitous pump has a central role in energy conversion, and we shall discussits three-dimensional structure and mechanism in Chapter 14.

+zri zi.:#Mi


Chapter3: Proteins

ProteinsOftenFormLargeComplexes ThatFunctionas protein Machines Large proteins formed from many domains are able to perform more elaborate functions than small, single-domain proteins. But large protein assemblies formed from many protein molecules perform the most impressive tasks. Now that it is possible to reconstruct most biological processesin cell-free systemsin the laboratory, it is clear that each of the central processesin a cell-such as DNA replication, protein synthesis,vesicle budding, or transmembrane signaling-is catalyzed by a highly coordinated, linked set of I0 or more proteins. In most such protein machines, an energetically favorable reaction such as the hydrolysis of bound nucleoside triphosphates (ATp or GTp) drives an ordered series of conformational changes in one or more of the individual protein subunits, enabling the ensemble of proteins to move coordinately. In this way, each enzyme can be moved directly into position, as the machine catalyzessuccessive reactions in a series.This is what occurs, for example, in protein synthesis on a ribosome (discussedin chapter 6)-or in DNA replication, where a large multiprotein complex moves rapidly along the DNA (discussedin chapter 5). cells have evolved protein machines for the same reason that humans have invented mechanical and electronic machines. For accomplishing almost any task, manipulations that are spatially and temporally coordinated through linked processesare much more efficient than the use of individual tools.

ProteinMachines with Interchangeable PartsMakeEfficientuse of Geneticlnformation To probe more deeply into the nature of protein machines, we shall consider a relatively simple one: the SCF ubiquitin ligase. This protein complex binds different "target proteins" at different times in the cell cycle, and it covalently adds multiubiquitin polypeptide chains to these proteins. Its c-shaped structure is formed from five protein subunits, the largest of which is a molecule that serves as a scaffold protein on which the rest of the structure is built. The structure underlies a remarkable mechanism (Figure 3-zg). At one end of the c is an E2 ubiquitin-conjugating enzyme. At the other end is a substrate-binding arm, a subunit knovrn as an F-box protein.These two subunits are separatedby a gap of about 5 nm. \Mhen this protein complex is acrivated, the F-box protein binds to a specific site on a target protein, positioning the protein in the gap so that some of its lysine side chains contact the ubiquitin-conjugating This enzyme can then catalyze the repeated addition of a ubiquitin "nry-e. polypeptide to these lysines (seeFigure 3-79c), producing a polyubiquitin chain that marks rhe target protein for rapid destruction in a proteasome (seep. 393). In this manner, specific proteins are targeted for rapid destruction in


adaptor protein 2 1)

F-boxprotein ( s u b s t r a t e - b i n d i nagr m )

E 2u b i q u i t i n conjugating enzyme ubiquitin

I I two oJ many possible substrate-bind ing arms


p o l y u b i q ui t y l a t e d protein targeted lor destruction ,/r


u b i q u i t i nl i g a s e

in cells, inasmuch as new functions can evolve for the entire complex simply by producing an alternative version of one of its subunits.

OftenInvolvesPositioning TheActivationof ProteinMachines Themat SpecificSites As scientists have learned more of the details of cell biology, they have recognized increasing degreesof sophistication in cell chemistry. Thus, not only do we now know that protein machines play a predominant role, but it has recently become clear that most of these machines form at specific sites in the cell, being activated only where and when they are needed. Using fluorescent, GFP-tagged fusion proteins in living cells (see p. 593), cell biologists are able to follow the repositioning of individual proteins that occurs in response to specific signals. Thus, when certain extracellular signaling molecules bind to receptor proteins in the plasma membrane, they often recruit a set of other proteins to the inside surface of the plasma membrane to form protein machines that pass the signal on. As an example, Figure 3-804 illustrates the rapid movement of a protein kinase C (PKC)enzyme to a complex in the plasma membrane, where it associates with specific substrate proteins that it phosphorylates. There are more than 10 distinct PKC enzymes in human cells, which differ both in their mode of regulation and in their functions. When activated, these enz]rynesmove from the cytoplasm to different intracellular locations, forming specific complexes with other proteins that allow them to phosphorylate different protein substrates (Figure 3-808). The SCF ubiquitin ligases can also move to specific sites of function at appropriate times. As will be explained when we discuss cell signaling in Chapter 15, the mechanisms frequently involve protein phosphorylation, as well as scaffold proteins that link together a set of activating, inhibiting, adaptol and substrate proteins at a specific location in a cell. This general phenomenon is known as induced proximity, and it explains the otherwise puzzling observation that slightly different forms of enzymes with the same catalltic site will often have very different biological functions. Cells change the locations of their proteins by covalently modifying them in a variety of different ways, as part of a "regulatory code" to be described next.

Figure3-79 The structure and mode of actionof a SCFubiquitinligase.(A)The structureof the five-proteincomplexthat The includesan E2ubiquitinligase. proteindenoted hereas adapterprotein 1 is the Rbxl/Hrt1protein,adaPtor protein2 is the 5kp1protein,and the cullinis the Cull protein.(B)Comparison of the samecomplexwith two different arms,the F-box substrate-binding proteins Skp2 (top) and p-trCPl (bottom), (C)The binding and respectively. ubiquitylationof a target protein by the a SCFubiquitinligase.lf,as indicated, chainof ubiquitinmoleculesis addedto the samelysineof the target protein,that protein is markedfor rapid destructionby the proteasome.(A and B,adaptedfrom G.Wu et al.,Mol.Cell11:1445-1456,2003. With oermissionfrom Elsevier.)


C h a p t e r3 : P r o t e i n s

0 min

3 min




(B) 2 0u m

1 0u m

unstructured regton , ,,,i


1 0m i n

rapid collisions +

structured d o m ar n


rnoditicationscreate sites on proteins that bind them to particular scafftlld proteins,therebyclusteringthe proteins required for particular reactionsin specificregionsof the cell. Most biologicalreactionsare catalyzedby setsof 5 or more proteins, and such a clustering of proteins is often required for the reaction to occur. Scaffoldsthereby allow cells to compartmentalizereactionseven irr the absence of membranes. Although onty recently recognized as a widespread phenomenon, this tvpe of clustering is particularly obvious in the cell nucleus (seeFigure4-69). Many scaffolds appear to be quite different from the cullin illustrated previously in Figure 3-79: rather than holding their bound proteins in precisepositions lelative to each other, the interacting proteins are linked by unstructured regionsof polvpeptide chain. This tethersthe proteins together,causingthem to collicle frequently with each other in random orientations-some of which will lead to a productive reaction (Figure3-B0c). In essence,this mechanism greatly speeds reactions by creating a very high local concentration of the reacting species.For this reason,the use ofscaffold proteins representsan especiallyversatileway of controlling cell chemistry (seealso Figure l5-61).

Figure3-80 The assemblyof protein machinesat specificsitesin a cell. (A)In response to a signal(herea phorbol ester),the gammasubspecies of protein kinaseC movesrapidlyfrom the cytosol to the plasmamembrane. The protein kinaseis fluorescent in theselivingcells becausean engineered geneinsidethe cellencodesa fusionproteinthat links the kinaseto greenfluorescent protein (GFP). (B)Thespecificassociation of a differentsubspecies of proteinkinaseC (aPKC) with the apicaltip of a differentiating neuroblastin an early Drosophila embryo.The kinaseis stained red,andthe cellnucleusgreen. (C)Diagramillustrating how a simple proximitycreatedby scaffoldproteins cangreatlyspeedreactionsin a cell.In this example,long unstructured regions of polypeptidechainin a largescaffold proteinconnecta seriesof structured domainsthat bind a setof reacting proteins. The unstructured regionsserve asflexible"tethers"thatgreatlyspeed reactionratesby causinga rapid,random collisionof all of the proteinsthat are (Fora simple boundto the scaffold. exampleof tethering,seeFigure16-38.) (A,from N. Sakaiet al,J. CellBiol. 139:1465-1476, 1997.With permission from The Rockefeller University Press. B,courtesyof AndreasWodarz,Institute of Genetics, University of Dr.isseldorf, Germany.)

Many ProteinsAre controlled by Multisitecovalent Modification we have thus far described only one type of posttranslational modification of proteins-that in which a phosphate is covalentlv attached to an amino acid side chain (seeFigure3-64). But a largenumber of other such modifications also occLlr,rnore than 200 distinct types being known. To give a senseof the variety, lable 3-3 presents a subset of modifying groups with known regulatory roles.As Table3-3 SomeMoleculesCovalentlyAttachedto ProteinsRegulateProteinFunction MODIFYING GROUP


Phosphateon Ser,Thr,or Tyr Methylon Lys

Drivesthe assembly of a proteininto largercomplexes (seeFigure15-,|9). Helpsto creates histonecodein chromatin throughformingeithermono-, di-,or tri-methyllysine(seeFigure4-38). Helpsto creates histonecodein chromatin(seeFigure4-38). Thisfattyacidadditiondrivesproteinassociation (see with membranes Figurel0-20). Controls enzymeactivityandgeneexpression in glucosehomeostasis. Monoubiquitin additionregulates the transportof membrane proteinsin vesicles (seeFigure13-58). A polyubiquitin chaintargetsa proteinfor degradation (seeFigure3-79).

Acetylon Lys Palmitylgroupon Cys N-acetylglucosamine on Seror Thr Ubiquitinon Lys

U b i q u i tsi na / 6 a m i n o a c i d p o y p e pt ht iedree;a r e e aa t stl0otherubiquitin-relatedproteins,suchasSUMo,thatmodifyproteinsins





iltl PP




Ac Ac










P R O T E Ip Ns 3

.e. phosphate SUMO

r,l P



$P,' methyl











\/ VV ( B ) H I S T O NH E3


B I N DT O M O V ET O M O V ET O ^ r n \ / ET ^ pRoTEASoME pRoTErNsor or PLASMA ill;.];,',: or F O RD E G R A D A T I O N M E M B R A N E Y ANDZ


in phosphate addition, these groups are added and then removed from proteins according to the needs of the cell. A large number of proteins are now knor,vnto be modified on more than one amino acid side chain, with different regulatory events producing a different pattern of such modifications. A striking example is the protein p53, which plays a central part in controlling a cell'sresponseto adversecircumstances (seep. I 105). Through one of four different tlpes of molecular additions, this protein can be modified at 20 different sites (Figure 3-SfA). Becausean enormous number of different combinations of these 20 modifications are possible, the proteins behavior can in principle be altered in a huge number of ways. Moreover, the pattern of modifications on a protein can determine its susceptibility to further modification, as illustrated by histone H3 in Figure 3-BlB. Cell biologists have only recently come to recognize that each protein's set of covalent modifications constitutes an importanl combinatorial regulatory code' As specific modi$ring groups are added to or removed from a protein, this code causes a different set of protein behaviors-changing the activity or stability of the protein, its binding partners, and its specific location within the cell (Figure 3-8iC). This helps the cell respond rapidly and with great versatility to changes in its condition or environment.

Cell Underlies A ComplexNetworkof ProteinInteractions Function There are many challengesfacing cell biologists in this "post-genome" era when complete genome sequences are is the need to dissect and reconstruct each one of the thousands of protein machines that exist in an organism such as ourselves. To understand these remarkable protein complexes, each must be reconstituted from its purified protein parts, so that we can study its detailed mode of operation under controlled conditions in a test tube, free from

Figure3-81 Multisiteprotein modification and its effects.A protein addition that carriesa post-translational to morethan one of its aminoacidside to carrya chainscan be considered regulatorycode.(A)The combinatorial oatternof known covalentmodifications to the proteinp53;ubiquitinand SUMO (seeTable3-3). arerelatedpolypeptides (B)The possiblemodifications on the first of 20 aminoacidsat the N-terminus histoneH3,showingnot onlYtheir locationsbut alsotheir activating(b/ue.) and inhibiting (red)effectson the additionof neighboringcovalent modifications.In additionto the effects and methylation shown,the acetylation of a lysinearemutuallyexclusive reactions(seeFigure4-38).(C)Diagram showingthe generalmannerin which areaddedto (and multisitemodifications removedfrom)a proteinthrough signalingnetworks,and how the regulatorycode resultingcombinatorial on the protein is readto alter its behavior in the cell.


Chapter3: Proteins

all other cell components. This alone is a massive task. But we now know that each of these subcomponents of a cell also interacts with other sets of macromolecules, creating a large network of protein-protein and protein-nucleic acid interactions throughout the cell. To understand the cell, therefore, we need to analyzemost of these other interactions as well. We can gain some idea of the complexity of intracellular protein netvvorks from a particularly well-studied example described in Chapter 16: the many dozens of proteins that interact with the actin cytoskeleton in the yeast saccharomycescereuisiae(seeFigure l6-18). The extent of such protein-protein interactions can also be estimated more generally. An enormous amount of valuable information is now freely available in protein databaseson the Internet: tens of thousands of three-dimensional protein structures plus tens of millions of protein sequencesderived from the nucleotide sequencesofgenes. Scientistshave been developing new methods for mining this great resource to increase our understanding of cells. In particular, computer-based bioinformatics tools are being combined with robotics and microarray technologies (seep. s74) to allow thousands of proteins to be investigated in a single set of experiments. proteomics is a term that is often used to describe such research focused on the large-scaleanalysis of proteins, analogous to the term genomics describing the Iarge-scaleanalysis of DNA sequencesand genes. Biologists use two different large-scalemethods to map the direct binding interactions between the many different proteins in a cell. The initial method of choice was based on genetics: through an ingenious technique known as the yeast two-hybrid screen (see Figure 8-24), tens of thousands of interactions between thousands of proteins have been mapped in yeast,a nematode, and the fruit fly Drosophila. More recently, a biochemical method based on affinity tagging and mass spectroscopy has gained favor (discussedin chapter 8), because it appears to produce fewer spurious results.The results of these and other analyses that predict protein binding interactions have been tabulated and organized in Internet databases.This allows a cell biologist studying a small set of proteins to readily discover which other proteins in the same cell are thought to bind to, and thus interact with, that set of proteins. \Arhendisplayed graphically as a protein interaction map, eachprotein is representedby a box or dot in a twodimensional network, with a straight line connecting those proteins that have been found to bind to each other. \Mhen hundreds or thousands of proteins are displayed on the same map, the network diagram becomes bewilderingly complicated, serving to illustrate how much more we have to learn before we can claim to really understand the cell. Much more useful are small subsections of these maps, centered on a few proteins of interest. Thus, Figure 3-82 shows a network of protein-protein interactions for the five proteins that form the SCFubiquitin ligase in a yeast cell (see Figure 3-79). Four of the subunits of this ligase are located at the bottom right of Figure 3-82. The remaining subunit, the F-box protein that serves as its substrate-binding arm, appears as a set of 15 different gene products that bind to adaptor protein 2 (the Skpl protein). Along the top and left of the figure are sets of additional protein interactions marked with yellow and green shading: as indicated, these protein sets function at the origin of DNA replication, in cell cycle regulation, in methionine slmthesis, in the kinetochore, and in vacuolar H+ArPase assembly.we shall use this figure to explain how such protein interaction maps are used, and what they do and do not mean. 1. Protein interaction maps are useful for identifuing the likely function of previously uncharacterized proteins. Examples are the products of the genes that have thus far only been inferred to exist from the yeast genome sequence,which are the six proteins in the figure that lack a simple threeletter abbreviation (white lettersbeginning withy). one, the product of socalled open readingframeYDRlg6c, is located in the origin of replication group' and it is therefore likely to have a role in starting new replication forks. The remaining five in this diagram are F-box proteins thai bind to Skpl; these are therefore likely to function as part of the ubiquitin ligase, serving as substrate-binding arms that recognize different target proteins.



However, as we discussnext, neither assignment can be considered certain without additional data. 2 . Protein interaction networks need to be interpreted with caution because, as a result of evolution making efficient use of each organism's genetic information, the same protein can be used as part of two different protein complexes that have different types of functions. Thus, although protein A binds to protein B and protein B binds to protein C, proteins A and C need not function in the same process.For example, we know from detailed biochemical studies that the functions of Skpl in the kinetochore and in vacuolar H+-ATPaseassembly (yellow shading) are separate from its function in the SCF ubiquitin ligase. In fact, only the remaining three functions of synthesis, cell cycle regulaSkpl illustrated in the diagram-methionine tion, and origin of replication (green shading)-involve ubiquitylation. 3 . In cross-speciescomparisons, those proteins displaying similar patterns of interactions in the two protein interaction maps are likely to have the same function in the cell. Thus, as scientists generate more and more highly detailed maps for multiple organisms, the results will become increasingly useful for inferring protein function. These map comparisons are a particularly powerful tool for deciphering the functions of human proteins. There is a vast amount of direct information about protein function that can be obtained from genetic engineering, mutational, and O R I G I NO F R E P L I C A T I O N CELLCYCLEREGULATORS

M E T H I O N I NSEY N T H E 5 I 5


E 2u b i q u i t i n coniugating enzyme



c Cep3

cbf2 .-..-



'/. adaptor protein1 Vm



Ram2 -


adaptor protein 2

scaffoldprotein (cullin)

Figure3-82 A map of some protein- protein interactionsof the SCFubiquitin ligaseand other proteins in the yeast S.lerevisiae,Thesymbolsand/or colorsusedfor the 5 proteinsof the ligasearethose in Figure3-79. Note that 15 different with u/hitelettering(beginningwith Y) areonly knownfrom the genome F-boxproteinsareshown(purpte);those of PeterBowersand DavidEisenberg, sequenceasopen readingframes.Foradditionaldetails,seetext.(Courtesy UCLA.) UCLA-DOE Institutefor Genomicsand Proteomics,


Chapter3: Proteins Figure3-83 A networkof protein-bindinginteractionsin a yeastcell. Eachlineconnectinga pairof dots (proteins) indicates a protein-protein (FromA. Guimer6and M. Sales-Pardo, interaction. Mol.Syst. Biol.2:42,2006. With permission from MacmillanPublishers Ltd.)

genetic analyses in model organisms-such as yeast, worms, and fliesthat is not available in humans The available data suggestthat a typical protein in a human cell may interact with between 5 and 15 different partners. Often, each of the different domains in a multidomain protein binds to a different set of partners; in fact, we can speculate that the unusually extensivemultidomain structures observed for human proteins may have evolved to facilitate these interactions. Given the enormous complexity of the interacting networks of macromolecules in cells (Figure 3-83), deciphering their full functional meaning may well keep scientists busy for centuries.

Su m m a r y Proteins canform enormouslysophisticatedchemical deuices,whosefunctions largely depend on the detailed chemical properties of their surfaces.Binding sitesfor ligands areformed as surfacecauitiesin which preciselypositioned amino acid side chains are brought togetherby protein folding. In this way, normally unreactiueamino acid side chains can be actiuated to make and break coualentbonds.Enzymesare catalytic proteins that greatly speedup reaction rates by binding the high-energy transition states for a speciftcreaction path; they also perform acid catalysisand basecatalysissimultaneously.The ratesof enzymereactionsare often sofast thqt they are limited only by diffusion; ratescan befurther increasedif enzymesthat act sequentiallyon a substrate are joined into a single multienzyme complex, or if the enzymesand their substrates are confined to the same compartment of the cell. Proteins reuersiblychange their shape when ligands bind to their surface. The allosteric changesin protein conformation produced by one ligand affect the binding of a secondligand, and this linkage betweentwo ligand-binding sitesprouidesa crucial mechanism for regulating cell processes.Metabolic pathways, for example, are controlled by feedback regulation: some small moleculesinhibit and other small moleculesactiuate enzymesearly in a pathway. Enzymescontrolled in this way generally form symmetric assemblies,allowing cooperetiueconformational changesto reate a steepresponseto changesin the concentrationsof the ligands that regulatethem. The expenditure of chemical energy can driue unidirectional changesin protein shape.By coupling allosteric shape changesto ATp hydrolysis,for example, proteins can do useful work, such as generating a mechanical force or mouing for long distancesin a singledirection.The three-dimensionalstructuresof proteins,determined by x-ray crystallography,haue reuealedhow a small local change causedby nucleoside triphosphate hydrolysis is amplified to create major changes elsewherein the protein. By such means,theseproteinscan serueas input-output deuicesthat transmit information, as assemblyfactors, as motors, or as membrane-boundpumps. Highly efficient protein machines areformed by incorporating many dffirent protein moleculesinto larger assembliesthat coordinate the allosteric mouementsof the inttiuidual components.such machinesere now known to perform many of the most important reactionsin cells. Proteins are subjectedto mqny reuersiblepost-translational modifications, such as the coualentaddition of a phosphateor an acetylgroup to a specificamino acid side chain. The addition of thesemodifying groups is usedto regulate the actiuity of a protein, changing its conformation, its binding to other proteins and its location inside the cell.A ltpical protein in a celt will interact with more than fiue dffirent panners. using the new technologiesof proteomics,biologistscan analyze thousandsof proteins in one set of experiments.One important result is the production of detailed protein interaction maps, which aim at describingall of the binding interactions betweenthe thousandsof distinct proteins in a cell.







FigureQ3-1 The kelch repeatdomainof galactoseoxidasefrom D.dendroides(Problem 3-9).The seven individualB propellers The N- and areindicated. C-terminiareindicated by N and C.

6r*, f l ,

Whichstatementsare true? Explainwhy or why not. 3-1 Each strand in a B sheet is a helix with two amino acidsper turn. 3-2 Loops of polypeptide that protrude from the surface of a protein often form the binding sitesfor other molecules. 3-3 An enzymereachesa maximum rate at high substrate concentrationbecauseit has a fixed number of active sites where substratebinds. 3-4 Higher concentrationsof enzyrnegiverise to a higher turnover number. 3*5 Enz)rynessuch as aspartatetranscarbamoylasethat undergo cooperative allosteric transitions invariably contain multiple identical subunits. 3*6 Continual addition and removal of phosphates by protein kinases and protein phosphatasesis wasteful of energy-since their combined action consumesATP-but it is a necessaryconsequenceof effectiveregulation by phosphorylation. Discussthe following problems. 3-7 Consider the following statement. "To produce one molecule of each possible kind of polypeptide chain, 300 amino acidsin length, would require more atoms than existin the universe." Given the size of the universe,do you suppose this statement could possibly be correct?Since counting atoms is a tricky business,consider the problem from the standpoint of mass.The mass of the observableuniverseis estimated to be about l0B0grams, give or take an order of magnitude or so.Assumingthat the averagemassof an amino acid is I l0 daltons,what would be the massof one molecule of eachpossiblekind of pollpeptide chain 300 amino acidsin length?Is this greaterthan the mass of the universe? 3-8 A common strategyfor identifying distantly related proteins is to search the databaseusing a short signature sequenceindicative of the particular protein function. \A/hy is it better to searchwith a short sequencethan with a long sequence?Do you not have more chancesfor a'hit' in the databasewith a long sequence? 3-9 The so-calledkelch motif consistsof a four-stranded B sheet,which forms what is known as a B propeller. It is usually found to be repeatedfour to seventimes, forming a kelch repeat domain in a multidomain protein. One such kelch repeat domain is shor.tmin Figure Q3-1. Would you classifythis domain as an'in-line' or'plug-in type domain? 3-10 Titin, which has a molecular weight of 3 x 106daltons, is the largest polypeptide yet described. Titin moleculesextend from muscle thick filaments to the Z disc; they arethought to act as springsto keep the thick filaments centeredin the sarcomere.Titin is composedof a largenumber of repeatedimmunoglobulin (Ig)sequencesof 89 amino acids,each of which is folded into a domain about 4 nm in length (Figure Q3-2A). You suspectthat the springlikebehavior of titin is caused by the sequentialunfolding (and refolding) of individual Ig

domains. You test this hlpothesis using the atomic force microscope,which allowsyou to pick up one end of a protein molecule and pull with an accuratelymeasuredforce. For a fragment of titin containing seven repeats of the Ig domain, this experiment gives the sawtooth force-versusextension curve shourn in Figure Q3-28. \A4renthe experiment is repeatedin a solution of B M urea (a protein denaturant), the peaks disappear and the measured extension becomesmuch longer for a given force.If the experimentis repeated after the protein has been cross-linkedby treatment with glutaraldehyde,once again the peaks disappear but the extensionbecomesmuch smaller for a given force. A. Are the data consistentwith your hlpothesis that titin's springlike behavior is due to the sequential unfolding of individual Ig domains?Explainyour reasoning. B. Is the extension for each putative domain-unfolding event the magnitude you would expect? (In an extended polypeptide chain, amino acids are spaced at intervals of 0.34nm.) C. \Mhy is each successivepeak in Figure Q3-2B a little higher than the one before? D. \A/hydoesthe force collapseso abruptly after eachpeak? 3*11 It is often said that protein complexesare made from subunits (that is, individually slnthesized proteins) rather than as one long protein becausethe former is more likelyto give a correctfinal structure. A. Assuming that the protein synthesismachinery incorDoratesone incorrect amino acid for each 10,000it inserts, (A)


4oo ^ z o ;


300 200

,P loo 0

i . rr t:,


I r i i . i r , r at t , r i rr:. . . : r , , ' l i . r i r i , r i , , ' ' : . ' .



150 100 (nm) extension


behaviorof titin (Problem3-10)'(A)The FigureQ3-2 Springlike versus structureof an individuallg domain.(B)Forcein piconewtons extensionin nanometersobtainedby atomicforce microscopy.


Chapter3: Proteins

calculatethe fraction of bacterial ribosomesthat would be assembledcorrectly if the proteins were synthesizedas one large protein versusbuilt from individual proteins?For the sake of calculation assumethat the ribosome is composed of 50 proteins, each 200 amino acids in length, and that the subunits-correct and incorrect-are assembledwith eoual likelihood into the completeribosome.IThe probability that a polypeptidewill be made correctly,Pc, equalsthe fraction correct for each operation,/6, raisedto a power equal to the number of operations, n: P6 = lfd". For an error rate of 1 / 1 0 , 0 0 0f r, . = 0 . 9 9 9 9 . 1 B. Is the assumption that correct and incorrect subunits assembleequally well likely to be true? \A4ryor why not? How would a changein that assumption affect the calculation in part A? 3-12 Roussarcomavirus (RSV)carriesan oncogenecalled Srq which encodes a continuously active protein tl,'rosine kinase that leadsto uncheckedcell proliferation. Normally, Src carries an attached fatty acid (myristoylate)group that allowsit to bind to the cy'toplasmicside of the plasmamembrane. A mutant version of Src that does not allow attachment of myristoylatedoesnot bind to the membrane.Infection of cells with RSV encoding either the normal or the mutant form of Src leads to the same high level of protein tyrosine kinase activity,but the mutant Src does not cause cell proliferation. A. Assumingthat the normal Srcis all bound to the plasma membrane and that the mutant Src is distributed throughout the cy.toplasm,calculatetheir relativeconcentrationsin the neighborhood of the plasma membrane. For the purposes of this calculation, assume that the cell is a sphere with a radius of l0 pm and that the mutant Srcis distributed throughout, whereasthe normal Src is confined to a 4-nmthick layer immediately beneath the membrane. [For this problem, assumethat the membrane has no thickness.The volume of a sphereis (4/3)rr3.l B. The target (X) for phosphorylationby Srcresidesin the membrane.Explainwhy the mutant Src does not causecell proliferation. 3-13 An antibody binds to anotherprotein with an equilibrium constant,K of 5 x lOeM-1.\A/henit binds to a second, relatedprotein, it forms three fewer hydrogenbonds,reducing its binding affinity by 2.8 kcal/mole.\Mhatis the Kfor its binding to the secondprotein?(Free-energy changeis related to the equilibrium constantby the equationAG" = -2.3 RTlog K whereR is t.9Bx 10-3kcal/(moleK) and Tis 310K.) 3-i 4 The protein SmpBbinds to a specialspeciesof tRNA, tmRNA, to eliminate the incomplete proteins made from truncated mRNAs in bacteria. If the binding of SmpB to tmRNA is plotted as fraction tmRNA bound versus SmpB concentration,one obtainsa symmetricalS-shapedcurve as shor.rnin Figure Q3-3. This curve is a visual displayof a very useful relationship between tr:i and concentration, which has broad applicability.The generalexpressionfor fraction of ligand bound is derived from the equation for K6 (trfr= lPrllll/ [Pr-L])by substituting([L]ror.- tL])for [pr-L] and rearranging.Becausethe total concentrationofligand ([L]ror) is equal to the free ligand (tll) plus bound ligand ([pr-L]),

ltmRNAlror = [SmpB]/([SmpB]+ rQ). Using this relationship, calculatethe fraction of tmRNA bound for SmpB concentrationsequal to 104Kd,103Kd,l02Kd,lOltra, Kd, lO-tIA, l0-2Kd,10-3^?,and 10rK4. 10

E 075 !

c l o c


E a



0 1 01 1


1 0s


(M) centration of SmpB FigureQ3-3Fraction of tmRNA boundversus SmpBconcentration ( P r o b l e3m- 1 4 ) .

3*15 Many enzymes obey simple Michaelis-Menten kinetics,which are summarizedby the equation rate = vmax[s]/([S] + K_) where V-* = maximum velociry [S]= concentrationof substrate,and Km= the Michaelisconstant. It is instructiveto plug a fewvaluesof [S]into the equation to seehow rate is affected.What are the ratesfor [S]equal to zero,equal to K-, and equal to infinite concentration? 3-16 The enzyme hexokinaseadds a phosphateto D-glucose but ignores its mirror image, L-glucose.Supposethat you were able to synthesizehexokinase entirely from Damino acids,which are the mirror image of the normal Lamino acids. A. Assuming that the 'D' enz).rnewould fold to a stable conformation,what relationshipwould you expectit to bear to the normal'l enzyme? B. Do you supposethe'D' en4/rnewould add a phosphate to L-glucose,and ignore D-glucose? 3-17 How do you supposethat a molecule of hemoglobin is ableto bind oxygenefficientlyin the lungs,and yet release it efficientlyin the tissues? 3-18 Synthesisof the purine nucleotidesAMP and GMP proceeds by a branched pathway starting with ribose 5phosphate (R5P),as shown schematicallyin Figure Q3-4. Using the principles of feedbackinhibition, proposea regulatory strategyfor this pathway that ensuresan adequate supply of both AMP and GMP and minimizes the buildup of the intermediates(,4-1)when suppliesof AMP and GMP are adequate. F +



/ +GMP

,/ R5P+A+8+C+D+E


fraction bound = tll/ [L]ror = tprl/ (tprl + Ka) For SmpB and tmRNA, the fraction bound = [tmRNAl/

FigureQ3-4 Schematic diagramof the metabolicpathwayfor synthesis of AMPand GMPfrom R5P(Problem3*18).


REFERENCES General Tymoczko Berg-1M, lL & StryerL (2006)Biochemistry, 6rh ed NewYork: WH Freeman Branden C &ToozeJ (1999)Introduction io ProteinStructure,2nd ed NewYork:GarlandScience Dickerson, RE(2005)Present at the FloodHowStructural Mo ecuar BiologyCameAbout Sunderland, MA:Slnauer KyteJ (2006)Structure in ProteinChemistryNewYork:Routledge Petsko GA& RingeD (2004)ProteinStructure and FunctionLondon: NewScience Press DroteinStrdclu.e: Pe'uLz M r 199..r) NewApp.oaches to Disease and TherapyNewYork:WH Freeman The Shapeand Structureof Proteins Anfinsen CB(1973)Principles that governthe foldingof proteinchains Science 181.2)3-230 peptidesandcytoplasmic BrayD (2005)Flexible Biol AelsGenome 6:106-I 09 P,Stetefe Burkhard d J & Strelkov SV(2001)Coiledcoils:a highly versatile protern fold ing r.laltf. Trends Ce|| Biol 11.82-BB principles CasparDLD& KlugA (1962)Physical in the construction of regularvrrusesColdSpringHarbSympQuantBiol27:1-24. DoolittleRF(1995) Themultiplicity of domainsin proteinsAnnuRev Biochem64.287-314 Eisenberg D (2003) Thediscovery ofthe alphahe ix and betasheet, theprincipe structural featuresofproteinsProcNatl AcadSct USA -11210 100.11207 Fraenkel ConratH & Williams RC(1955)Reconstitution of active tobaccomosaicvirusfrom itsinactiveproteinand nucleicacid componentsProcNatlAcadSciUSA41:69A-698 Goodsell DS& OlsonAJ (2000)Structural symmetryand protein function AnnuRevBiophys BtomolStruct29:105-1 53 HarrisonSC(1992)Yiuses CurrOpinStructBrol2.293-299 HarrisonSC(2004)Whitherstructuralbiology?NatureStructltlolBiol 11 : 1 2 -51 HudderA, Nathanson L & Deutscher MP (2003) Organization of mammaliancytoplasmMol CellBiol23.9318-9326 I n t e r n a t i o nHaul m a nG e n o m eS e q u e n c i nCgo n s o r t i u(m2 0 0 1l )n i t i a l sequencing and analysis of the humangenome/Vcfure 4A9.860-921 MeilerI & BakerD (2003)Coupledprediction of proteinsecondary and Tertiarv stnr.trrreProcNarlAcad5ciU5A100:12105 T2ll0 NomuraM (1973)Assembly of bacterial ribosomes Scrence 179. 864-873 OrengoCA& ThorntonJM (2005)Proteinfamilies andtheir perspective evolution a structural AnnuRevBiochem74.867-900 P a u l i n Lg& C o r e yR B( 1 9 5 1C) o n f i g u r a t i oonf sp o l y p e p t i dceh a i n w s ith favoredorientations aroundsinglebonds:two new pleatedsheets ProcNatlAcadSciUSA37:729-740 P a u l i n Lg ,C o r e yR B& B r a n s oHn R( 1 9 5 1 ) T hset r u c t u roef p r o t e i n tsw: o hydrogen-bonded helicalconfigurations of the polypeptide chain ProcNatlAcadSciUSA37.205-211 PontingCP,Schultz J,CopleyRRet al (2000)Evolution of domain familiesAdvProtetnChem54:185-244 T r i n i c k(J1 9 9 2U) n d e r s t a n d itnhge f u n c t i o nosf t i t i na n dn e b u l i n FEBS Leu3a7:44-48 VogelC,Bashton M, Kerrison NDet al (2004)Structure, functionand evolutionof multidomainproteinsCurrOpinStructBiol14.208-216 genomics: Tlang C & KimSH(2003)Overview from of structural structureto function,Curraptn ChemBiol7.28-32 Protein Function preparing AlbertsB (1998) Thecellasa collection of proteinmachines: the nextgeneration of molecular Cell92.291-294 biologists Benkovic| (1992)CatalyticantibodiesAnnuRevBtochem61:2954 BergOG& von HippelPH(1985)Diffusion-controlled macromolecular interactionsAnnu RevBiophys Biophys Chem14.131-1 60,

193 motifs, & LimWA (2006)Domains, RP,Remenyi A,YehB.J Bhattacharyya in the evolutionand and scaffods:Theroleof modularinteractions 75.655680 circuitsAnnuRevBiochem wiringof cellsignaling switches andclocks a familyof molecular BourneHR(1995)GTPases: PhilosTransR SocLondB 349:283-289 features ofthe reactions BradenBC& PoljakRJ(1995)Structural J 9.9-16 and proteinantigensFASEB betweenantibodies Structure, Function, RE& Geis| (1983)Hemoglobin: Dickerson CA:Benjamin Cummings and PathologyMenloPark, Evolution EnzymesNewYork:Scientific D & PotterH (1991)Discovering Dressler Library American | & Yeates TO (2000)Protein EM,Xenarios, D,Marcotte, Eisenberg functionin the post genomicera Nature405.823-826 in ProteinScience: Structure and Mechanisms FershtAR(,l999) NewYork.WH Freeman Catalysis A Guideto Enzyme Structural basisfor controlby LN& LewisRJ(200,1) Johnson, phosphorylation ChemRev101:2209-2242 WN (1988)Escheichia collaspartate ER& Lipscomb Kantrowitz andfunction the relationbetweenstructure transcarbamoylase: 4 Science 241:669-67 ModularenzymesNature409.247-252 Khosa C & HarburyPB(200,1) NatureRev KimE & ShengM (2004)PDZdomainproteinsof synapses Neurosci 5.771-781 DE,Jr (1984)Controlof enzymeactivityand metabolic Koshland pathwaysTrends 5cl9:,155-159 Btochem in enzyme D (2003)Challenges KrautDA,CarrollKS& Herschlag AnnuRevBiochen72.517-571 mechanism and energetics of protein KroganNJ,CagneyG,Yu H et al (2006)Globallandscape Nature cerevtstsae complexesin the yeastSaccharomyces 440:637-643 trace An evolutionary O, BourneHR& CohenFE(,1996) Lichtarge commonto proteinfamilles methoddefinesbindingsurfaces J ltlolBtol257.342-358 protein M, Ng HLet al (1999)Detecting MarcotteEM,Pellegrini fromgenomesequences interactions functionand protein-protein 285.751753, Science proteinsandcellular Allosteric JP& JacobF ('1963) MonodJ,Changeux controlsystemsJ ltlolBiol6.306-329 systems through of regulatory T & NashP (2003)Assembly Pawson 452 300'.445 proteininteraction domainsScience of cyclindependentkinaseregulation: NP(1999)Mechanisms Pavletich andCipand lNK4inhibitors of Cdks,theircyclinactivators, structures J l\lolBiol287.821-B2B in the lnteractions P & Kuriyan J (2006)Protein-protein Pellicena StructBrol regulation of proteinkinasesCurrOptn allosteric 16.702-709 Regulation and Allosteric of Cooperativity PerutzM (1990)Mechanisms Press University Cambrldge in ProteinsCambridge: with molecular F,Thoden,JB& HoldenHM (2003)Enzymes Raushel tunnels,4ccChemRes36.539548 Radzicka A & WolfendenR (1995)A proficientenzymeScience 267.9093, location: location, SatoTK,OverduinM & EmrS (200'])Location, targetingdirectedby PXdomainsScience Membrane 2 9 4 : 1 B B1l B B 5 state transition statesandtransition VL(1998)Fnzymatic Schramm 67:693-720 analogdesignAnnuRevBiochem to catalysis: diversity SchultzPG& LernerRA(1995)Frommolecular 269,1835-1842 fromthe immunesystemSclence lessons Thewaythingsmove:lookingunderthe ValeRD& MilliganRA(2000) 2BB,BB-95 hood of molecularmotor proteinsScience by hen Gl, LaineR& WithersSG(2001)Catalysis VocadloDJ,Davies /Vature proceeds viaa covalentintermedlate egg-whitelysozyme 412:835-838 of life Nature409:226-231 the chemistry WalshC (2001)Enabling and intramolecular YangXJ(2005)Multisiteproteinmodificatlon 24:16531662 signalingOncogene AnnuRevBtochem Zhu H,BilginM & SnyderM (2003)Proteomics 72:783812

DNA,Chromosomes, and Genomes Life depends on the ability of cells to store, retrieve, and translate the genetic instructions required to make and maintain a living organism. Tl:'is hereditary information is passed on from a cell to its daughter cells at cell division, and from one generation of an organism to the next through the organism's reproductive cells. These instructions are stored within every living cell as its genes, the information-containing elements that determine the characteristics of a species as a whole and of the individuals within it. As soon as genetics emerged as a science at the beginning of the twentieth century, scientists became intrigued by the chemical structure of genes. The information in genes is copied and transmitted from cell to daughter cell millions of times during the life of a multicellular organism, and it survives the process essentially unchanged.'What form of molecule could be capable of such accurate and almost unlimited replication and also be able to direct the development of an organism and the daily life of a cell?\A/hatkind of instructions does the genetic information contain? How can the enormous amount of information required for the development and maintenance of an organism fit within the tiny space of a cell? The answers to several of these questions began to emerge in the 1940s.At this time, researchers discovered, from studies in simple fungi, that genetic information consists primarily of instructions for making proteins. Proteins are the macromolecules that perform most cell functions: they serve as building blocks for cell structures and form the enzy'rnesthat catalyze the cell's chemical reactions (Chapter 3), they regulate gene expression (Chapter 7), and they enable cells to communicate with each other (Chapter 15) and to move (Chapter l6). The properties and functions of a cell are determined largely by the proteins that it is able to make. With hindsight, it is hard to imagine what other type of instructions the genetic information could have contained. Painstaking observations of cells and embryos in the late tgth century had led to the recognition that the hereditary information is carried on chromosomes,threadlike structures in the nucleus of a eucaryotic cell that become visible by light microscopy as the cell begins to divide (Figure 4-l). Later, as biochemical analysisbecame possible, chromosomes were found to consist of both deoxyribonucleic acid (DNA) and protein. For many decades, the DNA was thought to be merely a structural element. However, the other crucial advance made in the 1940swas the identification of DNA as the likely carrier of genetic information. This breakthrough in our understanding of cells came from studies

Figure4-l Chromosomes in cells.(A)Two adjacentplantcells photographed througha light microscope. The DNAhasbeenstainedwith a fluorescent dye (DAPI) that bindsto it.The DNAis presentin chromosomes, whichbecomevisibleasdistinctstructures in the light structures microscope onlywhen they becomecompact,sausage-shaped in preparation for celldivision,asshownon the left.Thecellon the right, which is not dividing,containsidenticalchromosomes, but they cannotbe clearlydistinguished in the light microscope at this phasein the cell'slife (B)Schematic cycle,becausethey are in a moreextendedconformation. diagramof the outlinesof the two cellsalongwith theirchromosomes. (A,courtesyof PeterShaw.)





d i v i d i n gc e t l

n o n d i v i d i n gc e l l



Chapter4: DNA,Chromosomes, and Genomes

of inheritance in bacteria (Figure 4-2). But as the 1950sbegan, both how proteins could be specified by instructions in the DNA and how this information might be copied for transmission from cell to cell seemed completely mysterious. The mystery was suddenly solved in 1953,when the structure of DNA was correctly predicted by Iames Watson and Francis Crick. As outlined in Chapter 1, the double-helical structure of DNA immediately solved the problem of how the information in this molecule might be copied, or replicated.It also provided the first clues as to how a molecule of DNA might use the sequenceof its subunits to encode the instructions for making proteins. Today, the fact that DNA is the genetic material is so fundamental to biological thought that it is difficult to appreciate the enormous intellectual gap that was filled. In this chapter we begin by describing the structure of DNA. We see how despite its chemical simplicity, the structure and chemical properties of DNA make it ideally suited as the raw material of genes.We then consider how the many proteins in chromosomes arrange and packagethis DNA. The packing has to be done in an orderly fashion so that the chromosomes can be replicated and apportioned correctly between the two daughter cells at each cell division. It must also allow accessto chromosomal DNA for the enzymes that repair it when it is damaged and for the specialized proteins that direct the expression of its many genes.We shall also see how the packaging of DNA differs along the length of each chromosome in eucaryotes,and how it can store a valuable record of the cell's developmental history. In the past two decades,there has been a revolution in our ability to determine the exact sequence of subunits in DNA molecules. As a result, we now know the order of the 3 billion DNA subunits that provide the information for producing a human adult from a fertilized egg, as well as the DNA sequencesof thousands of other organisms. Detailed analysesof these sequenceshave provided exciting insights into the process of evolution, and it is with this subject that the chapter ends. This is the first of four chapters that deal with basic genetic mechanismsthe ways in which the cell maintains, replicates, expresses,and occasionally improves the genetic information carried in its DNA. This chapter presents a broad overview of DNA and how it is packaged into chromosomes. In the following chapter (Chapter 5) we discuss the mechanisms by which the cell accurately replicates and repairs DNA; we also describe how DNA sequencescan be


s m o o t h p a t h o g e n i cb a c t e r i u m c a u s e sD n e u m o n t a

S s t r a i nc e l l s

I noroovrMUTATToN t Rstrain

f r a c t i o n a t i o no f c e l l - rf e e extra(t into classesof p u r i f i e dm o l e c u l e s

roughnonpathogenic m u r a n tD a c t e n u m


o oo l i v e R s t r a i nc e l l sg r o w n i n p r e s e n c eo f e i t h e r h e a t - k i l l e d S s t r a i nc e l l so r c e l l - f r e e e x t r a c to f S s t r a i nc e l l s TRANSFORMATION


5 strain

S o m eR s t r a i nc e l l sa r e t r a n s f o r m e dt o S s t r a i n c e l l sw , h o s ed a u g h t e r s a r e p a t h o g e n i ca n d c a u s ep n e u m o n t a

C O N C L U S I OM N :o l e c u l e st h a t c a n c a r r yh e r i t a b l ei n f o r m a t i o na r e presentin 5 strain cells. (A)




lipid carbohydrate

ttttl tttrl ttttl

moleculestested for transformationof R strain cells

ttttl ttttl




RRSRR strain strain

oo strain


C O N C L U S I OTNh: e m o l e c u l et h a t c a r r i e st h e h e r i t a b l ei n f o r m a t i o n isDNA. (B)


Figure 4-2 The first experimental demonstrationthat DNA is the genetic material.Theseexperiments, carriedout in the 1940s,showedthat addingpurified DNAto a bacteriumchangedits propertiesand that this changewas faithfullypassedon to subsequent generations. Two closelyrelatedstrainsof pneumoniae the bacteriumStreptococcus differfrom eachother in both their appearance underthe microscope and their pathogenicity. One strainappears smooth(5)and causesdeathwhen injectedinto mice,and the otherappears rough(R)and is nonlethal.(A)An initial experimentshowsthat a substance presentin the S straincanchange(or transform) the R straininto the S strain and that this changeis inheritedby generations subsequent of bacteria. (B)Thisexperiment, in whichthe R strain hasbeenincubatedwith variousclasses of biologicalmoleculespurifiedfrom the S strain,identifiesthe substance as DNA.


rearranged through the process of genetic recombination. Gene expressionthe process through which the information encoded in DNA is interpreted by the cell to guide the synthesis of proteins-is the main topic of Chapter 6. In Chapter 7, we describe how this gene expression is controlled by the cell to ensure that each of the many thousands of proteins and RNA molecules encrlpted in its DNA are manufactured only at the proper time and place in the life of the cell.

THESTRUCTURE ANDFUNCTION OFDNA Biologists in the 1940s had difficulty in conceiving how DNA could be the genetic material because of the apparent simplicity of its chemistry. DNA was known to be a long poll.rner composed of only four types of subunits, which resemble one another chemically. Early in the 1950s,DNA was examined by xray diffraction analysis, a technique for determining the three-dimensional atomic structure of a molecule (discussedin Chapter 8). The early x-ray diffraction results indicated that DNA was composed of two strands of the polymer wound into a helix. The observation that DNA was double-stranded was of crucial significance and provided one of the major clues that led to the Watson-Crick model for DNA structure. But onlywhen this model was proposed in 1953 did DNAs potential for replication and information encoding become apparent. In this section we examine the structure of the DNA molecule and explain in general terms how it is able to store hereditary information.

A DNAMoleculeConsists of TwoComplementary Chainsof Nucleotides A deoxyribonucleic acid (DNA) molecule consists of two long polynucleotide chains composed of four types of nucleotide subunits. Each of these chains is knornm as a'Achain, or a DNA strand. Hydrogen bondsbetween the base portions of the nucleotides hold the two chains together (Figure 4-3). As we saw in Chapter 2 (Panel 2-6, pp. 116-117), nucleotides are composed of a five-carbon sugar to which are attached one or more phosphate groups and a nitrogen-containing base. In the case of the nucleotides in DNA, the sugar is deoxyribose attached to a single phosphate group (hence the name deoxyribonucleic acid), and the base maybe either adenine (A),cytosine(C),guanine (G),or thymine (T). The nucleotides are covalently linked together in a chain through the sugarsand phosphates, which thus form a "backbone" of alternating sugar-phosphatesugar-phosphate. Becauseonly the base differs in each of the four types of subunits, each polynucleotide chain in DNA is analogous to a necklace (the backbone) strung with four rypes of beads (the four basesA, C, G, and T). These same symbols (A, C, G, and T) are also commonly used to denote the four different nucleotides-that is, the baseswith their attached sugar and phosphate groups. The way in which the nucleotide subunits are linked together gives a DNA strand a chemical polarity. If we think of each sugar as a block with a protruding knob (the 5'phosphate) on one side and a hole (the 3'hydroxyl) on the other (see Figure 4-3), each completed chain, formed by interlocking knobs with holes, will have all of its subunits lined up in the same orientation. Moreover, the two ends of the chain will be easily distinguishable, as one has a hole (the 3'hydroxyl) and the other a knob (the 5'phosphate) at its terminus. This polarity in a DNA chain is indicated by referring to one end as tl:'e ! end and the other as the ! end. The three-dimensional structure of DNA-the double helix-arises from the chemical and structural features of its two polynucleotide chains. Because these two chains are held together by hydrogen bonding between the bases on the different strands, all the bases are on the inside of the double helix, and the sugar-phosphatebackbones are on the outside (seeFigure 4-3). In each case, a bulkier two-ring base (a purine; see Panel 2-6, pp. 116-l 17) is paired with a single-ring base (a pyrimidine); A always pairs with T and G with C (Figure



Chapter4: DNA, Chromosomes,and Genomes

$$iiiffiliiiii:ii:iii llilii:i:ilitffi

b u i l d i n gb l o c k so f D N A phosphate suqar \



sugar oase phosphate





D N A d o u b l eh e l i x


Figure4-3 DNA and its building blocks. DNA is made of four types of nucleotides,which are linkedcovalently into a polynucleotide chain(a DNA strand)with a sugar-phosphate backbonefrom which the bases(A,C,G, andT) extend.A DNAmoleculeis composedof two DNA strandsheld together by hydrogenbonds between the pairedbases.Thearrowheadsatthe endsofthe DNAstrandsindicatethe polaritiesof the two strands,which run antiparallel to eachother in the DNA molecule.In the diagramat the bottom left of the figure,the DNA moleculeis shown straightenedout; in reality,it is twistedinto a doublehelix,as shownon the right.Fordetails,seeFigure4-5.

Y 3'

hydrogen-bonded b a s ep a i r s

4-4). This complementary base-pairlng enables the base pairs to be packed in the energetically most favorable arrangement in the interior of the double helix. In this arrangement, each base pair is of similar width, thus holding the sugarphosphate backbones an equal distance apart along the DNA molecule. To maximize the efficiency of base-pair packing, the two sugar-phosphate backbones


o \\

N - _CC '\

C -N

\/ -L



\\ C \\N sugar-phosphate backbone


C -C


,-n, , ,o'l





N -HilililililO



ll / hydrogen DOnO



Figure4-4 Complementarybasepairs in the DNAdouble helix.The shapesand chemicalstructureof the basesallow hydrogenbondsto form efficientlyonly betweenA and T and betweenG and C. where atomsthat are able to form hydrogen bonds(seePanel2-3, pp. 110-111)can be brought closetogetherwithout distorting the doublehelix.As indicated, two hydrogenbonds form betweenA and T, while three form betweenG and C.The basescan pairin thisway only if the two polynucleotide chainsthat containthem areantiparallel to eachother.




\ .o-

d":, mtnor groove





I 0 . 3 4n m

o' o={-o 5'end (A)


'.. 3'end

Figure4-5 The DNAdouble helix. (A)A space-filling modelof 1.5turnsof the DNAdoublehelix.Eachturn of DNAis madeup of 10.4nucleotidepairs,and the center-to-centerdistancebetween adjacentnucleotidepairsis 3.4nm.The coilingof the two strandsaroundeach other createstwo groovesin the double helix:the widergrooveis calledthe major groove,and the smallerthe minorgroove. (B)A shortsectionof the doublehelix viewedfrom its side,showingfour base pairs.The nucleotides are linkedtogether bondsthat covalentlyby phosphodiester join the 3rhydroxyl(-OH)groupofone sugarto the 5rhydroxylgroup of the next strand sugar.Thus,eachpolynucleotide hasa chemicalpolarity;that is,itstwo The5' end of different. endsarechemically the DNApolymeris by conventionoften illustrated carryinga phosphategroup, whilethe 3rend is shownwith a hydroxyl.


wind around each other to form a double helix, with one complete turn every ten base pairs (Figure 4-5). The members of each base pair can fit together within the double helix only if the two strands of the helix are antiparallel-that is, only if the polarity of one strand is oriented opposite to that ofthe other strand (seeFigures 4-3 and 4-4). A consequence of these base-pairing requirements is that each strand of a DNA molecule contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand.

TheStructureof DNAProvides a Mechanism for Heredity Genescarry biological information that must be copied accurately for transmission to the next generation each time a cell divides to form two daughter cells. TWo central biological questions arise from these requirements: how can the information for specifying an organism be carried in chemical form, and how is it accurately copied?The discovery of the structure of the DNA double helix was a landmark in twentieth-century biology because it immediately suggested answers to both questions, thereby providing a molecular explanation for the problem of heredity. We discuss these answers briefly in this section, and we shall examine them in much more detail in subsequent chapters. DNA encodes information through the order, or sequence, of the nucleotides along each strand. Each base-A, C, T or G-can be considered as a Ietter in a four-letter alphabet that spells out biological messagesin the chemical structure of the DNA. As we saw in Chapter 1, organisms differ from one another because their respective DNA molecules have different nucleotide sequencesand, consequently,carry different biological messages.But how is the nucleotide alphabet used to make messages,and what do they spell out? As discussed above, it was known well before the structure of DNA was determined that genes contain the instructions for producing proteins. The DNA messagesmust therefore somehow encode proteins (Figure 4-6). This relationship immediately makes the problem easier to understand. As discussed in Chapter 3, the properties of a protein, which are responsible for its biological function, are determined by its three-dimensional structure. This structure is determined in turn by the linear sequenceof the amino acids of which it is composed. The linear sequence of nucleotides in a gene must therefore somehow spell out the linear sequence of amino acids in a protein. The exact correspondence between the four-letter nucleotide alphabet of DNA and the twenty-letter amino acid alphabet of proteins-the genetic code-is not obvious from the DNA structure, and it took over a decade after the discoverv of the double helix

| |







rrl lll tti

g e n eC

g e n eB T--t

g e n eA


gene expresslon

protein A

hi€ protein B



lffi protein c

Figure4-6 The relationshipbetween geneticinformationcarriedin DNAand proteins(discussed in Chapter1).


Chapter4: DNA,Chromosomes, and Genomes

before it was worked out. In Chapter 6 we will describe this code in detail in the course of elaborating the process,knoltm as geneexpresslon,through which a cell converts the nucleotide sequence of a gene first into the nucleotide sequenceof an RNA molecule, and then into the amino acid sequenceof a protein. The complete set of information in an organism'sDNA is called its genorne, and it carries the information for all the proteins and RNA molecules that the organism will ever synthesize. (The term genome is also used to describe the DNA that carries this information.) The amount of information contained in genomes is staggering: for example, a typical human diploid cell contains 2 meters of DNA double helix. Written out in the four-letter nucleotide alphabet, the nucleotide sequence of a very small human gene occupies a quarter of a page of text (Figure 4-7), while the complete sequence of nucleotides in the human genome would fill more than a thousand books the size of this one. In addition to other critical information, it carries the instructions for roughly 24,000distinct proteins. At each cell division, the cell must copy its genome to pass it to both daughter cells. The discovery of the structure of DNA also revealed the principle that makes this copying possible: because each strand of DNA contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand, each strand can act as a template, or mold, for the synthesis of a new complementary strand. In other words, if we designate the two DNA strands as S and S', strand S can serve as a template for making a new strand S', while strand S' can serve as a template for making a new strand S (Figure 4-8). Thus, the genetic information in DNA can be accurately copied by the beautifully simple process in which strand S separatesfrom strand S', and each separated strand then servesas a template for the production of a new complementary partner strand that is identical to its former partner. The ability of each strand of a DNA molecule to act as a template for producing a complementary strand enables a cell to copy, or replicate,its genome before passing it on to its descendants.In the next chapter we shall describe the elegant machinery the cell uses to perform this enormous task.


r ffiU




r uAv


r u69uAuffibt




r 1w

r\srwl96wuuulls 1A


I IAvlsuALauu








u9 lluulwv

Iuuuffig I I tAul





t nLAb

I 1 1





In Eucaryotes, DNAls Enclosed in a CellNucleus As described in Chapter l, nearly all the DNA in a eucaryotic cell is sequestered in a nucleus, which in many cells occupies about l0% of the total cell volume. This compartment is delimited by a nuclear enuelopeformed by two concentric lipid bilayer membranes (Figure 4-9). These membranes are punctured at intervals by large nuclear pores, which transport molecules between the nucleus and the cytosol. The nuclear envelope is directly connected to the extensive membranes of the endoplasmic reticulum, which extend out from it into the cytoplasm. And it is mechanically supported by a network of intermediate filaments called the nuclear lamina, which forms a thin sheetlike meshwork just beneath the inner nuclear membrane (seeFigure 4-98). The nuclear envelope allows the many proteins that act on DNA to be concentrated where they are needed in the cell, and, as we see in subsequent Figure4-7 The nucleotidesequenceof the human p-globingene.By convention, a nucleotidesequenceis writtenfrom its 5'end to its 3'end, and it shouldbe readfrom left to right in successive linesdown the pageasthough it werenormalEnglishtext.Thisgenecarriesthe informationfor the aminoacidsequenceof one of the two typesof subunitsof the hemoglobinmolecule, the proteinthat carriesoxygenin the blood.A differentgene,the a-globingene,carriesthe informationfor the othertype of hemoglobinsubunit(a hemoglobinmoleculehasfour subunits, two of eachtype).Onlyone of the two strandsof the DNAdouble helixcontainingthe B-globingeneis shown;the otherstrandhasthe exact complementary sequence. The DNAsequences highlightedin yellowshow the threeregionsof the genethat specifythe aminoacidsequence for the B-globinprotein.We shallseein Chapter6 how the cellsplicesthesethree sequences togetherat the levelof messenger RNAin orderto synthesize a full-lenqthB-qlobinprotein.


1 UHLUffi



l l9ruuAtAlls


I LLr6lrffiuul



f M]ffiLhI

I f dI

ru^rul6ttrffir hL



f 1 IUAf


I lulu/fiIhlttl

r u r uuuAuu

I uAu






I 1H


] H I l*L






201 t e m p l a t eS s t r a n d

Figure4-8 DNA as a template for its own duplication.As the nucleotideA pairsonly with I and G with successfully C,eachstrandof DNAcan act asa templateto specifythe sequenceof in its complementary strand. nucleotides In this way,double-helical DNAcan be with eachparentalDNA copiedprecisely, helixproducingtwo identicaldaughter DNAhelices.

5 strand new S'strand

new 5 strand S'strand p a r e n t a lD N A d o u b l e h e l i x t e m o l a t e5 ' s t r a n d d a u g h t e rD N A d o u b l e h e l i c e s

chapters, it also keeps nuclear and cytosolic enzymes separate,a feature that is crucial for the proper functioning of eucaryotic cells. Compartmentalization, of which the nucleus is an example, is an important principle of biology; it serves to establish an environment in which biochemical reactions are facilitated by the high concentration of both substrates and the enzymes that act on them. Compartmentalization also prevents enzymes needed in one part of the cell from interfering with the orderly biochemical pathways in another.

Su m m a r y Genetic information is caruied in the linear sequenceof nucleotides in DNA. Each molecule of DNA is a double helix formed from two complementery strands of nucleotidesheld togetherby hydrogen bonds betweenG-C and A-T basepairs. Dultlication of the geneticinformation occursby the useof one DNA strand as a templatefor theformation of a complementarystrand. Thegeneticinformation storedin an organism's DNA contains the instructionsfor all the proteins the organism will euersynthesize and is said to comprise its genome.In eucaryotes,DNA is contained in the cell nucleus,a largemembrane-boundcompartment. e n d o p l a s m i rce t i c u l u m

peripheral heterochromatin

DNA and associated p r o t e i n s( c h r o m a t i n ) , p l u sm a n y R N Aa n d p r o t e i nm o l e c u l e s


nucleolus microtubule n u c l e a lra m i n a

\) n u c l e a rp o r e/

(B) z pm

membrane innernuclear


Figure4-9 A cross-sectional view of a typicalcell nucleus.(A)Electronmicrographof a thin sectionthroughthe nucleus the outer (B)Schematic of a humanfibroblast. drawing,showingthat the nuclearenvelopeconsistsof two membranes, one beingcontinuouswith the endoplasmic reticulummembrane(seealsoFigure12-8).The spaceinsidethe endoplasmic The lipid reticulum(theERlumen)is coloredyel/or4t it is continuouswith the spacebetweenthe two nuclearmembranes. networkof bilayersof the innerand outer nuclearmembranes areconnectedat eachnuclearpore.A sheet-like forminga special supportfor the nuclearenvelope, intermediate filaments(brown)insidethe nucleusprovidesmechanical nearthe laminacontains supportingstructurecalledthe nuclearlamina(fordetails,seeChapter12).The heterochromatin later. specially condensedregionsof DNAthat will be discussed


Chapter4: DNA,Chromosomes, and Genomes

CHROMOSOMAL DNAAND ITSPACKAGING INTHE CHROMATIN FIBER The most important function of DNA is to carry genes, the information that specifies all the proteins and RNA molecules that make up an organismincluding information about when, in what types of cells, and in what quantity each protein is to be made. The genomes of eucaryotesare divided up into chromosomes, and in this section we see how genes are typically arranged on each chromosome. In addition, we describe the specialized DNA sequencesthat are required for a chromosome to be accurately duplicated and passedon from one generation to the next. We also confront the serious challenge of DNA packaging. If the double helices comprising all 46 chromosomes in a human cell could be laid end-toend, they would reach approKimately 2 meters; yet the nucleus, which contains the DNA, is only about 6 pm in diameter. This is geometrically equivalent to packing 40 km (24 miles) of extremely fine thread into a tennis ball! The complex task of packaging DNA is accomplished by specializedproteins that bind to and fold the DNA, generating a series of coils and loops that provide increasingly higher levels of organization, preventing the DNA from becoming an unmanageable tangle. Amazingly, although the DNA is very tightly folded, it is compacted in a way that keeps it available to the many enz).rnes in the cell that replicate it, repair it, and use its genesto produce RNA molecules and proteins.

Eucaryotic DNAls Packaged into a Setof Chromosomes In eucaryotes,the DNA in the nucleus is divided between a set of different chromosomes. For example, the human genome-approximately 3.2 x 10e nucleotides-is distributed over 24 different chromosomes. Each chromosome consists of a single, enormously long linear DNA molecule associatedwith proteins that fold and pack the fine DNA thread into a more compact structure. The complex of DNA and protein is called chromatin (from the Greek chroma, "color," because of its staining properties). In addition to the proteins involved in packaging the DNA, chromosomes are also associated with many proteins and RNA molecules required for the processesof gene expression,DNA replication, and DNA repair. Bacteria carry their genes on a single DNA molecule, which is often circular (see Figure 1-29). This DNA is associatedwith proteins that package and condense the DNA, but they are different from the proteins that perform these functions in eucaryotes.Although often called the bacterial "chromosome," it does not have the same structure as eucaryotic chromosomes, and less is knoltm about how the bacterial DNA is packaged.Therefore, our discussion of chromosome structure will focus almost entirely on eucaryotic chromosomes. With the exception of the germ cells (discussed in Chapter 2l) and a few highly specialized cell types that cannot multiply and lack DNA altogether (for example, red blood cells),each human cell contains two copies of each chromosome, one inherited from the mother and one from the father. The maternal and paternal chromosomes of a pair are called homologous chromosomes (homologs). The only nonhomologous chromosome pairs are the sex chromosomes in males, where a Y chromosome is inherited from the father and an X chromosomefrom the mother. Thus, each human cell contains a total of 46 chromosomes-22 pairs common to both males and females, plus two so-called sex chromosomes (X and Y in males, two Xs in females). DNA hybridization is a technique in which a labeled nucleic acid strand servesas a "probe" that localizes a complementary strand, as will be described in detail in Chapter B. This technique can be used to distinguish these human chromosomes by "painting" each one a different color (Figure 4-f0). Chromosome painting is typically done at the stagein the cell cycle called mitosis, when chromosomes are especiallycompacted and easy to visualize (seebelow). Another more traditional way to distinguish one chromosome from another





along each mitotic chromosome (Figure 4-f l). The structural bases for these banding patterns are not well understood. Nevertheless,the pattern of bands on each type of chromosome is unique, and it is these patterns that initially allowed each human chromosome to be identified and numbered. The display of the 46 human chromosomes at mitosis is called the human karyotype. If parts of chromosomes are lost or are switched between chromosomes, these changes can be detected by changes in the banding patterns or by changes in the pattern of chromosome painting (Figure 4-12). Cytogeneticists use these alterations to detect chromosome abnormalities that are associated with inherited defects, as well as to characterize cancers that are associated with specific chromosome rearrangementsin somatic cells (discussedin Chapter 20).




203 Figure4-10 The completeset of human from chromosomes.Thesechromosomes, a male,wereisolatedfrom a cell and undergoingnucleardivision(mitosis) arethereforehighlycompacted.Each chromosomehasbeen"painted"a differentcolorto permitits unambiguous identification underthe light microscope. paintingis performedby Chromosome to a collection exposingthe chromosomes that havebeen of humanDNAmolecules coupledto a combinationof fluorescent derived dyes.Forexample,DNAmolecules from chromosome1 arelabeledwith one specificdye combination,those from chromosome2 with another,and so on. Because the labeledDNAcanform base pairs,or hybridize, only to the chromosomefrom which it was derived (discussed in Chapter8),each chromosomeis differentlylabeled.For the chromosomes are suchexperiments, subjectedto treatmentsthat separatethe DNAinto individualstrands, double-helical with the designedto permitbase-pairing labeledDNAwhile single-stranded keepingthe chromosomestructure relativelyintact.(A)Thechromosomes visualized asthey originallyspilledfrom the lysedcell.(B)Thesamechromosomes linedup in their numericalorder. artificially Thisarrangement of the full chromosome (FromE.Schrock set is calleda karyotype. With et al..Science273:494-497,1996. permissionfrom AAAS.)

Figure4-1 1 The bandingpatternsof Chromosomes human chromosomes. order 1-22 arenumberedin approximate of size.A typicalhumansomatic(nongerm-line)cellcontainstwo of eachof plustwo sex thesechromosomes, in a chromosomes-twoX chromosomes female,one X and oneY chromosomein a usedto make male.The chromosomes thesemapswerestainedat an earlystage are in mitosis,when the chromosomes incompletelycompacted.The horizontol redline representsthe positionof the centromere(seeFigure4-21),which appearsas a constrictionon mitotic The red knobson chromosomes. c h r o m o s o m e1s3 , ' l 4 ,1 5 , 2 1, a n d 2 2 indicatethe positionsof genesthat code in for the largeribosomalRNAs(discussed Chapter6).Thesepatternsareobtainedby with Giemsastain, stainingchromosomes and they can be observedunderthe light (Formicrographs, seeFigure microscope. 21-1 8; adaptedfrom U. Franke , Cytogenet. 1981.With CellGenet.31:24-32,


Chapter4: DNA,Chromosomes, and Genomes

l,?: f J#i,1lff:Tl1flT:ffl[S?ilil:;'1] 1,5# : ;i,en,w,h

ataxia,a diseasecharacterized by progressive deterioration of motor skills. poil, but one The patienthasa normalpairof chromosome 4s (left-hand

il?;|;;;:,T:"?;il::::ililiiil:,'""#i.1,ffi ff:iil: ;:JI:liffi (A)


aberrantchromosome12 (redbracket\was deduced,from its patternof bands,asa copyof part of chromosome4 that had becomeattachedto

:il:il:::il: l'"*:::l#lit",'*1" f :?il:Tl:T:T;:**il "

pairs,"painted"redfor chromosome4 DNA and bluefor chromosome12 DNA.Thetwo techniquesgive riseto the sameconclusionregardingthe natureof the aberrantchromosome12,but chromosomeoainting providesbetterresolution, allowingthe clearidentification of evenshort piecesof chromosomes that havebecometranslocated. However, Giemsa stainingis easierto perform.(Adaptedfrom E.Schrocket al.,Sclence 273:494-497,1996.With permissionf rom AAA5.)

Chromosomes ContainLongStringsof Genes Chromosomes carry genes-the functional units of heredity. A gene is usually defined as a segment of DNA that contains the instructions for making a particular protein (or a set of closely related proteins). Although this definition holds for the majority of genes, several percent of genes produce an RNA molecule, instead of a protein, as their final product. Like proteins, these RNA molecules perform a diverse set of structural and catalltic functions in the cell, and we discuss them in detail in subsequent chapters. As might be expected, some correlation exists between the complexity of an organism and the number of genes in its genome (see Table l-1, p. 1B). For example, some simple bacteria have only 500 genes, compared to about 25,000 for humans. Bacteria and some single-celled eucaryotes, such as yeast, have especially concise genomes; the complete nucleotide sequence of their genomes reveals that the DNA molecules that make up their chromosomes are little more than strings of closely packed genes (Figure 4-13). However, chromosomes from many eucaryotes (including humans) contain, in addition to genes, a large excessof interspersed DNA that does not seem to carry critical information. Sometimes called "junk DNA' to signify that its usefulness to the cell has not been demonstrated, the particular nucleotide sequence of most of this DNA may not be important. However, some of this DNA is crucial for the proper expression of certain genes,as we discuss elsewhere. Becauseof differences in the amount of DNA interspersed between genes, genome sizes can vary widely (see Figure l-37). For example, the human genome is 200 times larger than that of the yeast S. cereuisiae,but 30 times smaller than that of some plants and amphibians and 200 times smaller than that of a species of amoeba. Moreover, because of differences in the amount of excessDNA, the genomes of similar organisms (bony fish, for example) can vary severalhundredfold in their DNA content, even though they contain roughly the same number of genes.\.Vhateverthe excessDNA may do, it seems clear that it is not a great handicap for a eucaryotic cell to carry a large amount of it. How the genome is divided into chromosomes also differs from one eucaryotic species to the next. For example, compared with 46 for humans, somatic cells from a speciesof small deer contain only 6 chromosomes, while those from a species of carp contain over 100. Even ciosely related species with similar genome sizes can have very different numbers and sizes of chromosomes (Figure 4-14). Thus, there is no simple relationship between chromosome number,

0 5 % o f t h e D N A o f t h e y e a s tg e n o m e

3', 1 0 , 0 0 0n u c l e o t i d ep a i r s

Figure4-1 3 The arrangementof genes in the genome o'fS,cerevisiae. S.cerevisiae is a budding yeastwidely usedfor brewingand baking.The genomeof this yeastcellis distributed over 16 chromosomes. A smallregionof one chromosomehasbeenarbitrarily selectedto showthe high densityof genescharacteristic of this species. As indicatedby the light redshading,some genesare transcribedfrom the lower strand,whileothersaretranscribed from the upperstrand.Thereareabout 6300genesin the completegenome, whichcontainssomewhatmorethan 12 millionnucleotidepairs.(Forthe closelypackedgenesof a bacterium whosegenomeis 4.6millionnucleotides long,seeFigure1-29).

('pl-lsraqs!lqnd uellrul)eW urelord paplol @ uro4 uolssil,utao qllM. t002, l"z6_0ggt60b a.rnJDN'ulntuosuoJ 6ut)uanbasauJouag ueulnHleuotleuratul urolJpaldepv) UlalOlO mM 'uorsr^tp sa)uanoos Jo,uotlpJedald ut saulosouloJq) uo,ssa.rd*a aue6 slr 6urle)tldnplou uaqMsitedapttoal)nu uorlu! | I 'vNc .l ,.:,:.,,,:: i t t , r ,i i l i t l , : , , r , r , i : i t , I i l !t I I Jo lunouleptoldeqaqt 601x v'9lo rlr.ri,lillll:.1 i i r rl.::tli, r, i tiri i' i i' tt I I alrMl ^lq6nolsuteluo)snal)nulle) )tteuos s r r e da p r l o a ; : n u , O tx t € ; o a u a 6 a u o ueurnqe'ploldlp6urag.eupuoqrollu aqt pue sauosoutolq)leal)nutz aql u! VNOro aluenbasaplloallnu ega;> or1lol sraJaraluanbasawouabuownq urol eqf '(t t latdeq) u! llPlaput passn)stp pue'l raldeql ur pa)npollul)pupuoq)o]il,! s a u a 6y 6 u r u r e l u o la u r o s o r . x o r q + )o % L aql u! puno,sr(lla)rad sardote;d11;nr.u ur-stred eptloal)nu69S'9t)euoua6 ueurnqaql,o uotl)elJelnutLuv 'snal)nu aql ulqll/!^punol ( t [-t pue g 1-7 sarn6rl oos)sauosotuolLl)xas z pue saulosolne saua6gy- [!re auJosouorq)p VoOl ZZeql Ja^opalnqutstpsraurouab 'saoadslno o16ur6uo;aq slll Jo lle1sor,u;y uorlptrtroJur )llauablo Illlelol aq1sr(srred aprtoal)nu60l.x z'€)auouao ueunq aql '9 laldeqf ul llelapul passn)srp se'lueuodutunI;ant1e1al s1$ot61 suorluraqlJo atuanbasVNCaql ollqm 'ulalord aql uorlrod e )o! sapcD(pil) ;o 6uo; s.rredaprloal)nu e0[ x 8? q)ee 'seln]alourVNc orltr]]o pasodulo) uoxa Ll)el 'uorsueoxaploluol loqunj e toue 'uolleuijoluo) )!lol!ul sl! ul zz eurosol,jrolqlueulnq (v) umoqssrauo6le)rdIl e 1olueule6uele uoxo-uoltut aqf (C)'saua6lela^as 'Qt-V eJnBIil suleloJd JoJsapof, (luacrad lBtuosoruoJqcSurureuralaqlJo qonlN Jo rllbual arlua aql sMoqs(g)Jo uoruod ^{eJ e LIuo) ]l Jo allfl ^ oq sr eluoue8 ueunq eq] Jo erntBeJ3ut1u1s ]srIJ eqJ papuedxauy (1; 'sauabpal)!pard ale par 'Iooq sHl ur raldeqs ^,t-lana;o ul esoqlpue seua6umou) oJeu/AorqryDp lueluoJ ut asoqf'pale)rpursaue60t tnoqe qllm s]ueuuedxe 1v\eupe]Elnruus,{.pBeJIBsEq lI aql uo slceJJeJofptu peq e^eq ]eq] iZZaurosouolq>;ouorl.lode1o uolsuedxa 'paz,(lEuB,illry sr uorleuJoJur srq] eJoJeqsapecep zi.uuu aq lnq ill \ u )JolJ aq] plolualv (B)'tardeq)srrllu! ralel passn)stp punoJe puocas rad sapuoelcnu 000I Jo elPJP le seJuenbasaplloelcnu ^^eJpelE sr q)rqM'(ulteuolr.l)olalaq) utleuJolr..l) '{Eed -reuag stl }V'(I-? alqul pue 9I-? arntlg) 8uua83els sl l3eford 1ca[or4eq] l>eduor {;re;ntrlrede ur pabel>ed 1o r,u.ro1 'alqPlle^e are leql VNOJo sa)uanbaspaleada.r aruoueS uEunH aql .{q papr^ord uopBIuJoJuI Jo .&IluBnb reeqs aqJ sr saruosoruoJqc uErunq ile uI uouuuJoJur crleueS elql'v002 q ,,ecuanbasvNC lroqsJo slsrsuo)zz auosoruolq)Jo tllle 'euoua6 ueunq peqsrurJ,, altlua aqt puB I00z q auoueS uErunq aJuue eql Jo ,,ueJp ]sJg,,aql Jo uop aq] Uel aqrJo1soy1 r(lalerulxordde dn saleu pue s.rred -ecnqnd Joo/og'L qll^\ Lepol'(9I-? arnSlC) aluosoluorqr alerqeile^ eJllue ue SuolE aql aprloal)nue0t x It sureluo)'sauosorlojtl) pegueJJe are seue8^^oq.{Fcexeees o} arull IsJIJeql JoJelqlssod arueceq 1l '666I ueuJnqlsolleulsaql lo auo'zz uI '(t-I eJnBrCaes) suraloJd a>lpluol 'selElpauualul VNU q8norqt 'pasn pue lno aurosoruolq)(V)'aulosotuoJq)ueunq e peal sr VNCI ul uorlP{uJoJur eq} /\/\oq'sr.uJe}IeJauaSuI 'pessnsslp e,\/\I raldeq] uI uo saua6lo uoglezgue6,ro aq1 91-9 arn6r1




pobuerrv erv saueD rno /vloHs/v\orls auoueguerunHeql,o a)uanbaseplloel)nNoql ('0002'sraqs!lqnd llalupg g sauor:vy! [rnqpn5'pa pr€ /uorlnlon3'ra6req1trr15'n11'y1 'seluu.{Jeuor}nl r,uor;pardepy)'saua6;o taqurnu -ola Suol Je^osaJnsseJduollJales,{q uo pelcu 'sluala f,l}eua8ruopuer flSutruees repurse ureluo)seoadsom] asaql'leulue aq] uo l)aJJeloleu e burneq]noq];m'pasn; Jo ^Jolsrq enbrun e ,{.qpadeqs ueaq {1JEaaleq selJeds ,(ep-urapou Jo seuros -ouorr{J pue sauoueS aq} 'raqleg 'aZIs euoueS IPlo} puP salJads saurosourol r.lta1e.r edasI; ;erlrur'reflunu "fiIxalduoc uerpuleql uorlnlo^aaLllul 's,taqurnu Jo euosou,roltl)lueraggp{ran qllm raap :e[1unuasaurql > e [ u n u .u e r p u ; pale;ar {;aso1>orvrltr 1-y arn6r1 1o sag>eds tA I

l.x r(f



















uu uu (r(, 0u



pue leln])nrls'isuor6a.r po^.tasuol pue'svNUleuolt)unJ vNC aqt uo satrs6u purq-uralord pa^lasald pJlelsuerlun) sl1n ,€ pue,g 6urpoluaVNCapnputasaq]lsuot60lleuorl)unJ *** ,4dotauo ur aqt uror; suortetnLu 6urbeiuep;o uortelnunllpaql .(q poMollolauableuotDunle 1ouotlettldnp 6ututeluot1nq uorssardxa radordsll luanardleql suorlelnusnorautnu asueseuabopnasd 1so1,11 'eua6 y ** yp6 1oaruanbasapttoallnue sraua6opnasd elo leqt 6urlqurasar,(lasop leuorlrunJ lluapuel areleql ouo u.ror1 6uua;rpsraquJnu leedarq1t,a,r'paleader lxouaqt ol lenpr^rpur y5166uruteurar peleade;llqbrqlroqs1ostsrsuo) lluer-uud aql (saplloal)nu saruanbas OOO'OO L l r rJ U o l n o q e A l u o J o J l p r r o r a ) , { l a s r : a r d usM J po [uo)Jslr) n u , r o l l l l q q B 6 . 1 o a t u a n b a s e q l * (dor-q6tq ul oTogg {qaler"urxorddes}uarrrele anr}r}ader VNC }o a6e}ue)rad o/o9't ***SelUenOeS pa^rasuo)I;q6rq raqlo ul vNc,o abetua)red

o/o9'l 000'02ueql erou sredeprloel)nuItt srledaprloal)nugoL'LL t'0t 8LL l srledeprloel)nuooo'Lz sltedaptloel)nueOI x t'Z ggg'97Ilaleurxordde aptloallnu60l. x Z'€ +sJted

(sa>uenbas 6urpor urelord)suoxaura)uonbas VNCJoebelue)rod *xsoua60pnesd ;o requrnSl azrsuoxoueew azrsuoxalsa6lel aua6rad suoxaJoraqLUnu ueaw eua6radsuoxaJoraqunu lsablel euebrad suoxaro Jaqurnu ]salleus azrsaua6ueew auoblsa6re-l saua6JoraquinN !]6ua1YP6

ououaD ueuinHaql rol s)llsllels lelln euos [-t alqel

'dlssalarecrltuaredde 'r{.yaleururr.rcslpur sualr elqenp^ ,,'lnoq8norql paralleJs dliualed,r,ra; aql pue :pepreosrpra^e Sulqlou,!1en1rp 1(,1unf,sp pateqlulun aqt ^q ol peJJeJer)JallnlJ palepruncre qonru luoqpzrue8to Jo eJuaplle ell]{ l}duelun 'ralaure lano trrtsrlenpplpul dUBq :eJrT/roteJe8r4ar/uoorpaqTa8ure8rno/t alquasar lsnf ^luo ol dn ppe lnq plnoMaua6stLltut sa)uanbes6utpo)aql .4.eru11sitem auros u1,, 'auroua8 Jno paqrJcsap Joleluaruruoc auo sy ferresrp ;o ]nq 'LU0€ roJpuatxe plno^^aua6 a6ela^e uV 'tu 0€ t r(ranaaua6 6urpor-ura1o.rd alels SurruJpleup uI aq ol suaas ueunq e af,npoJd ol pepeeu uollBruJoJur lPcrlrJc aql leql paleelar seq aruoua8 uerunq aq] Jo aouenbas apnoalcnu eql dlpurC e'e6elaneuo 'aq plnoMelaql'ale)s 'Irorlr saJuanbasy11q Lrolep8ar,uoq raldeq3 ur ssnrsrp slql lv '(Bu aq par)sur6uoueunq rno 1o 2 alrsaql 'e)uJVjo ratua) aql ssol)e q)lells aM'sauoua8 esrJuoJqlyv\ susruESro ur passarduoc aroru eJBsaJuanbesdroleln o1q6nouareJ'(sapur 6697Ila1er-urxordde) -3er asaq] 'palradxa eq plnom sy 'srrcd epqoalf,nu Jo spuesnoql Jo sue] JaAolno Lul 002€pualxaplnoMeuoua6 ueunq peards aru eue8 pcldril e JoJ seJuenbas ,trolep8eJ er{l 'suerunq q 'lleJ Jo eddr aql'(v) ur se'eprloal)nuq)pa uaaMlaq JadoJdeq] ut I1uo puu 'la^el alerrdordde aql lE pessaJdxa'arur1JedoJdaql te JJo areds uut I e qllM umplpl .euoua6 Jo uo paurnl sr eue8 aq] teq] Surrnsua JoJelqrsuodseJ eJe qJIqM 'sacuanbasy6161 uetunqaqtro e1er591-tr arnbrl holaynBat qlyv\ pelprJossp sr eue8 qope 'suoxe pue suoJlur o] uonrppe uI 'saruoso(uoJqcJraql ul vNo SurpocSouorlJe4 raqSrqqcnu aql JoJ se IIeM se '(seua8 upunq Jo ]eq] qlapue^ l-euo lnoqP) sauaSrraql Jo azrs rallerus qJnru aql JoJslunocce srql'suoJlur {rEI saruoua8 asrcuoc qlylr susrue8Jo ruor; saua8 .&rroferu aq] 'lseJ]uoc uI 'suoJ]urJo 3ur]srsuoc eue8 eq] Jo lsotu q]lm ;o 'suoJlul pue suoxe Surleuralp go Surns Suoye Jo lsrsuoc snqt seue8 ueurnq;o dlr -rofeu eqJ'(I-t elqeJ pue g1-7 arn8rgaas)suonrq palpc ere saua8ur seruenbes (Surpocuou) Suruarrratureq] lsuoxe pelleJ ale saJuenbasSurpoc eq] 'g JeldeqJ ur 'uralord JoJepol ]Bqt VNC Jo stuetu8es uoqs /^ue Irelep ur pessnf,srpaq III ^ sy -lal eqt Surpocuou ldnJJalur ]Eql VNCI ;o seqcleJls3uo1yo stsrsuoc eua8 E ul VNe Sutureruar aq] Jo lsolN '(sueunq ut sprJp ourrup gtt lnoqe) azrs aBpJaAE Jo urel -ord e epoJua ol parrnbal eJEsJredeppoelonu 00tI lnoqp dpg 'uralord e sprce Jo ourue er{l Jo aruanbas ruauq eql JoJ uopeuJoJur aqt sepnoelJnu go acuanbas 'sJred appoelJnu 'eloqe passnJsrp Jeeurl slr ur serJJec eue8 e sV 000'LZ Jo Imrd,{r azs aua8 aBpJeAEa8rel aql sr eruoue8 ueunq aql eJnl€eJelqelou puooas V Jo 'sraldeqc relel ur lrc]ap ur vuawap a1qasod (v) -su0Jl esarfl ssncsrp aM'etup dteuorlnloura JaAoaruosoruoJqJ ar{] ur sa^lesueql pauasur dlenperS a^Eq lpql VNC Jo sacard alqou 'iloqs Jo dn apeu sl VNCI

'vNo :t raldeql sauouaDpuP'sauosouroJqf


.: :..:.,,,1:;,1$illJ






LINEs SlNEs r e t r o v i r a l - l i ke l e m e n t s D N A - o n l yt r a n s p o s o n ' f o s s i l s '







protein-coding regions GENES


s r m p t es e q u e n c er e p e a t s s e g m e n t adl u p l i c a t i o n s R E P E A T ESDE O U E N C E S

non-repetitiveDNA that is n e i t h e ri n i n t r o n sn o r c o d o n s UNIQUE SEQUENCES

GenomeComparisons RevealEvolutionarily Conserved DNA Sequences A major obstacle in interpreting the nucleotide sequences of human chromosomes is the fact that much of the sequence is probably unimportant. Moreover, the coding regions of the genome (the exons) are typically found in short segments (average size about 145 nucleotide pairs) floating in a sea of DNA whose exact nucleotide sequence is of little consequence. This arrangement makes it very difficult to identify all the exons in a stretch of DNA sequence. Even harder is the determination of where a gene begins and ends and exactly how many exons it spans. Accurate gene identification requires approaches that extract information from the inherently low signal-to-noise ratio of the human genome. We shall describe some of them in Chapter 8. Here we discussonly one general approach, which is based on the observation that sequencesthat have a function are relatively conserved during evolution, whereas those without a function are free to mutate randomly. The strategy is therefore to compare the human sequence with that of the corresponding regions of a related genome, such as that of the mouse. Humans and mice are thought to have diverged from a common mammalian ancestor about 80 x 106years ago,which is long enough for the majority of nucleotides in their genomes to have been changed by random mutational events.Consequently,the only regions that will have remained closely similar in the two genomes are those in which mutations would have impaired function and put the animals carrying them at a disadvantage, resulting in their elimination from the population by natural selection. Such closely similar regions are known as conserued regions. The conserved regions include both functionally important exons and regulatory DNA sequences. In contrast, nonconserued regionsrepresent DNA whose sequenceis unlikely to be critical for function. The power of this method can be increased by comparing our genome with the genomes of additional animals whose genomes have been completely sequenced,including the rat, chicken, chimpanzee, and dog. By revealing in this way the results of a very long natural "experiment," lasting for hundreds of millions of years, such comparative DNA sequencing studies have highlighted the most interesting regions in these genomes.The comparisons reveal that roughly 5% of the human genome consists of "multi-species conserved sequences,"as discussed in detail near the end of this chapter. Unexpectedly, only about onethird of these sequences code for proteins. Some of the conserved noncoding sequences correspond to clusters of protein-binding sites that are involved in gene regulation, while others produce RNA molecules that are not translated into protein. But the function of the majority of these sequences remains This unexpected discovery has led scientists to conclude that we understand much less about the cell biology of vertebrates than we had previously imagined. Certainly, there are enormous opportunities for new discoveries, and we should expect many surprises ahead. Comparative studies have revealed not only that humans and other mammals share most of the same genes,but also that large blocks of our genomes contain these genes in the same order, a feature calIed conseruedsynteny.As a result, Iarge blocks of our chromosomes can be recognized in other species. This allows the chromosome painting technique to be used to reconstruct the recent evolutionarv historv of human chromosomes (Fieure 4-18).

Figure 4-17 Representationof the nucleotidesequencecontent of the completelysequencedhuman genome. retroviral-like elements, The LlNEs, SlNEs, aremobile and DNA-onlytransposons geneticelementsthat havemultipliedin themselves our genomeby replicating and insertingthe new copiesin different positions. Thesemobilegeneticelements in Chapter5 (seeTable5-3, arediscussed p. 318).Simplesequencerepeatsare (lessthan 14 shortnucleotidesequences nucleotidepairs)that arerepeatedagain Segmental and againfor long stretches. arelargeblocksof the duplications genome(1000-200,000 nucleotidepairs) that are presentat two or more locations in the genome.The most highlyrepeated have blocksof DNAin heterochromatin not yet been completelysequenced; of human DNA thereforeabout 10olo in this arenot represented sequences diagram.(Datacourtesyof E.Margulies.)


Chapter4: DNA,Chromosomes, and Genomes


a n c e s t o rD N A of human c h r o m o s o m e3

2 i n v e r soin 5

a n c e s t o rD N A of human c h r o m o s o m e2 1

f usion






ry lemur


nu m a n

abcdabcdabcd (B)

ChromosomesExistin DifferentStatesThroughout the Life of a Cell

Figure4-18 A proposedevolutionary historyof human chromosome3 and its relativesin other mammals.(A)The orderof chromosome3 segments hypothesized to be presenton a chromosomeof a mammalianancestoris shown$rellowbox).The minimum changesin this ancestral chromosome necessary to accountfor the appearance of eachof the threemodern (The chromosomes areindicated. present-day chromosomes of humans and Africanapesareidenticalat this resolution.) The smallcirclesdepicted in the modernchromosomes reoresent the positionsof centromeres. A fissionand inversion that leadsto a changein chromosomeorganization is thoughtto occuronceevery5-10 x 106yearsin mammals.(B)Someof the chromosome paintingexperiments that led to the diagramin (A).Eachimageshowsthe chromosomemostcloselyrelatedto humanchromosome3, paintedgreenby hybridization with differentsegmentsof DNA,lettereda, b, c, and d alongthe bottom of the figure.Theseletters correspondto the coloredsegmentsof the diagramsin (A),as indicatedon the (From5. MLilleret ancestral chromosome. al.,Proc.NatlAcad.Sci.U.5.A.97:206-211, 2000.With permission from National Academyof Sciences.)

We have seen how genesare arranged in chromosomes, but to form a functional chromosome, a DNA molecule must be able to do more than simply carry genes: it must be able to replicate, and the replicated copies must be separatedand reliably partitioned into daughter cells at each cell division. This process occurs through an ordered series of stages,collectively known as the cell cycle, which provides for a temporal separation between the duplication of chromosomes and their segregation into two daughter cells. The cell cycle is briefly summarized in Figure 4-19, and it is discussed in detail in Chapter 17. Only certain

n u c t e a re n v e t o p e


In r e r p n a s e c hr o m o s o m e

m itotic

chromosome I NTERPHASE



Figure4-19 A simplifiedview of the eucaryoticcell cycle,Duringinterphase, the cellis activelyexpressing its genesano is thereforesynthesizing proteins. Also,duringinterphase and beforecelldivision,the DNAis replicated and each chromosomeis duplicatedto producetwo closelypaireddaughterchromosomes (a cellwith onlytwo chromosomes is illustrated here).OnceDNAreplication is complete,the cellcan enterM phase,when mitosisoccursand the nucleusis dividedinto two daughternuclei.Duringthis stage,the chromosomes condense, the nuclearenvelopebreaksdown,and the mitoticspindleformsfrom microtubules and other proteins. Thecondensedmitoticchromosomes arecapturedby the mitoticspindle,and one completesetof chromosomes is then pulledto eachend of the cellby separating eachdaughter chromosomepair.A nuclearenvelopere-formsaroundeachchromosomeset,and in the finalstepof M phase,the cell dividesto producetwo daughtercells,Mostof the time in the cellcycleis spentin interphase; M phaseis briefin comparison, occupyingonly aboutan hour in manymammaliancells.


,:' l$9 ";,;:'"' Figure4-20 A comparisonof extended interphasechromatinwith the chromatin in a mitotic chromosome. (A)A scanningelectronmicrographof a mitoticchromosome: a condensed duolicatedchromosomein whichthe arestilllinked two new chromosomes together(seeFigure4-21).The regionindicatesthe position constricted describedin Figure of the centromere, 4-21.(B)An electronmicrograph showingan enormoustangleof chromatinspillingout of a lysed interphasenucleus.Note the differencein scales.(A,courtesyof TerryD. Allen; B,courtesyof VictoriaFoe.)


1 lr.


l0 pm

parts of the cycle concern us in this chapter. During interphase chromosomes are replicated, and during mitosis they become highly condensed and then are separated and distributed to the two daughter nuclei. The highly condensed chromosomes in a dividing cell are knorm as mitotic chromosomes (Figure 4-2OA).This is the form in which chromosomes are most easily visualized; in fact, the images of chromosomes shor,rmso far in the chapter are of chromosomes in mitosis. During cell division, this condensed state is important for the accurate separation of the duplicated chromosomes by the mitotic spindle, as discussedin Chapter 17. During the portions of the cell cycle when the cell is not dividing, the chromosomes are extended and much of their chromatin exists as long, thin tangled threads in the nucleus so that individual chromosomes cannot be easily distinguished (Figure 4-208).We shall refer to chromosomes in this extended state as interphasechromosomes.Since cells spend most of their time in interphase, and this is where their genetic information is being read out, chromosomes are of greatestinterest to cell biologists when they are least visible.

EachDNAMoleculeThatFormsa LinearChromosome Must Containa Centromere, Origins TwoTelomeres, and Replication A chromosome operates as a distinct structural unit: for a copy to be passed on to each daughter cell at division, each chromosome must be able to replicate, and the newly replicated copies must subsequently be separated and partitioned correctly into the two daughter cells.These basic functions are controlled by three types of specialized nucleotide sequencesin the DNA, each of which binds specific proteins that guide the machinery that replicates and segregates chromosomes (Figure 4-21) . Experiments in yeasts, whose chromosomes are relatively small and easy to manipulate, have identified the minimal DNA sequence elements responsible for each of these functions. One type of nucleotide sequenceacts as a DNA replication origin, the location at which duplication of the DNA begins. Eucaryotic chromosomes contain many origins of replication to ensure that the entire chromosome can be replicated rapidly, as discussedin detail in Chapter 5. After replication, the two daughter chromosomes remain attached to one another and, as the cell cycle proceeds, are condensed further to produce mitotic chromosomes. The presence of a second specialized DNA sequence, called a centromere, allows one copy of each duplicated and condensed chromosome to be pulled into each daughter cell when a cell divides. A protein


Chapter4: DNA,Chromosomes, and Genomes INTERPHASE


r e p li c a t i o n oflgrn




i!, E




portion of m i t o t i cs p i n d l e


d u pl icated chromosomes i n s e p a r a t ec e l l s

complex called a kinetochore forms at the centromere and attaches the duplicated chromosomes to the mitotic spindle, allowing them to be pulled apart (discussedin Chapter I7). The third specializedDNA sequenceforms telomeres, the ends of a chromosome. Telomerescontain repeated nucleotide sequencesthat enable the ends of chromosomes to be efficiently replicated. Telomeresalso perform another function: the repeated telomere DNA sequences,together with the regions adjoining them, form structures that protect the end of the chromosome from being mistaken by the cell for a broken DNA molecule in need of repair. We discuss both this type of repair and the structure and function of telomeres in Chapter 5. In yeast cells, the three types of sequencesrequired to propagate a chromosome are relatively short (typically less than 1000base pairs each) and therefore use only a tiny fraction of the information-carrying capacity of a chromosome. Although telomere sequencesare fairly simple and short in all eucaryotes, the DNA sequencesthat form centromeres and replication origins in more complex organisms are much longer than their yeast counterparts. For example, experiments suggestthat human centromeres contain up to 100,000nucleotide pairs and may not require a stretch of DNA with a defined nucleotide sequence. Instead, as we shall discuss later in this chapter, they seem to consist of a large, regularly repeating protein-nucleic acid structure that can be inherited when a chromosome replicates.

DNAMolecules AreHighlyCondensed in Chromosomes All eucaryotic organisms have special ways of packaging DNA into chromosomes. For example, if the 48 million nucleotide pairs of DNA in human chromosome 22 could be laid out as one long perfect double helix, the molecule would extend for about 1.5 cm if stretched out end to end. But chromos ome 22 measures only about 2 pm in length in mitosis (seeFigures 4-10 and 4-ll), representing an end-to-end compaction ratio of nearly 10,000-fold.This remarkable feat of compression is performed by proteins that successivelycoil and fold the DNA into higher and higher levels of organization. Although much less condensed than mitotic chromosomes, the DNA of human interphase chromosomes is still tightly packed, with an overall compaction ratio of approximately 500-fold (the length of a chromosome's DNA helix divided by the end-to-end length of that chromosome). In reading these sections it is important to keep in mind that chromosome structure is dynamic. We have seen that each chromosome condenses to an unusual degree in the M phase of the cell cycle. Much less visible, but of enormous interest and importance, specific regions of interphase chromosomes

Figure4-21 The three DNAsequences requiredto producea eucaryotic chromosomethat can be replicatedand then segregatedat mitosis.Each chromosomehasmultipleoriginsof replication, one centromere, and two telomeres.Shownhere is the seouenceof eventsthat a typicalchromosome follows duringthe cellcycle.The DNAreplicates in interphase, beginningat the originsof replication and proceeding bidirectionally from the originsacrossthe chromosome. In M phase,the centromere attachesthe duplicatedchromosomes to the mitoticspindleso that one copyis distributedto eachdaughtercellduring mitosis. Thecentromerealsohelpsto hold the duplicatedchromosomes togetheruntilthey arereadyto be movedapart.The telomeresform special caosat eachchromosomeend.



decondense as the cells gain access to specific DNA sequences for gene expression, DNA repair, and replication-and then recondense when these processesare completed. The packaging of chromosomes is therefore accomplished in a way that allows rapid localized, on-demand accessto the DNA. In the next sections we discuss the specialized proteins that make this type of packagingpossible.

Nucleosomes Area BasicUnitof Eucaryotic Chromosome Structure The proteins that bind to the DNA to form eucaryotic chromosomes are traditionally divided into two general classes:the histones and the nonhistone chromosomal proteins.The complex of both classesof protein with the nuclear DNA of eucaryotic cells is knor,rmas chromatin. Histones are present in such enormous quantities in the cell (about 60 million molecules of each t)?e per human cell) that their total mass in chromatin is about equal to that of the DNA. Histones are responsible for the first and most basic level of chromosome packing, the nucleosome, a protein-DNA complex discovered in 1974.\Mhen interphase nuclei are broken open very gently and their contents examined under the electron microscope, most of the chromatin is in the form of a fiber with a diameter of about 30 nm (Figure 4-22A).If this chromatin is subjected to treatments that cause it to unfold partially, it can be seen under the electron microscope as a series of "beads on a string" (Figure 4-228). The string is DNA, and each bead is a "nucleosome core particle" that consists of DNA wound around a protein core formed from histones. The structural organization of nucleosomes was determined after first isolating them from unfolded chromatin by digestion with particular enzymes (called nucleases) that break dor,rmDNA by cutting between the nucleosomes. After digestion for a short period, the exposed DNA between the nucleosome core particles, the linker Dl/A, is degraded. Each individual nucleosome core particle consists of a complex of eight histone proteins-two molecules each of histones H2A, HzB, H3, and H4-and double-stranded DNA that is 147 nucleotide pairs long. The histone octamer forms a protein core around which the double-stranded DNA is wound (Figure 4-23). Each nucleosome core particle is separated from the next by a region of linker DNA, which can vary in length from a few nucleotide pairs up to about 80. (The term nucleosometechnically refers to a nucleosome core particle plus one of its adjacent DNA linkers, but it is often used synonymously with nucleosome core particle.) On average,therefore, nucleosomes repeat at intervals of about 200 nucleotide pairs. For example, a diploid human cell with 6.4 x 10enucleotide pairs contains approximately 30 million nucleosomes.The formation of nucleosomes converts a DNA molecule into a chromatin thread about one-third of its initial length.


Figure 4-22 Nucleosomesas seenin the electron microscope,(A)Chromatin isolateddirectlyfrom an interphase nucleusappearsin the electron microscooe asa thread30 nm thick. (B)Thiselectronmicrographshowsa lengthof chromatinthat hasbeen or unpacked, experimentally afterisolationto showthe decondensed, (A,courtesyof Barbara nucleosomes. Hamkalo;B,courtesyof VictoriaFoe.)


Chapter4: DNA,Chromosomes, and Genomes Figure4-23 Structuralorganizationof the nucleosome. A nucleosome containsa proteincoremadeof eight histonemolecules. In biochemical experiments, the nucleosome coreparticlecan be released from isolated chromatinby digestionof the linkerDNAwith a nuclease, an enzymethat breaksdown DNA,(Thenuclease candegradethe exposedlinkerDNAbut cannotattackthe DNAwound tightlyaroundthe nucleosome core.)After dissociation of the isolatednucleosome into its proteincoreand DNA,the lengthof the DNAthat waswound aroundthe corecan be determined. Thislengthof 147nucleotidepairsis sufficientto wrap 1.7 timesaround the histonecore.

c o r eh i s t o n e s of nucleosome

nucleosome includes " beads-on-a-string " -200 nucleotide formof chromatin p a i r so f D N A


,.rd'..Q'...od TheStructureof the Nucleosome CoreParticleReveals How DNA ls Packaged The high-resolution structure of a nucleosome core particle, solved in 1997, revealed a disc-shaped histone core around which the DNAwas tightlywrapped 1.7 turns in a left-handed coil (Figure 4-24). All four of the histones that make up the core of the nucleosome are relatively small proteins (102-135 amino acids), and they share a structural motif, known asthe histonefold,formed from three cr helices connected by two loops (Figure 4-25).In assembling a nucleosome, the histone folds first bind to each other to form H3-H4 and H2A-H2B dimers, and the H3-H4 dimers combine to form tetramers. An H3-H4 tetramer then further combines with two HZA-H2B dimers to form the compact octamer core, around which the DNA is wound (Figure 4-26). The interface between DNA and histone is extensive: 142 hydrogen bonds are formed between DNA and the histone core in each nucleosome. Nearly half of these bonds form between the amino acid backbone of the histones and the phosphodiester backbone of the DNA. Numerous hydrophobic interactions and salt linkages also hold DNA and protein together in the nucleosome. For example, more than one-fifth of the amino acids in each of the core histones are either lysine or arginine (two amino acids with basic side chains), and their positive


t -^,^^-^r *(


octameric h i s t o n ec o r e


ilJ H2A

side view

€ t h i s t o n eH 2 A

bottom view

@ h i s t o n eH 2 B

@ h i s t o n eH 3


h i s t o n eH 4


"ill".'"?..&ll rir .. :;;;i:r

sL H2B

147-nucleotide-pair D N Ad o u b l eh e l i x



ar6 H4

Figure4-24 The structureof a nucleosome core particle,as determined by x-ray diffraction analysesof crystals.Each histoneis coloredaccordingto the scheme in Figure4-23,with the DNAdoublehelixin light gray.(From K. Luger et al.,Nature 389:251-260,1997.With permissionfrom MacmillanPublishers Ltd.)



(A) H2A 4la6t





m*"'-' N - t e r m i n atla i l

**ffi h i s t o n ef o l d

Figure4-25 The overallstructuralorganizationof the core histones. (A)Eachof the corehistonescontainsan N-terminal tail,which is subjectto severalformsof covalentmodification, and a histonefold region,as (B)Thestructureof the histonefold,whichis formedby allfour indicated. (C)Histones of the corehistones. 24 and 28 form a dimerthroughan interactionknownasthe "handshakei'Histones H3 and H4 form a dimer throughthe sametype of interaction.

charges can effectively neutralize the negatively charged DNA backbone. These numerous interactions explain in part why DNA of virtually any sequencecan be bound on a histone octamer core. The path of the DNA around the histone core is not smooth; rather, several kinks are seen in the DNA, as expected from the nonuniform surface of the core.The bending requires a substantial compression of the minor groove of the DNA helix. Certain dinucleotides in the minor groove are especiallyeasyto compress,and some nucleotide sequencesbind the nucleosome more tightly than others (Figure 4-27). This probably explains some striking, but unusual, casesof very precise positioning of nucleosomes along a stretch of DNA. For most of the DNA sequencesfound in chromosomes, however, the sequence preference of nucleosomes must be small enough to allow other factors to dominate, inasmuch as nucleosomes can occupy any one of a number of positions relative to the DNA sequence in most chromosomal regions. In addition to its histone fold, each of the core histones has an N-terminal amino acid "tail", which extends out from the DNA-histone core (see Figure 4-26). These histone tails are subject to several different types of covalent modifications that in turn control critical aspects of chromatin structure and function, as we shall discuss shortly. As a reflection of their fundamental role in DNA function through controlling chromatin structure, the histones are among the most highly conserved eucaryotic proteins. For example, the amino acid sequenceof histone H4 from a pea and from a cow differ at only 2 of the 102 positions. This strong evolutionary conservation suggeststhat the functions of histones involve nearly all of their amino acids, so that a change in any position is deleterious to the cell. This suggestion has been tested directly in yeast cells, in which it is possible to mutate a given histone gene in uitro andintroduce it into the yeast genome in place of the normal gene. As might be expected, most changes in histone sequences are lethal; the few that are not lethal cause changes in the normal pattern of gene expression,as well as other abnormalities. Despite the high conservation of the core histones, eucaryotic organisms also produce smaller amounts of specializedvariant core histones that differ in amino acid sequence from the main ones. As we shall see,these variants, combined with a surprisingly large variety of covalent modifications that can be added to the histones in nucleosomes, make possible the many different chromatin structures that are required for DNA function in higher eucaryotes.



Chapter4: DNA,Chromosomes, and Genomes Figure4-26 The assemblyof a histone octamer on DNA.The histoneH3-H4 dimerand the H2A-H2Bdimerare formed from the handshakeinteraction. An H3-H4tetramerformsand bindsto the DNA.Two H2A-H2Bdimersarethen added,to completethe nucleosome. The histonesarecoloredas in Figures4-24 and4-25. Notethat all eight N-terminal tailsof the histonesprotrudefrom the disc-shaped corestructure. Their conformations arehighlyflexible. lnsidethe cell,the nucleosome assemblyreactions shownhereare mediatedby histonechaperoneproteins, some specificfor H3-H4 and others specificfor H2A-H28.(Adaptedfrom figuresby J.Waterborg.)

H 3 - H 4d i m e r

I H3-H4 tetramer

two dimers bind to H3-H4 tetramer


G-Cpreferred here (minor groove outside)

TT,and TA dinucleotides preferred here ( m i n o rg r o o v ei n s i d e ) histone core of nucleosome (histoneoctamer)

D N Ao f nucteosome

Figure4-27 The bending of DNA in a nucleosome. The DNAhelixmakes 1.7tight turnsaroundthe histone octamer.This diagramillustrates how the minorgrooveis compressed on the insideof the turn.Owingto certain structural featuresof the DNAmolecule, the indicateddinucleotides are preferentially accommodated in sucha narrowminorgroove,which helpsto explainwhy certainDNAsequences will bind moretightlythan othersto the nucleosomecore.

CHROMOSOMAL DNAAND ITSPACKAGING INTHECHROMATIN FIBER w r a p p e on u c r e o s o m e existsfor 250 milliseconds

u n w r a p p e dn u c l e o s o m e existsfor 10-50 milliseconds

rewrapped nucleosome


Figure4-28 Dynamicnucleosomes. showthat the DNA Kineticmeasurements is surprisingly in an isolatednucleosome dynamic,rapidlyuncoilingand then core. rewrappingaroundits nucleosome As indicated, this makesmostof its bound to otherDNADNAsequenceaccessible bindingproteins.(Datafrom G. Li and J.Widom, Nat.Struct.Mol.Biol.11:763-769, from Macmillan 2004.With oermission Ltd.) Publishers

Nucleosomes Havea DynamicStructure, and Are Frequently Subjected to Changes ChromatinCatalyzed by ATP-Dependent Remodeling Complexes For many years biologists thought that, once formed in a particular position on DNA, a nucleosome remains fixed in place because of the very tight association between its core histones and DNA. If true, this would pose problems for genetic readout mechanisms, which in principle require rapid accessto many specific DNA sequences,as well as for the rapid passageof the DNA transcription and replication machinery through chromatin. But kinetic experiments show that the DNA in an isolated nucleosome unwraps from each end at rate of about 4 times per second, remaining exposed for 10 to 50 milliseconds before the partially unr,trapped structure recloses.Thus, most of the DNA in an isolated nucleosome is in principle availablefor binding other proteins (Figure 4-28). For the chromatin in a cell, a further loosening of DNA-histone contacts is clearly required, because eucaryotic cells contain a large variety of ATP-dependent chromatin remodeling complexes. The subunit in these complexes that hydrolyzes ATP is evolutionarily related to the DNA helicases (discussed in Chapter 5), and it binds both to the protein core of the nucleosome and to the double-stranded DNA that winds around it. By using the energy of AIP hydrolysis to move this DNA relative to the core, this subunit changes the structure of a nucleosome temporarily, making the DNA less tightly bound to the histone core. Through repeated cycles of ATP hydrolysis, the remodeling complexes can catalyze nucleosomesliding, and by pulling the nucleosome core along the DNA double helix in this way, they make the nucleosomal DNA availableto other proteins in the cell (Figure 4-25). In addition, by cooperating with negatively ATP-dependent c h r o m a t i nr e m o d e l i n g





Figure4-29 The nucleosomesliding catalyzedby ATP-dependentchromatin remodelingcomplexes.Usingthe the remodeling energyof ATPhydrolysis, complexis thoughtto pushon the DNA and loosenits of its bound nucleosome core.Each attachmentto the nucleosome and cycleof ATPbinding,ATPhydrolysis, release of the ADPand PiProducts therebymovesthe DNAwith respectto the histoneoctamerin the directionof the arrowin this requires manysuchcyclesto producethe slidingshown.(Seealso nucleosome Figure4-468.)


Chapter4: DNA,Chromosomes, and Genomes

Figure4-30 Nucleosomeremovaland histone exchangecatalyzedby ATP-dependent chromatinremodelingcomplexes.By cooperating with specifichistonechaperones, somechromatinremodelingcomplexes can removethe H2A-H2Bdimersfrom a (top seriesof reactions) nucleosome and replacethem with dimersthat containa variant histone,suchas the H2AZ-H2Bdimer (see Figure4-41).Otherremodelingcomplexes are attractedto specificsiteson chromatinto removethe histoneoctamercompletelyand/or to replaceit with a differentnucleosome core (bottomseriesof reactions)

h i s t o n ec h a p e r o n e

ATP-dependent chromatin remodeling complex



charged proteins that serve as histone chaperones,some remodeling complexes are able to remove either all or part of the nucleosome core from a nucleosome-catalyzing either an exchange of its HZA-H2B histones, or the complete removal of the octameric core from the DNA (Figure 4-90). Cellscontain dozensof differentATP-dependentchromatin remodeling complexes that are specializedfor different roles. Most are large protein complexes that can contain 10 or more subunits. The activity of these complexesis carefully controlled by the cell.As genesare turned on and off, chromatin remodeling complexes are brought to specific regions of DNA where they act locally to influence chromatin structure (discussedin Chapter 7; seealso Figure 4-46, below). As pointed out previously,for most of the DNA sequencesfound in chromosomes,experimentsshow that a nucleosomecan occupy any one of a number of positions relative to the DNA sequence.The most important influence on nucleosomepositioning appearsto be the presenceof other tightly bound proteins on the DNA. Some bound proteins favor the formation of a nucleosome adjacent to them. others create obstacles that force the nucleosomes ro move to positions between them. The exact positions of nucleosomes along a stretch of DNA therefore depends mainly on the presence and nature of other proteins bound to the DNA. Due to the presenceof ATP-dependentremodeling complexes, the arrangement of nucleosomes on DNA can be highly dynamic, changing rapidly accordingto the needs of the cell.

Nucleosomes Are UsuallyPacked Togetherinto a Compact Ch r o m a t i F n i b er Although enormously long strings of nucleosomes form on the chromosomal DNA, chromatin in a living cell probably rarely adopts the extended "beads on a string" form. Instead, the nucleosomes are packed on top of one anothe; generating regular arrays in which the DNA is even more highly condensed. Thus, when nuclei are very gently lysed onto an electron microscope grid, most of the chromatin is seen to be in the form of a fiber with a diameter of about 30 nm, which is considerably wider than chromatin in the "beads on a string" form (see Figure 4-22).



F i g u r e 4 - 3 1A z i g z a g m o d ef lo r t h e 3 0 - n m c h r o m a t i n f i b e r . (TAh) e c o n f o r m a t i o n o f t w o o f t h e f o u r (B)Schematic of nucleosomes in a tetranucleosome, from a structuredeterminedby x-raycrystallography. is not visible,beingstackedon the bottom nucleosome the entiretetranucleosome; the fourth nucleosome of a possiblezigzagstructurethat couldaccount illustration and behindit in this diagram.(C)Diagrammatic 2005'With for the 30-nm chromatinfiber.(Adaptedfrom C.L.Woodcock,Ndf.Sttuct.Mol.Biol.12:639-640, permission Ltd.) from MacmillanPublishers

How are nucleosomes packed in the 30-nm chromatin fiber? This question has not yet been answered definitively, but important information concerning the structure has been obtained. In particular, high-resolution structural analyses have been performed on homogeneous short strings of nucleosomes, prepared from purified histones and purified DNA molecules. The structure of a tetranucleosome, obtained by X-ray crystallography,has been used to support a zigzag model for the stacking of nucleosomes in the 30-nm fiber (Figure 4-3f ). But cryoelectron microscopy of longer strings of nucleosomes supports a very different solenoidal structure with intercalated nucleosomes (Figure 4-32). \Arhatcauses the nucleosomes to stack so tightly on each other in a 30-nm fiber? The nucleosome to nucleosome linkages formed by histone tails, most notably the H4 tail (Figure 4-33) constitute one important factor. Another



1 0n m

Figure4-32 An interdigitatedsolenoidmodelfor the 30-nmchromatinfiber.(A)Drawingsin whichstringsof color(B)Schematic diagramof finalstructurein (A). codednucleosomes areusedto illustratehow the solenoidis generated. arrays imagesof nucleosome (C)Structuralmodel.The modelis derivedfrom high-resolution microscopy cryoelectron octamersand Bothnucleosome of specificlengthand sequence. reconstituted from purifiedhistonesand DNAmolecules a linkerhistone(discussed below)wereusedto produceregularlyrepeatingarrayscontainingup to 72 nucleosomes' 1, 2006.With (Adaptedfrom P.Robinson,L. Fairall, V. Huynhand D. Rhodes,Proc.NatlAcad.Sci.U.S.A.103:6506-651 permission from NationalAcademyof Sciences.)


Chapter4: DNA, Chromosomes,and Genomes

H 4t a i l H 2 At a i l

H 2 Bt a i l . . H 3t a i l

H4 tail

H3 tail

important factor is an additional histone that is often present in a l-to-1 ratio with nucleosome cores, knor,r,nas histone Hl. This so-called linker histone is larger than the individual core histones and it has been considerably less well conserved during evolution. A single histone Hl molecule binds to each nucleosome, contacting both DNA and protein, and changing the path of the DNA as it exits from the nucleosome. Although it is not understood in detail how Hl pulls nucleosomes together into the 30-nm fiber, a change in the exit path in DNA seems crucial for compacting nucleosomal DNA so that it interlocks to form the 30-nm fiber (Figure 4-34). Most eucaryotic organisms make several histone Hl proteins of related but quite distinct amino acid sequences. It is possible that the 30-nm structure found in chromosomes is a fluid mosaic of several different variations. For example, a linker histone in the Hl family was present in the nucleosomal arrays studied in Figure 4-32 but was missing from the tetranucleosome in Figure 4-31. Moreover, we saw earlier that the linker DNA that connects adjacent nucleosomes can vary in length; these differences in linker length probably introduce local perturbations into the structure. And the presenceof many other DNA-binding proteins, as well as proteins that bind directly to histones, will certainly add important additional features to any array of nucleosomes.

Figure4-33 A speculativemodel for the role playedby histonetailsin the formationof the 30-nmfiber.(A)This schematic diagramshowsthe approximate exit pointsof the eight histonetails,one from eachhistone protein.that extendfrom each nucleosome. Theactualstructureis shown to its right.In the high-resolution structure of the nucleosome, the tailsarelargely unstructured, suggesting that they are highlyflexible.(B)A speculative model showinghow the histonetailsmay helpto packnucleosomes togetherinto the 30-nmfiber.Thismodelis basedon (1) experimental evidencethat histonetails aid in the formationof the 30-nmfiber, and (2)the x-raycrystalstructureof the nucleosome, in whichthe tailsof one nucleosome contactthe histonecoreof an adjacentnucleosome in the crystallattice.

Su m m a r y A geneis a nucleotidesequencein a DNA moleculethat actsas a functional unit for the production of a protein, a structural RNA,or a catalytic or regulatory RNAmolecule.In eucaryotes,protein-codinggenesare usually composedof a string of alternating introns and exonsassociatedwith regulatory regionsof DNA. A chromosomeisformeclfrom a single,enormously long DNA moleculethat contains a linear array of many genes.The human genomecontains3.2x ]d DNA nucleotidepairs,diuidedbetween22 dffirent autosomesand 2 sexchromosomes.only a small percentageof this DNA codesfor proteins or functional RNAmolecules.A chromosomal DNA moleculealso contains three other filpes of functionally important nucleotide sequences:replication origins and telomeresallow the DNA molecule to be fficiently replicated, while a centromere attaches the daughter DNA moleculesto the mitotic spindle, ensuring their accurate segregationto daughter cellsduring the M phaseof the cell cycle.

Figure4-34 How the linkerhistone bindsto the nucleosome. The position and structureof the globularregionof histoneH1 areshown.As indicated, this regionconstrains an additional 20 nucleotidepairsof DNAwhereit exits from the nucleosome core.Thistype of bindingby H1 isthoughtto be important for formingthe 30-nmchromatinfiber. The long C-terminal tail of histoneH1 is alsorequiredfor the high-affinity binding of H1 to chromatin,but neitherits positionor that of the N-terminal tail is (B)structure. known.(A)Schematic, (8,from D. Brown,T. lzardand T. Misteli, Nat,Struct.Mol. Biol. 13:250-255,2006. With permission from Macmillan Publishers Ltd.)



The DNA in eucaryotesis tightly bound to an equal massof histones,which form repeatedarrays of DNA-protein particles called nucleosomes.The nucleosomeis composedof an octameric core of histone proteins around which the DNA double helix is wrapped.Nucleosomesare spacedat interuals of about 200 nucleotidepairs, and they are usually packed together (with the aid of histone Hl molecules)into quasi-regular arrays to form a 30-nm chromatin fiber. Despite the high degreeof compaction in chromatin, its structure must be highly dynamic to allow accessto the DNA. Thereis somespontaneousDNA unwrapping and rewrappingin the nucleosomeitself;how' euer,the general strategyfor reuersiblychanging local chromatin structure features ATP-driuen chromatin remodeling complexes.Cells contain a large set of such complexes,which are targeted to speciflc regionsof chromatin at appropriate times. The remodeling complexescollaborate with histone chaperonesto allow nucleosomecores to be repositioned,reconstitutedwith dffirent histones,or completelyremouedto exposethe underlying DNA.

THEREGULATION OFCHROM IN STRUCTURE Having described how DNA is packagedinto nucleosomesto create a chromatin fiber, we now turn to the mechanisms that create different chromatin structures in different regions of a cell's genome. We now know that mechanisms of this type are used to control many genesin eucaryotes.Most importantly, certain types of chromatin structure can be inherited; that is, the structure can be directly passed donm from a cell to its decendents.Becausethe cell memory that results is based on an inherited protein structure rather than on a change in DNA sequence,this is a form of epigenetic inheritance. The prefix epl is Greek for "on"; this is appropriate, becauseepigeneticsrepresentsa form of inheritance that is superimposed on the genetic inheritance based on DNA (Figure,t-35). In Chapter 7, we shall introduce the many different ways in which the expression of genes is regulated. There we discuss epigenetic inheritance in detail and present severaldistinct mechanisms that can produce it. Here, we are concerned with only one, that based on chromatin structure. We begin this section with an introduction to inherited chromatin structures and then describe the basis for them-the covalent modification of histones in nucleosomes.We shall see that these modifications serve as recognition sites for protein modules that bring specific protein complexes to the appropriate regions of chromatin, thereby producing specific effects on gene expressionor inducing other biological functions. Through such mechanisms, chromatin structure plays a central role in the development, growth, and maintenance of eucaryotic organisms' including ourselves.


E P I G E N E TI N I CH E R I T A N C E g e n eY o n

g e n eX o n

seeuerucr I orun

cHANGE IV cHnovnrtru



gene X ofl


gene X off


gene Y off

P R O D U C T I OONF G E R MC E L L S :ililiii:li:i.t, t:]it::]iltl::.lul

gene X off


I tiiilillii*:i iilisi:i:liitl

Figure4-35 A comparisonof genetic inheritancewith an epigenetic inheritancebasedon chromatin is based structures.Geneticinheritance of DNA on the directinheritance duringDNA nucleotidesequences DNAsequencechangesare replication. not only transmittedfaithfullyfrom a but somaticcellto all of its descendents, alsothroughgerm cellsfrom one generationto the next.Thefieldof genetics,reviewedin Chapter8, is based of thesechanges on the inheritance The type of betweengenerations. shownhereis epigeneticinheritance basedon other moleculesboundto the DNA,and it is thereforelesspermanent in than a changein DNAsequence; particular, epigeneticinformationis usually(but not always)erasedduring the formationof eggsand sPerm. that Onlyone epigeneticmechanism, of chromatin basedon an inheritance in this chapter. is discussed structures, are Otherepigeneticmechanisms presentedin Chapter7, whichfocuseson (see the controlof geneexpression Figure7-86).


Chapter4: DNA,Chromosomes, and Genomes

SomeEarlyMysteries Concerning ChromatinStructure Thirty years ago, histones were viewed as relatively uninteresting proteins. Nucleosomes were known to cover all of the DNA in chromosomes, and they were thought to exist to allow the enormous amounts of DNA in many eucaryotic cells to be packaged into compact chromosomes. Extrapolating from what was knor.m in bacteria, many scientists believed that gene regulation in eucaryotes would simply bypass nucleosomes, treating them as uninvolved bystanders. But there were reasons to challenge this view. Thus, for example, biochemists had determined that mammalian chromatin consists of an approximately equal mass of histone and non-histone proteins. This would mean that, on auerLge,every 200 nucleotide pairs of DNA in our cells is associated with more than 1000 amino acids of non-histone proteins (that is, a mass of protein equivalent to the total mass of the histone octamer plus histone Hl). We now know that many of these proteins bind to nucleosomes, and their abundance might suggestthat histones are more than just packaging proteins. A second reason to challenge the view that histones were inconsequential to gene regulation was based on the amazingly slow rate of evolutionary change in the sequences of the four core histones. The previously mentioned fact that there are only two amino acid differences in the sequence of mammalian and pea histone H4 implies that a change in almost any one of the 102 amino acids in H4 must be deleterious to these organisms.\iVhattype of process could make the life of an organism so sensitive to the exact structure of the nucleosome core that only two amino acids had changed in more than 500 million years of random variation followed by natural selection? Last but not least, a combination of genetics and cytology had revealed that a particular form of chromatin silencesthe genesthat it packageswithout regard to nucleotide sequence-and does so in a manner that is directly inherited by both daughter cells when a cell divides. It is to this subiect that we turn next.

Heterochromatin ls HighlyOrganized and Unusually Resistant to GeneExpression Light-microscope studies in the 1930sdistinguished two types of chromatin in the interphase nuclei of many higher eucaryotic cells: a highly condensed form, called heterochromatin, and all the rest, which is less condensed, called euchromatin. Heterochromatin representsan especially compact form of chromatin (see Figure 4-9), and we are finally beginning to understand important aspects of its molecular properties. Although present in many locations along chromosomes, it is also highly concentrated in specific regions, most notably at the centromeres and telomeres introduced previously (seeFigure 4-21). In a typical mammalian cell, more than ten percent of the genome is packaged in this way. The DNA in heterochromatin contains very few genes, and those euchromatic genes that become packaged into heterochromatin are turned off by this type of packaging. However, we know now that the term heterochromatin encompassesseveraldistinct types of chromatin structures whose common feature is an especially high degree of compaction. Thus, heterochromatin should not be thought of as encapsulating "dead" DNA, but rather as creating different tlpes of compact chromatin with distinct features that make it highly resistant to gene expression for the vast majority of genes. lvhen a gene that is normally expressedin euchromatin is experimentally relocated into a region of heterochromatin, it ceasesto be expressed,and the gene is said to be silenced.Thesedifferences in gene expression are examples of position effects, in which the activity of a gene depends on its position relative to a nearby region of heterochromatin on a chromosome. First recognized in Drosophila, position effects have now been observed in many eucarvotes, including yeasts,plants, and humans.





genes _ a---123 45

heterochromatin euchromatin

12345 trI1T."I{tr;iffiTlTi_T: jl



e a r l yi n t h e d e v e l o p i n ge m b r y o ,h e t e r o c h r o m a t i fno r m s a n d s p r e a d si n t o n e i g h b o r i n g euchromatinto different extents in different cells







12345 heterochromatin euchromatin

r--r[rT-r] c l o n eo f c e l l sw i t h gene 1 inactive (A)

c l o n eo f c e l l sw i t h g e n e s1 , 2 , a n d 3 i n a c t i v e

c l o n eo f c e l l sw i t h n o g e n e si n a c t i v a t e d


Figure4-36 The causeof position effect variegationin Drosophild.(A)Heterochromatin(green)is normallyprevented whichwe shalldiscussshortly. sequences, from spreadinginto adjacentregionsof euchromatin(red)by specialbarrierDNA this barrieris no longerpresent.(B)Duringthe early In fliesthat inheritcertainchromosomal however, rearrangements, for different DNA,proceeding developmentof suchflies,heterochromatin can spreadinto neighboringchromosomal patternof heterochromatin is inherited,so that distances in differentcells.Thisspreadingsoonstops,but the established and largeclonesof progenycellsareproducedthat havethe sameneighboringgenescondensedinto heterochromatin is (hencethe "variegated" therebyinactivated appearance of someof theseflies;seeFigure4-37).Although"spreading" the term may not be existingheterochromatin, usedto describethe formationof new heterochromatin closeto previously can"skipover"someregionsof chromatin, whollyaccurate. heterochromatin Thereis evidencethat duringexpansion, sparingthe genesthat lie withinthem from repressive effects

The position effects associated with heterochromatin exhibit a feature called position effectuariegation,which in retrospect provided critical clues concerning chromatin function. ln Drosophila, chromosome breakage events that directly connect a region of heterochromatin to a region of euchromatin tend to inactivate the nearby euchromatic genes.The zone of inactivation spreadsa different distance in different early cells in the fly embryo, but once the heterchromatic condition is established on a gene, it tends to be stably inherited by all of the cell's progeny (Figure 4-36). This remarkable phenomenon was first recognized through a detailed genetic analysis of the mottled loss of red pigment in the fly eye (Figure 4-37), but it shares many features with the extensive spread of heterochromatin that inactivates of one of the two X chromosomes in female mammals (seep. 473). Extensive genetic screenshave been carried out in Drosophila, as well as in fungi, in a search for gene products that either enhance or suppress the spread of heterochromatin and its stable inheritance-that is, for genes that when mutated serve as either enhancers or suppressorsof position effect variegation. In this way, more than 50 genes have been identified that play a critical role in these processes.In recent years, the detailed characterization of the proteins produced by these genes has revealed that many are nonhistone chromosomal proteins that underlie a remarkable mechanism for eucaryotic gene control, one

White gene at normal location

barner heterochromatin

rare cnromosome In v e r s t o n


Figure4-37 The discoveryof position effectson gene expression.TheWhite gene in the fruit fly Drosophilacontrols eyepigmentproductionand is named afterthe mutationthat firstidentifiedit. Wild-typeflieswith a normal Whitegene (White+) havenormalpigmentproduction, which givesthem red eyes,but if the White the geneis mutatedand inactivated, mutantflies(White-)makeno pigment and havewhiteeyes.Infliesin whicha normalWhite+gene has been moved near the eyesare a regionof heterochromatin, mottled,with both red and whltepatches. fhe white patchesrepresentcell lineages in which the White+gene hasbeen silencedby the effectsof the In contrast, the red heterochromatin. patchesrepresent celllineagesin which the White+gene is expressed.Earlyin when the heterochromatin develooment, isfirstformed,it spreadsinto neighboring to differentextentsin euchromatin differentembryoniccells(seeFigure 4-36).The presenceof largepatchesof red and whitecellsrevealsthat the stateof activity,asdeterminedby transcriptional the packagingof this geneinto chromatin in thoseancestorcells,is inheritedby all dauqhtercells.


Chapter4: DNA,Chromosomes, and Genomes







rtl C-C-










fn' CHt

CH, I N-



CH, +









t t- Cl -

n fn, I

CHr t|




CHt +



tll C-CI I



/ \cn. | H,c -




dimethyl lysine


CH: acetyllysine

Figure4-38 Someprominenttypesof covalentamino acid side-chain modificationsfound on nucleosomalhistones.(A)Threedifferentlevelsof lysinemethylationareshown;eachcan be recognized by a different bindingproteinand thuseachcan havea differentsignificance for the cell. Notethat acetylation removesthe pluschargeon lysine,and that,most importantly,an acetylatedlysinecannot be methylated,and viceversa. (B)Serinephosphorylation addsa negativechargeto a histone. Modifications not shownherearethe mono-or di-methylation of an arginine,the phosphorylation of a threonine,the additionof ADP-ribose to a glutamicacid,and the additionof a ubiquityl,sumoyl,or biotingroupto a lysine.

trimethyl lysine

( B ) S E R I NP EH O S P H O R Y L A T I O N




ttl -N-C-CH



lOH Senne





o O-




that requires the precise amino acid sequencesof the core histones. This mechanism of gene control therefore helps to explain the remarkably slow change in the histones over time.

TheCoreHistones AreCovalently Modifiedat ManyDifferent Sites The amino acid side chains of the four histones in the nucleosome core are subjected to a remarkable variety of covalent modifications, including the acetylation of lysines, the mono-, di-, and tri-methylation of lysines, and the phosphorylation of serines (Figure 4-38). A large number of these side-chain modifications occur on the eight relatively unstructured N-terminal "histone tails" that protrude from the nucleosome (Figure 4-39). However, there are also specific side-chain modifications on the nucleosome'sglobular core (Figure 4-40). Ail of the above types of modifications are reversible.The modification of a particular amino acid side chain in a nucleosome is created by a specific enzyme, with most of these enzymes acting only on one or a few sites.A different enzyme is responsible for removing each side chain modification. Thus, for example, acetyl groups are added to specific lysines by a set of different histone acetyl transferases (FIATs)and removed by a set of histone deacetylase complexes (HDACs).Likewise,methyl groups are added to lysine side chains by a set of different histone methyl transferases and removed by a set of histone demethylases. Each enzwe is recruited to specific sites on the chromatin at defined times in each cell'slife history. For the most part, the initial recruitment of these enz).rnesdepends on gene regulatory proteins thatbind to specific DNA sequencesalong chromosomes, and these are produced at different times in the life of an organism, as described in chapter 7. But in at least some cases,the covalent modifications on nucleosomes can persist long after the gene regulatory proteins that first induced them have disappeared,thereby carrying a memory in the cell of its developmental history. very different patterns of covalent modifications are therefore found on different groups of nucleosomes, according to their exact position on a chromosome and the status of the cell.





# 'ift- #W



s c n e x o c e r . q n a x a l s t R s s R A GL Q F P v G R V - i r 1315 9 1 5 119



.A le l



iA 'f




H2B . "ffi' \ Y

prratIr\ou aAA'selppuereJJlpIIec p ueq ^ tsol ,{1enrlca1as saua8 leql palcadsns rtlpu€rro slsrSoyorq'elqrsJalaur uaryo sr uorlertueJeJJlpIIac asnef,eq pue 'uospeJ srql JoC '11-2 arnt;g) auroua8 erues eql uleluoJ sllec olvtl aql IEIII aulSeur ot ]lnJrJJrpsr tl teql eruaJlxa os eJBsecueJeJJlpeql 'aldruexa ro; 'ail.coqdud1 p qly\^ uoJnau uellerurupur e aruduroo errr.Jl'uorlJunJ pue eJnlJnJls qloq uI dgerqeuerp raJJrprusrueS;o relnlacllpru E u1 sed,'i1flac tualeJJrp eqJ

tour_No) lNl9 lo


'sursrue8roJelnlleJrtlnu ul IoJluoo auaS;o saldrculrd Jrseq eruos Jo ^ er,t"ralo ue ql1,trru€aq all'uJnl ur slelel tuaJeJJIpeql ssncsrp e,r,rpup 's1ana1 dueur le alerado sauaS;o uolssardxa eql IoJluoJ luql srusper{f,aru er{J 'llal qoee ur passardxa ^{1a,rqca1as sr saua8 eq} Jo }esqns E qf,rr.Imd.q sursrue -qJaru pue selnr aql-uelqoJd gogeq sql eql ssnJsrp am raldeqc srql uI lsrrJ 'puP iop ]r saop lBqM'epeul eJuo 'apeur lcnpord aua8 qcea sl suoplpuoJ teq ^ repun 'pasn aJBlsll eq] uo spJom eql Jo acuenbas VNO eql ur slueruala eqt ^\oq llrorDl o] sr uralqord eql 'saseJ qtoq q 'areadsa4eqg dq ,fu1de ]f,nrtsuocer ot sn selqeue sprom qqpug 3o 1s11 e uuqt ursrue8Jo eql lcnJlsuof,eJ ol sn salquue eJoru ou-uerunq B Jo sapuoelf,nu uoIIIq ,ry\alaql Jo runrratceq p Jo sappoelcnu uoIlru 1v\eJaqt ll aq-usr -uu8ro ue;o acuanbesVNC aql;o uo4drrcsap alaldruoc e la 'slleJ sll lJnDSuoO o1 parmbar sepceloru uratord pue VNU eql Jo IIp sepoJue VNC s.ursrueSrouV

uolsseldxleuaDJo lotluo)


Chapter7:Controlof GeneExpression Figure7-1 A mammalianneuronand a lymphocyte.The long branchesof this neuronfrom the retinaenableit to receiveelectrical signalsfrom many cellsand carrythosesignalsto manyneighboringcells.The lymphocyteis a white bloodcellinvolvedin the immuneresponse to infectionand movesfreelythroughthe body.Bothof thesecellscontainthe same genome/but they expressdifferentRNAsand proteins.(From B.B.Boycott,Essays on the NervousSystem[R.Bellairs and E.G.Gray,eds.]. Oxford,UK:ClarendonPress, 1974.)

Further evidence that large blocks of DNA are not lost or rearranged during vertebrate development comes from comparing the detailed banding patterns detectable in condensed chromosomes at mitosis (seeFigure 4-l l). By this criterion the chromosome sets of differentiated cells in the human body appear to be identical. Moreover, comparisons of the genomes of different cells based on recombinant DNA technology have confirmed, as a general rule, that the changes in gene expression that underlie the development of multicellular organisms do not rely on changes in the DNA sequencesof the corresponding genes. There are, however, a few cases where DNA rearrangements of the genome take place during the development of an organism-most notably, in generating the diversity of the immune system of mammals, which we discussin Chapter 25.

DifferentCellTypesSynthesize DifferentSetsof proteins As a first step in understanding cell differentiation, we would like to know how many differences there are between any one cell type and another. Although we still do not have a detailed answer to this fundamental question, we can make certain general statements. 1. Many processes are common to all cells, and any two cells in a single organism therefore have many proteins in common. These include the structural proteins of chromosomes, RNA polymerases, DNA repair enzymes, ribosomal proteins, enzymes involved in the central reactions of metabolism, and many of the proteins that form the cytoskeleton. 2. Some proteins are abundant in the specializedcells in which they function and cannot be detected elsewhere,even by sensitive tests.Hemoglobin, for example, can be detected only in red blood cells. 3. Studies of the number of different mRNAs suggestthat, at any one time, a typical human cell expresses30-60% of its approximately 25,000 genes. \.\4ren the patterns of mRNAs in a series of different human cell lines are compared, it is found that the level of expression of almost every active gene varies from one cell tlpe to another. A few of these differences are striking, like that of hemoglobin noted above, but most are much more subtle. Even genes that are expressedin all cell types vary in their level of expression from one cell type to the next. The patterns of mRNA abundance (determined using DNA microarrays, discussed in chapter g) are so characteristic of cell type that they can be used to type human cancer cells of uncertain tissue origin (Figure 7-3).

in gene expressionbetween cell types is through methods that directly display the levels of proteins and their post-translational modifications (Figure 7-4).


a lymphocyte



n o r m a le m b r y o

u n f e r t i l i z e de g g

n u c l e u sd e s t r o y e d by UV light


secuo n of carrot

p r o li f e r a t n i g c e l lm a s s

separated c e l l si n r i c h liquid medium

meiotic indle ++

single cell

organized c l o n eo f dividing

young embryo

young plant



donorcell praceo N C XtIO Egg

D O N O RC E L L reconstructed wlTH To FUSE zygore ENUCLEATED E G GC E L L

e m b r' v o

e m b r y op l a c e di n foster mothers

unfertilized meioticspindle andassociated eggcell cnromo50mes removed Figure7-2 Evidencethat a differentiatedcell containsall the genetic instructionsnecessaryto direct into the formationof a completeorganism.(A)The nucleusof a skincellfrom an adultfrog transplanted egg can g ive riseto an entiretadpole.Thebrokenarrowindicatesthat,to g ivethe an enucleated a furthertransferstepis requiredin genometime to adjustto an embryonicenvironment, transplanted whichone of ihe nucleiis takenfrom the earlyembryothat beginsto developand is put backinto a cellsretainthe abilityto'dedifferentiate," egg.(B)In manytypesof plants,differentiated secondenucleated so that a singlecellcanform a cloneof progenycellsthat latergiveriseto an entireplant.(C)A egg from a differentcow can differentiated cellnucleusfrom an adultcow introducedinto an enucleated give riseto a calf.Differentcalvesproducedfrom the samedifferentiatedcell donor are genetically 1968. identicaland arethereforeclonesof one another.(A,modifiedfrom J.B.Gurdon,Sci.Am.2'19:24-35, American.) from Scientific With permission

of lts ExternalSignalsCanCausea Cellto Changethe Expression Genes Most of the specialized cells in a multicellular organism are capable of altering their patterns of gene expressionin responseto extracellularcues.If a liver cell is exposedto a glucocorticoid hormone, for example,the production of severalspecific proteins is dramatically increased.Glucocorticoids are releasedin the body during periods of starvation or intense exerciseand signal the liver to increasethe


Chapter7:ControlofGeneExpression unknown




I lung



I renal I

( A ) h u m a nb r a i n =-r '=





( B )h u m a n l i v e r -3.

-' l t




ia '1..







Figure7-4 Differencesin the proteins expressedby two human tissues.In each panel,the proteinsaredisplayedusingtwo-dimensional polyacrylamide-gel (see electrophoresis

spectrometry(seepp. 519-521\providemuch more detailedinformationand are thereforemore commonlyused.(Courtesy of Tim Myersand LeighAnderson,LargeScaleBiologyCorporation.)

Figure7-3 Differencesin mRNAexpression patternsamong different types of human cancercells.Thisfiguresummarizes a very largesetof measurements in whichthe mRNAlevelsof 1800selectedgenes (arrangedtop to bottom)were determined for 142differenthumantumor celllines (arrangedIeftto right),eachfrom a different patient.Eachsmallredbar indicates that the givengenein the giventumor is transcribed at a levelsignificantly higherthan the averageacrossall the celllines.Eachsmall greenbar indicatesa less-than-average expressionlevel,and eachblackbar denotes an expressionlevelthat is closeto average acrossthe differenttumors.The procedure usedto generatethesedata-mRNA isolationfollowed by hybridizationto DNA microarrays-isdescribedin Chapter8 figureshowsthat the @p.57a-575).The relativeexpressionlevelsof eachof the 1800 genesanalyzedvary among the different tumors(seenby followinga givengene from left to right acrossthe figure).This analysis alsoshowsthat eachtype of tumor hasa characteristic gene expressionpattern. Thisinformationcan be usedto "type" cancercellsof unknowntissueoriginby matchingthe gene expressionprofilesto those of known tumors.Forexample,the unknownsamplein the figurehasbeen identifiedas a lung cancer.(Courtesyof PatrickO. Brown,DavidBotstein,and the StanfordExpression Collaboration.)



Figure7-5 Sixstepsat which eucaryotic gene expressioncan be controlled. Controlsthat operateat steps1 through in this chapter.Step6, the 5 arediscussed regulationof proteinactivity,occurs largelythroughcovalentpostincluding modifications translational and phosphorylation, acetylation, ubiquitylation(seeTable3-3, p. 186)and in manychaptersthroughout is discussed the book.

production of glucose from amino acids and other small molecules; the set of proteins whose production is induced includes enz).rnessuch as tyrosine aminotransferase,which helps to convert tyrosine to glucose.\.A/henthe hormone is no longer present, the production of these proteins drops to its normal level. Other cell types respond to glucocorticoids differently. Fat cells, for example, reduce the production of tyrosine aminotransferase,while some other cell types do not respond to glucocorticoids at all. These examples illustrate a general feature of cell specialization: different cell types often respond differently to the same extracellular signal. Underlying such adjustments that occur in response to extracellular signals,there are features ofthe gene expressionpattern that do not change and give each cell type its permanently distinctive character.

CanBeRegulated at Manyof the Stepsin the GeneExpression Pathwayfrom DNAto RNAto Protein If differences among the various cell types of an organism depend on the particular genes that the cells express,at what level is the control of gene expression exercised?As we saw in the previous chapter, there are many steps in the pathway leading from DNA to protein. We now know that all of them can in principle be regulated.Thus a cell can control the proteins it makes by (l) controlling when and how often a given gene is transcribed (transcriptional control), (2) controlling the splicing and processingof RNA transcripts (RNA processing control), (3) selectingwhich completed mRNAs are exported from the nucleus to the cytosol and determining where in the c1'tosol they are localized (RNA transport and localization control), (4) selecting which mRNAs in the cytoplasm are translated by ribosomes (translational control), (5) selectively destabilizing certain mRNA molecules in the c1'toplasm (mRNA degradation control), or (6) selectively activating, inactivating, degrading, or locating specific protein molecules after they have been made (protein activity control) (FigureT-5). For most genes transcriptional controls are paramount. This makes sense because, of all the possible control points illustrated in Figure 7-5, only transcriptional control ensuresthat the cell will not synthesizesuperfluous intermediates. In the following sections we discuss the DNA and protein components that perform this function by regulating the initiation of gene transcription' We shall return at the end of the chapter to the many additional ways of regulating gene expressron.

Su m m a r y The genome of a ceII contains in its DNA sequencethe information to make many thousands of dffirent protein and RNA molecules.A cell typically expressesonly a fraction of its genes,and the dffirent types of cells in multicellular organisms arise Moreover,cells can change the pattern of becausedffirent setsof genesare expressed. genesthey expressin responseto changesin their enuironment, such as signalsfrom other cells.Although all of the stepsinuolued in expressing& Senecan in principle be regulated,for most genesthe initiation of RNA transcription is the most important point of control.



DNA-BINDING MOTIFS IN GENEREGUL ORY PROTEI NS How does a cell determine which of its thousands of genesto transcribe?As outlined in chapter 6, the transcription of each gene is controlled by a regulatory region of DNA relatively near the site where transcription begins. Some regulatory regions are simple and act as switches thrown by a single signal. Many others are complex and resemble tiny microprocessors, responding to a variety of signals that they interpret and integrate in order to switch their neighboring gene on or off. \.Vtrethercomplex or simple, these switching devices are found in all cells and are composed of two types of fundamental components: (l) short stretches ofDNA ofdefined sequence and (2) gene regulatory proteins that recognize and bind to this DNA. We begin our discussion of gene regulatory proteins by describing how they were discovered.

GeneRegulatory Proteins WereDiscovered UsingBacterial Genetics Genetic analysesin bacteria carried out in the 1950sprovided the first evidence for the existenceof gene regulatory proteins (often loosely called "transcription factors") that turn specific sets of genes on or off. one of these regulators, the lambda repressor, is encoded by a bacterial virus, bacteriophage lambtta. The repressor shuts off the viral genes that code for the protein components of new virus particles and thereby enables the viral genome to remain a silent passenger in the bacterial chromosome, multiplying with the bacterium when conditions are favorable for bacterial growth (seeFigure 5-78). The lambda repressor was among the first gene regulatory proteins to be characterized,and it remains one of the best understood, as we discuss later. other bacterial regulators respond to nutritional conditions by shutting off genes encoding specific sets of metabolic enzymes when they are not needed. The Lac repressor, the first of these bacterial proteins to be recognized, turns off the production of the proteins responsible for lactose metabolism when this sugar is absent from the medium. The first step toward understanding gene regulation was the isolation of mutant strains of bacteria and bacteriophage lambda that were unable to shut off specific sets of genes.It was proposed at the time, and later proven, that most of these mutants were deficient in proteins acting as specific repressorsfor these sets of genes. Because these proteins, like most gene regulatory proteins, are present in small quantities, it was difficult and time-consuming to isolate them. They were eventually purified by fractionating cell extracts. once isolated, the proteins were shor,unto bind to specific DNA sequencesclose to the genes that they regulate.The precise DNA sequencesthat they recognized were then determined by a combination of classical genetics and methods for studying protein-DNA interactions discussedlater in this chanter.

TheOutsideof the DNAHelixCanBeReadby proteins

Figure7-6 Double-helical structureof DNA.A space-filling modelof DNA showingthe majorand minorgrooveson the outsideof the doublehelix. The atomsare coloredasfollows:carbon,dark blue;nitrogen,tight blue; hydrogen,white;oxygen,red;phosphorus,yellow.



rflalor 9rooVe




6 ,g


l?)inor groovc

rDinorgroo!e groovs rn31or

tnalor 9roovq



Figure7-7 How the different basepairs in DNA can be recognizedfrom their edgeswithout the need to open the of basepairsareshown,with potentialhydrogenbond donorsindicatedin double helix.Thefour possibleconiigurations asa seriesof short s in red,and hydrogenbondsof the basepairsthemselves b/ue,potentialhydrogenbond accepior yellow, hydrogenatomsthat are and in shown protuberances, are groups, form hydrophobic parallelredlines.Methyl wfrich for hydrogenbonding,arc white.(FromC. Brandenand J.Tooze, and arethereforeunavailable attachedto carbons, 1999.) 2nd ed. NewYork:GarlandPublishing, Introductionto ProteinStructure,

sequence and another. It is now clear, however, that the outside of the double helix is studded with DNA sequence information that gene regulatory proteins can recognize without having to open the double helix. The edge of each base pair is exposed at the surface of the double helix, presenting a distinctive pattern ofhydrogen bond donors, hydrogen bond acceptors, and hydrophobic patches for proteins to recognize in both the major and minor groove (Figure 7-7). But only in the major groove are the patterns markedly different for each of the four base-pair arrangements (Figure 7-8). For this reason, gene regulatory proteins generally make specific contacts with the maior groove-as we shall see. m a J o rg r o o v e

m r n o rg r o o v e





e=H-bonddonor atom i,;r = hYdrogen


grouP ,,,. = methYl

Figure7-8 A DNA recognitioncode.The edgeof eachbasepair,seenherelooking directlyat the majoror minorgroove, containsa distinctivePatternof hydrogenbond donors,hydrogenbond acceptors,and methylgroups.Fromthe majorgroove,eachof the four base-pair projectsa uniquepattern configurations of features.Fromthe minor groove, however,the patternsare similarfor G-C and C-G as well as for A-T and T-A' The colorcodeis the sameasthat in Figure 7-7. (FromC. Brandenand J.Tooze, ed. lntroductionto ProteinStructure,2nd 1999.) NewYork:GarlandPublishing,


Chapter7: ControlofGeneExpression

ShortDNAsequences Are Fundamental componentsof Genetic Switches A specific nucleotide sequence can be "read" as a pattern of molecular features on the surface of the DNA double helix. particular nucleotide sequences,each typically less than 20 nucleotide pairs in length, function as fundamental com-

we now turn to the gene regulatory proteins themselves,the second fundamental component of genetic switches. we begin with the structural features that allow these proteins to recognize short, specific DNA sequencescontained in a much longer double helix.

GeneRegulatoryProteinsContainStructuralMotifsThatCan ReadDNASequences Molecular recognition in biology generally relies on an exact fit between the surfaces of two molecules, and the study of gene regulatory proteins has provided some of the clearest examples of this principle. A gene regulatory protein recognizes a specific DNA sequence because the surface of the protein is extensively

Table7-1 SomeGeneRegulatoryproteinsand the DNA sequencesThatThey Recognize















AACGGGTTAA lllll mm^^^^n I f WUUffiI

Bicoid Mammals




5p1 rll

cccc,cc Octl Poudomain



MyoD p53 *Forconvenience, onlyone recognition sequence, ratherthana consensus (seeFigure sequence 6 12),isgivenfor eachp.otein



complementary to the special surface features of the double helix in that region. In most cases the protein makes a series of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. Although each individual contact is weak, the 20 or so that are typically formed at the protein-DNA interface add together to ensure that the interaction is both highly specific and very strong (Figure 7-9). In fact, DNA-protein interactions include some of the tightest and most specific molecular interactions knor,rmin biology. Although each example of protein-DNA recognition is unique in detail, xray crystallographic and NMR spectroscopic studies of several hundred gene regulatory proteins have revealed that many of them contain one or another of a small set of DNA-binding structural motifs. These motifs generally use either a helices or p sheets to bind to the major groove of DNA; this groove, as we have seen, contains sufficient information to distinguish one DNA sequence from any other. The fit is so good that it has been suggestedthat the dimensions of the basic structural units of nucleic acids and proteins evolved together to permit these molecules to interlock.

Motif ls Oneof the Simplestand Most TheHelix-Turn-Helix Motifs CommonDNA-Binding The first DNA-binding protein motif to be recognized was the helix-turn-helix. Originally identified in bacterial proteins, this motif has since been found in many hundreds of DNA-binding proteins from both eucaryotes and procaryotes. It is constructed from two s helices connected by a short extended chain of amino acids,which constitutes the "turn" (Figure 7-10). The two helices are held at a fixed angle, primarily through interactions between the tvvo helices. The more C-terminal helix is called the recognition helixbecause it fits into the major groove of DNA; its amino acid side chains, which differ from protein to protein, play an important part in recognizing the specific DNA sequence to which the protein binds. Outside the helix-turn-helix region, the structure of the various proteins that contain this motif can vary enormously (Figure 7-ll). Thus each protein "presents" its helix-turn-helix motif to the DNA in a unique way, a feature thought to enhance the versatility of the helix-turn-helix motif by increasing the number of DNA sequencesthat the motif can be used to recognize.Moreover, in most of these proteins, parts of the polypeptide chain outside the helix-turn-helix domain also make important contacts with the DNA, helping to fine-tune the interaction.

sugar-phosphate backboneon outside of d o u b l eh e l i x

Figure7-9 The binding of a gene regulatory protein to the major groove of DNA.Onlya singlecontactis shown. Typically,the protein-DNAinterface would consistof 10-20 suchcontacts, involvingdifferentamino acids,each contributingto the strengthof the orotein-DNAinteraction.


Chapter7:Controlof GeneExpression

Figure7-10 The DNA-binding helix-turn-helix motif. The motif is shown in (A),where eachwhitecircle denotesthe centralcarbonof an amino acid.The C-terminalahelix (red is called the recognitionhelixbecauseit participates in sequence-specific recognitionof DNA.As shownin (B),this helixfits into the majorgrooveof DNA, whereit contactsthe edgesofthe base pairs(seealsoFigure7-7).fhe N-terminala-helix(blue)functions primarilyasa structuralcomponentthat helpsto positionthe recognitionhelix.

cooH (A)


The group of helix-turn-helix proteins shorrynin Figure 7-l l demonstrates a common feature of many sequence-specificDNA-binding proteins. They bind as symmetric dimers to DNA sequencesthat are composed of two very similar "half-sites," which are also arranged symmetrically (Figure z-r2).This arrangement allows each protein monomer to make a nearly identical set of contacts and enormously increases the binding affinity: as a first approximation, doubling the number of contacts doubles the free energy of the interaction and thereby squares the affinity constant.

HomeodomainProteinsConstitutea SpecialClassof Helix-Turn-Helix Proteins Not long after the first gene regulatory proteins were discovered in bacteria, genetic analyses in the fruit fly Drosophila led to the characterization of an imoortant class of genes, the homeotic selector genes,that play a critical part in orchestrating fly development. As discussed in chapter 22, they have since proved to have a fundamental role in the development of higher animals as well. Mutations in these genescan cause one body part in the fly to be converted into another, showing that the proteins they encode control critical developmental decisions. \Mhen the nucleotide sequences of several homeotic serector genes were determined in the early 1980s, each proved to code for an almost identical stretch of 60 amino acids that defines this class of proteins and is termed the homeodomain. \A/hen the three-dimensional structure of the homeodomain was determined, it was seen to contain a helix-turn-helix motif related to that of


tryptophan repressor

l a m b d aC r o



l a m b d ar e p r e s s o r f ragment



Figure7-1 1 Somehelix-turn-helixDNA-bindingproteins.All of the proteinsbind DNAasdimersin whichthe two copies of the recognitionhelix (redcylinder)are separatedby exactlyone turn of the DNA helix (3.4nm).The other helixof the helix-turn-helixmotif is coloredblue,as in Figure 7- 10.The lam bda repressor and Cro proteinscontrolbacteriophage lambdageneexpression, and the tryptophanrepressor and the cataboliteactivatorprotein(CAp)controlthe expression of setsof E.coli genes.


DNA-BINDING MOTIFSIN GENEREGULATORY PROTEINS FigureT-12 A specificDNA sequencerecognizedby the bacteriophage labeledin greenin this sequenceare lambdaCro protein.The nucleotides arrangedsymmetrically, allowingeachhalfof the DNAsiteto be recognized in the sameway by eachproteinmonomer,alsoshownin green.See Figure7-11 for the actualstructureof the protein.

s ' , rA A C A C




I ttrll



I c5'

the bacterial gene regulatory proteins, providing one of the first indications that the principles of gene regulation established in bacteria are relevant to higher organisms as well. More than 60 homeodomain proteins have now been discovered in Drosophila alone, and homeodomain proteins have been identified in virtually all eucaryotic organisms that have been studied, from yeasts to plants to humans. The structure of a homeodomain bound to its specific DNA sequence is shor,rmin Figure 7-13. \Mhereas the helix-turn-helix motif of bacterial gene regulatory proteins is often embedded in different structural contexts, the helix-turn-helix motif of homeodomains is always surrounded by the same structure (which forms the rest of the homeodomain), suggestingthat the motif is always presented to DNA in the same way. Indeed, structural studies have shor,vnthat a yeast homeodomain protein and a Drosophilahomeodomain protein have very similar conformations and recognize DNA in almost exactly the same manner, although they are identical at only 17 of 60 amino acid positions (seeFigure3-13).

ZincFingerMotifs ThereAreSeveralTypes of DNA-Binding The helix-turn-helix motif is composed solely of amino acids.A second important group of DNA-binding motifs includes one or more zinc atoms as structural components. Although all such zinc-coordinated DNA-binding motifs are called zinc fingers, this description refers only to their appearancein schematic drawings dating from their initial discovery (Figure 7-l4A). Subsequent structural that they fall into several distinct structural groups' two of studies have sho',n"ryr which we consider here. The first type was initially discoveredin the protein that activates the transcription of a eucaryotic ribosomal RNA gene. It has a simple structure, in which the zinc holds an cr helix and a B sheet together (Figure 7-l4B). This tlpe of zinc finger is often found in tandem clusters so that the cr helix of each can contact the major groove of the DNA, forming a nearly continuous stretch of o helices along the groove. In this way, a strong and specific DNA-protein interaction is built up through a repeating basic structural unit (Figure 7-r5). Another type of zinc finger is found in the large family of intracellular receptor proteins (discussedin detail in Chapter 15). It forms a different type of



Figure7-13 A homeodomainbound to its specificDNA sequence.Two different viewsof the samestructureareshown' (A)Thehomeodomainisfoldedinto three whichare packedtightlY o,helices, The togetherby hydrophobicinteractions' part containinghelices2 and 3 closely the helix-turn-helixmotif. resembles (B)The recognitionhelix (helix3, red)forms importantcontactswith the majorgroove (Asn)of helix3, for of DNA.Theasparagine example,contactsan adenine,asshownin FigureT-9.A flexiblearm attachedto helix 1 formscontactswith nucleotidepairsin the minorgroove.Thehomeodomain shownhereisfrom a yeastgeneregulatory protein,but it closelYresembles from manyeucaryotic homeodomains organisms.(AdaPtedfrom C.Wolbergeret al.,Cell67:517 -528, 1991. from Elsevier.) With oermission



25 HOOC---N\ K\ H

.D ./ \


1 ,,Y---NH2




o C




F 10


KV E 12


Figure7-14 One type of zincfinger protein.This protein belongsto the Cys-Cys-His-His familyof zincfinger proteins,namedafterthe aminoacidsthat graspthe zinc.(A)Schematic drawingof the aminoacidsequenceof a zincfinger from a frog proteinof this class.(B)The three-dimensional structureof this same type of zinc finger is constructedfrom an antiparallel B sheet(aminoacidsi to 10) followedby an o helix(aminoacids12to 24).Thefour aminoacidsthat bind the zinc (Cys3, Cys6, His19,and His23)hold one end of the o helixfirmlyto one end of the B sheet.(Adaptedfrom M.S.Leeet al., Science245:635-637,1989.With permissionfrom AAAS.)

structure (similar in some respects to the helix-turn-helix motif) in which two s helices are packed together with zinc atoms (Figure 7-16). Like the helix-turnhelix proteins, these proteins usually form dimers that allow one of the two cr helices of each subunit to interact with the major groove of the DNA. Aithough the two types of zinc finger structures discussed in this section are structurally distinct, they share two important features: both use zinc as a structural element, and both use an cr helix to recognize the major groove of the DNA.

p sheetsCanAlsoRecognize DNA

sequencerecognized depends on the sequence of amino acids that make up the B sheet.


Figure7-15 DNA binding by a zincfinger protein. (A)The structureof a fragmentof a mousegeneregulatoryproteinboundto a specificDNA site.This protein recognizes DNAby usingthreezincfingersof the Cys-Cys-His-His type (seeFigure7-1 4) arrangedas direct repeats.(B)The threefingershavesimilaraminoacid sequences and contactthe DNAin similar ways.ln both (A)and (B)the zincatom in eachfinger is representedby a small sphere.(Adaptedfrom N. Pavletichand C. Pabo,Science252:8'10-817, 1991.With permissionfrom AAAS.)


SomeProteinsUseLoopsThatEnterthe Majorand Minor DNA Groovesto Recognize


Figure7-16 A dimer of the zincfinger domain of the intracellularreceptor family bound to its sPecificDNA sequence,Eachzincfingerdomain containstwo atomsof Zn (indicatedby one stabilizesthe the smallgrayspheres); DNArecognitionhelix(shownin brownin one subunitandredin the othed,and one stabilizesa loop (shownin purple) involvedin dimerformation.EachZn atom is coordinatedby four appropriately spacedcysteineresidues.Likethe helix-turn-helixproteinsshownin Figure 7-11,the two recognitionhelicesof the dimerare heldapartbYa distance to one turn of the DNA corresponding doublehelix.The specificexampleshown is a fragmentof the glucocorticoid Thisis the proteinthrough receptor. whichcellsdetectand respond to the glucocorticoid transcriptionally hormonesproducedin the adrenalgland in responseto stress.(Adaptedfrom B.F.Luisiet al.,Naturc 352:497-505' 1991from Macmillan With permission Ltd.) Publishers

many tumors, as we shall see in Chapter 20. Many of the p53 mutations obseived in cancer cells destroy or alter its DNA-binding properties; indeed, Arg 248, which contacts the minor groove of DNA (seeFigure 7-lB) is the most frequently mutated p53 residue in human cancers'

TheLeucineZipperMotif MediatesBothDNABindingand Protein Dimerization Many gene regulatory proteins recognize DNA as homodimers, probably because,as *" hau" seen, this is a simple way of achieving strong specific binding (seeFigure 7-12). Usually, the portion of the protein responsible for dimerizition is distinct from the portion that is responsible for DNA binding. One motil however, combines these two functions elegantly and economically. It is called the leucine zipper motif, so named because of the way the two o helices, one from each monomer, are joined together to form a short coiled-coil (seeFigure 3-9). The helices are held together by interactions between hydrophobic amino acid side chains (often on leucines) that extend from one side of each helix. Just beyond the dimerization interface the two s helices separate from each other to form aY-shaped structure, which allows their side chains to contact the major groove of bNR. The dimer thus grips the double helix like a clothespin on a clothesline (Figure 7-lS).

Figure7-17 The bacterialMet repressor protein.The bacterialMet repressor the genesencodingthe regulates methionine enzymesthat catalYze Whenthis aminoacidis synthesis. abundant,it bindsto the repressor, causinga changein the structureof the proteinthat enablesit to bind to DNA of the tightly,shuttingoff the synthesis enzyme.(A)In orderto bind to DNA mustbe tightly,the Met rePressor methionine, complexedwith 5-adenosyl outlinedin red.Onesubunitof the dimericprotein is shown in green,while the other is shown in b/ue'The twostrandedB sheetthat bindsto DNAis formedby one strandfrom eachsubunit and is shown in darkgreenand dorkblue. (B)Simplifieddiagramof the Met reoressorbound to DNA,showinghow the two-strandedB sheetof the repressor bindsto the majorgrooveof DNA' For clarity,the other regionsof the repressor havebeen omitted. (A,adaptedfrom 5. Phillips,Cun.Opin.Struct.Biol.1:89-98, from Elsevier; 1991,with permission B,adaptedfrom W. Somersand S. Phillips,Nature 359:387-393, 1992' from Macmillan with permission Ltd.) Publishers


Chapter7: Controlof GeneExpression

Figure7-18 DNA recognitionby the p53 protein,The most important DNA contactsaremadeby arginine248and lysine120,whichextendfrom the protrudingloopsenteringthe minorand major grooves.The folding of the p53 protein requiresa zinc atom (shownas a sphere), but the way in whichthe zincis graspedby the protein is completely differentfrom that of the zinc finger proteins,describedpreviously.

Heterodimerization Expandsthe Repertoireof DNAsequences ThatGeneRegulatoryproteinsCanRecognize

structureshown is of the yeastGcn4protein,which regulates transcription in response to the availability of aminoacidsin the environment.(Adapted from T.E.Ellenbergeret al.,CeII 7 j :1223_1237, 1992.With permissionfrom Elsevier.)

Figure7-20 Heterodimerizationof leucinezipper proteins can alter their DNA-bindingspecificity.Leucine zipperhomodimersbind to symmetricDNAsequences, as shown in the left-handand centerdrawings.Thesetwo proteinsrecognizedifferentDNA sequences, as indicated by the redand b/ueregionsin the DNA.The two different monomerscan combineto form a heterodimer,which now recognizesa hybrid DNAsequence,composedfrom one red and one 6/ueregion.




There are, however, Iimits to this promiscuity: for example, if all the many tlpes of leucine zipper proteins in a typical eucaryotic cell formed heterodimers, the amount of "cross-talk" between the gene regulatory circuits of a cell would presumably be so great as to cause chaos. lVhether or not a particular heterodimer can form depends on how well the hydrophobic surfaces of the two Ieucine zipper a helices mesh with each other, which in turn depends on the exact amino acid sequencesof the two zipper regions.Thus, each leucine zipper protein in the cell can form dimers with only a small set of other leucine zipper proteins. Heterodimerization is an example of combinatorial control, in which combinations of different proteins, rather than individual proteins, control a cell process. Heterodimerization as a mechanism for combinatorial control of gene expression occurs in many different rypes of gene regulatory proteins (Figure 7-21). Combinatorial control is a major theme that we shall encounter repeatedly in this chapter, and the formation of heterodimeric gene regulatory complexes is only one of manyways in which proteins work in combinations to control gene expression. Certain combinations of gene regulatory proteins have become "hardwired" in the cell; for example, two distinct DNA-binding domains can, through gene rearrangements occurring over evolutionary time scales,become joined into a single pollpeptide chain that displays a novel DNA-binding specificity (Figure 7-22).

425 FigureT-21 A heterodimercomposedof two homeodomain proteins bound to its DNA recognitionsite.Theyel/owhelix4 of the proteinon the right (Mato'2)is in the absenceofthe protein unstructured forminga helixonly on the left (Mata1), The DNA upon heterodimerization. sequenceis recognizedjointly by both proteins;someof the Protein-DNA contactsmade by Mato2 were shown in Figure7-13.Thesetwo proteinsarefrom buddingyeast,wherethe heterodimer a particularcelltype (seeFigure specifies 7-65).The helicesare numberedin with Figure7-13' (Adapted accordance from T. Li et al.,Science270:262-269,1995. With permissionfrom AAA5.)

and DNA MotifAlsoMediatesDimerization TheHelix-Loop-Helix Binding Another important DNA-binding motif, related to the leucine zipper, is the (HLH) motif, which differs from the helix-turn-helix motif helix-loop-helix discussed earlier. An HLH motif consists of a short cr helix connected by a loop to a second, longer crhelix. The flexibility of the loop allows one helix to fold back

Figure7 -22 fwo DNA-bindingdomains covalentlyjoined bY a flexible polypeptide.The structureshown(called of both a consists a Pou-domain) homeodomainand a helix-turn-helix structurejoined by a flexiblepolypeptide by the brokenlines. "leashi'indicated A singlegeneencodesthe entireproteln, asa continuous which is synthesized polypeptidechain.The covalentjoining of two structuresin this way resultsin a in the affinityof the largeincrease proteinfor its specificDNA sequence comparedwith the DNAaffinityof either separatestructure.The grouP of mammaliangeneregulatoryproteins exemplifiedby this structureregulatethe productionof growth factors, and other molecules immunoglobulins, The particular involvedin development. exampleshownisfrom the Octl protein.(Adaptedfrom J.D'Klemmet al', Cell77:21-32, 1994.With permission from Elsevier.)


Chapter7: Controlof GeneExpression Figure7 -23 A helix-loop-helix (HLH) dimer bound to DNA.The two monomersare held together in a fourhelixbundle:eachmonomercontributes two o helicesconnectedby a flexible loop of protein (redJ.A specificDNA sequenceis bound by the two s helices that projectfrom the four-helixbundle. (Adaptedfrom A.R.Ferre-DAmare et al., Nature363:38-45,1993.With permission from MacmillanPublishers Ltd.)

and pack against the other. As shown in Figure 7-23, this two-helix structure binds both to DNA and to the HLH motif of a second HLH protein. The second HLH protein can be the same (creating a homodimer) or different (creating a heterodimer). In either case, tvvo s helices that extend from the dimerizati,on interface make specific contacts with the DNA.

It ls NotYetPossible to Predictthe DNAsequences Recognized by All GeneRegulatory Proteins The various DNA-binding motifs that we have discussed provide structural frameworks from which specific amino acid side chains to conracr spe"*t"nd cific base pairs in the DNA. It is reasonable to ask, therefore, whether there is a

Having outlined the general features of gene regulatory proteins, we turn to some of the methods that are now used to studv them.

a c t i v eH L Hh o m o d i m e r

inactiveHLH heterodimer

Figure7 -24 lnhibitory regulation by truncatedHLHproteins.The HLHmotif is responsible for both dimerization and DNAbinding.On the /eft,an HLH homodimerrecognizes a symmetricDNA sequence.On the right,the binding of a full-lengthHLHprotein (blue)to a truncatedHLHprotein(green)that lacks the DNA-binding cr helixgenerates a heterodimer that is unableto bind DNA tightly. lf presentin excess,the truncated proteinmoleculeblocksthe homodimerization of the full-lengthHLH proteinand therebypreventsit from bindingto DNA.


PROTEINS DNA-BINDING MOTIFSIN GENEREGULATORY Figure7-25 One of the most common protein-DNA interactions' Becauseof its specificgeometryof hydrogen-bondacceptors(see guanine. recognizes Figure7-7),the sidechainof arginineunambiguously Figure7-9 showsanothercommonprotein-DNAinteraction.

a r g I nI n e

A Gel-MobilityShiftAssayReadilyDetectsSequence-Specific Proteins DNA-Binding Genetic analyses,which provided a route to the gene regulatory proteins of bacteria, yeast, and Drosophila, are much more difficult in vertebrates. Therefore, the isolation of vertebrate gene regulatory proteins had to await the development of different approaches.Many of these approachesrely on the detection in a cell extract of a DNA-binding protein that specifically recognizes a DNA sequence known to control the expressionof a particular gene. One of the most common ways to detect and study sequence-specificDNA-binding proteins is based on the effect of a bound protein on the migration of DNA molecules in an electric field.

major 9roove H



N illillillllllllH-

mrnor groove

6E 5 4 3 2&

TTKfinger 1

TTK finger 2

6l 5 4 3

Zif finger 1

Zif finger 2

G L If i n g e r 4

G L If i n g e r 5

FigureT-26 Summaryof sequencespecificinteractionsbetween six different zinc fingers and their DNA recognitionsequences.Eventhough all sixZn fingershavethe sameoverall structure(seeFigure7-14),eachbindsto numbered a differentDNAsequence.The aminoacidsform the q, helixthat recognizesDNA (Figures7-14 andT-15), and thosethat makesequence-specific DNAcontactsaregreen.Basescontacted by protein arc orunge.Although contactsarecommon arginine-guanine (seeFigure7-25),guaninecan alsobe recognizedby serine,histidine,and lysine,as shown.Moreover,the same in this example)can aminoacid(serine, recognizemore than one base.Two of the Zn fingersdepictedarefrom the TTK protein (a Drosophilaprotein that functionsin development);two arefrom the mouseprotein (Zif268)that was shownin Figure7-15;and two arefrom a humanprotein(GL1)whoseaberrant forms can causecertaintypes of cancers. (Adaptedfrom C. Brandenand J.Tooze, lntroductionto ProteinStructure,2nd ed. 1999') NewYork:GarlandPublishing,


Chapter7: Control ofGene Expression

A DNA molecule is highly negatively charged and will therefore move rapidly toward a positive electrode when it is subiected to an electric field. \fhen anaIyzed by polyacrylamide-gel electrophoresis (see p. 534), DNA molecules are separared according to their size because smaller molecules are able to penetrate the fine gel meshwork more easily than large ones. protein molecules bound to a DNA molecule will cause it to move more slowly through the gel; in general, the larger the bound protein, the greater the retardation of the DNA molecule. This phenomenon provides the basis for the gel-mobility shift assay, which allows even trace amounts of a sequence-specificDNA-binding p.otein io be readily detected. In this assay,a short DNA fragment of specific iength and sequence (produced either by DNA cloning or by chemical synthesis, as discussed in chapter B) is radioactively labeled and mlred with a cell extract; the mixture is then loaded onto a polyacrylamide gel and subjected to electrophoresis. If the DNA fragment corresponds to a chromosomal region where, for example, several sequence-specificproteins bind, autoradiography (seepp. 602-603) will reveal a series of DNA bands, each retarded to a different exrent and representing a distinct DNA-protein complex. The proteins responsible for each band on the gel can then be separated from one another byiubsequent fractionations of the cell extract (Figure z-27). once a sequence-specificDNA protein has been purified, the gel-mobility shift assaycan be used to study the strength and specificity of its interactions with different DNA sequences, the lifetime of DNA-protein complexes, and other properties critical to the functioning of the protein in the cell.

DNAAffinitychromatography Facilitates the purification of proteins Sequence-Specific DNA-Binding A particularly powerful protein-purification method called DNA affinity chromatography can be used once the DNA sequencethat a gene regulatory protein recognizeshas been determined.A double-strandedoligonucleotideof the correct sequence is synthesizedby chemical methods and linked to an insoluble porous matrix such as agarose;the matrix with the oligonucleotide attached is




a c o c

c5 freeDNA






er -




-C5 -


gel result

, 8 * r ,' - *




C 1-

-C1 c o


C2-k#q@ C3-











-free DNA


_c6 (A)

c o o o





f r a c t i o nn u m b e rf r o m c h r o m o t o g r a p h y c o l u m ne l u t e dw i t h i n c r e a s i n g s a l tc o n c e n t r a t i o n

Figure7-27 A gel-mobility shift assay. The principleof the assayis shown schematically in (A).In this examplean extractof an antibody-producing cellline is mixedwith a radioactive DNAfragment containingabout 160nucleotides ofa regulatoryDNAsequence from a gene encodingthe light chainof the antibody madeby the cellline.Theeffectof the proteinsin the extracton the mobilityof the DNAfragmentis analyzedby polyacrylamide-gel electrophoresis followedby autoradiography. Thefree DNAfragmentsmigraterapidlyto the bottom of the gel,whilethosefragments boundto proteinsareretarded; the findingof six retardedbandssuggests that the extractcontainssixdifferent sequence-specific proteins DNA-binding (indicatedasC1-C6)that bind to this DNA (Forsimplicity, sequence. any DNA fragmentswith morethan one protein bound havebeenomittedfrom the figure.)In (B)a standardchromatographic technique(seepp. 512-513) wasusedto fractionatethe extract(top) and each fractionwasmixedwith the radioactive DNAfragment,appliedto one laneof a polyacrylamide gel,and analyzedas in (A). (8,modifiedfrom C.Scheidereit, A Heguy and R.G.Roeder,Cell51:783-793,1987. With permission from Elsevier.)



then used to construct a column that selectively binds proteins that recognize the particular DNA sequence (Figure 7-28). Purifications as great as 10,000-fold can be achieved by this means with relatively little effort. Although most gene regulatory proteins are present at very low levels in the cell, enough pure protein can usually be isolated by affinity chromatography to obtain a partial amino acid sequenceby mass spectrometry or other means (discussed in Chapter 8). If the complete genome sequence of the organism is known, the partial amino acid sequence can be used to identify the gene. The gene not only provides the complete amino acid sequenceof the protein; it also provides the means to produce the protein in unlimited amounts through genetic engineering techniques, also discussedin Chapter 8.

ProteinCan by a GeneRegulatory Recognized TheDNASequence Experimentally BeDetermined Gene regulatory proteins can be discovered before the DNA sequence they recognize is known. For example, many of the Drosophilahomeodomain proteins were discoveredthrough the isolation of mutations that altered fly development. This allowed the genes encoding the proteins to be identified, and the proteins could then be overexpressedin cultured cells and easily purified. DNA foot' printingis one method of determining the DNA sequencesrecognized by a gene regulatory protein once it has been purified. This strategy also requires a purified fragment of duplex DNA that contains somewhere within it a recognition site for the protein. Short recognition sequences can occur by chance on any long DNA fragment, although it is often necessaryto use DNA corresponding to a regulatory region for a gene kno',.tmto be controlled by the protein of interest. DNA footprinting is based on nucleasesor chemicals that randomly cleaveDNA at every phosphodiester bond. A bound gene regulatory protein blocks the phosphodiester bonds from attack, thereby revealing the protein's precise recognition site as a protected zone, or footprint (Figure 7-ZS). A second way of determining the DNA sequencesrecognized by a gene regulatory protein requires no prior knowledge of what genes the protein might

D N A - b i n d i n gp r o t e i n s from step 1

t o t a l c e l lp r o t e i n s aa aa o ao

STEP2 aoo at

t a


t c o l u m nw i t h m a t r i x c o n t a i n i n go n l y GCGCCC CCqGGG

m e d i u m - s a lwt a s h removesall proteins n o t s p e c i f i cf o r


?3 a

1t . .i ' t!

h i g h - s a lw t ash e l u t e sr a r e p r o t e i n that specifically recoqnizes GGGCCC -


a o



Figure7-28 DNA affinity chromatography. In the firststep,allthe proteinsthat can from the remainder bind DNAareseparated of the cellproteinson a columncontaining a hugenumberof differentDNAsequences. DNA-binding Mostsequence-specific proteinshavea weak(nonspecific) affinity for bulk DNAand arethereforeretainedon the column.Thisaffinityis due largelyto and the proteinscan be ionicattractions, washedoff the DNAbYa solutionthat of salt' containsa moderateconcentration In the secondstep,the mixtureof DNAbindingproteinsis passedthrougha columnthat containsonlYDNAof a all the DNAparticularsequence. Typically, bindingproteinswill stickto the column, the great majoritybY nonsPecific Theseareagainelutedby interactions. solutionsof moderatesaltconcentration, leavingon the columnonlythoseproteins (typicallyone or only a few)that bind and thereforeverytightlyto the specifically particularDNAsequence. Theseremaining proteinscan be elutedfrom the columnby solutionscontaininga verYhigh of salt. concentration


Chapter7:Controlof GeneExpression region of DNA protected b y D N A - b i n d i n gp r o t e i n



f a m i l yo f s i n g l e - s t r a n d eDdN A m o l e c u l e sl a b e l e da t t h e 5 , e n d

,rro^o'o,., By cELELEcrRopHoRESrs |

ililill (B)

"footprint, " w h e r e n o c l e a v a g ei s observed

tliltl t top of gel

Figure7-29 DNA footprinting. (A)Schematic of the method.A DNA fragmentis labeledat one end with 328a proceduredescribedin Figure8-34; next, the DNAis cleavedwith a nuclease or chemicalthat makesrandom,singlestrandedcuts.Afterthe DNAmoleculeis denaturedto separateits two strands, the resultantfragmentsfrom the labeled strandareseparated on a gel and detectedby autoradiography(seeFigure 8-33).The patternof bandsfrom DNA cut in the presence of a DNA-bindingprotein is comparedwith that from DNAcut in its absence.When protein is present,it coversthe nucleotides at its bindingsite and protectstheir phosphodiester bonds from cleavage.As a result,those labeled fragmentsthat would otherwise terminatein the bindingsiteare missing, leavinga gap in the gel patterncalleda "footprint." In the exampleshown,the DNA-binding proteinprotectsseven phosphodiester bondsfrom the DNA cleavingagent.(B)An actualfootprint usedto determinethe bindingsitefor a generegulatoryproteinfrom humans. Thecleavingagentwasa small,ironcontainingorganicmoleculethat normallycutsat everyphosphodiester bond with nearlyequalfrequency. (8,courtesyof MicheleSawadogoand RobertRoeder.)

without protein with protein"[,lt gene regulatory p r o t e i no f u n k n o w n D N A - b i n d i n gs p e c iifc i t y

sequencerecognizedby a gene regulatoryprotein is known, computerized genomesearchescan identifu candidategeneswhose transcriptionthe gene

Figure7-30 A methodfor determiningthe DNAsequencerecognizedby a gene regulatoryprotein.A purifiedgeneregulatoryproteinis mixed with millionsof differentshort DNAfragments,eachwith a different sequenceof nucleotides. A collectionof suchDNAfraqmentscan be

r,#xadrtq4l f*1!Pitcrs$

separation is throughgel-mobilityshifts,as illustrated in Figure7_27.After separationof the DNA-proteincomplexesfrom the free DNA,the DNA fragmentsare removedfrom the proteinand typicallyusedfor several additionalroundsof the sameselectionprocess(not shown).The nucleotidesequences of thoseDNAfragmentsthat remainthrough multipleroundsof bindingand release can be determined, and a consensus DNArecognitionsequencecanthus be generated.


I I I +











upstreamof the samegenefrom five footprinting.ThisexamplecomparesDNAsequences Figure7-31 Phylogenetic footprintingrevealsDNArecognition yellow. Phylogenetic in yeasts; highlighted nucleotides are identical closelyrelated Onlythe regionupstream than surroundingsequences. sitesfor regulatoryproteins,asthey aretypicallymoreconserved genomes. The gene entire to analyze used is typically particular the approach geneis shownin this example,but of a regulatoryproteinsthat bind to the siteoutlinedin red areshownin Figure7-21.Someof the shorterphylogenetic footprintsin this examplerepresentbindingsitesfor additionalgeneregulatoryproteins,not all of which havebeen Ltd.,and (FromM. Kelliset al.,Nature423:241-254,2003, from MacmillanPublishers with permission identified. from NationalAcademyof Sciences.) 01:18069-18074,2004,with permission D.J.Galgoczyet al.,Proc.NatlAcad.Sci.IJ.S.A.1 regulatory protein of interest might control. However, this strategy is not foolproof. For example, many organisms produce a set of closely related gene regulatory proteins that recognize very similar DNA sequences, and this approach cannot resolve them. In most cases, predictions of the sites of action of gene regulatory proteins obtained from searching genome sequences must, in the end, be tested experimentally.

Sequences FootprintingldentifiesDNARegulatory Phylogenetic Genomics ThroughComparative The widespread availability of complete genome sequencesprovides a surprisingly simple method for identi$ring important regulatory sites on DNA, even when the gene regulatory protein that binds them is unknown. In this approach, genomes from several closely related species are compared. If the species are chosen properly, the protein-coding portions of the genomes will be very similar, but the regions between sequencesthat encode protein or RNA molecules will have diverged considerably, as most of this sequence is functionally irrelevant and therefore not constrained in evolution. Among the exceptions are the regulatory sequences that control gene transcription. These stand out as conserved islands in a sea of nonconserved nucleotides (Figure 7-31 ) . Although the identity of the gene regulatory proteins that recognize the conserved DNA sequencesmust be determined by other means, phylogenetic footprinting is a powerful method for identifuing many of the DNA sequencesthat control gene expression.

ldentifiesManyof the SitesThat Chromatinlmmunoprecipitation ProteinsOccupyin LivingCells GeneRegulatory A gene regulatory protein will not occupy all of its potential DNA-binding sites in the genome at a particular time. Under some conditions, the protein may not be synthesized,and so will be absent from the cell; it may be present but lacking a heterodimer partner; or it may be excluded from the nucleus until an appropriate signal is received from the cell's environment. Even if the gene regulatory


Chapter7:Controlof GeneExpression Figure7-32 Chromatinimmunoprecipitation. Thismethodallowsthe identification of all the sitesin a genomethat a generegulatoryprotein occupiesin vivo For the amplificationof DNA by a polymerasechain reaction(PCR), seeFigure8-45.The identitiesof the precipitated, amplified DNAfragmentscan be determinedby hybridizingthe mixtureof fragmentsto DNAmicroarrays, asdescribedin ChapterB.

regulatoryproteinA :w

w: gene1 proteinB


living cell

g e n e2

protein is present in the nucleus and is competent to bind DNA, components of chromatin or other gene regulatory proteins that can bind to the same or overlapping DNA sequences may occlude many of its potential binding sites on DNA. chromatin immunoprecipitation provides one way of empirically determining the sites on DNA that a given gene regulatory protein occupies under a particular set of conditions (Figure z-32).ln this approach, proteins are covalently cross-linked to DNA in living cells, the cells are broken open, and the DNA is mechanically sheared into small fragments. Antibodies directed against a given gene regulatory protein are then used to puriff DNA that became covalently cross-linked to that protein in the cell. If this DNA is hybridized to microarrays that contain the entire genome displayed as a seriesof discrete DNA fragments (see Figure 8-73), the precise genomic location of each precipitated DNA fragment can be determined. In this way, all the sites occupied by the gene regulatory protein in the original cells can be mapped on the cell's genome (Figure 7-33). chromatin immunoprecipitation is also routinely used to identify the positions along a genome that are packaged by the various types of modified histones (discussedin chapter 4). In this case,antibodies specific to the particular histone modification of interest are employed.

Summ a r y Gene regulatory proteins recognizeshort stretchesof double-helical DNA of defined sequenceand therebydetermine which of the thousandsof genesin a ceII will be transcribed.Thousandsof gene regulatory proteins haue been identified in a witJe uariety of organisms.Although each of theseproteins has unique features, most bind to DNA as homodimers or heterodimersand recognizeDNA through one of a small number of structural motifs. The common motifs include the helix-turn-helix, the homeodomain, the leucine zipper, the helix-loop-helix, and zinc fingers of seueral rypes.The preciseamino acid sequencethat isfolded into a motif determinesthe particular DNA sequencethat a gene regulatory protein recognizes.Heterodimerization increasesthe rangeof DNAsequencesthat can be recognized.Powerful techniquesare now auailable for identifying and isolating theseproteins, the genesthat encodethem, and the DNA sequencesthey recognize,and for mapping all of the genes that they regulate on a genome.

HOWGENETIC SWITCHES WORK In the previous section, we described the basic components of genetic switches: gene regulatory proteins and the specific DNA sequences that these proteins recognize.we shall now discusshow these components operate to turn geneson and off in response to a variety of signals. In the mid-twentieth century, the idea that genes could be switched on and wa9 revolutionary. This concept was a major advance, and it came originally 9ff from the study of how E coli bacteria adapt to changes in the composition of their growth medium. Parallel studies of the lambda bacteriophage lea to many of the same conclusions and helped to establish the underlying mechanism. Many of the same principles apply to eucaryotic cells. However, ihe enormous complexity of gene regulation in higher organisms,combined with the packaging

pRorEtNS To I cnoss-r-rrur +





+ many other DNA fragments that comprisethe rest of the genome

DNAustNG I cnecrerrnre





cnromosome numoer










MATU1 & MATri2



I C S 2& A M N l














9 10 11







FARl S A G1 | 5TE3 | YLR04OC C C W 1 2 & HOG1

tx o




13 't4


1s|o to




I MFal


of their DNA into chromatin, createsspecial challenges and some novel opportunities for control-as we shall see.We begin with the simplest example-an on-off switch in bacteria that responds to a single signal.

TheTryptophanRepressor !sa SimpleSwitchThatTurnsGenes On and Off in Bacteria The chromosome of the bacterium E. coli, a single-celled organism, is a single circular DNA molecule of about 4.6 x 106nucleotide pairs. This DNA encodes approximately 4300 proteins, although the cell makes only a fraction of these at any one time. The expression of many genes is regulated according to the available food in the environment. This is illustrated by the five E. coli genesthat code for enzymes that manufacture the amino acid try,ptophan. These genes are arranged as a single operon; that is, they are adjacent to one another on the chromosome and are transcribed from a single promoter as one long mRNA molecule (Figure 7-34). But when tryptophan is present in the growth medium and enters the cell (when the bacterium is in the gut of a mammal that has just eaten a meal of protein, for example), the cell no longer needs these enzyrnes and shuts off their production. The molecular basis for this switch is understood in considerable detail. As described in Chapter 6, a promoter is a specific DNA sequence that directs RNA polymerase to bind to DNA, to open the DNA double helix, and to begin



E .c o / i c h r o m o s o m e

operator m R N Am o l e c u l e

r+ttt Of-a.o e n z y m e sf o r t r y p t o p h a nb i o s y n t h e s i s

Figure7-33 A gene regulatorycircuit:the complete set of genescontrolled by three key regulatory proteins in budding yeast, as deduced from the DNA siteswhere the regulatoryproteinsbind,The regulatory proteins-calledMata1,Mat()(1, and Mato2-specify the two differenthaploid matingtypes(analogous to maleand femalegamates)of this unicellular organism. The 16chromosomes in the yeast genomeare shown (gray),withcoloredbars indicatingsiteswherevariouscombinations of the threeregulatoryproteinsbind. Aboveeachbindingsiteis the nameof the proteinproductofthe regulatedtarget gene.Mats1,actingin a complexwith anotherprotein,Mcm1,activates expressionof the genesmarkedin red; Mat02,actingin a complexwith Mcm1, represses the genesmarkedin blue;and Matal in a complexwith Matcx,2 represses the genesmarkedin green(seeFigures 7 -21 and 7-65). Doublearrowheads genes, represent divergentlytranscribed whicharecontrolledby the indicatedgene regulatoryproteins. Thiscompletemap of bound regulatoryproteinswasdetermined usinga combinationof genome-wide (seeFigure chromatinimmunoprecipitation 7-32) and phylogeneticfootprinting (see Figure7-29).Suchdeterminationsof completetranscriptional circuitsshowthat transcriptional networksare not infinitely complex,althoughthey may appearthat way initially. Thistype of studyalsohelpsto revealthe overalllogicof the transcriptional circuitsusedby modern cells.(FromD.J.Galgoczyet al.,Proc.Natl Acad.Sci.U.5.4.101:18069-18074,2004. With permission from NationalAcademyof Sciences.)

Figure7-34 The clusteredgenesin E colithat code for enzymesthat manufacturethe amino acid tryptophan. Thesefive genesof the Irp operon-denoted as TrpA,B,C,D, and E-are transcribed asa singlemRNA molecule, whichallowstheirexpression to be controlledcoordinately. Clusters of genestranscribed asa singlemRNA moleculearecommonin bacteria. Each suchclusteris calledan ooeron.


Chapter7: ControlofGeneExpression promoter start of transcription 35


-10 operaror



inactiverepressor R N Ap o l y m e r a s e




active repressor


synthesizing an RNA molecule. Within the promoter that directs transcription of the trlptophan biosynthetic genes lies a regulator element called an operator (see Figure 7-34). This is simply a short region of regulatory DNA of defined nucleotide sequence that is recognized by a repressor protein, in this case the tryptophan repressor, a member of the helix-turn-helix family (see Figure 7-ll). The promoter and operator are arranged so that when the tryptophan repressor occupies the operator, it blocks accessto the promoter by RNA polymerase, thereby preventing expression of the tryptophan-producing enzymes (Figure 7-35). The block to gene expression is regulated in an ingenious way: to bind to its operator DNA, the repressor protein has to have two molecules of the amino

Figure7-35 Switchingthe tryptophan geneson and off. lf the levelof tryptophaninsidethe cell is low,RNA polymerase bindsto the promoterand transcribes the fivegenesof the tryptophan (Irp) operon.lf the levelof tryptophanis high,however, the tryptophan repressoris activatedto bind to the operator,where it blocksthe bindingof RNApolymerase to the promoter.Wheneverthe levelof intracellular tryptophandrops,the repressor releases itstryptophanand becomesinactive, allowingthe polymerase to begintranscribing these genes.The promoterincludestwo key blocksof DNAsequenceinformation, the - 3 5 a n d - 1 0 r e g i o n sh i g h l i g h t e idn yellow(seeFigure6-12).




Figure7-36 The binding of tryptophan to the tryptophan repressorprotein changesits conformation.Thisstructural changeenablesthis generegulatoryproteinto bind tightlyto a specificDNAsequence(theoperator), therebyblocking transcription of the genesencodingthe enzymesrequiredto producetryptophan(the Trpoperon).Thethree-dimensional structureof this bacterialhelix-turn-helixprotein,asdeterminedby x-raydiffractionwith and withouttryptophanbound, is illustrated' Tryptophanbindingincreases the distancebetweenthe two recognitionhelicesin the homodimer,allowing the repressorto fit snuglyon the operator.(Adaptedfrom R.Zhanget al.,Nature327:591-597 ,1 9g7.With permissionfrom MacmillanPublishers Ltd.)


Becausethe active, DNA-binding form of the protein serves to turn genes off, this mode of gene regulation is called negative control, and the gene regulatory proteins that function in this way are called transcriptional repressorsor gene repressorproteins.

Transcriptional ActivatorsTurnGenesOn We saw in Chapter 6 that purified E. coli RNA polymerase (including its o subunit) can bind to a promoter and initiate DNA transcription. Many bacterial promoters, however, are only marginally functional on their own, either because they are recognized poorly by RNA polymerase or because the polymerase has difficulty opening the DNA helix and beginning transcription. In either case these poorly functioning promoters can be rescued by gene regulatory proteins that bind to a nearby site on the DNA and contact the RNA polymerase in a way that dramatically increases the probability that a transcript will be initiated. Because the active, DNA-binding form of such a protein turns genes on, this mode of gene regulation is called positive control, and the gene regulatory proteins that function in this manner are known as transcriptional actiuators or geneactiuator proteins.In some cases,bacterial gene activator proteins aid RNA polymerase in binding to the promoter by providing an additional contact surface for the polymerase. In other cases,they contact RNA polymerase and facilitate its transition from the initial DNA-bound conformation of polymerase to the actively transcribing form by stabilizing a transition state of the enzyme. Like repressors,gene activator proteins must be bound to DNA to exert their effects. In this way, each regulatory protein acts selectively,controlling only those genes that bear a DNA sequence recognized by it. DNA-bound activator proteins can increase the rate of transcription initiation up to 1000-fold, a value consistent with a relatively weak and nonspecific interaction between the activator and RNA polymerase. For example, a 1000fold change in the affinity of RNA polymerase for its promoter corresponds to a change in AG of -4 kcal/mole, which could be accounted for by just a few weak, noncovalent bonds. Thus gene activator proteins can work simply by providing a few favorable interactions that help to attract RNA polymerase to the promoter. As in negative control by a transcriptional repressor,a transcriptional activator can operate as part of a simple on-off genetic switch. The bacterial activator protein CAP (catabolite actiuator protein), for example, activates genes that enable E. coli to use alternative carbon sourceswhen glucose, its preferred carbon source, is unavailable. Falling levels of glucose cause an increase in the intracellular signaling molecule cyclic AMII which binds to the CAP protein, enabling it to bind to its specific DNA sequence near target promoters and thereby turn on the appropriate genes. In this way the expression of a target gene is switched on or off, depending on whether cyclic AMP levels in the cell are high or low, respectively. Figure 7-37 summarizes the different ways that positive and negative control can be used to regulate genes. Transcriptional activators and transcriptional repressors are similar in design.The trlptophan repressorand the transcriptional activator CAB for example, both use a helix-turn-helix motif (see Figure 7-l l) and both require a small cofactor in order to bind DNA. In fact, some bacterial proteins (including CAP and the bacteriophage lambda repressor)can act as either activators or repressors, depending on the exact placement of the DNA sequencethey recognize in relation to the promoter: if the binding site for the protein overlaps the promoter, the poll.rnerase cannot bind and the protein acts as a repressor (Figure 7-38).

Repressor A Transcriptional Activatorand a Transcriptional Controlthe LocOperon More complicated types of genetic switches combine positive and negative controls. The Lac operonin E. coli, for example, unlike the Trp operon, is under both



Chapter7:Controlof GeneExpression

N E G A T I VR EE G U L A T I O N bound repressorprotein preventstranscription


P O S I T I VREE G U L A T I O N bound activator protein promotestranscription


R N Ap o l y m e r a s e

boundactivator protein,

oounorepressor protein\ fl


















''13' inA.tiva


rahra'8:eq)rzlaruPC "lela aluerl-leIayl'yuor;'):uPZ purlesou;o 'slla) 'arn}ln) (O) a^lau uorl6ueb prnbrl o))eqof u1 sllar leutlat ;o {seltnor'V) stse;qo^u uroJol 6ursn; aleal)nul]lnu ))lq) (8) terpauund(f) 'slla)al)snLu 'slsplqorqU rq6;'19-g eln6t3 asno6(V)'atntln)ul slla)Josqde,r6onlur

urri 0S


r.l.rrl 001

V N Up u e ' V N C


url OZ

' s u t a l o l 66 u 1 1 e 1 n d t u e: 6 31 reldeql



in many species, regenerate a whole new plant. Similar to animal cells, callus cultures can be mechanically dissociated into single cells, which will grow and divide as a suspension culture (seeFigure B-4D).

Eucaryotic CellLinesArea WidelyUsedSourceof Homogeneous Cells The cell cultures obtained by disrupting tissues tend to suffer from a problemeventually the cells die. Most vertebrate cells stop dividing after a finite number (discussed ofcell divisions in culture, a process called replicatiue cell senescence in Chapter 17). Normal human fibroblasts, for example, typically divide only 25-40 times in culture before they stop. In these cells, the limited proliferation capacity reflects a progressiveshortening and uncapping of the cell's telomeres, the repetitive DNA sequencesand associatedproteins that cap the ends of each chomosome (discussed in Chapter 5). Human somatic cells in the body have turned off production of the enzyrne, caIIed telomerase,Ihat normally maintains the telomeres, which is why their telomeres shorten with each cell division. Human fibroblasts can often be coaxed to proliferate indefinitely by providing them with the gene that encodes the catalytic subunit of telomerase;in this case, they can be propagated as an "immortalized" cell line. Some human cells, however, cannot be immortalized by this trick. Although their telomeres remain long, they still stop dividing after a limited number of divisions because the culture conditions eventually activate cell-cycle checkpoint mechanlsms (discussedin Chapter 17) that arrest the cell cycle-a process sometimes called "culture shock." In order to immortalize these cells, one has to do more than introduce telomerase. One must also inactivate the checkpoint mechanisms. This can be done by introducing certain cancer-promoting oncogenes, such as those derived from tumor viruses (discussed in Chapter 20). Unlike human cells, most rodent cells do not turn off production of telomerase and therefore their telomeres do not shorten with each cell division. Therefore, if culture shock can be avoided, some rodent cell types will divide indefinitely in culture. In addition, rodent cells often undergo genetic changes in culture that inactivate their checkpoint mechanisms, thereby spontaneously producing immortalized cell lines. Cell lines can often be most easily generated from cancer cells,but these cultures differ from those prepared from normal cells in several ways, and are referred to as transformed cell llnes. Transformed cell lines often grow without attaching to a surface, for example, and they can proliferate to a much higher density in a culture dish. Similar properties can be induced experimentally in normal cells by transforming them with a tumor-inducing virus or chemical. The resulting transformed cell lines can usually cause tumors if injected into a susceptible animal (although it is usually only a small subpopulation, called cancer stem cells, that can do so-discussed in Chapter 20). Both transformed and nontransformed cell lines are extremely useful in cell research as sources of very large numbers of cells of a uniform type, especially since they can be stored in liquid nitrogen at -196'C for an indefinite period and retain their viability when thawed. It is important to keep in mind, however, that the cells in both types of cell lines nearly always differ in important ways from their normal progenitors in the tissues from which they were derived. Some widely used cell lines are listed in Table 8-1. Different lines have different advantages;for example, the PtK epithelial cell lines derived from the rat kangaroo, unlike many other cell lines which round up during mitosis, remain flat during mitosis, allowing the mitotic apparatus to be readily observed in action.

Medicine Embryonic StemCellsCouldRevolutionize Among the most promising cell lines to be developed-from a medical point of view-are embryonic stem (ES) cells. These remarkable cells, first harvested from the inner cell mass of the early mouse embryo, can proliferate indefinitely



Chapter8: ManipulatingProteins, DNA,and RNA

Table8-1 SomeCommonlyUsedCellLines

3T3 BHK21 MDCK HeLa PtKl L6 PCl2 5P2

cos 293

cHo DT4O R1 E',t4.1 H 1 ,H 9 5l


(mouse) fibroblast (Syrian fibroblast hamster) epithelial cell(dog) epithelial cell(human) epithelial cell(ratkangaroo) myoblast(rat) chromaffincell(rat) plasmacell(mouse) kidney(monkey) kidney(human); transformed with adenovirus ovary(Chinese hamster) lymphomacellfor efficienttargetedrecombination (chick) embryonic stemcell(mouse) embryonic stemcell(mouse) embryonic stemcell(human) macrophage-likecell(Drosophi la) undifferentiated meristematic cell(tobacco)

* M a n y o f t h e s elcr n ee l ls w e r e d e r i v e d f r o m t u m o r s A l lo f t h e m a r e c a o a bi lnedoef f r n r e r e p l i c a t l o n i n c u l t u r e a n d e xeparset s so amt e o f t h e s pcehcarraa c t e r i s t c s o f t h e i rocrei gl li' ns o f in culture and yet retain an unrestricted developmental potential. If the cells from the culture dish are put back into an early embryonic environment, they can give rise to all the cell types in the body, including germ cells (Figure g-5). Their descendants in the embryo are able to integrate perfectly into whatever site they come to occupy, adopting the character and behavior that normal cells would show at that site. cells with properties similar to those of mouse ES cells can now be derived from early human embryos, creating a potentially inexhaustible supply of cells that might be used to replace and repair damaged mature human tissue. Experiments in mice suggestthat it may be possible, in the future, to use ES cells to produce specialized cells for therapy-to replace the skeletal muscle fibers that degeneratein victims of muscular dystrophy, the nerve cells that die in patients with Parkinson'sdisease,the insulin-secreting cells that are destroyed in type I

fat cell

c e l l so f i n n e r c e l l m a s s neuron

macrophage e a r l ye m b r y o (blastocyst) s m o o t hm u s c l ec e l l

Figure8-5 Embryonicstem (ES)cells derived from an embryo.Thesecultured cellscan giveriseto all of the celltypes ofthe body.EScellsare harvestedfrom the innercell massof an earlyembryo and can be maintainedindefinitely as stemcells(discussed in Chapter23)in culture.lf they areput backinto an embryo,they will integrateperfectlyand differentiateto suit whatever environmentthey find themselves. The cellscanalsobe kept in cultureasan immortalcell line;they can then be suppliedwith differenthormonesor growth factorsto encouragethem to differentiateinto specificcell types. (Basedon E.Fuchsand J.A.Segr6,Cell 100:143-1 55,2000.With permission from Elsevier.)



diabetics, and the cardiac muscle cells that die during a heart attack. Perhaps one day it may even become possibleto grow entire organs from ES cells by a recapitulation of embryonic development. It is important not to transplant ES cells by themselves into adults, as they can produce tumors called teratomas. There is another major problem associatedwith the use of ES-cell-derived cells for tissue repair. If the transplanted cells differ genetically from the cells of the patient into whom they are grafted, the patient's immune system will reject and destroy those cells.This problem can be avoided, of course, if the cells used for repair are derived from the patient's own body. As discussed in Chapter 23, many adult tissues contain stem cells dedicated to continuous production of just one or a few specialized cell types, and a great deal of stem-cell research aims to manipulate the behavior of these adult stem cells for use in tissue repair. ES cell technology, however, in theory at least, also offers another way around the problem of immune rejection, using a strategy known as "therapeutic cloning," as we explain next.

MayProvidea Wayto SomaticCellNuclearTransplantation Personalized Generate StemCells The term "cloning" has been used in confusing ways as a shorthand term for several quite distinct types of procedures.It is important to understand the distinctions, particularly in the context of public debates about the ethics of stem cell research. As biologists define the term, a clone is simply a set of individuals that are genetically identical because they have descended from a single ancestor.The simplest type of cloning is the cloning of cells. Thus, one can take a single epidermal stem cell from the skin and let it grow and divide in culture to obtain a Iarge clone of genetically identical epidermal cells, which can, for example, be used to help reconstruct the skin of a badly burned patient. This kind of cloning is no more than an extension by artificial means of the processesof cell proliferation and repair that occur in a normal human body. The cloning of entire multicellular animals, called reproductiuecloning, is a very different enterprise, involving a far more radical departure from the ordinary course of nature. Normally, each individual animal has both a mother and a father, and is not genetically identical to either of them. In reproductive cloning, the need for two parents and sexual union is bypassed.For mammals, this difficult feat has been achieved in sheep and some other domestic animals by somatic ceII nuclear transplantation.The procedure begins with an unfertilized egg cell. The nucleus of this haploid cell is sucked out and replaced by a nucleus from a regular diploid somatic cell. The diploid donor cell is typically taken from a tissue of an adult individual. The hybrid cell, consisting of a diploid donor nucleus in a host egg cytoplasm, is allowed to develop for a short while in culture. In a small proportion of cases,this procedure can give rise to an early embryo, which is then put into the uterus of a foster mother (Figure 8-6). If the experimenter is lucky, development continues like that of a normal embryo,

Oct Oo)

Figure8-6 Reproductiveand therapeuticcloning,Cellsfrom adult tissuecan be usedfor reproductive cloningor for generatingpersonalized therapeuticcloning). EScells(so-called Kt



/ e m b r y op l a c e di n foster mother


c e l l sf r o m e a r l ye m b r y o t r a n s f e r r e dt o c u l t u r ed i s h

\ E Sc e l l s unfertilized egg from an adultfemale

r e m o v a lo f e g g c e l ln u c l e u s THERAPEUTIC CLONING


Chapter8: ManipulatingProteins, DNA,and RNA

giving rise, eventually, to a whole new animal. An individual produced in this way, by reproductive cloning, should be genetically identical to the adult individual that donated the diploid cell (except for the small amount of genetic information in mitochondria, which is inherited solely from the egg cyoplasm). Therapeutic cloning, which is very different from reproductive cloning, employs the technique of somatic cell nuclear transplantation to produce personalized ES cells (seeFigure 8-6). In this case,the very early embryo generated by nuclear transplantation, consisting of about 200 cells, is not transferred to the uterus of a foster mother. Instead, it is used as a source from which ES cells are derived in culture, with the aim of generating various cell tlpes that can be used for tissue repair. Cells obtained by this route are genetically nearly identical to the donor of the original nucleus, so they can be grafted back into the donor, without fear of immunological rejection. Somatic cell nuclear transfer has an additional potential benefit-for studying inherited human diseases.EScells that have received a somatic nucleus from an individual with an inherited disorder can be used to directly study the way in which the diseasedevelops as the ES cells are induced to differentiate into distinct cell types. "Disease-specific"ES cells and their differentiated progeny can also be used to study the progression of such diseasesand to test and develop new drugs to treat the disorders. These strategiesare still in their infancy, and some countries outlaw certain aspects of the research. It remains to be seen whether human ES cells can be produced by nuclear transfer and whether human ES cells will fulfill the great hopes that medical scientistshave for them.

HybridomaCellLinesAre Factories ThatProduceMonoclonal Antibodies As we see throughout this book, antibodies are particularly useful tools for cell biology. Their great specificity allows precise visualization of selected proteins among the many thousands that each cell typically produces. Antibodies are often produced by inoculating animals with the protein of interest and subsequently isolating the antibodies specific to that protein from the serum of the animal. However, only limited quantities of antibodies can be obtained from a single inoculated animal, and the antibodies produced will be a heterogeneous mixture of antibodies that recognize a variety of different antigenic sites on a macromolecule that differs from animal to animal. Moreover, antibodies specific for the antigen will constitute only a fraction of the antibodies found in the serum. An alternative technology, which allows the production of an infinite quantity of identical antibodies and greatly increasesthe specificity and convenience of antibody-based methods, is the production of monoclonal antibodies by hybridoma cell lines. This technology, developed in 1975, has revolutionized the production of antibodies for use as tools in cell biology, as well as for the diagnosis and treatment of certain diseases,including rheumatoid arthritis and cancer.The procedure requires hybrid cell technology (Figure B-7), and it involves propagiting a clone of cells from a single antibody-secreting B ly,rnphocyteto obtain a homogeneous preparation of antibodies in large quantities. B lymphocytes normally have a limited life-span in culture, but individual antibody-producing B lymphocytes from an immunized mouse or rat, when fused with cells derived fiom a transformed B lymphocyte cell line, can give rise to hybrids that have both the ability to make a particular anribody and the ability to multiply indefinitely in culture. These hybridomas are propagated as individual clones, each of which provides a permanent and stable source of a single type of monoclonal antibody (Figure 8-8). Each type of monoclonal antibody recognizesa single type of antigenic site-for example, a particular cluster of five or six amino acid side chains on the surface of a protein. Their uniform specificity makes monoclonal antibodies much more useful than conventional antisera for most purposes. An important advantage of the hybridoma technique is that monoclonal antibodies can be made against molecules that constitute only a minor component of a complex mixture. In an ordinary antiserum made against such a mix-

'os op slla) eulopuq^q aq] ,(;uo'unno slr uo aleJaJtlold pue anrnlns ue) uorsnJletltutaql loJ pasnad{r lla> raqlrsuasne)ag'asnoul paztunuuilaql r.r.ror] paurplqoslla)lptulouaql u! l)elu! s! 'roLunt g aqr ulolJpa^uapautl lr lnq lla) lla) luetnu aql ur antt)alapsr{emqled srql'spt)e)tal)nuJaqt aztsaqlu^s o1{emqled ssed,{qe asn alolalaqt }snur slla)aql 'apeuraresaptloal)nuq)rqM^q s,{emqted )llaqlu^solqleurrouaql sl)olq lollqlLlulue sutetuo) teql (uuatdourure) oals uorsnJlla) aql ralle pasnLUntpauJ qu,ror6anr]>e;as aLll ,;x uaotlue,, se paleu6rsap srlseralur;ouabrlue aql'araH'ua611ue reln>|1lede 1su1e6e sa!poqrlueleuol)ououl olal)as leql seuoplrq^q1o uorleredal6g-g arn6r1





lpoqrluey-r1ue l o a ) J n o s6 u r n u r l u o :e eprnordsauo;: anrlrsod sarpoqrlue x - r l u er o l p e l s e ls l u e l e u r a d n s; e n p r n r p u r p u e ' { 1 d r 1 1 nou} p a m o l l es l l a ) t




l l a Mr a o l l a ) l - l e p a J n l l n )l l a M e n r l r s o du r o r ; s l l e l p u e ' ^ p o q r l u e x - l l u e r o ] p a l s a ll u e l e u r a o n s

dpoqrgueypala u r n r p e uer ^ r D e l a sa q ] u r e t e l a J r l o r dp u e e n r r u n s q l e re L u o p r : q i {rq{ ; u o

p e r n l l n )s 1 1 ae: u r o p u q { q6 u 1 1 ; n s a . r I NOrSnr

( u n r p a u ra ^ r l ) a l e su r a r p l n q u r n r p a u r l e u r o u u ! ^ l a l r u r + a p uMr o l 6 s l l a ) )

( a i n l l n ) u r s { e pm a l e g raue arp) say(:ot1dur^1



{Poqrlue Y-;1ue 6ur1eu,r11at

o.9o lo-l


y ue6rlue ql;nn pazlunuur esnour

say{:oqdu{; I }o roun} e u l o r +p a ^ u e pe u r l l l a l ] u e l n u r


'uorlf,unJpue aJnlcnJtsslr Lpnls ol uraloJd aqt lgund ol puE 'lueruenoru slr ^\olloJ ol 'sanssrlpuB slleJ ur uraloJd aq] azrlPJolol pasn aq UPJ]l 'epEtu uaaq seq,rtpoqpue ue ef,uo 'eldlues pcrSolorq e ur ureloJd LuE (aJoJaJaql'aldrcurrd uI 'sarlrluenb ]surE8uepeu aq ueJ,.(poquuP Ieuol3ouoru E palrlulun uI dpoqnue tBql aonpord ot se os dlalrugapur euoprrq,{q patJeles eq} ate8edoJd o] pue ,r(poqquelpuoloouoru Jo adrtt parrsap aqt seonpord 1eq1auo ]calas ol arnlxru a3re1aql uo4 sauolc pruoprJq^q pnpr^rpur uaerf,s o1 alqrssod seuoJaq lI (sEruopuq,{qolur aperu ale runJesr}ueslql Jo sluauodruof, snorJe^ eql acnpord teql seldJoqdru,{1g aq} Jr tng 'lnJasneq o} iletus oo} aq plnolv\tuau ,aJnl -oduroJ Jounu aq] ezruSooeJ]Eql salnJalour ^poqpup Jo uorpodord eql

prrq{qaq}'aurllla) roLun} e palle)sr 11ar e uor] sPMslla)lualeo oql Jo auoJl 'saurl puq{q lla) ;elrouLuro1aslranr6ue: s;;atpuq{q q)ns'snal); e;6urse ur raqla6o]]q6nolq aq ol sauosotj,lolq) aql lle 6ur^^olle'palqLUossestp uaoqa^eqsaoola^ua leal)nualptedos oMt aql qrqM ur ;;a: prrq{qe satnpold pue srsoll.rto1spaarold uo{le>oralaq e ^llenlua^l'asnJol [!aq] sa)nputteql piuseldaql {eane ur slle)}o sauelqulaLu clNo'l) NtHrluv H)rH^^'s'r'Ef srolleq)rqMJo q)ea 'lo)^16auallqla,(;od otuS H tt^o)38 qlrMro sasnJtA pale^tl)eut qltM utelta) tslHl'ltvusilroud paleal]st s;;attrouorsuedsns e ,(;;etrd,(1 cNV ln Inuns 'rel)nualeleoasoMl r]ltM of sNoAuv)ou3r_rH lla) paurqulo) A-rNOSAO]tV e 'uofuDoalarl p ulloJo] laqloue nntclN lnrDlrrs qtlM lla) euo asnjot alqtssodst ll .slla) p;rq{q;o uollrnpord aql l-8 arn6rJ s;;a: puqi{q }o seuol) ealq}


lla)Jounl lla) leullou asnour peletluela+]tp

]UV H)IHM ,SNOAUV)OU]I]H lo NoU-VWUol oNV NOtSnl'l'llf

clccv 1Nr9v 9Ntsnlv HltM Cl9nJU-LNl)Sld^l_ 'l'll) oMr_ro NotsNsdsns

lunlrnf NtwlHI9Ntl ouDcNVs-nl) 9Ntlvtosl

'J.t ,rtq oLlluteluteulol ulals^s sllrJlPql uorlnlos lIEs elnlp e Jo dol uo puBq ulql e uI alBuaSoluoq aql Sulraz(e1 le aldLues uorlerabulalaql 6utMollepuplolol aLl] pa^erqcEeq upJ uorlEJedasJo eeJSapJauu v 'aZIsuI ,{l}PeJ3ragrp reqr slueuoduoJ 'suorleuol]oerJ 6urluanaid'uo!])UJ sa)npal Jo 6urleaLl uouu8ryIJtue3 eq] sl uI dals dyuo saleredes lsJg tsoru lr ]nq unn)en aql'luaurpas o1a;duteseq1ur 'seull IeJa^as sallrUedasnp)q)tqM'satroy1e6nlttua> eJnpa3oJduoDe8nJrJlueraql Suueedar pue lafied eq] Sutpuadsnsar^q pa^oueJ snourouasateraua6 rolor aql Jo uotlelol '(0I-8 aq ueJ slueurueluoJ aq] Jo .{ueu 1nq 'erndtut aJEsuol}Je4 eseqt Jo IIV prdeg'.ro.1o.r lplaLue ut saloq;e>rrpur;{r arn31g; palJalloo aq upJ seuosoqrJ eq] uaq] pue selJISaApasoll ileus aql ISJIJ Jo 6uu p otur palresurarplpql saqnlul poureluo)sra;duresaql'a6n1.r1ue>er1;n 'uorle8n;rrlueJ Jo sporJedJa8uol qly\^ pue speadsJaq8q uela le pue lpallsodap a^lleredardaql 6-g arn6rJ sr Errpuoqcotlu Jo ]e[ed e 'paads raq8rq fpq8us tP laqn] a8n;u1uac eq] Jo 'paads Iuouoq aq] ]e lelled E uroJ ol lualulpas Ialcnu se qcns slueuodluoJ a3re1 ro10ur ir,ro1Llanrlelar 1y ,{.1prder}solu aq} a^ou pu€ acro; p8qrrlual lseBrEI aql alua uorlera6u;ar -rradxa slrun 1sa3re1aql 'lpJaue8ut :l1tsuap pue azISzlq sluauodruoo IIaJ selerc -das ]uaruteer1srqJ'16-9 arnS;g) spaadsq8rq re sfiec ua{orqJo slrer]xa salelor qclqm 'aBntuluacatlln anuDJadaJdaq1se ulrroDl lueunrlsul ue Jo s9761,{Fea aqf ur luarudola^ap lerJra{uuoJ eql rage.{po alqrssod eluef,eq suot]euol}f,erJIIaJ qJnS 'pateredasaq ueql lsntu aleuaSouroq aql Jo slueuodruoc 1uara;;lp aql 'sauradord leorrueqJolq pu€tro raqlJo lsoru urclaJ -saruosorcru pallpJ 'runlnJllal ctuseldopua eql {uoq peAIJep salJISaAaq} Surpnlcur-sluauodruoJ snorJel eql '(a[euEBJoqJea JoJJoile pue IEIJ]z{q)uas -oqc rtln;areJ uaaq sEq lunlpalx uollezluaSoluoq eql ]eq] papl^oJd ,!1suap pue 'a8reqc 'azrsalrlcurlsp e qlrm qcee 'sellaueBropasolcua-aueJqruelu ,&arrene Jo surPluoJ Wrql (1JDJryato atauaSoruoq e pele) [-rrn1s4c1qt e o] pa3npeJ ,{qaraqt saruosrxoJedpue 'setuososdl'sn}ereddu sr sllal Jo uorsuadsnsaql ']celut ,{1a8re1 r31og aql 'elrpuoqf,olltu 'IelJnu se qJns sallaue8ro eluel saJnpacorduoqdnrstp 'salJrse^pJsolc aql 'JJ^eMoq'rno pJrJlEJ,{1;n3a.rer llztus LuroJo1 leasa.r,{1a1e Jl -rparurur leql sluaru8eq olul (runpcllar cnuseldopua pue eueJquaru eurseld . r a q u r e qpl a r o u r e l e u e l e u r6 u r l u a u r p a s aql Surpnlcur) IIaJ aql Jo seuErquau eql ;o Lueru leeJq saJnpacord asaql 'uorlerqn cluoseJlln 'Japualq 'aclJuo punor8 ro e ur dn IIEIus B qBnorqt patJoJ snoIJEAur dn ua4orq eq ueJ Jo {Joqs cr}oruso o1 palcafqns eq uEJ r{aqt :s.{ervr silal 'llao aql aprsu ruo{ palJeJ}xeeq }sJIJlsnru lI 'uta}old e d;und o} JepJouI

suolt)erl luauodulo) Jlaql olul palerede5ag ue) slla) ,i1tctycads Sutsearcur (lEIJalEureI{} Jo ,Qrxalduroc aqt ;o sdals uorlecr;rrnd ,{q paatolog uaq} sI pue aJnpal o] uorleuorlJe4 rBlnllaJqns qlIM sgels ^,(1ensnarnpacord uotlecr;trnd e (anssl] IErnlEuE Jo IIar paraaut8uaue sr uralord eIDJo eoJnoseq] Jaqlaqr\\'JeISea qcnu uorleJgund s1r3uqeu,{qaraqt 'uleloJd uan€ e;o sappuenb a8rel Surcnp -ord olur s11ac,,3uqcrr1,,.{q 1se1 slql dJIrduIS ,(lsnoruroua uet [3o7out1Ja1VNO 'oJ]lttu? uoI]JunJ utalord ,tpnls ruDutqluoJal taldeqc slql q ralel aes fleqs a,\/\sV ol JepJour euroJre^o aq ]snlu ]nq 'auo alqepluroJ B sI IIac e ut luasard suralord Je{DoJo spuesnoql eq} ruo4 ura}ord yo ad,rfia13utse 3u4e1ost;o a8ualeqc aq1

DNl^llund sNEroud 'sasnastp pua asou8atp iaau oi sDIIamsn'suratotd11ac f)uolrouout {tnd puo papp oi pasnaro rlJII4m'sarpoquua u.uoltun lo salllluanb palwqun acnpotd o1 pato1dtua,{Iapru arv sIIaJ autopuqtg 'astutotd prtpau ryaB p7ot1atopnqt [at11 'tpoq aqt lo sad[t lac ryatatltp aqt U?) arnl1nca ut ,fialru{apul atatalqotd otul arupuatatllpol r{l1tqaatq Suturutatalll1m'14sry sa [\anuyfapmpaulqlulqru aq uoJ uotlalndtuour uDJ sIIaJwals ctuorQqwfl'sauq71ac 'suols ulauaB Jo suouDlnut snoauaruodsqBnotql pazlf4Joruwr uaaq anDulLottLsUn -trtrppat lo nqunu a1ru1[anltn Sutputtpdols s17ac lowturt fuout qBnoqlly'sa1nca1ow puZrc alaudotdda pua srya14nuSutulaluor unlpaur arnllnJ alqvllns o l'lllm papnotd rua1dpua lnuttua [uDW'sarn] ata [.at1tltqstparnt]nJ a ut alataJ4otdpua antntnss11at -1nc17ac to luawLlsqqotsaaqt nl n ststlaua loJtwaqrolq rct pasn pua pa{und aq uor sadtl llac lDnprnrput tlnqm utott's77acryauodruor naqi olut palDnosstpaq uoJ sanssl.

freuurn5 vNu pup'vNo

'sulelor66ul1e;ndrue61 :gtaldeq)



PURIFYING PROTEINS Figure8-10Cellfractionation Repeated centrifugation by centrifugation, of cellsinto at progressively higherspeeds willfractionate homogenates the theircomponents. Ingeneral, thesubcellular component, thesmaller for greater values it.Typical isthecentrifugal forcerequired to sediment thevarious centrifugation stepsreferred to in thefigureare: Iowspeed: 1000timesgravityfor 10minutes mediumspeed:20,000 timesgravityfor 20 minutes highspeed:80,000 timesgravityfor t hour veryhighspeed: 150,000 timesgravityfor 3 hours

a centrifuge tube. \Mhen centrifuged, the various components in the mixture move as a series of distinct bands through the salt solution, each at a different rate, in a process called uelocity sedimentation (Figue 8-f f A) . For the procedure to work effectively, the bands must be protected from convective mixing, which would normally occur whenever a denser solution (for example, one containing organelles)finds itself on top of a lighter one (the salt solution). This is achieved by augmenting the solution in the tube with a shallow gradient of sucrose prepared by a special mixing device.The resulting density gradient-with the dense end at the bottom of the tube-keeps each region of the salt solution denser than any solution above it, and it thereby prevents convective mixing from distorting the separation. lVhen sedimented through such dilute sucrose gradients, different cell components separateinto distinct bands that can be collected individually. The relative rate at which each component sediments depends primarily on its size and shape-normally being described in terms of its sedimentation coefficient, or S value. Present-day ultracentrifuges rotate at speeds of up to 80,000 rpm and produce forces as high as 500,000 times gravity. These enormous forces drive even small macromolecules, such as IRNA molecules and simple enzymes, to sediment at an appreciable rate and allow them to be separated from one another by size. The ultracentrifuge is also used to separate cell components on the basis of their buoyant density, independently of their size and shape. In this case the sample is sedimented through a steep density gradient that contains a very high concentration of sucrose or cesium chloride. Each cell component begins to move down the gradient as in Figure 8-l lA, but it eventually reaches a position where the density of the solution is equal to its own density. At this point the component floats and can move no farther. A series of distinct bands is thereby produced in the centrifuge tube, with the bands closestto the bottom of the tube containing the components of highest buoyant density (Figure 8-11B). This method, called equilibrium sedimentation, is so sensitive that it can separate macromolecules that have incorporated hear,yisotopes, such as I3C or rsN, from the same macromolecules that contain the lighter, common isotopes (lzC or 14N).In fact, the cesium-chloride method was developed in 1957to separatethe labeled from the unlabeled DNA produced after exposure of a growing population of bacteria to nucleotide precursors containing l5N; this classic experiment provided direct evidence for the semiconservative replication of DNA (see Figure 5-5).

Systemsto StudyCellFunctions CellExtractsProvideAccessible Studies of organelles and other large subcellular components isolated in the ultracentrifuge have contributed enormously to our understanding of the functions of different cell components. Experiments on mitochondria and chloroplasts purified by centrifugation, for example, demonstrated the central function of these organelles in converting energy into forms that the cell can use. Similarly, resealedvesicles formed from fragments of rough and smooth endoplasmic reticulum (microsomes) have been separatedfrom each other and analyzed as functional models of these compartments of the intact cell. Similarly, highly concentrated cell extracts, especially extracts of Xenopus laeuis (African clawed frog) oocyes, have played a critical role in the study of

cell homogenate




pellet contains wholecells nuclei cytoskeletons


p e l l e tc o n t a i n s mitochondria ly5osomeS peroxrsomes TD O SU SUPERNATAN T BJECTE H I G H - 5 P E ECDE N T R I F U G A T I O N


p e l l e tc o n t a i n s mrcroS0meS smallvesicles



pellet contains ribosomes vlruSes large macromolecules


Chapter8: ManipulatingProteins,DNA,and RNA





sample steep sucrose gradient ( e . g, 2 O - 7 O o h )




s l o w - s e idm e n t i n g component f a s t - s e idm e n t in g component



II + low-buoyantdensity component high-buoyantdensity component

such complex and highly organized processesas the cell-division cycle, the separation of chromosomes on the mitotic spindle, and the vesicuiar-transport steps involved in the movement of proteins from the endoplasmic reticulum through the Golgi apparatus to the plasma membrane. cell extracts also provide, in principle, the starting material for the complete separation of all of the individual macromolecular components of the cell. we now consider how this separation is achieved, focusing on proteins.

ProteinsCanBeSeparatedby Chromatography Proteins are most often fractionated by column chromatography, in which a mixture of proteins in solution is passed through a column containing a porous solid matrix. The different proteins are retarded to different extents by their interaction with the matrix, and they can be collected separatelyas they flow out of the bottom of the column (Figure 8-12). Depending on the choice of matrix, proteins can be separated according to their charge (ion-exchangechromatography), their hydrophobicity (hydrophobic chromatography), their size (gel-fittration chromatography), or their ability to bind to particular small molecules or to other macromolecwles (affinity chromatography). Many tlpes of matrices are commercially available (Figure g-13). Ionexchange columns are packed with small beads that carry either a positive or a negative charge, so that proteins are fractionated according to the arrangement of charges on their surface. Hydrophobic columns are packed with beads from which hydrophobic side chains protrude, selectively retarding proteins with

Figure8-1 1 Comparisonof velocity sedimentationand equilibrium (A)In velocity sedimentation. sedimentation, subcellular components sedimentat differentspeedsaccordingto theirsizeand shapewhen layeredovera dilutesolutioncontainingsucrose. To stabilize the sedimentingbandsagainst convectivemixingcausedby small differencesin temperatureor solute concentration, the tube contatnsa continuousshallowgradientof sucrose, which increases in concentration toward the bottom of the tube (typicallyfrom 5olo to 20olo sucrose).After centrifugation,the differentcomponentscan be collected individually, mostsimplyby puncturing the plasticcentrifugetube and collecting dropsfrom the bottom,as illustrated here.(B)In equilibriumsedimentation, subcellular componentsmoveup or down when centrifugedin a gradient untilthey reacha positionwheretheir densitymatchestheirsurroundings. Althougha sucrosegradientis shown here,densergradients, whichare especially usefulfor proteinand nucleic acid separation,can be formed from cesiumchloride. Thefinalbands,at equilibrium, can be collectedas in (A).



exposed hydrophobic regions. Gel-filtration columns, which separate proteins according to their size, are packed with tiny porous beads: molecules that are small enough to enter the pores linger inside successivebeads as they pass down the column, while larger molecules remain in the solution flowing between the beads and therefore move more rapidly, emerging from the column first. Besides providing a means of separating molecules, gel-filtration chromatography is a convenient way to determine their size. Inhomogeneities in the matrices (such as cellulose),which cause an uneven flow of solvent through the column, limit the resolution of conventional column chromatography. Special chromatography resins (usually silica-based) composed of tiny spheres (3-10 pm in diameter) can be packed with a special apparatus to form a uniform column bed. Such high-performance liquid chromatography (HPLC) columns attain a high degree of resolution. In HPLC, the solutes equilibrate very rapidly with the interior of the tiny spheres, and so solutes with different affinities for the matrix are efficiently separated from one another even at very fast flow rates. HPLC is therefore the method of choice for separating many proteins and small molecules.

BindingSiteson ExploitsSpecific AffinityChromatography Proteins If one starts with a complex mixture of proteins, the types of column chromatography just discussed do not produce very highly purified fractions: a single passagethrough the column generally increases the proportion of a given protein in the mixture no more than twenty{old. Becausemost individual proteins represent less than 1/ 1000of the total cell protein, it is usually necessaryto use several different types of columns in succession to attain sufficient purity (Figure 8-f 4). A more efficient procedure, known as affinity chromatography, takes advantageof the biologically important binding interactions that occur on protein surfaces.If a substrate molecule is covalently coupled to an inert matrix such as a polysaccharide bead, the enzyme that operates on that substrate will often be specifically retained by the matrix and can then be eluted (washed out) in nearly pure form. Likewise, short DNA oligonucleotides of a specifically

COLUMN CHROMATOGRAPHY s o l v e n tc o n t i n u o u s l y a p p l i e dt o t h e t o p o f s a m p l e c o l u m nf r o m a l a r g e



f r a c t i o n a t e dm o l e c u l e s e l u t e da n d c o l l e c t e d

Figure8-1 2 The separationof moleculesby columnchromatography. The sample,a solutioncontaininga is applied mixtureof differentmolecules, glassor plastic to the top of a cylindrical columnfilledwith a permeablesolid A largeamount matrix,suchascellulose. of solventis then PumPedslowlY throughthe columnand collectedin tubesas it emergesfrom the separate variouscomponentsof bottom.Because the sampletravelat differentrates throughthe column,they are fractionatedinto differenttubes'


Chapter8: ManipulatingProteins, DNA,and RNA solvent f low

s o l v e n tf l o w

s o l v e n tf l o w


o.o '

a a a a

o. oO


I O N - E X C H A N G EC H R O M A T O G R A P H Y


p o r o u sb e a d

.lD. .' o .tO



a a .

o. molecule

o e\ a ,






retarded S m a l lm o l e c u l e







,nretardeo :l a' r"g"e'm- 'o"l e c u l e

( B ) G E L - F I L T R A T IC OH NR O M A T O G R A P H Y



b e a dw i t h c o v al e n t l y attached substrate bound enzyme motecute other proteins


Figure8-13 Threetypesof matricesusedfor chromatography. (A)In ion-exchange chromatography, the insolublematrix carriesionicchargesthat retardthe movementof molecules of oppositecharge.Matricesusedfor separatingproteins includediethylaminoethylcellulose (DEAE-cellulose), which is positivelycharged,and carboxymethylcellulose iCM-cellulose) and phosphocellulose, whichare negatively charged.Analogousmatricesbasedon agaroseor other polymersareatso frequentlyused.Thestrengthof the association betweenthe dissolvedmolecules and the ion-exchange matrixdependson both the ionicstrengthand the pH of the solutionthat is passingdown the column,which may thereforebe varied (asin Figure8-14)to achievean effectiveseparation. systematically (B)In gel-filtration chromatography, the matrixis inert but porous.Molecules that aresmallenoughto penetrateinto the matrixaretherebydelayedand travelmoreslowly throughthe columnthan largermolecules that cannotpenetrate. polysaccharide (dextran,agarose, Beadsofcross-linked or acrylamide) areavailablecommercially in a wide rangeof poresizes,makingthem suitablefor the fractionation of

ff:T:i::?iffi::::ffii.?Jil: *.;l':flff:il:Hff ffi:T;J:["Jil:il:;],',;i:1fi:il't.,ecu an enzyme substrate, thatwillbinda specific protein. Enzyme molecules thatbindto immobilized substrates on such

columns canbeelutedwitha concentrated solution of thefreeformof thesubstrate molecule, whilemolecules thatbindto immobilized antibodies canbeelutedby dissociating theantibody-antigen complex withconcentrated saltsolutions or solutions of highor lowpH.Highdegrees of purification canbeachieved in a singlepassthroughanaffinitycolumn.

designed sequence can be immobilized in this way and used to purit/ DNAbinding proteins that normally recognize this sequence of nucleotides in chromosomes (seeFigure 7-28). Alternatively, specific antibodies can be coupled to a matrix to purify protein molecules recognized by the antibodies. Because of the great specificity of all such affiniry columns, 1000- to 10,000-fold purifications can sometimes be achieved in a single pass.

Genetically-Engineered TagsProvidean EasyWayto purify Proteins using the recombinant DNA methods discussedin subsequent sections,any gene can be modified to produce its protein with a special recognition tag attached to it, so as to make subsequent purification of the protein by affinity chromatography simple and rapid. often the recognition tag is itsell an antigenic determinant, or epitope, which can be recognized by a highly specific antibody. The antibody, can then be used both to localize the protein in cells and to purify it (Figure s-r5). other types of tags are specifically designed for protein purification. For example, the amino acid histidine binds to certain metal ions, including nickel and copper. If genetic engineering techniques are used to attach a short string of histidines to one end of a proiein, the slightly modified protein can be retained selectively on an affinity column containing immobilized nickel ions. Metal affinity chromatography can thereby be used t6 purify the modified protein from a complex molecular mixture. In other cases,an entire protein is used as the recognition tag.when cells are engineered to synthesize the small enzyme glutathione S-transferase (GST) attached to a protein of interest, the resulting fusion protein can be purified from the other contents of the cell with an affinity column containing glutathione, a




I =


E 6

o a o

-L-----i' fraction number +

p o o l t h e s ef r a c t i o n sa n d a p p l yt h e m t o t h e n e x t c o l u m nb e l o w

( B ) G E L - F I L T R A T IC OHNR O M A T O G R A P H Y

E f

E o o



(D) 1 f )


I I p o o l t h e s ef r a c t i o n sa n d a p p l yt h e m t o t h e n e x t c o l u m nb e l o w







E l



E E o

f r a c t i o nn u m b e r p o o l t h e s ef r a c t i o n sw , h i c h n o w c o n t a i nt h e h i g h l yp u r i f i e dp r o t e i n

substrate molecule that binds specifically and tightly to GST.If the purification is carried out under conditions that do not disrupt protein-protein interactions, the fusion protein can be isolated in association with the proteins it interacts with inside the cell (Figure 8-f O). As a further refinement of purification methods using recognition tags, an amino acid sequence that forms a cleavagesite for a highly specific proteolltic enzyme can be engineered between the protein of choice and the recognition tag. Becausethe amino acid sequencesat the cleavagesite are very rarely found by chance in proteins, the tag can later be cleaved off without destroying the purified protein. This tlpe of specific cleavageis used in an especially powerful purification methodology known as tandem ffinity purification tagging (tap-tagging). Here, one end of a protein is engineered to contain two recognition tags that are separated by a protease cleavage site. The tag on the very end of the construct is chosen to bind irreversibly to an affinity column, allowing the column to be washed extensively to remove all contaminating proteins. Protease cleavage then releasesthe protein, which is then further purified using the second tag.

Figure8-14 Proteinpurificationby chromatography.Typicalresults obtainedwhen threedifferent stepsare usedin chromatographic to purifya protein.In this succession example,a homogenateof cellswasfirst by allowingit to percolate fractionated resinpacked throughan ion-exchange into a column(A).The columnwas washedto removeall unbound and the bound proteins contaminants, werethen elutedby passinga solution c o n t a i n i n ag g r a d u a l liyn c r e a s i n g of saltonto the top of the concentration with the lowestaffinity column.Proteins resinpassed for the ion-exchange directlythroughthe columnand were collectedin the earliestfractionseluted from the bottom of the column.The remainingproteinswereelutedin sequenceaccordingto their affinityfor the resin-those proteinsbindingmost tightlyto the resinrequiringthe highest of saltto removethem. concentration The oroteinof interestwaselutedin severalfractionsand was detectedby its Thefractionswith enzymaticactivity. activitywerepooledand then appliedto column(B).The a second,gel-filtration elutionpositionof the still-impure proteinwasagaindeterminedby its enzymaticactivity,and the active fractionswerepooledand purifiedto homogeneityon an affinitycolumn(C) that containedan immobilizedsubstrate of the enzyme.(D)Affinity purificationof cyclin-bindingproteinsfrom S.cerevisiae, asanalyzedby SDSpolyacrylamide-gel which is describedbelow electrophoresis, in FigureB-18.Lane1 is a total cell extract;lane2 showsthe proteinseluted from an affinitycolumncontainingcyclin 82;lane3 showsone majorProtein elutedfrom a cyclin83 affinitycolumn. Proteinsin lanes2 and 3 wereeluted from the affinitycolumnswith salt,and blue. the gel was stainedwith Coomassie Thescaleat the left showsthe molecular weightsof markerproteins,in (D,from D. Kellogget al., kilodaltons. With J. CellBiol.130:675-685,1995. fromThe Rockefeller oermisison Press.) University

x utsloJd q l r Mp e r a l u r l e q l s u r a l o l dq l l M l e q l a 6 o l u r a l o . r ou o r s n +s e l n l a u o t l n l o sa u o r q l e l n l 6

X u r a l o r do l p u r q s u r a l o r d6 u r p e . r a l u+r 1 ' p a p p Ps rp e r l x a lla) uaqM

I |



'(S L-8 arn6ll aas)6el adolrdero uralordlleus luarua^uo)raqlouero 159 lsutebe serpoqrlue ureluo)ol appu.r aq osleue) suurnlo)Ilru;gy'(17-g atnbtl asaql aas)Illauro.rlrodssseu Iq paururalapeq uer suralotdleuor]tppe qtlm patnlaaq uaLllue) 'tl ol Illq6rl punoq aLll'auorLllelnl6 ,o sarlrluapr ereleql lla) aql u! suralo.rd leqto qllM 6uo;e'utalotduorsnleql'Ienne qllM paleo) paqseMoJespeaqaql o] punoqlou sutalord'auorqletn;6 speaq6urureluo)uurnlol{1rur;;eue uo palnlderaq ue)'sanbtuqla}VNO 'suralo.rd uorsn1 uo;sn1 lueurquro)arqllM slla)ur pa>npord ISg' pa66e1-159e 6ulsn {q saxe;duor u;a1ord,o uolle)Ulrnd 91-g aln6r3 uunloJ [q pa{rmd aq UDJ stJor]xa alqnps u! sunlold aqJ'paqonu! Vuauod nac 'sassaiord -luoc raqrc puo sunrcid ar4| 77acxa\d nD to uoqoJ|Jtrnd ailsua$a SwJInbaJ -woc to sllvpp JalnJapra aqt Sulununiap nt papaau aJD swa$[s aa{-pac pa{und [trtBtu 'pado1anap aq o1 sualsts aa$-11ac puoucunt Sulmon?1 'siuaiuoJ nat4t Suuz) pua Luaril SwtdnJup [q [T1acnuaLlJoIqpazt1aun aq uoc syacto suotla1ndo4

.:ltiil-uolpuJt it:it:i

s p e a qp a l p o ) - a u o r q 1 e l n 1f 6 o l p u n o q u r a l o t du o l s n ; | y uralotd



( I S 9 ) a s e J a l s u e l l -a5u o r q l e l n ; 6 pue y uralo.rduaeMlaq uotsnl eleul ol p e s na r e s a n b r u q : ay1p q l u e u ! q u o ) e r

'sepqnlorclru Suole uodsueJl 'sllal ur JnJJo luq] sassacoJdJeqlo ,{upu pue elJUJed pue (uorlJeJluoJ elcsnu 'uoDPISuEJ]uleloJd '3utct1dsVNU 'uollducs -uEJl VNCI pue uorleJrldar yl11q Jo sllelap JelnJaloru aq] reqdrcap ol 'aldruexa JoJ 'pasn uaaq e^eq daq; 'srualsdsaa;;-1ac qons uI salpnls l(q paranocsrpuaaq sEq IIac aqt Jo .{Solorq relnf,alolu aql lnoqE depol ,vrou1a^ lEql!\ Jo pap lear8 e 'auop aq ol sureruarqcnru qBnoq][V 'uollJe luslueqceu astcardrgaq]lno { Jo o1 pannbar se (suorleJluaJuocJreql IoJluoJ puE ssacord aq] JoJpepeeu sluauod -r.uo3aql Jo lle aurJapauo uEJz(ernsrql uI [Iuo 'ua]s,{s ee4-llal palllJnd B ul sseJ -o.rdpcrSolorq Lrana Jo uollnlllsuooeJ aq] st slst8oyotqIIeJ JoJpo8 roferu y 'ssacord IIeJe^oeql (alqellele ur aloJ ]Jexe slr eurJapo1 flaleredas plaqql]M Jo pappe aq plnoJ qJEa aJeMslueuodruoc amd pnptntput acu6 Lraurqceru cqaqlur(s-uralord eq] alnlps -uoc pue 'svNul 'saurosoqlJeql uJnl ut pacnpord raqlaSo] sarurtzue snoIJPA leq] 'sulaloJd acnpord ol salnJaloru 'da1sLq dals 'aleuaSouroq slql Jo uoIlEuoIlJEJC vNu elPlsueJ] plnoJ leql aleuaSouoq ilac E qll1l^ ueSaq stsaqlu.,ts utalord 'aldruexa JoC 'lseJelul Jo sseJ Jo srusrueqJau aql raqdrcap ol sluelulJadxa aql -ord pcrSolorq e azdplec 01 papaau aJBleql salncalouoJJeru pnpIAIpuI eql Jo qcea Surrlrrnd;o rurc eqt qlllrt paleuoqce4 are saleuaSouroqIIef, 'alqtssod stql e{pu oI'srualsfs aa{-IIeJ pegpnd Sulsn dq 1ac Suvrr e uI JnJOoleql suollJear aprs xaldruoJ eql Jo IIB ruo4 ae{ sassaJordpctSolorq ,(pn1so1 luegodrut s111

suo!l)unlreln)elowJo uoll)assloas!)ardeql roJpel|nbaualv suals^s salJ-lla)polJund suraloJopaler)osse , { u ep u e u r a l o r d n:66o1n

prder uorle:r,rr.rnd

6el uralo.rd o 1 s a r p o q r l u e6 u r s n uo!]ezrle)olounurul


'pagrrnd dprder aq o] ulalord '8e}-de} e sapocua lEqf lseart .{.ue,{,\olleot pet3nr}suoJ ueeq seq VNCI ol pesnJ aua8 e qlu qJee 'suleJlslsead 6969l.laleurrxorddeJo les e 'aldruexaro; luara;yp 'uoJJeel]lll ^la^l]Elar qll^ uollecgund 'snql'd8o1orq flac ur,tyanrsualxapesn sI ]l uralord ;o aar8ap q8rq ,(lercadsa u€ sepllord d8elerls dals-orrnlslql esneJeg


erelrd'ard seuo!]nros r:"J'"jil::;il?:JtX;i?""1H: ro]nouraqr

pa66eraql 6ututetuor aq} aql lurl-ssor){;;errpadssatpoqtlue uorlnlose ol peppeare6el adotldaaql lsu;ebepal)orp satpoqrluP 'uorlelrdr>ardounurur (1rur;eto uotle}tdttardounuJtltl ul ',{qde.r6o1euo.rqr ,(ql! ,!lrnd olro slla)ut ulalo.tdaqtlo uotlP)olaql autulalap ol pasnaq ue) 'palaqpl{;qer;ns,{poqrlueaq1'{poqtluealqelre^e {;;eureur,uo> a}eudolddeue,(q palebtelaq ue) tl 'adoldaJo'}ueululalap rrue6rlueueJlaslrsr6et aqlJl'tsalalutJo utalotde ot pappeaq uel 6e1 'su1a1ord >tlauabplepuels6uts1'1 aprldeduoqs e'sanbruqral6ulraaul6ua ;o uolterglrnd ,o uollez!le)ol aql ro; 6u;66e1adolrdf g 1-g atnbtl

rrl: orr.rr I r)naourNr I evt:aolat f rcrrdrd9NrcofNlI vNo-LUlsNt I l s a l e l u r+ o u r e l o t dt o 1 a u a 6

VNUpue'VNC'suteloldbullelndtueyl:gtaldeqf 9ts

'uljoJ paztuotsll ur araquMoqssrscs aql'stsaloqoollfala la6-apturel^r)eIlod 565 rol suraloroazrlrqnlos ol pasnaJeslp)tuleq) oMl asaql'loueqlaoldp)Jau-f,rue6e 6uPnparaql pue (SOS)arelns |{>apop unlpos 1ua6ra1ep aql 11-g arn6r1 loueqlaolde)raur-d HS '.Hl : ) '_ Hll



)u5 rnPN v rlO




,ui ,Hi ,Hi ,Hi .Hf ZHf l ._ H J

,Hi ,Hi ,Hi ,Hl ,Hf

'urelord E Jo uorlBJIJrJndeq] ur sa8elsa^Issacrnsaql Jo qJea az.{.I 'sureloJd -puE ot pesn uaaq seq ]pql Ia8 pJo qderSoloqd e stueserd 6I-g arn8lc Jo uorlrsoduoc lrunqns aql puE rq81amrelnJalolu 3q] lnoqP uorleruJoJursepr^ -ord tr 'ezrsdq sepqded^lod sateredespoqlau eql esneJeq puv 'saueJqruaruur sureloJd ^uplu eql se qJns-JalE^\ ur alqnlosur fileluJou eJBleq} esoq] Sulpnlcur 'suralord sl 1IDVd-SCS ;o sadfi IIE aleJedas ueo tr asnecaq pasn ,{.1aprrr,t 'pueq e ur palJelap aq ueJ uralord;o 3u 91 su aIDII se leqt os 'urels plo8 Jo JeAIrse qlllvr paleaJl sle8 ur uaas are suralord rourru ualg 'anlq arsseruooJ sE qcns al.p e qlpr 1aBaqr u1 suralord aql Sururels Iq palcalap [ppuar are sura]ord ro[eur aq;'(gl-g arn31g) relnf,eloru Jo JepJo ur peBuBJJuspueq uralord elerf,srpJo sarJese olur lq31a,,r,r peleuorlJeq sr suralord;o arnlxrur xaldruoJ u 'l1nseJu sV 'seuo IIBrusueql eJoru qJnu papJeleJ aJEsuralord a8rel'anars Jplnoeloru e sp slce qJIq \'1aB apnuel 'Jnq '}no 'uor}nlos -r{rcer{1odaql Jo qsaur aq} ur larueJ plnol slJaJJaoml arl} aaq uI '3erp ra8rel B ol osle pup seJJoJIEJrJlJelera8rel o1 palcafqns aJp 'eBJEqO aJoru qlr,4i\'suralord ra8rel 'a8reqc aAIlEBeuJo lunoue elues eq] e^eq eJoJeJeql pue scs Jo lunolue atups eql purq,taqt (z) pue 'alu€s eq] are sadeqs rraq] lEq] os 'SCS aqr -{q paploJun i{1ala1druocsr aJn}lnJls alr}eu Jraql (I) asneJeq spaads rulrurs qryn 1aBaql q8norqt aloru o] pua] azrs aues aql Jo suralord 'paqdde sr a8ellon e uaqm apoJlf,elaanrlrsod eql pJemo] aler8nu ot ll esneJ pue a8reqc JrsurJlur s,uralord aql {seru qJIqM 'salnoaloru lua8ralap pa8reqr dlanuu8au aq] Jo sraqunu a8rel spurq epcaloru uralord qceg ;1aBapnueldrce.{1odgo quls e q8norql unJ sr suralord pazrtqnlos-Sqs Jo eJnlxru p uaq^ suaddeq 1eq11 ,(laleredas pez&leue aq ueJ suralord lrun 'suratord aq} u1 sa8e4url -qnsq1ntu ur saprtdad,{yodluanlqsuoo aqt Jo IIe teqt os g-g ,{ue {perq o} pappe dlensn sl OI-8 arn8rg ees) Ioueqlaoldecraru-fl sE qrns 'uonlppp uI 'uorlnlos lua8u Surcnper B lua8ralap aql ul alqnlos dlaar; perapuer pup selnJaloru pldl Jo sure]oJd Jeq]o q]lm suorleroosspJraql ruo4 paseeleJeJB saplaloru uralord pnprlrpur aql 'supqc ap4daddlod papuaua olur ploJun ol ruaql SursneJ'selnJaloruuralord aql;o suo€ar crqoqdorp.,(r{ol spurq lua8ralap srql esneJag '(zI-g arnS;g) sos ro 'ateJlnsflcapop unrpos ']uaSratap peSrEqr dlanrle8au p;ra,,vrod p sapnlour euo ut lnq uonnlos snoenbu aldrurs e ur ler{} ']seJelur sapcalotu uralord aql go uoqerSrur ]ou ale sallasueql suralord aql Jo eq] pretar o1 q8noua sr leqt os palsnfpe eq uBOya8aqr Jo azrs arod aql rl Iprus :sreruouoruJouortezrJa(u,rtlod dq paredard sr 1aBaq; 'aler8ru suralord aq] qf,rqm q8norqt xrrleru uaur aqt se apnuel,{rce,,(1od3o la8 palull-ssoJJ dp8tq e sasn '(1ICVd-SOS)slseroqdorlcala U 1aE-ap;urufi.rcafiod SOS st rirradord srqtJo uop -ec11dderelndod tsotu aql'adeqs pue ezrsslr uo pue a8reqc leu slr uo spuadap ]Eql elEJe 1e aler8rur ol uralord eql sesneJ elnceloru uralord e Surureluoc uorl -nlos E o1paudde plerJJrJlceleuV 'uruluof, daq] sprce ourure pa8reqc go aJnlxrru aql uo Surpuadap 'a8reqc a,lpe8au ro anrlrsod lau E ssessoddlensn suraloJd


eg ue) sulalord sos ^q pa}eredas lag-aprue;fure{;o6 'ruaq1.{pnls o1 sanbruqcat apnlu1nru e go luarudola^ep aq} palelntuns Jo seq suorlJury pue sarnlJnrls uralord;o dlauel lear8 aq; 'lleJ er{l Jo s}uauala '{Jonr IernlcnJls roferu aq] sE enJespue IpJrueqJau op o1 srsdlorp.,(qapqoalcnu asn 'suorlJeal crloqelaru azftlept r(aq] :s11ac ur sassacordlsoru ruJoJJadsuralor4

'uouaclfrmd uaql 8uttlt\dturs [poaB fqataqt'sutalotd o1paqrquD aq oj sBpjuotttuSocatpttcadsmolp raruI paqrJJsap aq o1 'sanbruqlaj VNe tuoutqwoJav '$au aLU oj panddq aJ?tuunloJ auo tuoLt paulaiqo suo1lco.4{ pa?4Juuaal7-urru m suwnp) ryatatllp la)anastl¬qt passad st a\duos aql 'uottoclfuncl ytctcltl o uI 'salncapu raq]o rcl &ryyffn ro 'srtjslraJcD -ttn4caBtaqc 'tlnlqoqdotptt1 'lq8pm rulncapu 4aqt s6oq aqj uo pauntodasaq unc to suralotd artllcn[11acfio1otq'xuJDruutun1octo ad& aq] uo Sutpuadap:tqda8olaulou4r LLS



Chapter8: ManipulatingProteins, DNA,and RNA (A)


s a m p l el o a d e do n t o g e l by pipette cathode


p l a s t i cc a s i n g

protein with two s u b u n i t sA , and B, joinedby a disulfide bridge A

single-subunit protein







I + )a n o o e


^ |




It Figure8-18 SDSpolyacrylamide-gel (SDS-PAGE). (A)An electrophoresis (B)Individualpolypeptidechainsform a electrophoresis apparatus. complexwith negativelychargedmolecules of sodiumdodecylsulfate (SDS) and thereforemigrateas a negatively chargedSDS-protein complex througha porousgel of polyacrylamide. Because the speedof migration undertheseconditionsis greaterthe smallerthe polypeptide, this techniquecan be usedto determinethe approximatemolecularweightof a polypeptidechainaswell asthe subunitcompositionof a protein.lf the proteincontainsa largeamountof carbohydrate, however,it will move anomalously on the gel and its apparentmolecularweightestimatedby SDS-PAGE will be misleadino.



@ s l a b o f p o l y a c r y l a m i d eg e l

specificProteinscan Be Detectedby Blottingwith Antibodies

molecular weight (daltons) 100,000

of labeled antibody to reveal the protein of interest. This method of detecting proteins is called Western blotting, or immunoblotting (Figure g-2O).

Figure8-19 Analysisof proteinsamplesby SDSpolyacrylamide-gel electrophoresis. The photographshowsa Coomassie-stained gel that has beenusedto detectthe proteinspresentat successive stagesin the purification of an enzyme.The leftmostlane(lane1) containsthe complex mixtureof proteinsin the startingcellextract,and eachsucceeding lane analyzes the proteinsobtainedaftera chromatographic fractionation of the proteinsampleanalyzedin the previouslane(seeFigureg-14).Thesame total amountof protein(10 pg) was loadedonto the gel at the top of each lane.Individualproteinsnormallyappearas sharp,dye-stained bands;a band broadens,however,when it containstoo much protein.(From T.Formosaand B.M.Alberts,J.Biol.Chem.261:6't07-6i l B. 1986.)



(B) Figure8-20 Westernblotting.All the proteinsfrom dividingtobaccocellsin culturearefirstseparatedby two-dimensional (described in FigureB-23).In (A),the positionsof the proteinsare revealedby a polyacrylamide-gel electrophoresis to a sheetof nitrocellulose proteinson an identicalgel werethen transferred proteinstain.In (B),the separated sensitive during residues on threonine phosphorylated are proteins that onlythose and exposedto an antibodythat recognizes by this antibodyarerevealedby an enzyme-linked The positionsof the dozenor io proteinsthat arerecognized mitosis. secondantibody.Thistechniqueis alsoknownas immunoblotting(orWesternblotting).(FromJ.A.Traaset al.,PlantJ Publishing') from Blackwell 1992.With permission 2:723-732,

Methodfor Provides a HighlySensitive MassSpectrometry ldentifyingUnknownProteins A frequent problem in cell biology and biochemistry is the identification of a protein or collection of proteins that has been obtained by one of the purification procedures discussed in the preceding pages (see, for example, Figure s f m o s t c o m m o n e x p e r i m e n l . aol r g a n B - 1 6 ) .B e c a u s et h e g e n o m e S e q u e n c e o isms are now known, cataloguesof all the proteins produced in those organisms are available. The task of identifiiing an unknown protein (or collection of unknown proteins) thus reduces to matching some of the amino acid sequences present in the unknown sample with known catalogued genes.This task is now performed almost exclusively by using mass spectrometry in conjunction with

then dried onto a metal or ceramic slide. A laser then blasts the sample, ejecting the peptides from the slide in the form of an ionized gas,in which each molecule cattl"t one or more positive charges.The ionized peptides are acceleratedin an electric field and fly toward a detector. Their mass and charge determines the time it takes them to reach the detector: Iarge peptides move more slowly, and more highly charged molecules move more quickly. By analyzing those ionized peptides that bear a single charge,the precise massesof peptides present in the original sample can be determined. MALDI-TOF can also be used to accurately -"ir.rre the mass of intact proteins as large as 200,000daltons. This information is then used to search genomic databases,in which the masses of all proteins and of all their predicted peptide fragments have been tabulated from the genomic sequencesof the organism (FigUreS-zf A). An unambiguous match to i particular open reading frame can often be made by knowing the mass of only a few peptides derived from a given protein' MALDI-TOF provides accurate molecular weight measurements for proteins and peptidei. Moreover, by employing two mass spectrometers in tandem 1an arrangement known as MS/MS), it is possible to directly determine the


Chapter8: ManipulatingProteins, DNA,and RNA

p r o t e i no f Interest



j -


o tr o c l




m a s stto m l z ((mass mlz o ccharge h a r g erratio) atio)



M A S S EO S F F R A G M E N TMSE A S U R EO DN A c o u P L E DS E C O N D MRS / M 5 ) M A S SS P E C T R O M E T(E






q C o E c



zvu 200

mlz (massto charqe ratio)



Figure8-21 use of massspectrometryto identify proteins and to sequencepeptides.An isolatedprotein is digested with trypsinand the peptidefragmentsarethen loadedinto the massspectrometer. Two differentapproaches cJn then be usedto identifythe protein.(A)In the firstmethod,peptidemasses aremeasuredprecisely usingMALDI-TOF mass spectrometry' Sequence databases arethen searched to find the genethat encodesa proteinwhosecalculated tryptic digestprofilematchesthesevalues.(B)Massspectrometry can alsobe usedto determlnedirectlythe aminoacidsequence of peptidefragments. In this example,trypticpeptidesarefirstseparatedbasedon masswithina massspectrometer. Each peptideis then furtherfragmented,primarilyby cleavingits peptidebonds.Thistreatmentgenerates a nestedsetof peptides,eachdifferingin sizeby one aminoacid Thesefragmentsarefed into a second coupledmassspectrometer, and their masses aredetermined.The differencein massesbetweentwo closelyrelatedpeptidescan be usedto deducetne "missing"amino acid.By repeatedapplications of this procedure, a partialaminoacidsequenceof the originalproteincan be determined. Forsimplicity, the analysis shownbeginswith a singlespeciesof purifiedprotein.In reality,mass spectrometry is usuallycarriedout on mixturesof proteins,suchasthoseobtainedfor affinitychromatogiaphy (seeFigure8-16),and can identifyall the proteinspresentin the mixtures. experiments As explainedin the text,mass spectrometry can alsodetectpost-translational modifications of oroteins. amino acid sequences of individual peptides in a complex mixture. As described above, the protein sample is first broken into smaller peptides, which are separated from each other by mass spectrometry. each peptiae is then further fragmented through collisions with high-energy gas atoms. this method of fragmentation preferentially cleaves the peptide bonds, generating a ladder of fra!ments, each differing by a single amino acid. The second miss spectrometer then separates these fragments and displays their masses. The amino acid sequence of a peptide can then be deduced from these differences in mass (Fig-

ure 8-2lB). MS/MS is particularly useful for detecting and precisely mapping posttranslational modifications of proteins, such is phosphorylutiorn o. acetytations. Becausethese modifications impart a charicteristic mass increase to an amino acid, they are easily detected by mass spectrometry. As described in



Chapter 3, proteomics, a general term that encompassesmany different experimental techniques, is the characterization of all proteins in the cell, including all protein-protein interactions and all post-translational modifications. In combination with the rapid purification techniques discussedin the last section, mass spectrometry has emerged as the most powerful method for mapping both the post-translational modifications of a given protein and the proteins that remain associatedwith it during purification.

Powerful MethodsareEspecially Separation Two-Dimensional Because different proteins can have similar sizes, shapes, masses, and overall charges, most separation techniques such as SDS polyacrylamide-gel electrophoresis or ion-exchange chromatography cannot typically display all the proteins in a cell or even in an organelle. In contrast, two-dimensional gel electrophoresis, which combines two different separation procedures, can resolve up to 2000 proteins-the total number of different proteins in a simple bacterium-in the form of a two-dimensional protein map. In the first step, the proteins are separated by their intrinsic charges.The sample is dissolved in a small volume of a solution containing a nonionic (uncharged) detergent, together with B-mercaptoethanol and the denaturing reagent urea. This solution solubilizes, denatures, and dissociates all the polypeptide chains but leaves their intrinsic charge unchanged. The pollpeptide chains are then separated in a pH gradient by a procedure called isoelectric protein focusing, which takes advantage of the variation in the net charge on a molecule with the pH of its surrounding solution. Every protein has a characteristic isoelectric point, the pH at which the protein has no net charge and therefore does not migrate in an electric field. In isoelectric focusing, proteins are separated electrophoretically in a narrow tube of polyacrylamide gel in which a gradient of pH is establishedby a mixture of special buffers. Each protein moves to a position in the gradient that corresponds to its isoelectric point and remains there (Figure 8-22). This is the first dimension of two-dimensional polyacrylamide- gel electrophoresis. In the second step, the narrow gel containing the separated proteins is again subjected to electrophoresis but in a direction that is at a right angle to the direction used in the first step. This time SDS is added, and the proteins separate according to their size,as in one-dimensional SDS-PAGE:the original narrow gel is soakedin SDSand then placed on one edge of an SDSpolyacrylamide-gel slab, through which each pollpeptide chain migrates to form a discrete spot. This is the second dimension of two-dimensional polyacrylamide-gel electrophoresis. The only proteins left unresolved are those that have both identical sizes and identical iioelectric points, a relatively rare situation. Even trace amounts of each pollpeptide chain can be detected on the gel by various staining procedures-or by autoradiography if the protein sample was initially labeled with a radioisotope (Figure 8-23). The technique has such great resolving power that it can distinguiih between two proteins that differ in only a single charged amino acid.

4 c o

at low pH, the protein is positively charged

Ho B' -7 o^ i6 o


at high pH, the protein is negatively charged

at the isoelectric point, the protein h a sn o n e t c h a r g e a n d t h e r e f o r en o l o n g e r m i g r a t e si n t h e e l e c t r i cf i e l d ; for the protein shown the isoelectricpH i s6 . 5

Figure8-22 Separationof Protein moleculesby isoelectricfocusing.At low the pH (highH+concentration), carboxylicacid groupsof proteinstend to be uncharged(-COOH)and their basicgroupsfully nitrogen-containing charged(forexample,-NH:+),giving most proteinsa net positivecharge'At acidgroupsare high pH,the carboxylic negativelycharged(-COO-)and the basic groupstend to be uncharged(for example,-NHz),givingmost proternsa pH' net negativecharge.At its isoelectric a proteinhasno net chargesincethe positiveand negativechargesbalance. Thus,when a tube containinga fixedpH gradientis subjectedto a strongelectric fieldin the appropriatedirection,each proteinspeciespresentmigratesuntil it pH, formsa sharpbandat its isoelectric as shown.

Chapter8: Manipulatingproteins,DNA,and RNA



G c

s t a b l ep H g r a d i e n t

1 0 0-






E o








.if'r !-

5 c






-*J$,,.;1Fry,:.,.i,'"XffUa!cl,r,*r,ri:,:r*d -;r - :lti_ .-S-,:


ri:l'1,:i,.l.i. {

6 o E o



1 &-


A different, even more powerful, "two-dimensional" technique is now available when the aim is to determine all of the proteins present inan organelle or another complex mixture of proteins. Because the technique relies on mass spectroscopy, it requires that the proteins be from an organism with a completely sequenced genome. First, the mixture of proteins present is digested with trypsin to produce short peptides. Next, these peptideJ u." sepu.uted by a series of automated liquid chromatography steps. As the second dimension, each separated peptide is fed directly into a tandem mass spectrometer (MS/MS) that allows its amino acid sequence, as well as any post-translational modifications, to be determined. This arrangement, in wtrictr a tandem mass spectrometer (MS/MS) is attached to the output of an automated liquid chromatography (LC) system, is referred to as LC-MS/MS. It is now becoming routine to subject an entire organelle preparation to LC-MS/MS analysis una to ideltify hundreds of proteins and their modifications. of course, no organelle isolation procedure is perfect, and some of the proteins identified will be contaminating proteins. These can often be excluded by analyzing neighboring fractions from the organelle purification and "subtraiting" ihed out from the peak organelle fractions.

HydrodynamicMeasurementsRevealthe Sizeand Shapeof a ProteinComplex Most proteins in a cell act as part of larger complexes, and knowledge of the size and shape of these complexes often leads to insights regarding their function. This information can be obtained in severalimportant ways. Sometimes, a complex can be directly visualized using electron microscopy, as described in chapter 9. A complementary approach relies on the hydrodynamic properties of a complex, that is, its behavior as it moves through a liquid medium. 0sually, two separatemeasurements are made. one measure is the velocity of a complex as it moves under the influence of a centrifugal field produced byan ultracentrifuge (seeFigure 8-llA). The sedimentation constant (or S-valuej obtained depends on both the size and the shape of the complex and does not, by itself, convev especially useful information. However, once a second hydrodvnamic measure_ ment is performed-by charting the migration of a compiex thiough a gel-filtration chromatography column (seeFigure g-r3B)-botrr tne upp.*imite shape of a complex and its molecular weight can be calculated. Molecular weight can also be determined more directly by using an analytical ultracentrifuge, a complex device that allows protein absorbince measurements

Figure8-23 Two-dimensional polyacrylamide-gel electrophoresis, All the proteinsin an E.coli bacterialcell are separated in thisgel,in whicheachspot correspondsto a differentpolypeptide chain.The proteinswerefirst separated on the basisoftheir isoelectric pointsby isoelectricfocusingfrom left to right. Theywere then further fractionated accordingto their molecularweightsby electrophoresis from top to bottom in the presenceof SDS.Note that different proteinsare presentin very different amounts.The bacteriawerefed with a mixtureof radioisotope-labeled amino acidsso that all of their proretnswere radioactive and couldbe detectedby autoradiography(seepp. 602-603). (Courtesyof PatrickO'Farrell.)


to be made on a sample while it is subjected to centrifugal forces. In this approach, the sample is centrifuged until it reachesequilibrium, where the centrifugal force on a protein complex exactly balances its tendency to diffuse away. Becausethis balancing point is dependent on a complex's molecular weight but not on its particular shape, the molecular weight can be directly calculated, as needed to determine the stoichiometry of each protein in a protein complex'

Setsof InteractingProteinsCanBeldentifiedby Biochemical Methods Because most proteins in the cell function as part of complexes with other proteins, an important way to begin to characterize the biological role of an unknor,tryr protein is to identiff all of the other proteins to which it specifically binds. One method for identifying proteins that bind to one another tightly is coimmunoprecipitation.In this case,an antibody recognizesa specific target protein; reagents that bind to the antibody and are coupled to a solid matrix then drag the complex out of solution to the bottom of a test tube. If the original target protein is associatedtightly enough with another protein when it is captured by the antibody, the partner precipitates as well. This method is useful for identifuing proteins that are part of a complex inside cells, including those that interact only transiently-for example, when extracellular signal molecules stimulate cells (discussed in Chapter 15). Another method frequently used to identify a protein's binding partners is protein affinity chromatography (seeFigure B-l3C)' To employ this technique to capture interacting proteins, a target protein is attached to polymer beads that are packed into a column. \A/henthe proteins in a cell extract are washed through this column, those proteins that interact with the target protein are retained by the affinity matrix. These proteins can then be eluted and their identity determined by mass spectrometry. In addition to capturing protein complexes on columns or in test tubes, researchers are developing high-density protein arrays to investigate protein interactions. These arrays, which contain thousands of different proteins or antibodies spotted onto glass slides or immobilized in tiny wells, allow one to examine the biochemical activities and binding profiles of a large number of proteins at once. For example, if one incubates a fluorescently labeled protein with arrays containing thousands of immobilized proteins, the spots that remain fluorescent after extensivewashing each contain a protein to which the labeled protein specifically binds.

Interactions CanAlsoBe ldentifiedby a Protein-Protein Two-HybridTechniquein Yeast Thus far, we have emphasized biochemical approaches to the study of protein-protein interactions. However, a particularly powerful strategy, called the two-hybrid system, relies on exploiting the cell's own mechanisms to reveal protein-protein interactions. The technique takes advantage of the modular nature of gene activator proteins (see Figure 7-45). These proteins both bind to specific DNA sequences and activate gene transcription, and these activities are often performed by two separate protein domains. Using recombinant DNA techniques, two such protein domains are used to create separate "bait" and "prey" fusion proteins. To create the "bait" fusion protein, the DNA sequence that codes for a target protein is fused with DNA that encodes the DNA-binding domain of a gene activator protein. lVhen this construct is introduced into yeast, the cells produce the fusion protein, with the target protein attached to this DNA-binding domain (Figure 8-24). This fusion protein binds to the regulatory region of a reporter gene, where it serves as "bait" to fish for proteins that interact with the target protein. To search for potential binding partners (potential prey for the bait), the candidate proteins also have to be constructed as fusion proteins: DNA encoding the activation domain of a gene activator protein is fused to a large



Chapter8: ManipulatingProteins, DNA,and RNA target D N A - b i n d i n gd o m a i n p r o t e i n

bindino partne; \

transcriptional a c T l v a t l oO n omaln


t r a n s c r i p t i o n aalc t i v a t o r b i n d i n gs i t e


number of different genes. Members of this collection of genes-encoding potential 'rprey"-are introduced individually into yeast cells containing the bait. If the yeast cell receives a DNA clone that expressesa prey partner for the bait protein, the two halves of a transcriptional activator ate ,-,.rited, switching on the reporter gene (seeFigure 8-24). This ingenious technique sounds complex, but the two-hybrid system is relatively simple to use in the laboratory.Although the protein-protein interactions occur in the yeast cell nucleus, proteins from every part of the cell and from any organism can be studied in this way. The two-hybrid system has been scaled up to map the interactions that occur among all of the proteins an organism produces. In this case,a set of bait and prey fusions is produced for every cell protein, and every bait/prey combination can be monitored. In this way proiein interaction maps have been generated for most of the proteins in yeast, c. elegans, and Drosophila.

produces combiningDataDerivedfrom DifferentTechniques ReliableProtein-lnteraction Maps As previously discussedin chapter 3, extensiveprotein-interaction maps can be very useful for identiSring the functions of proteins (seeFigure 3-g2) . For this reason, both the two-hybrid method and the biochemical technique discussedearlier knornmas tap-tagging (seepp. 515-516) have been automited to determine the interactions between thousands of proteins. Unfortunately, different results are found in different experiments, and many of the interactions detected in one laboratory are not detected in another. Therefore,the most useful protein-interaction maps are those that combine data from many experiments, requiring that each interaction in the map be confirmed by more than one technique.

opticalMethodscan MonitorproteinInteractions in RealTime once two proteins-or a protein and a small molecule-are knor.vnto associate, it becomes important to characterize their interaction in more detail. proteins can associate with each other more or less permanently (like the subunits of RNA polymerase or the proteosome), or engage in transient encounters that may last only a few milliseconds (like a protein kinase and its substrate). To understand how a protein functions inside a cell, we need to determine how tightly it binds to other proteins, how rapidly it dissociatesfrom them, and how covalent modifications, small molecules, or other proteins influence these interactions. such studies of protein dynamics often employ optical methods.

Figure8-24 The yeasttwo-hybrid systemfor detecting protein-protein interactions.The target protein is fused to a DNA-binding domainthat directsthe fusion proteinto the regulatoryregionof a reportergeneas"bait."When thistarget proteinbindsto anotherspecially designedproteinin the cell nucleus 1"prey"\,their interactionbringstogether two halvesof a transcriptionalactivator, whichthen switcheson the expression of the reportergene.



Certain amino acids (for example, tryptophan) exhibit weak fluorescence that can be detected with sensitive fluorimeters. In many cases,the fluorescence intensity, or the emission spectrum of fluorescent amino acids located in a protein-protein interface, will change when the proteins associate. \.{/hen this change can be detected by fluorimetry, it provides a sensitive and quantitative measure of protein binding. A particularly useful method for monitoring the dynamics of a protein's binding to other molecules is called surface plasmon resonance (SPR).The SPR method has been used to characterize a wide variety of molecular interactions, including antibody-antigen binding, ligand-receptor coupling, and the binding of proteins to DNA, carbohydrates,small molecules, and other proteins. SPRdetects binding interactions by monitoring the reflection of a beam of Iight off the interface between an aqueous solution of potential binding molecules and a biosensor surface carrying an immobilized bait protein. The bait protein is attached to a very thin layer of metal that coats one side of a glass prism (Figure 8-25). A light beam is passedthrough the prism; at a certain angle, called the resonanceangle, some of the energy from the light interacts with the cloud of electrons in the metal film, generating a plasmon-an oscillation of the electrons at right anglesto the plane of the film, bouncing up and down between its upper and lower surfaceslike a weight on a spring. The plasmon, in turn, generates an electrical field that extends a short distance-about the wavelength of the light-above and below the metal surface.Any change in the composition of (A)

p r i s m ,o r g r a t i n g

reflected-lig ht detector

s u r fa c e p l a s m o n se x c i t e d in gold film by light at a , / s p e c i f i cr e s o n a n c e angle g o r of l r m (-50 nm) L bait molecule attachedto gold

r i r m b v r r e x i b r e +g a




solution of prey molecules

( B ) T h e b i n d i n go f p r e y m o l e c u l e tso b a i t m o l e c u l e si n c r e a s etsh e r e f r a c t i v ei n d e xo { t h e s u r f a c el a y e r , h i c h c a n b e m e a s u r e db y a d e t e c t o r . T h i sa l t e r st h e r e s o n a n c e a n g l ef o r p l a s m o ni n d u c t i o n w

s o lu t l o n of prey molecules





+ liiT.::" dissociation

o E o o c 6 o

prey added

buffer wash

Figure8-25 Surfaceplasmon resonance. by (A)SPRcan detectbindinginteractions monitoringthe reflectionof a beam of light off the interfacebetweenan aqueous solutionof potentialbindingmolecules (green)and a biosensorsurfacecoated with an immobilizedbait protein (red,). (B)A solutionof prey proteinsis allowedto flow pastthe immobilizedbait protein. to the bait Bindingof preymolecules changein proteinproducesa measurable angle,asdoestheir the resonance when a buffersolutionwashes dissociation them off.Thesechanges,monitoredin real and time,reflectthe association of the molecularcomplexes. dissociation


Chapter8: ManipulatingProteins, DNA,and RNA

the environment within the range of the electrical field will cause a measurable change in the resonance angle. To measure binding, a solution containing proteins (or other molecules) that might interact with the immobilized bait protein is allowed to flow past the biosensor surface. Proteins binding to the bait change the composition of the molecular complexes on the metal surface, causing a change in the resonance angle (see Figure 8-25). The changes in the resonance angle are monitored in real time and reflect the kinetics of the association-or dissociation-of molecules with the bait protein. The association rate (korj is measured as the molecules interact, and the dissociation rate (kor) is determined as buffer washes the bound molecules from the sensor surface.A binding constant (.fiJis calculated by dividing komby kon.In addition to determining the kinetics, spR can be used to determine the number of molecules that are bound in each complex: the magnitude of the sPR signal change is proportional to the mass of the immobilized complex. The sPR method is particularly useful becauseit requires only small amounts of the protein, the protein does not have to be labeled in any way, and the interactions of the protein with other molecules can be monitored in real time. A third optical method for probing protein interactions usesgreenfluorescent protein (discussedin detail below) and its derivatives of different colors. In this application, two proteins of interest are each labeled with a different fluorochrome, such that the emission spectrum of one fluorochrome overlaps the absorption spectrum of the second fluorochrome. If the two proteins-and their attached fluorochromes-come very close to each other (within about l-10 nm), the energy of the absorbed light is transferred from one fluorochrome to the other. The energy transfer, called fluorescence resonance energy transfer (FRET), is determined by illuminating the first fluorochrome and measuring emission from the second (Figure 8-26). This technique is especially powerful because,when combined with fluorescencemicroscopy, it canbe used to characlerizeprotein-protein interactions at specific locations inside living cells.

SomeT e c h n i q u eCsa nMo n i to rS i n g l eMolecules The biochemical methods described so far in this chapter are used to study large populations of molecules, a limitation that reflects the small size of typical bio, logical molecules relative to the sensitivity of the methods to detect them. However, the recent development of highly sensitive and precise measurement methods has created a new branch of biophysics-the study of single molecules. Single-molecule studies are particularly important in cell biology becausemany processesrely on the activities of only a few critical molecules in the cell. bluefluorescent protetn blue

g r e e n f u o r e s c e nt protein

blue light e x c i t a toi n


9reen l i gh t e mr s S t o n



p r o t e i nX




protein Y

green violet



,rr+ir'[0,!] ,,rtrii



Figure8-26 Fluorescence resonance energy transfer(FRET). To determine whether(andwhen)two proteins interactinsidea cell,the proteinsarefirst producedasfusionproteinsattachedto differentcolorvariantsof green protein(GFP). (A)In this fluorescent example,proteinX is coupledto a blue protein,which is excitedby fluorescent violetlight (370-440nm) and emitsblue light (440-480nm);proteiny is coupled protein,whichis to a greenfluorescent excitedby blue lightand emitsgreen light (510nm).(B)lf proreinX and y do not interact,illuminatingthe samplewith violetlight yieldsfluorescence from the bluefluorescent proteinonly.(C)When proteinX and proteinY interact,FRET can now occur.llluminatingthe samplewith violetlight excitesthe bluefluorescent protein,whoseemissionin turn excites the greenfluorescent protein,resultingin an emissionof greenlight.The fluorochromes mustbe quiteclose together-within about 1-10 nm of one another-for FRET to occur.Because not everymoleculeof proteinX and proteiny is boundat all times,someblue light may stillbe detected.Butasthe two proteins beginto interact,emissionfrom the donorGFPfallsasthe emissionfrom the acceDtor GFPrises.



The first example of a technique for studying the function of single protein molecules was the use of a patch electrode to measure current flow through single ion channels (seeFigure ll-33). Another approach is to attach the protein to a larger structure, such as a polystyrene bead, which can then be observed by conventional microscopy. This strategy has been particularly useftrl in measuring the movements of motor proteins. For example, molecules of the motor protein kinesin (discussed in Chapter 16) can be attached to a bead, and by observing the kinesin-attached bead moving along a microtubule, the step size of the motor (that is, the distance moved for each ATP molecule hydrolyzed) can be measured. As we will see in Chapter 9, optical microscopes have a limited resolution due to the diffraction of light, but computational and optical methods can be used to determine the position of a bead to a much finer precision than the resolution limit of the microscope. Using such techniques, extremely small movements-on the order of nanometers-can easily be detected and quantified. Another advantage of attaching molecules to large beads is that these beads can serve as "handles" by which the molecules can be manipulated. This allows forces to be applied to the molecules, and their responseobserved.For example, the speed or step size of a motor can be measured as a function of the force it is pulling against. As discussed in the next chapter, a focused laser beam can be used as "optical tweezers" to generate a mechanical force on a bead, allowing motor proteins to be studied under an applied force (seeFigure 9-35). Beads can also be manipulated using magnetic fields, a technology known as "magnetic tweezers."If multiple beads are present in a magnetic field, they will all experience the same force, potentially allowing large numbers of beads to be manipulated in parallel in a single experiment. VVhilebeads can be used as markers to track protein movements, it is clearly preferable to be able to visualize the proteins themselves. In the next chapter, we shall see that recent refinements in microscopy have now made this possible.

HNI s4N/ (A)




H monastrol

ProteinFunctionCanBeSelectivelyDisruptedWith Small Molecules Chemical inhibitors have contributed to the development of cell biology. For example, the microtubule inhibitor colchicine is routinely used to test whether microtubules are required for a given biological process; it also led to the first purification of tubulin several decades ago. In the past, these small molecules were usually natural products; that is, they were symthesizedby living creatures. Although, as a whole, natural products have been extraordinarily useftrl in science and medicine (see,for example, Table 6-4, p.385), they acted on a limited number of biological processes.Howeve! the recent development of methods to slmthesize hundreds of thousands of small molecules and to carry out large-scale automated screens holds the promise of identifuing chemical inhibitors for virtually any biological process. In such approaches, Iarge collections of small chemical compounds are simultaneously tested, either on living cells or in cell-free assays. Once an inhibitor is identified, it can be used as a probe to identiff, through affrnity chromatography (see Figure 8-13C) or other means, the protein to which the inhibitor binds. This general strategy, often called chemical biology, has successfirlly identifled inhibitors of many proteins that carry out key processesin cell biology. The kinesin protein that functions in mitosis, for example, was identified by this method (Figure &-27). Chemical inhibitors give the cell biologist great control over the timing of inhibition, as drugs can be rapidly added to or removed from cells, allowing protein function to be switched on or offquickly.

ProteinStructureCanBe DeterminedUsingX-RayDiffraction The main technique that has been used to discover the three-dimensional structure of molecules, including proteins, at atomic resolution is x-ray crystallography. X-rays, like light, are a form of electromagnetic radiation, but they have a much shorter wavelength, typically around 0.1 nm (the diameter of a hydrogen




Figure8-27 Small-moleculeinhibitors for manipulating living cells. (A)Chemicalstructureof monastrol,a kinesininhibitoridentifiedin a largethat scalescreenfor smallmolecules disruptmitosis.(B)Normalmitotic spindleseenin an untreatedcell.The microtubulesare stainedgreenand chromosomesblue.(C)MonoPolar soindlethat forms in cellstreatedwith monastrol.(Band C,from T.U'Mayeret al.,Science286:971 -974, 1999.With oermissionfrom AAA5.)


Chapter8: ManipulatingProteins, DNA,and RNA x-ray diff raction pattern obtained from the protein crystal


diffracted beams

beam of x-rays

x - r a ys o u r c e


Figure8-28 X-raycrystallography. (A)A narrowparallelbeamofx-raysis directedat a well-ordered crystal(B).shown hereis a proteincrystalof ribulose bisphosphate carboxylase, an enzymewith a centralrolein CO2fixationduring photosynthesis. Theatomsin the crystalscattersomeofthe beam,and the scattered wavesreinforce one anotherat certainpointsand appearasa pattern of diffractionspots(C).Thisdiffractionpattern,togetherwith the aminoacid sequenceof the protein,can be usedto producean atomicmodel(D).The completeatomicmodelis hardto interpret,but this simplifiedversion,derived from the x-raydiffractiondata,showsthe protein'sstructuralfeaturesclearly(a, green;p strands, helices, red).Thecomponentspicturedin A to D arenot shown to scale.(8,courtesyofC. Branden; C,courtesyofJ. Hajduand l. Andersson; D,adaptedfrom originalprovidedby B.Furugren.)


atom). If a narrow parallel beam of x-rays is directed at a sample of a pure protein, most of the x-rays pass straight through it. A small fraction, however, are scattered by the atoms in the sample. If the sample is a well-ordered crystal, the scatteredwaves reinforce one another at certain points and appear as diffraction spots when recorded by a suitable detector (Figure g-2g). The position and intensity of each spot in the x-ray diffraction pattern contain information about the locations of the atoms in the crystal that gave rise to it. Deducing the three-dimensional structure of a large molecule from the diffraction pattern of its crystal is a complex task and was not achieved for a protein molecule until 1960. But in recent years x-ray diffraction analysis has become increasingly automated, and now the slowest step is likely to be ihe generation of suitable protein crystals.This step requires large amounts of very pure protein and often involves years oftrial and error to discover the proper crystallization conditions; the pace has greatly acceleratedwith the use of recombinant DNA techniques to produce pure proteins and robotic techniques to test large numbers of crystallization conditions. Analysis of the resulting diffraction pattern produces a complex threedimensional electron-density map. Interpreting this map-translating its contours into a three-dimensional structure-is a complicated procedure that requires knowledge of the amino acid sequence of the protein. Largely by trial and error, the sequence and the electron-density map are correlated by computer to give the best possible fit. The reliability of the final atomic model depends on the resolution of the original crystallographic data: 0.5 nm resolution might produce a low-resolution map of the polypeptide backbone, whereas a resolution of 0.15 nm allows all of the non-hydrogen atoms in the molecule to be reliably positioned. A complete atomic model is often too complex to appreciate directly, but simplified versions that show a protein's essential structural features can be readily derived from it (see Panel 3-2, pp. 132-133). The three-dimensional

/n\ \vl



structures of about 20,000different proteins have now been determined by x-ray crystallography or by NMR spectroscopy (see below)-enough to begin to see families of common structures emerging. These structures or protein folds often seem to be more conserved in evolution than are the amino acid sequencesthat form them (seeFigure3-13). X-ray crystallographic techniques can also be applied to the study of macromolecular complexes. In a recent triumph, the method was used to determine the structure of the ribosome, a large and complex machine made of several RNAs and more than 50 proteins (see Figure 6-64). The determination required the use of a synchrotron, a radiation source that generatesx-rays with the intensity needed to analyze the crystals of such large macromolecular complexes.

NMRCanBeUsedto DetermineProteinStructurein Solution Nuclear magnetic resonance (NMR) spectroscopyhas been widely used for many years to analyze the structure of small molecules. This technique is now also increasingly applied to the study of small proteins or protein domains. Unlike xray crystallography, NMR does not depend on having a crystalline sample. It simply requires a small volume of concentrated protein solution that is placed in a strong magnetic field; indeed, it is the main technique that yields detailed evidence about the three-dimensional structure of molecules in solution. Certain atomic nuclei, particularly hydrogen nuclei, have a magnetic moment or spin: that is, they have an intrinsic magnetization, like a bar magnet. The spin aligns along the strong magnetic field, but it can be changed to a misaligned, excited state in response to applied radiofrequency (RF)pulses of electromagnetic radiation. \.Vhenthe excited hydrogen nuclei return to their aligned state, they emit RF radiation, which can be measured and displayed as a spectrum. The nature of the emitted radiation depends on the environment of each hydrogen nucleus, and if one nucleus is excited, it influences the absorption and emission of radiation by other nuclei that lie close to it. It is consequently possible, by an ingenious elaboration of the basic NMR technique known as twodimensional NMR, to distinguish the signals from hydrogen nuclei in different amino acid residues,and to identify and measure the small shifts in these signals that occur when these hydrogen nuclei lie close enough together to interact. Becausethe size of such a shift revealsthe distance between the interacting pair of hydrogen atoms, NMR can provide information about the distances between the parts of the protein molecule. By combining this information with a knowledge of the amino acid sequence,it is possible in principle to compute the threedimensional structure of the protein (Figure 8-29).


Figure8-29 NMRspectroscopy.(A)An exampleof the datafrom an NMR NMR Thistwo-dimensional machine. soectrumis derivedfrom the C-terminal The domainof the enzymecellulase. spotsrepresentinteractionsbetween hydrogenatomsthat are nearneighbors in the orotein and hencereflectthe them.Complex distancethat separates computingmethods,in conjunctionwith enable the knownaminoacidsequence, possiblecompatiblestructures to be derived.(B)Tenstructuresof the enzyme, whichall satisfythe distanceconstraints on equallywell,areshownsuperimposed one another,givinga good indicationof the orobablethree-dimensional (Courtesy of P.Kraulis.) structure.


Chapter8: ManipulatingProteins,DNA,and RNA

For technical reasonsthe structure of small proteins of about 20,000daltons or less can be most readily determined by NMR spectroscopy. Resolution decreasesas the size of a macromolecule increases.But recent technical advances have now pushed the limit to about 100,000daltons, thereby making the majority of proteins accessiblefor structural analysis by NMR. BecauseNMR studies are performed in solution, this method also offers a convenient means of monitoring changes in protein structure, for example during protein folding or when the protein binds to another molecule. NMR is also used widely to investigate molecules other than proteins and is valuable, for example, as a method to determine the three-dimensional structures of RNA molecules and the complex carbohydrate side chains of glycoproteins. Some landmarks in the development of x-ray crystallography and NMR are Iisted in Table 8-2.

ProteinSequence and StructureProvideCluesAboutprotein Fu n c t i o n Having discussed methods for purifying and analyzing proteins, we now turn to a common situation in cell and molecular biology: an investigator has identified a geneimportant for a biologicalprocessbut has no direct knowledgeof the biochemical propertiesof its protein product. Thanks to the proliferation of protein and nucleic acid sequencesthat are catalogued in genome databases,the function of a gene-and its encoded protein-can often be predicted by simply comparing its sequence with those of previously characterized genes (seeFigure 3-14). Becauseamino acid sequence

Table8-2 Landmarksin the Developmentof X-rayCrystallography and NMRand TheirApplicationto BiologicalMolecules 1864 Hoppe-Seyler crystallizes, and names, the proteinhemoglobin. 1895 Rcintgen observes that a newformof penetrating radiation, whichhe namesx-rays, is producedwhencathoderays (electrons) hit a metaltarget. 1912 Von Laueobtainsthe firstx-raydiffractionpatternsby passingx-raysthrougha crystalof zincsulfide. W.L.Braggproposes a simplerelationship betweenan x-raydiffraction patternandthe arrangement of atomsin a crystalthat producethe pattern. 1926 5ummerobtainscrystals of the enzymeurease fromextracts of jackbeansanddemonstrates that proteinspossess catalyticactivity. 1931 Paulingpublishes hisfirstessays on 'TheNatureof the Chemical Bondidetailingthe rulesof covalentbonding. 1934 Bernaland Crowfootpresentthe firstdetailedx-raydiffractionpatternsof a proteinobtainedfrom crystalsof the enzymepepsin. 1935 Patterson develops an analytical methodfor determining interatomic spacings fromx-raydata. 1941 Astburyobtainsthe firstx-raydiffractionpatternof DNA. 1946 BlockandPurcelldescribe NMR. 19 5 1 Paulingand Coreyproposethe structure of a helicalconformation of a chainof L-amino acids-the crhelix-and the structure of the B sheet,bothof whichwerelaterfoundin manyproteins. 1953 Wallon and Crickproposethe double-helix modelof DNA,basedon x-raydiffraction patternsobtainedby Franklin and Wilkins. 1954 Perutzandcolleagues developheavy-atom methodsto solvethe phaseproblemin proteincrystallography. 1960 Kendrewdescribes the firstdetailedstructure of a protein(spermwhalemyoglobin) to a resolution of 0.2nm,and Perutzpresents a lower-resolution structure of the largerproteinhemoglobin. 1966 Phillipsdescribes the structure of lysozyme, the firstenzymeto haveitsstructure analyzed in detail. 1971 Jeenerproposes the useof two-dimensional NMR,andWuthrichandcolleagues firstusethe methodto solvea protein structure in the early1980s. 1976 Kimand Richand Klugandcolleagues describe the detailedthree-dimensional structure of IRNAdetermined by x-ray diffraction. 1977-'1978Holmesand Klugdetermine the structure of tobaccomosaicvirus(TMV), andHarrisonandRossman determine the structure of two smallspherical viruses. 1985 Michel,Deisenhofer andcolleagues determine the firststructure protein(a bacterial of a transmembrane reaction center)by.x-ray crystallography. Henderson andcolleagues obtainthe structure of bacteriorhodopsin, a transmembrane protein,by high-resolution electron-microscopy methodsbetween1975and'1990.

ANALYZING PROTEINS Score = 399 bits (1025), Expect = e-111 = L98/290 (58t), Positives = 24L/290 (82t), Identities


Figure8-30 Resultsof a BLASTsearch. can be searched to Sequence databases find similaraminoacidor nucleicacid I.IE}IFQKVEKIGEGTYGWYKARNKLTGEWAI,KKIRLDTETEGVPSTATREISI,I,KEI.NH 116 QUETy:57 Here,a searchfor proteins sequences. ME ++KVEXIGEGTYGWYKA +K T E +AI,KKIRI,+ E EGVPSTAIREISLI,RE+NH regulatory similarto the humancell-cycle Sbjct ! 1 MEOYEKVEKIGEGTYGWYKALDK.ATI{ETIAI,KKTRLEOEDEGVPSTAIREISLI,KEMNE 6O proteinCdc2(Query)locatesmaizeCdc2 OUETY! 117 PNIVKIIIDVIHTEIIKLYLVFEFLSODLKKFUDASTI,IKTPLPIIKSYLFQLLOGLAFCES 175 (Sbjcf)which is 680loidentical(and82olo NIV+L DV+E+E ++YTVFE+IJ DI'KKFUD+ LIKSYL+O+IJ G+A+CES similad to humanCdc2in its aminoacid SbJCt! 51'GNIVRIEDWI{SEERI,YLVFBYI,EITDLKKFMDSEWFNXSIflTLIKSYI,YQII,EGVAYCHS 120 Thealignmentbeginsat residue sequence. that 57 of the Queryprotein,suggesting OUETYt I 7 7 HRWERDLKPONI,I.INENKAIKLADFGLAXAFGVPVRTYTEEWTI,WYRAPEII,I,CdKE2 35 ERWHRDIJKPO$LLI+ A+KI'ADFGLARJAFG+PVRT+TEEV\ITLYIYRAPEII'I'G + the humanproteinhasan N-terminal Sbjct: 121 IIRVI.ERDLKPONLI,IDftM}IATKLADFGLARAFGIPVRTFTHEWTLWYRAPEILLGIIR&180 regionthat is absentfrom the maize protein.The greenblocksindicate 295 QUETY:235 YST*VDIIVSLGCIFAEUVIIRR&FPGDSEIDOLFRIFRI,T,GTPDE$VITPGUTSUPDYKSS differencesin sequence,and the yel/owbar YST VD+YIS+GCIFAEMV++ I,FPGDSEID+LF+IFR I,GTP+E I{PGV+ +PD+K + sbjctr 181 YSXWDVWSVGCIFAEWHQKSTFPGDSEIDELFKIFRXLGTPNEO6TVPGVSfi,PDFKIDA when the two 240 the similarities: summarizes the areidentical, aminoacidsequences QUETY! 2 9 5 FPKIfi&ODF.SS/VPST,DBOGN$LLSO}TTSYDPNKRISAKIAIAIIPFFQDV 3 4 5 conservative aminoacid residue is shown; FP+IV OD + vvP IJD G I'I'S+}TL Y+P+KRI+A+ A! E +F+D+ areindicatedby a plussign substitutions sb j ct : 2 4 1 FPRIIO*ODI.ATWPIEDS*61IALLSKIT{LRYEPSKRITAR0ATIHBYFKDL 2 9 0 (+).Onlyone smallgap hasbeen introduced-indicated by the redarrow at determines protein structure, and structure dictates biochemical function, proposition194in the Querysequence-to teins that share a similar amino acid sequence usually have the same structure maximally. The alignthe two sequences which is expressed and usually perform similar biochemical functions, even when they are found in alignment score(Score), in two differenttypes of units,takesinto distantly related organisms. In modern cell biology, the study of a newly discovand account oenaltiesfor substitutions protein proteins ered usually begins with a search for previously characterized gaps;the higherthe alignmentscore,the that are similar in their amino acid sequences. of the betterthe match.Thesignificance Searching a collection of known sequences for homologous genes or pro(E) alignmentis reflectedin the Expectation teins is typically done over the World Wide Web, and it simply involves selecting how oftena match value,which specifies a databaseand entering the desired sequence.A sequencealignment programthis good would be expectedto occurby chance. The lowerthe E value,the more the most popular are BI,AST and FASTA-scans the database for similar the match;the extremelylow significant sequences by sliding the submitted sequence along the archived sequences (e-111) indicatescertain value here until a cluster of residues falls into full or partial alignment (Figure 8-30). The E valuesmuch higherthan 0.1 significance. results of even a complex search-which can be performed on either a For areunlikelyto reflecttrue relatedness. nucleotide or an amino acid sequence-are returnedwithin minutes. Such comexample,an E valueof 0.1meansthereis a parisons can predict the functions of individual proteins, families of proteins, or 1 in 10 likelihoodthat sucha matchwould even most of the protein complement of a newly sequenced organism. arisesolelyby chance. €tpa -


As was explained in Chapter 3, many proteins that adopt the same conformation and have related functions are too distantly related to be identified as clearly homologous from a comparison of their amino acid sequencesalone (see Figure 3-13). Thus, an ability to reliably predict the three dimensional structure of a protein from its amino acid sequencewould improve our ability to infer protein function from the sequence information in genomic databases.In recent years,major progresshas been made in predicting the precise structure of a protein. These predictions are based,in part, on our knowledge of tens of thousands of protein structures that have already been determined by x-ray crystallography and NMR spectroscopy and, in part, on computations using our knowledge of the physical forces acting on the atoms. However, it remains a substantial and important challenge to predict the structures of proteins that are large or have multiple domains, or to predict structures at the very high levels of resolution needed to assist in computer-based drug discovery. \.A/hilefinding homologous sequencesand structures for a new protein will provide many clues about its function, it is usually necessary to test these insights through direct experimentation. However, the clues generated from sequence comparisons typically point the investigator in the correct experimental direction, and their use has therefore become one of the most important strategiesin modern cell biology.

Summary Most proteinsfunction in concert with other proteins,and many methodsexistfor identifying and studying protein-protein interactions.Small-molecule inhibitors allow thefunctions of proteins they act upon to be studied in liuing cells.Becauseproteinswith similar structuresoften hauesimilar functions, the biochemicalactiuifitof a


Chapter8: ManipulatingProteins,DNA,and RNA

protein can often be predicted by searchingdatabasesfor preuiouslycharacterizedproteins that are similar in their amino acid seauences.

ANALYZING AND MANIPULATING DNA Until the early 1970s,DNAwas the most difficult biological molecule for the biochemist to analyze.Enormously long and chemically monotonous, the string of nucleotides that forms the genetic material of an organism could be examined only indirectly, by protein or RNA sequencing or by genetic analysis.Today, the situation has changed entirely. From being the most difficult macromolecule of the cell to analyze, DNA has become the easiest.It is now possible to isolate a specific region of almost any genome, to produce a virtually unlimited number of copies of it, and to determine the sequence of its nucleotides in a few hours. At the height of the Human Genome Project, large facilities with automated machines were generating DNA sequences at the rate of 1000 nucleotides per second, around the clock. By related techniques, an isolated gene can be altered (engineered) at will and transferred back into the germ line of an animal or plant, so as to become a functional and heritable part of the organism'sgenome. These technical breakthroughs in genetic engineering-the ability to manipulate DNA with precision in a test tube or an organism-have had a dramatic impact on all aspects of cell biology by facilitating the study of cells and their macromolecules in previously unimagined ways. Recombinant DNA technology comprises a mixture of techniques, some newly developed and some borrowed from other fields such as microbial genetics (Table 8-3). Central to the technology are the following key techniques: l. Cleavage of DNA at specific sites by restriction nucleases,which greatly facilitates the isolation and manipulation of individual genes. 2. DNA ligation, which makes it possible to design and construct DNA molecules that are not found in nature. 3. DNA cloning through the use of either cloning vectors or the polymerase chain reaction, in which a portion of DNA is repeatedly copied to generate many billions of identical molecules. 4. Nucleic acid hybridization, which makes it possible to find a specific sequenceof DNA or RNA with great accuracy and sensitivity on the basis of its ability to selectivelybind a complementary nucleic acid sequence. 5. Rapid determination of the sequence of nucleotides of any DNA (even entire genomes), making it possible to identify genes and to deduce the amino acid sequence of the proteins they encode. 6. Simultaneous monitoring of the level of mRNA produced by every gene in a cell, using nucleic acid microarrays, in which tens of thousands of hybridization reactions take place simultaneously. In this section, we describe each of these basic techniques, which together have revolutionized the study of cell biology.

Restriction Nucleases Cut LargeDNAMolecules into Fragments unlike a protein, a gene does not exist as a discrete entity in cells, but rather as a small region of a much longer DNA molecule. Although the DNA molecules in a cell can be randomly broken into small pieces by mechanical force, a fragment containing a single gene in a mammalian genome would still be only one among a hundred thousand or more DNA fragments, indistinguishable in their average size. How could such a gene be purified? Becauseall DNA molecules consist of an approximately equal mixture of the same four nucleotides, they cannot be readily separated, as proteins can, on the basis of their different charges and binding properties. The solution to all of these problems began to emerge with the discovery of restriction nucleases.These enzymes,which can be purified from bacteria, cut the DNA double helix at specific sites defined by the local nucleotide sequence, thereby cleaving a long double-stranded DNA molecule into frasments of



Table8-3 SomeMajorStepsin the Development Technology of Recombinant DNAandTransgenic obtainedfroma nearby Miescher bandages firstisolates DNAfromwhitebloodcellsharvested frompus-soaked hospital. transformation. duringbacterial 1944 Averyprovides the geneticinformation evidence that DNA,ratherthanprotein,carries Franklin and Wilkins. propose x-ray results of 1953 Watsonand Crick basedon modelfor DNAstructure the double-helix probes. 1955 Kornbergdiscovers DNApolymerase, the enzymenow usedto producelabeledDNA of nucleicacidhydridization 1961 Marmurand Dotydiscover andfeasibility the specificity DNArenaturation, establishing reactions. andusein nucleases, leadingto theirpurification 1962 Arberprovides the firstevidence for the existence of DNArestriction DNAsequence characterization by Nathansand H.Smith. 1966 Nirenberg, Ochoa,andKhoranaelucidate the geneticcode. 1967 together. Gellertdiscovers DNAligase, the enzymeusedto join DNAfragments at Stanford 1972-1973 DNAcloningtechniques of Boyer,Cohen,Berg,andtheircolleagues aredeveloped by the laboratories University andthe University of California at SanFrancisco. gel-transfer DNAsequences. 1975 Southerndevelops hybridization for the detectionof specific methods. 1975-1977 Sangerand BarrellandMaxamand GilbertdeveloprapidDNA-sequencing fruitflies. 1981-1982 Palmiterand Brinsterproducetransgenic mice;Spradlingand Rubinproducetransgenic 1982 GenBank, at LosAlamosNationalLaboratory. NIH'spublicgeneticsequence database, isestablished 1985 Mullisandco-workers inventthe polymerase chainreaction(PCR). stemcells. in mouseembryonic 1987 Capecchi targetedgenereplacement andSmithiesintroducemethodsfor performing andstudyingproteininteractions. 1989 Fieldsand Songdevelopthe yeasttwo-hybridsystemfor identifying mapsof of DNAthat areusedto makephysical 1989 Olsonandcolleagues describe sequence-tagged sites,uniquestretches humanchromosomes. for homologybetweenDNAandproteinsequences. 1990 Lipmanandcolleagues release BLAST, an algorithmusedto search to carrylargepiecesof cloned BACs, chromosomes, 1990 artificial Simonandcolleagues studyhowto efficiently usebacterial humanDNAfor sequencing. technology. 1991 Hoodand Hunkapillar introducenewautomatedDNAsequence influenzae. 1995 Venterand colleagues sequence the firstcompletegenome,that of the bacteriumHaemophilus of a of the firstgenomesequence the completion 1996 Goffeauandan international announce consortium of researchers 1869

eucaryote, the yeast Saccharo mycescerevisioe. which allow the simultaneousmonitoring 1996-1997 Lockhart and colleaguesand Brown and DeRisiproduceDNA microarrays,

1998 2001 2004

of thousands of genes. the nematode organism, of a multicellular producethe firstcompletesequence SulstonandWaterston andcolleagues w orm Caenorhabditiselegans. Consortia of researchers announce the completion of the drafthumangenomesequence. Publication of the"finished" humangenomesequence.

strictly defined sizes. Different restriction nucleases have diff'erent sequence specificities, and it is relatively simple to find an enzyme that can create a DNA fragment that includes a particular gene.The size of the DNA fragment can then be used as a basis for partial purification of the gene from a mixture. Different species of bacteria make different restriction nucleases, which protect them from viruses by degrading incoming viral DNA. Each bacterial nuclease recognizes a specific sequence of four to eight nucleotides in DNA. These sequences,where they occur in the genome of the bacterium itself, are protected from cleavageby methylation at an A or a C nucleotide; the sequences in foreign DNA are generally not methylated and so are cleavedby the restriction nucleases.Large numbers of restriction nucleaseshave been purified from various species of bacteria; several hundred, most of which recognize different nucleotide sequences,are now available commercially. Some restriction nucleases produce staggered cuts, which leave short single-stranded tails at the two ends of each fragment (Figure 8-31). Ends of this type are known as cohesiueends,as each tail can form complementary base pairs with the tail at any other end produced by the same enzyme (Figure 8-32). The cohesive ends generated by restriction enzymes allow any two DNA fragments to be easily joined together, as long as the fragments were generated with the same restriction nuclease (or with another nuclease that produces the same cohesive ends). DNA molecules produced by splicing together two or more DNA fragments are called recombinant DNA molecules.


Chapter8: ManipulatingProteins, DNA,and RNA Figure8-31 The DNA nucleotidesequencesrecognizedby four widely usedrestrictionnucleases. As in the examplesshown,suchsequences are (thatis,the nucleotide often sixbasepairslong and"palindromic" sequenceis the sameif the helixis turnedby 180degreesaroundthe centerof the shortregionof helixthat is recognized). Theenzymescut the two strandsof DNAat or nearthe recognitionsequence. Forthe genes encodingsomeenzymes, suchas Hpal,the cleavageleavesblunt ends;for others,suchas EcoRl, Hindlll,and Pstl,the cleavageis staggeredand createscohesiveends.Restriction nucleases areobtainedfrom various parainfluenzae, speciesof bacteria:Hpal is from Haemophilus EcoRlis lrom Escherichia colt,Hindlll isfrom Haemophilus influenzae, and Pstlis from Providenciastuartii.



GelElectrophoresis Separates DNAMolecules of DifferentSizes The same types of gel electrophoresismethods that have proved so useful in the analysis of proteins can determine the length and purity of DNA molecules. The procedure is actually simpler than for proteins: because each nucleotide in a nucleic acid molecule already carries a single negative charge (on the phosphate group), there is no need to add the negatively charged detergent SDS that is required to make protein molecules move uniformly toward the positive electrode. For DNA fragments less than 500 nucleotides long, specially designed polyacrylamide gels allow the separation of molecules that differ in length by as little as a single nucleotide (Figure 8-33A). The pores in polyacrylamide gels, however, are too small to permit very large DNA molecules to pass; to separate these by size, the much more porous gels formed by dilute solutions of agarose (a polysaccharide isolated from seaweed) are used (Figure 8-338). These DNA separation methods are widely used for both analytical and preparative purposes. A variation of agarose-gel electrophoresis, called pulsed-field gel electrophoresis,makes it possible to separate even extremely long DNA molecules. Ordinary gel electrophoresisfails to separatesuch molecules becausethe steady electric field stretches them out so that they travel end-first through the gel in snakelike configurations at a rate that is independent of their length. In pulsedfield gel electrophoresis, by contrast, the direction of the electric field changes periodically, which forces the molecules to reorient before continuing to move snakelike through the gel. This reorientation takes much more time for larger molecules, so that longer molecules move more slowly than shorter ones. As a consequence,even entire bacterial or yeast chromosomes separateinto discrete bands in pulsed-field gels and so can be sorted and identified on the basis of their size (Figure 8-33C). Although a typical mammalian chromosome of 108 base pairs is too large to be sorted even in this way, large segments of these chromosomes are readily separated and identified if the chromosomal DNA is first cut with a restriction nuclease selected to recognize sequencesthat occur only rarely (once every 10,000or more nucleotide pairs). The DNA bands on agarose or polyacrylamide gels are invisible unless the DNA is labeled or stained in some way. one sensitivemethod of staining DNA is to expose it to the dye ethidium bromide, which fluoresces under ultraviolet light when it is bound to DNA (seeFigure 8-338,c). An even more sensitive detection method incorporates a radioisotope into the DNA molecules before electrophoresis;32Pis often used as it can be incorporated into DNA phosphates and emits an energetic B particle that is easily detected by autoradiography,as in Figure 8-33. (For a discussion ofradioisotopes, see p. 601).

s', 5



s', 5',

anneatino Figure8-32 The use of restrictionnucleasesto produce DNA fragments that can be easilyjoined together,Fragments with the samecohesive endscan readilyjoin by complementary base-pairing betweentheir cohesiveends,as illustrated. Thetwo DNAfragmentsthat join in this examplewere both producedby the EcoRlrestrictionnuclease,whereas the three other fragmentswere producedby differentrestrictionnucleases that generateddifferentcohesiveends(seeFigure8-31). Blunt-ended fragments, likethosegeneratedby Hpal(seeFigure8-31),can be spliced togetherwith more difficulty.

I 5',

5 l


ANALYZING AND MANIPULATING DNA Figure8-33 Gel electrophoresis techniquesfor separatingDNA isfrom moleculesby size.In the threeexamplesshown,electrophoresis

taneS 4 123

jfffi l",J""n'"i,1,1,1,,.,, llSl?"131'.[;lTii:lll"o:"',",.:-,i:,xiT; poreswas usedto fractionatesingle-stranded DNA.In the sizerange10to 500 nucleotides, DNAmolecules that differin sizeby only a single nucteotidecan be separated from eachother.In the example,the four lanes represent setsof DNAmoleculessynthesized in the courseof a DNAsequencingprocedure. The DNAto be sequenced hasbeenartificially replicated from a fixedstartsiteup to a variablestoppingpoint,producing a setof partialreplicas of differinglengths.(Figure8-50 explainshow such setsof partialreplicas aresynthesized.) Lane1 showsall the partialreplicas that terminatein a G,lane2 all thosethat terminatein an A, lane3 all those that terminatein a I and lane4 all thosethat terminatein a C.Sincethe DNAmoleculesusedin thesereactions their positions wereradiolabeled, can be determinedby autoradiography, as shown. (B)An agarosegel with medium-sized poreswasusedto separate doublestrandedDNAmolecules. Thismethodis mostusefulin the sizerange300 to 10,000nucleotidepairs.TheseDNAmolecules arefragmentsproduced and nuclease, by cleavingthe genomeof a bacterialviruswith a restriction they havebeendetectedby theirfluorescence when stainedwith the dye ethidiumbromide.(C)Thetechniqueof pulsed-field agarosegel electrophoresis was usedto separate16 differentyeast(Saccharomyces cerevisiae) chromosomes, which rangein sizefrom 220p00to 2.5 million as largeas nucleotidepairs.TheDNAwasstainedas in (B).DNAmolecules 107nucleotidepairscan be separated in this way.(A,courtesyof Leander Laufferand PeterWalter;B,courtesyof KenKreuzer;C,from D.Vollrathand R.W.Davis,NucleicAcidsRes.15:7865-7876, 1987.With permissionfrom OxfordUniversitvPress.)

& @




& PurifiedDNAMolecules Labeledwith CanBeSpecifically Radioisotopes or ChemicalMarkersin vitro




15 7


TWo procedures are widely used to label isolated DNA molecules. In the first method, a DNA polymerase copies the DNA in the presence of nucleotides that are either radioactive (usually labeled with 32P)or chemically tagged (Figure 8-34A). In this way, "DNA probes" containing many labeled nucleotides can be produced for nucleic acid hybridization reactions (discussedbelow). The second procedure uses the bacteriophage enzyme pol)'nucleotide kinase to transfer a single 32P-labeledphosphate from AIP to the 5' end of each DNA chain (Figure 8-348). Because only one 32Patom is incorporated by the kinase into each DNA strand, the DNA molecules labeled in this way are often not radioactive enough to be used as DNA probes; becausethey are labeled at only one end, however, they have been invaluable for other applications, including DNA footprinting, as discussedin Chapter 7. Radioactivelabeling methods are being replaced by Iabeling with molecules that can be detected chemically or through fluorescence.To produce such nonradioactive DNA molecules, specially modified nucleotide precursors are used (Figure B-34C).A DNA molecule made in this way is allowed to bind to its complementary DNA sequence by hybridization, as discussed in the next section, and is then detected with an antibody (or other ligand) that specifically recognizes its modified side chain (Figure 8-35).

Wayto NucleicAcidHybridization Reactions Providea Sensitive DetectSpecificNucleotideSequences \.A/henan aqueous solution of DNA is heated at 100'C or exposed to a very high pH (pH ' 13), the complementary base pairs that normally hold the two strands of the double helix together are disrupted and the double helix rapidly dissociates into two single strands. This process, called DNA denaturation, was for many years thought to be irreversible. In 1961,however, it was discovered that complementary single strands of DNA readily re-form double helices by a process called hybridization (also called Dl/A renaturation) if they are kept for a




wast '?e*

'!659' ... i,h**s| '&':


E f

c o

nucleotide parrs

Ero ,qkf.


5 10 , 0 0 0 v f

8 9$,

rry**, ,wl '*i$ :8Ai!i '{*#r,


9 3 6 1


Chapter8: ManipulatingProteins, DNA,and RNA








D N A l a b e l e da t 5 ' e n d s w i t h p-olynucleotidekinaseand "P-labeled ATP

d e n a t u r ea n d a n n e a lw i t h mixture of hexanucleotides


u restrictionnucleasecuts DNA helix into two differentsizedfragments


and I aOOorunpolymerase labeled nucleotides I separationby gel electrophoresis


+ 3',

5' DNA polymeraseincorporateslabeled n u c l e o t i d e sr,e s u l t i n gi n a p o p u l a t i o no f D N A m o l e c u l e tsh a t c o n t a i nl a b e l e de x a m o l e so f a l l sequenceson both strands

the desiredDNA fragment with a singlestrand labeled at one end

t h i s r e g i o ns t i l l

ooo ililil


ltl o-



prolonged period at 65"c. similar hybridization reactions can occur between any two single-stranded nucleic acid chains (DNA/DNA, RNA/RNA, or RNA/DNA), provided that they have complementary nucleotide sequences. These specific hybridization reactions are widely used to detect and characterize specific nucleotide sequencesin both RNA and DNA molecules. Single-stranded DNA molecules used to detect complementary sequences are known as probes; these molecules, which carry radioactive or chemical markers to facilitate their detection, can range from fifteen to thousands of nucleotides long. Hybridization reactions using DNA probes are so sensitive and selective that they can detect complementary sequences present at a concentration as low as one molecule per cell. It is thus possible to determine how many copies of any DNA sequence are present in a particular DNA sample. The same technique can be used to searchfor related but nonidentical genes.To find a gene of interest in an organism whose genome has not yet been sequenced,for example, a portion of a knor,rrngene can be used as a probe (Figure 8-36).

Figure8-34 Methods for labeling DNA mofeculesinvitro. (A)A purifiedDNA polymerase enzymelabelsall the nucleotides in a DNAmoleculeand can therebyproducehighlyradioactive DNA probes.(B)Polynucleotide kinaselabels only the 5'ends of DNAstrands; therefore, when labelingis followed by restriction nuclease cleavage, as shown,DNA molecules containinga single5'-endlabeledstrandcan be readilyobtained. (C)The method in (A)is alsousedto producenonradioactive DNAmolecules that carrya specificchemicalmarkerthat can be detectedwith an appropriate antibody.The modified nucleotideshown can be incorporatedinto DNA by DNA polymerase, allowingthe DNAmoleculeto serveas a probe that can be readily detected.The baseon the nucleoside triphosphate shownis an analogof thymine,in whichthe methylgroupon T hasbeen replacedby a spacerarm linked to the plantsteroiddigoxigenin. An antidigoxygeninantibodycoupledto a visible markersuchas a fluorescentdye is usedto visualize the probe.Otherchemicallabels suchas biotin can be attachedto nucleotides and usedin essentiallv the sameway.



Figure8-35In situhybridization to locatespecificgeneson chromosomes. Here,sixdifferentDNAprobeshavebeenusedto markthe locations of theirrespective nucleotide seouences on humanchromosome 5 at metaphase. Theprobeshavebeenchemically labeled anddetected withfluorescent antibodies. Bothcopies of chromosome 5 areshown, aligned sideby side.Eachprobeproduces two dotson eachchromosome, sincea metaphase hasreplicated itsDNAandtherefore chromosome (Courtesy contains two identical DNAhelices. of DavidC.Ward.)

Alternatively, DNA probes can be used in hybridization reactions with RNA rather than DNA to find out whether a cell is expressinga given gene. In this case a DNA probe that contains part of the gene's sequence is hybridized with RNA purified from the cell in question to see whether the RNA includes nucleotide sequencesmatching the probe DNA and, if so, in what quantities. In somewhat more elaborate procedures, the DNA probe is treated with specific nucleases after the hybridization is complete, to determine the exact regions of the DNA probe that have paired with the RNA molecules. One can thereby determine the start and stop sites for RNA transcription, as well as the precise boundaries of the intron and exon sequencesin a gene (FigureS-37). Today, the positions of intron/exon boundaries are usually determined by sequencing t},:'e complementary DNA (cDNA) sequences that represent the mRNAs expressedin a cell and comparing them with the nucleotide sequenceof the genome.We describe later how cDNAs are prepared from mRNAs. The hybridization of DNA probes to RNAs allows one to determine whether or not a particular gene is being transcribed; moreover, when the expression of a gene changes,one can determine whether the change is due to transcriptional or post-transcriptional controls (seeFigure 7-92). These tests ofgene expression were initially performed with one DNA probe at a time. DNA microarrays now allow the simultaneous monitoring of hundreds or thousands of genes at a time, as we discuss later. Hybridization methods are in such wide use in cell biology today that it is difficult to imagine how we could study gene structure and expressionwithout them.


B Gl.__,f

/ mixture of sing le-stranded DNAmolecules




/a-'\**..., F

h y b r i d i z a t i o ni n 5 0 % formamide at 42"C

only A forms stable d o u b l eh e l i x

A, C, and E all form s t a b l ed o u b l e h e l i c e s

Figure8-36 Stringentversus nonstringent hybridizationconditions. To usea DNA probeto find an identical conditions match,stringenthybridization are used;the reactiontemperatureis kept just a few degreesbelow that at which a perfectDNA helixdenaturesin the solventused(itsmeltingtemperoture), so that all imperfecthelicesformed are Whena DNAprobeis being unstable. usedto find DNAswith related,as well as lessstringent sequences, identical, is conditionsareused;hybridization performedat a lower temperature,which allowseven imperfectlypaireddouble helicesto form. Only the lowerconditionscan temperaturehybridization be usedto searchfor genesthat are but relatedto geneA (Cand nonidentical E in this example).


Chapter8: ManipulatingProteins, DNA,and RNA exon1



5', 3',

TREATMENT I sr ruucleasr

degraded DNA

I D E G R A D ESST N G L E - S T R A N D E D I N U C L E IA CC I D S 3' 5',

5' 3',

untreated o o c

control *-

eXOn 2


exon 1 a o

I a l k a l i n e a g a r o s eg e l

Northernand SouthernBlottingFacilitate Hybridization with Electrophoretically Separated NucleicAcidMolecules In a complex mixture of nucleic acids, DNA probes are often used to detect only those molecules with sequences that are complementary to all or part of the probe. Gel electrophoresiscan be used to fractionate the many different RNA or DNA molecules in a crude mixture according to their size before the hybridization reaction is performed; if the probe binds to molecules of only one or a few sizes, one can be certain that the hybridization was indeed specific. Moreover, the size information obtained can be invaluable in itself. An example illustrates this point. Suppose that one wishes to determine the nature of the defect in a mutant mouse that produces abnormally low amounts of albumin, a protein that liver cells normally secreteinto the blood in large amounts. First, one collects identical samples of liver tissue from mutant and normal mice (the latter serving as controls) and disrupts the cells in a strong detergent to inactivate nucleasesthat might otherwise degrade the nucleic acids. Next, one separates the RNA and DNA from all of the other cell components: the proteins present are completely denatured and removed by repeated extractions with phenol-a potent organic solvent that is partly miscible with water; the nucleic acids, which remain in the aqueous phase, are then precipitated with alcohol to separate them from the small molecules of the cell. Then, one separatesthe DNA from the RNA by their different solubilities in alcohols and degradesany contaminating nucleic acid of the unwanted type by treatment with a highly specific enzyme-either an RNase or a DNase.The mRNAs are typically separatedfrom bulk RNA by retention on a chromatography column that specifically binds the poly-A tails of mRNAs. To analyze the albumin-encoding mRNAs, a technique called Northern blotting is used. First, the intact mRNA molecules purified from mutant and control liver cells are fractionated on the basis oftheir sizesinto a seriesofbands by gel electrophoresis. Then, to make the RNA molecules accessible to DNA probes, a replica of the pattern of RNA bands on the gel is made by transferring ("blotting"l the fractionated RNA molecules onto a sheet of nitrocellulose or nylon paper. The paper is then incubated in a solution containing a labeled DNA probe, the sequence of which corresponds to part of the template strand that

Figure8-37 The useof nucleicacid hybridizationto determinethe region of a clonedDNAfragmentthat is presentin an mRNAmolecule.The methodshownreouiresa nuclease that cutsthe DNAchainonly whereit is not base-paired to a complementary RNA chain.The oositionsof the intronsin genesare mappedby the eucaryotic methodshown.Forthis type of analysis, the DNAis electrophoresed througha denaturingagarosegel,which causesit to migrateas single-stranded molecules. Thelocationof eachend of an RNA moleculecan be determinedusinq s i m i l am r ethods.


ANALYZING AND MANIPULATING DNA r e m o v en i t r o c e l l u l o s e p a p e rw i t h t i g h t l y b o u n d n u c l e i ca c i d s

stackof paper towels



n itrocelIulose paper

l a b e l e dR N Ao r D N Ao f k n o w n s i z e s s e r v i n ga s s i z em a r k e r s agarose gel spon9e a l k a l is o l u t i o n N U C L E IA CC I D SS E P A R A T E D A C C O R D I NTGO S I Z EB Y A G A R O S E G E LE L E C T R O P H O R E S I S


Figure8-38 Detectionof specificRNAor DNA moleculesby gel-transfer hybridization.In this example,the DNAprobeis detectedby its radioactivity. methods DNAprobesdetectedby chemicalor fluorescence arealsowidelyused(seeFigure8-34).(A)A mixtureof eithersingleDNA strandedRNAmolecules(Northern blotting)orthe double-stranded fragmentscreatedby restrictionnucleasetreatment(Southernblotting)is (B)A sheetof separated accordingto lengthby electrophoresis. RNAor nitrocellulose or nylonpaperis laidoverthe gel,and the separated DNAfragmentsaretransferred to the sheetby blotting.(C)The nitrocellulose sheetis carefullypeeledoff the gel.(D)The sheetcontaining the bound nucleicacidsis placedin a sealedplasticbag togetherwith a bufferedsaltsolutioncontaininga radioactively labeledDNAprobe.The sheetis exposedto a labeledDNAprobefor a prolongedperiodunder (E)The sheetis removedfrom the bag and conditionsfavoringhybridization. washedthoroughly,so that only probemolecules that havehybridizedto the RNAor DNAimmobilizedon the paperremainattached.After autoradiography, the DNAthat hashybridizedto the labeledprobeshows up as bandson the autoradiograph DNAmolecules ForSouthernblotting,the strandsof the double-stranded process; this is on the papermust be separated beforethe hybridization done by exposingthe DNAto alkalinedenaturingconditionsafterthe gel hasbeenrun (not shown). produces albumin mRNA. The RNA molecules that hybridize to the labeled DNA probe on the paper (because they are complementary to part of the normal albumin gene sequence) are then located by detecting the bound probe by autoradiography or by chemical means (Figure 8-38). The sizes of the hybridized RNA molecules can be determined by reference to RNA standards of

known sizesthat are electrophoresedside by side with the experimental sample. In this way, one might discover that Iiver cells from the mutant mice make albumin mRNA in normal amounts and of normal size; alternatively,you might find that they make it in normal size but in greatly reduced amounts. Another possibility is that the mutant albumin nRNA molecules are abnormally short; in this casethe gel blot could be retestedwith a seriesof shorter DNA probes, each corresponding to small portions of the gene, to revealwhich part of the normal RNA ls mrsslng. The original gel-transfer hybridization method, called Southern blotting, analyzes DNA rather than RNA. (It was named after its inventor, and the Northern andWestern blotting techniques were named with referenceto it.) Here, isoIated DNA is flrst cut into readily separablefragments with restriction nucleases. The double-stranded fragments are then separated on the basis of size by gel electrophoresis, and those complementary to a DNA probe are identified by blotting and hybridization, as just described for RNA (seeFigure B-38). To characterize the structure of the albumin gene in the mutant mice, an albumin-specific DNA probe would be used to construct a detailed restriction map of the genome in the region of the albumin gene (such a map consists of the pattern of DNA fragments produced by various restriction nucleases).From this map one


positions of labeled marKers

labeled bands


Chapter8: ManipulatingProteins,DNA,and RNA

could determine if the albumin gene has been rearranged in the defective animals-for example, by the deletion or the insertion of a short DNA sequence; most single-basechanges,however, could not be detected in this way.

GenesCanBeClonedUsingDNALibraries Any DNA fragment can be cloned. In molecular biology, the term DNA cloning is used in two senses.In one sense,it literally refers to the act of making many identical copies of a DNA molecule-the amplification of a particular DNA sequence.However, the term also describes the isolation of a particular stretch of DNA (often a particular gene) from the rest of a cell's DNA, becausethis isolation is greatly facilitated by making many identical copies of the DNA of interest. As discussed earlier in this chapter, cloning, particularly when used in the context of developmental biology, can also refer to the generation of many genetically identical cells starting from a single cell or even to the generation of genetically identical organisms. In all cases,cloning refers to the act of making many genetically identical copies; in this section, we will use the term cloning (or DNA cloning or gene cloning) to refer to methods designed to generate many identical copies of a segment of nucleic acid. DNA cloning in its most general sensecan be accomplished in severalways. The simplest involves inserting a particular fragment of DNA into the purified DNA genome of a self-replicating genetic element-generally a virus or a plasmid. A DNA fragment containing a human gene, for example, can be joined in a test tube to the chromosome of a bacterial virus, and the new recombinant DNA molecule can then be introduced into a bacterial cell, where the inserted DNA fragment will be replicated along with the DNA of the virus. Starting with only one such recombinant DNA molecule that infects a single cell, the normal replication mechanisms of the virus can produce more than l012identical virus DNA molecules in less than a day, thereby amplifying the amount of the inserted human DNA fragment by the same factor. A virus or plasmid used in this way is knor.tmas a cloning uector,and the DNA propagated by insertion into it is said to have been cloned. To isolate a specific gene, one often begins by constructing a Dl/A librarya comprehensive collection of cloned DNA fragments from a cell, tissue, or organism. This library includes (one hopes) at least one fragment that contains the gene of interest. Libraries can be constructed with either a virus or a plasmid vector and are generally housed in a population of bacterial cells.The principles underlying the methods used for cloning genes are the same for either tlpe of cloning vector, although the details may differ. Today,most cloning is performed with plasmid vectors. The plasmid vectors most widely used for gene cloning are small circular molecules of double-stranded DNA derived from larger plasmids that occur naturally in bacterial cells. They generally account for only a minor fraction of the total host bacterial cell DNA, but they can easily be separated owing to their small size from chromosomal DNA molecules, which are large and precipitate as a pellet upon centrifugation. For use as cloning vectors, the purified plasmid DNA circles are first cut with a restriction nuclease to create linear DNA molecules. The genomic DNA to be used in constructing the library is cut with the same restriction nuclease,and the resulting restriction fragments (including those containing the gene to be cloned) are then added to the cut plasmids and annealed via their cohesive ends to form recombinant DNA circles. These recombinant molecules containing foreign DNA inserts are then covalently sealedwith the enzyme DNA ligase (Figure 8-39). In the next step in preparing the library, the recombinant DNA circles are introduced into bacterial cells that have been made transiently permeable to DNA. These bacterial cells are now said to be transfectedwith the plasmids. As the cells grow and divide, doubling in number every 30 minutes, the recombinant plasmids also replicate to produce an enormous number of copies of DNA circles containing the foreign DNA (Figure 8-40). Many bacterial plasmids carry genes for antibiotic resistance (discussedin chapter 24), a property that can be


ANALYZING AND MANIPULATING DNA c i r c ul ar dou ble-stranded p l a s m i dD N A ( c l o n i n gv e c t o r )

.. 1';..


DNAfragment to be cloned



r e c o m b i n a nD t NA






Figure8-39 The insertionof a DNA fragment into a bacterialplasmid with the enzymeDNA ligase.The plasmidis nuclease(in cut open with a restriction this caseone that oroducescohesive ends)and is mixedwith the DNA fragmentto be cloned(whichhasbeen preparedwith the samerestriction nuclease). DNAligaseand ATPareadded. and DNA Thecohesiveendsbase-pair, ligasesealsthe nicksin the DNA backbone,producinga complete recombinantDNAmolecule. (Micrographs courtesyof Huntington Potterand DavidDressler.)



exploited to select those cells that have been successfullytransfected;if the bacteria are grown in the presence of the antibiotic, only cells containing plasmids will survive. Each original bacterial cell that was initially transfected contains, in general, a different foreign DNA insert; this insert is inherited by all of the progeny cells of that bacterium, which together form a small colony in a culture dish. For many years, plasmids were used to clone fragments of DNA of 1000 to 30,000nucleotide pairs. Larger DNA fragments are more difficult to handle and were harder to clone. Then researchersbegan t o vse yeastartificial chromosomes (IACs), which could accommodate very large pieces of DNA (Figure 8-4f). Today,new plasmid vectors based on the naturally occurring F plasmid of E. coli are used to clone DNA fragments of 300,000to I million nucleotide pairs. Unlike smaller bacterial plasmids, the F plasmid-and its derivative, the bacterial artificial chromosome (BAC)-is present in only one or two copies per E. coli celI. The fact that BACs are kept in such low numbers in bacterial cells may contribute to their ability to maintain large cloned DNA sequencesstably: with only a few BACs present, it is less Iikely that the cloned DNA fragments will become scrambled by recombination with sequencescarried on other copies of the plasmid. Because of their stability, ability to accept large DNA inserts, and ease of handling, BACs are now the preferred vector for building DNA libraries of complex organisms-including those representing the human and mouse genomes.

TwoTypesof DNALibrariesServeDifferentPurposes Cleaving the entire genome of a cell with a specific restriction nuclease and cloning each fragment as just described produces a very large number of DNA fragments-on the order of a million for a mammalian genome. The fragments are distributed among millions of different colonies of transfected bacterial cells. d o ub l e - s t r a n d e d recombrnant p l a s m i dD N A i n t r o d u c e di n t o b a c t e r i acl e l l



. , "' \"; \l

1* bacteriaI cell

""r /2-,



: c e l lc u l t u r ep r o d u c e s h u n d r e d so f m i l l i o n so f new bacteria



m a n y c o p i e so f p u r i f i e d r e c o m b i n a np t l a s m i di s o l a t e d f r o m l y s e db a c t e r i acl e l l s

Figure8-40 The amplification of the DNAfragmentsinsertedinto a plasmid. To producelargeamountsof the DNAof the recombinantplasmidDNAin interest, Figure8-39 is introducedinto a whereit will bacteriumby transfection, replicatemanymillionsof timesasthe bacteriummultiplies.


Chapter8: ManipulatingProteins, DNA,and RNA







rlm+K left arm

rioht arm

l a r g ec h r o m o s o m aflr a g m e n t s




Figure8-41 The makingof a yeast artificial chromosome(YAC).A YAC vectorallowsthe cloningof very large DNAmolecules. TEL,CEN,and ORIarethe telomere,centromere, and originof replicationsequences, respectively, for the yeast Saccharomycescerevisiae;all of theseare requiredto propagatethe YAC. BamHland EcoRlaresiteswherethe corresponding restrictionnucleases cut the DNAdoublehelix.The seouences denotedA and B encodeenzymesthat serveas selectable markersto allowthe easyisolationofyeastcellsthat have takenup the artificialchromosome. Because bacteriadividemore rapidly than yeasts,most large-scale cloning projectsnow useE.colias the meansfor amplifyingDNA.(Adaptedfrom D.T.Burke,G.F.Carleand M.V.Olson, Science236:806-812,1987.With permission from AAAS.)



5 . 5x 10 3 n u c l e o t i d ep a i r s

up to

'106 n u c l e o t i d ep a i r s

3 9 x 1 0 3n u c l e o t i d ep a i r s


\A/hen working with BACs rather than typical plasmids, larger fragments can be inserted, and so fewer transfected bacterial cells are required to cover the genome. In either case, each of the colonies is composed of a clone of cells derived from a single ancestor cell, and therefore harbors many copies of a particular stretch of the fragmented genome (Figure 8-42). Such a plasmid is said to contain a genomic DNA clone, and the entire collection of plasmids is called a genomic DNA library. But because the genomic DNA is cut into fragments at random, only some fragments contain genes.Many of the genomic DNA clones obtained from the DNA of a higher eucaryotic cell contain only noncoding DNA, which, as we discussedin Chapter 4, makes up most of the DNA in such genomes. An alternative strategyis to begin the cloning processby selecting only those DNA sequencesthat are transcribed into mRNA and thus are presumed to correspond to protein-encoding genes.This is done by extracting the mRNA from cells and then making a DNA copy of each mRNA molecule present-a so-called complementary DNA, or cDNA. The copying reaction is catalyzed by the reverse transcriptase enzyme of retroviruses,which synthesizesa complementary DNA chain on an RNA template. The single-stranded cDNA molecules synthesizedby the reverse transcriptase are converted into double-stranded cDNA molecules by DNA polymerase, and these molecules are inserted into a plasmid or virus vector and cloned (Figure 8-43). Each clone obtained in this way is called a cDNA clone, and the entire collection of clones derived from one mRNA preparation constitutes a cDNA library. Figure 8-44 illustrates some important differences between genomic DNA clones and cDNA clones. Genomic clones represent a random sample of all of the DNA sequencesin an organism and, with very rare exceptions, are the same regardlessof the cell type used to prepare them. By contrast, cDNA clones contain only those regions of the genome that have been transcribed into mRNA. Becausethe cells of different tissuesproduce distinct sets of mRNA molecules, a distinct cDNA library is obtained for each type of cell used to prepare the library. Figure8-42 Constructionof a human genomicDNA library.A genomic libraryis usuallystoredasa set of bacteria, eachbacteriumcarryinga differentfragmentof humanDNA.Forsimplicity, cloningof just a few representative fragments(colored)is shown.In reality,all of the gray DNA fragmentswould alsobe cloned.

h u m a nd o u b l e - s t r a n d eD d NA E rrH I CLEAVw I RESTRTCTTON NUCLEASE I V

m i l l i o n so f g e n o m i c DNAfragments




h u m a n g e n o m i cD N A l i b r a r y


Irssue ( e . 9 . ,b r a i n )

I '. :' ,543









offtft TTTTTT




TTTTTTT d o u b l e - s t r a n d ecdD N Ac o p yo f o r i g i n a lm R N A

Figure8-43 The synthesisof cDNA. TotalmRNAis extractedfrom a particular tissue,and the enzymereverse transcriptaseproducesDNAcopies (cDNA)of the mRNAmolecules(see p. 320).Forsimplicity, the copyingof just one of thesemRNAsinto cDNAis A shortoligonucleotide illustrated. to the poly-Atail at the complementary in Chapter 3' end of the mRNA(discussed 6) is first hybridizedto the RNAto act as a primerfor the reversetranscriptase, whichthen copiesthe RNAinto a DNAchain,thereby complementary forming a DNA/RNAhybrid helix.Treating the DNA/RNAhybrid with RNaseH (see Figure5-12) createsnicksand gapsin the RNAstrand.The enzymeDNA polymerase then copiesthe remainingsinglestrandedcDNAinto double-stranded cDNA.Thefragmentof the originalmRNA is the primerfor this synthesisreaction, the DNApolymerase asshown.Because usedto synthesizethe secondDNA throughthe bound strandcan synthesize the RNAfragmentthat is RNAmolecules, base-oairedto the 3' end of the first DNA strandusuallyactsasthe primerfor the finaloroductofthe secondstrand ThisRNAis eventually synthesis. cloning degradedduringsubsequent steps.As a result,the nucleotide at the extreme5'ends ofthe sequences areoften originalmRNAmolecules absentfrom cDNAlibraries.

c n r o m o s o m aDl N A



nontranscri bed DNA




RNA I transcripts

















ffi I



- i : : : :

r i' .' 1' t

iriii'r' ta5;!


rF H H H I

g e n o m i cD N Ac l o n e si n g e n o m i cD N A l i b r a r y

CDNAclonesin cDNAlibrary

Figure8-44 The differencesbetween cDNAclonesand genomicDNA clones derived from the same region of DNA. In this example,geneA is infrequently whereasgeneB is frequently transcribed, and both genescontain transcribed, invons (green).Inthe genomic DNA library,both the intronsand the nontranscribedDNA (pink)are included in the clones,and mostclonescontain,at most,only part of the codingsequence of a gene (red).In the cDNAclones,the intron sequences(yellow)havebeen removedby RNAsplicingduringthe formationof the mRNA(blue),and a continuouscodingsequenceis therefore geneB is presentin eachclone.Because morefrequentlythan geneA transcribed in the cellsfrom whichthe cDNAlibrary was is representedmuch more frequentlythan A in the cDNAlibrary.In A and B arein principle contrast, equallyin the genomic represented DNAlibrary.


Chapter8: ManipulatingProteins, DNA,and RNA

cDNAClonesContainUninterrupted CodingSequences There are severaladvantagesin using a cDNA library for gene cloning. First, specialized cells produce large quantities of some proteins. In this case,the mRNA encoding the protein is likely to be produced in such large quantities that a cDNA library prepared from the cells is highly enriched for the cDNA molecules encoding the protein, greatly reducing the problem of identi$ring the desired clone in the library (seeFigure B-44). Hemoglobin, for example, is made in large amounts by developing erl,throcytes (red blood cells); for this reason the globin geneswere among the first to be cloned. By far the most important advantageof cDNA clones is that they contain the uninterrupted coding sequence of a gene. As we have seen, eucaryotic genes usually consist of short coding sequences of DNA (exons) separated by much Ionger noncoding sequences (introns); the production of mRNA entails the removal of the noncoding sequences from the initial RNA transcript and the splicing together of the coding sequences.Neither bacterial nor yeast cells will make these modifications to the RNA produced from a gene of a higher eucaryotic cell. Thus, when the aim of the cloning is either to deduce the amino acid sequence of the protein from the DNA sequence or to produce the protein in bulk by expressingthe cloned gene in a bacterial or yeast cell, it is much preferable to start with cDNA. cDNA libraries have an additional use: as described in Chapter 7, many mRNAs from humans and other complex organisms are alternatively spliced, and a cDNA library often representsmany, if not all, of the alternatively spliced mRNAs produced from a given cell line or tissue. Genomic and cDNA libraries are inexhaustible resources,which are widely shared among investigators.Today, many such libraries are also available from commercial sources.

GenesCanBeSelectively Amplifiedby PCR Now that so many genome sequencesare available,genescan be cloned directly without the need to first construct DNA libraries. A technique called the polymerase chain reaction (PCR)makes this rapid cloning possible. Starting with an entire genome, PCR allows the DNA from a selected region to be amplified several billionfold, effectively "purifying" this DNA away from the remainder of the genome. To begin, a pair of DNA oligonucleotides, chosen to flank the desired nucleotide sequence of the gene, are synthesizedby chemical methods. These oligonucleotides are then used to prime DNA synthesis on single strands generated by heating the DNA from the entire genome. The newly synthesizedDNA is produced in a reaction catalyzed in uitroby a purified DNA polymerase, and the primers remain at the 5' ends of the final DNA fragments that are made (Figure 8-454). Nothing special is produced in the first cycle of DNA synthesis;the power of the PCR method is revealed only after repeated rounds of DNA slmthesis.Every cycle doubles the amount of DNA synthesized in the previous cycle. Because each cycle requires a brief heat treatment to separatethe two strands of the template DNA double helix, the technique requires the use of a special DNA polymerase, isolated from a thermophilic bacterium, that is stable at much higher temperatures than normal so that it is not denatured by the repeated heat treatments. with each round of DNA synthesis,the newly generated fragments serve as templates in their turn, and within a few cycles the predominant product is a single species of DNA fragment whose length corresponds to the distance between the two original primers (seeFigure 8-458). In practice, effective DNA amplification requires 20-30 reaction cycles,with the products of each cycle serving as the DNA templates for the next-hence the term polymerase "chain reaction." A single cycle requires only about 5 minutes, and the entire procedure can be easily automated. pcR thereby makes possible the "cell-free molecular cloning" of a DNA fragment in a few hours, compared with the several days required for standard cloning procedures. This technique



is now used routinely to clone DNA from genes of interest directly-starting either from genomic DNA or from mRNA isolated from cells (Figure 8-46). The PCR method is extremely sensitive;it can detect a single DNA molecule in a sample.Trace amounts of RNA can be analyzed in the same way by first transcribing them into DNA with reverse transcriptase.The PCR cloning technique has largely replaced Southern blotting for the diagnosis of genetic diseasesand for the detection of low levels of viral infection. It also has great promise in forensic medicine as a means of analyzing minute traces of blood or other tissues-


S T E P1

S T E P2

S T E P3


s e p a r a t et h e D N A s e p a r a t et h e D N A s t r a n d sa n d a n n e a l primers

DNA synthesis /




s e p a r a t et h e D N A s t r a n d sa n d a n n e a l primers

DNA synthesis




FIRSTCYCLE (producingtwo double-stranded DNA molecules)

SECOND CYCLE ( p r o d u c i n gf o u r d o u b l e - s t r a n d e d DNA molecules)

THIRDCYCLE (producingeight double-stranded DNA molecules)

Knowledgeof the DNAsequenceto be amplifiedis Figure8-45 Amplificationof DNA by the PCRtechnique. to the sequenceon one strand primerDNAoligonucleotides. One primeris complementary usedto designtwo synthetic, of the DNAdoublehelix,and one is complementary to the sequenceon the otherstrand,but at the oppositeend of the which is performedby a DNA regionto be amplified. Theseoligonucleotides serveas primersfor in vitroDNAsynthesis, DNA,and polymerase, and they determinethe segmentof the DNAto be amplified.(A)PCRstartswith a double-stranded the two strands(step1).Afterstrandseparation, eachcycleof the reactionbeginswith a briefheattreatmentto separate allowstheseprimersto coolingof the DNAin the presence of a largeexcessof the two primerDNAoligonucleotides hybridizeto complementary sequences in the two DNAstrands(step2).Thismixtureis then incubatedwith DNA polymerase DNA,startingfrom the two primers(step3).The to synthesize and the four deoxyribonucleoside triphosphates DNAstrands.(B)As the procedure the newlysynthesized entirecycleis then begunagainby a heattreatmentto separate fragmentsserveastemplatesin theirturn,and within a few cycles is performedoverand overagain,the newlysynthesized the predominantDNAis identicalto the sequencebracketedby and includingthe two primersin the originaltemplate.Of the DNAput into the originalreaction,onlythe sequencebracketedby the two primersis amplifiedbecausethereare no primersattachedanywhereelse.In the exampleillustrated in (B),threecyclesof reactionproduce16 DNAchains,8 of which(boxedin yellow)arethe samelength as and correspondexactlyto one or the other strandof the originalbracketed which is sequenceshownat the far left;the other strandscontainextraDNAdownstreamof the originalsequence, exactlyto the original replicatedin the firstfew cycles.Afterfour morecycles,240of the 256 DNAchainscorrespond all ofthe DNAstrandshavethis uniquelength. bracketedsequence, and afterseveralmorecycles,essentially


Chapter8: ManipulatingProteins, DNA,and RNA

i s o l a t eD N A


i s o l a t em R N A















9enomrc clones (A)

cDNA crones (B)

evenaslittle asa singlecell-and identifyingthe personfrom whom the sample cameby his or her genetic"fingerprint" (Figure 847).

CellsCanBe UsedAs Factories to ProduceSpecificProteins The vast majority of the thousands of different proteins in a cell, including many with crucially important functions, are present in very small amounts. In the past, for most of them, it has been extremely difficult, if not impossible, to obtain more than a few micrograms of pure material. One of the most important contributions of DNA cloning and genetic engineering to cell biology is that they have made it possible to produce any of the cell's proteins in nearly unlimited amounts. Large amounts of a desired protein are produced in living cells by using expression vectors (Figure 8-48). These are generally plasmids that have been designed to produce a large amount of a stable nRNA that can be efficiently translated into protein in the transfected bacterial, yeast, insect, or mammalian cell. To prevent the high level of the foreign protein from interfering with the transfected cell's growth, the expression vector is often designed to delay the slnthesis of the foreign mRNA and protein until shortly before the cells are harvested and lysed (Figure 8-49). Because the desired protein made from an expression vector is produced inside a cell, it must be purified away from the host-cell proteins by chromatography after cell lysis; but because it is such a plentiful species in the cell lysate (often 1-10% of the total cell protein), the purification is usually easy to accomplish in only a few steps.As we saw above, manv expression vectors have been

Figure8-46 Useof PCRto obtain a genomicor cDNAclone.(A)To obtaina genomiccloneusingPCR,chromosomal DNA is first ourifiedfrom cells.PCR orimersthat flank the stretchof DNAto be clonedareadded,and manycyclesof the reactionare completed(seeFigure 8-45).Sinceonlythe DNAbetween(and including)the primersis amplified,PCR providesa way to obtain a short stretch of chromosomal DNAselectively in a virtuallypure form. (B)To use PCRto obtaina cDNAcloneof a gene,mRNAis first purifiedfrom cells.The first primer is then addedto the populationof mRNAs, and reversetranscriotase is usedto make a complementary DNAstrand.The secondprimeris then added,and the single-stranded cDNAmoleculeis amplifiedthroughmanycyclesof PCR,as shown in Figure8-45. For both types of cloning,the nucleotidesequenceof at leastpart of the regionto be clonedmust be known beforehand.





p r i m e r sf o r PCR a m p l i fi c a t i o n

I o t


E a o

I repeated sequences ofa V N T Rl o c u s

FY -b9 o m








I o



o20 o




E o


o a







that createthe variabilityusedin this analysis Figure8-47 How PCRis usedin forensicscience.(A)The DNAsequences . . .,whicharefound in variouspositions(loci)in the human suchasCACACA containrunsof short,repeatedsequences, genome.The numberof repeatsin eachrun can be highlyvariablein the population,rangingfrom 4 to 40 in different sequencemicrosatel/lte individuafs. A run of repeatednucleotidesof this type is commonly referredto as a hypervariable variablenumberoftandemrepeat)sequence.Becauseofthevariabilityinthesesequencesateac a l s o k n o w n a s a V N T(R locus,individuals usuallyinherita differentvariantfrom their motherand from theirfather;two unrelatedindividuals usingprimersthat bracketthe locusproduces A PCRanalysis thereforedo not usuallycontainthe samepairof sequences. the maternalvariantand the other one band representing a pairof bandsof amplifiedDNAfrom eachindividual, representing the paternalvariant.The lengthof the amplifiedDNA,and thusthe positionof the band it producesafter exampleshownhere,the same electrophoresis, dependson the exactnumberof repeatsat the locus.(B)In the schematic primers)from three selectedoligonucleotide threeVNTRlociareanalyzed(requiringthreedifferentpairsof specially electrophoresis. suspects(individuals A, B,and C),producingsix DNAbandsfor eachpersonafterpolyacrylamide-gel Althoughsomeindividuals haveseveralbandsin common,the overallpatternis quitedistinctivefor each.The band fourth lane(F)containsthe patterncanthereforeserveasa "fingerprint"toidentifyan individualnearlyuniquely.The productsof the samereactions startingmaterialfor sucha PCRcan be a singlehairor carriedout on a forensicsample.The a tiny sampleof bloodthat was left at the crimescene.Whenexaminingthe variabilityat 5-10 differentVNTRloci,the odds 1 in 10 billion.In the would sharethe samegeneticpatternby chancecan be approximately that two randomindividuals whereasindividualB remainsa clear A and C can be eliminatedfrom furtherenquiries, caseshownhere,individuals suspectfor committingthe crime.A similarapproachis now routinelyusedfor paternitytesting.


Chapter8: ManipulatingProteins, DNA,and RNA Figure8-48 Production of largeamountsof a proteinfroma proteincodingDNAsequence clonedinto an expression vectorand introduced intocells,A plasmid vectorhasbeenengineered to contain a highlyactive promoter, whichcauses unusually largeamounts of mRNA to be produced fromanadjacent protein-coding geneinserted intotheplasmid vector. Depending on thecharacteristics of thecloning vector, theplasmid is introduced yeast, intobacterial, insect, or mammalian cells, wherethe geneisefficiently inserted transcribed andtranslated intoprotein.

d o u b le - s t r an d e d p l a s m i dD N A e x p r e s s i ovne c t o r



designed to add a molecular tag-a cluster of histidine residues or a small marker protein-to the expressedprotein to allow easy purification by affinity chromatography (seeFigure 8-16). A variety of expressionvectors are available, each engineered to function in the tlpe of cell in which the protein is to be made. In this way, cells can be induced to make vast quantities of medically useful proteins-such as human insulin and growth hormone, interferon, and viral antigens for vaccines. More generally, these methods make it possible to produce every protein-even those that may be present in only a few copies per cell-in large enough amounts to be used in the kinds of detailed structural and functional studies that we discussed earlier. DNA technology also can produce large amounts of any RNA molecule whose gene has been isolated. Studies of RNA splicing, protein synthesis, and RNA-based enzyrnes, for example, are greatly facilitat