Theoretical Biochemistry - Processes and Properties of Biological Systems

  • 28 199 1
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Theoretical Biochemistry - Processes and Properties of Biological Systems

A v THEORETICAL AND COMPUTATIONAL CHEMISTRY Theoretical Biochemistry Processes and Properties of Biological Systems

1,316 247 40MB

Pages 719 Page size 486 x 684 pts Year 2007

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

File loading please wait...
Citation preview

A v

THEORETICAL

AND COMPUTATIONAL

CHEMISTRY

Theoretical Biochemistry Processes and Properties of Biological Systems

T H E O R E T I C A L AND C O M P U T A T I O N A L CHEMISTRY

SERIES EDITORS

Professor P. Politzer Department of Chemistry University of New Orleans New Orleans, LA 70148, U.S.A.

Professor Z.B. Maksi~ Rudjer Bo'~kovid~Institute P.O. Box 1016, 10001 Zagreb, Croatia VOLUME 1

Quantitative Treatments of Solute/Solvent Interactions

P. Politzer and J.S. Murray (Editors) VOLUME Z Modern Density Functional Theory: A Tool for Chemistry J.M. Seminario and P. Politzer (Editors) VOLUME 3 Molecular Electrostatic Potentials: Concepts and Applications J.S. Murray and K. Sen (Editors) VOLUME 4 Recent Developments and Applications of Modern Density Functional Theory J.M. Seminario (Editor) VOLUME S Theoretical Organic Chemistry

C. Pdrkdnyi (Editor) VOLUME 6 Pauling's Legacy: Modern Modelling of the Chemical Bond Z.B. Maksi~ and W.J. Orville-Thomas (Editors) VOLUME 7 Molecular Dynamics: From Classical to Quantum Methods

P.B. Balbuena and J.M. Seminari0 (Editors) VOLUME 8 Computational Molecular Biology

J. Leszczynski (Editor) VOLUME 9 Theoretical Biochemistry: Processes and Properties of Biological Systems

L.A. Eriksson (Editor)

THEORETICAL

AND

O

COMPUTATIONAL

CHEMISTRY

Theoretical Biochemistry Processes and Properties of Biological Systems

Edited by Leif

A. Eriksson

Department of Q u a n t u m Chemistry Uppsala University 751 - 2 0 Uppsala, Sweden

ELSEVIER 2001 Amsterdam

- L o n d o n - N e w Y o r k - O x f o r d - Paris - S h a n n o n - T o k y o

ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands 9 2001 Elsevier Science B.V. All rights reserved. This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: [email protected]. You may also contact Global Rights directly through Elsevier's home page (http://www.elsevier.nl), by selecting 'Obtaining Permissions'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44)207 631 5555; fax: (+44)207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.

ISBN: ISSN:

0-444-50292-0 1380-7323 (Series)

~) The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.

FOREWORD Theoretical chemistry has been an area of tremendous expansion and development over the past decade; from an approach where we were able to treat only a few atoms quantum mechanically or make fairly crude molecular dynamics simulations, into a discipline with an accuracy and predictive power that has rendered it an essential complementary tool to experiment in basically all areas of science. One of the areas where the success of computational chemistry perhaps has been most profound is that of biochemistry/biophysics. With the development of faster and cheaper computers, algorithms that in increasing number scale linearly (in particular the bigger the system under study is), and completely new methods and hybrids between methods, we are now able to investigate systems of between 50-100 atoms with quantum chemical methods, even larger aggregates by combining QM and MM methods or performing quantum-MD simulations, or systems with, say, 50000 atoms in large scale classical MD simulations. As the systems become increasingly realistic, direct comparisons with experimental data hence becomes possible. The intention of this volume is to give a flavour of the types of problems in biochemistry that theoretical calculations can solve at present, and to illustrate the tremendous predictive power these approaches possess. With these aspects in mind, I have tried to gather some of the leading scientists in the field of theoretical/computational biochemistry and let them present their work. You will hence find a wide range of computational approaches, from classical MD and Monte Carlo methods, via semi-empirical and DFT approaches on isolated model systems, to Car-ParrineUo QM-MD and novel hybrid QM/MM studies. The systems investigated also cover a broad range; from membranebound proteins to various types of enzymatic reactions as well as inhibitor studies, cofactor properties, solvent effects, transcription and radiation damage to DNA. It is my hope that the work presented herein will provide as much pleasure in reading, as I have had in editing the volume, and that it will help to stimulate discussions and further development of a truly fascinating field of science. Leif A. Eriksson June, 2000.

This Page Intentionally Left Blank

vii

TABLE OF C O N T E N T S Chapter 1. The Structure and Function of Blue Copper Proteins, U. Ryde, M.H.M. Olsson and K. Pierloot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The optimal geometry of the blue-copper active site . . . . . . . . . . . . 3.2 Trigonal and tetragonal Cu(II) structures . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 The sensitivity of the geometries to the theoretical method ..... 3.4 Geometry optimisations in the protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Electronic spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The electronic spectrum of plastocyanin . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Correlation between structure and spectroscopy of copper proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The sensitivity of the calculated spectra on the theoretical method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Reorganisation energies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Reduction potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Related proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The binuclear CUA site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Cytochromes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Iron-sulphur clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Protein strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 6 6 8 12

14 17 18 20 24 26 28 32 32 37 40 42 46

Chapter 2. Myoglobin, D. Karancsi-Menyhfird, G. Keserti and G. Nfiray1. 2. 3. 4. 5. 6.

Szab6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conformation and structural dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complexes with various ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Photodissociation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ligand migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 58 66 73 79 86

Chapter 3. Mechanisms for Enzymatic Reactions Involving Formation or Cleavage of O-O Bonds, P.E.M. Siegbahn and M.R.A. Blomberg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 95

viii 2. Methods and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Formation of O2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. O-O bond cleavage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 O-O bond activation in cytchrome oxidase . . . . . . . . . . . . . . . . . . . . . . . 4.2 Heine peroxidases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 O-O bond activation in methane m o n o o x y g e n a s e . . . . . . . . . . . . . . . 4.4 O-O bond activation in manganese catalase . . . . . . . . . . . . . . . . . . . . . . 4.5 O-O bond activation in isopenicillin N synthase . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97 99 107 107 118 121 128 133 137

Chapter 4. 1. 2. 3. 4. 5. 6.

Catalytic Reactions of Radical Enzymes, F. H i m o and L.A. Eriksson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology ................................................................. Galactose oxidase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pyruvate formate-lyase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ribonucleotide reductase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

C h a p t e r 5. Theoretical Studies o f Coenzyme B12-Dependent CarbonSkeleton Rearrangemems, D.M. Smith, S.D. Wetmore and L. Radom ....................................................................... 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. B a c k g r o u n d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Vitamin B 12: What is it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 C o e n z y m e B12: What does it do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The bound flee-radical hypothesis" H o w does coenzyme B12 work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The radical rearrangement mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Evaluation o f theoretical techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. 2-Methyleneglutarate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Fragmentation-recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Addition-elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Facilitation by protonation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. M e t h y l m a l o n y l - C o A mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Fragmentation-recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Addition-elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Facilitation by protonation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Glutamate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Fragmentation-recombination pathway for the rearrangement o f the aminopropyl radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Rearrangement o f the iminopropyl radical . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Hydride ion removal from the aminopropyl radical . . . . . . . . . . . . . .

145 145 147 149 158 169 177

183 183 184 184 185 186 188 190 193 195 196 196 197 199 199 199 200 201 202 204

ix 7. Comparison of the models for B12-dependent carbon-skeleton mutases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. The partial-proton-transfer concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

205 206 209

C h a p t e r 6. Simulations of Enzymatic Systems" Perspectives from CarParrinello Molecular Dynamics Simulations, P. Carloni and U. Rothlisberger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Principles of the Car-Parrinello method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Car-Parrinello modeling of biological systems . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Applications to non-enzymatic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Nucleic acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Heme-based proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Cyclic peptides and ion channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Photosensitive proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Applications to enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Human carbonic anhydrase II (HCAII) . . . . . . . . . . . . . . . . . . . 5.2.2 Serine proteases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Enzymes as targets for pharmaceutical intervention . . . . . . . . . . . . . . 5.3.1 HIV- 1 protease (HIV- 1 PR) .......................................... 5.3.2 HIV-1 reverse transcriptase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Herpes simplex virus type 1 thymidine kinase: a target for gene-therapy based anticancer drugs . . . . . . . . . . . . . . . . . . 5.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Rational design of biomimetic catalysts by hybrid Q M / M M Car-Parrinello simulations of galactose oxidase . . . . . . . . . . . . . . . . 6. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

236 243

C h a p t e r 7. Computational Enzymology: Protein Tyrosine Phosphatase Reactions, K. Kolmodin, V. Luzhkov and J. Aqvist . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Protein tyrosine phosphatase reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Protein tyrosine phosphatases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The PTPase reaction mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. The empirical valence bond method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 EVB and PTPase reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Calibration of the EVB potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Simulation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Reaction free energy profile of the LMPTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Step 1: Substrate dephosphorylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

253 253 254 254 255 256 257 258 262 263 263

215 215 216 218 219 219 219 219 220 220 220 221 221 225 228 229 233 234 235

4.2 Binding free energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Step 2: Phosphoenzyme hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Reaction mechanism for mutants lacking the general acid residue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The pKa of the catalytic cysteine is different in LMPTP and PTP1B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Substrate trapping in cysteine to serine mutated PTPases .............. 6. Prediction of a ligand induced conformational change in the active site of CDC25A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Kinetic isotope effects in phosphoryl transfer reactions ................. 7.1 Calculations of heavy atom kinetic isotope effect in phosphate monoester hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 8. Mome Carlo Simulations ofHIV-1 Protease Binding Dynamics and Thermodynamics with Ensembles of Protein Conformations: Incorporating Protein Flexibility in Deciphering Mechanisms of Molecular Recognition, G.M. Verkhivker, D. Bouzida, D.K. Gehlhaar, P.A: Rejto, L. Schaffer, S. Arthurs, A.B. Colson, S.T. Freer, V. Larson, B.A: Luty, T. Marrone and P.W. Rose ............. 1. Structural models for molecular recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Structure-based analysis of H IV-1 protease-inhibitor binding ......... 2.1 Structure-based analysis ofHIV- 1 protease-SB203386 inhibitor binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Structure-based computational models of ligand-protein binding dynamics and molecular docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Computer simulations of ligand-protein binding . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Computer simulations of ligand-protein docking . . . . . . . . . . . . . . . . . 4.2 Monte Carlo equilibrium simulations of ligand-protein thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Monte Carlo data analysis with the weighted histogram method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Computer simulations ofHIV-1 protease-inhibitor binding dynamics and thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 9. Modelling G-Protein Coupled Receptors, C. Higgs and C.A. Reynolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Receptor structure and modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Ligand binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Structural changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Receptor-G-protein interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

266 268 271 271 274 274 276 279 279

289 289 293 296 298 302 304 306 309 312 327

341 341 342 351 356 359

xi 6. G P C R dimerisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 10. Protein-DNA Interactions in the Initiation of Transcription: The Role of Flexibility and Dynamics of the TATA Recognition Sequence and the TATA Box Binding Protein, N. Pastor and H. Weinstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. TBP and transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Structural biology of TBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Kinetics and thermodynamics of TATA box recognition and binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. TATA box sequence specific recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The role of direct readout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The energy cost of DNA bending: an alternative sequencedependence mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Stable bends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Free energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The dehydration of the interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Integration of the various contributions into mechanistic criteria for the formation of TBP-DNA complexes . . . . . . . . . . . . . . . 3. Dynamic effects in complex stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Towards the preinitiation complex assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

363 366

377 377 378 380 382 382 387 387 390 395 396 397 398 400 401

Chapter 11.

A Multi-Component Model for Radiation Damage to D N A from its Constituents, S.D. Wetmore, L.A. Eriksson and R.J. Boyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Characterization of DNA radiation products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Pyrimidine and purine radiation products" close agreement between experiment and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pyrimidine and purine radiation products" problematic cases... 2.3 New mechanism for radiation damage in cytosine monohydrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Sugar radicals in irradiated DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Full D N A studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The primary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The effects of water on radical formation in D N A . . . . . . . . . . . . . . . 3.4 Major radical products formed in irradiated D N A . . . . . . . . . . . . . . . . 3.5 DNA cations and secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 D N A anions and secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

409 409 411 413 417 424 429 437 438 441 445 447 449 453

xii 4. A Multi-component model for D N A radiation damage . . . . . . . . . . . . . . . . . 5. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

456 458

Chapter 12. N e w Computational Strategies for the Quantum Mechanical

1. 2.

3. 4.

5.

6.

Study of Biological Systems in Condensed Phases, C. Adamo, M. Cossi, N. Rega and V. Barone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The density functional model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Functionals of the electronic density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The PBE functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Beyond the PBE functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 A further improvement: the hybrid HF/DF methods . . . . . . . . . . . . . 2.5 Beyond the GGA functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Some tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 EPR hyperfine coupling constants . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 N M R absolute shieldings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 General comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vibrational averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solvent effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Outline of the P C M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Extension to large solutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Conformational analysis including solvent effects . . . . . . . . . . . . . . . 5.2 Characterization of organic free radicals. Structure and magnetic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Glycine radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 5,6-dihydro-6-thymyl and 5,6-dihydro-5-thymyl radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Pyrrolidine-l-oxyl and imidazoline-l-oxyl radicals ..... Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

467 467 469 470 472 474 476 479 481 482 484 487 488 496 498 502 505 507 509 513 514 524 529 532

Chapter 13. Modelling Enzyme-Ligand Interactions, M.J. Ramos, A. Melo and E.S. Henriques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Strategies in enzyme-ligand design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Receptor homology-built models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Mapping the binding region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Assembling the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Docking the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Scoring the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Refining the enzyme-ligand structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. The enzyme-ligand complex in motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

539 539 540 541 543 545 548 550 553 555

xiii 3.1 Monte Carlo and molecular dynamics simulations . . . . . . . . . . . . . . . 3.2 Continuum electrostatic methods and Brownian dynamics ...... 3.3 Rigorous free energy simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Approximate free energy simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. A quantum insight into the study of enzyme-ligand interactions ...... 4.1 Quantum mechanical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Hybrid Q M / C M methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 14. The Q M / M M Approach to Enzymatic Reactions, A.J. Mulholland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Simulation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Q M / M M partitioning schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Q M / M M methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Method development and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Techniques for reaction modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Optimization of transition structures and reaction pathways .... 4.2 Activation free energies, conformational behaviour and dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Practical aspects of modelling enzyme reactions . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Choice and preparation of the starting structure . . . . . . . . . . . . . . . . . . 5.2 Choice of theoretical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Performance of semiempirical Q M methods . . . . . . . . . . . . . 5.3 Definition of the Q M region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Mechanistic questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Some recent applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Para-hydroxybenzoatehydroxylase (PHBH) . . . . . . . . . . . . . . . . . . . . . 6.2 Citrate synthase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 H u m a n immunodeficiency virus protease . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Enolase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Malate dehydrogenase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Lactate dehydrogenase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Papain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Influenza neuraminidase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 cAMP-dependent protein kinase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Chorismate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 15. Quinones and Quinoidal Radicals in Photosynthesis, R.A.

555 559 561 565 568 569 572 578

597 597 599 603 603 607 610 614 615 618 618 621 625 625 627 628 629 630 631 631 635 639 640 641 641 643 644 644 645 646

xiv Wheeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Plant photosystem II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Bacterial photosynthetic reaction centers . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Tests of computational methods for calculating properties of quinoidal radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Tyrosyl radical and its phenoxyl and p-cresyl radical models... 2.2 Para-benzoquinoneand its semiquinone radical anion .......... 3. Calculated properties of quinoidal radicals important in photosynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Plastoquinones and their radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Menaquinones and their semiquinone radical anions . . . . . . . . . . . . 3.3 Ubiquinones and their semiquinone radical anions . . . . . . . . . . . . . . . . 4. Semiquinone radical anions in plant photosystem II . . . . . . . . . . . . . . . . . . . . 5. Conclusions and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Retrospective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Future promise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

655 655 657 658 659 660 665 670 670 674 677 683 684 685 685

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

691

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

695

L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

Chapter 1

The structure and function of blue copper proteins Ulf Ryde a, Mats H. M. Olsson a, and Kristine Pierloot b*

aDepartment of Theoretical Chemistry, Lund University, Chemical Centre, P. O. Box 124, S-221 00 Lund, Sweden, e-mail: [email protected] bDepartment of Chemistry, University of Leuven, Celestijnenlaan 200F, B-3001 Heverlee-Leuven, Belgium

Theoretical investigations of the structure and function of the blue copper proteins and the dimeric CUA site are described. We have studied the optimum vacuum geometry of oxidised and reduced copper sites, the relative stability of trigonal and tetragonal Cu(II) structures, the relation between the structure and electronic spectra, the reorganisation energy, and reduction potentials. We also compare their electron-transfer properties with those of cytochromes and ironsulphur clusters. Our calculations give no support to the suggestion that strain plays a significant role in the function of these proteins; on the contrary, our results show that the structures encountered in the proteins are close to their optimal vacuum geometries (within 7 kJ/mole) and that the favourable properties are achieved by an appropriate choice of ligands. We use the density functional B3LYP method for the geometries, multiconfigurational second-order perturbation theory (CASPT2) for calculations of accurate energies and spectra, pointcharge models, continuum approaches, and combined classical and quantum chemical methods for the environment, and classical force-field calculations for estimation of dynamic effects and free energies.

I. INTRODUCTION The blue copper proteins or cupredoxins are a group of proteins that exhibit a number of unusual properties, viz. a bright blue colour, a narrow hyperfine sprit-

*This investigation has been supported by grants from the Swedish Natural Science Research Council, the Flemish Science Foundation, the Concerted Action of the Flemish Government, and by the European Commission through the TMR program (grant ERBFMRXCT960079). It has also been supported by computer resources of the Swedish Council for Planning and Coordination of Research, Parallelldatorcentrum at the RoYal Institute of Technology, Stockholm, the National Supercomputer Centre at the University of Linkrping, the High Performance Computing Center North at the University of UmeL and Lunarc at Lund University.

ting in the electron spin resonance (ESR) spectra, and high reduction potentials [1-3]. Moreover, crystal structures of these proteins show an extraordinary cupric geometry" The copper ion is bound to the protein in an approximate trigonal plane formed by a cysteine (Cys) thiolate group and two histidine (His) nitrogen atoms. The coordination sphere in most blue-copper sites is completed by one or two axial ligands, typically a methionine (Met) thioether group, but sometimes also a backbone carbonyl oxygen atom (in the azurins) or instead an amide oxygen atom from the side chain of glutamine (in the stellacyanins) [1-4]. The blue copper proteins serve as electron-transfer agents. Their distorted trigonal geometry is intermediate between the tetrahedral coordination preferred by Cu(I) and the tetragonal geometry of most Cu(II) complexes. As a result, the change in geometry when Cu(II) is reduced to Cu(I) is small [2,3,5], which gives a small reorganisation energy and allows for a high rate of electron transfer [6]. These unusual properties, unprecedented in the chemistry of small inorganic complexes, already in the 1960s led to the proposal that the protein forms a rigid structure, which forces the Cu(II) ion into a coordination geometry more similar to that preferred by Cu(I) [7,8]. These hypotheses were later extended into general theories for metalloproteins, suggesting that the protein forces the metal centre into a catalytically poised state, the entatic state theory [9,10] and the induced rack theory [ 11,12]. However, this suggestion has recently been challenged [13,14]. In particular, we have shown by quantum chemical calculations that the cupric geometry in the blue copper proteins is very close to the optimal vacuum structure of a Cu(II) ion with the same ligands [14]. Why then are the properties of the blue copper proteins so unusual, if not by protein strain? During the last five years, we have tried to answer this question using theoretical methods. In this review, we describe our results and discuss them in relation to the strain hypotheses. We also compare the blue copper proteins with related metal sites, such as the CUA dimer, cytochromes, and iron-sulphur clusters. Altogether, this gives an illustration of how theoretical methods, ranging from high-level quantum chemical calculations to pure classical simulations, can be used to study and solve biochemical problems.

2. METHODS

It is not yet possible to perform accurate quantum chemical calculations on a whole protein. Therefore models have to be constructed that are as realistic as possible but at the same time computationally tractable. We have used a number of techniques, ranging from high-level quantum chemical calculations on small models of the active site to classical simulations of the full protein. At the intermediate level, we have described the protein by quantum chemical methods and incorporated the effects of the surrounding protein and solvent by a variety of

methods, e.g. a point-charge model, a dielectric continuum, or by a classical force field. Each method has its strengths and disadvantages, and the choice of method is largely determined by the questions of interest and the available computer resources. For quantum chemical geometry optimisations we have used the density functional method B3LYP, as implemented in the Mulliken, Turbomole, or Gaussian softwares [15-18]. Hybrid density functional methods have been shown to give as good or better geometries as correlated ab initio methods for first-row transition metal complexes [ 19-21 ], and the B3LYP method in particular seems to give the most reliable results among the widely available density functional methods [22]. In most calculations, we have used a basis set of double-~ quality enhanced with p, d, and f functions for copper and iron [14,23,24] and the 6-31G* basis sets for the other atoms [25]. This basis set is denoted DZpdf/6-31G*. For calculation of accurate energies, geometries, and electronic spectra, the CASSCF-CASPT2 approach was used (second order perturbation theory with a multiconfigurational reference state) [27]. This method has been shown to give reliable results for organic molecules as well as transition-metal complexes, with an error consistently less than 2500 crn-~ [28]. Generally contracted atomic natural orbitals (ANO) type basis sets were used in these calculations [29]. They have the virtue of being compact but at the same time optimised to include as much correlation as possible at a given size. Due to the size of the systems studied, basis sets of moderate size have been used, including up to f-type functions on Cu and a d-type function on S, but often no polarisation functions on C, N, and H. The choice of the active orbital space for the CASSCF calculations is a crucial step, and has turned out to be especially difficult in these proteins and other systems containing a Cu-thiolate bond. From earlier studies it was known that in complexes with first-row transition metal ions with many 3d electrons, the active space should include one correlating orbital for each of the doubly occupied 3d orbitals [28]. Therefore the starting active space contains 10 orbitals (3d and 3d). In addition, it is necessary to add the 3p orbitals on Scys to describe correctly the covalent character of the Cu-Scys bond and also 2p and 3p orbitals on nitrogen and sulphur to describe charge-transfer states. The final active space therefore contain 11 or 12 active orbitals (12 active orbitals are at present the upper limit for the CASPT2 method). The CASSCF wavefunction is used as reference function in a second-order estimate of the remaining dynamical correlation effects. All valence electrons were correlated in this step and also the 3s and 3p shells on copper. Relativistic corrections (the Darwin and mass-velocity terms) were added to all CASPT2 energies. They were obtained at the CASSCF level using first-order perturbation theory. A level-shift (typically 0.3 Hartree) was added to the zeroth order Hamiltonian in order to remove intruder states [30]. Transition moments were computed with the CAS state-interaction method [31] at the CASSCF level. They were

combined with CASPT2 excitation energies to obtain oscillator strengths. The CASSCF--CASPT2 calculations were performed with the MOLCAS quantum chemistry software [32]. For further details, we refer to the original articles [14, 33-39]. In the quantum chemical calculations, only the copper ion and its ligands were included. Several models were tested for the ligands: histidine was modelled by either ammonia, imidazole (Im), or ImCH3, cysteine by SH-, SCH3-, or SC2H5-, methionine by SH2, S(CH3) 2, or S(CH3)(CH2CH3),and amide ligands by formaldehyde, formamide, or acetamide. In the calculations on azurin, the main-chain linkage between the histidine ligand and the backbone amide group was also included in the calculations (ImCH2CH2NHCOCH3). We have shown that a converged geometry and spectroscopy is obtained with imidazole as a model for histidine and with methyl groups on the sulphurs modelling cysteine and methionine [14,26,36]. Smaller models contain polar hydrogen atoms, which form artificial hydrogen bonds that may strongly distort the structure and change the energies. Fortunately, the explosive increase in computer power has the last two years made it unnecessary to compromise with the ligand models. However, before that, we often had to use smaller models and enforce symmetry to make the calculations feasible. Naturally, calculations in vacuum cannot reproduce all the properties of a metal site in a protein. The simplest way to include the surroundings in quantum chemical calculations is to assign a charge to each atom in the protein (and possibly also an equilibrated ensemble of solvent molecules) and include the field of these charges in the calculations. This method was used in most of the calculations of electronic spectra [33,34,36,39]. The best available crystal structures were used as starting coordinates. Hydrogen atoms and solvent molecules (in the form of a spherical cap) were added and their positions were equilibrated by the Amber suite of programs [40]. The final coordinates were used in the spectra calculations together with charges from the Amber libraries [40]. Another method to include solvent effects in quantum chemical calculations is the continuum approach. In the polarised continuum method (PCM), a molecule is placed in a cavity formed by overlapping atom-centred spheres surrounded by a dielectric medium [41]. The induced polarisation of the surroundings is represented by point charges distributed on the surface of the cavity, and the field of these charges affects the wavefunction. Thus, solvation effects are included in the wavefunction in a self-consistent manner. In addition to this electrostatic term, the PCM method includes additional terms that affect only the solute energy, viz. cavitation, dispersion and exchange energy [42]. We have used the conductor or serf-consistent isodensity PCM methods [43,44] as implemented in the Gaussian 98 software [17]. Further details are given in the original references [35,45]. Continuum methods can also be used in classical calculations including the full protein. In such calculations, each atom in the protein is assigned a point charge.

For the active site, the charge may be taken from a quantum chemical calculation [46]. Moreover, the protein is assigned a low dielectric constant (typically 4), whereas the surrounding solvent is assigned the value of water (-80). Then, the Poisson-Boltzmann equation is solved numerically on a grid coveting the protein and parts of the solvent [47]. Naturally, this method is less accurate than the PCM, but it can be used for much larger systems. We have used this method as implemented in the MEAD software [48] for the calculation of reduction potentials [45,49]. A related method is the protein-dipole Langevin-dipole method, in which water molecules are represented by Langevin dipoles on a grid and polarisation effects are included [50]. It has successfully been used in several investigations of the reorganisation energy and reduction potentials of proteins [50-53], but we have only used it in some explorative calculations. We have run several types of classical simulations on blue copper proteins, e.g. energy minimisations, molecular dynamics simulations, free energy calculation, and potential of mean force computations [33,34,36,38,39,54], all with the Amber software [40]. In such calculations, the copper ion and its ligands pose a special problem, since they are not included in the force fields. For crude calculations (especially when the metal site is kept fixed), it may be sufficient to determine an appropriate set of charges for the copper ion and its ligands from a fit to the electrostatic potential calculated by quantum chemical methods [46] (there is a significant transfer of charge from the ligands to the copper ion). For more accurate calculations, a full force-field parameterisation of the copper ion and its ligands has to be performed, involving charges, force constants, and equilibrium parameters. We have performed such a parameterisation for the copper sites in oxidised and reduced plastocyanin and oxidised nitrite reductase (both the type 1 and type 2 copper sites) [54]. The most satisfactory way to include the effect of the surroundings in quantum chemical calculations is to combine a quantum chemical and a classical program, the QC/MM approach. In this method, the interesting part of the system is treated by quantum chemical methods, whereas the rest is treated by classical methods. Classical forces on the quantum atoms are added to the quantum chemical forces before the atoms are moved (either in a geometry optimisation or in molecular dynamics simulations). If there are bonds between the quantum chemical and classical systems, special action is taken. This approach is very popular at present, and many different variants have been suggested [55-60]. We have recently updated our QC/MM program COMQUM [57] to incorporate the density functional methods of Turbomole [16] and the accurate force field methods in Amber [40]. This program has been used to optimise the geometry of the copper site in three blue copper proteins, to estimate strain energies, and to calculate reorganisation energies [26].

Figure 1. A comparison of the optimised structure of Cu(Im)z(SCH3)(S(CH3)2)+ [14] and the crystal structure of plastocyanin (shaded) [4].

3. G E O M E T R Y

3.1 The optimal geometry of the blue-copper active site According to the induced-rack and the cntatic state hypotheses [10,12], the Cu(II) coordination sphere in the blue copper proteins is strained into a Cu(I)-likc structure. Such hypotheses arc hard to test experimentally, but with theoretical methods it is quite straightforward. The actual coordination preferences of the copper ion can be determined by optimising the geometry of the ion and its ligands in vacuum; if the optimised structure is almost the same as in the proteins, strain is probably of minor importance for the geometry. We have optimiscd the geometry of Cu(Im)2(SCH3)(S(CH3)2) +, a rcafistic model of the oxidiscd prototypical CuHiszCysMct blue-copper site (e.g. in plastocyanin), using the density functional B3LYP method [14]. The result is sensational but convincing. As can be seen in Figure 1 and Table 1, the optimiscd geometry is virtually identical to the one observed experimentally in the blue copper proteins. Almost all bond lengths and bond angles around the copper ion arc within the range observed in crystal structures, and most of them arc close to the average values for the proteins. Only two small, but significant, differences can be observed: a slightly too long Cu-Scys bond and a slightly too short Cu-SMet bond. These differences can be fully explained by the dynamics of the system, which increase the average Cu-SMet bond length by at least 10 pm [54], and by deficiencies in the theoretical method (the more accurate CASPT2 method, gives a 7 pm shorter Cu-Scy~ bond and a 7 pm longer C-SMet bond [ 14]). Equally convincing results have been obtained for the optimal structure of Cu(Im)z(SCH3)(OCCH3NH2) +, a model of the ligand sphere of oxidiscd stcllacyanin [33], as is also shown in Table 1. It should be noted that no information

Table 1. Comparison of the geometry of optimised models and crystal structures of blue copper proteins [14,34,54]. A x is the axial ligand and tp the angle between the Scys-fu-Ax and N-Cu-N planes. Model

Distance to Cu Scys N Cu(Im)2(SCH3)(S(CH3 )2) +a 218 204 Plastocyanin oxidised 207-221 189-222 Cu(Im)2(SCH3)(S(CH3)2) + 232 214-215 Cu(Im)2(scn3)(s(cn3)2) +b 227 205-210 Plastocyanin reduced 211-217 203-239 Cu(Im)2(SH)(S(CH3)2) +c 223 205-206 Nitrite reductase oxidised 208-223 193-222 Cu(Im)2(SCH3)(OCCH3NH2) +a 217 202-206 Stellacyanin oxidised 211-218 191-206 a Trigonal structure b The C u - S ~ t bond length was constrained to 290 pm. r Tetragonal structure

(pm) Ax 267 278-291 237 290 287-291 242 246--270 224 221-227

N-N 103 96-104 109 119 91-118 100 96-102 103 97-105

Angle subtended at Cu (o) SCys - N Sc~,s-Ax N-Ax 120~ 122 116 94-95 112-144 102-110 85-108 105-108 115 107-113 112-120 99 100-101 ll0-141 99-114 83-110 97-141 103 95-126 98-140 103-109 84-138 122-125 113 92-95 116-141 101-107 87-102

tp 90 77-89 89 88 74-80 62 56-65 88 82-86

from the crystal structures has been used to obtain these structures; they are entirely an effect of the chemical preferences of the copper ion and its four ligands. Thus, the cupric structure in the oxidised blue copper proteins is clearly neither unnatural nor strained. Geometry optimisations of the corresponding reduced models are more complicated, because the lower charge on the copper ion leads to weaker bonds, so that the geometry of the complex is very sensitive to electrostatic interactions with the surrounding protein (e.g. hydrogen bonds) [14]. The optimum vacuum structure of the Cu(Im)2(SCH3)(S(CH3)2) complex is more tetrahedral than the active site in reduced plastocyanin and it has a short CU-SMet bond (237 pm, see Table 1) [14]. However, the potential surface of the Cu-SMet bond is extremely flat (c.f. Figure 10). tf this bond length is fixed at the crystal value (290 pm) and the complex is reoptimised, a structure is obtained that is virtually identical to the crystal structure of reduced plastocyanin. Interestingly, this structure is only 4 kJ/mole less stable than the optimal tetrahedral structure, which is within the error limits of the method [14]. Moreover, many effects not included in these calculations tend to elongate this bond (see below). Thus, we cannot decide whether the reduced structure is slightly distorted by the protein or not, but it is clear that the energy involved is extremely small. We have also studied the structure of azurin [37,61 ]. In this protein there is another weak axial ligand of the copper ion, a backbone carbonyl group. This group is chemically similar to the amide oxygen ligand in stellacyanin. Therefore, it is not surprising that it prefers to bind to copper at a rather short distance, about 230 pm for the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2) § model, similar to the Cu-O distance in stellacyanin. However, it costs less than 6 kJ/mole to move it to the distance found in the crystal structure (around 310 pm, see Figure 2). The same applies to the methionine ligand. It prefers to bind in the second sphere of the complex, but it also has a shallow minimum around 290 pm, which is only 3 kJ/mole higher in energy. The reduced site behaves similarly [61 ].

65

if.j/

~" 4

no s

/

3

\

Cu-O

~

.~

ea~)

Cu-S free

~

1

~

~ X

~

no o

"x.J 0

'

200

'

'

!

'

220

' '

I

'

240

'

'

I

'-'

260

'

I

'

280

'

'

I

'

300

'

'

I

'

320

Bond length (pm)

Figure 2. Potential surfaces for the Cu-O and Cu-SMet bonds in three models of oxidised azurin. "No S" and "no O" refer to the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)+ and Cu(Im)2(SCH3)(S(CH3)2) + models, respectively, whereas the other two curves were obtained with the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2)+ model with the Cu-O or Cu-SMet bond at its equilibrium distance [37]. Interestingly, if the copper ion in azurin is replaced by Co(II), things change drastically, as can be seen in Figure 3. If one of the axial ligands is removed, the other binds strongly to the metal ion, with a force constant of the same size as for the equatorial ligands and about six times larger than those of the Cu(II) complex. The optimum C o O and Co--SMet distances are 207 and 243 pm, respectively. The potential surface for the carbonyl ligand is not changed much if the methionine ligand is added; cobalt still prefers a short Co-O bond by about 30 M/mole over the distance found in the copper protein, which explains the short distance in the crystal structure (--220 pm) [62]. However, if the Co-O distance is fixed at the crystal value, the Co-SMe t potential change strongly: The methionine model prefers to bind in the second sphere, but the surface becomes extremely fiat, so fiat that the Co--SMet bond length can be varied between 280 and 380 pm at a cost of less than 1 M/mole. In the crystal structure it is 340-370 prrt Thus, the cobalt site is four-coordinate with a strong bond to the carbonyl group, whereas the copper site is effectively three-coordinate. The bonds to the other axial ligands are determined by interactions with the protein, rather than with the metal. The only way to study such weak bonds in vacuum models is by studying the potential surfaces, such as those in Figures 2 and 3.

3.2. Trigonai and tetragonal Cu(ll) structures Why does a Cu(II) ion with the ligands in the blue copper proteins assume a trigonal structure, whereas most inorganic cupric complexes are tetragonal (square planar, square pyramidal, or distorted octahedral) [63,64]? We have faced this question by optimising the geometry of a number of models of the type

50 2

Co-O

/

.~ 40-

co-

CoS .

s//

30-

s nx

20-

///

lo0

~ -r.-

200

|

I

i-,.

250

I

|

~ I

% ~ o ~ x ~ i

I

300 350 Bond length (pm)

|

i

|

|

400

Figure 3. Potential surfaces for the Co-O and Co-SMet bonds in three models of Co(II)substituted azurin [37]. "No S" and "no O" refer to the Co(Im)(ImCHzCH2NHCOCH3)(SCH3)+ and Co(Im)2(SCH3)(S(CH3)2)+ models, respectively, whereas the other two curves were obtained with the Co(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2)+ model with the Co-O or Co-SMet bonds either at their equilibrium distance or f'rxed at the crystal values, 223 or 356 pm, respectively. where X is OH-, SH-, Sell-, CI-, NH2-, and some other ligands related to the cysteine thiolate group [65]. The results show that all complexes may assume two types of structures, both reflecting the Jahn-Teller instability of the tetrahedral Cu(II) complex (a d 9 ion). This instability can be rifted either by a Ozd distortion, leading to a tetragonal structure, or by a Czv distortion, leading to a trigonal structure. The tetragonal structure is stabilised by four favourable t~ interactions between the singly occupied Cu 3d orbital and p~ orbitals on the four ligands, as is shown in Figure 4. This gives rise to the well-known square-planar Cu(II) complexes. If one of the ligands instead has the ability to form a strong rc bond with the copper ion, however, a trigonal structure can be stabilised. The fight-hand side of Figure 4 shows that in such a structure, two of the ligands still form t~ bonds to the copper ion, whereas a px orbital of the third ligand overlaps with two lobes of the singly occupied Cu 3d orbital. Thereby it occupies two positions in a square coordination plane, giving rise to a trigonal planar geometry. The fourth ligand cannot overlap with the singly occupied orbital and has to interact with a doubly occupied orbital. Since such an interaction is weaker, it becomes an axial ligand with an enlarged copper distance. Thus, the long axial bond is a result of the electronic structure. Moreover, the effective coordination number is decreased, so the three strong ligands in the trigonal complex bind at shorter distances to the copper ion than in the tetragonal complex. This explains the short Cu-Scys bond in the blue copper proteins, together with the fact that a rt bond to copper is inherently shorter than a corresponding ~ bond [34,39]. CuII(NH3)3 X,

10

Figure 4. The singly occupied orbitals of the tetragonal (left) and trigonal (fight) Cu(NH3)3(SH)§ complex [38]. For small and hard X ligands, such as NH 3 and OH-, the tetragonal CulI(NH3)3 x structure is most stable (by 30-70 U/mole) [65]. For large, soft, and polarisable ligands, such as SH- and Sell-, on the other hand, the two types of structures have approximately the same stability (within 15 U/mole). Interestingly, the tetragonal structure is most stable for Cu(NH3)3(SH) +, whereas the trigonal structure is more stable for Cu(NH3)2(SH)(SH2) +, showing that the methionine ligand is also important for the structure of the blue copper proteins. This explains why very few trigonal cupric structures have been observed for small inorganic models; there simply is no complex with the appropriate set of ligands, CuN2S-S ~ [63,66,67]. Formally, the Cu(NH3)3X complexes consist of a Cu(II) ion with nine d electrons and a neutral or negatively charged X ligand. For three soft and negatively charged ligands, Sell-, NH2-, and PH2-, however, the charge on the ligand moves to copper ion, yielding a Cu(I) ion and an uncharged ligand radical. Since the Cu(I) ion has a full d shell, it prefers a tetrahedral structure and the complexes with these three ligands are almost tetrahedral. The thiolate ligand is intermediate: the electron is delocalised between the copper and thiolate ions. Therefore, both the trigonal and the tetragonal structures are rather tetrahedral and are actually quite similar (c.f. Figure 4). Naturally, this facilitates electron transfer to and from the complex by reducing the reorganisation energy. A characteristic difference between the two geometries of Cu(NH3)2(SH)(SH2) + is that the tetragonal structure has a longer Cu-Scy~ bond and a shorter CU-SMe t bond than the trigonal structure. Moreover, the tp angle (the angle between the S-Cu-S and N - C u - N planes) is smaller in the tetragonal structure and the two largest angles around the copper ion in the trigonal structure are between Scys and the two NHis atoms, whereas in the tetragonal structure the two largest angles are between two distinct pairs of atoms (Scys---fu-N and SMet---fu-N).

11 These structural differences remind of the differences in the crystal structure of plastocyanin and nitrite reductase (c.f. Table 1). Detailed investigations have shown that this is not accidental: Plastocyanin, and more generally, axial type 1 copper proteins have a trigonal structure with a rc bond between copper and the thiolate ligand, whereas the so-called rhombic type 1 copper proteins (e.g. nitrite reductase) have a tetragonal structure with mainly o bonds to the copper ion (c.f. Figures 6 and 7). This gives an attractive explanation to the structural and spectroscopic differences between these proteins, which share the same copper ligand sphere [34,65]. The two types of structures have almost the same energy (within 7 kJ/mole) and which structure is most stable depends on the models used for the ligands. At present it is not possible to decide if the native structure of the typical blue-copper ligand sphere is trigonal or tetragonal [26,34,54,65]. Thus, with the typical blue-copper ligands, the tetragonal Jahn-Teller distortion may at worst give rise to the structure found in nitrite reductase, i.e. a fully functional site with properties (reduction potentials and reorganisation energies) similar to those of the trigonal blue copper proteins [35,68]. This shows that with the blue-copper ligands there is no need for protein strain to avoid a tetragonal structure. By free energy perturbations, we have studied why some proteins stabilise the trigonal structure, whereas others stabilise the tetragonal structure, although the ligand sphere is the same [54]. The results indicate that plastocyanin prefers the bond lengths and electrostatics of the trigonal structure, whereas nitrite reductase favours the angles in the tetragonal structure, both by 10-20 U/mole. Interestingly, the length of the CU-SMet bond has a very small influence on the relative stability of the two conformations. The existence of trigonal and tetragonal structures seems to be general for copper-cysteine complexes. An illustrative example is the geometry of the catalytic metal ion in copper-substituted alcohol dehydrogenase (the native enzyme contains zinc). In this enzyme, the copper ion is coordinated by two cysteines, one histidine, and a ligand from the solution. A crystal structure with dimethylsulfoxide as the fourth ligand shows a trigonal structure with dimethylsulfoxide as an axial ligand at a large distance (319-345 pm) [69]. Our calculations [39] show that the electronic structure of such complexes is very similar to the traditional blue copper proteins. The NHis atom and one of the Scys atoms make t~ bonds to the copper ion, whereas the other Scys atom forms a rc bond with copper. Thus, the two cysteine ligands are not equivalent; the rc bonded ligand has a shorter CuScys bond and larger angles to the other ligands compared to the t~ bonded ligand (217 and 225 pm, respectively). Tetragonal structures may also be obtained for models of Cu-alcohol dehydrogenase [39]. When the fourth ligand is uncharged, they are less stable than the corresponding trigonal complexes (in accordance with the crystal structure). However, with OH- (which is involved in the reaction mechanism of the enzyme)

12

Table 2. The effect of the model size, basis set, and density functional method on the geometry of the trigonal models of the oxidised blue-copper site [26]. Method Modela Basis Distance to Cu (pm) Angle around Cu (~ q) setb Scy~ N SMet N-N Scy~N S-S SMet-N (o) B3LYP 1 1 218 204-205 267 103 119-122 117 94-95 89.1 B3LYP 1 2 219 206 269 104 119-122 117 94-96 88.5 B3LYP 1 3 218 205 267 103 118-123 117 94-96 88.3 B3LYP 2 1 218 204-206 271 102 120-125 115 94-95 89.7 BP86 1 1 219 203-210 236 103 102-130 116 99-105 81.1 aThe models are 1, Cu(Im)2(SCH3)(S(CH3)2) + or 2, Cu(ImCH3)2(SC2Hs)(S(CH3)(C2Hs) +. bThe basis sets are 1, DZpdf/6-3 IG*; 2, TZVPP; or 3, DZs2pd2f/6-31 l(+)G(2d,2p) [26]. as the fourth ligand, the tetragonal structure is most stable. This may explain some of the spectral shifts that are observed experimentally when the coenzyme or the ligands are exchanged [69].

3.3 The sensitivity of the geometries to the theoretical method When we did the geometry optimisations of the plastocyanin models six years ago, they were on the verge of the possible; each optimisation took three cPumonths. Today, such calculations can be done routinely in less than a week on a standard workstation. Therefore, we now have the opportunity to test whether these calculations were converged with respect to the basis sets or model systems. In Table 2, we list the result of a series of such calculations for the oxidised trigonal plastocyanin model, Cu(Im)2(SCH3)(S(CH3)2) § [26]. Clearly, the results are very stable. If the basis set is enhanced to triple-~, with diffuse functions on S and N, and double polarising functions on all atoms (DZs2pd2f/6-31 l(+)G(2d, 2p) [25]) or is changed to the TZVPP basis set (with a d function on H and an f function on other atoms) [70], the bond lengths to copper change by less than 2 pm and the angles change by less than 1o. Similarly, if a methyl group is added to all ligands (Cu(ImCH3)2(SC2Hs)(C2HsSCH3)+), only small changes are observed, up to 4 pm in the bond length and 3 ~ in the angles. It would have been interesting to check if the results also are converged with respect to the method. Unfortunately, there is no method that is clearly better and can be used for models of this size. It is still not possible to do analytical geometry optimisations with the CASPT2 method and test calculations with the MP2 method indicate that these results cannot be converged with respect to the basis set [26]. Therefore, we have only made calculations with another common density functional n~thod, Becke-Perdew86 (BP86) [71,72]. In general performance, it slightly less accurate than B3LYP [22], but it has the advantage of lacking exact exchange, which may in combination with other techniques make the calculations about five times faster than B3LYP [73]. However, as can be seen in Table 2, the change in the density functional, leads to quite appreciable changes in the geometry, up to 10 ~ in the angles and as much as 35 pm for the Cu-SMet

13 Table 3. The effect of the dielectric constant (e) on the geometry of the Cu(Im)2(SCH3)(S(CH3)2) complex [26]. The calculations were performed with the B3LYP method, the DZpdf/6-31 G* basis set, and the CPCM solvation model with a water probe. e Distance to Cu (pm) Angle around Cu (o) tp .... S%s N SMet N-N Scy~N S-S SMet-N (o) 1 232 214-215 237 109 105-108 115 107-113 89.4 2 233 209-211 240 116 106-108 106 103-115 87.0 4 234 209-211 241 117 107-109 105 103-114 86.8 8 233 209-210 244 118 108-112 106 100-112 86.7 16 235 208-208 246 120 108-110 104 103-110 87.6 80 232 208-211 248 112 108-121 104 98-110 87.4 bond length. Even if these changes are not so large in energy terms, the structure is appreciably less similar to the experimental structures and therefore we cannot recommend this method for general use. Similar results apply for the reduced models, but due to the weaker interaction with copper, the changes are slightly larger, up to 6 pm and 7 ~ [26]. Most importantly, however, the relative energy between the optimal geometry and the complex with the Cu-SMet bond length fixed at 290 pm is converged; it changes by less than 1 U/mole if the basis set or model system is increased. With the BeckePerdew86 method, bond lengths change by up to 9 pm, angles by less than 4 ~ but the relative energy change by 6 U/mole [26]. Our study of the reduction potential of the blue copper proteins indicated that the geometries may change quite appreciably when solvation effects are taken into account [35]. We have therefore performed a number of geometry optimisations of the reduced Cu(Im)2(SCH3)(S(CH3)2)complex in a solvent with varying values of the dielectric constant [26]. The results in Table 3 show that the bond lengths change by up to 11 pm and the angles by up to 11 o. Thus, the geometry change in the solvent, but the effects are not very large, and the general geometry is not changed (the tp angle does not decrease below 86~ In particular, the effect of the solvent is smaller than the results for the reduction potential indicate. It is notable that the length of the CU-SMe t bond increases with the dielectric constant. Thus, solvent effects may explain parts of the difference between the optimised and crystal structures for the reduced complexes. Even if the CU-SMe t bond does not become longer than 248 pm, this will decrease the energy difference between the optimised and the crystal structure (i.e. below 4 U/mole). Furthermore, increasing the basis set [26], the model size [26], or improving the theoretical method [14] also elongate the CU-SMe t bond, as do dynamic effects [54]. If all these corrections are added together, the CU-SMe t bond length should become --270 pm, but non-additive effects may make it even longer. Therefore, it is not clear if there is any discrepancy at all between the calculated and experimental length of this bond, but if there is any, it is very small in energy terms.

14

Figure 5. A comparison between the crystal structure of reduced plastocyanin (light grey and no hydrogen atoms) [74] and the structures of Cu(Im)2(SCH3)(S(CH3)2)optimised in vacuum (dark grey) [14] or with COMQUMin reduced plastocyanin [26].

3.4 Geometry optimisations in the protein The best way to study the geometry of the blue copper proteins is to perform geometry optimisations in the protein using combined quantum chemical and molecular mechanical methods. We have recently initiated a series of such calculations using the program COMQUM [57], which uses the B3LYP method for the active site and the Amber force field [40] for the rest of the protein [26]. Some of the results of these calculations are shown in Table 4. It is clear that the COMQUM structures are appreciably more similar to crystal structures than structures optimised in vacuum. This is most obvious for the orientation of the histidine tings and the dihedrals of the other copper ligands, as can be seen in Figure 5. This improvement is quite natural since these low-energy modes are determined in vacuum by weak hydrogen bonds involving the methyl groups. In the protein, they are instead determined by interactions with the surrounding protein, e.g. steric effects, normal hydrogen bonds, and non-polar interactions. However, there is also a significant improvement of the Cu-ligand distances and the angles around the copper ion, as can be seen in Table 4. In all COMQUM structures, the Cu-Scy~ bond is appreciably shorter than in vacuum, which make them more similar to what is found in crystal structures (they still are a few pm too long, which reflect the tendency of B3LYP to give too long Cu-Scys bonds [14]). This is probably an effect of the N H S c y s hydrogen bond in the protein. Similarly, the Cu-N distances are 3-10 pm shorter than in the vacuum structures, again improving the agreement with crystal structures. The S-Cu-SMet, SMet-CuN, and r angles are also significantly improved, especially for the oxidised systems.

15 Table 4. The result of COMQUM calculations on plastocyanin, nitrite reductase, and cucumber basic protein, using the quantum system Cu(Im)2(SCH3)(S(CH3)2) § [26]. Systema Distance to Cu (pm) Angle around Cu (o) tp Cu Protein Con Sc~ N SMe t N-N Scr~-N S-S SMet-N (o) I Vacuum 232 214-215 237 109 105-108 115 107-113 89 Pc red Yes 221 203-208 339 103 120-134 104 78-101 76 No 222 203-212 375 103 120-136 105 72-103 71 Crystal 211-217 203-239 287-291 91-118 110-141 99-114 83-110 74-80 II Vacuum 218 204 267 103 120-122 116 94-95 90 Pc ox 214 Yes 197-198 290 103 123-125 105 86-106 78 Crystal 207-221 189-222 278-291 96-104 112-144 102-110 85-108 77-89 IIb Vacuum 223 205-206 242 100 97-141 103 95-126 62 Nir Yes 219 200-203 262 100 104-135 105 87-129 61 No 224 203-205 241 97 91-146 105 89-141 48 Crystal 208-223 193-222 246-270 96-102 98-140 103-109 84-138 56-65 IIc CBP Yes 217 199-210 273 100 118-128 104 84-118 69 Crystal 216 193-195 261 99 110-138 111 83-112 70 a The system is defined by the oxidation state of the copper ion (Cu), the protein (Pc red, reduced plastocyanin; Pc ox, oxidised plastocyanin; CBP, cucumber basic protein; Nir, nitrite reductase; Vacuum, quantum chemical optimisation in vacuum [14,34]; Crystal, range observed in the available crystal structures in the Brookhaven protein data bank), and whether there is a connection between the metal ligands and the protein backbone (Con). b This is the tetragonal structure, which in vacuum is obtained with the Cu(Im)2(SH)(S(CH3)2) + model [34,65]. This is a structure intermediate between trigonal and tetragonal that has not been observed in vacuum.

For the C u - S M e t bond length, the results are less clear. In all cases, the bond is elongated, and for the oxidised structures, it is in excellent agreement with experimental structures. However, for reduced plastocyanin, the Cu-SMet bond becomes too long, 339 pm, compared to -290 pm. This is most likely due to the flexibility of the bond, combined with problems in the classical force field. Apparently, the molecular mechanics part of the calculations is not accurate enough to describe the fine-tuned interplay between methionine group, the copper ion, and the surrounding enzyme. Nitrite reductase and cucumber basic protein were also included in the investigation to see if the protein could stabilise tetragonal and intermediate structures, although such structures cannot be found in vacuum with the quantum system used (Cu(Im)2(SCH3)(S(CH3)2)+). The results in Table 4 show that this is actually the case. The COMQUM structures give tp angles of 61 o and 69 ~ respectively, which is close to the experimental values and clearly show that the nitrite reductase is tetragonal, whereas the cucumber basic protein structure is intermediate. The large SMet---fu-N and Scys---fu-N a n g l e s also flag that the structures are not trigonal. At first, these improved structures could be taken as evidence for protein strain. However, the COMQUM calculations involve effects that are normally not considered as strain, e.g. the change in the dielectric surrounding of the copper

16

site, electrostatic interactions, and hydrogen bonds. In order to distinguish between these effects, we performed two calculations in which the covalent bond between the backbone and the side chain of the metal ligands is removed (c.f. Table 4) [26]. This way, covalent strain effects from the protein are eliminated. Interestingly, this hardly changed the structure of the plastocyanin site at all, except for an elongated CU-SM~t bond. However, the nitrite reductase structure became more similar to the vacuum structure, except for a larger variation in the SMet---fu-N angles and a smaller tp angle (even smaller than in crystal structures). Thus, the nitrite reductase structure seems to be tuned by covalent interactions, whereas the plastocyanin site is modified by electrostatic interactions. This is in excellent agreement with our free energy calculations of the two proteins [54], which indicated that plastocyanin preferred the trigonal structure by electrostatic interactions, whereas nitrite reductase favoured the angles in the tetragonal structure. It should be noted, however, that already the vacuum structures reproduce most of the features of the copper coordination. Protein interactions are used only to fine-tune the structures at a small cost in energy. Actually, the COMQUM calculations give us an opportunity to directly estimate strain energies in the proteins. The strain energy is given by the difference in energy of the isolated quantum system at the COMQUM geometry and at the optimal vacuum geometry. This energy ranges from 33 to 51 kJ/mole. This is similar to what was found for the catalytic and structural zinc ions in alcohol dehydrogenase, 30-60 kJ/mole [57,75-77], which seems to be a normal strain energy for the incorporation of a metal site from vacuum into a protein. If the connection between the protein and the metal ligands is removed, the strain energy is approximately halved. The difference (21-23 kJ/mole) is close to the strain energy in the sense of Warshel [50] (see Section 8) and also in the common mechanical sense (a distortion of the structure caused by covalent interactions). This energy is actually appreciably lower than what was found for alcohol dehydrogenase (33 kJ/mole) [76]. Yet, even these energies involve some terms that are not commonly regarded as strain. In the vacuum structure there are hydrogen bonds between the methyl groups and the negatively charged Scy~ atom. These are removed in the COMQUM structure, but the more favourable interactions in the protein are not included when the strain energies are calculated. This gives a significant positive contribution to the strain energy. Therefore, it is not surprising that the strain energies are still not negligible, but it is clear that the COMQUM calculations give no evidence of any unusual strain energies for the blue copper proteins.

17 4. E L E C T R O N I C SPECTRA The hallmark of cupredoxins, leading to their description as blue or type 1 copper proteins, is the presence in their electronic spectrum of an intense (e = 3 00(06 000 M-~crn-~) absorption band around 600 nm. This spectral feature distinguishes them from normal inorganic Cu(II) complexes, the spectrum of which only contains a number of weaker (e = 100 M-~cm-~) ligand-field transitions in the same region [78]. However, also within the type 1 proteins, variations exist. In addition to the prominent peak at 600 mat, a feature at 450 nm is observed in all spectra with a varying intensity [79,80]. The axial type 1 proteins, like plastocyanin and azurin, show only tittle absorption in the 450-nm region, whereas this band becomes much more prominent in rhombic type 1 proteins, like pseudoazufin, cucumber basic protein, and stellacyanin. The increasing intensity of the 450nm band in the latter proteins goes together with a decrease in intensity of the 600-nm band, so the sum of e460 and e600 is approximately constant [79]. Nitrite reductase from Achromobacter cycloclastes is a limiting case for which the 460nm line is actually more intense than the 600-nm peak, giving the enzyme a green colour. No natural proteins exist in which the blue band is even further reduced, but by site-directed mutagenesis a number of mutants have been constructed in which only the second band is present, blue-shifted towards 410 ran, giving them a yellow to orange colour [81]. Based on the analogy of their EPR spectra with the normal type 2 copper proteins, these mutants have been classified as type 2 [82]. The classification of mutants with intermediate spectroscopic characteristics as type 1.5 follows naturally. Apart from these two peaks, several weaker features have been discerned in the visual and near-infrared region of the spectra of type 1 copper proteins. Based on different types of spectroscopic analyses and with the help of the density functional X~ calculations, Solomon and coworkers [83,84] have reported and assigned a total of nine absorption bands in the spectrum of plastocyanin (c.f. Table 5). They assigned the 600-nm (11 700 crn-~) band to a charge transfer excitation from a Scys p orbital with rt overlap to a Cu orbital. The band at 460 nm (21 370 cm -~) was proposed to correspond to a His---)Cu charge-transfer, whereas an additional feature at 535 nm (18 700 cm -~) was assigned to a charge transfer from the so-called Scy~ pseudo-o orbital. Similar studies have more recently been performed on nitrite reductase [85], cucumber basic protein, and stellacyanin [86]. Below, we describe our spectroscopic studies of the blue copper proteins with the more accurate CASPT2 method, leading to a unified theory for the spectra of copper-cysteinate proteins.

18

4.1 The electronic spectrum of plastocyanin We have studied the electronic spectrum of plastocyanin with the CASSCF/CASPT2 approach [36]. The blue copper site in this protein is not symmetric. However, the N - C u - N and S-Cu-S planes are approximately perpendicular, so the geometry can be changed to G symmetry with modest movements. Such a symr~trisation simplifies the labelling of the excited states and speeds up the calculations, so that larger models and more excited states can be studied. However, our most reliable results were obtained for an unsymmetrical Cu(Im)2(SH)(SH2) + model (for which we can include a point-charge model of the surrounding protein and solvent), corrected for the truncated cysteine and methionine models [36]. A total of nine states have been studied, including the five ligand-field states and the four lowest ligand-to-metal charge-transfer states. The results are shown in Table 5 together with experimental excitation energies. The various excited states can best be characterised by analysing the singly occupied molecular orbital of each state. These orbitals are shown in Figure 6. The singly occupied orbital for the X 2A~'ground state is strongly delocalised over the Cu-Scy~ bond. It involves arc antibonding interaction between the Cu 3dxy and Scy~ 3py orbitals, combined with a much weaker ~ antibonding interaction with the two N ligands, whose positions in the equatorial plane are such that a perfect overlap with the two remaining lobes of the Cu 3dxy orbital is obtained (the coordinate system is selected so that the copper ion is in the origin, SMut is on the z axis, and Scys is in the xz plane). The singly occupied orbital of the first excited state (a 2/~) is formed by a t~ antibonding combination of the Cu 3dxz_y2 and Scys 3p~ orbitals. This interaction is also strongly covalent. The calculated excitation energy for this state is 4 119 cm -l, which explains the appearance of the band at 5 000 crn-~ in the plastocyanin spectrum. Between 10 000 and 14 000 cm -l, three bands are found in the experimental spectrum, corresponding to the calculated states b 2A~,b 2A~', and c 2A~.From the composition of the corresponding singly occupied orbitals it is clear that the states concerned can be labelled as genuine ligand-field states, with the electron hole localised in the Cu 3dz2, 3dyz, and 3dxz orbitals, respectively. The presence of a definite amount of Scy~ 3p~ character in the Cu 3dyz orbital of the b 2~r state is notable. This mixing gives a significant intensity for the transition to this state, which is in fact responsible for the second most intense band in the plastocyanin spectrum. The dominant blue band, appearing at 16 700 cm -~ in the experimental spectrum, was calculated at 17 571 cm -~ and corresponds to the c 2A~'state. As can be seen from Figure 6, the corresponding singly occupied orbital is the bonding counterpart to the Cu-Scy~ rt antibonding ground-state orbital. The extremely good overlap between the two orbitals immediately explains the large absorption intensity of the corresponding excitation. Even if this transition formally can be

19

Figure 6. The singly occupied orbitals of the various excited states in the symmetric Cu(Im)2(SCH3)(S(CH3)2) § model, calculated at the CASSCF level [36]. labelled as a Scys---)Cu charge-transfer excitation, the actual amount of charge transferred is only about 0.2 e. At higher energies, four additional charge-transfer bands were observed in the experimental spectrum, at 18 700, 21 390, 23 440, and 32 500 crn-~, respectively [84]. The latter two bands were assigned by Gewirth and Solomon [84] as charge-transfer excitations from methionine and histidine, respectively. We could only study these excitations in models with enforced symmetry. The results are therefore more approximate than for the lower excitations, but they are in line with Gewirth's assignments. However, for the bands at 18 700 and 21 390 cm -~, there is a discrepancy between the experimental spectrum and our calculations. Indeed, we predict only

20

Table 5. The experimental [84,85] and calculated [34,36] spectrum of plastocyanin and nitrite reductase (excitation energies in cm -1, oscillator strengths in brackets) together with the assignment of the various excitations. The ground-state singly occupied orbital is Cu-Scys rt* in plastocyanin but Cu-Sc~s t~* for nitrite reductase. State Plastocyanin Nitrite reductase Assignment Calculated Experimental Experimental Calculated Assignment a 2A Cu-Sc. o* 4 119 (0.000) 5 000 (0.000) 5 600 (0.000) 4 408 (0.000) Cu-Sc~ n:* b zA 3dz2 10 974 (0.000) 10 800 (0.003) 11 900 (0.003) 12 329 (0.000) 3dz2 b~' 3d,,z 13 117(0.001) 12800(0.011) 13500(0.009) 12872(0.000) 3dyz c 2t~ 3dxz 13 493 (0.000) 13 950 (0.004) 14 900 (0.010) 13 873 (0.003) 3dxz c 2A'' Cu-Scy, n: 17 571 (0.103) 16 700 (0.050) 17 550 (0.020) 15 789 (0.032) Cu-Scy~X d2A Cu-Sc~o 20599 (0.001) 21 390(0.005) 21 900(0.030) 22461 (0.119) Cu-Sc~o

one excited state in this region of the spectrum, d 2A~.The singly occupied orbital in this state is the ~ bonding combination of Cu 3dx2_y2 and S 3px, corresponding to the antibonding orbital of the first excited a 2g state. This is Gewirth and Solomon's pseudo-o orbital [84]. They assign the band at 18 700 cm -1 in the experimental spectrum as the excitation to the d 2A~state and the band at 21 390 cm -1 as another His---)Cu charge-transfer excitation. Our calculated excitation energy for the d2A' state is 20 599 crn-~, between the experimental bands at 18 700 and 21 390 crn-~, but closer to the latter. Still, our assignment of the 21 390-crn-~ band as the transition to d 2A~comes mainly from an analysis of the Scys po---)Cu transition in other proteins and as a function of the tp angle (see the next section). According to this analysis, the transition energy should remain constant for the various proteins. Therefore, it seems unlikely that the d ZA~ state would appear more than 3 000 crn-~ lower in energy in plastocyanin (18 700 cm -1) than in nitrite reductase (21 900 crn-~ [85]). Moreover, the intensity of the Scys po---)Cu transition should increase significantly with a decreasing tp angle, which is in accordance with the increasing intensity of the 460-nm peak (21 700 crn-~) in the experimental spectra of the rhombic type 1 proteins. In addition, experimental evidence indicate that Scys, rather than imidazole, is involved in this band [79,87]. Therefore, it is more plausible to assign the d 2g state to the band at 21 390 cm -1, although this means that we have to leave the 18 700-cm -1 band unassigned. It is notable that the latter band is not present in the experimental spectrum of nitrite reductase [85].

4.2 Correlation between structure and spectroscopy of copper proteins On the basis of the electronic, resonance Raman, and EPR spectra, the cysteine-containing copper proteins have been divided into four groups: axial type 1 (e.g. plastocyanin), rhombic type 1 (e.g. nitrite reductase and stellacyanin), type 1.5, and type 2 (mutant) copper proteins [81]. We have studied the spectra of members of each group with the CASPT2 method [33,34,36,38].

21

Figure 7. The singly occupied ground-state orbitals for four models of rhombic type 1 copper oroteins, calculated at the CASSCF level [341. In Figure 7, the ground-state singly occupied orbitals of three rhombic type 1 proteins, viz. cucumber basic protein (plantacyanin), pseudoazurin, and nitrite reductase, are shown. If these are compared with the ground-state orbital of plastocyanin in Figure 6, a clear difference can be seen. In plastocyanin, there is an almost pure ~* interaction between Cu and Scys. However, in nitrite reductase, this interaction is instead mainly of o* character, and the other two proteins show a mixture of o* and ~* interactions. Thus, there has been a change in the ground state of the system; for plastocyanin the Cu-Scys o* interaction is found in the first excited state, whereas in the rhombic proteins, a significant contribution of o character is found in the ground-state singly occupied orbital. The singly occupied orbitals in the other excited states are not much changed. This directly explains the change in intensity pattern of the spectrum. The re* antibonding orbital has a strong overlap with the corresponding rc bonding orbital in the c 2A~' state,

22

giving rise to the blue line in the spectrum, whereas the t~* antibonding orbital instead overlaps strongly with the corresponding t~ bonding orbital, found in the d2A~ state. As expected, this transition gives rise to the yellow band around 460 nm in the spectrum, the line that increase in intensity for the rhombic type 1 copper proteins. In Table 5, the calculated and experimental excitation energies and oscillator strengths of plastocyanin and nitrite reductase are compared [34]. It can be seen that the error in the calculations is consistently below 1 800 crn-~, i.e. within the error limits of the CASPT2 method [28]. Moreover, the calculations follow the experimental trend, i.e. that all excitations for nitrite reductase appear at a higher energy than the corresponding excitations for plastocyanin [85]. This reflects the stronger ligand-field exerted in the tetragonal structure, with four instead of three strongly bound ligands. The intensity of the ligand-field states also reflects the change in ground state" the intensity of the d2A~' state has dropped to zero, whereas the b,c 2A~ states gain intensity from the presence of a small amount of Scys 3pa character in the corresponding singly occupied orbitals. The ground-state orbitals and the spectra of the other two proteins, cucumber basic protein and pseudoazurin are intermediate between those of plastocyanin and nitrite reductase. Similarly, their structures, as described by the angle cp between the S--Cu-S and N - C u - N planes, are also intermediate (rp is 82, 74, 70, and 61 o for plastocyanin, pseudoazurin, cucumber basic protein, and nitrite reductase, respectively). Thus, there seems to be a correlation between the spectrum and the flattening of the copper geometry. This was investigated thoroughly for the Cu(NH3)2(SH)(SH2) + model by calculating the spectrum at a number of cp angles, ranging from the ideal trigonal structure (rp = 90 ~ to a strictly squareplanar structure (r 0 ~ [34]. The results are summarised in Figure 8, which shows how the CU-SM~t bond is shortened and the CU-SMe t bond is elongated as the structure goes from trigonal to square planar. At the same time, the ratio of the calculated oscillator strengths for the excitations around 460 and 600 nm goes from zero to infinity. This reflects that in the trigonal structure, the c2A~' state gives rise to the dominant blue band, whereas the d 2A~state has little intensity. In the square-planar structure, the situation is reversed. The ground state is of CuScys ~* character, and the Cu-Scy~ t~---~* excitation, has by far become the most intense, whereas the c 2g, state has almost completely vanished. Even if the calculations were performed on a simple model, the results presented in Figure 8 nicely reflect the structure-electronic spectroscopy relationship between the various types of copper-cysteinate proteins. The copper coordination geometry of axial type 1 proteins is close to trigonal, and their spectroscopic characteristics are reflected by the results obtained for tp > 80 ~ Rhombic type 1 proteins like pseudoazurin and cucumber basic protein, on the other hand, have rp angles between 70 ~ and 80 ~ As can be seen from Figure 8, even at such a small

23

290-

-25

,o

//

ntensity

280" 270

/

260

15

250

~' ..

240 230 220 210

CU_Scys '

'

'

I

20

'

'

'

I

40

'

'

'

I

60

w

,

,

1

'

'

80

Twisting angle (o)

Figure 8. The variation of the Cu-Scys and C u - S M e t bond lengths and the quotient of the oscillator strengths of the peaks around 460 and 600 nm as a function of the r twisting angle [34]. deviation from orthogonality, the 460-nm excitation has already gained significant intensity due to mixing of ~ character into the ground-state singly occupied orbital. The largest deviation from orthogonality within the type 1 copper proteins is found for nitrite reductase from Achromobacter cycloclastes which has r = 5665 ~ [88]. At such angles, the second transition has become the most intense, which is in accordance with the green colour of nitrite reductase. The intensity of the blue band further decreases as the structure is more flattened, and the results obtained for the smallest r angles in Figure 8 can to a first approximation be used to mimic the properties of type 2 copper-cysteinate (mutant) proteins, with their yellow colour. We have also performed calculations on more realistic complexes [34,38] which confirm these predictions. They show that the Cu-Scys ~--->~* excitation is blue-shifted in these models by more than 1 000 c m - 1 , in agreement with the experimental shift of this band from 460 to 410 nm when going from type 1 to type 2 copper proteins [81 ]. The result in Figure 8 has led us to suggest that axial type 1 proteins have a trigonal structure with a rc bond between Cu and Scys. The other three types of copper proteins have instead a tetragonal structure with mainly o bonds to all the four copper ligands. They differ in the flattening of the geometry, for example as described by the tp angle. Rhombic type 1 proteins, which are most distorted towards a tetrahedron, arise when one of the ligands forms a weak bond. If all ligands bind strongly, but still are rather soft (e.g. histidine), type 1.5 sites arise, whereas with harder ligands (e.g. water) and preferably with two axial ligands, the strongly flattened type 2 copper sites are found. It is notable that all sites form naturally, following the preferences of the copper ion and its ligands, and not by protein strain.

24 The only protein that does not fit into the above description is stellacyanin. The structure of this protein is clearly trigonal, with a r angle of 84 ~ similar to plastocyanin. However, the e46o/e600 ratio for stellacyanin is significantly higher than for plastocyanin and its ESR characteristics are rhombic instead of axial. The structure and electronic characteristics of stellacyanin were recently discussed in two independent studies by Solomon et al. and us, and quite different interpretations were given [33,86]. In Solomon's view, the stronger axial ligand in stellacyanin (a glutamine side-chain amide group, which binds closer to the copper ion than methionine, -220 pm compared to 265-330 pm) should induce a stronger Jahn-Teller driving force. The fact that the copper surrounding in stellacyanin is not more strongly tetragonally distorted than in plastocyanin can in this view only be explained by more protein strain. However, our calculations on the Cu(Im)z(SCH3)(OCCH3NH2) + model show that its optimal geometry is trigonal and close to the crystal structure of cucumber stellacyanin (c.f. Table 1) [33]. There is no need for strain, since the Jahn-Teller instability can be lifted also by a trigonal distortion instead. As concerns the spectral characteristics, Solomon makes a clear distinction between stellacyanin and the other rhombic type 1 proteins, in that he gives a different assignment to the intensity-gaining band around 460 nm: a His--)Cu charge-transfer excitation in stellacyanin, as opposed to a Scys pseudo-o---)Cu excitation in the other rhombic proteins. As already noted, the His--)Cu band was not reproduced by our calculations. However, our results indicate that the excitation out of the Scys o orbital around 22 0(O cm -1 becomes significantly more intense in stellacyanin, at the expense of the blue band, in conformity with what was found for the other rhombic type 1 proteins. The intensity-gaining mechanism in stellacyanin is not a decreasing tp angle, but the stronger axial interaction with the glutamine side-chain amide group, giving rise to a more pronounced mixing in of o character into the ground-state singly occupied orbital, even in an almost strictly trigonal structure (see Figure 7d) [33]. Therefore, there is no need to invoke a H i s ~ C u excitation to explain the increased e460/e600ratio in stellacyanin. 4.3 The sensitivity of the calculated spectra on the theoretical method Our studies of the spectra of blue copper proteins have taught us a lot about spectra calculations on metal complexes and their sensitivity to various parameters. First, the size of the ligand models is crucial. Imidazolc should be used as a model of histidine, SCH3- for cystcine, S(CH3)2 for methioninc, and CH3CONH2 for glutamine [33,36]. If imidazole is replaced by NH3, most excitation energies decrease by 800-1900 cm-~, and the ordering of the excitations may change. Likewise, if SCH3- is replaced by SH-, the excitation energies decrease by up to 5 200 r -l. On the other hand, substituting SH2 for S(CH3)2 increases all excitations by up to 6 800 cm-l. Consequently, the results obtained with Cu(Im)2-

25 (SH)(SH2) + and Cu(Im)2(SCH3)(S(CH3)2) + are quite similar, and can be improved by a set of corrections factors [33,34,36]. Further replacing SCH3- with SCzHshas a small effect on all excitation energies (less than 200 crn-~). This is a bit surprising, since Zerner et al. report changes of up to 2 100 crn-~ in the spectrum of rubredoxin when the chromophore is modelled by Fe(SCzHs)4 instead of Fe(SCH3)4 [89]. Second, the geometry strongly influences the spectrum. In particular, the Cu-S distances are crucial. If the Cu-Scys distance is decreased by 5 pm, all excitation energies increase by up to 2 000 cm-~. Similarly, if the Cu-SMet bond length is increased by 10 pm, the excitation energies increase by up 900 cm-1, except for the excitation to the e 2/~ state (the charge-transfer to methionine), which change by 1 900 crn-~ [36]. Therefore, in order to reproduce experimental excitation energies and to get accurate results it is necessary to reoptimise the two Cu-S distances with the CASPT2 method [34]. Third, the effect of the surrounding protein and solvent molecules, which has been estimated using a point-charge model, is appreciable and cannot be neglected. The general trends are the same for all proteins studied, and can be related to the character of the transitions [33,34,36,39]. The excitation energies of the two Scys----)Cucharge-transfer states increase by up to 2 800 crn-~, whereas the ligand-field excitations, which involve an appreciable charge-flow from Cu to Scys, decrease by almost 2 000 cm-~. A considerably smaller effect is found for the lowest transition, which is essentially a transition within the Cu-Scys bond. However, if only details in the crystal structure are changed, e.g. the binding or exchange of the coenzyme (NADH) in Cu-substituted alcohol dehydrogenase, the variation in the spectra is limited, less than 300 c m -1 [39]. It is also notable that the surroundings reduce all oscillator strengths by a factor of up to 1.75. Finally, we have also investigated the influence of the basis sets, relativistic effects, and Cu 3s and 3p semicore correlation on the spectrum [36]. Somewhat unexpectedly, the spectrum is quite insensitive to the basis set. Increasing it with double polarising functions on Cu and S and single polarising functions on C, N, and H, change the spectrum by less than 250 and 500 crn-~ for the ligand-field and charge-transfer states, respectively, except for the charge-transfer state from methionine, which is changed by 2 300 crn-~ [36]. Likewise, relativistic effects and the Cu 3s and 3p correlation do not influence the spectrum very much, less than 800 cm -~ [36]. However, the two effects act in the same direction and change the ligand-field and the charge transfer excitations in opposite directions (both effects favour states with a low Cu 3d population). Thus, their combined effect may significantly alter the relative energy of the excited states. Therefore, they are included in all reported excitation energies.

26

5. REORGANISATION ENERGIES

According to the semiclassical Marcus theory [6], the rate of electron transfer depends on the reduction potential (AGo), the electronic coupling matrix element (HDA), and the reorganisation energy (A,):

ker

=

2zr H2 a exp( h ~4n:XR'---~

- (AGo + ~)2

).

(1)

42RT

If the geometry of the active site and its surroundings does not change much during electron transfer, the reorganisation energy will be small, and the reaction will be fast. Therefore, it is of vital importance for an electron-transfer protein to reduce the reorganisation energy. For convenience, the reorganisation energy is usually divided into two parts: inner-sphere reorganisation energy, which is associated with the structural change of the first coordination sphere, and outer-sphere reorganisation energy which involves structural changes of the remaining protein as well as the solvent. Several groups have tried to estimate the reorganisation energy for transitionmetal complexes and proteins using theoretical methods of variable sophistication and with varying success [51,90-101]. However, we seem to be the only group that has systematically studied models with relevance to the blue copper proteins. We have estimated inner-sphere reorganisation energies by calculating the energy difference of a reduced model between the optimum geometry of the reduced and oxidised complex or vice versa [68]. For our best model of plastocyanin, Cu(Im)2(SCH3)(S(CH3)2) § we obtain an inner-sphere reorganisation energy of 62 kJ/mole, whereas models of the rhombic type 1 proteins nitrite reductase and stellacyanin have slightly larger values, 78 and 90 kJ/mole. It is far from trivial to compare these values with experimental data. First, we need an estimate of the outer-sphere reorganisation energy. However, it depends strongly on the geometry of the docking complex of the donor and acceptor proteins in the electron-transfer reaction of interest, and it is unlikely that it should be additive for different reactions. Therefore, it is highly questionable to use Marcus' combination rules [6] to obtain reorganisation energies for reactions that have not been studied experimentally [102-107]. It should also be noted that the other terms in the Marcus' equation, the reduction potential and the coupling constant, also change when the docking complex is formed [ 108]. Therefore, reliable comparisons can only be done when calculations and experiments are performed on the same electron-transfer reaction. However, to get a crude feeling about the relation between the calculated and measured reorganisation energies, we can proceed in the following way. The outer-sphere reorganisation energy of three tentative configurations of the dock-

27 ing complex between plastocyanin and its natural electron donor, cytochrome f, has been estimated by force-field methods and numerical solution of the PoissonBoltzmann equation [99]. The best estimate is 42 U/mole, and it can be combined with our calculated inner-sphere reorganisation energy (inner-sphere reorganisation energies can to a good approximation be expected to be additive, since they do not depend on the conformation of the docking complex) for plastocyanin to get an approximate total self-exchange reorganisation energy of 100 U/mole. This energy is slightly lower than the experimentally measured reorganisation energy for plastocyanin (120 kJ/mole) [102]. The reorganisation energy of azurin, which is the best studied blue copper protein [103-107], is slightly lower (about 80 kJ/mole), but it is likely that azurin, with its bipyramidal copper site, has a lower reorganisation energy than the pyramidal site in plastocyanin [68]. Recently, Loppnow and Fraga [109] have estimated the reorganisation energy for plastocyanin by analysing resonance Raman intensities. They obtain an innersphere reorganisation energy of 18 U/mole, which is significantly lower than our [ 105]. However, it represents the reorganisation energy of charge transfer during the excitation to the intense blue fine. As we saw above, only about 0.2 e is transferred during this excitation (and only from Scys to Cu) and it has therefore tittle to do with the reorganisation energy during electron transfer of plastocyanin

[68]. We have also investigated how the blue copper proteins have achieved a low reorganisation energy. As can be seen in Figure 9, a six-coordinate Cu(H20)6 +/2+ complex has a rather small reorganisation energy, 112 kJ/mole. However, Cu(I) cannot stabilise such a high coordination number. If it is allowed to relax to its preferred coordination number, the reorganisation energy of Cu(H20)6 + increases strongly, to 336 U/mole. If the number of ligands is lowered to four, we get a rather high reorganisation energy, 186-247 kJ/mole for Cu(H20)4 +/z+, depending on whether the reduced complex is allowed to relax to a lower coordination number or not. Thus, the low coordination number of the copper ion in the proteins is unfavourable for the reorganisation energy, but necessary since Cu(I) normally does not bind more than four ligands. Instead, the low reorganisation energy of the blue copper proteins is achieved by a proper choice of ligands. Nitrogen ligands give an appreciably (50 U/mole) lower reorganisation energy than water, owing to the lower Cu-N force constant. A methionine ligand gives an even lower reorganisation energy (by 14 kJ/mole), because of its weaker Cu-S bond. The cysteine ligand decreases the reorganisation energy even more, by 45 U/mole, although the Cu-Scys force constant is appreciably higher than the one of Cu-N. This decrease is caused by the transfer of charge from the negative charged thiolate group to Cu(II), which makes the oxidised and reduced structures quite similar. The effects of the methionine and cysteine ligands are approximately additive, so the Cu(NH3)3(SH)(SH2) +/~ complex has a reorganisation energy of 74 U/mole. Finally, for the trigonal

28

3507Cu(H20)6 "~

250

~

Cu(n20)4

001 ~~Cu(

~ 1505o0

CNtt~()NH3)3S(I~ )3SH

II

Figure 9. The inner-sphere self-exchange reorganisation energy of a number of complexes related to the blue copper proteins. The hatched bars indicate the reorganisation energy obtained when the reduced structure preferred a lower coordination number than the oxidised structure [681. Cu(NH3)3(SH)(SH2) +/0 complex (all the other complexes have been tetragonal),

the oxidised structure is even closer to the reduced one, so the reorganisation energy is only 66 kJ/mole. If more realistic models are used, the reorganisation energy decreases by 4 kJ/mole and we arrive at the estimate discussed above. Thus, we can conclude that the inner-sphere reorganisation energy of our blue copper models is similar to the one in the proteins. This indicates that the proteins do not alter the reorganisation energy to any significant degree, i.e. that protein strain is not important for the low reorganisation energies of the blue copper proteins. On the contrary, an important mechanism used by the blue copper site to reduce the reorganisation energy is the flexible bond to the methionine ligand, which can change its geometry at virtually no cost [54,68]. This mechanism is actually the antithesis of the strain hypotheses, which suggest that a low reorganisation energy is obtained by the rigid protein obstructing any change in geometry.

6. R E D U C T I O N P O T E N T I A L S The reduction potential is central for the function of electron-transfer proteins, since it determines the driving force of the reaction. In particular, it must be poised between the reduction potentials of the donor and acceptor species. Therefore, electron-transfer proteins normally have to modulate the reduction potential of the redox-active group. This is very evident for the blue copper proteins, which show reduction potentials ranging from 184 mV for stellacyanin to -1000 mV for the type 1 copper site in domain 2 of ceruloplasmin [ 1,110,111 ].

29 These two copper sites are untypical in that stellacyanin has a glutamine amide oxygen atom as the axial ligand (instead of methionine), whereas the ceruloplasmin centre does not have any axial ligand at all (leucine replaces the methionine ligand). However, blue copper proteins with the typical CuHis2CysMet ligand sphere have reduction potentials between 260 and 680 mV (e.g. pseudoazurin and rusticyanin) [1,63], although they share the same active site. It is also clear that the reduction potentials of the blue copper proteins are high, higher than for most other electron-transfer proteins (-700 to +400 mV for iron-sulphur clusters [112] and-300 to +470 for cytochromes [113,114]), and also higher than for a copper ion in aqueous solution (+ 150 mV [115]). The reason of these high potentials and their great variation has been much discussed. Originally, the entatic state and the induced rack hypotheses suggested that the high potential was caused by protein strain. They proposed that the protein forces Cu(II) to bind in a geometry more similar to that preferred by Cu(I). Thus, Cu(II) should be destabilised, which would increase the reduction potential [10,12]. This effect has been observed for inorganic complexes with strained ligands [ 115]. However recently, Malmstrtim and Gray showed that the reduction potential of denatured azurin is higher than for the native protein [105,116,117]. This shows that the reduced copper site gains more from unfolding than the oxidised site, especially as unfolding would increase the solvent accessibility of the site, thereby favouring Cu(II) and lowering the reduction potential. Consequently, the overall effect of the folding of the protein is a lowering of the reduction potential [117], i.e. opposite to what the strain hypotheses originally suggested. This is in line with the suggestion by Solomon and co-workers that it is only the Cu(I)-SMe t bond that is constrained by the protein [13]. A normal Cu(I)-SMet bond length is about 230 pm, whereas in the blue copper proteins, the observed length is around 290 p ~ Such an elongation can be predicted to reduce significantly the donation of charge from the ligand to the copper ion, which would increase the reduction potential. In fact, density functional Xct calculations indicate that the reduction potential would increase by more than 1000 mV by this elongation [ 13]. Malmstr6m et al. have extended this hypothesis to include also other axial ligands [117,118]. They point out that stellacyanin has the strongest axial ligation among the blue copper proteins (a glutamine amide group at a distance of--220 pm) and also the lowest reduction potential. Azurin has two axial ligands at distances around 310 pm and a higher reduction potential (285-310 mV). In plastocyanin, the Cu-O distance has increased to about 390 pm, as has the reduction potential (to 380 mV). In rusticyanin, the Cu-O distance is even greater, 590 pm, and the carbonyl oxygen does no longer point towards the copper site. This is correlated with a high reduction potential of 680 mV (however, they disregard the compensating shortening of the Cu-SMet bond in plastocyanin and rusticyanin).

30

0 - - - 0 Cu(I) solution Cu(lI) solution Cu(l) vacuum A-----A Cu(II) vacuum

"• 6 ~4 2

0

230

250 2 70 290 C u - S ( M e t ) di sta n c e ( p m )

310

Figure 10. The calculated potential energy surface of the Cu-SMet bond in the Cu(Im)2(SCH3)(5(CH3)2)+/~complexes [35]. Two curves are given for each oxidation state, one in vacuum and one in water (calculated with the CPCM method). The actual potential in any protein can be expected to found in between these two extreme cases. Reduction potentials can be found by forming the difference between the curves of the oxidised and reduced complex together with a hypothesis whether the Cu-SMct bond is constrained in the oxidised, reduced, or both states [68]. Note that 1 kJ/mole = 10.4 mV. Finally, in fungal laccase and ceruloplasmin, which have the highest known reduction potentials (750-1000 mV), a leucine replaces the methionine ligand, yielding a three-coordinate copper site. Thus, they propose that the protein fold dictates the reduction potential of the copper site by varying the strength of the axial ligation [ 117,118]. We have examined these suggestions by several types of calculations. First, we have used free energy perturbations to estimate the maximum strain energy plastocyanin or nitrite reductase can mobilise to resist a certain copper geometry [54]. These calculations show that the proteins are quite indifferent to the CU-SMe t bond length. It costs less than 5 kJ/mole to change the length of this bond between the values observed in different crystal structures or in optimised vacuum models. This energy is at least a factor of two too low to explain the observed differences in the CU--SM~t bond length. Instead, the difference between the calculated and observed CU-SM,t bond seems to be caused by systematic errors in the theoretical method, dynamic effects, and solvation effects [26,35,54], as was discussed above. Second, quantum chemical calculations of the potential energy surface of the CU-SMe t bond shows that it costs less than 10 kJ/mole to change the Cu-SMet bond length by 100 pm around its optimum value (see Figure 10), a range larger than the natural variation in this bond [ 14,54]. Thus, even if the proteins could constrain this bond, it would affect the electronic part of the reduction potential by less than 10 kJ/mole, or 100 mV, i.e. much less than the variation found among the blue copper proteins. Moreover, a constrained Cu(I)-SMetbond would

31

destabilise the reduced state and therefore decrease the reduction potential, contrary to the suggestion of a raised potential [ 119] and the fact that the blue copper proteins are characterised by high reduction potentials. However, there are other contributions to the reduction potential than the electronic part, most prominently the solvation energy of the active site caused by the surrounding protein and solvent. We have therefore studied the reduction potential of the blue copper proteins using various methods to include the solvation effects. The results have shown that constraints in the Cu-Suet bond length can affect the reduction potential by less than 70 mV (c.f. Figure 10) [35]. Furthermore, we have tested the suggestion [63,118] that the reduction potential is determined by the axial backbone carbonyl ligand or by replacements of the methionine ligand (by glutamine in stellacyanin or leucine in ceruloplasmin and laccase). Again, our results show that the potential energy surfaces of the axial ligands are too soft to account for the variation in reduction potential among the blue copper proteins, even if solvation effects are taken into account (the total effect is less than 140 mV) [35]. This is in accordance with mutation studies of the axial methionine ligand in azurin [120], showing that most substitutions give only modest changes (less than 60 mV). The largest effects are found for mutations to hydrophobic residues, which increase the reduction potential by up to 140 mV, and also mutations that change the structure of the copper site [ 121]. Therefore, there must be other reasons for the high potentials of the blue copper proteins. Examination of small inorganic models [63,115,122] have shown that anionic ligands lower the potential, whereas sulphur and nitrogen re acceptor ligands raise the potential. Our calculations of the reduction potentials of a number of blue-copper models confirm this [45]. The replacement of an ammonia ligand in Cu(NH3)4 +/2+ by SH2 increases the potential by 0.7 V, whereas SH- decreases the potential by 0.3-0.5 V. ff both models are included in the complex, Cu(NH3)z(SH)(SH2) ~ the potential hardly change relative to the Cu(NH3)4 +/2+ complex. The same is true if more realistic ligands are used (Cu(Im)z(SCH3)(S(CH3)2)~ A tetragonal model of the rhombic blue copper proteins has a slightly larger reduction potential than the trigonal model (0.07 V), but it is not clear if this difference is significant. Moreover, other effects are as important as the ligands. The dielectric properties of the protein matrix are very different from those of water. It has often been argued that it behaves as a medium with a low dielectric constant (around 4 compared to 80 in water) [47,123,124]. Figure 11 shows that this gives rise to a very prominent change in the reduction potential of a blue-copper site [45]. It increases by 0.8-1 V as the site is moved from water solution to the centre of a protein with a radius of 1.5 nm (like plastocyanin) or 3.0 nm (like an azurin tetramer). It can also be seen that it is not necessary to move the site to the centre of the protein to get a full effect. Already at the surface of the protein, 80% of the maximum effect is seen, and when the site is 0.5 nm from the surface (as is typi-

32

1000

8o0

60O

4oo o tv 2O0

0

0

10

20 30 Distance (A)

40

50

Figure 11. The reduction potential of the Cu(Im)2(SCH3)(S(CH3)2) ~ complex as a function of the size of the protein (1.5 or 3.0 nm radius) and the distance between the copper ion and the centre of the protein [45]. The protein was modelled by a sphere of a low dielectric constant (4) surrounded by water (e = 78.39), and the copper site as a collection of point-charges taken from quantum chemical calculations. The potentials were calculated with the MEAD program.

cal for the blue copper proteins), the change in the reduction potential is 90% of the maximum. Thus, reduction potential of the blue-copper site in the protein will be 0.64).9 V higher than in water solution, in accordance with a 0.5-V variation in the cytochrome reduction potentials depending on the solvent exposure of the haem group [125]. This effect alone explains the high reduction potential of the blue copper proteins compared to copper in aqueous solution. Naturally, details of the protein matrix, i.e. the presence and direction of protein dipoles and charged groups around the copper site, also have strong influence on the reduction potential [53,126]. In fact, a single water molecule 0.45 nm from the copper ion may change the potential by 0.2 V, and backbone amide groups may have similar effects [53]. The water accessibility and the packing of hydrophobic residues have also been shown to significantly influence the reduction potential. In fact, it has been suggested that the protein may modify the reduction potential by more than 1 V without any changes in the redox-active group [52]. With these results in mind, the large variation of the reduction potentials of blue copper proteins is not surprising, even if the detailed mechanism remains to be revealed for most proteins [45,53,126].

7. R E L A T E D PROTEINS 7.1 The binuclear CUA site Cytochrome c oxidase is the terminal oxidase in both prokaryotic and eukaryotic cells and is responsible for the generation of cellular energy via oxidative phosphorylation [127]. It couples the catalytic four-electron reduction of 02 to

33 water to transmembrane proton pumping, which can be used for ATP synthesis and long-range electron transfer. The active site is a haem a3-CUB binuclear site, whereas a second haem a and an additional copper site, CUA, serve as electrontransfer intermediates between cytochrome c and the active site. The CUA site shows many similarities with the blue copper proteins. Recently, the structure of cytochrome c oxidase was determined by crystallography [128-130]. This solved an old controversy regarding the geometry of the CUA site [131,132], showing that it is a binuclear site, bridged by two cysteine thiolate groups. Each copper ion is also bound to a histidine group and a weaker axial ligand, a methionine sulphur atom for one copper and a backbone carbonyl group for the other. The Cu-Cu distance is very short, --245 pm [133], and it has been speculated that it represents a covalent bond [ 134-136]. A similar site is found in nitrous oxide reductase, a terminal oxidase that converts N20 to N2 in denitrifying bacteria [137]. During electron transfer, the CUA site alternates between the fully reduced and the mixed-valence (CuI+Cun) forms. Interestingly, the unpaired electron in the mixed-valence form seems to be delocalised between the two copper ions. Several theoretical investigations of the electronic sm~ctm'e and spectrum of the CUA dimer have been published [138-144]. In similarity to the blue copper proteins, it has been suggested that the structure and the properties of the CUA site is determined by protein strain. More precisely, it has been proposed [136] that CUA in its natm'al state is similar to an inorganic model studied by Tolman and coworkers [145]. This complex has a long Cu-Cu bond (293 pm) and short axial interactions (-212 pm). The protein is said to enforce weaker axial interactions, which is compensated by shorter bonds to the other ligands and the formation of a Cu-Cu bond. This should allow the protein to modulate the reduction potential of the site [136,146]. We have studied the structure, reorganisation energy, and reduction potential of the CUA site with the same theoretical methods as for the blue copper proteins [147]. The experimentally most studied state of CUA is the mixed-valence state. Our optimised structure of (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH 3)+ is very similar to available experimental data [128-130,133,148-151] (c.f. Figure 12 and Table 6). The Cu-Cu distance is 248 pm, 2-5 pm longer than what is obtained by extended x-ray absorption fine structure measurements (EXAFS), and the Cu-Scys distances are 231-235 pm (~2 pm longer than the EXAFS resuits). Even the distances to the axial ligands are within the experimentally observed range: 245 pm for the methionine ligand and 220 pm for the backbone carbonyl group. The difference in the Cu-NHis distances seems to be slightly larger, 6-7 pm, which is probably due to hydrogen-bond interactions in the protein [147]. It has been noted that some inorganic models of the CUA site have an appreciably longer Cu-Cu distance (-290 pm) [145]. This is accompanied by a change in the electronic state: In the proteins there is a ~* antibonding interaction between

34

Figure 12. The optimised geometry of the 6-bonded structure for the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) + complex [ 147] compared to the crystal structure of the CUA site in cytochrome c oxidase (shaded and without any hydrogen atoms) [ 149]. the copper ions in the singly occupied orbital (an orbital of B3u symmetry for an idealised O2h Cu2S2 core), whereas in the model, there is instead a Cu-Cu rc bonding interaction (B2u symmetry) [136,138,141,143]. We have also optimised the rt bonded electronic state. It is characterised by a Cu-Cu bond length of 310 pm and a slightly larger variation in the Cu-Scys distances (226-236 pm), whereas the other geometric parameters are similar to those of the t~ bonded structure. In particular, there is no significant difference in the bond lengths to the axial ligands. Therefore, it is unlikely that variations in the axial interactions (caused by the protein) may change the electronic state of CUA. Interestingly, the two structures are almost degenerate; they have the same energy within 2 kJ/mole. In fact, the full potential surface for the Cu-Cu interaction is extremely fiat. As can be seen in Figure 13, the barrier between the two electronic states is less than 5 kJ/mole and the Cu-Cu distance can vary over 100 pm (240-340 pm) at a cost of less than 5 kJ/mole both in vacuum and in water solution. Thus, there is no indication that the CUA site should be significantly strained. The difference between the protein structures and the mixed-valence model is caused by the degeneration of the two electronic states (indicating that small differences in the surrounding protein may stabilise either structure) and the fact that the inorganic complex involve poor models of the histidine and axial ligands (four amine groups at almost the same distances, 211-212 pm). This illustrates the danger of relying on inorganic complexes with poor ligand models; if such models had been used in theoretical calculations, nobody had believed in them. The two electronic states differ in the localisation of the unpaired electron: in the ~* state, the electron is delocalised over the whole system, whereas in the rt state, the electron is more localised to one copper. Our calculations reproduce this movement of the electron: in the system with a long Cu-Cu bond, the elec-

35

H Fully reducedVacuum 0"------~Mixed-valence Vacuum Fully reducedSolvent O----~ M

/ / /

~6 ~4

230

250

270

290

310

330

C u - C u distance (pm)

Figure 13. The calculated potential energy surface for the Cu-Cu interaction of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) § complex in vacuum and in water [147].

tron is mainly localised to the copper ion with the methionine ligand. However, the electronic structure is quite flexible, as experiments with engineered CUA sites have shown [146,153,154]. The optimum structure of the fully reduced state of our CUA model is also shown in Table 6. It can be seen that most Cu-ligand bond lengths increase by 17 pm upon reduction, but the Cu-Cu distance increases by 9 pm and the Cu-O distance by as much as 30 pn'L No crystal structure has been published for this oxidation state, but EAXFS data are available [133]. It can be seen in Table 6 that our optimised structure is quite close to these results, with the same trends as for the mixed valence structure (i.e. slightly too long Cu--Cu and Cu-N bonds). Therefore, our calculations excellently reproduce the changes upon reduction observed by EXAFS, e.g. the change in the Cu-Cu distance. We also reproduce the larger variation in the Cu-S distances (233-247 pm). Consequently, the calculated reorganisation energies can be expected to be quite accurate. For the reduction of the o* state, we predict a serf-exchange inner-sphere reorganisation energy of 43 kJ/mole [ 147]. This is 20 kJ/mole lower than for plastocyanin [68]. It has been speculated that the reorganisation energy of CUA should be half as large as for a blue-copper site due to the delocalised electron [ 136,144, 155,156] and older estimates of the reorganisation energy of the CUA were in general quite low, 15-50 U/mole [157,158]. However, recent experiments have indicated that the reorganisation energy is of the same size as for the blue copper proteins, around 80 kJ/mole [159]. If the outer-sphere reorganisation energy of cytochrome c oxidase is of the same magnitude as for plastocyanin (-40 kJ/mole [99], our calculated reorganisation energy is in good agreement with the latter experiment. The reorganisation energy for reduction of the n state is appreciably higher, 69 kJ/mole, which is due to the change in the Cu-Cu bond length and the angles in the CuS2Cu core [147].

36 Table 6. Bond distances in four electronic states of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) model [147] compared to experimental data for CUAand model compounds. Oxidation Electronic Distances (pm) states State Cu-Cu C u - S c y s Cu-NHis Cu-SMet CH-O I+I 257 233-247 207-211 240 250 EXAFSa 251-252 231-238 195-197 II+I G* 248 231-235 202-209 245 220 rc 310 227-236 203-210 242 219 EXAFS" 243-246 229-233 195-203 crystalb 220-258 207-244 177-211 239-302 219-300 modelc 293 225-229 211 212 212 II+II 342 228-234 202-203 242 202 modeld 334 233 206 210-226 210-226 a EXAFS data [ 133,151]. b Protein crystal structures [128-130,148-150]. CA mixed-valence inorganic model synthesised by Tolman and coworkers [145] with a r~ ground state. Note that both the histidine and axial ligands are amine groups in the model. d Another fully oxidised inorganic model synthesised by Tolman and coworkers [ 152]. Note that each copper ion is five-coordinate with three amine nitrogen ligands. The potential energy surface for the Cu-Cu bond in the reduced CUA model is almost as flat as in the mixed-valence state (Figure 13). Therefore, the reduction potential of the CUA site cannot change by more than 100 mV by constraints in this bond. In particular, a change in the electronic structure of the mixed-valence state from rt to G does not change the reduction potential by more than 13 mV. Solvation effects alter the results by less than 20 mV (Figure 13). Similarly, the potential energy surfaces of the CU-SM~t and C u - O bonds are also flat (Figure 14). The two bond lengths can vary over a range of almost 100 pm at an energy cost of less than 8 kJ/mole. As for the blue copper proteins, the optimum distance for the carbonyl group is shorter than for the methionine ligand. Thus, it unlikely that the axial ligands determine the reduction potential of the Cug site [136,146,155]. Even if the protein could constrain these distances, the results in Figure 14 show that the reduction potential would vary by less than 80 mV for the experimentally observed range of these bond lengths. Inclusion of solvation effects does not change the situation significantly [ 147]. Finally, we have also studied the fully oxidised CUA model (Cun+CuU). This state has not been unambiguously observed in biology yet, but it has been suggested that it is responsible for the differing characteristics of the Cuz site in nitrous oxide reductase [146,160]. Our calculations indicate that the fully oxidised state should have a much longer Cu-Cu bond (342 pm) and a shorter C u - O bond (202 pm) than the mixed-valence state. This is reasonably similar to a fully oxidised inorganic model complex with bridging thiolate groups and three amine nitrogen ligands of each copper, see Table 6 [152]. In particular, the angles in the CuS2Cu core are very similar, 85 ~ compared to 83 ~ for the S--Cu-S angle. This is quite different from the angles in the mixed-valence G* state (115~ Therefore,

37

6

~4 O-----O C'u-O Reduced --- Cu-O Ox Cu-S Reduced

kZ///

210

230

-

. - -co-so~,~d

250 270 Distance (A)

290

310

Figure 14. The calculated potential energy surfaces for the Cu-SMet and Cu-O interactions of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3)§ complex in vacuum [ 147]. the self-exchange reorganisation energy for the oxidation of this state is high, 133 kJ/mole [ 147]. Interestingly, the re-bonded structure is more similar to the fully oxidised state, and the corresponding reorganisation energy is also appreciably lower, 90 kJ/mole. It has been suggested that the CUA and Cuz sites in nitrous oxide reductase in fact are the same site, with altered properties as a result of a comformational change [146,160]. If this suggestion is correct, it follows from our resuits that the conformational change may stabilise the t~* structure in the CUA site, but the ~ structure in the Cuz site. This would hardly cost anything in energy terms (Figure 13), but it would strongly reduce the inner-sphere reorganisation energy for oxidation of the mixed-valence state [ 147]. In conclusion, the properties of the CUA dimer are very similar to those of the blue copper proteins. Each copper ion has a trigonal structure with a weakly bound axial ligand. There are two nearly degenerate electronic states, which together with the fiat potential of the axial ligands give a very plastic site. The inner-sphere reorganisation energy is slightly lower than for the blue copper proteins and it is achieved by the same mechanisms: delocalisation of the charge between the copper and sulphur ions and flexible bonds to the axial ligands. As for the blue copper proteins we have not seen any evidence for protein strain in the CUA site.

7.2 Cytochromes In nature there are only two major types of electron-cartier sites in addition to the blue copper proteins and the CUA site, viz. cytochromes and iron-sulphur clusters [ 161,162]. The cytochromes consist of an iron ion bound to a porphyrin ring. Two axial ligands complete the octahedral coordination sphere. During electron transfer, iron alternates between Fe(II) and Fe(III). Several types of cytochromes exist in biological systems, depending on the substituents on the por-

38 phyrin ring, the axial ligands, and the number and arrangement of the haem groups in the protein (cytochromes a, b, c, f, etc) [163]. Their reduction potential ranges between -300 and +470 mV [ 113,114]. Several groups have tried to predict the reduction potentials of various cytochromes using theoretical approaches [51,95,97,114,164-171]. A few groups have also studied the reorganisation energy of these proteins. For example, the outer-sphere reorganisation energy of cytochrome c has been calculated to 28100 kJ/mole with various theoretical methods [6,51,93,95,97,100]. The majority of this energy comes from the protein, 70-90 %. It is also clear that the protein reduce the reorganisation energy compared a haem group in water solution, which has been estimated to I(Kt-160 kJ/mole [95,97,100]. The inner-sphere reorganisation energy of the cytochromes is considered to be low, ranging from negligible to 48 kJ/mole [6,51,93,95,97]. It is normally estimated from the difference between the measured total serf-exchange reorganisation energy and the calculated outer-sphere reorganisation energy. Considering the large variation of the latter, and an almost equally large span of experimental estimates (e.g. 70-140 kJ/mole for the serf-exchange reorganisation energy of cytochrome c [172-174]), such estimates much be considered very approximate. Alternatively, the inner-sphere reorganisation energy has been estimated from vibrational frequencies and the observed changes in the haem geometry in crystal structures. However, also these estimates are approximate, since the observed changes in the bond lengths to the iron ion are smaller than the uncertainty in the crystal structures. Therefore, we have investigated the inner-sphere reorganisation energy of iron porphine (the porphyrin ring without any substituents) with different axial ligands [24]. The results presented in Table 7 show that if the axial ligands are uncharged, the reorganisation energy is small, 5-9 kJ/mole, appreciably smaller than for the blue copper proteins (62 kJ/mole). It varies somewhat with the axial ligands. Two methionine ligands (as in bacterioferritin [175]) give the lowest reorganisation energy, whereas the most common sets of ligands (His-His and HisMet, as in the b and c type cytochromes [163]) give slightly higher reorganisation energies. We have also tested a number of charged axial ligands, which have been suggested to be present in haem proteins [113]. These models have appreciably higher reorganisation energies, ranging from 20 kJ/mole (His-Cys) to 47 kJ/mole (His-Tyr). Interestingly, the only combination that has been unambiguously observed in a cytochrome is His-Tyr (in the dl domain of cytochrome Cdl nitrite reductase [ 176]). At a first glance the results may indicate that the Tyr ligand in this cytochrome should be protonated. Yet, the reorganisation energy is not larger than observed for blue copper proteins or iron-sulphur clusters [24,68,147], so it cannot be excluded that the Tyr ligand is deprotonated. All other characterised proteins with negatively charged axial ligands are enzymes with a catalytic func-

39 Table 7. Geometries and inner-sphere reorganisation energies for a number of cytochrome models calculated by the B3LYP method [24]. The haem group was modelled by Fe(porphine) and Met, His, Amt (amino terminal), Cys, Tyr, and Glu were modelled by S(CH3)2, Ira, CH3NH2, SCH3-, C5H60-, and CH3COO-, respectively. All complexes were assumed to be in the low-spin state in accordance with experiments [ 162]. Axial ligands Oxidation Reorg. energy Distance to Fe (pm) 1 2 state ( M / m o l e ) Ligand 1 Ligand 2 N Met Met II 2.7 240 240 202 III 2.1 240 240 202 His Met II 4.2 203 243 202 III 4.1 200 244 201 HIS HIS II 3.7 205 205 202 III 4.5 202 203 201 His Amt II 4.2 203 208 202-203 III 4.4 200 205 201-202 HIS Cys II 9.7 211 238 202 III 10.3 215 222 201-203 His Tyr II 21.2 206 199 202-203 III 25.8 207 184 201-203 His Glu II 13.0 205 199 202-203 III 13.4 207 187 201-202 don rather than electron carders (often in combination with a five-coordinate iron ion). Much experimental data are available for the structure of small inorganic haem models with various axial ligands [177,178]. From these, it can be conclude that our calculated Fe-Npor, Fe-NHis, Fe-SMet, and Fe-Scy~ distances are slightly too long, by 2-3, 4-5, 6, and 3 pm, respectively [24]. This probably reflects the accuracy of the B3LYP method. However, it is also clear that the discrepancy is the same (within 1 pm) for the two oxidation states. Therefore, the change in the Fe-ligand bond lengths upon reduction is accurately reproduced in our models, so calculated reorganisation energies can be expected to be quite reliable. It is also notable that the accuracy of our optimised models is better than what can be expected for a metal site in protein crystallography [179]. Therefore, it is not meaningful to calibrate our results by comparing to a single protein structure. We have also investigated how the cytochromes have achieved a low reorganisation energy [24], using similar methods as for the blue copper proteins [68]. First, an octahedral geometry is favourable for electron transfer, since there is no change in the angles upon reduction. Second, cytochromes en~loy nitrogen and sulphur ligands, which form weaker bonds with smaller force constants than oxygen ligands (the reorganisation energy of Fe(NH3)6is a third of that of Fe(H20)6). Third, covalent strain in the porphyrin ring decreases the changes in the Fe-Npo~ distances. For example, if the porphyrin ring is replaced by two molecules of diformamidate (NHCHNH-), a small ligand that often has been used as a reasonable minimal model for the porphyrin ring [180], the equatorial Fe-Npor distances

40

change by 9-10 pm upon reduction, compared to ~1 pm with the full porphine model. This increases the reorganisation energy by 44 kJ/mole. It is informative to compare the haem group and the blue copper proteins since we have argued strongly against a reduction of the reorganisation energy by strain in the blue copper proteins [10,12,14,49,63]. The major difference is that the porphyrin ring is held together by strong covalent bonds and is constrained by the aromaticity of the ring, whereas in the protein, the ligands are oriented by weak torsional constraints and non-bonded interactions. Covalent bonds are stronger than metal-ligand bonds, whereas torsions and non-bonded interactions are weaker. Therefore, the iron ion is constrained in the haem group, whereas it is more likely that the protein will distort if the preferences of the metal and the protein differ. Moreover, it must be recognised that if significant strain were involved in the binding of a metal, it would simply not bind; a strain energy of 70 kJ/mole, as has been suggested for the blue copper proteins [ 12], corresponds to an equilibrium constant of 1.5-1012.

7.3 Iron-sulphur clusters Iron-sulphur clusters are the third type of the widely available electron-transfer sites in biology. They consist of iron ions surrounded by four sulphur ions, either thiolate groups from cysteine residues or inorganic sulphide ions. Regular clusters with one (rubredoxins), two, three, or four (ferredoxins) iron ions are known, as well as a number of more irregular clusters, also with other ligands than cysteine [ 112,181 ]. Theft reduction potentials vary between -700 and +400 mV [ 112]. The electronic structure, spectroscopy, and reduction potential have been thoroughly studied for all common classes of iron-sulphur clusters [52,89,182-191]. In particular, Noodleman and coworkers have performed detailed quantum chemical calculations on iron-sulphur clusters in various spin states [192-198]. It is now settled that rubredoxin contains an iron ion in the high-spin state (quintet for Fe n, sextet for Fern), whereas in the [2Fe-2S] clusters, the two iron ions are both in the high-spin state, but antiferromagntically coupled to form a singlet or doublet state for the oxidised 0II+l/I) and reduced (mixed-valence II+III) forms, respectively [112,162]. In variance to the CUA site, the unpaired spin is trapped at one of the iron ions in the mixed-valence state. However, nobody seems to have studied the reorganisation energy of the ironsulphur clusters systematically. Therefore, we have initiated an investigation of the inner-sphere reorganisation energy of Fe(SfH3)4, (SCH3)2FeS2Fe(SCH3)2,and (SCH3)2FeS2Fe(Im)2 [24]. The optimised structures and the calculated innersphere reorganisation energies are collected in Table 8. The Fe-S distances in the Fe(SCH3)4model increase from 232 to 242 pm when the site is reduced. This 10-pm increase is similar to what is observed in inorganic model complexes, but the average distances are shorter in the models, 227 and 236 pm, respectively [ 199]. Thus, the Fe-S distances are again 5-6 pm too

41 Table 8. Geometries and inner-sphere reorganisation energies for iron-sulphur models calculated by the B3LYP method [24]. Model Oxidation Reorg. Distance to Fe (pm) NHis state energy Scys Si Fe 21.4 Fe~ ~(SCH3)4 II 242 III 18.3 232 232-238 Models [ 199] II III 225-228 224-236 Proteins [202,203] II 223-233 III II+III 34.3 245-249 225-241 299 (SCH3)2FeSzFe(SCH3)4 III+III 41.1 235-237 226-227 285 III+III 230-231 219-223 270 Models [204] III+III 222-237 211-228 260-278 Proteins [205] (SCH3)2FeSzFe(Im)2 II+III 18.3 233-239 225-229 271 216-220 III+III 21.8 227-232 219-230 275 210-212 Proteins [206,207] II+III 222-231 223-235 271 21..3-223 , ,

long, but the change during oxidation is well reproduced. However, in the protein, the distances seem to be even shorter, 226 and 232 pm according to EXAFS experiments, and the change is smaller [ 112,200]. This is probably an effect of the protein environment, where several backbone amide groups form hydrogen bonds to the Scys atoms [76,201]. COMQUM calculations on rubredoxin show that the protein reduces the calculated average Fe-S distances to 230 and 236 pm for the oxidised and reduced site, respectively [24]. Thus, the hydrogen bond reduce the bond length more in the reduced than in the oxidised complex, giving an excellent agreement with experiments for the change in the bond length upon reduction. The calculated inner-sphere reorganisation energy of the Fe(SfH3)4 model in vacuum is 40 kJ/mole. In the proteins the hydrogen bonding reduce the reorganisation energy by -12 kJ/mole [24]. The inner-sphere reorganisation energy of rubredoxin has been estimated from the change in Fe-S bond lengths and the corresponding vibrational frequency [208]. The result is ~10 kJ/mole lower than our estimate, which illustrates that the reorganisation energy does not only arise from the changes in these bond lengths. We have also studied the (SCH3)2FeS2Fe(SCH3)2 complex in its fully oxidised and mixed-valence form as a model of the [2Fe-2S] ferredoxins. The optimised Fe-S distances are 5-10 pm longer than in experiments, again reflecting the systematic error of the B3LYP method. The discrepancy for the Fe-Fe distance is slightly larger, but this is probably an effect of a flexible Fe-Fe interaction, as for the Cu-Cu bond in the CUA site [147]. Our calculated reorganisation energy is 75 kJ/mole, appreciably larger than for the rubredoxin site. This is in accordance with a lower rate of electron transfer for these sites in proteins as well as in model systems [ 162,204]. At first, the increase in reorganisation energy for the dimeric iron-sulphur clusters (compared to the monomeric rubredoxin site) may seem a bit strange,

42 considering that for the dimeric CU A site, the reorganisation energy decreased compared to the blue-copper monomer. The reason for this behaviour is that the unpaired electron in the mixed-valence iron-sulphur site is localised to one of the iron ions, whereas it is delocalised in the CUA site. It has been suggested that a delocalised dimer should have approximately half the reorganisation energy of the monomer, because of a reduction in the change in the bond lengths upon reduction by a factor of two [ 144,155,156]. This was essentially what we observed for the CUA site [147]. In the iron-sulphur dimers, the change in the bond lengths upon reduction is not significantly altered. In fact, it is slightly increased around the iron ion that is reduced (13 pm compared to 9 pm for the rubredoxin model), but there are also appreciable changes around the other iron ion (6 pm on average). Even if the force constants are reduced around the reduced iron ion, the number of bonds is doubled. Therefore, the total reorganisation energy of the ferredoxin model increases. Interestingly, our model of the Rieske iron-sulphur site, (SCH3)2FeS2Fe(Im)2, has an appreciably lower reorganisation energy, 40 M/mole. This is due to smaller changes around the iron ions, 2-8 pm (c.f. Table 8) and lower force constants of the imidazole ligands. As for the cytochromes and blue copper proteins, we have also investigated how the iron-sulphur clusters have achieved a low inner-sphere reorganisation energy. First, iron is a better ion than copper, since the Fe(II) and Fe(llI) have similar preferences for the geometry and coordination number. Moreover, even at a fixed geometry, copper gives a higher reorganisadon energy than iron. For example, the reorganisation energy of an octahedral Cu(H20)6 complex is twice as large as for Fe(H20)6. This probably reflects the difference in the charge of the two ion pairs. Second, four ligands give slightly lower reorganisation energies than six, provided that the geometry does not change, since there are fewer bonds. Finally, iron-sulphur sites employ soft and large thiolate ligands, which give smaller reorganisation energies than harder ligands such as water.

8. PROTEIN STRAIN

The suggestion that proteins use mechanical strain for their function is an old but still viable hypothesis [ 12,10,209-211 ]. The most classical example of a protein for which strain has been suggested to play a functional role is probably lysozyme [212]. It was originally suggested that this protein forces its substrate to bind in an unfavourable conformation, viz. a conformation similar to the transition state. However, theoretical calculations by Levitt and Warshel convincingly showed that strain has a negligible influence on the rate of this enzyme; instead, the catalytic power is gained by favourable electrostatic interactions in the transition state [50]. This and other cases have led several leading biophysical chemists

43 to argue strongly against strain as an important factor in enzyme catalysis [50, 213,214]. To make strain hypotheses testable, it is vital to define what is meant by strain. Warshel has defined strain as distortions caused by covalent interactions (bond, angles, and dihedrals) and possibly also the repulsive part of the Van der Waals interaction [50]. This is close to the intuitive conception of mechanical strain, but it is hard to estimate except in classical simulations of proteins. We have used a wider definition of strain [49]: a change in geometry of a ligand (e.g. a metal coordination sphere) when bound to a protein (it includes effects that normally are not considered as mechanical strain, most prominently electrostatic and solvation effects). This change must be relative to a reference state. We have used the vacuum geometry as the strainless state, but other reference states are also conceivable, e.g. the ligand in water solution. However, such a choice is less welldefined. For exan~le, how large changes should be allowed in the reference state: May the number of ligands change? May a water molecule come in as an axial ligand, or as an equatorial ligand, or may it even replace the protein ligands? It must be recognised that any ligand necessarily acquires slightly different properties when bound to a protein. This is an effect of the trivial fact that a protein is different from vacuum or solution (it has another effective dielectric constant and presents specific electrostatic interactions). Such changes have been estimated for a number of protein-ligand complexes, and Liljefors et al. have argued that the energies involved are less than 13 kJ/mole if the reference state is the ligand in solution [215]. If the reference state instead is the ligand in vacuum, appreciably larger energies are observed. We have, for example, calculated energies associated with the change in geometry of the metal site when inserted from vacuum into a protein to 30-60 M/mole for the catalytic and structural zinc ions in alcohol dehydrogenase [75-77] and similar values for the blue copper proteins and iron-sulphur clusters, 16-51 M/mole [24,26]. We suppose that the strain hypotheses are intended to deal with systems where the strain is larger than normal and has a functional role. Therefore, we consider distortions smaller than this insignificant, unless there is a clear function of the strain [49]. Originally, the entatic state and the induced rack theories for the blue copper proteins discussed only the rigid protein and the strained cupric conformation, i.e. mechanical strain. However, lately they have started to embrace virtually any modifying effect of the protein. For example, in a recent commentary [63], Gray, Malmstrtm and Williams consider exclusion of water as a "constraining factor". This is a most unfortunate widening and blurting of the concept, making discussions harder. Moreover, seen in that way, all proteins are strained or entatic (i.e. they are adapted to functional advantage [63]), but at the same time such a hypothesis becomes void of any predictive value. With Warshel's or our definition of strain, we have shown without any doubts that the cupric structure of the blue copper proteins is not strained to any signifi-

44 cant degree [14,33,34], especially if the dynamics at ambient temperatures and the dielectric surroundings of the copper site are considered [26,35,45,54]. The electronic structure explains why protein sites with a cysteine ligand have structures close to a tetrahedron, whereas inorganic complexes are tetragonal [65]. Furthermore, our and other groups have shown that the unusual spectroscopic properties and the high reduction potential of the blue copper proteins are a natural consequence of the covalent nature of the bond between copper and the cysteine thiolate group [33-36,65,83-86,96,119,216]. Similarly, we have shown that the low reorganisation energy is also intrinsic to the blue copper site [26,68]. Clearly, strain is not needed to explain any of the unusual properties of the blue copper proteins and there is no indication that mechanical strain has any functional value for the proteins. The similarity in structure between the oxidised and reduced forms of the blue copper proteins has often been taken as an argument for the strain hypotheses [63]. However, our results show that this is a natural effect of the copper ligands, especially the cysteine ligand [14]. Likewise, the similarity between metalsubstituted blue copper proteins and their native counterparts has been taken as an argument for strain [63,217]. Yet, there is an appreciable variation in the metal-ligand distances for the various proteins, viz. 43, 31, 102, and 103 pm for the bonds to N, Scys, O, and SMr respectively [63]. This points to a plastic, rather than rigid, metal site. Moreover, the changes reflect the softness of the metal, showing that the metal, rather than the protein, determines the geometry of the site. Similarly, in trans mutations of the copper ligands in azurin have provided strong experimental evidence for a flexible copper site [218]. The fact that the structure of the copper-flee form of the blue copper proteins is similar to that of the metal-loaded form has also frequently been taken as an argument for strain. However, this does not show that the copper site is strained. Instead, it may facilitate metal binding [14,77]; if the metal chelating site was not present before the metal is bond, clearly metal binding would be harder [14,144]. Moreover, the copper-flee structure is stabilised by several favourable hydrogen bonds [219-221], showing that the structure is not unnatural. In fact, there is another structure of the apo-azurin [57], in which a water molecule occupies the metal site, leading to appreciable changes in the geometry of the site. Again, this points to a substantial flexibility of the metal site. A fourth argument for the strain hypotheses is the problem to synthesise small inorganic models that reproduce the geometry and macroscopic properties of the blue copper proteins [66,67]. The most successful models involve strained ligands [222] and the first trigonal model was reported only very recently [223]. However, inorganic modelling of blue-copper sites is full of practical problems [224]. Most prominently, Cu(II) and thiolate ligands tend to disproportionate to form Cu(I) and disulphide. In the proteins, this reaction is inhibited by the bulk of the protein. Second, our calculations show that the stability of trigonal and

45 tetragonal Cu(II) complexes depends strongly on the ligands. A thiolate ligand is not enough to stabilise a trigonal structure; another weak ligand, such as methionine must also be present [65]. In fact, there are still no small inorganic model that have the same set of ligands (N2S-S ~ as the blue copper proteins [66,67]. Another problem with small models is that molecules from the solution (e.g. water) may come in and stabilise tetragonal structures and higher coordination numbers [224]. It is illustrative that very few inorganic complexes reproduce the properties of the blue copper proteins [66,67], whereas typical blue-copper sites have been constructed in several proteins and peptides by metal substitution, e.g. insulin, alcohol dehydrogenase, and superoxide dismutase [66]. This shows that the problem is more related to protection from water and dimer formation than to strain. This does not mean that the protein is unimportant for the function of the blue copper proteins. On the contrary, the protein provides the proper ligands to the copper site and protects it from unwanted ligands. This may also involve a restriction of the number of ligands of the copper ion. Typically, Cu(II) binds 4--6 ligands, whereas Cu(I) prefers 2-4, but with the bulky, soft, and negatively charged sulphur ligands, the two oxidation states accept the same coordination number. Second, the protein modifies the dielectric properties of the surroundings of the copper site, thereby reducing the outer-sphere reorganisation energy and modulating the reduction potential of the copper site. Third, the protein offers a proper path or matrix for electron transfer and the docking sites for the donor and acceptor proteins [144]. Clearly, the blue copper proteins also modulate the geometry of the copper site. The rhombic type 1 proteins stabilise a tetragonal structure, whereas the axial type 1 proteins stabilise the trigonal structure of the same copper coordination sphere. However, the energy needed for such a stabilisation, Leu29 > Va129 (wild type) > Ala29 while on the nanosecond time scale rebinding in Phe29 mutant was found to be slower than in the Ala29 mutant. Molecular dynamics could, however, be used to explore this phenomena as well. Short time simulations performed by Li et al. [ 130] revealed that Phe29 squeezes the ligand against the heme and therefore recombination should be fast. Long time simulations, however, pointed out that the phenyl ring prevents the diffusion of the ligand to the nearest cavity (B/G contact) and forces it to another pocket (E/F comer or CD loop). Rebinding from these latter sites is associated with a barrier

83

and therefore recombination is hindered. Since alanine did not prevent diffusion to the nearest cavity recombination rate for this mutant is higher than that obtained for the Phe29 mutant. Effect of mutations was further investigated by Carlson et al. analysing NO recombination properties of a number of double mutant myoglobins [68]. Although their previous study suggested that the rate of recombination depends on the volume available to ligands in the heme pocket experiments with double mutants gave unexpected results. Preparing the double mutant His64Gly/Va168Ala Egeberg et al. [131] expected that the increased volume of the distal pocket facilitates the movement of the ligand and therefore the rate of geminate recombination will be lower than that of the native protein. In fact, however, they found that the rate of recombination was greater than in native myoglobin. Another double mutant, His64Gly/Va168Ile, resulting in a smaller extension of the heine pocket than in case of His64Gly/Va168Ala, instead of recombining faster rebound much slower. Results obtained for these double mutants illustrated another fundamental issue i.e. the role of distal His64 in ligand binding as it was previously suggested by Campbell et al. [97]. X-ray diffraction studies on native myoglobin revealed that His64 is located near the surface of the protein and it has been proposed that this residue functions as a gate, controlling access and escape of the ligand [132,133]. Unexpected rebinding results obtained for double mutant myoglobins inspired Gibson et al. to perform molecular dynamics simulations [127]. The crystal structure of the His64Gly/Va168Ala mutant has been determined and used as a starting structure for the simulations. Comparing the X-ray structure of the double mutant to that of the native protein there was no major rearrangement detected. Since a methyl group of the Va168 side chain was removed the binding pocket collapsed slightly. Mutation of the distal His64, however, showed that a well-ordered water molecule was positioned to interact with the ligand. Ligand accessible volumes in the mutant and that in the native protein were found to be similar suggesting that the effect of mutations cannot be due to simple volume effects. Molecular dynamics simulations on His64Gly/Va168Ala and His64Gly/Va168Ile mutants as well as on the native protein were performed in water using all of the crystallographic water molecules. Ten trajectories 50 ps long were calculated for each protein. Analysis of trajectories obtained for the mutants and a comparison to that of the native protein revealed that steric hindrance has a major role in determining the recombination rate. Recombination in the mutants may be explained in terms of fluctuating free volumes and structure of the heme pocket. Although the distal His64 usually forms stabilising interactions with the ligand the authors claimed that its kinetic effect is just the opposite. They found that steric effects on ligand rebinding depend mainly on the positions of side chains at the distal side. This finding is in

84

contrast to the proposal of Petrich et al. [67] which attributed non-exponential kinetics to proximal effects. The most important result of this mutant study is the emphasis of solvation in geminate rebinding. Simulations on solvated proteins demonstrated that the solvation shell around the protein creates a secondary barrier to the escape of the ligand. Since the mutated His64 is in direct contact with the external solvent the significant effect of solvation is not unexpected in the mutants. Water molecules of the first solvation shell can replace mutated surface residues (c.f. the X-ray structure of the His64Gly mutant) and block potential sites for geminate trapping. As a consequence this blocking increases the total time necessary for the ligand to escape from the heme pocket. Molecular dynamics simulations performed on other distal side double mutants (mutated at Va168 and Ilel07) showed that a pattern of cavities fluctuate and interconvert due to protein motions [ 134]. The authors suggested that these fluctuations have influence on the access to the iron atom and therefore affect recombination of the ligand. The positions of helices around the distal pocket were also monitored and it was demonstrated that these helices accommodate the mobile diatomic ligand which suggests a mechanism for communication between the heme pocket and the exterior of the protein. Although theoretical calculations reproduced rebinding properties of mutant myoglobins qualitatively, the lack of specific iron-ligand potential and short simulation times did not allow the calculation of absolute rebinding rates. Problems associated with the lack of iron-ligand potential function were overruled by Elber et al. [130] constructing the ground and the excited state potentials between NO and the heme on the basis of experimental data. These two crossing potential energy functions were introduced to the conventional CHARMM force field and classical ground and excited trajectories were calculated on these surfaces. Crossing between the two states was modelled via the Landau-Zerner formula [135]. Switching between binding and non-binding heme potentials involved a switching function defined for nuclear motion on the electronic ground state potential surface. After adjusting excited state and crossing parameters by the trial and error method short time scale simulations reproduced the order of recombination observed experimentally. Qualitative agreement with experimental data allowed the in-depth investigation of protein co-ordinates in a long time scale (0.5 fs). One of the most important conclusion of this study is that no longer relaxation time for the iron atom beyond a few picosenconds was observed. A similar result was obtained by Eaton et al. simulating NO recombination in myoglobin [100] but the statement is in contrast to that published by Petrich et al. [67] identifying long time relaxation for the iron atom and also to the results of Kuczera et al. identifying a time-dependent shift in the iron position [ 125].

85

Simulations performed by Henry et al. [136] also support this hypothesis. These authors carried out molecular dynamics simulations independently using a similar potential function to that applied by Li et al. [130]. Investigating the kinetics of NO rebinding to native myoglobin they used a potential function that switches between non-binding and binding potentials as a function of the position of the ligand. To simulate dissociation and subsequent rebinding three distinct potential functions were applied. The potential function of the hexacoordinated heine with bound NO (ligand binding potential) contains iron-ligand interaction terms including a Morse potential for the distance between the iron and the nitrogen atom of the ligand. The second potential was designed for the description of the ligand-free heine (ligand free potential). Bonding interactions between the ligand and heine were replaced by nonbonding interactions between NO and the pyrrole nitrogens. The most important rebinding potential included bonding terms and several switching functions. One of these switching functions was applied for the attenuation bending interactions as the iron-ligand distance increases. Another switching function was used to attenuate van der Waals interactions between the ligand and the pyrrole nitrogens of the heme. This function was constructed so that rebinding potential switches between the ligand binding and ligand free potentials. Ligand free potential was applied when the ligand was located far from the binding site or its orientation was unfavourable for binding and it was changed to the binding potential at optimal ligand positions. Based on the analysis of iron displacement along 12 trajectories Henry and cowokers [136] could not identify time dependent changes. In agreement with Li et al. [130] their analysis suggested that there is no contribution to the nonexponential rebinding of NO from conformational substates different in iron position that do not convert rapidly on the time scale of NO rebinding. Since CO rebinding was found to be slower than that of the NO it is still possible that iron displacement plays some role in the non-exponential rebinding of this ligand. Transition state analysis for NO rebinding, however, clearly demonstrated that this motion is not responsible for the non-exponential rebinding of NO. Kinetic description of NO rebinding was carried out using five sets of 20 trajectories each originating from a different 200 ps ligated trajectory. These sets are therefore considered as conformational substates for which the kinetic curve was calculated. Progress curves showed only very little difference i.e. in 96 of 100 trajectories the ligand rebound within 15 ps. Analysing ligand positions along these trajectories the authors concluded that NO remained in a pocket formed by Phe43, His64, Leu29, Val 68 and Ile 107 found to be very close to the iron. Four of 100 trajectories indicated that NO can also escape from the heme pocket and suggested a competition with the fast geminate rebinding. The experimental kinetic curve was well described by a double exponential potential

86

with time constants of 28 ps and 280 ps. A comparison between this and the calculated curve revealed that the 28 ps relaxation measured corresponds to binding from the heme pocket. The multi-exponential nature of rebinding, however, suggests the coexistence of multiple pockets inside the protein. The agreement between escape rates calculated from simulations and experiments also supports this hypothesis. In addition to the electronic barrier associated with crossing potential surfaces introduced by Li et al. [130] Henry et al. [136] suggested two further contributions to the free energy of rebinding. One of these is the steric interaction between the ligand and distal side residues, the other one is the requirement that the ligand should be localised in a correct orientation for binding. These contributions were approximated using Langevin dynamics calculating the enthalpy and entropy effects of steric interactions and the entropy effect of optimal ligand positioning. Although this method was considered as a rough estimate of the magnitude of these barriers, results agreed well with early suggestions [119] that the bent orientation of NO allows the binding of this ligand without significant steric barriers.

6. LIGAND M I G R A T I O N As it was already noted when the first X-ray structure of myoglobin has been solved by Perutz and Matthews, the protein environment of the heme, if rigid, would prevent the entrance and exit of even small ligands [1]. Thus, protein fluctuations of myoglobin are certainly a crucial part of its biological function. In the first theoretical quest for escape routes, the O2 molecule was treated as a high-energy sphere making its way through the rigid protein matrix of myoglobin. The majority of escaped molecules passed His64, Thr67 and Va168, a second part of the molecules first moved to a pocket bordered by Leu29, Leu61 and Phe33 then left the protein between Leu61 and Phe33. However, these pathways had very high energy barriers of around 420 kJ/mol [ 132]. This number could be reduced to about 40 kJ/mol in a fluctuating protein, which would be a biologically acceptable value. Polarity, rather than size has been identified as important in the contribution of amino acids to the barrier height between the heme pocket and the solvent in a series of photolysis experiments of O2 and CO complexes of several distal side mutants of Mb [128]. Another mutation study suggested that the distal histidine has a role in blocking ligand escape [137], however, no compelling evidence supported that the major pathway in the wild type protein goes by the distal histidine also. The results of fluorescence quenching measurements indicated that a rather large portion of the protein can be penetrated by the physiological ligands of the

87

globins [ 13 8,13 9,140]. This finding has gained theoretical support from xenon binding studies. A molecular dynamics simulation, where all protein atoms were allowed to move, revealed that Xe atoms can take rather complicated routes prior to exit, spending a considerable amount of time in a connecting network of channel-like pathways within the protein interior [141]. It has also been concluded that protein fluctuations cause changes of 3 to 4 % in the protein volume as compared to that crystallographically determined. This leads to changes in the shape, size and location of those cavities that are large enough to host a ligand molecule [142]. All molecular dynamics studies set out to deal with the problem of migration had to face a decision. Either several ligand trajectories have to be examined in a rigid protein environment or protein movements have to be allowed which, however, imposes a time limitation on the number of possible trajectories. These limitations were overcome in the work of Elber and Karplus [126] who applied the time-dependent Hartree approximation to this phenomenon as was implemented by Gerber et al. [143]. This made it possible to treat simultaneously an ensemble of photodissociated CO molecules moving through the protein matrix. The CO molecules move in the field of a single set of protein co-ordinates, the protein trajectory, however, is approximate since the protein atoms move in the average field of all CO molecules. The basic hypothesis of the method is that the ligands do not introduce a major perturbation in protein fluctuations which are expected to be governed by the interaction within the protein. The kinetic energy of the ligand was increased by heating at fixed time intervals for short periods and then gradually cooling it to room temperature, while the protein temperature was kept near 300 K by velocity scaling. The kinetic energy of the ligand was increased because classical dynamics has been shown to be inefficient in sampling the rare fluctuations of the protein which assist ligand escape [132]. Solvent was not included in the calculations for which the models were derived from the X-ray structure of carbon-monoxy myoglobin [43]. The energy was minimised using deoxy heine parameters so that only non-bonded interactions were defined between the heme and the CO ligand. Three set of different simulations (100 ps each) were carried out each using sixty CO molecules, one where all atoms were free to move, one constraining all protein and heme atoms and one where the polypeptide chain and heme were rigid but side chains and the ligands were allowed to move. Four significant cavities and one semi-cavity, thus five major pathways within the protein matrix were identified. These are the (1) EF loop and the N terminal loop (2), A/E helices (3), AB loop and the G helix (4) and the proximal histidine (5) CD loop. The diffusion motion of the ligand could be described by a few-site hopping scheme in which the ligand is trapped for a significant time in individual cavities and then hops to another cavity or finally to the solvent. Even

88

using the high temperature ligand only a few molecules were able to escape the rigid or partly restricted protein matrix. Inspection of the trajectories showed that the barriers between the cavities are significantly reduced by the protein fluctuations. Czerminski and Elber studied the diffusion of CO in lupine leghemoglobin, a protein of similar overall fold but different, much faster, diffusion rates than myoglobin. Their goal was to compare theoretical estimates of diffusion for the two proteins [144]. The locally enhanced protocol (LES), quite similar to that used by Elber and Karplus [126] suggested that the mechanism of ligand penetration and escape is different for the two proteins. In myoglobin many alternate routes exist while in leghemoglobin application of the same technique described the ligand escaping along a well-defined, practically unique path. The results of this LES study were later refined using the self penalty walk (SPW) algorithm [145] developed by Czerminski and Elber [146]. SPW provides a more detailed picture than LES methods since barrier hights for diffusion can be estimated and the quenching of a significant fraction of protein motions that are not coupled to the diffusion of the ligand is made possible. Thus the method helps to elucidate structural features of gate openings. Three distinct reaction coordinates were explored following three diffusion paths. Local properties of the co-ordinates in the vicinity of the CO ligand were found to be similar supporting the original view of one escape channel in leghemoglobin. The diffusion process consisted of only two steps, in contrasts to the few-site hopping model of CO diffusion in myoglobin established by Elber and Karplus [126]. In the first the ligand jumps to a cavity in the protein matrix assisted by the tilt of Phe29 then it hops to the exterior in which the global translations and rotations of helices C and G are involved. In the first step of the process the barrier is local, however, in the second step significant coupling to low frequency modes is observed as shown in a further elucidation of the problem [147]. A number of experimental studies complemented with LES calculations have been carried out on different point mutants of myoglobin with a diverse set of results that tend in a direction contrary to the many-route escape model. In an experimental and theoretical study of ligand migration in myoglobin (over 25 different oxymyoglobin point mutants were studied by laser flash photolysis and the LES method) Scott and Gibson [148] conclude that secondary docking sites of 02 are unlikely and refer, by exclusion, to the original hypothesis of escape through the histidine gate as was suggested by Case and Karplus [132]. In this picture, after the photolysis the ligand molecules move toward the interior of the protein within the first few picoseconds then return to the proximity of the iron either to be recaptured or to escape giving rise to the so called primary recombination with a relaxation half-time of some 20 ns. Other ligands move to the site surrounded by Gly25, Ile28, His93, Va168 and Ile107 or to the edge of

89

the heme below the heme plane only to remm and be recaptured on gs time scales in the so-called secondary recombination process. However, no actual escape route was mapped by the calculations, only several different recombination processes. A quite different escape model emerges from the work of Brunori et al. [149] proposing that the escape of the ligand is through the secondary site in the studied case, which disagrees with the hypothesis of Scott and Gibson [148]. The secondary docking site of small molecular ligands overlapping the Xe(4) site [141 ] was identified in a distal side triple mutant myoglobin also. The triple mutant of myoglobin was synthesised to rationalise the characteristic difference in 02 dissociation rates between MB and Ascaris hemoglobin. This latter has an extremely low 02 dissociation constant. The distal pocket of Ascaris hemoglobin differs only in three amino acids from Mb, this difference was mimicked by Leu29Tyr, His64Gln and Thr67Arg point mutations. Although the H-bonding pattern stabilising the 02 ligand in Ascaris hemoglobin was reproduced by the mutations, as hoped, the dissociation rate for the triple mutant was still over 200 fold faster than that of Ascaris hemoglobin. To find a plausible explanation the LES method of Elber and Karplus [ 126] was applied to study the migration paths of NO (a more reactive ligand with similar diffusion constant) within the protein interior of both proteins. Simulations were started from the respective crystal structures with all crystallographic water molecules included as TIP3P [73] explicit molecules. Runs were repeated in the presence of xenon as well. Eight trajectories were collected, each over 50 ps. In five runs the ligand cloud stayed close to the binding position within 4 A of the iron. In two runs after 10 ps the 8-carbon of Ilel07 swung around opening a path communicating with the Xe(4) site so the ligands could dock into the cavity formed by Gly25, Ile28, Vla68, Leu69 and Ile107 approximately 9 A from the iron. Docking within this secondary site and the partial return of the ligands toward the iron might generate, according to the model, the slow component of the geminate recombination reaction measured for the mutant. In Ascaris hemoglobin, however, a Phe residue is found in place of Ile107 of Mb, which, instead of opening the gate toward the secondary site and sequentially to the escape of the ligand, blocks this path. Therefore, it enhances the geminate recombination of the ligands that stay trapped in the iron-close primary site. The authors propose this effect to be the cause of the unusually low dissociation rates measured for Ascaris hemoglobin.

90

REFERENCES

1. M.F. Perutz and F.S. Matthews, J. Mol. Biol., 21 (1965) 199. 2. C.L. Nobbs, H.C. Watson and J.C. Kendrew, Nature, 209 (1966) 339. 3. R. Elber, and M. Karplus, Science, 235 (1987) 318. 4. R.M. Levy and M. Karplus, Biopolymers, 18 (1979) 2465. 5. M. Karplus and J.N. Kushick, Macromolecules, 14 (1981) 325. 6. M. Levitt, C. Sander and P.S. Stem, J. Mol. Biol., 181 (1985) 423. 7. W. Bialek and R.F. Goldstein, Biophys. J., 48 (1985) 1027. 8. P.G. Debrunner and H. Frauenfelder, Annu. Rev. Phys. Chem., 33 (1982) 283. 9. A. Ansari, J. Berendzen, S.F. Browne, H. Frauenfelder, I.E.T. Iben, T.B. Sauke, E. Shyamsunder and R.D. Young, Proc. Acad. Natl. Sci., U. S. A., 82 (1985) 5000. 10. S. Swaminathan, T. Ichiye, W. van Gusteren and M. Karplus, Biochemistry, 21 (1982) 5230. 11. J. Kuriyan, G.A. Petsko, R.M. Levy and M. Karplus, J. Mol. Biol., 190 (1986) 227. 12. J.L. Smith, W.A. Hendrickson, R.B. Honzatko and S. Sheriff, Biochemistry, 25 (1986) 5018. 13. A.M. Lesk and C. Chothia, J. Mol. Biol., 136 (1980) 225. 14. S. Corbin, J.C. Smith and G.R. Kneller, Proteins: Struct. Funct. Genet., 16 (1993) 141. 15. D.L. Stein, Proc. Natl. Acad. Sci., U.S.A. 82 (1985) 3670. 16. K. Kuczera, J. Kuriyan and M. Karplus J. Mol. Biol., 213 (1990) 351. 17. H. Frauenfelder, G.A. Petsko and B. Bianchi, Nature, 280 (1979) 558. 18. H. Frauenfelder, H. Hartmann, M. Karplus, I. D. Kuntz, J. Kuriyan, F. Parak, G.A. Petsko, D. Ringe, R.F. Tilton, M.L. Conolly and N. Max, Biochemistry, 26 (1987) 254. 19. H. Hartmann, F. Parak, W. Steigemann, G.A. Petsko, D.R. Ponzi and H. Frauenfelder, Proc. Natl. Acad. Sci., U.S.A., 79 (1982) 4967. 20. F. Parak, E.N. Frolov, R.L. M6ssbauer and V.I. Goldanski, J. Mol. Biol., 145 (1981) 825. 21. J. Smith, K. Kuczera and M. Karplus, Proc. Natl. Acad. Sci., U.S.A., 87 (1990) 1701. 22. F. Parak, E.W. Knapp and D. Kucheida, J. Mol. Biol., 161 (1982) 177. 23. E.R. Henry, Biophys. J., 64 (1993) 869. 24. W. Nowak, J. Mol. Structure THEOCHEM, 398-399 (1997) 537. 25. J. Smith, K. Kuczera, B. Tidor, W. Doster, S. Cusack and M. Karplus, Physica B, 156157 (1989) 437. 26. G.R. Kneller and J.C. Smith, J. Mol. Biol., 242 (1994) 181. 27. D.J. Danziger and P.M. Dean, Proc. Roy. Soc. Lond., B236 (1989) 101. 28. M. Schmidt, F. Parak and G. Coronghiu, Int. J. Quant. Chem., 59 (1996) 263. 29. W. Gu and B.P. Schoenborn, Proteins: Struct. Funct. Genet., 22 (1995) 20. 30. W. Gu, A.E. Garcia and B.P. Schoenbom, Basic Life. Sci., 64 (1996) 289. 31. V. Lounnas and M.B. Pettit, Proteins: Struct. Funct. Genet., 18 (1994) 133. 32. H. Frauenfelder, F. Parak and R.D. Young, Annu. Rev. Biophys. Biophys. Chem., 17 (1988) 451. 33. Y.F. Krupyanskii, F. Parak, V.I. Goldanksii, R.L. M6ssbauer, E.E. Gaubman, H. Engelmann and I.P. Suzdalev, Z. Naturforsch., C37 (1982) 57. 34. P.J. Steinbach and B.R. Brooks, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 9135. 35. P.J. Steinbach and B.R. Brooks, Proc. Natl. Acad. Sci. U.S.A., 93 (1996) 55. 36. P.J. Steinbach and B.R. Brooks, Chem. Phys. Lett., 226 (1994) 447.

91

37. B.K. Andrews, T. Romo, J.B. Clarage, M.B. Pettitt and N.G. Phillips, Structure, 6 (1998) 587. 38. C.L. Brooks III, J. Mol. Biol., 227 (1992) 375. 39. J.D. Hirst and C.L. Brooks III, Biochemistry, 34 (1995) 7614. 40. S.E.V. Phillips and B.P. Schoenborn, Nature, 292 (1981) 81. 41. L. Stryer, Biochemistry, W.H. Freeman & Co, New York, 1988. 42. B.A. Springer, S.G. Sligar, J.S. Olson and G.N. Phillips Jr., Chem. Rev., 94 (1994) 699. 43. J. Kuriyan, S. Wilz, M. Karplus and G.A. Petsko, J. Mol. Biol., 192 (1986) 133. 44. X. Cheng and B.P. Schoenborn, J. Mol. Biol., 220 (1991) 381. 45. M.L. Quillin, R.M. Arduini, J.S. Olson and G.N. Phillips Jr., J. Mol. Biol., 234 (1993) 140. 46. F. Young and G.N. Phillips Jr., J. Mol. Biol., 256 (1996) 762. 47. M. Lim, T.A. Jackson and P.A. Anfinrud, Science, 269 (1995) 962. 48. J.T. Sage and W. Jee, J. Mol. Biol., 274 (1997) 21. 49. J.P. Collman, J.I. Brauman, T.R. Halbert and K.S. Suslick, Proc. Natl. Acad. Sci. U.S.A., 73(1976) 3333. 50. D.A. Case and M. Karplus, J. Mol. Biol., 123 (1978) 697. 51. T. Li, M.L. Quillin, G.N. Phillips Jr. and J.S. Olson, Biochemistry, 33 (1994) 1433. 52. M. Lim, T. A. Jackson and P.A. Anfinrud, J. Chem. Phys., 102 (1995) 4355. 53. J.O. Alben, D. Beece, S.F. Bowne, W. Doster, L. Eisenstein, H. Frauenfelder, D. Good, D. McDonald, M.C. Marden, P.P. Moh, L. Reinisch, A.H. Reynolds, E. Shyamsunder and K.T. Yue, Proc. Natl. Acad. Sci. U.S.A., 79 (1982) 3744. 54. E. Oldfield, K. Guo, J.D. Augspurger and C.E. Dykstra, J. Am. Chem. Soc., 113 (1991) 7537. 55. J. Vojtechovsky, K. Chu, J. Berendzen, R.M. Sweet and I. Schlichting, Biophys J., 77 (1999)2153. 56. E.E. Abola, J.L. Sussman, J. Prilusky and N.O. Manning, Methods. Enzymol., 277 (1997) 556. 57. J. L. Sussman, L. Lin, J. Jiang and N.O. Manning, Acta Cryst., D54 (1998) 1078. 58. G.S. Kachlova, A.N. Popov and H.D. Bartunik, Science, 284 (1999) 473. 59. S. Bhattacharya and J.T. Lecomte, Biophys J., 73 (1997) 3241. 60. R.F. Eich, T. Li, D.D. Lemon, D.H. Doherty, S.R. Curry, J.F. Aitken, A.J. Mathews, K.A. Johnson, R.D. Smith, G.N. Phillips Jr. and J.S. Olson, Biochemistry, 35 (1996) 6976. 61. M. Hoshino, K. Ozawa, H. Seki and P.C. Ford, J. Am. Chem. Soc. 115 (1993) 9568. 62. V.S. Sharma, R.A. Isaacson, M.E. John, M.R. Waterman and M. Chevien, Biochemistry, 32(1993) 3897. 63. N.V. Gordunov, A.N. Osipov, B.W. Day, B. Zayas-Rivera, V. Kagan and N.M. Elsayed, Biochemistry, 34 (1995) 6689. 64. L.J. Ignarro, C.M. Buga, K.S. Wood, R.W. Byrns and G. Chaudhuri, Proc. Natl. Acad. Sci. U. S. A., 84 (1987) 9265. 65. R.M.J. Palmer, A.G. Ferrige and S. Moncada, Nature 327 (1987) 524. 66. M. Hoshino, K. Ozawa, H. Seki and P.C. Ford, J. Am. Chem. Soc., 115 (1993) 9568. 67. J.W. Petrich, J.C. Lambry, K. Kuczera, M. Karplus, C. Poyart and J.L. Martin, Biochemistry, 30 (1991) 3975. 68. M.L. Carlson, R. Regan, R. Elber, H. Li, G.N. Phillips Jr. and Q.H. Gibson, Biochemistry, 33 (1994) 10497.

92

69. E.A. Brucker, J.S. Olson, M. Ikeda-Saito and G.N. Phillips Jr., Proteins: Struct. Funct. Genet., 30 (1998) 352). 70. M.A. Lopez and P.A. Kollman, Protein Sci., 2 (1993) 1975. 71. P. Jewsbury and T. Kitagawa, Biophys. J., 67 (1994) 2236. 72. X. Cheng, J.C. Norwell, A.C. Nunes and B.P. Schoenbom, Science, 190 (1975) 568. 73. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey and M.L. Klein, J. Chem. Phys., 79 (1983) 926. 74. P. Jewsbury and T. Kitagawa, Biophys. J., 68 (1995) 1283. 75. P. Jewsbury, S. Yamamoto, T. Minato, M. Saito and T. Kitagawa, J. Am. Chem. Soc., 226 (1994) 11586. 76. P. Jewsbury, S. Yamamoto, T. Minato, M. Saito and T. Kitagawa, J. Phys. Chem., 99 (1995) 12677. 77. D.K. Menyhfird and G.M. Keserfi, J. Am. Chem. Soc., 120 (1998) 79911. 78. M.L. Quillin, R.M. Arduini, J.S. Olson and G.N. Phillips Jr., J. Mol. Biol. 243 (1993) 140. 79. C.W. Rella, K. Rector, A. Kwok, J.R. Hill, H.A. Schwettmann, D.D. Dlott and M.D. Fayer, J. Phys. Chem., 100 (1996) 15620. 80. B. Kushkuley and S.S. Stavrov, Biophys. J., 70 (1996) 1214. 81. B. Kushkuley and S.S. Stavrov, Biophys. J., 72 (1997) 899. 82. G.N. Phillips Jr., M.L. Teodoro, T. Li, B. Smith and J.S. Olson, J. Phys. Chem., 103 (1999) 8817. 83. M. Davis, J. Madura, B. Luty and J.A. McCammon, Program. Comput. Phys. Commun. 62 (1990) 187. 84. A. Gosh and F.D. Bocian, J. Phys. Chem., 100 (1996) 6363. 85. U. von Barth and L.J. Hedin, Phys. Chem., 5 (1972) 1629. 86. X.Y. Li and T.G. Spiro, J. Am. Chem. Soc., 110 (1988) 6024. 87. M.T. McMahon, A.C. DeDios, N. Godbout, R. Salzmann, D.D. Laws, H. Le, R.H. Havlin and E. Oldfield, J. Am. Chem. Soc., 120 (1998) 4784. 88. V.G. Malkin, O.L. Malkina, M.E. Casida and D.R. Salahub, J. Am. Chem. Soc., 116 (1994) 5898. 89. W. Kutzelnigg, U. Fleischer and M. Schindler in, NMR- Basic Principles and Progress, Vol. 28, Springer, Heidelberg, 1990, p. 1965. 90. A.J.H. Wachters, J. Chem. Phys., 52 (1970) 1033. 91. H. Le, J.G. Pearson, A.C. DeDios and E. Oldfield, J. Am. Chem. Soc., 117 (1995) 3800. 92. T.G. Spiro and P.M. Kozlowski, J. Am. Chem. Soc., 120 (1998) 4524. 93. R.H. Austin, K.W. Beece, L. Eisenstein, H. Frauenfelder and I.C. Gunsalus, Biochemistry, 14 (1975) 5355. 94. A. Ansari, J. Berendzen, D. Braunstein, B.R. Cowen, H. Frauenfelder, M.K. Hong, I.E.T. Iben, J.B. Johnson, P. Ormos, T.S. Sauke, R. Scholl, A. Schulte, P.J. Steinbach, J. Vittitow and R.D. Young, Biophys. Chem., 26 (1987) 337. 95. A. Ansari, E.E. Dilorio, D.D. Dlott, H. Frauenfelder, P. Langer, H. Roder, T.B. Sauke and E. Shyamsunder, Biochemistry, 25 (1986) 3139. 96. M.D. Chatfield, K.N. Walda and D. Madge, J. Am. Chem. Soc., 112 (1990) 4680. 97. B.F. Campbell, M.R. Chance and J.M. Friedman, Science, 238 (1987) 373. 98. P.A. Anfinrud, C. Han and R.M. Hochstrasser, Proc. Natl. Acad. Sci. U.S.A., 86 (1989) 8387.

93

99. J.L. Martin, A. Migus, C. Poyart, Y. Lecarpentier, R. Astier and A. Antonetti, Proc. Natl. Acad. Sci. U.S.A. 80 (1983) 173. 100. E.R. Henry, M. Levitt and W.A. Eaton, Proc. Natl. Acad. Sci. U.S.A. 82 (1985) 2034. 101. I. Schlichting, J. Berendzen, G.N. Phillips Jr. and R.M. Sweet, Nature, 371 (1994), 808. 102. T.Y. Teng, V. Srajer and K. Moffat, Nature Struct. Biol., 1 (1994) 701. 103. H. Hartmann, S. Zinser, P. Komninos, R.T. Schneider, G.U. Nienhaus and F. Parak, Proc. Natl. Acad. Sci. U.S.A. 93 (1996) 7013. 104. M. Lim, T.A. Jackson and P.A. Anfinrud, Nature Struct. Biol., 4 (1997) 209. 105. J.O. Alben et al. Phys. Rev. Lett., 44 (1980) 1157. 106. J.E. Straub and M. Karplus, Chem. Phys., 158 (1991) 221. 107. D. Vitkup, G.A. Petsko and M. Karplus, Nature Struct. Biol., 4 (1997) 202. 108. J. Ma, S. Huo and J.E. Straub, J. Am. Chem. Soc., 119 (1997) 2541. 109. J.L. Martin, A. Migus, C. Poyart, Y. Lecarpentier, R. Astier and A. Antonetti, Proc. Natl. Acad. Sci. U.S.A. 80 (1983) 173. 110. J. Meller and R. Elber, Biophys. J., 74 (1998) 789. 111. J.S. Weiner, P.A. Kollman and D.T. Nguyen, J. Comput. Chem., 7 (1986) 230. 112. W.L. Jorgensen and J. Tirado-Rives, J. Am. Chem. Soc., 110 (1988) 1657. 113. G.M. Keserfi and D. K. Menyhfird, Biochemistry, 38 (1999) 6614. 114. W.C. Still, A. Tempczyk, R.C. Hawley and T. Hendrickson, J. Am. Chem. Soc., 112 (1990) 6127. 115. D.Q. McDonald and W. C. Still, Tetrahedron Lett., 33 (1992) 7743. 116. E.R. Henry, M. Levitt and R.M. Hochstrasser, Proc. Natl. Acad. Sci. U.S.A., 83 (1986) 8982. 117. P.J. Steinbach, A. Ansari, H.J. Berendzen, D. Braunstein, K. Chu, B.R. Cowan, D. Ehrenstein, H. Frauenfelder, J.B. Johnson, D.C. Lamb, S. Luck, J.R. Mourant, G.U. Nienhaus, P. Ormos, A. Xie and R.C. Young, Biochemistry, 30 (1991) 3988. 118. P.A. Cornelius, R.M. Hochstrasser and A.W. Steele, J. Mol. Biol., 163 (1983) 119. 119. A. Szabo, Proc. Natl. Acad. Sci. U.S.A., 75 (1978) 2108. 120. H. Frauenfelder and P.G. Wolynes, Science, 229 (1985) 337. 121. J.W. Petrich, C. Poyart and J.L. Martin, Biochemistry, 27 (1988) 4049. 122. K.A. Jongeward, D. Magde, D.J. Taube, J.C. Marsters, T.G. Traylor and V.S. Sharma, J. Am. Chem. Soc., 110 (1988) 380. 123. B.F. Campbell, M.R. Chance and J.M. Friedman, J. Mol. Biol., 262 (1987) 14885. 124. G.N. LaMar, F. Dalichow, X. Zhao, Y. Don, M. Ikeda-Saito, M.L. Chin and S.G. Sligar, J. Biol. Chem., 269 (1994) 29629. 125. K. Kuczera, J.C. Lambry, J.L. Martin and M. Karplus, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 5805. 126. R. Elber and M. Karplus, J. Am. Chem. Soc., 112 (1990) 9161. 127. Q.H. Gibson, R. Regan, R. Elber, J.S. Olson and T.E. Carver, J. Biol. Chem., 267 (1992) 22022. 128. T.E. Carver, R.J. Rohlfs, J.S. Olson, Q.H. Gibson, R.S. Blackmore, B.A. Springer and S.G. Sligar, J. Biol. Chem., 265 (1990) 20007. 129. T.E. Carver, R.E. Brantley, E.W. Singleton, R.M. Arduini, M.L. Quillin, G.N. Phillips and J.S. Olson, J. Biol. Chem., 267 (1992) 14443. 130. H. Li, R. Elber and J.E. Straub, J. Biol. Chem., 268 (1993) 17908. 131. K.D. Egeberg, B.A. Springer, S.G. Sligar, T.E. Carver, R.J. Rohlfs and J.S. Olson, J. Biol. Chem., 265 (1990) 11788.

94

132. D.A. Case and M. Karplus, J. Mol. Biol., 132 (1979) 343. 133. K.A. Johnson, J.S. Olson and G.N. Phillips, J. Mol. Biol., 207 (1989) 459. 134. M.L. Carlson, R. Regan and Q.H. Gibson, Biochemistry, 35 (1996) 1125. 135. L. Landau, Z. Phys. Sov., 2 (1932) 46. 136. O. Schaad, H.X. Zhou, A. Szabo, W.A. Eaton and E.R. Henry, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 9547. 137. T.E. Carver, J.S. Olson, S.J. Swerdon, S. Krzywda, A.J. Wilkinson, Q.H. Gibson, R.S. Blackmore, J.D. Ropp and S. G. Sligar, Biochemistry, 30 (1991) 4697. 138. J.R. Lakowitz and G. weber, Biochemistry, 12 (1983)4171. 139. M.R. Eftink and C.A. Ghiron, Anal. Biochem., 114 (1987) 199. 140. S.E. Englander and N.R. Kallenbach, Quart. Rev. Biophys., 16 (1983) 521. 141. R.F. Tilton Jr., U.C. Singh, S.J. Weiner, M.L. Connolly, I.D. Kuntz Jr., P.A. Kollman, N. Max and D.A. Case, J. Mol. Biol., 192 (1986) 443. 142. R.F. Tilton Jr., U.C. Singh, I.D. Kuntz Jr. and P.A. Kollman, J. Mol. Biol., 199 (1988) 195. 143. R.B. Gerber, V. Buch and M.A. Ratner, J. Chem. Phys., 77 (1982) 3022. 144 R. Czerminski and R. Elber, Proteins: Struct. Funct. Gen., 10 (1991) 70. 145 W. Nowak, R. Czerminski and R. Elber, J. Am. Chem. Soc., 113 (1991) 5627. 146 R. Czerminski and R. Elber, Int. J. Quant. Chem., 24 (1990) 167. 147 G. Verkhiver, R. Elber and Q.H. Gibson, J. Am. Chem. Soc., 114 (1992) 7866. 148 E. E. Scott and Q. H. Gibson, Biochemistry, 36 (1997) 11909. 149 M Brunori, F. Cutruzzola, C. Savino, C. Travaglini-Allocatelli, B. Vallone and Q.H. Gibson, Biophys. J., 76 (1999) 1259.

L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

95

Chapter 3 M e c h a n i s m s for E n z y m a t i c Reactions Involving Form a t i o n or Cleavage of 0 - 0 Bonds Per E.M. Siegbahn and Margareta R.A. Blomberg Department of Physics, Stockholm University, Box 6730, S-113 85 Stockholm, Sweden

Theoretical studies of the important class of enzyme reactions where an O-O bond is either formed or cleaved are described. Photosystem II is the only enzyme that can form O-O bonds from water, and suggested mechanisms for how this might occur are discussed. In contrast, several enzymes are able to cleave O-O bonds. The main examples discussed here are cytochrome oxidase and methane monooxygenase. Other examples described are heine peroxidases, manganese catalase and isopenicillin N synthase. General features are discussed for these reactions, which are shown to usually involve spin-state changes. The appearance of radicals and critical roles of protonations are emphasized.

1. I N T R O D U C T I O N Formation and cleavage of 02 are two of the most fundamental processes in nature and are therefore central in biochemistry [1]. The first organisms that developed the ability to form 02 from water and sunlight were cyanobacteria which appeared more than two billion years ago. They used water as an unlimited source of protons and electrons in their metabolism and released 02 as a waste product. This led to a rather fast increase of 02 in the atmosphere from small amounts to the present 2 1 % level. Initially, the large amount of 02 was disastrously toxic for the existing organisms. However, soon organisms developed that could make use of 02 to significantly increase the efficiency of ATP production, as well as using it for different efficient oxidation processes. Anaerobic glycolosis leads to the overall reaction (1), C6H1206 + 2ADP +2P~ - - ~ 2Lactate + 2H + + 2H20 + 2ATP

(1)

while aerobic metabolism of glucose leads to (2), C6H1206 + 38ADP +38Pi +602 ~

6CO2 + 44H20 + 38ATP

(2)

96 There is thus a 19-fold increase of the efficiency of ATP production when 02 is used. The reason for this increase is that 02, unlike other abundant substances, is thermodynamically relatively unstable. The double-bond in 02 has the strength of only 118 kcal/mol, as compared to the sum of the two O-H bond strengths in water of 219 kcal/mol, for example. Therefore, the reaction between 02 and a large number of substances is strongly exothermic. Still, 02 is kinetically quite stable, partly because of its triplet ground state. The by far main biological use of 02 is in respiration and only a minor fraction is used for oxidizing different substrates. The area of high accuracy quantum chemical applications on biological systems is relatively new. One obvious reason is that models of biological processes in general have to contain a rather large number of atoms. In the case of reactions involving 02 there is also the additional reason that these processes usually need to be catalyzed by transition metal complexes, and transition metal systems have been regarded as quite difficult systems to treat by theoretical methods. The major problem of treating reactions involving transition metals accurately is that these are usually associated with large changes of both the dynamical and non-dynamical correlation energies [2]. A decade ago, the high accuracy required to treat transition metal reactions demanded the use of the most advanced ab initio methods, but the use of these was too time-consuming even for small models of biological systems. At the same time conventional Density Functional Theory (DFT) methods were not quite accurate enough. A major change of this situation occurred a few years ago when terms depending on the gradient of the density, in particular for the exchange interaction, were introduced into DFT [3, 4]. This improvement, together with the improvement obtained by introducing a few semi-empirical parameters and a fraction of the Hartree-Fock exchange, has led to an accuracy that is not far away from that obtained by the most accurate ab initio methods at a small fraction of the cost [5]. In the present review, reactions occurring in enzymes involving either formation of 02 or cleavage of O-O bonds will be discussed. Since the field is so new, only a few of these reactions have been studied until now, but these will be described in relatively high detail. Formation of 02 from water is performed in nature by only one system, Photosystem II, containing a tetramanganese cluster. Since there is not yet any X-ray structure available, the character of a theoretical study of PSII is quite different from the other examples discussed here. The main case of O-

97

O bond cleavage discussed is the one that occurs in cytochrome oxidase, which is the terminal enzyme in the respiratory chain. In this case the roles of the protons and electrons involved are critical and this will therefore be discussed in detail. Other examples will be the O-O bond cleavages in heme peroxidases, in methane monooxygenase (and ribonucleotide reductase), in manganese catalase and isopenicillin N synthase. A general feature of these reactions that make them different from most other reactions is that a change of potential surface, usually leading to a change of spin, is required. The general principles involved in these processes will be emphasized. For theoretical studies involving also other transition metal enzymes and other reactions, see recent reviews [6, 7, 8].

2. M E T H O D S

AND

MODELS

All studies discussed in this review have used the B3LYP method [5, 9], which is termed a hybrid DFT method since it uses Hartree-Fock exchange in addition to the normal density functionals. The B3LYP functional can be written as, F B 3 L Y P = ( I _ A ) , F~Slat,~ +A ,FHF+B , F~B,~k, +C , F~LYP + ( 1 - C ) F yWg (3) where F~ later is the Slater exchange, F HE is the Hartree-Fock exchange, F~e-k* is the gradient part of the exchange functional of Becke [3], F LYP is the correlation functional of Lee, Yang and Parr [10] and F y W g is the correlation functional of Vosko, Wilk and Nusair [11]. The A, B and C coefficients were determined [5] using a fit to experimental heats of formation, where the correlation functionals of Perdew and Wang [12] were used instead of F v W g and F LYP in the expression above --C

--C

~

The calculations described here were performed in two steps. For each structure considered a full geometry optimization was performed using the B3LYP method and standard double zeta basis sets, which for the metals imply the use of non-relativistic effective core potentials (ECP's). For the B3LYP Hessian calculations the same basis set was used. In the second step, the B3LYP energy was evaluated for the optimized geometries using larger basis sets including diffuse functions and a single set of polarization functions on each atom. All calculations were performed with GAUSSIAN94 [13] or GAUSSIAN-98 [14]o The accuracy of different DFT methods has been tested on the standard G2 benchmark test consisting of the enthalpies of formation of 148

98

small first and second row molecules [15]. These comparisons show that the B3LYP method is clearly superior to the other DFT methods with an average deviation from experiments of only 3.11 kcal/mol [15]. This can be compared to the corresponding results of 1.58 and 0.94 kcal/mol, respectively, for the G2 [15] and G3 [16] methods, which are among the most accurate ab initio methods available. For the geometries of a 55 atom subset of the G2 benchmark test, all DFT methods give quite accurate results, perhaps slightly more accurate for the hybrid methods [17]. It is also worth noting that the geometry convergence with basis set is very fast. Due to the lack of accurate experimental values, much less is known about the accuracy of DFT methods for transition metal complexes. The few systematic theoretical studies that have been performed were recently discussed in a review [7]. For small cationic systems, the average absolute error in calculated M-R bond energies, where M is a first row transition metal and R is H, CH3, CH2 or OH, were found to be in the range 35 kcal/mol using B3LYP. For the successive M-CO bond energies in first transition row metal carbonyls, the average error was only 3 kcal/mol, and the results were in most cases within the experimental error bars. A comparison of particular interest for the present review also exists for the case of the O-H bond strength in MnO3(O-H)- [18], where the B3LYP result was found to be in good agreement with experiment. This system is similar to the model systems discussed below for PSII. When studying biochemical problems it may be important to consider also the modeling of the part of the enzyme that surrounds the part treated quantum mechanically. For the present type of transition metal complexes, it has generally been found that effects coming from outside the metal complex are quite small. These are therefore reasonably well treated by simple dielectric cavity methods. In some examples discussed here, the polarized continuum model of Tomasi et al [19] was used, but in most cases the self-consistent isodensity PCM (SCI-PCM) of Wiberg et al [20] as implemented in the GAUSSIAN-94 [13] program was used. In this method the solute cavity is determined self-consistently. The dielectric constant of the protein is the main empirical parameter of these models and in the studies discussed below it was normally chosen to be equal to 4 in line with previous suggestions for proteins. This value corresponds to a dielectric constant of about 3 for the protein itself and of 80 for the water medium surrounding the protein. A major question in the modeling discussed below for metal enzymes

99 concerns the charge state to be used of the active site complex. This question was discussed in connection to several examples of different modelings in a recent review [6]. It was concluded that the use of neutral models is in general the preferred procedure for metal complexes in the low-dielectric of proteins. The view that these metal complexes are best considered neutral is also common based on experimental experience [21]. Iron dimer complexes, which will be discussed in detail below, are illustrative examples of common situations in enzymes. For methane monooxygenase (MMO) there are very good experimental indications that the iron dimer complexes involved are all neutral. For example, normal charge counting on the reduced Fe2(II,II) complex with four carboxylates, two imidazoles and one water ligand leads to a neutral complex. The same is true for the strongly related Fe2(III,III) oxidized complex of ribonucleotide reductase (RNR) [22], where apart from the above mentioned ligands for reduced MMO, there are also a #-oxo bridge and another water ligand. Recently, X-ray structures of RNR without the iron centers show that even then the regions of the metal complexes for several different mutants are still neutral [23].

3. F O R M A T I O N

OF 02

Only one system in nature is capable of forming an O-O bond from water using visible light and this is Photosystem II found in green plants, algae and cyanobacteria. The overall reaction is given by (4), 2H20 + 4hu ---+ 4H + + 4e- + O2

(4)

where the photon wavelength is 680 nm. All attempts to reproduce the chemistry in reaction (4) by laboratory model compounds have so far been unsuccessful. In PSII, the formation of O2 is catalyzed by the wateroxidizing complex (WOC). No X-ray crystallographic study of the WOC does yet exist, and essentially all structural information about the complex is therefore derived from EXAFS and EPR studies. It is known that each PSII contains four manganese atoms, one calcium atom and one chlorine atom. Manganese and calcium are essential for O2 formation. Calcium can only be replaced by strontium [24], while the chloride can be replaced by a variety of ions and can even be removed entirely without totally suppressing the activity [25]. Several ideas about the structure of the WOC exist. The leading suggestion based on EXAFS has been a complex with two loosely coupled bis-p-oxo Mn-dimers [26], but a more tightly coupled complex is not ruled out. EPR seems to generally favor tight complexes

100

[27, 28], but other interpretations also exist [29]. Strontium EXAFS has been interpreted to show two short distances for strontium to manganese implying the same for calcium in the actual complex [30]. Apart from the direct structural information about the WOC, there are also several other pieces of critical information, on which a model of 02 formation can be built. 02 is thus found to be formed in four steps, each one involving adsorption of a photon leading to a charge separation in the reaction center [31]. These steps define the so called S-states of the WOC, where the system starts in So and goes up to $4 where 09 is evolved before it returns to the next cycle. The resting state of the enzyme is $1 in which the WOC is EPR inactive and mostly interpreted to be in an Mn4(III,III,IV,IV) state [26]. One of the most important findings for the mechanism of 02 formation is that a neutral tyrosyl, Tyrz, radical is formed in the beginning of each S-state following reduction of P680 + in the reaction center [32]. In each S-state, the Tyrz radical then becomes rereduced forming a neutral tyrosine, simultaneously with oxidation of the WOC. Independently of the mechanism of this process, which is currently under debate, this is one of the most important experimental findings for the mechanism, since it gives a direct energetic criterion for the oxidation chemistry. It means that the energy available to the water oxidizing complex in each step is approximately equal to the bond strength of the TyrzO-H bond, which is equal to 86.5 kcal/mol. This energy amount can be modified, but only slightly, by changes of the charge of the cluster and changes in hydrogen bonding occurring during the S-state transitions. Two leading models for recreation of Tyrz exists. In the first model by Babcock et al [32] termed the hydrogen abstraction mechanism, see Figure la, the tyrosyl radical obtains both the proton and the electron from the manganese complex in a concerted hydrogen atom transfer step. This requires that the O-H bond strengths of tyrosine and water coordinated to manganese are about the same. In the second model termed the electron transfer model, see Figure lb, the tyrosyl radical obtains the electron from the manganese complex and the proton from a nearby base. In that model, the water molecules which will eventually form 02, will lose their protons to a different base. This model has recently been elaborated further by Junge et al [33] and some new aspects and modifications of the proton translocation mechanism have been introduced. In the first B3LYP study of possible PSII mechanisms, the energetic fea-

101

(

9.'

a.

Tyrz 9

ss,'

Base

.,*"

+

H+

Mn4 H

Proton channel

H

Tyrz"

bo

,(k.

, e"

" ~

"W.

"-.N~-~

Base

"H§ "'r

H+ N H

Figure 1" Schematic picture of the hydrogen abstraction scheme (a) and the electron transfer scheme (b). sibility of the hydrogen abstraction model was tested [34]. Both monomeric and dimeric manganese model systems were studied, but only 5-coordinated complexes. It was found that, by coordination to a manganese cent~er, the first O-H bond strength of water is lowered to a value 0.2 kcal/mol lower than that in tyrosine. The second hydrogen abstraction energy was quite similar. Since thermoneutrality in the reaction (or a weak exothermicity) is a requirement for the hydrogen abstraction model, these calculations are in accord with this model. It should be added that the results are not inconsistent with the electron transfer model either. Later studies using 6-coordinated complexes, have given a somewhat different picture. For all these systems tried, the energy to form terminal M n - O oxo bonds has been found to be too high (by >10 kcal/mol) in comparison with the TyrO-H

102

bond strength [35, 36, 37]. Several B3LYP studies have been performed to study the mechanism by which the WOC forms the O-O bond. In the initial study [38], already prepared terminal M n ( V ) = O oxo bonds were approached to each other in order to make an O-O bond. In the same study a terminal M n ( V ) = O oxo bond was also moved towards a terminal Mn-OH hydroxo bond to form an Mn-OOH ligand. In other unpublished work, attempts to form an 02 molecule from two bridging #-oxo oxygens were also tried. Models with 5- and 6-coordinated manganese centers were investigated. For all these model reactions, very high barriers (above 25 kcal/mol) were obtained, in contrast to the barrier of about 10 kcal/mol found experimentally for PSII. There is one reason in common for the high barriers in all cases tried and this is the difficulty to reach a point where an oxyl group (oxygen radical) is formed. It was always found that very early in the reactions the M n = O oxo-bond was promoted to an Mn-O. oxyl-group which for the model complexes tried did cost too much energy. Later studies on O-O bond formation were therefore focused on the problem of creating oxyl radicals at a sufficiently low energy cost. After extensive B3LYP investigations an oxyl radical mechanism for OO bond formation in PSII was formulated [39]. The suggested mechanism includes several new idea~, which were proposed and tested. First, general spin state considerations were shown to lead to the conclusion that formation of 02 most probably will require preformation of an oxyl radical, in line with the experience obtained in the initial search for possible transition states described above. The reasoning was as follows. In a typical weak ligand field redox reaction, in which at least one metal atom changes oxidation state, this will lead to a change of ground state spin. The position of the excited states before and after the reaction are therefore critical. For a

low barrier reaction, either the excited state of the reactant corresponding to the product ground state (the high-spin state), or the excited state of the product corresponding to the reactant ground state (the low-spin state}, has to be low lying. In the case of water oxidation the reactant excited state (before O-O bond formation) is expected to be an oxyl radical since the ground state has a rather weak r-bond to the oxo ligand. This oxyl radical should be very reactive and ideal for formation of the O-O bond. The product excited state, on the other hand, is just a recoupling of the d-shell, which should not help O-O bond formation. This leads to the conclusion that it is the excited state for the reactant that has to be low

103

lying. All model calculations also point in the same direction. In fact, for a sufficiently low barrier, the oxygen radical state of the reactant has to be prepared prior to the step where the O-O bond is formed, which is after the $3 step. No oxidation of manganese should therefore occur going from $2 to $3. In subsequent studies it has furthermore been shown that the oxyl radical appears also on the low spin-state of the reactant, which means that the creation of the oxyl radical can not be avoided, either way the reaction occurs. It can also be added that these spin-state arguments do not change when a complex of several antiferromagnetically coupled manganese centers is involved, simply because the antiferromagnetic coupling is so weak for the relevant type of complexes. As an example of the size of the effects, it has been shown that the O-H bond strength of the hydroxyl ligand in the Mn2(IV,IV)-OH dimer is only 0.5 kcal/mol stronger for the antiferromagnetic than for the ferromagnetic coupling case [34, 37]. The total O-H bond strength was found to be 85.0 kcal/mol with an estimated error of 4 kcal/mol. So

M,,rm)

Sl

~~'d~') ",~l

..

,(m)

-h,

.

:( )

.N

0\1

.OH,

M~a~;,

H+e -

$3

J

$2

o

OH: I / o. 9

.OH

.o/-o , ox/~ , d-"

O~Jn.../o.

/!,

x/~

Figure 2: Proposed sequence of the S-states from So to $3 for oxygen radical formation in PSII. Protons removed are marked with *.

104

3.28 ;=2.83

% % %

Figure 3: Optimized Mn3-model structure for the $3 oxygen radical state.

Built on the previous study where a complex with only one manganese and one calcium center were used [39], an oxyl-radical mechanism has recently been suggested based on more realistic model complexes [37]. These complexes were constructed based on available experimental information mainly from EXAFS and EPR, see above, and contain three tightly bound manganese centra and a calcium center with two short distances to manganese. No position could as yet be suggested for the fourth, less tightly bound manganese center, and this center was therefore left out of the model. The type of model complex used can be seen in Figure 2 where a tentative position of the fourth manganese has also been indicated. This figure shows the suggested sequence of S-states that resulted from the model study. The model complex contains a central cube with an empty corner, and it is suggested that the essential chemistry occurs in this cube. In the $1 state, the corners are formed from two Mn, one Ca, two #-oxo and two waters. Water-oxidation is suggested to occur by removing protons from the two waters and from a/z-hydroxo group. All computed O-H bond strengths fulfill the requirement that they are close to the one of tyrosine, which is the most demanding requirement for the water oxidation chemistry. Calcium has an important chelating role in these processes and makes the O-H bonds sufficiently weak for the abstraction chemistry by the tyrosyl radical. In the $3 state the oxyl radical, required for O-O bond

105

formation, is located in a bridging position in the lower left corner of the cube in Figure 2. This assignment and the suggestion that manganese is not oxidized in the $2 to $3 transition is in line with previous suggestions based on EXAFS, XANES [26], E P R [40], and NMR [41] experiments, but differs from other suggestions based on other XANES experiments [42, 43]. The optimized S3-state structure for the Mn3 model complex is shown in Figure 3. Spins larger than 0.10 are marked in the figure and it can be seen that these are strongly localized to four centers, the three manganese centers and the oxyl radical center which has a spin population of 0.90.

o

O"" 9

I

"OH

~,

~ ML(~'~) (~l

HO

~ Mn(IV)

/Mn(IV)

C~l

/\

So

\ /

"4

+IH

4

TS:3~5

3

5

a

TS:4-->3'

TS:5~3'

+

3' -Ill

c

3-H +

:1: --.-i

CH--~3+:1:

TS:3-H +~3'-H*

+

c

3"H +

Scheme 7. Possible mechanisms for the degenerate rearrangement of the but-3-enyl radical (3).

4.1. Fragmentation-Recombination The fragmentation-recombination mechanism (step a, Scheme 7) for the rearrangement of the but-3-enyl radical (3) proceeds via a bond fission to give the vinyl radical plus ethylene (collectively re,ferred to as 4) followed by an intermolecular radical addition to form the rearranged product (3'). The energy of TS:3---~4 is found to be quite high, nearly 150 kJ tool -1 above the but-3-enyl radical (3), demonstrating that the fragmentation step is energetically unfavorable. We find that the energy increases relatively steeply as the two fragments separate, rising to more than 50 kJ mo1-1 above that of 3 at a separation of just 1.8 A. This finding may be relevant in considerations of the reaction within the confines of the cavity of the active site of the enzyme. The two fragments (4) lie in a

196

relatively shallow energy well ( 1 0 - 20 kJ tool -1 deep), indicating that if fragmentation were to be effected, then the recombination could occur relatively easily. Indeed, recent experiments [70] have given an approximate activation energy for this process of 30 kJ mo1-1, only slightly higher than our calculated values.

Table 2. Relative Energies (kJ mol-1)a of the Species Involved in the Degenerate Rearrangement of the But-3-enyl Radical (3, Scheme 7) at 0 K CBS-RADb G3(MP2)-RAD("p) 3 TS:3--~4 4 TS:3--->5 5 3-H+ TS:3-H+--->3'-H+

0.0 147.5 137.2 42.4 12.4 0.0 8.8

0.0 150.9 134.6 46.3 17.6 0.0 8.2

a Energies relative to either 3 or 3-H+. See text. b Reference [67].

4.2. Addition-Elimination The presence of a C=C double bond in the migrating group of the but-3-enyl radical introduces the possibility of the addition-elimination mechanism (path b, Scheme 7), where the appropriate intermediate is the previously discussed cyclopropylcarbinyl radical (5). We find a significant preference (ca 100 kJ tool -1) for the addition-elimination pathway compared with the fragmentationrecombination pathway. Thus it is more favorable, in the gas phase at least, for the migrating HC=CH2 group to stay bonded to the remaining framework rather than to become detached from it. The cyclopropylcarbinyl radical intermediate involved in the addition-elimination mechanism is predicted to lie in a well of depth 30 kJ mo1-1.

4.3. Facilitation by Protonation Guided by a previous study [23, 71], we were encouraged to investigate the facilitation of the concerted 1,2-shift in the but-3-enyl radical by protonation of the migrating group (step e, Scheme 7). Of the two possible protonation sites on the migrating group, we have chosen the terminal carbon for our current investigation (3-H+). The resulting reaction is equivalent to the degenerate rearrangement of a partially ring-opened methyl cyclopropane radical cation. The unsubstituted cyclopropane radical cation has received considerable experimental [72] and theoretical [73] attention and is thought to exist as three equivalent 2A 1 partially-ring-opened structures. These three equivalent structures are able to interconvert relatively easily, via three equivalent 2B2 structures. Although the symmetry is reduced in the methyl-substituted system, we are able to observe the appropriate 1,2-shift operating by a mechanism analogous to that

197

of the unsubstituted case. The barrier to interconversion for the two methylcyclopropane radical cations (less than 10 kJ mo1-1, Table 2) is found to be significantly lower than the barrier for the unassisted addition-elimination. The energetics for the three pathways discussed for the degenerate rearrangement of the but-3-enyl radical are; summarized in Figure 1. The potential energy diagram illustrates the reduced energy requirement upon moving from the fragmentation-recombination pathway to the addition-elimination mechanism. The benefits of substrate protonation are also clearly evident [67]. 200 Relative Energy

(kJ mol'l)

150

TS:3--->4

TS:4-->3'

/ (150.9) ~(134.6) /" (150.9) ....... I

100

I

~" m " m -

50

t

4

I I I I I I

t

t t t t t t

I

i TS:3---~5

--"

I / (46.3) \ (17 6)

:

9

"

_ 9

TS:5----3' ,,, ,~ (46.3)

It

""

- . . . . . i~'~ 3. H+__,3,."6~-. . . . . . . 3, 3 - H +

t

(8.2)

Z'~ 3', 3 ' - H +

-50 Figure 1. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangement of the but-3-enyl radical (see Scheme 7). Relative energies (kJ mo1-1) in parentheses.

5. METHYLMALONYL-CoA MUTASE The second B12-dependent enzyme to be discussed is methylmalonyl-CoA mutase which catalyzes the transformation of (R)-methylmalonyl-CoA to succinyl-CoA [69]"

CoAS--4'0 . H

H

-O C O_scoA H

H

(6)

This step is the culmination of a reaction sequence in which propionyl-CoA, a toxic metabolite derived from the degradation of fats, is removed from circulation. Carboxylation of propionyl-CoA gives (S)-methylmalonyl-CoA, which is epimerized to (R)-methylmalonyl-CoA. Conversion of the (R)-isomer to succinyl-CoA allows further metabolism via the Krebs cycle [74]. Methylmalonyl-CoA mutase is also the only adenosylcobalamin-dependent enzyme known to participate in human metabolism, and as such has received significant study [30, 37, 38].

198

Acceptance of the bound free-radical hypothesis in this instance results in the radical rearrangement shown in reaction 7 [47, 75]"

coAs_/,/~

O --SCoA H

" 6

(7)

7

As with the 2-methyleneglutarate mutase system, the detailed computational investigation of the methylmalonyl-CoA mutase system is somewhat complex. We therefore continue to use the 'model system' approach, and replace the SCoA and carboxylate groups by hydrogen atoms. This simplification results in the degenerate rearrangement of the 3-propanal radical (8) [69, 76]: o

o H

H

.,i

8

(8)

i

8'

We have investigated three distinct mechanistic possibilities (see Scheme 8) for the rearrangement shown in reaction 8 [69]. The relative energies are displayed in Table 3 and Figure 2.

a

p_O--]o :[: o i

TS:8-->9

0

0 ~

b

8

II 9

~

9

0__~-- ] . :[: ,

TS:9-->8'

o--J. :J:

b

.5 8'

TS:8-~8'

+ IH +

-IH c

8-H §

a

i i

oH---It:I: TS:8-H §

+

+

c

8'-H +

Scheme 8. Possible mechanisms for the degenerate rearrangement of the 3-propanal radical (8).

199

Table 3. RelativeEnergies (kJ mol-1)a of the Species Involved in the Degenerate Rearrangement of the 3-Propanal Radical (8, Scheme 8) at 0 K

CBS-RAD(p)b G.2(MP2,SVP)-RAD(p)b G3(MP2)-RAD(p) 8

TS:8--->9 9 TS:8--->8' 8-H+

TS :8-H+---->8'-H+

0.0

0.0

0.0

93.2 66.9 46.9

96.1 63.6 51.8

95.2 63.3 53.0 0.0 13.9

0.0

0.0

10.0

12.7

a Energies relative to either 8 or 8-H+. See text. b Reference [69]. 5.1. F r a g m e n t a t i o n - R e c o m b i n a t i o n

The first pathway (a, Scheme 8) for the rearrangement of the 3-propanal radical involves a homolytic bond fission in 8 to give the formyl radical plus ethylene (collectively referred to as 9) followed by an intermolecular radical addition to form the rearranged product 8'. As was the case in the degenerate rearrangement of the but-3-enyl radical, we find the fragmentation-recombination pathway for reaction 8 to be associated with a relatively high barrier (95.2 kJ tool -1, see Table 3). The separated fragments (9) are found to lie 63.3 kJ mo1-1 above the reactant (8). 5.2. Addition-Elimination

The second possible pathway (route b, Scheme 8) involves an intramolecular migration of the formyl group in what is commonly thought of as a two-step process. The first step involves an intramolecular radical addition to the carbonyl carbon to form an intermediate cyclopropyloxy radical (shown in Scheme 8 as TS:8---~8'). The three-membered ring can then undergo a ring-opening elimination reaction to give the desired product [77] (the addition-elimination mechanism). We find that the cyclopropyloxy radical lies in a very shallow well (with a depth of 0.3 kJ mo1-1) on the electronic potential energy surface, which disappears upon the inclusion of zero-point vibrational energy. We therefore conclude that the cyclopropyloxy radical does not correspond to a stable intermediate and that the addition-elimination pathway is essentially a single-step process (as shown in mechanism b, Scheme 4). The barrier for this intramolecular rearrangement (53.0 kJ tool -1, see Table 3) [78] is considerably lower than that calculated for the fragmentation-recombination pathway. 5.3. Facilitation By Protonation

Encouraged by the results of previous calculations which showed the beneficial effects of protonation in facilitating 1,2-shifts in free radicals ([20, 23, 67, 71 ] and Section 4.3), and following the specific suggestion of protonating the migrating carbonyl group [23, 71], we investigated the rearrangement of the protonated 3propanal radical (8-H+). The resulting 1,2-shift of the CHOH group (Scheme 8,

200

pathway e) is found to proceed, via a single transition structure, with an extremely low barrier (13.9 kJ mo1-1) [79]. We believe that this result is particularly important in understanding how methylmalonyl-CoA mutase catalyzes the interconversion of the substrate-derived and product-related radicals [69]. The energetics of the three pathways in Scheme 8 are compared in Figure 2 for the model of the methylmalonyl-CoA-mutase-catalyzed reaction. As for the model discussed for the rearrangement of 2-methyleneglutarate, the fragmentation-recombination mechanism is associated with the largest barrier. The barrier height is decreased on moving to the addition-elimination mechanism and further lowered upon protonation. Relative

100

9 Energy - (kJ moi'l)

TS:8---9

TS:9--~8'

'9

80 ,

i

60

I I I

/

' l I

[ / I SS (0.0) 9 l,l d/ ~ ~

j

(63.3)/

9

9

~

I

I

I I |

TS:8~8'

,

I

,'

40

|

"(95.2)I \

I

20

w

/(95.2)\

i |

(53.0) \,,

/

, N

i ",_ ~%

$8

I

I

TS:8- *'+ . . . . + ',,, ' ~ _ %% m_ ..... (13.9) . . . . . . . . . ~ \,~ (0.0)

8, 8-H +

8', 8'-H +

Figure 2. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangement of the 3-propanal radical (see Scheme 8). Relative energies (kJ mo1-1) in parentheses.

6. GLUTAMATE MUTASE The final B12-dependent carbon-skeleton rearrangement to be discussed is catalyzed by glutamate mutase, which involves the interconversion of (S)glutamate and (2S,3S)-3-methylaspartate:

M

(9) H

CO 2-

This reaction represents the first step in the fermentation of glutamate to acetate and butyrate in many clostridia [4, 80]. Once again, accepting the bound freeradical hypothesis leads to the following radical rearrangement:

201

-O2C~NH2

H'",~~O2H 10

H2N?O~

" H:"~'~H CO211

(10)

The reaction catalyzed by glutamate mutase differs from those catalyzed by other carbon-skeleton mutases because of the saturation of the migrating group. Thus, the possibility of a bridged intermediate, which provides a lower energy pathway for the other carbon-skeleton rearrangements, is more difficult to conceptualize. However, since so many experimental similarities have been observed between the enzymes, such as the nature of the reactions catalyzed, the composition of the enzymes, the cofactors required and ESR data points [6, 16], it is desirable to look for a mechanistic link between the radical rearrangements catalyzed by B12-dependent carbon-skeleton mutases. Comparison of possible radical pathways for this reaction with those previously considered for other carbon-skeleton rearrangements will yield insight into whether it is likely that all the carbon-skeleton rearrangements occur through similar pathways, or whether nature has different, equally efficient, ways to deal with related reactions. As discussed for the previous two rearrangements, the carboxylate groups in 10 and 11 were replaced with hydrogen atoms and the computational problem reduces to investigating the rearrangement of the radical derived from propylamine [35]:

H4NH2 12

H2N~H H:~--~,. H

(11)

12'

Once again a number of different pathways fi3r the degenerate rearrangement of the aminopropyl radical can be considered (Scheme 9) [35], including pathways that are analogous to those examined as models for the reactions catalyzed by methyleneglutarate mutase (Scheme 7) and methylmalonyl-CoA mutase (Scheme 8).

6.1. Fragmentation-Recombination Pathway for the Rearrangement of the Aminopropyl Radical Proposals for a "unified mechanism" for all B12-assisted carbon-skeleton rearrangements have focused on the fragmentation-recombination pathway [16, 47]. This proposal is attractive since the formation of separated products for all the reactions can be easily envisioned. The fragmentation-recombination pathway for the rearrangement of the aminopropyl radical (path a, Scheme 9) initially produces ethylene plus the aminomethyl radical (collectively referred to as 13). This step is associated with a high barrier (97.2 kJ mo1-1, Table 4). We note that the separated fragments lie in a shallow energy well (36.2 kJ tool-l), indicating

202

that recombination of the separated products is a favorable process if fragmentation occurs. The prediction of a high barrier for this rearrangement pathway is consistent with our calculations regarding fragmentationrecombination for the model systems used to study the rearrangement of the 2methyleneglutarate and (R)-methylmalonyl-CoA substrate-derived radicals.

J

,._NH2-'[. :l:

a

NH2 I

,

H2N---,---I ~ :l:

i ,_...

i

TS:12~13

NH2

13

TS:13-~12' H2N

._2

12

12'

,~=NH-I :1: ~

/

14

"-

b/f

i i

~

NH2

14-H +

HN------~7":1:

9

, i i

.~

b TS: 14-->15

--I ~ NH

TS:I 4-->16

16

d

TS:15-->14' " ~

15

HN ~1.:1:

L

I

NH II

c

:J

Nh-1~ :1:

:5 14' l

H-

TS:I 6-->14'

NH--[~ +:1:

TS: 14-H*~ 14'-H*

d

H2N

.3 14'-H*

Scheme 9: Possible mechanisms for the degenerate rearrangements of the aminopropyl (12) and iminopropyl (14) radicals.

6.2. Rearrangement of the Iminopropyl Radical Due to the high energy associated with the fragmentation-recombination mechanism in the (S)-glutamate model system, it is attractive to consider alternative rearrangement pathways. It has been proposed that interactions between a group within the enzyme and the amino group of (S)-glutamate may lead to the formation of an imine and thereby facilitate the rearrangement of the substrate by permitting a cyclic intermediate [81, 82]. There are precedents for the transformation of amines to imines in other enzyme systems [6, 83]. Despite the fact that experimental evidence for the presence of such groups in glutamate mutase remains to be found [39, 53, 81, 84], it is still of interest to investigate the energetics of this reaction pathway and to determine whether it provides a lower

203

energy route. Thus, three mechanistic pathways will be considered for the rearrangement of the iminopropyl radical (equation 12 and Scheme 9 b, e and d). H

.NH

HN.

H

(12) 14

14'

Relative energies for the species involved in this reaction pathway are included in Table 4 and Figure 3. Table 4. Relative Energies (kJ mol-1)a for the Species Involved in the Rearrangement of the Aminopropyl (12) and Iminopropyl (14) Radicals (Scheme 9) at 0 K G3(MP2)-RAD(p) 12 TS: 12---~13 13 14 TS: 14--->15 15 TS: 14---)16 16

0.0 97.2 61.1 0.0 118.0 90.2 52.4 37.6

14-H + TS: 14-H+---~14'-H + a

0.0 19.0

Energies relative to 12, 14 or 14-H+. See text.

Once again, the fragmentation-recombination pathway (path b, Scheme 9) is associated with a very high energy transition structure (118.0 kJ mo1-1) with respect to the reactant radical (14). Even the separated fragments (15) lie 90.2 kJ mo1-1 above the reactant. A low barrier (27.8 kJ mo1-1) between the separated fragments and product radical indicates t]hat product formation from the separated fragments will occur readily provided fragmentation is achieved. Clearly, this pathway is less favorable than the fragmentation-recombination of the aminopropyl radical. More specifically, not only is the barrier height increased by 20.8 kJ mo1-1, but the pathway is also more complicated since an imine must be formed in a preliminary step. The second possible route for the rearrangement of iminopropyl radical involves the formation of a cyclic intermediate (16), and the subsequent elimination of the amino carbon to yield the product radical (path e, Scheme 9). The barrier for this addition-elimination pathway (52.4 kJ mo1-1) is significantly lower than the barrier for the fragmentation-recombination of the iminopropyl radical (118.0 kJ mo1-1) or the aminopropyl radical (97.2 kJ mol-1). Additionally,

204

the cyclic intermediate is only 37.6 kJ mo1-1 higher in energy than the reactant radical and, if formed, is separated by only a small barrier (14.8 kJ mo1-1) from the product radical. Therefore, in the gas-phase, an intramolecular rearrangement is favored over one involving bond fragmentation for the rearrangement of the iminopropyl radical. The third possibility for radical rearrangement (path d, Scheme 9) involves protonation of the iminopropyl radical. In contrast to the addition-elimination mechanism for the neutral system, the protonated cyclic structure (TS:14H+---~14'-H +) is found to be a transition structure, rather than a stable intermediate. This transition structure lies 19.0 kJ mo1-1 above the reactant radical (14-H+). Protonation of the reactant radical thus leads to a significant reduction in the barrier height (by 33.4 kJ mo1-1). Relative

100

"Energy --(kJ mo1-1)

TS:14--,15 (118.0) TS:15-->14' (118.0) i \ / i 1 'i ~ \ ~,, ( 9 0 . 2 ) / / 1

"

60

TS:12-~13 ~ 9 7 2~ " " "

II

i

(52.4) -"

20

15

I

iJ~ ~,m~

9

40

I

,,

80

-9

--

J 9 ~i

t

,

~ i! ~

\(61.1)/ e 13

~._ "~.

~l~

s I

TS:13-~12' (97.2)

(52.4) tt~ /

_,,'

9

I

tt I1

[[~ "~,, ( 3 7 . 6 ) / ~~ /iTS:14--~16 " " TS:16--~14 AI ttti i/ 16 I1~ ii

_.-

-._ ~

I ,,,'"" (19.0) ..... (0.0) L,,.-'"" TS:14"H+-~14"H+ 12, 14, 14-H +

~

',~l

'~ """""1

(0.0)

12', 14', 14'-H +

Figure 3. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangements of the aminopropyl (12) and iminopropyl (14) radicals (see Scheme 9). Relative energies (kJ tool- ]) in parentheses. 6.3. Hydride Ion Removal from the Aminopropyl Radical The pathway involving cyclization of a protonated migrating group provides a very appealing alternative to the fragmentation-recombination pathway. Given the lack of evidence for imine formation, it is interesting to note that the formation of a protonated imine (14-H +) in the model system can alternatively arise formally as the result of removal of a hydride ion from the parent (saturated) system (12), aminopropyl (path e, Scheme 9). To examine the feasibility of cation formation through hydride ion abstraction, we obtained an estimate for the barrier for hydride ion removal. Due to complications associated with the gas-phase reaction, we modeled the abstraction by considering the 1-aminoethyl cation abstracting a hydride ion from the neutral aminopropyl radical. Although hydride abstraction might have been expected to be a high-energy process, the calculated barrier for this model reaction is only 13

205

kJ mo1-1, a value small enough to have only a minimal effect on the enzymatic turnover rate. The pathways discussed for the rearrangements of the aminopropyl and iminopropyl radicals are compared in Figure 3. Once again, the fragmentationrecombination mechanism offers a high-energy route. The benefit of hydride ion removal from the parent system, leading to a protonated imine, is also clearly apparent. As discussed in the following section, trends between the model systems begin to emerge. 7. COMPARISON OF THE MODELS FOR B12-DEPENDENT CARBONSKELETON MUTASES

To obtain an overview of the reactions catalyzed by B12-dependent carbonskeleton mutases, we present a comparison of the G3(MP2)-RAD(p) barrier heights for the different pathways considered Jin the present work as a function of the migrating group (Figure 4). ii

Relative Energy (kJ mo1-1)

(150.9) 150

-

- .....

r g t u '%a-men'a"on-recomk'na':on u,u (118.0)

(95.2)

100

(97.2)

Addition-elimination

50

(46.3)

(52.4)

(53.0)

Protonation (19.0)

(8.2) CH=CH 2

(13.9) ,

CH=NH

CH=O

CH2-NH 2

Figure 4. Comparison of the G3(MP2)-RAD(p) energy requirements for the fragmentationrecombination, addition-elimination and protonated pathways for the model systems of B12dependent carbon-skeleton mutases with migrating groups CH=CH2, CH=NH, CH=O and CH2-NH2.

Some important trends are apparent in Figure 4. In the first place, the fragmentation-recombination barrier heights for the model systems have consistently high values of between approximately 95 and 150 kJ mo1-1. The fragmentation-recombination barrier height depends on the migrating group, with barriers decreasing in the order CH=CH2 > CH=NH > CH2NH2 > CH=O. This

206

trend presumably reflects differences in the stability of the radical fragment in the high-energy route. Since the barrier heights for B12-assisted 1,2-shifts are estimated from the reaction rates to fall within or below a range of 50 to 75 kJ tool -1 (see Section 2.4), for the fragmentation-recombination mechanism to be plausible, the enzyme would be required to substantially reduce the activation barrier for this route. How the enzyme could perform such a feat is not immediately apparent. Although the calculations on the model systems cannot be used to rule out the fragmentation-recombination pathway, the high barrier implies that an alternative mechanism may be important. The intramolecular addition-elimination mechanism, which is possible when the migrating group is unsaturated, provides a lower energy pathway than fragmentation-recombination. Whether or not the three-membered cyclic structure associated with this pathway is a transition structure (3-propanal radical rearrangement) or a stable intermediate (but-3-enyl and iminopropyl radical rearrangements) does not affect this general conclusion. The barriers for the addition-elimination route lie between 46 and 53 kJ mo1-1, significantly less than those calculated for the fragmentation-recombination pathway. A pathway involving a cyclic intermediate could not be characterized for the CH2-NH2 migrating group, possibly due to the high energy expected for such a structure. The addition-elimination barriers fall within the range estimated for B12dependent 1,2-shifts, but the pathway becomes energetically still more favorable when the migrating group is protonated. In fact, the barrier heights for the protonated pathways of all the model systems with unsaturated migrating groups fall below 20 kJ mo1-1. This protonation alternative is very appealing and could provide a clue as to how these demanding reactions are catalyzed by the enzymes. The applicability of this finding to enzyme catalysis is considered in the following section. 8. THE PARTIAL-PROTON-TRANSFER CONCEPT

The results for the models of carbon-skeleton rearrangements suggest that protonation of the migrating group would facilitate the reactions. However, the concept of substrate protonation, while energetically attractive, carries with it the problem that it is difficult to achieve substantial protonation of a weak base with the weakly acidic groups available to enzymes [85]. For example, the pKa of the conjugate acid of a thioester carbonyl oxygen is estimated to be around -6 [86], so even the strongest conceivable acid in a protein cannot be expected to generate a substantial concentration of protonated substrate in the reaction catalyzed by methylmalonyl-CoA mutase (reaction 6). Owing to the problems associated with mechanisms involving full protonation, we have considered whether partial-proton-transfer would be sufficient to activate the migrating group [87]. To investigate such behavior, we examined the interaction of the 3-propanal radical with a set of representative acids (NH4 +, HF and H30 +) with the CBS-RAD(p) method [69]. This choice encompasses a wide range of acid strengths, as measured by the proton affinities (PAs) of the conjugate bases (F-= 1556.0, NH3 --- 848.6, and H20 = 680.1 kJ tool-l) [88]. The geometries of the substrate-acid complexes suggest partial-proton-transfer through changes in the C=O and acidic proton ( X ~ H ) bond lengths. The

207

distance between the acidic proton and the carbonyl oxygen of the 3-propanal radical in the relevant complexes is the most direct measure of the degree of proton transfer to oxygen. We find that this distance decreases across the acid series from infinity (no protonation), to 1.727 A (HF), 1.503/k (NH4+), 1.046 (H30+), and 0.976 A (full protonation). A similar trend (but in the opposite direction) is found for the C=O bond length, with HF causing a slight lengthening to 1.221 A, NH4 § to 1.235, while H30 + imparts the largest effect with a calculated carbonyl bond length of 1.273 A. This same monotonic trend can be found for several of the other geometric parameters. The most striking consequence of the transition from non-protonation to complete protonation of the carbonyl group in the 3-propanal radical is the monotonic lowering of the barrier to migration of the formyl group (see Figure 5) [69, 89]. 50 40

30

_ Relative Energy

(kJ mo1-1)

(46.9) -~ ! E

"9

HF

NH 4+

/~// ~ (41.4)

~l ~ (24.5) t .E

g~

20 10

no protonation

I I

-\\,, II I1 II II

o

I

t / ,~s~, : , . ~ 1 t full protonation //.,,,f" (10.0) ~ ' ~ % ,, Reactant

Product

Figure 5. Schematic CBS-RAD(p) energy profiles showing barriers (kJ tool -1) for the rearrangement of the 3-propanal radical (8) assisted by acids of varying strength.

As might have been expected from the barriers in the extreme cases (Figure 4 and Tables 2 - 4), a greater degree of proton transfer is associated with a lower barrier to rearrangement. The acidity of H30 + is sufficient to result in a barrier (10.3 kJ mo1-1) virtually identical to that calculated for full protonation, while the barrier with HF as the acid (41.4 kJ mo1-1) shows that even a small amount of proton transfer can result in a significant decrease in the barrier for migration. With the ammonium ion, the moderately high proton affinity of ammonia maintains the relatively strong binding of the proton while allowing sufficient proton transfer to facilitate the rearrangement, to the extent that the barrier is reduced to 24.5 kJ mo1-1. In the context of enzymatic catalysis, this situation might be regarded as ideal since significant barrier lowering can be achieved without deprotonation of the enzyme. It is possible to phrase the partial-proton-transfer concept in the same language of hydrogen bonding that has been employed in the current debate as to whether

208

or not "low-barrier" hydrogen bonds (LBHBs) or "short strong" hydrogen bonds (SSHBs) can be important in enzymatic catalysis [90]. The discussion has focussed on concepts such as the pKa matching of the H-bonding donor and acceptor [91 ], the positioning of the shared proton [92], the distance between the donor and acceptor atoms [93, 94], the strength of the hydrogen bond [94, 95] and the non-existence of "short-strong" H-bonds under certain solvation conditions [96, 97]. We believe that our results and their interpretation make an important contribution to this debate and that it is conceptually instructive to examine the "low-barrier" hydrogen-bonding hypothesis in terms of partialproton-transfers. The lowering of a reaction barrier by protonation is equivalent to saying that the transition structure interacts more favorably with the proton than does the reactant. For example, at the CBS-RAD(p) level, the energy of the transition structure (TS:8---~8')is lowered by 825.1 kJ tool -1 upon protonation while the 3propanal radical (8) has a proton affinity of 788.2 kJ mo1-1. The difference between these two energies of 36.9 kJ mo1-1 is exactly the reduction in barrier associated with protonation. The same concept applies to partial protonation. We note in the first place that the gas-phase hydrogen bond between the 3-propanal radical and NH4 + is quite strong (96.9 kJ mol-1), despite the fact that the proton transfer between donor and acceptor is described by a single, asymmetric energy well. However, the 22.3 kJ mo1-1 lowering of the rearrangement barrier (corresponding to a rate increase of ca five orders of magnitude) by NH4 + comes not from the strength of this hydrogen bond but from the fact that the interaction between NH4 + and the transition structure (119.2 kJ mol-1) is 22.3 kJ mo1-1 stronger than its interaction with the reactant, due to the higher 'proton affinity' of the former species. This reasoning is supported by the geometric parameters, in that the degree of proton transfer to the transition structure is greater than it is to the reactant. In an enzymatic reaction facilitated by protonation, the proton-accepting site will generally carry some small amount of negative charge, making it a good candidate for binding to a proton donor in the protein via a hydrogen bond. If such a hydrogen bond exists and remains intact during the course of the reaction then, regardless of the strength of the H-bond donor, the barrier will be lowered simply because the transition structure interacts more strongly with the proton than does the substrate. The transition from a "weak" hydrogen bond to a "short-strong" hydrogen bond is continuous and, regardless of where a particular H-bonding interaction happens to lie on this scale, there will be a contribution to the lowering of the barrier made by partial-proton-transfer. Our thesis is simple: any reaction that is facilitated by protonation will also be facilitated (to a moderated extent) by the partial-proton-transfer that enzymatic hydrogen bonding can provide. It is unlikely that a given partial-proton-transfer would be overly efficient in aqueous solution. In much the same way as has been argued in the context of the LBHB hypothesis [97], the hydrogen bonding donor/acceptor properties of water and the entropic disorder associated with such a solution are likely to disrupt the hydrogen bond. However, the active sites of many enzymes are sequestered from bulk water, at least to some extent, and may therefore provide

209

an environment well suited to hydrogen bonding relatively undisrupted by solvent. In particular, the active site of methylmalonyl-CoA mutase has been shown to be deeply buried and largely inaccessible to solvent [37], seemingly providing such a tailored environment. Furthermore, the X-ray crystal structures [37, 38] indicate that an active site histidine residue (His244) is in a position to bind the carbonyl oxygen of the substrate by means of a hydrogen bond. We suggest that this hydrogen bond not only serves to bind the substrate but also provides partial-proton-transfer for catalysis. In this way, the enzyme can take advantage of the proton-induced barrier-lowering that is available for the intramolecular rearrangement, without resorting to the extreme of full protonation. Although we have only discussed the partial-proton-transfer model for one of the B12-dependent carbon-skeleton mutases [69], we can expect that there also exists a continuum between no protonation and full protonation of the substrate in the other reactions. Analogously, partial hydride removal from the substrate of glutamate mutase may serve to facilitate this rearrangement. 9. CONCLUSIONS The main focus of the current chapter is to gain a greater understanding about the radical rearrangement step in reactions catalyzed by B 12-dependent carbonskeleton mutases. Through a 'model system' approach, estimates of the barrier heights associated with a variety of radical rearrangement pathways were obtained from high-level molecular orbital calculations. General trends through a series of model B12-assisted carbon-skeleton rearrangements are apparent. Foremost, a recently suggested mechanism involving complete detachment of the migrating group from the rest of the molecule (i.e., fragmentation-recombination) is found to require significantly more energy than an intramolecular pathway (i.e., addition-elimination). In addition, protonation of the substrate reduces the addition-elimination barrier height, thus identifying a way for the enzyme to facilitate these otherwise energetically demanding reactions. Although full protonation may not be feasible within the enzymatic environment, our calculations show that partial-proton-transfer from the enzyme to the substrate can provide a significant reduction in the energy requirement for the rearrangement. Evidence that this mechanism is plausible in the case of the reaction catalyzed by methylmalonyl-CoA mutase is provided by the X-ray crystal structure. Although the effects of the carboxylate groups, and other substituents, were not discussed in this article, this represents an important area of ongoing research. Accounting for these groups can provide additional information about the enzyme-catalyzed reactions, such as the exothermicity and stereochemistry. Preliminary results indicate that, although there are differences in the details between models that neglect and account for the carboxylate groups, in most cases the small models provide an adequate description of the gas-phase rearrangements. In some instances, the magnitude of the barrier reduction between different pathways is diminished, but nevertheless it is still at hand. Despite the abundance of literature on coenzyme-B12 and the reactions it catalyzes, the field remains open. We hope that the present article provides useful

210

insights into the substrate chemistry in B12-dependent carbon-skeleton rearrangements and allows informed speculation about the function of the related enzymes.

ACKNOWLEDGEMENTS We thank Professor Bernard Golding for his insightful contributions to our general program of theoretical studies of enzyme-catalyzed reactions, and thank the Australian National University Supercomputing Facility for generous allocations of computer resources.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8.

9.

10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

B. Krautler, D. Arigoni and B. T. Golding, Vitamin B12 and B12-Proteins; Wiley-VCH: Weinheim, 1998. (a) E. L. Rickes, N. G. Brink, F. R. Koniuszy, T. R. Wood and K. Folkers, Science, 107 (1948) 396. (b) E. L. Smith and L. F. Parker, Biochem. J., 43 (1948) 7. R. West, Science, 107 (1948) 398. H.A. Barker, H. Weissbach and R. D. Smyth, Proc. Natl. Acad. Sci. USA, 44 (1958) 1093. B.T. Golding, Chem. Br., 26 (1990) 950. B.T. Golding and W. Buckel, Comprehensive Biological Catalysis, M. L. Sinnott (ed.), Academic Press, London, 1997, Vol. III, pp 239. (a) D. C. Hodgkin, J. Kamper, M. Mackay, J. Pickworth, K. N. Trueblood and J. G. White, Nature, 178 (1956) 64. (b) P. G. Lenhart and D. C. Hodgkin, Nature, 161 (1961) 937. (a) K. Bemhauer, O. Muller and F. Wagner, Angew. Chem. Int. Ed. Engl., 3 (1964) 200. (b) A. Eschenmoser, Chem. Soc. Rev., 5 (1976) 377. (c) A. Eschenmoser, R. Scheffold, E. Bertele, M. Pesaro and H. Gschwend, Proc. Roy. Soc., 288 (1965) 306. (d) A. W. Johnson, Chem. Soc. Rev., 4 (1975) 1. (e) D. C. Black, V. M. Clark, B. G. Odell and L. Todd, J. Chem. Soc., Perkin Trans. I, (1976) 1944. (f) R. V. Stevens, Tetrahedron Lett., 32 (1976) 1599. (g) R. B. Woodward, Pure Appl. Chem., 33 (1973) 145. (h) A. Eschenmoser, Science, 196 (1977) 513. (a) A. I. Scott, Tetrahedron Lett., 50 (1994) 13313. (b) F. Blanche, B. Cameron, J. Crouzet, L. Debussche, D. Thibaut, M. Vuilhorgne, F. J. Leeper and A. R. Battersby, Angew. Chem. Int. Ed. Engl., 34 (1995) 384. (c) A. R. Battersby and F. J. Leeper, Chem. Rev., 90 (1990) 1261. (d) P. M. Shoolingin-Jordan, J. Bioener. Biomembr., 27 (1995) 181. (e) P. Renz, B. Endres, B. Kurz and J. Marquat, Eur. J. Biochem., 217 (1993) 1117. L. Ellenbogen and B. A. Cooper, Handbook of Vitamins, L. J. Machlin (ed.), Marcel Dekker, New York & Basel, 1991, pp 491. R. H. Abeles, Proceedings Robert A Welch Foundation, Conf. Chem. Res., Vol XV, BioOrganic Chemistry and Mechanisms, W. O. Milligan (ed.), 1972, pp 95. (a) P. Dowd and R. Hershline, J. Chem. Soc. Perkin. Trans. 2, (1988) 61. (b) H. Kung and T. C. Stadtman, J. Biol. Chem., 246 (1971) 3378. (c) G. Hartrampf and W. Buckel, Eur. J. Biochem., 156 (1986) 301. H. Eggerer, E. R. Stadtman, P. Overath and F. Lynen, Biochem. Z., 333 (1960) 1. H. A. Barker, V. Roose, F. Sizuki and A. A. Iodice, J. Biol. Chem., 242 (1967) 878. A. Munch-Peterson and H. A. Barker, J. Biol. Chem., 230 (1958) 649. W. Buckel and B. T. Golding, Chem. Soc. Rev., 26 (1996) 329. (a) M. Michenfelder, W. E. Hull and J. R6tey, Eur. J. Biochem., 168 (1987) 659. (b) G. C. Hall, Proc. Roy. Soc. (London), A205 (1951) 541. The enzyme-coenzyme partnership involving diol dehydratase catalyzes the dehydration of both ethane- 1,2-diol and propane- 1,2-diol. J. R6tey, A. Umani-Ronchi, J. Seibl and D. Arigoni, Experientia, 22 (1966) 502. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 5700.

211

21. 22. 23. 24. 25. 26. 27.

28. 29. 30. 31. 32. 33.

34. 35. 36. 37. 38. 39. 40.

41. 42. 43.

D. Smith, B. Golding and L. Radom, submitted for publication. J. Stubbe, Biol. Chem., 265 (1990) 5330. B.T. Golding and L. Radom, J. Am. Chem. Soc., 98 (1976) 6331. (a) B. M. Babior, Acc. Chem. Res., 2498 (1975) 376. (b) J. R6tey, Recent Adv. Phytochem., 13 (1979) 1. (c) B. T. Golding, B12, D. Dolphin (led.), J Wiley & Sons, New York, 1982, Vol. 1, pp 543. R. G. Finke, D. A. Schiraldi and B. J. Mayer, Coord. Chem. Rev., 54 (1984) 1. J. Halpern, Science, 227 (1985) 869. (a) J. R6tey, Angew. Chem. Int. Ed. Eng., 29 (1990) 355. (b) B. Zagalak and W. Friedrich, Vitamin B 12. Proceedings of the Third European Symposium on Vitamin B 12 and Intrinsic Factor; Walter de Gruyter: New York, 1979. (c) D. Dolphin, B12; Wiley-Interscience, New York, 1982, Vol. 1 and 2. (d) R. H. Abeles and D. Dolphin, Acc. Chem. Res., 9 (1976) 114. B. M. Babior, Biofactors, 1 (1988) 21. (e) B. Krautler, Cobalt, Blz-Enzymes and Coenzymes, Encyclopedia of Inorganic Chemistry, John Wiley & Sons, Chichester, England, 1994, Vol. 2. R. G. Finke, Molecular Mechanisms in B ioorganic Processes, C. B leasdale and B. T. Golding (eds.), The Royal Society of Chemistry, Cambridge, 1990, pp 281. P. Dowd, Selective Hydrocarbon Activation, J. A. Davies, P. L. Watson, J. F. Liebman and A. Greenberg (eds.), VCH, New York, 1990, pp 26:5. M. L. Ludwig and R. G. Matthews, Ann. Rev. Biochem., 66 (1997) 269. (a) W. H. Orme-Johnson, H. Beinert and R. L. Blakley, J. Biol. Chem., 249 (1974) 2338. (b) S. A. Cockle, H. A. O. Hill, R. J. P. Williams, S. P. Davies and M. A. Foster, J. Am. Chem. Soc., 94 (1972) 275. K.N. Joblin, A. W. Johnson, M. F. Lappert, M. R. Hollaway and H. A. White, FEBS Lett., 53 (1975) 193. (a) H. F. Kung and L. Tsai, J. Biol. Chem., 246 (1971) 6436. (b) R. H. Abeles and B. Zagalak, J. Biol. Chem., 241 (1966) 1245. (c) P. A.. Frey and R. H. Abeles, J. Biol. Chem., 241 (1966) 2732. (d) P. A. Frey, M. K. Essenberg and R. H. Abeles, J. Biol. Chem., 242 (1967) 5369. (e) J. R6tey and D. Arigoni, Experientia, 24 (1966) 783. (f) R. L. Switzer, B. G. Baltimore and H. A. Barker, J. Biol. Chem., 244 (1969) 5263. (g) B. M. Babior, Biochem. Biophys. Acta., 167 (1968) 456. (h) J. R6tey, F. Kunz, T. C. Stadtman and D. Arigoni, Experientia, 25 (1968) 802. T. W. Meier, N. H. Thoma and P. F. Leadlay, Bioc,hemistry, 35 (1996) 11791. S. D. Wetmore, D. M. Smith and L. Radom, work in progress. (a) C. Luschinsky-Drennan, S. Huang, J. T. Drummond, R. G. Matthews and M. L. Ludwig, Science, 266 (1994) 1669. (b) N. Shibata, J. Masuda, T. Tobimatsu, T. Toraya, K. Suto, Y. Morimoto and N. Yasuoka, Structure, 7 (1999) 997. F. Mancia, N. H. Keep, A. Nakagawa, P. F. Leadlay, S. McSweeney, B. Rasmussen, P. Bosecke, O. Diat and P. R. Evans, Structure, 4 (1996) 339. (a) F. Mancia and P. R. Evans, Structure, 6 (1998) 711. (b) N. H. Thoma, T. W. Meier, P. R. Evans and P. F. Leadlay, Biochemistry, 37 (1998) 14386. R. Reitzer, K. Gruber, G. Jogl, U. G. Wagner, H. Bothe, W. Buckel and C. Kratky, Structure, 7 (1999) 891. (a) Y. Zhao, P. Such and J. R6tey, Angew. Chem. Int. Ed. Engl., 31 (1992) 215. (b) M. G. N. Hartmanis and T. C. Stadtman, Proc. Natl. Acad. Sci., 84 (1987) 76. (c) O. Zelder, B. Beatrix, U. Leutbecher and W. Buckel, Eur. J. Biochem., 226 (1994) 577. (d) B. M. Babior, T. H. Moss, W. H. Orme-Johnson and H. Beinert, J. Biol. Chem., 249 (1974) 4537. (e) O. Zelder and W. Buckel, Biol. Chem. Hoppe-Seyler, 374 (1993) 84. (f) Y. Zhao, A. Abend, M. Kunz, P. Such and J. R6tey, Eur. J. Biochem., 225 (1994) 891. R. Padmakumar and R. Banerjee, J. Biol. Chem., 270 (1995) 9295. R. Padmakumar, S. Taoka, R. Padmakumar and R. Banerjee, J. Am. Chem. Soc., 117 (1995) 7033. H. Bothe, D. J. Darley, S. P. J. Albracht, G. J. Gerfen, B. T. Golding and W. Buckel, Biochemistry, 37 (1998) 4105.

212

44. (a) M. He and P. Dowd, J. Am. Chem. Soc., 120 (1998) 1133. (b) R. B. Silverman and D. Dolphin, J. Am. Chem. Soc., 98 (1976) 4626. (c) W. A. Mulac and D. Meyerstein, J. Al'n. Chem. Soc., 104 (1982) 4124. (d) R. G. Finke, W. P. McKenna, D. A. Schiraldi, B. L. Smith and C. Pierpont, J. Am. Chem. Soc., 105 (1983) 7592. 45. A. Greenberg and J. F. Liebman, Energetics of Organic Free Radicals, J. A. M. Simoes, A. Greenberg and J. F. Liebman (eds.), Blackie Academic and Professional, London, 1996, Vol. 4, pp 224. 46. J.J. Russell, H. S. Rzepa and D. A. Widdowson, J. Chem. Soc., Chem. Commun. (1983) 625. 47. B. Beatrix, O. Zelder, F. K. Kroll, G. Orlygsson, B. T. Golding and W. Buckel, Angew. Chem. Int. Ed. Engl., 34 (1995) 2398. 48. J. Pacansky, R. J. Waltman and L. A. Barnes, J. Phys. Chem., 97 (1993) 10694. 49. T. Hoz, M. Sprecher and H. Basch, J. Phys. Chem., 89 (1985) 1664. 50. D. A. Lindsay, J. Lusztyk and K. U. Ingold, J. Am. Chem. Soc., 106 (1984) 7087. 51. T. Hoz, M. Sprecher and H. Basch, J. Mol. Struct. Theochem, 150 (1987) 51. 52. P. George, J. P. Glusker and C. W. Bock, J. Am. Chem. Soc., 119 (1997) 7065. 53. D. E. Holloway and E. N. G. Marsh, J. Biol. Chem., 269 (1994) 20425. 54. (a) W. W. Bachovchin, R. G. E. Jr., K. W. Moore and J. H. Richards, Biochemistry, 16 (1977) 1082. (b) B. M. Babior, B 12, D. Dolphin (ed.), John Wiley & Sons, New York, Vol. 2, pp 263. 55. (a) M. W. Wong and L. Radom, J. Phys. Chem., 99 (1995) 8582. (b) P. M. Mayer, C. J. Parkinson, D. M. Smith and L. Radom, J. Chem. Phys., 108 (1998) 604. (c) P. J. Knowles, S. J. Andrews, R. D. Amos, N. C. Handy and J. A. Pople, Chem. Phys. Lett., 186 (1991) 130. (d) H. B. Schlegel, J. Chem. Phys., 84 (1986) 4530. (e) J. F. Stanton, J. Chem. Phys., 101 (1994) 371. 56. M. W. Wong and L. Radom, J. Phys. Chem., 102 (1998) 2237. 57. S. Wollowitz andJ. Halpern, J. Am. Chem. Soc., 110 (1988) 3112. 58. M. Newcombe, Tetrahedron, 49 (1993) 1151. 59. F. N. Martinez, H. B. Schlegel and M. Newcombe, J. Org. Chem., 61 (1996) 8547. 60. M. Newcombe and A. G. Glenn, J. Am. Chem. Soc., 111 (1989) 275. 61. (a) A. Effio, D. Griller, K. U. Ingold, A. L. J. Beckwith and A. K. Serelis, J. Am. Chem. Soc., 102 (1980) 1734. (b) B. Maillard, D. Forrest and K. U. Ingold, J. Am. Chem. Soc., 98 (1976) 7024. (c) D. F. McMillen, D. M. Golden and S. W. Benson, Int. J. Chem. Kinet., 3 (1971) 358. (d) J. D. Cox and G. Pilcher, Thermochemistry of Organic and Organometallic Compounds, Academic Press, New York, 1970. (e) W. Tsang, J. Am. Chem. Soc., 107 (1985) 2872. 62. D. M. Smith, A. Nicolaides, B. T. Golding and L. Radom, J. Am. Chem. Soc., 120 (1998) 10223. 63. J. A. Montgomery, Jr., M. J. Frisch, J. W. Ochterski and G. A. Petersson, J. Chem. Phys., 110 (1999) 2822, and references therein. 64. L. A. Curtiss, K. Raghavachari, P. C. Redfern, V. Rassolov and J. A. Pople, J. Chem. Phys., 109 (1998) 7764, and references therein. 65. A. M. Mebel, K. Morokuma and M. C. Lin, J. Chem. Phys., 103 (1995) 7414. 66. (a) C. J. Parkinson, P. M. Mayer and L. Radom, Theor. Chem. Acc., 102 (1999) 92. (b) C. J. Parkinson and L. Radom, work in progress. 67. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 1037. 68. H. F. Kung, S. Cederbaum, L. Tsai and T. C. Stadtman, Proc. Natl. Acad. Sci. USA, 65 (1970) 978. 69. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 9388. 70. J. M. Roscoe, I. S. Jayaweera, A. L. Mackenzie and P. D. Pacey, Int. J. Chem. Kinet., 28 (1996) 181. 71. B. T. Golding and L. Radom, J. Chem. Soc. Chem. Commun. (1973) 939. 72. (a) X. Z. Qin and F. Williams, Tetrahedron Lett., 42 (1986) 6301. (b) S. Lunell, I. Yin and M. B. Huang, Chem. Phys., 139 (1989) 293. (c) L. W. Sieck, R. Gordon and P. Ausloos, J. Am. Chem. Soc., 94 (1972) 7157.

213

73. (a) K. Krogh-Jespersen and H. D. Roth, J. Am. Chem. Soc., 114 (1992) 8388. (b) A. Skancke, J. Phys. Chem., 99 (1995) 13886. (c) P. Du, D. A. Hrovat and W. T. Borden, J. Am. Chem. Soc., 110(1988)3405. 74. J. R6tey, B 12, D. Dolphin (ed.), John Wiley & Sons, New York, 1982, Vol. 2, pp 357. 75. J. R6tey and J. A. Robinson, Stereospecificity in Organic Chemistry and Enzymology, H. F. Ebel (ed.), Verlag Chemie, Weinheim, 1982, pp 185. 76. For simplicity, we refer to structure 8 (~ as the 3-propanal radical. This species may be better named as the 3-oxoprop-1-yl radical. 77. B. Giese and H. Horler, Tetrahedron Lett., 24 (1983) 3221. 78. The quoted barrier corresponds to the energy of the symmetrical structure which, after inclusion of the zero-point energy, is higher than the two non-symmetrical transition structures. 79. This same shift has been investigated previously in the context of mass spectrometry experiments using lower-level molecular orbital calculations and mass spectrometry: G. Bouchoux, A. Luna and J. Tortajada, Int. J. Mass Spectrom. Ion Proc., 167 (1997) 353. 80. (a) H. A. Barker, R. D. Smyth and R. M. Wilson, Ref. Proc., 17 (1958) 185. (b) H. A. Barker, R. D. Smyth, E. J. Wawszkiewicz, M. N. Lee and R. M. Wilson, Arch. Biochem. Biophys., 78 (1958) 468. (c) H. A. Barker, R. D. Smyth, E. J. Wawszkiewicz, A. MunchPeterson, J. I. Toohey, J. N. Ladd, B. E. Volcani and R. M. Wilson, J. Biol. Chem., 235 (1960) 181. (d) W. Buckel and H. A. Barker, J. Bacteriol., 117 (1974) 1248. (e) W. Buckel, Arch. Microbiol., 127 (1980) 167. 81. M. Brecht, J. Kellermann and A. Pltichthun, FEBS Lett., 319 (1993) 84. 82. (a) P. Dowd, S. Choi, F. Duah and C. Kaufman, Tetrahedron Lett., 44 (1988) 2137. (b) S. Choi and P. Dowd, J. Am. Chem. Soc., 111 (1989) 2313. 83. J. Baker and T. Stadtman, B12, D. Dolphin (ed.), John Wiley and Sons, New York, 1982 Vol. 2, pp 203. 84. (a) U. Leutbecher, R. B6cher, D. Linder and W. Buckel, Eur. J. Biochem., 205 (1992) 759. (b) F. Suzuki and H. A. Barker, J. Biol. Chem., 241 (1965) 878. 85. A. Thibblin and W. P. Jencks, J. Am. Chem. Soc., 101 (1979) 4963. 86. J. T. Edward, S. C. Wong and G. Welch, Can. J. Chem., 59 (1978) 931. 87. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 1383. 88. These species, although not physiologically significant themselves, were chosen to demonstrate how the migration behavior depends on the strength of the interacting acid. On this basis, the amino acids His--H + and LysmH + could be expected to show behavior similar to NH4 +, while Asp and Glu should be closer to H3 O+, and Cys and Tyr closer to HF. 89. The rearrangement assisted by HF has the same electronic profile as the uncatalyzed pathway. That is, the symmetrical structure corresponds to a minimum on the vibrationless potential energy surface that disappears upon inclusion of zero-point energy. With either NH4 + or H3 O+ as the acid, the symmetrical species is a transition structure on the vibrationless surface. 90. (a) J. A. Gerlt and P. G. Gassman, J. Am. Chem. Soc., 115 (1993) 11552. (b) W. W. Cleland and M. M. Kreevoy, Science, 264 (1994) 1887. (c) P. A. Frey, S. A. Whitt and J. B. Tobin, Science, 264 (1994) 1927. 91. (a) S. Scheiner and T. Kar, J. Am. Chem. Soc., 117 (1995) 6970. (b) S. Shan, S. Lob and D. Herschlag, Science, 272 (1996) 97. (c) M. Garcia-Viloca, A. Gonzalez-Lafont and J. M. Lluch, J. Phys. Chem. A., 101 (1997) 3880. 92. (a) E. L. Ash, J. L. Sudmeier, E. C. DeFabo and W. W. Bachovchin, Science, 278 (1997) 1128. (b) M. E. Tuckerman, D. Marx, M. L. Klein and M. Parrinello, Science, 275 (1997) 817. (c) C. L. Perrin and J. B. Nielson, Annu. Rev. Phys. Chem., 48 (1997) 511. (d) C. L. Perrin, J. B. Nielson and Y. Kim, Ber. Bunsenges. Phys. Chem., 102 (1998) 403. 93. P. Gilli, V. Bertolasi, V. Ferretti and G. Gilli, J. Am. Chem. Soc., 116 (1994) 909. 94. J. P. Guthrie, Chem. Biol., 3 (1996) 163. 95. (a) Y. Pan and M. McAllister, J. Am. Chem. Soc., 120 (1998) 166. (b) Y. Pan and M. McAllister, J. Am. Chem. Soc., 119 (1997) 7561.

214

96. A. Warshel, A. Papazyan and P. A. Kollman, Science, 269 (1995) 102. 97. A. Warshel and A. Papazyan, Proc. Natl. Acad. Sci. USA, 93 (1996) 13665.

L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems

215

Theoretical and ComputationalChemistry,Vol. 9 9 2001 ElsevierScienceB.V. All rights reserved

Chapter 6 S I M U L A T I O N S OF E N Z Y M A T I C S Y S T E M S PERSPECTIVES FROM CAR-PARRINELLO MOLECULAR DYNAMICS SIMULATIONS

P a o l o C a r l o n i 1,2 a n d U r s u l a R o t h l i s b e r g e r 3 1 International School of Advanced Studies and INFM-Istituto Nazionale di Fisica della Materia, 1-34014 Trieste, Italy 2 International Centre for Genetic Engineering and Biotechnology, 1-34012 Trieste, Italy SLaboratory of Inorganic Chemistry, ETH Zurich, CH-8092 Zurich, Switzerland

1

INTRODUCTION

In 1985, Car and ParrineUo introduced a new method [1] that merged two major fields of computational chemistry that had so far been essentially orthogonal. They were able to combine electronic structure calculations based on density functional theory (DFT) with a classical molecular dynamics (MD) scheme. This new simulation method allows to perform parameter-free MD studies in which all the interactions are calculated on the fly via an electronic structure method. These first-principles or ab initio molecular dynamics (AIMD) simulations are especially valuable for systems for which it is difficult (or impossible) to construct empirical potential energy functions. They are also a necessary prerequisite for the study of processes in which a wide range of chemical environments are sampled that challenge the transferability properties of empirically-derived potentials. An adequate description of (transition) metal centers or the forming and breaking of bonds in chemical reactions can be mentioned as typical examples. The introduction of the Car-Parrinello method has not only extended the range of classical MD simulations based on empirical potentials but at the same time, it has also significantly increased the capabilities of conventional electronic structure calculations. Through the combination with a MD method a generalization to finite temperature and condensed phase systems was achieved. Furthermore, a whole set of simulation tools based on statistical mechanics can be applied in this way in the context of an electronic structure method. Consequently, many dynamic as well as thermodynamic properties can be described within the accuracy of a first-principles method. AIMD was first applied to clusters [2, 3] and amorphous solids [4]; subsequently it became also a valuable and versatile tool for the study of materials [5] and of chemical reactions in the gas phase [6] and on surfaces [7].

216

A first step towards biological applications was issued in the early 90's when Parrinello, Car and co-workers demonstrated the power of the method in describing structural, electronic and dynamical properties of liquid water and other H-bonded systems [8]. Indeed, not only did this work extend the domain of applications to solute/solvent interactions [9] and chemical reactions in aqueous solution [10], but it also provided a first basis for biological modeling and therefore represented a key step for the start of ab initio biosimulations. The first application on a biological system was performed in the mid-90's on a gas phase cluster model of the active site of superoxide dismutase [11]. Since then a rapidly increasing number of applications to biological systems have been reported. In this article, we are trying to give an overview of the current status by giving a short outline of the studies that appeared so far in the literature and by presenting selected examples from our own work on enzymatic systems. This review is organized as follows. In Section 2, we describe the foundations of the method in its most wide-spread implementation, the one based on DFT, plane wave basis sets and pseudopotentials. In Section 3, we outline the different approaches to AIMD modeling of biological systems. This is followed by a summary of the applications that appeared so far (Section 4), with particular emphasis on enzymes (Section 5). Finally, in Section 6, we give an outlook on possible future directions for the investigation of enzymes and other fundamental classes of biomolecules.

2

PRINCIPLES METHOD

OF T H E C A R - P A R R I N E L L O

The central concept of AIMD as introduced by Car and Parrinello [1] lies in the idea to treat the electronic degrees of freedom, as described by e.g. one-electron wavefunctions r as dynamical classical variables. The mixed system of nuclei and electrons is then described in terms of the extended classical Lagrangian/2~:

s

-

K.N + 1E,~-

Epot

(1)

where/EN is the kinetic energy of the nuclei,/E~ is the analogous term for the electronic degrees of freedom and Epot is the potential energy which is a function of both, nuclear positions R1 and electronic variables r s can thus be written as: L:r

i

1 / 2 M , ~ ; + y~#[r i

2 - E[{r

{/~,}] + Y~ Ai, [ { f r162

} -5i,]

(2)

i,s

where Aij are Lagrange multipliers that ensure orthonormality of the one-electron wavefunctions r and # is a fictitious mass associated with the electronic degrees of freedom. The Lagrangian in Eq. 2 determines the time evolution of a fictitious classical system in which nucleic positions as well as electronic degrees of freedom are treated as dynamic variables. The classical equations of motion (EOM) of this system are given by the Euler-Lagrange equations:

217

d--t

-

Oq*

(3)

where qi corresponds to a set of generalized coordinates. With the Lagrangian of Eq. 2, the EOM for the nuclear degrees of freedom become: :~ MIRI =

0t7 _.

(4)

0RI and for the electronic ones - -nv,

+

A,jCj (5) J where the term with the Lagrange multipliers Aij describes the constraint forces that are needed to keep the wavefunctions orthonormal during the dynamics. The parameter # is a purely fictitious variable and can be assigned an arbitrary value. In full analogy to the nuclear degrees of freedom, # determines the rate at which the electronic variables evolve in time. In particular, the ratio of M1 to # characterizes the relative speed in which the electronic variables propagate with respect to the nuclear positions. For # < < M1 the electronic degrees of freedom adjust instantaneously to changes in the nuclear coordinates and the resulting dynamics is adiabatic. Under this conditions K:e < < ~N and the extended Lagrangian in Eq.1 becomes identical to the physical Lagrangian of the system. s - ]~N -- E p o t (6)

For finite values of p, the system moves within a given thickness of E kin above the BornOppenheimer surface. Adiabacity is ensured if the highest frequency of the nuclear motion

is well separated from the lowest frequency associated with the fictitious motion of the electronic degrees of freedom

For systems with a finite gap Eg the parameter # can be used to shift the electronic frequency spectrum so that we > > 031 and no energy transfer between nuclear and electronic subsystems takes place. For metallic systems special variations of the original method have to be adopted [12]. In practice, it is easy to check if adiabatic conditions are fulfilled by monitoring the energy conservation of the physical Lagrangian in Eq. 6. Eqs 4 and 5 (or analogous first order equations) can be used fi~r a simultaneous optimization of electronic and nuclear degrees of freedom. They can also be used to generate classical nuclear trajectories on a quantum mechanical potential energy surface: after an initial optimization of the electronic wavefunctions for a given starting configuration, ionic and electronic degrees of freedom can be propagated in parallel along the Born-Oppenheimer surface. The Car-Parrinello method is similar in spirit to the extended Lagrangian methods for constant temperature [13] or constant pressure dynamics [14]. Extensions of the original

218

scheme to the canonical NVT-ensemble, the NPT-ensemble or to variable cell constant pressure dynamics [15] are therefore straightforward [16]. The treatment of quantum effects on the ionic motion is also easily included in the framework of a path-integral formalism [17]. Most of the current implementations use the original Car-Parrinello scheme based on DFT. The system is treated within periodic boundary conditions and the Kohn-Sham one-electron orbitals r are expanded in a basis set of plane waves (with wave vectors

G~). r -

1 eiO=.~. Vx/-~cen~I n c,m

(9)

up to a given kinetic energy cutoff Ecut. In such a scheme, an adequate treatment of inner core electrons would require prohibitively large basis sets. Therefore, only valence electrons are treated explicitly and the effect of the ionic cores is integrated out using an ab initio pseudo potential formalism. Due to the use of periodic boundary conditions, the treatment of charged systems needs special care. Different methods are available for this purpose [18]. Apart from the traditional scheme, Car-Parrinello approaches using semiempirical [19], nartree-Fock [19, 20] or GVB [21] electronic structure methods have been proposed and extensions to augmented plane wave [22] and hybrid basis sets of atom-centered basis functions and plane waves [23] have been implemented. Recently, Car-Parrinello schemes have also been extended into a mixed quantum/classical QM/MM approach [24]. If not mentioned otherwise, all the calculations presented in the next sections use the original Car-Parrinello scheme based on (gradient-corrected) density functional theory in the framework of a pseudo potential approach and a basis set of plane waves.

3

CAR-PARRINELLO CAL SYSTEMS

MODELING

OF B I O L O G I -

The exponential increase in computer power and the development of highly efficient algorithms has distinctly expanded the range of structures that can be treated on a firstprinciple level. Using parallel computers, AIMD simulations of systems with few hundred atoms can be performed nowadays. This range already starts to approach the one relevant in biochemistry. Indeed, some simulations of entire biomolecules in laboratory-realizable conditions (such as crystals or aqueous solutions) have been performed recently [25-28]. For most applications however, the systems are still too large to be treated fully at the AIMD level. By combining AIMD simulations with a classical MD force field in a mixed quantum mechanical/molecular mechanical fashion (Hybrid-AIMD) the effects of the protein environment can be explicitly taken into account and the system size can be extended. Even though it is possible to work with fairly sizeable quantum models, an intelligent choice of the crucial part of the system is still the basis of any successful modeling. The following different approaches have been applied so far: (1) AIMD simulation of the full system in laboratory-realizable conditions (e.g. in the crystal phase or in aqueous solution)

219 (2) AIMD calculations of carefully designed gas phase cluster models (3) AIMD simulations of gas phase cluster mode]is embedded in an external electrostatic field (4) Hybrid QM/MM Car-Parrinello simulations in which the quantum part is treated at the AIMD level and the surrounding is described with a classical force field

4 4.1

Applications to Non-Enzymatic Systems Nucleic Acids

RNA and DNA are in general very difficult to model with force-field based approaches. One major difficulty is to reproduce the backbone conformation (crucial for any modeling of nucleic acids), as the corresponding torsional energy barriers are very small [29]. First results from AIMD are encouraging: the calculated structure of a hydrated GpG RNA duplex in laboratory realizable conditions (that is, in the crystal phase)[25] showed excellent agreement with experiment and provided the H-bond network postulated by the crystallographers. Investigations of platinum-based drugs [30, 31] and their adducts with DNA fragments in the solid state [32] and in aqueous solution [26] confirm the reliability of an AIMD scheme to describe these systems even in the presence of transition metal ions, which (as mentioned in Section 1) are notoriously difficult to treat with effective potentials [33]. 4.2

Heme-Based

Proteins

AIMD has been used extensively to elucidate structure/function relationships in myoglobin and cytochromes. Calculations on myoglobin mimics [34-39], provided a picture of the binding mode of 02 and ligands such as CO and NO. These studies have also shed light on the intricate interplay between structural and bonding properties of the complex and environment and temperature effects. Two cytochromes have been studied so far. In case of the electron-transfer protein cytochrome c, electronic structure calculations helped clarify the intriguing nature of the Fe-S bond at the active site [40] whereas for cytochrome P450, steps of the enzymatic reaction were investigated [41-44]. The P450 family of' enzymes is involved in the metabolism of endogenous and xenobiotic compounds and this work can therefore be of potential use in toxicology research.

4.3

Cyclic Peptides

and Ion Channels

Up to date, two AIMD studies have been performed in this field. The first dealt with self-assembled polypeptides nanotubes [28]. These systems have a variety of potential application in biochemistry and material science, from optoelectronics to the construction of drug delivery vehicles. Calculations carried out on Cyclo[D-Ala-Glu-D-Ala-Gln]2 in the crystal phase showed also in this case excellent agreement with available structural data and provided novel information on intra- and intermolecular H-bond patterns.

220

In the second study, the proton diffusion through a polyglycine analog of the gramicidin channel was analyzed [27]. These simulations showed that the diffusion process is very rapid and furthermore, is assisted by the polypeptide backbone. Thus, the latter emerges as a key factor for rapid proton transfer through the channel. 4.4

Photosensitive

Proteins

A recent study has focused on structural and electronic aspects of a bacteriochlorophyl derivative (methyl bacteriophorbide) in the crystal phase [45]. The calculations are in good agreement with experimental data and provide evidence of a local structural change upon electronic excitation. AIMD simulations have also been carried out on the chromophore present in the rhodopsin photoreceptor (retinal). In the primary event of vision, retinal passes from the ground state (GS) to an excited state (ES) and isomerizes from 11-cis to all-trans within ~ 200 fs. A series of papers [46-50] have analyzed the GS isomerization process. More recently, calculations were extended to the first singlet ES [51] within a recently developed scheme for singlet state dynamics [52]. This work characterizes structural and energetic changes during the photoisomerization process and points to the crucial role of environment effects.

APPLICATIONS 5.1

TO E N Z Y M E S

Introduction

Understanding enzymatic function and mechanism at the molecular level is one of the most challenging and fascinating problems of biochemistry. Furthermore, it has direct implications in pharmacological intervention, as enzymes constitute the targets for a large variety of therapeutic agents. A modeling of such phenomena however, is a formidable task: an appropriate method should be able to treat fairly large systems of several thousand of atoms and take into account dynamical effects at finite temperature. Furthermore, for a direct investigation of the enzymatic mechanism of action, the modeling should also provide an adequate description of chemical reactions. AIMD simulations appear as a promising tool for a first-principles modeling of enzymes. Indeed, they enable in situ simulations of chemical reactions; furthermore, they are capable of taking crucial thermal effects [53] into account; finally, they automatically include many of the physical effects so difficult to model in force-field based simulations, such as polarization effects, many-body forces, resonance stabilization of aromatic rings and hydration phenomena. In this section, the power and the current limitations of AIMD in studying enzyme function is illustrated by a survey of selected recent applications. First, we present calculations on two very-well known enzymes, which are meant as benchmark studies for subsequent applications. Then, we outline application to pharmaceutical research and finally, we conclude this section by presenting state-of-the-art, QM/MM Car-Parrinello hybrid simulations on an enzyme relevant for synthetic and biotechnological applications.

221 5.2

Test Cases

To probe the capabilities of ab initio molecular dynamics (AIMD) in describing enzymatic reactions, calculations have been carried out on two text-book examples, human carbonic anhydrase II and serine protease. As these are among the most theoretically and experimentally characterized enzymes, this work has provided a basis for subsequent applications in the field.

5.2.1

Human Carbonic Anhydrase II (HCAII)

HCAII is a zinc-enzyme (260 amino acids, ~29 kD) which catalyzes the reversible hydration of CO2 to bicarbonate HCO~. The active site is located at the bottom of ,-.,15/~ deep conical cavity that is open towards the solvent. With a turnover rate at room temperature of ~ 106s-1, HCAII is one of the fastest enzymes known. X-ray structures show that the zinc ion is coordinated to three histidine residues (His94, His96 and Hisll9) and that a water molecule is bound to the zinc ion in an approximately tetrahedral arrangement. This water molecule has a pKa around 7-8 and can thus be easily deprotonated to OHunder physiological conditions. The zinc bound H:~O/OH- is connected via a hydrogenbonded network (H20/OH- --+ Thr199 --+ Glul06) to the rest of the protein. Another hydrogen-bonded network (H20/OH---+ HOH318--+ HOH292 -+ His64)extends from the zinc bound hydroxide/water via two solvent molecules to a histidine group located in the upper channel of the active site. The direct zinc ligands (His94, His96 and Hisll9) and the two residues involved in the hydrogen-bonded network around the zinc-bound water (Thr199 and Glul06) are conserved in all animal carbonic anhydrases [54] and site directed mutagenesis experiments have revealed the crucial importance of these residues for the activity of the enzyme by controlling a precise coordination geometry at the zinc center [55]. The catalytic reaction involves the steps of binding the CO2 via a nucleophilic attack of the zinc-bound OH-, conversion to HCOi~-, replacement of HCO~- by water and regeneration of the Zn-OH- through deprotonation of the zinc bound water molecule. The latter step constitutes the rate determining step [56] and most probably involves the histidine residue (His64) as a proton shuttle. Experiments estimate a free energy barrier of ~10 kcal/mol for the overall proton transfer reaction originating mainly from solvent reorganization or conformational changes while the intrinsic barrier for proton transfer could be as low as 1.25 kcal/mol [56]. Our goals in this study [57] are to (i) probe the influence of the size of the quantum cluster; (ii) establish the effect of the environment, i.e. compare cluster models with QM/MM hybrid models which include the electrostatic effect of the protein environment; and (iii) study the dynamical properties. We consider several different models of the active site: two ab initio cluster models of different size: MOD-A (,--,30 atoms) consists of a zinc-trisimidazole complex with a water or a hydroxide group as fourth ligand. MOD-B (~90 atoms) includes the tetrahedrally coordinated zinc center (Zn2+-H20/OH -, His94, His96, Hisll9) and the essential residues involved in the hydrogen bonding network (Thr199 and Glul06). The eight ordered water molecules resolved in the crystal structure that are within a distance of 7.~ from the zinc ion have also been included. The residues were fixed at a position close to the backbone and otherwise left free.

222

Figure 1: Graphical representation of model B. Atoms that are kept fixed during the simulations are indicated with a circle. Dummy hydrogen atoms are indicated with white balls. (Reproduced with permission from ref.[57], Copyright 1998 Am.Chem.Soc.) Dummy hydrogen atoms have been used to saturate the QM model where covalent bonds had to be broken to cut out the cluster model from the rest of the protein. Figure 1 shows a graphical representation of MOD-B. MOD-C is an extension of MODB that takes the electrostatic external field of the protein into account. The electrostatic background is represented by Gaussian broadened point charges located within a distance of 7.5-9/~ from the centre of the simulation box. We have tested charge sets from AMBER 4.0 [58] and GROMOS96 [59] force fields and have also probed the effect of different charge exclusion schemes. Our calculations show that the smallest size quantum model (MOD-A) does not provide an adequate description for neither structural, nor electronic or dynamical properties. In contrast, a cluster model of the size of MOD-B is able to reproduce the structural properties of the real system quite accurately and provides also a qualitative description of the electronic and dynamic features: S t r u c t u r a l P r o p e r t i e s . As an example for a characteristic structural property, the zinc-oxygen bond distance for models A and B is compared in Table 1. In the case of the zinc-trisimidazole complex (MOD A), the zinc-oxygen distance changes distinctly upon deprotonation of the zinc-bound water (Ad -- 0.32/~). Such a drastic change of the zincoxygen distance is not observed when comparing the crystal structures at low and high pH [60]. Apparently, in the real enzyme the protein environment helps to stabilize the zinc-oxygen distance during protonation/deprotonation. This shows clearly that such a simplified model is not able to capture the main structural features of the real enzyme.

223

T a b l e 1. Z i n c - O x y g e n D i s t a n c e s of Different M o d e l C o m p l e x e s

MOD A OH/HOH MOD B OH/HOH

BLYP

exp

1.91/2.23 1.94/2.02

2.05/2.051 2.05/2.051

exp: experimental values. All distances are given in Angstrom. 1values of the experimental crystal structures of the high and low pH forms of the enzyme [60].

In contrast, model B is able to retain the appropriate structure of the active site. In particular the zinc-oxygen distance in the hydroxide and in the water form are now similar (Ad = 0.08 .~).

Electronic Properties: Effects of the S u r r o u n d i n g . The proton affinity of the zinc-bound water molecule is a key property for the enzymatic mechanism. The acidity of the zinc bound water is the result of a subtle fine tuning via hydrogen-bonded networks and electrostatic environment effects. This quantity can thus serve as a sensitive indicator of differences in the electronic structure that will have a critical influence on the enzymatic reaction. As a first attempt to quantify the effect of the electrostatic environment and the varying size of the cluster model we have therefore calculated the proton affinities for the different models. The small cluster model A has a distinctly smaller proton affinity than the larger cluster model B. The inclusion of the environment does not change this value significantly. Point charge sets from two different force fields result in almost identical values even though the absolute magnitude of the specific charges differs in some cases appreciably. A large overpolarization effect is however induced if the 1-4 electrostatic interactions to the QM part are maintained. To the best of our knowledge, the proton affinity of HCAII is not known experimentally. The only experimental values available for a rough bracketing are the gas phase proton affinities of water (166.7 kcal/mol) [61] and OH- (390.8 kcal/mol) [62]. In view of this values, a proton affinity of 433 kcal/mol calculated without applying Coulomb exclusion rules at the QM/MM interface is clearly far too high. This indicates that point charges close to the QM part can induce a large over polarization and have to be treated with care. Table 2. P r o t o n Affinity of the Zinc-bound W a t e r BLYP

MOD A 184

MOD B 268

GROMOS 269

AMBER 268

AMBER inc 433

....

GROMOS: MOD B with point charge set of GROMOS96 (electrostatic 1-4 interactions to QM part are excluded); AMBER: MOD B with point charge set of AMBER 4.0 (electrostatic 1-4 interactions to QM part are excluded), AMBERinc: MOD B with point charge set of AMBER 4.0 (electrostatic 1-4 interactions to QM part are included). All energies are given in kcal/mol

224

Dynamical Properties: The Proton Transfer Reaction. We have investigated the dynamical properties of the two gas phase cluster models within the local density approximation (LDA). As for the structural properties, also the dynamical properties of the minimal cluster model A differ clearly from the real system. A short MD simulation (1 ps) at room temperature shows that the zinc-bound hydroxide can rotate around the Zn-O axis. This is in contrast to the real protein where the zinc-bound nucleophile is kept by the hydrogen-bonded network of Thr199 and Glul06 in a defined orientation [63] appropriate for the binding of CO2. Furthermore, the mobility of the imdazole rings is much higher than the ones of the corresponding histidine residues which are kept quite rigidly in place as indicated by T-factors of 5-10 reported for the crystal structure [62]. To investigate the dynamical properties of the larger cluster model B, we have performed a 1 ps MD simulation at body temperature. Being aware of the known deficiencies of the LDA in significantly underestimating proton transfer barriers, we used this simulation to make an efficient scan of the potential energy surface of the system 'with reduced barriers'. During these MD runs a spontaneous proton transfer reaction is observed. Starting from the hydroxide form of the enzyme a proton from a neighboring water molecule (HOH 318) is transferred to the zinc bound OH- and the charged defect can be transferred to the next water molecule by a further switch of a proton. In these proton transfer reactions a simultaneous shortening of several hydrogen-bonds (between the zinc-bound water, HOH 318 and HOH 292) occurs and protons can be exchanged easily back and forth via these three solvent molecules that form a kind of proton-exchange pathway. The water molecules involved in this process are indeed the ones connected in the real protein via a hydrogen-bonded network to the hypothetical proton shuttle group His64. Figure 2 shows the temporal evolution of the four oxygen-hydrogen distances (ZnHO...H-O-H(318)...OH2(292)) that form the proton relay. Two of these OH-distances correspond to covalent O-H bonds (as indicated in Figure 2 by OH distances of 1.0-1.2 /~) and two to hydrogen-bonded O...H distances in the range of ~ 1.6/~ (the hydrogenbonded O...H distances are somewhat shorter than what can be expected due to the overestimation of hydrogen-bonding within the LDA). It is apparent in Figure 2 that the monitored pairs of oxygen and hydrogen atoms can change their mutual distance from covalent to hydrogen-bonded and vice versa, i.e. the protons can be exchanged between two neighboring oxygen atoms. Prior to such a proton transfer, the OH distances involved in the relay adjust simultaneously to a similar value around 1.2-1.3/~ (corresponding to the symmetric position of the hydrogen between two oxygen atoms). Such a concerted change can be seen around 200, 340 and 440 fs. During our simulation, only the zinc bound water molecule exchanges its proton via this pathway and no other proton transfers were observed along different hydrogen-bonded networks. The findings of our simulations are thus in very good agreement with the proposed role of His64 as proton shuttle group. The fact that it is possible to observe directly part of the enzymatic reaction cycle is very encouraging. Our approach is completely bias-free in the sense that no knowledge about likely reactions or reaction coordinates is necessary. Such an unbiased approach seems particularly promising for the study of systems where the enzymatic reaction is not yet known in detail.

225

Figure 2: Temporal evolution of four characteristic oxygen-hydrogen distances involved in the proton relay. Note the simultaneous contraction of the O-H distances around 200, 340 and 440 fs prior to a proton transfer event. (Reproduced with permission from ref. [57], Copyright 1998 Am.Chem.Soc.)

5.2.2

Serine P r o t e a s e s

The serine proteases (SPR's) are one of the most studied enzyme families [64-74]. SPR's use the catalytic triad (Ser195-His57-Asp102) to catalyze the hydrolysis of peptides (Fig. 3a). This occurs through nucleophilic addition of the 3-hydroxyl group of Ser195 to the acyl carbonyl of the substrate, with formation of a negatively charged tetrahedral intermediate (Fig. 3(5)). Stabilization of the intermediate is achieved by formation of two H-bonds with the amide groups of Ser195 and Gly193 (mammalian isoenzymes [65]) or with the amide groups of Ser195 and the sidechain of Asn155 (bacterial isoenzymes [75]). Theoretical [76, 77] and experimental [75, 78] studies on wild type and mutants of a bacterial SPR (subtilisin) have shown that Asn155 is a key residue for the biological function, in that it provides a stabilization of the transition state (TS) relative to the ground state (GS) by as much as ~ 5 kcal/mol. Curiously, no correspondent studies on the mammalian isoenzymes have appeared to clarify the crucial role of Gly193. A second, important H-bond interaction involves two residues of the catalytic triad, His57 and Aspl02. A series of NMR studies on a mammalian [72-74,79] and bacterial [80] SPR's and their complexes with inhibitors have indicated the presence of a low-barrier hydrogen bond (LBHB) linking N61 of protonated His57 with the ~-carboxyl group of Asp102 (Fig. 3) [72-74,79]. Approaching of the TS is suggested to facilitate the formation of the LBHB,

226 S195 H ,"

R'

G193

~ ~ 0

/

H Q192

(a) D102

S195 ~ ~ / / N

,

---.2 H

/o

R 9

G193

11...-- \

/

Q192

s

(b) D102

Figure 3: Schematic views of the H-bond network in mammalian serine proteases active site (a) and of the adduct with the intermediate of the enzymatic reaction (b). In (b) the double arrow symbol refers to the a putative low-barrier H-bond.(Reproduced with permission from ref.citepapersp, Copyright 1999 Wiley.) which in turn may render N~2 of His57 a stronger base for accepting a proton from Ser195 in the formation of the intermediate [72-74,79]. As a result of this process, the free energy barrier of the TS relative to the GS could decrease (but this point is object of some controversy [81, 82]). To provide a picture of the chemical bonding in SPR's, and to relate it to the biological function, we have carried out ab initio molecular dynamics simulations on models of the SPR-intermediate (I-SPR) and the SeR-substrate complexes (S-SPR) (Fig. 4) [83, 84]. I - S P R d y n a m i c s . Consistent with NMR studies [79], proton hopping occurs between His57 and Aspl02 in the subpicosecond time-scale. Analysis of the chemical bonding indicates that the interaction is covalent in nature [83]. The second fundamental H-bond interaction investigated here involves Gly193 and the intermediate carbonyl oxygen. This H-bond is well maintained during the dynamics (average O - - - H distance of 1.7(0.1) h). A rough estimation of the interaction energy based on an electrostatic model [83] indicate that Gly193 stabilizes the intermediate by more than 10 kcal/mol (Tab. 3). This value appears to be too large for a hydrogen bond [85, 86]. Inspection of the structure reveals that the very large Q192G193 peptide's unit dipole (~4 D [67]) could be also an important factor for intermediate stabilization, as it points towards the negative charge of the intermediate. To extract the peptide dipole

227

Figure 4: Serine proteases: model complexes representing I-sPa ((a) and (c)), S.SPR ((b) and (d)). In (c) and (d) the Q192G193 peptide unit is replaced by dimethylammonia. H-bonds are depicted with dashed lines. Arrows indicate the scissile carbon atom C s. The latter is labeled only in (b) for clarity. (Reproduced with permission from ref. [83], Copyright 1999 Wiley.)

contribution from the total stabilization energy we constructed a second model complex in which the Q192G193 peptide unit is substituted by dimethylamine (II! in Fig. 4c). Tab. 3 shows that the resulting stabilization is much smaller, of the order of only few kcal/mol. Thus, we conclude that a large contribution of the transition state stabilization is due to charge-dipole interactions. S - s P a d y n a m i c s . The two key H-bond interactions are maintained but no proton transfer occurs. Interestingly, the substrate-protein interaction energy turns out to be much lower than that of the I - s P a complex (Tab.3). Table 3: Serine P r o t e a s e s Elec. AE (I-SPa) (Complex I) -12(4) -2.6 AE (I-SPa) (Complex III) AE (S-SPa) (Complex II) -6(2) -2.6 AE (S.SPa) (Complex IV)

B.E.

-4.2 -1.5

Tab.3 Intermediate- and substrate- Q192G193 peptide unit interactions in terms of electrostatic (Elec.) and binding energies (B.E.) (in kcal/mol). Replacing the Q192G19 peptide with dimethylammonia (complex IV) causes a drastic

228

decrease of the interaction energy. The latter turns out to be practically identical to that of complex I I I (Tab.3). We conclude that H-bond interactions are similar in the S-SPR and I-SPR complexes. In contrast, the electrostatic (charge-dipole) interactions are very different, the I- SPR being more stable by ~ 6 kcal/mol with respect to S-SPR (Tab.3). For these complexes it has been possible to calculate also the binding energies. Tab.3 shows a qualitative agreement between binding and electrostatic energies. The result validates the use of the electrostatic model for a qualitative analysis of intermolecular interactions, as it has been done in this work. Our calculations are completely consistent with and confirm the existence of a LBHB between His57 and Asp102, which has been observed experimentally in transition state analog inhibitor complexes [72-74,79]. Furthermore, they strongly support the proposal of an LBHB-facilitated mechanism [79], as the LBHB is essentially covalent in nature. Thus, the energy supplied by covalent interaction may be crucial to overcome the energy loss due to the compression of the two residues, which is a prerequisite for the postulated LBHB-based reaction [79]. The second conclusion is that the rather large, Gly193-induced stabilization of the transition state with respect to the ground s t a t e / s not caused by an H-bond with Gly193, as commonly proposed [65, 66]: indeed, the H-bond favors the binding of both substrate and intermediate by ~ 2.6 kcal/mol, a value typical of a strong H-bond in biological systems [86]. Instead, the negatively charged transition state turns out to be more stable relative to S-SPR by several kcal/mol as a result of the interaction of the negative charge with the large dipole of the Q192G193 peptide unit. A simulation in which dimethylammonia replaces the Q192G193 peptide unit confirms the crucial role of the dipole: the absence of the stabilizing charge-dipole interaction renders the intermediate species unstable. These considerations suggest that site-directed mutagenesis experiments on the 192 and/or 193 positions might affect significantly the activity of SPR's, as the Q192G193 dipole orientation may no longer be optimal for transition state stabilization. 5.3

Enzymes

As Targets

for Pharmaceutical

Intervention

Molecular dynamics calculations based on force-fields are a fundamental tool for designing new and more powerful drugs for specific molecular targets [87]. Based on the largenumber of 3D biological structures available today, these calculations have led to major advances in our understanding of macromolecular structures, molecular similarity and in the identification of pharmacophores. However, the force-field approach is not devoid of problems, which lie in the intricate physico-chemical nature of the intermolecular interactions. Indeed, it is becoming increasingly clear, from both experiment and theory, that electronic structure effects may play a crucial role in ligand-receptor interactions and enzyme-inhibitor binding. Examples in this respect include bond- forming- bond-breaking processes such as low-barrier hydrogen bonds and charge transfer and polarization effects. All these phenomena are more reliably described by ab initio quantum-chemical methods. In this respect, AIMD presents itself as a promising new tool. In the next paragraph, we describe our work on the main targets for therapeutic intervention in AIDS, the enzymes protease and reverse transcriptase from human immunodeficiency virus type 1 (HIV-1 PR and HIV-1 RT). Subsequently, we focus on an

229

enzyme of relevance for anticancer research.

5.3.1

HIV-1 P r o t e a s e (HIV-1 P R )

HIV-1 PR cleaves the multidomain protein encoded by the virus genome to yield separated structural proteins. Structure-based drug-design studies have shown that in the substratecleavage s i t e - two Asp-Thr-Gly loops at the subunit-subunit interface (Fig. 5a) - the almost coplanar conformation of the catalytic Asp dyad is crucial for enzymatic function and for the binding of both substrate and inhibitors [88-90].

Figure 5: (a) Structure of HIV-1 PR [103] and its cleavage site; (b) models used for the ab initio molecular dynamics of the mono-protonated form. (Reproduced with permission from ref. [100], Copyright 2000 Wiley.) Based on these structures, force-field based molecular dynamics (MD) simulations have been used to probe the binding of novel ligands [91-99]. However, these approaches have encountered difficulties in adequately describing interactions of the catalytic aspartyl pair [91-99]. As a result, ad hoc assumptions have often been introduced in the calculations. Among these are (i) the choice of charge distribution [95]; (ii) the application of geometric constraints between the carboxylate moieties [96, 97] and (iii) the positioning of the proton midway between the two adjacent Asp groups [99]. These a posteriori models, therefore, do not provide the physico-chemical origin of the stability in the active site. Quantum-mechanical approaches appear ideally suited to provide an understanding of the underlying molecular interactions of the Asp dyad. Here, we present results from our ab initio MD simulations [100]. This investigation, which focuses on the free enzyme, is divided in two steps. First, we attempt to determine the protonation state of the Asp

230

dyad [101, 102]. Then, we study the conformational flexibility of the Asp dyad of HIV1 protease on models of increasing complexity, including also the protein electrostatic potential. Our model complexes of HIV-1 PR active site (Fig. 5b) were constructed starting from the structure of the free enzyme [103]. From the X-ray structure, it has been inferred that a water molecule bridges the two Asp groups (Wat_b hereafter) even though its exact location has not been provided. We positioned Wat_b so as to form the H-bond patterns already proposed for the eukariotic isoenzyme [104] and added two other water molecules putatively present in the active site channel which interact with the Asp dyad. P r o t o n a t i o n S t a t e . At optimal pH for enzymatic activity (N 5-6) [101, 102, 105], the Asp dyad can in principle exist in three protonation states, a deprotonated, a monoprotonated or a doubly protonated form. Because hydrogen atoms are invisible in the X-ray structure, evidence for a specific protonation state must be inferred indirectly by spectroscopic or titration measurements. Up to now, the existence of the doubly protonated, neutral form has not been proposed for the free enzyme. The existence of the deprotonated, doubly negative form is supported by a recent NMR study [102] at pH 6. However, this study has been subjected to criticism [106] and it is not conclusive. Our ab initio simulations of this form show that the Asp dyad is unstable even in the ps timescale because of the strong Asp-Asp repulsion, which turns out to be N +30 kcal/mol (as estimated with a simple electrostatic model [100]). Thus, our calculations do not support the existence of this form. The third possible state is the mono-protonated one, which has been strongly suggested from both experiment and theory [101, 106]. The ab initio energy minimizations performed on relatively large models of the two protomers C and B indicate that [100]: (i) C is lower in energy than B by 1.1 kcal/mol; (ii) the conformation of C is close to the X-ray structure but that of B is not; and (iii) the location of Wat_b is close to the observed electronic density in C but not in B. Inclusion of additional water molecules of the active site channel is expected to stabilize further protomer B relative to C because Wat_b can form additional H-bonds in the latter but not in the former. In conclusion our calculations provide strong evidence in favor of protomer C in free HIV-1 PR and all subsequent calculations have been done on complex C or its derivatives. Simulations of t h e Cleavage Site w i t h M o d e l s of I n c r e a s i n g Levels of C o m plexity. The ab initio MD simulation of the simple Asp dyad - Complex C(I) (Fig. 5b) - demonstrates a hopping of the aspartyl proton between the oxygen atoms already on the subpicosecond time-scale (Fig. 6b): the two O51 atoms oscillate around a very short equilibrium distance. The presence of this low-barrier hydrogen bond (LBHB) confirms previous findings based on quantum dynamical studies [107]. During the dynamics, the LBHB compensates for the strong repulsion between the two Asp OJ1 atoms (O51-Otil average distance 2.5(0.1)/~), which is consistent with the suggestion that this type of interaction can provide several kcal/mol of stabilization energy [74, 108]. While the LBHB is maintained, the coplanarity is completely disrupted (Fig.6b). We conclude that the LBHB bond is able to keep the proton-sharing oxygen atoms close to each other but the repulsion of the other oxygens of the carboxylates renders the

231

A~sp2S

Asp2S 9.

,o

42

44

Asl~

Asp2S"

,~

Time (lOS)

Figure 6:HIV-1 protease: Ab initio molecular dynamics of complexes C(I), C(II) and C. (a) Location of the proton; (b)-(d): (Left) O61... H distances plotted as a function of time and (right) final (thick line) and starting (thin line) structures of complexes C(I) (b), C(II) (c) and C (d). In (d)(left) only the last 0.9 ps are shown for the sake of clarity. (Reproduced with permission from ref. [100], Copyright 2000, Wiley.)

system unstable. Inclusion of hydration and the hydrogen bond interaction with the glycine residuesComplex C(II) (Fig. 5b) - is not sufficient to produce a stable conformation: besides the loss of the characteristic orientation of the Asp groups, also the Asp-Asp hydrogen bond is disrupted (Fig. 6c). Inspection of the X-ray structures of unbound and complexed HIV-1 PR [103,109-115] offers an explanation for the instability of the system: in all the structures investigated, the rather rigid Gly amide groups do not form an optimum hydrogen bond with the Asp groups, /(N-H---O51) ranging from 125 to 153~ Thus, the carboxylate groups rearrange unphysically to maximize H-bond stabilization (maximum /(N-H---O51) = 179~ so as to remove the aspartyl hydrogen bond. Thus, we conclude that H-bonding to Gly27(27') is not an essential factor for the stability of the Asp dyad. What then are the key interactions stabilizing the conformation that is found in the experimental structures? A detailed inspection of the active site suggests the strong dipoles of the Thr26(26') Gly17(17') peptide units as important factors, as they point towards the negative charge of the Asp dyad. Indeed, calculation of the quantum-mechanically derived electrostatic potential of the aspartyl dyad reveals a striking alignment of the peptide unit dipoles with the Asp charge (Fig. 7). The resulting electrostatic interaction turns out to be rather large (an estimate from a point charge model is-7.8 kcal/mol [100]). The simulation, where the peptide link is included - Complex C (Fig. 5b) - confirms

232

Figure 7:HIV-1 PR: Thr26--Gly27 peptide unit's dipole (calculated and experimental (Nelson RD, et al. Nat'l. Bur. Stands. 10, 1967) values 3.82 D and 3.84 D, respectively) superimposed on the electrostatic potential of the Asp dyad active site. The coloring varies continuously from red in negative areas to blue in more positive regions.(Reproduced with permission from ref. [100], Copyright 2000 Wiley.)

the fundamental role of the charge-dipole interactions. Indeed, the system turns out to be stable over the relatively long time range explored (over 4.5 ps): (Fig. 6d) the coplanarity of the Asp dyad and the dipole-charge interactions are well maintained and proton hopping between the carboxylate groups is observed. Our calculations provide no support for the existence of a deprotonated form at pH 56 while they show that the mono-protonated s t a t e - in which the Asp dyad shares one proton- is rather stable, in agreement with previous findings [101, 106]. In the most stable protomer C, a water molecule forms two H-bonds to the Asp carboxylates. The close proximity of the two carboxylates is achieved by forming a LBHB which overcomes the repulsion of the two negative residues [74, 108]. The peculiar orientation of the two Asp residues is obtained through the interaction of the aspartyl negative charge with the rather rigid Thr26(26')-Gly 27(27') peptide units' dipole. Recent site-directed mutagenesis experiments on the 27, 27' position, which show the complete loss of catalytic power in the G27V, G27'V HIV-1 PR mutant [116], are consistent with the crucial role of this dipole. Indeed, replacing glycine with the bulk side-chain of valine may cause a significant rearrangement of the backbone and thus of this dipole. This in turn may stabilize a conformation of the Asp dyad which is not optimal for the catalytic action of the enzyme.

233

The ab initio MD simulations indicate that several ingredients, such as polarization forces, the treatment of bond-forming/breaking processes and temperature effects, play a crucial role in the HIV-1 PR active site. These key features are expected to play a critical role also in the adducts with the substrate and inhibitors. Based on these findings, specific force-fields could now be developed for this system, which in turn might allow for a more accurate modeling of HIV-1 PR - drug interactions. 5.3.2

HIV-1 Reverse

Transcriptase

Drug effectiveness in anti-AIDS therapy is severely limited by the capability of the virus to develop mutations which ultimately lead to drug: resistance [117, 118]. The spectrum of alterations is rather broad for both HIV-1 PR and and reverse transcriptase (RT), as evidenced by genetic and biochemical studies performed in the laboratory or in clinical trials [119, 120]. Single mutations effective against drug action are usually accompanied sequentially by 3 - 4 additional mutations so that several highly resistant mutation patterns are observed. Thus, understanding how mutations exert their effects on drug-resistance at a molecular level can ultimately lead to the design of new drugs and therapeutic strategies more effective against AIDS.

Figure 8:HIV-1 RT: Nucleotide binding site (right) and proton transfer between 7-phosphate and Lys65, superimposed with the electron localization function (ELF) (Silvi B e t al, Nature 1994; 371:683) (left). The ELF is represented in a best-fit plane containing the oxygen, the proton and the lysine nitrogen. Red areas indicate strong localization of the electronic density.

234

The recent determination of the crystal structure of a ternary catalytic complex of HIV-1 RT with a substrate (dTTP) and the DNh-primer and template [121] (Fig. 8) has provided the structural basis of resistance: it has been found that most mutations causing resistance to nucleoside-analog drugs are located closely to the nucleoside binding site. AIMD calculations were used to characterize the functional role of these residues involved in resistance against nucleoside-analog drugs [122]. Calculations were carried out for models of the nucleoside binding site in different protonation states of the substrate triphosphate (fully deprotonated and protonated in the q-position). While the protonated form experiences large rearrangements already in the ps time scale, the fully deprotonated form exhibits a previously unrecognized low-barrier hydrogen bond (LBHB) between Lys65 and ~ -phosphate (Fig. 8). The probable loss of this interaction in K65R HIV-1 RT may be a key factor of the well-known resistance of this mutant for nucleoside analogs (such as ddI, ddC and 3TC) [123]. Water molecules (not detected in the X-ray structure) form a structured H-bond network at the active site. A well-ordered water molecule emerged as key factor for substrate recognition by bridging Gin151 and Arg72 with the 7-phosphate. In the Q151M HIV-1 RT mutant, which exhibit cross resistance towards dideoxy-type drugs and AZT [124], loss of Gin151- water H-bond is expected to destabilize the water position and therefore could affect substrate binding and drug resistance. 5.3.3

Herpes Simplex Virus T y p e 1 T h y m i d i n e Kinase: a Target for GeneTherapy Based Anticancer Drugs

Viral herpes simplex type 1 thymidine kinase (HSV1 TK) is a key enzyme in the metabolism of the herpes simplex virus. Its physiological role is to salvage thymidine into the DNA metabolism by converting it to thymidine monophosphate: ATP + d ( T ) - + ADP + d(Tp) Phosphorilization is achieved by transfer of the q-phosphate group from ATP to the 5'-OH group of thymidine. Understanding the chemistry of this enzyme is important for applications in the treatment of virus infections and for cancer chemotherapy [125-132]. Recently, we have performed an ab initio MD study that has focused on the HSV1 TK nucleoside interactions [133]. Our goal has been to gain a better understanding of the nature of HSV1 TK binding interactions and of its mechanism of action. Our complexes are based on the X-ray structure of the substrate-enzyme adduct [134]. They include residues fixing the thymine ring (Met128 and Tyr172); the guanidinum group of Arg163, represented by an ammonium ion, is also included because of its important electrostatic role (Fig.9). Several HSV1 TK-thymine complexes have been considered by protonating the residues and the substrate differently. The ab initio MD simulations show that all the complexes investigated are stable in the ultrashort time-scale investigated. We study the binding by calculating the density difference Ap : P c o m p l e x - P f r a g m e n t s - Psubstrate, which describes how the electron density p changes during the formation of the complex. Inspection of the Ap for all complexes reveals that no charge transfer from or to the substrate is present (Fig. 10). The O and

235

Figure 9: HSV-1 TK nucleoside binding site (left) and (right) quantum-mechanical model used in the calculations. (Reproduced with permission from ref. [133], Copyright 1998 Wiley.) N atoms of thymine as well as the Arg163-Tyr172 H-bond are significantly polarized. Thus, the tyrosine ring appears to polarize the nucleobase, indicating that T y r 1 7 2 - T electrostatic interactions play an important role in the binding. This result is consistent with biological data on Y127F HSV1-TK mutant: indeed, the latter exhibits very small enzymatic activity [135]. In contrast, there is no evidence of polarization effects on the Met128 sulfur atom [136]. This indicates that sulfur plays only a minor role in binding. That the role of Met128 sulfur in the binding process is purely hydrophobic and steric has been confirmed by very recent site-directed mutagenesis experiments, which have shown that the activity is preserved when the Met residue is replaced by another hydrophobic residue such as Ile [135]. Work is in progress to study the binding of sugar-like chains of fraudolent substrates. The calculations point to a critical role of electrostatic interactions, providing a rationale for enzyme kinetics measurements performed in the lab of Prof. Folkers at the ETH in Zurich. 5.3.4

Conclusion

In conclusion, this type of quantum chemical calculations reveal a variety of functionally important characteristics of drug/target interactions, which can neither be discerned by visual inspection of the molecular structure nor be described by standard force-fields. Several ingredients, such as polarization forces, treatment of bond-breaking processes in the LBHB and temperature effects do play a critical role for drug binding and possibly for drug- resistance mechanisms. Implementation of such ingredients in standard force fields and in QSAR parameters is expected to result in a more efficient design of new drugs for these targets.

236

Figure 10: Electronic density difference in HSV-1 TK nucleoside binding site: Magenta:-0.054 e/A s, green 0.054 e/A a . (Reproduced with permission from ref.[133], Copyright 1998 Wiley).

5.4

Rational Design of Biomimetic Catalysts by Hybrid Q M / M M Car-Parrinello Simulations of Galactose Oxidase

In millions of years of evolution, nature has developed a remarkably elegant and subtle in vivo chemistry. Reactions are generally performed under very mild conditions, with high efficiency and (stereo) selectivity. It is therefore not surprising, that a lot of research effort is devoted to the understanding of the principles governing enzymatic catalysis and the development of small synthetic compounds that would be able to mimick the natural chemistry [137]. However, the search for simple synthetic models is difficult and only very few functional biomimetic compounds exist so far. One of the factors, that hampers the successful design of synthetic analogs is the great complexity of biological systems which makes it almost impossible to pinpoint all the important factors of the active site that have to be included in a biomimetic analog. An accurate and realistic computer modeling of the enzymatic process could in principle be used to map out these crucial factors. In computer experiments the influence of different residues in the active site can be probed easily and environment and temperature effects can be assessed. To probe the capabilities of

237

Y495 H~i

~H581

HzO~Y272 "" C228 (a)

(b)

Figure 11: (a) Schematic representation of the active site of Galactose Oxidase (GOase) in comparison with the biomimetic model compound [156] (b). (Reproduced with permission from ref. [159], Copyright 2000 Springer.) AIMD for this purpose we have chosen the mononuclear copper enzyme galactose oxidase (GOase, 68 kD, 639 amino acids) for which recently functional biomimetic models have been synthezised (see also the chapter on radical enzymes by F. Himo and L. Eriksson in this book). GOase is an extracellular enzyme secreted by the fungus Dactylium dendroides that oxidizes primary alcohols to the corresponding aldehydes under simultaneous reduction of molecular oxygen to hydrogen peroxide [138]. This reaction is performed for a wide range of substrates with strict regio and stereo selectivity. Properties which render this system of considerable interest for bioanalytical [139] and synthetic applications. The X-ray structure was solved in 1991 [140, 141] and showed that the Cu 2+ ion is coordinated by two nitrogens and two oxygen atoms from aromatic residues (His496, His581, Tyr272 and Tyr495) and an external fifth ligand (water or acetate) (Figure lla). No other cofactor is present in the structure that could provide the second redox equivalent for the catalyzed 2-electron oxidation. However, one of the tyrosine ligands (Tyr272) forms a very unusual covalent thioether linkage with Cys228 indicating that Tyr272 might provide the second redox center by forming a free radical stabilized via delocalization to Cys228. This ligand-based radical mechanism has been confirmed by EPR-measurements characteristic of a Cu(II)- site close to an organic radical and by the EPR-spectrum of the apo enzyme [142-144]. Many biomimetic model compounds have been designed for GOase [145-157]. In spite of a high similarity of structural and/or magnetic properties most of these synthetic analogs show no catalytic activity. Very recently however, two groups have succeeded in synthezising functional models of GOase [156, 157]. These new biomimetic compounds enable a novel synthetic route for the conversion of primary alcohols to aldehydes and they also constitute well-defined model systems for an investigation of the underlying reaction mechanism. However, the synthetic models exhibit a reactivity that is several orders of magnitude below that of GOase. This drastic difference calls for an approach that would allow an identification of the essential factors governing the enzymatic reaction that are still at miss in the mimetic system.

238

,oo'4~)

0HI@0~N(11581)

Semi-

...

......' HzO;"

A Semi

oo'2r~)

I

O'4Q5)

0(Ym~

C)X~/Ze~

~)~e~(H~. ) A . X ,o ...........

.'~ 2;='2)

HzO"

0

+HzO 0

(Hma)N

I

~)N

D

+RCHzOH~-

N(Hr~)

HO~'" :%272) ~'~'~H " R

HzO2

HOOm~) C N(H~D

H0(Y~

(H4~)N

I(II~81)

HO(Y4e~) (Hm)N~~ D

B

I

H0(Y406) (~~u~

N(H581)

C

Figure 12: Schematic representation of the proposed catalytic cycle [138]. Labels refer to: A: resting state; B: protonated intermediate; C: transition state of the H-abstraction step, D: product of the abstraction step. semi: semi reduced form, ox: oxidized form. (Reproduced with permission from ref. [159], Copyright 2000 Springer.) We have performed a parallel theoretical study of the enzyme and one of its synthetic analogs [156] (Figure 11) aimed at the characterization of the main catalytic differences [158, 159]. Several key structures of the catalytic cycle (Figure 12) have been investigated in direct comparison with the natural target.

239 To capture the enzymatic system in its full complexity, we have adopted a mixed quantum/classical QM/MM Car-Parrinello approach [160] in which the active site residues (Figure 11a) are treated quantum mechanically (within the framework of density functional theory) and the rest of the protein is described with an empirically:derived force field. In contrast to pure gas phase models of the active site, such an approach allows to assess the influence of the protein environment and to capture finite temperature and solvent effects. We have confronted the two systems during the catalytic cycle (Figure 12) by characterizing the semi reduced and the oxidized form of the resting state, A semi and A ~ the protonated intermediate B, the transition state for the rate determining hydrogen abstraction C and the final product of the abstraction step D. We find that the overall features of the mimetic ,compound are qualitatively remarkably similar to the ones of its natural target. For both systems, the semi reduced resting state A semi is characterized by an unpaired electron localized in a dx2_y2 orbital at the Cu(II)center (Figure 13) while the catalytically active species A ~ (Figure 14), the protonated intermediate B (Figure 15), and the transition state of the hydrogen abstraction step C (Figure 16) form antiferromagnetically coupled diradical states. In A ~ B and , one electron remains localized on the Cu(II)-ion whereas the localization of the second electron of opposite spin varies several times throughout the cycle. All variations of the fl-spin distribution from a localization on the axial tyrosine Tyr495 in A ~ to the equatorial tyrosine Tyr272 in B and to a localization on the alcohol substrate in C are closely matched by the synthetic active site analog. However, we have also found a number of intrinsic differences between natural and synthetic compound that can be summarized as follows: (i) Throughout the catalytic cycle the active site of GOase undergoes only very small geometric changes. The RMS deviations of all the investigated structures A-D is smaller than 0.01/~ for all the ~ 70-80 atoms of the active site quantum region. In the biomimetic system on the other hand, at least two significant structural rearrangements occur; one upon substrate binding and another one in the product formation of the abstraction step. (ii) Substrate binding in the mimetic system seems to be hampered by alkyl residues of the thioether groups in ortho-position of the equatorial oxygen ligand. Considering the fact that the alcohol substrate is only weakly bound prior to deprotonation, the energy needed to induce a conformational change disfavors the formation of the substrate complex additionally. (iii) Adjacent oxygen and nitrogen containing aromatic ligands of the biomimetic compound form an angle of 50~ in the resting state. However, protonation of the axial ligand and the formation of the product D favor the formation of an extended conjugated system in which both ligand systems are essentially coplanar. This energetically favorable competitive configuration leads to large structural changes and induces the formation of a linear NCu(I)O-product in which the aldehyde substrate is tightly bound and cannot be released as easily as in the corresponding weakly bound GOase-analog (Figure 17). (iv) The activation barrier we calculate for the natural system is 16 kcal/mol in close agreement with a value of 14 kcal/mol estimated from the experimental turnover rate of 800s-I [161].

240

Figure 14: Contour plots of the unpaired electron density distribution in the oxidized form of the resting state of (a) GORse and (b) the biomimetic compound. Contours are drawn at 0.002 e/au 3. Yellow and magenta refer to a- and ~-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)

Figure 13: Contour plots of the unpaired electron density distribution in the semi reduced form of the resting state of (a) GOase and (b) the biomimetic compound (contour at 0.02e/au3). (Reproduced with permission from ref. [159], Copyright 2000 Springer.)

241

Figure 15: Comparison of the unpaired electron density distribution of the protona ted intermediate B of (a) GOase and (b) the biomimetic compound (contour at 0.008 e/au3). Yellow and magenta refer to a- and t3-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)

Figure 16: Comparison of the unpaired electron density distribution in the transition state for hydrogen abstraction (C) for GOase (a) and the biomimetic compound (b). Contours are drawn at two different levels: 0.0015 e/au 3 (upper half of Figure 16) and 0.001e/au 3 (lower half). Yellow and magenta refer to a- and j3-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)

242

Figure 17: Structure of the product D of the abstraction step for (a) GOase and (b) the biomimetic compound. The long coordination bonds to Tyr495 and the substrate are indicated in dashed lines. (Reproduced with permission from ref. [159], Copyright 2000 Springer.) The corresponding value for the synthetic system instead is with 21 kcal/mol distinctly higher, consistent with its much lower catalytic activity (turnover numbers for aromatic substrates are ~ 0.02s -1). In both systems, the second unpaired electron, which is localized on the equatorial oxygen ligand in B is here mainly located on the substrate itself (Figure 16 upper half). This finding offers a first explanation for the experimental fact that GOase is several orders of magnitude more efficient in the conversion of aromatic as compared to aliphatic substrates and that the synthetic system only converts aromatic but not aliphatic alcohols [156]. The strong concentration of the unpaired spin density on the alcohol substrate in the transition state suggests that the experimentally observed differences in reactivity are caused by the fact that aromatic substrates form more stable radical intermediates due to the additional

243

delocalization of the unpaired electron density. A closer inspection shows that for C the unpaired spin density on the substrate is smaller in the natural system (0.6e) than in its mimic (0.7e). In fact, in GOase the unpaired/~-spin density is delocalized to some extend over the equatorial tyrosine and the covalent sulfur link whereas at the same contour level no net spin population on the equatorial ligand of the mimetic system. For the synthetic analog, the integrated unpaired spin density is lower than 0.01e for any atom of the equatorial ligand system while corresponding values in GOase range typically from 0.01-0.02e/atom. The total unpaired electron density of the equatorial ligand is roughly twice as large in the natural compound which provides a first rationale of the discrepancy in barrier height. The sulfur-containing ligand has almost no radical character in the biomimetic, in contrast to the natural system. This agrees with the experimental observation that the covalent sulfur link plays an important role for the catalytic function of GOase [162] whereas sulfur-substituents have only a small or no effect for the synthetic compounds [150, 156]. The subtle electronic differences between natural and synthetic system is caused by a particular variance in the geometric properties. All the essential orbitals hosting the unpaired /%spin density are coplanar with the dx2_y~ orbital at the copper in both, natural and mimetic system. However, due to the perpendicular orientation of Tyr272 the pz-orbitals of the aromatic system and the covalently linked sulfur atom can easily overlap with these orbitals on the former while due to the different orientation of the equatorial ligand, they are orthogonal in the latter. We have performed a series of computer experiments to evaluate decisive factors involved in the enzymatic catalysis. The protein field outside the quantum region has only a relatively small effect. Most of the crucial properties seem to be determined by the geometric and electronic features of approximately 100 atoms of the active site. Thus, it should be indeed possible to construct small synthetic analogs that can mimic the enzymatic chemistry with high fidelity. Our study provides direct mechanistic information that can help in the future design of GOase mimics with increased efficiency or selectivity.

6

OUTLOOK

This review has shown the power of AIMD to describe biochemical problems. Complex enzymatic processes (such as catalytic reactions and binding of drugs) can be followed directly at the molecular level and many valuable insights can be gained from such in situ studies. AIMD and Hybrid/AIMD simulations certainly constitute a promising novel tool for an ab initio modeling of biological processes. However, due to the great complexity of the systems, technical and fundamental reasons still limit the domain of applications. The system size problem necessitates mixed QM/MM approaches which in the future might be accompanied by linear scaling approaches. However, the most severe of the remaining limitations is the time scale of a few tens of picoseconds during which the system can be sampled. Therefore, the combination of AIMD and Hybrid/AIMD simulations with enhanced sampling techniques [163] can be expected to multiply the power of this approach. The fast ongoing progress in the development of new algorithms and computer archi-

244

tectures makes us confident that AIMD and Hybrid/AIMD methods will be able to add a new dimension to the simulation of biological processes.

Acknowledgments. It is a pleasure to thank all the people who have contributed to this review, in particular Frank Alber, Karel Doclo, Stefano Piana, Lorenzo De Santis and Marialore Sulpizi. We also acknowledge fruitful collaborations with Wanda Andreoni and Gerd Folkers. We are indebted to Erio Tosatti and Michael Klein for many useful discussions. Finally, we would like to thank Michele Parrinello for his continuous support.

References [1] Car R, Parrinello M, Phys Rev Lett 55:2471 1985 [2] Ballone P, Andreoni W, Car R, Parrinello M, Phys Rev Lett 60:271-274 1988 [3] R5thlisberger U, Andreoni W, J Chem Phys 94:8129 1991 [4] Car R, Parrinello M, Phys Rev Lett 60:204-207 1988 [5] See, e. g. (a) Nusterer E, Sl5chl PE, Schwarz K, Angew Chem Intl 35:175 1996 (b) Charlier JC, De Vita A, Blase X, Car R Science 275:646 1997 [6] See, e g (a) R5thlisberger U, Klein ML, J Am Chem Soc, 177:42 1995; (b) R5thlisberger U , Sprik M, Klein ML, J Chem Soc Faraday Trans 94:501 1998; Doclo K, R5thlisberger U, Chem Phys Lett , 297:205 1998 [7] See, e g , (a) Boero M, Parrinello M, Terakura K, J. Am. Chem. Soc. 120:2746 1998; (b) Hass KC, Schneider WF, Curioni A, Andreoni W, Science, 282:882 1998; (c) Boero M, Parrinello M, Hiiffer S, Weiss H, J. Am. Chem. Soc. 122:501 2000 [8] Sprik M, Hutter J, Parrinello M, J Chem Phys 105:142 1996 and references therein [9] See e g (a) Molteni C, Parrinello M, J Am Chem Soc 120:2168 1998; (b) Brug@ F, Bernasconi M, Parrinello M, J. Am. Chem. Soc. 121:10883 1999; (c) Alber F, Folkers G, Carloni P, J Mol Structure (Theochem), 489:237 1999 [10] See, e g (a) Curioni A, Sprik M, Andreoni W, Schiffer H, Hutter J, Parrinello M, J Am Chem Soc 199:7218 1997; (b) Meijer EJ, Sprik M, J Phys Chem A, 102:2893 1998; (c) Meijer EJ, Sprik M, J. Am. Chem. Soc. 120:6345 1998 [11] Carloni, P , B15chl, PE, Parrinello M, J Phys Chem 99:1338 1995 [12] B15chl PE, Parrinello M, Phys Rev B 45:9413 1992; Kresse G, Hafner J J Non Cryst Solids 156-158:956 1993; Alavi A, Kohanof J, Parrinello M, Frenkel D, Phys Rev Lett 73:2599 1994; VandeVondele J, DeVita A, Phys Rev B 60:13241 1999 [13] Nose S, Mol Phys 52:255 1984; Hoover WG, Phys RevA 31:1695 1985 [14] Melchionna S, Ciccotti G, Holian BL, Mol Phys 78:533 1993

245

[15] Parrinello M, Rahman A, Phys Rev Lett 45:1196 1980 [16] Focher P, Chiarotti GL, Bernasconi M, Tosatti E, Parrinello M, Europhys lett 26:345 (1994); Bernasconi M, Chiarotti G1, Focher P, Scandolo S, Tosatti E, Parrinello M, J Phys Chem Solids 56:501 1995; [17] Marx D, Parrinello M, Z Phys B 95:143 1994; Marx D, Parrinello M, J Chem Phys 104:4077 1996; Tuckerman ME, Marx D, Klein ML, Parrinello M, J Chem Phys 104:5579 1996; Martyna G J, Hughes A, Tuckerman ME, J Chem Phys 110:3275 1999 [18] Blochl PE, J Chem Phys 103:7422 1995; Marx D, Fois E, Parrinello M, Intl J Quant Chem 57:655 1996; Martyna GJ, Tuckerman ME, J Chem Phys 110:2810 1999 [19] Hammes-Schiffer S, Andersen HC, J Chem Phys 99:523 1993 [20] Hartke B, Carter EA, Chem Phys Lett 189:358 1992 [21] Hartke B, Carter EA, J Chem Phys 97:6569 1992 [22] Blochl PE, Phys Rev B 50:17953 1994 [23] Lippert G, Hutter J, Parrinello M, Mol Phys 92:477 1997 [24] Woo TK, Margl PM, Blochl PE, Ziegler T, J Phys Chem B 101:7877 1997; Eichinger M, Tavan P, Hutter J, Parrinello M, J Chem Phys 110:10452 1999 [25] Hutter J, Carloni P, Parrinello M, J Am Chem Soc 118:8710 1996 [26] Carloni P, Sprik M, Andreoni, W J Phys Chem lq 104:823 2000 [27] Sagnella D E, Laasonen K, Klein M, Biophys J 71:1172 1996 [28] Carloni P, Andreoni W, Parrinello M, Phys Rev Lett 79:761 1997 [29] Florian J, Baumruk V, Strs

M Bedrs163

SJ, J Phys Chem 100 1559

[30] Carloni P, Andreoni W, Hutter J, Curioni A, Giannozzi P, Parrinello M, Chem Phys Lett 234:50 1995 [31] Tolari E, Carloni P~ Andreoni W, Hurter J, Parrinello M, Chem Phys Lett 234:469 [32] Carloni P, Andreoni W, J Phys Chem , 100:17797 [33] Comba P, Hambley T "Molecular Modeling of Inorganic Compounds" VCH, Weinheim, 1995 [34] Rovira C, Kunc K, Hutter J, Ballone P, Parrinello M, J Phys Chem A 101:8914 1997 [35] Rovira C, Ballone P, Parrinello M, Chem Phys Lett 271:247 1997 [36] Rovira C, Kunc K, Hutter J, Ballone P, Parrinello M, Int J Quantum Chem 69:31 1998 [37] Rovira C, Parrinello MInt J Quantum Chem 70:387 1998 [38] Rovira C, Parrinello M, Chem Eur J 5:250 1999

246

[39] Rovira C, Parrinello M Biophys J 78:93 2000 [40] Rovira C, Carloni P, Parrinello M, J Phys Chem B 103:7031 1999 [41] Segall MD, Payne MC, Ellis S W, Tucker GT, Boyes, RN, Xenobiotica 28:15 1998 [42] Segall MD, Payne MC, Ellis S W, Tucker GT, Boyes, RN, Phys Rev E 57:4618 1998 [43] Segall MD, Payne MC, Ellis SW, Tucker GT, Boyes, RN, N Chem Res Toxicol 11:962 1998 [44] Segall MD, Payne MC, Ellis SW, Tucker GT, Eddershaw PJ, Xenobiotica 29:561 1999 [45] Marchi M, Hutter J, Parrinello M, J. Am. Chem. Soc. 118:7847 1996 [46] Bifone A, de Groot HJM, Buda F, Chem Phys Lett 248:165 1996 [47] Buda, F, de Groot HJM, Bifone A, Phys Rev Lett 77:5405 1996 [48] Bifone A, de Groot HJM, Buda FJ, Chem Phys B 1997 101:2954 1997 [49] Bifone A, de Groot HJM, Buda F, Pure Appl Chem 1997 69:2105 1997 [50] La Penna G, Buda F, Bifone A, de Groot HJM, Chem Phys Lett 294:447 1998 [51] Molteni C, Frank I, Parrinello M, J. Am. Chem. Soc. 121:12177 1999 [52] Frank I, Hutter J, Marx D, Parrinello M, J Chem Phys 108:4060 1998 [53] Karplus M, Petsko GA, Nature 347:631-639 1990 [54] Tashian RE, BioEssays 10:186 1989 [55] Xue Y, Liljas A, Jonsson, BH, Lindskog S, Proteins: Str Func Gen 17:93 1993 [56] Silverman DN, TU C Chen X, Tanhauser SM, Kresge AJ, Laipis P J, Biochemistry 32:10757 1993 [57] For computational details and additional information see RSthlisberger U, ACS Syrup Ser, Am Chem Soc, Washington, DC 1998 712:264-274 1995 [58] Cieplak P, Bayly CI, Gould IR, Merz KM Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA, J. Am. Chem. Soc. 117:5179 1995 [59] van Gunsteren WF, Billeter SR, Eising AA, Hiineberger PH, Kr/ige P, Mark A, Scott WRP, Tironi IG, GROMOS96, BIOMOS, Z/irich and Groningen 1996 [60] Hakansson K, Carlsson, M, Svensson, LA, Liljas A, J Mol Biol 227:1192 1992 [61] Collyer SMR, McMahon TB, J Phys Chem 87:909 1983 [62] Liljas SG, Bartmess JE, Liebman JF, Holmes JL, Mallard, WG J Phys Chem Ref Data 17, Suppl 1 1988 [63] Merz KM Jr, J. Am. Chem. Soc. 133:406 1991 [64] Fersht A, "Enzyme structure and mechanism" 2nd ed New York: Freeman W H ; 1985 p 327

247 [65] Kraut J, Ann Rev Biochem 46:331 1977 [66] Stroud RM, Sci Am 231:74 1974 [67] Branden C , Tooze J "Introduction to protein structure" 2nd ed New York: Garland; 1999 p 410 [68] Matheson NR, van Halbeek H, Travis J, J Biol Chem 266:13489 1991 [69] Steitz TA, Shulman RG, Annu Rev Biochem Biophys 11:419 1982 [70] Blow DM, Birktoft JJ, Hartley BS, Nature 221:337 1969 [71] Matthews BW, Sigler PB, Henderson R, Blow DM, Nature 214:652 1967 [72] Lin J, Cassidy CS, Frey PA, Biochemistry 37:11940 1998 [73] Cassidy CS, Lin J, Frey PA, Biochemistry 36:4576 1997 [74] Frey PA, Whitt SA, Tobin JB, Science 264:1927 1994 [75] Bryan P, Pantoliano MW, Quill SG, Hsiao HY, Poulos T, Proc Nat'l Acad Sci USA 83:37433745 1986 [76] Hwang JK, Warshel A, Biochemistry 26:2669 1987 [77] Warshel A, Naray-Szabo G, Sussman F, Hwang JK Biochemistry 28:3629 1989 [78] Wells JA, Cunningham BC, Craycar TP, Estell DA, Phil Trans R Soc Lond A 317:415 1986 [79] Lin J, Westler WM, Cleland WW, Markley JL, Frey PA, Proc Nat'l Acad Sci USA 95:14664 1998 [80] Halkides CJ, Wu YQ, Murray CJ, Biochemistry 35:15941 1996 [81] Warshel A, Papazyan A, Kollman PA Science 269:102 1995 [82] Warshel A, J Biol Chem 273:27035 1998 [83] For additional information, see De Santis L, Carloni P, Proteins: Str Func Gen 37:611 1999 [84] Models based on the structure of pancreatic elastase complexed with Ace-Ala-Pro-Valdifluoro-N-phenylethylacetamide: Takahashi L H, Radhakrishnan R, Rosenfield R E, , J. Am. Chem. Soc. 111:3368 1989 [85] Rao SN, Singh UC, Bash PA, Kollman PA, Nature 328:551 1987 [86] Jeffrey GA, Saenger W, "Hydrogen bonding in biological structures" Berlin: SpringerVerlag; 1991 [87] See, e g (a) "3D QSAR in drug design: ligand-protein interaction and molecular similarity", Kubinyi H, Folkers G, Martin YC: Kluwer Escom, Dodrecht-Boston-London, 1998 (b) "Structure-based drug design: computational advances" Marrone JM, Briggs JM, McCammon A, Annu Rev Pharmacol Toxicol 37:71 1997; (c)"Computer-Aided Molecular Design: Theory and Application", Doucet, JP, Weber, J Academic Press, London, 1996

248

[88] Davies DR, Annu Rev Biophys Biophys Chem 19:189 1990 [89] Fitzgerald PMD, Springer JP, Annu Rev Biophys Biophys Chem 20:299 1991 [90] Todd MJ, Semo N, Freire E, J Mol Biol 283:475 1998 [91] Harte WE, Swaminathan S, Mansuri MM, Martin JC, Rosenberg IE, Beveridge DL, Proc Nat'l Acad Sci (USA) 87:8864 1990 [92] Harte WE, Swaminathan S, Beveridge DL, Proteins: Str Func Gen 12:175 1992 [93] York DM, Darden TA, Pedersen LG, Anderson MW, Biochemistry 32:1443 1993 [94] Wlodawer A, Vondrasek J, Annu Rev Biophys Biomol Struct 27:249 1998 [95] Chatfield DC, Brooks BR, J Am Chem Soc 117:5561 1995 [96] Straatsma, TP et al in "Computer Simulations of Biomolecular Systems" van Gunsteren WF, Weiner PK, Wilkinson AJ Eds, (ESCOM, Leiden), p 363, 1993 [97] Liu H, Muller-Plathe F, van Gusteren WF, J Mol Biol 261:454 1996 [98] Geller M, Miller M, Swansom SM, Maizel J, Proteins: Str Func Gen 27:195 1997 [99] Harrison RW, Weber IT, Prot Eng 7:1353 1994 [100] For a description of computational details and additional information see Piana S, Carloni P, Proteins: Str Func Gen in press (2000) [101] Hyland LJ, Tomaszek TA Jr, Roberts GD, Carr SA, Magaard VW, Bryan HL, Fakhoury SA, Moore ML, Minnich MD, Culp JS, DesJarlais RL, Meek TD, Biochemistry 30:8454 1991 [102] Smith R, Brereton IM, Chai RY, Kent SBH, Nature Struct Biol 3:946 1996 [103] McKeever BM, Navia MA, Fitzgerald PM, Springer JP, Leu CT, Heimbach JC, Herbert WK, Sigal IS, Darke PL, J Biol Chem 264:1919 1989 [104] Beveridge AJ, Heywood GC, Biochemistry 32:3325 1993 [105] Polgar L, Szeltner Z, Boros I Biochemistry 33:9351 1994 [106] Trylska J, Antosiewicz J, Geller M, Hodge CN, Klabe RM, Head MS, Gilson MK, [107] Berendsen HJC, Mavri J in Theoretical Treatments of Hydrogen Bonding, Hadzi D Ed, p 119, 1997 [108] Cleland WW, Kreevoy MM Science 264:1887 1994 [109] Miller M, Schneider J, Sathyanarayana BK, Toth MV, Marshall GR, Clawson L, Selk L, Kent SBH, Wlodawer A, Science 246:1149 1989 [110] Erickson J, Neidhart DJ, VanDrie J, Kempf D J, Wang XC, Norbeck DW, Plattner JJ, Rittenhouse JW, Turon M, Wideburg N, Kohlbrenner WE, Simmer R, Helfrich R, Paul DA, Knigge M, Science 249:527 1990

249 [111] Suguna K, Padlan EA, Smith CW, Carlson WD, Davies DR, Proc Nat'l Acad Sci (USA) 84:7009 1987 [112] Silva AM, Cachau RE, Sham HL, Erickson JW, J Mol Biol 255:321 1996 [113] Kempf DJ, Marsh KC, Denissen JF, McDonald E, Vasavanonda S, Flentge CA, Green BE, Fino L, Park CH, Kong XP, Wideburg NE, Saldivar A, Ruiz L, Kati WM, Sham HL, Robins T, Stewart KD, Hsu A, Plattner JJ, Leonard JM, Norbeck DW, Proc Nat'l Acad Sci (USA) 92:2484 1995 [114] Lapatto R, Blundell T, Hemmings A, Overington J, Wilderspin A, Wood S, Merson JR, Whittle P J, Danley DE, Geoghegan K F, Hawrylik SJ, Lee EE, Scheld KG, Hobart PM, Nature 342:299 1989 [115] Wlodawer A, Miller M, Jaskolski M, Sathyanarayana B K, Baldwin E, Weber I T, Selk L M, Clawson L, Schneider J, Kent SB, Science 245,:616 1989 [116] Bagossi P, Cheng YE , Oroszlan S, Tozser J, Prot Eng 9:997 1996 [117] de Clercq E, Ann N Y Acad Sci 724:438 1994 [118] Richman D D Annu Rev Pharmacol Toxicol 33:149 1993 [119] Boyer PL, Ferris AL, Clark P, Whitmer J, Frank P, Tantillo C, Arnold E, Hughes SH, J Mol Biol 243:472 1994 [120] Tantillo C, Ding J, Jacobo-Molina A, Nanni RG, Boyer P L, Hughes SH, Pauwels R, Andries K, Janssen PA, Arnold E, J Mol Biol 24"~:369 1994 [121] Huang H, Chopra R, Verdine GL, Harrison SC, Science 282:1669 1998 [122] Alber F, Carloni P, submitted [123] Gu Z, Gao Q, Fang H, Salomon H, Parniak M, Goldberg E, Cameron J, Wainberg MA, Antimicrob Agents Chemother 38:275 1994 [124] Iversen AK, Sharer RW, Wehrly K, Winters MA, Mullins JI, Chesebro B, Merigan TC, J Virol 70:1086 1996 [125] Elion GB~ Furman PA, Fyfe JA, De Miranda P, Beauchamp Acad Sci 74 5716 1977

C, Schaeffer HJ, Proc Nat'l

[126] Schaeffer HJ, Beauchamp C, De Miranda P, Elion GB, Bauer DI, Collins P, Nature 272:583 1978 [127] Culver KW, Ram Z, Wallbridge S, Ishii H, Oldfield EH, Blaese RM, Science 256:1550 1992 [128] Chen S-H, Shine HD, Goodman JC, Grossman RG, Woo SLC, Proc Nat'l Acad Sci 91:3054 1991 [129] O'Malley BW Jr, Chen SH, Schwartz MR, Woo SLC, Cancer Res 55:1080 1995 [130] Chambers R, Gillespie GY, Soroceanu L, Andreansky S, Chatterjee S, Chou J, Roizman B, Whitely RJ, Proc Nat'l Acad Sci 92:1411 1995

250

[131] Vile RG, Hart IR, Cancer Res 53:3860 1993 [132] Caruso M, Panis Y, Gagandeep S, Houssin D, Salzmann J L, Klatzmann D, Proc Nat'l Acad Sci 90:7024 1993 [133] Alber F, Kuonen O, Scapozza L, Folkers G, Carloni P, Proteins Struc Func Gen 31:453 1998 [134] Wild K, Bohner T, Aubry A, Folkers G, Schulz GE, FEBS Lett 369:289 1995 [135] Pilger B, Perozzo R, Alber F, Wurth C, Folkers G, Scapozza L, J Biol Chem 274:31967 1999 [136] The sulfur atom of Met 128 is 4 8/~ away from the thymine ring. Therefore, it should in principle be possible to find sizable polarization effects on the sulfur [137] see e.g. "Mechanistic Bioinorganic Chemistry" (Thorp H H, Pecoraro V L, Eds, American Chemical Society, Washington D C 1995); "Bioinorganic Catalysis" (Reedijk J, Ed, Marcel Dekker, New York 1993) [138] for a review on Goase see e g : Whittaker JW, in Metals Ions in Biological Systems (Sigel H, Sigel A, Eds, Marcel Dekker, New York 1993), Vol 30, p 315 [139] Johnson JM, Halsall HB, Heineman WR Anal Chem 54:1394 1982 [140] Ito N, Phillips SEV , Stevens C, Ogel ZB, McPherson MJ, Keen JN, Yadav KDS and Knowles PF, Nature 350:87 1991 [141] Ito N, Phillips SEV, Yadav KDS, Knowles PF, J Mol Biol 238:794 1994 [142] Whittaker MM, De Vito VL, Asher SA, Whittaker JW, J Biol Chem 264:7104 1989 [143] Whitaker MM, Whittaker JW, J Biol Chem 265:9610 1990 [144] Gerfen G J, Bellew BI, Griffin RG, Singel D J, Eckberg AC, Whittaker JW, J Phys Chem 100:16739 1996 [145] Branchaud BP, Montague-Smith MP, Kosman DJ, McLaren FR, J Am Chem Soc 1993 115:798 1993 [146] Adams H, Bailey NA, Campell IK, Fenton DE, He QY, J Chem Soc, Dalton Trans 2233 1996 [147] Wang Y, Stack TDP, J. Am. Chem. Soc. 118:13097 1996 [148] Halfen JA, Young V G Yr, Tolman WB Angew Chem Int Ed Engl 35:1687 1996 [149] Whittaker MM, Duncan WR, Whittaker JW Inorg Chem 35:382-386 1996 [150] Halfen JA, Jazdzewski BA, Mahapatra S, Berreau L M, Wilkinson EC, Que L Jr, Tolman WB, J. Am. Chem. Soc. 119:8217 1997 [151] Sokolowski A, Leutbecher H, Weyermiiller T, Schnepf R, Bothe E, Bill E, Hildebrandt P, Wieghardt K, J Biol Inorg Chem 2:244 1997

251

[152] Fontecave M, Pierre JL, Coord Chem Rev 170:125 1998 [153] Vaidyanathan M, Viswanathan R, Palaniandavar M, Balasubramanian T, Prabhaharan P, Muthiah TP, Inorg Chem 37:6418 1998 [154] Ito S, Nishino S, Itoh H, Ohba S, Nishida Y Polyhedron 17:1637 1998 [155] Ruf M, Peripont CG Angew Chem Int Ed 1998 37:1736 1998 [156] Wang Y, Dubois JL, Hedman B, Hodgson KO, Stack TDP Science 278:537 1998 [157] (a) Chaudhuri P, Hess M, F15rke U, Wieghardt K, Angew Chem Intl Ed 37:2217 1998 (b) Chaudhuri P, Hess M, Weyermiiller T, Wieghardt K, ibid, 1 38:1095 1999 [158] Rothlisberger U, Carloni P Intl J Quant Chem 1999 73:209 1999 [159] Rothlisberger U, Carloni P, Doclo K, Parrinello M, J Biol Inorg Chem, in press (2000) [160] Eichinger M, Tavan P, Hutter J, Parrinello M J, Chem Phys 110:10452 1999 [161] Wachter RM, Branchaud BP, Biochim Biophys Acta 138 4:43 1998 [162] Baron AJ, Stevens C, Wilmot C, Seneviratne KD, Blakeley V, Dooley DM, Phillips SE, Knowles PF, Mc Pherson MJ J Biol Chem 269:25095 1994 [163] VandeVondele J, Rothlisberger U (to be published)

This Page Intentionally Left Blank

L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

253

Chapter 7

Computational enzymology: Protein tyrosine phosphatase reactions K. Kolmodin, V. Luzhkov * and J. ~qvist

Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, Box 596, SE-751 24 Uppsala, Sweden. 1. INTRODUCTION Phosphoryl transfer to and from specific tyrosine residues in proteins is an important regulatory (signaling) mechanism involved in cellular processes such as cell growth, proliferation, differentiation and T-cell activation [1-3]. The cascades of phosphoryl transfer reactions by the phosphorylating protein tyrosine kinases and dephosphorylating protein tyrosine phosphatases (PTPases) form an extremely complex network of interacting proteins in the cell. The mutual actions of the kinases and phosphatases determine the level of phosphorylation of their target proteins and thereby guarantee correct timing of the cellular processes. Most of the proteins that are regulated by each specific type of PTPase are not yet identified. Nevertheless, the PTPases hydrolyze both phosphotyrosyl containing proteins and peptides as well as small arylphosphates in vitro, which makes it possible to characterize them biochemically. The interest in PTPases has exploded in recent years and considerable progress has been made towards elucidating their catalytic machinery. Several enzymological studies as well as crystal structures have been reported [4-11 ], but despite these advances there are a few fundamental questions regarding the catalytic reaction mechanism that remain unanswered. Here we will present a computational study of the reaction catalyzed by the PTPases. The calculations, based on crystal structures of in total three different phosphatases, were performed in order to answer detailed mechanistic questions not always accessible to classical enzymology. A quantum chemical investigation of kinetic isotope effects in phosphate monoester hydrolysis is also presented.

* Permanent address: Institute of Problems of Chemical Physics, Russian Academy of Sciences, Chemogolovka, Moscow Region, 142432, Russian Federation.

254

Substrate

O

~

Asp

/0 H " 9~

~ ~ - - - - - N~ %~. ~ ~ , ~ P-loop / ' "

H..::~2 /H"

N

N

Arg

..

. U4""i"" H2N ~ H~-N ..... i f'~i ...... \ Ser H LTJ'S""......

Cys

Figure 1. Schematic view of the active site in a typical PTPase.

2. PROTEIN TYROSINE PHOSPHATASE REACTIONS 2.1. Protein tyrosine phosphatases The PTPases can divided into three subfamilies based on their primary structure: I. The major family of the PTPases is formed by proteins containing at least one homologous catalytic domain. These proteins can either be membrane bound (e.g. the leukocyte-antigen related PTPase, LAR) or non-membrane bound (e.g. PTP1B). II. The dual specificity phosphatases (DSPases) can dephosphorylate both serine/threonine residues as well as tyrosine residues. This family includes for example the Vaccinia///-related phosphatase (VHR) and the cell cycle controlling phosphatases Cdc25. III. The cytosolic low molecular weight PTPases (LMPTP) form a distinct class, containing a catalytic domain of only 140-180 residues. There is also a large number of specific serine/threonine phosphatases that have a totally different strategy for catalysis. These enzymes utilize bound metal ions for catalyzing the reaction, which is not the case in the PTPases and DSPases. The PTPases all possess the active site signature motif H/V-C-(X)5-R-S/T comprising the characteristic phosphate binding loop, referred to as the P-loop. The backbone amide NH-groups of the P-loop residues are oriented towards the center of the substrate binding crevice forming an phosphate anion hole (Figure 1). As an extension of the P-loop backbone, the guanidinium group of the invariant arginine side chain is involved in binding the substrate and stabilizing the transition states, by forming a bidentate interaction with two of

255

O~,,..

Asp

O ~ - , . ~ Asp

OH O II C y s / ~ S - HO-"P - 0 O-

O~..~. Asp OH Cys ~ S -

O II HO~/H. OH O-

"-

O-

OH

257

Cys " ~ s ~ P " ~ O ' OHO

1L

0.~.--.,~

OH

I

Asp

O-

H

cys.~S-'o__-'o6,

Figure 2. The reaction catalyzed by the protein tyrosine phosphatases.

the non-bridging oxygens of the substrate phosphate group. The hydroxyl group of the serine/threonine residue immediately after the arginine (not in Cdc25) forms a hydrogen bond with the catalytic cysteine. In addition to the P-loop, all PTPases (except possibly Cdc25) also possess a conserved aspartic residue positioned on a more or less flexible loop close to the active site. In most PTPase crystal structures this side chain is at hydrogen bond distance to the bridging oxygen of the ligand. Therefore, this residue is believed to function as a general acid which donates its proton to the leaving group oxygen. 2.2. The PTPase reaction mechanism

PTPases catalyze the hydrolysis of phosphate monoesters yielding inorganic phosphate and the dephosphorylated substrate as products. The fact that active site structures, kinetic properties such as formation of an cysteinyl phosphate intermediate, pH-rate profiles etc. are similar for different types of PTPases [ 12-14] indicates that they all employ a common mechanism for catalysis. The catalytic reaction in PTPases has been shown to proceed via a double displacement mechanism involving a phosphoenzyme intermediate where the phosphate group is covalently bound to the', cysteine residue in the active site motif [ 15]. The formation of this thiophosphate intermediate is accomplished by a substitution reaction where the catalytic cysteine attacks the phosphorus atom and the leaving group oxygen is protonated by the general acid as the P-O bond is cleaved [16,17]. This aspartate residue is thought to subsequently activate a water molecule which hydrolyzes the phosphorylated cysteine in the following step (Figure 2). It is most likely that the catalytic cysteine is in its ionized form when the first nucleophilic displacement takes place. However, it is still unclear

256

whether the catalytic cysteine is in its thiol or ionized form in the free enzyme, as well as in the enzyme-substrate complex. Here we will describe how the total reaction free energy profile of the reaction catalyzed by a low molecular weight PTPase (LMPTP) is calculated using the empirical valence bond method. Combining the results with binding flee energy calculations the protonation state of the reacting fragments is determined. The consistency of the calculated reaction free energy profile is further verified by studies of mutant enzymes. 3. THE EMPIRICAL VALENCE BOND M E T H O D

The empirical valence bond (EVB) method describes chemical reactions in terms of resonance structures or valence bond (VB) states that represent different bonding arrangements and charge distributions along a reaction pathway [ 18-20]. EVB can be used in combination with molecular dynamics simulations (MD) and free energy perturbation (FEP) techniques in order to obtain reaction free energy profiles of reactions in different environments, for example in water solution and the in the active site of an enzyme. Here, MD is mainly used as a tool for thermal sampling of the system, while the FEP technique is used to drive the reaction from reactant to product and allow the free energy profile (potential of mean force) to be calculated. The diagonal elements of the EVB hamiltonian correspond to the diabatic energies of the valence bond states and are given by a regular force field expression Ei -- H i i

V (i) -~- V (i) .at_ V (i> -4- V (i> .at- V (i) .-1- Vs s -~- a (i) = " bond angle torsion nb,rr nb,rs

(1)

where the first four terms describe the bonded and non-bonded energies of the molecular fragments corresponding to the ith resonance structure, while vnb(i),rs denotes its non-bonded interaction with the surrounding system. The sixth term represent bonded and non-bonded interactions within the surrounding system, that are the same for all resonance structures. The parameter a ~') determines the gas phase energy of the ith state with the fragments at infinite separation [ 18-20]. The actual ground-state energy of the system E~ at a given configuration is obtained by mixing the VB states using the off-diagonal elements (resonance integrals)H0 and solving the secular equation: HC = EgC

(2)

257

One advantage with the EVB approach :is that c/and H o can be calibrated using experimental information on reaction free energies (AG ~) and activation barriers (AG::) for relevant reference reactions in solution. The resulting parameters (typically A% and Hu) are then used without change in simulations of the enzyme reaction. The obtained result is then the effect on the free energy profile when the reaction is transferred from one environment (water solution) to another (solvated enzyme). The free energy is evaluated by driving the system between different VB states using an FEP mapping potential of the form: Em = Zo~i/7~7

Z ~ , 7 =1

i

(3)

i

where the mapping vector "~m with components 2m is changed in small incremental steps. For a two-state reaction the mapping potential is typically" ~'m-- ~'~(1--&2)+ ~222

~' e[0,1]

(4)

The actual ground-state free energy is then obtained from the expression:

(5) where

AG(~m)=-RTln~(exp{-@~+1-cr

(6)

4'=0

AG(2m) in Equation 5 denotes the mapping free energy for a particular value of the mapping vector "~m that contributes the sampling of the reaction coordinate value X. The generalized multidimensional reaction coordinate is as usual taken as the energy gap zxG between relevant diabatic VB states [ 18-20]. 3.1. EVB and the PTPase reaction

The valence bond structures used in the: present calculations are shown in Figure 3 and the reaction is thus modeled in terms of conversions between these different states. The first step of the reaction ( ~ 1 ~ 2 ) represents activation of the nucleophile by proton transfer from the cysteine to the dianionic phosphate group of the substrate. The next step is the formation of the transient high energy penta-coordinated structure (~3), followed by release of the leaving group with concerted proton transfer from the general acid residue ((I)3---~i~4). In (I) 4 the phosphate group is covalently bound to the enzyme via a thiophosphate linkage.

258

In the second part of the reaction the leaving group is replaced by a water molecule which hydrolyzes the phosphoenzyme via a second penta-coordinated structure (O5~O6). The nucleophilic addition is modeled to occur concertedly with the proton transfer from the water to the general base residue (the same residue that acted as a general acid in the first step). Inorganic phosphate is released as final product as the S-P bond is broken ((I)6----~(~7) and finally one of the protons is transferred back to the cysteine ( ~ 7 ~ 8 ) yielding the initial state of the enzyme. Since the phosphate oxygens in the enzyme are not equivalent, due to restricted rotation of the phosphate group as can be seen schematically in Figure 1, it is necessary to consider three separate VB structures for each state with a singly protonated phosphate group. For the first reaction step we also examine the most plausible pathway for an unprotonated mechanism (kI-/z--}kI'/3---).kt/4, with total charge-3 on the reacting fragments). In this case there is no proton transfer between the nucleophile and the phosphate group and the negatively charged cysteine reacts directly with the phosphorus atom of the dianionic substrate. The EVB hamiltonians for the different phosphoryl transfer reactions were calibrated against relevant solution reactions utilizing experimental energetics data as well as semi-empirical and a b initio geometry optimizations (se below). As described elsewhere [18-20] the EVB calibration involves determining gasphase energy differences zXao - a (j~- ~(i) as well as off-diagonal matrix elements H,j between pairs of VB states so that the EVB potential surface reproduces experimental reaction free energies and barrier heights of relevant reference reactions in solution. This calibration procedure thus involves simulations of uncatalyzed reaction steps with the reacting fragments in water and fitting the above parameters so that calculated and observed free energies coincide. 3.2. Calibration of the EVB potential Recent a b initio calculations [21,22] combined with the Langevin dipoles (LD) and polarizable continuum model (PCM), on the hydrolysis of mono- and dianions of methylphosphate and various phenyl phosphate derivatives, as well as earlier quantum calculations [23,24] have shown that the reaction paths generally involve two transition states (TSs) separated by a high-energy minimum. Furthermore, it has been found that the associative and dissociative reaction mechanisms seem to have similar energetics in solution [21,25]. These results are also consistent with Guthrie's analysis of available thermodynamic data [26]. A recent thermodynamic analysis of experimental information by us [25] demonstrated that both a late associative and an early dissociative TS can reproduce experimentally observed linear free energy relationships (LFERs) of phosphate ester hydrolysis reactions in solution. These LFERs have also been

259

quantitatively reproduced by a recent ab initio+LD/PCM study of the associative reaction pathway [22]. The issue of associative versus dissociative mechanism turns out to be less important in this case (see below) and here our main objective in calibration of the EVB surface is to estimate the heights of the TSs in water. In the case of proton transfer steps, such as ~ 1 - ~ 2 , and ~ 7 ~ 8 , the pKa difference between donor and acceptor together with available LFERs for proton transfer steps were used as described in [27,28] for calibration of the relevant EVB parameters. Calibration of nucleophilic displacement steps, such as ~ 2 ~ 4 , and ~ 5 ~ 7 utilized data from Kirby and Varvoglis [29] on hydrolysis with phenol leaving groups, from Akerfeldt [30] on hydrolysis of phosphorothioic acids, from Borne and Williams [31] on equilibrium constant dependence on leaving group pK a and from Guthrie's thermodynamic data on phosphoric acid derivatives [26]. For the hydrolysis of phenylphosphate dianion the rate was too slow to be measured in [29], but using the monoanion rate and the ratio between mono- and dianion hydrolysis for the 2-nitro, 4-nitro and 3,5-dinitro derivatives one can estimate an overall barrier of 32.8 kcal/mol for the phenylphosphate dianion reaction in water. This value turns out to be entirely consistent with ~g = - 1 . 2 (the Bronsted coefficient for log k vs. leaving group pK~) [29] and the rate constant estimated by Guthrie [26] for n~tethyl phosphate dianion. The free energy barrier for hydrolysis of ethylthiophosphate dianion is obtained from [31 ] as 26.9 kcal/mol using the same estimate of the ratio between mono- and dianion hydrolysis. Furthermore, the equilibrium constants for hydrolysis of phenyl phosphate and RSPO32- are obtained from [31] a s - 3 . 0 a n d - 3 . 8 kcal/mol, respectively (after correcting for the 55M concentration of water), using pKas of 10.0 for phenol and 8.3 for cysteine. From these pKas and those of water (15.7), PhOPOaH- (5.7) and CH3CH2SPO3H- (5.9) the barriers for reaction of OH-with PhOPOaH- and RSPO3H- can be estimtated as 19.2 and 13.5 kcal/mol, respectively. Also the barriers for PhO- and RS- attack on phosphate monoanion (reverse reactions) are obtained as 32.0 and 29.2 kcal/mol, respectively. The barrier for OH-reaction with methyl phosphate monoanion is estimated to be 31.0 kcal/mol from [26]. The effect of changing the leaving group in the monoanion reaction with OH- f r o m - - O C H 3 t o - O P h thus becomes 19.2-31.0 =-11.8 kcal/mol. This value can be compared to that derived from ~ g = - l . 2 together with the corresponding AAG ~ of proton transfer between water and the phosphate as the leaving group is changed, which i s - l l . 0 kcal/mol. Hence, our estimate for the effect of changing the leaving group from methanolate to phenolate appears entirely consistent with available data and we will use AAG~zg(OCH3~OPh)=-ll.4 kcal/mol (the average of these two

260

values). The same reasoning is employed to estimate the effect of changing the leaving group from -OCH 3 to -SR where we obtain = - 15.6 kcal/mol. These results can now be combined to give two free energy profiles each having two TSs separated by a high-energy minimum for the uncatalyzed reaction in water.

AAG~tg(OCH3~SR)

RSH + PhOPO~-

~""

RSPO~- + PhOH

(7)

RSPO~- +H20

~

RSH+HPO2.-

(8)

The two barriers of the first reaction step are 21.3 and 22.8 kcal/mol, where the former is mainly associated with the incoming nucleophile (RS) and the latter with the leaving group (PhO). The issue of associative vs. dissociative mechanism becomes less important here since it mainly pertains to which barrier comes first and they are of similar height. The level of the high energy transient intermediate structure ~3 is more difficult to estimate accurately since it is not directly accessible to experiment. However, this state mainly serves as a reference for calculation of the flanking barriers wherefore its energy is not at all critical for our simulations. That is, our conclusions here are not affected even if (I) 3 would not be a minimum, which would correspond to a reaction profile with a single 22.8 kcal/mol activation barrier. Our earlier estimate [32] of the free energy of the penta-coordinated transient state (I) 3 at 12.7 kcal above the RS- + PhOPO3 H- state ~2 was based on AM 1-SM2 calculations since they were found to agree with Guthrie's estimate of (I) 3 for the case with OH-groups as axial ligands. In view of the recent MP2/6-31+G**//HF/6-31G* plus LD calculations [22] this appears to be an underestimate and we will instead use 16.5 kcal/mol for this free energy difference here which is in better agreement with [22]. That is, the free energy of (I) 3 is then only 1-3 kcal/mol below the two activation barriers. Geometries of various penta-coordinated species with different combinations of axial groups were optimized with the AM1-SM2 and PM3-SM3 hamiltonians [33]. For the case with RS- and PhO- as axial groups both AM 1-SM2 and PM3-SM3 locate a similar minimum. We used the geometry of ~3 from the former calculations which gives both axial ligand distances of ~2.4 A. One can also note here that both AM1, PM3, HF/6-31G* and HF/6-31+G** optimizations of CH3SPO3Hand CH3SPO3 2- give consistent S-P bond lengths of about 2.1-2.2 and 2.4 A for the mono- and dianion, respectively.

261

/

O

H-N O

~., =~

o

"P--O

S-H

06 ,O / H-N

~2

?_,

o =~

o

s|

." _ "-@o o

O

/ H-N

~2

O

~ )---" s|

P-O

O

/ H -- N /

I~ 3

o=:~s

O

H-N

O

}--

..... ~; ..... o

~3

~|

O

O

O :~

0

I

\ S ..... P ...... 0

o/ ~o

|

/

H "O

/

H--N

@

:

o

O O

\

@o

4

H-N

~4

o

O \

~ - - / 'i \( o~~." O O

o H

Qo

o~ / H-N

~-x

|

,,O

IH

o=C ~io~ ~ \H

@ O

/ H-N O

~6 o=~

I

S ..... P ..... 0

/

H

g'o| ,~

.. o

/ H-N

(~)7

)__,,

o==~

sO

H--O

|

o "

/

P-O

/ H--N

)--,,

~8 o=~

~-~

o o,.P

-

O

|176

Figure 3. Valence bond states used in the EVB calculations of the reaction mechanism catalyzed by the LMPTP. The formal charges of the reacting fragments are indicated.

262

These results also appear to be consistent with the crystal structure of a small double zwitterionic phosphate compound [34] in which the double negative phosphate charge is partly neutralized by interaction with cations. The top curve in Figure 4a summarizes the energetics of the protonated (-2) reaction in water where the effect on the equilibrium constant of protonation from an aspartate has also been included. For the alternative unprotonated @3) reaction, where neither Cys 12 nor the substrate phosphate group is protonated, the solution energetics is estimated directly from the observed values o f ~ g = - l . 2 and fl,~c = 0.13 [29], with Guthrie's value for the OH- + CH3OPO3 2- barrier (42.6 kcal/mol) as starting point. The two TSs are then found to be 31.0 and 34.1 kcal/mol, indicating about 10 kcal/mol higher activation energies than the protonated mechanism. The top curve in Figure 4b shows this estimated reaction profile in water where the effect of protonation from an aspartate is also included. Again, the order of the two barriers may be interchanged or even merged into a single 34.1 kcal/mol barrier (as the reaction approaches true SN2 character where kI'/3 is no longer a minimum). However, as emphasized above, this does not affect the conclusions from our simulations. 3.3. Simulation details The force field parameters for the different VB states were taken as far as possible from the GROMOS87 potential [35], which was also used to model the rest of the system. However, bonds within the reacting fragments were represented by Morse potentials using standard bond lengths and dissociation energies. Charges for the non-standard moieties involving S-P bonding were also derived from AM1-SM2 calculations and merged with those of the standard GROMOS fragments to maintain compatibility with these. Charges and van der Waals parameters for the thiolate species were those developed by Hansson et a l [27]. The protein coordinates used in the MD simulations were those of bovine liver LMPTP in complex with sulfate ion [5] (PDB entry 1PHR). The phenyl phosphate substrate was modeled into the crystal structure using the graphics program InsightlI [36]. The phosphorus atom was positioned approximately where the sulfate atom is found in the crystal structure, letting the phenyl ring perfectly fit in the narrow hydrophobic binding slot. In addition to Cys 12, seven residues close to the reaction center were considered to be charged: Argl8, Arg53, Asp48, Asp56, Arg58, His72, Asp92, whereas Asp129 (the general acid) was protonated. Other charged groups distant from the reaction center were replaced by neutral dipolar groups. All MD/FEP/EVB calculations were carried out using the program Q [37]. The reaction center was surrounded by a 16 sphere of SPC water in the solution (calibration) simulations and by a sphere of

263

the same size containing both protein and water in the enzyme simulations. Nine crystal waters close to the active site were kept at their original positions as the water sphere was generated. Water molecules generated closer than 2.3 A to the protein or crystal waters were removed. Protein atoms outside this sphere were restrained to their crystallographic coordinates and interacted only via bonds, angles and torsions across the boundary during the simulations. A non-bonded cut-off radius of 10 ~ was used together with the local reaction field (LRF) method [38] for longer range electrostatics. The water surface was subjected to radial and polarization surface restraints according to a new model described by Marelius et al [37]. The protein systems were equilibrated by a 20 ps stepwise heating scheme and thereafter 50 ps simulation at a constant temperature of 300 K. The water systems were equilibrated by directly simulating them for 50 ps at 300 K. The MD trajectories were run using a time step of 1 fs and energy data were collected every fifth step. The free energy perturbations were sampled using 47-83 )~-points and 5 ps simulation for each value of )~. Data from the first 2 ps of each step were discarded for equilibration.

4. REACTION FREE ENERGY PROFILE OF THE LMPTP

4.1. Step 1: Substrate dephosphorylation The free energy profile of the first part of the reaction, where the phosphate group is transferred to the catalytic cysteine of the enzyme (Cysl2 in LMPTP), was calculated for both the protonated and unprotonated reaction mechanism. The resulting free energy profiles from the EVB/FEP/MD calculations are summarized in Figure4. The upper curves are those of the simulated uncatalyzed reference reactions in solution, after calibration against experimental data. The lower curves are obtained by simulating the corresponding reactions in the solvated enzyme. It can be seen that the enzyme exerts a substantial catalytic effect on both the monoanionic and the dianionic reaction and generally stabilizes the high-energy structures by about 10-20kcal/mol compared to the uncatalyzed reactions. Surprisingly, the calculated activation barriers are 13.5-14 kcal/mol for both reactions, a value which is in excellent agreement with the reported rate of this step with p-nitrophenyl phosphate as substrate at pH 5,540 s-~ [39] and 789.5 s-1 [17].

264

25

A G (kcal/mol)

IlL

20

....

15

.~.

f

\\

I." ' \ Io' ,'~ \ t...._.~,, i:/ il" '.~, t,'. ,'. ~ ,~',~.~ t'' '1, 1,,

H on 02

. . . . . H on 03 ....... H on 04

.~'.

10

water 25

....... LMPTP

15 .'~"'

5

"~"' ' ""

l:': m /

0

-5

~.~ (I) 1

(I)2

(I)3

~4

Reaction coordinate

-5 t kI/2

4=

-15

1

~3

'--~4 Reaction coordinate

Figure 4. a) Free energy profiles of the protonated mechanism with a total charge of-2 on the reacting fragments (~i-+~4). The upper curve is the uncatalyzed reference reaction in water solution. The three lower curves are the same reaction simulated in the protein with the proton positioned on the three different oxygens as indicated in Figure 1. b) Reaction free energy profiles of the unprotonated mechanism with a total charge of-3 on the reacting fragments (~'2~P~). We find that for the protonated reaction the protein environment facilitates proton transfer from Cysl2 to the phosphate group of the substrate ( ~ ~ 2 ) , thus ensuring availability of the nucleophilic anion for the substitution reaction. This proton transfer would then correspond to a substrate assisted reaction mechanism if the cysteine is in its thiol form in the free enzyme. The small difference in free energy between I ~ 1 and (~2 ( 1 " 5 kcal/mol) indicates that the pK a of the cysteine is close to that of the substrate, i.e. it is lowered by the enzymatic environment. It has previously been shown that, among other interactions, a hydrogen bond from Serl9 is important for lowering the pKa of Cysl2 [14,27] The simulated reaction profiles of the three possible proton transfers show that there is no significant discrimination of the acceptor and that proton transfer is feasible from the cysteine to any of the three oxygens. The position of the proton does not have any major effect on the catalysis of the approach to the transition state (~2--,~3). On the other hand, it appears that stabilization of the thiophosphate intermediate resulting from leaving group departure is more sensitive to the nature of phosphate protonation. When the proton is bound to the 03 oxygen, which accepts a hydrogen bond from N~ of Arg 18, it can be engaged in hydrogen bonding to the negatively charged Asp129. When the proton is bound to 02 the distance to Asp 129 becomes too large to allow such hydrogen bonding. For the third case with the proton bound to 0 4 we observe a

265

Figure 5. Snapshot of the active site in the high-energy region of the reaction ((I)3----~(I)4). The side chains of residues 13-17 are omitted for clarity. Stabilizing hydrogen bonds are indicated as broken lines and the partial axial bonds are dotted. Note the phosphate hydrogen positioned on oxygen 03 stabilized by Asp129 and the carboxylic proton which is being transferred to the leaving group as the P-O bond is cleaved. stabilization of the phosphoenzyme intermediate that is somewhere in between the other two cases. Simulations of the P-O bond cleavage and leaving group departure clearly indicate that bond cleavage at the bridging oxygen has to be concerted with protonation of the leaving group in order to depress a charge separation in the active site. This is in agreement with interpretations of solvent isotope effects and proton inventory experiments which suggest that the proton from the general acid is largely transferred to the bridging oxygen in the transition state [40]. The bond cleavage was first simulated along a stepwise pathway with consecutive bond break and proton transfer, via a phenolate species. This pathway was predicted to be energetically unfavorable in the enzyme, yielding a barrier of ~22 and ~35 kcal/mol for the protonated and unprotonated reactions respectively. A developing negative charge on the leaving group oxygen apparently cannot be stabilized by the relatively hydrophobic surrounding in this region and, since the binding cavity is very narrow, solvating water molecules are excluded from the active site. The concerted pathway, (I03---~(I) 4 is strongly facilitated by the enzyme and the resulting negative charge on Asp 129 is, unlike the phenolate ion, accessible to solvent. An MD snapshot of the transition state region corresponding to P-O bond cleavage is shown in Figure 5.

266

PhOPO3H-water AGbind(monoanion)= AG, I

AG3 ~. PhOPO32-water I AG2 =AGbind(dianion)

PhOPO3Hprotein AG4~ PhOPO32promn aaG,.~ = aG - aG = aG - aG

(9)

Figure 6. Thermodynamic cycle for determination of the relative binding free energy between

a monoanionic and a dianionic substrate.

The calculated free energy profiles show that the activation energy of the unprotonated substitution reaction with concerted protonation by Asp 129 is similar to that of the protonated reaction. However, the unprotonated reaction gives an exothermicity of 13 kcal/mol. The difference in free energy between ~4 and ~4 is given by 1.36"(pH-pKa), where the relevant pKa is that of the thiophosphate group in the enzyme. This pK~ value is close to the pH normally used in experiments and thus only a small free energy difference between ~4 and ~4 is expected. In Figure 4 the observed difference is 16 kcal/mol, which indicates that the free energy profiles of the two simulated reactions are shifted relative to each other and that the exothermicity results from destabilization of the reactants OF2) rather than a large stabilization of the phosphoenzyme intermediate (q"4). 4.2. Binding free energy calculations Substrate binding is a prerequisite for catalysis and an important step in the reaction which should be considered for a complete understanding of the energetics. In this case, the most straightforward way to examine this issue is to try to evaluate the difference in substrate affinity for the two different protonation states. We thus performed free energy perturbation (FEP) calculations where the substrate phenyl phosphate was transformed from monoanion (proton positioned on 03) to dianion in aqueous solution and in the solvated protein with Cys 12 in its anionic form according to the thermodynamic cycle shown in Figure 6.

267

AG (kcal/mol)

Ts (-3)

9 161

Q'~

E+S

!si2i5 14

(1)1./.. .... ..1~.2 .........................................

ES

ES~

I(I) 4

E-P + l.g.

Figure 7. Calculated thermodynamic cycle describing the relationship between the two possible mechanisms of catalysis in LMPTP.

The calculated difference in binding free energy was 15.9+0.9 kcal/mol for the monoanion to dianion perturbation, indicating that there is much less affinity for a dianionic substrate than a monoanionic substrate with Cysl2 ionized. Affinity calculations using the linear interaction energy approach [41,42] also confirmed a large difference in binding free energy. It was also seen from the MD structures that the distance between the nucleophile and the phosphorus atom was significantly increased (from 3.6 to 4.6 A) due to electrostatic repulsion, as the perturbation proceeded from monoanion to dianion. The average MD structures of the protonated and unprotonated states were superimposed on the crystal structure. The r.m.s, deviation for the heavy atoms of residues Cysl2-Serl9, Asp129 and the phosphate group was 0.43 A and 0.97 A for the monoanionic and dianionic states, respectively, and it was clear that the overall structure of the P-loop was significantly distorted in the case where the proton was absent. In particular, the nucleophilic sulfur had moved 1.7 A. from its original coordinates away from the substrate. On the other hand, with Cysl2 ionized and a proton on oxygen 03 the average MD structures were in excellent agreement with the crystal structure [43]. The MD structures suggest that a dianionic substrate, although having favorable interactions with the P-loop amide nitrogens and the positively charged Argl8, is in an electrostatically disfavored position. It has been proposed that the positive arginine would

268

effectively neutralize one of the charges on the phosphate group [44]. However, Arg 18 also forms an ion pair with Asp92 which makes the positive charge in the active site less pronounced. With a monoanionic substrate, the hydrogen bond between the nucleophile and the phosphate hydroxyl group keeps the reactants in close contact. The destabilization of the unprotonated ES complex obtained from the binding calculations agrees well with the exothermicity observed in the reaction simulations with no proton present on the reacting groups. This allows us to rather accurately close the thermodynamic cycle describing the states involved in the two possible reaction pathways. The difference in binding free energy shifts the unprotonated ES state (W2) by +16 kcal/mol relative the corresponding protonated state (~2) and as a result, the levels of W4 and (I) 4 closely coincide as expected. The thermodynamic cycle which summarizes the energetics of the substrate dephosphorylation step in LMPTP is shown in Figure 7. Thus, the most probable protonation state of the reacting groups of the PTPase-substrate complex was determined using EVB and FEP techniques.

4.3. Step2: Phosphoenzyme hydrolysis The second step of the reaction, phosphoenzyme hydrolysis, was simulated analogously to the first reaction step, but here only the protonated mechanism was considered. As can be seen in Figure 8, where the complete reaction is summarized, all steps of the reaction are significantly catalyzed by the enzyme compared to the uncatalyzed reference reaction in water. In particular, the activation barrier of the rate limiting step, formation of the second pentacoordinated high-energy structure ((I)5---).(I)6) , is lowered by as much as 15 kcal/mol. The calculated rate limiting barrier is 16 kcal/mol which is in excellent agreement with the reported kca t value of 27.5 s-~ for phenyl phosphate [39]. Since concerted bond breaking and leaving group protonation was found to be considerably favored over a stepwise mechanism in the first part of the reaction, the analogous concerted pathway was also modeled here. Simulation of the first step showed that the protein environment cannot stabilize a negative ligand in the active site outside the phosphate binding loop, which would also be the case for a stepwise proton transfer to Asp129 and a subsequent in-line attack of a hydroxide ion. The complete reaction in which the phosphate group is effectively transferred from phenol to a water molecule is exothermic with a change in standard free energy o f - 3 kcal/mol [30]. In Figure 8 the equilibrium constant on the enzyme becomes-5 kcal/mol in favor of the products. Here the apparent shift of the equilibrium constant includes the difference in binding of the product versus the

269

A G (kcal/mol)

water

,;,

(~1 .

(I)2

(I)3

~4/5

~6

~ "-"

q~6 Reaction ' \ ~ coordinate

Figure 8. Complete free energy profile of the reaction mechanism catalyzed by LMPTP. The reaction coordinate refers to the valence bond states shown in Figure 3.

substrate. Since inorganic phosphate is a competitive inhibitor of LMPTP [ 11 ] it is not unexpected that the product binds somewhat stronger than the substrate and thus lowers the level of(I) 7 and ~8 relative to ~ and ~2- The reaction could also involve a free energy change when the leaving phenol group is exchanged for an incoming water molecule (~4-~5). The difference in binding free energy between a phenol molecule and a water molecule was therefore estimated using the linear interaction energy (LIE) method [41,42]. The absolute binding affinities of phenol and water to the active site were calculated as in [42], giving the result that the binding free energies were very similar (-2 kcal/mol) for the two ligands. The energetics of ligand exchange does therefore not affect the free energy profiles in Figure 8. MD trajectories of the wild-type phosphoenzyme intermediate show that two water molecules interact directly with the phosphate group (Figure 9). One of these water molecules will be in the right position for the hydrolysis reaction to occur. In PTP1B Gln262 has been found to be an important residue for coordinating the nucleophilic water molecule. Mutating this residue to alanine resulted in phosphoenzyme trapping which made it possible to crystallize the reaction intermediate [45]. Although very similar in active site structure, there is no corresponding glutamine present in LMPTP. However, our simulations of the water attack ( ~ 5 ~ 6 ) reveal that Cysl 7 interacts with the nucleophilic water. It seems that this interaction is involved in coordinating the water molecule in favor of the reaction. The involvement of Cysl7 in the phosphoenzyme hydrolysis step was proposed by Cirri et al [46] already in 1993, before the

270

Figure 9. MD structure of the active site in the phosphoenzyme intermediate state (~5) viewed along the P-S bond. The two water molecules (Wl and W2) interacting with the phosphate group are shown, whereas the side chains of residues 13-16 are omitted for clarity. Wl is in the position for an attack on the phosphorus atom. The network of hydrogen bonds is shown as broken lines.

structure was solved. When Cys 17 was mutated to a serine the enzyme displayed low activity, but significant amounts of phosphoenzyme intermediate was trapped. This suggests that the larger thiol group better orients the water molecule than the smaller hydroxyl group in position 17. We also calculated the free energy profile (not shown) for the water attack ( q ~ 5 ~ 6 ) in the C17S mutant enzyme and it was found that the free energy barrier increased with 1.6 kcal/mol. This is totally consistent with the 6% residual activity compared to the wild type enzyme presented by Davis et al [47]. The polar and steric interaction between Cys 17 and the water molecule that are involved in its appropriate positioning can be directly appreciated in Figure 9. Superimposing the active site residues of PTP1B and LMPTP reveals that the proposed water coordinating residues (Gln262 in PTP1B and Cysl 7 in LMPTP) are in the same spatial position relative to the active site, although not sequentially related. Cysl 7 is a residue in the phosphate binding loop, whereas Gln262 is positioned in a flexible loop that can apparently move in and out of the active site [45].

271

4.4. Reaction mechanism for mutants lacking the general acid residue The D 129A mutant of LMPTP has been extensively studied in enzymological experiments. This mutant lacks the catalytically important general acid/base residue. However, the mutant is not entirely inactive, but retains an activity around 3000 times less than that of the wild-type enzyme[17]. We have found that protonation of the leaving group is essential for catalysis of phenyl phosphate hydrolysis since release of negatively charged phenolate species is energetically disfavored. If the leaving group departs as an anion we predicted an energy barrier that is not compatible with the experimentally observed activity. We therefore propose that the phosphate group itself may act as an acid in the first reaction step of this mutant and protonate the leaving group concertedly with its release. The alternative reaction mechanism for D 129A was simulated in the same way as the wild-type reaction, but with a slightly different set of valence bond states [48]. This hypothesis yields an activation barrier of the first step that is 5 kcal/mol higher than the corresponding wild-type reaction step. This corresponds to a decrease in rate by a factor of 4000 that is consistent with experiments. It then seems reasonable that the -2 charged phosphocysteine could itself abstract a proton from the attacking water molecule in the second hydrolysis step of the D129A mutant enzyme. This mechanism would then be similar to the substrate assisted reaction mechanism proposed for the acylphosphatase [49]. The complete free energy profile for this reaction mechanism in the D 129A mutant is shown in Figure 10. The free energy level of the phosphoenzyme intermediate lies somewhat below the initial enzymesubstrate level. From this lowest point of the profile the rate limiting barrier 9 5 ~ 6 is predicted to be 20 kcal/mol which is in accordance with the observed turn-over rate of 0.012 s-~ [17]. For the wild-type enzyme, the phosphoenzyme intermediate is higher in energy than the initial enzyme-substrate complex, while in the mutant it is slightly lower. This would imply that more phosphoenzyme intermediate should accumulate in the D129A mutant than in the wild-type, which has also been observed by phosphoenzyme trapping experiments[17]. The good compatibility of this proposed reaction mechanism with available experimental data suggest that this pathway may actually be utilized by mutant LMPTP lacking the general acid/base Asp 129. 4.5. The pK a of the catalytic cysteine is different in LMPTP and PTP1B Since a negatively charged thiolate group is a better nucleophile than the protonated thiol, the catalytic cysteine is believed to be deprotonated prior to nucleophilic substitution. Depending on its PKa, the r could be deprotonated already in the free enzyme, or it could be ionized after proton transfer to the substrate phosphate group in the Michaelis complex. The pKa of

272

AG (kcal/mol) m

..,,. .~. 9

,-,,,~

, o

.,,"~ ,

.

j!"1

/ (~1

~.a'

I

I

'~:

.'

i

~.

i'. [:

.....

LMPTP

.......

D129A mutant

i ,, |

,.! iL.s

,'-',, i ,t \ ~

l~2

.rm.., \

"~_./'

,,,~.

!.r-,

,,

(~3

(~4/5

i i (~6

',~7 ----- C~8 Reaction i 9 ... ~ ! , \-!,,.. coordinate

Figure 10. Calculated reaction profiles for wild type and D 1 2 9 A mutant L M P T P .

the cysteine is dependent on the surrounding environment and it would therefore be interesting to study how the energetics of nucleophile activation differs in among the PTPases. The active site of all PTPases are very similar in sequence and structure. However, the LMPTPs differ from the other tyrosine specific PTPases (e.g. PTP1B) in that there is no histidine residue prior to the catalytic cysteine in the sequence. The composition of amino acids surrounding the substrate binding site is also different in the two enzymes. We have employed the EVB method to study the energetics of nucleophile activation by proton transfer to a dianionic substrate in both LMPTP and human PTP1B. The two valence bond states ~l and ( I ) 2 used in the calculations are shown in Figure 3. These states represent the reactants and products for the reaction where a proton is transferred from the cysteine residue to the phenyl phosphate dianion. Starting coordinates for the protein simulations were the structure of bovine liver LMPTP in complex with sulfate and human PTP1B (C215S mutant) in complex with phosphotyrosine (PDB entries 1PHR [5] and 1PTV [9] respectively). The EVB potential was calibrated to reproduce experimental data for the uncatalyzed reference reaction in solution. In the case of proton transfer the difference in free energy between the two states, A G ~ can be obtained from the difference in pKa between the donor and acceptor. Once A G ~ is known the activation energy A G ~ can be determined from a linear free energy relationship compiled by Eigen [50]. For the proton transfer described by ~--->~2 the

273

/

8

i

_ ~ _ water

' 5 4

'/2

:.

-150

-100

-50

(t)]

/~

(t)2

100

150

reaction coordinate

Figure 11. Calculated energetics of proton transfer between the catalytic cysteine and the dianionic phosphate group of the substrate in LMPTP and PTP lB.

resulting difference in free energy is 3.5 kcal/mol and the activation energy is 7.8 kcal/mol. This estimate of the barrier effectively includes zero-point energy and tunneling effects since it is obtained from experimental data. In typical EVB studies of enzymatic reactions it is usually assumed that these quantummechanical effects do not differ significantly between the water and enzyme environments. This assumption has been verified by implementation of the path integral method [51] within the EVB framework [52,53]. The reaction free energy profile obtained from the water simulation containing only the solvated reacting fragments was calculated using the sampling approach described above. The EVB parameters Ac~ and H~e were adjusted until the calculated profile reproduced the experimental values. The resulting values were then used when evaluating the corresponding simulations of the reaction in the two enzymes. The simulated free energy profiles for proton transfer are shown inFigure 11. The upper curve is the calibrated profile of the reference reaction in solution. The other two curves show that both enzymes have a significant catalytic effect on the proton transfer, i.e. the activation energies and the free energy differences between the two states are lowered compared to the water simulation. Comparing the two enzymes it appears that ~2 is energetically more stable in PTP1B than in LMPTP, which indicates that the catalytic cysteine has a pKa

274

approximately two units lower in the former enzyme. This is consistent with experimental data as well as computed pK~s [54]. 4.6. Summary Arylphosphate hydrolysis is effectively catalyzed by the PTPases without the use of active site bound cations utilized by many other proteins that handle phosphorylated substrates in order to stabilize the negative charges of the reacting groups. The catalytic power of the PTPases instead arises from the perfectly designed active site structure which stabilizes each step of the reaction. The major properties that contribute to catalysis can be summarized as follows: I. The essential nucleophilic thiolate species is stabilized by the interaction with a hydroxyl group and a number of backbone amides hydrogen bonds. This stabilization lowers the PKa of the cysteine favoring its activation ((I)l-~(I)2). II. The P-loop backbone amides and the side chain of the arginine residue supplies perfect stabilization of the equatorial oxygens of the penta-coordinated transition states by a network of hydrogen bonds. III. An increased pK a of the general acid/base residue results in a larger catalytic effect of the second, rate limiting step where the water molecule is activated by the general base ((I)5--~6), compared to the first step where the same residue acts as an acid

((I) 3---~(I)4). The effects of these three features are clearly demonstrated by the above calculations where the obtained free energies of each step is in good agreement with experimental observations. The fact that also energetics of mutant LMPTP (D129A and C17S) are consistent with experiments indicates that the present computational modeling approach can successfully describe the catalytic process in a PTPase. Importantly, the calculations show that the P-loop is designed to stabilize exactly two negative charges, which means that the reacting fragments (nucleophile and phosphate group) must be singly protonated. Establishing the protonation state of the groups involved is essential for fully understanding the energetics of the catalytic mechanism and thus, the results presented here could serve as a framework in which enzymological experiments may be interpreted. 5. SUBSTRATE TRAPPING IN CYSTEINE TO SERINE MUTATED PTPases

The cysteine residue in the catalytic loop (Cysl2 in LMPTP, Cys215 in PTP1B) is the essential nucleophile for PTPase activity. Experiments show that cysteine to serine mutants are completely inactive but can still bind substrate molecules. The ability to bind substrates without hydrolyzing them is called substrate trapping and has been exploited when searching for native PTPase

275

(10) Figure 12. Thermodynamic cycle for determination of the difference in binding free energies between wild type and Cys---~Sermutated PTPases.

substrates in cell extracts. It is expected that the Cys--~Ser mutant have lower activity than the wild type since the hydroxyl group of the serine residue is a worse nucleophile than the thiol group of the cysteine. However, it is not totally clear why this mutant is completely inactive. The substrate binding properties of wild type and Cys---~Ser mutated PTPases are easily investigated by relative binding free energy calculations. Using free energy perturbation (FEP) according to the thermodynamic cycle shown in Figure 12 the difference in affinity between the wild type and mutant proteins can be obtained. Two sets of simulations are necessary for each protein; one with the empty solvated structure and one with phenyl phosphate dianion bound in the active site. The catalytic cysteine is then slowly transformed to serine, and vice versa, in both enzymes. The perturbations include change of charges, van der Waal parameters, bond lengths and one bond angle for the three atoms: C~-S/O-H. The simulations were performed using the PTPase crystal structures as above. The PTP1B structure was a serine mutant with a phosphotyrosine ligand, which was manually replaced by phenyl phosphate. Acidic and basic residues close to the active site were charged whereas those outside the simulation sphere and distant to the active site were replaced by polar neutral groups giving the system a total charge of zero [37]. Each simulation was prepared by slow heating of the system from 1 to 300 K followed by 100 ps equilibration at 300 K. The perturbations were performed at

276

Table 1. Results obtained from the FEP simulations. Simulated system

free PTP 1B PTP 1B+ligand free LMPTP LMPTP+ligand

Cys--~Ser

AG(kcal/mol)' -7.6+0.1 - 15.2+0.1

-9.5+0.1 -17.2+0.1

Ser ---~Cys AG (kcal/mol)' 6.5+0.1 13.5+0.1 9.0+0.1 17.3+0.1

AGave

AAGbina

(kcal/mol)

(kcal/mol) 3

(Cys-~Ser) z -7.1+0.4 -14.3+0.5 -9.2+0.2 - 17.2+0.1

-7.3+0.8 -8.0+0.3

this temperature using 51 )~-steps, 1 fs time steps and 5 ps sampling at each Z. Energies were collected every fifth Is. The energy data from the first 2 ps of each Z-step were discarded for equilibration. Forward and backward FEP simulations were performed, starting from mutant PTP1B and wild type LMPTP. The backward perturbations, preceded by a 25-50 ps equilibration, were started from the endpoints of the forward runs. The change in free energy for each perturbation (AG1 and AG2) were calculated using Equation 6. The results shown in Table 1 show that the simulations forwards and backwards yield similar energies with small standard deviations. The negative ZIAGbincl values predict that the serine mutants of both LMPTP and PTP1B bind phenyl phosphate 7-8 kcal/mol more strongly than the native enzymes. The stronger stabilization of the enzyme-substrate complex relative to the transition state in addition to the higher pK a of the hydroxyl group compared to the thiol group are therefore likely to be the major reasons for the complete lack of activity. 6. P R E D I C T I O N O F A L I G A N D I N D U C E D C O N F O R M A T I O N A L C H A N G E IN T H E A C T I V E S I T E O F C D C 2 5 A The cell cycle control phosphatases Cdc25 are dual specificity phosphatases (DSPases) that dephosphorylate both phosphothreonine and phosphotyrosine

' The free energy difference obtained from the simulation. Ser--~Cys indicates the forward mutation and Cys---~Serrefers to the backward mutation. The error is the convergence error obtained from summation in the two directions on the same trajectory. 2 The average free energy difference calculated from the two independent simulations of columns 2 and 3, with the standard error of the mean. The sign of this value corresponds to the Cys-+Ser mutation. 3 The difference in binding energy between the wt-ligand complex and the mutant-ligand complex. A negative value indicates that the serine mutant binds the ligand stronger than the wild type. The error is the sum of the errors of the terms.

277

Figure 13. a) Superposition of the backbone atoms of residues 430-436 in Cdc25A (white, PDB entry 1C25) and 12-18 in LMPTP (gray, PDB entry 1PHR). The sulfate ion is found in the crystal structure 1PHR. In addition to Ser434 and Glu435 only the totally conserved sidechains Cys and Arg are shown for clarity. b) Average MD structure of the ligand complex with Cdc25A after the observed conformational change, superimposed on LMPTP as in Figure 13a. residueS of their substrate proteins. The determination of the apo-protein structure of Cdc25A revealed that this enzyme has a completely different fold compared to all other phosphatases crystallized to date [55]. Although different in fold, the crystal structure confirms the expected features of the characteristic active site containing the C-Xs-R motif. Crystal structures of PTPases and DSPases in complex with various ligands show a common structure of the active site with the backbone amides of the loop residues pointing into the center of the crevice in order to stabilize the equatorial oxygens of the phosphomimetic group. The conformation of the corresponding region in the unliganded structure of Cdc25A is different. Here one of the peptide bonds is pointing its amide group in the opposite direction. This conformation would be very disfavored in the enzyme-substrate complex with the negatively charged phosphate group positioned in the active site. It would also have a destabilizing effect on the transition state which requires maximal stabilization of the equatorial oxygens for efficient catalysis. By performing MD simulations of the Cdc25A apo-structure (PDB entry 1C25 [55]) and a modeled Cdc25A-ligand complex some structural features of this protein were studied. The MD trajectories of the apo-structure in water were run for 65 ps at 300K and showed relatively stable energies and well

278

Figure 14. Ramachandran trajectory plot of Ser434 and Glu435 during the first 5 ps of the MD simulation at 300 K. The arrows indicate the direction of the trajectories. Shaded areas correspond to the generally allowed regions for q~and qt. retained structures. Simulation of the modeled complex of Cdc25A and 8042yielded stable trajectories after 2.5 ps simulation at 300K. However, already at 2 ps a conformational change in the backbone peptide bond between residue Ser434 and Glu435 occurred. In the starting (crystal) structure (Figure 13a) the dipoles of this peptide bond are 'inverted' compared to the pattern seen in the other H-C-Xs-R containing structures. 'Inverted' here refers to the fact that the carbonyl oxygen is pointing into the phosphate binding site and the amide nitrogen is pointing outwards. As expected, this conformation is electrostatically unfavorable when there is a negatively charged ligand bound in the phosphate binding loop and thus the dipoles spontaneously flip over to the preferred conformation when the sulfate ion is present (Figure 13b). The conformational change was clearly monitored by measuring the Ramachandran angles of residues Ser434 and Glu435 during the MD trajectory (Figure 14). The diagram shows that residue 435 has a strained conformation in the starting structure with the Ramachandran angles q~ and ~being unstable and located outside the allowed regions. As the trajectory proceeds the torsional angle q~ changes and the Ramachandran trajectory ends up in the allowed region of the diagram. For Ser434 mainly the ~ angle changes its value and the Ramachandran trajectory plot is displaced from the allowed region defined by 13 strands into the region typical for a-helical structures. This ligand induced conformational change yields a structure that is similar to the other PTPase-ligand complexes that have been crystallized. The results

279

obtained from the MD simulations emphasize the importance of the active site conformation of the PTPases and DSPases, with respect to substrate binding and catalysis. We suggest that the type of conformational change that was observed upon ligand binding in Cdc25A is an important molecular switch in the catalytic process [56]. 7. KINETIC ISOTOPE REACTIONS

EFFECTS

IN

PHOSPHORYL

TRANSFER

In the case of phosphate monoester hydrolysis by PTPases it has been argued that reported ~sO isotope effects for the non-bridge phosphate oxygens, 18(Vmax/Km)non_bridge, show that the reacting groups are unprotonated [44]. However, the 18(V/K)non_bridgevalues were then corrected, by the 180 isotope effect of deprotonation, for the fraction of monoanion (that needs to be deprotonated if only the dianion is reactive) present under the experimental conditions. Since such a correction assumes that the reactive species is the phosphate dianion (together with thiolate) the result cannot be used to prove the assumption. One could equally as well assume that the reactive species is the phosphate monoanion plus thiolate or the dianion plus thiol (with proton transfer according t o (I)l--->(I)2) , in which case the correction would go in the opposite direction. This would lead to a corrected value of 18(V/K)non_bridge--l.O17 which is, in fact very similar to that observed for hydrolysis of p-nitrophenyl phosphate in solution [44]. The ~sO isotope effects from the three non-bridging phosphate oxygens are often used as diagnostic tools for investigating the details of phosphoryl transfer reactions in enzymes and solutions. It would be interesting to investigate whether the experimentally observed kinetic isotope effects (KIEs) for phosphoryl transfer in solution can be reproduced by ab initio calculations. If so, the calculations might tell us something about the probability of possible pathways. 7.1. Calculations of heavy atom kinetic isotope effect in phosphate monoester hydrolysis Hydrolysis of phosphate esters is one of the fundamental biochemical reactions and a vast amount of research has been devoted to the study of phosphoryl transfer reactions [57-60], both in solution and in enzymes. Despite these efforts there are still ambiguities regarding the interpretation of experimental data (e.g., linear free energy relationships, kinetic isotope effects, crystal structures of enzyme-inhibitor complexes etc.) in terms of detailed reaction mechanisms [21,25,59,60]. Of particular interest has been to determine

280

whether these reactions follow associative or dissociative pathways (Figure 15). Here we report an attempt to address the issue of heavy atom kinetic isotope effects (KIE) in phosphate ester monoanion hydrolysis by quantum mechanical calculations. The use of 180 isotope effects from the three non-bridging phosphate oxygens as a diagnostic for investigating phosphoryl transfer mechanisms was pioneered by Cleland and coworkers [61,62]. Intuitively, the non-bridge~80 KIE would be expected to be normal for an associative mechanism while inverse for a dissociative one, judging from the formal equatorial bond-order to P in pentacovalent and metaphosphate like (transition) structures (Figure 16). For hydrolysis of the monoanion the situation is, however, complicated by a significant ~80 equilibrium isotope effect (EIE) on deprotonation [61 ]. Thus, the interpretations of experimentally measured KIEs depend on assumptions regarding proton transfers during the reaction [62]. To investigate this problem we use ab initio methods to calculate the effect of isotopic substitutions on the gas-phase free energies of the alternative reaction pathways. The 180 kinetic isotope effect on methylphosphate monoanion hydrolysis is calculated using the general formula '6k/18k = exp((AG~8 - AG~6 ) / RT)

(11)

where AG,*6 and AG,~ are the reaction activation free energies for 160 and ~80 isotopes in the non-bridging positions, respectively. The values of AG~are obtained from ab initio calculations as described below. As a check of the computational procedures, we also calculate the equilibrium ~80 isotope effect for deprotonation of methylphosphate and orthophosphoric acid monoanion, [61 ], from the ratio

16Keqll8Keq

=

exp((AG18 - AG16 ) / R T ) ,

(12)

where AG,6 and AG18 are the corresponding free energy differences between the mono- and dianions. All calculations of structures and energies for the reaction species are performed using the Gaussian-94 program [63]. Structures of the stationary points are fully optimized in redundant coordinates and characterized by subsequent frequency calculations. Geometry optimizations and thermodynamic calculations are performed using the 6-31G*, 6-31+G*, and 6-3 I++G** basis sets [63,64] containing polarization and diffuse functions (for the two latter). The calculated vibrational frequencies are scaled by a conventional factor of 0.8929 [64]. Electron correlation is included at the second

281

H-- .O OCH 3 / % / O P --"~ kit- - O / \ O H

H I

O II

O ....... P--OCH3 ft,,7.O/ \OH

1 3

O

O~

II /\

I

HO--P--OCH3 HO/ \ OH

4TS

HO--P .........OCH3 HO

5

Associative hydrolysis pathway

/~

MePO3H- + H20 1

PO4H2- + CH3OH

2

l!

"

O-~- P ........ 0 C H 3 + 2 ~

H/O'xH PO 3 - + 2 + 8

II O 9TS

7

Dissociative hydrolysis pathway

O ,-7..... H I

O.:.,t:I

6TS

~

:

-'

O%p/

0- + 8

H ~

~

A/

8

O

II

O ....... P---O : I

+ 8

H.......o-

II

10

O

11

12TS

Figure 15. Reacting species involved in the associative and dissociative reaction pathways. order M611er-Plesset perturbation theory level (MP2) for the 6-31++G** calculated structures. The optimized transition state structures and reaction free energies are shown in Figure 16 and 17, respectively. The energies of reaction steps are calculated relative to the sum of energies for the separated reactants 1 and 2. The search for the first transition state in the associative mechanism shows the existence of a symmetric transition state 4 T S , that has not been described previously. In the 4TS structure, the carbon atom is oriented anti relative to the unprotonated oxygen atom (Figure 16). The hydrogen atom originating from the reactant water molecule is largely transferred to the participating oxygen of methylphosphate. As a result, both equatorial hydroxyl hydrogens of the phosphate group have a symmetric syn orientation relative to the oxygen atom of the attacking water. The observed symmetry of this transition state directly shows that it has a common structure with the transition structure resulting from attack by hydroxide ion on the neutral form of 1, MeHzPO 4. A very similar spatial configuration of hydroxyl groups is found in the subsequent intermediate product 5, with the penta-coordinated phosphorous atom. The symmetric transition state 4TS is 11.7 kcal/mol (at the HF/6-3 I++G** level ) more stable than the TS structure reported for this step in [21 ] and is, in fact, very close in energy to the intermediate product 5 (Figure 17). The penta-coordinated species undergoes internal rotations of hydroxyl and methoxy groups before yielding the next transition state 6TS, on the way of expelling the methanol molecule, 8. In 6TS, the leaving hydrogen atom of the hydroxyl group remains to a large extent bonded to orthophosphate, rather than to methoxide ion.

282

Figure 16. Optimized HF/6-31++G** geometries of the transition state structures in associative and dissociative reaction pathways. Transition states along the dissociative pathway,9TS and 12TS, correspond to abstraction of the methanol molecule in the first step and to subsequent attack of the water molecule on the metaphosphate. In the first dissociative transition state the participating proton is largely transferred to the leaving group, while the hydrogen in the second transition state essentially remains bonded to the nucleophile. Comparing the energies of the two alternative pathways one finds that the free energy, including the MP2 correction, at the highest point along the associative route (6TS) is 1.2 kcal/mol higher than the energy of the highest point on the dissociative path (12TS). The gas phase energy for the water molecule attack on the phosphate moiety is considerably lower for the associative mechanism, while the activation energy for methanol abstraction is lower for the dissociative mechanism. The energetics of the corresponding solution reactions is likely to be modulated by several factors related to solutesolvent interactions. To address this issue in the best way one should apply several different approaches for including solvent effects on the reaction energetics [21,65], which might still not provide a conclusive answer regarding the exact mechanism. Nevertheless, it is worth noting that the calculated overall activation barriers are close to those observed experimentally for the solution reaction [26,66].

283

Our approach here, is to evaluate the heavy atom KIEs, which can provide important insight into the reaction mechanism, since these have been experimentally measured. Quantitative predictions of heavy atom isotope effects, in general, present a difficult computational problem which is associated with the assumed necessity of using the highest possible level o f ab initio theory, and with possible important contributions from tunneling effects and coupling of solute and solvent vibrational modes [64,67-69]. In the present calculations that deal with a medium size system, viz. methyl phosphate monoanion, we are forced to restrict the treatment to the HF level. Besides, the effects of the polar medium are taken into account using the SCRF model, where the solute molecule is treated as being immersed into a spherical cavity of continuum dielectric. The corresponding results from EIE and KIE calculations using the split-valence 6-31 G* basis set as well as basis sets augmented with extra diffuse and polarization functions on heavy atoms and hydrogens, 6-31+G* and 6-31 ++G**, are given in Table 2. For the deprotonation of orthophosphate we obtain a substantial normal equilibrium isotope effect in agreement with solution experiments, although it is clearly overestimated by the calculations. Our results for the KIEs show that none of the transition states yield a normal isotope effect, as observed experimentally. Both reaction steps of the dissociative pathway show small inverse KIEs, while in the associative mechanism step 2 gives a small normal KIE of 1.0021. The inverse isotope effect predicted for the first associative transition state can be rationalized in terms of well advanced proton transfer to the phosphate group so that the EIE of protonating the phosphate group manifests itself. The second transition state corresponds to the partial deprotonation of the equatorial oxygen atom and this in fact gives the normal KIE for this step. However, the overall value of associative reaction KIE is less than 1. Similar values for the KIEs are obtained in all cases by just considering the zero-point energy vibrational contributions. The actual values of the calculated KIE differences between the associative and dissociative mechanisms depend on the rather complicated pattern of contributions coming from bondstretching and bending frequencies. As expected [64,69], expansion of the basis set has some effect on our calculated EIE and KIE values, but the qualitative picture remains the same. The fact that the EIE on phosphate deprotonation is substantially overestimated compared to experiment would presumably lead to an underestimation of the KIEs for the associative transition states. This is because both 4TS and 6TS have doubly protonated character on the equatorial oxygens, so that a too inverse effect on protonation of the phosphate monoanion would reduce the overall KIE for this reaction path. Conversely, one might expect the KIE for the dissociative transition states to be somewhat

284

AG (kcal/mol) 4

-i-J .

.

.

.

.

1.6 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

-10

Figure 17. Relative gas phase energies of methylphosphate ester hydrolysis for the structures given in Figure 16. Free energies calculated at the MP2/6-31++G**//HF/6-31++G** basis set level, T=298.15 K, frequencies scaled by 0.8929. overestimated since the non-bridging oxygens in both 9TS and 12TS have unprotonated character. The inclusion of solvent effects using continuum SCRF models does neither provide any significant effect on the geometries of the reactive species (the ionic P-O bond lengths change by around 0.01-0.02 A) nor on the calculated EIE and KIE values. However, the observed trend demonstrates a small increase in the value of normal EIE related to deprotonation of phosphate. The corresponding small decrease in the KIE of the associative mechanism and increase in value of dissociative mechanism are also found. Another clue for improving the theoretical description of the considered isotope effects is obtained when we try to rationalize the relatively large discrepancy between the experimental and calculated EIE for deprotonation of the phosphate. The ratio of the v(~60)/v(~80) for the O-H bond frequencies can be estimated from the Hooke's law as 1.00328. The change in the zero point energy for the breaking of the 160-H bond in the reaction HzPO4---~HPO42-is calculated as 7.80 kcal/mol using the 6-3 I++G** basis set. Consequently, this provides an estimate for the EIE related to the hydrogen abstraction equal 1.0440 based only on the isotope effect on the reactant O-H frequency. The EIE for the phosphate deprotonation based on the actual ZPEs for reactants and products equals 1.0501, which is close to the above estimate for the hydrogen abstraction.

285

Table 2. Calculated and experimental values for 180 equilibrium and kinetic isotope effects in methylphosphate monoanion reactions. IE, calculated a Reaction

IE, experiment gas phase

H2p1804- ~---x Hp18042-

6-31G* 1.0378

6-3I+G* 1.0455

water 6-3I++G** 6-3I++G** 1.0466 1.0466 (1.0400)*

(1.0444)*

1.019 (0.001) [61]

Me 16Op1803H-~- Me 16Op18032-

1.0214

1.0279

1.0279

1.0301

1.015 (0.002) [61 ]

Associative hydrolysis

0.9936

0.9874

0.9884

0.9832

1.013 (0.002)[62]

Dissociative hydrolysis

0.9979

0.9969

0.9979

1.0000

1.013(0.002) [62]

a Frequencies scaled by 0.8929, T=298.15 K. * One water molecule in complex with the phosphate.

The larger EIE in the latter case reflects the weakening of P-O bonds in the phosphate dianion. Thus, ab initio calculations seemingly provide a realistic estimate for the phosphate deprotonation in the gas phase, but disagree with the experimental data in solution. This raises the question whether the water molecules play a more active role in phosphate hydrolysis than just providing a polar environment for the intramolecular processes. One promising but not completely conclusive attempt of such kind is given in [65], where the authors tried to search the associative and dissociative hydrolytic pathways with the extra proton transfer route over the water molecules. Here we investigate the issue of solute vibrational coupling to the solvent in the calculations of isotope effects for the phosphate deprotonation reaction. We model the microscopic effects of the water molecules in the simplest manner just by considering the double-hydrogen bonded complex of inorganic phosphate with one water molecule. A substantial drop by 0.0066 in the calculated gas phase values of EIE is observed in such a model system (Table 2). The inclusion of the continuum solvent effects using the Onsager model again works in the opposite manner and increases the EIE. These results indicate that the calculation of isotope effects in phosphate hydrolysis are not reliable without a realistic treatment of the solute vibration coupling to solvent (H-bonds), but the inclusion of electron correlation might as well substantially change the calculated KIE values. Extension of the computational models to treat these effects will necessarily lead to a better understanding of the heavy atom kinetic isotope effects in phosphate hydrolysis.

286

REFERENCES

[ 1] T. Hunter, Cell 58 (1989) 1013. [2] E.H. Fischer, H. Charbonneau, N.K. Tonks, Science 253 (1991) 401. [3] D. Barford, Z. Jia, N.K. Tonks, Nature Struct. Biol. 2 (1995) 1043. [4] J.A. Stuckey, H.L. Schubert, E.B. Fauman, Z.-Y. Zhang, J.E. Dixon, M.A. Saper, Nature 370(1994) 571. [5] X.-D. Su, N. Taddei, M. Stefani, G. Ramponi, P.Nordlund, Nature 370 (1994) 575. [6] T.M. Logan, M.M. Zhou, D.G. Nettesheim, R.L. Meadows, R.L. Van Etten, S.W. Fersik, Biochemistry 33 (1994) 11087. [7] M. Zhang, R.L. Van Etten, C.V. Stauffacher, Biochemistry 33 (1994) 11097. [8] D. Barford, A.J. Flint, N.K. Tonks, Science 263 (1994) 1297. [9] Z. Jia, D. Barford, A.J. Flint, N.K. Tonks, Science 268 (1995) 1754. [10] J. Yuvaniyama, J.M. Denu, J.E. Dixon, M.A. Saper, Science 272 (1995) 1328. [ 11 ] M. Zhang, M. Zhou, R.L. Van Etten, C.V. Stauffacher, Biochemistry 36 (1997) 15. [12] Z.-Y. Zhang, W.P. Malachowski, R.L. Van Etten, J.E. Dixon, J. Biol. Chem. 269 (1994) 8140. [13] J.M. Denu, G. Zhou, Y. Guo, J.E. Dixon, Biochemistry 34 (1995) 3396. [ 14] B. Evans, P.A. Tishmack, C. Pokalsky, M. Zhang, R.L. Van Etten, Biochemistry 35 (1996) 13609. [15] M.S. Saini, S.L. Buchwald, R.L. Van Etten J.R. Knowles, J. Biol. Chem. 256 (1981) 10453. [16] Z. Zhang, E. Harms, R.L. Van Etten, J. Biol. Chem. 269 (1994) 25947. [ 17] N. Taddei, P. Chiarugi, P. Cirri, T. Fiaschi, M. Stefani, M. Camici, G. Raugei, G. Ramponi, FEBS Lett. 350 (1994) 328. [ 18] A. Warshel, Computer Modeling of Chemical Reactions in Enzymes and Solutions. New York: Wiley, 1991. [19] A. Warshel, F. Sussman, J.-K. Hwang, J. Mol. Biol. 201 (1988) 139. [20] J. Aqvist, A. Warshel, Chem. Rev. 93 (1993) 2523. [21] J. Florian, A. Warshel, J. Phys. Chem. B 102 (1998) 719. [22] J. Florian, J./kqvist, A. Warshel, J. Am. Chem. Soc. 120 (1998) 11524. [23] D.G. Gorenstein, B.A. Luxon, J.B. Findlay, J. Am. Chem. Soc. 101 (1979) 5869. [24] A. Yliniemela, T. Uchimaru, K. Kanabe, K. Taira, J. Am. Chem. Soc. 115 (1993) 3032. [25] J. Aqvist, K. Kolmodin, J. Florian, A. Warshel, Chemistry & Biology 6 (1999) R71. [26] J.P. Guthrie, J. Am. Chem. Soc. 99 (1977) 3991. [27] T. Hansson, P. Nordlund, J. Aqvist, J. Mol. Biol. 265 (1996) 118. [28] J. Aqvist, M. Fothergill, J. Biol. Chem. 271 (1996) 10010. [29] A.J. Kirby, A.G. Varvoglis, J. Am. Chem. Soc. 89 (1967) 415. [30] S. Akerfeldt, Acta Chem. Scand. 17 (1963) 319. [31] N. Bourne, A. Williams, J. Org. Chem. 49 (1984) 1200. [32] K. Kolmodin, T. Hansson, J. Danielsson, J. Aqvist. (1998) ACS Symposium series 721, 370. [33] C. J. Cramer, D.G. Truhlar, Science 256 (1992) 213. [34] J.M. Karle, I.I. Karle, Acta Cryst. C 44 (1988) 135. [35] W.F van Gunsteren, H.J.C. Berendsen, Groningen Molecular Simulation (GROMOS) Library Manual, Biomos BV, Nijenborgh 16, Netherlands: Groningen, 1997.

287

[36] Insight II, Biosym/MSI, San Diego, USA, 1995. [37] J. Marelius, K. Kolmodin, I. Feierberg, J. Aqvist, J. Mol. Graph. Model. 16 (1998) 213. [38] F.S. Lee, A. Warshel, J. Chem. Phys. 97 (1992) 3100. [39] Z.-Y. Zhang, R.L. Van Etten J. Biol. Chem. 266 (1991) 1516. [40] Z.-Y. Zhang, R.L. Van Etten, Biochemistry 30 (1991) 8954. [41] J. Aqvist, C. Medina, J.-E. Samuelsson, Protein Eng. 7 (1994) 385. [42] J. Marelius, M. Graffner-Nordberg, T. Hansson, A. Hallberg, J. Aqvist, J. Comput.-Aided Mol. Design 12 (1998) 119. [43] K. Kolmodin, P. Nordlund, J. Aqvist, Proteins 36 (1999) 370. [44] A.C. Hengge, Z. Yu, L. Wu, Z.-Y. Zhang. Biochemistry 36 (1997) 7928. [45] A.D.B. Pannifer, A.J. Flint, N.K. Tonks, D Barford, J. Biol. Chem. 273 (1998) 10454. [46] P. Cirri, P. Chiarugi, G. Camici, G. Manao, G. Raugei, G. Capugi, G. Ramponi, Eur. J. Biol. Chem. 214 (1993) 637. [47] J.P. Davis, M.-M. Zhou R.L. Van Etten, J. Biol. Chem. 269 (1994) 8734. [48] K. Kolmodin, J. Aqvist, FEBS letters 456 (1999) 301. [49] M.M.G.M. Thunnissen, N. Taddei, G. Liguri, G. Ramponi, P. Nordlund, Structure 5 (1997) 69. [50] M. Eigen, Angew. Chem. (Intl. Ed. Engl.), 3 (1964) 1. [51] J.Lobaugh, G.A.Voth, J. Chem. Phys. 100 (1994), 3039. [52] J.K. Hwang, Z.T. Chu, A. Yadav, A. Warshel, J. Phys. Chem. 95 (1991) 8445. [53] I. Feierberg, V. Luzhkov, J. Aqvist (submitted). [54] G. H. Peters, T.M. Frimurer, O.H. Olsen, Biochemistry 37 (1998) 5383. [55] E.B. Fauman, J.P. Cogswell, B. Lovejoy. W.J. Rocque, W. Holmes, V.G. Montana, H. Piwnica-Worms, M.J. Rink, M.A. Saper, Cell 93 (1998) 617. [56] K. Kolmodin, J. Aqvist, FEBS letters 465 (2000) 8. [57] S.J. Benkovic, K.J., Schray, In: P.D. Boyer, Ed., The Enzymes, 201-238, Academic Press, New York 1973. [58] G.R.J. Thatcher, R. Kluger, Adv. Phys. Org. Chem., 25 (1989) 99. [59] S. Admiraal, D. Herschlag, Chemistry & Biology, 2 (1995) 729. [60] K. Scheffzek, M.R. Ahmadian, W. Kabsch, L. Wiesmuller, A. Lautwein, F. Schmitz, A. Wittinghofer, Science 277 (1997) 333. [61] W.B. Knight, P.M. Weiss, W.W. Cleland, J. Am. Chem. Soc. 108 (1986) 2759. [62] P.M. Weiss, W.B. Knight, W.W. Cleland, J. Am. Chem. Soc. 108 (1986) 2761. [63] M.J. Frisch, et al., Gaussian 94, Revision B2, Gaussian Inc., Pittsburgh PA, 1995. [64] J.B. Foresman, A. Frisch, Exploring Chemistry with Electronic Structure Methods, 2nd ed. Gaussian, Inc., Pittsburgh, PA, 1996. [65] C.-H. Hu, T. Brinck, J. Phys. Chem. A 103 (1999) 5379. [66] C.A. Bunton, D.R. Llewellyn, K.G. Oldham, C.A. Vernon, J. Chem. Soc. (1961) 2670. [67] N.J. Harris, J. Phys. Chem. 99 (1995) 14689. [68] W.-P. Hu, D.G. Truhlar, J. Am. Chem. Soc. 118 (1996) 860. [69] S.S. Glad, F. Jensen, J. Phys. Chem. 100 (1996) 16892.

This Page Intentionally Left Blank

L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

Chapter 8

Monte Carlo simulations of HIV-1 protease binding dynamics and thermodynamics with ensembles of protein conformations 9 incorporating protein flexibility in deciphering mechanisms of molecular recognition Gennady M. Verkhivker ,*Djamal Bouzida, Daniel K. Gehlhaar, Paul A. Rejto, Lana Schaffer, Sandra Arthurs, Anthony B. Colson, Stephan T. Freer, Veda Larson, Brock A. Luty, Tami Ma-rrone, and Peter W. Rose Agouron Pharmaceuticals, Inc., A Warner-Lambert Company 10777 Science Center Drive, San Diego, CA 92121-1111 USA

I. Structural m o d e l s of molecular recognition Understanding of the molecular recognition mechanisms has been greatly advanced in the last decade by computer simulations of ligand-protein interactions on an atomic level [1-12] and by studying the nature of the underlying energy landscape which describes the free energy of the system as a function of its coordinates [13-23]. It has been recognized that proteins are not ~ adequately described by a single conformational state, but are better represented by a manifold of low-energy protein conformations, conformational substates, on a rugged energy landscape [24-30]. A typical folded protein has a well-defined overall fold, but upon closer examination it may be seen *the corresponding author

289

290

as a myriad of different nearly isoenergetic structures, populated in a thermal Boltzmann equilibrium. Current view of the protein energy landscape picture implies that the conformational substates, that represent local minima of the protein, are organized in hierarchical tiers that are separated by barriers which can be crossed by thermal activation [27-30]. Within this hierarchy, alternative conformational states are defined by significant differences in protein conformation and large energy barriers, while modest coordinate changes, with concomitantly smaller energy barriers, characterize alternative protein conformational substates. Recent optical experiments have studied conformational fluctuations of myoglobin in real time and have suggested that proteins may have a hierarchy of energy barriers on different length and energy scales [27,28]. According to the emerging wisdom, the protein energy landscapes may be characterized by a number of discrete tiers of conformational substates, each tier within a hierarchy of conformational substates having approximately the same barrier, but with separate tiers having different distributions of barrier heights. Furthermore, it was suggested that the protein energy landscapes are self-similar, i.e. the protein fluctuations associated with each tier in the hierarchy of conformational substates belong to the same class of global conformational arrangements [27,28]. Accessibility of alternative conformational states is important for protein function, including assembly, molecular recognition, regulation of biological activity, and enzymatic catalysis [27-30]. Protein structures determined in different environments, at high pressure, under various pH and solvent conditions, in different crystal forms as well as bound to inhibitors provide information about protein responses and protein conformational substates. Protein mutants may also be regarded as local perturbations of the native structure, and comparison of mutant crystal structures typically reveals conformational protein substates [30]. Another type of perturbation of the protein conformation is usually seen during complex formation with peptides, substrates, ions or ligands. Analogous to a typical folded protein, ligand-protein complexes generally have a well-defined native structure, but on a microscopic level a ligand-protein system may exhibit structural disorder that is revealed on different length and time scales: by rotation of a local protein side-chain, by conformational change of the ligand in the active site or by a collective conformational change associated with a movement of the protein backbone, side-chains and a change of the ligand binding mode. There are two regulatory mechanisms

291

whereby binding can produce a significant conformational rearrangement of the ligand-protein complex structure. In kinetic regulation, a barrier separating two conformational substates is reduced or eliminated as a result of complex formation. In thermodynamic regulation, the free energy of an alternative conformational state is lowered and becomes the new free energy minimum. In both scenarios, the overall shape of the energy landscape is preserved and the initial and alternative conformations remain local energy minima. Hence, local perturbations, for instance a single mutation in the active site or ligand binding, need only induce minor adjustments in a barrier height or in the relative energetics of the local minima in order to rearrange the conformational substate of the system [30]. Conformational substates that represent local minima of the ligand-protein system can be organized in a hierarchy of ligand-protein binding modes and corresponding families of protein conformational fluctuations. Distinct, functionally important conformational substates of the HIV-I protease have been observed by comparing the crystal structure of the protease in its unbound form with the crystal structures of the same protein in complexes with a diverse set of inhibitors [31-33]. With the aid of high sensitivity differential scanning calorimetry, the exact balance of the ~intra-subunit and intersubunit energetic contributions was elucidated and structural distribution of forces that stabilize HIV-I protease was determined [34]. The HIV-I protease stabilization free energy is primarily determined by the dimerization interface, whereas the isolated subunits are not stable. Only after dimerization, where a moderate decrease in conformational entropy is offset by a much larger increase in solvation entropy, the resulting entropy contribution becomes favorable for stabilization of the HIV-I protease. A structure-based thermodynamic analysis approach has reproduced the balance of stabilizing contributions and the magnitude of the Gibbs free energy of HIV-I protease stabilization in agreement with the experimental measurements of the free energy and its enthalpy and entropy components [34]. This approach is based on structural parameterization of folding and binding energetics of proteins, peptides and synthetic ligands [35-41], whereby experimental results on the types of stabilizing forces in folding and binding were used to establish the appropriate energy model. The resulting binding free energy model is based on the conjecture that the underlying physical forces that govern the process of ligand-protein binding are the same as in protein folding [42-45]. It is widely recognized that the major components of protein stabilization are

292

hydrophobic interactions and hydrogen bonds, with the hydrophobic effect representing the dominant force in stabilizing the protein structure and defined as the combined effect of protein internal van der Waals interactions and hydration of non-polar groups [42-45]. Consequently, structural parameterization of binding energetics is calculated separately for the enthalpy and entropy components of the Gibbs free energy, includes the electrostatic and ionization effects, and the contribution due to the change in translational degrees of freedom. The enthalpy contribution of the free energy results from the formation of van der Walls interactions, hydrogen bonding and concomitant desolvation of the interacting groups. This free energy component is parameterized in terms of changes in apolar and polar solvent-accessible surface areas. The entropy contribution is composed of solvation component and changes in conformational degrees of freedom. The magnitude of conformational entropy contributions for each amino acid has been estimated by computing the probability profiles of different conformational states as a function of dihedral angles [35,39]. A detailed structural mapping of the ligand-protein binding energetics has been performed for a number of peptidic and synthetic HIV-1 protease inhibitors [40]. Thermodynamic analysis has shown that the inhibitor binding to the HIV-1 protease is not enthalpy favored and that the major contribution to the Gibbs free energy is determined by the hydrophobic effect, resulting from the favorable entropy of water molecules released from ligand and protein groups. The enthalpy contributions are unfavorable at room temperature and are dominated by the positive enthalpy of desolvating hydrophobic groups. The driving force of binding, determined by the entropy gain, is opposed by the positive enthalpy change, negative change in conformational entropy of the inhibitor and protease side-chains as well as the negative change in translation entropy. These data are consistent with the calorimetric analysis of hydration enthalpy and entropy contributions to protein folding [46,47], supporting the notion that stabilizing forces in protein folding and ligand-protein binding are similar and appropriately derived energetic models can adequately describe both folding and binding phenomena. The structure-based thermodynamic method combines the derived binding free energy model with the formalism which computes probabilities of individual amino acids being folded in native-like conformations and thereby allows to determine structural stability of different protein regions [48-53]. In a single site thermodynamic mutation approach, the cooperativity of in-

293

teractions in the protein can be examined by computing the free energy of all available protein states given a particular residue being held in its folded native conformation [54]. The HIV-1 protease stabilization free energy is not uniformly distributed along the dimerization interface and the binding site has the dual character with the regions of high and low structural stability [55]. The flap region of the HIV-1 protease molecule has only marginal stability, a high propensity to undergo independent local unfolding and is forced into a closed conformation by favorable interactions with the inhibitor whereas the flap reorganization energy is unfavorable. The existence of multiple conformational substates for the HIV-1 protease, which is caused by the presence of several mobile regions undergoing local folding-unfolding transitions upon ligand binding, results in local cooperativity effects. This allows to characterize structural and energetic distributions of the protein response caused by energy perturbations originated at different locations of the prorein, which can be induced by either mutations or ligand binding [54]. The effect of inhibitor binding on stability and cooperativity of the HIV-1 protease was elucidated by identifying the protease regions with high and low structural stability and discovering that local cooperativity effects are not limited to the active site residues, but can propagate to a small subset of remote protease regions [55]. The HIV-1 protease residues that have low structural stability in the uncomplexed binding site are involved in selective transmission of the binding stimulus to distant protease regions [56]. Structural disorder of the HIV-1 protease, which is localized in several mobile regions and a dual character of the active site with regions of high and low structural stability can serve important biological functions, in particular, conferring inhibitor resistant mutations in the HIV-1 protease [57-60].

II. Structure-based analysis of HIV-1 proteaseinhibitor binding A number of HIV-1 protease inhibitors, used in clinic as therapeutic agents, have produced resistant variants with point mutations in various regions of the protease. Forty weeks of treatment with indinavir [61] produces a 15-fold resistant variant with five L10R/M46I/L63P/V82T/I84V mutations [62]. A 10-25 fold resistant variant with five M46I/L63P/A71V/V82F/I84V muta-

294

tions emerges in the presence of ritonavir [63,64]; 100-fold resistant virus with G48V/L90M mutations appears during therapy with the saquinavir inhibitor [65,66]. The reduction in binding affinity for saquinavir with L90M, G48V and L90M/G48V mutants is primarily due to larger dissociation rate constants and a decrease in the internal equilibrium between the bound inhibitor with the protease flaps up and the bound inhibitor with the flaps down [67]. More evidence was recently provided to the conjecture that the reduction in affinity between HIV-1 protease inhibitors and a particular mutant can be due to a reduction in protease dimer stability, in addition to, and independent of the intrinsic inhibitor affinity for the mutant dimer [68]. Thermodynamic equilibrium studies, conducted for a number of inhibitors on several drugresistant HIV-1 protease mutants V82F, V82F/I84V, V82T/I84V, L90M, have shown that reduction in the binding affinity is due to a combined effect of both the dimer stability and the inhibitor binding. Mutations conferring resistance are located in the HIV-1 protease regions of different structural stability : the active site, flaps, dimer interface, and the surface loops. Active site mutations typically play a leading role in modulating the affinity of the protease because mutations usually accumulate in a stepwise fashion, appearing first in the active site and then in compensatory regions. Mutations that are located in the active site reduce the number of favorable van der Waals contacts, increase steric hindrance or produce unfavorable electrostatic interactions. The loss of binding affinity going from the wild-type HIV-1 protease to mutants can be attributed in 40-65 % cases to amino acid mutations away from the active site and not in direct contact with the inhibitor [69]. Substitutions outside of the active site are thought to produce compensatory changes that affect the activity of the protease by either altering the stability of the protease dimer or indirectly influencing binding through long-range cooperative interactions [70]. According to the single site thermodynamic mutation approach, binding perturbations in the low stability flap region can trigger a large redistribution of the conformational protein ensemble and the free energy required to bring the flap into the optimal binding conformation can be affected by distant mutations. This type of flexibility enhances the probability of generating resistant forms of the protease with mutations in the flap region. The growing body of structural and thermodynamic data has revealed similarities and differences in molecular origins for inhibitorsspecificity against HIV-1 protea~e and its various mutant forms. The crystal structures of three

295

mutant protease I84V, V82, V82F/I84V complexes with cyclic urea-based inhibitors DMP323 and DMP450 have been solved to explain modulation in inhibitor binding [70]. These mutations represent key protease residues associated with the HIV-1 protease resistance towards this class of inhibitors. The substitutions produce only local perturbations that alter the network of van der Waals ligand-protease contacts, but retain the hydrogen bonding pattern. It appears, that mutations are not additive and compensatory shifts in the I84V and V82F/I84V complexes produce a small number of new contacts, which are insufficient to compensate the initial loss of interactions caused by mutations. In a subsequent study, the inhibitors which included indinavir [61,62], ritonavir [63,64], saquinavir [61,62], nelfinavir [71] and 14 cyclic urea-based inhibitors were tested against the V82F, V82, I84V and V82F/I84V mutations [72]. Single mutations V82F and I84V cause moderate changes in binding affinity as compared to the wild-type complexes, whil more significant changes have been observed for the double mutation V82F/I84V. It was suggested that the therapeutic effectiveness of DMP323 and DMP450 inhibitors may be improved by increasing the size and flexibility of the inhibitor to maintain a certain critical number of favorable contacts and to accommodate to protein conformational changes by forming new interactions that were lost in the mutation sites. A series of novel cyclic urea inhibitors was developed [73] based on the premise that the number of hydrogen bonding interactions between the designed inhibitor and the HIV-1 protease backbone should remain constant, while a larger number of non-bonded contacts must be maintained throughout the entire binding site. Crystal structures of HIV-1 protease complexes with DMP-323, XV368 and SD146 inhibitors rationalized the dramatic improvement in the resistant profile, exhibited by larger and more flexible cyclic urea derivatives XV368 and SD146 of the original DMP323 inhibitor [73]. Subsequently, the crystal structures of the three active-site mutant proteases V82F, I84V and V82F/I84V in complexes with XV368 and SD146 have identified interactions that are responsible for the high potency and broad specificity of these inhibitors [74]. These structural results have suggested that high potency against the wildtype HIV-1 protease and retained affinity to a broad spectrum of mutations conferring resistance can be achieved by increasing the total number of hydrogen bonds, while sustaining the hydrogen bonds formed to the protease backbone and preserving favorable ligand-protease contacts in all six enzyme subsites.

296

II.1. Structure-based analysis of HIV-1 protease-SB203386 inhibitor binding Comparative structure analysis of HIV-1, HIV-2 and SIV protease in complexes with the same inhibitor has shown only minor differences and nearly identical protease tertiary structures [75-78] but may exhibit different ligand binding modes. An unexpected binding mode with two symmetry-related molecules each bound to half of the active site has been found in the complex of the SB203386 inhibitor with SIV protease [78]. Recently, it was determined that mutating residues from 31 to 37 alone in 30's loop of the HIV-1 protease produce a resistance pattern against a broad range of inhibitors, including SB203386 [78-80]. In order to determine individual contributions of the 30's loop residues to the binding affinity and specificity, a number of chimeric proteases were constructed [80]. The crystal structures of the SB203386 complexes with three chimeric HIV-1 proteases, denoted as HIV-1 (2:31-37), HIV-1 (2: 31,33-37) and HIV-1 (2:31-37,47,82), in which the HIV-1 protease residues were substituted by the corresponding amino acids of the HIV-2 protease, have been recently determined at high resolution [81]. These structures have provided significant additional insights into the molecular basis for SB203386 selectivity pattern against HIV-1 protease mutants. There is a general trend in decreasing binding affinity of SB203386 for the protease as the number of HIV-2 protease residues increases, except for the HIV-1 (2:31-37,47,82) which reverts to a moderate affinity and the wild-type complex mode of binding. The HIV-1 protease triple mutant V32I/I47V/V82I, denoted as HIV-1 (2: 32,47,82), HIV-1 (2: 31,33-37) and HIV-1 (2: 31,33-37,47,82) mutants have a moderate and similar effect on the SB203386 inhibitor affinity [80]. While binding affinities of the wildtype complex and the HIV-1 (2: 32,47,82) triple mutant correspond to the Ki values of 18 nM and 110 nM, the Ki values of HIV-1 (2: 31,33-37) and HIV-1 (2: 31,33-37,47,82) chimeras are 210 nM and 460 nM respectively. These mutations, however, are not as nearly detrimental for SB203386 binding affinity as a combined effect in the HIV-1 (2:31-27) chimera where the Ki value of 1410 nM is similar to the activity seen in SB203386 complexes with HIV-2 protease (Ki=1280 nM) and SIV protease (Ki-960 nM). The binding mode of the SB203386 inhibitor in the HIV-1 protease triple mutant HIV-1 (2: 32,47,82), where the HIV-1 protease residues were mutated to the corresponding amino acids of HIV-2 and SIV proteases, remains identical to

297

the wild-type complex. Introducing the HIV-2 residues at positions 31 and 33-37 moderately increase the Ki value by 12-fold, and maintains the HIV-1 protease-like mode of SB203386 binding. However, adding to this change the Ile residue at position 32 as in HIV-1 (2: 31-37) increases the Ki to a value comparable to that of HIV-2 or SIV proteases, and changes the inhibitor mode of binding to two ligand molecules in the active site, as seen in the SIV protease complex [81]. The binding mode of SB203386 in the complex with HIV-2 protease has not been determined crystallographically, but it is expected to be similar to that of in the SIV complex [80,81]. This has led to the conjecture that the SB203386 inhibitor binding affinity and specificity may be conferred by a combination of the active site residues 32, 47, 82 along with a loop of residues 31-37, which mostly lie outside of the active site. In the crystal structure of the HIV-1 (2:31-37) chimera complex with SB203386, structural changes in the vicinity of the active site residue Ile32 result in the extension of 80's loop residues towards the active site and cause the decrease in the size of the active site cavity. These structural changes were also observed in the crystal structure of the SB203386 complex with SIV protease. It was suggested that not only changes in the 30's loop may affect the structural stability of the protease dimer, but also the induced changes in the 80's loop may have a detrimental effect on the interactions with the inhibitor and a subsequent significant reduction of the SB203386 binding affinity [81]. In contrast, the crystal structure of the HIV-1 (2:31,3337) chimeric complex does not show the 80's loop motions and the observed loss in the SB203386 inhibitor binding affinity was primarily attributed to the changes in the dimer stability caused by mutations in 30's loop sequence. The structural flexibility in the 30's and 80's loops observed in the HIV-1 (2: 3137,47,82) chimeric complex with SB203386 combined with the compensatory enlargement changes of the active site cavity relative to the HIV-1 (2: 3137) complex were suggested as primary reasons for the restoration of the wild-type ligand binding mode and less dramatic loss of affinity [81]. A widely accepted two-step mechanism of HIV-1 protease binding implies the creation of a loose complex with the open form of the enzyme, followed by the conformational change involving the closure of the flap region over the active site and formation of the final bound complex. Consequently, binding affinity differences between the HIV-1 protease and its mutants may also result from the changes in the internal equilibrium between the bound form of the protease with closed flaps conformation and the unbound open form

298

of the enzyme. The reduction in the inhibitor binding affinity between the wild-type HIV-1 protease and a particular mutant can be due to changes in the protease dimer stability, independent of the differences in the inhibitor interaction energies. The flap shifts observed in crystal structures of the chimeric HIV-1 (2: 31,33-37), HIV-1 (2: 31,33-37,47,82), and HIV-1 (2:31-27) proteases suggested that, in addition to the changes in the enzyme-inhibitor interactions, a decreased stability of the closed form of the enzyme in solution may contribute to the reduction in binding affinity observed in complexes of SB203386 with these chimeric proteases. In this study, however, we focus only on the analysis of changes in the enzyme-inhibitor interactions and the role of compensatory changes in the active site residues to the binding affinity reduction of the SB203386 complexes with the HIV-1 protease mutants.

III. Structure-based computational models of ligand-protein binding dynamics and molecular docking Computational studies of molecular recognition usually require the consistent and rapid determination of the global energy minimum of a ligandprotein complex which must correspond to the experimentally solved X-ray structure [1-12]. Recent advances in computational structure prediction of ligand-protein complexes utilize a diverse range of energetic models, based on either surface complementarity [82-89] or atom-atom representations of the intermolecular interactions [90-95]. A variety of optimization docking techniques include Monte Carlo methods [96-98], molecular dynamics [99,100], genetic algorithms [101-103], tabu searching algorithm [104] and are focused primarily on molecular docking of flexible ligands into proteins which are held fixed in a bound conformation, while the internal degrees of freedom of the ligand and its rigid body variables are optimized. Combined flexible ligand docking and protein side-chain optimization techniques have been recently proposed in molecular recognition studies [105-107]. A variant of the dead-end elimination (DEE) algorithm has been used to avoid a combinatorial explosion by restricting both the ligand and the side-chains of the receptor residues to a limited number of discrete low-energy conformations [105]. The combinatorial problem in flexible peptide docking with major

299

histocompatibility complexes receptors was also approached by utilizing the DEE algorithm to optimize protein side-chains that adopt to the docked peptide conformations [106]. A hierarchical computational approach was introduced for predicting structures of ligand-protein complexes and analyzing binding energy landscapes, which combines Monte Carlo simulated annealing technique t o determine the ligand bound conformation with the DEE algorithm for side-chain optimization of the protein active site residues [107]. Limited protein side-chain flexibility has been employed in the GOLD program [103]. These approaches incorporate protein flexibility by using rotamer libraries of side-chains [105-107], Monte Carlo simulations combined with minimization in flexible binding sites [98] or molecular dynamics docking simulations [100]. A combination of energetic models with stochastic optimization techniques have led to a number of powerful strategies for computational structure prediction of ligand-protein complexes and docking of flexible ligands to a protein with a rigid backbone and flexible side-chains has now become more feasible [105-109]. The NP-hardness of the ligand-protein recognition problem, as in protein folding, implies that for a given protein there may be ligands that do not find the global free energy minimum on the binding energy landscape in a reasonable amount of computer time given a high degree of complexity and frustration of the underlying binding energy landscape. Nevertheless, ligand-protein complexes with experimentally determined X-ray structures must recognize their global free energy minimum rapidly and consistently. The energy of the crystallographic structure of the ligand-protein complex must be the global minimum on the binding energy landscape, representing a thermodynamic requirement on the energy function in docking simulations, and this conformation must be accessible during the search, which is a kinetic condition of the docking problem. A simplified energy function in combination with evolutionary sampling technique was developed to satisfy both thermodynamic and kinetic requirements in docking by reducing frustration of the underlying binding energy landscape [91,92,110,111]. Robust structure prediction of bound ligands given a fixed conformation of the native protein can be achieved with the family of simplified knowledge-based energy functions by generating binding energy landscapes with co-existing correlated, funnel-like [15-23,112-114] and uncorrelated, rugged features. While adequate for non-polar and hydrogen bonds patterns, this simplified energy - - include a direct electrostatic component and therefore may

300

be expected to fail when extensive networks of electrostatic interactions are present in the crystal structures. By contrast, the GOLD algorithm employs a template of protein hydrogen bond donors and acceptors, and uses a genetic algorithm to sample intermolecular hydrogen bonds networks and ligand conformations [103]. This approach lacks a desolvation component and was found to be less suitable in finding hydrophobic interactions. Docking methodologies implemented in such programs as Hammerhead [93], FLEXx [94], and GOLD [103] have been validated on a large number of ligand-protein complexes with known crystal structures to test robustness of the method. There have been also studies which employed explicit protein flexibility [115,116]. However, the results of flexible ligand docking with a receptor in the absence of any experimentally known protein bound conformation are considerably less reliable [117]. Applications of flexible ligand docking techniques range from the analysis of the binding energy landscapes [118,119] to lead discovery [120], database mining [121], and structure-based combinatorial ligand design [122] and include simulations with ensembles of multiple ligands [123] and ensembles of multiple protein conformations [124,125]. A recently introduced molecular docking technique employs a set of related crystal structures as "snap shots" of a dominant protein conformation perturbed by different ligands, crystallization conditions and simple mutations [124]. The analysis of the effect of multiple protein conformational substates in response to ligand binding has led to some practical recipes to effectively account for the types of protein flexibility that may occur upon ligand binding [124,125]. Docking simulations usually determine a single structure of the complex with the lowest energy and postulate that the lowest energy conformation corresponds to the native structure. The number of low-energy structures is usually very large and a computationally demanding task of finding the lowest energy structure does not imply its thermodynamic stability. Nevertheless, the structure prediction problem implies determination of the ensemble of many similar conformations which describe the thermodynamically stable native basin of the global energy minimum rather than a single structure [126]. We have previously established that the results of kinetic docking simulations can be rationalized based on the thermodynamic properties of ligand-protein binding determined from equilibrium simulations and the analysis of the binding energy landscape [118,119,127,128]. The robust topology of the native structure is a decisive factor contributing to the ther-

301

modynamics and dynamics of well-optimized ligand-protein complexes such the MTX-DHFR system, that appear to be robust to structural perturbations, variations in the ligand composition and accuracy of the energetic model [127,128]. Topological features of the native complexes that are critical for robust structure prediction and thermodynamic stability and are determined by early ordering of the recognition ligand motif in its native conformation. Structural stability of these motifs contributes decisively to the topology and thermodynamic stability of the native ligand-protein complex [127,128]. These molecular fragments, termed recognition anchors, exhibit a high structural consensus or accessibility of the dominant native binding mode in docking simulations [20,111]. In addition, these molecular fragments maintain structural stability of the bound conformation when embedded in larger molecules, a property that we termed structural harmony. For 'optimal' ligand-protein complexes, native interactions are stronger on average than non-native interactions, which results in gradual energy decrease as the native interactions are progressively formed and a dominant, conformational funnel leading to the native structure [118,119,127,128]. Comparing the results of validation docking experiments performed on a large number of Protein Data Bank (PDB) ligand-protein complexes with the GOLD program [103] and with our docking strategy [91,92], we have detected a number of complexes where both methods fail to predict the crystal structures [129]. Misdocked predictions in ligand-protein docking can be categorized as soft and hard failures. Soft failure is defined as the case when the energy of the crystal structure, after minimization with the chosen force field, is lower than the energy of the lowest energy conformation found in docking simulations. A soft failure is due to a flaw in the search algorithm, which is unable to find the global energy minimum. Hard failures are more difficult; they arise when the energy of a misdocked structure is lower than the energy of the minimized crystal structure. Hard failures result from an inability to accurately reproduce subtle differences in the relative energies of alternate binding modes, a problem that compounded by competing electrostatic and van der Waals interactions which results in a frustrated binding energy landscape. A hierarchical approach, that involves a hierarchy of energy functions, has been proposed in the analysis of common failures in molecular docking [129,130]. This protocol identifies clusters of structurally similar low-energy conformations, generated in equilibrium simulations with the simplified energy function, followed by subsequent energy minimization

302

with the molecular mechanics force field. The successes and failures in docking simulations have been explained based on the thermodynamic properties determined from equilibrium simulations and the shape of the underlying binding energy landscape.

IV. Computer simulations of ligand-protein binding In simulations of ligand-protein interactions, rigid body degrees of freedom and rotatable angles of the ligand are treated as independent variables. Ligand conformations and orientations are sampled in a parallelepiped that encompasses the binding site obtained from the crystallographic structure of the corresponding complex with a 5.0 ft. cushion added to every side of this box. Bonds allowed to rotate include those linking s p 3 hybridized atoms to either s p 3 or s p 2 hybridized atoms and single bonds linking two s p 2 hybridized atoms. The ligand bond lengths, bond angles, and the torsional angles of the unrotated bonds were obtained from the crystal structures of the bound wildtype ligand-protein complexes. Crystallographic buried water molecules are included in the simulations as part of the protein structure. We have pursued a 'plug-and-play' strategy with two different energy functions, a molecular mechanics AMBER force field [131,132] and a simplified energy function, along with two different sampling techniques, evolutionary programming [91] and Monte Carlo simulations [118,119,127,128]. The knowledge-based simplified energetic model includes intramolecular energy terms for the ligand, given by torsional and nonbonded contributions of the DREIDING force field [133], and intermolecular ligand-protein steric and hydrogen bond interaction terms calculated from a piecewise linear potential summed over all protein and ligand heavy atoms [19-21,91,92]. The parameters of the pairwise potential depend on the six different atom types: hydrogen-bond donor, hydrogen-bond acceptor, both donor and acceptor, carbon-sized nonpolar, flourine-sized nonpolar and sulfur-sized nonpolar. Primary and secondary amines are defined to be donors while oxygen and nitrogen atoms with no bound hydrogens are defined to be acceptors. Sulfur is modeled as being capable of making long-range, weak hydrogen bonds which allows for sulfur-donor closer contacts that are seen in some of

303

the crystal structures. Crystallographic water molecules and hydroxyl groups are defined to be both donor and acceptor, and carbon atoms are defined to be nonpolar. The steric and hydrogen bond-like potentials have the same functional form, with an additional three-body contribution to the hydrogen bond term. The parameters were refined to yield the experimental crystallographic structure of a set of ligand-protein complexes as the global energy minimum [91,92]. No assumptions regarding either favorable ligand conformations or any specific ligand-protein interactions were made, and all buried crystallographic water molecules are included in the simulations as part of the protein structure. The all atom-based energy function employed in this study contains an intramolecular term for the ligand, which consists of the van der Waals and torsional strain contributions of the DREIDING force field and an intermolecular energy term which describes interactions between the ligand and the protein. The short-ranged repulsive interactions present in many molecular force fields such as AMBER leads to rough energy surfaces with high energy barriers separating local minima. In this force field, small changes in position can lead to significant energy changes. For molecular docking simulations, it has been shown that the energy surface must be smooth for robust structure prediction of ligand-protein complexes [92]; softening the potentials is a way to smooth the force field and enhance sampling of the conformational space while retaining adequate description of the binding energy landscape [119,125,134] We have shown that both the modified AMBER force field and the simplified piecewise linear (PL) energy function produce comparable results during docking simulations in predicting crystal structures of ligand-protein complexes [119,125]. Both the modified AMBER energy function and the P L energy function do not have singularities at interatomic distances, effectively explore accessible ligand binding modes, and sample a large fraction of conformational space, particularly at high temperature. Although the standard AMBER force field is less amenable to searching, in principle it should describe more adequately the energetics of ligandprotein interactions, which is critical for adequate ordering of the energetics of SB203386 complexes with HIV-I protease and its mutants. In this study, we employ a hierarchical approach where the PL energy function is used in combination with parallel Monte Carlo simulated tempering approach [135140] to adequately sample the conformational space and describe the multitude of the inhibitor binding modes. The advantage of simulated tempering

304

approach is the ability not only to generate an accurate canonical distribution of the ligand-protein system at a wide temperature range, but also to search for the global energy minimum. The PL energy function is expected to characterize the density of low-energy states and describe the local basins surrounding binding modes. However, this function is less accurate in detecting the exact locat}on and energetics of the native state because of the inaccuracy in quantifying the exact magnitude of ligand-protein interactions. Standard molecular mechanics AMBER force field in conjunction with a desolvation correction [141] is used to optimize the generated samples from the low-energy regions and thereby characterize more precisely the energetics of the inhibitor binding domains. A solvation term was added to the AMBER interaction potential to account for the free energy of interactions between the explicitly modeled atoms of the ligand-protein system and the implicitly modeled solvent.

IV.1. Computer simulations of ligand-protein docking Evolutionary algorithm, a stochastic optimization technique based on the ideas of natural selection, was used in ligand-protein docking simulations [91]. During the search, a population of candidate ligand conformers competes for survival against a fixed number of opponents randomly selected from the remainder of the population. A win is assigned to the competitor with the lowest energy and the number of competitions that a member wins determines the survival probability to the next generation. All surviving members produce offspring, subject to a constant population size. In the population of ligand conformers, each member represents an encoded vector consisting of the rigid body coordinates and the torsional angles about the rotatable bonds. The initial ligand conformations are generated by randomizing the encoded vector, where the center of mass of the ligand is restricted to the rectangular parallelepiped that defines the active site. The three rigid-body rotational degrees of freedom, as well as the torsional angles for all rotatable bonds are uniformly initialized between 0 and 360 degrees. In simulations with multiple protein conformations, each member of the initial population represents a ligand conformation with a randomly assigned protein conformation from the given ensemble. During the search, the surviving members of the population with the lowest energy represent the ligand conformation with the corresponding protein conformation. The

305

protein conformation of the winner is preserved when offspring is produced, otherwise a new randomly selected protein conformation is assigned to a population member. For each docking simulation, the evolutionary search was performed for a total of 120 generations with a population size of 1200 members. To provide a necessary level of diversity, each member competes against three opponents at each generation. The size of the standard deviation for the Gaussian mutation in the process of generating offsprings is varied adaptively using selection pressure. As a result, large mutations are encouraged early in the simulation to facilitate rapid search, while smaller mutations are made as the simulation progresses to refine solutions near to the global energy minimum. The minimized best member of the final generation defines the predicted structure for the ligand-protein complex. Using the evolutionary searching algorithm, we have carried out multiple independent docking simulations of the SB203386 inhibitor with the ensemble of 6 protease bound conformations, generated from the crystallographically determined HIV-1 protease wild-type and mutant complexes with the SB203386 inhibitor [77,78,80,81] : 1) SB203386 wild-type (pdb entry lsbg), 2) HIV-1 (2:31-37)chimera (lbdl), 3) HIV-1 (2:31-37,47,82) (lbdq), 4) HIV-1 (2:31,33-37) (lbdr), 5) HIV-1 protease triple mutant V32I/I47V/V82I (ltcx), and 6) SIV protease (ltcw). In addition, an extended set of 32 protease bound conformations was used in docking simulations, that included protein conformations of the SB203386 complexes. The remainder of this set consisted of the following crystallographically determined HIV-1 protease complexes : 7) hydroxyethylene inhibitor (laaq), 8) $B203238 (lhbv), 9) SKF 108738 (lhef), 10)SKF107457 (lheg), 11) CGP 53820 (lhih), 12) U75875 (lhiv), 13) SB204144 (lhosa), 14)SB204144 (lhosa), 15)SB206343 (lhpsa), 16) SB206343 (lhpsb), 17) VX-478 (lhpv), 18) GR126045 (lhtfa), 19) GR126045

(lhtfb), 20) Cm37615 (lhtg ), 21) CR1376 5 (ltgb), 22) A7692S 23) A77003(R,S)(lhvi), 24)A78791(S,-)(lhvj), 25) A76928(S,S)(lhvk), 26) A76889(R,R) (lhvl), 27) XK263 (lhvr), 28) V82A mutant with inhibitor A77003 (lhvs), 29) MVT101 (4hvp), 30) JG365 (7hvp), 31) U85548e (Shvp), and 32) A-74704 (9hvp).

306

IV.2. Monte Carlo equilibrium simulations of ligandprotein thermodynamics Parallel simulated tempering dynamics with multiple protein conformations can be considered as a modification of A-dynamics approach [142-147] and primarily its extension t h a t rapidly evaluates the relative binding affinities of a set of ligands to a given protein [146,147]. This methodology is based on the idea of the "hybrid" hamiltonian that allows efficient calculation of thermodynamic quantities with a coupling parameter, treated as a dynamic variable, rather than a parameter for continuous transformation from one state to another. The A-dynamics approach was further developed for competitive binding calculations with ensembles of multiple ligands, where ligands compete for a given receptor on the basis of their relative binding free energies. Rapid screening of binding affinities with the A-dynamics method is a compromise between conventional free energy methods and empirical free energy methods. This methodology was found to be more efficient in evaluating multiple ligands because of the simultaneous search component of the technique [147]. Analogous to the )~-dynamics approach, binding free energy calculations of a given ligand with an ensemble of multiple protein conformations must contain two components: the free energy calculation of the solvated protein, and the free energy of the complexed ligand-enzyme bound state. The first half of this binding affinity equation evaluates the solvation free energy of protein conformations and should take into account the free energy changes between the unbound and closed forms of the HIV-1 protease and its mutants. We focus only on the second half of the binding affinity equation and analyze the results of competitive binding experiments, in which multiple protein conformations compete for the SB203386 inhibitor on the basis of the interaction energetics. In simulations with ensembles of multiple protein conformations, each ligand replica of the ligand-protein system is associated with a protein conformation from a given ensemble. The protein conformations are linearly assigned to each temperature level, that implies a consecutive assignment of protein conformations starting from the highest temperature level and allows each protein conformation from the ensemble at least once be assigned to a certain temperature level. We have carried out equilibrium simulations with the ensembles of protease conformations using parallel simulated tempering dynamics with 50 replicas of the ligand-protein system attributed respec-

307

tively to 50 different temperature levels that are uniformly distributed in the range between 5300K and 300K. Independent local Monte Carlo moves are performed independently for each replica at the corresponding temperature level, but after a simulation cycle is completed for all replicas, configuration exchanges for every pair of adjacent replicas are introduced. The m-th and n-th replicas, described by a common Hamiltonian H ( X ) , are associated with the inverse temperatures ~,~ and ~ , and the corresponding conformations Xm and X~. The exchange of conformations between adjacent replicas m and n is accepted or rejected according to Metropolis criterion with the probability p = rain(l, exp[-5]) where 5 = [~-/3m][H(Xm)-H(X~)]. Starting with the highest temperature, every pair of adjacent temperature configurations is tested for swapping until the final lowest value of temperature is reached. This process of swapping configurations is repeated 50 times after each simulation cycle for all replicas whereby the exchange of conformations presents an improved global update which increases thermalization of the system and overcomes slow dynamics at low temperatures on rough energy landscapes, thereby permitting regions with a small density of states to be sampled accurately. During simulation, each replica has a non-negligible probability of moving through the entire temperature range and the detailed balance is never violated which guarantee each replica of the system to be equilibrated in the canonical distribution with its own temperature [135-140]. Hence, we generate the canonical distribution of the ligand-protein system and the equilibrium distribution of protein conformations at each temperature. At equilibrium, the fraction of time that the ligand-protein system spends at a protein conformation % = i to time spent at a protein conformation A = j is determined by the Boltzm a n n distribution P(Ai = 1, Amr = O) P(Aj = 1, A~j = O) and provides a measure for ordering protein conformations according to their interaction free energies with the inhibitor. The protein conformations that deliver the lowest interaction energy for the inhibitor during equilibrium simulation would dominate the distribution with the highest probability. Monte-Carlo simulations allow to dynamically optimize the step sizes at each temperature by taking into account the inhomogeneity of the molecular

308

system [148]. We update the maximum step sizes using the acceptance ratio method every cycle of 1000 sweeps, and stored both the energy and the coordinates of the system at the end of each cycle. For all these simulations, we equilibrated the system for 1000 cycles (or one million sweeps), and collected data during 10,000 cycles (or ten million sweeps) resulting in 10,000 samples at each temperature. A sweep is defined as a single trial move for each degree of freedom of the system. A key parameter is the acceptance ratio which is the ratio of accepted conformations to the total number of trial conformations. At a given cycle of the simulation, each degree of freedom can change randomly throughout some prespecified range determined by the acceptance ratio obtained during the previous cycle. This range varies from one degree of freedom to another because of the complex nature of the energy landscape. At the end of each cycle, the maximum step size is updated and used during the next cycle. Simulations are arranged in cycles, and after a given cycle i, where the average acceptance ratio for each degree of freedom j is GAAT = GTTT, which agrees well with the predictions made on the basis of the histograms in Figure 6. GAAT and GTTT are least adaptable to this conformation because of a steric clash between the thymine methyl groups in the major groove (see also [112]). Interestingly, these sequences are found at the kink positions in 1AIS, the ternary hyperthermophile complex. In this complex, the geometry of the kink is slightly different, with a higher rise and smaller s l i d e , alleviating the methyl-methyl clash. This structural rearrangement is probably not free of energy cost, as this complex has a very different salt and temperature behavior compared to the mesophile complexes [ 113]. The results from the calculations identify GTAT and GATT as preferred energetically, because the inter-basepair hydrogen bonds they form in the major groove stabilize the TA-DNA conformation. These hydrogen bonds have a better geometry in GTAT than in GATT, accounting for the further energetic preference for GTAT [83]. The same kind of calculation was performed for DNA double stranded

396 tetramers TATA and TAAA, corresponding to the sequences at the dyad of structures 1YTB and PDT025, respectively. This position is characterized by positive roll and the most severe unwinding in the whole element. The calculated transition was from A-DNA to the conformation in 1YTB and PDT025, using the same protocol as before [83]. The calculations started from A-DNA because of the tendency of these sequences to adopt A-DNA like geometries in solution (see above). The conformational change is greater in 1YTB than in PDT025, and this is reflected in the free energy cost for the transition: 11.8 kcal/mol for TATA and 8.1 kcal/mol for TAAA. The conclusion from these calculations was that the hydrogen bonds formed between these bases and the asparagine and threonine residues located at the dyad of TBP are responsible for driving the conformational transition [82]. Most interesting is that if TAAA is forced into the conformation in 1YTB, the energetic cost climbs to 14.4 kcal/mol. This is again consistent with the behavior depicted in the histograms in Figure 6 in that AA steps have a narrower twist prof'fle than AT steps, indicating that the population of unwound structures is smaller for AA than for AT steps. It is most noteworthy that the MD simulations leading to the results in Figure 6 were done with the CHARMM23 potential [93], while the PMF calculations were done with the AMBER 4.0 potential [94]. The coincidence in the conclusions derived from the simulations done with these different forcefields lends further credence to the inferences from these complex calculations.

2.3. The dehydration of the interface In section 2.1 the contact interface between TBP and the minor groove of DNA was characterized as anhydrous. This is a common characteristic in all the TBP-DNA complexes available to date. As TBP presents a primarily hydrophobic surface to DNA, most of the hydrogen bond donors and acceptors at this surface are not satisfied by the complexation. Hence, there is likely to be an enthalpic penalty associated with the dehydration of this surface. This penalty is compensated by the favorable increase in entropy associated with the liberation of the surface-bound water molecules into bulk solution. Following this reasoning, there are two aspects of hydration that could contribute to the determination of sequence specificity: the ideal sequence would be one which coordinates a large number of water molecules, but binds them least tightly. We carried out an extensive analysis of the hydration properties of DNA in the simulations M L P and I (see Table 3) [ 114], based on the proximity analysis developed in Beveridge's group [115,116] and implemented and improved by Mihaly Mezei [117]. The idea behind the proximity analysis is to partition the space surrounding DNA by placing bisectors along each bond in the molecule. In this manner, a collection of cells is generated (akin to Voronoi polyhedra) which can be ascribed to each atom of DNA for each snapshot of a simulation. Water molecules are assigned to each particular atom if they fall in the

397 appropriate volume elements. Properly normalized radial distribution functions can then be calculated from the number of water molecules in each cell and the corresponding volume, and with these functions, primary and secondary hydration shells can be detected. The detailed analysis is reported in [114]. In summary, the number of water molecules coordinated in the first shell by both dodecamers is practically the same, but there are important differences in the number of water molecules contained in the first and second hydration shells in the minor groove of the TATA (or TITI) elements. From the analysis, it appears that the grooves have very similar numbers of water molecules in the first shell, but the groove in M L P widens near the sugar-phosphate backbone and is capable of hosting more water molecules than I in that region. The interaction energies between the water molecules in the first shell and the atoms in the minor groove are practically the same for both dodecamers. This is expected because the chemical identity of this surface is the same in these two sequences. On the other hand, M L P has 19 more water molecules than I in the minor groove, and this represents a great entropic advantage for M L P . Furthermore, if both hydration shells are restricted in mobility compared to bulk water, there should be a measurable difference in the heat capacity change upon binding to M L P compared to binding to I. This prediction can be tested experimentally. Such a determination of the heat capacity change for these two systems would actually help to define the extent of water perturbation caused by the minor groove surface.

2.4. Integration of the various contributions into mechanistic criteria for the formation of TBP-DNA complexes The final energy balance that determines the relative affinities of TBP to different DNA sequences is composed of the various contributions to selectivity analyzed above in an individual fashion. They apply simultaneously, either constructively or in opposite directions. The experimental measures of binding constants [107] reflect this final ~ energy balance. According to these experimental data, the best TBP-binding sequences are represented by AT, E4, CYC1 and 6T. A common feature of these TBP binding sites is that they have at least six bp of alternating YR sequence. The detailed considerations discussed above indicate that this family of sequences is special because of its combination of static wedges at the appropriate positions (RY steps tend to have positive roll and low twist on average) and the high flexibility for YR steps that exhibit a mild anisotropic bendability towards the major groove. The fact that these sequences have the highest affinity for TBP suggests that DNA flexibility is a dominant characteristic in determining the specificity of binding. Note moreover, that because GC sequences have been found to have high flexibility of this kind, it may be possible to design a TBP mutant which can bind to GC, after judiciously eliminating all the steric clashes to the guanine amino group.

398 Following in the binding affinity ranking are the M L P , M L P 2 and CYC1B sequences, which also contain an alternating YR sequence that is followed, however, by a purine tract. In this same family one could also fit R28 (which is an inverted version of MLP), as well as 2C, 7G, and I. As shown in Figure 6, the purine tract is more rigid than the alternating YR region, and this could well account for the decrease in affinity. On the other hand, the purine tracts tend to lock into an A-DNA conformation which is assumed to be on a productive pathway leading to TBP binding. Thus, energy considerations would indicate that part of the work necessary to achieve the conformation found in the complex is already done for these sequences, and the entropy involved in locking the sugar tings in the North conformation is already paid for. The sequences in this group tend to have the highest structural homology to A-DNA of the collection of sequences studied by us, and their affinity to TBP could be rationalized on this basis. (Note that a recent report analyzing the thermodynamics of TBP-DNA complex formation proposes an initial intermediate that has an A-DNA conformation that subsequently isomerizes by the insertion of the two pairs of phenylalanine residues [60]). A comparison of the flexibility properties of adenine tracts (found in M L P ) and inosine tracts (found in I) indicated that the former are more rigid than the latter, making I a better substrate for TBP than M L P . Nevertheless, as discussed in the previous section, M L P coordinates more water molecules in the minor groove than I, compensating in excess the difference in rigidity of the purine tracts. In this particular comparison, hydration would appear to be a more important selectivity determinant than DNA flexibility.

3. D Y N A M I C E F F E C T S IN C O M P L E X S T A B I L I Z A T I O N The available crystal structures of TBP-DNA complexes agree in presenting a very tight and rigid interface between TBP and the minor groove of DNA. The comparison of the conformation of TBP in its free and bound forms [22,24] showed that there was a very modest structural rearrangement in TBP (a 5 ~ rotation of one subdomain respect to the other and a reduction in stirrup-stirrup distance), compared to the drastic conformational change in DNA. These considerations based on the crystal structures have resulted in centering all the attention on the changes in DNA during binding, with the implication that TBP is a passive and rigid restraint. The mechanistic analysis is now complemented by molecular dynamics simulations of TBP-DNA complexes that have enabled the study of the role of TBP dynamics in DNA recognition. Miaskiewicz and Ornstein carried out a 400 ps simulation of PDT025, with and without DNA [118], with the AMBER 4.0 and the Weiner et al. potential [ 119], where they identified a bending and twisting motion of the two subdomains of TBP. This motion has been invoked in subsequent articles as

399 part of the mechanism employed by TBP to open the minor groove of the TATA box [60,86]. They also found that TBP makes contacts to bases immediately 3' of the TATA box. Three very important residues, N99, L54 and L145, do not stick to the original base where they were found in the X-ray structure, but they wobble between two adjacent bases and their sugars. Our 2 ns CHARMM simulations of PDT025, 1YTB and 1AIS (without TFB), and the corresponding free TBPs, serve to reexamine the inferences on the nature and role of TBP dynamics with results from simulations of additional systems that are more completely described (e.g., Miaskiewicz and Ornstein did not include the internal water molecules in the TBP structure, while our new calculations do include them [ 120]). In these new simulations we do not detect evidence for the collapse of the two subdomains of TBP caused by extreme bending. There are indeed oscillations in the distance between the tips of the stirrups of TBP [101] and twisting motions, but these oscillations never acquire such an amplitude as to cause the complete closing of the underside of TBP. This is true for the three free TBP crystal structures simulated as monomers (NP-unpublished results). The striking result of the simulations of these complexes is the wealth of dynamics they have revealed, ranging from global motions to the spectrum of populations of side chain rotamers. For example, the hydrogen bonds located at the center of the TATA box, which had been studied with quantum mechanical methods (see section 2.1, above) and found to be indistinguishable in strength between the two possible AoT bp, are shown from the dynamics data to be very labile as a result of a competition for the amino hydrogens of N9 and N99, between the rim of the bases and neighboring H-bond TBP acceptors (the alcohol oxygens of T64 and T155). The extent of competition depends on the actual DNA sequence at that step, and is therefore different in the three simulated complexes. This finding makes TBP an active partner in the formation of the complex, and adds another layer of complexity to the analysis of specificity determinants. Experimental validation for the importance we are beginning to place on TBP dynamics comes from the prediction of the OH radical footprint pattern of SCE bound to MLP (Table 1). The predictions were based on solvent accessibility, and compared the X-ray structure PDT025 to results from trajectories of a molecular dynamics simulation of the same complex. The comparison is valid although there is no crystal structure for the particular combination of TBP and DNA sequence, because the residues involved in binding to DNA are 100% conserved between ATH and SCE. The key findings from this comparison are that both the crystal structures and a single average structure from the simulation failed to predict the correct reactivity at four nucleotides of the TATA box. Rather, the correct reactivity pattern is produced only when the fluctuations in the structure of TBP are taken into account through the analysis of the solvent accessibility in the MD trajectory (Pastor et al., in preparation).

400 The extensive pattern of dynamics that emerges from our simulations of TBP as an important contributor to complex formation is evident as well from the first structure of TBP obtained by NMR. The collection of 25 structures of a complex between TBP and an N-terminal domain of the largest TAFII in D.melanogaster, listed in Table 1 as 1TBA, provides evidence for the dynamic properties of both TBP and the TAF fragment from the analysis of the different rotamers populated in the ensemble of structures. Such local variability in structure also obtains for the side chains at the recognition interface, lending further credibility to the results obtained in the molecular dynamics simulations described above.

4. T O W A R D S THE PREINITIATION COMPLEX A S S E M B L Y The formation of a functional preinitiation complex is heavily regulated [6,15], in keeping with the fact that most of the regulation of gene expression occurs at the level of transcription initiation. As mentioned in the introduction, TBP exists in the cell in association with TAFs, which are absolutely necessary to respond to most of the activators characterized so far. TBP on its own can promote transcription, but only at a basal level. The footprint generated at a core promoter in vitro by TFIID depends on the kind of TATA box present: if it is a very efficient TATA box (such as MLP), then the footprint will cover the TATA box and a few nearby bp [121]. On the other hand, for a poor TATA box the footprint may extend many bp upstream and downstream [122,123]. This has been interpreted based on the assumption that TAFs have some DNA binding activity towards the initiator and a downstream promoter element, and that these interactions help to stabilize the association to DNA when the TATA box sequence deviates significantly from the consensus [11 ]. In this respect, TFIIA is also known to stabilize TBP-DNA complexes, and also to compete away proteins which inhibit the ability of TBP to bind to DNA [52,124,125]. The reported TBP-DNA structures are almost invariant in the geometry of the complex; nevertheless, there is evidence from gel retardation assays that the geometry of the complex in solution is strongly dependent on the TATA box sequence [47,126]. This is also relevant to understanding sequence specificity, because the difference in geometry is related to the stability of the complex and its ability to be recognized by TFIIB and the rest of the transcription machinery [126]. A preliminary analysis of the simulations of three different TBP-DNA complexes suggests that indeed, TBP can make adjustments in its structure to respond to the different dynamic properties of the bound DNA sequence [ 120], resulting in different angles between the incoming and outgoing segments of DNA. It is not immediately clear how to translate these alterations in complex structure to the disposition of TAFs and TFIIs around TBP. To

401 address this issue, ongoing work in our laboratories aims to characterize the differences in relative positions of side chains that have been identified by mutagenesis to be involved in contacts with the TAFs and TFI:S, as a response to the interaction with different DNA sequences.

5. CONCLUDING REMARKS

The analysis of the different factors that contribute to sequence specific recognition of TATA boxes by TBP leads to a complex picture of their interplay in determining the final binding affinity. Steric repulsion remains the strongest selectivity filter, effectively biasing TATA box composition to exclude GoC bp. For the remaining AoT rich sequences, DNA flexibility appears to be the next most important factor, as the best TBP binding sequences are those which are the most flexible, (i.e., sequences include many pyrimidinepurine bps). For sequences with adenine tracts, the penalty in loss of flexibility is apparently balanced by their propensity for adopting the A-DNA like structure that has been proposed to be an intermediate in the process of TBP binding. Further distinction among the TBP-binding sequences is achieved by differential hydration of the minor groove surface, which must be completely dehydrated to form a stable complex. Thus, we found that differences in the number of bound water molecules can offset differences in flexibility. A key contribution of molecular dynamics simulations to the understanding of mechanisms of selectivity and affinity in TBP-DNA complexes is the discovery of the active role of TBP in the formation of the complex. The view derived from crystal structures was that of a passive role for the TBP which only imposed a steric constraint on DNA shape. It appears now from the simulations that TBP can respond to the dynamics of the bound DNA sequence by adjusting its interdomain geometry, and this might be relevant for the construction of the final preinitiation complex. Furthermore, many of the contacts characterized in the crystal structures were found in the simulations to have an important dynamic component, as side chains switch rotamers rather frequently. This conformational freedom makes it possible for TBP to achieve suitable binding contacts with a variety of DNA moieties in a dynamic mode which contributes to enthalpic stabilization. However, the extent of preservation of side chain dynamics in the complex is dependent on the local structure. As it reduces the entropy loss upon complex formation, it provides an additional source of sequence-dependent gain in affinity that is revealed for the first time from the results of the molecular dynamics simulations.

402 6. A C K N O W L E D G E M E N T S W e thank Dr. L e o n a r d o Pardo for sustained discussions and collaborations on this topic. S o m e of the s i m u l a t i o n s r e p o r t e d here w e r e p e r f o r m e d at the Direcci6n General de Servicios de C6mputo Acad6mico ( U N A M ) .

REFERENCES .

.

.

.

.

6. .

.

.

10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

T. Rowlands, P. Baumann and S.P. Jackson. The TATA-binding protein: a general transcription factor in eukaryotes and archaebacteria. Science 264 (1994) 1326-9. S.K. Burley and R.G. Roeder. Biochemistry and structural biology of transcription factor 13I) (TFIID). Annu. Rev. Biochem. 65 (1996) 769-99. LB.P. Cormack and K. Struhl. The TATA-binding protein is required for transcription by all three nuclear RNA polymerases in yeast cells. Cell 69 (1992) 685-96. J.A. Goodrich and R. Tjian. TBP-TAF complexes: selectivity factors for eukaryotic transcription. Curr. Opin. Cell. Biol. 6 (1994)403-9. K. Struhl. Duality of TBP, the universal transcription factor. Science 263 (1994) 1103. R.G. Roeder. The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci. 21 (1996) 327-35. R. Breathnach and P. Chambon. Organization and expression of eucaryotic split genes coding for proteins. Annu. Rev. Biochem. 50 (1981) 349-83. B.C. Hoopes, J.F. LeBlanc and D.K. Hawley. Contributions of the TATA box sequence to rate-limiting steps in transcription initiation by RNA polymerase II. J. Mol. B iol. 277 (1998) 1015-31. X.Y. Li, A. Virbasius, X. Zhu and M.R. Green. Enhancement of TBP binding by activators and general transcription factors. Nature 399 (1999) 605-9. L. Kuras and K. Struhl. Binding of TBP to promoters in vivo is stimulated by activators and requires Pol II holoenzyme. Nature 399 (1999) 609-13. T.W. Burke, P.J. Willy, A.K. Kutach, J.E. Butler and J.T. Kadonaga. The DPE, a conserved downstream core promoter element that is functionally analogous to the TATA box. Cold Spring Harb. Symp. Quant. Biol. 63 (1998) 75-82. S. Buratowski, S. Hahn, L. Guarente and P.A. Sharp. Five Intermediate Complexes in Transcription Initiation by RNA Polymerase II. Cell 56 (1989) 549-561. A.J. Koleske and R.A. Young. The RNA polymerase II holoenzyme and its implications for gene regulation. Trends Biochem. Sci. 20 (1995) 113-6. J.A. Ranish, N. Yudkovsky and S. Hahn. Intermediates in formation and activity of the RNA polymerase II preinitiation complex: holoenzyme recruitment and a postrecruitment role for the TATA box and TFIIB. Genes Dev. 13 (1999) 49-63. G. Orphanides, T. Lagrange and D. Reinberg. The general transcription factors of RNA polymerase II. Genes Dev. 10 (1996) 2657-83. J.C. Dantonel, J.M. Wurtz, O. Poch, D. Moras and L. Tora. The TBP-like factor: an alternative transcription factor in metazoa? Trends Biochem. Sci. 24 (1999) 335-9. P.A. Moore, et al. A human TATA binding protein-related protein with altered DNA binding specificity inhibits transcription from multiple promoters and activators. Mol. Cell Biol. 19 (1999) 7610-20. B.S. DeDecker, et al. The crystal structure of a hyperthermophilic archaeal TATA-box binding protein. J. Mol. Biol. 264 (1996) 1072-84. D.B. Nikolov, et al. Crystal structure of TFHD TATA-box binding protein. Nature 360 (1992) 40-6. D.B. Nikolov and S.K. Burley. 2.1 A resolution refined structure of a TATA box-

403

21. 22. 23. 24. 25. 26. 27. 28. 29.

3o. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.

binding protein (TBP). Nature Struct Biol 1 (1994) 621-37. D.I. Chasman, K.M. Flaherty, P.A. Sharp and R.D. Kornberg. Crystal structure of yeast TATA-binding protein and model for interaction with DNA. Proc. Natl. Acad. Sci. USA 90 (1993) 8174-8. J.L. Kim, D.B. Nikolov and S.K. Burley. Co-crystal structure of TBP recognizing the minor groove of a TATA element. Nature 365 (1993) 520-7. Y. Kim, J.H. Geiger, S. Hahn and P.B. Sigler. Crystal structure of a yeast TBP/TATAbox complex. Nature 365 (1993) 512-20. J.L. Kim and S.K. Burley. 1.9 A resolution refined structure of TBP recognizing the minor groove of TATAAAAG. Nature Struct Biol 1 (1994) 638-53. D.B. Nikolov, et al. Crystal structure of a human TATA box-binding proteinffATA element complex. Proc. Natl. Acad. Sci. USA 93 (1996) 4862-7. Z.S. Juo, et al. How proteins recognize the TATA box. J. Mol. Biol. 261 (1996) 239. D.B. Nikolov, et al. Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature 377 (1995) 119-28. S. Tan, Y. Hunziker, D.F. Sargent and T.J. Richmond. Crystal structure of a yeast TFIIA/TBP/DNA complex. Nature 381 (1996) 127-51. J.H. Geiger, S. Hahn, S. Lee and P.B. Sigler. Crystal structure of the yeast TFIIA/TBP/DNA complex. Science 272 (1996) 830-6. P.F. Kosa, G. Ghosh, B.S. DeDecker and P.B. Sigler. The 2.1-A crystal structure of an archaeal preinitiation complex: TATA- box-binding protein/transcription factor (II)B core/TATA-box. Proc. Natl. Acad. Sci. USA 94 (1997) 6042-7. O. Littlefield, Y. Korkhin and P.B. Sigler. The structural basis for the oriented assembly of a TBP/TFB/promoter complex. Proc. Natl. Acad. Sci. USA 96 (1999) 13668-73. F.T. Tsai and P.B. Sigler. Structural basis of preinitiation complex assembly on human Pol II promoters. EMBO J. 19 (2000) 25-36. C.A. Bewley, A.M. Gronenbom and G.M. Clore. Minor groove-binding architectural proteins: structure, function, and DNA recognition. Annu. Rev. B iophys. B iomol. Struct. 27 (1998) 105-31. D. Liu, et al. Solution structure of a TBP-TAF(II)230 complex: protein mimicry of the minor groove surface of the TATA box unwound by TBP. Cell 94 (1998) 573-83. F.C. Bernstein, et al. The Protein Data Bank: a computer-based archival file f o r macromolecular structures. J. Mol. Biol. 112 (1977) 535-42. H.M. Berman, et al. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63 (1992) 751-9. B. Coulombe, J. Li and J. Greenblatt. Topological localization of the human transcription factors IIA, IIB, TATA box-binding protein, and RNA polymerase IIassociated protein 30 on a class II promoter. J Biol Chem 269 (1994) 19962-7. J.D. Griff'lth, S. Lee and Y.-H. Wang. Visualizing nucleic acids and their complexes using electron microscopy. Curr. Opin. Struct. Biol. 7 (1997) 362-6. T.K. Kim, et al. Trajectory of DNA in the RNA polymerase II transcription preinitiation complex. Proc. Natl. Acad. Sci. USA 94 (1997) 12268-73. T. Lagrange, et al. High-resolution mapping of nucleoprotein complexes by site-specific protein-DNA photocrosslinking: organization of the human TBP-TFIIA- TFIIB-DNA quaternary complex. Proc. Natl. Acad. Sci. USA 93 (1996) 10620-5. G.J. Jensen, G. Meredith, D.A. Bushnell and R.D. Kornberg. Structure of wild-type yeast RNA polymerase II and location of Rpb4 and Rpb7. EMBO J. 17 (1998) 2353-8. X. Xie, et al. Structural similarity between TAFs and the heterotetrameric core of the histone octamer. Nature 380 (1996) 316-22. W. Zhu, et al. The N-terminal domain of TFIIB from Pyrococcus furiosus forms a zinc ribbon. Nature Struct. Biol. 3 (1996) 122-4. S. Bagby, et al. Solution structure of the C-terminal core domain of human TFIIB: similarity to cyclin A and interaction with TATA-binding protein. Cell 82 (1995) 857-67. M.E. Noble, J.A. Endicott, N.R. Brown and L.N. Johnson. The cyclin box fold:

404

46. 47. 48. 49. 50. 51. 52. 53. 54.

protein recognition in cell-cycle and transcription control. Trends Biochem. Sci. 22 (1997) 482-7. B.C. Hoopes, J.F. LeBlanc and D.K. Hawley. Kinetic analysis of yeast TFIID-TATA box complex formation suggests a multi-step pathway. J Biol Chem 267 (1992) 11539. D.B. Starr, B.C. Hoopes and D.K. Hawley. DNA bending is an important component of site-specific recognition by the TATA binding protein. J. Mol. Biol. 250 (1995) 434. J.M. Cox, A.R. Kays, J.F. Sanchez and A. Schepartz. Preinitation complex assembly: potentially a bumpy path. Curr. Opin. Chem. Biol. 2 (1998) 11-7. R.A. Coleman, A.K. Taggart, L.R. Benjamin and B.F. Pugh. Dimerization of the TATA binding protein. J. Biol. Chem. 270 (1995) 13842-9. A.K. Taggart and B.F. Pugh. Dimerization of TFIID when not bound to DNA. Science 272 (1996) 1331-3. R.A. Coleman and B.F. Pugh. Slow dimer dissociation of the TATA binding protein dictates the kinetics of DNA binding. Proc. Natl. Acad. Sci. USA 94 (1997) 7221-6. R.A. Coleman, A.K. Taggart, S. Burma, J.J.n. Chicca and B.F. Pugh. TFILA regulates TBP and TFIID dimers. Mol. Cell 4 (1999) 451-7. A.J. Jackson-Fisher, C. Chitikila, M. Mitra and B.F. Pugh. A role for TBP dimerization in preventing unregulated gene expression. Mol. Cell 3 (1999) 717-27. A.J. Jackson-Fisher, et al. Dimer dissociation and thermosensitivity kinetics of the Saccharomyces cerevisiae and human TATA binding proteins. Biochemistry 38 (1999) 11340-8.

55. 56. 57. 58. 59.

60. 61. 62. 63. 64. 65. 66. 67.

R.A. Coleman and B.F. Pugh. Evidence for functional binding and stable sliding of the TATA binding protein on nonspecific DNA. J. Biol. Chem. 270 (1995) 13850-9. V. Petri, M. Hsieh and M. Brenowitz. Thermodynamic and kinetic characterization of the binding of the TATA binding protein to the adenovirus E4 promoter. Biochemistry 34 (1995) 9977-84. G.M. Perez-Howard, P.A. Weil and J.M. Beechem. Yeast TATA binding protein interaction with DNA: fluorescence determination of oligomeric state, equilibrium binding, on-rate, and dissociation kinetics. Biochemistry 34 (1995) 8005-17. K.M. Parkhurst, M. Brenowitz and L.J. Parkhurst. Simultaneous binding and bending of promoter DNA by the TATA binding protein: real time kinetic measurements. Biochemistry 35 (1996) 7459-65. V. Petri, M. Hsieh, E. Jamison and M. Brenowitz. DNA sequence-specific recognition by the Saccharomyces cerevisiae "TATA" binding protein: promoter-dependent differences in the thermodynamics and kinetics of binding. Biochemistry 37 (1998) 15842-9. K.M. Parkhurst, R.M. Richards, M. Brenowitz and L.J. Parkhurst. Intermediate species possessing bent DNA are present along the pathway to formation of a final TBPTATA complex. J. Mol. Biol. 289 (1999) 1327-41. J.D. Parvin, R.J. McCormick, P.A. Sharp and D.E. Fisher. Pre-bending of a promoter sequence enhances aff'mity for the TATA-binding factor. Nature 373 (1995) 724-7. A. Grove, A. Galeone, E. Yu, L. Mayol and E.P. Geiduschek. Affinity, Stability and Polarity of Binding of the TATA Binding Protein Governed by Flexure at the TATA Box. J. Mol. Biol. 282 (1998) 731-739. M.T. Record Jr, T.M. Lohman and P. de Haseth. Ion effects on ligand-nucleic acid interactions. J Mol Biol 107 (1976) 145-158. G.S. Manning. The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Quart. Rev. Biophys. 11 (1978) 179-246. R.S. Spolar and M.T.J. Record. Coupling of local folding to site-specific binding of proteins to DNA. Science 263 (1994) 777-84. J.M. Sturtevant. Heat capacity and entropy changes in processes involving proteins. Proc. Natl. Acad. Sci. USA 74 (1977) 2236-40. N.C. Seeman, J.M. Rosenberg and A. Rich. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. USA 73 (1976) 804-8.

405

68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79.

80. 81. 82. 83. 84. 85. 86. 87. 88. 89.

C.L. Kielkopf, et al. A structural basis for recognition of A.T and T.A base pairs in the minor groove of B-DNA. Science 282 (1998) 111-5. K.M. Amdt, S.L. Ricupero, D.M. Eisenmann and F. Winston. Biochemical and genetic characterization of a yeast TFIID mutant that alters transcription in vivo and DNA binding in vitro. Mol Cell Biol 12 (1992) 2372-82. K.M. Arndt, C.R. Wobbe, H.S. Ricupero, K. Struhl and F. Winston. Equivalent mutations in the two repeats of yeast TATA-binding protein confer distinct TATA recognition specificities. Mol. Cell. Biol. 14 (1994) 3719-28. M. Strubin and K. Struhl. Yeast and human TFIID with altered DNA-binding specificity for TATA elements. Cell 68 (1992) 721-30. M.C. Schultz, R.H. Reeder and S. Hahn. Variants of the TATA-binding protein can distinguish subsets of RNA polymerase I, II, and I~ promoters. Cell 69 (1992) 697. D. Poon, et al. Genetic and biochemical analyses of yeast TATA-binding protein mutants. J B iol Chem 268 (1993) 5005-13. H. Tang, X. Sun, D. Reinberg and R.H. Ebright. Protein-protein interactions in eukaryotic transcription initiation: structure of the preinitiation complex. Proc. Natl. Acad. Sci. USA 93 (1996) 1119-24. G.O. Bryant, L.S. Martel, S.K. Burley and A.J. Berk. Radical mutations reveal TATAbox binding protein surfaces required for activated transcription in vivo. Genes Dev. 10 (1996) 2491-504. P. Reddy and S. Hahn. Dominant negative mutations in yeast TFIID define a bipartite DNA-binding region. Cell 65 (1991) 349-57. T. Yamamoto, et al. A bipartite DNA binding domain composed of direct repeats in the TATA box binding factor TFIID. Proc Natl Acad Sci U S A 89 (1992) 2844-8. W.P. Tansey, S. Ruppert, R. Tjian and W. Herr. Multiple regions of TBP participate in the response to transcriptional activators in vivo. Genes Dev 8 (1994) 2756-69. S.K. Mahanta, T. Scholl, F.C. Yang and J.L. Strominger. Transactivation by CIITA, the type II bare lymphocyte syndrome- associated factor, requires participation of multiple regions of the TATA box binding protein. Proc. Natl. Acad. Sci. USA 94 (1997) 6324-9. Y. Cang, D.T. Auble and G. Prelich. A new regulatory domain on the TATA-binding protein. EMBO J. 18 (1999) 6662-6671. N. Pastor and H. Weinstein. Electrostatic analysis of DNA binding properties in lysine to leucine mutants of TATA-box binding proteins. Protein Eng. 8 (1995) 543-9. L. Pardo, M. Campillo, D. Bosch, N. Pastor and H. Weinstein. Binding mechanisms of TATA box-binding proteins: DNA kinking is stabilized by specific hydrogen bonds. Biophys. J. in press (2000). L. Pardo, N. Pastor and H. Weinstein. Selective binding of the TATA box-binding protein to the TATA box-containing promoter: analysis of structural and energetic factors. Biophys. J. 75 (1998) 2411-21. G. Guzikevich-Guerstein and Z. Shakked. A novel form of the DNA double helix imposed on the TATA-box by the TATA-binding protein. Nature Struct. Biol. 3 (1996) 32-7. L. Pardo, N. Pastor and H. Weinstein. Progressive DNA bending is made possible by gradual changes in the torsion angle of the glycosyl bond. Bi0phys. J. 74 (1998) 2191. A. Lebrun, Z. Shakked and R. Lavery. Local DNA stretching mimics the distortion caused by the TATA box- binding protein. Proc. Nail. Acad. Sci. USA 94 (1997) 2993. A. Lebrun and R. Lavery. Modeling DNA deformations induced by minor groove binding proteins. Biopolymers 49 (1999) 341-53. A.H. Elcock and J.A. McCammon. The low dielectric interior of proteins is sufficient to cause major structural changes in DNA on association. J. Amer. Chem. Soc. 118 (1996) 3787-3788. N. Pastor, L. Pardo and H. Weinstein. Does TATA matter? A structural exploration of the selectivity determinants in its complexes with TATA box-binding protein. Biophys.

406

90. 91. 92. 93. 94. 95. 96. 97. 98.

-9. 100. 101. 102. 103. 104. 105.

106. 107. 108. 109. 110. 111. 112. 113.

J. 73 (1997) 640-52. D. Flatters, M. Young, D.L. Beveridge and R. Lavery. Conformational properties of the TATA-box binding sequence of DNA. J. Biomol. Struct. Dyn. 14 (1997) 757-65. D. Flatters and R. Lavery. Sequence-dependent dynamics of TATA-Box binding sites. Biophys. J. 75 (1998) 372-81. O.N. de Souza and R.L. Omstein. Inherent DNA curvature and flexibility correlate with TATA box functionality. Biopolymers 46 (1998) 403-15. A.D. MacKerell Jr, J. Wiorkiewicz-Kuczera and M. Karplus. An all-atom empirical energy function for the simulation of nucleic acids. J Am Chem Soc 117 (1995) 11946. W.D. Cornell, et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117 (1995) 5179-5197. M. Feig and B.M. Pettitt. Structural equilibrium of DNA represented with different force fields. Biophys. J. 75 (1998) 134-49. M. Feig and B.M. Pettitt. Experiment vs force fields: DNA conformation from molecular dynamics simulations. J. Phys. Chem. B 101 (1997) 7361-3. N.A. Davis, S.S. Majee and J.D. Kahn. TATA Box DNA Deformation with and without the TATA Box-binding Protein. J. Mol. Biol. 291 (1999) 249-265. D.S. Goodsell and R.E. Dickerson. Bending and curvature calculations in B-DNA. Nucleic Acids Res. 22 (1994) 5497-503. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey and M.L. Klein. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79 (1983) 926-35. N. Pastor, L. Pardo and H. Weinstein. in Molecular Modeling of Nucleic Acids (eds. Leontis, N.B. & SantaLucia Jr, J.) 329-45 (American Chemical Society, San Francisco, CA, 1997). N. Pastor. Ph.D. Thesis in Biomedical Sciences (CUNY, New York, 1997). D.K. Lee, K.C. Wang and R.G. Roeder. Functional significance of the TATA element major groove in transcription initiation by RNA polymerase II. Nucleic Acids Res. 25 (1997) 4338-45. S. ArnotL et al. Wrinkled DNA. Nucleic Acids Res. 11 (1983) 1457-1474. N.B. Ulyanov and T.L. James. Statistical analysis of DNA duplex structural features. Methods Enzymol. 261 (1995) 90-120. M. Tonelli, E. Ragg, A.M. Bianucci, K. Lesiak and T.L. James. Nuclear magnetic resonance structure of d(GCATATGATAG), d(CTATCATATGC): a consensus sequence for promoters recognized by sigma K RNA polymerase. Biochemistry 37 (1998) 11745-61. C.R. Wobbe and K. Struhl. Yeast and human TATA-binding proteins have nearly identical DNA sequence requirements for transcription in vitro. Mol. Cell. Biol. 10 (1990) 3859-67. J.M. Wong and E. Bateman. TBP-DNA interactions in the minor groove discriminate between A:T and T:A base pairs. Nucleic Acids Res 22 (1994) 1890-6. H. Sklenar, C. Etchebest and R. Lavery. Describing protein structure: a general algorithm yielding complete helicoidal parameters and a unique overall axis. Proteins 6 (1989) 46-60. M. Suzuki, M.D. Allen, N. Yagi and J.T. Finch. Analysis of co-crystal structures to identify the stereochemical determinants of the orientation of TBP on the TATA box. Nucleic Acids Res. 24 (1996) 2767-73. W.K. Olson. Simulating DNA at low resolution. Curr. Opin. Struct. Biol. 6 (1996) 242-56. J.M. Cox, et al. Bidirectional binding of the TATA box binding protein to the TATA box. Proc. Natl. Acad. Sci. USA 94 (1997) 13475-80. M. Suzuki, N. Yagi and J.T. Finch. Role of base-backbone and base-base interactions in alternating DNA conformations. FEBS Lett. 379 (1996) 148-52. R. O'Brien, B. DeDecker, K.G. Flenfing, P.B. Sigler and J.E. Ladbury. The effects of

407

114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126.

salt on the TATA binding protein-DNA interaction from a hyperthermophilic archaeon. J. Mol. Biol. 279 (1998) 117-25. N. Pastor, A.D. MacKerell, Jr. and H. Weinstein. TIT for TAT: the properties of inosine and adenosine in TATA box DNA. J. Biomol. Struct. Dyn. 16 (1999) 787-810. P.K. Mehrotra, F.T. Marchese and D.L. Beveridge. Statistical state solvation sites. J. Am. Chem. Soc. 103 (1981) 672-3. P.K. Mehrotra and D.L. Beveridge. Structural analysis of molecular solutions based on quasi-component distribution functions. Application to [H2CO]aq at 25 oC. J. Am. Chem. Soc. 102 (1980) 4287-94. M. Mezei and D.L. Beveridge. Structural chemistry of biomolecular hydration via computer simulation: the proximity criterion. Methods Enzymol. 127 (1986) 21-47. K. Miaskiewicz and R.L. Ornstein. DNA binding by TATA-box binding protein (TBP): a molecular dynamics computational study. J. Biomol. Struct. Dyn. 13 (1996) 593-600. S.J. Weiner, P.A. Kollman, D.T. Nguyen and D.A. Case. An all atom force field for simulations of proteins and nucleic acids. J Comp Chem 7 (1986) 230-252. N. Pastor and H. Weinstein. Sidechain dynamics and seuqence specific TBP binding to TATA box DNA. Biophys. J. 76 (1999) A387. M. Horikoshi, et al. Transcription factor TFUD induces DNA bending upon binding to the TATA element. Proc Natl Acad Sci U S A 89 (1992) 1060-4. Y. Nakatani, et al. A downstream initiation element required for efficient TATA box binding and in vitro function of TFIID. Nature 348 (1990) 86-8. P.A. Emanuel and D.S. Gilmour. Transcription factor TFIID recognizes DNA sequences downstream of the TATA element in the Hsp70 heat shock gene. Proc. Natl. Acad. Sci. USA 90 (1993) 8449-53. K.H. Emami, A. Jain and S.T. Smale. Mechanism of synergy between TATA and initiator: synergistic binding of TFIID following a putative TFIIA-induced isomerization. Genes Dev. 11 (1997) 3007-19. Q. Liu, S.E. Gabriel, K.L. Roinick, R.D. Ward and K.M. Arndt. Analysis of TFIIA Function In Vivo: Evidence for a Role in TATA-Binding Protein Recruitment and GeneSpecific Activation. Mol. Cell Biol. 19 (1999) 8673-8685. J. Bernues, P. Carrera and F. Azorin. TBP binds the transcriptionally inactive TA5 sequence but the resulting complex is not efficiently recognised by TFIIB and TFIIA. Nucleic Acids Res. 24 (1996) 2950-8.

This Page Intentionally Left Blank

L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

409

Chapter 11

A Multi-Component Model For Radiation Damage To DNA From Its Constituents Stacey D. W e t m o r e , a Left A. E r i k s s o n b and Russell J. B o y d a

aDepartment of Chemistry-, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3 bDepartment of Quantum Chemistry, Uppsala University, Box 518, 751 20 Uppsala, Sweden

1. I N T R O D U C T I O N While the significance of radicals in biological systems has been appreciated for decades, there is relatively little defimtive experimental information on the identity of the radicals and even less on the mechanisms by which they affect the physiology of living systems. The paucity of detailed information is a direct consequence of the fact that most radicals are highly reactive and, therefore, short-lived transient species. Despite the tremendous advances in spectroscopic and laser photolysis techniques, much less is known about radicals than about closed-shen species. The treatment of radicals by theoretical methods is, however, only marginally more difficult than that of closed-shell molecules. It is for these reasons that the numerous applications of quantum chemical techniques to radicals have proven to be complementary to experimental studies. The large number of biologically important radicals and the even greater number of reactions that they undergo in vivo provide a limitless list of interesting problems. Many biological radicals are formed by exposure of living matter to ionizing radiation. More specifically, radiation causes damage to DNA, the primary products being base or sugar radicals that subsequently lead to strand breaks and DNA-DNA or DNA-protein cross-links. Interest in the effects of radiation on DNA has grown for several reasons. For example, the beneficial effects of radiation therapy are achieved through alterations to DNA, and there is

410

increasing concern about the exposure of the human population to higher levels of ultraviolet radiation due to the depletion of stratospheric ozone. It is extremely difficult to study the effects of radiation on DNA by direct experimental methods. In many cases there is much uncertainty about which radicals are the main radiation products. Due to complications associated with electron spin resonance (ESR) and electron nuclear double resonance (ENDOR) experiments on full DNA, the most accurate studies are available for single crystals of the four DNA bases and related derivatives. However, even these low-temperature spectra are complicated by the presence of significant hydrogen bonding in the crystal structures. Furthermore, due to structural similarities in the generated radicals, the spectra involve many overlapping peaks. Consequently, the experimental identification of specific radicals is difficult and often involves many assumptions. This is an ideal problem for which computational chemistry is a valuable complementary partner to experiment. Due to advances in quantum chemical methods (density-functional theory) and computer hardware, it is possible to accurately predict the hyperfine coupling constants (HFCC) of many biological radicals, the property used to identify radicals experimentally. The theoretical HFCCs can be used to assist with the interpretation of the results obtained from ESR and ENDOR experiments. This chapter has two objectives. First, we review recent progress in the computation of the HFCCs of radicals that may be formed from radiation damage to the four DNA bases, thymine [1], cytosine [2], guanine [3] and adenine [4], as well as the sugar moiety [5]. The theoretical values are compared with the most accurate data available from ENDOR and related experiments on single crystals. The good agreement between the computed and experimental values in many cases is used to validate the level of theory used for the HFCC computations. For a few cases where there are discrepancies between the two data sets, consideration of conformational changes due to hydrogen bonding and packing effects in the crystalline state, which are not accounted for in gas-phase calculations, leads to better agreement. For the small number of cases where the discrepancies between the computed and experimental results are not resolved by modifying the gas-phase structures, it is suggested that alternate assignments of the spectra, or new experiments, may be warranted. In one case, the discrepancy between the computed and experimental HFCCs has led to the proposal of a new mechanism for radiation damage. The second objective is to review the results of experimental studies on full DNA in the context of the computed and experimental results for single crystals. We conclude our chapter with a multi-component model for radiation damage to

411

DNA that includes damage to the bases, the sugar moiety, the phosphate group, and the surrounding water molecules. The model incorporates the results of many sophisticated experimental studies on full DNA and accounts for all known direct and indirect consequences of radiation damage to DNA. The model is expected to be useful for the design of new experiments and the characterization of the ESR and ENDOR spectra of DNA. A full knowledge of the radicals generated upon irradiation of DNA is essential for determining the type of damage at a molecular level which in turn governs the biological consequences (strand-breaks, tandem lesions, DNA-protein cross-links, unaltered base release, etc.). 2. C H A R A C T E R I Z A T I O N OF DNA RADIATION PRODUCTS

We have recently reported extensive calculations on all possible radicals formed by net hydrogen atom addition (hydrogenated), net hydrogen atom removal (dehydrogenated), or net hydroxyl radical addition (hydroxylated) to the four DNA bases, cytosine (C), thymine (T), guanine (G), and adenine (A) [1-4]. We have also studied all radicals formed by net hydrogen atom and net hydroxyl radical abstraction from a model of the sugar group present in DNA (deoxyribose (dR)), as well as sugar radicals formed through more extensive damage pathways [5]. The important information gained from the calculations includes the relative energetics of the products generated from each base through a similar mechanism, the spin density distributions and the HFCCs. The potential energy surfaces for possible radiation products were explored using Becke's three-parameter exchange functional (B3) [6] in combination with Lee, Yang and Parr's correlation expression (LYP) [7] and Pople's 6-31G(d,p) basis set [8]. Two sets of single-point calculations were performed on the global minima. First, the B3LYP hybrid functional and Pople's 6-311G(2df, p) basis set [8] were used to obtain relative energies and spin densities. Secondly, HFCCs were obtained using Perdew and Wang's nonlocal exchange (PW) [9], Perdew's nonlocal correlation functional (P86) [10], Pople's 6-311G(2d,p) basis set [8], and the (5,4;5,4) family of auxiliary basis sets for the fitting of the charge density and the exchange correlation potential. These calculations were carried out with the GAUSSIAN 94 [11] and deMon program packages [12]. The present combination of methods has been successfully employed in studies of model re-radicals [ 13]. Details of the calculation of HFCCs have been reviewed on several occasions and will not be discussed in detail within [14]. However, it is important to understand that the HFCC has two contributions: the isotropic component (Azso) and the anisotropic component (Txx, Trr, Tzz). The addition of Aiso to each

412

component of the anisotropic tensor results in the principal components (Axx, An,, Azz). The calculation of accurate isotropic HFCCs requires both a good description of electron correlation and a well-defined basis set. However, even if these computational demands are satisfied, theoretical results may deviate more than 20% from the experimental value. On the other hand, anisotropic HFCCs can be calculated accurately even with lower levels of theory. More importantly, the calculated anisotropic component of hydrogen HFCCs are often within 5-10% of the experimental value and the most abundant data available for biological systems are hydrogen couplings. Thus, comparison of anisotropic hyperfine tensors can be used as an accurate guide to identify radical sites even when less satisfactory agreement is obtained for the isotropic component. The unit used for the HFCCs throughout is gauss, which is related to megahertz through a simple conversion factor (1 G = 2.8025 MHz). The atomic numbering in the nucleobases used throughout is shown in Figure 1 and a few examples of the notation used for DNA radicals will now be given. The cytosine anion and guanine cation are denoted as C ' - a n d G "+. A radical formed by net addition of OH to C5 in cytosine or thymine is denoted by C(C5OH) and T(C5OI-I), respectively. Similarly, radicals formed by net hydrogen atom addition to N3 or C6 in adenine and thymine, respectively, are referred to as A(N3tD and T(C6H). The radicals formed via net hydrogen atom

/CH3 I

""

CH;OH..

II

I

I

""

R

II

:OH

4'

I

r

"oH

H

R

T

C

dR

~

{

l{

Cs--H

l

11

l,~ 1

H2N

t-. 5

N3

R

A

.C~--n

\N

\ R

G

Figure 1: Structure and chemical numbering in the four DNA bases (R = H, thymine, cytosine, adenine and guanine) and the sugar group (deoxyribose).

413

removal from the methyl group in thymine or the amino group in guanine are denoted by T(Ctt2) and G(N2tt), respectively. Some guanine and adenine crystals examined experimentally lead to protonated radicals, such as the protonated N6-dehydrogenated adenine radical [A(N61t+)] or the protonated guanine C8-hydrogenated radical [G(C8H+)]. The radicals formed via net hydrogen abstraction from the C5' or 03' position in the sugar group are referred to as C5" or 0 3 '~ respectively. The north and south puckering modes (Section 2.3) for the C3 '~ radical will be distinguished as C3'~ and C3'~ respectively. The notation of more complex sugar radicals will be discussed as required. Some of the results obtained for the numerous radicals investigated will be presented in the remainder of this section. The discussion will be separated into base and sugar radicals. The former will be further divided into radicals where theory and experiment are in good agreement, those where external influences must be considered in order to obtain agreement between theory and experiment, and, finally, those where consideration of external influences does n o t aid the poor agreement between theory and experiment. The examples given within were chosen to illustrate the level of agreement with experiment and the type of complementary information that can be obtained from the calculations. For full details on the computations the interested reader is referred to the original series of theoretical papers [1-5]. Only limited experimental data are presented herein, as most experiments gave similar results. For a complete list of experimental papers, please refer to the original theoretical work and/or an excellent review covering experimental work until 1993 [ 15].

2.1 Pyrimidine and purine radiation products" close agreement between experiment and theory Typical theoretical results obtained for a wide variety of base radicals will be represented through a discussion of the results for five radicals (Table 1). The first data block in Table 1 represents the experimental and theoretical HFCCs obtained for the radical formed through net hydrogen atom addition to C5 in

I

II

Figure 2: Pseudo-axial (I) and pseudo-equatorial (II) T(C5H) conformations.

414

Table 1" Theoretical and experimental HFCCs (G) in radicals for which good agreement between the two data sets is observed. Theory Experimenta Radical Atom Ai~o To: Trr Tzz Ai~o Txx TrY Tzz T(C5H) C6H - 1 5 . 9 -11.2 -0.2 11.4 -19.2 -11.2 1.0 10.2 C5H 41.9 - 1 . 5 -1.1 2.5 48.6 - 1 . 7 -0.7 2.4 T(CH2)

A(NrH)

NIH -2.5 N3H 0.1 C6H -11.4 C5-CH -15.1 C5-CH -14.1 N6H C8H

G(C5H) C5H

-1.8

-1.1

-0.4 0.1 -5.4 -0.7 -8.9 -0.1 -8.1 -0.5

2.9 0.4 -1.0 6.1 -10.7 9.0 -16.4 8.6 -15.7

- 1 1 . 8 -9.7 -2.0 11.8 -11.5 -4.0 - 2 . 3 -0.3 2.6 - 4 . 6 49.5

- 0 . 7 -0.5

1.2

54.0

-0.9 -4.8 -9.0 -8.1

-0.8 -0.4 0.6 0.3

1.7 5.3 8.4 7.9

-8.3

-1.2

-2.4 -0.2

9.4 2.6

-1.0 -0.5

1.7

G(N21-I) N2H -7.6 -6.6 - 1.2 7.7 - 9 . 6 - 6 . 9 -0.9 7.8 C8H -6.0 -3.4 -0.3 3.7 - 4 . 9 - 2 . 6 -0.2 2.9 "References for experimental data: T(C51-1) and T(CH2) reference 17; A(N6H) reference 20; G(C5I-I) and G(N2H) reference 21. thymine [T(C5H)], which displays interesting geometrical effects. This radical is distorted at C5 while the rest of the ring remains planar leading to two possible orientations for the additional hydrogen atom ~ pseudo-axial and pseudo-equatorial (Figure 2). The radical with hydrogen in the pseudo-axial position, almost perpendicular to the molecular plane, is slightly lower in energy (0.7 kcal/mol). This agrees with the most stable conformation observed for 5,6-dihydrothymine [16]. A spin density of 0.79 at C6 leads to a large isotropic coupling for the out-of-plane C5H in the pseudo-axial position (41.9 G) and a smaller coupling for C6H (-15.9 G). These calculated couplings match well with the experimental predictions where a large coupling was assigned to a fl-hydrogen orientated in a position perpendicular to the thymine base [17]. A second coupling was experimentally assigned to the hydrogen at the C6 position, and the spin density at C6 was predicted to be 0.75, in good agreement with experiment. The isotropic couplings in T(C51-1) with the added hydrogen in the pseudo-equatorial position (C5H = 16.0 G; C6H = -16.4 G) confirm that the hydrogen at C5 in the observed radical is in a pseudo-axial position. This example illustrates the additional geometrical information that can be obtained from the calculations. The C5-methyl dehydrogenated thymine radical [T(CH2)] has been observed in almost every ESR study on thymine derivatives to date [15] and was calculated

415

to be the lowest energy radical formed by net hydrogen atom removal from thymine. Experimentally [17], this radical is characterized in thymine crystals by two methyl hydrogen isotropic HFCCs (-15.7 and -16.4 G) and a small C6H isotropic coupling (-10.7 G). The corresponding theoretical isotropic couplings are - 14.1, - 15.1 and - 11.4 G, respectively. In addition, the anisotropic HFCCs agree closely for all three allylic protons. An additional weak coupling (Ai~o = -1.0 G) was assigned to N3H and an estimated spin density of 0.04 was assigned to N3. The calculated spin densities indicate a smaller amount of spin on N3 (-0.01) and a greater amount on N1 (0.08). Although the calculated isotropic couplings are small in both cases, it can be suggested that the experimental coupling is due to the hydrogen at N1 (-2.5 G) rather than at N3 (0.1 G). The experimentally derived anisotropic HFCCs fall in-between those calculated for N1H and N3H, and thus assignment to either of these atoms is not facilitated through examination of the anisotropic HFCCs. ,.

The large anisotropic couplings (-11.7, 4.7, 6.9 G) assigned experimentally to the N6-dehydrogenated adenine radical [A(N6tt)] in co-crystals of adenosine and 5-bromouracil (rA:5BrU) were speculated to arise due to hydrogen-bonding interactions in the crystal where the remaining N6 hydrogen is hydrogen bonded to oxygen in uracil [18]. The calculated geometry of the N6-dehydrogenated radical is planar with the remaining amino hydrogen also located in the molecular plane. The calculations indicate that the N6-dehydrogenated radical indeed possesses a large isotropic coupling (-11.8 G) with significant anisotropy (-9.7, -2.0, 11.8 G). Differences from the experimental anisotropic results isolated in rA:5BrU may arise due to hydrogen bonding in the crystal structures. Crystal effects, such as hydrogen bonding and crystal packing, must play an important role since even the experimentally determined anisotropic HFCCs obtained from rA:5BrU (-11.7, 4.7, 6.9 G) and either adenosine (rA) [19] (-9.1, 1.2, 7.9 G) or anhydrous deoxyadenosine (dA) [20] (-8.3, -1.2, 9.4 G) differ substantially. It is clear from the present calculations (Table 1) that the magnitude of the N6H coupling tensor is significant without hydrogen-bonding effects. Overall, it can be concluded that the calculated results support the experimental assignment of this radical due to the magnitude of the calculated N6H anisotropic HFCCs and C8H data. The radical formed through net hydrogen addition to C5 in guanine [G(C5H)] was identified in detailed work on 2'-deoxyguanosine 5'-monophosphate (5'dGMP) [21]. The experimental study indicated that C5H has a very large isotropic coupling (54.0 G) and a very small anisotropic coupling tensor (-1.0, -0.7, 1.7 G). The C5-hydrogenated radical was calculated to be in a "butterfly" conformation (Figure 3) where the pyrimidine and imidazole tings remain planar

416

Figure 3" "Butterfly" conformation of G(CSI-I). but are tilted about the C4C5 double bond towards each other [22]. A higher energy conformer (not examined in the present work) involves the tings tilted to opposite sides of the C4C5 bond [22]. The experimental anisotropic coupling tensor is in good agreement with the calculated tensor (-0.7, -0.5, 1.2 G). The calculated isotropic C5H coupling (49.5 G) also supports the experimental assignment of the observed spectrum to G(CSH) and verifies that C5H is located perpendicular to the C4C5 bond in the "butterfly" conformation. The only radical identified in nonprotonated guanine crystals formed through net hydrogen atom abstraction is the N2-dehydrogenated radical [G(N2I-I)]. This radical has been observed in 5'dGMP [21,23] and guanosine 3'5'-cyclic monophosphate (3'5'cGMP) [24], and all experimental couplings are in excellent agreement with one another. The C8 and N2 spin density distributions in all samples were determined to be approximately 0.17 and 0.33 (calculated values: 0.19 and 0.37, respectively). The N3 spin density (0.31) was determined in the study of 3'5'cGMP crystals (calculated value" 0.35) [24]. The experimental couplings for the N2-dehydrogenated radical obtained in the various studies are remarkably similar. The C8H coupling tensor consists of an average isotropic component of-4.9 G and an average anisotropic component of (-2.5, -0.2, 2.7 G), which are only in fair agreement with the calculated values (Ai~o = -6.0 G; Tii = -3.4, -0.3, 3.7 G). The remaining amino hydrogen was also observed in the experimental studies, where an isotropic HFCC, averaged between the three studies, of-9.6 G was obtained. The magnitude of this coupling is again larger than the N2H coupling obtained from DFF (-7.6 G). However, comparison of experimental (-6.9, -1.0, 7.8 G) and calculated (-6.6,-1.2, 7.7 G) anisotropic N2H coupling tensors supports the experimental assignment of the spectrum to

G(N2H). These five examples illustrate the agreement between theory and experiment that is considered to be more than sufficient for the calculations to verify the experimental radical assignment. It is important to stress once again that the anisotropic coupling can be calculated to a greater degree of accuracy and that some error is expected in the isotropic component. This trend is nicely

417

portrayed in Table 1. It is also interesting to note that for larger (adenine and guanine) radicals the agreement with experiment is not as good as observed for smaller (thymine) radicals. This can be attributed to the fact that it is more difficult to describe the larger systems theoretically. However, all examples discussed above show that the level of theory chosen is suitable to evaluate HFCCs in DNA radicals. In the next section, a selection of radicals will be discussed where the agreement between theory and experiment is initially poor until the differences between the two data sources (gas-phase versus crystalline environment with extensive hydrogen bonding and possible crystal packing effects) are taken into account. 2.2 Pyrimidine and purine radiation products: problematic cases For most of the thymine radicals considered, excellent agreement between theory and experiment was observed as shown for T(C5H) and T(CH2) (Section 2.1). Thus, it was surprising that poor results were observed for the O4-hydrogenated product [T(O4H)]. However, comparison of the experimental and calculated HFCCs in T(O4H) indicates that good agreement between the two data sets is obtained for all of the HFCCs except for the isotropic O4H coupling (Table 2). The spin density in this molecule was concluded from experimental data to exist predominantly on C6 (0.50) and C4 (0.40), with a small amount on C5 (0.08). This is in good agreement with calculated results

Table 2" Theoretical and experimental HFCCs (G) in radicals which exhibit poor a~reement between the two data sets. Theory Experimenta Radical Atom A~o Txx Try Tzz A~o Txx Try Tzz T(O4H) N3H -3.4 -2.9 -1.0 3.9 -2.1 -2.5 -0.7 3.1 O4H -1.6 -1.7 -1.6 3.3 1 2 . 3 -2.6 -2.5 5.1 C6H -15.1 -8.5 -0.4 8.1 -14.2 -8.2 1.0 7.2 C5-CH -4.0 -0.7 -0.4 1.3 -2.6 -0.6 -0.4 1.1 A(N3H)

C2H N3H C8H

A(N3H+)

C2H

-12.9 -7.3 0.4 15.2 -2.9 -1.5 -3.0 -1.9 -0.2

6.9 -10.6 -5.9 0.2 5.7 4.4 - 3 . 9 - 3 . 1 -0.9 4.0 2.1 -4.4 -2.4 0.3 2.1

2.2 -9.4 -0.9 10.3 -14.2 -10.0

1.0 9.0

G(O61-1+) NIH -3.2 -2.4 -0.5 2.9 O6H 22.0 -1.5 -1.2 2.7 N7H -2.4 -1.5 -1.2 2.7 - 2 . 8 -1.6 -0.9 2.4 C8H -11.0 -6.2 0.3 5.9 -8.1 -4.0 -0.6 4.5 aReferences for experimental data: T(O4I-I) reference 17; A(N3H) reference 18; A(N31-1+)reference 26; G(O6H+) reference 30. ,i

418

obtained from a Mulliken population analysis (0.56, 0.36 and-0.12 on C6, C4 and C5, respectively) indicating that an accurate description of the spin distribution in this radical is obtained with the level of theory implemented. The question remains as to why the isotropic O4H couplings do not correspond. Experimentally, the relatively large coupling (12.3 G) assigned to O4H was speculated to be due to an out-of-plane position for this atom. Semi-empirical calculations performed by Sagstuen et al. [17] support the initial predictions of an out-of-plane hydrogen configuration. B3LYP predicts O4H to be in the molecular plane (structure I, Figure 4), a configuration which results in a very small HFCC (-1.6 G). Effects of an out-of-plane position on the O4H HFCCs were investigated through single-point calculations performed by fixing the ring geometry, as this is expected not to change considerably, and by varying the HO4C4C5 dihedral angle (0) in steps of ten degrees out of the molecular plane. These single-point calculations (Table 3, left columns) indicate that the isotropic O4H HFCC is very dependent on the dihedral angle and a maximum HFCC (= 22 G) is obtained at an angle of 90 ~ out of the molecular plane. The rotational barrier is very small, approximately a 2 kcal/mol difference between the in-plane position and the position 90 ~ out of the plane, and a 5 kcal/mol difference when the hydrogen is cis relative to the C4N3 bond. The rotation does not modify the spin distribution in the radical. A calculated O4H coupling close to the experimental value is observed at an angle of approximately 50 ~ out of the molecular plane. H

I 9"

H

II I

H

I

II

I "-

I

H

II

Figure 4: Conformations of T(O4tl). The difference between theory and experiment for the geometry of T(O4H) arises due to the rapid rotation of the methyl group in the experimental environment, which is characterized by the presence of three equivalent methyl group protons in the ENDOR spectra. Allowing for the rotation of the methyl group, the O4-hydrogen and the in-plane methyl hydrogen positions are only

419

Table 3" I-IFCCs (G) and relative energies (kcal/mol) obtainedin T.(O41t) through examination of methyl group rotation. .. Dihedral Methyl group optimized Methyl group rotated Angle Aiso(O4H)Rel. Energies Aiso(O4H) Rel. Energies 0 -1.6 0.0 -1.7 1.6 20 0.7 0.2 0.6 1.3 40 7.2 0.6 7.2 1.0 60 15.1 1.2 15.6 0.6 80 21.6 1.7 21.7 0.3 100 23.4 2.0 22.8 0.1 120 19.2 2.5 18.3 0.0 140 11.0 3.3 10.1 0.4 160 2.3 4.6 1.9 1.1 180 -1.7 5.2 -1.7 1.5 ,

i.

separated by 1.62 ,~ in the calculated geometry (structure II, Figure 4). The effects of this unfavorable interaction [17] and the unfavorable interaction with the N3-hydrogen are expected to result in an out-of-plane position for O4H in the crystals. This hypothesis is readily confirmed by additional single-point calculations performed by rotating the HO4C4C5 dihedral angle as before, but with the methyl group fixed in a staggered orientation with respect to the C5C6 double bond (Table 3, fight columns). In this case, the lowest energy orientation for the O4H is at an angle of approximately 50-60 ~ out of the molecular plane (0 = 120-130~ the same position that yields the experimentally determined O4H HFCC. The results obtained for the thymine O4-hydrogenated radical can be extended to 1-methylthymine and deoxythymidine since geometrical and electronic changes are expected to be small upon substitution at the N1 position. Comparison of calculated and experimental HFCCs indicates that the O4-hydrogen remains in the molecular plane and at an angle of approximately 60 ~ out of the molecular plane in 1-methylthymine [25] and deoxythymidine [15,25] crystals, respectively. The differences in these systems relative to unsubstituted thymine arise due to the characteristic hydrogen bonding patterns in the crystals.

Figure 5" Calculated distortion in the gas-phase A(N3I-I) radical.

420

The radical formed by net hydrogen addition to N3 in adenine [A(N3H)] undergoes geometrical alterations upon formation. The N3 hydrogen is located out of the molecular plane and the amino group is puckered with both hydrogens displaced out of the plane (Figure 5). The experimental and theoretical C2H and C8H isotropic HFCCs, as well as the anisotropic tensors, are in good agreement with the calculated results for this radical (Table 2). However, a small N3H coupling has been experimentally observed for this radical (-3.9 G) while a large HFCC (15.2 G) was calculated due to distortions at N3. It is possible that hydrogen bonding or packing effects in the crystal forces the N3 hydrogen to remain in the molecular plane, thus leading to a small isotropic HFCC and explaining why the N3H coupling is not observed in all experimental studies. This hypothesis can be tested through examination of a fully optimized C~ structure, which lies only 1.7 kcal/mol above the non-planar arrangement and possesses two imaginary frequencies. The spin distribution, and the C2H and C8H HFCCs (Table 4), in the planar radical is very similar to that calculated for its puckered form (Table 2). The main difference in the computed couplings is in the magnitude of the N3H isotropic HFCC. In the C~ N3-hydrogenated radical, the N3H isotropic component was calculated to be -3.6 G, which is in much better agreement with experiment (-3.9 G) than that calculated for the puckered form (15.2 G). Hence, it can be concluded that in crystals where the N3H coupling was detected, A(N3H) is likely to remain in a planar form. The N l-protonated form of A(N3H) [A(N3H+)] has been observed in crystals of adenine hydrochloride hemihydrate (A:HCI:VzH20) [26]. The HFCCs in this protonated radical follow a similar pattern to those discussed for A(N3H). The Table 4: Comparison between experimental HFCCs (G) and those calculated for planar radicals. Theory Experimenta _Radical Atom Aiso Txx Trr Tzz Ai~o Txx Trr A(N3H) C2H -14.0 -7.9 0.2 7.7 -10.6 -5.9 0.2 N3H -3.6 - 3 . 3 -1.1 4.5 - 3 . 9 -3.1 -0.9 C8H -5.1 -2.9 0.2 2.7 -4.4 -2.4 0.3 i ,

A(N3H+)

C2H

-18.5 -10.7

0.4

10.4 -14.2 -10.0

T~ 5.7 4.0 2.1

1.0 9.0

G(O6H+) N1H -2.6 -2.2 -0.9 3.1 -3.2 -2.4 -0.5 2.9 O6H -1.5 -1.5 -1.5 2.9 N7H -3.2 - 2 . 3 -1.3 3.6 - 2 . 8 -1.6 -0.9 2.4 C8H -12.8 -7.2 0.4 6.8 -8.1 -4.0 -0.6 4.5 ~References for experimental data: A(N3H) reference 18; A(N3H+) reference 26; G(O6H+) reference 30.

-

9

ii,,

421

gas-phase geometry of A(N3H+) is distorted at C2, which leads to N3H HFCCs in poor agreement with experiment (Table 2). However, upon consideration of a planar structure, the HFCCs are in good agreement with the experimental results (Table 4). These examples illustrate how hydrogen bonding and/or crystal packing can affect the radical geometry and therefore, indirectly, the HFCCs. The N7-protonated O6-hydrogenated guanine radical [G(O6H+)] has been observed in studies on crystals of guanine hydrochloride monohydrate (G:HCI:H20) [27], the free acid of guanosine 5'-monophosphate (5'GMP(FA)), [28] guanine hydrochloride dihydrate (G:HCI:2H20) [29] and guanine hydrobromide monohydrate (G:HBr:H20) [30]. The geometry was calculated to exhibit distortions at C6 (Figure 6), where O6H is located out of the molecular plane and resuks in a large isotropic O6H coupling which was not recorded experimentally. The calculated coupling for the hydrogen at C8 is also large, while the corresponding experimental coupling is small (Table 2). Not even the anisotropic couplings for this radical are in agreement. Thus, it seems unlikely that the N7-protonated 06 hydrogen addition radical is responsible for the spectra observed in these studies. Since hydrogen-bonding interactions or crystal packing effects may resuk in a planar geometry, as discussed for A(N3I-I) and A(N3H+), a Cs radical was obtained through a full optimization, which possesses one imaginary frequency and lies 1.7 kcal/mol higher in energy than the nonplanar radical. Calculations on the planar species (Table 4) yield a small O6H coupling which is expected experimentally and, thus, the agreement between the calculated couplings and experiment could be considered to be improved over that observed for the nonplanar radical. Additionally, a N1H coupling was calculated in the planar radical that was not obtained for the nonplanar form. However, the experimental and calculated couplings disagree in the magnitude of the CSH coupling, where the HFCCs obtained from the calculations are too large relative to those obtained in the experimental study. The possibility that the observed radical is not protonated can be eliminated. In particular, the CSH HFCC for the planar O6-hydrogenated radical (Ai~o = -3.9 G;

Figure 6: Calculated distortion in the gas-phase G(O6I-I+)radical.

422

Z/i "-" -2.3, -0.1, 2.3 G) is different from that assigned to the N7-protonated O6-hydrogenated radical. Furthermore, clear couplings were observed experimentally for N7H. To ensure that differences in the calculated and experimental C8H couplings for the N7-protonated O6-hydrogenated radical do not arise due to differences in the hydrogen bonding environment at N7, a series of calculations were performed where the N7H bond was lengthened [3]. The N1H, O6H and C8H couplings did not change over the N7H bond lengths investigated (0.908 - 1.308 ,a,). Alternatively, the N7H anisotropic couplings show a decrease in magnitude with an increase in bond length. Despite the great difference between the C8H couplings in the planar protonated and nonprotonated radicals, neither of these couplings match those assigned to the protonated O6-hydrogenated radical. However, the average of these couplings ( A i s o -~ -8.4 G; Z i i - -4.8, 0.3, 4.6 G) is in good agreement with the experimental results (Ai~o = -8.1 G; Zii- -4.0, '0.6, 4.5 G). Moreover, the average calculated N1H coupling (A~o = -2.8 G; T i i -" -2.4, -1.1, 3.5 G) is also in agreement with experiment (Ai~o = -3.2 G; T i i - -2.4,-0.5, 2.9 G). Any discrepancies between experimental and calculated N7H couplings can also be explained in terms of differences in the N7H bond length. The experimental N7H HFCCs are in better agreement with the calculations performed at longer bond lengths than those performed at the optimized geometry [3]. Thus, a possible explanation for the observed spectra is either a recorded averaging through temperature effects, which cause N7H to vibrate or rotate, or an extreme example of the effects of hydrogen bonding on the HFCCs. In either case, this example illustrates how hydrogen bonding in the crystals can directly affect HFCCs. Additionally, this discussion demonstrates how calculations can be used to view experimental spectra in a new fight.

Another truly interesting problem arising in the calculations under discussion is the inaccurate description of the puckering at a carbon center upon hydrogen atom addition. Table 5 compares theoretical and experimental isotropic HFCCs for the hydrogens located at the carbon to which the additional hydrogen has been added in all radicals generated through net hydrogen atom addition to a double bond in the four DNA bases. Most of the calculated couplings for the two hydrogens are nearly identical. Indistinguishable couplings arise since the radicals have been calculated to be planar with the two hydrogens under discussion lying on either side of the molecular plane. Accurate experimental resuks obtained with ENDOR, however, recover unique couplings for each hydrogen, which presumably indicates differences in the atomic environment due to puckering at the addition site. Only the results for T(C5H) are in good agreement with experiment (recall that it is difficuk to reduce the error in computed isotropic HFCCs to less than about 20%). This illustrates that when a

423

Table 5: Comparison of theoretical and experimental isotropic HFCCs (G) in radicals formed by net hydrogen atom addition to a double bond in the DNA bases.~ Radical Experiment b Theory Radical Experimentb Theory T(C5It) 48.6 41.9 A(C2tt) 32.8/38.9 43.3 54.3/47.5 45.5 T(C6tt)

45.3 32.0

33.9 33.9

A(C8tt)

36.3/36.7/38.4 41.6/40.9/41.0

38.9 39.1

C(C5tt)

47.1 31.0

44.6 14.0

A(C2H+)

39.1 40.5

36.2 36.7

C(C6H)

51.3 47.7

45.1 42.0

A(C8H+)

40.9 43.0

40.5 40.6

29.1 37.2 36.9 G(C8H+) 33.1/35.3 29.1 39.3 37.2 36.5/38.2 aAll data presented is for the hydrogens at the addition site. ~ full list of references to the experimental data can be found in references 1 - 4. G(CSH)

bulky methyl group is attached to the hydrogen addition site, the puckering is much easier to describe theoretically (the geometry of this radical is displayed in Figure 2). Another radical for which adequate calculated HFCCs were obtained is C(C61-I). Although the theoretical results for this radical are smaller than the experimental values, the difference between the two hydrogen couplings (3 G) is identical in both data sets. For all other radicals in Table 5, there exists a significant disagreement with experiment, which is mainly due to inadequacies in describing ring puckering. The fact that B3LYP predicts relatively planar geometries compared to other theoretical methods has been documented in the literature for T(C5H) and T(C6H) [31]. Although the main shortfall for the calculated geometries of the radicals in Table 5 is the predicted planar geometry versus the apparently puckered structures experimentally, the puckering in C(C5H) was overestimated. The difference between the experimental C5H couplings is 16 G whereas the difference between the calculated couplings is much larger (30 G). If a planar radical is assumed, then equivalent couplings are obtained theoretically (35.4 and 35.3 G). Thus, it can be concluded that the geometrical distortion in the crystalline environment must lead to a nonplanar radical with less puckering than initially calculated. For all radicals formed through net hydrogen atom addition to a double bond in one of the DNA bases (Table 5), the calculated anisotropic

424

couplings for the hydrogens at the addition site and/or the couplings for other hydrogens confirm the experimental assignment of each radical. Thus, from these examples it is evident that even though the experimental HFCCs were extractedat low temperatures, often gas phase calculations are not capable of reproducing the experimental results. Thus, alternate effects must be taken into account. The most common arguments implemented to understand why theoretical and experimental HFCCs differ include molecular vibration (the rotation of the methyl group in T(O4H)) and the hydrogen bonding scheme and packing effects in the crystal, which can either induce geometrical effects (planar radicals versus gas phase puckered geometries as considered for A(N3H) and A(N3H+)) or affect the HFCCs more directly (through hydrogen bonding to neighboring sites as discussed for G(O6H+)).

2.3 New mechanism for radiation damage in cytosine monohydrate The radicals discussed in Sections 2.1 and 2.2 display good agreement between theory and experiment initially or after alternate arguments had been employed to understand or verify experimental results in relation to calculated HFCCs. The results clearly indicate that the level of theory chosen to calculate the HFCCs is adequate and can be applied to a wide range of DNA radicals (both protonated and nonprotonated). In the present section, the radicals generated in cytosine derivatives will be discussed. The most complete experimental study has been performed on cytosine monohydrate (Cm) crystals by Sagstuen et al. [32]. The suggested mechanism for radical formation in cytosine monohydrate involves net hydrogen atom removal from the N1 position of one cytosine [C(N1)] and hydrogen atom addition to the N3 position of a neighboring cytosine [C(N3H)]. The experimental and theoretical HFCCs in the two radicals believed to be the main products of radiation damage to Cm crystals will now be presented in detail. Table 6: Comparison of theoretical and experimental HFCCs (G) for the first major radical product assigned experimentallyin irradiated cytosine monohydrate crystals [C(N3H)]. Theory: Nonplanar Theory: Planar Experimenta Atom A/so Txx Tre Tzz A/so Txx Trr Tzz A/so Txx Trr Tzz N3H 0.6 -2.5 -1.0 3.5 - 2 . 9 -2.6 -0.9 3.5 -2.0 -2.1 -0.5 2.6 N4H 19.6 -1.0 -1.0 2.0 - 2 . 7 -2.3 -0.7 3.0 - 1 . 6 -1.7 -0.8 2.4 N4H -1.1 -1.5 -0.5 2.0 - 2 . 4 -2.0 -1.0 3.0 C6H -13.7 -8.3 0.2 8.2 -14.8 -8.9 0.5 8.5 -13.5 -8.8 0.8 8.0 aReference 32.

425

C(N3H) is the lowest energy radical formed by net hydrogen atom addition to cytosine. The calculated C6H and one N4H HFCCs in this radical are in very good agreement with those obtained experimentally in cytosine monohydrate (Table 6). On the contrary, the N3H coupling was calculated to be smaller than that determined experimentally, while a large coupling (19.6 G) was obtained from the calculations for the second amino hydrogen. Differences in the experimental and calculated couplings of C(N3H) could arise due to a rotation about the C4N4 bond i n the optimized gas-phase geometry relative to that present experimentally, where hydrogen-bonding effects may be important as discussed for radicals generated in the other nucleobases (such as A(N3H) and G(O6H+)). More specifically, due to crystal interactions a planar radical may predominate over one with a distorted amino group. This is confirmed through the optimization of a radical constrained to C~ symmetry, which is only 3.6 kcal/mol higher in energy than the nonplanar form. The two small isotropic N4H, the anisotropic C6H and the isotropic N3H couplings obtained for the planar radical are in much better agreement with experiment than those discussed for the nonplanar form. Thus the calculations confirm the experimental assignment of one of the major radical products in Cm crystals once hydrogen bonding or crystal packing effects are taken into account. C(N1) is the lowest lying radical formed through net hydrogen atom removal. The calculated spin density displays an alternating pattern with the main components situated on C5 (0.49), O2 (0.35) and N1 (0.29). This distribution is quite different from that obtained experimentally (0.57 and 0.17 at C5 and N4, respectively). In addition, the calculated and experimental HFCCs deviate substantially (Table 7), where even the calculated anisotropic couplings for the amino hydrogens are extremely small compared to experimental values. Since it is known that the anisotropic component can be calculated with a great degree of accuracy using many theoretical techniques, the deviations observed for this radical are too large to be ascribed to the method employed. One possible explanation for deviations from experimental couplings could be that a rotation occurs about the C4N4 bond in the experimental environment which could lead Table 7: Comparison of theoretical and experimental HFCCs (G) for the second major radical product assigned experimentally in irradiated cytosine m0nohydrate crystals [C(N1)]............. Theory Experimenta Atom Aiso Txx Try Tzz Aiso Txx Try Tzz N4 -0.7 -0.5 -0.4 0.9 -5.1 -3.3 -0.6 4.0 N4H -0.5 -0.7 -0.4 1.1 - 4 . 6 -2.2 -1.3 3.5 C5H -11.2 -6.9 -0.4 7.2 -14.8 -7.5 -0.3 7.8 aReference 32. _

426

to significant N4H couplings compared to those calculated for the nearly planar structure. Variation in the HFCCs with rotation about the C4N4 bond was examined and the agreement between the experimental and theoretical HFCCs was not improved [2]. Calculations of the couplings of the N 1-dehydrogenated radical surrounded with up to four water molecules or additional neighboring cytosine fragments to simulate the experimental hydrogen-bonding scheme could also not reproduce the experimental couplings [2]. Even a cytosine dimer was studied to model the N 1-dehydrogenated, N3-hydrogenated diradical pair. None of these investigations lead to a clear theoretical description of the experimental results [2]. Thus, since good results were observed for so many other related DNA radicals, alternate radicals must be considered as possible precursors to the observed HFCCs. Among all cytosine radicals considered, the only radical which gave couplings similar to those assigned experimentally to C(N1) is the radical formed via net hydroxyl radical addition to C5. Two conformers were optimized for this radical [C(C5OH-1) and C(C501-I-2)] and the couplings vary slightly (Table 8). Among the entire set of computed couplings for any cytosine radical, the N1H couplings obtained in each conformation of the C5-hydroxylated product (Table 8) are in best agreement with the experimental couplings assigned to the amino hydrogens in C(N1) (Table 7). One large, negative isotropic coupling, obtained for C6H in these radicals, is not unlike that assigned to C5H in C(N1), although the anisotropic results deviate more substantially. In addition, a C6H coupling left unassigned to a specific radical in cytosine monohydrate (Aiso = -18.2 G; Tii = -9.6, 0.9, 8.6 G) resembles those calculated for C6H in the C5-hydroxylated radicals. The large isotropic coupling (33.0 or 37.4 G) calculated for C5H in the C(C5OH) radicals could be used as a fingerprint for the identification of this radical in future studies. Alternatively, this coupling may have gone undetected in the experiments due to its similarity to the coupling assigned to the C5-hydrogenated radical. Table 8" Theoretical HFCCs (G) calculated for the newly assigned major radical product generated in irradiated cytosine monoh~,drate cr),stals [C(C501-I)]. C(CSOH-1)

Atom Ai~o N1H -4.2 C5H 33.0 C6H -10.6

Txx

C(CSOH-2)

Tzz 4.9 -1.6 -0.5 2.1 3 7 . 4 -1.5 -0.8 2.3 -9.6 -0.3 9.9 -13.3 -10.2 -0.3 10.6 -3.5

Trr Tzz

-1.7

5.3

Ai~o

Txx

Trr

-3.8

-3.2

-1.7

,

427

Thus, although theory does not unequivocally favor one mechanism or the other, comparison of experimental and theoretical HFCCs suggests that the experimentally proposed mechanism is less likely. Furthermore, at least two different mechanisms can be considered which yield the N3-hydrogenated and C5-hydroxylated products and both involve water molecules. In the first postulated mechanism, ionization and electron uptake are initially assumed to occur on cytosine to form C ~ and C ~ where water subsequently adds to the former. The second postulated reaction mechanism involves ionization of a water molecule followed by electron uptake at cytosine, resulting in a water cation and a cytosine anion, where the former dissociates to hydroxyl radicals and protons. Both of these reactions have a net energy cost of 58 kcal/mol, but the second postulated mechanism has a greater energy cost for the first step in the reaction. Ionization of cytosine, which forms C "§ and C ~ followed by deprotonation of the cation and protonation of the anion (as suggested in the experimental study of Cm crystals) costs 68 kcal/mol. Of the mechanisms discussed, the path involving cytosine ionization and water addition is most likely to occur. Reasons for this include the fact that approximately 85% of all ionization processes will occur on cytosine since it possesses a greater number of electrons relative to water. In addition, this reaction has lower energy costs for the initial step (relative to the mechanism involving ionization of water) and the overall process (relative to the proposed mechanism involving hydrogen addition and abstraction products). However, the reaction mechanism involving radiolysis of water to produce hydroxyl radicals and hydroxyl radical adducts is a commonly used ESR technique [33,34]. In addition, Sevilla and coworkers have investigated the presence of hydroxyl radicals in the DNA hydration layer [34]. Hydroxyl radicals were found in the intermediate hydration shell, but not in the closest hydration layer. This was speculated to occur due to reactions of the hydroxyl radicals with DNA. The present work indicates that this option should be examined more closely. In addition, Wala et al. [35] have reported that strand-breaks in DNA occur due to hydroxyl radical addition to the DNA bases. Reactions of DNA and hydroxyl radicals have also been reported to lead to 5-hydroxycytosine [36]. Experimental investigations of adenine and guanine monohydrate crystals also indicate that products formed through net hydroxyl radical addition may also be formed in these crystals. For example, early ESR studies on frozen aqueous solutions of deoxyadenosine 5'-monophosphate [37] revealed one isotropic coupling (29 G) which was believed to be due to a radical formed through addition of a hydroxyl radical to C8 in adenine [A(C8OH)]. The calculated results for the C8-hydroxylated radical (28.8 G) indicate that this coupling is

428

indeed due to the C8H in A(C8OH). Furthermore, the calculations show that a better resolved spectrum would yield experimental couplings for C2H, N9H and both of the amino hydrogens. The spectrum of the C4-hydroxylated guanine radical [G(C4OH)] was recorded in crystals of 3'5'cGMP [23]. The observed radical was determined to possess a C8 spin density of approximately 0.25 (calculated value: 0.26). The only coupling extracted from the experiments was for C8H, whose principal tensor is (- 10.1, -6.9, -3.1 G). These couplings agree reasonably well with the calculated Ai~o and T/i for the proposed radical (-12.3, - 8 . 5 , - 2 . 7 G). If the individual components of the coupling tensor are considered, however, then only fair agreement with experiment is obtained. The protonated radical formed by net hydroxyl radical addition to C8 in guanine [G(C8OH+)] has been observed in single crystals of G:HCI:2H20 [29]. The observed spectrum consists of a large C8H isotropic coupling (20.2 G) and a very small anisotropic tensor (-1.1, -0.6, 1.6 G), which is in excellent agreement with the calculated values (Ai~o = 17.5 G, T i i - -0.9,-0.5, 1.4 G). Experimentally, another isotropic coupling was observed for N7H (-8.5 G) which also possesses great anisotropy (-6.5, -1.5, 8.0 G). The calculated N7H couplings (A~o = -7.0 G; Ti~ = -5.9, -1.7, 7.6 G) are also in agreement with experiment. The agreement observed for the N7H couplings is impressive since the local environment (hydrogen bonding) has been shown to affect the couplings of the hydrogen at N7 in other radicals (for example, G(O6H+), Section 2.2). A small N9H isotropic coupling was also obtained experimentally (-2.2 G) and theoretically (-2.1 G). Experimentally, it was speculated that the observed spectrum could be due to G(C8H+) where the additional hydrogen is added to an in-plane position and, thus, only one large C8H coupling is observed. This alternative seems very unlikely due to the excellent agreement between experimental and calculated HFCCs for G(C8OH+). The examples of hydroxyl radical addition products identified in single crystals of adenine and guanine base derivatives confirm that water can play an important role in the radiation damage mechanism. It should be noted that the new mechanism has attracted some criticism [38]. The main objection is that nonplanar geometries were calculated for the gas phase radicals whereas planar structures are expected experimentally due to the hydrogen-bonding scheme [38]. As shown within, accounting for crystal interactions can lead to improved results in most cases [C(N3I-I)], however, not for C(N1). Thus, the newly proposed mechanism should not be discarded based solely on these arguments [39].

429

Assigning the N3-hydrogenated and C5-hydroxylated radicals as the major radiation products in cytosine monohydrate crystals would also explain the absence of the Cm couplings assigned to the N 1-dehydrogenated radical in the larger cytosine systems. Previously it was assumed that these couplings were not observed since a methyl or sugar group replaces the hydrogen at N1 preventing the N 1-dehydrogenated radical from forming. A new explanation uses the fact that water was not present in previous crystal studies and, thus, the C5-hydroxylated product was not possible. Monohydrate crystals of deoxycytidine 5'-monophosphate (5'dCMP) [40] were studied, however, and the similarity of the couplings observed in these crystals (assigned to the cation) to those experimentally assigned to the N 1-dehydrogenated radical in Cm should be noted. All of the evidence presented above, in addition to the good agreement observed for countless other related DNA radicals, creates a clear picture that water plays an important role in monohydrate crystals and therefore will most likely play an important role in the radiation damage mechanism in biological systems.

2.4 Sugar radicals in irradiated DNA As mentioned, sugar radicals have been investigated at the same level of theory discussed for the other DNA bases. Sugar radicals can be formed through direct mechanisms, in which alkoxyl or base radicals are generated and radical character is transferred to the sugar group, and indirect mechanisms, where hydrogen or hydroxyl radicals generated from water radiolysis attack the sugar group. In an important study, Schuchmann and von Sonntag [41] concluded that hydroxyl radicals attack the six carbon atoms in D-glucose to an equal extent. However, ESR techniques have been unable to detect sugar radicals in irradiated DNA [42]. Hole et al. [21] were the first to observe a large variety of sugar radicals in their ENDOR study of 2'-deoxyguanosine 5'-monophosphate, where nine sugar radicals were characterized. This provides a nice example of the power of the ENDOR technique since ESR did not easily detect these radicals. A subsequent ENDOR study of single crystals of deoxyadenosine [43] supported the hypothesis that many sugar radicals are generated upon irradiation. Theoretical investigations of carbon-centered sugar radicals have appeared in the literature [44,45]. In these studies, geometries, relative energies, spin density distributions and hyperfine coupling constants were calculated at the Hartree-Fock level. Both studies were very complete and carefully performed at the level of theory chosen. However, Hartree-Fock overestimates the hyperfine coupling constants considerably and methods accounting for electron correlation are essential to calculate this property accurately.

430

The model sugar group chosen (Figure 7) represents phosphate groups with hydroxyl groups and the DNA base with an amino group. The sugar radicals investigated include hydrogen abstraction radicals formed by removal of hydrogen from all carbon and oxygen atoms, radicals formed via removal of either of the hydroxyl groups in the model system, as well as a variety of radicals which lead to significant sugar ring alterations. Some of the results obtained for these systems will be presented in the present section. Two different puckering modes were examined for each possible radical corresponding to north (N) and south (S) radicals, which are defined according to where the radical is located on the pseudorotation cycle [46]. The nonradical sugars present in A and B-DNA are in north and south conformations, respectively. B3LYP predicts C4'~ and C2'~ radicals to be the lowest and highest energy radicals among those formed by hydrogen abstraction from a carbon, respectively. This information is useful to determine which sugar radicals are most easily generated and, thus, which radicals are involved in strand-breaks, as mechanisms have been proposed involving almost all carbon centers. The C2' radicals were determined to be relatively flat at the radical center since oxygen is not present at a neighboring site which removes unfavorable interactions with lone pairs. This lack of stabilization helps to explain why these were calculated to be the highest energy radicals in this class and why C2' radicals have only appeared as minor products in experimental studies [21]. For all hydrogen and hydroxyl abstraction radicals, the major geometrical alterations that occur upon radical formation affect only the bonds and angles involving the radical center. The bonds between the radical center and surrounding atoms are generally contracted between 0.04 and 0.07/k. The

O_ ~ 5' r OzPO~ ~ ~ a s e Nucleotide 4'~,H "~ H/~ 1' Unit 3H ' -"~ 2' ~' H

5' QII HOCH_ - L ~ o ."_. . . . jNH2 4N3~,iN ~ ~HI' ~oHZH oe

po~ I

II

Figure 7: Structure and numbering of the sugar group present in DNA (I) and the model system used for the calculations presented within (II).

431

bond angle with the radical center as the central atom changes between 2 and 8 ~ The remainder of the sugar ring geometry in all radicals is relatively unaffected. The couplings present in the spectra of a number of irradiated DNA molecules have been assigned to the radical formed via net hydrogen atom removal from C 1 (CI'~ [21,26,43,47]. Hole et al. [21] determined that the n-spin density at CI' in this radical is 0.64, which is smaller than the calculated value (0.75). Comparison of experimental and theoretical HFCCs indicates that the calculations support the experimental assignment of the C I ' radical (Table 9). In particular, the experimental results agree more closely with those calculated for the N-type radical. One of the C2'H couplings calculated for C I " ( S ) is significantly smaller (9.1 G) than the experimental results (approximately 18 G). This is a nice example of the effects of the sugar ring puckering on the HFCCs. It should be noted that although the C2'H isotropic components differ between N and S-type radicals, the anisotropic values are almost identical. Another example of the predictive power of theoretical calculations can be found in C3 '~ Through comparison of the two calculated C2'H couplings in the N and S-type conformers with experimental results obtained from 2'-deoxyguanosine 5'-monophosphate (Table 9) [21], the nature of the observed radical is difficult to predict. However, the calculated HFCCs for the N and S-type C3' radicals differ through the absence of a C4'H coupling in the latter conformation. Since a large C4'H coupling was recorded experimentally, ' the~ calculations predict this radical to be present in the north conformation. The Table 9: Comparison of theoretical and experimental HFCCs (G) for select sugar radicals formed via net hydrogen atomremoval from one of the ring carbons. Theory: North Theory: South Experiment a Radical Atom Aiso Txx Tr~ Tzz Aiso Txx Trr Tzz Also Txx Trr Tzz C1 '~ C2'H 18.5 -1.4 -1.0 2.4 9.1 -1.9 -1.4 3.3 17.2 -1.9 -1.7 3.6 C2'H 22.7 -1.9 -1.6 3.4 29.3 -1.4 -1.1 2.5 25.4 -1.7 -0.7 2.3 C3 '~

C4 '~

C2'H 18.9 -1.7 -1.5 3.2 12.4 -1.6 -1.1 2.7 16.7 -1.6 -0.9 2.5 C2'H 34.0 -1.5 -1.1 2.6 31.2 -1.9 -1.5 3.5 38.1 -2.1 -1.4 3.4 O3'H -2.8 -4.3 -3.1 7.4 -2.3 -4.5 -3.3 7.7 C4'H 22.5 -1.6 -1.3 2.8 27.5 -1.7 -0.9 2.7

C5'H 27.9 -1.8 -1.1 2.8 C5'H 2.8 -2.1 -1.4 3.5 6.2 -2.4 -1.1 3.5 C4'H 22.1 -1.8 -1.1 2.8 31.4 -1.9 -1.2 3.2 O5'H 5.6 -1.6 -0.8 2.5 aReferences for experimental data: C1 '~ reference 43; C3 '~ reference 21.

432

calculated values indicate that O3'H has a small isotropic coupling and a relatively large anisotropic contribution ~ were not reported in the experimental study. However, experimentally there was another coupling observed for which only the principal components (16, -22, 29 G) were resolved and assignment to a particular atom was not made. The unassigned couplings are not unlike those of a C2' hydrogen and could possibly be due to a C2'H in a ring with another conformation. Any difference between the experimental and the calculated isotropic hyperfine coupling constants in this radical could be due to the presence of a phosphate group at the C5' position in the experimental study since it has been previously determined that the phosphate groups affect the HFCCs in the C3' radical [45]. Not all experimental and theoretical HFCCs for the sugar radicals are in such nice agreement with one another. A typical example is the C4" radical, which has been observed in three different crystals: uridine 5'-monophosphate (5'rUMP) [48], inosine (rI, which can be derived from adenosine by replacing the amino group at C6 with a hydroxyl group) [49] and adenosine:5-bromouracil (rA:5BrU) [18]. The C4'~ calculated radical exhibits two C5'H couplings, one of substantial magnitude (27.9 G), and no O5'H coupling, while C4'~ has a significant O5'H coupling (5.6 G) and only one small C5'H coupling (6.2 G). Experimentally, three substantial couplings of 36, 25 and 24 G were recovered in crystals of 5'rUMP, [48] and two small couplings were observed at certain orientations for which accurate HFCCs could not be evaluated. In rI [49], large C3'H (34.7 G) and C5'H (33.4 G), as well as a small C5'H (3.4 G), couplings were obtained. In rA:5BrU, two couplings were resolved corresponding to the C3' and C5' hydrogens (21.0 and 10.0 G, respectively). Overall poor agreement between theoretical (Table 9) and experimental HFCCs (mentioned within) was observed. Additionally, no anisotropic components, which are important for comparison to theoretical work, were isolated in the experiments. Differences between theoretical and experimental isotropic couplings could arise due to alterations experienced when phosphate groups replace the hydroxyl model group [45]. However, more accurate experimental and theoretical data is required to verify this radical assignment. Another example of disagreement between theory and experiment is the C5'" radical, which has been assigned in studies of various DNA constituents [21,43,47,50]. The HFCCs calculated for both the north and south conformers are in close agreement with one another as the radical center is outside the sugar ring, the part of the molecule involved in the puckering. Theoretically, relatively small isotropic couplings were obtained for C5'H and O5'H, while a large coupling was calculated for C4'H (Table 10). Experimentally, large

433

Table 10" Comparison of experimental and theoretical HFCCs (G) for C5 '~ Source Atom Also Txx TrY Tzz

5'dGMP

dAb dam c 3'CMPd

5C1 and 5BrdUd

a

Experiment C5'H -22.2 -8.7 0.8 7.9 C5'H -20.9 -8.6 0.6 8.0 C5'H -20.8 -8.8 0.8 8.0 C5'H - 19.6 -8.7 0.5 8.2 C4'H 2.5 O5'H* (16.3) (20.2) (28.1) C5'H -14.7 -7.9 -1.7 9.7 C4'H 7.0+_1 C5'H -17.5 -11.8 0.8 11.0 C5'H -22.7 -9.3 0.7 8.5 C4'H 4.5 -3.0 0.1 3.0 O5H 20.8 -4.3 -1.5 5.8 C5'H -20.7 -12.5 2.9 9.6 O5'H 8.6 -3.1 -1.0 4.2 C4'H 18.9 -1.6 -0.2 1.9

Theory C5'H -9.4 -11.2 -0.8 12.0 C4'H 33.6 -1.7 -0.8 2.5 O5'H -4.3 -5.1 -3.4 8.5 C5'~ C5'H -10.4 -10.9 -0.7 11.7 C4'H 35.3 -1.6 -0.8 2.4 O5'H -3.9 -5.0 -3.3 8.3 aReference 21. bReference 50. CReference 43. dReference 42. CS'~

couplings were elucidated for C5'H and small values for C4'H and O5'H (Table 10). Despite these differences, the anisotropic couplings are in much better agreement. However, the experimental trend stated for the isotropic couplings is not true for all of the experimental results, as even the experimental results differ greatly between crystalline environments. Due to discrepancies between the results, an in-depth investigation of the couplings assigned to C5 '~ is required. Since significant effects on the HFCCs can be observed with changes in geometry (as discussed for T(O4I-I)), an investigation of the dependence of the HFCCs on rotation about the C5'C4' bond was undertaken. The XC5'C4'C3', X = 0 5 ' or H5', dihedral angles in the north conformer were varied by increments of 15 ~ starting from the optimized geometry (289.3 ~ and 144.4 ~ for X = H5' and O5', respectively) and single-point calculations performed at each step. The results for the variation in C4'H, C5'H and O5'H HFCCs as a function of rotation angle are displayed in Figure 8. It is interesting to note that upon rigid rotation,

434

'~

r

50 /

C'4

-o-c'5

I i

40

~,

30

20

0 -10~

4~

-20 Rotation

Angle

Figure 8: The C4', C5' and 05' hydrogens' HFCCs (G) versus the rotation angle (deg.) about the C5'C4' bond for the C5'(N) radical. the isotropic component of the HFCCs changes considerably, while the anisotropic components (not shown) do not differ more than twenty percent from the values displayed in Table 10. On average, the rotation barrier about the C4'C5' bond is 8.6 kcal/mol, with maximum and minimum values occurring at 90 ~ (14.4 kcal/mol) and 15 ~ (1.4 kcal/mol), respectively. The results from the rotational study (Figure 8) shed some fight on the dependence of the HFCCs on rotation about the C5'C4' bond. The calculated C5'H isotropic HFCC does not reach the experimental value (-22 G) obtained in 5'dGMP, but comes close to the value obtained in dAm (-17 G) upon a 300 ~ rotation (-16.7 G). The variation between the O5'H and C4'H results obtained for 3'CMP, 5CldU and 5BrdU can also be understood from these results. For 3'CMP, the calculated values which satisfy both the C4' and 05' experimental couplings occur at a 130 ~ rotation, where Aiso(O5'H) = 22.6 G and Ai~o(C4'H) = 8.1 G (experimental values are 20.8 and 4.5 G, respectively). Similarly, results in agreement with 5CldU and 5BrdU experimental HFCCs occur upon a 150 ~ rotation, where Aiso(C4'H) = 17.7 G and Aiso(O5'H) = 10.3 G (experimental values: 18.9 and 8.6 G, respectively). Hence, once geometrical effects are accounted for, the calculated and experimental HFCCs agree very well. The poor agreement between theory and experiment cannot always be improved upon by the sole investigation of rotational effects. For example, a rotational

435

em

"

:OH H

"

"

m

:.OH H

ee

I

II

Figure 9: The structure of model C4' (I) and C1' (II) centered radicals formed through opening the sugar ring. study similar to that discussed for C5'" was carded out in attempts to improve theoretical results for the 0 5 ' alkoxyl radical (the results are not shown explicitly within but the reader is referred to reference 5). However, not all of the experimental results could be understood. The main explanation given for the poor agreement between theory and experiment for the 0 5 ' (and 03') centered radicals is that the hydrogen bonding and crystal environment greatly affect the HFCCs in these ~adicals and therefore calculations accounting for these effects must be performed before improved agreement can be obtained. One radical which involves damage more extensive than the sole removal of a hydrogen atom or breakage of a phosphoester bond is a C4' radical generated through breaking the C4'O1' (I, Figure 9) [21,51,52]. It is also possible that the C2'O1' bond breaks; however, the resulting radical (II, Figure 9) has not been Table 11" Comparison of theoretical and experimental HFCCs (G) for radicals formed through extensive damage to the sugar ring as displayed in Figures 9 and 10. Theory Experiment a'~ Radical Atom Aiso Txx Try Tzz Aiso Txx Try Tzz Figure 9, I

C4'H C5'H C5'H C3'H

-21.3 -13.0 0.0 13.0 -18.8 (32.2) -9.8 -0.2 10.0 3 2 . 8 -2.3 -0.8 3.1 48 (37) 3.8 - 2 . 1 -1.5 3.5 13 (27) 32.4 -1.9 -1.0 3.0

Figure 10, II/III/IV

C5'H C4'H O5'H

-14.1 -2.8 -4.2

Figure 10, IV/V

-7.6

-1.1

-1.8 -1.3 -4.4 -3.0

8.8

3.0 7.4

-8.0 0.2 3.8 -1.6 -0.2

-16.9

C5'H 0.0 -0.8 -0.7 1.5 O5'P - 2 1 . 1 -2.2 1.0 1.2 C2'H -12.7 -7.8 0.0 7.8 CI'H 27.3 - 1.0 -0.4 1.5 C5'H 11.2 -1.3 -0.7 2.0 aExperimental HFCCs for structure I, Figure 9, are from reference 52 (and Aii from 51). bExperimental HFCCs for structures in Figure 10 are from reference 53.

7.8 1.9

436

observed experimentally. The experimental HFCCs in the C4' centered ring opened radical exhibit great differences (Table 11). However, the sum of the C5'H couplings is very similar (61 and 64 G) indicating that alternative conformers may be responsible for the differences. Calculations reveal a large isotropic C4'H coupling (-21.3 G) possessing significant anisotropy (-13.0, 0.0, 13.0 G), not unlike that assigned in uridine 5'-monophosphate (5'rUMP) [52]. One small and one large C5'H couplings were also obtained from the calculations (3.8 and 32.8 G, respectively). The large C5'H coupling is not unlike those assigned experimentally (48 and 37 G). However a larger experimental coupling was obtained for the remaining C5'H coupling than calculated and a substantial C3'H coupling was also calculated but not observed experimentally. The good agreement observed for the C4'H and one of the C5'H couplings is promising and more accurate theoretical and experimental studies could possibly unveil any discrepancies and confidently identify this radical. The second series of ring breaking radicals is formed through removal of a portion of the sugar ring. The radical depicted in structure I, Figu/e 10, has been proposed to be formed in nucleotides by abstraction of a hydrogen atom from the C5' position by a base radical, followed by breakage of the sugar ring and reorientation about the C4'O1' bond [21]. A very similar radical appears in structure 1I, where this radical was observed only after irradiation at room temperature [53]. The coupling constants in these radicals were calculated using a model system (structure III) that represents either the phosphate [21] or carbon oo

eo

"'%

%/,H

/ca,-H

0

\

2

/H

/ca--c;\

ore;

H

I

/ R

H

Hj H

IV

-2 ./OPO3

F---c;

H

II

Base

/H

0

11I

: 9 H C/~ 3

OPO-32 \

H

V

Figure 10: Model systems used for various ring-breaking sugar radicals: radicals observed experimentally (I and II), model ring-breaking radical (HI), C5' centered radical proposed experimentally (IV) and the model ring-breaking radical with a phosphate group (V).

437

[53] group with a hydroxyl group. The experimental results (Table 11) include a large C5'H isotropic coupling (approximately -17 G) and a small C4'H isotropic coupling (4 G). The major difference in the two data sets is the magnitude of the largest component of the anisotropic tensor. The C5'H couplings calculated using the model system are in good agreement with the experimental results. However, the anisotropic results agree more closely with those shown in Table 11 [53] than results obtained in alternative crystals (-11.0, 1.1, 9.8 G) [21]. Hole et al. [53] proposed that an alternative explanation for the couplings observed in 5'GMP is the radical displayed as structure IV (Figure 10) where a large experimental coupling (-17 G) was suggested to arise from the phosphate group. The model system displayed in structure V, was used to test this hypothesis. The calculated resuks indicate that the phosphorus yields a similar coupling (-21 G) to that observed experimentally. However, the calculated phosphorus anisotropic and experimental C5'H couplings do not concur. Thus, due to the better agreement obtained for the ring-breaking radical modeled by structure Ill, it can be concluded that the most likely structure for the observed radical is that displayed as structure I. The calculations presented within this section provide support for experimental data which speculates that many different base and sugar radicals are formed upon irradiation of DNA base derivatives. In fact, the calculations even defend the possibility of the formation of sugar radicals that have been disputed to be important products. Furthermore, the formation of ring-altering sugar radicals in single crystals has been confirmed. This is very important information since sugar radicals have not been assigned in the spectra of full DNA. The confident identification of base and sugar radicals in single crystals will aid in the discovery of these radicals (if formed) in irradiated DNA. Understanding whether these radicals are generated in full DNA or whether they react to form other radicals is important information for the field of DNA radiation damage. 3. F U L L DNA STUDIES The previous sections have discussed the effects of radiation on individual DNA components in relation to experimental results obtained from single crystals of base derivatives at low temperatures. Issues can now be addressed which question the relevance of these studies to the identification of the radiation products in full DNA. Early ESR work on DNA revealed that the classification of radiation products is a difficult task since many of the DNA radicals are extremely similar and, therefore, the hyperfme couplings and g-factors are not sufficient to separate their spectra. The implementation of a variety of experimental conditions allows for the determination of the dependence of

438

radical formation on the environment (for example, strand conformation, hydration level, 02 content). Annealing experiments are also useful to determine which radicals are formed via decay of another product or to simplify the spectra.

3.1 The primary radicals Studies have been performed on DNA both in the dry state and in aqueous solutions [47]. Frozen aqueous solutions and low temperature glasses have been employed on occasion to investigate full DNA. The former is advantageous since it allows for the easy addition of additives, such as electron scavengers (FeC13 or Ka[-Fe(CN)6]) used to obtain information about electron loss centers. The latter is also useful since different reactive radicals can be stabilized and the specificity of a reaction can be studied by carefully selecting the glass-forming agent. For example, hydroxyl radicals are known to be abundant in BeF2 glasses, electrons in LjC1 glasses or in the presence of strong bases (NaOH) and hydrogen atoms in strong acids (H2SO4). Lyophilized (freeze-dried) powders prepared completely dry or with varying degrees of hydration (typically 2.5 to l l water molecules per nucleotide) are also often implemented. These experimental techniques yield random orientations of the DNA molecules and therefore the spectra are very broad and lack distinguishing features. Single-crystal studies would be beneficial, but it is not possible to prepare these samples for an entire DNA strand. An attractive alternative is to use oriented fibers. Despite great efforts put forth by experimentalists, the exact identity of most radical products generated in irradiated DNA is still unknown. The first ESR studies on full DNA only provided evidence for the formation of a thymine centered radical [54,55]. Work performed on DNA irradiated by ultraviolet light [56] and on oriented fibers [57] confirmed this radical to be formed through net hydrogen atom addition to C6 in thymine [T(C6H)]. The defil~tive 9O: | H~..

"

:O:

CH3

H~_~ ~

[

\dR

dR

T ~

G ~

Figure 11" The primary radical products generated according to the two-component model for DNA radiation damage.

439

identification of a thymine radical product led to the suggestion and subsequent proof [58] that T'- must be initially formed in irradiated DNA. Following these studies, little progress was made to classify additional radiation products in full DNA for years, although work continued on single crystals of base derivatives and other DNA subunits (as discussed in Section 2). The model of radiation damage to DNA was greatly enhanced through work performed on oriented fibers by Gr~islund and coworkers [59,60,61]. The radical mixture generated in DNA was suggested to be composed of thymine (and/or cytosine) anions and guanine (and/or cytosine) cations [59]. The initial assumption that cytosine may also be damaged was discarded [61 ] and the picture of radiation damage in DNA resulting in T ' - a n d G "§ became known as the "two-component" model (Figure 11). The two-component model for radiation damage in DNA was often criticized [18,62]. The main criticism was that the formation of T ~ was favored over C ~ only because the anionic radical converts to T(C6H) and products generated from cytosine anions were not observed. Additionally, major criticism of the two-component model arose since the spectrum assigned to T'- in nondeuterated DNA samples is in poor agreement with that obtained from single-crystal studies [62], and the spectrum does not change appropriately upon deuteration. This information indicates that some other species must be responsible for the spectrum [63], possibly C'- which yields a doublet with couplings approximately equal to those assigned to T'-. CuUis and coworkers [64] alluded that the two-component model for damage to DNA seems surprising since ionizing radiation damages indiscriminately and therefore initial electron gain and loss centers should include water, the phosphate group, the sugar moiety and all four bases. This was verified by examining DNA strand-breaks, which were determined to be formed at all centers rather than exclusively at thymine and guanine as predicted by the two-component model [64]. More evidence supporting C'- as the major anion formed in irradiated DNA also appeared in the literature. Bernhard and coworkers determined that C'- is the predominant electron gain radiation product in low temperature glasses of oligonucleotides [65] and that it may also be the major anion generated in DNA [66]. Through the use of computer simulations, Sevilla et al. [67] determined that 77% of all anions are C ' - a t 100 K. However, since the spectra of C ' - a n d T'- are so similar slight changes in the simulation input can yield very different percentages [64]. Furthermore, the one-electron reduction potentials of the bases in aqueous solutions indicated that C'- has a greater tendency to be protonated by its base pair guanine than thymine by adenine and therefore should be the most easily reduced base in DNA [68]. Studies on frozen DNA

440

samples predicted that T " slightly prevails in single-stranded DNA whereas C'predominates in double-stranded DNA, where differences arise due to interstrand base-pairing and base-stacking effects that allow electrons to travel throughout the strand [69]. The debate over the site of electron loss in DNA is much less pronounced since it has been estimated that over 90% of the cations generated in DNA are centered on guanine [67] and guanine end products account for 90% of the electron loss products in DNA [70]. However, the spectra of G "+ recorded in solid-state studies of nucleotides and nucleosides do not correspond to the spectrum recorded in full DNA [71] and investigations of the strand-break specificity determined that some adenine cations could be generated [64]. Thus, it is also possible that other cations are formed, primarily A "+. More information about the specificity of electron gain and loss in DNA can be obtained by calculating the ionization potentials (IPs) and electron affinities (EAs) of the bases. Table 12 compares the IPs for the nucleobases obtained experimentally with those obtained from Mr (MP2) single-point calculations on HF geometries [72] and from DFT (the B3LYP functional). The theoretical data is in good agreement with the experimental results, where all three data sets predict the magnitude of the ionization potential to follow the trend T > C > A > G. Thus, an electron is most easily removed from guanine, which supports the experimental predictions that the guanine cation is the major oxidation product in irradiated DNA. Limited experimental data is available for the EAs of the DNA bases. The trend in the "estimated EAs" (obtained by correcting the HF Koopmanns EA by the calculated nuclear relaxation energy) is T > C > A > G, which is in agreement with early studies on DNA predicting that the thymine anion is the major reduction product upon irradiation [72]. Alternatively, the trend predicted through examination of the adiabatic EAs calculated with DFI' (C > T > G > A) supports experimental data predicting cytosine to be the major reduction site in Table 12: The adiabatic IPs and EAs (kcal/mol) of the DNA bases obtained at various levels of.theor~ and experimemally. IP EA DFI' MP2 Exp. DFT DFT(+) "Estimated" T 196.0 204.2 204.6 -14.8 3.3 7.2 C 1 9 4 . 2 201.5 200.1 -13.8 -1.4 4.8 A 182.3 188.6 190.5 -17.7 -9.1 -7.2 G 1 7 1 . 8 176.6 179.3 -15.8 -6.4 -16.7

441

irradiated DNA. The interesting feature of the EAs is that the "estimated" values for A and G, as well as all adiabatic DFT results are negative. The EA is defined as the energy required to add an electron to a neutral molecule and calculated as the energy of the neutral molecule minus the energy of the anion. Therefore, a negative value for the EA indicates that the anion is higher in energythan the corresponding neutral molecule. A negative EA cannot be measured experimentally due to the dissociation of the anion into an electron and the neutral molecule before nuclear relaxation. One predominant flaw in the DFT results is that diffuse functions were not included in the calculations and these are known to be essential for the accurate calculation of EAs. The inclusion of diffuse functions on the heavy atoms was accomplished with the 6-31+G(d,p) and 6-311+G(2df, p) basis sets for the geometry optimizations and single-point calculations respectively (Table 12, DFT(+)). Inclusion of diffuse functions leads to a positive EA for only thymine. Additionally, the order of T and C is reversed when diffuse functions are included in the calculations, as the results now indicate that thymine is the most favorable anion formed on a base center. The IPs also improve through the use of diffuse functions, for example the IP of thymine changes from 196.0 to 201.6 kcal/mol when diffuse functions are used (experimental value: 204.6 kcal/mol). To better understand the trend in EAs for the DNA bases, a more systematic study must be performed. A good starting point would be to apply techniques known to yield highly accurate thermochemistry, such as the Gaussian-n techniques, to the smaller bases thymine and cytosine. In particular, the introduction of G3 methods using Mr theory has reduced the computational cost of these methods (allowing calculations to be extended to at least 10 heavy atoms) and at the same time increased their accuracy.

3.2 The Secondary Radicals At higher temperatures, ionic radicals are not expected to be stable, but rather these species protonate or deprotonate to form neutral, secondary radical products. As mentioned, the first of these products, T(C6H), was identified as evolving from the thymine anion in early ESR studies. Later, the decay of the guanine cation was predicted to be related to the growth of G(N1) [73,74]. T(CH2) has also been observed in highly hydrated DNA samples [73]. Evidence exists that the cytosine anion is stabilized by protonation at N3 at 77 K [75]. Additionally, in thymine deuterated DNA samples, a deuteron has been determined to add to the C6 position of the cytosine anion [73]. Despite the fact that the types of products observed are diverse, these products were each observed in different samples.

442

:0:

"

I

"

dR T'(O,R) :0:

H~?l j ~

H2

I

dR

l~k\

jl~

] H

\ dR

G'+ (O, R)

H

I

dR

C'(O,R)

:0: H ~ _ ~ II

Jl~

"

dR

T(O4H) (O)

\ dR

N

I

C(N3H) (O,R)

~I

"H

dR T(CH2) (O,R)

-203PO'~H2)'~N ~ O.. . ~ Base ]

:9H

H H

G(N2H) (O)

C1" (O, R)

:~:

:0:

HX

~)~ "

H

A(N3H) (O) H-,,.O#HH

HX~

CH3

.

H

dR A'+ (O)

G" (O)

T(C6H) (R)

C(N4H) (R)

:0:

H2

\ dR G(N1) (R)

:OH H C3" (R)

C4" (R)

~dR C8" (R)

Figure 12: Radicals predicted to be formed in orientated (O) and randomly (R) oriented samples of DNA. Advances have been made in the past few years to identify more than two or three products in one D N A sample. The most promising results were obtained by Htittermann and coworkers, in both oriented fibers [76] and in randomly

443

oriented DNA [77,78]. Through the use of the field-swept electron spin-echo technique, nine clear patterns were identified, and seven radicals proposed (Figure 12), in the ESR spectrum of oriented DNA fibers at 77 K [76]. Species identified which were previously discussed in the literature as possible damage products in oriented fibers include T'-, or T(O4H), C", or C(N3H), and G "§ Newly proposed radicals for oriented fibers include T(CH2) and the C I " sugar radical. Assignment was also made to A(N3H), although the spectrum of this radical was not clear in full DNA and differs from that obtained in the copolymer poly(A:U). Another spectrum, for which little direct information could be obtained, was previously assigned to G "+. However, since G "+ was already assigned in the study under discussion, suggested assignments include G " or A "+, as the adenine anion was already related to A(N3H). The two remaining components could not be assigned due to insufficient information. The first study performed on randomly oriented fibers, which detected more than two or three ionic species, was performed on DNA equilibrated at various levels of hydration, as well as on frozen aqueous solutions [77]. In lyophilized powders, G "+, C'- and T'- were identified without any uncertainty for the first time. The spectra obtained for frozen aqueous solutions were very different from those equilibrated at 76% relative humidity, since the amount of G "+ is considerably reduced. T(C6H) and T(CH2) were also assigned. A continuation of this study directly analyzed the spectra obtained from lyophilized DNA powders (in dry environments and equilibrated at 76% relative humidity) using electron scavengers rather than results obtained from model systems. Many new radicals were identified besides T(C6H) (Figure 12). The spectrum previously assigned to G "§ was reassigned to the cytosine radical formed via net hydrogen atom addition to the amino group [C(N4I-I)]. This is the first time C(N4H) has been proposed for DNA, although it has been identified in aqueous solutions of cytosine derivatives [79]. Spectra were also assigned to T", C'- or C(N3H), T(CI-I2), C1" and G "+. Two additional patterns were acknowledged for the first time, one was speculated to be due to radical addition to the C8 position in one of the purines (C8") and the other due to G(N1). An additional spectrum was speculated to be due to the C4" or C5" sugar radical, but a definitive assignment could not be made. At high doses of radiation, a spectrum appeared which gave strong indications of being due to C3" or C4". These studies on oriented fibers and randomly oriented DNA are very important since they are the first studies to demonstrate the great variety of radicals that can be identified in irradiated DNA. The role of sugar radicals in DNA radiation damage is uncertain since no sugar radicals were identified in preliminary studies on full DNA samples [42].

444

However, at least nine different sugar radicals were observed in irradiated single crystals of 2'-deoxyguanosine 5'-monophosphate [21]. It was originally suggested that damage quickly shifts from the sugar (where alkoxyl radicals are often observed in nucleotides but cannot be formed without a strand break in DNA) to the bases, especially after annealing [15,80]. Alternative explanations include a small abundance of radicals, multiple conformations, the similarity of the radical's spectra and the limitations imposed by the sole use of ESR (rather than more involved techniques) [14,42]. Despite the problems associated with the identification of sugar radicals, Hiittermann and coworkers [77,78] provided the first direct evidence that these radicals are formed in full DNA samples and proposed the formation of the C I', C3', C4' and C5' centered radicals. In addition, studies performed with heavy ion beam irradiation of DNA noted the resemblance between the simulated spectra of the C4 '~ and C3 '~ radicals and the spectrum of DNA [81 ].

9

-

Base

.

:0:

il

..

~o--.e---oc~I2 ~A

HH

~

i

H ! H

: O:

: nO - - - ~ O

i a :O:

i,,

-

I

..

Base

:o--e-=~5

:O:

cs'(m)

:0:

:0:

li

II

--..o-Q "

|

HH

:O:

H

|

e

: O: ,I

: O---~O

"H

H[

H

: O: 9l

7,,-,,

e :B___le-=B

:Of

P1

P2

Figure 13: The first phosphate derived radicals observed in DNA.

P3

445

Similarly, little evidence has appeared for the formation of phosphate centered radicals. Studies on model systems show that electron capture at the phosphate group would result in cleavage of the phosphoester bond [82,83]. Additionally, sugar radicals of the form displayed in Figure 13 (C5'(I-I2)) have been observed and the most likely mechanism for their formation is through capture of an electron at a phosphate group [21]. Electron transfer to the DNA bases from the phosphates is also likely [83], and supporting evidence has been obtained which indicates rapid elimination of the phosphate-ester group through a C4' centered sugar radical (S, Figure 13). The only direct detection of phosphate centered radicals was obtained through heavy ion beam irradiation of DNA [81], where large couplings were assigned to the phosphorus atoms in radicals displayed in Figure 13 (P1 and P2, or possibly P3). From these studies it is clear that damage to DNA is broader than initially expected from the two-component model since products on all four bases and the sugar moiety have been proposed. These proposals include sugar and phosphate radicals despite early failures to detect radicals in the backbone of the DNA double helix. More work is required in order to determine the exact identity of the radical products since structural information is difficult to obtain through the methods implemented thus far. 3.3 Effects of water on radical formation in D N A

Besides direct damage of the DNA strand, it is also possible for the surrounding water molecules to be involved in radiation damage mechanisms. The hydration layer of DNA consists of a primary layer (approximately 20 or 21 water molecules per nucleotide), which possesses properties different from crystalline ice upon freezing, and a secondary layer, which cannot be distinguished from bulk water upon crystallization. Upon irradiation of water, many different products can be formed: "OH + e'-'(aq) + "I-I + H 2 0 "+ + H + + H 2 0 2 + H2

The first 14 water molecules per nucleotide in the hydration layer surrounding DNA have approximately the same mass as DNA [84] and, therefore, the same number of ionizations are expected to occur in the primary hydration layer as in the DNA strand. However, it is unknown how the water molecules in the primary hydration layer are affected by radiation. One possibility is that water cations and electrons are formed, which transfer their ionic character to the DNA strand (quasi-direct effects). Water cations can also transfer protons to neighboring water molecules resulting in hydroxyl radicals. The products formed in the hydration layer (hydroxyl radicals, hydrogen atoms or aqueous electrons) can subsequently react with DNA (indirect effects). Quasi-direct and indirect effects are expected to yield very different radicals.

446

Perhaps the first indication of the dependence of DNA damage on hydration was reported for frozen aqueous solutions [85], where the radical yield in wet DNA was reported to be twice the yield obtained in dry DNA. Additionally, the yield of radical ions at 77 K was found to increase by a factor of four upon inclusion of the primary DNA hydration layer [86]. In lyophilized DNA, it was instead noted that radical yield increases with hydration to a certain extent, but then a plateau is reached that cannot be surmounted by increasing the level of hydration [73]. The absolute yields of the individual ion radicals have also been determined to vary with hydration, where for example T'-predominates in dry DNA and C'- predominates when the hydration layer is included [75,77]. Alternatively, evidence exists which indicates that DNA damage does not increase with consideration of the primary hydration layer, but increases upon inclusion of the secondary layer. These studies include investigations of the release of unaltered bases [87], the production of base damage products (14 detected in total) [70], and the efficiency of strand breaks [88]. The investigations discussed thus far used the fact that hydroxyl radicals, hydrogen atoms and free electrons were not observed in the primary hydration layer of DNA to speculate that damage due to the hydration layer must occur via quasi-direct effects. However, it is possible that hydroxyl radicals are formed, but are not detected due to weak signals, they rapidly react with the DNA strand or the generated radicals quickly recombine [89]. Conversely, it is accepted that hydroxyl radicals can be formed in the secondary hydration layer, where water molecules are more loosely bound. A major revelation in this area was obtained in a study of ),-irradiated DNA where hydroxyl radicals were observed in low yields in the primary hydration layer and it was therefore concluded that most of the oxidative damage in the hydration layer is transferred to DNA [34a]. Reinvestigation of this problem revealed that the hydration layer can be separated into three partitions: (1) the first 9 water molecules which do not form significant amounts of hydroxyl radicals, but transfer their charge upon irradiation to DNA; (2) an additional 12 water molecules completing the primary hydration layer which predominantly form hydroxyl radicals, but unsubstantial charge transfer may also occur; and (3) bulk water which forms hydroxyl radicals [34b]. It is still possible that hydroxyl radicals were not detected in the first 9 water molecules since they react quickly with DNA or they could simply not be detected with ESR. In aqueous BeF2 glasses of base derivatives, hydroxyl radicals were found to add to the C5C6 double bond in cytosine and uracil, abstract a hydrogen atom from the methyl group in thymine and add to C2 in adenine [90]. In aqueous

447

solutions, hydroxyl radicals have been determined to add to the C5C6 double bond in all pyrimidines [91] and to C4, C5 and C8 in purines [92]. Differences are also thought to exist between low temperature glasses and frozen aqueous solutions, where indirect and quasi-direct pathways are thought to predominate in the former and latter, respectively. Htittermann et al. proposed a new mechanism for radiation damage in frozen aqueous solutions, which involved oxidation at water followed by net hydroxyl radical or hydrogen atom addition to C6 in thymine and net hydrogen atom abstraction from the methyl group [93]. This is the first indication that in frozen aqueous solutions hydroxyl radicals can take part in the radiation damage to DNA components [37,94], although the spectra assigned to T(C6OH) could arise due to attack at C6 in thymine by a neighboring allylic radical (dimer radical) or by its own sugar group (cyclic radical) [95]. More recent work indicates that the allylic radical could be formed via a base cation without the formation of hydroxyl radicals and therefore contradicts this proposal [96]. Work on single crystals of DNA components has also suggested that water can be involved in the initial ionization process. Studies on single crystals of guanine derivatives determined that it is necessary to consider ionization of the surrounding water molecules in order to account for the formation of the identified radicals [27,29,30,97]. Comparison of calculated HFCCs and those elucidated in cytosine monohydrate crystals also supports water as a site for oxidative damage.

3.4 Major radical products formed in irradiated DNA Ionizing radiation damages indiscriminately and the number of initial damage products formed on a particular center is proportional to the mass of the center under consideration. Therefore, upon irradiation of a DNA strand, the primary radicals formed should include cationic and anionic radicals of each base, the sugar moiety and the phosphate group. As discussed, the yield of damage to the DNA strand has been determined to increase upon consideration of the hydration layer [70,87,88], which indicates that the water surrounding DNA plays an important role in the radiation damage mechanism. More specifically, since living entities are largely composed of water, a model of the radiation damage to DNA must also encompass the ionization of water molecules, which generates water cations and electrons. However, the abundance of other organic molecules with which these species can react and the amount of room available for radical migration must also be considered. Any of the primary radiation products can transform into secondary radicals by protonation or deprotonation. Early evidence for radical transfer to secondary products was obtained by recognizing the relationship between T'- and T(C6H) [60].

448

Due to the nature of the DNA double helix, it is possible for the initial damage to be transferred through the DNA strand to produce more stable intermediate radical products. Electron transfer has been reported to occur over as few as three base pairs to as many as one hundred. [98]. The consensus in the literature regarding radicals initially formed upon irradiation of DNA is that the primary electron loss center is guanine and the primary electron gain centers are cytosine and thymine. The formation of these primary products is also supported by ab initio [72] and DFF calculations. Thus, if an adenine anion is formed initially, the electron can be transferred throughout the DNA strand to produce either a thymine or cytosine anion. Interbase electron transfer is possible in DNA due to the small distance between base pairs, which results in an overlap of the n-systems, and hydrogen bonding of the bases [92]. Evidence for charge transfer through the DNA strand can be obtained from a study that predicted thymine anions to be present in slightly larger yields in single-stranded DNA, while the cytosine anion clearly predominates in double-stranded DNA [69]. More evidence for transfer of anionic character has been obtained in single-crystal studies. For example, despite the fact that the primary radicals identified in cocrystals of 1-methylcytosine and 5-fluorouracil were the cytosine anion and the uracil centered cation, the only net hydrogen atom addition products observed evolved from the uracil anion [99]. Furthermore, in cocrystals of adenine and various uracil (or thymine) derivatives, net hydrogen atom addition adenine radicals were observed despite the fact that uracil (or thymine) anions are expected to be the primary anions formed. However, although the adenine cation and the amino-deprotonated counterpart were observed in cocrystals of 1-methyluracil and 9-ethyladenine [100], uracil and adenine acted as if they were isolated from one another, which indicates that transfer of radical character does not occur in these crystals. Negative charge can also be transferred to the DNA hydration layer. For example, Steenken suggested that upon formation of the adenine anion, proton transfer from T(N3) to A(N1) occurs, forming the thymine anion, which is subsequently protonated by a nearby water molecule to form hydroxyl anions [101 ]. Thus, initial reduction of adenine could lead to an abundance of negative charge in the hydration layer. Alternatively, the adenine cation could transfer non-hydrogen bonded amino-protons to a neighboring water molecule. Thus, these experimental results indicate that the charge can be transferred from bases in the DNA strand to the hydration layer where it can be stabilized or additional water radicals can be formed to attack the bases and the sugar moiety. Alternatively, long-range hole transfer in DNA is considered to be more difficult. However, evidence supporting hole transfer in some crystals

449

(co-crystallized with thio derivatives) does exist, which provides evidence that hole transfer may also occur in DNA [98]. For example, positive holes formed on thymine, cytosine or adenine can be transferred to guanine. The radiation products generated in DNA will be discussed in the next two sections in terms of how the primary cation and anion radicals decay to form secondary radical products. This discussion will encompass results from single crystals [15], the aqueous state [92,101], the calculations presented in Section 2, as well as those obtained from ab initio studies [72], and studies on oriented and randomly oriented DNA [76,78].

3.5 DNA cations and secondary radicals Cations can be formed via direct ionization of the DNA strand or through transfer of the positive charge from irradiated water molecules in the hydration layer. Sugar radical cations can be formed via transfer of the radical character from the base cations. Once formed, cations can recapture an electron, generated from either ionization of water or the DNA strand to heal the damage, or positive hole transfer can occur, where the favorable electron deficient center in DNA is guanine. At higher temperatures, or more specifically those of biological systems, neutral radicals are more probable and cations are expected to deprotonate. However, in experimental studies on DNA, it is difficult to determine the deprotonation state of the primary radical products. This is clearly seen from theoretical calculations performed on model systems [1-4], which illustrate that there exists very little difference in, for example, the spin densities of cations and their deprotonated counterparts. The thymine cation has not been identified in experiments on single crystals [ 15] and ab initio calculations predict that this base has the largest IP [72]. However, T(CH2) has been identified in all thymine derivatives [15], an assignment which was supported by HFCCs calculated with DFT. Thus, assuming that the thymine cation is stabilized for a sufficient period of time in DNA to allow for deprotonation, the most abundant secondary thymine radical would be formed via loss of a methyl proton. This hypothesis is supported by the fact that T(CH2) has been identified in the most complete studies on both oriented fibers [76] and randomly oriented DNA [77,78]. Studies of the redox properties of base pairs indicate that one-electron oxidized thymine in DNA should be characterized by both T "+ and T(N3) implying proton transfer can possibly occur [101]. The T(N3) radical has however not been identified in single crystals through comparison of calculated and experimental HFCCs, even in studies on single-crystals of base pairs. Moreover, this radical has not been suggested to be

450

formed in full DNA. This indicates that proton transfer cannot compete with deprotonation at the methyl group. Little experimental evidence has been obtained for the formation of the cytosine cation. Early ESR studies predicted that the cytosine cation is formed in cytosine monohydrate crystals, however, through the use of the ENDOR technique this assignment was determined to be unlikely [15]. In single crystals of deoxycytidine 5'-monophosphate, the cytosine cation was also postulated, but the HFCCs did not match those calculated with DFF. The only direct successor of this cation discussed in the literature is that formed via net hydrogen loss at N1. In cytosine monohydrate crystals, this radical product was postulated, but through comparison with calculated HFCCs, a new mechanism was proposed involving oxidation at water rather than at cytosine. The N l-deprotonated cytosine radical is irrelevant when DNA is considered since the hydrogen at N1 is replaced with ~deoxyribose. Alternatively, sugar radicals have been observed in some cytosine derivatives [15]. These radicals could be formed from the cytosine cation, where the cationic nature is transferred to deoxyribose and deprotonation subsequently occurs at the sugar moiety. The instability of the cytosine cation in single crystals indicates that upon irradiation of DNA, the formation of the cytosine cation, or its secondary radical products, is unlikely. This is in agreement with results obtained from the redox properties of the base pairs which determined that the cytosine cation will not deprotonate since guanine is such a weak base [101]. In addition, since cytosine is base paired with guanine, which is well accepted to be the ultimate cationic site in irradiated DNA, transfer of the positive charge from cytosine to guanine (or to the sugar moiety) is more likely than the formation of a cytosine radical by deprotonation. The adenine cation has not been confidently assigned through comparison of calculated HFCCs and those obtained from single crystals of nonprotonated adenine derivatives unless co-crystallized with another base derivative. However, a study performed on the co-crystals of 1-methyluracil and 9-ethyladenine detected the adenine cation at 10 K [100] and the HFCCs agree well with the calculated values. FtLrthermore, the cation can be observed in protonated crystals [15]. The extreme conditions at which the adenine cation was observed in these studies are not evident in full DNA. Deprotonation of the adenine cation is expected to occur primarily at the amino group. In single crystals it has been determined that this radical is formed if one of the amino hydrogens is involved in a hydrogen bond to a site which can transfer the damage further away from the initial adenine molecule [20]. In DNA, the proton could be transferred through the hydrogen bond formed with

451

the base-pair thymine, although further transfer through a hydrogen bond network is not possible. In cocrystals of 1-methylthymine and 9-methyladenine, no products formed via deprotonation of the adenine cation were detected, which was believed to indicate that proton transfer between adenine and thymine is unlikely [102]. These results indicate that stacking and hydrogen bonding effects are not sufficient for radical stabilization. In solution, it has been determined that although the adenine cation is a strong acid, thymine is a poor base and therefore will not abstract a proton from adenine [101]. Ab initio calculations also predict that proton transfer is not favorable in adenine and thymine ion pairs [72]. These results indicate that the effects of base pairing on the formation of the adenine cation or its secondary radicals in DNA are unknown and hydrogen transfer between base pairs cannot be used to justify the most abundant adenine deprotonated radical. An alternative possibility for the formation of A(N6H) in DNA is that the hydrogen not involved in the base-pair hydrogen bonding could be removed. In some adenine crystals, the C I' sugar radical (C1 '~ was detected and postulated to be formed from the adenine cation [15]. Thus, if an adenine cation is stabilized for a time longer than that required to transfer its cationic character to guanine, either deprotonation at the amino group or transfer of the cationic character to the sugar moiety is expected. As discussed, it is agreed upon in the literature that guanine is the major oxidation site in DNA. Ab initio calculations on base pairs indicate that the IP of the guanine-cytosine base pair lowers to a greater extent than the IP of the adenine-thymine base pair relative to guanine and adenine, respectively [103]. This lends even more support to guanine being the major positive center in DNA. Despite this fact, the HFCCs calculated with DFI" do not support the experimental assignment to the guanine cation in single crystals. Deprotonation of the guanine cation is also expected in solution, however the equilibrium constant was determined to be small. The primary product formed via deprotonation of this cation in single crystals is G(N2I-I). Alternatively, in solution, deprotonation primarily occurs at N1 [92]. In DNA, deprotonation at N1 or the amino group are both possible due to transfer through a hydrogen bond with cytosine. However, since N3 has been determined to be the most likely site for protonation in cytosine (to be discussed in Section 3.6), transfer from N1 may be favored in DNA. Ab initio calculations have determined that the guanine-cytosine base pair cation can readily undergo proton transfer along the C(N3)-G(NIH) bond, where the activation barrier was calculated to be only 0.9 kcal/mol after correction for zero-point vibrational effects, and the products are only 1.6 kcal/mol higher in energy than the reactants [103]. Altematively, if transfer does not occur through the hydrogen bonds, but rather protons are released into the surrounding environment as proposed for adenine, then the

452

amino hydrogen not involved in a hydrogen bond can be deprotonated. Only the G(N1) deprotonated product has been identified thus far in studies of randomly oriented DNA [78]. It has been suggested that since the predicted total yield of anions is larger than the total yield of cations in a 1.4:1 ratio in DNA, some cations may have been left undetected. This provides evidence that oxidation may also occur on the DNA sugar moiety. Deoxyribose has an IP larger than the bases, but smaller than the phosphate group [72], indicating that cation formation could occur on this center. It should also be noted however that calculations accounting for the phosphate hydration layer indicate that the IP of the sugar and the phosphate groups are more similar to one another [72]. In single crystals, direct oxidation of the sugar moiety is expected to result in alkoxyl radicals, which are commonly observed in various base derivatives [15]. Other sugar radicals can be formed directly from alkoxyl radicals or hydrogen atoms can be abstracted by neighboring molecules in the single crystals. Oxidation of a base followed by transfer of the radical character to the sugar moiety can also result in deoxyribose radicals. However, transfer of radical character from the sugar to the base was observed at 200 K in single crystals of 2'-deoxyguanosine 5'-monophosphate and, thus, this pathway may not be relevant to radiation effects on living systems. Any of the mechanisms discussed for the formation of sugar radicals can be expected to lead to deprotonation at any of the carbons (CI' to C5'). In studies on single crystals of base derivatives [15,21], the C I' position appears to be the favored site for deprotonation. It is speculated that thymine and guanine derivatives are more likely to deprotonate at the base rather than transfer character to the sugar group due to the abundant formation of alternative deprotonated radicals. The C 1' centered radical has been suggested as a product in oriented fibers [76] and randomly oriented DNA [77,78]. The formation of the C3', C4' and C5' centered radicals was also postulated in DNA samples [78]. On the contrary, the C2' radical has not been suggested to be formed in DNA. This is supported by both ab initio [72] and DFF calculations [5], since both predicted the C2' radical to be much higher in energy than the other carbon centered radicals which are all very close in energy. Additional sugar radicals have been observed in single-crystal studies, which involve considerably more damage to the sugar ring than breakage of one bond. The relevance of these structures to DNA is unknown at this time since none of these products have been observed in irradiated samples.

453

Products formed by loss of an electron from the phosphate group have not been identified in single-crystal studies of base derivatives or studies on full DNA. Experiments and calculations indicate that the IP of the phosphate group in DNA or outside the helix is low [72]. However, if an environment which is more relevant to biological systems is considered (for example, inclusion of solvation or countefion effects), then the IP increases by a factor of 2 to 2.5 [72]. Thus, products generated by loss of an electron from the phosphate groups are unexpected in DNA. It is postulated that these radicals are quickly repaired by capture of an electron. The role the water encompassing the DNA strand plays in radiation damage appears to be unsettled. However, it is agreed that water is primarily involved in the radiation process through an oxidation-type mechanism. Oxidation of water leads to free electrons and H20 +, which can dissociate to form protons and hydroxyl radicals. The hydroxyl radicals can subsequently react with any of the undamaged bases or the sugar group. Aqueous [91,92] and solid-state [34b] results predict that the primary sites for hydroxyl radical addition is across the C5C6 double bond in the pyrimidines and at C8 in the purines, as well as C2 in adenine. In a study of randomly oriented DNA [78], a secondary product was identified to be generated through radical addition to C8 in one of the purines. This species could be attributed to hydroxyl radical addition to C8 in guanine or adenine. Alternatively, hydroxyl radicals can abstract a hydrogen atom to form, for example, T(CH2) or carbon-centered radicals in deoxyribose. Whether hydroxyl radicals prefer to abstract hydrogen from the sugar moiety or add to the bases remains to be determined. It should be noted that although the secondary radicals mentioned in the present section were discussed in terms of formation from the primary cationic centers, other pathways may lead to the equivalent species. For example,' upon irradiation of DNA it is possible to generate excited species. The excess energy on these centers can be relieved by dissociation of an X-H bond that would result in radical products equivalent to those discussed above. Excitation could occur at the bases to yield for example T(CH2) or at the sugar group to yield any of the net hydrogen atom removal radicals (C1 '~ to C5'~

3.6 DNA anions and secondary radicals The generation of cations through irradiation of DNA and its surrounding water molecules yields a supply of electrons that can add to the DNA strand to generate anionic centers. Similar to the cations, these anions may be stable under extreme conditions, but they can be expected to rapidly protonate at elevated temperatures. The protons can be obtained from deProtonafion of the

454

base, sugar or water cations. The protonation state of the anions in DNA is difficult to determine. In particular, if the added proton lies in the molecular plane, which is often the case, the resulting HFCCs are very small and extremely difficult to detect even with the sophisticated ENDOR technique. Through comparison of data from single crystals [15] and DFT calculations [14], it can be determined that at 10 K the thymine and cytosine anions are protonated in many different crystals. Since radicals formed through net hydrogen atom addition have been observed with ENDOR spectroscopy even at low temperatures in single crystals, it seems likely that thymine and cytosine radicals should also exist as neutral species in irradiated DNA. The most probable sites for protonatation are 0 4 and N3 in thymine and cytosine, respectively. These protonation sites are even more likely in full DNA samples due to the hydrogen bonding interactions between the base pairs. In particular, the ease of proton transfer along the C(N3)-G(N1H) bond in the guanine-cytosine base pair cation has already been discussed and proton transfer has been determined through ab initio calculations to be favorable in guanine-cytosine ion pairs [72]. Furthermore, if the cytosine anion is formed, which is a strong base, it is base paired with guanine, which is a strong acid, and proton transfer is very favorable [101]. T(O4H) and C(N3H) have been speculated to be formed in full DNA [76,78]. It is also possible to protonate along the C5C6 double bond in both pyrimidines. The thymine C6-hydrogenated radical was observed in the first ESR studies on irradiated DNA [56] and has been identified with more advanced methods [76,78]. It is expected that this radical is predominant since adenine is a weak acid and therefore cannot donate a proton to its thymine base pair at the 0 4 position. Ab initio calculations have shown that proton transfer ability across the T(N3H)-A(N1) bond in the adenine-thymine base pair cation is poor [103]. Although transfer between T(O4H) and the adenine amino group was not investigated, other calculations have shown that proton transfer is not favorable in adenine-thymine ion pairs [72]. Furthermore, single-crystal studies indicate that transfer across a hydrogen bond where the acceptor is a ketyl oxygen (=O) represents less favorable conditions for a successful proton transfer [20]. Thus, evidence exists suggesting that proton transfer across the T(O4)-A(N6H) hydrogen bond may be slow. Therefore, other proton-donating agents (such as water or free protons generated from deprotonation of base cations) have an opportunity to react with the thymine anion. In particular, protonation is expected to occur at C6 (or C5) in thymine [T(C6I-I) or T(CSH)].

455

In addition to the C(N3H) product, the cytosine N4 protonated radical [C(N4H)] has been proposed experimentally for full DNA samples [78,]. This radical has been observed in single crystals of cytosine hydrochloride [104] and couplings calculated with DFF for this radical are in good agreement with experiment even though the chlorine counterions were not included in the model system [39]. If protonation from a neighboring guanine molecule is slow, then there exists the possibility of the formation of the N4-hydrogenated radical. Moreover, the radicals formed by protonation across the C5C6 double bond [C(CSH) or C(CrH)] could be generated, both of which have been observed in single crystals and the assignment is supported by DFT calculations cited in Section 2. The c ( C r H ) product has also been observed in deuterated DNA samples, where a deuteron adds to C6. However, as indicated by ab initio calculations, proton transfer is favorable in the guanine-cytosine base pair ions and C(N3H) is probably the most predominant cytosine net hydrogen addition radical product [72]. It is interesting to note that cytosine has one more probable protonation product than thymine, which could offer an explanation for the experimentally observed higher yield of the cytosine anion, since it is difficult to detect the differences between the cytosine anion and its protonated analogs by ESR. The adenine anion has also been determined to be protonated in single crystals at very low temperatures. The main protonation site in single crystals is N3, which is supported by DFF calculations [4]. Furthermore, protonation can occur at both C2 and C8, where these sites are favorable under conditions where N3 is not involved in a hydrogen bond in single crystals [15]. In the aqueous state, the adenine anion has been shown to accept a proton from N3 in thymine at the N1 position [101]. This can be followed by a 1,2-shift to form the A(C2H) product [92]. Only the A(N3H) product has been assigned in oriented DNA [76]. However, a product has been identified in randomly oriented DNA and assigned to a net radical addition product at C8 in one of the purines [78], which could be associated with A (C8H). The guanine anion has been suggested as a product in some single crystals. However, since the other three bases were determined to be protonated even at low temperatures and the anion and its protonated form possess similar characteristics, it is unlikely that the guanine anion will be observed directly in irradiated DNA samples. Through comparison of single-crystal and calculated results, the primary protonation site for the guanine anion is 06. In full DNA, this position is hydrogen bonded to the amino group of its base-pair cytosine. However, the amino-dehydrogenated cytosine radical has not been observed in either single crystals or irradiated DNA. Furthermore, from studies in aqueous solutions it is known that cytosine is a weak acid [101]. Thus, a simple proton

456

transfer mechanism seems unlikely. Comparison of single crystal results and calculations indicates that alternative sites for protonation include C8 and C5. Electron capture at the sugar group is not expected to occur. This is primarily due to the fact that the electron affinities of the bases are much larger than that of the sugar group and therefore the bases are expected to shield deoxyribose. However, a radical formed by a rupture of the phosphoester bond at C5' was determined to be formed at 10 K in 2'-deoxyguanosine 5'-monophosphate (C5'(H2), Figure 13) [21]. Since this radical was formed at such low temperatures, it can be speculated to be generated through a reductive pathway at the sugar group rather than through transfer of character from the base. Thus, although products generated from electron capture at the sugar were not expected as forms of DNA damage in the past, a reductive mechanism involving deoxyribose cannot be ruled out for radical formation. In addition, a similar radical could be formed at the C3' position (C3'(H)). If these radicals are generated in irradiated DNA, then a prompt strand break will occur. It should be noted that hydrogen abstraction radicals have been shown to be products of reduction pathways in related sugars [ 105]. The phosphate group is also a possible site for electron capture. Two phosphate-centered radicals were discussed in a previous section and speculated to be due to electron gain on the phosphates at either C3' or C5' (P1 or P2, Figure 13) [81]. Radical character could also be transferred to the sugar moiety. Alternatively, as discussed in Section 3.5, electron capture at the phosphate group could lead to elimination of this group, or strand breaks in DNA, by the formation of the C5'(H2) or C3'(H) sugar products. This is thought to occur mainly through abstraction of hydrogen from C4' which forms a radical at this center [21,106]. It should be noted that the products discussed within could also be formed via hydrogen atom addition. These hydrogen atoms can be generated via recombination of an electron and a proton or as products following excitation of the bases or sugar moiety. For example, in randomly oriented DNA a radical product was identified as being formed by radical addition to C8 in one of the purines (adenine or guanine) [78]. 4. A M U L T I - C O M P O N E N T M O D E L F O R DNA RADIATION D A M A G E

Figure 14 summarizes the explanations provided in the previous sections for the effects of radiation on the entire DNA strand and the surrounding water molecules. The diagram depicts the formation of the primary radicals (cation

457

DNA + Radiation

,W'"

,i~ , :

~

T'"

C"

!:', \

,

~: ,O

+

-

A

T i

\

,,

,,



-1

i

\ \

\

~

ol

! 1

!!

I 1

; I ,.j I

,

I

\

T-

i i i i

I I 1 I I

T+ ~

I 1 i

C

I



,,,--| __

r

A* +----ID

G-

r

H

i i i

G+ p

m

! i i o i

sOH

~+

d

w

T(O4H)

A(N3H)

T(C6H) T(C5H)

A(C2H)

A(CSH)

,v

P1 P2

A(N6I-I) T(CH2)

C/T(C5OH) C/T(C6OH) A/G(C8OH) A(C2OH)

C(N3H) C(N4H) C(C5H) C(C6H)

G(O6H)

C5'(I-12)

G(C8H) G(CSH) G(,!T2H)

C3'(H)

G(N1)

88 C1' C3' C4' C5'

Figure 14: A model for radiation damage to D N A which includes damage to the bases, the sugar moiety, the phosphate group and the surrounding water molecules.

458

and anion radicals) on all bases (T, C, A, G), the phosphate group (P), the sugar moiety (S) and the surrounding water molecules (W). The transformation of each primary radical to secondary radicals is also displayed. The protonation of anions and deprotonation of cations are in strict competition with electron transfer throughout the DNA strand. The electron-transfer mechanisms are not shown in the diagram for simplification. Thus, the formation of secondary radical products is dependent on whether or not the anion is stabilized for a sufficient period of time to allow for protonation (or equivalently deprotonation of cations). Alternatively, hydrogen atoms or hydroxyl radicals can attack the undamaged bases to form the radical products included in the model. The model presented in Figure 14 indicates that a primary product could directly result in the formation of a secondary radical. For example, the thymine cation can deprotonate to form the methyl-dehydrogenated product. An alternative pathway could be that the, primary radicals react to form radical products on another center. For example, the cytosine cation was concluded not to deprotonate, but rather form a sugar radical (indicated by a horizontal line in the figure), which subsequently forms a sugar deprotonated radical. Another example is water cations form hydroxyl radicals that can abstract a hydrogen atom from the thymine methyl group or from deoxyribose. The protons formed from the water cations, in addition to the hydroxyl radicals, can add to any of the base anions to form protonated products (these processes are also indicated by horizontal lines in the figure). From the model developed in the present chapter and displayed in Figure 14, it can be seen that the possibilities of radical formation in irradiated DNA are extremely abundant. Since these are the most probable radical products in irradiated DNA, this model may be useful when attempting to characterize the ESR spectra of DNA. In order to narrow the formation of radical products further, more experimental work must be performed to rule out each product. For example, many experimental studies have shown that the formation of a specific radical cannot be eliminated solely due to the fact that its signal is not observed with ESR, since often a strong ENDOR signal will be obtained with the same sample. It is postulated that as experimental techniques become more advanced and are able to characterize more products, evidence will be obtained to support the current working model for radiation damage to DNA. 5. CONCLUDING REMARKS The discussion presented in the present chapter illustrates the diversity of-radical products generated in irradiated DNA samples. The knowledge of which

459

radicals are formed has important consequences for determining the type of damage exhibited (for example, strand-breaks, tandem lesions, DNA-protein cross-links, unaltered base release). The model outlined herein is much more complex than the original two-component model which speculated that initial radiation damage centers on the formation of only two ionic radicals. Moreover, early researchers have claimed on occasion that the "complexity of the DNA radical population" can be explained by the formation of four radicals [85]. From the multi-component model presented herein, it can be determined that this is clearly not true. The determination of the radicals generated upon irradiation of DNA leads to a broader area of research which can investigate how these radicals are formed or, more importantly, how they subsequently react to result in permanent dal~mge to the DNA strand. 6. A C K N O W L E D G E M E N T S

We thank the Natural Sciences and Engineering Research Council of Canada (NSERC), the Swedish Natural Science Research Council (NFR), and the Killam Trusts for financial support. REFERENCES

S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 5369. S. D. Wetmore, F. Himo, R. J. Boyd and L. A. Eriksson, J. Phys. Chern, B, 102 (1998) 7484.

,

,

o

o

S. D. Wetmore, R. J. Boyd and L~ A. Eriksson, J. Phys. Chem. B, 102 (1998) 9332. S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 10602. S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 7674.

6. A.D. Becke, J. Chem. Phys., 98 (1993) 1372. 7. C. Lee, W. Yang and R. G. Parr, Phys. Rev. B, 37 (1988) 785.

460

Q

R. Ditchfield, W. J. Hehre, and J. A. Pople, J. Chem. Phys., 54, 724 (1971); W. J. Hehre, R. Ditchfield and J. A. Pople, J. Chem. Phys., 56, 2257 (1972); P. C. Hariharan and J. A. Pople, Mol. Phys., 27 (1974) 209; M. S. Gordon, Chem. Phys. Lett., 76 (1980) 163; P. C. Hariharan and J. A. Pople, Theor. Chim. Acta, 28 (1973) 213; A. D. McLean and G. S. Chandler, J. Chem. Phys., 72 (1980) 5639; R. Krishnan, J. S. Binkley, R. Seeger and J. A. Pople, J. Chem. Phys., 72 (1980) 650; T. Clark, J. Chandrasekhar, G. W. Spitznagel and P. v. R. Schleyer, J. Comput. Chem., 4 (1983) 294; M. J. Frisch, J. A. Pople and J. S. Binkley, J. Chem. Phys., 80 (1984) 3265.

9. J.P. Perdew and Y. Wang, Phys. Rev. B, 33 (1986) 8800. 10. (a) J. P. Perdew, Phys. Rev. B, 33 (1986) 8822; (b) J. P. Perdew, Phys. Rev. B, 34 (1986) 7406. 11. Gaussian 94 (Revision B.2), M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R.Cheeseman, T. A. Keith, G. A. Petersson, J. A. Montgomery, K. Raghavachari, M. A. A1-Laham, V. G. Zakrzewske, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Binkley, D. J. Defrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A. Pople, Gaussian, Inc., Pittsburgh PA, 1995. 12. St-Amant, A.; Salahub, D. R.; Chem. Phys. Lett., 169 (1990) 387; St-Amant, A. PhD. thesis, Universit6 de Montr6al, 1991; Salahub, D. R.; Fournier, R.; Mlynarski, P.; Papai, I.; St-Amant, A.; Ushio, J. In Density Functional Methods in Chemistry; Labanowski, J., Andzelm, J., Eds.; Springer: New York, 1991. 13. L. A. Eriksson, Mol. Phys., 91 (1997) 827. 14. (a) V. G. Malkin, O. L. Malkina, L. A. Eriksson, D. R. Salahub, In Modem Density Functional Theory, A Tool for Chemistry; J. M. Seminario, P. Politzer, Eds.; Elsevier: New York, 1995; (b) B. Engels, L. A. Eriksson, S. Lunell, Adv. Quan. Chem., 1997, 27, 297; (c) L. A. Eriksson, In Encyclopedia of Computational Chemistry, P. v. R. Schleyer, Ed.; WHey and Sons: New York, 1998; (d) Eriksson, L. A.; Himo, F. Trends in Physical Chemistry 1997, 6, 153.

461

15. D. M. Close, Radiat. Res., 135 (1993) 1. 16. K. Miaskiewicz, J. Miller and R. Osman, Int. J. Radiat. Biol., 63 (1993) 677. 17. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, J. Phys. Chem., 96 (1992) 1121. 18. L. Kar and W. A. Bernhard, Radiat. Res., 93 (1983) 232. 19. D. M. Close and W. H. Nelson, Radiat. Res., 117 (1989) 367. 20. W. H. Nelson, E. Sagstuen, E. O. Hole and D. M. Close, Radiat. Res., 149 (1998) 75. 21. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 129 (1992) 119. 22. A.-O. Colson and M. D. Sevilla, J. Phys. Chem., 100 (1996) 4420. 23. E. O. Hole, W. H. Nelson, D. M. Close and E. Sagsmen, J. Chem. Phys., 86 (1987) 5218. 24. E. O. Hole, E. Sagstuen, W. H. Nelson, and D. M. Close, Radiat. Res., 129 (1992) 1. 25. E. O. Hole, E. Sagstuen, W. H. Nelson and D. M. Close, J. Phys. Chem., 95 (1991) 1494. 26. W. H. Nelson, E. Sagstuen, E. O. Hole and D. M. Close, Radiat. Res., 131 (1992) 272. 27. D. M. Close, W. H. Nelson and E. Sagstuen, Radiat. Res., 112 (1987) 283. 28. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 116 (1988) 196. 29. W. H. Nelson, E. O. Hole, E. Sagstuen and D. M. Close, Int. J. Radiat. Biol., 54 (1988) 963.

462

30. E. O. Hole, E. Sagstuen, W. H. Nelson and D. M. Close, Radiat. Res., 125 (1991) 119. 31. F. Jolibois, J. Cadet, A. Grand, R. Subra, N. Rega and V. Barone, J. Am. Chem. Soc., 120 (1998) 1864.

32. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, J. Phys. Chem., 96 (1992) 8269. 33. W. Hiraoka, M. Kuwabara, F. Sato, A. Matsuda, T. Ueda, Nucl. Acids Res., 18 (1990) 1217. 34. (a) D. Becker, T. La Vere and M. D. Sevilla, Radiat. Res., 140 (1994) 123; (b) D. Becker, T. La Vere and M. D. Sevilla, Radiat. Res., 145 (1996) 673. 35. M. Wala, E. Bothe, H. G6rner and D. Shulte-Frohlinde, J. Photocherru Photobiol. A, Chemistry, 53 (1990) 87. 36. (a) D. Chapman and C. GiUespie, J. Adv. Radiat. Biol., 9 (1981) 143; (b) R. T6oule, Int. J. Radiat. Biol., 51 (1987) 573. 37. S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 60 (1974) 388. 38. D. M. Close, E. Sagstuen, E. O. Hole, W. H. Nelson, J. Phys. Chem. B, 103 (1999) 3049. 39. S. D. Wetmore, R. J. Boyd, F. Himo and L. A. Eriksson, J. Phys. Chem. B, 103 (1999) 3051. 40. D. M. Close and W. A. Bernhard, J. C h e ~ Phys., 70 (1979) 210. 41. M. N. Schuchmann and C. von Sonntag, J. Chem. Soc., Perkin Trans., 2 (1977) 1958. 42. D. M. Close, Radiat. Res., 147 (1997) 663. 43. D. M. Close, W. H. Nelson, E. Sagstuen and E. O. Hole Radiat. Res., 137 (1994) 300. 44. K. Miaskiewicz and R. Osman, J. Am. Chem. Soc., 116 (1994) 232.

463

45. A.-O. Colson and M. D. Sevilla, J. Phys. Chem., 99 (1995) 3867. 46. W. Saenger, In Principles of Nucleic Acid Structure; Springer-Veflag: New York, 1984. 47. Effects of Ionizing Radiation on DNA; J. Htittermann, W. Kthnleif, R. Ttoule and A. J. Bertinchamps, Eds.; Springer: Heidelberg, 1978. 48. E. Sagstuen, J. Mag. Res. 1981, 44, 518. 49. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 130 (1992) 148. 50. C. Alexander, Jr. and C. E. Franklin, J. Chem. Phys., 54 (1971) 1909. 51. B. Rakvin and J. N. Herak, Radiat. Res., 88 (1981) 240. 52. E. Sagstuen, Radiat. Res., 84 (1980) 164. 53. E. O. Hole and E. Sagstuen, Radiat. Res., 109 (1987) 190. 54. A. Ehrenberg, L. Ehrenberg and G. Ltfroth, Nature, 200 (1963) 376. 55. R. Salovey, R. G. Shulman and W. M. Walsh, Jr. J. Chem. Phys., 39 (1963) 839. 56. P. S. Pershan, R. G. Shulman, B. J. Wyluda and J. Eisinger, Science, 148 (1964) 378. 57. A. Ehrenberg, A. Rupprecht and G. Strtm, Science, 157 (1967) 1317. 58. M. G. Ormerod, Int. J. Radiat. Biol., 9 (1965) 291. 59. A. Gr~islund, A. Ehrenberg, A. Rupprecht and G. Strtrn, Biochim. Biophys. Acta, 254 ( 1971) 172. 60. A. Gr~islund, A. Ehrenberg, A. Rupprecht, B. TjNldin and G. Strtm, Radiat. Res., 61 (1975) 488.

464

61. A. Gr/islund, A. Ehrenberg, A. Rupprecht, G. Str6m and H. Crespi, Int. J. Radiat. Biol., 28 (i 975) 313. 62. W. A. Bernhard, Adv. Radiat. Biol., 9 (1981) 199. 63. I. Zell, J. Htittermann, A. Gr~islund, A. Rupprecht and W. K6hnlein, Free Radical Res. Commun., 6 (1989) 105. 64. P. M. Cullis, J. D. McClymont, M. E. Malone, A. N. Mather, I. D. Podmore, M. C. Sweeney and M. C. R. Symons, J. Cherm Soc., Perkin Trans, 2 (1992) 1695. 65. W. A. Bernhard, J. Phys. Chem., 93 (1989) 2187. 66. J. Barnes, W. A. Bernhard and K. R. Mercer, Radiat. Res., 126 (1991) 104. 67. M. D. Sevilla, D. Becker, M. Yan and S. R. Summerfield, J. Phys. Chem., 95 (1991) 3409. 68. S. Steenken, J. P. Telo, H. M. Novais, and L. P. Candeias, J. Am, Chem. Soc., 114 (1992) 4701. 69. M. Yan, D. Becker, S. Summerfield, P. Renke and M. D. SeviUa, J. Phys. Chem,, 96 (1992) 1938. 70. S. G. Swarts, D. Becker, M. D. Sevilla, K. T. Wheeler, Radiat. Res., 145 (1996) 304. 71. (a) D. M. Close, E. Sagstuen, W. H. Nelson, J. Chem. Phys., 82 (1985) 4386; (b) E. O. Hole, W. H. Nelson, D. M. Close, E. Sagstuen, J. Chem. Phys., 86 (1987) 5218. 72. A.-O. Colson and M. D. Sevilla, Int. J. Radiat. Biol., 67 (1995) 627. 73. J. Htittermann, M. R6hrig and W. K6hnlein, Int. J. Radiat. Biol., 61 (1992) 299. 74. J. Htittermann, K. Voit, H. Oloff, W. K6hnlein, A. Gr/islund and A. Rupprecht, Faraday Discuss. Chem. Soc., 78 (1984) 135.

465

75. W. Wang, M. Yan, D. Becker and M. D. SeviUa, Radiat. Res., 135 (1994) 2. 76. W. Gatzweiler, J. Htittermann and A. Rupprecht, Radiat. Res., 138 (1994) 151. 77. B. Weiland, J. Htittermann and J. van Tol, Acta Chem. Scan., 51 (1997) 585. 78. B. Weiland and J. Htittermann, Int. J. Radiat. Biol., 74 (1998) 341. 79. I. D. Podmore, M. E. Malone, M. C. R. Symons, P. M. Cullis and B. G. Dalgarno, J. Chem. Soc. Faraday Trans., 2 (1991) 3647. 80. J. Htittermann, Ultramicroscopy, 10 (1982) 25. 81. D. Becker, Y. Razskazovskii, M. U. Callaghan and M. D. Sevilla, Radiat. Res., 146 (1996) 361. 82. A. Sanderud and E. Sagstuen, J. Chem. Soc. Faraday Trans., 91 (1996) 995. 83. D. J. Nelson, M. C. R. Symons and J. L. Wyatt, J. Chem. Soc. Faraday Trans., 89 (1993) 1955. 84. W. Saenger, Principles of Nucleic Acid Structure, C. R. Cantor, Ed.; Springer-Veflag: New York, 1984. 85. S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 89 (1982) 238. 86. W. Wang, D. Becker and M. D. Sevilla, Radiat. Res., 135 (1993) 146. 87. S. G. Swarts, M. D. Sevilla, D. Becker, C. J. Tokar and K. T. Wheeler, K. T. Radiat. Res., 129 (1992) 333. 88. T. Ito, S. C. Baker, C. D. Stickley, J. G. Peak and M. J. Peak, Int. J. Radiat. Biol., 63 (1993) 289. 89. N. Mroczka and W. A. Bernhrad, Radiat. Res., 135 (1993) 155. 90. J. Ohlmann and J. Htittermann, Int. J. Radiat. Biol., 63 (1993) 427. 91. C. von Sonntag and H.-P. Schuchmann, Int. J. Radiat. Biol., 49 (1986) 1.

466

92. S. Steenken, Chem. Rev., 89 (1989) 503. 93. J. Htittermann, M. Lange and J. Ohlmann, Radiat. Res., 131 (1992) 18. 94. (a) S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 65 (1976) 202; (b) S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 72 (1977) 201. 95. M. Malone, M. C. R. Symons and A. W. Parker, J. Chem. Soc. Perkin Trans., 2 (1993) 2067. 96. M. Lange, B. Wetland and J. Htittermann, Int. J. Radiat. Biol., 68 (1995) 475. 97. D. M. Close, E. Sagstuen and W. H. Nelson, Radiat. Res., 116 (1988) 379. 98. M. D. Sevilla and D. Becker, In A Specialists Periodical Report Electron Spin Resonance, Vol. 14, N. M. Atherton, M. J. Davis and B. C. Gilbert, Eds.; Royal Society of Chemistry: Cambridge, 1994, p. 130. 99. D. M. Close and W. A. Bernhard, Bull. Am. Phys. Soc., 25 (1980) 416. 100. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 149 (1998) 120. 101. S. Steenken, Free Radical Res. Commun., 16 (1992) 349. 102. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 146 (1996) 425. 103. M. Hutter and T. Clark, J. Am. Chem. Soc., 118 (1996) 7574. 104. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 149 (1998) 109. 105. E. Sagstuen, M. Lindgren and A. Lund, Radiat. Res., 128 (1991) 235. 106. S. Steenken and L. Goldbergerova, J. Am. Chem. Soc., 120 (1998) 3928.

L.A. Eriksson (Editor)

Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved

467

Chapter 12

New Computational Strategies for the Quantum Mechanical Study of Biological Systems in Condensed Phases Carlo Adamo, Maurizio Cossi, Nadia Rega and Vincenzo Barone Laboratory for the Structure and Dynamics of Molecules (LSDM), Dipartimento di Chimica, Universit~ 'Federico II', via Mezzocannone 4, 1-80134 Napoli, Italia ABSTRACT This chapter examines some of the methodological and computational aspects involved in the modeling of biomolecular systems at a quantum-mechanical level. In the first part we analyze in some detail a general strategy allowing an effective study of phisico-chemical processes involving large molecules in condensed phases. The main building block of our approach is a modular electronic tool rooted in the density functional theory coupled to an effective description of environmental effects by a mixed discrete-continuum model. The potential energy surfaces obtained in this way provide the input for a numerical treatment of a small number of large amplitude motions (possibly involving light particles) coupled to an harmonic bath. In the second part of this contribution we discuss a number of prototypical applications with the aim of giving a flavor of the potentialities and of the upcoming developments of this integrated approach. The 'ill rouge' of our report is provided by open-shell systems, which represent at the same time key intermediates in a number of biochemical processes and particularly challenging systems for both experimental investigations and quantum mechanical computations.

1. INTRODUCTION The theoretical treatment of biomolecular systems is becoming increasingly important in modem science for at least two different reasons. From the one hand, theoretical studies allow to obtain information that cannot be easily accessed by experimental methods and to dissect an overall effect into different contributions simply switching different interactions on and off in a selective way. From the other hand, working hypotheses can be formulated that can stimulate fimlaer experimental work and reduce the number of different tests to be performed. Of course, these tasks can be fulfilled only if the accuracy and the reliability of theoretical results match the experimental standards. While conventional approaches have reached a remarkable accuracy for small and medium size systems, biologically interesting molecules are invariably large,

468

flexible, and do not act in vacuo, but in aqueous solution. Even if effective numerical simulations can be routinely performed by empirical energy calculations for chemically significant models, a number of problems (e.g. reactivity, proton and electron transfer, spectroscopic and photochemical processes) require a quantum mechanical approach. Thus theoretical and computational chemistry are presently facing the very demanding challenge of expanding the applicability of the quantum mechanical approaches to large molecules. Both hardware and software developments are contributing to this task, leading to the first applications of reliable electronic structure methods to macromolecular systems. A leading role in this progress is played by the use of fast mulfipole moment (FMM) methods, sparse matrix algorithms, and conjugate gradient density matrix search (CG-DMS) techniques for solving the serf-consistent field (SCF) problem. At the same time faster algorithms are being developed for geometry optimizations of large molecules and effective composite approaches fin-ther reduce computer times. In this context, the situation is particularly favourable for Kohn-Sham (KS) methods, although promising progresses are being done for post-HF models too. It seems, therefore, particularly important to examine the limits of current density functionals (DF) for the description of specific features of biological systems, like non covalent interactions, proton and electron transfer, or spectroscopic parameters. This should hopefully allow the developmem of new functionals with improved reliability in these fields. As a consequence the first section of this contribution is devoted to the work being performed in our laboratory in the framework of density functional theory. From another point of view biological molecules are often very flexible, so that a realistic computation of their properties cannot neglect vibrational averaging effects from large amplitude motions. This aspect is examined in the second part of this work. Finally, biological processes occur in solution so that the modeling of physico-chemical processes at a microscopic level must be extended from isolated molecules to condensed phases. While explicit inclusion of solvent molecules in numerical simulations is providing interesting results, only shorttime local fluctuations are presently amenable to routine computations, thus leaving aside fundamental phenomena like conformafional transitions or protein folding. At the same time, continuum approaches are becoming more and more effective and reliable, thanks to the increasing accuracy of the underlying model coupled to their remarkable flexibility and efficiency. Here we will concentrate on the so called polarizable continuum model (PCM), which, thanks to a number of recent improvements, is rapidly approaching the target of 'chemically accurate' computations for systems in condensed phases. As a matter of fact, nearly all the quantum mechanical procedures (including analytical gradients

469

and hessians) developed for isolated systems are now available (with comparable computational efficiency) also for .systems in solution. A brief sketch of the status of the PCM is given in the third part of the paper together with some illustrative results. Finally, the usefulness of the computational tools sketched in the first three sections is analyzed in the last part of the report by means of some case processes involving unstable intermediates (radicals and zwitterionic species) of biological significance. 2. THE DENSITY FUNCTIONAL MODEL.

In the last few years the Density Functional Theory (DFT) has become one of the most powerful tools in computational chemistry [1-3]. Actually, an increasing amount of studies deals with DFT, both in the field of basic theoretical developments and in the wide framework of chemical applications. There are several reasons for this success. First of all, methods rooted in the DFT take into account a significant amount of electron correlation, providing accurate numerical results. As a matter of fact, the latest DFT implementations show an accuracy comparable to that of many body perturbative methods [4]. Another major advantage of DFT is its favourable scaling with the size of the system under investigation. The Kohn-Sham (KS) approach [5], the most common route to DFT, rests on equations which are close to those developed for the Hartree-Fock ( H ~ theory [1]. It was therefore quite easy to implement this model in several commercial quantum-mechanical packages intoducing only slight modifications into already existing software. As a consequence, standard D F r approaches have reached nearly the same basis-set dependence as the HF method [6]. Furthermore, they can take advantage of the most recent implementations in the field of Self Consistent Field methods. For instance, algorithms like Fast-Multipole Methods (FMM) [7] or fast assembly of the Hamiltonian matrix [8] have been successfially applied to the DFT methods, essentially without any modification. So, the asymptotic linear scaling has been obtained, and sizeable systems (up to several hundreds of atoms) can be handled by this quantum mechanical tool. At the same time, the formally independent particle nature of DFT allows the application of standard interpretative tools developed for the HF approach. This is true not only for the standard Mulliken population analysis, but also for more sophisticated schemes, like the Natural Bond Orbital (NBO) analysis [9], the Atomic Polarizable Tensor population [10], or the Atom in Molecule (AIM) approach [11]. These tools allow the use of familiar and well known models to analyze the molecular wave function and to rationalize it in terms of classical chemical concepts. In short, DFT is providing very effective quantum

470

mechanical tools, which take into account most of the electron correlation, at a fraction of the computational cost required by conventional post-HF methods. However, the weakness of the DFT approach is represented by the non-classical part of the Hamiltonian, the so-called exchange-correlation contribution, which is an unknown functional of the electron density. A huge number of exchange and correlation functionals have been proposed, characterized by different physical soundness and numerical performances. In this context, the so-called hybrid HF/DFT models, which mix some HF exchange with DFT contributions, are nowadays considered as standards for their good performances. In particular, the popular B3LYP approach provides results close to the so-called chemical accuracy for the properties of systems involving covalent bonds (e.g. the thermochemistry of molecules belonging to the so called G2 data set) and also for some non-covalent interactions, like hydrogen bonds. However, hybrid methods, as well as convemtional DFT approaches, are not yet sufficiently accurate for a number of chemical problems, like van der Waals complexes, proton transfer, or SN2 reactions. These limits provide a strong driving force to the quest for new functionals. In our opinion, a major requirement for a succesfiall exchange-correlation functional is its generality: a "good" functional should treat with the same accuracy different chemical interactions and properties, avoiding any excessive 'specialization' for a specific subset of interactions or properties. In the next paragraph we will discuss in some detail this last point. Of course we cannot be exhaustive and we refer the reader to published reviews and textbooks for a more complete analysis [1-3]. 2.1 Ftmctionals of the electronic density In the Kohn-Sham (KS) approach to DFT [1,5], the total energy can be expressed as: E[p]=

Ts [P]+ Vext[P]+ J [ p ] + Exc [P]

(1)

Here, V~ [p ] is the potential energy in the field of the nuclei plus any external perturbation, T,[p] is the kinetic energy of a set of n independent electrons, moving in an effective one-electron potential which leads to the density p(r), and J [ p ] is the total Coulomb interaction [1]. E~[p] is the remainder, usually described as the exchange-correlation energy. This term represents the keyproblem in DFT, since the exact Exc is unknown, and approximations must be used. The simplest approach is the local spin density approximation (LSD), in which the functional for the uniform electron gas of density p is integrated over the whole space:

471

E LSD = E f

unif ( Per )Ptr ( r )4/ 3dr e xc

(2)

where exc _unif (Ptr) is the exchange-correlation energy per particle of a uniform electron gas and ~ represents the spin (t~ or ~). While this approximation is responsible for the early success of DFT, it often provides unsatisfactory results in chemical applications [3]. Starting from equation (2) several corrections for the non-uniformity of atomic and molecular densities have been proposed. In particular, those based on the gradient of the electron density (V9) have received considerable attention in the last years due to their simplicity. These corrections, collectively referred to as generalized gradient approximation (GGA), are usually expressed in terms of an enhancement factor over the exchange energy of the uniform electron gas, so that the total exchange energy takes the form:

EGGA = ELSD - E

f FGGA[ Ptr ' VPtr ]Ptr( r )4/3dr

(3)

Wl~e exc _unif (Pa) in equation (2) is uniquely defined, there is no unique function FGGA , and a number of GGA exchange-correlation functionals have been proposed (see for instance refs. 12-20). Roughly speaking, we can recognize two main classes" the first one collects the functionals containing parameters fitted to some sets of experimental data, while the second class includes funtionals which fulfill a number of theoretical physical constraints. Although most existing functionals combine both approaches, recently the attention is being shifted to the first aspect even at the expense of introducing a huge number of parameters and of overemphasizing the thermochemistry of organic molecules [21]. In contrast with this tendency, functionals belonging to the second class are particularly attractive to theoretical chemists, due to their strong theoretical background and to the absence of any "specialization". Furthermore, a number of recent studies are showing that some parameter free functionals are not less accurate than the most succesfull heavy-parametrized models [22-27]. Despite the theoretical difficulties involved in the development of such functionals, their number is increasing and there is a request for even more stringent theoretical constraints [28-32]. It is thus natural to focalize our attention on this class of exchange-correlation functionals and on their performances in the field of biological applications.

472

2.2 The PBE functional The non-empirical GGA functional of Perdew, Burke and Emzerhof (PBE) [28] can be considered as the most promising non-empirical functional. In particular, it was constructed to respect a number of physical constraints both in the correlation and in the exchange parts. A detailed discussion of the physical background of the PBE functional is given in references 33 and 34. Here we just recall that it obeys the following six conditions: 1) correct uniform gas-limit 2) correct spin and uniform density scaling of Ex 3) the correct upper bound Ex < 0 4) the correct upper bound E~ < 0 5) the correct Lieb-Oxford lower bond [35] 6) the LSD linear response. In this functional the correlation part is similar to the Perdew-Wang (PW) correlation functional [36], while the exchange contribution is"

FPBE

- 1 + x"-

K"

(4)

l+~s 2 K;

with

~'=0.804, kt----0.21951 and

s =[V~/~k,~l/ FP

. This form is not completely new,

because it is the same used by Becke in his 1986 paper [37], with the tr and kt parameters (0.967 and 0.235, respectively) determined by a fitting procedure. What is new, and makes the strength of the PBE approach, is that ~ and ~t in equation 4, as well as the other parameters in the GGA correlation functional, have been obtained imposing the above mentioned constraints [28]. These conditions determine the behavior of the functional and its numerical performance for different chemical "situations". For instance, we have recently evidenced that the correct asymptotic conditions are less important in high density (which corresponds to covalent bonds), than in low-density regions. This point is of particular importance, since these latter regions are responsible for non-covalent interactions, such as H-bonds, van der Waals (vdW) and charge transfer (CT) and for spin polarization effects in EPR observables [38]. It must be remarked that some of the conditions are fulfilled also by other current exchange-correlation functionals. For instance, the Becke 88 exchange functional [14] respects only three of the above conditions, as does the functional developed by Gill [ 16]. Table 1 collects an error statistics for several density functionals, concerning the atomization energies of 55 molecules belonging to the so caUed G2 set [39],

473

which is nowadays considered a standard for the validation of new quanturn chemical approaches [40].

Table 1. Mean absolute errors (mae's, kJ/mol) and maximum errors for atomization energies of the original G2 set (55 molecules). The values have been computed using the MP2/6-31G(d) geometries of reference 39 and the 6-31 l+G(2df,2pd) basis set. Method GGA functionals BLYP PBE BPBE RevPBE RPBE Hybrid functionals B3LYP PBE0

mae

max error

40.2 36.0 27.2 20.1 18.8

107.9 (CO2) 110.4 (CO2) 98.7 (CO2) -82.8 (Si2H6) -86.6 (Si2H6)

10.0 14.6

-34.3 (Bell) -42.7 (Si2H6)

14.2

56.1 (02)

r-functionals mGGA

These results give a flavor of the performances of the different functionals with respect to this set of covalently bonded molecules, and can be considered as a starting point for a deeper discussion about chemical applications. From these data, it is quite apparent that the PBE functional performs as well as more empirical DFF approaches, like the BLYP model (Becke 88 exhange [14] and Lee-Yang-Parr correlation [19]). In table 2 we report the deviations for the geometrical parameters and harmonic vibrational frequencies of 32 molecules belonging to the G2 set. Here, the deviations of PBE are close to those provided by the BLYP functional, thus giving further support to the reliability of this model. It is clear, anyway, that these results are still far from the accuracy required for chemical applications (e.g. about 5 kJ/mol for atomization energies). Furthermore, the PBE functional suffers from other problems. For instance, the energy barriers for proton transfer reactions [22], as well as some chemisorpfion energies [31] are still significantly underestimated.

474

Table 2. Mean absolute errors (mae's) and maximum error for the bond lengths and harmonic vibrational frequencies of the molecules belonging to the reduced G2 data set. All values have been computed using the 6-311G(d,p) basis set. d (A) v (cm-1) mae

GGA functionals BLYP PBE revPBE RPBE

nlax err.

mae

max

err.

0.013 0.011 0.012 0.013

0.075(Li2) 0.064 (Li2) 0.072 (LIE) 0.075 (Li2)

77 59 65 72

212(CH) 194 (H2CO, 2192) 215 (NH) 195 (H2CO, 2b2)

0.004 0.007

0.057 (Li2) 0.062 (Li2)

32 40

135 (NO) 144 (NO)

0.019

0.111 (Li2)

72

196 (CH,NH)

Hybrid functionals B3LYP PBE0 z-functionals

mGGA

2.3 Beyond the PBE functional In order to simplify the following discussion, we separate correlation and exchange contributions, i.e.

Exc = e x + ec

(5)

Even if this is the most common representation of Ex~, the distinction between different contributions is somewhat artificial in the context of DFr. Ftn'thermore it can be misleading because there is some error compensation between both partners and combination of exchange and correlation functionals issuing from different sources could be dangerous. Anyway the separation between these two terms considerably simplifies the discussion and in the following we adopt this distinction. Some efforts have been done to improve the numerical performances of the PBE exchange functional without modifying its theoretical background. Among all the PBE conditions, the fulfillment of the Lieb-Oxford (LO) bound [35], E x > -1.679p(r)4/3

(6)

475

rises some question [29,31]. Figure 1 shows the behavior of some exchange functionals with respect to this bound. In particular, the Becke 86 (B86, ref. 12), PBE, RevPBE and RPBE exchange functionals have been considered. The first three functionals correspond to different values of the g and ~: constants in equation 4, whereas the last functional has a different dependence on p and V p. In the construction of the revPBE functional [29], Zhang and Yang pointed out that, for a given electron density, fulfillment of equation (6), which may be considered a local LO limit, is a sufficient, but not a necessary requirement for the fulfillment of the true, integrated LO bound:

E x > Exc >-1.679Axf p( r )4 / 3dr

(7)

In particular, the choice ~=-1.245 leads to both the fulfillment of the LO bound in all the chemically significant situations, and to more accurate energies for atoms and covalent molecules [29]. However, employing the local bound in the GGA construction ensures that the integrated bound will be satisfied for any possible electron density. Furthermore, optimization of the parameters for a specific property may worsen the behaviour of the functional for other situations. As a consequence, the revPBE functional, while being based on interesting considerations, loses its generality (see discussion in reference 41, and below). Recently, Hammer, Hansen and Norskov observed that the revPBE functional differs from the original PBE exchange in the region s < 2.5, where it still fulfiUs the local LO bound [31] (see figure 1). This suggests that it should be possible to construct from the PBE a new GGA functional, which follows the revPBE exchange only for s values up to 2.5. The resulting functional form is:

I -EszI

FxRPBE =1 + tr 1 - e ~:

(8)

They called this functional RPBE. So, while the revPBE functional deviates form the PBE functional in the value of one parameter (~:) in the exchange enhancement factor Fx(s), the RPBE functional deviates from the PBE functional in the form of the functional itself. It must be pointed out that RPBE preserves all the correct features of the parent PBE model. This functional provides very good chemisorption energies, but has not yet been tested on molecular systems. The behavior of the revPBE and RPBE functionals, with to respect the LO limit is shown in figure 1.

476

2,5 "

revPBE 2,0 Lieb-Oxford limit

..................................... R'P B-E ........

1,5

1,0

0

2

4

6

8

10

Figure 1. Asymptotic behavior of some exchange functionals belonging to the PBE family It is noteworthy that the considered functionals (B86, PBE, revPBE and PBE) have different behaviors in the region l