The Industrial Communication Technology Handbook (Industrial Information Technology)

  • 36 1,499 3
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

The Industrial Communication Technology Handbook (Industrial Information Technology)

THE INDUSTRIAL COMMUNICATION TECHNOLOGY H © 2005 by CRC Press A N D B O O K I N D U S T R I A L I N F O R M A

4,235 1,539 29MB

Pages 879 Page size 453.6 x 705.6 pts Year 2007

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

File loading please wait...
Citation preview

THE

INDUSTRIAL COMMUNICATION TECHNOLOGY H

© 2005 by CRC Press

A

N

D

B

O

O

K

I N D U S T R I A L I N F O R M AT I O N T E C H N O L O G Y S E R I E S Series Editor

RICHARD ZURAWSKI Forthcoming Books Embedded Systems Handbook Edited by Richard Zurawski

Electronic Design Automation for Integrated Circuits Handbook Luciano Lavagno, Grant Martin, and Lou Scheffer

© 2005 by CRC Press

THE

INDUSTRIAL COMMUNICATION TECHNOLOGY H

A

N

D

B

O

O

Edited by

RICHARD ZURAWSKI

© 2005 by CRC Press

K

Library of Congress Cataloging-in-Publication Data The industrial communication technology handbook / Richard Zurawski, editor. p. cm. — (The industrial information technology series ; 1) Includes bibliographical references and index. ISBN 0-8493-3077-7 (alk. paper) 1. Computer networks. 2. Data transmission systems. 3. Wireless communication systems. I. Zurawski, Richard. II. Series. TK5105.5.I48 2005 670'.285'46—dc22

2004057922

This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal use of specific clients, may be granted by CRC Press, provided that $1.50 per page photocopied is paid directly to Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA. The fee code for users of the Transactional Reporting Service is ISBN 0-8493-3077-7/05/$0.00+$1.50. The fee is subject to change without notice. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press for such copying. Direct all inquiries to CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com © 2005 by CRC Press No claim to original U.S. Government works International Standard Book Number 0-8493-3077-7 Library of Congress Card Number 2004057922 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper

© 2005 by CRC Press

Foreword

A handbook on industrial communication technology! What a challenge! When we know the complexity of industrial applications, the number of possible solutions, the number of standards, the variety of applications, of contexts, and of products! The challenge can be expressed with just a few words: applications diversity, need for networking, integration of functions, and technologies. Applications diversity: The applications concerned with industrial communications are known under the following terms: process control, manufacturing and flexible systems, building automation, transport management, utilities, and embedded systems, in trains, aircraft, cars, etc. All these applications need similar services, but in very different environments and also with very different qualities of service. Need for networking: The need for networking is not new. Since the MAP and TOP projects, in the field of automation, it is clear that the future of automation is really in distributed systems supported by distributed (heterogeneous) communication systems. The sharing of information, the necessity of interoperability, and the necessity of abstraction levels are just some of the reasons why industrial communication has always been considered a major challenge. Integration: In all the domains, integration is a key word meaning that all the functions in an enterprise need to be interconnected, in real time, as much as possible. This is only feasible through the use of robust communication systems, real-time features, and coherent design of the applications. With the development of ubiquitous computing and ambient intelligence, industrial communication applications will become the next challenge. Technologies: Numerous technologies are available for use at different levels of control and command and in all the services provided by a company; in addition, they exist for maintenance, supervision and monitoring, diagnosis, spare parts management, and so on. Specific solutions are frequently dictated by specific problems. The importance of standards cannot be overemphasized. Wireless systems, fieldbuses and cell or plant networks, building automation, device buses and applications, embedded systems, Internet technologies and related applications, security and safety, MAC protocols, and representative application domains are just some of the topics treated in this handbook. Methodology considerations for choosing and developing systems are also presented. This handbook will become the major reference source for this domain. Setting aside some technological details, the methods and principles presented will be relevant for years to come. Putting together such a book would not be possible without the cooperation of a great number of authors, all specialists in their fields and involved in the development of communication systems and applications, as well as members of the International Advisory Board. The Industrial Communication Technology Handbook is a must for industrial communication professionals. Jean-Pierre Thomesse Institute National Polytechnique de Lorraine Nancy, France v © 2005 by CRC Press

International Advisory Board

Jean-Pierre Thomesse, LORIA-INPL, France, Chair Salvatore Cavalieri, University of Catania, Italy Dietmar Dietrich, Vienna University of Technology, Austria Jean-Dominique Decotignie, CSEM, Switzerland Josep M. Fuertes, Universitat Politecnico de Catalunia, Spain Jürgen Jasperneite, Phoenix Contact, Germany Chris Jenkins, Proces-Data, U.K. Ed Koch, Akua Control, U.S. Thilo Sauter, Austrian Academy of Sciences, Austria Viktor Schiffer, Rockwell Automation, Germany Wolfgang Stripf, Siemens AG, Germany

vi © 2005 by CRC Press

Preface

Introduction Aim The purpose of The Industrial Communication Technology Handbook is to provide a reference useful for a broad range of professionals and researchers from industry and academia interested in or involved in the use of industrial communication technology and systems. This is the first publication to cover this field in a cohesive and comprehensive way. The focus of this book is on existing technologies used by the industry, and newly emerging technologies and trends, the evolution of which is driven by the actual needs and by the industry-led consortia and organizations. The book offers a mix of basics and advanced material, as well as overviews of recent significant research and implementation/technology developments. The book is aimed at novices as well as experienced professionals from industry and academia. It is also suitable for graduate students. The book covers extensively the areas of fieldbus technology, industrial Ethernet and real-time extensions, wireless and mobile technologies in industrial applications, linking the factory floor with the Internet and wireless fieldbuses, industrial networks’ security and safety, automotive applications, industrial automation applications, building automation applications, energy systems applications, and others. It is an indispensable companion for those who seek to learn more on industrial communication technology and systems and for those who want to stay up to date with recent technical developments in the field. It is also a rich source of material for any university or professional development course on industrial networks and related technologies. Contributors The book contains 42 contributions, written by leading experts from industry and academia directly involved in the creation and evolution of the ideas and technologies treated in the book. Over half of the contributions are from industry and industrial research establishments at the forefront of the developments shaping the field of industrial communication technology, for example, ABB, Bosch Rexroth Corporation, CSEM, Decomsys, Frequentis, Phoenix Contact, PROCES-DATA, PSA Peugeot-Citroen, PROFIBUS International, Rockwell Automation, SERCOS North America, Siemens, and Volcano. Most of the mentioned contributors play a leading role in the formulation of long-term policies for technology development and are key members of the industry–academe consortia implementing those policies. The contributions from academia and governmental research organizations are represented by some of the most renowned institutions, such as Cornell University, Fraunhofer, LORIA-INPL, National Institute of Standards (U.S.), Politecnico di Torino (Italy), Singapore Institute of Manufacturing Technology, Technical University of Berlin, and Vienna University of Technology. Format The presented material is in the form of tutorials, surveys, and technology overviews, combining fundamentals with advanced issues, making this publication relevant to beginners as well as seasoned profesvii © 2005 by CRC Press

sionals from industry and academia. Particular emphasis is on the industrial perspective, illustrated by actual implementations and technology deployments. The contributions are grouped in sections for cohesive and comprehensive presentation of the treated areas. The reports on recent technology developments, deployments, and trends frequently cover material released to the profession for the first time. Audience The handbook is designed to cover a wide range of topics that comprise the field of industrial communication technology and systems. The material covered in this volume will be of interest to a wide spectrum of professionals and researchers from industry and academia, as well as graduate students, from the fields of electrical and computer engineering, industrial and mechatronic engineering, mechanical engineering, computer science, and information technology.

Organization The book is organized into two parts. Part 1, Basics of Data Communication and IP Networks, presents material to cover in a nutshell basics of data communication and IP networks. This material is intended as a handy reference for those who may not be familiar with or wish to refresh their knowledge of some of the concepts used extensively in Part 2. Part 2, Industrial Communication Technology and Systems, is the main focus of the book and presents a comprehensive overview of the field of industrial communication technologies and systems. Some of topics presented in this part have received limited coverage in other publications due to either the fast evolution of the technologies involved, material confidentiality, or limited circulation in case of industry-driven developments. Part 1 includes six chapters that present in a concise way the vast area of IP networks. As mentioned, it is intended as supplementary reading for those who would like to refresh and update their knowledge without resorting to voluminous publications. This background is essential to understand the material presented in the chapters in Part 2. This part includes the following chapters: “Principles of LowerLayer Protocols for Data Communications in Industrial Communication Networks,” “IP Internetworking,” “A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues,” “Fundamentals in Quality of Service and Real-Time Transmission,” “Survey of Network Management Frameworks,” and “Internet Security.” Part 2 includes five major sections: Field Area and Control Networks, Ethernet and Wireless Network Technologies, Linking Factory Floor with the Internet and Wireless Fieldbuses, Security and Safety Technologies in Industrial Networks, and Applications of Networks and Other Technologies. Field Area and Control Networks The section on fieldbus technology provides a comprehensive overview of selected fieldbuses. The focus is on the most widely used in industry and the most widely known. The presentation is not exhaustive, however. One of the limiting factors was the availability of qualified authors to write authoritatively on the topics. This section begins with “Fieldbus Systems: History and Evolution,” presenting an extensive introduction to the fieldbus technology, comparison and critical evaluation of the existing technologies, and the evolution and emerging trends. This chapter is a must for anyone with an interest in the origins of the current fieldbus technology landscape. It is also compulsory reading for novices to understand the concepts behind fieldbuses. The “The WorldFIP Fieldbus” chapter was written by Jean-Pierre Thomesse, one of the pioneers of the fieldbus technology. WorldFIP is one of the first fieldbuses, developed in France at the beginning viii © 2005 by CRC Press

of the 1980s and widely used nowadays, particularly in applications that require hard real-time constraints and high dependability. This is almost a “personal” record of a person involved in the development of WorldFIP. A brief record of the origins and evolution of the FOUNDATION Fieldbus (H1, H2, and HSE) and its technical principles is presented in the chapter “FOUNDATION Fieldbus: History and Features.” The description of PROFIBUS (PROFIBUS DP) is presented in “PROFIBUS: Open Solutions for the World of Automation.” This is a comprehensive overview of PROFIBUS DP, one of the leading players in the fieldbus field, and it includes material on HART on PROFIBUS DP, application and master and system profiles, and integration technologies such as GSD (general station description), EDD (electronic device description), and DTM (device type manager). The chapter “Principles and Features of PROFInet” presents a new automation concept, and the technology behind it, that has emerged as a result of the trend in automation technology toward modular, reusable machines and plants with distributed intelligence. PROFInet is an open standard for industrial automation based on the industrial Ethernet. The material is presented by researchers from the Automation and Drives Division of Siemens AG, the leading provider of automation solutions within Siemens AG. Dependable time-triggered communication and architecture are presented in “Dependable TimeTriggered Communication,” written by Hermann Kopetz et al. Hermann Kopetz is the inventor of the concept and the driving force for the technology development. The TTP (Time-Triggered Protocol) and TTA (Time-Triggered Architecture) had a profound impact on the development of safety-critical systems, particularly in the automotive industry. This is one of the most authoritative presentations on this topic. The time-triggered CAN (TTCAN) protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. This technology is introduced in “Controller Area Network: A Survey.” This chapter describes the main features of the Controller Area Network (CAN) protocol, including TTCAN. The chapter “The CIP Family of Fieldbus Protocols” introduces the following CIP (Common Industrial Protocol) -based networks: DeviceNet, a CIP implementation employing a CAN data link layer; ControlNet, implementing the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters); and Ethernet/IP, in which CIP runs over TCP/IP. The chapter also introduces CIP Sync, which is a CIP-based communication principle that enables synchronous low-jitter system reactions without the need for low-jitter data transmission. This is important in applications that require much tighter control of a number of real-time parameters characterizing hard real-time control systems. The chapter also overviews CIP Safety, a safety protocol that adds additional services to transport data with high integrity. The P-NET fieldbus is presented in the chapter “The Anatomy of the P-NET Fieldbus.” The chapter was written by the chairman of the International P-NET User Organization and the technical director of PROCES-DATA (U.K.) Ltd., which provides the real-time PC operating system for P-NET. The chapter “INTERBUS Means Speed, Connectivity, Safety” introduces INTERBUS, a fieldbus with over 6 million nodes installed, and a broad base of device manufacturers. The chapter also briefly introduces IP over INTERBUS and looks at data throughput for IP tunneling. The IEEE 1394 FireWire, a high-performance serial bus, principles of its operation, and applications in the industrial environment are presented in “Data Transmission in Industrial Environments Using IEEE 1394 FireWire.” The issues involved in the configuration (setting up a fieldbus system) and management (diagnosis and monitoring, and adding new devices to the network, to mention some activities) of fieldbus systems

ix © 2005 by CRC Press

are presented in “Configuration and Management of Fieldbus Systems.” This chapter also discusses the plug-and-participate concept and its implementations in the industrial environment. The section on fieldbus technology is concluded by an excellent chapter discussing the pros and cons of selecting control networks for specific applications and application domains. The material in this chapter is authored by Jean-Dominique Decotignie. It includes a great deal of practical recommendations that can be useful for practicing professionals. It is the kind of material that cannot be easily found in the professional literature. Ethernet and Wireless Network Technologies This section on Ethernet and wireless/mobile network technologies contains four chapters discussing the use of Ethernet and its variants in industrial automation, as well as selected issues related to wireless technologies. Ethernet is fast becoming a de facto industry standard for communication in factories and plants at the fieldbus level. The random and native CSMA/CD (carrier-sense multiple access with collision detection) arbitration mechanism is being replaced by other solutions allowing for deterministic behavior required in real-time communication to support soft and hard real-time deadlines. The idea of using wireless technology on the factory floor is appealing, since fieldbus stations and automation components can be mobile, and furthermore, the need for (breakable) cabling is reduced. However, the wireless transmission characteristics are fundamentally different from those of other media types, leading to comparably high and time-varying error rates. This poses a significant challenge for fulfilling the hard real-time and reliability requirements of industrial applications. This section begins with the chapter “Approaches to Enforce Real-Time Behavior in Ethernet,” which discusses various approaches to ensure real-time communication capabilities, to include those that support probabilistic as well as deterministic analysis of the network access delay. This chapter also presents a brief description of the Ethernet protocol. The practical solutions to ensure real-time communication capabilities using switched Ethernet are presented in “Switched Ethernet in Automation Networking.” This chapter provides an evaluation of the switched Ethernet suitability in the context of industrial automation and presents practical solutions obtained through R&D to address actual needs. The issues involving the use of wireless and mobile communication in the industrial environment (factory floor) are discussed in “Wireless LAN Technology for the Factory Floor: Challenges and Approaches.” This is a very comprehensive chapter dealing with topics such as error characteristics of wireless links and lower-layer wireless protocols for industrial applications. It also briefly discusses hybrid systems involving extending selected fieldbus technologies (such as PROFIBUS and CAN) with wireless stations. The chapter “Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment” concludes this section. This chapter discusses from the radio network perspective the potentials and limits of technologies such as Bluetooth, IEEE 802.11, and ZigBee for deployment in the industrial environments. Linking Factory Floor with the Internet and Wireless Fieldbuses The demand for process data availability at different levels of factory organizational hierarchy, from production to the business level, has caused an upsurge in the activities to link the “factory floor” with the intranet/Internet. The issues, solutions, and technologies for linking industrial environments with the Internet and wireless fieldbuses are extensively discussed in this section.

x © 2005 by CRC Press

The issues and actual and potential solutions behind linking factory floor/industrial environments with the Internet/intranet are discussed in “Linking Factory Floor and the Internet.” This chapter also discusses new trends involving industrial Ethernet. The chapter “Extending EIA-709 Control Networks across IP Channels” presents a comprehensive overview of the use of the ANSI/EIA-852 standard to encapsulate the ANSI/EIA-709 control network protocol. This contribution comes from authors from industry involved directly in the relevant technology development. The means for interconnecting wire fieldbuses to wireless ones in the industrial environment, various design alternatives, and their evaluation are presented in “Interconnection of Wireline and Wireless Fieldbuses.” This is one of the most comprehensive and authoritative discussions of this challenge, presented by one of the leading authorities of the fieldbus technology. Security and Safety Technologies in Industrial Networks Security in the field area networks employed in the industrial environment is a major challenge. The requirement for process data availability via intranet/Internet access opens possibilities for intrusion and potential hostile actions to result in engineering system failures, including catastrophic ones if they involve chemical plants, for instance. These and safety issues are the focus of this section. This section begins with the chapter “Security Topics and Solutions for Automation Networks,” which provides a comprehensive discussion of the issues involved, challenges, and existing solutions amenable to adaptation to industrial environments, and outlines a need for new approaches and solutions. The second paper in this section is “PROFIsafe: Safety Technology with PROFIBUS,” which focuses on the existing solutions and supporting technology in the context of PROFIBUS, one of the most widely used fieldbuses in industrial applications. The material is presented by some of the creators of PROFIsafe. CIP Safety, a safety protocol for CIP, is presented in the Field Area and Control Networks section in “The CIP Family of Fieldbus Protocols.” Applications of Networks and Other Technologies This is the last major section in the book. It has eight subsections dealing with specialized field area networks (synonymous with fieldbuses) and their applications to cover automotive communication technology, building automation, manufacturing message specification in industrial communication systems, motion control, train communication, smart transducers, energy systems, and SEMI (Semiconductor Equipment and Materials International). This section tries to present some of the most representative applications of field area networks outside the industrial controls and automation presented in the Field Area and Control Networks section. The “Automotive Communication Technologies” subsection has four chapters discussing different approaches, solutions, and technologies. The automotive industry is a very fast growing consumer of field area networks, aggressively adopting mechatronic solutions to replace or duplicate existing mechanical/hydraulic systems. This subsection begins with the chapter “Design of Automotive X-by-Wire Systems,” which gives an overview of the X-by-wire approach and introduces safety-critical communication protocols (TTP/C, FlexRay, and TTCAN) and operating systems and middleware services (OSEKTime and FTCom) used in automotive applications. The chapter also presents a comprehensive case study illustrating the design of a Steer-by-Wire system. The newly emerging standard and technology for automotive safety-critical communication — FlexRay — is presented in the chapter “FlexRay Communication Technology.” The material is among the most

xi © 2005 by CRC Press

comprehensive and authoritative available at the time of this book’s publication, and it is written by industry people directly involved in the standard and technology development. The LIN (Local Interconnect Network) communication standard, enabling fast and cost-efficient implementation of low-cost multiplex systems for local interconnect networks in vehicles, is presented in “The LIN Standard.” The Volcano concept and technology for the design and implementation of in-vehicle networks using the standardized CAN and LIN communication protocols are presented in “ Volcano: Enabling Correctness by Design.” The material comes from the source: Volcano Communications Technologies AG. This chapter provides insight into the design and development process of an automotive communication network. Another fast-growing consumer of field area networks is building automation. At this stage, particularly for office, commercial, and industrial complexes, the use of automation solutions offers substantial financial savings on costs of lighting and HVAC and can considerably improve the quality of the environment. There are other benefits as well. Relevant communication solutions for this application domain are presented in the subsection “Networks in Building Automation.” This subsection is composed of three contributions, outlining the issues involved and the specific technologies currently in use. An excellent introduction to issues, architectures, and available solutions is presented in “The Use of Network Hierarchies in Building Telemetry and Control Applications.” The material was written by one of the pioneers of the concept of building automation and a technology developer. The details of the European Installation Bus (EIB), a field area network designed specifically for building automation purposes, are presented in “EIB: European Installation Bus.” This chapter was contributed by one of the most active proponents of using field area networks in building automation and a co-founder of one of the largest research groups in this field, the Vienna University of Technology. “Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)” chapter introduces the technical aspects of LonWorks networks, one of the main contenders for building automation. It covers protocol, development environments, and tools. The subsection “Manufacturing Message Specification in Industrial Automation” focuses on the highly successful international standard MMS (manufacturing message specification), which is an Open Systems Interconnection (OSI) application layer messaging protocol designed for the remote control and monitoring of devices such as remote terminal units (RTUs), programmable logic controllers (PLCs), numerical controllers (NCs), robot controllers (RCs), etc. This section features two chapters: “The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS),” which gives a fairly comprehensive introduction to the standard and illustrates its use; and “Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine,” which shows the use of MOTIP (MMS on top of TCP/IP) in development and operation of the virtual factory environment. The chapter also discusses an MMS-based Internet monitoring system. The chapter “The SERCOS interface™” describes the international standard (IEC/EN 61491) for communication between digital motion controls, drives, input/output (I/O), and sensors. It includes definitions, a brief history, a description of SERCOS interface communication methodology, an introduction to SERCOS interface hardware, a discussion of speed considerations, information on conformance testing, and information on available development tools. A number of real-world applications are presented and a list of sources for additional information is provided. The “IEC/IEEE Train Communication Network” chapter presents details of the international standard IEC 61375, adopted in 1999. It also discusses other European and U.S. initiatives in this field.

xii © 2005 by CRC Press

“A Smart Transducer Interface Standard for Sensors and Actuators” presents material on the IEEE 1451 standards for connecting sensors and actuators to microprocessors, control and field area networks, and instrumentation systems. The standards also define the Transducer Electronic Data Sheet (TEDS), which allows for the self-identification of sensors. The IEEE 1451 standards facilitate sensor networking, a new trend in industrial automation, which, among other benefits, offers strong economic incentives. The use of IEC 61375 (Train Communication Network) in substation automation is presented in “Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations.” This is in an interesting case study illustrating the suitability of some of the field area networks for various application domains. The last subsection and chapter in the Applications of Networks and Other Technologies section is “SEMI Interface and Communication Standards: An Overview and Case Study.” This is an excellent introduction to SEMI, providing an overview of the fundamentals of the SEMI Equipment Communication Standard, commonly referred to as SECS, its interpretation, the available software tools, and case study applications. The material was written by experts from the Singapore Institute of Manufacturing Technology who were involved in a number of SEMI technology developments and deployments.

Locating Topics To assist readers with locating material, a complete table of contents is presented at the front of the book. Additionally, each chapter begins with its own table of contents. For further assistance, two indexes are provided at the end of the book: an index of authors who contributed to the book, together with the titles of their contributions, and a detailed subject index.

xiii © 2005 by CRC Press

Acknowledgments

I thank all members of the International Advisory Board for their help with structuring the book, selection of authors, and material evaluation. I have received tremendous cooperation from all contributing authors. I thank all of them for that. I also express gratitude to my publisher, Nora Konopka, and other CRC Press staff involved in the book’s production, particularly Jessica Vakili, Elizabeth Spangenberger, and Gail Renard. My gratitude goes also to my wife, who tolerated the countless hours I spent preparing this book. Richard Zurawski ISA Corp Santa Clara, CA

© 2005 by CRC Press

The Editor

Dr. Richard Zurawski is president and CEO of ISA Corp., South San Francisco and Santa Clara, CA, a company involved in providing solutions for industrial and societal automation. He is also chief scientist with and a partner in a Silicon Valley-based start-up involved in the development of wireless solutions and technology. Dr. Zurawski is a co-founder of the Institute for Societal Automation, Santa Clara, a research and consulting organization. Dr. Zurawski has over 25 years of academic and industrial experience, including a regular appointment at the Institute of Industrial Sciences, University of Tokyo, and full-time R&D advisor with Kawasaki Electric, Tokyo. He has provided consulting services to Telecom Research Laboratories, Melbourne, Australia, and Kawasaki, Ricoh, and Toshiba Corporations, Japan. He has participated in an IMS package: Formal Methods in Distributed Autonomous Manufacturing Systems and Distributed Logic Controllers, Task 8: Distributed Intelligence in Manufacturing Systems; Globeman 21 Group I: Global Product Management. He has also participated in a number of Japanese Intelligent Manufacturing Systems programs. Dr. Zurawski’s involvement in R&D projects and activities in the past few years includes remote monitoring and control, network-based solutions for factory floor control, network-based demand side management, MEMS (automatic microassembly), Java technology, SEMI (Semiconductor Equipment and Materials International) implementations, development of DSL telco equipment, and wireless applications. Dr. Zurawski currently serves as an associate editor of the IEEE Transactions on Industrial Electronics and Real-Time Systems: The International Journal of Time-Critical Computing Systems, Kluwer Academic Publishers. He was a guest editor of three special sections in IEEE Transactions on Industrial Electronics: two sections on factory automation and one on factory communication systems. He has also been a guest editor of a special issue of the Proceedings of the IEEE dedicated to industrial communication systems. In addition, Dr. Zurawski was invited by IEEE Spectrum to contribute material on Java technology to “Technology 1999: Analysis and Forecast Issue.” Dr. Zurawski is the series editor for The Industrial Information Technology Series, CRC Press, Boca Raton, FL, and has served as a vice president of the Institute of Electrical and Electronics Engineers (IEEE) Industrial Electronics Society (IES), chairman of the Factory Automation Council, and chairman of the IEEE IES Ad Hoc Committee on IEEE Transactions on Factory Automation. He was an IES representative to the IEEE Neural Network Council and IEEE Intelligent Transportation Systems Council. He was also on a steering committee of the ASME/IEEE Journal of Micromechanical Systems. In 1996, he received the Anthony J. Hornfeck Service Award from the IEEE Industrial Electronics Society. Dr. Zurawski has established two IEEE events: the IEEE Workshop on Factory Communication Systems, the only IEEE event dedicated to industrial communication networks; and the IEEE International Conference on Emerging Technologies and Factory Automation, the largest IEEE conference on factory automation. He has served as a general, program, and track chair for a number of IEEE conferences and workshops. xv © 2005 by CRC Press

Dr. Zurawski has published extensively on various aspects of control systems, industrial and factory automation, industrial communication systems, robotics, formal methods in the design of embedded and industrial systems, and parallel and distributed programming and systems. Currently, he is preparing The Embedded Systems Handbook, soon to be published by CRC Press.

© 2005 by CRC Press

Contributors

Luís Almeida

Joachim Feld

Øyvind Holmeide

Universidade de Aveiro Aveiro, Portugal

Siemens AG Nürnberg, Germany

OnTime Networks Billingstad, Norway

Herbert Barthel

A.M. Fong

Jürgen Jasperneite

Siemens AG Nürnberg-Moorenbrunn, Germany

Singapore Institute of Manufacturing Technology Singapore

Phoenix Contact GmbH & Co. KG Bad Pyrmont, Germany

Günther Bauer

Klaus Frommhagen

Ulrich Jecht

Vienna University of Technology Vienna, Austria

Fraunhofer Institute of Photonic Microsystems Dresden, Germany

UJ Process Analytics Baden-Baden, Germany

K.M. Goh

PROCES-DATA (U.K.) Ltd. Wallingford, Oxon, United Kingdom

Ralph Büsgen Siemens AG Nürnberg, Germany

Christopher G. Jenkins

Salvatore Cavalieri

Singapore Institute of Manufacturing Technology Singapore

University of Catania Catania, Italy

Zygmunt J. Haas

Gianluca Cena

Cornell University Ithaca, New York

IEIIT-CNR Torino, Italy

Scott C. Hibbard

Jean-Dominique Decotignie

Svein Johannessen ABB Corporate Research Billingstad, Norway

Wolfgang Kampichler Bosch Rexroth Corporation Hoffman Estates, Illinois

Frequentis GmbH Vienna, Austria

Wolfgang Kastner Helmut Hlavacs

Centre Suisse d’Electronique et de Microtechnique Neuchatel, Switzerland

University of Vienna Vienna, Austria

Wilfried Elmenreich

Mai Hoang

Vienna University of Technology Vienna, Austria

University of Potsdam Potsdam, Germany

Vienna University of Technology Vienna, Austria

Dong-Sung Kim Kumoh National Institute of Technology Gumi-Si, South Korea

xvii © 2005 by CRC Press

Hubert Kirrmann

Peter Lutz

Antal Rajnák

ABB Corporate Research Baden, Switzerland

Interests Group SERCOS interface e.V. Stuttgart, Germany

Volcano AG Tägerwilen, Switzerland

Edward Koch Akua Control San Rafael, California

Hermann Kopetz Vienna University of Technology Vienna, Austria

Kirsten Matheus Carmeq GmbH Berlin, Germany

Dietmar Millinger DECOMSYS — Dependable Computer Systems Vienna, Austria

Christopher Kruegel Vienna University of Technology Vienna, Austria

Christian Kurz University of Vienna Vienna, Austria

Ronald M. Larsen SERCOS North America Lake in the Hills, Illinois

Kang Lee National Institute of Standards and Technology Gaithersburg, Maryland

Y.G. Lim Singapore Institute of Manufacturing Technology Singapore

Lucia Lo Bello

Petra Nauber

Thilo Sauter Austrian Academy of Sciences Wiener Neustadt, Austria

Uwe Schelinski Fraunhofer Institute of Photonic Microsystems Dresden, Germany

Viktor Schiffer

Fraunhofer Institute of Photonic Microsystems Dresden, Germany

Rockwell Automation Haan, Germany

Nicolas Navet

Fraunhofer Institute of Photonic Microsystems Dresden, Germany

Michael Scholles LORIA Vandoeuvre-lès-Nancy, France

Georg Neugschwandtner Vienna University of Technology Vienna, Austria

Roman Nossal DECOMSYS — Dependable Computer Systems Vienna, Austria

Paulo Pedreiras Universidade de Aveiro Aveiro, Portugal

Christian Schwaiger Austria Card GmbH Vienna, Austria

Karlheinz Schwarz Schwarz Consulting Company (SCC) Karlsruhe, Germany

Françoise Simonot-Lion LORIA Vandoeuvre-lès-Nancy, France

University of Catania Catania, Italy

Stefan Pitzek

Tor Skeie

Vienna University of Technology Vienna, Austria

ABB Corporate Research Billingstad, Norway

Dietmar Loy

Manfred Popp

Ye Qiong Song

LOYTEC Electronics GmbH Vienna, Austria

Siemens AG Fürth, Germany

LORIA Vandoeuvre-lès-Nancy, France

xviii © 2005 by CRC Press

Stefan Soucek

O. Tin

Cédric Wilwert

LOYTEC Electronics GmbH Vienna, Austria

Singapore Institute of Manufacturing Technology Singapore

PSA Peugeot–Citroen La Garenne Colombe, France

Wilfried Steiner Vienna University of Technology Vienna, Austria

Hagen Woesner Albert Treytl Vienna University of Technology Vienna, Austria

Wolfgang Stripf Siemens AG Karlsruhe, Germany

Technical University of Berlin Berlin, Germany

K. Yi Adriano Valenzano IEIIT-CNR Torino, Italy

Singapore Institute of Manufacturing Technology Singapore

Peter Wenzel

Pierre A. Zuber

PROFIBUS International Karlsruhe, Germany

Bombardier Transportation Total Transit Systems Pittsburgh, Pennsylvania

Jean-Pierre Thomesse Institut National Polytechnique de Lorraine Vandoeuvre-lès-Nancy, France

Andreas Willig University of Potsdam Potsdam, Germany

xix © 2005 by CRC Press

Contents

Part 1

Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks..............................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel

Part 2

Industrial Communication Technology and Systems

Section I

Field Area and Control Networks

7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner

xxi © 2005 by CRC Press

13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie

Section II

Ethernet and Wireless Network Technologies

20 Approaches to Enforce Real-Time Behavior in Ethernet .............................................20-1 Paulo Pedreiras and Luís Almeida 21 Switched Ethernet in Automation Networking .............................................................21-1 Tor Skeie, Svein Johannessen, and Øyvind Holmeide 22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches..........22-1 Andreas Willig 23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment ......................................................................................................................23-1 Kirsten Matheus

Section III

Linking Factory Floor with the Internet and Wireless Fieldbuses

24 Linking Factory Floor and the Internet.........................................................................24-1 Thilo Sauter 25 Extending EIA-709 Control Networks across IP Channels..........................................25-1 Dietmar Loy and Stefan Soucek 26 Interconnection of Wireline and Wireless Fieldbuses..................................................26-1 Jean-Dominique Decotignie

Section IV

Security and Safety Technologies in Industrial Networks

27 Security Topics and Solutions for Automation Networks............................................27-1 Christian Schwaiger and Albert Treytl

xxii © 2005 by CRC Press

28 PROFIsafe: Safety Technology with PROFIBUS ...........................................................28-1 Wolfgang Stripf and Herbert Barthel

Section V

Applications of Networks and Other Technologies

Automotive Communication Technologies 29 Design of Automotive X-by-Wire Systems ....................................................................29-1 Cédric Wilwert, Nicolas Navet, Ye Qiong Song, and Françoise Simonot-Lion 30 FlexRay Communication Technology ............................................................................30-1 Dietmar Millinger and Roman Nossal 31 The LIN Standard ............................................................................................................31-1 Antal Rajnák 32 Volcano: Enabling Correctness by Design.....................................................................32-1 Antal Rajnák Networks in Building Automation 33 The Use of Network Hierarchies in Building Telemetry and Control Applications......................................................................................................................33-1 Edward Koch 34 EIB: European Installation Bus ......................................................................................34-1 Wolfgang Kastner and Georg Neugschwandtner 35 Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)..........................................................................................................35-1 Dietmar Loy Manufacturing Message Specification in Industrial Automation 36 The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS) ..............................................................................................................36-1 Karlheinz Schwarz 37 Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine...........................................................................................37-1 Dong-Sung Kim and Zygmunt J. Haas Motion Control 38 The SERCOS interface™..................................................................................................38-1 Scott C. Hibbard, Peter Lutz, and Ronald M. Larsen Train Communication Network 39 The IEC/IEEE Train Communication Network ............................................................39-1 Hubert Kirrmann and Pierre A. Zuber

xxiii © 2005 by CRC Press

Smart Transducer Interface 40 A Smart Transducer Interface Standard for Sensors and Actuators ...........................40-1 Kang Lee Energy Systems 41 Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations ..................................................................................................41-1 Hubert Kirrmann SEMI 42 SEMI Interface and Communication Standards: An Overview and Case Study........42-1 A.M. Fong, K.M. Goh, Y.G. Lim, K. Yi, and O. Tin

xxiv © 2005 by CRC Press

3077_book.fm Page 1 Friday, November 19, 2004 11:21 AM

1 Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks................................................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel

1-1 © 2005 by CRC Press

1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks 1.1 1.2

Introduction ........................................................................1-1 Framing and Synchronization............................................1-2 Bit Synchronization • Frame Synchronization • Example: Bit and Frame Synchronization in the PROFIBUS

1.3

Medium Access Control Protocols.....................................1-6 Requirements and Quality-of-Service Measures • Design Factors • Random Access Protocols • Fixed-Assignment Protocols • Demand-Assignment Protocols • Meta-MAC Protocols

1.4

Error Control Techniques.................................................1-15 Open-Loop Approaches • Closed-Loop Approaches • Hybrid Approaches • Further Countermeasures

1.5

Flow Control Mechanisms................................................1-18 XON/XOFF and Similar Methods • Sliding-Window Flow Control • Further Mechanisms

1.6

Packet Scheduling Algorithms..........................................1-20 Priority Scheduling • Fair Scheduling

1.7

Andreas Willig University of Potsdam

Hagen Woesner Technical University of Berlin

Link Layer Protocols .........................................................1-22 The HDLC Protocol Family • The IEEE 802.2 LLC Protocol

References .....................................................................................1-24 Bit and Frame Synchronization • Medium Access Control Protocols • Error Control • Flow Control • Link Layer Protocols • Packet Scheduling

1.1 Introduction In packet-switched networks the lower layers (data link layer, medium access control layer, physical layer) have to solve some fundamental tasks to facilitate successful communication. The lower layers are concerned with communication between neighbored stations, in contrast to the layers above (networking layer, transport layer), which are concerned with end-to-end communications over multiple intermediate stations.

1-1 © 2005 by CRC Press

1-2

The Industrial Communication Technology Handbook

The lower layers communicate over physical channels, and consequently, their design is strongly influenced by the properties of the physical channel (bandwidth, channel errors). The importance of the lower layers for industrial communication systems is related to the requirement for hard real-time and reliability guarantees: if the lower layers are not able to guarantee successful delivery of a packet/frame* within a prescribed amount of time, this cannot be compensated by any actions of the upper layers. Therefore, a wide variety of mechanisms have been developed to implement these guarantees and to deal with harmful physical channel properties like transmission errors. In virtually all data communication networks used in industrial applications the transmission is packet based; i.e., the user data are segmented into a number of distinct packets, and these packets are transmitted over the channel. Therefore, the following fundamental problems have to be solved: • What constitutes a frame and how are the bounds of a frame specified? How does the receiver detect frames and the data contained? To this end, framing and synchronization schemes are needed, discussed in Section 1.2. • When should a frame be transmitted? If multiple stations want to transmit their frames over a common channel, appropriate rules are needed to share the channel and to let each station determine when it may send its frames. This problem is tackled by medium access control (MAC) protocols, discussed in Section 1.3. • How should channel errors be coped with? The topic of error control is briefly touched on in Section 1.4. • How should the receiver be protected against too much data sent by the transmitter? This is the problem of flow control, discussed in Section 1.5. • Which packet should be transmitted next? This is the problem of packet scheduling, sketched in Section 1.6. • Finally, in link layer protocols all these mechanisms are combined into a working protocol. We discuss two important protocols in Section 1.7. The chapter is necessarily short on many topics. The interested reader will find further references in the text.

1.2 Framing and Synchronization The problem of synchronization is related to the transmission of information units (packets, frames) between a sending and a receiving entity. In computer systems, information is usually stored and processed in a binary digital form (bits). A packet is formed from a group of bits and shall be transmitted to the receiver. The receiver must be able to uniquely determine the start and end of a packet as well as the bits within the packet. The transmission of information over short distances, for instance, inside the computer, can be done with parallel transmission. Here, a number (say 64) of parallel copper wires transport all bits of a 64-bit data word at the same time. In most cases, one additional wire transmits the common reference clock. Whenever the transmitter has applied the correct voltage (representing a 0 or 1 bit) on all wires, it signals this by sending a sampling pulse on the clock wire toward the receiver. Conversely, on receiving a pulse on the clock wire, the receiver samples the voltage levels on all data wires and converts them back to bits by comparing them with a threshold. This kind of transmission is fast and simple, but cannot span large distances, because the cabling cost becomes prohibitive. Therefore, the data words have to be serialized and transmitted bit by bit on a single wire.† *We will use both terms interchangeably. †The term wire is actually used here as a synonym for a transmission channel. It therefore could also be a wireless or ISDN channel.

© 2005 by CRC Press

1-3

Principles of Lower-Layer Protocols for Data Communications

1

0

1

1

0

0

1

0

NRZ

Manchester

Diff. Manchester

1 means no level change

0 means level change

FIGURE 1.1 NRZ, Manchester, and differential Manchester codes.

1.2.1 Bit Synchronization The spacing of bits generated by the transmitter depends on its local clock. The receiver needs this clock information to sample the incoming signal at appropriate points in time. Unfortunately, the transmitters’ and receivers’ clocks are not synchronized, and the synchronization information has to be recovered from the data signal; the receiver has to synchronize with the transmitter. This process is called bit synchronization. The aim is to let the receiver sample the received signal in the middle of the bit period in order to be robust against the impairments of the physical layer, like bandwidth limitation and signal distortions. Bit synchronization is called asynchronous if the clocks are synchronized only for one data word and have to be resynchronized for the next word. A common mechanism used for this employs one start bit preceding the data word and one or more stop bits concluding it. The Universal Asynchronous Receiver/Transmitter (UART) specification defines one additional parity bit, which is appended to the 8 data bits, leading to the transmission of 11 bits total for every 8 data bits [3]. The upper row in Figure 1.2 illustrates this. For longer streams of information bits, the receiver clock must be synchronized continuously. The digital phase-locked loop (DPLL) is an electrical circuit that controls a local clock and adjusts it to the received clock being extracted from the incoming signal [23]. To recover the clock from the signal, sufficiently frequent changes of signal levels are needed. Otherwise, if the wire shows the same signal level for a long time (as may happen for the non-return to zero (NRZ) coding method, where bits are directly mapped to voltage levels), the receiver clock could drift away from the transmitter clock. The Manchester encoding (shown in the second row of Figure 1.1) ensures that there is at least one signal change per bit. Every logical 1 is represented by a signal change from one to zero, whereas a logical 0 shows the opposite signal change. The internal clock of the DPLL samples the incoming signal with a much higher frequency, for instance, 16 times per bit. For a logical 0 bit that is arriving exactly in time, the DPLL receives a sample pattern of 0000000011111111. If the transition between the 0 and 1 samples is not exactly in the middle of the bit but rather left or right of it, the local clock has to be readjusted to run faster or slower, respectively. In the classical IEEE 802.3 Ethernet, the bits are Manchester encoded [2]. To allow the DPLL of the receiver to synchronize to the received bit stream, a 64-bit-long preamble is transmitted ahead of each frame. This preamble consists of alternating 0 and 1 bits that result in a square wave of 5 MHz. A start-of-frame delimiter of two consecutive 1 bits marks the end of the preamble and the beginning of the data frame.

© 2005 by CRC Press

1-4

The Industrial Communication Technology Handbook

Start

D7

D6

D5

D4

D3

D2

D1

D0

Parity

Stop

UART character (11 bit) Start, Stop Start/Stop bit D7D0 Data bits SD1

DA

SA

FC

FCS

ED

Control frame (no data)

SD2

DA

SA

FC

Data

FCS

ED

SD1SD3 DA, SA FC FCS LE LEr ED

Start Delimiter Destination, Source address Frame Control byte Frame Check Sequence (CRC) Length Field Length Field repeated End Delimiter

Fixed data length (8 characters)

SD3

LE

LEr

SD3

SA

DA

FC

Data

FCS

ED

Variable length data frame (0249 characters)

FIGURE 1.2 EN 50170 PROFIBUS: character and selected frame formats.

1.2.2 Frame Synchronization It is of interest for the receiver to know whether the received information is (1) complete and (2) correct. Section 1.4 treats the latter problem in some more detail. To decide about the first problem, the receiver needs to know where a packet starts and ends. The question that arises immediately is that of marking the start and end of a frame. There are several ways to accomplish this; in real-world protocols one often finds combinations of them. In the following, the most important will be discussed briefly. 1.2.2.1 Time Gaps The most straightforward way to distinguish between frames is to leave certain gaps of silence between them. However, when many stations share the same medium, all of them have to obey these time gaps. As it will be seen in Section 1.3.3.2, several MAC protocols rely on minimum time gaps to determine if the medium is accessible. While time gaps are a simple way to detect the start of a frame, it should be possible to detect the end of it, too. Using time gaps, the end of the previous packet can be detected only after a period of silence. Even if the receiver detects a silent medium, it cannot be sure if this is the result of a successful transmission or a link or node failure. Therefore, additional mechanisms are needed. 1.2.2.2 Code Violations A bit is usually encoded by a certain signal pattern (e.g., a change in voltage or current levels) that is, of course, different for a 0 and 1 bit. A signal pattern that represents neither of the allowed values can be taken as a marker for the start of a frame. An example for this is the IEEE 802.5 Token Ring protocol [28], which uses differential Manchester encoding (see Figure 1.1 and [28]). Here, two special symbols appear: J for a so-called positive code violation and K for a negative one. In contrast to the bit definitions in the encoding, these special symbols do not show a transition in the middle of the bit. Special 8-bit-long characters that mark the beginning and end of the frame are constructed from these symbols. 1.2.2.3 Start/End Flags Some protocols use special flags to indicate the frame boundaries. A sequence of 01111110, that is, six 1 bits in a sequence surrounded by two 0 bits, marks the beginning and the end of each frame. Of course, since the payload that is being transmitted can be an arbitrary sequence of bits, it is possible that the flag is contained in the payload. To avoid misinterpretation of a piece of payload data as being the end of a frame, the sender has to make sure that it only transmits the flag pattern if it is meant as a flag. Any

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-5

flag-like data have to be altered in a consistent way to allow the receiver to recover the original payload. This can be done using bit- or byte-/character-stuffing techniques. Bit stuffing, as exercised in high-level data link control (HDLC) [23] protocols, requires the sender to insert a zero bit after each sequence of five consecutive 1 bits. The receiver checks if the sixth bit that follows five 1s is a zero or one bit. If it detects a zero, it is removed from the sequence of bits. If it detects a 1, it can be sure that this is a frame boundary. Unfortunately, it might happen that a transmission error modifies the data sequence 01111100 into 01111110, and thus creates a flag prematurely. Therefore, additional mechanisms like time gaps are needed to remove the following bits and detect the actual end of the frame. Byte stuffing, as employed in Point-to-Point Protocol (PPP), uses the same flag pattern, but relies on a byte-oriented transmission medium [5, 6]. The flag can be written as a hexadecimal value 0¥7E. Every unintentional appearance of the flag pattern is replaced by two characters, 0¥7D 0¥5E. This way, the flag character disappears, but the 0¥7D (also called escape character) has to be replaced, if it is found in the user data. To this end, 0¥7D is replaced by 0¥7D 0¥5D. The receiver, after detecting the escape character in the byte stream, discards this byte and performs an exclusive-or (XOR) operation of 0¥20 with the following byte to recover the original payload. In both cases, more data are being transmitted than would be necessary without bit- or byte-stuffing techniques. To make things worse, the amount of overhead depends on the contents of the payload. A malicious user might effectively double the data rate (with byte stuffing) or increase it by around 20% (with bit stuffing) by transmitting a continuous stream of flags. To avoid this, several measures can be taken. One is to scramble the user data before they are put into data frames [4]. Another possibility is the so-called consistent overhead byte stuffing (COBS), proposed in [1]. Here, the stream of data bytes is scanned in advance for appearing flags. The sequence of data bytes is then cut into chunks of at most 254 bytes not containing the flag. Thus, every flag that appears in the flow is replaced by one byte representing the number of nonflag data bytes following it. This way, no additional data are to be transmitted as long as there is at least one flag every 255 data bytes. Otherwise, one byte is inserted every 254 bytes, indicating a full-length chunk. 1.2.2.4 Length Field To avoid the processing overhead that comes with bit or character stuffing, it is possible to reserve a field in the frame header that indicates the length of the whole frame. Having read this field, the receiver knows in advance how many characters or bytes will arrive. No end delimiter is needed anymore. Either a continuous transmission of packets followed by idle symbols, or the usual combination of preamble and start delimiter is needed to correctly determine which of the header fields carries the length information. Being potentially the best solution concerning transmission overhead, the length field mechanism suffers from erroneous transmission media. If the packet length information is lost or corrupted, then it is difficult to find again. Therefore, it has to be protected separately using error-correcting codes or redundant transmission. Additional mechanisms (for example, time gaps) should be employed to find the end of a frame even when the length field is erroneous.

1.2.3 Example: Bit and Frame Synchronization in the PROFIBUS As an example to illustrate the mechanisms introduced above, let us look at the lower layers of the EN 50170/DIN 19245 process fieldbus (PROFIBUS) [52]. This standard defines a fieldbus system for industrial applications. The lowest layer, the physical layer, is based on the RS-485 electrical interface. Shielded twisted-pair or fiber cable may be used as the transmission medium. The UART converts every byte into 11-bit transmission characters by adding start, parity, and stop bits. Thus, asynchronous bit synchronization is used on the lowest layer of the PROFIBUS. The second layer is called the fieldbus data link (FDL). It defines the frame format, as shown in Figure 1.2. Start and end delimiters are used in every frame, but different start delimiters SDx (SD1 to SD4; the

© 2005 by CRC Press

1-6

The Industrial Communication Technology Handbook

latter is not shown in the figure) define different frame types. Thus, a receiver knows after reading an SD1 that a control frame of fixed length will arrive. In addition, time gaps of 33 bit times are required between the frames. After receiving an SD3, the receiver interprets the next byte as LE (length field) and checks this against the redundant transmission of LE in the third byte, thereby decreasing the probability of undetected errors in the length field. Using the combination of time gaps and the redundant transmission of the length field, a character stuffing to replace all possible start and end delimiters in the payload becomes unnecessary.

1.3 Medium Access Control Protocols All medium access control or multiple-access control (MAC) protocols try to solve the same problem: to let a number of stations share a common resource (namely, the transmission medium) in an efficient manner and such that some desired performance objectives are met. They are a vital part of local area network (LAN) and metropolitan area network (MAN) technologies, which typically connect a small to moderate number of users in a small geographical area, such that a user can communicate with other users. With respect to the Open Systems Interconnection (OSI) reference model, the MAC layer does not form a protocol layer on its own, but is considered a sublayer of either the physical layer or the data link layer [45]. However, due to its distinguished task, the MAC sublayer deserves separate treatment. The importance of the MAC layer is reflected by the fact that many MAC protocol standards exist, for example, the IEEE 802.x standards. Its most fundamental task is to determine for each station attached to a common broadcast medium the points in time where it is allowed to access the medium, i.e., to send data or control frames. To this end, each station executes a separate instance of a MAC protocol. The design and behavior of a MAC protocol depend on the design goals and the properties of the underlying physical medium. Specifically for hard real-time communications, the MAC layer is a key component: if the delays on the MAC layer are not strictly bounded, the upper layers cannot compensate this. A large number of MAC protocols have been developed during the last three decades. The following references are landmark papers or survey articles covering the most important protocols: [7], [8], [17], [20], [21], [32], [33], [34], [36], [42], [43], [48], [49], [50], [54]. Furthermore, MAC protocols are covered in many textbooks on computer networking, for example, [12], [23], [45]. In this survey, we stick to those protocols that are important for industrial applications and that have found some deployment in factory plants, either as stand-alone solutions or as building blocks of more complex protocols.

1.3.1 Requirements and Quality-of-Service Measures There are a number of (sometimes conflicting) requirements for MAC protocols; some of them are specific for industrial applications with hard real-time and reliability constraints. There are two main delay-oriented measures: the medium access delay and the transmission delay. The medium access delay is the time between arrival of a frame and the time where its transmission starts. This delay is affected by the operational overhead of the MAC itself, which may include collisions, MAC control frames, backoff and waiting times, and so on. The transmission delay denotes the time between frame arrival and its successful reception at the intended receiver. Clearly, the medium access delay is a fraction of the transmission delay. For industrial applications with hard real-time requirements, both delays must be upper bounded. In addition, a desirable property is to have low medium access delays in case of low network loads. A key requirement for industrial applications is the support for priorities: important frames (for example, alarms, periodic process data) should be transmitted before unimportant ones. This requirement can be posed locally or globally: in the local case, each station decides independently which of its waiting frames is transmitted next. There is no guarantee that station A’s important frames are not blocked by station B’s unimportant frames. In the global case, the protocol identifies the most important frame of all stations to be transmitted next.

© 2005 by CRC Press

1-7

Principles of Lower-Layer Protocols for Data Communications

The need to share bandwidth between stations constitutes another important class of desired MAC properties. A frequently posed requirement is fairness: stations should get their fair share of the bandwidth, even if other stations demand much more. It is also often required that a station receives a minimum bandwidth, like for the transmission of periodic process data of fixed size. With respect to throughput, it is clearly important to keep the MAC overhead small. This concerns the frame formats, the number and frequency of MAC control frames, and efficiency losses due to the operation of the MAC protocol. An example for efficiency loss is collisions: the bandwidth spent for collided packets is lost, since typically the collided frames are useless and must be retransmitted. A MAC protocol is said to be stable if an increase in the overall load does not lead to a decrease in throughput. Depending on the application area, other constraints can be important as well. For simple field devices, the MAC implementation should have a low complexity and be simple enough to be implementable in hardware. For mobile stations using wireless media, the energy consumption is a major concern; therefore, power-saving mechanisms are needed. For wireless transmission media, the MAC should contain additional mechanisms to adapt to the instantaneous error behavior of the wireless channel; possible control knobs are the transmit power, error-correcting codes, the bit rate, and several more.

1.3.2 Design Factors The most important factors influencing the design of MAC protocols are the medium properties/medium topology and the available feedback from the medium. We can broadly distinguish between guided media and unguided media. In guided media the signals originating from frame transmissions propagate within well-specified geographical bounds, typically within copper or fiber cables. If the medium is properly shielded, then beyond these bounds the communications are invisible and two cables can be placed close to each other without mutual interference. In contrast, in unguided media (with radio frequency or infrared wireless media being the prime example) the wave propagation is visible in the whole geographical vicinity of the transmitter, and ongoing transmissions can be received at any point close enough to the transmitter. Therefore, two different networks overlayed within the same geographical region can influence each other. This coexistence problem appears, for example, with IEEE 802.11b [37] and Bluetooth [13, 22]. Both systems utilize the 2.4-GHz industrial, scientific, and medical (ISM) band [16, 25, 35]. Guided media networks can have a number of topologies. We discuss a few examples. In a ring topology (see Figure 1.3), each station has a point-to-point link to its two neighbors, such that the stations form a ring. In a bus topology like the one shown in Figure 1.4, the stations are connected to a common bus

1

4

FIGURE 1.3 Ring topology.

© 2005 by CRC Press

1

2

4

3

2

3 FIGURE 1.4 Bus topology (the black boxes are line terminations).

1-8

The Industrial Communication Technology Handbook

1

2

Star

4

3

FIGURE 1.5 Star topology.

1

7

2

5 6 3

8

4 FIGURE 1.6 Partial mesh topology.

and all stations see the same signals. Hence, the bus is a broadcast medium. In the star topology illustrated in Figure 1.5, all stations only have a physical connection to a central device, the star coupler, which repeats and optionally amplifies the signals coming from one line to all the other lines. A network with a star topology also provides a broadcast medium, where each station can hear all transmissions. When using wireless transmission media, the distance between stations might be too large to allow all stations to receive all transmissions. Therefore, the network is often only partially connected or has a partial mesh structure, shown in Figure 1.6. Additional routing mechanisms have to be employed to implement multihop transmission, for example, from station 4 to station 8. An important property of a physical channel is the available feedback. Specifically, some kinds of media allow a station to read back data from the channel while transmitting. This can be done to detect faulty transceivers (like in the PROFIBUS protocol [52]), collisions (like in the Ethernet protocol), or parallel ongoing transmissions of higher priority (like in the Controller Area Network (CAN) protocol). This feature is typically not available when using wireless technologies: it is not possible to send and receive simultaneously on the same channel.

1.3.3 Random Access Protocols In random access (RA) protocols the stations are uncoordinated and the protocols work in a fully distributed manner. RA protocols typically incorporate a random element, for example, by exploiting random packet arrival times, setting timers to random values, and so on. The lack of central coordination and of fixed resource assignment allows the sharing of a channel between a potentially infinite number of stations, whereas fixed assignment and polling protocols support only a finite number of stations. However, the randomness can make it impossible to give deterministic guarantees on medium access delays and transmission delays. There are many RA protocols that are used not only on their own, but also as building blocks of more complex protocols. One example is the GSM system, where speech data are transmitted in exclusively allocated time slots on a certain frequency, but the call setup messages have to contend for a shared channel using an ALOHA-like protocol.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-9

1.3.3.1 ALOHA and Slotted ALOHA A classical protocol is ALOHA [7], for which we present two variants here. In both variants a number of stations want to transmit packets to a central station. In pure ALOHA a station sends a newly arriving data frame immediately without inquiring the status of the transmission medium. Hence, frames from multiple stations can overlap at the central station (collision) and become unrecognizable. In slotted ALOHA all stations are synchronized to a common time reference and the time is divided into fixed-size time slots. Newly arriving frames are transmitted at the beginning of the next time slot. In both ALOHA variants the transmitter starts a timer after frame transmission. The receiver has to send an immediate acknowledgment frame upon successful reception of the data frame. When the transmitter receives the acknowledgment, it stops the timer and considers the frame successfully transmitted. If the timer expires, the transmitter selects a random backoff time and waits for this time before the frame is retransmitted. The backoff time is chosen randomly to avoid synchronization of colliding stations. This protocol has two advantages: it is extremely simple and it offers short delays in case of a low network load. However, the protocol does not support priorities, and with increasing network load, the collision rate increases and the transmission delays grow as well. In addition, ALOHA is not stable: above a certain threshold load an increase in the overall load leads to a decrease in overall throughput. The maximum normalized throughput of pure ALOHA is 1/2e ª18% under Poisson arrivals and an infinite number of stations. The maximum throughput can be doubled with slotted ALOHA. A critical parameter in ALOHA is the backoff time, which is typically chosen from a certain time interval (backoff window). A collision can be interpreted as a sign of congestion. If another collision occurs after the backoff time, the next backoff time should be chosen from a larger backoff window to reduce the pressure on the channel. A popular rule for the evolution of the backoff window is the truncated binary exponential backoff scheme, where the backoff window size is doubled upon every collision. Above a certain number of failed trials, the window remains constant. After successful transmission the backoff window is restored to its original value. 1.3.3.2 CSMA Protocols In carrier-sense multiple-access (CSMA) the stations act more careful than in ALOHA: before transmitting a frame they listen on the medium (carrier sensing) to see whether it is busy or free [32, 46]. If the medium is free (many protocols require it to be contiguously free for some minimum amount of time), the station transmits its frame. If the medium is busy, the station defers transmission. The various CSMA protocols differ in the following steps. In nonpersistent CSMA the station simply defers for a random time (backoff time) without listening to the medium during this time. After this waiting time the station listens again. All other protocols discussed next wait until the end of the ongoing transmission before starting further activities. In p-persistent CSMA (0 < p < 1) the time after the preceding transmission ends is divided into time slots. A station listens to the medium at the beginning of a slot. If the medium is free, the station starts transmitting its frame with a probability p and with probability 1 – p it waits until the next slot comes. In 1-persistent CSMA the station transmits immediately without further actions. Both approaches still have the risk of collisions, since two or more stations can decide to transmit (1-persistent CSMA) or can choose the same slot (p-persistent CSMA). The problem is the following: if station A senses the medium as idle and starts transmission at time t0, station B would notice this earliest at some later time t0 + t, due to the propagation delay. If B performs carrier sensing at a time between t0 and t0 + t, it senses the medium to be idle and starts transmission too, resulting in a collision. Therefore, the collision probability depends on the propagation delay, and thus on the maximum geographical distance between stations. Similar to ALOHA, pure CSMA protocols rely on acknowledgments to recognize collisions. Although the throughput of CSMA-based protocols is much better than that of ALOHA (ongoing transmissions can be completed without disturbance), the number of collisions and their duration limit the throughput. Collision detection and collision avoidance techniques can be used to relax these problems. These are discussed in the following sections. Specifically for wireless media the task of carrier sensing is not without problems. After all, the transmitter senses the medium ultimately because he wants to know the state of the medium at the

© 2005 by CRC Press

1-10

The Industrial Communication Technology Handbook

A

C

B

FIGURE 1.7 Hidden-terminal scenario.

intended receiver, since collisions are only important at the receiver. However, due to path loss [40, Chapter 4], any signal experiences attenuation with increasing distance. If a minimum signal strength is required, the hidden-terminal problem occurs (refer to Figure 1.7): consider three stations, A, B, and C, with transmission radii as indicated by the circles. Stations A and C are in range of B, but A is not in the range of C and vice versa. If C starts to transmit to B, A cannot detect this by its carrier-sensing mechanism and considers the medium to be free. Hence, A also starts frame transmission and a collision occurs at B. For wireless media there is a second scenario where carrier sensing leads to false predictions about the channel state at the receiver: the so-called exposed-terminal scenario, depicted in Figure 1.8. The four stations A, B, C, and D are placed such that the pairs A/B, B/C, and C/D can hear each other; all remaining combinations cannot. Consider the situation where B transmits to A, and one short moment later C wants to transmit to D. Station C performs carrier sensing and senses the medium is busy, due to B’s transmission. As a result, C postpones its transmission. However, C could safely transmit its frame to D without disturbing B’s transmission to A. This leads to a loss of efficiency. Two approaches to solve these problems are busy-tone solutions [50] and the request-to-send (RTS)/ clear-to-send (CTS) protocol, as applied in the IEEE 802.11 wireless LAN (WLAN) medium access control protocol [47]. In the busy-tone approach the receiver transmits a busy-tone signal on a second channel during frame reception. Carrier sensing is performed on this second channel. This solves the exposedterminal problem. The hidden-terminal scenario is also solved, except the rare cases where A and C start their transmissions simultaneously. The RTS/CTS protocol attacks the hidden-terminal problem using only a single channel. Consider the case that A has a data frame for B. After A has obtained channel access, it sends a short RTS frame to B, indicating the time duration needed for the whole frame exchange sequence (the sequence consists of the RTS frame, the CTS frame, a data frame, and a final acknowledgment frame). If B receives the RTS frame properly, it answers with a CTS frame, indicating the time needed for the remaining frame exchange sequence. Station A starts transmission after receiving the CTS frame. Station C, hearing the RTS and CTS frames, defers its transmissions for the indicated time, thus not disturbing the ongoing frame exchange. It is a conservative choice to defer on any of these frames, but the exposed-terminal problem still exists. If station C defers only on receiving both frames, the exposed-terminal problem is solved. However, there is the risk of bit errors in the CTS frame, which may lead C to start transmissions falsely. The RTS/CTS protocol of IEEE 802.11 does not resolve collisions of RTS frames at the receiver, nor does

A

FIGURE 1.8 Exposed-terminal scenario.

© 2005 by CRC Press

B

C

D

Principles of Lower-Layer Protocols for Data Communications

1-11

it entirely solve the hidden-terminal problem [39]. Furthermore, this four-way handshake imposes serious overhead, which only pays out for large frames. 1.3.3.3 CSMA Protocols with Collision Detection If two or more stations collide without recognizing this, they would uselessly transmit their entire frames. If the stations could quickly detect a collision and abort transmission, less bandwidth is wasted. The class of carrier-sense multiple access with collision detection (CSMA/CD) protocols enhances the basic CSMA method with a collision detection facility. The collision detection is performed by reading back the signal from the cable during transmission, and by comparing the measured signal with the transmitted one. If the signals differ, a collision has been detected [23, Section 6.1.3]. When a station experiences a collision, it executes a backoff algorithm. In the IEEE 802.3 Ethernet this algorithm works with slotted time. A time slot is large enough to accommodate the maximum roundtrip time, in order to make sure that all stations have the chance to reliably recognize an ongoing transmission. As an example, in the CSMA/CD method of IEEE 802.3 a truncated binary exponential backoff scheme is used: after the first collision, a station randomly chooses to wait either 0 or 1 slot. If another station starts transmission during the waiting time, the station defers. After the second collision, a station chooses to wait between 0 and 3 slots, and for all subsequent collisions, the backoff window is doubled. After 10 collisions the backoff window is kept fixed to 1024 slots, and after 16 collisions the station gives up and discards the frame. In wireless LANs (for example, in the IEEE 802.11 wireless LAN) acknowledgment frames are used to give the transmitter feedback, since wireless transceivers cannot transmit and receive simultaneously on the same channel. The lack of an acknowledgment frame indicates either a collision or a transmission error. Furthermore, two colliding frames do not necessarily result in total loss of information: when the signal strength of one frame is much stronger than the other one, the receiver may be able to successfully decode the frame (near–far effect). 1.3.3.4 CSMA Protocols with Collision Resolution This class of CSMA protocols reacts to collisions not by going into a backoff mode and deferring transmissions, but by trying to resolve them. One approach to resolving a collision is to determine one station among the contenders, which is ultimately allowed to send its frame. One example for this is protocols with bit-wise priority arbitration like the MAC protocol of Controller Area Network (CAN) [30] and the protocol used for the D-channel of Integrated Services Digital Network (ISDN) [41]. Another approach is to give all contenders a chance to transmit, like what is done in the adaptive tree walking protocol [14], which works as follows: The time is slotted, just as in the Ethernet CSMA/CD protocol. Furthermore, all stations are arranged in a balanced binary tree T and know their respective positions in this tree. All stations wishing to transmit a frame (called backlogged stations) wait until the end of the ongoing transmission and start to transmit their frame in the first slot (slot 0). If there is only one backlogged station, then it could transmit its frame without further disturbance. If two or more stations collide, then in slot 1 only the members of the left subtree TL are allowed to try transmission again. If another collision happens, only stations of the left subtree TL,L of TL are allowed to transmit in slot 2, and so forth. On the other hand, if only one station from TL transmits its frame, then for fairness reasons the next frame transmission is reserved for a station from the right subtree TR, and so on. The bit-wise arbitration protocols do not try to be fair to stations. As an example, we present the MAC protocol of CAN. CAN requires a transmission medium that guarantees that overlapping signals do not destroy each other, but lead to a valid signal. If two stations transmit the same bit, the medium adapts the common bit value. If one station transmits a zero bit and the other a one bit, the medium adopts a well-defined state, for example, a zero bit. The CAN protocol uses a priority field of a certain length at the beginning of a MAC frame. Backlogged stations wait until the end of an ongoing frame and then transmit the first bit of their priority field. In parallel, they read back the state of the medium and compare it with their transmitted bits. If both agree, the station continues with the second bit of the priority field. If the bits differ, the station has lost contention and defers until the end of the next frame. This process

© 2005 by CRC Press

1-12

The Industrial Communication Technology Handbook

is continued until the end of the priority field is reached. If it can be guaranteed that all priority values are distinct, only one station survives contention. This protocol supports global frame priorities in a natural way, and the medium access time for the highest-priority frame is tightly bounded. However, the assignment of priorities to stations or frames is nontrivial when fairness is a goal. If the priorities are assigned on a per-station basis, the protocol is inherently unfair. One solution is to rotate station priorities over time. In CAN applications the priorities are not assigned to stations but to data. Another drawback of the protocol is that all stations have to be synchronized with a precision of a bit time, and the need for all stations to agree on the state of the medium limits either the bit rate or the geographical extension of a CAN network. 1.3.3.5 CSMA Protocols with Collision Avoidance If it is technically not feasible to immediately detect collisions, one might try to avoid them. Protocols belonging to this class are called carrier-sense multiple access with collision avoidance (CSMA/CA) protocols. An important application area is wireless LANs, where (1) stations cannot transmit and receive simultaneously on the same channel, and (2) the transmitter cannot directly detect collisions at the receiver due to path loss and the need for a minimum signal strength (see the discussion of the hiddenterminal scenario in Section 1.3.3.2). The IEEE 802.11 WLAN protocol combines two mechanisms to avoid collisions. The first one is the RTS/CTS handshake protocol described in Section 1.3.3.2. The second mechanism, the carrier-sensing mechanism of IEEE 802.11, not only requires a minimum idle time on the channel, but each station also chooses a random backoff time, during which the carrier-sense operation is continued. If another station starts to transmit in the meantime, the station defers and resumes after the other frame is finished. This approach with random backoff times also enables the introduction of stochastic priorities into IEEE 802.11: frames with different priorities can choose their backoff times from different distributions, with more important frames likely having shorter backoffs than unimportant ones. Such an approach is proposed in [11] and also used for the IEEE 802.11e extension of the IEEE 802.11 standard. Another example is the EY/NPMA protocol of HIPERLAN [18]. Here the collision avoidance part consists of three phases; all stations wishing to transmit a frame wait for the end of the ongoing transmission. In the first phase (priority phase), the stations wait for a number of slots corresponding to the frames priority (there are five distinct priorities). If station A decides to transmit in slot n and station B starts in slot m < n, then A defers, since B has a higher-priority frame. In the second phase (elimination phase), the surviving stations transmit a burst of random length before switching to receive mode. If it receives some energy, the station gives up, since another station sends a longer burst. In the third phase (yield phase), the surviving stations keep idle for a random amount of time. If another station starts to transmit in the meantime, the station defers. Otherwise, the station starts to transmit its data frame.

1.3.4 Fixed-Assignment Protocols In fixed-assignment (FA) protocols a station is assigned a channel resource (frequency, time, code, space) exclusively; i.e., it does not need to contend with other stations when using its share, and it is intrinsically guaranteed that medium access can be achieved within a bounded time. In frequency-division multiple-access (FDMA) systems the available spectrum is subdivided into N subchannels, with some guard band between them. A channel is assigned exclusively to a station. When a frame arrives, the station can transmit immediately on the assigned channel; the intended receiver has to know the channel in advance. Idle subchannels cannot be used by highly loaded stations. When a station wants to use multiple subchannels in parallel, it needs multiple transceivers. In code-division multiple-access (CDMA) systems the stations spread their frames over a much larger bandwidth than needed while using different codes to separate their transmissions. The receiver has to know the code used by the transmitter; all parallel transmissions using other codes appear as noise. Similar to FDMA, stations can transmit newly arriving frames immediately.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-13

In time-division multiple-access (TDMA) systems the time is divided into fixed-length superframes, which in turn are divided into time slots. Each station is assigned a set of slots in every superframe.* During its slots, a station can use the full channel bandwidth. The stations need to be synchronized on slot boundaries; however, the slots contain some guard times to compensate for inaccurate synchronization. In a centralized setting where all stations transmit to a central station, such inaccuracies can be introduced by different propagation delays resulting from different distances between stations and the central controller. In the GSM network a timing-advance mechanism is used to compensate different propagation delays [55]. In space-division multiple-access (SDMA) systems spatial resources are divided among stations. Consider, for example, a cellular system, where the base station is equipped with a smart antenna array. Using this array, the base station can form a number of spatially directed spot beams and focus them to the stations. If a beam covers two or more stations, they have to share the channel by some other protocol, but stations in different beams can transmit in parallel. Another example is the use of sectored antennas in cellular systems. In all these schemes the allocation of channel resources to stations can be static or dynamic. In the static case the allocation may be preconfigured. In a dynamic scheme a station requests the resource once from some resource management facility, which may be part of a central station/access point. The TimeTriggered Protocol is an example of such a scheme [51]. Another example is the cyclic window in WorldFIP, which offers a preconfigured allocation of time slots. Some light dynamics can be introduced by changing the allocation tables at appropriate times. The fixed assignment of channel resources is advantageous, specifically for industrial applications, for the following reasons: • It allows the guarantee of a minimum bandwidth to a station. • It allows the guarantee of a strictly bounded medium access time as well as strictly isochronous service.

1.3.5 Demand-Assignment Protocols In demand-assignment (DA) protocols channel resources are also assigned exclusively to a station, but on a much shorter timescale than for fixed-assignment protocols. In the latter case, assignment happens once and lasts for the lifetime of a session, while in demand-assignment protocols resources are assigned only for the duration of a data burst. Consequently, for each new data burst a station must obtain new channel resources. Clearly, this involves appropriate signaling mechanisms. We can broadly distinguish two classes of DA protocols: In distributed protocols there is no central authority for resource allocation; instead, token-passing schemes are often used. On the other hand, in centralized protocols the stations have to signal their demands to a central station, which assigns resources and schedules transmissions. The signaling channel can be either in-band (requests can be piggybacked to transmissions of data or control frames) or a separate logical signaling channel with its own medium access procedure, for example, ALOHA/slotted ALOHA. For industrial applications demand-assignment protocols have two major advantages: they can guarantee a bounded medium access time and they allow the use of idle resources. However, for the distributed schemes there is inevitably some jitter in the medium access times, which hinders strictly isochronous services. The centralized schemes introduce a single point of failure, namely, the resource manager. 1.3.5.1 Centralized Schemes: Hub-Polling Protocols and Reservation Protocols As a very general description [44], a hub-polling system consists of a central station (called hub) and a number of stations, with each station conceptually having a queue of frames. The hub carries out two different tasks: (1) it queries the queue states from the stations, and (2) it assigns bandwidth to the stations according to the query results and some polling policy. Typically it is assumed that a query is less costly *It is perfectly possible to assign slots every k-th superframe as well.

© 2005 by CRC Press

1-14

The Industrial Communication Technology Handbook

than to serve a frame; otherwise, the query overhead would not be justified. To be queried, a station must register itself with the hub. Polling schemes differ in the sequence by which stations are polled: • In round-robin, the stations are visited one after another. • In table-driven schemes, the next station to be visited is determined from a prespecified table. • In random polling, the next station to poll is determined randomly. Furthermore, they differ in the type of service a polled station is granted: • k-limited service: Up to k frames are served per station before proceeding to the next station. • Time-limited service: The station may transmit frames, including retransmissions, for no longer than a specified time. • Exhaustive service: A queue is serviced until it is empty. • Gated service: The server serves only those frames of station i that were already present when starting service for i. As an example, the master/slave protocol of PROFIBUS can be classified as a table-driven and timelimited service (however, with varying masters). In the BITBUS protocol [29], the role of the master does not change over time. A variation of hub-polling protocols is probing protocols [24, 42]. These are based on the observation that polling each station separately is wasteful, if the load is low. Instead, it is more effective to poll a group of stations as a whole. For example, the hub may announce that a random access slot follows, which can be used by stations belonging to a certain group to signal their transmission needs. If no station answers, the next group can be polled. If a single station answers, it is granted access to the medium. If two or more stations answer, their requests will collide in the random access slot. Different methods can now be applied to resolve this collision, for example, the tree walking approach discussed in Section 1.3.3.4, or all stations in the group can be polled separately. In [56] the latter approach is introduced, along with a scheme that adapts the group sizes to the current load. In reservation protocols the stations have to send a reservation message to the resource manager. The reservation message may specify the length of the desired data transmission and its timing constraints. The resource manager can perform an admission control test to decide whether the request can be satisfied without harming the guarantees given to already admitted requests. After successful admission control, the resource manager sends some feedback describing the allocated resources (for example, the time slots to use). There are three common methods to transmit reservation messages: (1) in piggybacking schemes the reservation requests are sent along with already admitted data or control frames; (2) the stations send request frames on a separate signaling channel using a contention-based MAC protocol (ALOHA or CSMA protocols); and (3) the resource manager may poll all stations that are currently idle and thus cannot use piggybacking. Many protocols developed in the context of wireless ATM [17] belong to this class, for example, the MASCARA protocol [38]. The FTT-CAN [10] protocol is another example of this class, where stations send reservation requests for periodic transmissions to a central master station. 1.3.5.2 Distributed Schemes: Token-Passing Protocols In distributed schemes there is no central facility controlling resource allocation or medium access. Instead, a special frame, called the token frame, circulates between stations. Only the station that currently holds the token (token owner) is allowed to initiate transmissions. After some time, the token owner must pass the token to another station by sending a token frame. Token-passing schemes can be applied in networks with a ring topology (examples: IEEE 802.5 Token Ring [28] or Fiber Distributed Data Interface (FDDI) [15, 31]) or with a bus/tree topology (examples: IEEE 802.4 Token Bus [27] or PROFIBUS with the FMS profile [52]). To guarantee an upper bound on medium access delay, the IEEE Token Bus, FDDI, and PROFIBUS protocols use variants of the timed-token protocol [9]. In this protocol all stations agree on a common parameter, the target token rotation time TTTRT . Furthermore, each station is required to measure the time TRT that passed between the last time it received the token and the actual token reception time. This time

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-15

is called the token rotation time. If the difference TTTRT – TRT is positive, the arriving token is called an early token; otherwise, it is called a late token. Some protocols forbid a station to transmit when receiving a late token; however, the PROFIBUS protocol allows transmission of a single high-priority frame in case of a late token. When a station receives an early token, it may transmit for a time corresponding to TTTRT – TRT . This way, the timed-token protocol guarantees an upper bound on the medium access delay. Token-passing protocols over broadcast media (bus, tree) construct a logical token-passing ring. The token frame is passed among all stations in this ring; each station gets the token once per cycle. The ring members have the additional burden to execute ring maintenance algorithms, which include, among others, new stations, excluding leaving or crashed stations; detect and repair lost tokens; and some more. These mechanisms rely on control frames and are designed in a way that they do not harm the timing guarantees given by the timed-token protocol.

1.3.6 Meta-MAC Protocols In reference [19] meta-MAC protocols are introduced. The basic idea is simple and elegant: a station contains not only a single MAC instance, but several of them, running in parallel. These can be entirely different protocols or the same protocol, but with different parameters. However, only one protocol is really active at a given time in the sense that its decisions (transmit/not transmit) are executed; in the other instances decisions are only recorded. From time to time a new active protocol is selected. This selection is based on history information about transmission outcomes (success, failure). For each candidate protocol it is evaluated how successful the protocol would have been given the outcomes in the history. For example, a protocol that produced a lot of transmit decisions in successful time slots would get a high ranking, while a protocol whose transmit decisions would have resulted in collisions gets a bad ranking. Based on this ranking, a new protocol is chosen.

1.4 Error Control Techniques When a packet is transmitted over a physical channel, it might be subject to distortions and channel errors. Potential error sources are noise, interference, loss of signal power, etc. As a result, a packet may be either completely or partially lost (for example, when the receiver fails to acquire bit synchronization or loses it somewhere), or a number of the bits within a packet are modified. In some types of channels, errors occur quite frequently, with wireless channels being the prime example [79]. One option to deal with errors is to tolerate them. For example, in Voice-over-IP systems a loss rate of speech packets of approximately 1% still gives an acceptable speech quality at the receiver, depending on the codec and the influence of error concealment techniques [65, Chapter 7]. Hence, as long as the loss rate is below this level, no action needs to be taken. However, in safety-critical industrial applications, errors are often not tolerable; they must be detected and subsequently corrected. There are the following fundamental approaches to error control [61, 69, 70]: • In open-loop approaches the transmitter receives no feedback from the receiver about the transmission outcomes. Redundancy is introduced to protect the transmitted data against a certain amount of errors. • In closed-loop schemes the transmitter gets feedback about erroneously received packets. The receiver requests retransmission of these packets by the transmitter. • In hybrid schemes these two approaches are combined. The detection of errors is based on checksums, which are appended to a packet. Well-known kinds of checksums are cyclic redundancy checks (CRCs) or parity bits [69]. However, no checksum algorithm is perfect; there are always bit error patterns that cannot be detected by a checksum algorithm. Hence, the residual error probability is nonzero, but fortunately very small for many practical channels. A study of the performance of checksum algorithms over real data is [76]. There is a rich literature on error control. Some standard references are [61], [69], [70], and [71].

© 2005 by CRC Press

1-16

The Industrial Communication Technology Handbook

1.4.1 Open-Loop Approaches In general, open-loop approaches involve redundant data transmission. Several kinds of redundancy can be used: • Send multiple copies of a packet. • Add redundancy bits to the packet data. • Diversity techniques. In the multiple-copies approach, the transmitter sends K identical copies of the same packet [57], each one equipped with a checksum. If the receiver receives at least one copy without checksum errors, this is accepted as the correct packet. If the receiver receives all copies with checksum errors, it might apply a bit-by-bit majority voting scheme [73, Chapter 4] on all received copies and check the result again. A variation of the multiple-copies scheme is to not send multiple copies of the same packet, but to send each bit of the user data multiple times: instead of sending 00110101, the transmitter sends, for example, 000.000.111.111.000.111.000.111. Hence, each user bit is transmitted three times and the receiver applies majority voting to each group of three bits. In error-correcting codes or forward error correction (FEC) codes to k bits of user data, a number n – k of redundancy bits are appended and the block of n bits is transmitted (the fraction k/n is called code rate), such that bit errors can be detected and a limited amount of bit errors can be corrected [69–71]. In block coding schemes, the user data are divided into blocks of k bits and each block is coded independently. Some well-known block FEC schemes are Reed–Solomon codes, Hamming codes, and Bose–Chaudhuri–Hocquenghem (BCH) codes. In convolutional coding schemes, the encoder has some memory, such that the coding of the current bit affects coding of future bits. Therefore, there are no clear block boundaries. Recently, the class of turbo codes has attracted considerable attention [75, 60]. In this class of codes two convolutional codes are concatenated and combined with an interleaver [58]. Diversity techniques are often applied on wireless channels. In general, in diversity schemes multiple copies of the same signal are created and the receiver tries to combine these copies in a sensible way. These copies can be created either explicitly (by sending the same packet multiple times on the same channel, on different channels, in different directions, etc.) or implicitly (by letting the channel create the multiple signal copies — reflections). In the case of receiver diversity, the receiver is equipped with two or more antennas. If these are appropriately spaced [72], the antennas receive two copies of the transmitted waveform, which in the best case are uncorrelated. Hence, it might happen that one antenna receives only a weak signal while the other one experiences good signal quality. The two antenna signals may then be combined in different ways. Clearly, there are many more diversity schemes.

1.4.2 Closed-Loop Approaches In closed-loop approaches receiving station B checks the arriving packets sent by station A by means of checksums and sequence numbers. In addition, station B provides A with feedback information indicating the transmission outcome (success or failure). Usually, B sends acknowledgment frames to provide this feedback to A, but the feedback information may as well be piggybacked onto data frames sent from B to A. Automatic repeat request (ARQ) protocols implement this approach [23, 63]. Some basic ARQ protocols are the send-and-wait/alternating-bit protocol, the Goback-N protocol, and the SelectiveRepeat protocol. In the send-and-wait/alternating-bit protocol, the transmitter sends a packet and starts a timer. The receiver sends an acknowledgment if the packet is received correctly; otherwise, it keeps quiet. If the transmitter receives the acknowledgment, the timer is canceled and the next packet is transmitted. If the transmitter’s timer expires without acknowledgment, the transmitter retransmits the packet. A 1-bit sequence number is used to prevent duplicates at the receiver, which can occur if not the data frame but the acknowledgment is lost and the same data frame is transmitted again. If the receiver receives a duplicate packet, the packet is acknowledged, but the data are not delivered to the user. This protocol is simple and works reliably as long as the delay for data packets or acknowledgments can be upper bounded.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-17

A drawback of this protocol is its inability to fill “long fat pipes” (links with a high-bandwidth-delay product), since there can be at most one outstanding (not yet acknowledged) frame at any time. Both the Goback-N and the Selective-Repeat protocols are not restricted to a single unacknowledged or outstanding frame, but allow for multiple outstanding frames; these protocols are also called slidingwindow protocols. The frames are identified by sequence numbers. In the Goback-N protocol there may be up to N outstanding frames. The transmitter sets a timer for each transmitted frame and resets it as soon as an acknowledgment for this frame is received. If the receiver receives an in-sequence frame, it delivers the frame to its local user and sends a positive acknowledgment; otherwise, the frame is dropped (even if it is received correctly) and the receiver sends a negative acknowledgment or keeps quiet. When the transmitter receives a negative acknowledgment for an outstanding frame, or if the timer for this frame expires, it retransmits this frame and all subsequent outstanding frames. Therefore, it might happen that correctly received frames are retransmitted, which is inefficient. This drawback is attacked by the Selective-Repeat protocol, which works similar to Goback-N, but allows the receiver to buffer and acknowledge frames that are not received in sequence. As soon as the missing frames arrive, the buffered frames are delivered to the user in their correct sequence and the buffer is freed.

1.4.3 Hybrid Approaches Open-loop and closed-loop approaches can be combined, forming so-called hybrid ARQ protocols. Some simple schemes are: • To each packet some light FEC is applied; the remaining errors are corrected through the ARQ mechanism. • Normal packets are not FEC coded; only retransmissions are. • Increase the amount of redundancy for subsequent retransmissions, for example, by adapting the number of copies in a multicopy approach [57]. Another line of attack would be to make the receiver more clever and to take advantage of the information contained in already received erroneous packets by using packet-combining methods [64, 66, 78], for example, equal-gain combining or bit-by-bit majority voting. Such an approach is also referred to as type II hybrid ARQ [70]. In [59] and [77] this approach has been made deadline aware, by adopting the strategy to increase the coding strength (decreasing the code rate) more and more as the packet deadline comes closer (deadline-dependent coding). In reference [62] a scheme is described that takes both the estimated channel state and the packet deadline into account to select one coding scheme from a set of available schemes. For a bursty wireless channel, this scheme reduces the bandwidth need for a prescribed maximum failure probability, compared to a static scheme solely taking the channel state into account. A scheme that utilizes already received and partially erroneous packets but does not require redundancy is the intermediate checksum method (see, for example, [68]), where a packet is not equipped with a single checksum covering its whole contents, but is subdivided into several chunks, such that each chunk is equipped with a separate checksum. The receiver requests only the erroneous chunks for retransmission.

1.4.4 Further Countermeasures The transmitter has some further control knobs to reduce the probability of packet errors at the receiver. One control knob is the packet length: in a scenario where a packet is equipped with a checksum but not with redundant FEC data, longer packets have a higher probability of being received in error. On the other hand, short packets have a higher probability of being received successfully, but the overhead of the fixed-length packet header becomes prohibitive. For a given channel quality there is an optimum packet length, and adaptive schemes exist to find this [67, 68]. It is a fundamental communications law that the bit error rate at the receiver depends on the ratio of the energy expended per bit to the channel noise level [74]. There are two possibilities to use this relationship to increase transmission reliability:

© 2005 by CRC Press

1-18

The Industrial Communication Technology Handbook

• If the transmit power is increased, the energy per bit is increased and the bit error rate is reduced. However, often the transmit power is technically or legally restricted. • If the bits are transmitted at lower speed, the energy per bit is increased, too. Hence, a transmitter might apply transmit power control or modulation rate control.

1.5 Flow Control Mechanisms Flow control compensates different processing speeds of transmitters and receivers [12, Chapter 6], [23], [80]. Specifically, if the receiver does not have enough resources (buffers, processing speed) to process packets as fast as the transmitter sends them, mechanisms to slow down the transmitter are useful. Otherwise, the receiver would have to drop data packets, causing the transmitter to retransmit them and to waste further network resources. It is therefore necessary for the transmitter to receive feedback from the receiver. The function of flow control is to be distinguished from congestion control, although many authors consider the former to be a special case of the latter. Congestion control is relevant in multihop networks, where two end nodes are connected by a series of intermediate nodes, for example, routers. In congestion control, it is not the ultimate receiver but the intermediate nodes that need to be protected against resource exhaustion. However, here we do not discuss congestion control any further. Fieldbus systems offer different communication models. One important model is where all communications are performed between individually addressable stations and all packets are delivered from the link layer to the upper layers. Representatives of this class are PROFIBUS [52] and Foundation/IEC Fieldbus [26]. These systems can benefit from flow control mechanisms. Another important model is that of a real-time database, where not stations but data are addressed. The owner of a data item (the producer) broadcasts it, and all nodes interested in the data (the consumers) copy the data item into a preallocated local buffer. Each time the consumer receives an updated version of the same data item, it copies the data silently into the local buffer without notifying the applications running on the consumer node. The latter read the buffer contents when they need the value of the data item. Reading from and writing to this buffer are decoupled, and reading the buffer does not trigger any communications to fetch the data from the producer. Representatives of this class are CAN [30] and Factory Instrumentation Protocol (FIP)/WorldFIP [53]. Flow control is not an issue here, since the buffers are preallocated. Conceptually, flow control mechanisms need two key ingredients: • A signaling mechanism provides the transmitter with information about the available resources at the receiver. Different signaling mechanisms can vary in their accuracy (number of distinguishable states of receiver resource utilization), update frequency, signaling path (in-band or out-of-band), and relationship to other mechanisms, for example, error control. • A signaling answer determines the transmitter’s reaction to flow control signals. Flow control is not restricted to the data link layer, but is used in higher-layer protocols like the Transmission Control Protocol (TCP) as well. In the following we describe some of the most important mechanisms frequently found on the link layer. A more general discussion can be found in textbooks like [12], [23], and [45].

1.5.1 XON/XOFF and Similar Methods This family of flow control methods is simple. The receiver distinguishes only two different states: ready or not ready to accept frames. The transmitter, upon acquiring a ready signal, transmits frames at an arbitrary rate until it acquires a not-ready signal. After this, the transmitter does not transmit any data packets until a ready signal is again acquired. This basic scheme is implemented in different protocols. One application of this scheme can be found in the ITU V.24 recommendation (equivalent to EIA RS232-C), providing an interface between a DTE (data terminal equipment, for example, a computer) and

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-19

a DCE (data communications equipment, for example, a modem). This interface targets asynchronous and serial communications. It provides either 9 or 25 lines, 2 of which are the request-to-send and clearto-send lines. When the DTE wants to send data to the DCE, it raises the RTS line. If the DCE is willing to accept data, it answers by raising the CTS line. As soon as the DCE lowers the CTS signal, the DTE must stop transmission. This is an out-of-band signaling mechanism since data and flow control signals do not share the same line. The XON/XOFF mechanism is employed, for example, in the North American Digital Data System (DDS) [81, Section 24-11]. This mechanism rests on two characters of the underlying charset. For example, in the ASCII charset the DC1 character is used for XON and the DC3 character is used for XOFF. When the transmitter receives an XOFF character, it must stop its transmission, and it may resume as soon as XON is received. This is an in-band mechanism; the occurrence of these characters in the payload must be prevented by using proper escaping mechanisms (see also Section 1.2). The HDLC family of link layer protocols [81, Section 26-2] uses special supervisory frames for executing flow control: the RR (receive ready) and RNR (receive not-ready) frames. Since user data are transmitted in another type of frames, the RR/RNR mechanism uses out-of-band signaling. The receiver issues an RNR frame when all its buffers are full. As soon as it can accept new data, the receiver sends an RR frame. This frame also contains the sequence number of the next expected data frame (see Section 1.7).

1.5.2 Sliding-Window Flow Control In this class of schemes flow control is integrated with a sliding-window ARQ protocol like Goback-N or Selective-Repeat (see Section 1.4.2). The transmitter has a buffer for a number W of packets, called its window. The window size W specifies the number of allowed outstanding packets, i.e., packets for which the transmitter has not received an acknowledgment (yet). The receiver can use this for flow control by delaying its acknowledgments. This approach is tightly integrated with the ARQ protocol and does not need any extra control frames or escape mechanisms. However, there are two important drawbacks: • Delaying acknowledgments is not a good idea for time-critical transmissions with desired response times in the millisecond range. • Even without real-time requirements, the link layer protocol does not wait arbitrarily long for acknowledgments. Instead, with each frame a timeout is associated. In link layer protocols these timeouts are typically chosen such that propagation delays, packet generation delays, and processing speeds are included, but nothing more. This is in sharp contrast to multihop networks, where queueing delays are a significant fraction of the overall delay. In multihop networks, timeouts are therefore chosen much larger than necessary for a single link. Either way, if a timeout occurs, the transmitter retransmits the packet. If the receiver is still busy, the retransmission is wasted.

1.5.3 Further Mechanisms One straightforward approach to flow control can be applied in connection-oriented link layer protocols. Upon connection setup, the receiving station specifies a rate by which the transmitter can send packets [80], and which the receiver guarantees to always accept. Instead of a rate specification, the receiver can also specify the parameters (s, r) of a leaky bucket. The leaky-bucket scheme works as follows: The transmitter generates permits at rate r, i.e., every 1/r seconds. The transmitter is allowed to keep a maximum number of s permits; any permit in excess of this number is dropped. When a packet is to be transmitted, it is checked whether there is a permit. If so, the number of stored permits is decremented and the packet is transmitted. Otherwise, the transmitter has to wait for the arrival of the next permit. Another approach is used by TCP. TCP contains a flow control mechanism where the receiving end of a TCP connection tells the sender explicitly about its available buffer space (the advertised window). The advertised window is part of the TCP header and carried in each acknowledgment or data packet

© 2005 by CRC Press

1-20

The Industrial Communication Technology Handbook

going back from the receiver to the transmitter. This mechanism is independent of any underlying link layer flow control mechanism.

1.6 Packet Scheduling Algorithms At an abstract level, a packet scheduling algorithm selects the packet to be transmitted next, after service of the current packet has been finished. The packet is selected from a set of waiting packets. The packet waiting room can be located within a single station, but it can also be distributed over several stations. In the latter case, a MAC protocol can be considered part of a packet scheduling algorithm, since it actually decides about the station transmitting a packet, and the winning station has to make a local decision regarding which of its waiting packets to transmit. In this section we consequently restrict the perspective to a single station and its packet scheduler. As opposed to processor scheduling algorithms, packet scheduling algorithms are nonpreemptive; i.e., an ongoing transmission is not interrupted upon arrival of a more important packet. A packet scheduler bases its decision upon some performance objectives to be optimized. Typical objectives are delay, avoiding deadline misses, jitter avoidance, fairness, throughput, and priority. In the absence of any specific criterion, packets are often served on a first-come, first-served (FCFS) basis. In this section we discuss some popular scheduling schemes. A more detailed introduction to packet scheduling and more general scheduling problems are [87, Chapter 9] and [86].

1.6.1 Priority Scheduling In priority scheduling each packet is tagged with an explicit priority value, or the packet priority is derived from other packet attributes like addresses, packet types, and so on. The scheduler always selects the packet that currently has the highest priority. Multiple packets of the same highest priority are served in random order or FCFS order. Some algorithms map time-dependent information onto priorities. One example is the rate-monotonic scheduling algorithm [88] and its nonpreemptive extensions. Here it is assumed that the packets are generated from different periodic streams or flows, and each packet is associated with a deadline corresponding to its flow period. The priorities are then assigned in inverse order of the periods; therefore, the stream with the smallest period receives highest priority. Another example is the earliest-deadlinefirst (EDF) algorithm where the packet with the tightest deadline has highest priority.

1.6.2 Fair Scheduling In the last years many algorithms for fair queueing have been developed. The packets are grouped into distinct flows, and to each flow a separate queue is associated. Within a flow, packets are served in FCFS order. A nonempty queue is said to be backlogged. The goal of a fair scheduling algorithm is twofold: • Each backlogged flow should get a minimum share of the available bandwidth independent of the behavior of the other flows (firewall property). • The bandwidth of a currently inactive flow should be fairly distributed to the other flows, thus making efficient use of bandwidth. One of the simplest fair queueing algorithms is the round-robin algorithm, where all backlogged queues are served in round-robin order. A modification of this scheme is weighted round-robin, where each flow i is associated with a specific weight fi such that

 f = 1 (F is the set of all flows). The time is i

iŒF

divided into epochs of fixed upper length t. At the start of an epoch the scheduler determines the set of backlogged queues. The first nonempty queue j receives service until it is empty or its transmission time approaches its share of fj · t of the overall epoch. Following this, the second nonempty queue is served, and so on. The next epoch starts when all nonempty queues of the previous epoch are served.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-21

Other fair queueing algorithms have been derived from the generalized processor sharing (GPS) approach [90–92]. In its pure form, GPS assumes a number of flows with associated weights fi

Â

Ê fi = 1ˆ˜ . All backlogged queues are served in parallel. Serving a queue means transmission of the Á ¯ Ë iŒF packet at the head of the queue. As soon as this packet is finished, the next packet’s transmission is started. It can be shown that GPS has the following desirable property: if queue i is always nonempty during time interval (t1,t2) and if Wi(t1,t2) is the amount of service queue i receives during (t1,t2), then for a GPS server the following holds: Wi (t 1,t 2 ) fi ≥ W j (t 1,t 2 ) f j for all sessions j (except those with Wj(t1,t2) = 0). If both sessions i and j are backlogged during (t1,t2), then we have Wi (t 1,t 2 ) fi = W j (t 1,t 2 ) f j This scheme, however, is not directly usable for packet-based communications, since at most one packet can be transmitted at a time. Therefore, packet-based approximations to GPS have been developed. A good approximation strategy tries to pick the packets in the same order as they would finish under GPS. This decision has to be made each time a packet has finished transmission and the next packet is to be picked. However, at this time the packet that would finish next under GPS may not have arrived yet. The scheduler can only take the currently backlogged queues into account when making its decision. In weighted fair queueing (WFQ) the scheduler simulates the GPS operation. More specifically, for each flow a virtual time is maintained. For the k-th packet of flow i, the virtual start and finish times Sik and Fik are defined as [84] Sik = max{Fik -1, V (aik )} Fik = Sik +

Lki ri

where Fi 0 = 0, ri = r · fi (r is the overall link capacity), aik is the arrival time of the k-th packet in flow i, Lki is its length, and V(·) is a so-called virtual-time function. This basic algorithm can be performed with different virtual-time functions (vtf). For vtf of WFQ, the following holds: VWFQ(t 1 ) = 0 ∂VWFQ(t) ∂t

=

1

Â

fi

iŒBWFQ ( t )

t1 denotes the beginning of a system busy period and BWFQ(t) denotes the set of backlogged queues at time t. By this definition the vtf may change on every packet arrival or departure, and the vtf needs to be tracked by the scheduler. Therefore, the vtf has a computational complexity proportional to the number of flows. A number of schemes have been developed with similar properties but lower complexity. One example is frame-based fair queueing [93]. Both GPS and WFQ can be shown to guarantee a minimum service rate of ri = r · fi to a flow i. If flow i is leaky bucket constrained (i.e., the packets have a minimum

© 2005 by CRC Press

1-22

The Industrial Communication Technology Handbook

interarrival time and the packet length is appropriately bounded), then it can also be shown that for each packet an upper bound on its finishing time can be guaranteed. In the context of wireless transmission media, the situation changes slightly: to avoid waste of resources, the scheduler should pick a packet only from those backlogged flows where the head-of-queue packet is destined to a station for which the wireless channel is currently in a good state. Therefore, a backlogged flow does not receive service if its packet is likely to fail [85]. A number of wireless fair queueing schemes has been developed (see, for example, [89]), differing, for example, in the amount of compensation granted to flows suffering from bad channels for some time.

1.7 Link Layer Protocols In this section we present two standard link layer protocols. In general, a link layer protocol combines several of the mechanisms discussed in the previous sections.

1.7.1 The HDLC Protocol Family The HDLC protocol (high-level data link control) [81, Section 26.2; 82] can be considered the “mother” of many link layer protocols, including LAPB (used in X.25), LAPD (used in ISDN), LAPM (used in GSM), and the IEEE 802.2 Logical Link Control (LLC) protocol discussed in Section 1.7.2. An HDLC variant is also used in the IEEE BITBUS standard [29]. It is designed for point-to-point links; however, it can also be used over multiple-access channels with unique station addresses. The HDLC protocol distinguishes the following station types: • A primary station controls the link; it is responsible for error control, flow control, and setup and teardown of the logical link. All frames generated by a primary station are called commands. • A secondary station is controlled by a primary station. Specifically, it may not initiate data transfers on its own. Frames generated by a secondary station are called responses. • A combined station combines these two roles. These two station types can be used in two different configurations: • In the unbalanced configuration there is a single primary station and a number of secondary stations with distinct addresses. • In the balanced configuration two combined stations are connected. Since either station is both a primary and a secondary station, both can initiate data transfers. The HDLC protocol offers three modes of operation: • In the normal response mode (NRM) there is a central coordinator (a primary station) and a number of secondary stations. The secondary stations only send frames upon being polled by the primary station. • In the asynchronous response mode (ARM) the same configuration is used as in the normal response mode, but a secondary station may send frames on its own, without having to wait to be polled. • The asynchronous balanced mode (ABM) is used on point-to-point links. There is a combined station at either end of the link. This mode is used, for example, in the X.25 link layer and in the IEEE 802.2 Logical Link Control (see next section). The protocol is built upon three different frame types, illustrated in Figure 1.9. The general frame format has a flag field, used for bit and frame synchronization (see Section 1.2); an address field to identify a specific secondary on a multipoint link; a control field (explained below); an optional information field carrying the user data; a frame check sequence (FCS) field containing a 16- or 32-bit CRC checksum; and a closing flag. The three frame types are distinguished by their purpose and the different layouts of the control field:

© 2005 by CRC Press

1-23

Principles of Lower-Layer Protocols for Data Communications

8 bits

x * 8 bits

Flag

8/16 bits

Address

N bits

Control

Iframe

0

N(S)

Uframe

1

1

M

M

P/F

Sframe

1

0

S

S

P/F

N(S) N(S) P /F

Information

16/32 bits

8 bits

FCS

Flag

N(R) N(R) N(R)

M

M

M

N(R) N(R) N(R)

FIGURE 1.9 HDLC frame structure.

• Supervisory frames (S-frames) are used for error control and flow control purposes: — The two S-bits in the supervisory frame correspond to four different receiver answers: RR and RNR are used for flow control (see Section 1.5), whereas the two answers REJ and SREJ belong to the Goback-N or Selective-Repeat ARQ protocol (the HDLC frame format is usable for both protocols). — The P/F (poll/final) bit is a poll bit if it is sent in a command frame; otherwise, it is called the final bit. If the poll bit is set, the primary requires an acknowledgment for the corresponding command frame. If the secondary answers with the final bit set to one, this indicates that the command frame has been received and the corresponding command has been successfully executed. If the final bit is zero, the secondary indicates successful reception of the command frame, but executing the requested command has not (yet) been finished. — The receiver sequence number N(R) is discussed below. • Unnumbered frames (U-frames) are used for link management purposes (link setup, teardown). The five M-bits (mode bits) encode commands and responses. When used as a command, the primary can set the secondary’s mode of operation (ABM, ARM, NRM), reset the secondary, disconnect from the secondary, or reject a frame. When used as a response, the M-bits either acknowledge or reject a command. • Information frames (I-frames) carry user data. The control field contains transmit and receive sequence numbers. A station transmitting data packets equips each I-frame with a sequence number N(S). The receiver checks whether N(S) is the same as the expected sequence number N(R). If so, the receiver increments N(R) and sends an acknowledgment carrying the new value. This acknowledgment is either piggybacked onto an I-frame going in the opposite direction or sent as a separate S-frame when there is no outgoing I-frame ready for transfer. With the help of these sequence numbers, the receiver can detect lost and duplicate frames. HDLC supports procedures for setup and teardown of a logical connection and allows specification of the operation mode (ABM, ARM, NRM) between two stations. With the available frame types, different ARQ protocols can be implemented, most notably Goback-N and Selective-Repeat (called Selective-Reject in the context of HDLC). The several HDLC variants differ in their ARQ protocols and in the supported modes. In the next section we briefly discuss one of HDLC’s descendants, the IEEE 802.2 Logical Link Control protocol.

1.7.2 The IEEE 802.2 LLC Protocol The IEEE 802.2/ISO/IEC 8802-2 Logical Link Control (LLC) protocol [83] is a member of the IEEE 802.x family of MAC and link layer protocols. Specifically, it operates on top of the different IEEE 802.x MAC protocols, like Ethernet (IEEE 802.3), Token Bus (IEEE 802.4), Token Ring (IEEE 802.5), or wireless LAN

© 2005 by CRC Press

1-24

The Industrial Communication Technology Handbook

(IEEE 802.11). The LLC protocol offers three services to upper layers: an unacknowledged connectionless datagram service (best effort), an acknowledged connectionless datagram service, and a reliable connection-oriented service. The LLC can run with several MAC protocols because it makes rather weak assumptions about the MAC services: nothing more than a connectionless best-effort service is assumed. All these services use addressing information consisting of four attributes: source and destination MAC addresses as well as source and destination service access points (SAPs). Consequently, all packets carry SAP addresses in addition to MAC addresses. The connection-oriented service requires explicit setup and teardown of a link layer connection. A link layer connection is characterized by source and destination MAC and SAP addresses. For each link layer connection there is a separate connection context, which includes, among others, the sequence numbers. A connection provides reliable and in-sequence data delivery, and it is additionally possible to request flow control operations. Specifically, the upper layers can specify an amount of data they are willing to accept. The sequence number fields are larger than shown in Figure 1.9; up to 127 sequence numbers can be distinguished. The ARQ protocol is essentially Goback-N; the Selective-Reject feature of HDLC is not used. The LLC uses the asynchronous balanced mode.

Abbreviations ATM — Asynchronous Transfer Mode EIA — Electrical Industries Association FMS –– Fieldbus Message Specification FTT — Flexible Time–Triggered GSM — Global System for Mobile communications ITU — International Telecommunications Union LAP — Link Access Procedure

References Bit and Frame Synchronization [1] Stuart Cheshire and Mary Baker. Consistent overhead byte stuffing. ACM SIGCOMM Computer Communication Review, 27:209–220, 1997. [2] IEEE. Carrier Sense Multiple Access with Collision Detection (CSMA/CD): ETHERNET, 1985. [3] International Organization for Standardization (ISO). IS 1177-1985, Character Structure for Start/ Stop and Synchronous Character Oriented Transmission, 1985. [4] J. Manchester, J. Anderson, B. Doshi, and S. Dravida. IP over SONET. IEEE Communications Magazine, 36(5): 136–142, May 1998. [5] W. Simpson. RFC 1661, The Point-to-Point Protocol (PPP), July 1994. [6] W. Simpson. RFC 1662, PPP in HDLC-Like Framing, July 1994. Obsolete RFC 1549. Status: STANDARD.

Medium Access Control Protocols [7] Norman Abramson. Development of the ALOHANET. IEEE Transactions on Information Theory, 31:119–123, 1985. [8] Norman Abramson, Editor. Multiple Access Communications: Foundations for Emerging Technologies. IEEE Press, New York, 1993. [9] G. Agrawal, B. Chen, W. Zhao, and S. Davari. Guaranteeing synchronous message deadlines with the timed-token medium access control protocol. IEEE Transactions on Computers, 43:327–339, 1994. [10] Luis Almeida, Paulo Pedreiras, and Jose Alberto G. Fonseca. The FFT-CAN protocol: why and how. IEEE Transactions on Industrial Electronics, 49:1189–1201, 2002.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-25

[11] Michael Berry, Andrew T. Campbell, and Andras Veres. Distributed control algorithms for service differentiation in wireless packet networks. In Proc. INFOCOM 2001, Anchorage, AK, April 2001. IEEE. [12] D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, Englewood Cliffs, NJ, 1987. [13] Bluetooth Consortium. Specification of the Bluetooth System. http://www.bluetooth.org, 1999. [14] J.I. Capetanakis. Tree algorithm for packet broadcast channels. IEEE Transactions on Information Theory, 25:505–515, 1979. [15] Biao Chen, Nicholas Malcolm, and Wei Zhao. Fiber distributed data interface and its use for timecritical applications. In Jerry D. Gibson, editor, The Communications Handbook, pp. 597–610. CRC Press/IEEE Press, Boca Raton, FL, 1996. [16] Carla-Fabiana Chiasserini and Ramesh R. Rao. Coexistence mechanisms for interference mitigation in the 2.4-GHz ISM band. IEEE Transactions on Wireless Communications, 2:964–975, 2003. [17] Lou Dellaverson and Wendy Dellaverson. Distributed channel access on wireless ATM links. IEEE Communications Magazine, 35:110–113, 1997. [18] ETSI. High Performance Radio Local Area Network (HIPERLAN): Draft Standard, March 1996. [19] Andras Farago, Andrew D. Myers, Violet R. Syrotiuk, and Gergely V. Zaruba. Meta-MAC protocols: automatic combination of MAC Protocols to optimize performance for unknown conditions. IEEE Journal on Selected Areas in Communications, 18:1670–1681, 2000. [20] Robert G. Gallager. A perspective on multiaccess channels. IEEE Transactions on Information Theory, 31:124–142, 1985. [21] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3, 2–15, 2000. http://www.comsoc.org/pubs/surveys. [22] Jaap C. Haartsen. The Bluetooth radio system. IEEE Personal Communications, 7:28–36, 2000. [23] Fred Halsall. Data Communications, Computer Networks and Open Systems. Addison-Wesley, Reading, MA, 1996. [24] J.F. Hayes. Modeling and Analysis of Computer Communications Networks. Plenum Press, New York, 1984. [25] Ivan Howitt. Bluetooth performance in the presence of 802.11b WLAN. IEEE Transactions on Vehicular Technology, 51:1640–1651, 2002. [26] IEC. IEC 1158-1, FieldBus Specification: Part 1: FieldBus Standard for Use in Industrial Control: Functional Requirements. [27] IEEE. IEEE 802.4, Token-Passing Bus Access Method, 1985. [28] IEEE. IEEE 802.5, Token Ring Access Method and Physical Layer Specifications, 1985. [29] IEEE. IEEE 1118, Standard Microcontroller System Serial Control Bus, August 1991. [30] ISO. ISO 11898, Road Vehicle: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication, 1993. [31] Raj Jain. FDDI Handbook: High-Speed Networking Using Fiber and Other Media. Addison-Wesley, Reading, MA, 1994. [32] Leonard Kleinrock and Fouad A. Tobagi. Packet switching in radio channels. Part I. Carrier sense multiple access models and their throughput-/delay-characteristic. IEEE Transactions on Communications, 23:1400–1416, 1975. [33] J.F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16:43–70, 1984. [34] S.S. Lam. Multiaccess protocols in computer communications. In W. Chon, Editor, Principles of Communication and Network Protocols, Volume I, Principles, pp. 114–155. Prentice Hall, Englewood Cliffs, NJ, 1983. [35] Jim Lansford, Adrian Stephens, and Ron Nevo. Wi-Fi (802.11b) and Bluetooth: enabling coexistence. IEEE Network Magazine, 15:20–27, 2001. [36] Andrew D. Myers and Stefano Basagni. Wireless media access control. In Ivan Stojmenovic, Editor, Handbook of Wireless Networks and Mobile Computing, pp. 119–143. John Wiley & Sons, New York, 2002.

© 2005 by CRC Press

1-26

The Industrial Communication Technology Handbook

[37] IEEE. IEEE 802.11, Standard for Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Networks: Specific Requirements: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical Layer (PHY) Extension in the 2.4 GHz Band, 1999. [38] Nikos Passas, Sarantis Paskalis, Dimitri Vali, and Lazaros Merakos. Quality-of-service-oriented medium access control for wireless ATM networks. IEEE Communications Magazine, 35:42–50, 1997. [39] C.S. Raghavendra and Suresh Singh. Pamas: power aware multi-access protocol with signalling for ad hoc networks. ACM Computer Communication Review, 27, 5–26, 1998. [40] Theodore S. Rappaport. Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 2002. [41] Erwin P. Rathgeb. Integrated services digital network (ISDN) and broadband (B-ISDN). In Jerry D. Gibson, Editor, The Communications Handbook, pp. 577–590. CRC Press/IEEE Press, Boca Raton, FL, 1996. [42] Izhak Rubin. Multiple access methods for communications networks. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 622–649. CRC Press/IEEE Press, Boca Raton, FL, 1996. [43] S.R. Sachs. Alternative local area network access protocols. IEEE Communications Magazine, 26:25–45, 1988. [44] Hideaki Takagi. Analysis of Polling Systems. MIT Press, Cambridge, MA, 1986. [45] Andrew S. Tanenbaum. Computer Networks, 3rd edition. Prentice Hall, Englewood Cliffs, NJ, 1997. [46] Andrew S. Tanenbaum. Computernetzwerke, 3rd edition. Prentice Hall, Muenchen, 1997. [47] IEEE. IEEE 802.11, Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, November 1997. [48] Fouad A. Tobagi. Multiaccess protocols in packet communications systems. IEEE Transactions on Communications, 28:468–488, 1980. [49] Fouad A. Tobagi. Multiaccess link control. In P.E. Green, Editor, Computer Network Architectures and Protocols. Plenum Press, New York, 1982. [50] Fouad A. Tobagi and Leonard Kleinrock. Packet switching in radio channels. Part II. The hidden terminal problem in CSMA and busy-tone solutions. IEEE Transactions on Communications, 23:1417–1433, 1975. [51] TTTech Computertechnik GmbH, Vienna. TTP/C Protocol, Version 0.5, 1999. [52] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 2, PROFIBUS, 1996. [53] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 3, WorldFIP, 1996. [54] Harmen R. van As. Media access techniques: the evolution towards terabit/s LANs and MANs. Computer Networks and ISDN Systems, 26:603–656, 1994. [55] Bernhard Walke. Mobile Radio Networks: Networking, Protocols and Traffic Performance. John Wiley & Sons, Chichester, 2002. [56] Andreas Willig and Andreas Köpke. The adaptive-intervals MAC protocol for a wireless PROFIBUS. In Proc. 2002 IEEE International Symposium on Industrial Electronics, L’Aquila, Italy, July 2002.

Error Control [57] A. Annamalai and Vijay K. Bhargava. Analysis and optimization of adaptive multicopy transmission arq protocols for time-varying channels. IEEE Transactions on Communications, 46:1356–1368, 1998. [58] Sergio Benedetto, Guido Montorsi, and Dariush Divsalar. Concatenated convolutional codes with interleavers. IEEE Communications Magazine, 41:102–109, 2003. [59] Henrik Bengtsson, Elisabeth Uhlemann, and Per-Arne Wiberg. Protocol for wireless real-time systems. In Proc. 11th Euromicro Conference on Real-Time Systems, York, England, 1999. [60] Claude Berrou. The ten-year-old turbo codes are entering into service. IEEE Communications Magazine, 41:110–116, 2003.

© 2005 by CRC Press

Principles of Lower-Layer Protocols for Data Communications

1-27

[61] Daniel J. Costello, Joachim Hagenauer, Hideki Imai, and Stephen B. Wicker. Applications of errorcontrol coding. IEEE Transactions on Information Theory, 44:2531–2560, 1998. [62] Moncef Elaoud and Parameswaran Ramanathan. Adaptive use of error-correcting codes for real-time communication in wireless networks. In Proc. INFOCOM 1998, San Francisco, March 1998. IEEE. [63] David Haccoun and Samuel Pierre. Automatic repeat request. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 181–198. CRC Press/IEEE Press, Boca Raton, FL, 1996. [64] Bruce A. Harvey and Stephen B. Wicker. Packet combining systems based on the Viterbi decoder. IEEE Transactions on Communications, 42:1544–1557, 1994. [65] Olivier Hersent, David Gurle, and Jean-Pierre Petit. IP Telephony: Packet-Based Multimedia Communications Systems. Addison-Wesley, Harlow/England, London, 2000. [66] Samir Kallel. Analysis of a type-II hybrid ARQ scheme with code combining. IEEE Transactions on Communications, 38:1133–1137, 1990. [67] Paul Lettieri, Curt Schurgers, and Mani B. Srivastava. Adaptive link layer strategies for energyefficient wireless networking. Wireless Networks, 5:339–355, 1999. [68] Paul Lettieri and Mani Srivastava. Adaptive frame length control for improving wireless link throughput, range and energy efficiency. In Proc. INFOCOM 1998, pp. 564–571, San Francisco, 1998. IEEE. [69] Shu Lin and Daniel J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1983. [70] Hang Liu, Hairuo Ma, Magda El Zarki, and Sanjay Gupta. Error control schemes for networks: an overview. MONET: Mobile Networks and Applications, 2:167–182, 1997. [71] Arnold M. Michelson and Allen H. Levesque. Error-Control Techniques for Digital Communication. John Wiley & Sons, New York, 1985. [72] Arogyaswami Paulraj. Diversity techniques. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 213–223. CRC Press/IEEE Press, Boca Raton, FL, 1996. [73] Martin L. Shooman. Reliability of Computer Systems and Networks. John Wiley & Sons, New York, 2002. [74] Bernard Sklar. Digital Communications: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1988. [75] Bernard Sklar. A primer on turbo code concepts. IEEE Communications Magazine, 35, 94–102, 1997. [76] Jonathan Stone, Michael Greenwald, Craig Partridge, and James Hughes. Performance of checksums and CRC’s over real data. IEEE/ACM Transactions on Networking, 6:529–543, 1998. [77] Elisabeth Uhlemann, Per-Arne Wiberg, Tor M. Aulin, and Lars K. Rasmussen. Deadline-dependent coding: a framework for wireless real-time communication. In Proc. International Conference on RealTime Computing Systems and Applications, pp. 135–142, Cheju Island, South Korea, December 2000. [78] Xin Wang and Michael T. Orchard. On reducing the rate of retransmission in time-varying channels. IEEE Transactions on Communications, 51:900–910, 2003. [79] Andreas Willig, Martin Kubisch, Christian Hoene, and Adam Wolisz. Measurements of a wireless link in an industrial environment using an IEEE 802.11-compliant physical layer. IEEE Transactions on Industrial Electronics, 49:1265–1282, 2002.

Flow Control [80] Rene L. Cruz. Routing and flow control. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 650–660. CRC Press/IEEE Press, Boca Raton, FL, 1996. [81] Roger L. Freeman. Reference Manual for Telecommunications Engineering, 3rd edition, Volume 2. John Wiley & Sons, New York, 2002.

Link Layer Protocols [82] D.E. Carlson. Bit-oriented data link control procedures. IEEE Transactions on Communications, 28:455–467, 1980.

© 2005 by CRC Press

1-28

The Industrial Communication Technology Handbook

[83] LAN/MAN Standards Committee of the IEEE Computer Society. International Standard ISO/IEC 8802-2, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 2: Logical Link Control, 1998.

Packet Scheduling [84] Jon C.R. Bennet and Hui Zhang. Hierarchical packet fair queueing algorithms. In Proc. ACM SIGCOMM, 1996. Association of Computing Machinery. [85] Pravin Bhagwat, Partha Bhattacharya, Arvind Krishna, and Satish K. Tripathi. Using channel state dependent packet scheduling to improve TCP throughput over wireless LANs. Wireless Networks, 3:91–102, 1997. [86] E.G. Coffman, Jr. Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York, 1982. [87] Srinivasan Keshav. An Engineering Approach to Computer Networking: ATM Networks, the Internet and the Telephone Network. Addison-Wesley, Reading, MA, 1997. [88] C.L. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM, 20:46–61, 1973. [89] Songwu Lu, Vaduvar Bharghavan, and Rayadurgan Srikant. Fair queueing in wireless packet networks. In Proc. of ACM SIGCOMM ’97 Conference, pp. 63–74, Cannes, France, September 1997. [90] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the single node case. In Proc. IEEE INFOCOM, Volume 2, pp. 915–924, 1992. IEEE. [91] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the multiple node case. In Proc. IEEE INFOCOM, Volume 2, pp. 521–530, 1993. IEEE. [92] Abhay Kumar J. Parekh. A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, February 1992. [93] Anujan Varma and Dimitrios Stiliadis. Hardware implementation of fair queueing algorithms for ATM networks. IEEE Communications Magazine, 35:54–68, 1997.

© 2005 by CRC Press

2 IP Internetworking 2.1

ISO/OSI Reference Model ..................................................2-1 The Physical Layer • The Data Link Layer • The Network Layer • The Transport Layer • The Session Layer • The Presentation Layer • The Application Layer

2.2

The TCP/IP Reference Model.............................................2-4 The Host-to-Network Layer • The Internet Layer • The Transport Layer • The Application Layer

2.3 2.4

Reference Model Comparison............................................2-6 Data Link Layer Protocols and Services ............................2-8 Frame Creation • Error Detection and Correction • Media Access Control

2.5

Network Layer Protocols and Services ............................2-10 IPv4 • IPv4 Multicasting • IPv6 • Address Resolution Protocol • Internet Control Message Protocol • Internet Group Management Protocol

2.6

Transport Layer Protocols and Services ..........................2-18 Transmission Control Protocol • User Datagram Protocol • Resource Reservation Protocol

2.7 2.8

Helmut Hlavacs University of Vienna

Christian Kurz University of Vienna

Presentation Layer Protocols and Services ......................2-21 Application Layer Protocols and Services .......................2-22 TELNET • File Transfer Protocol • Hypertext Transfer Protocol • Simple Mail Transfer Protocol • Resource Location Protocol • Real-Time Protocol

2.9 Summary............................................................................2-26 References .....................................................................................2-26

2.1 ISO/OSI Reference Model The ISO/OSI reference model [ISO7498] was developed by ISO (International Organization for Standardization) and finished in 1982. The OSI (Open Systems Interconnection) reference model allows the connection of open systems. This objective is reached by applying a layered approach. The communication system is divided into seven layers (see Figure 2.1) [PET2000]. The lowest three layers are network dependent. They provide support for data communication between and linking of two systems. The upper three layers are application oriented. They allow the end-user application processes to interact with each other. The intermediate layer (transport layer) isolates the application-oriented layers from the communication details at the lower layers [HAL1996]. Each layer performs a well-defined function. This allows the reduction of the level of complexity at each layer and is defined by a protocol. The information flow between the layers is directed through interfaces and should be minimized [TAN1996]. Each layer exchanges messages using services of the layer below. It communicates with the related peer at the same level in a remote system and provides services to the layer above [COL2001]. At each layer the source host adds a header to the packet, which

2-1 © 2005 by CRC Press

2-2

The Industrial Communication Technology Handbook

Host A

Host B Application Layer

Interface Presentation Layer Interface

Application Oriented Layers

Session Layer Interface Transport Layer

Intermediate Layer

Interface Network Layer Interface Data Link Layer

Network Dependent Layers

Interface Physical Layer

Physical Link (e.g., Cable)

FIGURE 2.1 ISO/OSI reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

is read and removed again by the receiver. It is important to note that the implementation of one layer is therefore independent from the implementation of the other layers. In the next section, each of the layers is discussed separately, starting at the lowest one.

2.1.1 The Physical Layer The lowest layer is concerned with the transmission of raw bits from the electrical interface of the user equipment to the communication channel. This can be either an electrical, optical, or wireless medium, and it transfers a serial stream of data. It has to be ensured that a sent 1 bit is seen by the receiver as a 1 bit, not as a 0 bit. Design issues at this layer are, for example, how long one bit lasts and by which wavelength of light or by which voltage level a 1 and a 0 bit is represented. Additionally, the handling of the initial connection and the closure of the connection are carried out at the physical layer. Also, mechanical properties of the network equipment, such as size and shape of connectors and cables, have to be specified. Furthermore, electrical (or optical) parameters must be determined. These are the voltage levels, electrical resistance of the cable, duration of signaling elements and voltage changes, and coding method. The next issue handled by the physical layer is the functional specification. It concerns the meaning of switched connections, distinguishing between data and control wires and specifying clock rate and ground.

2.1.2 The Data Link Layer As the physical layer is only concerned with the transmission of raw data, the main function of the data link layer is to recognize and correct transmission errors. For this reason, the sender divides the data stream into frames that are transmitted sequentially. When the frame is received, an acknowledgment may be sent back to the sender. If a frame is destroyed by a burst in the line and therefore is not acknowledged, it is retransmitted by the sender. As the acknowledgment frame could also be lost, care

© 2005 by CRC Press

IP Internetworking

2-3

has to be taken that no duplicate frames are inserted into the data stream. The data link layer therefore solves problems arising from lost, damaged, or duplicate frames. This layer may also offer different service classes, e.g., for protected or unprotected services. If the receiver is slower than the sender, frames can be lost because of different processing speeds. To prevent this scenario, a mechanism is implemented to regulate network traffic. Therefore, the sender should know how much buffer space is left at the receiver. Another task of the data link layer is the media access control within broadcasting networks. In these networks, all connected computers perceive all data transferred; they share a common link. Therefore, it has to be made sure that there is only one sender at a time to avoid data collision. If a collision occurs, it has to be detected and retransmission of all affected data has to be initiated. When data can be transmitted in both directions simultaneously, the acknowledgment frame for sender A sending to receiver B competes with the data frames that B is sending to A. A solution for this problem is piggybacking, where the acknowledgment information is added to data packages sent, instead of sending additional frames.

2.1.3 The Network Layer The network layer is responsible for the setup, handling, and termination of network-wide connections; it controls the operation of the subnet. There are two possible types of network connections: virtual connections and datagram connections. Virtual connections are set up at the start of a transmission to fix the route for the following data packets; packets are always sent using the same route. Using a datagram connection, the route is chosen separately for each package. Sometimes it has to be ensured that packets arrive in the same order as they were sent. A packet B sent after a packet A may arrive in front of A using a different route in datagram communication. Additionally, the network layer is concerned with package routing from source to destination. Routing information can either be stored in a static table or be determined dynamically at the start of each transmission. The chosen route can also depend on the current network load. If too many packets are sent in one subnet, a capacity bottleneck forms. To avoid this situation, the network layer may implement congestion control. To be able to analyze network traffic, an accounting mechanism is incorporated at this layer. This mechanism counts how many packets are sent, also storing information about packet source and destination. The gathered information can be used to produce billing information. Also, there may be problems when a packet is traveling through heterogeneous networks. The network layer handles the issues of different packet sizes, varying addressing schemas, or different protocols.

2.1.4 The Transport Layer This layer is the interface between the higher application-oriented layers and the underlying networkdependent layers. Thus, the session layer can transfer messages independent from the network structure. Seen from layers above, messages can be transferred transparently without having knowledge of the underlying network structure. The transport layer basically cuts messages into smaller packets if needed and passes them to the network layer. At the sender, the messages are assembled again and passed to the session layer. An important task of the transport layer is the handling of transport connections. Normally, one network connection is created for each transport connection required by the session layer. If the session layer requires higher output than can be handled by one connection, the network layer might create additional connections. On the other hand, if one wants to save costs, a number of transport connections can be multiplexed onto one network connection. As there is the possibility to set up multiple connections, a transport header is added to distinguish between them. The transport layer provides different classes of quality of service (QoS). The lowest service class provides only basic functionality for connection establishment; the highest class allows full error control and flow control. To avoid the situation of a fast sender overrunning a slower receiver with messages, an

© 2005 by CRC Press

2-4

The Industrial Communication Technology Handbook

algorithm for flow control is provided. The most popular type of connection is an error-free point-topoint connection where messages are delivered in the same order they were sent. Additionally, messages with no guaranteed order can be sent. It is also possible to send messages not only to one, but to multiple destinations, or to send broadcast messages. The transport layer establishes and terminates connections across the network. Therefore, the need for a naming mechanism arises, allowing processes to choose with whom they converse.

2.1.5 The Session Layer Layer 6 organizes and synchronizes the data exchange for two application layer processes. It sets up and clears the communication channel for the whole duration of the network transaction between them, and therefore sets up sessions between users on different machines. A session might be used to log into another machine in a remote time-sharing environment or to transfer a file. The session layer provides interaction management (also called dialogue control). Data can be exchanged using duplex or half-duplex connections. A duplex connection transfers data both ways simultaneously. A half-duplex connection can transfer either one way or the other, where the session layer decides which party is allowed to use the link. Another task of the session layer is the token management. It is useful when both sides are not allowed to perform the same operation at the same time. To schedule these operations, a token is issued only to one process at each given time, allowing only the process that holds the token to perform the critical task. For big data transmissions, synchronization points can be set periodically. If the network connection fails, the transmission is restarted at the last synchronization point set. Thus retransmission of the whole data can be avoided. Nonrecoverable exceptions during transmissions are reported to the application layer.

2.1.6 The Presentation Layer The main task of the presentation layer is the representation of data, e.g., integers, floating point numbers, or character strings. Therefore, the syntax for these data containers is defined. As different computers may use varying internal data representations (for example, for characters or numbers, a conversion has to be done), the data sent are converted to an appropriate transfer syntax and are transformed back to the receiver’s internal data format upon receipt. Those converters for the syntax of data do not necessarily have to understand the semantics. This layer may also provide services for data encryption and data compression.

2.1.7 The Application Layer This layer provides services not to other layers, but directly to application programs. Thus, there is no specific service in this layer, but there is a distinct combination of services offered to each application process. As the connected hosts may use different file systems, the application layer handles the differences and avoids incompatibilities. Therefore, this layer provides the means for network-wide distributed information services. This allows the application processes to transfer files, send e-mails, or perform directory lookups. Furthermore, the application layer provides services for the identification of intended communication partners, to check the availability of an intended communication partner, to verify communication authority, to provide privacy services, to authenticate communication partners, to select the dialogue discipline, to reach an agreement on the responsibility for error recovery, and to identify the constraints on data syntax [HAL1996].

2.2 The TCP/IP Reference Model TCP/IP (Transmission Control Protocol/Internet Protocol) was first used when the ARPANET was emerging. This network was developed for use by the U.S. Armed Forces. Therefore, it was required that even when some parts of the network were destroyed during battle, it should still provide communication

© 2005 by CRC Press

2-5

IP Internetworking

Host A

Host B Application Layer

Interface Transport Layer Interface Internet Layer Interface Host-to-Network Layer

Physical Link (e.g., Cable)

FIGURE 2.2 TCP/IP reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

services. As long as two hosts were still functioning and there was any path available between them, communication must be possible. Another important issue was the ability to connect multiple different networks together no matter the underlying protocols, physical transport medium, or provided bandwidth. The TCP/IP reference model is structured similar to the ISO/OSI model introduced in Section 2.1, but it consists of only four layers. A comparison of both models is done later in this chapter. The lowest layer is the host-to-network layer, where the Internet layer is attached. Above the transport layer is located the highest layer, the application layer (Figure 2.2) [SCO1991]. The next sections give an overview of the services provided by the TCP/IP model layers, starting with the lowest one.

2.2.1 The Host-to-Network Layer TCP/IP does not specify services or operations at the host-to-network layer. It is only required that the host can somehow connect to the network to enable the Internet layer to send packets. As this layer is not defined, the implementation can vary on each system. The network service may be provided by Ethernet, Token Ring, asynchronous transfer mode (ATM), wide area network (WAN) technologies, wireless technologies, or any other means of transferring network packets.

2.2.2 The Internet Layer The main features of the Internet layer are addressing, packet routing, and error reporting. Additionally, services for fragmentation and reassembly of packets are provided [HAL1996]. The core protocols at the Internet layer are the Internet Protocol (IP) [RFC791], the Address Resolution Protocol (ARP) [RFC826], the Internet Control Message Protocol (ICMP) [RFC792], and the Internet Group Management Protocol (IGMP) [RFC3376]. The Internet Protocol is concerned with packet routing, IP addressing, and the fragmentation and reassembly of packets. It is a packet-switching protocol based on a best-effort connectionless architecture. Packets travel independently from each other from source to destination host. Each packet may be routed differently through the network; thus packets may be delivered in a different order than they were sent. Packets may also be lost, because delivery is not guaranteed. To be able to route packets across the network, each host has to know the location of a gateway or a router. The gateway decides which path a packet has to travel. For this reason, a routing table is maintained at the Internet layer. To send packets across networks that only support small packet sizes, the packets are broken down in size at the source host and are assembled again at the destination host.

© 2005 by CRC Press

2-6

The Industrial Communication Technology Handbook

ARP [RFC826], the Address Resolution Protocol, translates network layer addresses to link layer (hardware) addresses. Thus, an IP address is translated to, for example, an Ethernet address. The Internet Control Message Protocol (ICMP) [RFC792] is concerned with datagram error reporting and is able to provide certain information about the Internet layer. The Internet Group Management Protocol (IGMP) [RFC3376] is used to manage IP multicast groups [RFC1122].

2.2.3 The Transport Layer The transport layer provides stream and datagram communication services. Protocols specified at this layer are the Transmission Control Protocol (TCP) [RFC793] and the User Datagram Protocol (UDP) [RFC768]. Both protocols deliver end-to-end communication services (i.e., message transfer). The Transmission Control Protocol is a connection-oriented and reliable point-to-point communication service [RFC793]. A data stream is sent to any other host in the Internet without errors. This data stream is broken down into messages and handed down to the Internet layer. TCP sets up and terminates the connection, and it sequences and acknowledges the packets it sends. It is also responsible for retransmitting packets lost during transmission. Also, a service for flow control is implemented, thus avoiding a receiver being flooded by a faster sender. The User Datagram Protocol is a connectionless, unreliable communication protocol [RFC768]. Thus sequencing or flow control is not provided. It is used when prompt delivery of packets is more important than error-free transmission, like demanded for the transmission of video or audio content. Compared to TCP, there is no connection establishment, no connection state, a smaller packet overhead, and an unregulated send rate [KUR2001].

2.2.4 The Application Layer This layer provides services to application processes. It accesses services of the transport layer and allows processes at different hosts to communicate with each other using a variety of protocols. These include the Hypertext Transfer Protocol (HTTP) [RFC2616] to send and receive files that make up Web pages. Also, protocols for sending electronic mail, the Simple Mail Transfer Protocol (SMTP) [RFC821], and interactive file transfer, the File Transfer Protocol (FTP) [RFC959], are implemented at this layer. Another provided and often used service is Telnet, which is a terminal emulation protocol [RFC854]. It enables the user to log on to remote hosts. To access news articles at virtual blackboards, the Network News Transfer Protocol (NNTP) [RFC977] is provided. Additionally, protocols for the management of TCP/IP networks are available at this layer. The Domain Name Service (DNS) [RFC1034, RFC1035] resolves a host name to an IP address. Network management, including the collection and exchange of management information, is facilitated by the Simple Network Management Protocol (SNMP) [RFC1157]. Besides these basic protocols, a wide variety of other protocols are implemented for use at the TCP/ IP application layer. An overview of the assignment of the protocols mentioned in this section to the respective layer is given in Figure 2.3.

2.3 Reference Model Comparison Both reference models described above are based on a layered approach. Also, the layers provide quite similar services. The application layer of the TCP/IP model corresponds to the application layer of the ISO/OSI reference model. Presentation and session layers are not present in the TCP/IP reference model. Thus, in the TCP/IP model, services provided by these two layers have to be performed by the application process itself. The two transport layers perform similar services. The next layer in the TCP/IP model, the Internet layer, is equivalent to the network layer in ISO/OSI. The data link layer and the physical layer of the OSI reference model are represented by the host-to-network layer in the TCP/IP model (Figure 2.4). In both models, layers above the transport layer are application dependent [TAN1996].

© 2005 by CRC Press

2-7

IP Internetworking

HTTP

SMTP

FTP

NNTP

TCP

SNMP

RTP

ICMP

IP

WLAN

IGMP

Token Ring

Application Layer

Transport Layer

UDP

ARP

Ethernet

DNS

ATM

Internet Layer

Host-toNetwork Layer

FIGURE 2.3 TCP/IP architecture. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

ISO/OSI

TPC/IP

7

Application Layer

Application Layer

4

6

Presentation Layer

5

Session Layer

4

Transport Layer

Transport Layer

3

3

Network Layer

Internet Layer

2

2

Data Link Layer Host-to-Network Layer

1

1

Physical Layer

FIGURE 2.4 ISO/OSI vs. TCP/IP. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

The ISO/OSI model is mainly a conceptual model; it is an example for a universally applicable structured network model. It introduced three main concepts, which were also obeyed when the TCP/ IP reference model was developed. Each layer provides exactly defined services to the layer above and uses services of the layer below. These services are accessed using interfaces specifying which parameters are expected and the results returned. Protocols defined in each layer communicate with their peer at the remote host, independent from the underlying network structure. These ideas in both protocols are similar to object-oriented software development [TAN1996]. In the beginning, the TCP/IP model did not strictly separate between services, interfaces, and protocols; these concepts have been introduced later. Thus, protocols in the ISO/OSI model are better encapsulated than those in the TCP/IP model, and it is easier to alter services in the ISO/OSI reference model [TAN1996]. As the ISO/OSI model was developed before the respective protocols and their implementation, it was possible to easily distinguish between services, interfaces, and protocols for each layer. The developers were able to choose the appropriate number of layers such that each one could perform only a distinct set of matching services. For TCP/IP, the protocols were developed first, and afterwards, the

© 2005 by CRC Press

2-8

The Industrial Communication Technology Handbook

abstract model was created. The problem with this approach was that the model did not fit to any existing protocol stacks [TAN1996]. Finally, there are differences at the area of connectionless vs. connection-oriented communication. The ISO/OSI reference model provides services for both kinds of communication at the network layer, but only connection-oriented services at the transport layer. The TCP/IP reference model supports both connectionless and connection-oriented communication at the transport layer, but only connectionless services at the network layer [TAN1996]. In the following, the most important TCP/IP models and services, their functionality, and their position in the OSI stack will be described.

2.4 Data Link Layer Protocols and Services In the OSI model, the data link layer is situated at layer 2, and in the TCP/IP reference model, at the host-to-network layer. Its purpose is to offer services to OSI layer 3 in the way that protocols at layer 3 may send data to neighboring computers (i.e., computers directly connected via a network link or via layer 1 or 2 repeaters, bridges, hubs, or switches) in a reliable way. The data link layer may offer one of the following services to layer 3: • Using an unacknowledged connectionless service, no measures are taken to detect lost packets by the sender or the receiver. • In acknowledged connectionless services, the receiver must acknowledge the data it received by sending back an acknowledgment to the sender. If the sender does not receive the acknowledgment after a certain amount of time, it assumes that the data were lost and retransmits them. • In connection-oriented services, the data link layer must first create a (possibly virtual) path between sender and receiver before data can be sent. Furthermore, the data link layer adds sequence numbers to the sent data units in order to detect lost or erroneous data units. The bit-error rate of modern wire-line (electrical or optical) local area network (LAN) interconnections is too low to justify the additional effort for virtual path creation at this level. In LANs, therefore, usually acknowledged (Token Ring) or unacknowledged (Ethernet) connectionless services are used at layer 2. Lost packets or packets delivered out of order are then often to be detected at layer 4 or even higher. Wireless networks, however, may severely suffer from lost packets or high bit-error rates. Under these conditions, sophisticated data link layer protocols like IBM’s Synchronous Data Link Control (SDLC), or the closely related ISO norm High-Level Data Link Control (HDLC) and the CCITT recommendation Link Access Procedure (LAP), or the IEEE 802.2 norm Logical Link Control (LLC), are often used.

2.4.1 Frame Creation One major task of layer 2 is to pack the data it receives from a higher layer for transfer into so-called frames, i.e., data packets, which are then modulated onto the physical network medium. This is done in a way that the desired receivers are able to (1) detect that a frame has been sent, (2) decode the frame reliably and retrieve the sender and receiver addresses, and (3) identify those frames, which are meant for themselves. Frames are nothing more than a sequence of bits modulated onto a carrier. In order to be able to decode information stored in a frame, a receiver first has to be able to identify the first bit of a frame. This start bit then is usually followed by a specific sequence of bits containing frame information like the frame length, the content type, checksums, etc. A simple method for finding the frame start is given by bit stuffing. In protocols like X.25, the start of a frame is signaled by six 1s. If the data transported in the frame also contain six 1s in a row, then after the fifth 1, a 0 has to be inserted by the data link layer. The receiving data link layer then knows that if it receives five consecutive 1s followed by a 0, the sender must have inserted the 0, and therefore removes it. Another method for synchronizing senders and receivers at the bit level is given by sending sync bytes, as, for instance, is done in Digital Video Broadcast (DVB) [REI2001]. Here, data frames are 204 bytes long

© 2005 by CRC Press

IP Internetworking

2-9

and contain a certain value (0¥47) always at the same position (sync byte). The task of a sync-byte detector is to detect the regular occurrence of this value every 204 bytes. If this value is detected five times, then sender and receiver are synchronized and the receiver may easily compute the frame start from it. Other methods include octet counting or octet stuffing and will be described in the context of application protocols. Once the start bit of a frame is identified, a network card may determine whether it is the receiver of the sensed frame. In the IEEE 802 standard, each network interface card is assigned a unique 6-bytelong media access control (MAC) address, and each sent frame starts with the MAC address of the destination network card. Thus, each network card receiving a frame just compares the first 6 bytes of the frame with its own MAC address, and if they are equal, it passes the frame on to the next higher layer for further processing.

2.4.2 Error Detection and Correction Sending data over certain media types is often unreliable and may be severely disturbed by external disruptions, causing data to be lost or wrongly received. Thus, another important task of the data link layer is either to correct corrupted frames or at least to detect the occurrence of bit errors. In order to detect or correct bit errors, the sender must add checksum information, which is additional to the transported headers and user data. The more information is added, the more wrong bits may be detected or even corrected. A popular method for error detection and correction is given by Hamming codes. Here, certain code words are sent, which are different at a specific number of bits, the Hamming distance. For instance, a code containing the words 000111, 111000, 000000, and 111111 has the Hamming distance 3; i.e., code words differ at at least 3 bits from each other. In order to identify that d bits have been changed during transmission, a Hamming distance of d + 1 is required. If the receiver has to be able to correct d bits, then a Hamming distance of 2d + 1 must be kept. The code above thus is able to detect two wrong bits and correct one wrong bit. A simpler way for detecting wrong bits is given by the parity bit. Here, only one bit is added to each code word, counting the number of 1s in the word. If this number is even, the parity bit is set to 0; otherwise, it is set to 1 (or vice versa). Codes with parity bits may detect one wrong bit per code word only. A more sophisticated error detection code is give by the cyclic redundancy check (CRC) code. Here, each sequence of bits is treated as a polynomial over the field of binary numbers (modulo 2). The number 101, for instance, is treated as the polynomial x2 + 1. Modulo 2 means that each addition of single bits is treated as an exclusive-or (XOR) operation, i.e., 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, and 1 + 1 = 0. For CRC codes, a fixed polynomial is chosen, called generator polynomial G(x). If a code word W(x) is to be sent, it is replaced by another polynomial R(x), which can be divided by G(x) with rest 0, and from which the original code word W(x) can be reconstructed. The polynomial R(x) is then transmitted and received. If the received R(x) can be divided by G(x) without rest, then the transmission has been error-free with high probability. Otherwise, bit errors are detected and the transmitted code word is dropped. Other error correction techniques include Reed–Solomon codes and convolutional codes, but these will not be treated here.

2.4.3 Media Access Control An important part of the data link layer is given by the media access control (MAC) sublayer. At this sublayer, the access to the physical medium is controlled, which may be shared by several senders concurrently. Depending on the media type, one or several senders may transmit data at the same time. In case of conflicts, several techniques exist in order to grant the right to use the medium. 2.4.3.1 ALOHA The ALOHA technique, developed at the University of Hawaii, allows all senders to send their data to a commonly shared broadcast medium whenever they wish. In a broadcast medium, the data sent by one

© 2005 by CRC Press

2-10

The Industrial Communication Technology Handbook

host are received by all others listening to the same medium. In case of collisions due to the concurrent sending of two or more senders, the colliding frames are discarded and must be sent anew. 2.4.3.2 CSMA/CD For carrier-sense multiple access/collision detection (CSMA/CD), as, for example, implemented in Ethernet, several network cards share the same broadcast medium (e.g., an electrical wire). Each network card listens to the medium (carrier sense), and if no signal is detected, then a new sender may use the medium immediately. Due to the limited speed of signals, two or more senders may send simultaneously without noticing each other in time, resulting in collisions. At such an instance, all colliding frames are discarded and each sender waits for a random amount of time until it tries to send again. 2.4.3.3 TDMA In time-division multiple access (TDMA), time is divided into time slices, and each sender is granted one slice where it may send its data into the medium. Here, bandwidth may be wasted as senders own their time slice, whether they have something to send or not. 2.4.3.4 FDMA In frequency-division multiple access (FDMA) several sending frequencies exist, and for each frequency, one sender may transmit without fearing interference from other frequencies. For example, GSM (global system for mobile communication) uses a mixture of TDMA and FDMA for its calls. Additionally, GSM terminals change their frequency according to a fixed scheme (frequency hopping). 2.4.3.5 CDMA The concept of code-division multiple access (CDMA) is fundamentally different from the previous concepts. Here, each sender is assigned a unique bit sequence of length N called chip. Each sent bit is then added (modulo 2) to all chip bits, yielding the chip if a 0 is to be sent, or the inverse chip if a 1 is to be sent. If a terminal wants to transmit R bits per second (bps), then R chips have to be transferred per second, making necessary a much higher bandwidth of R ¥ N bps in total. Thus, the necessary frequency band is broadened significantly. In essence, the signal is spread over a broad spectrum and the chip is thus often called spreading sequence. In CDMA, senders with different chips can send concurrently and do not disturb the reception of other signals. This works because different chips are mathematically orthogonal to each other with respect to the inner products of chips (which can also be interpreted as bit vectors) and their inverse. Also, due to the use of a broader spectrum, the reconstruction of the signal is more robust with respect to other noise sources.

2.5 Network Layer Protocols and Services 2.5.1 IPv4 The term Internet Protocol (IP) usually denotes IP version 4 (IPv4), which has been specified in [RFC791] and [RFC1122] and is the established standard protocol for the Internet at layer 3 of the ISO/OSI reference model and at the Internet layer of the TCP/IP reference model. The task of IP is to transport a packet from one source computer to a destination computer, where both computers are interconnected by an internet. Here, internet denotes any (possibly privately managed) heterogeneous network that is interconnected using IP and IP-based routers. In contrast, the Internet denotes the well-known worldwide IPbased network interconnecting millions of computers and being managed by network information centers (NICs) and Internet service providers (ISPs). When traveling through an internet, a packet may pass by several intermediary networks with different network technologies, for instance, Ethernet, Token Ring, ATM, etc., used at layers 1 and 2. At the border between two different networks, the packet’s destination network address is examined by a router, i.e., a computer that is connected to both networks and that is able to select other routers in the path between sender and receiver, or to find the receiver in its own network. Routing decisions are usually made using

© 2005 by CRC Press

2-11

IP Internetworking

TABLE 2.1

IP Network Classes

Class

Most Significant Bits

Network Address

Host Address

A B C D E

0 10 110 1110 11110

7 bits 14 bits 21 bits 28 bits Reserved

24 bits 16 bits 8 bits 0 bits Reserved

Source: From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.

predefined and regularly updated routing tables. However, the next chosen router is by no means fixed and may depend on runtime situations like congestion or link failures, or it may simply be chosen at random. As a consequence, packets may travel through different paths from sender to receiver, and neither the delivery itself nor the original order can be guaranteed. IP packets are called datagrams, which may have a total length of 65,535 bytes. Datagrams may be cut into a sequence of smaller datagrams, if the datagram size is larger than the network’s maximum transfer unit (MTU), i.e., the largest OSI layer 2 frame that may be transmitted by the network. For Ethernet, for instance, the MTU is 1500 bytes. This process is called fragmentation, and the IP header reflects several fields for reassembling such fragments into the original datagram again. Each datagram or fragment is led by a 20-byte header containing the following information: • • • • • • • • • •

The version number of the IP (4). The IP header length (IHL), which may be larger than 20. The total length of the datagram, including header. An identification number for reassembling fragmented datagrams. All fragments with the same ID belong to the same datagram. Flags, including the don’t fragment (DF) flag, flagging that the datagram should not be fragmented, and the more fragments (MF) flag, signaling that more fragments are still to come. A fragment offset identifying the offset of the received fragment in the whole datagram. The time-to-live (TTL) counter, which is decreased by one by each router. A datagram with TTL equal to zero is discarded. This prevents faulty datagrams from circling through the Internet forever. A number identifying the used transport protocol (6 for TCP, 17 for UDP, …). A header checksum. The IP source and destination addresses.

An important aspect of IPv4 is given by the 32-bit-long IP addresses. The written form follows the dotted decimal notation scheme X1.X2.X3.X4, where the Xi are decimals between 0 and 255. Each address starts with an address class identifier, and then is followed by the network address, and finally by the host address. There are different network classes, as shown in Table 2.1. Each network card attached to the Internet must have a unique IP address. The address assignment scheme is a two-step strategy. First, each site managing a network connected to the Internet is assigned a unique network address by a central authority called the network information center (NIC). Then each site may assign the unique host addresses belonging to this network address, which may include 224 – 2 = 16,777,214 addresses (class A), 216 – 2 = 65,534 (class B), or 28 – 2 = 254 (class C) unique host addresses. IP defines a set of private addresses that may be used freely, but whose traffic should not be routed over the Internet without modification [RFC1918]. The three address blocks are: • 10.0.0.0 to 10.255.255.255 (one class A network) • 172.16.0.0 to 172.31.255.255 (16 contiguous class B networks) • 192.168.0.0 to 192.168.255.255 (256 contiguous class C networks)

© 2005 by CRC Press

2-12

The Industrial Communication Technology Handbook

Source

MR1 MRS

Recv. 1

Recv. 2

FIGURE 2.5 Unicast.

Multicast addresses are special addresses reserved for groups of hosts receiving the same multimedia program via multicast from a single source [RFC1112]. Multicast addresses may range from 224.0.0.0 to 239.255.255.255; details about multicasting are described in Section 2.5.2. Two host addresses are reserved in each (sub)network. The host address 0 denotes the network itself; the highest possible host address denotes a broadcast address that is received by all hosts of a given network.

2.5.2 IPv4 Multicasting The normal mode of communicating via IPv4 is unicast; i.e., one sender sends data to one receiver. Another possible transfer mode inside a subnet is broadcast. In this case one sender sends data to each node of the subnet, regardless of whether the node is interested in the data or not. IP broadcasting, however, does not work beyond the respective subnet boundaries. The third mode of communication is called multicast. Here one sender sends data to a well-defined group of nodes, which may be attached to the same subnet or attached to some other subnet that can be reached via the Internet. Nodes that do not belong to this group do not receive or ignore the sent data. The main advantage of multicasting can be seen in Figure 2.5 and Figure 2.6. A host sending a data packet to a group of N receivers in unicast mode (Figure 2.5) must send the data N times, once for each receiver, thus causing significant traffic and CPU overhead at the source. When using multicast (Figure 2.6), the host sends the data only once, and somewhere in between the source and the receivers, multicast routers duplicate the data packets (as done by MR1 in Figure 2.6) and pass them on to the interested receivers. This way, the source sends each packet only once, reducing traffic for the source itself and for the links between sender and receivers. It must be noted that in a multicasting network all routers must be able to route multicast traffic. If pure unicast routers are present, then multicast traffic must be embedded into unicast traffic, resulting in tunnels, as, for instance, is necessary for the Internet MBone example described below. Pure unicast traffic, however, can be routed by unicast and multicast routers. Source

MR1 MRS

FIGURE 2.6 Multicast.

© 2005 by CRC Press

Recv. 1

Recv. 2

2-13

IP Internetworking

Internet

FIGURE 2.7 MBone multicast islands.

MR1

MR2

Unicast Routers FIGURE 2.8 Tunnel between two multicast routers MR1 and MR2. Logically, MR1 sends multicast traffic directly to MR2. Physically, the data are transported in the payload section of unicast packets.

2.5.2.1 MBone Most of the existing Internet routers either are not able to route multicast traffic, or this ability has not been activated. If a multicast data packet is received by such a pure unicast router, the packet cannot be routed and therefore is discarded. In contrast, the Internet multicast backbone (MBone) is a set of Internet routers that are able to route multicast data and that collaborate with each other. Each of these routers is also attached to a multicasting-enabled subnet; thus, the MBone makes up a set of interconnected multicast islands (Figure 2.7). MBone routers act at two levels. At the usual unicast level, they are standard Internet routers, able to communicate with all other Internet routers via unicast. At the multicast level, they logically send multicast traffic or multicast routing information only to other members of the MBone. As between two different MBone routers physically there may be an arbitrary number of pure unicast routers, multicast data and usually also routing information are sent inside unicast tunnels (Figure 2.8). This means that if a multicast router sends a multicast packet PM toward its receivers, it creates a new UDP unicast packet PU and puts the whole multicast packet PM (including its IP/UDP headers) into the data section of the UDP packet PU . The UDP packet PU is then sent to the next multicast router via unicast. For tunneling, usually IP in IP [RFC1853] is used, but the more general generic route encapsulation (GRE) [RFC2784] may also be used. The MBone is a so-called overlay network on top of the Internet, because the MBone routers together with the tunnels form up a second smaller logical network above the Internet, which at the multicast level is not necessarily aware of the lower-level Internet structure and all its unicast routers. At the multicast level, only the MBone multicast routers and tunnels (connections between the MBone routers) are visible. Nowadays, the MBone consists of thousands of multicast islands being interconnected via tunnels, and users attached to a multicast island may multicast audio and video transmissions to all other users connected to the MBone worldwide. 2.5.2.2 IPv4 Multicast Addressing Like in unicast, when sending a multicast UDP packet, the destination address field of the IP header represents the nodes that receive the packet. However, this destination address must be a class D IP multicast

© 2005 by CRC Press

2-14

The Industrial Communication Technology Handbook

TABLE 2.2

IPv4 Multicast Addressing Scheme

Start

End

Description

224.0.0.0 224.0.1.0 232.0.0.0 233.0.0.0 239.0.0.0

224.0.0.255 238.255.255.255 232.255.255.255 233.255.255.255 239.255.255.255

Routing protocols (e.g., DVMRP, topology discovery, etc.) Either permanently assigned or free for dynamic use Source-specific multicast (SSM) GLOP Administratively scoped IP multicast

address, also called group address. Thus, a multicast packet is always sent to a group of hosts rather than to a specific host. Table 2.2 shows parts of the Internet multicast addressing scheme [IANAM, ALB2004]. It can be seen that some parts of the addressing range are reserved, for instance, for routing protocols, etc.; some are reserved for static multicast groups, which are defined permanently; and some are reserved for different multicast address assignment schemes. For sending a multicast to a transient group (one that is created and destroyed again), the sender must obtain an unused multicast address. Unfortunately, there is no central authority for assigning such an address. Thus, users must either arbitrarily take an address from one of the free address ranges and hope that no one else uses it, or use tools like sd or sdr (see Section 2.5.2.5), which are able to suggest unused addresses. Alternatively, senders may use global scope multicast addresses (GLOP) [RFC2770] or multicast address-set claim (MASC) [RFC2909] for obtaining such an address. Finally, there is a range of multicast addresses that are devoted to limiting their scope within a hierarchically set scheme rather than with the somewhat crude TTL mechanism (explained in Section 2.5.2.4). These are called administratively scoped [RFC2365] addresses; i.e., a large company or institution may limit the set of multicast routers that may receive the sent traffic to their subnets, but not beyond. 2.5.2.3 Local Multicast Hosts wanting to receive multicast data must first join the respective group that will receive the data. If the casting is restricted to a specific LAN, then a receiver at least needs to implement the Internet Group Management Protocol (see Section 2.5.6). It must provide the functions JoinHostGroup(group-address, interface) and LeaveHostGroup(group-address, interface) for its IP service interfaces [RFC1112]. With IGMP, the host joins a group at the IP level and informs its local multicast router that it wishes to receive data sent to this group. The two interface functions instruct each network interface card that it should either join or leave a multicast group at the data link layer (ISO/OSI layer 2). When sending a multicast packet in a LAN, it is advisable to use the existing multicasting capabilities of the used LAN data link layer technology, which are often available additionally to unicast and broadcast. This means that inside a LAN, multicast data should be handled by layer 2 only, rather than layer 3. For instance, a multicast IP address (4 bytes) can be mapped to a unique IEEE 802 (e.g., Ethernet, FDDI, etc.) MAC layer multicast address (6 bytes). For this purpose, the IANA [IANA] has been assigned the IEEE 802 MAC address block from 01-00-5E-00-00-00 to 01-00-5E-FF-FF-FF for the sole use of IP multicasting. For mapping the IP multicast address to the corresponding MAC multicast address, the least significant 23 bits of the IP multicasting address are added to the IANA MAC multicasting base address 01-00-5E-00-0-00. As an IP class D address (32 bits) starts with 4 fixed bits (see Table 2.1), leaving 28 bits free to choose, 5 bits of an IP multicast address are ignored in this mapping, leading to the fact that 25 = 32 IP multicast addresses are always mapped to the same MAC multicast address. The procedure for the transmission of multicast traffic sent in the same LAN is simple. The sender sends the data to a specific IP multicast address AI, which is mapped to the corresponding MAC multicast address AM, and the destination MAC address of each sent frame is set to AM. If a network interface card is instructed to receive multicast sent to the IP multicast address AI (via call to JoinHostGroup), the IP multicast address is again mapped to the same MAC multicast address AM. Once the network interface card detects a frame having the very multicast MAC address AM as the destination address, it accepts the frame and passes it on to layer 3.

© 2005 by CRC Press

2-15

IP Internetworking

TABLE 2.3

Connection between TTL and Scope

TTL

Scope

128 64 48 16–32 1–16

Low-speed tunnels Intercontinental International (within the continent) National (depending on the links involved) Within institution

A call to LeaveHostGroup deletes this association at the receiver. From there on, received multicast frames sent to the group will be ignored. 2.5.2.4 Multicast Routing If multicast packets should be received outside their own LAN, things become more complicated. Whether packets should be sent beyond their own LAN via the local multicast router is in principle determined by the TTL field of the sent packet. Similar to unicast, this field is decremented by one by each router it passes by. Once it reaches zero, it is dropped. This automatically prevents packets from circulating through the Net forever due to incorrect routing tables and also provides scoping, i.e., a way for defining how far the sent packets may travel. For instance, if a packet should be received by hosts being attached to the same LAN only (and nowhere else), the TTL must be set to 1; if packets should be received only by hosts situated on the same continent as the sender, the TTL must be set to 48. Other values for the TTL limit the scope to certain areas centered around the sender (Table 2.3). If TTL is greater than one, then the local multicast router must forward the packet to each multicast router it is connected to. On the MBone this means the packet is sent over each tunnel going out of the local multicast router. As the sender does not know who the other members of the multicast group (i.e., the receivers) are, each multicast packet should be sent to all multicast routers of the MBone (i.e., flooding the whole network) in order to make sure that all group members get the sent data. However, this would lead to a drastic overload of the multicast network, and usually routing protocols exist that minimize the traffic and yet guarantee that each member of the multicast group will receive each packet that is sent to the group, for instance, the distance-vector multicast routing protocol (DVMRP) [RFC1075, PUS2003], multicast extensions to open shortest path first (MOSPF) [RFC1584], or protocol-independent multicast (PIM) [ADA2003, RFC2362]. 2.5.2.5 Multicast Applications Several tools have been created for creating, managing, and receiving multicast traffic over the MBone. For initializing and joining multicast sessions, the tools Session Directory (sd or sdr) or Multikit can be used. Sdr shows multicast programs currently being sent or scheduled for the future. It can also be used for obtaining an unused multicast address and announcing a multicast session to be scheduled for the future. When sessions are joined, sdr will launch the appropriate tools for presenting the program. This can be video tools like vic (video conferencing) or nv (network video), or audio tools like vat (visual audio tool) or rat (robust audio tool). Telephony is done via Free Phone (fphone), and a whiteboard application is given by wb. Other examples for multicast tools include text tools like the Network Text Editor (nt) and a polling tool (mpoll).

2.5.3 IPv6 The Internet Protocol version 6 (IPv6) has been designed for replacing the old IPv4 in the next-generation Internet [RFC1883, RFC1887]. It represents a totally new approach and is incompatible with version 4. As most Internet hosts and routers still only support IPv4, IP packets following IPv6 often cannot be transported from sender to receiver without further modification. Usually, when leaving the IPv6 subnetwork of the sender, IPv6 packets are tunneled over IPv4, i.e., transported in IPv4 packets, where the whole IPv6 packet is treated as pure IPv4 data.

© 2005 by CRC Press

2-16

The Industrial Communication Technology Handbook

The header has been simplified and contains only 7 fixed fields (the IPv4 header includes 13): • • • • •

A version field containing the value 6. A priority field distinguishing between data and real-time traffic. A flow label for supporting pseudo end-to-end connections with guaranteed QoS. The payload length specifies the size of the data contained in the packet. The next header points at the next optional header or an ID for the used transport protocol (TCP or UDP). • The hop limit is decreased by each passed-by router; a packet with zero hop limit is discarded. This prevents faulty packets from circling through the network forever. • Finally, the 16-byte source and destination addresses are contained. IPv6 offers the following enhancements with respect to IPv4: • Addresses are 16 bytes long, written in groups of four hexadecimal digits separated by colons (e.g., 8000:0000:1111:2222:3333:4444:ABCD:EFFF). This solves the shortage of IPv4 addresses caused by the exponential growth of the Internet. Even when wasting a lot of such addresses due to the inefficient use of network addresses, thousands of IP addresses could be assigned to each square meter of the Earth’s surface. • New address classes exist, including addresses for Internet service providers and geographical regions. • Due to the simpler header, routing is made more efficient. Additionally, IPv6 supports an arbitrary list of options that may be skipped by routers that do not support them. • IPv6 supports authentication and encryption. • IPv6 supports QoS for real-time applications. Of course, multicasting is also an intrinsic capability of IPv6 but will not be treated here. For more information, see [RFC2373] and [RFC2460]. Even though IPv6 offers substantial advantages, its implementation is costly and requires buying new routers and reconfiguring existing hosts. For these reasons, IPv4 still is the Internet Protocol today, and IPv6 will not dominate the Internet until the year 2010 or even later.

2.5.4 Address Resolution Protocol The Address Resolution Protocol (ARP) defined in [RFC826] and its complement, the Reverse Address Resolution Protocol (RARP) defined in [RFC903], are a means for connecting OSI layer 2 addresses to their corresponding layer 3 IP addresses. Basically, computers communicate with each other by sending messages on the data link layer (and subsequently the physical layer), for instance, by sending an Ethernet frame over an Ethernet variant. On this level all network cards following IEEE 802 are identified by globally unique 6-byte-long identifiers called MAC addresses. In order to successfully send an Ethernet frame, each sending network card must put both its own MAC address and the MAC address of the receiving card into the Ethernet frame. If too many computers are connected by a single layer 2 network (possibly via hubs, bridges, or switches), senders often know only the IP address of a receiver. However, for Ethernet cards, IP addresses are meaningless. In such situations, ARP can be used to find out the MAC address of a network card, which at a higher layer is bound to a given IP address. If computer A wants to find out the MAC address of a network card on computer B, which according to its IP address belongs to the same layer 2 subnet, then on computer A ARP is automatically activated. At first, computer A looks into a small ARP cache to find out if the desired binding is already stored there. If not, computer A generates an ARP request message (who is B.B.B.B tell A.A.A.A, where B.B.B.B is the IP address of computer B and A.A.A.A the IP address of computer A), which is no more than a special Ethernet frame containing the following information: • Ethernet protocol type is set to 0¥806.

© 2005 by CRC Press

IP Internetworking

• • • •

2-17

Sender MAC address. Sender IP address. Receiver MAC address is set to the Ethernet broadcast address FF:FF:FF:FF:FF:FF. Receiver IP address.

As the receiver in this Ethernet frame is the broadcast address, all network cards connected to the same subnet will receive this request, including computer B. Upon the reception of the ARP request message, computer B will then activate its own ARP, which will immediately send an ARP response message (B.B.B.B is HH:HH:HH:HH:HH:HH, where B.B.B.B is the IP address and HH:HH:HH:HH:HH:HH is the MAC address of the network card of computer B). Once the ARP response message has been received by computer A, computer A will store this IP-MAC address binding for computer B in its ARP cache and may start sending Ethernet frames to computer B. In order to avoid outdated ARP caches, these caches are periodically emptied. The purpose of RARP is to let computers find out their IP addresses upon start-up, in case they only know their MAC addresses. This can be the case, for example, for diskless workstations, which automatically attach to a server, or for workstations with identical disk images (which do not require manual setup). RARP works in a manner similar to that of ARP, except that the protocol type value is set to 0¥8035. Also, a router is required that contains a table with the MAC-IP bindings. Alternatives to RARP are given by the Bootstrap Protocol (BOOTP) or the Dynamic Host Configuration Protocol (DHCP), which allow the resolution of IP addresses in a more flexible way.

2.5.5 Internet Control Message Protocol The Internet Control Message Protocol (ICMP), defined in [RFC792] and [RFC1122], is used for automatically sending control signals and commands between computers attached to an IP network. Also, ICMP messages can be used for testing connections and measuring interconnection performance. ICMP messages are sent as special IP packets and thus can be handled by routers. As a consequence, ICMP messages can be sent to or received from arbitrary computers connected with each other over an IP network. An ICMP message contains the following data: • • • •

The type defines the purpose of the ICMP packet. There are over 30 different ICMP types. The code further defines the packet’s purpose. A header checksum. The rest of the packet may then contain further data depending on the ICMP type.

The most important ICMP packet types are: Echo Request: When receiving such an ICMP packet, the receiver should answer with an ICMP Echo Reply packet. Echo Reply: Answer to an ICMP Echo Request packet. Time Stamp Request: The same as Echo Request, except that the receiver answers with a Time Stamp Reply packet, which holds additional time stamps. Time Stamp Reply: The answer to an ICMP Time Stamp Request, which holds the time points at which the Time Stamp Request was received and the Time Stamp Reply was sent back. Destination Unreachable: This message is returned by a router to the source host to inform it that the destination of a previously sent packet cannot be reached. Time Exceeded: Sent from a router to a source host to inform it that the lifetime of a previously sent packet has reached zero. Parameter Problem: Sent to a source host to inform it that a previously sent packet contains invalid header data. Source Quench: Sent to a source host to inform it that due to insufficient bandwidth, it should lower its sending bit rate.

© 2005 by CRC Press

2-18

The Industrial Communication Technology Handbook

2.5.6 Internet Group Management Protocol The Internet Group Management Protocol (IGMP) is defined in [RFC2236] and is used by IP hosts to inform multicast routers in their LAN about their multicast group memberships (see Section 2.5.2.4). IGMP messages are encapsulated into IP datagrams with protocol number 2. The goal is to ensure that the multicast router knows whenever a host in its multicast island joins or leaves a multicast group. As a necessary prerequisite, all hosts wishing to receive multicast traffic must join the local LAN all-hostsgroup with multicast IP address 224.0.0.1. Periodically, multicast routers send a Host Membership Query message to the all-hosts-group of their attached LANs. Upon receiving this message, each host answers by reporting those host groups it is a member of by sending appropriate Host Membership Report messages (one for each group). In principle, the multicast router is interested only in whether, for a specific group A, there are members in the LAN. Thus, even if several hosts are members of the same group A, it is sufficient that only one membership report for A reaches the router. In order to minimize the sent membership reports, before sending a membership report for A, each member of A first waits a random amount of time. Then, if no other membership report of some other member of A has been received, the host sends its own membership report to the group address of A, thus reaching the multicast router (which receives all multicast traffic) and all other members of A (which in turn suppress their own membership reports for A). Additionally to the above scheme, if a host newly joins a multicast group, it sends a Host Membership Report to the multicast router without waiting for a query, thus being able to receive the respective traffic immediately in case it is the first member of this group in the LAN. To cover the probability of lost reports, this is done at least twice. An IGMP version 2 packet has the following format: • The type field defines the type of the message. • The maximum response time is meaningful only on Membership Query messages and defines the maximum allowed time before sending a report message (unit is 1/10 second). • The checksum is computed over the whole IP payload. • The group address field contains the respective multicast address. There are various types of IGMP messages: • • • •

Host Membership Query Group Specific Query Version 1 and Version 2 Membership Report Leave Group

Whenever a host leaves a group, it may send a Leave Group message to the all-routers multicast group (224.0.0.2) to inform all routers of the LAN that there are possibly no more members of this group present. If the leaving host was the one host that actually answered the last Membership Query for this group, then it should send this message. Upon receiving a Leave Group message, a router sends one or more Group Specific Query messages to the group that the host has left, to verify whether members of this particular group are left.

2.6 Transport Layer Protocols and Services 2.6.1 Transmission Control Protocol The Transmission Control Protocol (TCP) operates at OSI layer 4 (in the TCP/IP reference model at the transport layer) on top of IP and is assigned the IP number 6. It constitutes the most important Internet protocol and is defined in [RFC793], [RFC1122], and [RFC1323]. The purpose of TCP is twofold: • To guarantee the correct delivery of packets sent over an intrinsically unreliable packet-oriented IP network.

© 2005 by CRC Press

2-19

IP Internetworking

TCP

FIGURE 2.9 Full-duplex TCP connection. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

• To control the output bit rate of each sender in order to minimize packet losses due to congested routers or receivers. TCP operates connection oriented in full duplex. Applications using TCP may assume that a TCP connection opened from a source host to a receiver host is like a reliable pipeline or byte stream. Data (arbitrary bytes) put into this pipeline are guaranteed to drop out at the receiver without losses and in correct order (Figure 2.9). In order to guarantee this correctness, TCP divides the data to send into so-called segments, which are themselves sent in IP packets. In principle, IP packets can hold up to 65,535 bytes. However, in order to avoid fragmentation, the size of TCP segments is more importantly limited by the network’s MTU. Each segment starts with a TCP header, which is at least 20 bytes long, but may hold additional options. The rest of the segment may hold user data, but may also be empty. The TCP header contains the following information: • Source port and destination port. • A sequence number identifying each sent byte. This is wrapped back to zero, in case the highest number has been used. • An acknowledgment number denoting the number of the next expected byte. This field only contains valid data if the ACK bit is set. • The data offset holding the size of the TCP header. • Explicit congestion notification (ECN) and control bits, including URG, ACK, PSH, RST, SYN, and FIN. • Sender receive window size. • A header checksum. • An optional pointer to urgent data (URG flag set) and optional TCP headers. In order to create a TCP connection between two applications X and Y running on computers A and B, both applications first must get a port number, an identifier between 0 and 65,535, which can be assigned only once on each computer. The application X initiating the connection then must provide its own port number, the IP address of computer B, and the port number of the partner application Y to TCP. TCP then sends a segment to the given IP address and port number, where the SYN flag is set to 1 and ACK is set to 0, and a random sequence number x is chosen. If application Y correctly waits at the given port, the TCP on computer B answers with a segment, where the SYN and ACK bits are set, the sequence number of side B is set to a random number y, and the acknowledgment number is set to x + 1. Upon reception of this second segment, the TCP on computer A again sends a segment, where the SYN and ACK flags are set, the sequence number is set to x + 1, and the acknowledgment number is set to y + 1. As three segments must be sent for establishing a TCP connection, this process is called three-way handshake (see Figure 2.10). After the establishment of the connection, each side may send arbitrary bytes to the other side. If one side wants to terminate the connection, a segment with set FIN flag must be sent. Otherwise, if, for instance, application X sends data to Y, then the data are put into one or more TCP segments, which are then sent via IP to computer B. Due to the sequence numbers of each segment, the TCP layer at B is able to realize missing segments or the out-of-order delivery of segments. For each correctly received segment, B must send an acknowledgment segment back to A, where the acknowledgment number identifies the number of the next expected byte. The TCP on computer A, on the other hand, starts a

© 2005 by CRC Press

2-20

The Industrial Communication Technology Handbook

Host 1

Host 2

SY N ( S EQ =

=y (S E Q SYN

SYN

(S EQ

, AC K

= x+ 1 ,

x)

=x +1 )

AC K=

y+ 1 )

FIGURE 2.10 TCP three-way handshake. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

so-called retransmission timer for each sent segment. If no acknowledgment has been received within a certain amount of time, computer A assumes that the segment was lost and has to be sent again. TCP also maintains two so-called sliding windows in order to control the transmission bit rate of each sender (flow control). One window simply tells each sender how many bytes the receiver may currently receive without risking a buffer overflow. This information is transmitted in each ACK segment in the receive window size field. The second window is called congestion window (CWND). Here, each sender additionally restricts the number of bytes it may send without acknowledgment to the congestion window size. Initially, the window size is set to 1 packet (i.e., the maximum allowed segment size), a strategy that is called slow start. For each acknowledged byte, TCP increases the size of its congestion window, at first with exponential speed, but after reaching a certain threshold h, only with linear speed. If a timeout of the retransmission timer occurs, h is set to h/2 and the congestion window is reset to one packet. Instead of waiting for the retransmission timer to time out, a strategy called fast retransmit enables receivers to send duplicate ACKs to the sender, in case out-of-order segments are received. A sender receiving more than two or three such duplicate ACKs may deduce that an intermediate segment has been lost rather than that the segments have been just remixed on the way, and may retransmit the missing segment earlier [RFC1122].

2.6.2 User Datagram Protocol The User Datagram Protocol (UDP) is the second important IP at OSI layer 4 (in the TCP/IP reference model at the transport layer) [RFC768, RFC1122] and is assigned the IP number 17. It is meant for transporting application data in a message-oriented, unreliable manner from one application to another. As most functionality is already provided by IP, the UDP header only contains the port numbers of the source and receiver applications, the length of the UDP packet, and a checksum. As UDP does not provide any functionality for detecting lost packets or out-of-order delivery, it is mostly used either in local networks with large bandwidths and reliable layer 2 transport, or for transporting multimedia data like live broadcasts, where a few lost packets will not seriously decrease the perceived quality of the presentation. In any case, detection of lost packets or out-of-order delivery must be carried out by the receiving applications, usually by including sequence numbers into the UDP application data. The interpretation of these numbers is left solely to the applications.

© 2005 by CRC Press

2-21

RE

PATH

V

RESV

TH

Source

ES R

SV

PA

PATH

PA TH

IP Internetworking

RESV

Destination

FIGURE 2.11 RESV and PATH messages in a multicast tree. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)

2.6.3 Resource Reservation Protocol IPv4 does not contain mechanisms for guaranteeing a minimum quality of service (QoS) for its traffic, for instance, a minimum sustainable end-to-end bit rate or a maximum end-to-end delay or jitter (delay variation). This may severely affect the presentation quality of real-time transmissions, using, for example, Real-Time Protocol (RTP) (see application layer protocols). The Resource Reservation Protocol (RSVP) tries to fill this gap by providing means for guaranteeing certain quality of service parameters [RFC2205, RFC2750]. It is an optional add-on for Internet routers and clients using IP (IPv4 and IPv6), and is currently available on a small subset of Internet hosts only. Being at the same level as TCP or UDP, it has its own IP number (47). RSVP is no routing protocol itself, but rather a signaling protocol. It cooperates with other routing protocols for controlling efficient unicast and multicast over IP. RSVP allows two different QoS modes. In the controlled load service, RSVP simulates a lightly loaded network for its clients, although the network itself may be overloaded [RFC2211]. Although no hard QoS parameters are met, a lightly loaded network is likely to be sufficient for many load-tolerant and adaptive applications like audio/video streaming. In contrast, the guaranteed service guarantees that the RSVP path will meet the agreed QoS level at all times [RFC2212]. A client application wishing to receive a multicast multimedia stream passes this request to its local RSVP daemon. This daemon then sends a reservation (RESV) request to adjacent RSVP routers toward the multimedia source along the reverse multicast tree path. The RESV request contains a description of the desired quality of service in a so-called flow descriptor. Coming from the other side, the multicast source periodically sends PATH messages down the multicast tree. PATH messages create and acknowledge valid and active multicasting paths (Figure 2.11). Also, they carry information about the quality of service of the path from source to receiver. RSVP routers may merge different QoS requests into one single reservation, here choosing the maximum of each request as the prereserved QoS level. During runtime, reservations may be changed to other QoS levels. Also, RSVP paths must be acknowledged periodically by PATH and RESV messages, but RSVP is fault tolerant with respect to a few missing messages. Only if none have been received for a certain time is the whole path cancelled. On each RSVP router, an RSVP daemon manages and controls the IP routing process. It consists of the following modules: An incoming QoS reservation request is approved or denied by the admission control, depending on whether the QoS request can be satisfied. The rights for making reservations are checked in the policy control module. Incoming data packets are sorted by the packet classifier, which puts them into different queues. Finally, the packet scheduler is responsible for granting the agreed QoS to the packets in the routing queues; packets belonging to the same queue are treated identically.

2.7 Presentation Layer Protocols and Services Applications may send arbitrary data to others, often embedding complex data structures into their messages. In this process, the data structures have to be transformed (flattened, marshalled) into a sequence of bytes, containing the data as well as information about the used data representation. The

© 2005 by CRC Press

2-22

The Industrial Communication Technology Handbook

receiver must be able to understand the structure of the byte sequence and how to interpret the single bytes in order to reconstruct the sent data structures. This is achieved by the presentation layer (layer 6 of the ISO/OSI model). The presentation layer ensures that two computer systems may successfully communicate even if they use different data representations. Due to different data representation schemes, the presentation layer often is forced to translate sent or received messages. This, however, should be done in a manner totally transparent to the OSI application layer above. Problems may arise, for instance, because of the CPU byte order. In modern 32-bit architectures, CPUs store values and addresses using 32 bits, stored in four consecutive bytes. In Intel processors, for example, the least significant byte is stored first and the most significant byte last. This is called little endian. On the other hand, for example, Motorola processors store a 4-byte value in the reverse order, called big endian. If an Intel-based computer sends a 32-bit value to a Motorola-based computer, without further corrective measures, the receiver totally misinterprets the received value. This may be prevented, for instance, by forcing the sender to convert the data to the receiver’s format before sending, or alternatively forcing the receiver to convert the data from the sender’s format after receiving. A third approach is to agree to a commonly used format and to convert to this format before sending or from this format after receiving. TCP/IP, for instance, defines a common network byte order. Using, for example, the C programming language, 32-bit values may be converted to and from this format by the macros hton() and ntoh(). Another system using an external data format is given by the external data representation (XDR), as specified by Sun [SUN1990]. An additional problem arising in different computers is the code interpretation. For instance, characters may be stored using one of the following codes: ASCII (common in Intel compatibles, 8 bits/character), EBCDIC (used on IBM mainframes, 8 bits/character), or UNICODE (16 or 20 bits/character). Here, the presentation layer is responsible for automatically translating between the various code schemes. At the next-higher decoding level, received complex data structures should be reconstructed (unmarshalled) from their flattened byte sequence representation. For inhomogeneous data, the data structures must be described by metadata, for instance, defining the data types belonging to each structure, being followed by the data values themselves. This, for instance, can be achieved by using the standardized Abstract Syntax Notation 1 (ASN.1) [X680]. Other tasks of the presentation layer include the encryption of messages and supporting authentication. Finally, the presentation layer may also be responsible for the compression of data.

2.8 Application Layer Protocols and Services In both the ISO/OSI scheme and the TCP/IP reference model, the application layer defines protocols directly to be used by applications for exchanging data with each other. These include, for instance, authentication, distributed databases and files systems, file transport, data syntax restrictions, coordination and agreement procedures, quality of service issues, e-mail, and terminal emulation. Many standard protocols are already specified by the Internet Engineering Task Force (IETF). They define standard data structures that are to be exchanged between applications. Applications following these protocols are guaranteed to be able to successfully interact with other applications over the Internet, even if these applications have been created by different sources. For instance, Web browsers following HTTP may download Web pages from any Web server connected to the Internet. The IETF-specified protocols usually use TCP for reliable transport and UDP for the transport of realtime multimedia data (although real-time multimedia data may also be sent over TCP). Usually, both control commands and pure data can be transmitted over the same TCP connection. For signaling the end of a data transmission, one of three approaches is used. In octet stuffing, the end of a data transmission is signaled by a certain byte sequence (similar to the bit stuffing used at the data link layer). If the transported data also contain this very sequence, the sequence is changed (escaped) into another sequence. The receiver must detect such a change and undo it. An example for octet stuffing is SMTP. In octet

© 2005 by CRC Press

IP Internetworking

2-23

counting, transported messages contain special headers that specify the number of data bytes to be transferred. This concept is used, for instance, in HTTP. Finally, in connection blasting, the end of a transmission is signaled by closing the TCP connection. This is used, for instance, in FTP.

2.8.1 TELNET The TELNET protocol is meant for providing a general 8-bit interface for the communication between users, hosts, and processes [RFC854]. Generally, a TELNET client running on computer A opens a TCP connection to port 23 of a TELNET server on computer B. Both sides then emulate a certain simple type of terminal called network virtual terminal (NVT), but may negotiate additional services after the connection has been established. An NVT is a bidirectional character device consisting of a printer that shows the information received from the other side, and a keyboard where keystrokes are produced and sent to the other side. TELNET defines a set of commands that may be sent in-band with the stream of data. The mechanism used here is the octet stuffing. Byte 255 is called interpret as command (IAC) and signals that the following byte specifies a TELNET command, for example, for sending an interrupt to the running process or for erasing the last character. If a data byte with value 255 is to be sent, then two bytes with value 255 are sent. Upon receiving two consecutive bytes with value 255, the receiver side must remove one of them automatically.

2.8.2 File Transfer Protocol The File Transfer Protocol (FTP) is used for transporting arbitrary binary data from one Internet host to another [RFC959]. On computer A, an FTP client is started with the IP or DNS address of the Internet computer B with which communication is desired. The FTP client then opens a TCP connection to port 21 of computer B, representing the control connection. The control connection uses the TELNET protocol underneath, and users may send control commands to the FTP server on computer B, including the request for showing the contents of the current directory at computer B (LIST), as well as changing this current directory to another one (CWD), creating new directories (MKD), etc. Additionally, the user may start uploads (STOR) or downloads (RETR) of files to and from the current directory. Upon the reception of a control command over the control connection, the server answers with a reply, sending status or error information to the client. One has to distinguish between the FTP control command that is actually sent over the control channel and commands that are typed in by users into a command line application, which may be different. Once data are to be sent, a TCP data connection is opened by the server on computer B from port 20 to the user client on computer A listening on port 21. Then, depending on the specified direction, the data are sent either from A to B or vice versa. After transmitting the last byte, the sender must close the data connection, indicating to the other side that the transmission has ended. It is worth noting that FTP knows different transmission modes. In the binary mode, the data are sent without modification. In the ASCII mode, the FTP automatically changes different character representation codes, for instance, when sending a pure text file from an IBM mainframe (using EBCDIC) to a PC (using ASCII), or when exchanging data between different operating systems like Microsoft Windows and Unix or Unix-like operating systems (having different end-of-line representations in text files).

2.8.3 Hypertext Transfer Protocol The Hypertext Transfer Protocol (HTTP) is available as version 1.0 [RFC1945] and version 1.1 [RFC2616]. Its purpose is to manage the download of documents that are part of the World Wide Web (WWW), usually following the Hypertext Markup Language (HTML) [RFC1866]. Most Web browsers and servers nowadays understand HTTP/1.0, although [RFC1945] is not a standard but rather an informational guideline. Newer Web clients and servers also support the standardized HTTP/1.1.

© 2005 by CRC Press

2-24

The Industrial Communication Technology Handbook

HTTP is a client/server-based protocol following the octet-counting approach. A client wishing to download a specific document from a Web server opens a TCP connection to the server port 80 (sometimes 8080). The client then sends a request, containing a request line, various headers, an empty line, and an optional body. The request line specifies what the client wants the server to do. For example, a request line “GET /dir1/dir2/the_document.html HTTP/1.1” informs the server that the clients wants to download the document “the_document.html,” which is situated in the directory “/dir1/dir2” by using HTTP/1.1. Clients may also send data to the server, for example, a form that has been filled out by a user. This can be done using the PUT command. The server then answers by sending a status line containing a code for success or an error description, various headers describing the downloaded document (e.g., its size or the time stamp of its last change), followed by an empty line. Finally, in the message body, the HTML document itself is transported to the client. HTTP/1.1 masters several limitations of HTTP/1.0. For example, an HTML document may contain several other subdocuments, like photos, wall papers, frames, etc. In HTTP/1.0, for each subdocument a new TCP connection has to be created. In HTTP/1.1, all subdocuments can be transported over the same persistent TCP connection.

2.8.4 Simple Mail Transfer Protocol The Simple Mail Transfer Protocol (SMTP) defines the exchange and relay of text mails over TCP/IP [RFC821]. If a mail client running on computer A wants to send mail to a receiver on computer B, it opens a TCP connection to port 25 of either computer B or an intermediate mail server that is able to pass on the mail to the receiver on computer B. Then the sender client sends SMTP commands to the receiver, which replies by sending SMTP responses. Once the sender wants to send an electronic mail, it sends the command MAIL with an identifier for the sender. If the receiver is willing to accept mail from the sender, it answers with an OK reply. Now, the sender client sends a sequence of receipt to (RCPT) commands, which identify the receivers of the mail. Each recipient is acknowledged individually by an OK reply. Once all receivers have been specified, the client sends a DATA command followed by the mail data itself. In order to indicate the end of the mail, the client sends a line containing only a period. If such a line is part of the message, the sender will introduce an additional period, which is removed by the receiver automatically (octet stuffing). In SMTP, (text) mail must be composed of 7-bit ASCII characters only (byte values 0 to 127), a limitation that was not severe in 1982 when SMTP was designed. Nowadays, electronic mail often contains multimedia attachments like audio or video files, where each byte may contain any value between 0 and 255. In order to be able to transport binary data over SMTP, these data are usually transformed into a sequence of 7-bit ASCII characters by using a byte-to-character mapping like Base64 or uuencode. Upon receiving such a transformed character sequence, the receiver must apply the inverse of the transform in order to retrieve the original binary data.

2.8.5 Resource Location Protocol Computers connected over an IP network may offer a variety of services to others, including services standardized by the IETF like DNS, SMTP, FTP, etc., as well as self-created services, for instance, for managing personal information. The Resource Location Protocol (RLP) has been designed to enable arbitrary computers to automatically find other computers that provide specific services [RFC887]. For this purpose, RLP defines a set of request messages that may be sent by the searching computer. RLP uses UDP as a transport protocol. A request message is sent to the UDP port 39 of another host and contains a question and a description of one or more services that are looked for. Depending on the question, hosts that provide the service or know of others that do answer by sending a reply message. RLP defines the following request messages:

© 2005 by CRC Press

IP Internetworking

2-25

• Who Provides? is usually broadcast into a LAN. Hosts providing one of the described services may answer; hosts that do not provide any of the specified services may not. • Do You Provide? is directly sent to some specific host. It may not be broadcast. A host receiving this message must answer, regardless of whether it provides any of the specified services. • Who Anywhere Provides? also is usually broadcast into a LAN. Hosts either providing any service or knowing other hosts that do so may answer. • Does Anyone Provide? is sent to a specific host, which must send back an answer, regardless of whether it knows of any host providing any of the services. There are two possible answers. The I Provide reply contains a (possibly empty) list of services that are supported by the answering host. The They Provide reply contains a (possibly empty) list of supported services, qualified by a list of host IP addresses supporting them. An RLP message contains the following fields: • The type field defines the question or reply type. • The flag local only specifies whether only hosts with the same IP network address should answer or be included in the answer list. • A message ID enables the mapping of received answers to previously sent requests. • Finally, the resource list contains a description of the looked-for or provided services and supporting hosts. Resources and services may by described by several fields. The first description byte specifies the IP number of the IP transport protocol the service uses, for instance, 6 for TCP or 17 for UDP. The next byte defines the port that is usually used by the service, for instance, 23 for TELNET or 25 for SMTP. Additional bytes may then define arbitrary self-created services.

2.8.6 Real-Time Protocol The Real-Time Protocol (RTP) has been designed for carrying real-time multimedia data like audio or video information [RFC3550]. Multimedia data usually are produced as a continuous stream of bits. For this stream to be carried over the network, it must be packetized and sent as a sequence of packets to one (unicast) or several (multicast) receivers. For real-time traffic, UDP is preferred over TCP, as the delivery of late or lost packets (which is mandatory for TCP) may cause the presentation to stall, which is undesirable, for example, for video conferences. Instead, in case of (a few) lost packets, small artifacts may be visible or audible, which are less annoying than a complete connection breakdown or stall. At the receiver, the original sequence of RTP packets and its content are restored, and lost packets are identified. Pure RTP does not know anything about the payload content. Instead, RTP headers may be altered to fit the needs of specific applications like audio and video conferences. Such changes are then defined in so-called profile specifications. Additionally, different RTP payload formats may be defined in payload format specifications, as, for instance, is given by [RFC2190] for H.263. The RTP specification defines the following header fields: • • • • • • • • •

The RTP version (1 or 2) Padding and header extension flags Contributing sources (CSRC) count, i.e., length of the CSRC list A marker flag M to be used freely by profiles Payload type (PT), must be interpreted by the application Sequence number, increased by one for each new RTP packet Time stamp of the sampling of the first RTP payload byte RTP synchronization source identifier, which must be unique for concurrent RTP sessions An optional contributing sources list (CSRC list)

As RTP is transported over the best-effort protocols TCP/UDP/IP, no guarantee can be made that a required bit rate is available for the real-time transport. Instead, RTP provides a means for measuring

© 2005 by CRC Press

2-26

The Industrial Communication Technology Handbook

and controlling the output bit rate and perceived quality of service of a real-time stream. This procedure is provided by the Real-Time Control Protocol (RTCP). RTCP can carry the following information: • • • • •

In a sender report, statistics for each active sender are sent to the receivers. In a receiver report, receivers (which are not senders) send reception statistics to the active senders. Sender attributes like e-mail addresses, etc. (source description). The request for leaving the presentation. Application-specific control information.

For each real-time session (transporting exactly one medium like audio or video), each participant needs two ports — one for RTP and one for RTCP. RTP is able to multiplex several sessions into one. This is done by a so-called mixer. For example, the audio data of several participants of an audio conference may be mixed into one single audio stream and sent over a connection with low bandwidth. Here, the mixer would act as a new synchronization source; the IDs of the original sources, however, may then be stored additionally after the RTP header in the list of contributing sources. Another RTP entity is a translator, which is able to change payload content or tunnel packets through a firewall.

2.9 Summary The TCP/IP suite consists of numerous protocols covering several layers of the ISO/OSI stack or, alternatively, the TCP/IP reference model. Starting at OSI layer 2, protocols are defined for link-level services for secure frame transport over IP at OSI layer 3 for the unreliable delivery of datagrams from one host connected to the Internet to another. At OSI layer 4, transport protocols regulate the either reliable and controlled or the unreliable transport of data from process to process. Further services may alter the data due to different presentation schemes or offer direct support to applications.

References [ADA2003] A. Adams, J. Nicholas, W. Siadak, Protocol Independent Multicast–Dense Mode (PIM-DM): Protocol Specification (Revised), IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/ draft-ietf-pim-dm-new-v2-04.txt. [ALB2004] Z. Alb, et al. IANA Guidelines for IPv4 Multicast Address Assignments, IETF Internet Draft, 2004, http://www.ietf.org/Internet-drafts/draft-ietf-mboned-rfc3171bis-01.txt. [COL2001] G. Coulouris, J. Dollimore, T. Kindberg, Distributed Systems, 3rd edition, Addison-Wesley, Boston, 2001. [HAL1996] F. Halsall, Data Communications, Computer Networks and Open Systems, 4th edition, AddisonWesley, Reading, MA, 1996. [IANA] Internet Assigned Numbers Authority, http://www.iana.org/. [IANAM] Internet Assigned Numbers Authority, Internet Multicast Addresses, http://www.iana.org/ assignments/multicast-addresses. [ISO7498] ISO/IEC 7498-1, Information Technology–Open Systems Interconnection: Basic Model, ISO, 1994. [KUR2001] J.F. Kurose, K.W. Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Reading, MA, 2001. [PET2000] L.L. Peterson, B.S. Davie, Computer Networks: A Systems Approach, 2nd edition, Morgan Kaufmann, San Francisco, 2000. [PUS2003] T. Pusateri, Distance Vector Multicast Routing Protocol, Version 3, IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/draft-ietf-idmr-dvmrp-v3-11.txt. [REI2001] U. Reimers, Digital Video Broadcasting, Springer, Berlin, 2001. [RFC768] RFC 768, User Datagram Protocol, IETF, 1980, http://www.ietf.org/rfc/rfc0768.txt. [RFC791] RFC 791, Internet Protocol: DARPA Internet Program Protocol Specification, DARPA, 1981, http://www.ietf.org/rfc/rfc791.txt.

© 2005 by CRC Press

IP Internetworking

2-27

[RFC792] RFC 792, Internet Control Message Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc792.txt. [RFC793] RFC 793, Transmission Control Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc793.txt. [RFC821] RFC 821, Simple Mail Transfer Protocol, IETF, 1982, http://www.ietf.org/rfc/rfc821.txt. [RFC826] RFC 826, An Ethernet Address Resolution Protocol, IETF, 1982, http://www.ietf.org/rfc/ rfc826.txt. [RFC854] RFC 854, Telnet Protocol Specification, IETF, 1983, http://www.ietf.org/rfc/rfc854.txt. [RFC887] RFC 887, Resource Location Protocol, IETF, 1983, http://www.ietf.org/rfc/rfc887.txt. [RFC903] RFC 903, A Reverse Address Resolution Protocol, IETF, 1984, http://www.ietf.org/rfc/rfc903.txt. [RFC959] RFC 959, File Transfer Protocol (FTP), IETF, 1985, http://www.ietf.org/rfc/rfc959.txt. [RFC977] RFC 977, Network News Transfer Protocol: A Proposed Standard for the Stream-Based Transmission of News, IETF, 1986, http://www.ietf.org/rfc/rfc977.txt. [RFC1034] RFC 1034, Domain Names: Concepts and Facilities, IETF, 1987, http://www.ietf.org/rfc/ rfc1034.txt. [RFC1035] RFC 1034, Domain Names: Implementation and Specification, IETF, 1987, http:// www.ietf.org/rfc/rfc1035.txt. [RFC1075] RFC 1075, Distance Vector Multicast Routing Protocol, IETF, 1988, http://www.ietf.org/rfc/ rfc1075.txt. [RFC1112] RFC 1112, Host Extensions for IP Multicasting, IETF, 1989, http://www.ietf.org/rfc/ rfc1112.txt. [RFC1122] RFC 1122, Requirements for Internet Hosts: Communication Layers, IETF, 1989, http:// www.ietf.org/rfc/rfc1122.txt. [RFC1157] RFC 1157, A Simple Network Management Protocol (SNMP), IETF, 1990, http:// www.ietf.org/rfc/rfc1190.txt. [RFC1323] RFC 1323, TCP Extensions for High Performance, IETF, 1992, http://www.ietf.org/rfc/ rfc1323.txt. [RFC1584] RFC 1584, Multicast Extensions to OSPF, IETF, 1994, http://www.ietf.org/rfc/rfc1584.txt. [RFC1853] RFC 1853, IP in IP Tunneling, IETF, 1995, http://www.ietf.org/rfc/rfc1853.txt. [RFC1866] RFC 1866, Hypertext Markup Language: 2.0, IETF, 1995, http://www.ietf.org/rfc/rfc1866.txt. [RFC1883] RFC 1883, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1995, http://www.ietf.org/ rfc/rfc1883.txt. [RFC1887] RFC 1887, An Architecture for IPv6 Unicast Address Allocation, IETF, 1995, http:// www.ietf.org/rfc/rfc1887.txt. [RFC1918] RFC 1918, Address Allocation for Private Internets, IETF, 1996, http://www.ietf.org/rfc/ rfc1918.txt. [RFC1945] RFC 1945, Hypertext Transfer Protocol: HTTP/1.0, IETF, 1996, http://www.ietf.org/rfc/ rfc1945.txt. [RFC2190] RFC 2190, RTP Payload Format for H.263 Video Streams, IETF, 1997, http://www.ietf.org/ rfc/rfc2190.txt. [RFC2205] RFC 2205, Resource ReSerVation Protocol (RSVP), IETF, 1997, http://www.ietf.org/rfc/ rfc2205.txt. [RFC2211] RFC 2211, Specification of the Controlled-Load Network Element Service, IETF, 1997, http:// www.ietf.org/rfc/rfc2211.txt. [RFC2212] RFC 2212, Specification of Guaranteed Quality of Service, IETF, 1997, http://www.ietf.org/ rfc/rfc2212.txt. [RFC2236] RFC 2236, Internet Group Management Protocol, Version 2, IETF, 1997, http://www.ietf.org/ rfc/rfc2236.txt. [RFC2362] RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, IETF, 1998, http://www.ietf.org/rfc/rfc2362.txt. [RFC2365] RFC 2365, Administratively Scoped IP Multicast, IETF, 1998, http://www.ietf.org/rfc/ rfc2365.txt.

© 2005 by CRC Press

2-28

The Industrial Communication Technology Handbook

[RFC2373] RFC 2373, IP Version 6 Addressing Architecture, IETF, 1998, http://www.ietf.org/rfc/ rfc2373.txt. [RFC2460] RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1998, http://www.ietf.org/ rfc/rfc2265.txt. [RFC2616] RFC 2616, Hypertext Transfer Protocol: HTTP/1.1, IETF, 1999, http://www.ietf.org/rfc/ rfc2616.txt. [RFC2750] RFC 2750, RSVP Extensions for Policy Control, IETF, 2000, http://www.ietf.org/rfc/ rfc2750.txt. [RFC2770] RFC 2770, GLOP Addressing in 233/8, IETF, 2000, http://www.ietf.org/rfc/rfc2770.txt. [RFC2784] RFC 2784, Generic Route Encapsulation (GRE), IETF, 2000, http://www.ietf.org/rfc/ rfc2784.txt. [RFC2909] RFC 2909, The Multicast Address-Set Claim (MASC) Protocol, IETF, 2000, http:// www.ietf.org/rfc/rfc2909.txt. [RFC3550] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, IETF, 2003, http:// www.ietf.org/rfc/rfc3550.txt. [RFC3376] RFC 2376, Internet Group Management Protocol Version 3, IETF, 2002, http://www.ietf.org/ rfc/rfc3376.txt. [SCO1991] T. Scocolowski, C. Kale, A TCP/IP Tutorial, IETF, Network Working Group, RFC 1180, January 1991, http://www.ietf.org/rfc/rfc1180.txt. [SUN1990] Network Programming, Sun Microsystems, Inc., Mountain View, CA, March 1990. [TAN1996] A.S. Tanenbaum, Computer Networks, 3rd edition, Prentice Hall, 1996. [X680] Information Technology: Abstract Syntax Notation One (ASN.1): Specification of Basic Notation, ITU-T Recommendation X.680 (1997), ISO/IEC 8824-1, 1998.

© 2005 by CRC Press

3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues 3.1 3.2 3.3

Introduction ........................................................................3-1 Routing and Routers ...........................................................3-2 Routing Algorithm Design Issues ......................................3-3 Optimality • Convergence • Scalability • Robustness • Flexibility and Stability • Simplicity

3.4

Classification of Routing Protocols ...................................3-4 Static or Dynamic • Global or Decentralized • Link State or Distance Vector • Single Path or Multipath • Flat and Hierarchical • Intra-AS and Inter-AS • Unicast and Multicast

3.5

IP Unicast Routing: Interior and Exterior Gateway Protocols ..............................................................................3-8 Interior Gateway Protocols for IP Networks • Exterior Gateway Protocols for IP Internetworks

3.6

IP Multicast Routing.........................................................3-17 Distance-Vector Multicast Routing Protocol • Multicast OSPF • Protocol-Independent Multicast • Core-Based Tree • Interdomain IP Multicast Routing

3.7

IP Addressing and Routing Issues....................................3-19 Classful IP Addressing • Impact of IP Addressing on Routing Tables and Internet Scalability • Subnetting • Variable-Length Subnet Masks • Classless Interdomain Routing

3.8

IPv6 Overview ...................................................................3-24 IPv6 Addressing, Subnetting, and Routing • IPv6 Deployment in the Current Internet: State of the Art and Migration Issues

Lucia Lo Bello University of Catania

3.9 Conclusions .......................................................................3-29 References .....................................................................................3-29

3.1 Introduction This chapter addresses routing from a broad perspective. After an introduction on routing algorithm principles, an overview of the routing protocols currently used in the Internet domain is presented. Both unicast and multicast routing are dealt with. The chapter then focuses on the strict correlation between Internet Protocol (IP) routing and addressing.

3-1 © 2005 by CRC Press

3-2

The Industrial Communication Technology Handbook

The impact of the traditional classful addressing scheme on the size of routing tables for Internet routers and its poor scalability in today’s Internet is discussed. Then, classless interdomain routing (CIDR), which, for the time being, has solved the problems previously mentioned, is presented and discussed. Finally, the next-generation IP, which represents the long-term solution to the problems of the current Internet, is introduced. IP version 6 (IPv6) is outlined and issues on the IPv4-to-IPv6 transition addressed.

3.2 Routing and Routers Two or more networks joined together form an internetwork, where network layer routing protocols implement path determination and packet switching. Path determination consists of choosing which path (or route) the packets are to follow from a source to a destination node, while packet switching refers to transporting them. Path determination is accomplished by routing algorithms that, given a set of routers and links connecting them, determine the best (i.e., least-cost) path from source to destination, according to a given cost metric. A router is a specialized network computing device, similar to a computer, but optimized for packet switching. It typically contains memory (ROM, RAM, Flash) and some kind of bus and is equipped with an operating system (OS), a configuration, and a user interface. As happens inside a computer, in a router a boot process loads bootstrap code from the ROM, thus enabling the device to load its operating system and configuration into the memory. A significant difference between a router and a computer lies in the user interface and memory configuration. While DOS or UNIX systems typically have one physical bank of memory chips that will be allocated by the software to different functions, routers feature several distinct banks of memory, each dedicated to a different function. In many routers, OSs are stripped-down schedulers derived from early versions of Free BSD (Berkeley Software Distribution) UNIX. A growing interest in the Linux OS has recently appeared. Some vendors run proprietary OSs on their routers (for example, Cisco routers run the Internetwork Operating System (IOS), which embeds a broad set of functions). A router’s task is to switch IP packets between interconnected networks. In order to allow the calculation of the best path for individual packets, routing protocols enable routers to communicate with each other, exchanging both topology information (e.g., about neighbors and routes) and state information (e.g., costs), which are fed into routing tables. A routing table consists of a list of routing entries indicating which outgoing link should be used to forward packets to a given destination. Figure 3.1 shows a simplified routing table. When a router receives an incoming packet, it checks the routing table to find a destination/nexthop association for the destination address specified in the packet. The routing table data structure contains all the information necessary to forward an IP data packet toward its destination. When forwarding an IP data packet, the routing table entry providing the best match for the packet’s IP destination is chosen.

Destination Network

Next Router

# of Hops to Destination

Interface

205.219.0.0

205.219.5.2



Ethernet 0

151.5.0.0

160.4.2.5

5

Ethernet 0

Default

193.55.114.128

2

Ethernet 1

FIGURE 3.1

A simplified routing table.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-3

3.3 Routing Algorithm Design Issues While designing routing algorithms, several issues have to be addressed. The main ones are listed and discussed below.

3.3.1 Optimality Optimality is the ability to find the optimal (i.e., least-cost) path from source to destination according to a given metric. A single metric may be used, or a combination of multiple metrics in a single hybrid one. The most common routing metric is path length, usually expressed in terms of hop count, i.e., the number of hops that a packet must make on its path from a source to a destination. Alternatively, a network administrator may assign arbitrary costs (expressed as integer values) to each network link and calculate the path length as the sum of the costs associated with each link traversed. These costs account for several link features, such as: • Bandwidth. • Routing delay: The length of time required to move a packet from source to destination through the internetwork. • Load: The degree of utilization of a router, obtained by monitoring variables such as CPU utilization or packets routed per second. • Reliability: Accounts for the link’s fault probability or recovery time in the event of failure. • Monetary cost: Companies may prefer to send packets over their own lines, even though slower, rather than through faster, but expensive external lines that charge money for usage time. Different routing protocols generally adopt different metrics and algorithms that are not compatible with each other. As a result, in a network where multiple routing protocols are present, a way to determine the best path across the multiple protocols has to be found. Each routing protocol is therefore labeled with an integer value that defines the trustworthiness of the protocol, called the administrative distance. When there are multiple different routes to the same destination from two different routing protocols, routers will select the route supplied by the protocol with the shortest administrative distance.

3.3.2 Convergence When a router detects a topology change (e.g., new routes being added, existing routes changing state, etc.), this information must be propagated through the network and a new routing topology calculated. Routers achieve this by distributing routing update messages to the other routers, thus stimulating recalculation of optimal routes and eventually causing all routers to agree on these routes. The time taken to detect changes in the network topology, reconfigure the topology correctly, and agree, called convergence time, is a very important characteristic of routing algorithms. Slow convergence should be avoided, as it may entail network interruption or routing loops. These occur when, due to slow convergence, a packet arriving at a router A or B and destined for a router C bounces back and forth between these two routers until either convergence is reached or the packet has been switched the maximum number of times allowed. Convergence time may depend either on the network topology and size (e.g., number of routers, link speeds, routing delays) or on the routing protocol used and the setting of the relevant timing parameters.

3.3.3 Scalability Routing algorithms that behave well in small systems should also scale well in larger internetworks. Unfortunately, some routing algorithms (such as those based on heavy flooding techniques), while performing well in small networks, are not suitable for use in large-size ones.

© 2005 by CRC Press

3-4

The Industrial Communication Technology Handbook

3.3.4 Robustness Routing algorithms should perform correctly even in the presence of unusual or unforeseen events (e.g., router failures, misbehavior, sabotage).

3.3.5 Flexibility and Stability When responding to network changes (e.g., in bandwidth, router queue size, network delay), routing algorithms should exhibit flexible, but stable behavior.

3.3.6 Simplicity In order to reduce the overhead on routers (in terms of both processing and storage), routing algorithms should be as simple as possible. Moreover, they have to exploit system resources efficiently (especially when executing on resource-constrained hosts).

3.4 Classification of Routing Protocols Routing protocols may be classified according to se ver al different char acter istics [Cisco03][Kenyon02][Kurose01]. Here we will address the most relevant. As will be seen, routing protocols may be static or dynamic, global or decentralized, link state or distance vector, single path or multipath, flat or hierarchical, intra-AS or inter-AS, unicast or multicast.

3.4.1 Static or Dynamic Static algorithms are based on fixed tables. Static routes seldom change, and when changes do occur, it is usually a result of human intervention (i.e., editing a router’s forwarding table). Static routing algorithms are simple, introduce a low overhead, and are suitable for environments where network traffic is stable and predictable. Static routing is commonly adopted where there is no need for an alternative path (for example, in permanent point-to-point wide area network (WAN) links to remote sites or dial-up Integrated Services Digital Network (ISDN) lines). Dynamic routing algorithms automatically generate the routing paths responding to the network traffic or topology changes. When a change occurs, the routing algorithm running on a router recalculates routes, reflects changes in the routing table, and then propagates updates throughout the network, thus stimulating recalculation in the other routers as well. Dynamic algorithms sometimes have static routes inserted in their routing tables. This is the case, for instance, of default routers, to which all traffic should be forwarded when the destination address is unknown (i.e., not explicitly listed in the routing table). For example, the last entry in Figure 3.1 indicates the default router, to which all traffic should be forwarded when the destination address is not explicitly listed in the routing table.

3.4.2 Global or Decentralized A global algorithm makes the routing decision on the basis of complete information about the network, in terms of connectivity and link costs. The calculation of the best path can be up to a single site or replicated over multiple ones. In a decentralized routing algorithm no site has complete knowledge of the network, and route calculation is iterative and distributed. Each node only knows the status of the links directly connected to it. This information is then distributed to its neighbors, i.e., nodes directly connected to it, and this iterative process of route calculations and exchanges enables a node to determine the least-cost path to a destination.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-5

3.4.3 Link State or Distance Vector Link-state algorithms (also called shortest-path-first algorithms) compute the least-cost path using complete, global knowledge of the network in terms of connectivity and link costs. Each router maintains a complete copy of the topology database in its routing table and floods routing information to all the nodes in the internetwork. At the beginning, each router will only know about its neighbors, but it will increase its knowledge through link-state broadcasts received from all the other routers. The router does not send the entire routing table, but only the portion of the table that describes the state of its own links. Distance-vector algorithms (also known as minimum-hop or Bellman–Ford algorithms) require each router to keep track of its distance (hop count) from all other possible destinations. Each node receives information from its directly connected neighbors, calculates the routing table, and then distributes the results back to the neighbors. When a change is detected in the link cost from a node to a neighbor, the router first updates its distance table and then, if the change also affects the cost of the least-cost path, notifies its neighbors. Distance-vector algorithms are distributed and iterative, as the routing table distribution process goes on until no more exchanges with neighbors occur. Routers can identify new destinations as they come into the network, learn of failures in the network, and calculate distances to all known destinations. Each router advertises on a regular basis all the destinations it is aware of with the relevant distances and sends update messages containing all the information maintained in the routing table to neighboring routers on directly connected segments. Each router can therefore build a detailed picture of the network topology by analyzing routing updates from all other routers. The best route for each destination is determined according to a minimum-distance (minimum-hop) rule. Table 3.1 compares link-state and distance-vector routing algorithms. As all the routers share the same knowledge of the network in link-state algorithms, they have a consistent view of the best path to a given destination. This would entail a sudden change in the load on the least-cost link, and even congestion if all the routers decided to send their packets through that link at the same time. TABLE 3.1

Link-State vs. Distance-Vector Routing Algorithms Link State Global

Distance Vector Decentralized

“Tell the world about the neighbors”; i.e., a router does not send the entire routing table, but only the portion of the table that describes the state of its own links More robust, as each router autonomously calculates its routing table Fast Good High

Type Route advertising

Message overhead

High: any change in a link cost entails the need to send all nodes the new cost

Implementation complexity Processor and memory requirements Stability

High

“Tell all neighbors about the world”; i.e., routers tend to distribute the entire routing table (or large portions of it) to their directly attached neighbors only An incorrect node calculation can be spread over the entire network Slow (routing loops may occur) Poor “Good news propagates fast, bad news propagates slowly”; i.e., a decrease in the cost of the best path propagates fast, while an increase goes slowly Low: when link costs change, the results of the change will be propagated only if the latter entails a change in the least-cost path for one of the nodes attached to that link Low

High

Low

Problematic, due to oscillation in routes

Good

Robustness Convergence Scalability Responsiveness to network changes

© 2005 by CRC Press

3-6

The Industrial Communication Technology Handbook

A way to avoid such oscillations would be to ensure that all the routers do not run the algorithm at the same time. However, it has been noted that routers on the Internet can self-synchronize. Even though they initially execute the routing algorithm at the same rate, but at different times, the algorithm execution instance will eventually become synchronized at the routers [Floyd97]. To deal with this problem, randomization is introduced into the period between the execution instants of the algorithm at each router.

3.4.4 Single Path or Multipath Routing protocols may be single path or multipath. The difference lies in the fact that multipath algorithms support multiple entries for the same destination in the routing table, while single path ones do not. The presence of alternative routes in multipath routing protocols allows traffic to be multiplexed over several circuits (LAN (local area network) or WAN), thus providing not only greater throughput and topological robustness, but also support for load balancing (i.e., splitting traffic between paths that have equal costs). In multipath algorithms, multiplexing may be packet based or session based. In the former case, a roundrobin technique is typically used. In the latter case, load sharing is performed on a session basis, typically using a source–destination or destination hash function.

3.4.5 Flat and Hierarchical Another distinction can be made between flat and hierarchical routing algorithms. In a flat routing algorithm, all routers are peers. Each router is indistinguishable from another as they all execute the same algorithm to compute routing paths through the entire network. This flat model suffers from two main problems: lack of scalability and poor administrative autonomy. The first derives from the growing computational, storing, and communication overhead the algorithm introduces when the number of routers becomes large. The second arises from the need to hide some features of a company’s internal network from the outside, a crucial requirement for enterprise networks. In hierarchical routing, logical groups of nodes, called domains or autonomous systems (ASs), are defined. A routing domain is a collection of routers that coordinate their routing knowledge using a single routing protocol. An AS is a routing domain that is administered by one authority (person or group). Each AS requires a registered AS number (ASN) to connect to the Internet. According to this hierarchy, some routers in an AS can communicate with routers in other ASs, while others can communicate only with routers within their own AS. In very large networks, additional hierarchical levels may exist, with routers at the highest hierarchical level forming a routing backbone (as shown in Figure 3.2). Packets from nonbackbone routers are conveyed along the backbone by backbone routers until a backbone router connected to the destination AS is found. Then packets are sent through one or more nonbackbone routers within the AS until the ultimate destination is reached. Compared to flat routing, a drawback of hierarchical routing is that suboptimal paths may sometimes be found. Nevertheless, hierarchical routing offers several advantages over flat routing. First, the amount of information maintained and exchanged by routers is reduced, and this increases the speed of route calculation, thus allowing faster convergence. Second, unlike flat routing, where a single router problem can affect all routers in the network, in a hierarchical algorithm the scope of router misbehavior is limited. This increases overall network availability. In addition, the existence of boundary interfaces between different levels in the hierarchy can be exploited to enforce security policies (e.g., access control lists or firewalls) on border routers. Hierarchical routing also improves scalability and protocol upgrades, thus making the task of the network manager easier. Thanks to the above-mentioned advantages, large companies generally adopt hierarchical routing.

3.4.6 Intra-AS and Inter-AS A very large internetwork in the IP domain is typically organized as a collection of ASs. An AS can be composed of one or more areas, made up of contiguous nodes and networks that may be further split

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

AS-3

Backbone

ASBR

EGP

EGP EGP

ASBR

ABR Area 1

IGP

Area 3 ABR Area 4

ABR

ASBR

IGP

AS-1

3-7

IGP

Area 2

IGP AS-2 Area 6 Area 4 ABR ABR IGP IGP ABR Area 5

FIGURE 3.2 A routing architecture with three ASs connected through backbone routers.

into subnetworks. Within an area, network devices equipped with the capability to forward packets between subnetworks are called intermediate systems (ISs). ISs may be further classified into those that can communicate within routing areas only (intra-area ISs) and those that can communicate both within and between routing areas (inter-area ISs). Autonomous system border routers (ASBRs) on the backbone are entrusted with routing traffic between different ASs, while area border routers (ABRs) deal with traffic between different areas within the same AS. The routing protocol used within an AS (an intra-AS routing protocol) is commonly referred to as an Interior Gateway Protocol (IGP). A separate protocol is used to interface among the ASs, called the Exterior Gateway Protocol (EGP). EGPs are usually referred to as inter-AS routing protocols. All routers within each AS will run one or more IGPs. Routing information between ASs is exchanged through the routing backbone via an EGP. The use of an EPG limits the amount of routing information exchanged among the three ASs and allows them to be managed differently. The main differences between intra-AS routing protocols (IGPs) and inter-AS routing protocols (EGPs) can be summarized in terms of policy, scalability, and performance. When dealing with inter-AS routing, enforcing policy is crucial. For example, traffic originating from a given AS might be required not to pass through another specific AS. On the other hand, policy is less critical within an AS, where everything is under the control of a single administrative entity. Scalability represents a more critical requirement in inter-AS routing, where a large number of internetworks may be involved, than in intra-AS routing. Conversely, performance is more important within an AS than in inter-AS, where it is of secondary importance compared to policy. For instance, an EGP may prefer a more costly path to another one if the former complies with certain policy criteria that the second does not fulfill.

3.4.7 Unicast and Multicast Routing protocols involving just one sender and one receiver are called unicast protocols. However, several applications require addressing packets from one source to multiple destinations. This is the case, for instance, of applications that distribute identical information to multiple users. In this case, multicast routing is required. Multicast routing enables sending a packet from one source to multiple recipients with a single operation. Multicast addresses are present in IP. A multicast address is a single identifier for a group of receivers, which are thus members of a multicast group. To deploy multicast routing, two approaches are possible. In the first one, there is no explicit multicast support at the network layer, but an emulation using multiple

© 2005 by CRC Press

3-8

The Industrial Communication Technology Handbook

point-to-point unicast connections. This means that each application-level data unit is passed to the transport layer and here duplicated and transmitted over individual unicast network layer connections. In the second option, explicit multicast support is provided: a single packet is transmitted by the source and then replicated at a network router, i.e., forwarded over multiple outgoing links, to reach the destinations. The advantage of the second approach is that there is a more efficient use of bandwidth, as only one copy of a packet will cross a link. However, this approach does have a cost. In the Internet, for instance, multicast is not connectionless, as routers on a multicast connection have to maintain state information. This entails a combination of routing and signaling in order to establish, maintain, and tear down connection state in the routers. Compared to unicast routing, where the focus is on the destination of a packet, multicast routing is backward oriented: multicast routing packets are transmitted from a source to multiple destinations through a spanning tree. Multicast IP routing and relevant protocols will be addressed in Section 3.6. Unicast IP routing is addressed in Section 3.5.

3.5 IP Unicast Routing: Interior and Exterior Gateway Protocols With reference to unicast IP routing, among the IGPs that support IP there are [Lewis99]: • • • • •

The Routing Information Protocol (RIPv1 and RIPv2) The Cisco Interior Gateway Routing Protocol (IGRP) The Cisco Enhanced Interior Gateway Routing Protocol (EIGRP) The Open Shortest-Path-First Protocol (OSPF) The Intermediate System-to-Intermediate System Protocol (IS-IS)

EGPs in the IP domain include: • The Exterior Gateway Protocol (EGP) • The Border Gateway Protocol (BGP) In the following, the different protocols in each class will be described and compared.

3.5.1 Interior Gateway Protocols for IP Networks 3.5.1.1 Distance-Vector IGPs This section describes and compares two popular distance-vector protocols supporting IP: RIP and IGRP. 3.5.1.1.1 Routing Information Protocol The Routing Information Protocol (RIP) was one of the first IGPs and has been used for routing computations in computer networks since the early days of the ARPANET. Formally defined in the XNS (Xerox Network Systems) Internet Transport Protocols publications (1981), its widespread use was favored by its inclusion, as the routed process, in the Berkeley Software Distribution (BSD) version of UNIX supporting Transmission Control Protocol (TCP)/IP (1982). Two RIP versions exist: version 1 [Hedri88] and version 2 [Malkin97]. In RIP, each router sends a complete copy of its entire routing table to all its neighbors on a regular basis (typical RIP update timer = 30 seconds). A single RIP routing update contains up to 25 route entries within the AS. Each entry contains the destination address of a host or network, the IP address of the next-hop, the distance to the destination (in hops), and the interface. To obtain the cost to a given destination, a router can also send RIP request messages. After receiving an update, a router compares the new information with the information it already possesses. If the routing update includes a new destination network, it is added to the routing table. If the router receives a route to an existing destination with a lower metric, it replaces the current entry with the new one. If an entry in the update message has the same next-hop as the current route entry, but a different metric, the new metric will be used to update the routing table.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-9

FIGURE 3.3 A simplified internetwork used to explain split horizon with poison reverse.

If a router does not hear from a neighbor for a given time interval, called a dead interval (an invalid timer is used for this purpose), it assumes that the neighbor is no longer available (either down or unreachable). As a result, the router modifies its local routing table and then notifies its neighbors of the unavailable route. After another predefined time interval (a flush timer is set for this purpose), if nothing is heard from the route, the information is flushed from the router’s routing table. Routers allow for configuration of an active or passive RIP mode on specific interfaces. The active mode means full routing capability, while passive means a listen-only mode; that is, no RIP updates are sent out. To speed up convergence, RIP uses triggered updates. That is, whenever a RIP router learns of a change, such as a link becoming unavailable, it sends out a triggered update immediately, rather than waiting for the next announcement interval. As it takes time for triggered updates to get to all the other routers in large networks, a gateway that has not yet received the triggered update may issue a regular update at the wrong time, thus causing a bad route to be reinserted in a neighbor that has already received the triggered update. In order to prevent new routes from reinstating an old link, a hold-down period is enforced in the protocol: when a route is removed, no update for that route will be accepted for a given period, until the topology becomes stable. Hold-downs have the drawback of slowing convergence. Besides hold-down, RIP implements other techniques to avoid routing loops between adjacent routers, called split horizon with poison reverse. The general split-horizon algorithm prevents routes from being propagated back to the source, i.e., down the interface from which they were learned. As an example, consider the case in Figure 3.3. During normal operations, router A will notify router B that it has a route to network 1. According to the split-horizon algorithm, when B sends updates to A, it will not mention network 1. Now let us assume that the router A interface to network 1 goes down. Without split horizon, router B would inform router A that it can get to network 1. Since it no longer has a valid route, router A might select that route. In this case, A and B would both have routes to 1. But this would result in a circular route, where A points to B and B points to A. Using split horizon with poison reverse, instead of not advertising routes to the source, routes are advertised back to the source with a cost of infinity (i.e., 16), which will make the source router ignore the route. On the whole, RIP is quite robust and very easy to set up. It is the only routing protocol that UNIX nodes universally understand and is therefore commonly used in UNIX environments. RIP is also commonly used in end-system routing as a dynamic router discovery protocol. Dynamic router discovery is an alternative to static configurations, which allows hosts to dynamically locate routers when they have to access devices external to the local network. RIP was designed to work with moderate-size networks using reasonably homogeneous technology. This makes it suitable for local networks featuring a small number of routers (about a dozen) and links with equal characteristics. RIP has a poor degree of scalability, so it is not recommended for use in more complex environments. RIP cannot be used to build a backbone larger than 15 hops in diameter,* as it

*RIP is usually configured in such a way that a cost of 1 is used for the outbound link. If a network administrator chooses to use larger costs, the upper bound of 15 can be quickly reached and will become a problem.

© 2005 by CRC Press

3-10

The Industrial Communication Technology Handbook

specifies a maximum hop count of 15. A number of hops equal to 16 corresponds to infinity. This is useful to prevent packets that get stuck in routing loops from being constantly switched back and forth between routers. As it uses a simple, fixed metric to compare alternative routes, RIP can generate suboptimal routing tables, resulting in packets sent over slow (or costly) links even in the presence of better choices. RIP is therefore not suitable for environments where routes need to be chosen based on dynamically varying parameters such a measured delay, reliability, or load (RIP does not support load balancing). RIP version 1 (RIPv1) lacks variable-length subnet mask (VLSM) [Brade87] supports, so RIPv1 is classful. As will be explained in Section 3.7, this may seriously deplete the available address space to the detriment of scalability. On the other hand, RIP version 2 (RIPv2) is classless. VLSM, classful, and classless concepts will be addressed in detail in Section 3.7. RIPv1 uses a broadcast mode to advertise and request routes, while RIPv2 has the ability to send routing updates via multicast addresses. RIPv2 also supports route aggregation techniques, i.e., the use of a single network prefix to advertise multiple networks [Chen99]. Moreover, RIPv2 provides some support for authorization data link security, implementing authentication support on a per-message basis.* RIPv2 also offers some support for ASs and IGP/EGP interaction by means of external route tags. External routes are learned from neighbors situated outside the AS, while internal routes lie completely within a given AS. In RIPv2 a route tag field is used to separate internal RIP routes from external routes imported from an EGP. Finally, RIPv2 scales better than RIPv1, but, compared with link-state protocols, it suffers from slow convergence and scalability limitations. 3.5.1.1.2 Interior Gateway Routing Protocol The Interior Gateway Routing Protocol (IGRP) [Hedri91] was created by Cisco in the 1980s. Not an open standard, it only runs on Cisco routers. IGRP shares several features with RIP. IGRP sends out updates on a regular basis (every 90 seconds) and uses update, invalid, and flush timers as well (with different values from RIP, depending on the implementation). Like RIP, IGRP uses triggered updates to speed up convergence and hold-down timers to enforce stability. However, while RIP only allows a network diameter of 15 hops, IGRP can support a network diameter of up to 255 hops, so it can be used in larger networks. IGRP differs from RIP regarding metrics, parallel route support, the reverse poisoning algorithm, and the use of a default gateway. Like RIPv1, IGRP does not support VLSM. Unlike RIP, IGRP does not use a single metric, but a vector of metrics that takes into account the topological delay time, the bandwidth of the narrowest bandwidth segment of the path, channel occupancy, and the reliability of the path. A single composite metric can be computed from this vector [Hedri91], which encompasses the effect of the various components into a single number representing how good a path is. The path featuring the smallest value for the composite metric will be the best path. The hop count and MTU (maximum transmission unit, i.e., the maximum packet size that can be sent along the entire path without fragmentation) of each network are also considered in the best path calculation. With IGRP, network administrators are supplied with a large range of metrics and they are also allowed to customize them. By giving higher or lower weight to specific metrics, network administrators can therefore influence IGRP’s automatic route selection. For example, suitable constants can be set to several different values to provide different types of service (interactive traffic, for instance, would typically give a higher weight to delay, whereas file transfer would assign a higher weight to bandwidth). IGRP is more accurate than RIP in calculating the best path, as a vector of metrics instead of a single metric improves the description of the status of a network. If, for instance, a single metric is used, several consecutive fast links will appear to be equivalent to a single slow link. While this would be appropriate for delay-sensitive traffic, it would not be so for bulk *A whole RIP entry, denoted by a special address family identifier (AFI) value of 0¥FFFF, is used for authentication purposes, thus reducing the total number of routing entries per advertisement to 24.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-11

data transfer, which is more sensitive to bandwidth. This problem may arise with RIP, but not with IGRP, as it considers delay and bandwidth separately. An interesting feature of IGRP is that it provides multipath routing and can perform load balancing, by splitting traffic equally between paths that have equal values for the composite metrics. To deal with the case of multiple paths featuring unequal values for the composite metrics, a variance parameter is defined as a multiplier of the best metrics. Only routes with metrics within a given variance of the best route can be used as multiple paths. In this case, traffic is distributed among multiple paths in inverse proportion to the composite metrics. The variance can be fixed by the network administrator. The default value is 1, which means that when the metric values of two paths are not equal, only the best route will be used. A higher variance value means that any path with a metric up to that value will be considered for routing purposes. However, variance values other than 1 should be carefully managed, as they may cause routing loops*. Unlike RIP, which can only prevent routing loops between adjacent routers, the route poisoning algorithm used in IGRP prevents large routing loops between nonadjacent routers. This is achieved by poisoning routes for which the metrics increase by a factor of 10% or more after an update. The rationale behind this is that routing loops generate continually increasing metrics. It should be noted that if this poisoning rule is used, valid routes may be erroneously deleted from routing tables. However, if the routes are valid, they will be reinstalled by the next update. The advantage of the IGRP poisoning algorithm is that it safely allows a zero hold-down value, which significantly improves network convergence time. IGRP handles default routes differently from RIP. Instead of a single dummy entry for the default route, IGRP allows real networks to be flagged as default candidates. IGRP periodically analyzes all the candidate default routes to choose the one with the lowest metric, which will be the actual default route. This approach is more flexible than the one adopted by typical RIP implementations. The default route can change in response to changes within a network. 3.5.1.2 A Hybrid Protocol: The Enhanced Interior Gateway Routing Protocol EIGRP was introduced by Cisco in the early 1990s as an evolution of IGRP [Cisco02a]. As it offers support for multiple network layer protocols (e.g., IP, AppleTalk, and Novell NetWare), EIGRP is commonly used in mixed networking environments. EIGRP is called a hybrid protocol as it combines a distance-vector routing protocol with the use of a diffusing update algorithm (DUAL) [Garcia93], which has some of the features of link-state routing algorithms. The advantages of this combination are fast convergence and lower bandwidth consumption. DUAL, based on distance information provided by route advertisements from neighbors, finds all loop-free paths to any given destination. Among all the loop-free paths to the destination, the neighbor with the best path is selected as the successor, while the others are selected as feasible successors, i.e., eligible routes to be used if the primary route becomes unavailable. If, after a failure, one feasible successor is found, DUAL promotes it to primary route without performing recalculation, thus reducing the overhead on routers and transmission facilities. If, on the other hand, no feasible successors exist, a recomputation (called a diffusing computation) is performed to select a new successor. Unnecessary recomputations affect convergence, so they should be avoided. EIGRP uses the same composite metrics as IGRP. However, it does not send periodic updates, but instead implements neighbor discovery/recovery, based on a hello mechanism, to assess neighbor reachability. Routing updates are only sent in the event of topology changes or a failure in a router or link. Moreover, when the metric for a route changes, only partial updates are sent, and they only reach the routers that really need the update. As a result, EIGRP requires less bandwidth than IGRP. EIGRP relies on the Reliable Transport Protocol (RTP) [Schulz03] to achieve guaranteed and ordered delivery of EIGRP packets to all neighbors. It supports VLSM, thus providing more flexibility in inter*It should also be pointed out that the load balancing performed by the IGRP may produce out-of-sequence packets. This should be taken into account when neither the data link nor transport layer protocols have the capability of handling out-of-sequence packets.

© 2005 by CRC Press

3-12

The Industrial Communication Technology Handbook

network design than RIPv1 or IGRP. This can be particularly useful when dealing with a limited address space (as will be explained in Section 3.7). EIGRP has a modular architecture that makes it possible to add support for new protocols to an existing network. On the whole, EIGRP is robust and easy to configure and use. EIGRP supports both internal and external routes, allowing tags to identify the source of external routes. This feature can be exploited by network administrators to develop their own interdomain routing policies [Pepel00][Cisco97]. 3.5.1.3 Link-State Protocols 3.5.1.3.1 OSPF: The Open Shortest-Path-First Protocol OSPF is a link-state IGP designed by the Internet Engineering Task Force (IETF) in the late 1980s to overcome the limitations of RIP for large networks. The name of this protocol derives from the fact that it is an open standard and that it uses the shortest-path-first (also called Dijkstra [Dijks59]) algorithm [Perlm92][Moy89]. It was specifically designed for the TCP/IP environment and runs directly over IP. OSPF is commonly used in medium to large IP networks and is implemented by all major router manufacturers. Several specifications have appeared since the first one [Moy89]. The RFC for OSPF version 2 (OSPFv2) is in [Moy88], while OSFP specifications to support IPv6 are given in [Coltun99]. One of the most appealing features of OSPF is its support for hierarchical routing design within ASs (although it is purely an IGP). An AS running OSPF can therefore be configured into areas that are interconnected by one backbone area (as shown in Figure 3.4). There are two types of routing within OSPF: intra-area and inter-area routing. In OSPF, each router monitors the state of its attached interfaces. If a topology change occurs, the router that has detected it distributes information about the change to all the other routers within its area, through broadcast messages called link-state advertisements (LSAs). LSAs include metrics, interface addresses, and other data. A router uses this information to build a topological database in the form of a direct graph. The costs associated with the various edges (i.e., links) are expressed by a single dimensionless metric configured by the network administrator. A topological database is therefore present in each router. When two routers have identical topological databases, they become adjacent. This is important, as routing information can only be exchanged between adjacent routers. Locally, the router applies a shortest-path tree algorithm to all the networks, taking itself as the root node. As said above, each router only sends LSAs to all the routers in the same area. This prevents intraarea routing from spreading outside the area. Each router within an area will know routes to any destination within the area and to the backbone. Other AS ASBR 11

IR 1

Area 1

ABR 2

H1

Area 0 (Backbone) BR 10

ABR 3 Area 2

IR 4

ABR 9 IR 6

IR 5 IR 7

H2

Area 3

H3

IR 8 AS 1

FIGURE 3.4 OSPF routing hierarchy.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-13

The internal topology of any area is hidden to every other area. Each router within a given area will know how to reach both every other router within its area and the backbone, but it does not have any idea of the number of routers existing or the way they are interconnected for any other area. Inter-area routing is performed through the backbone (which must be configured as area 0). It should be pointed out that an OSPF backbone is a routing area and not a physical backbone network, as the name might suggest. Four different types of routers can be configured in OSPF: internal routers (IRs), area border routers (ABRs), backbone routers (BRs), and autonomous system border routers (ASBRs). All of them are depicted in Figure 3.4. IRs are only entrusted with intra-area routing. All their interfaces are connected within an area. ABRs route packets between different areas within the AS. They interface to multiple areas (including the backbone), so they belong to both an area and the backbone and maintain topological information about both. BRs belong to the backbone (area 0), but are not ABR; that is, their interfaces are connected only to the backbone (e.g., BR-10). As the backbone operates an area itself, BRs maintain area routing information. ASBRs, also called boundary routers, are responsible for exchanging routing information with routers belonging to other ASs (e.g., ASBR-11 in Figure 3.4). They have an external interface to another AS and learn external routes from dynamic EGPs, such as BGP-4 (presented in Section 3.5), or static routes. External routes are passed transparently throughout the AS and kept separate from the OSPF link-state data. External routes can also be tagged by the advertising routes. A packet to be sent to another area in the AS (inter-area routing) is first routed to an ABR in the source area (intra-area routing) and then routed via the backbone to the ABR that belongs to the destination area. Finally, it will be routed to the ultimate destination. Route calculation introduces high complexity, and as an OSPF router stores all link states for all the areas, the overhead is high in a large internetwork. On the other hand, OSPF scales well and converges quickly. It also provides several appealing features, such as: • Security: All the exchanges between OSPF routers (e.g., LSAs) are authenticated, thus preventing intruders from manipulating routing information. • Type-of-service (TOS) support: Each link may feature different costs according to the traffic TOS requirements. • Load balancing between multiple equal-cost paths. • Triggered updates. • Designated routers: On a LAN, if several routers may be connected, one will be elected as the designated router and another as its backup. The designated router is responsible primarily for generating LSAs for the LAN to all other networks in the OSPF area. • Explicit support for VLSM and the ability to have discontinuous subnets (i.e., made up of single networks or sets of networks featuring noncontiguous addressing). This feature is very useful on the Internet (as will be discussed in Section 3.7). • Tagged external routes: OSPF is capable of receiving routes from and sending routes to different ASs. • Route summarization: An area border router can aggregate two subnets belonging to the same area so that only one entry in the routing table of another router will be used to reach both subnets. This minimizes the size of topological databases in the routers, thus significantly reducing protocol traffic. 3.5.1.3.2 Integrated IS-IS The Intermediate System-to-Intermediate System Protocol (IS-IS) [Callon90] is a link-state protocol. It is an open standard and its name indicates that this protocol is used by routers to talk to each other. Developed for the OSI world, IS-IS was made integrated so that it can route both OSI and IP simultaneously. Similar to OSPF, integrated IS-IS uses the SPF algorithm and is based on LSAs sent to all routers within a given area and hello packets to check the current state of a router. As it is a link-state protocol, it converges fast, but at the expense of high complexity. Integrated IS-IS supports VLSM, load sharing,

© 2005 by CRC Press

3-14

The Industrial Communication Technology Handbook

and triggered updates. Still not widely deployed at present (it is confined to telco and government networks), integrated IS-IS has good potential for medium to large IP networks.

3.5.2 Exterior Gateway Protocols for IP Internetworks EGPs are responsible for routing between ASs. Possible alternatives in the IP world are static routes, the Exterior Gateway Protocol (EGP) [Rosen82], and the Border Gateway Protocol (BGP) [Rekhr95]. 3.5.2.1 Static Routes Static routes can be applied to both intra-AS and inter-AS routing. However, as they offer particularly appealing features for inter-AS routing, they are included in this subsection. Configuration of static routes is simple, and it is very easy to enforce policy (as no routes equals no access). With static routes, no routing protocol messages travel over the links between ASs. On the other hand, maintenance of static routes in large internetworks may be complex, as they do not scale. For this reason, many network designers adopt the Dynamic Host Configuration Protocol (DHCP) [Droms97], which dynamically allocates a default gateway from a set of candidate gateways. Static routes also lack flexibility (as there is no way to choose a better path that could have been selected if dynamic routing protocols were used), so they are not suitable for changing environments. They do not respond to topological changes. To enforce fault tolerance in static routes, a secondary gateway is usually maintained, which could take the role of the primary one if the latter becomes unreachable or goes down. 3.5.2.2 Exterior Gateway Protocol The Exterior Gateway Protocol [Rosen82] was the first EPG to be developed. It runs directly over IP and is a best-effort service. The routing information of EGP is similar to distance-vector protocols, but it does not use metrics. EGP suffers from some design limitations, as it does not support routing loops detection and multiple paths. If more than one path to a destination exists, packets can easily get stuck in routing loops. Nowadays, it has been declared obsolete and replaced by the Border Gateway protocol (BGP). 3.5.2.3 Border Gateway Protocol Border Gateway Protocol version 4 (BGP-4), originally specified in RFC 1771 [Rekhr95], is a very robust and scalable routing protocol that is becoming a de facto standard for inter-AS routing on the current Internet. BGP-4 is an open standard. It somewhat resembles distance-vector protocols, as it is a distributed protocol where information exchange occurs only between directly connected routers. However, it is more appropriate to define BGP-4 as a path-vector protocol. This is because, as stated in [Rekhr95], “the primary function of a BGP speaking system is to exchange network reachability information with other BGP systems.” Network reachability information includes only the list of autonomous systems (ASs) that need to be traversed in order to reach other networks. Path information, instead of cost information, is exchanged between neighboring BGP routers, and BGP-4 does not specify the rule for choosing a path from those advertised. The routing mechanism and routing policy are therefore separated. A policy is manually configured to allow a BGP router to rate possible routes to other ASs and choose the best path. Two BGP routers first have to establish a TCP connection. After a negotiation phase, in which the two systems exchange some parameters (such as BGP version number, AS number, etc.), they become BGP peers and can start exchanging information. Initially, they exchange the full routing tables. Thereafter, only incremental updates are sent when some change in the routing tables occurs. No periodic refresh of the entire routing table is needed, so a BGP speaker maintains the current version of the entire BGP routing tables of all of its peers for the duration of the connection. In order to maintain the connection, peers regularly exchange keep-alive messages. Notification messages are sent in response to errors or special conditions. In the event of error, a notification message is sent and the connection is closed. Network reachability information is used to construct a graph of AS connectivity, from which routing loops may be pruned and policy decisions at the AS level may be enforced. Each BGP router maintains a routing table with all feasible paths to a given network.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-15

As said above, BGP does not propagate cost information but path information, and does not specify which path should be selected from those that have been advertised, as this is a policy-dependent decision left up to the network administrator. This is because, as stated previously (in Section 3.4), when dealing with inter-AS routing, policy is more important than performance. However, when a BGP speaker receives several updates describing the best paths to the same destination, it has to choose one of them. The selection criteria are based on rules that take path attributes into account. Once the decision has been made, the speaker puts the best path in its routing table and then propagates the information to its neighbors. The path attributes generally used in the route selection process are [Lewis99][Halabi00][Huitem95]: • Weight: A Cisco-defined attribute that is local to a router and is not advertised to neighboring routers. If the router learns about more than one route to the same destination, the route with the highest weight will be preferred. • Local preference: Used to choose an exit point from the local autonomous system (AS). It is propagated throughout the local AS. If there are multiple exit points from the AS, the local preference attribute is used to select the exit point for a specific route. • Multiexit discriminator, or metric attribute: Used as a suggestion to an external AS regarding the preferred route into the AS that is advertising the metric. • Origin: Indicates how BGP learned about a particular route. The origin attribute can have one of three possible values: IGP (the route is interior to the originating AS), EGP (the route is learned via the Exterior Border Gateway Protocol (EBGP), or incomplete (the origin of the route is unknown or learned in some other way — this occurs when a route is redistributed in BGP). • AS_path: When a route advertisement passes through an autonomous system, the AS number is added to the ordered list of AS numbers the advertisement traversed. • Next-hop: The IP address that is used to reach the advertising router. • Community: This provides a way of grouping communities, i.e., destinations to which routing decisions can be applied. Predefined community attributes are no-export (this route must not be advertised to EBGP peers), no-advertise (this route must not be advertised to any peer), and Internet (this route must not be advertised to the Internet community — all routers in the network belong to it). Attributes are crucial to achieve scalability, define routing policies, and maintain a stable routing environment. Other BGP features include route filtering, i.e., the ability of a BGP speaker to specify which routes to send and receive from any of its peers. Filtering may refer to inbound or outbound links and may be applied as permit or deny. BGP-4 supports supernetting or classless interdomain routing (CIDR) [Fuller93] (dealt with in Section 3.7), which enables route aggregation, that is, the combination of several routes within a single route advertisement. This minimizes the size of routing tables and protocol overhead. BGP was originally designed to perform inter-AS routing, but it can also be used for intra-AS routing. BGP connections between ASs are called external BGPs (e-BGPs), while those within an AS are called internal BGPs (i-BGPs). Figure 3.5 shows a typical scenario for e-BGP, i.e., a multihomed AS connected to the Internet. All the networks X, Y, Z, 1, 2, and 3 are ASs. More specifically, 1 and 3 are stub networks, 2 is a multihomed stub network, and X, Y, and Z are backbone provider networks. A stub network is such that all traffic entering it should be destined to that network. A multihomed AS is connected to multiple ASs (for example, via two different service providers*), but does not allow transit traffic. Stub network 2 will be prevented from forwarding traffic between Y and Z by a selective route advertisement mechanism. BGP routes will be advertised in such a way that network 2 will not advertise

*In this situation, multiple service providers are used to increase availability and to allow load sharing.

© 2005 by CRC Press

3-16

The Industrial Communication Technology Handbook

FIGURE 3.5 A scenario for BGP.

to its neighbors (Y and Z) any path to other destinations except itself. For instance, 2 will not advertise path 2 Z 3 to network Y, so the latter will not forward traffic to 3 via 2. i-BGP can be used to distribute routing information to routers within the AS regarding destinations (networks) outside the AS. Running i-BGP offers several advantages, such as a consistent view of the AS to external neighbors, more control over information exchange within the AS, flexibility, and a slightly shorter convergence time than e-BGP (which is slow). Table 3.2(A) and Table 3.2(B) list the various IGPs and EGPs discussed and compare them according to multiple design criteria. The tables can be helpful to guide the designer’s choice when dealing with routing protocols in the IP domain. We have seen that various IGPs and EGPs exist in the IP domain, including both proprietary protocols and open standards. In a composite, complex internetwork they may coexist either for historical reasons or due to the presence of multiple vendor solutions. This coexistence may cause incompatibility problems due to both the different metrics adopted and peculiarities of the various protocols. It is therefore necessary to find a way to overcome incompatibilities and form a holistic view of the network topology, thus enabling interoperability. This is the case for route redistribution [Awdu02]. Route redistribution is the process that enables routing information from one protocol to be translated and used by a different one. A router can therefore be configured to run more than one routing protocol and redistribute route information between the two protocols. TABLE 3.2(A) IGPs vs. EGPs in the IP Domain Protocol IGP

EGP

Technology

Type

Metrics

Distance vector

Classful

RIPv1 RIPv2 EIGRP

Advanced distance vector

Classful Classless Classless

OSPF IS-IS BGP-4

Link state Link state Path vector

Classless Classless Classless

Bandwidth delay, load, reliability, MTU Hop count Hop count Bandwidth, delay, load, reliability, MTU Cost Cost Cost, hop, policy

IGRP

Scalability

VLSM Support

Medium

No

Small Small Large

No Yes Yes

Large Very large Large

Yes Yes Yes

TABLE 3.2(B) IGPs vs. EGPs in the IP Domain Protocol IGP

EGP

IGRP RIPv1 RIPv2 EIGRP OSPF IS-IS BGP-4

© 2005 by CRC Press

Hop Count Limit

Load Balance (Equal Paths)

Load Balance (Unequal Paths)

Standard

Convergence Time

Routing Algorithm

100 (up to 255) 15 15 100 (up to 255) 200 1024

Yes Yes Yes Yes Yes Yes No

Yes No No Yes No No No

No Yes Yes No Yes Yes Yes

Slow Slow Slow Fast Fast/slow Fast/slow Slow

Bellman–Ford Bellman–Ford Bellman–Ford Dual Dijkstra IS-IS

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-17

Redistribution is needed in two cases: • At the AS boundary, as IGPs and EGPs do not match • Within an AS, when multiple IGPs are used Redistribution has direction. That is, routing information can be redistributed symmetrically (mutual redistribution) or asymmetrically (hierarchical redistribution). Handling redistribution is not an easy task, and it is difficult to define a general approach. Solutions are typically vendor dependent, as the problems that can arise are strictly related to the details of the various protocols.

3.6 IP Multicast Routing The aim of multicast routing is to build a tree of links connecting all the routers that are attached to hosts belonging to the multicast group. Two different approaches are possible. The first one, called groupshared tree, foresees a single tree for all sources in the multicast group, while the second approach, called source-based trees, provides an individual routing tree for each source in the multicast group. The group-shared tree approach entails solving the Steiner tree problem [Hakimi71]. Although several heuristics were devised to solve this problem [Wall80][Waxm88][Wei93], they were not adopted by existing Internet multicast routing algorithms, due to the complexity and poor scalability of these methods, which entail that multicast-aware routers maintain state information about all the links in the network. Moreover, the tree built using these techniques should be recalculated at any change in link costs. A more effective way to find the group-shared tree is the center-based (or core-based) approach, which is adopted by several Internet multicast routing algorithms. Here a router of the multicast group (also called a rendezvous point) is elected to be the core of the multicast tree. The core node will be the recipient of join messages sent by all the routers attached to the hosts belonging to the multicast group. A join message is forwarded toward the core via unicast routing, and it stops when it reaches the core or when it reaches a router that is already part of the multicast tree. This way, the path followed by the join message is the branch to the core from the originating node, and it will become part of the multicast tree (the branch is said to be grafted on to the existing tree). The advantages of this approach are that unicast tables are exploited to forward the join messages and that multicast-aware routers do not need to maintain link state. The source-based trees approach entails solving a least-cost path multicast routing problem. This could be expensive, as each sender needs to know all links’ costs to derive the least-cost spanning tree. For this reason, the reverse path-forwarding (RPF) algorithm [Dalal78] is used, which is also useful as it prevents loops. The RPF algorithm allows a router to accept a multicast packet only on the interface from which the router would send a unicast packet to the source of the incoming multicast packet. RPF is effective, as each router only has to know the next-hop along its least-cost path to the source, but it has the drawback that even routers that are not attached to hosts belonging to the multicast group would receive multicast packets. This can be solved by pruning techniques, i.e., a prune message can be sent back upstream by a multicast router receiving a multicast message for which it has no recipients. As in a dynamic scenario, if it happens that a receiver later joins a multicast-aware router that has already sent a prune message, unprune messages to be sent back upstream or suitable timeouts (time to live (TTL)) to remove bad prunes can be introduced. In the Internet, multicast is achieved by the combination of the Internet Group Management Protocol version 2 (IGMPv2) [Fenne97] and multicast routing protocols. IGMP is an end system to intermediate system (ES-IS) protocol for multicasts. End systems (ESs) are network devices without the capability to forward packets between subnetworks. IGMP is used by a host to inform its directly attached router that an application running on it is interested in joining a given multicast group. IGMP allows multicast group members to join or leave a group dynamically and maintains state information on router interfaces that can be exploited by multicast routing protocols to build the delivery tree.

© 2005 by CRC Press

3-18

The Industrial Communication Technology Handbook

Multicast routing algorithms, such as the protocol-independent multicast (PIM), the Distance-Vector Multicast Routing Protocol (DVMRP), the multicast open shortest path first (MOSFP), and the corebased tree (CBT), are in charge of coordinating the routers so that multicast packets are forwarded to their ultimate destinations. They are outlined below.

3.6.1 Distance-Vector Multicast Routing Protocol The Distance-Vector Multicast Routing Protocol [Waitz88] is an Interior Gateway Protocol derived from RIP. DVMRP combines many of the features of RIP with the truncated reverse-path-broadcasting (TRPB) algorithm described by Deering [Deeri88]. DVMRP is based on flood and prune, according to the dense-mode multicast routing model, which assumes that the multicast group members are densely distributed over the network and bandwidth is plentiful. Such a model fits high-density enterprise networks. The multicast forwarding algorithm requires the building of per-source multicast trees based on routing information, and then the dynamic creation of per-source group multicast delivery trees, by pruning the multicast tree for each source in a selective way. To unprune a previously pruned link, DVMRP provides both an explicit graft message and a TTL on prune messages (2 hours by default). DVMRT includes support for tunneling IP multicast packets. This is very useful, as not all IP routers have multicast capabilities. Tunneling is performed by encapsulating multicast packets in IP unicast packets that are addressed and forwarded to the next multicast router on the destination path. Thanks to its tunneling capabilities, DVMRP has been used on the Internet for several years to support the multicast overlay network MBone [Kumar96], although its flooding nature does not recommend its adoption on large internetworks. The tree building performed by DVMRP needs more state information than RIP, so DVMRP is more complicated than RIP. There is also a very important difference from RIP. While the target of RIP is to route and forward packets to a particular destination, the goal of DVMRP is to keep track of the return paths to the source of multicast packets.

3.6.2 Multicast OSPF Multicast OSPF (MOSPF) [Moy94] is an extension of the OSPFv2 unicast protocol enabling the routing of IP multicast packets. MOSPF is not a separate routing protocol; the multicast extensions are built on top of OSPFv2 and have been implemented so that a multicast routing capability can be gradually introduced into an OSPFv2 routing domain. A new OSPF link-state advertisement (LSA) describing the location of multicast destinations is added. Each router builds least-cost multicast trees for each (sender, group) pair. The path for a multicast packet is obtained by building a source-rooted pruned shortest-path multicast tree. The state of the tree is cached: it has to be recalculated following link-state changes or when the cache times out. Unlike unicast packets, in MOSPF an IP multicast packet is routed based on both the packet’s source and its multicast destination. During packet forwarding, any commonality of paths is exploited. When multiple hosts belong to a single multicast group, a multicast packet will be replicated only when the paths to the separate hosts diverge [Moy94]. MOSPF is an example of sparse-mode routing protocol, which assumes that multicast members are widely distributed over the network and bandwidth is possibly restricted. The sparse mode is suitable for internetworking applications. However, MOSF does not support tunneling and, due to the inherent scaling problems of shortest-path algorithm, is not suitable for large internetworks.

3.6.3 Protocol-Independent Multicast Protocol-independent multicast (PIM) is a recent IP multicast protocol. The name indicates that PIM is not dependent on any particular unicast routing protocol. PIM works with IGMP and existing unicast routing protocols, such as RIP, IGRP, OSFP, IS-IS, and BGP. There are two operating modes for PIM, the sparse mode and the dense mode, which are described below.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-19

3.6.3.1 Sparse-Mode PIM The sparse-mode protocol-independent multicast (PIM-SM) [Estrin98] is a protocol for efficiently routing to multicast groups that may span wide-area (and interdomain) internetworks and is designed to support the sparse distribution model. PIM-SM works using a rendezvous point (RP) and requires explicit join/prune messages. A sender willing to send multicast packets has first to announce its presence by sending data to the RP. Analogously, a receiver willing to receive multicast data has first to register with the RP. Once the data flow from sender to RP to receiver starts, the routers on the path automatically optimize the path, removing not-needed hops. This way, traffic only flows where it is required and router state is maintained only along the path. A drawback of PIM-SM is that RP could become a bottleneck, so multiple RPs may be introduced to avoid congestion through load sharing. 3.6.3.2 Dense-Mode PIM The dense-mode protocol-independent multicast (PIM-DM) [Nicho03] forwards multicast packets out to all connected interfaces except the receive one. Thus, it floods the network first and prune-specific branches later. PIM-DM efficiently supports routing in dense multicast networks, where it is reasonable to assume that every downstream system is potentially a member of the multicast group. Unlike DVMRP, PIM-DM does not use routing tables. PIM-DM is easy to configure, but less scalable and less efficient than PIM-SM for most applications.

3.6.4 Core-Based Tree The core-based tree (CBT) multicast routing protocol builds a shared multicast distribution tree per multicast group. CBT follows the core-based approach described at the beginning of Section 3.6 for group-shared tree building. Despite its simplicity, the way the tree is built in CBT tends to concentrate traffic around the core routers, and this may lead to congestion. For this reason, some implementation features multiple core routers and performs load sharing between them. CBT is very suitable for supporting multicast applications, such as distributed interactive simulations or distributed video gaming, as they are characterized by many senders within a single multicast group. The deployment of CBT until now has been limited. More details on CBT can be found in [CBT][CBTv2].

3.6.5 Interdomain IP Multicast Routing The current approaches to interdomain IP multicast routing are based on an extension of BGP-4, called the Multicast Border Gateway Protocol (MBGP) [Bates00]. MBGP carries two sets of routes, one for unicast routing and one for multicast routing. The routes associated with multicast routing are used by PIM-SM to build data distribution trees.

3.7 IP Addressing and Routing Issues This section focuses on the strict coupling between IP addressing and routing. First, the two commonly used addressing models, classful and classless, are described, together with their impact on routing. Then subnetting, variable-length subnet masks (VLSM), and classless interdomain routing (CIDR) are presented. Finally, IPv6 and its deployment in the current Internet, together with IPv4/IPv6 migration issues, are discussed.

3.7.1 Classful IP Addressing According to the first IP specification [Poste81], each system attached to an IP-based Internet has a globally unique 32-bit Internet address. IP addresses are administered by the Internet Assigned Numbers Authority [IANA]. Systems that have interfaces to more than one network must have a unique IP address for each network interface. The Internet address consists of two parts: (1) the network number (or

© 2005 by CRC Press

3-20

The Industrial Communication Technology Handbook

TABLE 3.3(A) Address Formats and Network Sizes for Each Class

Class ID Class A Class B Class C Class D Class E

Highest-order bits = 0 Second-highest-order bits = 10 Third-highest-order bits = 110 Fourth-highest-order bits = 1110 Fifth-highest-order bits = 11,110

Network Prefix Size (bit)

Network Number Size (bit)

Host Number Length (bit)

Maximum No. of Networks

Maximum No. of Hosts Per Network

8 16

7 14

24 16

27 – 2 = 126a 214

224 – 2b 210 – 2

24

21

8

221

28 – 2

na

na

na

na

na

tbd

tbd

tbd

tbd

tbd

a

The maximum number of networks which can be defined is 126, and not 128, as there are 2 reserved networks, i.e., network 0.0.0.0 (reserved for default routes) and network 127.0.0.0 (reserved for the loopback function). b As host numbers 0.0.0.0 (“this network”) and 1.1.1.1 (“broadcast”) are reserved.

TABLE 3.3(B) Address Type and Ranges for Each Class

Class A Class B Class C Class D Class E

Type

Dotted Decimal Notation Range

Unicast Unicast Unicast Multicasta Reserved for experimental use only

From 1.0.0.0 to 127.255.255.255 From 128.0.0.0 to 191.255.255.255 From 192.0.0.0 to 223.255.255.225 From 224.0.0.0 to 239.255.255.255 From 240.0.0.0 to 247.255.255.255

a

Class D is used for multicast applications and routing protocols such as OSPF and RIPv2.

network identifier, netID), which identifies the network to which the host belongs, and (2) the host number (or host identifier, hostID), which specifies the particular host on the given network. In classful addressing, the IP address space is divided into three main address classes, Class A, Class B, and Class C, which differ for the position of the boundary between the network number and the host number within the 32-bit address. Two additional classes are also defined: Class D (used for multicast addresses) and Class E (reserved for future use). IP addresses are commonly expressed in what is called dotted decimal notation, which divides the 32bit Internet address into four 8-bit fields. Each field value corresponds to each byte of the address, written in its decimal form and separated by a period (dot) from the other bytes in the address. For example, let us consider the IP address 192.32.215.8. The first number, 192, is the decimal equivalent of the first eight bits of the address, i.e., 11000000; the second, 32, is the equivalent of the second eight bits of the address; and so on. In binary notation, the address is therefore 11000000 00100000 11010111 00001000. Table 3.3(A) summarizes the address formats for each class and the maximum number of networks and hosts that can be defined within each class, while Table 3.3(B) shows the address type and address ranges for the different classes in the dotted decimal notation. Each unicast address class (i.e., A, B, and C) has an associated default mask, which is a bit mask used by host and routers to assess how much of the netID is significant for forwarding decisions. The bit mask therefore indicates how much of the address is allocated to the netID and how much is left to the hostID. Table 3.4 shows the default masks associated with each unicast IP address class. For example, the default mask 255.0.0.0 for Class A indicates that only the first eight bits are used by the netID. The mask role is crucial for routers, as by ANDing such a mask with a destination address, they can easily determine if an incoming packet should be sent directly to the local network or forwarded to another one.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-21

TABLE 3.4 Prefix-Default Masks Association for Each Unicast Class

Class A Class B Class C

Default Mask

Default Prefix

255.0.0.0 255.255.0.0 255.255.255.0

/8 /16 /24

Recently, masks have been indicated using the so-called prefix, which indicates the number of contiguous bits used by the mask. Prefix–default mask associations for each unicast class are also shown in Table 3.4. A host can be configured with multiple IP addresses of different classes on the same physical interface. Direct communication is only possible between nodes within the same class and prefix, while nodes with a different address class or the same address class and a different prefix need an intermediate device, such as a layer 3 switch, router, proxy, or network address translator (NAT). Network address translation allows a router to act as an agent between the Internet and a local private network. This means that only a single, unique IP address is required to represent an entire group of computers. NAT implements short-term address reuse and is based on the fact that a very small percentage of hosts in a stub domain (i.e., a domain such as a corporate network that only handles traffic originated or destined to hosts in the domain) are communicating outside of the domain at any given time. Because many hosts never communicate outside of their stub domain, only a subset of the IP addresses inside the domain needs to be translated into IP addresses that are globally unique when outside communications are required. Each NAT device has a table consisting of pairs of local IP addresses and globally unique addresses. The IP addresses inside the stub domain are not globally unique. They are reused in other domains, thus solving the address depletion problem. The globally unique IP addresses are assigned according to the CIDR address allocation schemes, which solve the scaling problem (as will be discussed below). The main advantage of NAT is that it can be installed without changes to routers or hosts. More details on NAT can be found in [Egeva94].

3.7.2 Impact of IP Addressing on Routing Tables and Internet Scalability The two most compelling problems facing today’s Internet are IP address depletion and poor scaling due to the uncontrolled increase in the size of Internet routing tables. The first problem is the result of the IP version 4 (IPv4) addressing scheme, based on a 32-bit address, which limits the total number of IPv4 addresses available. The situation is further complicated by the traditional model of classful addressing, which determined inefficient allocation of some portions of the IP address space in the early days of the Internet. The second problem derives from the exponential growth of the number of organizations connected on the Internet, combined with the fact that Internet backbone routers have to maintain complete routing information for the Internet. This problem cannot be solved by hardware enhancements, such as expanding router memory or improving router processing power; to deal with large routing table processing, route flapping (i.e., rapid changes in WWW route connections), and large volumes of information to be exchanged without jeopardizing routing efficiency or the reachability of Internet portions, a more comprehensive and effective approach is needed. In the following, a technique, called subnetting, to reduce the uncontrolled growth of Internet routing tables is described.

3.7.3 Subnetting Introduced in [Mogul85], subnetting allows a single Class A, B, or C network number to be divided into smaller parts. This is achieved by splitting the standard classful hostID field into two parts: the subnetID

© 2005 by CRC Press

3-22

The Industrial Communication Technology Handbook

FIGURE 3.6 Extended network prefix for subnetting.

and the hostID on that subnet (as shown in Figure 3.6). The (netID, subnetID) pair forms the extended network prefix. In this way, if the internal network of a large organization is split into several subnetworks, the division is not visible outside the organization’s private network. This allows Internet routers to use a single routing table entry for all the subnets of a large organization, thus reducing the size of their routing tables. However, subnetworks are visible to internal routers, which will have to differentiate between the internal routes. Internet routers therefore use only the netID of the destination address to route traffic, while internal routers will use the extended network prefix. Subnetting has two main advantages. First, it hides the complexity of private network organization within the private network boundary, preventing it from spreading outside and affecting the size of Internet router routing tables. Second, thanks to subnetting, local administrators do not have to obtain a new network number from the Internet when deploying new subnets. Each bit in the subnetID mask has a one-to-one correspondence with the Internet address. If a bit in the subnet mask is set to 1, the corresponding bit in the IP address will be considered by the router as part of the extended network prefix. Otherwise, the corresponding bit in the IP address will be considered as part of the host number. Modern routing protocols still carry the complete four-octet subnet mask. The use of a single subnet mask, however, limits organizations to hosting a fixed number of fixed-size subnets. A further improvement that greatly enhances flexibility is using more than one subnet mask for an IP network split into several subnetworks. This solution, called variable-length subnet masks, is described next.

3.7.4 Variable-Length Subnet Masks Variable-length subnet masks (VLSM), introduced in [Brade87], proposed the possibility of using more than one subnet mask for a subnetted IP network. As in this case, where the extended network prefix may have different lengths, the subnetted IP network is called a network with variable length subnet masks. VLSM is a powerful improvement in flexibility. VLSM also offers another significant benefit — the possibility of introducing route aggregation (summarization). Route aggregation is defined as the ability of a router to collapse several forwarding information base entries into a single entry [Trotter01]. This aggregation process works in combination with VLSM to build hierarchically structured networked environments. VLSM allows us to recursively divide the address space of a large organization. This is accomplished by splitting a large network into subnets, some of which are then further divided into subnets, some of which will, in turn, be split into subnets. This division makes routing information relevant to one group of subnets invisible to routers belonging to another subnet group. A single router can therefore summarize multiple subnets behind it into a single advertisement, thus allowing a reduction in the routing information to be maintained at the top level. From the routing perspective, route summarization offers the following advantages: • It reduces the amount of information stored in routing tables.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-23

• It simplifies the routing process, thus reducing the load on router resources (e.g., processor and memory). • It improves network convergence time and isolates topology changes. Without summarization, every router would need to have a route to every subnet in the network environment. As the size of the network gets larger, the more crucial route summarization becomes. 3.7.4.1 Routing Protocol Requirements for Deploying VLSM Now that the advantages of VLSM are clear, let us analyze what features routing protocols have to offer in order to provide support for VLSM deployment. The first requirement is that routing protocols must carry extended network prefix information along with each route advertisement. This allows each subnetwork to be advertised with its corresponding prefix length or mask. As said before (in Section 3.5), some protocols, such as OSPF, Interior IS-IS, and RIPv2, provide this feature. RIPv1, on the other hand, allows only a single subnet mask to be used within each network number, because it does not provide subnet mask information as part of its routing table update messages. In this case, a router would have to either guess that the locally configured prefix length should be used (but this cannot guarantee that the correct prefix will be applied) or perform a lookup in a statically configured prefix table containing all the masking information (but static tables raise severe scalability issues, require nonnegligible effort for maintenance, and are error-prone). As a result, to successfully deploy VLSM in a large complex network, the designer must choose an IGP such as OSPF, IS-IS, or RIP v2, while RIPv1 should be avoided. The second requirement for deploying VLSM is that all routers must adopt a consistent forwarding algorithm based on the longest match. When VLSM is implemented, it may happen that a destination address matches multiple routes in a router’s routing table. As a route with a longer extended network prefix is more specific than a route with a shorter one, when forwarding traffic, routers must always choose the route with the longest matching extended network prefix. The third requirement to support VLSM is related to route aggregation and consists of assigning addresses so that they have topological significance. This means that addresses have to reflect the hierarchical network topology. In general, network topology follows continental and national boundaries, so IP addresses should be assigned on this basis. If the organizational topology does not match the network topology, route aggregation should not be applied. While, in fact, it is reasonable to aggregate a pool of addresses assigned to a particular region of the network into a single routing advertisement, it is not meaningful to group together addresses that are not topologically significant. Wherever route aggregation cannot be applied, the size of the routing tables cannot be reduced. The solution that allows today’s Internet to operate normally despite the problems related to the depletion of IPv4 addressing space and to the growing size of Internet routing tables is classless interdomain routing (CIDR), which is described below.

3.7.5 Classless Interdomain Routing CIDR, also called supernetting, relaxes the traditional rules of classful IP addressing; as with CIDR, the netID is not constrained to 8, 16, or 24 bits anymore, but may be any number of bits long. This realizes the so-called classless addressing [Hinden93][Rekht93a][Fuller93][Rekht93b]. In the CIDR model, a prefix length specifies the number of leading bits in the 32-bit address that represent the network portion of the address. For example, the network address in the dotted decimal form a.b.c.d/21 indicates that the first 21 bits specify the netID, while the remaining 11 bits identify the specific hosts in the organization. As a result, CIDR supports the deployment of arbitrarily sized networks rather than the standard 8-bit, 16-bit, or 24-bit network numbers associated with classful addressing. Moreover, the rightmost 11 bits could be further divided through subnetting [Mogul85], so that new internal networks within the a.b.c./21 network can be created. For route advertising, instead of the traditional high-order bits scheme, prefix length is used. Routers supporting CIDR therefore rely on the prefix length information provided with the route.

© 2005 by CRC Press

3-24

The Industrial Communication Technology Handbook

192.0.0.0/8

192.169.0.0/16

192.168.1.0/24 192.168.2.0/24 192.168.3.0/24

192.169.1.0/24

FIGURE 3.7 CIDR and route summarization.

CIDR also supports route aggregation. As a result, with CIDR a single routing table entry can represent the address space of thousands of traditional classful routes. That is, in route advertisement, networks can be combined into supernets, as long as they have a common network prefix (Figure 3.7). This is crucial to reduce the size of Internet backbone router routing tables and to simplify routing management. The implementation of CIDR in the Internet is mainly based on the BGP-4 protocol. Internet is divided into addressing domains in a hierarchical way. Within a domain, detailed information about all the networks belonging to the domain is available, while outside the domain only the common network prefix is advertised. This allows a single routing table entry to specify a route to many individual network addresses. CIDR and VLSM are similar, since both allow a portion of the IP address space to be recursively divided into smaller pieces. Both approaches require that the extended network prefix information be provided with each route advertisement and use longest matching for addresses. The key difference between VLSM and CIDR is a matter of where recursion is performed. In VLSM the subdivision of addresses is done after the address range is assigned to the user (e.g., a private enterprise’s network). In CIDR the subdivision of addresses is done by the Internet authorities and ISPs before the user receives the addresses. CIDR deployment also imposes the same routing protocol requirements as VLSM. Although CIDR, in combination with network address translation (NAT), represents an acceptable short-term solution to today’s Internet deficiencies, the long-term solution is to redesign the address format to allow for more possible addresses and more efficient routing on the Internet. This is the reason for a new version of the IP, called IP version 6 (IPv6) [Deeri98], which was devised to overcome the limitations of IPv4.

3.8 IPv6 Overview IPv6 [Deeri98], the new version of the IP, solves the problem of a limited number of available IPv4 addresses and also adds many improvements in areas such as routing, network self-configuration, and QoS support. The IPv6 protocol has been streamlined to expedite packet handling by network nodes and provides support for congestion control, reducing the need for reliable, but untimely higher-level protocols (e.g., TCP). Moreover, IPV6 characteristics can be exploited inside routers to entrust them with the task of providing diversified scheduling for real-time and non-real-time flows. The IPv6 header is different from the IPv4 header format, and this makes the two protocols incompatible. Although the IPv6 addresses are four times longer than the IPv4 addresses, the IPv6 header is only twice the size of the IPv4 header, as several functions present in the IPv4 header have been relocated in extension headers or dropped [Deeri98].

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-25

For example, in IPv6 there is no checksum field. This speeds up routing, as routers are relieved from recalculating checksum for each incoming packet. IPv6 also provides the hop-limit field, which indicates the number of hops remaining valid for this packet. Used to limit the impact of routing loops, the hop limit field is decremented by 1 by each node that forwards the packet. The packet is discarded if the hop limit reaches zero. In IPv4 there was a timeto-live field, but it was expressed in seconds. The change in number of hops reduces the processing time within routers. Another difference is that in IPv6 the minimum header length is 40 bytes long and can be supplemented by a number of extension headers of variable length. For example, the payload length in bytes is specified in the payload length field. Note that IPv6 options can be of arbitrary length (and not limited to 40 bytes, as in IPv4). Another interesting field in the IPv6 header is the flow label [Partri95], which enables the source to label packets for special handling by intermediate systems. The important features of IPv6 can therefore be summarized as follows [Deeri98]: • A new addressing scheme • No fragmentation/reassembly at the intermediate routers: These time-consuming operations are left to the source and destination, thus improving IP forwarding within the network. • Simplified header and fixed-length options in the header: Compared to IPv4, these features reduce the processing overhead on routers. • Support for extension headers and options: IPv6 options are placed in separate headers located in the packet between the IPv6 header and the transport layer header. Most IPv6 option headers are not processed by any router along a path before the packet arrives at its final destination, and this improves router performance for packets containing options. • Quality-of-service capabilities: These are crucial for the future evolution of the Internet. A new capability enables labeling of packets belonging to specific traffic flows for which the sender has requested special handling, such as nondefault quality of service or real-time service. • Support for authentication and privacy: Through an extension that provides support for authentication and data integrity. • Support for source routes: IPv6 includes an extended source routing header designed to support source-initiated selection of routes (used to complement the route selection provided by existing routing protocols for both interdomain and intradomain routes). Some extension headers are exploited for routing purposes. The routing header is used by a source to list one or more intermediate nodes to be visited on the way to a packet’s destination (more details will be given below). This particular form of the routing header is designed to support source demand routing (SDR) [Estrin96]. The hop-by-hop options header is used to carry optional information that must be examined by every node along a packet’s delivery path. Finally, the end-to-end options header is used to carry optional information that needs to be examined only by a packet’s destination node (or nodes).

3.8.1 IPv6 Addressing, Subnetting, and Routing The address field in IPv6 is 128 bits long. This not only entails a higher number of possible IP addresses than the one achievable with the 32-bit IPv4 address, but also allows more levels of the addressing hierarchy and simpler autoconfiguration. IPv6 addresses are assigned to interfaces, not nodes. A node can have several interfaces, and therefore it can be identified by any of the unicast addresses assigned to any of its interfaces. IPv6 supports unicast, anycast, and multicast addresses. With unicast addressing, the packet is delivered to the interface identified by the specific address. With anycast addressing, the packet is delivered to one of the interfaces identified by that address, i.e., the nearest one, according to the routing metric adopted. Anycast can be considered

© 2005 by CRC Press

3-26

The Industrial Communication Technology Handbook

a refinement of unicast devised for simplifying and streamlining the routing process. When used as part of a route sequence, anycast addresses permit a node to select which of several ISPs it wants to transfer its traffic. This capability is referred to as source-selected policies. Finally, with multicast addressing, the packet is delivered to all the interfaces identified by that address. With multicast and anycast addressing, addressed interfaces typically belong to different nodes. The address form can be expressed in three different ways: preferred, compressed, and mixed. The preferred form is the full IPv6 address in hexadecimal values, which is H:H:H:H:H:H:H:H, where each H refers to a hexadecimal integer (16 bits). The compressed form substitutes zero strings with a shorthand indicator — double colons (::) — to compress arbitrary-length strings of zeros. This is useful as IPv6 addresses containing long strings of zeros are quite common. The mixed form is represented as H:H:H:H:H:H:H:H:D.D.D.D, where the Hs represent the hexadecimal values of the six high-order 16-bit parts of the address, while the Ds stand for the standard IPv4 decimal value representation of the four low-order 8-bit parts of the address. This mixed form is useful in hybrid IPv4/ IPv6 environments. There are two special addresses, the unspecified address 0:0:0:0:0:0:0:0 and the loopback address. The first one indicates the absence of an address, which must never be assigned to any node or used as the destination address. The second is the special unicast address 0:0:0:0:0:0:0:1, which may be used by a node to send a packet to itself. There are six types of unicast IPv6 addresses. Here we only mention aggregatable global unicast addresses, which can be routed globally on the IPv6 backbone, i.e., the 6Bone, and are equivalent to public IPv4 addresses. Aggregatable global unicast addresses can be aggregated or summarized to produce an efficient routing infrastructure. IPv6 subnetting can be compared to classless addressing. As in the CIDR notation, the prefix length indicates the leading bits that constitute the netID. IPv6 routing is hierarchical and reflects the classless concept. However, with IPv6, small to regional network service providers and end users are no longer able to obtain address space directly from numbering authorities such as IANA [IANA]. Only top-level aggregators (TLAs) (i.e., large ISPs) will be assigned address space from the Internet Registry. TLAs will be assigned address blocks, which they will in turn handle and delegate to their downstream connections, i.e., next-level aggregators (NLAs) (medium-size ISPs or specific customers sites) and site-level aggregators (SLAs) (individual organizations) [3COM]. With this new hierarchical architecture, the number of entries to be maintained in the routing tables of Internet core routers is reduced, thus limiting the routing complexity of the future Internet. IPv6 embeds simple routing extensions that support powerful new routing functionalities, such as: • Provider selection (according to some criteria such as policy, performance, cost, etc.) • Host mobility (route to current location) • Auto-readdressing (route to new address) The new routing functionality is obtained by creating sequences of IPv6 addresses using the IPv6 routing option. The routing option is used by an IPv6 source to list one or more intermediate nodes to be visited on the way to a packet’s destination. (This function is IPv4’s loose source and records route options.) To enable address sequences, IPv6 hosts are required in most cases to reverse routes in a packet they receive (if the packet was successfully authenticated using the IPv6 authentication header) containing address sequences in order to return the packet to its originator. The address sequence facility of IPv6 is simple but powerful. As an example, if host H1 were to decide to enforce a policy that all packets to/from host H2 should only go through given provider ISPx, it would construct a packet containing the following address sequence: H1, ISPx, H2. This ensures that when H2 replies to H1, it will reverse the route and the reply will go through ISPx. The addresses in H2’s reply would be: H2, ISPx, H1.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-27

FIGURE 3.8 Dual stack.

3.8.2 IPv6 Deployment in the Current Internet: State of the Art and Migration Issues 3.8.2.1 Transition from IPv4 to IPv6 The two versions of IP are not compatible, so they cannot coexist in the same subnet. This makes the spread of the protocol difficult. No company, organization, or university would switch to IPv6 if it meant turning the network off until all the nodes and routers are updated. To cope with these problems, the IETF has standardized an almost painless transition mechanism called SIT (simple Internet transition) [Gillig96]. The new protocol has been provided with properties that allow a simple, fast transition and mechanisms that allow the two protocols to coexist. IPv6 thus has the following properties: • Incremental update: IPv4 nodes can be updated to IPv6 one at a time. • Minimal dependence in update operations: The only requirement is that before performing a transition operation on a host, it has to be performed on the Domain Name Service (DNS) server; there are no requirements for transition on routers. • Easy addressing: At the moment of transition, the same addresses can be used simply by transforming them. A class of addresses that exploits IPv4 addresses (IPv4 compatibile) has been provided. • Low initialization costs: Little effort is required to update IPv4 to IPv6 and initialize new systems supporting IPv6. In addition, two cooperation mechanisms have been provided: dual stack and tunneling. 3.8.2.1.1 Dual Stack These are gateways that support both IPv4 and IPv6 implementing the two protocols completely, so as to allow IPv6 nodes to receive correctly traffic from nodes using IPv4 (as shown in Figure 3.8). These gateways receive IPv4 packets, replace the header with an IPv6 one, and relaunch them to the IPv6 subnet. Dual-stack nodes have at least two addresses, an IPv4 one and an IPv6 one, that can be related. For example, the IPv6 address may be IPv4 compatible, but not necessarily. 3.8.2.1.2 Tunneling This mechanism was actually born before IPv6 to solve certain communication problems in networks using different protocols. It consists of creating a virtual tunnel between two network nodes. In practice, a whole packet arriving at the first node (from the network header onward) is inserted into the payload of another packet. In IPv6 the routers providing tunneling are similar to the gateway dual stacks discussed above. They encapsulate IPv6 packets in IPv4 packets, as can be seen in Figure 3.9. This allows IPv6 packets to pass through networks that do not support this protocol. At the destination there obviously has to be a router to perform the inverse operation, i.e., to open the IPv4 packets, extract the IPv6 packets, and send them to the destination subnet. This mechanism is shown in Figure 3.10, where routers RA and RB handle the tunnel. This mechanism is fundamental for the worldwide spread of IPv6: it allows communications between IPv6 islands through the IPv4 network. IPv6 islands are the subnets using IPv6 instead of IPv4. To communicate with each other through the IPv4 network, they have to use tunneling. There are two kinds of tunneling:

© 2005 by CRC Press

3-28

The Industrial Communication Technology Handbook

FIGURE 3.9 Tunneling.

FIGURE 3.10 Tunneling.

• Configured tunneling: When the packet’s destination is not at the end of the tunnel. There are two modes: • Router to Router: The tunnel interconnects two IPv6 routers that are neither the source nor the destination of the IPv6 packet. In this case, the tunnel is part of the path the packet has to cover. • Host to Router: The tunnel interconnects an IPv6 host, which is the source of the packet, to an IPv6 router, which is not the packet’s destination. In this case, the tunnel is the first part of the path the packet has to cover. • Automatic tunneling: When the packet’s destination is at the end of the tunnel. Again, there are two modes: • Host to Host: The tunnel connects two IPv6 hosts; the first is the source of the packet and the second its destination. In this case, the tunnel is the whole path the packet has to cover. • Router to Host: The tunnel connects an IPv6 router and an IPv6 host; the former is not the source of the packet, but the latter is its destination. In this case, the tunnel is the final part of the path the packet has to cover. 3.8.2.2 An Experimental Network: The 6Bone The first lab experiments on networking solutions based on IPv6 soon led to worldwide geographical area experimentation and the introduction in 1996 of the 6Bone network [6Bone]. 6Bone (IPv6 backbone) is an experimental IPv6 network, parallel to the network using IPv4 and realized by interconnecting IPv6 labs via tunneling. It became a reality in March 1996 with the setting up of a first tunnel between the IPv6 labs of G6 [G6] (France), UNI-C [UNI-C] (Denmark), and WIDE [WIDE] (Japan). 6Bone has seen continuous growth in the number of interconnected labs and is the environment in which the most interesting IPv6 protocol experiments are being carried out: verification of the maturity of implementations, handling of the addressing spaces assigned to experimental providers, IPv6 routing, etc. The network is organized as a three-layer hierarchy. At the highest level are the sites making up the 6Bone backbone, i.e., the portion of the network on which most of the geographical connectivity enjoyed by the other sites connected is based. At the next level down there are the so-called 6Bone transit sites, i.e., sites that are connected to at least one of the backbone sites but which, in turn, operate as network access points for nodes that do not have a direct tunnel toward the backbone. The latter are the lowest level in the current 6Bone hierarchy and are called leaves. Connectivity between the backbone sites is ensured by a large number of tunnels on the Internet and some direct links forming an arbitrary mesh topology within which the routing of IPv6 packets is based

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-29

on the BGPv4+ dynamic routing protocol [Bates00] (a version of BGP4 that is capable of supporting both IPv4 and IPv6).

3.9 Conclusions The adoption of IPv6 has been limited up to now, as it requires modification of the entire infrastructure of the Internet. More effort on the transition is therefore necessary to make it as simple as possible and open the way for the potential of IPv6. IPv6 is expected to gradually replace IPv4, with the two coexisting for a number of years during a transition period, thanks to tunneling and dual-stack techniques. Currently, IPv6 is implemented on the 6Bone network [6Bone], a collaborative project involving Europe, North America, and Japan. Studies about how to exploit novel IPv6 features have already appeared, and companies are also interested in IPv6 technology to overcome IPv4 limitations [Ficher03][LoBello03][VaSto03]. Migration from IPv4 to IPv6 is expected to be gradual, due to reasons that slow the process, for example: • Increased memory requirements for intermediate devices, such as routers, switches, etc., for network addresses • The extra load on domain name systems (DNSs), which need to maintain and provide both the addresses that, during the transition, each IPv6 host will have, i.e., an IPv4 28-bit address and an IPv6 128-bit one • The need to redesign the user interfaces of current TCP/IPv4 applications and services, which are based on traditional 32-bit addresses and therefore have to be adapted to work with the larger IPv6 addresses Nevertheless, all the major router vendors have already started to enable IPv6 implementation on their systems. Among the routing protocols supporting IPv6 are RIPng [Malkin98], OSPFv3 [Coltun99], Integrated IS-ISv6 [Hopps03], and MP-BGPv6 [Marque99]. More details can be found in [Cisco02b].

References [3COM] Understanding IP Addressing, White Paper, 3COM Corporation, www.3com.com. [6Bone] http://www.6bone.net. [Awdu02] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao, RFC 3272, Overview and Principles of Internet Traffic Engineering, May 2002. [Bates00] T. Bates, Y. Rekhter, R. Chandra, D. Katz, RFC 2858, Multiprotocol Extensions for BGP-4, June 2000. [Brade87] R. Braden, J. Postel, RFC 1009, Requirements for Internet Gateways, June 1987. [CBT] RFC 2189, Core-Based Tree (CBT Version 2) Multicast Routing: Protocol Specification, September 1997. [CBTv2] RFC 2201, Core-Based Tree (CBT) Multicast Routing Architecture, September 1997. [Chen99] E. Chen, J. Stewart, RFC 2519, A Framework for Inter-Domain Route Aggregation, February 1999, ftp://ftp.rfc-editor.org/in-notes/rfc2519.txt. [Cisco97] Cisco Systems, Integrating Enhanced IGRP into Existing Networks, 1997, http://www.cisco.com (search for the document title). [Cisco02a] Cisco Systems, Inc., Enhanced IGRP, http://www.cisco.com/univercd/cc/td/doc/cisintwk/ ito_doc/en_igrp.htm. [Cisco02b] Cisco IOS Learning Services, The ABCs of IP Version 6, 2002, www.cisco.com/go/abc. [Cisco03] Cisco Systems, Inc., Internetworking Technologies Handbook, Cisco Press, Indianapolis, 2003. [Callon90] R.W. Callon, RFC 1195, Use of OSI IS-IS for Routing in TCP/IP and Dual Environments, December 1990. [Coltun99] R. Coltun, D. Ferguson, J. Moy, RFC 2740, OSPF for IPv6, December 1999.

© 2005 by CRC Press

3-30

The Industrial Communication Technology Handbook

[Dalal78] Y.K. Dalal and R.M. Metcalf, Reverse path forwarding of broadcast packets, Communications of the ACM, 21(12), 1040–1048, December 1978. [Deeri88] S. Deering, Multicast routing in internetworks and extended LANs, ACM Computer Communication Review, 18(4), Proceedings of ACM SIGCOMM’88, pp. 55–64, Stanford, Aug. 16–19, 1988. [Deeri98] S. Deering, R. Hinden, RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, December 1998. [Dijks59] E.W. Dijkstra, A note on two problems in connection with graphs, Numer. Math., 1, 269–271, 1959. [Droms97] R. Droms, RFC 2131, Dynamic Host Configuration Protocol, March 1997. [Egeva94] K. Egevang, P. Francis, RFC 1631, The IP Network Address Translator (NAT), May 1994. [Estrin96] D. Estrin, T. Li, Y. Rekhter, K. Varadhan, D. Zappala, RFC 1940, Source Demand Routing: Packet Format and Forwarding Specification (Version 1), May 1996. [Estrin98] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, June 1998. [Fenne97] W. Fenner, RFC 2236, Internet Group Management Protocol, Version 2, November 1997. [Ficher03] S. Fichera, S. Visalli, O. Mirabella, QoS Support for Real-Time Flows in Internet Routers, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Floyd97] S. Floyd, V. Jacobson, Synchronization of periodic routing messages, IEEE/ACM Transactions on Networking, 2, 122–136, 1997. [Fuller93] V. Fuller, T. Li, J. Yu, K. Varadhan, RFC 1519, Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy, September 1993. [G6] http://www.g6.asso.fr. [Garcia93] J.J. Garcia-Luna-Aceves, Loop-free routing using diffusing computations, IEEE/ACM Transactions on Networking, 1, 130–141, 1993. [Gillig96] R. Gilligan, E. Nordmark, RFC 1933, Transition Mechanisms for IPv6 Hosts and Routers, April 1996. [Hakimi71] S.L. Hakimi, Steiner’s problem in graphs and its implications, Networks, 1, 113–133, 1971. [Halabi00] B. Halabi, D. McPherson, Internet Routing Architectures, Cisco Press, Indianapolis, 2000. [Hedri88] C.L. Hedri, RFC 1058, Routing Information Protocol, June 1988. [Hedri91] C.L. Hedri, An Introduction to IGRP, August 1991, http://www.cisco.com/warp/public/103/ 5.html. [Hinden93] R. Hinden, Editor, RFC 1517, Applicability Statement for the Implementation of Classless Inter-Domain Routing (CIDR), September 1993. [Hopps03] C.E. Hopps, Routing IPv6 with IS-IS, January 2003, draft-ietf-isis-ipv6-05.txt. [Huitem95] C. Huitem, Routing in the Internet, Prentice Hall, 1995. [IANA] Internet Assigned Number Authority homepage, http://www.iana.org/. [Kenyon02] T. Kenyon, Data Networks, Digital Press, Elsevier Science, 2002. [Kumar96] V. Kumar, Mbone: Interactive Media on the Internet, New Riders Publishing, Indianapolis, 1996. [Kurose01] J.F. Kurose, K. Ross, Computer Networking, Addison-Wesley, Reading, MA, 2001. [Lewis99] C. Lewis, Cisco TCP/IP Routing Professional Reference, McGraw-Hill Companies, New York, 1999. [LoBello03] L. Lo Bello, S. Fichera, S. Visalli, O. Mirabella, Congestion Control Mechanisms for MultiHop Network Routers, paper presented at IEEE International Conference on Emerging Technologies and Factory Automation ETFA2003, Lisbon, Portugal, October 2003. [Malkin97] G. Malkin, R. Minnear, RFC 2080, RIPng for IPv6, January 1997. [Malkin98] G. Malkin, RFC 2453/STD 0056, RIP Version 2, November 1998. [Marque99] P. Marques, F. Dupont, RFC 2545, Use of BGP-4 Multiprotocol Extensions for IPv6 InterDomain Routing, March 1999.

© 2005 by CRC Press

A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues

3-31

[Mogul85] J. Mogul, J. Postel, RFC 950, Internet Standard Subnetting Procedure, August 1985. [Moy89] J. Moy, RFC 1131, OSPF Specification, October 1989. [Moy88] J. Moy, RFC 2328, OSPF Version 2, April 1998. [Moy94] J. Moy, RFC 1584, Multicast Extensions to OSPF, March 1994. [Nicho03] J. Nicholas, W. Siadak, Protocol Independent Multicast — Dense Mode (PIM-DM): Protocol Specification (Revised), September 2003, draft-ietf-pim-dm-new-v2-04.txt. [Partri95] C. Partridge, RFC 1809,Using the Flow Label Field in IPv6, June 1995. [Pepel00] I. Pepelnjak, EIGRP Network Design Solutions, Cisco Press, Indianapolis, 2000. [Perlm92] R. Perlman, Interconnections: Bridges and Routers, Addison-Wesley, Reading, MA, 1992. [Poste81] J. Postel, RFC 791, Internet Protocol, September 1981. [Rekht93a] Y. Rekhter, T. Li, RFC 1518, An Architecture for IP Address Allocation with CIDR, September 1993. [Rekht93b] Y. Rekhter, C. Topolcic, RFC 1520, Exchanging Routing Information across Provider Boundaries in the CIDR Environment, September 1993. [Rekhr95] Y. Rekhter, T. Li, RFC 1771, A Border Gateway Protocol 4 (BGP-4), March 1995. [Rosen82] E.C. Rosen, RFC 0827, Exterior Gateway Protocol, October 1982. [Schulz03] H. Schulzrinne, S. Casner, R. Frederick, RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [Trotter01] G. Trotter, RFC 3222, Terminology for Forwarding Information Base (FIB) Based Router Performance, December 2001. [UNI-C] http://www.uni-c.dk. [VaSto03] P. Van der Stok, M. van Hartskamp, Robust Real-Time IP-Based Multimedia Communication, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Waitz88] D. Waitzman, C. Partridge, RFC 1075, Distance Vector Multicast Routing Protocol, November 1988. [Wall80] D. Wall, Mechanisms for Broadcast and Selective Broadcast, Ph.D. Dissertation, Stanford University, June 1980. [Waxm88] B.M. Waxmann, Routing of multipoint connections, IEEE Journal of Selected Areas in Communications, 6, 1617–1622, 1988. [Wei93] L.Wei, D. Estrin, TR USC-CD-93-560, A Comparison of Multicast Trees and Algorithms, Department of Computer Science, University of Southern California, Los Angeles, September 1993. [WIDE] http://6bone.v6.wide.ad.jp.

© 2005 by CRC Press

4 Fundamentals in Quality of Service and Real-Time Transmission 4.1 4.2

What Is Quality of Service? ................................................4-1 Factors Affecting the Network Quality..............................4-3 Bandwidth • Throughput • Latency • Queuing Delay • Transmission Delay • Propagation Delay • Processing Delay • Jitter • Packet Loss

4.3

QoS Delivery........................................................................4-6 FIFO Queuing • Priority Queuing • Class-Based Queuing • Weighted Fair Queuing

4.4

Protocols to Improve QoS ..................................................4-8 Integrated Services • Differentiated Services • Multi-Protocol Label Switching • Combining QoS Solutions

4.5

Wolfgang Kampichler Frequentis GmbH

Protocols Supporting Real-Time Traffic..........................4-14 Real-Time Transport Protocol • Real-Time Transport Control Protocol • Real-Time Streaming Protocol

References .....................................................................................4-17

4.1 What Is Quality of Service? It is difficult to find an adequate definition of what quality of service (QoS) actually is. There is a danger that because we wish to use quantitative methods, we might limit the definition of QoS to only those aspects of QoS that can be measured and compared. In fact, there are many subjective and perceptual elements to QoS, and there has been a lot of work done trying to map the perceptual to the quantifiable (particularly in the telephony industry). However, as yet there does not appear to be a standard definition of what QoS actually is in measurable terms. When considering the definition of QoS, it might be helpful to look at the old story of the three blind men who happen to meet an elephant on their way. The first man touches the elephant’s trunk and determines that he has stumbled upon a huge serpent. The second man touches one of the elephant’s massive legs and determines that the object is a large tree. The third man touches one of the elephant’s ears and determines that he has stumbled upon a huge bird. All three of the men envision different things, because each man examines only a small part of the elephant. In this case, think of the elephant as a concept of QoS. Different people see QoS as different concepts, because various and ambiguous QoS problems exist. Hence, there is more than one way to characterize QoS. Briefly described, QoS is the

4-1 © 2005 by CRC Press

4-2

The Industrial Communication Technology Handbook

ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery [3]. By nature, the basic Internet Protocol (IP) service available in most of the network is a best effort. For instance, from a router point of view (upon receiving a packet at the router), this service could be described as follows: • It determines first where to send the incoming packet (the next-hop of the packet). This is usually done by looking up the destination address in the forwarding table. • Once it is aware of the next-hop, it will send the packet to the interface associated with this nexthop. If the interface is not able to immediately send the packet, it is stored on the interface in an output queue. • If the queue is full, the arriving packet is dropped. If the queue already contains packets, the newcomer is subjected to extra delay due to the time needed to emit the older packets in the queue. Best effort allows the complexity to stay in the end hosts, so the network can remain relatively simple. This scales well, as evidenced by the ability of the Internet to support its growth. As more hosts are connected, the network degrades gracefully. Nevertheless, the resulting variability in delivery delay and packet loss does not adversely affect typical Internet applications (e.g., e-mail or file transfer). Considering applications with real-time requirements, delay, delay variation, and packet loss will cause problems. Generally, applications are of two main types: • Applications that generate elastic traffic — The application would rather wait for reception of traffic in the correct order, without loss, than display incoming information at a constant rate (such as an e-mail). • Applications that generate inelastic traffic — Timeliness of information is more important to the application than zero loss, and traffic that arrives after a certain delay is essentially useless (such as voice communication). In an IP-based network, applications run across User Datagram Protocol (UDP) or Transmission Control Protocol (TCP) connections. TCP guarantees delivery, doing so through some overhead and session layer sequencing of traffic. It also throttles back transmission rates to behave gracefully in the face of network congestion. By contrast, UDP is connectionless; thus, no guarantee of delivery is made, and sequencing of information is left to the application itself. Most elastic applications use TCP for transmission, and in contrast, many inelastic applications use UDP as a real-time transport. Inelastic applications are often those that demand a preferential class of service or some form of reservation to behave properly. However, many of the mechanisms that network devices use (such as traffic discard or TCP session control) are less effective on UDP-based traffic since it does not offer some of TCP’s self-regulation. Common for all packets is that they are treated equally. There are no guarantees, no differentiation, and no attempt at enforcing fairness. However, the network should try to forward as much traffic as possible with reasonable quality. One way to provide a guarantee to some traffic is to treat those packets differently from packets of other types of traffic. Increasing bandwidth is seen as a necessary first step for accommodating real-time applications, but it is still not enough. Even on a relatively unloaded network, delivery delays can vary enough to continue to affect time-sensitive applications adversely. To provide an appropriate service, some level of quantitative or qualitative determinism must be supplemented to network services. This requires adding some “intelligence” to the net, to distinguish traffic with strict timing requirements from others. Yet there remains a further challenge: in the real world, the end-to-end communication path consists of different elements utilizing several network layers and traversing domains managed by different service providers. Therefore, it is unlikely that QoS protocols will be used independently, and in fact, they are designed for use with other QoS technologies to provide top-to-bottom and end-to-end QoS between senders and receivers. What does matter is that each element has to provide QoS control services and the ability to map other QoS technologies in the correct manner. The following gives a brief overview

© 2005 by CRC Press

4-3

Fundamentals in Quality of Service and Real-Time Transmission

LAN Ethernet 100 Mbps

LAN Ethernet 100 Mbps

WAN STM-1 155 Mbps Router

Router

Processing delay Queuing delay Transmission delay Propagation delay

FIGURE 4.1 Network end-to-end communication path.

of end-to-end network behavior and some key QoS protocols and architectures. For a detailed description, refer to other chapters in this book.

4.2 Factors Affecting the Network Quality A typical end-to-end communication path might look like that illustrated in Figure 4.1 and consist of two machines, each connected through a local area network (LAN) to an enterprise network. Further, these networks might be connected through a wide area network (WAN). The data exchange can be anything from a short e-mail message to a large file transfer, an application download from a server, or communication data from a time-sensitive application. While networks, especially local area networks, have been getting faster, perceived throughput at the application has not always increased accordingly. An application is generally running on a host CPU, and its performance is a function of the processing speed, memory availability, and overall operating system load. In many situations, it is the processing that is the real limiting factor on throughput, rather than the infrastructure that is moving data [14]. Network interface hardware transfers incoming packets from the network to the computer’s memory and informs the operating system that a packet has arrived. Usually, the network interface uses the interrupt mechanism to do so. The interrupt causes the CPU to suspend normal processing temporarily and to jump to a code called a device driver. The device driver informs the protocol software that a packet has arrived and must be processed. Similar operations occur in each intermediate network node. Routing devices pass packets along a chain of hops until the final address is reached. These hops are routing machines of various kinds that generally maintain a queue (or multiple queues) of outgoing packets on each outgoing physical port [2]. If these queues of outgoing data packets become full, a routing machine simply starts to discard packets randomly to ease the buildup of congestion. It is evident that such nodes are customized for forwarding operations, which are mostly processed in hardware. In recent years, however, the Internet has seen increasing use of applications that rely on the timely, regular delivery of packets, and that cannot tolerate the loss of packets or the delay caused by waiting in queues. In general, the one-way delay is equivalent to the sum of single-hop delays suffered between each pair of consecutive pieces of equipment encountered on the path. Measurable factors [7], [8] that are used to describe network QoS are as follows.

4.2.1 Bandwidth Bandwidth (better described as data rate in this context) is the transmission capacity of a communications line, which is usually stated in bit/second. In reality, as data exchange approaches the maximum limit (in a shared environment), delays and collisions might mean a drop in quality. Basically, the bandwidths of all networks utilized in an end-to-end path need to be considered, as the narrowest section provides the maximum speed of data transfer for the entire path. A routing device needs to be capable of transmitting data at a rate commensurate with the potential bandwidth of the network segments that it is servicing. The cost of bandwidth has fallen in recent years, but demand has obviously gone up.

© 2005 by CRC Press

4-4

The Industrial Communication Technology Handbook

TABLE 4.1

Queuing Delays

Number of Queued 1000-Bit Packets

STM-1 (155 Mbps)

STM-4 (622 Mbps)

Gigabit Ethernet (1 Gbps)

40 (80% load) 80 (85% load) 200 (93% load) 500 (97% load)

256 ms 512 ms 1280 ms 3200 ms

64 ms 128 ms 320 ms 800 ms

40 ms 80 ms 200 ms 500 ms

4.2.2 Throughput Throughput is the average of actual traffic transferred over a given link, in a given time span expressed in bit/second. It can be seen, for congestion-aware transport protocols such as TCP, as transport capacity = (data sent)/(elapsed time), where data sent represents the unique data bits transferred (i.e., not including header bits or emulated header bits). It should also be noted that the amount of data sent should only include the unique number of bits transmitted (i.e., if a particular packet is retransmitted, the data it contains should be counted only once). Hence, in such a case, the throughput is not only limited by the transmission window, but also limited by the value of the round-trip time.

4.2.3 Latency In general, latency is the time taken to transmit a packet from a sending to a receiving node. This encompasses delay in a transmission path or in a device within the transmission path. The nodes might be end stations or intermediate routes. Within a single router, latency is the amount of time between the receipt of a data packet and its transmission, which includes processing and queuing delay, as described next, among other sources of delay.

4.2.4 Queuing Delay The major random component of delay (that is, the only source of jitter) for a given end-to-end path consists of queuing delay in the network. Queuing delay depends on the number of hops in the path and the queuing mechanisms used, and it also increases with the offered load leading to packet loss if the queues are filled up. The last packet in the queue has to wait (N*8)/X seconds before being emitted by the interface, where N is the number of bytes that have to be sent before the last queued packet and X is the sending rate (bit/s). Typical queuing delay values of state-of-the-art routers are summarized in Table 4.1. Values are about 0.5 to 1 ms; thus, it can be said that queuing delay in a well-dimensioned backbone network (using priority scheduling mechanisms, as described later) would not dramatically increase latency, even if there are five to eight hops within the path. At this point, it should be mentioned that queuing delay may be impaired by edge routers connecting high- and low-bandwidth links and could easily reach tens of milliseconds, thus increasing latency more noticeably.

4.2.5 Transmission Delay Transmission, or serialization delay, is the time taken to transmit all the bits of the frame containing the packet, i.e., the time between emission of the first bit of the frame and emission of the last bit (see also [4]). It is inversely proportional to the line speed or, in other words, the ratio between packet size (bit) and transmission rate (bps). For example, transmission of a 1500-byte packet over a 10-Mbps link takes 1.2 ms, whereas for a 64-kbps link it takes 187.5 ms (the protocol overhead is not considered in either case). In general, a small packet size and a high transmission rate lower the transmission time.

4.2.6 Propagation Delay Propagation delay is the time between emission (by the emitting equipment) of the first bit (or the last bit) and the reception of this bit by the receiving equipment. It is mainly a function of the speed of light

© 2005 by CRC Press

Fundamentals in Quality of Service and Real-Time Transmission

4-5

and the distance traveled. For local area networks, propagation delay is almost negligible. For wide area connections, it typically adds 2 ms per 250 miles to the total end-to-end delay. One can assume that a well-designed homogeneous high-speed backbone network (e.g., STM-4*) would have a network delay (only propagation and queuing taken into account) of 10 ms when considering 10 hops using priority queuing mechanisms and a network extension of about 625 miles.

4.2.7 Processing Delay Most networks use a protocol suite that provides connectionless data transfer end to end, in our case IP. Link layer communication is usually implemented in hardware, but IP will usually be implemented in software, executing on the CPU in a communicating end station. Normally, IP performs very few functions. Upon inputting of a packet, it checks the header for correct form, extracts the protocol number, and calls the upper-layer protocol function. The executed path is almost always the same. Upon outputting, the operation is very similar, as shown in the following IP instruction counts: • Packet receipt: 57 instructions • Packet sending: 61 instructions Since input occurs at interrupt time, arbitrary procedures cannot be called to process each packet. Instead, the system uses a queue along with message-passing primitives to synchronize communication. When an IP datagram arrives, the interrupt software must en(d)-queue the packet and invoke a send primitive to notify the IP process that a datagram has arrived. When the IP process has no packets to handle, it calls the receiving primitive to wait for the arrival of another datagram. Once the IP process accepts an incoming datagram, it must decide where to send it for further processing. If the datagram carries a TCP segment, it must go to the TCP module; if it carries a UDP datagram, it is forwarded to the UDP module. Being complex, most TCP designs use a separate process to handle incoming segments. A consequence of having separate IP and TCP processes is that they must use an interprocess communication mechanism when they interact. Once TCP receives a segment, it uses the protocol port numbers to find the connection to which the segment belongs. If the segment contains data, TCP will add the data to a buffer associated with the connection and return an acknowledgment to the sender. If the incoming segment carries an acknowledgment for outbound data, the input process must also communicate with the TCP timer process to cancel the pending retransmission. The process structure used to handle an incoming UDP datagram is quite different from that used for TCP. As UDP is much simpler than TCP, the UDP software module does not execute as a separate process. Instead, it consists of conventional procedures executed by the IP process to handle an incoming UDP datagram. These procedures examine the destination UDP port number and use it to select an operating system queue for the incoming datagram. The IP process deposits the UDP datagram on the appropriate port, where an application program can extract it [15].

4.2.8 Jitter Jitter is best described as the variation in end-to-end delay, and it has its main source in the random component of the queuing delay. Jitter can be expressed as the distortion of interpacket arrival times when compared to the interpacket departure times from the original sending station. For instance, packets are sent out at regular intervals, but may arrive at varying irregular intervals. Jitter is the variation in interval times. When packets are taking multiple paths to reach their destination, extreme jitter can lead to packets arriving out of order. Jitter is generally measured in milliseconds, or as a percentage of variation from the average latency of a particular connection.

*Synchronous digital hierarchy (SDH) defines n transport levels (hierarchy) called a synchronous transport module-n (STM-n).

© 2005 by CRC Press

4-6

The Industrial Communication Technology Handbook

FIGURE 4.2 Classification, queuing, and scheduling.

4.2.9 Packet Loss Packets that fail to arrive, or arrive so late that they are useless, contribute to packet loss. Lost (or dropped) packets are a product of insufficient bandwidth on at least one routing device on the network path. Some packets may arrive, but have been corrupted in transit and are therefore unusable. Note that loss is relative to the volume of data that is sent and is usually expressed as a percentage of data sent. In some contexts, a high loss percentage can mean that the application is trying to send too much information and is overwhelming the available bandwidth. Packet loss starts to be a real problem when the percentage of loss exceeds a specific threshold or when loss occurs in bursts. Thus, it is important to know both the percentages of lost packets and their distribution [5].

4.3 QoS Delivery As packet-switched networks are operated in a store-and-forward paradigm, a solution for service differentiation in the forwarding process is to give priority to packets requiring, for instance, an upperbounded delay over other packets. Considering that queuing is the central component in the internal architecture of a forwarding device, it is not difficult to imagine that managing such queuing mechanisms appropriately is crucial for providing the underlying QoS. Hence, queuing can be seen as one of the fundamental parts for differentiating service levels. The queuing delay can be minimized and kept under a certain value, even in the case of interface congestion. To achieve this, the forwarding device has to support classification, queuing, and scheduling (CQS) techniques to classify packets according to a traffic type and its requirements, to place packets on different queues according to this type. Finally, to schedule outgoing packets by selecting them from the queues in an appropriate manner, see Figure 4.2. The following descriptions of queuing disciplines focus on output queuing strategies, being the predominating strategic location for store-and-forward traffic management and QoS-related queuing [3], common for all QoS policies. Queuing should never happen permanently and continuously; instead, it is used to deal with occasional traffic peaks.

4.3.1 FIFO Queuing First-in, first-out (FIFO) queuing is considered to be the standard method for store-and-forward handling of traffic from an incoming interface to an outgoing interface. Many router vendors have highly optimized forwarding performances that make this standard behavior as fast as possible. When a network operates in a mode with sufficient level of transmission capacity and adequate levels of switching capability, FIFO

© 2005 by CRC Press

Fundamentals in Quality of Service and Real-Time Transmission

4-7

queuing is highly efficient. This is because as long as the queue depth remains sufficiently short, the average packet-queuing delay is an insignificant fraction of the end-to-end packet transmission time. Otherwise, when the load on the network increases, the transient bursts raise significant queuing delay, and if the queue is full, all subsequent packets are discarded.

4.3.2 Priority Queuing One of the first queuing variations to be widely implemented was priority queuing. This is based on the concept that certain types of traffic can be identified and shuffled to the front of the output queue so that some traffic is always transmitted ahead of other types of traffic. Priority queuing may have an adverse effect on forwarding performance because of packet reordering (non-FIFO queuing) in the output queue. This method offers several levels of priority, and the granularity in identifying traffic to be classified into each queue is very flexible. Although the level of granularity is fairly robust, the more differentiation attempted, the more impact on computational overhead and packet-forwarding performance. Another possible vulnerability in this queuing approach is that if the volume of high-priority traffic is unusually high, normal traffic to be queued may be dropped because of buffer starvation. This usually occurs because of overflow caused by too many packets waiting to be queued and there is not enough room in the queue to accommodate them.

4.3.3 Class-Based Queuing Another queuing mechanism introduced several years ago is called class-based queuing (CBQ) or custom queuing (CQ). Again, this is a well-known mechanism used within operating system design intended to prevent complete resource denial to any particular class of service. CBQ is a variation of priority queuing, where several output queues can be defined. CBQ provides a mechanism to configure how much traffic can be drained off each queue in a servicing rotation. This servicing algorithm is an attempt to provide some semblance of fairness by prioritizing queuing services for certain types of traffic, while not allowing any one class of traffic to monopolize system resources. CBQ can be considered a primitive method of differentiating traffic into various classes of service, and for several years, it has been considered an efficient method for queue resource management. However, CBQ simply does not scale to provide the desired performance in some circumstances, primarily because of the computational overhead concerning packet reordering and intensive queue management in networks with very high speed links.

4.3.4 Weighted Fair Queuing Weighted fair queuing (WFQ) is another popular method of queuing that algorithmically attempts to deliver predictable behavior and to ensure that traffic flows do not encounter buffer starvation. It gives low-volume traffic flows preferential treatment and allows higher-volume traffic flows to obtain equity in the remaining amount of queuing capacity. WFQ uses a servicing algorithm that attempts to provide predictable response times and negate inconsistent packet transmission timing, which is done by sorting and interleaving individual packets by flow, and queuing each flow based on the volume of traffic in each flow [6]. Typically, low-bandwidth streams, such as Voice-over-IP (VoIP), are given priority over largerbandwidth consumers such as file transfer. The weighted aspect of WFQ is dependent on the way in which the servicing algorithm is affected by other extraneous criteria. This aspect is usually vendor specific, and at least one implementation uses the IP precedence bits in the type-of-service (ToS, or DiffServ Code Point (DSCP), as described later) field to weight the method of handling individual traffic flows. WFQ possesses some of the same characteristics as priority and class-based queuing — it simply does not scale to provide the desired performance in some circumstances, primarily because of computational overhead. However, if these methods of queuing (priority, CBQ, and WFQ) are moved completely into hardware instead of being done in software, the impact on forwarding performance can be reduced greatly.

© 2005 by CRC Press

4-8

The Industrial Communication Technology Handbook

4.4 Protocols to Improve QoS Delivering network QoS for a particular application implies minimizing the effects of sharing network resources (bandwidth, routers, etc.) with other applications. This means effective QoS aims to minimize delay, optimize throughput, and minimize jitter and loss. The reality is that network resources are shared with other competing applications. Some of the competing applications could also be time-dependent services (inelastic traffic); others might be the source of traditional, best-effort traffic. For this reason, QoS has the further goal of minimizing the parameters mentioned for a particular set of applications or users, but without adversely affecting other network users. In order to regulate network capacity, the network must classify traffic and then handle it in some way. The classification and handling may happen on a single device consisting of both classifiers and queues or routes. In a larger network, however, it is likely that classification will happen at the periphery, where devices can recognize application needs, while handling is performed at the core, where congestion occurs. The signaling between classifying devices and handling devices can come in a number of ways, like the ToS of an IP header or other protocol extensions. Classification can occur based on a variety of information sources, such as protocol content, media identifier, the application that generated the traffic, or extrinsic factors such as time of the day or congestion levels. Similarly, handling can be performed in a number of ways: • Through traffic shaping (traffic arrives and is placed in a queue, where its forwarding is regulated; excess traffic will be discarded) • Through various queuing mechanisms (first-in, first-out, priority weighting, and class-based queuing) • Through throttling using various flow control algorithms such as used in TCP • Through the selective discard of traffic to notify transmitters of congestion • Through packet marking for sending instructions to downstream devices that will shape the traffic QoS protocols are designed to act that way, but they never create additional bandwidth; rather, they manage it to be used more effectively. Briefly summarized, QoS is the ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery. The following sections give a brief overview of some of the key QoS protocols and architectures.

4.4.1 Integrated Services The Integrated Services (IntServ) architecture provides a framework for applications to choose between multiple controlled levels of delivery of services for their traffic flows. Two basic requirements exist to support this framework. The first is for the nodes in the traffic path to support the QoS control mechanisms and guaranteed services. The second is for a mechanism by which the applications can communicate their QoS requirements to the nodes along the transit path, as well as for the network nodes to communicate between each other about the requirements that must be provided for the particular traffic flow. All this is provided by a Resource Reservation Setup Protocol called RSVP [9], which is best described as the QoS signaling protocol. The information presented here is intended to be a qualitative description of the protocol, as in [3]. There is a logical separation between the Integrated Services QoS control services and RSVP. RSVP is designed to be used with a variety of QoS control services, and the QoS control services are designed to be used with a variety of setup mechanisms [11]. RSVP does not define the internal format of the protocol objects related to characterizing QoS control services; rather, it can be seen as the signaling mechanism transporting the QoS control information. RSVP is analogous to other IP control protocols, such as Internet Control Message Protocol (ICMP) or one of the many IP routing protocols. RSVP itself is not a routing protocol, but it uses the local routing table in routers to determine routes to the appropriate destinations.

© 2005 by CRC Press

Fundamentals in Quality of Service and Real-Time Transmission

4-9

FIGURE 4.3 Traffic flow of the RSVP Path and Resv messages.

In general terms, RSVP is used to provide QoS requests to all router nodes along the transit path of the traffic flows and to maintain the state necessary in the routers required to actually provide the requested services. RSVP requests, generally, result in resources being reserved in each router in the transit path for each flow. RSVP requires the receiver to be responsible for requesting specific QoS services, instead of the sender. This is an intentional design in RSVP that attempts to provide for efficient accommodation of large groups (e.g., multicast traffic), dynamic group membership (also for multicast), and diverse receiver requirements. There are two fundamental RSVP message types: the Resv message and the Path message, which provide for the basic RSVP operation, illustrated in Figure 4.3. An RSVP sender transmits Path messages downstream along the traffic path provided by a discrete routing protocol (i.e., Open Shortest-Path First (OSPF)). The Resv message is generated by the receiver and is transported back upstream toward the sender, creating and maintaining a reservation state in each node along the traffic path. RSVP still can function across intermediate nodes that are not RSVP capable. However, end-to-end resource reservations cannot be made, because non-RSVP-capable devices in the traffic path cannot maintain reservation or Path state in response to appropriate RSVP messages. Although intermediate nodes that do not run RSVP cannot provide these functions, they may have sufficient capacity to be useful in accommodating tolerant real-time applications. Since RSVP relies on a discrete routing infrastructure to forward RSVP messages between nodes, the forwarding of Path messages by non-RSVP-capable intermediate nodes is unaffected, since the Path message is carrying the IP address of the previous RSVP-capable node as it travels toward the receiver. RSVP is not a routing protocol by itself. RSVP is designed to operate with current and future unicast and multicast routing protocols. An RSVP process consults the local routing database(s) to obtain routes. In the multicast case, for example, a host sends IGMP messages to join a multicast group and then sends RSVP messages to reserve resources along the delivery path(s) of that group. Routing protocols determine where packets get forwarded — RSVP is only concerned with the QoS of those packets that are forwarded in accordance with routing (Figure 4.4). Summing up, Integrated Services is capable of bringing enhancements to the IP network model to support real-time transmissions and guaranteed bandwidth for specific flows. In this case, a flow is defined as a distinguishable stream of related datagrams from a unique sender to a unique receiver that results from a single user activity and requires the same QoS. The Integrated Services architecture promises precise per-flow service provisioning, but never really made it as a commercial end-user product, which was mainly accredited to its lack of scalability [16].

4.4.2 Differentiated Services Differentiated Services (DiffServ) defines an architecture (RFC 2474 and 2475) for implementing scalable service differentiation in the Internet. Here, a service defines some significant characteristics of packet transmission in one direction across a set of one or more paths within a network. These characteristics may

© 2005 by CRC Press

4-10

The Industrial Communication Technology Handbook

0

32 version

flags

message type

RSVP checksum

(reserved)

RSVP length

send TTL

• Version — The protocol version number; the current version is 1. • Flags — No flag bits are defined yet. • Message type — Possible values are: 1 Path, 2 Resv, 3 PathErr, 4 ResvErr, 5 PathTear, 6 ResvTear, and 7 ResvConf. • RSVP checksum — The checksum. • Send TTL — The IP TTL value with which the message was sent. • RSVP length — The total length of the RSVP message in bytes, including the common header and the variable-length objects that follow. FIGURE 4.4 Resource Reservation Protocol (RSVP). 0

8 CP

CU

FIGURE 4.5 DiffServ Code Point.

be specified in quantitative or statistical terms of throughput, delay, jitter, and loss, or may otherwise be specified in terms of some relative priority of access to network resources. Service differentiation is desired to accommodate heterogeneous application requirements and user expectations, and to permit differentiated pricing of Internet service. Differentiated Services mechanisms do not use per-flow signaling and, as a result, do not consume per-flow state within the routing infrastructure. Different service levels can be allocated to different groups of users, which means that all traffic is distributed into groups or classes with different QoS parameters. This reduces the maintenance overhead in comparison to Integrated Services. Network traffic is classified and apportioned to network resources according to bandwidth management criteria. To enable QoS, network elements give preferential treatment to classifications identified as having more demanding requirements. DiffServ provides a simple and coarse method of classifying services of applications. The main goal of DiffServ is a more scalable and manageable architecture for service differentiation in IP networks [13]. The initial premise was that this goal could be achieved by focusing not on individual packet flows, but on traffic aggregates, large sets of flows with similar service requirements. By carefully aggregating a multitude of QoS-enabled flows into a small number of aggregates, giving a small number of differentiated treatments within the network, DiffServ eliminates the need to recognize and store information about each individual flow in core routers. This basic approach to scalability succeeds by combining a small number of simple packet treatments with a larger number of per-flow policies to provide a broad and flexible range of services. A description of the externally observable forwarding treatment applied at a Differentiated Services-compliant node to a behavior aggregate is defined as per-hop behavior (PHB). Each DiffServ flow is policed and marked at the first QoS-enabled downstream router according to a contracted service profile, or service-level agreement (SLA). Downstream from this router, a DiffServ flow is mingled with similar DiffServ traffic into an aggregate. Then, all further forwarding and policing activities are performed on these aggregates. Current proposals [12] are using a few bits of the IP version 4 (IPv4) ToS byte or the IPv6 traffic class byte, now called the DiffServ Code Point (DSCP) (Figure 4.5), for marking packets. The format of the header is as follows: • CP — (Six-bit) Differentiated Services Code Point to select the PHB a packet experiences at each node • CU — Currently unused

© 2005 by CRC Press

4-11

Fundamentals in Quality of Service and Real-Time Transmission

Classifier

Conditioner Marker

Meter

FIGURE 4.6 Edge router: DiffServ classification and conditioning.

There are currently two standard per-hop behaviors defined that effectively represent two service levels (traffic classes): • Expedited forwarding (EF) — The objective with EF PHB (RFC 3246) is to provide a service that is low loss, low delay, and low jitter, such that the service approximates a virtual leased line. The basic approach is to minimize the loss and delay experienced in the network by minimizing queuing delays. This job can be done by ensuring that, at each node, the rate of departure of packets from the node is a well-defined minimum (shaping on egress points), and that the arrival rate at the node is always less than the defined departure rate (policing on ingress points). For example, to ensure that the incoming rate is always below the configured outgoing rate, any traffic that exceeds the traffic profile, which is defined by local policy, is discarded. Generally, expedited forwarding could be implemented in network nodes by a priority queue. The recommended DSCP value for the EF PHB is 101110; see [20]. • Assured forwarding (AF) — AF PHB is defined in RFC 2597. Its objective is to provide a service that ensures that high-priority packets are forwarded with a greater degree of reliability than lower-priority packets. AF defines four priorities (classes) of traffic, receiving different bandwidth levels (sometimes described as the Olympic services: gold, silver, bronze, and best effort). There are three drop preferences within each priority class, resulting in 12 different DSCP values. The worse the drop preference, the more chance of getting dropped during congestion. Hence, AF PHB enables packets to be marked with different AF classes and, within each class, to be marked with different drop precedence values. Within a router, resources are allocated according to the different AF classes. If the resources allocated according to a different class become congested, then packets must be dropped. The packets to be dropped are those with higher drop precedence, as in [20]. Normally, the traffic into a DiffServ network from a particular source should conform to a particular traffic profile; thus, the rate of traffic should not exceed some preagreed maximum. In the event that it does, excess traffic is not delivered with as high of a probability as the traffic within the profile, which means it may be demoted but not necessarily dropped. The PHBs are expected to be simple and define forwarding behaviors that may suggest, but do not require, a particular implementation or queuing discipline. In general, a classifier selects packets based on one or more predefined sets of header fields. The mapping of the network traffic to the specific behaviors is indicated by the DSCP. The traffic conditioners enforce the rules of each service at the network ingress point. Finally, PHBs are applied to the traffic by the conditioner at a network ingress point according to predetermined policy criteria. The traffic may be marked at this point and routed according to the marking, and then unmarked at the network egress. Each DiffServ-enabled edge router implements traffic conditioning functions, which perform metering, shaping, policing, and marking of packets to ensure that the traffic entering a DiffServ network conforms to the SLA, as illustrated in Figure 4.6. The simplicity of DiffServ to prioritize traffic belies its extensibility and power. Using RSVP parameters (as described in the next section) or specific application types to identify and classify constant-bit-rate (CBR) traffic might help to establish well-defined aggregate flows that may be directed to fixed-bandwidth pipes. DiffServ is more scalable at the cost of coarser service granularity, which may be the reason why it is not yet commercially available to the end users; see also [16].

© 2005 by CRC Press

4-12

The Industrial Communication Technology Handbook

0

32 label

exp

S

TTL

• Label — Label value carries the actual value of the label. When a labeled packet is received, the label value at the top of the stack is inspected and learns: • The next-hop to which the packet is to be forwarded. • The operation to be performed on the label stack before forwarding; this operation may be to replace the top label stack entry with another, or to pop an entry off the label stack, or to replace the top label stack entry and then to push one or more additional entries on the label stack. • Exp — Experimental use: Reserved for experimental use. • S — Bottom of stack: This bit is set to one for the last entry in the label stack, and zero for all other label stack entries. • TTL — Time-to-live field is used to encode a time-to-live value. FIGURE 4.7 MPLS label structure.

4.4.3 Multi-Protocol Label Switching As stated, we can see that IntServ and DiffServ take different approaches solving the QoS challenge. Meanwhile, another approach exists that is slightly different but already in use: Multi-Protocol Label Switching (MPLS). In contrast, it is not primarily a QoS solution, although it can be used to support QoS requirements. More specifically, MPLS has mechanisms to manage traffic flows of various granularities and is independent of the layer 2 and layer 3 protocols such as asynchronous transfer mode (ATM) and IP. MPLS provides a means to map IP addresses to simple, fixed-length labels used by different packet-forwarding and packet-switching technologies. Additionally, MPLS interfaces to existing routing and switching protocols, such as IP, ATM, Frame Relay, Resource Reservation Protocol (RSVP), Open Shortest-Path First (OSPF), and others. In MPLS, data transmission occurs on label-switched paths (LSPs). LSPs are a sequence of labels at each and every node along the path from the source to the destination. There are several label distribution protocols used today, such as Label Distribution Protocol (LDP) or RSVP, or piggybacked on routing protocols like Border Gateway Protocol (BGP) and OSPF. High-speed switching of data is possible because the fixed-length labels are inserted at the very beginning of the packet or cell and can be used by hardware to switch packets quickly between links. MPLS is best viewed as a new switching architecture and is basically a forwarding protocol that simplifies routing in IP-based networks. It specifies a simple and scalable forwarding mechanism, since it uses labels instead of a destination address to make the routing decision. The label value that is placed in an incoming packet header is used as an index to the forwarding table in the router (Figure 4.7). This lookup requires only one access to the table, in contrast to the traditional routing table access that might require uncountable lookups [1]. One of the most important uses of MPLS is in the area of traffic engineering, which can be summarized as the modeling, characterization, and control of traffic to meet specified performance objectives. Such performance objectives might be traffic oriented or resource oriented. The former deals with QoS and includes aspects such as minimizing delay, jitter, and packet loss. The latter deals with optimum usage of network resources, particularly network bandwidth. The current situation with IP routing and resource allocation is that the routing protocols are not well equipped to deal with traffic engineering issues. For example, a protocol such as OSPF (open shortestpath first) can actually promote congestion because it tends to force traffic down the shortest route, although other acceptable routes might be less loaded. With MPLS, a set of flows that share specific attributes can be routed over a given path. This capability has the immediate advantage of steering certain traffic away from the shortest path, which is likely to become congested before other paths. In conclusion, we may say that label switching offers scalability to networks by allowing a large number of IP addresses to be associated with one or a few labels. This approach reduces further the size of address

© 2005 by CRC Press

4-13

Top-to-bottom QoS

Fundamentals in Quality of Service and Real-Time Transmission

Application

Application

Presentation

Presentation

Session

Session

Transport

Transport

Network

Network

Data link

Data link

Physical

Physical

QoS API RSVP DiffServ 802.1p

802.1p

802.1p RSVP

QoS enabled application

DiffServ, MPLS

RSVP

End-to-end QoS

FIGURE 4.8 QoS architecture.

(actually label) tables and allows a router to support more users or to set up fixed paths for different types of traffic. Since the main attributes of label switching are fast relay of the traffic, scalability, simplicity, and route control, label switching can be a valuable tool to reduce latency and jitter for data transmission on packet-switched networks.

4.4.4 Combining QoS Solutions The QoS solutions previously described take different approaches, and each has its advantages and disadvantages. The Integrated Service approach is based on a sophisticated background of research in QoS mechanisms and protocols for packet networks. However, the acceptance of IntServ from network providers and router providers has been quite limited, at least so far, mainly due to scalability and manageability problems [10]. The scalability problems arise because IntServ requires routers to maintain control and a forwarding state for all flows passing through them. Maintaining and processing a per-flow state for gigabit or terabit links, with a lot of simultaneously active flows, is significantly difficult from an implementation point of view. Hence, the IntServ architecture makes the management and accounting of IP networks significantly more complicated. Additionally, it requires new application–network interfaces and can only provide service guarantees when all elements in the flow’s path support IntServ. MPLS may be used as an alternative intradomain implementation technology. These architectures in combination can enable end-to-end QoS. End hosts may use RSVP requests with high granularity (e.g., bandwidth, jitter, threshold, etc.). Border routers at backbone ingress points can then map those RSVP reservations to a class of service indicated by a DSCP or to a dedicated MPLS path. At the backbone egress point, the RSVP provisioning may be honored again, to the final destination; see Figure 4.8. Such combinations clearly represent a trade-off between service granularity and scalability: as soon as flows are aggregated, they are not as isolated from each other as they possibly were in the IntServ part of the network. This means that, for instance, unresponsive flows can degrade the quality of responsive flows. The strength of a combination is the fact that it gives network operators another opportunity to customize their network and fine-tune it based on QoS and scalability demands, as stated in [16]. Until now, IP has provided a best-effort service in which network resources are shared equitably. Adding quality-of-service support to the Internet raises significant concerns, since it enables Differentiated Services that represent a significant departure from the fundamental and simple design principles that made the Internet a success. Nonetheless, there is a significant need for IP QoS, and protocols have

© 2005 by CRC Press

4-14

The Industrial Communication Technology Handbook

evolved to address this need. The most viable solution today is a trade-off between protocol complexity and bandwidth scarcity with the following results: • Different QoS levels are used in the core network (e.g., four MPLS levels). • Applications at the user side are distinguished by DiffServ mechanisms. • The marked user traffic is mapped to the appropriate core layers. Finally, we should always bear in mind that an application-to-application guarantee not only depends on network conditions but also on the overall performance of each end system and the way of supporting real-time traffic, as discussed next.

4.5 Protocols Supporting Real-Time Traffic This section is intended to give a brief overview of protocols supporting end-to-end transport of realtime data. However, it may also be added that these protocols do not provide any QoS guarantees as previously described.

4.5.1 Real-Time Transport Protocol The Real-Time Transport Protocol (RTP) provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video or simulation data, over multicast or unicast network services. Real-time traffic examples are audio conversations between two people and playing individual video frames at the receivers as they are received from the transmitter. RTP itself, however, does not provide all of the functionality required for the transport of data, and therefore applications typically run RTP on top of UDP to make use of its multiplexing and checksum services. RTP is best described as an encapsulation protocol. The data field of the RTP packet carries the real-time traffic, and the RTP header contains information about the type of traffic that is transported [17]. RTP supports data transfer to multiple destinations using multicast distribution if provided by the underlying network, and may also be used with other suitable underlying network or transport protocols. RTP is described in the IETF’s RFC 3550 [18] specification as being a protocol providing end-to-end delivery services, such as payload type identification, time stamping, and sequence numbering, for data with real-time characteristics. RTP itself does not provide any mechanism to ensure timely delivery or provide other quality-of-service guarantees, but relies on lower-layer services to do so. It does not guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence (Figure 4.9). RTP consists of two closely linked parts: • The Real-Time Transport Protocol (RTP), to carry data that has real-time properties • The Real-Time Transport Control Protocol (RTCP), to monitor the quality of service and to convey information about the participants in an ongoing session

4.5.2 Real-Time Transport Control Protocol RTP usually works in conjunction with another protocol called the Real-Time Transport Control Protocol (RTCP), which provides minimal control over the delivery and quality of the data. It is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example, using separate port numbers with UDP. RTCP performs four main functions: • Feedback information — This is used to check the quality of the data distribution. During an RTP session, RTCP control packets are periodically sent by each participant to all the other participants. These packets contain information such as the number of RTP packets sent, the number of packets

© 2005 by CRC Press

4-15

Fundamentals in Quality of Service and Real-Time Transmission

0

32 V

P X

CC

M

sequence number

PT timestamp

synchronization source (SSRC) identifier contributing source (CSRC) identifier ....

• V — Version: Identifies the RTP version (V = 2). • P — Padding: When set, the packet contains one or more additional padding octets at the end that are not part of the payload. • X — Extension bit: When set, the fixed header is followed by exactly one header extension, with a defined format. • CSRC count (CC) — Contains the number of CSRC identifiers that follow the fixed header (0 to 15 items, 32 bits each). • M — Marker: The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. • Payload type — Identifies the format of the RTP payload and determines its interpretation by the application. A profile specifies a default static mapping of payload type codes to payload formats. Additional payload type codes may be defined dynamically through non-RTP means. • Sequence number — Increments by 1 for each RTP data packet sent and may be used by the receiver to detect packet loss and restore packet sequence. • Time stamp — Reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations. • SSRC — Synchronization source: This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. • CSRC — Contributing sources identifier list: Identifies the contributing sources for the payload contained in this packet. FIGURE 4.9 RTP header.

lost, etc., which the receiving application or any other third-party program can use to monitor network problems. The application might then change the transmission rate of the RTP packets to help reduce any problems. • Transport-level identification — This is used to keep track of each of the participants in a session. RTCP carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. Since the SSRC identifier may change if a conflict is discovered or a program is restarted, receivers require the CNAME to keep track of each participant. It is also used to associate multiple data streams from a given participant in a set of related RTP sessions, e.g., the synchronization of audio and video. • Transmission interval control — The first two functions require that all participants send RTCP packets; therefore, the rate must be controlled in order for RTP to scale up to a large number of participants. By having each participant send its control packets to all the others, each can independently observe the number of participants. This number is used to calculate the rate at which the packets are sent, which ensures that the control traffic will not overwhelm network resources. Control traffic is limited to at most 5% of the overall session traffic. • Minimal session control — This optional function is to convey minimal session control information, e.g., to display the name of a new user joining an informal session. This is most likely useful in loosely controlled sessions where participants enter and leave without membership control or parameter negotiation.

© 2005 by CRC Press

4-16

The Industrial Communication Technology Handbook

When an RTP session is initiated, an application defines one network address and two ports for RTP and RTCP. If there are several media formats such as video and audio, a separate RTP session with its own RTCP packets is required for each one. Other participants can then decide which particular session and hence medium they want to receive. Overall, RTP provides a way in which real-time information can be transmitted over existing transport and underlying network protocols. It is important to realize that RTP is an application layer protocol and does not provide any QoS guarantees. However, it does allow for various types of impairments such as packet loss or jitter to be detected. With the use of a control protocol, RTCP, it provides a minimal amount of control over the delivery of the data. However, to ensure that the real-time data will be delivered on time, if at all, RTP must be used in conjunction with other mechanisms or protocols that will provide reliable service.

4.5.3 Real-Time Streaming Protocol The Real-Time Streaming Protocol (RTSP) [19] establishes and controls either a single or several timesynchronized streams of continuous media such as audio and video. RTSP does not typically deliver the continuous streams itself, although interleaving of the continuous media stream with the control stream is possible. RFC236 [21] describes RTSP as being an application-level protocol that controls the delivery of streaming media with real-time properties. This media can be streamed over unicast or multicast networks. RTSP itself does not actually deliver the media data. This is handled by a separate protocol, and therefore RTSP can be described as a kind of network remote control to the server that is streaming the media. Sources of data can include both live data feeds and stored clips. RTSP is intended to control multiple data delivery sessions, provide a means for choosing delivery channels such as UDP, multicast UDP, and TCP, and provide a means for choosing delivery mechanisms based upon RTP. The underlying protocol that is used to control the delivery of the media is determined by the scheme used in the RTSP Uniform Resource Locator (URL). The schemes that are supported on the Internet are “rtsp:,” which requires that the commands are delivered using a reliable protocol, e.g., TCP; “rtspu:,” which identifies an unreliable protocol such as UDP; and “rtsps:,” which requires a TCP connection secured by the Transport Layer Security (TLS) protocol. Therefore, a valid RTSP URL could be “rtspu://foo.bar.com:5150,” which requests that the commands be delivered by an unreliable protocol to the server “foo.bar.com,” on port 5150. There is no notion of an RTSP connection; instead, a server maintains a session labeled by an identifier. During an RTSP session, an RTSP client may open and close many reliable transport connections to the server to issue RTSP requests. Alternatively, it may use a connectionless transport protocol such as UDP. RTSP is intentionally similar in syntax and operation to the Hypertext Transfer Protocol (HTTP) so that extension mechanisms to HTTP can in most cases also be added to RTSP. The protocol supports the following operations: • Retrieval of media from media server: The client can request a presentation description via HTTP or some other method. • Invitation of a media server to a conference: A media server can be invited to join an existing conference, either to play back media into the presentation or to record all or a subset of the media in a presentation. • Addition of media to an existing presentation: Particularly for live presentations, it is useful if the server can tell the client about additional media becoming available. Since most servers are designed to handle more than one user at a time, the server needs to be able to maintain a session state, i.e., whether it is setting up a session (the SETUP state), playing a stream (the PLAY state), etc. This will allow it to correlate RTSP requests with the relevant stream. HTTP, however, is a stateless protocol since typically there is no need to save the state of each client. Another area in which HTTP and RTSP differ is in the way the client and server interact. With HTTP the interaction is one way — the client issues a request for a document and the server responds. With

© 2005 by CRC Press

Fundamentals in Quality of Service and Real-Time Transmission

4-17

RTSP both the client and server can issue requests. To summarize, RTSP is more of a protocol framework than a protocol itself.

References [1] Uyless D. Black, MPLS and Label Switching Networks, Prentice Hall, Englewood Cliffs, NJ, 2001. [2] Douglas E. Comer, Computer Networks and Internets, 2nd edition, Prentice Hall, Englewood Cliffs, NJ, 1999. [3] P. Ferguson and G. Houston, Quality of Service: Delivering QoS on the Internet and in Corporate Networks, John Wiley & Sons, New York, 1998. [4] ITU-T Recommendation G.114, One-Way Transmission Time, International Telecommunication Union, 1996. [5] S. Kalinindi, OWDP: A Protocol to Measure One-Way Delay and Packet Loss, Technical Report STR-001, Advanced Network and Services, September 1998. [6] S. Keshav, An Engineering Approach to Computer Networking, Addison-Wesley, Reading, MA, 1997. [7] T. Kushida, The traffic and the empirical studies for the Internet, in Proc. IEEE Globecom 98, Sydney, 1998, pp. 1142–1147. [8] V. Paxson, Towards a Framework for Defining Internet Performance Metrics, Technical Report LBNL-38952, Network Research Group, Lawrence Berkeley National Laboratory, June 1996. [9] RFC2205, Resource ReSerVation Protocol (RSVP) Version 1 Functional Specification, September 1997. [10] RFC2208, Resource ReSerVation Protocol (RSVP) Version 1 Applicability Statement: Some Guidelines on Deployment, September 1997. [11] RFC2210, The Use of RSVP with IETF Integrated Services, September 1997. [12] RFC2474, Definition of the Differentiated Services Field (DS Field in the IPv4 and IPv6 Headers), December 1998. [13] RFC2475, An Architecture for Differentiated Services, September 1997. [14] R. Seifert, Gigabit Ethernet: Technology and Applications for High-Speed LANs, Addison-Wesley, Reading, MA, 1998. [15] R.W. Stevens, TCP/IP Illustrated: The Protocols, Volume 1, Addison-Wesley, New York, 1994. [16] M. Welzl and M. Mühlhäuser, Scalability and quality of service: a trade-off? IEEE Communications Magazine, 41, 32–36, 2003. [17] Uyless D. Black, Voice over IP, Prentice Hall, Englewood Cliffs, NJ, 2000. [18] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [19] H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol, Internet Draft, 1998. [20] D. Collins, Carrier Grade Voice Over IP, 2nd edition, McGraw-Hill, New York, 2003. [21] RFC2326, Real Time Streaming Protocol (RTSP), April 1998.

© 2005 by CRC Press

5 Survey of Network Management Frameworks 5.1 5.2 5.3

Introduction ........................................................................5-1 Network Management Architecture...................................5-3 ISO Systems Management Framework..............................5-4 Functional Aspects • Information Aspects • Organization Aspects • Communication Aspects

5.4

Internet Management Framework .....................................5-7

5.5

ISO and Internet Management Standards: Analysis and Comparison................................................................5-12

SNMPv1 • SNMPv2 • SNMPv3

SNMP and CMIP • MIBs and SMI • Network Management Functions

5.6

DHCP: IP Address Management Framework for IPv4 ....................................................................................5-14 IP Address Allocation Mechanisms • The IP Address Management of DHCP • Advantages and Disadvantages of DHCP for IPv4

Mai Hoang University of Potsdam

5.7 Conclusions .......................................................................5-17 References .....................................................................................5-18

5.1 Introduction Computer networks and distributed processing systems continue to grow in scale and diversity in business, government, and other organizations. Three facts become evident. First, new networks are added and existing ones are expanded almost as rapidly as new network technologies and products are introduced. The problems associated with network expansion affect day-to-day network operation management. Second, the network and its resources and distributed services become indispensable to organizations. Third, more things can go wrong, which disable the network or degrade the performance to an unacceptable level. Inhomogeneous large networks cannot be put together and managed by human effort alone. Instead, their complexity dictates the use of a rich set of automated network management tools and applications. In response, the International Organization for Standardization (ISO) began work in 1978 to establish a standard for network management, the Open Systems Interconnection (OSI) network management, including the management model, functional areas, Common Management Information Services (CMIS), Common Management Information Protocol (CMIP), and management information base (MIB) [ROS90]. The network management model describes the main components of a network man-

5-1 © 2005 by CRC Press

5-2

The Industrial Communication Technology Handbook

agement tool for a managed network. For a given managed network, it is necessary to know which problem areas have to be considered. These problem areas are specified as the ISO management functions, which were already contained in the first ISO working draft of the management framework and gradually evolved into what is presently known as the five functional areas of the ISO management framework (performance, faults, configuration, accounting, security). The important pieces of the ISO management are CMIP and CMIS, which managing and managed devices use for their communication. The CMIP consists of a set of services, the so-called CMIS that define the types of requests and responses and the actions they should invoke. In addition to being able to pass information back and forth, the managing and managed devices need to agree on a set of variables and means to initiate actions. The collection of this information is referred to as the management information base (MIB). Because of the slowness of the ISO standardization process, the complexity of the proposed new standard, and the urgent need for management tools, the Internet Engineering Task Force (IETF) devised the Simple Network Management Protocol (SNMP) [RFC1157], which was originally regarded as a provisional means for network management until the OSI management standards were complete, but subsequently became a de facto standard because of its dissemination and simplicity. The SNMP consists of three parts: the protocol, the structure of management information (SMI), and the management information base (MIB). The SNMP includes the SNMP operations, the format of messages, and how messages are exchanged between a manager and agent. The SMI is a set of rules allowing a user to specify the desired management information, e.g., by providing a means of naming and declaring the types of variables. Finally, the MIB is a structured collection of all managed objects maintained by a device. The managed objects are structured as a hierarchical tree. In order to address several weaknesses within SNMP, SNMP version 2 (SNMPv2) was initiated around 1994. SNMPv2 provides more functionality and greater efficiency than the original version of SNMP, but for various reasons SNMPv2 did not succeed. Finally, SNMP version 3 (SNMPv3) was issued in 1998. SNMPv3 describes an overall framework for present and future versions of SNMP and defines security features to SNMP. Both ISO and IETF frameworks are used for developing the network management systems and applications for monitoring and controlling the hardware as well as software components. In addition to these complex frameworks, other simple management frameworks have been developed. Each of these frameworks focuses only on a particular management task. One sort of framework is for Internet Protocol (IP) address management, which has been around since the advent of networks — each component within a network must have a set of definite, unique parameters so the rest of the network can recognize them. Traditionally, most network administrators used a pen and paper or a spreadsheet to keep track of their networks’ parameters. While this was sufficient for small networks with a few hosts, increased management expenses naturally followed as the networks grew and changed. Thus, the process of IP address management needed to be done through automated management applications. In response to this need, the IETF created the Dynamic Host Configuration Protocol (DHCP). DHCP was developed from an earlier protocol called the Bootstrap Protocol (BOOTP) [RFC951, RFC1542], which was used to pass information during initial booting to client systems. The BOOTP was designed to store and update static information for clients, including IP addresses. The BOOTP server always issued the same IP address to the same client. As a result, while BOOTP addressed the need for central management, it did not address the problem of managing IP addresses as a dynamic resource. To address the need to manage dynamic configuration information in general, and dynamic IP addresses specifically, the IETF standardized the DHCP as a framework for automatic IP verson 4 (IPv4) address management. To standardize the DHCP environment, the IETF issued a series of RFCs [RFC1542, RFC2131, RFC2132] focused on DHCP extensions to the BOOTP. The most recent of these standards is RFC2131, which was issued in March 1997. DHCP is built on a client–server model. It includes two parts: the mechanisms for IP address allocation and the protocol for communication between DHCP servers and DHCP clients. The most important features of DHCP are as follows. First, DHCP permits a server to allocate IP addresses automatically. Automatic address allocation is needed for environments such as wireless networks, where a computer can attach and detach quickly. Second, DHCP allows a client to acquire all the configuration information it needs in a single message.

© 2005 by CRC Press

Survey of Network Management Frameworks

5-3

FIGURE 5.1 Manager–agent architecture.

This chapter focuses on the network management frameworks. First, it provides a comprehensive survey of conceptual models, protocols, services, and management information bases of the ISO and IETF management framework. Following that, the DHCP for IPv4 is discussed in detail. The chapter is organized as follows. Section 5.2 describes the network management model. The ISO network management framework is discussed briefly in Section 5.3, while Section 5.4 presents the IETF management framework. A comparison of these management standards is given in Section 5.5. Section 5.6 provides an overview of DHCP. Section 5.7 concludes the chapter with an overview of the open problems in network management.

5.2 Network Management Architecture The network management architecture used in ISO and IETF frameworks is called manager–agent architecture and includes the following key components: • • • •

Managed devices Management stations Management protocols Management information

These pieces are shown in Figure 5.1 and described below. Network management is done from management stations, which are computers running special management software. These management stations contain a set of management application processes called managers for data analysis, fault recovery, and so on. The manager is the locus of activity for network management: it provides or monitors information to users; it issues requests to managed devices in order to ask them to take some action; it receives responses to the requests; and it receives unsolicited reports from managed devices concerning the status of the devices — these reports are referred to as notification and are frequently used to report problems, anomalies, or changes in the agent environment. A managed device is a piece of network equipment that resides in a managed network. The managed devices might be hosts, routers, switches, bridges, or printers. To be managed from a management station, a device must be capable of running a management process, called (management) agent. These agents communicate with managers running on the management station and take local actions on the managed device under the command and control of the managers. An agent can act upon and respond to requests from a manager; furthermore, it can provide unsolicited notifications to a manager. Each managed device maintains one or more variables (for example, a network interface card or a set of configuration parameters for a piece of hardware or software) that describe its state. In the ISO and IETF management frameworks these variables are called managed objects. The collection of these managed objects is referred to as a management information base (MIB). These variables can be viewed and optionally modified by the managers. The network management protocol is needed for communication between managers and agents. This protocol allows the manager to query the status of managed devices and to initiate actions at these devices

© 2005 by CRC Press

5-4

The Industrial Communication Technology Handbook

by triggering the agents. Furthermore, agents can use the network management protocol to report exceptional events to the management stations. When describing any framework for network management, the following aspects must be addressed: • Functional aspect: Specifies management functional areas supported by managers and agents. This aspect relates to specific management functions that are carried out by the manager or agent. • Information aspect: Defines the kind of information that will be exchanged between manager and agent. The information aspect deals with MIBs and SMI. • Communication aspect: Addresses the communication protocol between manager and agent for exchanging this information. • Organization aspect: Deals with the definition of the principle structural components and the management architecture for a managed network. The OSI and IETF management frameworks are discussed in the following subsections from the view of these aspects.

5.3 ISO Systems Management Framework The first standard for network management was ISO 7498-4, which specifies the network management framework for the OSI model [ISO7498-4]. Although the production of this framework took considerable time, it was not generally accepted as an adequate starting point. It was therefore decided to issue an additional standard, which was called the Systems Management Overview [ISO10040]. Subsequently, ISO has issued a set of other standards for network management. Together these standards provide the basis for the OSI management framework.

5.3.1 Functional Aspects OSI Systems Management standardization followed a top-down approach, with a number of systems management functional areas (SMFAs) identified first. The intention was not to describe exhaustively all relevant types of management activity, but rather to investigate the key requirements and address these through a generic management model. The identified areas were fault, configuration, accounting, performance, and security management, collectively referred to as FCAPS from their initials [Sta93a]. 5.3.1.1 Fault Management Fault management deals with the mechanisms for the detection, isolation, and correction of abnormal operations. Fault management includes functions to: • • • •

Maintain and examine error logs Trace and identify faults Accept and act upon error notifications Carry out diagnostic tests and correct faults

5.3.1.2 Configuration Management Configuration management is the set of facilities that allow network managers to exercise control over the configuration of the network components and OSI layer entities. Configuration management includes functions to: • • • • •

Record the current configuration Record changes in the configuration Initialize and close down managed objects Identify the network components Change the configuration of managed objects (e.g., routing table)

© 2005 by CRC Press

Survey of Network Management Frameworks

5-5

5.3.1.3 Accounting Management Accounting management deals with the collection and processing of accounting information for charging and billing purposes. It should enable accounting limits to be set and costs to be combined when multiple resources are used in the context of a service. Accounting management includes functions to: • Inform users of the cost thus far • Inform users of the expected cost in the future • Set cost limits 5.3.1.4 Performance Management Performance management is the set of facilities that enable the network managers to monitor and evaluate the performance of the system and layer entities. Performance management involves three main steps: (1) performance data are gathered on variables of interest to network administrators, (2) the data are analyzed to determine normal (baseline) levels, and (3) appropriate performance thresholds are determined for each important variable so that exceeding these thresholds indicates a network problem worth attention. Management entities continually monitor performance variables. When a performance threshold is exceeded, an alert is generated and sent to the network management system. Performance management provides functions to: • Collect and disseminate data concerning the current level of performance of resources • Maintain and examine performance logs for planning and analysis purposes 5.3.1.5 Security Management Security management addresses the control of the access to network resources according to local guidelines so that the network cannot be damaged and persons without appropriate authorization cannot access sensitive information. A security management subsystem, for example, can monitor users logging on to a network resource and can refuse access to those who enter inappropriate access codes. Security management provides support for management of: • • • • •

Authorization facilities Access control Encryption and key management Authentication Security logs

Soon after the first working drafts of the management framework appeared, ISO started to define protocol standards for each of the five SMFAs. After some time, an interesting observation was made that most of the functional area protocols used a similar set of elementary management functions. ISO therefore decided to stop further progression of the five functional area protocols and concentrate on the definition of elementary management functions. Following this, a set of standards, e.g., object management, state management, relationships management, alarm reporting, event report management, and log control, have been issued as the general category systems management functions (SMFs). Each SMF standard defines the functionality to support specific management functional area (SMFA) requirements. Moreover, these standards provide a mapping between the CMIS (discussed below) and SMFs.

5.3.2 Information Aspects The information aspects of OSI systems management deal with the resources that are being managed by agents. OSI systems management relies on object-oriented concepts. Therefore, each resource being managed is represented by a managed object. A managed object may represent either a logical resource, such as a user account, or a real resource, like an ATM switch. Managed objects that refer to resources specific to an individual layer are called (N)-layer managed objects. Managed objects that refer to resources that encompass more than one layer are called systems managed objects. According to the OSI

© 2005 by CRC Press

5-6

l

The Industrial Communication Technology Handbook

Management Information Model [ISO10165-1], a managed object is defined in terms of attributes it possesses, operations that may be performed on it, notifications that it may issue, and its interactions with other managed objects. The managed objects are defined using two standards: Abstract Syntax Notation 1 (ASN.1), to define data types, and Guidelines for Definition of Managed Objects (GDMO), to define managed objects [ASN90, ISO10165-1]. Under systems management, all the managed objects are represented in the so-called management information base (MIB). The managed object concept is refined in a number of additional standards that are called the structure of management information (SMI) standards [ISO10165-1, ISO10165-2, ISO10165-4, ISO10165-5, ISO10165-7]. The SMI identifies the data types that can be used in the MIB and how the resources within the MIB are represented and named [Sta93b].

5.3.3 Organization Aspects The key elements of the OSI architectural model include the systems management application process (SMAP), systems management application entity (SMAE), layer management entity, and management information base. SMAP is the process within a managing device that is responsible for executing the network management functions; it has access to all parameters of managed devices and can therefore manage all aspects of a managed network. SMAP works in cooperation with SMAPs on other managed networks. A SMAE is responsible for communication with other devices, especially with devices exercising control functions. CMIP is used as a standardized application-level protocol by SMAE. Layer management entity is the logic embedded into each layer of OSI architecture to provide network management functions that are specific for this layer. To provide management of a distributed system, the elements in this architectural model must be implemented in a distributed fashion across all of the devices in a managed network. The OSI systems management is organized in a central manner. According to this scheme, a single manager may control several agents. Each agent contains a number of objects. Each object is a data structure that corresponds to an actual piece of device to be managed. The SMAP is allowed to take on either a manager role or an agent role. The manager role for a SMAP occurs in a device that acts as a network control center. The agent role for a SMAP occurs in managed devices. The manager performs operations upon the agents, and the agents forward notifications to the managers. Because of the expansion of the open system, the OSI management environment may be partitioned into a number of management domains. The partition can not only be based on the required management functions (security, accounting, performance, etc.), but also on other requirements (e.g., geographical).

5.3.4 Communication Aspects The communication aspect deals with the exchange of systems management information between manager and agents within a managed network. Relating to this aspect, ISO has issued two standards, the Common Management Information Services (CMIS) and the Common Management Information Protocol (CMIP) [ISO9595, ISO9596]. CMIS provides OSI management services to management applications. It defines a set of management services, specifies types of requests and responses, and defines what each request and response can do. The management processes initiate these services in order to communicate remotely. Seven services used to handle management information have been standardized. Table 5.1 lists the CMIS with their type and function. The CMIP provides the information exchange capability to support CMIS; it defines a set of protocol data units that implement the CMIS [ISO9596]. In particular, CMIP defines how the requests, responses, and notifications are encoded into messages and specifies which bearer service is used to transport those encoded messages between managers and agents. A CMIP request typically specifies one or more managed objects to which the request is to be sent. The correspondence between CMIS primitives and CMIP data units is described in [Sta99].

© 2005 by CRC Press

5-7

Survey of Network Management Frameworks

TABLE 5.1

CMIS

CMIS Services

Type

Functions

M-EVENT-REPORT

Confirmed/not confirmed

Notification Services Gives notification of an event occurring on a managed object

Operation Services M-GET M-SET M-ACTION M-CREATE M-DELETE M-CANCEL-GET

Confirmed Confirmed/not confirmed Confirmed/not confirmed Confirmed Confirmed Confirmed

Request for mangement data Modification of management data Execution of action on a managed object Creation of a managed object Deletion of a managed object Request to cancel any new responses to a previous request for M-GET services

5.4 Internet Management Framework An interesting difference between the IETF and ISO is that the IETF takes a more pragmatic and resultdriven approach than ISO. In the IETF, it is, for instance, unusual to spend much time on architectural discussions; people prefer to use their time for the development of protocols and implementations. This difference explains why no special standards management architecture and function areas have been defined in the first two versions of SNMP; only the communication aspect (as SNMP), the information aspect (as SMI and MIB), and the security aspect have been standardized. SNMP is an application layer protocol that facilitates the exchange of management information between network devices. It is a part of the Transmission Control Protocol (TCP)/IP suite and operates over User Datagram Protocol (UDP). As described in Section 5.2, the IETF network management is based on the manager–agent architecture. Figure 5.2 shows the architecture of the Internet management. In this architecture, a manager process controls access to a central MIB at the management station and provides an interface to the management application. Furthermore, a manager may control many agents, whereby each agent interprets the SNMP messages and controls the agent’s MIBs. In Section 5.2, we have provided an overview of the basic components of a management architecture used by ISO and IETF. The IETF network management framework consists of: • SNMP. SNMP is a management protocol for conveying information and commands between a manager and an agent running in a managed network device [KR01]. • MIB. Resources in networks may be managed by representing them as objects. Each object is a data variable that represents one aspect of a managed device. In the IETF network management framework, the representation of a collection of these objects is called the management information base (MIB) [RFC1066, RFC1157, RFC1212]. A MIB object might be a counter such as the number of IP datagrams discarded at a router due to errors, descriptive information such as generic information about the physical interfaces of the entity, or protocol-specific information such as the number of UDP datagrams delivered to UDP users. Management application Manager process Central MIB

SNMP UDP IP Network-dependent protocols

FIGURE 5.2 Internet management architecture.

© 2005 by CRC Press

Application manages objects SNMP messages Network or Internet

Agent process SNMP UDP IP Network-dependent protocols

Agent MIB

5-8

The Industrial Communication Technology Handbook

• SMI. SMI [RFC1155] allows the formal specification of the data types that are used in a MIB and specifies how resources within a MIB are named. The SMI is based on the ASN.1 (Abstract Syntax Notation 1) [ASN90] object definition language. However, since many SMI-specific data types have been added, SMI should be considered a data definition language of its own right. • Security and administration are concerned with monitoring and controlling access to managed networks and access to all or part of management information obtained from network nodes. In the following sections, an overview of several SNMP versions (SNMPv1, SNMPv2, SNMPv3) with respect to protocol operations, MIB, SMI, and security is given.

5.4.1 SNMPv1 The original network management framework is defined in the following documents: • RFC 1155 and RFC 1212 define SMI, the mechanisms used for specifying and naming managed objects. RFC 1215 defines a concise description mechanism for defining event notifications that are called traps in SNMPv1. • RFC 1157 defines SNMPv1, the protocol used for network access to managed objects and event notification. • RFC 1213 contains definitions for a specific MIB (MIB I) covering TCP, UDP, IP, routers, and other inhabitants of the IP world. 5.4.1.1 SMI The RFCs 1155, 1212, and 1215 describe the SNMPv1 structure of management information and are often referred to as SMIv1. Note that the first two SMI documents do not provide definitions of event notifications (traps). Because of this, the last document specifies a straightforward approach toward defining event notifications used with the SNMPv1 protocol. 5.4.1.2 Protocol Operations In SNMPv1, communication between manager and agent is performed in a confirmed way. The manager at the network management station takes the initiative by sending one of the following SNMP protocol data units (PDUs): GetRequest, GetNextRequest or SetRequest. The GetRequest and GetNextRequest are used to get management information from the agent; the SetRequest is used to change management information at the agent. After reception of one of these PDUs, the agent responds with a response PDU, which carries the requested information or indicates failure of the previous request (Figure 5.3). It is also possible that the SNMP agent takes the initiative. This happens when the agent detects some extraordinary event such as a status change at one of its links. As a reaction to this, the agent sends a trap PDU to the manager [RFC1215]. The reception of the trap is not confirmed (Figure 5.3(d)). 5.4.1.3 MIB As noted above, the MIB can be thought of as a virtual information store, holding managed objects whose values collectively reflect the current state of the network. These values may be queried or set by a manager by sending SNMP messages to the agent. Managed objects are specified using the SMI discussed above. The IETF has been standardizing the MIB modules associated with routers, hosts, and other network equipment. This includes basic identification data about a particular piece of hardware and management information about the devices network interfaces and protocols. With the different SNMP standards, the IETF needed a way to identify and name the standardized MIB modules, as well as the specific managed objects within a MIB module. To do that, the IETF adopted ASN.1 as a standardized object identification (naming) framework. In ASN.1, object identifiers have a hierarchical structure, as shown in Figure 5.4. The global naming tree illustrated in Figure 5.4 allows for unique identification of objects, which correspond to leaf nodes. Describing an object identifier is accomplished by traversing the tree, starting

© 2005 by CRC Press

5-9

Survey of Network Management Frameworks

FIGURE 5.3 Initiative from manager (a, b, c) and agent (d).

at the root, until the intended object is reached. Several formats can be used to describe an object identifier, with integer values separated by dots being the most common approach. As shown in Figure 5.4, ISO and the Telecommunications Standardization Sector of the International Telecommunications Union (ITU-T) are at the top of the hierarchy. Under the Internet branch of the tree (1.3.6.1), there are seven categories. Under the management (1.3.6.1.2) and MIB-2 (1.3.6.1.2.1) branches of the object identifier tree, we find the definitions of the standardized MIB modules. The

ITU-T (0)

ISO (1)

Standard (0)

ISO member body (2)

Joint ISO/ITU-T (2)

ISO identified organization (3)

US Dod (6) Internet (1)

directory (1) experimental (3) Security (5) management (2) private (4) SNMPv2 (6) MIB-2 (1)

system (1)

mail (7)

address icmp (5) udp (7) cmot (9) snmp (11) RMON (16) translation (3) interface (2) ip (4) tcp (6) egp (8) transmission (10)

FIGURE 5.4 ASN.1 object identifier tree.

© 2005 by CRC Press

5-10

The Industrial Communication Technology Handbook

lowest level of the tree shows some of the important hardware-oriented MIB modules (system and interface) as well as modules associated with some of the most important Internet protocols. RFC 2400 lists all standardized MIB modules. 5.4.1.4 Security The security capabilities deal with mechanisms to control the access to network resources according to local guidelines so that the network cannot be damaged (intentionally or unintentionally) and persons without appropriate authorization have no access to sensitive information. SNMPv1 has no security features. For example, it is relatively easy to use the SetRequest command to corrupt the configuration parameters of a managed device, which in turn could seriously impair network operations. The SNMPv1 framework only allows the assignment of different access rights to variables (READ-ONLY, READ-WRITE), but performs no authentication. This means that anybody can modify READ-WRITE variables. This is a fundamental weakness in the SNMPv1 framework. Several proposals have been presented to improve SNMP. In 1992, IETF issued a new standard, SNMPv2.

5.4.2 SNMPv2 Like SNMPv1, the SNMPv2 network management framework [RFC1213, RFC1441, RFC1445, RFC1448, RFC1902] consists of four major components: • RFC1441 and RFC1902 define the SMI, the mechanisms used for describing and naming objects for the purpose of management. • RFC1213 defines MIB-2, the core set of managed objects for the Internet suite of protocols. • RFC1445 defines the administrative and other architectural aspects of the framework. • RFC1448 defines the protocol used for network access to managed objects. The main achievements of SNMPv2 are improved performance, better security, and a possibility to build a hierarchy of managers. 5.4.2.1 Performance SNMPv1 includes a rule that states that if the response to a GetRequest or GetNextRequest (each of which can ask for multiple variables) would exceed the maximum size of a packet, no information will be returned at all. Because managers cannot determine the size of response packets in advance, they usually take a conservative guess and request just a small amount of data per PDU. To obtain all information, managers are required to issue a large number of consecutive requests. To improve performance, SNMPv2 introduced the GetBulk PDU. In comparison with Get and GetNext, the response to GetBulk always returns as much information as possible in lexicographic order. 5.4.2.2 Security The original SNMP had no security features. To solve this deficiency, SNMPv2 introduced a security mechanism that is based on the concepts of parties and contexts. The SNMPv2 party is a conceptual, virtual execution environment. When an agent or manager performs an action, it does so as a defined party, using the party’s environment as described in the configuration files. By using the party concept, an agent can permit one manager to do a certain set of operations (e.g., read, modify) and another manager to do a different set of operations. Each communication session with a different manager can have its own environment. The context concept is used to control access to the various parts of a MIB; each context refers to a specific part of a MIB. Contexts may be overlapping and are dynamically configurable, which means that contexts may be created, deleted, or modified during the network’s operational phase. 5.4.2.3 Hierarchy of Managers Practical experience with SNMPv1 showed that in several cases managers are unable to manage more than a few hundred agent systems. The main cause for this restriction is due to the polling nature of

© 2005 by CRC Press

Survey of Network Management Frameworks

5-11

FIGURE 5.5 Hierarchy of managers.

SNMPv1. This means that the manager must periodically poll every system under his control, which takes time. To solve this problem, SNMPv2 introduced the so-called intermediate-level managers concept, which allows polling to be performed by a number of intermediate-level managers under control of top-level managers (TLMs) via the InformRequest command provided by SNMPv2. Figure 5.5 shows an example of hierarchical managers: before the intermediate-level managers start polling, the top-level manager tells the intermediate-level managers which variable must be polled from which agents. Furthermore, the toplevel manager tells the intermediate-level manager of the events he wants to be informed about. After the intermediate-level managers are configured, they start polling. If an intermediate-level manager detects an event of interest to the top-level manager, a special Inform PDU is generated and sent to the TLM. After reception of this PDU, the TLM directly operates upon the agent that caused the event. SNMPv2 dates back to 1992, when the IETF formed two working groups to define enhancements to SNMPv1. One of these groups focused on defining security functions, while the other concentrated on defining enhancements to the protocol. Unfortunately, the group tasked with developing the security enhancements broke into separate camps with diverging views concerning the manner by which security should be implemented. Two proposals (SNMPv2m and SNMPv2*) for the implementation of encryption and authentication have been issued. Thus, the goal of the SNMPv3 working group was to continue the effort of the disbanded SNMPv2 working group to define a standard for SNMP security and administration.

5.4.3 SNMPv3 The third version of the Simple Network Management Protocol (SNMPv3) was published as proposed standards in RFCs 2271 to 2275 [RFC2271, RFC2272, RFC2273, RFC2274, RFC2275], which describe an overall architecture plus specific message structure and security features, but do not define a new SNMP PDU format. This version is built upon the first two versions of SNMP, and so it reuses the SNMPv2 standards documents (RFCs 1902 to 1908). SNMPv3 can be thought of as SNMPv2 with additional security and administration capabilities [RFC2570]. This section focuses on the management architecture and security capacities of SNMPv3. 5.4.3.1 The Management Architecture The SNMPv3 management architecture is also based on the manager–agent principle. The architecture described in RFC 2271 consists of a distributed, interacting collection of SNMP entities. Each entity implements a part of the SNMP capabilities and may act as an agent, a manager, or a combination of both. The SNMPv3 working group defines five generic applications (Figure 5.6) for generating and receiving SNMP PDUs: command generator, command responder, notification originator, notification receiver, and proxy forwarder. A command generator application generates the GetRequest, GetNextRequest, GetBulkRequest, and SetRequest PDUs and handles Response PDUs. A command responder application executes in an agent and receives, processes, and replies to the received GetRequest, GetNextRequest,

© 2005 by CRC Press

5-12

The Industrial Communication Technology Handbook

FIGURE 5.6 SNMPv3 entity.

GetBulkRequest, and SetRequest PDUs. A notification originator application also executes within an agent and generates Trap PDUs. A notification receiver accepts and reacts to incoming notifications. And a proxy forwarder application forwards request, notification, and response PDUs. The architecture shown in Figure 5.6 also defines an SNMP engine that consists of four components: dispatcher, message processing subsystem, security subsystem, and access control subsystem. This SNMP engine is responsible for preparing PDU messages for transmission, extracting PDUs from incoming messages for delivery to the applications, and doing security-related processing of outgoing and incoming messages. 5.4.3.2 Security The security capabilities of the SNMPv3 are defined in RFC 2272, RFC 2274, RFC 2275, and RFC 3415 {RFC3415]. These specifications include message processing, a user-based security model, and a viewbased access control model. The message processing can be used with any security model as follows. For outgoing messages, the message processor is responsible for constructing the message header attached to the outgoing PDUs and for passing the appropriate parameters to the security entity so that it can perform authentication and privacy functions, if required. For incoming messages, the message processor is used for passing the appropriate parameters to the security model for authentication and privacy processing and for processing and removing the message headers of the incoming PDUs. The user-based security model (USM) specified in RFC 2274 uses data encryption standard (DES) for encryption and hashed message authentication codes (HMACs) for authentication [Sch95]. USM includes means for defining procedures by which one SNMP engine obtains information about another SNMP engine, and a key management protocol for defining procedures for key generation, update, and use. The view-based access control model implements the services required for an access control subsystem [RFC2275]. It makes an access control decision that is based on the requested resource, the security model and security level used for communicating the request, the context to which access is requested, the type of access requested, and the actual object for which access is requested.

5.5 ISO and Internet Management Standards: Analysis and Comparison The purpose of this section is to compare the two different network management frameworks described in the previous sections. This comparison focuses on the four management aspects described above (functional, information, communication, organization). In particular, the network management protocols (SNMP and CMIP), the management information base (MIB), and the management functions, management architectures, and security capabilities of these two frameworks are discussed. Possible solutions to some disadvantages are also presented.

© 2005 by CRC Press

Survey of Network Management Frameworks

5-13

5.5.1 SNMP and CMIP The biggest advantage of SNMP over CMIP is its simple design, which makes it easy to implement and as easy to use on a small network as on a large one. Users can specify the variables to be monitored in a straightforward manner. From a low-level perspective, each variable consists of the following information: • • • •

Variable name Its data type Its access attributes (READ-ONLY or READ-WRITE) Its value

Another advantage of SNMP is that it is in wide use today around the world. It has became so popular that no other network management protocol appeared to be likely to replace SNMP. The result of this is that almost all major vendors of network hardware, such as bridges and routers, design their products to support SNMP, making it very easy to implement. SNMP also has several disadvantages. The first deficiency with SNMP is that it has some large security leaks that can give network intruders access to managed devices. Intruders could also potentially shut down some terminals. To solve this problem, SNMPv2 and SNMPv3 have added some security mechanisms, as described above, that help combat the security problems. In comparison with SNMP, CMIP has a lot of advantages. The biggest advantage of CMIP is that an agent not only relays information to and from a terminal, as in SNMP, but in CMIP, an agent can perform management functions on its own instead of being restricted to gather information for remote processing by a manager. Another advantage of the CMIP approach is that it addresses many of the shortcomings of SNMP. For instance, it has built-in security management devices that support authorization, access control, and security logs. The result of this is a safer system from the beginning; no security upgrades are necessary. CMIP has many advantages, but has not been implemented yet. One problem of CMIP is that it needs more system resources than SNMP. Furthermore, a full implementation of CMIP requires adding more processes to network elements. One possible work-around is to decrease the size of the protocol by changing its specifications. Another problem with CMIP is that it is very difficult to program.

5.5.2 MIBs and SMI MIB and SMI represent the information aspects of a network management framework. In the ISO management framework, managed objects within the MIB are complex and have sophisticated data structures with three attributes: variable attributes that represent the variable characteristics, variable behaviors that define what actions can be triggered on that variable, and notifications that generate an event report whenever a specific event occurs [Sta99]. In contrast, in the SNMP framework, variables are only used to relay information to and from managers. The SNMP MIB concept has two important disadvantages. The first one is that the user has to know the names and meanings of (thousands of) different variables, which can be a daunting task. The second problem is the lack of variable aggregation: when the user wants to inquire about the values contained in an array, he has to ask separately for each element instead of naming the array at once. The latter problem has been fixed in the newer releases of SNMP: SNMPv2 and SNMPv3. These versions provide means for aggregating variables, e.g., by the new GetBulkRequest service. In fact, so many new features have been added that the formal specifications for SNMP MIBs have expanded considerably.

5.5.3 Network Management Functions One advantage of the ISO management framework over the IETF framework is that ISO has issued the five specific management functional areas, which are useful for users to develop management applications. In contrast, the IETF management framework has not defined any specific network management functions. These have to be provided entirely by the user. In fact, the IETF management standards explain

© 2005 by CRC Press

5-14

The Industrial Communication Technology Handbook

how individual management operations should be performed, but they do not specify the sequence in which these operations should be carried out to solve particular management problems.

5.6 DHCP: IP Address Management Framework for IPv4 In the previous sections, two network management frameworks have been discussed. These standards are used for developing automated network management tools and applications for monitoring and maintaining the network. In this section, the Dynamic Host Configuration Protocol (DHCP), as an IP address management framework derived from IETF, will be discussed. In comparison with the standards described before, DHCP is based on neither ISO nor IETF management standards. Each computer that can connect to the Internet needs a unique IP address. When an organization sets up its computers with a connection to the Internet, an IP address must be assigned to each machine. In the early phase of the Internet, the administrator had to manually assign an IP address to each computer, and if computers were moved to another location in another part of the network, a new IP address had to be entered. With the daily changes and additions of new IP addresses, it has become extremely difficult to keep track of the IP address records across the multitude of IP nodes and subnets. Problems involving duplicate IP addresses, missing devices, and overflows of allocated IP address pools can bring down parts or the whole of a network until the problems are manually remedied. To overcome those problems, IETF has developed the Dynamic Host Configuration Protocol [RFC2131, RFC2132, RFC3046], which allows for the automatic assignment of IP addresses to devices as they connect to the network. DHCP allows for a computer to acquire all the configuration information it needs in a single message. Furthermore, this protocol permits the allocation of IP addresses automatically. To use DHCP’s dynamic address allocation mechanism, the network administrator must configure a DHCP server by supplying a set of IP addresses. Whenever a new computer connects to the network, this computer contacts the DHCP server and requests an IP address. The server chooses one of the addresses the administrator specified and allocates that address to the computer. In the next subsections, IP address allocation and IP address management within DHCP are discussed in detail.

5.6.1 IP Address Allocation Mechanisms DHCP supports three mechanisms for IP address allocation: • Automatic allocation: DHCP server assigns a permanent IP address to a computer when it first attaches to the network. • Dynamic allocation: DHCP server assigns an IP address to a computer for a limited period of time (or until the client explicitly relinquishes the address). This mechanism is useful for assigning an address to a computer that will be connected to the network only temporarily or for sharing a limited IP address pool among a group of clients that do not need permanent IP addresses. • Manual allocation: The network administrator can configure a specific address for a specific computer. A particular network will use one or more of those mechanisms, depending on the policies of the network administrator.

5.6.2 The IP Address Management of DHCP In this subsection, DHCP is discussed from the viewpoint of four aspects: organization, information, function, and communication, which were presented in the previous sections while describing the ISO and IETF management frameworks. 5.6.2.1 Organization Aspect DHCP is built on a client–server model. The DHCP system consists of three types of devices: clients, relays, and servers. DHCP servers provide configuration information for one or several subnets. A DHCP

© 2005 by CRC Press

Survey of Network Management Frameworks

5-15

FIGURE 5.7 Communication between DHCP server and DHCP client.

FIGURE 5.8 The DHCP PDU format.

client is a host configured using information obtained from DHCP servers. If a client and a server reside on different networks, then a relay server on the client’s network is needed to relay broadcast messages between the server and the client. The organization architecture is shown in Figure 5.7. A DHCP server in a network receives DHCP requests from a client and, in case of dynamic address allocation policies selected, allocates an IP address to the requesting client. 5.6.2.2 Information Aspect The information aspects of DHCP deal with the network parameters (configuration parameters and IP addresses) exchanged between DHCP servers and DHCP clients, and the persistent storage of these parameters. The DHCP server stores a key–value entry for each client, where the key is some unique identifier and the value contains the configuration parameters for the client. A client can query the DHCP server to retrieve its configuration parameters. The client’s interface to the configuration parameters repository consists of protocol messages to request configuration parameters, and responses from the server carrying the configuration parameters. 5.6.2.3 Functional Aspect The functions of DHCP are defined through DHCP PDU. The format of a DHCP PDU is shown in Figure 5.8. Table 5.2 describes the fields in a DHCP message. There are eight message types for DHCP: five of them are used as messages sent from the client to the server, and the other three are used for messages sent from the server to the client. The types of these messages are described in Table 5.3. 5.6.2.4 Communication Aspect The communication aspect deals with rules for communication between a DHCP client and a DHCP server for exchanging the DHCP PDUs. The client–server interaction can be classified in two cases: (1) client–server interaction for allocating an IP address, and (2) client–server interaction for reusing a previously allocated IP address. In both cases, the communication between clients and servers is performed in a confirmed way and initiated by clients.

© 2005 by CRC Press

5-16

The Industrial Communication Technology Handbook

TABLE 5.2

DHCP Message Field Description

Field

Description

op xid ciaddr yiaddr giaddr sname file options

Message type (BOOTREQUEST, BOOTREPLY) identifies whether a message is sent from a client to a server (BOOTREQUEST) or from a server to a client (BOOTREPLY) Transaction ID, a random number chosen by client, used by the client and server to associate messages and responses between a client and a server Client IP address; only filled in if client can respond to ARP request Your client IP address Relay agent IP address, used in booting via a relay agent Optional server host name Boot file name Optional parameters field

TABLE 5.3

DHCP Message Type

PDU (Message)

Description Sent from Client to Server

DHCPDISCOVER DHCPREQUEST DHCPDECLINE DHCPINFORM DHCPRELEASE

Client broadcast to local available servers Requesting parameters from one server Indicating an IP address is already in use Asking for a local configuration parameter Relinquishing an IP address and canceling remaining lease Sent from Server to Client

DHCPOFFER DHCPACK DHCPNAK

Respond to DHCPDISCOVER with an offer of configuration parameters Acknowledgment with configuration parameters, including committed IP address Refusing request for configuration parameters (e.g., requested IP address already in use)

• The client–server interaction for allocating an IP address is performed as follows [RFC2131]: 1. A client, which attaches to the network for the first time, takes the initiative by sending a DHCPDISCOVER broadcast message to locate available servers. The DHCPDISCOVER message may include options that suggest values for the network address and lease duration. 2. DHCP servers receiving the DHCPDISCOVER message may return DHCPOFFER or may not return (many servers may receive the same DHCPDISCOVER message). If a server decides to respond, the server puts an available address into the “yiaddr” field (and other configuration parameters in DHCP options) and broadcasts a DHCPOFFER message. At this point, there is no agreement of an assignment between the server and the client. 3. The client receives one or more DHCPOFFER messages from one or more servers. It then chooses one server from them. The client puts the IP address of the selected server into the “server identifier” option of a DHCPREQUEST and broadcasts it to indicate which server it has selected. This DHCPREQUEST is broadcasted and relayed through DHCP relay agents. 4. Servers receive the DHCPREQUEST broadcast from the client. The servers check the “server identifier” option. If it does not match with its own address, the server interprets the message as a notification that the client has declined the offer. The selected server sends the DHCPACK (if its address is available) or the DHCPNAK (for example, the address is already assigned to another client). 5. The client, which gets the DHCPACK, starts using the IP address. From that point on, the client is configured. If it gets DHCPNAK, it restarts from step 1. 6. If the client finds a problem with the assigned address of DHCPACK, it sends DHCPDECLINE to the server and restarts from step 1. 7. The client may choose to relinquish its lease of a network address by sending a DHCPRELEASE message to the server.

© 2005 by CRC Press

Survey of Network Management Frameworks

5-17

• Client–server interaction for reusing a previously allocated IP address: If a client remembers and wishes to reuse a previously allocated IP address, it may choose to omit some of the steps described in the previous section. The interaction is performed as follows [RFC2131]: 1. The client broadcasts a DHCPREQUEST message with the “requested IP address” option, which indicates the previously assigned address. 2. A DHCP server, which has a binding of the address, returns DHCPACK or DHCPNAK to the client. The DHCPACK message indicates that the client can use the previously assigned address. The DHCPNAK means that the IP address is already in use. 3. If the client receives the DHCPACK message, it performs a final check of the parameters, notes the duration of the lease specified in the DHCPACK message, and starts using the IP address. If the client receives the DHCPNAK message, the client must send the DHCPDECLINE message to the server and restarts the configuration process by requesting a new network address. If the client receives neither the DHCPACK nor DHCPNAK message, it times out and retransmits the DHCPREQUEST message. 4. The client may choose to relinquish its lease of an IP address by sending a DHCPRELEASE message to the server. DHCP uses UDP as its transport protocol. DHCP messages from a client to a server are sent to the DHCP server’s port (67), and DHCP messages from a server to a client are sent to the DHCP client’s port (68).

5.6.3 Advantages and Disadvantages of DHCP for IPv4 Nowadays, DHCP is used in many installations to pass configuration information to workstations. One of the main advantages of DHCP is that a workstation is not required to have any kind of permanent storage space. All network configuration parameters can be passed using DHCP without any human interaction. The other advantage is that the DHCP can play an important role in reducing the cost of ownership for large organizations by moving the administration of client systems to centralized management servers. DHCP helps reduce the impact of the increasing scarcity of available IP addresses in two ways. First, DHCP can be used to manage the limited standard IP addresses that are available to an organization. It does this by issuing the addresses to clients on an as-needed basis and reclaiming them when the addresses are no longer required. Second, DHCP can be used in conjunction with network address translation (NAT) to issue private network addresses to connect clients to the Internet. However, HDCP for IPv4 has a few inherent problems. First of all, the current DHCP implementation provides no authentication or security mechanisms, whereby the authentication for DHCP is proposed in RFC 3118 [RFC3118]. One example of this security problem is that a message broadcast by a server can lead to the situation in which all the traffic can be routed through a malicious host that eavesdrops on the traffic all the time. An even more dreadful situation would arise if a computer could obtain a tampered boot file that logs all the login–password pairs to the same remote hosts. Second, data about configuration parameters and IP addresses are held locally in the DHCP servers. There exists no standard for controlling and monitoring this configuration data from the DHCP servers. Another disadvantage of DHCP relates to the leasing mechanism: the client is expected to stop using any dynamically allocated IP address after the lease time expires. Additionally, a client requesting a new lease is not guaranteed to receive the same IP address as it had previously.

5.7 Conclusions We have surveyed the management architecture, protocols, services, and management information base of the network management frameworks standardized by ISO and IETF. We also introduced DHCP, an IP address management framework standardized by IETF. Within each of those frameworks, standards relating to four fundamental aspects of network management (functional, information, communication, and organization aspects) were addressed.

© 2005 by CRC Press

5-18

The Industrial Communication Technology Handbook

Both ISO and IETF network management frameworks have their advantages and disadvantages. However, the key decision factor in choosing between the two frameworks is in their implementation. It has been until now almost impossible to find a system with the necessary resources to support the ISO framework, although it is conceptually superior to SNMP (v1, v2, and v3) in both design and operation. In comparison with both network management frameworks, DHCP is much simpler in both design and implementation. This is largely due to the fact that it focuses only on one particular task —the IP address management. DHCP can play an important role in making systems management simpler and less expensive by moving the management of IP addresses away from the client systems and onto centralized servers.

References [ASN90] ISO/IEC 8824. Specification of Abstract Syntax Notation One (ASN.1), April 1990. [ISO7498-4] ITU-T-ISO/IEC.ITU-T X.700-ISO/IEC 7498-4. Information Processing Systems: Open Systems Interconnection: Management Framework for Open System Interconnection, 1992. [ISO9595] ISO 9595. Information Processing Systems: Open Systems Interconnection: Common Management Information Service Definition, Geneva, 1990. [ISO9596] ISO 9596. Information Processing Systems: Open Systems Interconnection: Common Management Information Protocol, Geneva, 1991. [ISO10040] ITU-T-ISO/IEC.ITU-T X.701-ISO/IEC 10040. Information Processing Systems: Open System Interconnection: System Management Overview, 1992. [ISO10165-1] ITU-T-ISO/IEC.ITU-T X.720-ISO/IEC 10165-1. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Management Information Model, Geneva, 1993. [ISO10165-2] ISO 10165-2. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Definition of Management Information, Geneva, 1993. [ISO10165-4] ISO 10165-4. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Part 4: Guidelines for the Definition of Managed Objects, Geneva, 1993. [ISO10165-5] ISO 10165-5. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Generic Management Information, Geneva, 1993. [ISO10165-7] ISO 10165-7. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: General Relationship Model, Geneva, 1993. [KR01] James F. Kurose, Keith W. Ross. Computer Networking: A Top-Down Approach Featuring the Internet, Addison Wesley, Reading, MA, 2001. [RFC951] Bill Crosoft, John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September 1985. [RFC1066] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets, RFC 1066, 1998. [RFC1155] K. McCloghrie, M. Rose. Structure and Identification of Management Information for TCP/ IP-Based Internets, RFC 1155, 1990. [RFC1157] J. Case, M. Fedor, M. Schofftall, C. Davin. The Simple Network Management Protocol, RFC 1157, May 1990. [RFC1212] K. McCloghrie, M. Rose. Concise MIB Definitions, RFC 1212, 1991. [RFC1213] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets: MIB-II, RFC 1213, 1991. [RFC1215] M. Rose. A Convention for Defining Traps for use with the SNMP, RFC 1215, 1991. [RFC1441] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Introduction to Version 2 of the InternetStandard Network Management Framework, RFC 1441, 1993. [RFC1445] J. Galvin, K. McCloghrie. Administrative Model for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1445, 1993.

© 2005 by CRC Press

Survey of Network Management Frameworks

5-19

[RFC1448] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1448, 1993. [RFC1542] W. Wimer. Clarifications and Extensions for Bootstrap Protocol, RFC 1542, October 1993. [RFC1902] J. Case, K. McCloghrie, M. Rose, S. Waldbusser. Structure of Management Information for Version 2 of Simple Network Management Protocol (SNMPv2), RFC 1902, January 1996. [RFC2131] R. Droms. Dynamic Host Configuration Protocol, RFC 2131, March 1997. [RFC2132] S. Alexander, R. Droms. DHCP Options and BOOTP Vendor Extensions, RFC 2132, March 1997. [RFC2271] D. Harrington, R. Presuhn, B. Wijnen. An Architecture for Describing SNMP Management Frameworks, RFC 2271, 1998. [RFC2272] J. Case, D. Harrington, R. Presuhn, B. Wijnen. Message Processing and Dispatching for the Simple Network Management Protocol (SNMP), RFC 2272, 1998. [RFC2273] D. Levi, P. Meyer, B. Stewart. SNMPv3 Applications, RFC 2273, 1998. [RFC2274] U. Blumenthal, B. Wijnen. User-Based Security Model (USM) for Version 3 of the Simple Network Management Protocol (SNMPv3), RFC 2274, 1998. [RFC2275] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 2275, 1998. [RFC2570] J.Case, R. Mundy, D. Partain, B. Steward. Introduction to Version 3 of the Internet Standard Network Management Framework, RFC 2570, 1999. [RFC3046] M. Patrick. DHCP Relay Agent Information Option, RFC 3046, January 2001. [RFC3118] R. Droms, W. Arbaugh. Authentication for DHCP Messages, RFC 3118, June 2001. [RFC3411] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3411, 2002. [RFC3415] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3415, 2002. [Ros90] Marshall T. Rose. The Open Book: A Practical Perspective on OSI, Prentice Hall, Englewood Cliffs, NJ, 1990. [Sch95] Bruce Schneider. Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley, New York, 1995. [Sta93a] William Stallings. Networking Standards: A Guide to OSI, ISDN, LAN, and MAN Standards, Addison-Wesley, Reading, MA, 1993. [Sta93b] William Stallings. SNMP, SNMPv2 and CMIP: The Practical Guide to Network Management Standards, Addison-Wesley, Reading, MA, 1993. [Sta99] William Stallings. SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, Addison-Wesley, Reading, MA, 1999.

© 2005 by CRC Press

6 Internet Security 6.1 6.2

Security Attacks and Security Properties...........................6-1 Security Mechanisms ..........................................................6-3 Attack Prevention • Attack Avoidance • Attack and Intrusion Detection

Christopher Kruegel Vienna University of Technology

6.3 Secure Network Protocols ................................................6-10 6.4 Secure Applications ...........................................................6-12 6.5 Summary............................................................................6-13 References .....................................................................................6-13

In order to provide useful services or to allow people to perform tasks more conveniently, computer systems are attached to networks and get interconnected. This results in the worldwide collection of local and wide area networks known as the Internet. Unfortunately, the extended access possibilities also entail increased security risks, as it opens additional avenues for an attacker. For a closed, local system, the attacker was required to be physically present at the network in order to perform unauthorized actions. In the networked case, each host that can send packets to the victim can be potentially utilized. As certain services (such as Web or name servers) need to be publicly available, each machine on the Internet might be the originator of malicious activity. This fact makes attacks very likely to happen on a regular basis. The following attempts to give a systematic overview of security requirements of Internet-based systems and potential means to satisfy them. We define properties of a secure system and provide a classification of potential threats to it. We also introduce mechanisms to defend against attacks that attempt to violate desired properties. The most widely used means to secure application data against tampering and eavesdropping, the Secure Sockets Layer (SSL), and its successor, the Transport Layer Security (TLS) protocol, are discussed. Finally, we briefly describe popular application programs that can act as building blocks for securing custom applications. Before one can evaluate attacks against a system and decide on appropriate mechanisms against them, it is necessary to specify a security policy [23]. A security policy defines the desired properties for each part of a secure computer system. It is a decision that has to take into account the value of the assets that should be protected, the expected threats, and the cost of proper protection mechanisms. A security policy that is sufficient for the data of a normal user at home may not be sufficient for bank applications, as these systems are obviously a more likely target and have to protect more valuable resources. Although often neglected, the formulation of an adequate security policy is a prerequisite before one can identify threats and appropriate mechanisms to face them.

6.1 Security Attacks and Security Properties For the following discussion, we assume that the function of a system that is the target of an attack is to provide information. In general, there is a flow of data from a source (e.g., host, file, memory) to a

6-1 © 2005 by CRC Press

6-2

The Industrial Communication Technology Handbook

FIGURE 6.1 Security attacks.

destination (e.g., remote host, other file, user) over a communication channel (e.g., wire, data bus). The task of the security system is to restrict access to this information to only those parties (persons or processes) that are authorized to have access according to the security policy in use. In the case of an automation system that is remotely connected to the Internet, the information flow is from or to a control application that manages sensors and actuators via communication lines of the public Internet and the network of the automation system (e.g., a fieldbus). The normal information flow and several categories of attacks that target it are shown in Figure 6.1 and explained below (according to [22]): 1. Interruption: An asset of the system gets destroyed or becomes unavailable. This attack targets the source or the communication channel and prevents information from reaching its intended target (e.g., cut the wire, overload the link so that the information gets dropped because of congestion). Attacks in this category attempt to perform a kind of denial of service (DOS). 2. Interception: An unauthorized party gets access to the information by eavesdropping into the communication channel (e.g., wiretapping). 3. Modification: The information is not only intercepted, but modified by an unauthorized party while in transit from the source to the destination. By tampering with the information, it is actively altered (e.g., modifying message content). 4. Fabrication: An attacker inserts counterfeit objects into the system without having the sender do anything. When a previously intercepted object is inserted, this process is called replaying. When the attacker pretends to be the legitimate source and inserts his desired information, the attack is called masquerading (e.g., replay an authentication message, add records to a file). The four classes of attacks listed above violate different security properties of the computer system. A security property describes a desired feature of a system with regards to a certain type of attack. A common classification following [5, 13] is listed below: • Confidentiality: This property covers the protection of transmitted data against its release to nonauthorized parties. In addition to the protection of the content itself, the information flow should also be resistant against traffic analysis. Traffic analysis is used to gather other information than the transmitted values themselves from the data flow (e.g., timing data, frequency of messages).

© 2005 by CRC Press

Internet Security

6-3

• Authentication: Authentication is concerned with making sure that the information is authentic. A system implementing the authentication property assures the recipient that the data are from the source from which they claim to be. The system must make sure that no third party can masquerade successfully as another source. • Nonrepudiation: This property describes the feature that prevents either sender or receiver from denying a transmitted message. When a message has been transferred, the sender can prove that it has been received. Similarly, the receiver can prove that the message has actually been sent. • Availability: Availability characterizes a system whose resources are always ready to be used. Whenever information needs to be transmitted, the communication channel is available and the receiver can cope with the incoming data. This property makes sure that attacks cannot prevent resources from being used for their intended purpose. • Integrity: Integrity protects transmitted information against modifications. This property ensures that a single message reaches the receiver as it has left the sender, but integrity also extends to a stream of messages. It means that no messages are lost, duplicated, or reordered, and it makes sure that messages cannot be replayed. As destruction is also covered under this property, all data must arrive at the receiver. Integrity is not only important as a security property, but also as a property for network protocols. Message integrity must also be ensured in case of random faults, not only in case of malicious modifications.

6.2 Security Mechanisms Different security mechanisms can be used to enforce the security properties defined in a given security policy. Depending on the anticipated attacks, different means have to be applied to satisfy the desired properties. We divide these measures against attacks into three different classes: attack prevention, attack avoidance, and attack detection.

6.2.1 Attack Prevention Attack prevention is a class of security mechanisms that contains ways of preventing or defending against certain attacks before they can actually reach and affect the target. An important element in this category is access control, a mechanism that can be applied at different levels, such as the operating system, the network, or the application layer. Access control [23] limits and regulates the access to critical resources. This is done by identifying or authenticating the party that requests a resource and checking its permissions against the rights specified for the demanded object. It is assumed that an attacker is not legitimately permitted to use the target object and is therefore denied access to the resource. As access is a prerequisite for an attack, any possible interference is prevented. The most common form of access control used in multiuser computer systems is access control lists for resources that are based on the user identity of the process that attempts to use them. The identity of a user is determined by an initial authentication process that usually requires a name and a password. The login process retrieves the stored copy of the password corresponding to the user name and compares it with the presented one. When both match, the system grants the user the appropriate user credentials. When a resource should be accessed, the system looks up the user and group in the access control list and grants or denies access as appropriate. An example of this kind of access control is a secure Web server. A secure Web server delivers certain resources only to clients that have authenticated themselves and pose sufficient credentials for the desired resource. The authentication process is usually handled by the Web client, such as Microsoft Internet Explorer or Mozilla, by prompting the user to enter his name and password. The most important access control system at the network layer is a firewall [4]. The idea of a firewall is based on the separation of a trusted inside network of computers under single administrative

© 2005 by CRC Press

6-4

The Industrial Communication Technology Handbook

Firewall

Firewall

Internet

Inside Network Demilitarized Zone (DMZ)

FIGURE 6.2 Demilitarized zone.

control from a potential hostile outside network. The firewall is a central choke point that allows enforcement of access control for services that may run at the inside or outside. The firewall prevents attacks from the outside against the machines in the inside network by denying connection attempts from unauthorized parties located outside. In addition, a firewall may also be utilized to prevent users behind the firewall from using certain services that are outside (e.g., surfing Web sites containing pornographic material). For certain installations, a single firewall is not suitable. Networks that consist of several server machines that need to be publicly accessible and workstations that should be completely protected against connections from the outside would benefit from a separation between these two groups. When an attacker compromises a server machine behind a single firewall, all other machines can be attacked from this new base without restrictions. To prevent this, one can use two firewalls and the concept of a demilitarized zone (DMZ) [4] in between, as shown in Figure 6.2. In this setup, one firewall separates the outside network from a segment (DMZ) with the server machines, while a second one separates this area from the rest of the network. The second firewall can be configured in a way that denies all incoming connection attempts. Whenever an intruder compromises a server, he is now unable to immediately attack a workstation located in the inside network. The following design goals for firewalls are identified in [4]: 1. All traffic from inside to outside, and vice versa, must pass through the firewall. This is achieved by physically blocking all access to the internal network except via the firewall. 2. Only authorized traffic, as defined by the local security policy, will be allowed to pass. 3. The firewall itself should be immune to penetration. This implies the use of a trusted system with a secure operating system. A trusted, secure operating system is often purpose built, has heightened security features, and only provides the minimal functionality necessary to run the desired applications. These goals can be reached by using a number of general techniques for controlling access. The most common is called service control and determines Internet services that can be accessed. Traffic on the Internet is currently filtered on the basis of Internet Protocol (IP) addresses and Transmission Control Protocol (TCP)/User Datagram Protocol (UDP) port numbers. In addition, there may be proxy software that receives and interprets each service request before passing it on. Direction control is a simple mechanism to control the direction in which particular service requests may be initiated and permitted to flow through. User control grants access to a service based on user credentials similar to the technique used in a multiuser operating system. Controlling external users requires secure authentication over the network (e.g., such as provided in IPSec [10]). A more declarative approach in contrast to the operational variants mentioned above is behavior control. This technique determines how particular services are used. It may be utilized to filter e-mail to eliminate spam or to allow external access to only part of the local Web pages. A summary of capabilities and limitations of firewalls is given in [22]. The following benefits can be expected: • A firewall defines a single choke point that keeps unauthorized users out of the protected network. The use of such a point also simplifies security management. • It provides a location for monitoring security-related events. Audits, logs, and alarms can be implemented on the firewall directly. In addition, it forms a convenient platform for some nonsecurity-related functions, such as address translation and network management. • A firewall may serve as a platform to implement a virtual private network (e.g., by using IPSec).

© 2005 by CRC Press

Internet Security

6-5

The list below enumerates the limits of the firewall access control mechanism: • A firewall cannot protect against attacks that bypass it, for example, via a direct dial-up link from the protected network to an Internet service provider (ISP). It also does not protect against internal threats from an inside hacker or an insider cooperating with an outside attacker. • A firewall does not help when attacks are against targets whose access has to be permitted. • It cannot protect against the transfer of virus-infected programs or files. It would be impossible, in practice, for the firewall to scan all incoming files and e-mails for viruses. Firewalls can be divided into two main categories. A packet-filtering router, or short packet filter, is an extended router that applies certain rules to the packets that are forwarded. Usually, traffic in each direction (in- and outgoing) is checked against a rule set that determines whether a packet is permitted to continue or should be dropped. The packet filter rules operate on the header fields used by the underlying communication protocols, for the Internet almost always IP, TCP, and UDP. Packet filters have the advantage that they are cheap, as they can often be built on existing hardware. In addition, they offer good performance for high-traffic loads. An example for a packet filter is the iptables package, which is implemented as part of the Linux 2.4 routing software. A different approach is followed by an application-level gateway, also called proxy server. This type of firewall does not forward packets on the network layer but acts as a relay on the application level. The user contacts the gateway, which in turn opens a connection to the intended target (on behalf of the user). A gateway completely separates the inside and outside networks at the network level and only provides a certain set of application services. This allows authentication of the user who requests a connection and session-oriented scanning of the exchanged traffic up to the application-level data. This feature makes application gateways more secure than packet filters and offers a broader range of log facilities. On the downside, the overhead of such a setup may cause performance problems under heavy loads. Another important element in the set of attack prevention mechanisms is system hardening. System hardening is used to describe all steps that are taken to make a computer system more secure. It usually refers to changing the default configuration to a more secure one, possibly at the expense of ease of use. Vendors usually preinstall a large set of development tools and utilities, which, although beneficial to the new user, might also contain vulnerabilities. The initial configuration changes that are part of system hardening include the removal of services, applications, and accounts that are not needed and the enabling of operating system auditing mechanisms (e.g., event log in Windows). Hardening also involves a vulnerability assessment of the system. Numerous open-source tools such as network (e.g., nmap [8]) and vulnerability scanners (e.g., Nessus [12]) can help to check a system for open ports and known vulnerabilities. This knowledge then helps to remedy these vulnerabilities and close unnecessary ports. An important and ongoing effort in system hardening is patching. Patching describes a method of updating a file that replaces only the parts being changed, rather than the entire file. It is used to replace parts of a (source or binary) file that contain a vulnerability that is exploitable by an attacker. To be able to patch, it is necessary that the system administrators keep up to date with security advisories that are issued by vendors to inform about security-related problems in their products.

6.2.2 Attack Avoidance Security mechanisms in this category assume that an intruder may access the desired resource but the information is modified in a way that makes it unusable for the attacker. The information is preprocessed at the sender before it is transmitted over the communication channel and postprocessed at the receiver. While the information is transported over the communication channel, it resists attacks by being nearly useless for an intruder. One notable exception is attacks against the availability of the information, as an attacker could still interrupt the message. During the processing step at the receiver, modifications or errors that might have previously occurred can be detected (usually because the information cannot be correctly reconstructed). When no modification has taken place, the information at the receiver is identical to the one at the sender before the preprocessing step.

© 2005 by CRC Press

6-6

The Industrial Communication Technology Handbook

FIGURE 6.3 Encryption and decryption.

The most important member in this category is cryptography, which is defined as the science of keeping messages secure [18]. It allows the sender to transform information into a random data stream from the point of view of an attacker, but to have it recovered by an authorized receiver (Figure 6.3). The original message is called plain text (sometimes clear text). The process of converting it through the application of some transformation rules into a format that hides its substance is called encryption. The corresponding disguised message is denoted cipher text, and the operation of turning it back into clear text is called decryption. It is important to notice that the conversion from plain to cipher text has to be lossless in order to be able to recover the original message at the receiver under all circumstances. The transformation rules are described by a cryptographic algorithm. The function of this algorithm is based on two main principles: substitution and transposition. In the case of substitution, each element of the plain text (e.g., bit, block) is mapped into another element of the used alphabet. Transposition describes the process where elements of the plain text are rearranged. Most systems involve multiple steps (called rounds) of transposition and substitution to be more resistant against cryptanalysis. Cryptanalysis is the science of breaking the cipher, i.e., discovering the substance of the message behind its disguise. When the transformation rules process the input elements one at a time, the mechanism is called a stream cipher; in case of operating on fixed-size input blocks, it is called a block cipher. If the security of an algorithm is based on keeping the way the algorithm works (i.e., the transformation rules) secret, it is called a restricted algorithm. Those algorithms are no longer of any interest today because they do not allow standardization or public quality control. In addition, when a large group of users is involved, such an approach cannot be used. A single person leaving the group makes it necessary for everyone else to change the algorithm. Modern cryptosystems solve this problem by basing the ability of the receiver to recover encrypted information on the fact that he possesses a secret piece of information (usually called the key). Both encryption and decryption functions have to use a key, and they are heavily dependent on it. When the security of the cryptosystem is completely based on the security of the key, the algorithm itself may be revealed. Although the security does not rely on the fact that the algorithm is unknown, the cryptographic function itself and the used key, together with its length, must be chosen with care. A common assumption is that the attacker has the fastest commercially available hardware at his disposal in his attempt to break the cipher text. The most common attack, called known plain text attack, is executed by obtaining cipher text together with its corresponding plain text. The encryption algorithm must be so complex that even if the code breaker is equipped with plenty of such pairs and powerful machines, it is infeasible for him to retrieve the key. An attack is infeasible when the cost of breaking the cipher exceeds the value of the information or the time it takes to break it exceeds the life span of the information. Given pairs of corresponding cipher and plain text, it is obvious that a simple key guessing algorithm will succeed after some time. The approach of successively trying different key values until the correct one is found is called brute force attack because no information about the algorithm is utilized whatsoever. In order to be useful, it is a necessary condition for an encryption algorithm that brute force attacks are infeasible. Depending on the keys that are used, one can distinguish two major cryptographic approaches: secret and public key cryptosystems.

© 2005 by CRC Press

Internet Security

6-7

6.2.2.1 Secret Key Cryptography This is the kind of cryptography that has been used for the transmission of secret information for centuries, long before the advent of computers. These algorithms require that the sender and receiver agree on a key before communication is started. It is common for this variant (which is also called single key or symmetric encryption) that a single secret key is shared between the sender and receiver. It needs to be communicated in a secure way before the actual encrypted communication can start, and has to remain secret as long as the information is to remain secret. Encryption is achieved by applying an agreed function to the plain text using the secret key. Decryption is performed by applying the inverse function using the same key. The classic example of a secret key block cipher, which is widely deployed today, is the data encryption standard (DES) [6]. DES was developed in 1977 by IBM and adopted as a standard by the U.S. government for administrative and business use. Recently, it has been replaced by the advanced encryption standard (AES — Rijndael) [1]. It is a block cipher that operates on 64-bit plain text blocks and utilizes a key 56 bits in length. The algorithm uses 16 rounds that are key dependent. During each round 48 key bits are selected and combined with the block that is encrypted. Then, the resulting block is piped through a substitution and a permutation phase (which use known values and are independent of the key) to make cryptanalysis harder. Although there is no known weakness of the DES algorithm itself, its security has been much debated. The small key length makes brute force attacks possible, and several cases have occurred where DES-protected information has been cracked. A suggested improvement called 3DES uses three rounds of the simple DES with three different keys. This extends the key length to 168 bits while still resting on the very secure DES base. A well-known stream cipher that has been debated recently is RC4 [16], which has been developed by RSA. It is used to secure the transmission in wireless networks that follow the IEEE 802.11 standard and forms the core of the wired equivalent protection (WEP) mechanism. Although the cipher itself has not been broken, current implementations are flawed and reduce the security of RC4 down to a level where the used key can be recovered by statistical analysis within a few hours. 6.2.2.2 Public Key Cryptography Since the advent of public key cryptography, the knowledge of the key that is used to encrypt a plain text also allowed the inverse process, the decryption of the cipher text. In 1976, this paradigm of cryptography was changed by Diffie and Hellman [7] when they described their public key approach. Public key cryptography utilizes two different keys, one called the public key, the other the private key. The public key is used to encrypt a message while the corresponding private key is used to do the opposite. Their innovation was based on the fact that it is infeasible to retrieve the private key given the public key. This makes it possible to remove the weakness of secure key transmission from the sender to the receiver. The receiver can simply generate his public–private key pair and announce the public key without fear. Anyone can obtain this key and use it to encrypt messages that only the receiver with his private key is able to decrypt. Mathematically, the process is based on the trap door of one-way functions. A one-way function is a function that is easy to compute but very hard to inverse. That means that given x, it is easy to determine f(x), but given f(x), it is hard to get x. Hard is defined as computationally infeasible in the context of cryptographically strong one-way functions. Although it is obvious that some functions are easier to compute than their inverse (e.g., square of a value in contrast to its square root), there is no mathematical proof or definition of one-way functions. There are a number of problems that are considered difficult enough to act as one-way functions, but it is more an agreement among cryptanalysts than a rigorously defined set (e.g., factorization of large numbers). A one-way function is not directly usable for cryptography, but it becomes so when a trap door exists. A trap door is a mechanism that allows one to easily calculate x from f(x) when an additional information y is provided. A common misunderstanding about public key cryptography is thinking that it makes secret key systems obsolete, either because it is more secure or because it does not have the problem of secretly exchanging keys. As the security of a cryptosystem depends on the length of the used key and the utilized

© 2005 by CRC Press

6-8

The Industrial Communication Technology Handbook

transformation rules, there is no automatic advantage of one approach over the other. Although the key exchange problem is elegantly solved with a public key, the process itself is very slow and has its own problems. Secret key systems are usually a factor of 1000 (see [18] for exact numbers) faster than their public key counterparts. Therefore, most communication is still secured using secret key systems, and public key systems are only utilized for exchanging the secret key for later communication. This hybrid approach is the common design to benefit from the high speed of conventional cryptography (which is often implemented directly in hardware) and from a secure key exchange. A problem in public key systems is the authenticity of the public key. An attacker may offer the sender his own public key and pretend that it originates from the legitimate receiver. The sender then uses the fake public key to perform his encryption and the attacker can simply decrypt the message using his private key. In order to thwart an attacker that attempts to substitute his public key for the victim’s, certificates are used. A certificate combines user information with the user’s public key and the digital signature of a trusted third party that guarantees that the key belongs to the mentioned person. The trusted third party is usually called a certification authority (CA). The certificate of a CA itself is usually verified by a higher-level CA that confirms that the CA’s certificate is genuine and contains its public key. The chain of third parties that verify their respective lower-level CAs has to end at a certain point, which is called the root CA. A user that wants to verify the authenticity of a public key and all involved CAs needs to obtain the self-signed certificate of the root CA via an external channel. Web browsers (e.g., Netscape Navigator, Internet Explorer) usually ship with a number of certificates of globally known root CAs. A framework that implements the distribution of certificates is called a public key infrastructure (PKI). An important protocol for key management is X.509 [25]. Another important issue is revocation, the invalidation of a certificate when the key has been compromised. The best-known public key algorithm and textbook classic is RSA [17], named after its inventors at MIT, Rivest, Shamir, and Adleman. It is a block cipher that is still utilized for the majority of current systems, although the key length has been increased over recent years. This has put a heavier processing load on applications, a burden that has ramifications, especially for sites doing electronic commerce. A competitive approach that promises similar security as RSA using far smaller key lengths is elliptic curve cryptography. However, as these systems are new and have not been subject to sustained cryptanalysis, the confidence level in them is not yet as high as in RSA. 6.2.2.3 Authentication and Digital Signatures An interesting and important feature of public key cryptography is its possible use for authentication. In addition to making the information unusable for attackers, a sender may utilize cryptography to prove his identity to the receiver. This feature is realized by digital signatures. A digital signature must have similar properties as a normal handwritten signature. It must be hard to forge and it has to be bound to a certain document. In addition, one has to make sure that a valid signature cannot be used by an attacker to replay the same (or different) messages at a later time. A way to realize such a digital signature is by using the sender’s private key to encrypt a message. When the receiver is capable of successfully decrypting the cipher text with the sender’s public key, he can be sure that the message is authentic. This approach obviously requires a cryptosystem that allows encryption with the private key, but many (such as RSA) offer this option. It is easy for a receiver to verify that a message has been successfully decrypted when the plain text is in a human readable format. For binary data, a checksum or similar integrity checking footer can be added to verify a successful decryption. Replay attacks are prevented by adding a time stamp to the message (e.g., Kerberos [11] uses time stamps to prevent messages to the ticket-granting service from being replayed). Usually, the storage and processing overhead for encrypting a whole document is too high to be practical. This is solved by one-way hash functions. These are functions that map the content of a message onto a short value (called message digest). Similar to one-way functions, it is difficult to create a message when given only the hash value itself. Instead of encrypting the whole message, it is enough to simply encrypt the message digest and send it together with the original message. The receiver can then apply

© 2005 by CRC Press

Internet Security

6-9

the known hash function (e.g., MD5 [15]) to the document and compare it to the decrypted digest. When both values match, the message is authentic.

6.2.3 Attack and Intrusion Detection Attack detection assumes that an attacker can obtain access to his desired targets and is successful in violating a given security policy. Mechanisms in this class are based on the optimistic assumption that most of the time the information is transferred without interference. When undesired actions occur, attack detection has the task of reporting that something went wrong and then to react in an appropriate way. In addition, it is often desirable to identify the exact type of attack. An important facet of attack detection is recovery. Often it is enough to just report that malicious activity has been found, but some systems require that the effect of the attack has to be reverted or that an ongoing and discovered attack is stopped. On the one hand, attack detection has the advantage that it operates under the worst-case assumption that the attacker gains access to the communication channel and is able to use or modify the resource. On the other hand, detection is not effective in providing confidentiality of information. When the security policy specifies that interception of information has a serious security impact, then attack detection is not an applicable mechanism. The most important members of the attack detection class, which have received an increasing amount of attention in the last few years, are intrusion detection systems (IDSs). Intrusion detection [2, 3] is the process of identifying and responding to malicious activities targeted at computing and network resources. This definition introduces the notion of intrusion detection as a process, which involves technology, people, and tools. An intrusion detection system basically monitors and collects data from a target system that should be protected, processes and correlates the gathered information, and initiates responses, when evidence for an intrusion is detected. IDSs are traditionally classified as anomaly or signature based. Signature-based systems act similar to virus scanners and look for known, suspicious patterns in their input data. Anomaly-based systems watch for deviations of actual from expected behavior and classify all abnormal activities as malicious. The advantage of signature-based designs is the fact that they can identify attacks with an acceptable accuracy and tend to produce fewer false alarms (i.e., classifying an action as malicious when in fact it is not) than their anomaly-based cousins. The systems are more intuitive to build and easier to install and configure, especially in large production networks. Because of this, nearly all commercial systems and most deployed installations utilize signature-based detection. Although anomaly-based variants offer the advantage of being able to find prior unknown intrusions, the costs of having to deal with an order of magnitude more false alarms is often prohibitive. Depending on their source of input data, IDSs can be classified as either network or host based. Network-based systems collect data from network traffic (e.g., packets by network interfaces in promiscuous mode), while host-based systems monitor events at the operating system level, such as system calls, or receive input from applications (e.g., via log files). Host-based designs can collect high-quality data directly from the affected system and are not influenced by encrypted network traffic. Nevertheless, they often seriously impact performance of the machines they are running on. Network-based IDSs, on the other hand, can be set up in a nonintrusive manner — often as an appliance box without interfering with the existing infrastructure. In many cases, this makes them the preferred choice. As many vendors and research centers have developed their own intrusion detection system versions, the Internet Engineering Task Force (IETF) has created the intrusion detection working group [9] to coordinate international standardization efforts. The aim is to allow intrusion detection systems to share information and to communicate via well-defined interfaces by proposing a generic architectural description and a message specification and exchange format (IDMEF). A major issue when deploying intrusion detection systems in large network installations is the huge number of alerts that are produced. These alerts have to be analyzed by system administrators who have to decide on the appropriate countermeasures. Given the current state of the art of intrusion detection, however, many of the reported incidents are in fact false alerts. This makes the analysis process for the

© 2005 by CRC Press

6-10

The Industrial Communication Technology Handbook

system administrator cumbersome and frustrating, resulting in the problem that IDSs are often disabled or ignored. To address this issue, two new techniques have been proposed: alert correlation and alert verification. Alert correlation is an analysis process that takes as input the alerts produced by intrusion detection systems and produces compact reports on the security status of the network under surveillance. By reducing the total number of individual alerts and aggregating related incidents into a single report, it is easier for a system administrator to distinguish actual and bogus alarms. In addition, alert correlation offers the benefit of recognizing higher-level patterns in an alert stream, helping the administrator to obtain a better overview of the activities on the network. Alert verification is a technique that is directly aimed at the problem that intrusion detection systems often have to analyze data without sufficient contextual information. The classic example is the scenario of a Code Red worm that attacks a Linux Web server. It is a valid attack that is seen on the network; however, the alert that an IDS raises is of no use because the Linux server is not vulnerable (as Code Red can only exploit vulnerabilities in Microsoft’s IIS Web server). The intrusion detection system would require more information to determine that this attack cannot possibly succeed than what is available from only looking at network packets. Alert verification is a term that is used for all mechanisms that use additional information or means to determine whether an attack was successful. In the example above, the alert verification mechanism could supply the IDS with the knowledge that the attacked Linux server is not vulnerable to a Code Red attack. As a consequence, the IDS can react accordingly and suppress the alert or reduce its priority and thus reduce the workload of the administrator.

6.3 Secure Network Protocols Now that the general concepts and mechanisms of network security have been introduced, the following section concentrates on two actual instances of secure network protocols: the secure sockets layer (SSL [20]) and the transport layer security (TLS [24]) protocol. The idea of secure network protocols is to create an additional layer between the application and transport/network layers to provide services for a secure end-to-end communication channel. TCP/IP are almost always used as transport/network layer protocols on the Internet, and their task is to provide a reliable end-to-end connection between remote tasks on different machines that intend to communicate. The services on that level are usually directly utilized by application protocols to exchange data, for example, Hypertext Transfer Protocol (HTTP) for Web services. Unfortunately, the network layer transmits this data unencrypted, leaving it vulnerable to eavesdropping or tampering attacks. In addition, the authentication mechanisms of TCP/IP are only minimal, thereby allowing a malicious user to hijack connections and redirect traffic to his machine as well as to impersonate legitimate services. These threats are mitigated by secure network protocols that provide privacy and data integrity between two communicating applications by creating an encrypted and authenticated channel. SSL has emerged as the de facto standard for secure network protocols. Originally developed by Netscape, its latest version SSL 3.0 is also the base for the standard proposed by the IETF under the name TLS. Both protocols are quite similar and share common ideas, but they unfortunately cannot interoperate. The following discussion will mainly concentrate on SSL and only briefly explain the extensions implemented in TLS. The SSL protocol [21] usually runs above TCP/IP (although it could use any transport protocol) and below higher-level protocols such as HTTP. It uses TCP/IP on behalf of the higher-level protocols, and in the process allows an SSL-enabled server to authenticate itself to an SSL-enabled client, allows the client to authenticate itself to the server, and allows both machines to establish an encrypted connection. These capabilities address fundamental concerns about communication over the Internet and other TCP/IP networks and give protection against message tampering, eavesdropping, and spoofing. • SSL server authentication allows a user to confirm a server’s identity. SSL-enabled client software can use standard techniques of public key cryptography to check that a server’s certificate and

© 2005 by CRC Press

Internet Security

6-11

public key are valid and have been issued by a certification authority (CA) listed in the client’s list of trusted CAs. This confirmation might be important if the user, for example, is sending a credit card number over the network and wants to check the receiving server’s identity. • SSL client authentication allows a server to confirm a user’s identity. Using the same techniques as those used for server authentication, SSL-enabled server software can check that a client’s certificate and public key are valid and have been issued by a certification authority (CA) listed in the server’s list of trusted CAs. This confirmation might be important if the server, for example, is a bank sending confidential financial information to a customer and wants to check the recipient’s identity. • An encrypted SSL connection requires all information sent between a client and a server to be encrypted by the sending software and decrypted by the receiving software, thus providing a high degree of confidentiality. Confidentiality is important for both parties to any private transaction. In addition, all data sent over an encrypted SSL connection are protected with a mechanism for detecting tampering — that is, for automatically determining whether the data has been altered in transit. SSL uses X.509 certificates for authentication, RSA as its public key cipher, and one of RC4-128, RC2128, DES, 3DES, or IDEA (international data encryption algorithm) as its bulk symmetric cipher. The SSL protocol includes two subprotocols: the SSL Record Protocol and the SSL Handshake Protocol. The SSL Record Protocol simply defines the format used to transmit data. The SSL Handshake Protocol (using the SSL Record Protocol) is utilized to exchange a series of messages between an SSL-enabled server and an SSLenabled client when they first establish an SSL connection. This exchange of messages is designed to facilitate the following actions: • Authenticate the server to the client • Allow the client and server to select the cryptographic algorithms, or ciphers, that they both support • Optionally authenticate the client to the server • Use public key encryption techniques to generate shared secrets • Establish an encrypted SSL connection based on the previously exchanged shared secret The SSL Handshake Protocol is composed of two phases. Phase 1 deals with the selection of a cipher, the exchange of a secret key, and the authentication of the server. Phase 2 handles client authentication, if requested, and finishes the handshaking. After the handshake stage is complete, the data transfer between client and server begins. All messages during handshaking and after are sent over the SSL Record Protocol layer. Optionally, session identifiers can be used to reestablish a secure connection that has been previously set up. Figure 6.4 lists in a slightly simplified form the messages that are exchanged between the client C and the server S during a handshake when neither client authentication nor session identifiers are involved. In this figure, {data}key means that data has been encrypted with a key. The message exchange shows that the client first sends a challenge to the server, which responds with an X.509 certificate containing its public key. The client then creates a secret key and uses RSA with the server’s public key to encrypt it, sending the result back to the server. Only the server is capable of decrypting that message with its private key and can retrieve the shared, secret key. In order to prove to the client that

FIGURE 6.4 SSL handshake message exchange.

© 2005 by CRC Press

6-12

The Industrial Communication Technology Handbook

the secret key has been successfully decrypted, the server encrypts the client’s challenge with the secret key and returns it. When the client is able to decrypt this message and successfully retrieve the original challenge by using the secret key, it can be certain that the server has access to the private key corresponding to its certificate. From this point on, all communication is encrypted using the chosen cipher and the shared secret key. TLS uses the same two protocols shown above and a similar handshake mechanism. Nevertheless, the algorithms for calculating message authentication codes (MACs) and secret keys have been modified to make them cryptographically more secure. In addition, the constraints on padding a message up to the next block size have been relaxed for TLS. This leads to an incompatibility between both protocols. SSL/TLS is widely used to secure Web and mail traffic. HTTP and the current mail protocols IMAP (Internet Message Access Protocol) and POP3 (Post Office Protocol version 3) transmit user credential information as well as application data unencrypted. By building them on top of a secure network protocol such as SSL/TLS, they can benefit from secured channels without modifications. The secure communication protocols simply utilize different well-known destination ports (443 for HTTPs, 993 for IMAPs, and 995 for POP3s) than their insecure cousins.

6.4 Secure Applications A variety of popular tools that allow access to remote hosts (such as telnet, rsh, and rlogin) or that provide means for file transfer (such as rcp or ftp) exchange user credentials and data in plain text. This makes them vulnerable to eavesdropping, tampering, and spoofing attacks. Although the tools mentioned above could have also been built upon SSL/TLS, a different protocol suite called Secure Shell (SSH) [19] has been developed that follows partial overlapping goals. The SSH transport and user authentication protocols have features similar to those of SSL/TLS. However, they are different in the following ways: • TLS server authentication is optional and the protocol supports fully anonymous operation, in which neither side is authenticated. As such connections are inherently vulnerable to man-inthe-middle attacks, SSH requires server authentication. • TLS does not provide the range of client authentication options that SSH does — public key via RSA is the only option. • Most importantly, TLS does not have the extra features provided by the SSH connection protocol. The SSH connection protocol uses the underlying connection (also known as a secure tunnel), which has been established by the SSH transport and user authentication protocols between two hosts. It provides interactive login sessions, remote execution of commands and forwarded TCP/IP, as well as X11 connections. All these terminal sessions and forwarded connections are realized as different logical channels that may be opened by either side on top of the secure tunnel. Channels are flow controlled, which means that no data may be sent to a channel until a message is received to indicate that window space is available. The current version of the SSH protocol is SSH 2. It represents a complete rewrite of SSH 1 and improves some of its structural weaknesses. Because it encrypts packets in a different way and has abandoned the notion of server and host keys in favor of host keys only, the protocols are incompatible. For applications built from scratch, SSH 2 should always be the preferred choice. Using the means of logical channels for interactive login sessions and remote execution, a complete replacement for telnet, rsh, and rlogin could be easily implemented. A popular site that lists open-source implementations that are freely available for many different platforms can be found under [14]. Recently, a secure file transfer protocol (sftp) application has been developed that makes the use of regular File Transfer Protocol (FTP)-based programs obsolete. Notice that it is possible to tunnel arbitrary application traffic over a connection that has been previously set up by the SSH protocols. Similar to SSL/TLS, Web and mail traffic could be securely transmitted over a SSH connection before reaching the server port at the destination host. The difference is that SSH requires that a secure tunnel is created in advance that is bound to a certain port at the destination host. The setup

© 2005 by CRC Press

Internet Security

6-13

of this secure channel, however, requires that the client that is initiating the connection has to log in to the server. Usually, this makes it necessary that the user has an account at the destination host. After the tunnel has been established, all traffic sent by the client gets forwarded to the desired port at the target machine. Obviously, the connection is encrypted. In contrast, SSL/TLS connects directly to a certain point without prior logging in to the destination host. The encryption is set up directly between the client and the service listening at the destination port without a prior redirection via the SSH server. The technique of tunneling application traffic is often utilized for mail transactions when the mail server does not support SSL/TLS directly (as users have accounts at the mail server anyway), but it is less common for Web traffic.

6.5 Summary This chapter discusses security threats that systems face when they are connected to the Internet. In order to achieve the security properties that are required by the security policy in use, three different classes of mechanisms can be adopted. The first is attack prevention, which attempts to stop the attacker before it can reach its desired goals. Such techniques fall into the category of access control and firewalls. The second approach aims to make the data unusable for unauthorized persons by applying cryptographic means. Secret key as well as public key mechanisms can be utilized. The third class of mechanisms contains attack detection approaches. They attempt to detect malicious behavior and recover after undesired activity has been identified. The text also covers secure network protocols and applications. SSL/TLS as well as SSH are introduced, and their most common fields of operations are highlighted. These protocols form the base of securing traffic that is sent over the Internet on behalf of a variety of different applications.

References [1] Advanced Encryption Standard (AES). National Institute of Standards and Technology, U.S. Department of Commerce, FIPS 197, 2001. [2] Edward Amoroso. Intrusion Detection: An Introduction to Internet Surveillance, Correlation, Trace Back, and Response. Intrusion.Net Books, Andover, NJ, 1999. [3] Rebecca Bace. Intrusion Detection. Macmillan Technical Publishing, Indianapolis, 2000. [4] William R. Cheswick and Steven M. Bellovin. Firewalls and Internet Security. Addison-Wesley, Reading, MA, 1994. [5] George Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems: Concepts and Design, 2nd edition. Addison-Wesley, Harlow, England, 1996. [6] Data Encryption Standard (DES). National Bureau of Standards, U.S. Department of Commerce, FIPS 46-3, 1977. [7] W. Diffie and M. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22:644-654, 1976. [8] Fyodor. Nmap: The Network Mapper. http://www.insecure.org/nmap/. [9] Intrusion Detection Working Group. http://www.ietf.org/ids.by.wg/idwg.html. [10] IP Security Protocol. http://www.ietf.org/html.charters/ipsec-charter.html, 2002. [11] J. Kohl, B. Neuman, and T. T’so. The evolution of the Kerberos authentication system. Distributed Open Systems, 78-94, IEEE Computer Society Press, 1994. [12] Nessus Vulnerabilty Scanner. http://www.nessus.org/. [13] Steven Northcutt. Network Intrusion Detection: An Analyst’s Handbook. New Riders, Indianapolis, 1999. [14] OpenSSH: Free SSH Tool Suite. http://www.openssh.org. [15] R.L. Rivest. The MD5 Message-Digest Algorithm. Technical report, Internet Request for Comments (RFC) 1321, 1992.

© 2005 by CRC Press

6-14

The Industrial Communication Technology Handbook

[16] R.L. Rivest. The RC4 Encryption Algorithm. Technical report, RSA Data Security, 1992. [17] R.L. Rivest, A. Shamir, and L. A. Adleman. A method for obtaining digital signatures and publickey cryptosystems. Communications of the ACM, 21:120-126, 1978. [18] Bruce Schneier. Applied Cryptography, 2nd edition. John Wiley & Sons, New York, 1996. [19] Secure Shell (secsh). http://www.ietf.org/html.charters/secsh-charter.html, 2002. [20] Secure Socket Layer. http://wp.netscape.com/eng/ssl3/, 1996. [21] Introduction to Secure Socket Layer. http://developer.netscape.com/docs/manuals/security/ sslin/contents.htm, 1996. [22] William Stallings. Network Security Essentials: Applications and Standards. Prentice Hall, Englewood Cliffs, NJ, 2000. [23] Andrew S. Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs, NJ, 2002. [24] Transport Layer Security. http://www.ietf.org/html.charters/tsl-charter.html, 2002. [25] Public Key Infrastructure X.509. http://www.ietf.org/html.charters/pkix-charter.html, 2002.

© 2005 by CRC Press

2 Industrial Communication Technology and Systems

2-1 © 2005 by CRC Press

I Field Area and Control Networks 7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner 13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie

I-3 © 2005 by CRC Press

7 Fieldbus Systems: History and Evolution 7.1 7.2

What Is a Fieldbus? .............................................................7-1 Notions of a Fieldbus..........................................................7-2 The Origin of the Word • Fieldbuses as Part of a Networking Concept

7.3

History .................................................................................7-5

7.4

Fieldbus Standardization ....................................................7-8

The Roots of Industrial Networks • The Evolution of Fieldbuses The German–French Fieldbus War • The International Fieldbus War • The Compromise

7.5

Fieldbus Characteristics ....................................................7-15 Communication Concepts • Communication Paradigms • Above the OSI Layers: Interoperability and Profiles • Management

7.6

New Challenges: Industrial Ethernet ...............................7-20 Ethernet in IEC 61158 • Real-Time Industrial Ethernet

7.7

Aspects for Future Evolution............................................7-25 Driving Forces • System Complexity • Software Tools and Management • Network Interconnection and Security

Thilo Sauter Austrian Academy of Sciences

7.8 Conclusion and Outlook ..................................................7-30 Acknowledgments........................................................................7-31 References .....................................................................................7-31 Appendix ......................................................................................7-37

7.1 What Is a Fieldbus? Throughout the history of automation, many inventions and developments have influenced the face of manufacturing and information processing. But few novelties have had such a radical effect as the introduction of fieldbus systems, and no single achievement was so heavily disputed as these industrial networks. And yet, they have made automation what it is today. But even after some 20 years of fieldbus development, there exists no clear-cut definition for the term. The “definition” given in the IEC 61158 fieldbus standard is more a programmatic declaration, or a least common multiple compromise, than a concise formulation [1]: “A fieldbus is a digital, serial, multidrop, data bus for communication with industrial control and instrumentation devices such as — but not limited to — transducers, actuators and local controllers.” It comprises some important characteristics, but is far from being complete. On the other hand, it is a bit too restrictive. A more elaborate explanation is given by the Fieldbus Foundation, the user organization supporting one of the major fieldbus systems [2]: “A Fieldbus is a digital, two-way, multi-drop communication link among intelligent measurement and control devices. It serves as a Local Area Network (LAN) for advanced process control, remote input/output and high speed factory automation applications.” Again, this is a

7-1 © 2005 by CRC Press

7-2

The Industrial Communication Technology Handbook

bit restrictive, for it limits the application to process and factory automation, the primary areas where the Foundation Fieldbus is used. The lack of a clear definition is mostly due to the complex evolutionary history of fieldbuses. A look at today’s situation reveals that fieldbus systems are employed in all automation domains ranging from the aforementioned process and factory areas to building and home automation, machine building, automotive and railway applications, and avionics. In all those fields, bus systems emerged primarily to break up the conventional star-type point-to-point wiring schemes connecting simple digital and analog input and output devices to central controllers, thereby laying the grounds for the implementation of really distributed systems with more intelligent devices. As was declared in the original mission statement of the International Electrotechnical Commission (IEC) work, “the Field Bus will be a serial digital communication standard which can replace present signalling techniques such as 4-20 mA … so that more information can flow in both directions between intelligent field devices and the higher level control systems over a shared communication medium …” [3, 4]. But even though the replacement of especially the traditional 4–20 mA current loop by a digital interface is bequeathed as the sole impetus of fieldbus development, still in contemporary publications [5], there is much more to the idea of the fieldbus: • Flexibility and modularity: A fieldbus installation like any other network can be extended much more easily than a centralized system, provided the limitations of addressing space, cable length, etc., are not exceeded. • Configureability: A network — other than an analog interface — permits the parameterization and configuration of complex field devices, which facilitates system setup and commissioning and is the primary requirement for the usability of intelligent devices. • Maintainability: Monitoring of devices, applying updates, and other maintenance tasks are easier, if at all possible, via a network. • Distribution: A network is the prerequisite of distributed systems; many data processing tasks can be removed from a central controller and placed directly in the field devices if the interface can handle reasonably complex ways of communication. These aspects are not just theoretical contemplations but actual user demands that influenced the development from the beginning [4]. However, as the application requirements in the various automation domains were quite different, so were the solutions, and that makes it difficult to find a comprehensive definition. The purpose of this contribution is not to find the one and only precise definition for what constitutes a fieldbus. The vast literature on this topic shows that this is a futile attempt. Furthermore, such a definition would be mostly of academic nature and is not necessary that either. Instead, the following sections will treat the fieldbus as a given phenomenon in automation and look at it from different sides. Typical characteristics will be discussed as well as the role of fieldbus systems in a globally networked automation world. The major part of this chapter will be devoted to the historical evolution and the standardization processes, which will enlighten the current situation. Finally, future aspects and evolutionary potential are briefly discussed.

7.2 Notions of a Fieldbus Fieldbus systems have to be seen as an integrative part of a comprehensive automation concept and not as stand-alone solutions. The name is therefore programmatic and evocative. It seems to give an indication of the intentions the developers had in mind and thus deserves special attention.

7.2.1 The Origin of the Word Interestingly enough, not even the etymology of the term itself is fully clear. The English word fieldbus is definitely not the original one. It appeared around 1985 when the fieldbus standardization project

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-3

within IEC TC65 was launched [4] and seems to be a straightforward literal translation of the German term Feldbus, which can be traced back until about 1980 [6]. Indeed, the overwhelming majority of early publications in the area are available only in German. The word itself was coined in process industry and primarily refers to the process field, designating the area in a plant where lots of distributed field devices, mostly sensors and actuators, are in direct contact with the process to be controlled. Slightly after the German expression and sharing its etymological root, the French word réseau de terrain (or réseau d’instrumentation, instrumentation network) emerged. This term was not specifically targeted at the process industry, but refers also to large areas with scattered devices. The connection of such devices to the central control room was traditionally made via point-to-point links, which resulted in a significant and expensive cabling need. The logical idea, powered by the advances of microelectronics in the late 1970s, was to replace this star-like cabling in the field by a party-line, bus-like installation connecting all devices via a shared medium — the fieldbus [7, 8]. Given the large dimensions of process automation plants, the benefits of a bus are particularly evident. However, the concept was not undisputed when it was introduced. The fieldbus approach was an ambitious concept: a step toward decentralization, including the preprocessing of data in the field devices, which both increases the quality of process control and reduces the computing burden for the centralized controllers [9]. Along with it came the possibility to configure and parameterize the field devices remotely via the bus. This advanced concept, on the other hand, demanded increased communication between the devices that goes far beyond a simple data exchange. This seemed infeasible to many developers, and still in the mid-1980s, one could read statements like the following [10]: “The idea of the fieldbus concept seems promising. However, with reasonable effort it is not realizable at present.” The alternative and somewhat more conservative approach was the development of so-called field multiplexers, devices that collect process signals in the field, serialize them, and transfer them via one single cable to a remote location where a corresponding device de-multiplexes them again [11]. For quite some time, the two concepts competed and coexisted [12], but ultimately the field multiplexers mostly disappeared, except for niches in process automation, where many users still prefer such remote input/ output (I/O) systems despite the advantages of fieldbus solutions [13]. The central field multiplexer concept of sampling I/O points and transferring their values in simple data frames also survived in some fieldbus protocols, especially designed for low-level applications. The desire to cope with the wiring problem getting out of hand in large installations was certainly the main impetus for the development of fieldbus systems. Other obvious and appealing advantages of the concept are modularity, the possibility to easily extend installations, and the possibility to have much more intelligent field devices that can communicate not just for the sake of process data transfer, but also for maintenance and configuration purposes [14, 15]. A somewhat different viewpoint that led to different design approaches was to regard bus systems in process control as the spine of distributed real-time systems [16]. While the wiring optimization concepts were in many cases rather simple bottom-up approaches, these distributed real-time ideas resulted in sophisticated and usually well investigated top-down designs.

7.2.2 Fieldbuses as Part of a Networking Concept An important role in the fieldbus evolution has been played by the so-called automation pyramid. This hierarchical model was defined to structure the information flow required for factory and process automation. The idea was to create a transparent, multilevel network — the basis for computer-integrated manufacturing (CIM). The numbers vary, but typically this model is composed of up to five levels [7, 8]. While the networks for the upper levels already existed by the time the pyramid was defined, the field level was still governed by point-to-point connections. Fieldbus systems were therefore developed also with the aim of finally bridging this gap. The actual integration of field-level networks into the rest of the hierarchy was in fact considered in early standardization [4]; for most of the proprietary developments, however, it was never the primary intention. In the automation pyramid, fieldbuses actually populate two levels: the field level and the cell/process level. For this reason, they are sometimes further differentiated into two classes:

© 2005 by CRC Press

7-4

The Industrial Communication Technology Handbook

• Sensor–actuator buses or device buses have very limited capabilities and serve to connect very simple devices with, e.g., programmable logic controllers (PLCs). They can be found exclusively on the field level. • Fieldbuses connect control equipment like PLCs and PCs as well as more intelligent devices. They are found on the cell level and are closer to computer networks. Depending on the point of view, there may even be a third sublevel [17]. This distinction may seem reasonable but is in fact problematic. There are only few fieldbus systems that can immediately be allocated to one of the groups; most of them are used in both levels. Therefore, it should be preferable to abandon this arbitrary differentiation. How do fieldbus systems compare to computer networks? The classical distinction of the different network types used in the automation pyramid hinges on the distances the networks span. From top down, the hierarchy starts with global area networks (GANs), which cover long, preferably intercontinental distances and nowadays mostly use satellite links. On the second level are wide area networks (WANs). They are commonly associated with telephone networks (no matter if analog or digital). Next come the well-known local area networks (LANs), with Ethernet as the most widely used specimen today. They are the classical networks for office automation and cover only short distances. The highest level of the model shown in Figure 7.1 is beyond the scope of the original definition, but is gaining importance with the availability of the Internet. In fact, Internet technology is penetrating all levels of this pyramid all the way down to the process level. From GANs to LANs, the classification according to the spatial extension is evident. One step below, on the field level, this criterion fails, because fieldbus systems or field area networks (FANs) can cover even larger distances than LANs. Yet, as LANs and FANs evolved nearly in parallel, some clear distinction between the two network types seemed necessary. As length is inappropriate, the classical border line drawn between LANs and FANs relies mostly on the characteristics of the data transported over these networks. Local area networks have high data rates and carry large amounts of data in large packets. Timeliness is not a primary concern, and real-time behavior is not required. Fieldbus systems, by contrast, have low data rates. Since they transport mainly process data, the size of the data packets is small, and real-time capabilities are important. For some time, these distinction criteria between LANs and FANs were sufficient and fairly described the actual situation. Recently, however, drawing the line according to data rates and packet sizes is no longer applicable. In fact, the boundaries between LANs and fieldbus systems have faded. Today, there are fieldbus systems with data rates well above 10 Mbit/s, which is still standard in older LAN installations. In addition, more and more applications require the transmission of video or voice data, which results in large data packets. Network types

Protocol hierarchy

company level

global area networks

factory level

wide area networks

TOP

MAP shop floor level

cell level process level

field level

cell controller

PLC

sensors/actuators

CNC

local area networks Mini-MAP field area networks

Fieldbus

sensor-actuator networks

(sensor level)

FIGURE 7.1 Hierarchical network levels in automation and protocols originally devised for them.

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-5

On the other hand, Ethernet as the LAN technology is becoming more and more popular in automation and is bound to replace some of today’s widely used midlevel fieldbus systems. The real-time extensions under development tackle its most important drawback and will ultimately permit the use of Ethernet in low-level control applications. At least for the next 5 years, however, it seems that Industrial Ethernet will not make the lower-level fieldbuses fully obsolete. They are much better optimized for their specific automation tasks than the general-purpose network Ethernet. But the growing use of Ethernet results in a reduction of the levels in the automation hierarchy. Hence the pyramid gradually turns into a flat structure with at most three, maybe even only two, levels. Consequently, a more appropriate distinction between LANs and FANs should be based on the functionality and the application area of these networks. According to this argumentation, a fieldbus is simply a network used in automation, irrespective of topology, data rates, protocols, or real-time requirements. Consequently, it need not be confined to the classical field level; it can be found on higher levels (provided they still exist) as well. A LAN, on the other hand, belongs to the office area. This definition is loose, but mirrors the actual situation. Only one thing seems strange at first: following this definition, the Industrial Ethernet changes into a fieldbus, even though many people are inclined to associate it with LANs. However, this is just another evidence that the boundaries between LANs and FANs are fading.

7.3 History The question of what constitutes a fieldbus is closely linked to the evolution of these industrial networks. The best approach to understanding the essence of the concepts is to review the history and intentions of the developers. This review will also falsify one of the common errors frequently purported by marketing divisions of automation vendors: that fieldbus systems were a revolutionary invention. They may have revolutionized automation — there is hardly any doubt about it. However, they were only a straightforward evolution that built on preexisting ideas and concepts.

7.3.1 The Roots of Industrial Networks Although the term fieldbus appeared only about 20 years ago, the basic idea of field-level networks is much older. Still, the roots of modern fieldbus technology are mixed. Both classical electrical engineering and computer science have contributed their share to the evolution, and we can identify three major sources of influence: • Communication engineering with large-scale telephone networks • Instrumentation and measurement systems with parallel buses and real-time requirements • Computer science with the introduction of high-level protocol design This early stage is depicted in Figure 7.2. One foundation of automation data transfer has to be seen in the classic telex networks and also in standards for data transmission over telephone lines. Large distances called for serial data transmission, and many of these comparatively early standards still exist, like V.21 (data transmission over telephone lines) and X.21 (data transmission over special data lines). Various protocols have been defined, mostly described in state machine diagrams and rather simple because of the limited computing power of the devices available at that time. Of course, these communication systems have a point-to-point nature and therefore lack the multidrop characteristic of modern fieldbus systems, but nevertheless, they were the origin of serial data transmission. Talking about serial data communication, one should notice that the engineers who defined the first protocols often had a different understanding of the terms serial and parallel than we have today. For example, the serial interface V.24 transmits the application data serially, but the control data in a parallel way over separate control lines. In parallel to the development of data transmission in the telecommunication sector, hardware engineers defined interfaces for stand-alone computer systems to connect peripheral devices such as printers. The basic idea of having standardized interfaces for external devices was soon extended to process control and instrumentation equipment. The particular problems to be solved were the synchronization of

© 2005 by CRC Press

7-6

The Industrial Communication Technology Handbook

Centronics parallel printer interfaces

Telex DT

CAMAC industrial parallel interfaces

Teletex DT

GPIB

RS 485

serial interfaces

fieldbus systems

SS7 V.21

X.21

DT in telecommunications Computer WAN X.25

FIGURE 7.2 Roots of fieldbus systems.

spatially distributed measurement devices and the collection of measurement data from multiple devices in large-scale experimental setups. This led to the development of standards like CAMAC (computerautomated measurement and control, mostly used in nuclear science) and GPIB (general purpose interface bus, later also known as IEEE 488). To account for the limited data processing speed and real-time requirements for synchronization, these bus systems had parallel data and control lines, which is also not characteristic for fieldbus systems. However, they were using the typical multidrop structure. Later on, with higher integration density of integrated circuits and thus increased functionality and processing capability of microcontrollers, devices became smaller and portable. The connectors of parallel bus systems were now too big and clumsy, and alternatives were sought [18]. The underlying idea of developments like I2C [19] was to extend the already existing serial point-to-point connections of computer peripherals (based on RS 232) to support longer distances and finally also multidrop arrangements. The capability of having a bus structure with more than just two connections together with an increased noise immunity due to differential signal coding eventually made RS 485 a cornerstone of fieldbus technology up to the present day. Historically the youngest root of fieldbus systems, but certainly the one that left the deepest mark, was the influence of computer science. Its actual contribution was a structured approach to the design of high-level communication systems, contrary to the mostly monolithic design approaches that had been sufficient until then. This change in methodology had been necessitated by the growing number of computers used worldwide and the resulting complexity of communication networks. Conventional telephone networks were no longer sufficient to satisfy the interconnection requirements of modern computer systems. As a consequence, the big communication backbones of the national telephone companies gradually changed from analog to digital systems. This opened the possibility to transfer large amounts of data from one point to another. Together with an improved physical layer, the first really powerful data transmission protocols for wide area networks were defined, such as X.25 (packet switching) or SS7 (common channel signaling). In parallel to this evolution on the telecommunications sector, local area networks were devised for the local interconnection of computers, which soon led to a multitude of solutions. It took nearly a decade until Ethernet and the Internet Protocol (IP) suite finally gained the dominating position they have today.

7.3.2 The Evolution of Fieldbuses The preceding section gave only a very superficial overview of the roots of networking, which laid the foundations not only of modern computer networks, but also of those on the field level. But let us now look more closely at the actual evolution of the fieldbus systems. Here again, we have to consider the

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-7

different influences of computer science and electrical engineering. First and foremost, the key contribution undoubtedly came from the networking of computer systems, when the International Organization for Standardization (ISO) introduced the Open Systems Interconnection (OSI) model [20, 21]. This seven-layer reference model was (and still is) the starting point for the development of many complex communication protocols. The first application of the OSI model to the domain of automation was the definition of the Manufacturing Automation Protocol (MAP) in the wake of the CIM idea [22]. MAP was intended to be a framework for the comprehensive control of industrial processes covering all automation levels, and the result of the definition was a powerful and flexible protocol [23]. Its complexity, however, made implementations extremely costly and hardly justifiable for general-purpose use. As a consequence, a tightened version called MiniMAP, using a reduced model based on OSI layers 1, 2, and 7, was proposed to better address the problems of the lower automation layers [24]. Unfortunately, it did not have the anticipated success either. What did have success was Manufacturing Message Specification (MMS). It defined the cooperation of various automation components by means of abstract objects and services and was later used as a starting point for many other fieldbus definitions [25]. The missing acceptance of MiniMAP and the inapplicability of the original MAP/MMS standard to time-critical systems [26] were finally the reason for the IEC to launch the development of a fieldbus based on the MiniMAP model, but tailored to the needs of the field level. According to the original objectives, the higher levels of the automation hierarchy should be covered by MAP or PROWAY (process data highway) [22]. Independent of this development in computer science, the progress in microelectronics brought forward many different integrated controllers (ICs), and new interfaces were needed to interconnect the ICs in an efficient and cheap way. The driving force was the reduction of both the interconnect wires on the printed circuit boards and the number of package pins on the ICs. Consequently, electrical engineers — without knowledge of the ISO/OSI model or similar architectures — defined simple buses like the I2C. Being interfaces rather than fully fledged bus systems, they have very simple protocols, but they were and still are widely used in various electronic devices. Long before the invention of board-level buses, the demand for a reduction of cabling weight in avionics and space technology had led to the development of the MIL-STD-1553 bus, which can be regarded as the first real fieldbus. Introduced in 1970, it showed many characteristic properties of modern fieldbus systems: serial transmission of control and data information over the same line, master–slave structure, the possibility to cover longer distances, and integrated controllers. It is still used today. Later on, similar thoughts (reduction of cabling weight and costs) resulted in the development of several bus systems in the automotive industry, but also in the automation area. A characteristic property of these fieldbuses is that they were defined in the spirit of classical interfaces, with a focus on the lower two protocol layers, and no or nearly no application layer definitions. With time, these definitions were added to make the system applicable to other areas as well. Controller Area Network (CAN) is a good example of this evolution: for the originally targeted automotive market, the definition of the lowest two OSI layers was sufficient. Even today, automotive applications of CAN typically use only these low-level communication features because they are easy to use and the in-vehicle networks are usually closed. For applications in industrial automation, however, where extensibility and interoperability are important issues, higher-level functions are important. So, when CAN was found to be interesting also for other application domains, a special application layer was added. The lack of such a layer in the original definition is the reason why there are many different fieldbus systems (like CANopen, Smart Distributed System (SDS), DeviceNet) using CAN as a low-level interface. From today’s point of view, it can be stated that all fieldbuses that still have some relevance were developed using the top-down or computer science-driven approach, i.e., a proper protocol design with abstract high-level programming interfaces to facilitate usage and integration in complex systems. The fieldbuses that followed the bottom-up or electrical engineering-driven approach, i.e., that were understood as low-level computer interfaces, did not survive due to their inflexibility and incompatibility with modern software engineering, unless some application layer functions were included in the course of the evolution.

© 2005 by CRC Press

7-8

The Industrial Communication Technology Handbook

Computer Science

ARPANET Microprocessors

Internet

Ethernet

C4004

ISO/OSI

C8080

C8086

Building and home automation

80386

Batibus

Modbus ARCNET

PDV-Bus Automotive and avionics

IEEE488 GPIB Predecessors

1970

Pentium

BacNet EIB LON

FF PROWAY ISA SP50 FIP ControlNet IEC61158 P-NET Profibus SDS IEC61784 EN50254 Interbus ASi Bitbus EN50170 EN50325 Sercos Hart DeviceNet

ARINC

MIL 1553 Interfaces, Instrumentation, PCB busses CAMAC

80486

CEbus

X10

Industrial and process

WWW

MMS

MAP

CAN

I 2C HP-IL RS485

SwiftNet M-Bus

Meas. Bus

Proprietary and Open Systems

1980

1990

International Standards

2000

FIGURE 7.3 Milestones of fieldbus evolution and related fields.

From the early 1980s on, when automation made a great leap forward with PLCs and more intelligent sensors and actuators, something like a gold rush set in. The increasing number of devices used in many application areas called for reduced cabling, and microelectronics had grown mature enough to support the development of elaborated communication protocols. This was also the birth date for the fieldbus as an individual term. Different application requirements generated different solutions, and from today’s point of view, it seems that creating new fieldbus systems was a trendy and fashionable occupation for many companies in the automation business. Those mostly proprietary concepts never had a real future, because the number of produced nodes could never justify the development and maintenance costs. Figure 7.3 depicts the evolution timeline of fieldbus systems and their environments. The list of examples is of course not comprehensive; only systems that still have some significance have been selected. Details about the individual solutions are summarized in the tables in the appendix. As the development of fieldbus systems was a typical technology push activity driven by the device vendors, the users first had to be convinced of the new concepts. Even though the benefits were quite obvious, the overwhelming number of different systems appalled rather than attracted the customers, who were used to perfectly compatible current-loop or simple digital inputs and outputs as interfaces between field devices and controllers and were reluctant to use new concepts that would bind them to one vendor. What followed was a fierce selection process where not always the fittest survived, but often those with the highest marketing power behind them. Consequently, most of the newly developed systems vanished or remained restricted to small niches. After a few years of struggle and confusion on the user’s side, it became apparent that proprietary fieldbus systems would always have only limited success and that more benefit lies in creating open specifications, so that different vendors may produce compatible devices, which gives the customers back their freedom of choice [8, 27]. As a consequence, user organizations were founded to carry on the definition and promotion of the fieldbus systems independent of individual companies [28]. It was this idea of open systems that finally paved the way for the breakthrough of the fieldbus concept.

7.4 Fieldbus Standardization From creating an open specification to the standardization of a fieldbus system it is only a small step. The basic idea behind it is that a standard establishes a specification in a very rigid and formal way, ruling out the possibility of quick changes. This attaches a notion of reliability and stability to the specification,

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-9

which in turn secures the trust of the customers and, consequently, also the market position. Furthermore, a standard is vendor independent, which guarantees openness. Finally, in many countries standards have a legally binding position, which means that when a standard can be applied (e.g., in connection with a public tender), it has to be applied. Hence, a standardized system gains a competitive edge over its nonstandardized rivals. This position is typical for, e.g., Europe (see [29] for an interesting U.S.-centric comment). It is therefore no wonder that after the race for fieldbus developments, a race for standardization was launched. This was quite easy on a national level, and most of today’s relevant fieldbus systems soon became national standards. Troubles started when international solutions were sought. One problem of fieldbus standardization is that the activities are scattered among a multitude of committees and working groups according to the application fields. This reflects the historical evolution and underpins the previous statement that the fieldbus is not a unique and revolutionary technology, but emerged independently in many different areas. Interestingly enough, the standardization activities are not even confined to the electrotechnical standardization bodies. Inside the IEC, committees concerned are: • IEC TC65/SC65C: Industrial-Process Measurement and Control/Digital Communications • IEC TC17/SC17B: Switchgear and Controlgear/Low-Voltage Switchgear and Controlgear In the ISO, work is being done in: • ISO TC22/SC3: Road Vehicles/Electrical and Electronic Equipment • ISO TC184/SC5: Industrial Automation Systems and Integration/Architecture, Communications and Integration Frameworks • ISO TC205/WG3: Building Environment Design/Building Control Systems Design The second player in the international standardization arena is the European standardization bodies CENELEC and CEN.* They are not mirrors of the IEC and ISO; the committees work independently, even though much work is being done in parallel. In recent years, cooperation agreements were established with the aim of facilitating the harmonization of international standardization. The cooperation of ISO and CEN is governed by the Vienna Agreement [30] (1990), and that of IEC and CENELEC by the Dresden Agreement [31] (1996). Roughly, these documents define procedures to carry out parallel votings and to simultaneously adopt standards on both the international and European levels. In practice, this comes down to international standards always superseding European ones, even though there is the theoretical possibility of European work being adopted on an international level. Hence, European committees are today much more closely connected to their worldwide counterparts than they were at the beginning of the fieldbus era. Within CENELEC, such relevant committees are: • CLC TC65CX: Fieldbus • CLC TC17B: Low-voltage Switchgear and Controlgear Including Dimensional Standardization • CLC TC205: Home and Building Electronic Systems (HBES) In CEN, fieldbuses are defined in: • CEN TC247: Building Automation, Controls and Building Management The committee with the longest track record in fieldbus standardization is IEC SC65C, which in May 1985 started the ambitious endeavor of defining an international and uniform fieldbus standard for process and industrial automation. This initiative came relatively early, soon after the trend toward fieldlevel networking and the inability of MAP to fully cover it became apparent. With the background of several industry-driven solutions emerging all around, however, this project caused heavy turbulences and opened a battlefield for politics that gradually left the ground of technical discussion. Table 7.1 shows the overall timeline of these fieldbus wars, which form an essential and obscure chapter in the fieldbus history and thus deserve special attention. *CENELAC, Comité Européen de Normalisation Electrotechnique (European Committee for Electrotechnical Standardization); CEN, Comité Européen de Normalisation (European Committee for Standardization).

© 2005 by CRC Press

7-10

The Industrial Communication Technology Handbook

TABLE 7.1

Fieldbus Standardization Timeline from the Viewpoint of IEC 61158

Period

Status of Standards

1985–1990

The claims are staked

1990–1994

German–French fieldbus war

1995–1998

Standardization locked in stalemate

1999–2000 2000–2002

The compromise Amendments to reach maturity for the market

Major Activities Start of the IEC fieldbus project; selection of various national standards — German Profibus and French FIP are the main candidates; first attempts to combine the two approaches Attempt of a general specification based on WorldFIP and the Interoperable System Project (ISP) Development of the American Foundation Fieldbus (FF) in response to the European approach and formation of the CENELEC standards comprising several fieldbus systems in one standard; deadlock of the international standard through obstructive minorities The eight-type specification becomes a standard The standard is enhanced by more types and the necessary profiles are specified in IEC 61784

7.4.1 The German–French Fieldbus War The actual starting point of international fieldbus standardization in IEC SC65C was a new work item proposed by the German national mirror committee [32]. The task was allocated to the already existing working group 6 dealing with the definition of PROWAY, another fieldbus predecessor. At that time, the development of fieldbus systems was mainly a European endeavor, thrust forward by research projects that still had a strong academic background as well as many proprietary developments. The European activities — at least those on the nonproprietary level — also have to be seen as a response to MAP, where the U.S. had a dominating position. Hence, the two big European fieldbus projects at that time, Factory Instrumentation Protocol (FIP) and Profibus, were intended to be counterweights for the international automation world. The IEC work started with a definition of requirements a fieldbus must meet [4]. In parallel, the ISA SP50 committee started its own fieldbus project on the U.S. level and defined a slightly different set of requirements [24, 33]. Work was coordinated between the two committees, with ISA taking the more active part. It launched a call for proposals to evaluate existing solutions. In response to this call, the following systems were identified as possible candidates [34]: • FIP (Flux Information Processus, Factory Instrumentation Protocol), a French development started around 1982 • Profibus (derived from process field), a German project started around 1984 • A proposal from Rosemount based on the ISO 8802.4 token-passing bus • A proposal from Foxboro based on the high-level data link control (HDLC) protocol • The IEEE 1118 project, in fact an extension of Bitbus • An extension of MIL-STD-1553B defined by a U.K. consortium All these proposals were evaluated, and finally the two most promising projects retained for further consideration were the French FIP and the German Profibus. Unfortunately, the approaches of the two systems were completely different. Profibus was based on a distributed control idea and in its original form supported an object-oriented vertical communication according to the client–server model in the spirit of the MAP/MMS specification, with the lower two layers taken from the exiting PROWAY project. FIP, on the other hand, was designed with a central, but strictly real-time-capable control scheme and with the newly developed producer–consumer (producer–distributor–consumer) model for horizontal communication. In fact, the idea behind FIP was to have a distributed operating system; a communication protocol was just one building block. Different as they were, the two systems were well suited for complementary application areas [35]. Evidently, a universal fieldbus had to combine the benefits of both, and so the following years saw strong efforts to find a viable compromise and a convergence between the two approaches. The most problematic part was the data link layer, where Profibus supported a token-passing scheme, while FIP relied on a

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-11

central scheduling approach. The suggestion to standardize both in parallel was not supported, and so it came that two different proposals were put to vote: a token-passing approach and a new proposal defined by an expert group with the aim of reconciling the two worlds [32]. The latter was more FIP oriented and finally prevailed [36], but it was very complex and left many Profibus supporters skeptical about its practical usability. In the meantime, the leading role in the standardization efforts on the IEC level had been taken not by the Europeans, but by the work of the SP50 committee of the Instrumentation, Systems and Automation Society (ISA, at that time still standing for Instrument Society of America). Owing to its mandatory composition involving manufacturers and users, it had taken a more pragmatic view and had been much more efficient during the late 1980s [37]. Actually, the committee had defined and issued (as a U.S. standard in 1993) a solution on its own. The results of this work exerted an important influence on the layer structure of the standard as we have it today [8, 38]. Finally, ISA and IEC decided to have joint meetings [35], and from that point onward the actual technical work was done within ISA SP50, while IEC restricted its activities to organizing the voting process. By the mid-1990s, the IEC committee was still struggling to overcome the differences between Profibus and FIP in what was sarcastically called the two-headed monster. With respect to its goal of defining a uniform fieldbus solution, it had not produced any substantial outcome for more than 8 years. The only exception was the definition of the physical layer, which was adopted as IEC 61158-2 in 1993. This part is the one that has since been used very successfully, mainly in the process automation area. On top of the physical layer, however, the standardization drafts became more and more comprehensive and overloaded with all kinds of communication and control principles imported from the different systems. In the data link layer specification, for example, three different types of tokens were introduced: the scheduler token, which determines which station controls the timing on the bus; the delegated token, with which another station can temporarily gain control over the bus; and the circulated token, which is passed from station to station for bus access. The problem with these all-inclusive approaches was that a full implementation of the standard was too expensive, whereas a partial implementation would have resulted in incompatible and not interoperable devices (a problem that was encountered also in the early implementations of, e.g., Profibus-FMS (fieldbus message specification), where significant parts of the standard are optional and not mandatory). Outside the international standardization framework, but alerted by the inability of the committees to reach a resolution, the big vendors of automation systems launched two additional initiatives to find a compromise. The foundation of the international WorldFIP project in 1993 had the goal of adding the functionality of the client–server model to FIP [39]. On the other side, the Interoperable System Project (ISP) attempted to demonstrate from 1992 onward how Profibus could be enhanced with the publisher–subscriber communication model, which is about the same as the producer–consumer model of FIP. Strange enough, the ISP was abandoned in 1994, before reaching a mature state, for strategic reasons [40].

7.4.2 The International Fieldbus War In 1994, after long years of struggles between German and French experts to combine the FIP and Profibus approaches, several, mainly American, companies decided to no longer watch the endless discussions. With the end of the ISP project, several former project members joined forces with the WorldFIP North America organization and formed the Fieldbus Foundation. This new association began the definition of a new fieldbus optimized for the process industry: the Foundation Fieldbus (FF). The work was done outside the IEC committees within the ISA, and for some time, the IEC work seemed to doze off. Meanwhile in Europe, disillusion had run rampant [3]. Following the failure to find an acceptable IEC draft for a universal fieldbus, several players deemed it necessary to make a new start at least on a European level. Therefore, the CENELEC committee TC65CX was established in 1993 with the aim of finding an intermediate solution until an agreement was reached within IEC. By that time, the standardization issue had ceased to be a merely technical question. Fieldbus systems had already made their way into the

© 2005 by CRC Press

7-12

The Industrial Communication Technology Handbook

TABLE 7.2 Contents of the CENELEC Fieldbus Standards and Their Relation to IEC IS 61158 CENELEC Standards Part

Contained in IEC Standard

EN 50170-1 (July 1996) EN 50170-2 (July 1996) EN 50170-3 (July 1996) EN 50170-A1 (Apr. 2000) EN 50170-A2 (Apr. 2000) EN 50170-A3 (Aug. 2000) EN 50254-2 (Oct. 1998) EN 50254-3 (Oct. 1998) EN 50254-4 (Oct. 1998) EN 50325-2 (Jan. 2000) EN 50325-3 (Apr. 2000) EN 50325-4 (July 2002) EN 50295-2 (Dec. 1998)

IS 61158 type 4 IS 61158 type 1/3/10 IS 61158 type 1/7 IS 61158 type 1/9 IS 61158 type 1/3 IS 61158 type 2 IS 61158 type 8 (IS 61158 type 3) (IS 61158 type 7) IS 62026-3 (2000) IS 62026-5 (2000) IS 62026-2 (2000)

Brand Name P-Net Profibus WorldFIP Foundation Fieldbus Profibus-PA ControlNet Interbus Profibus-DP (Monomaster) WorldFIP (FIPIO) DeviceNet SDS CANOpen AS-Interface

Note: The dates given in parentheses are the dates of ratification by the CENELEC Technical Board. The parenthetical IEC types denote that the respective fieldbus is contained in a superset definition.

market, much effort and enormous amounts of money had been invested in the development of protocols and devices, and there were already many installations. Nobody could afford to abandon a successful fieldbus; hence it was — from an economical point of view — impossible to start from scratch and create a unified but new standard that was incompatible with the established and widely used national ones. The emerging market pressure was also a reason that within CENELEC no uniform fieldbus solution could be agreed upon. However, the national committees found after lengthy and controversial discussions [3] a remarkable and unprecedented compromise: all national standards under consideration were simply compiled “as is” to European standards [41]. Every part of such a multipart standard is a copy of the respective national standard, which means that every part is a fully functioning system. Although this approach is very pragmatic and seems easy to carry out once adopted, it took a long time to reach it. After all, with the strict European regulations about the mandatory application of standards, being part of it would ensure competitiveness for the respective system suppliers. As there were mostly representatives of the big players present in the committees, they naturally tried to optimize their own positions. Consequently, the contents of the individual CENELEC standards that were adopted step by step still reflect the strategic alliances that had to be formed by the national committees to get “their” standard into the European ones. To make the CENELEC collection easier to handle, the various fieldbus systems were bundled according to their primary application areas. EN 50170 contains general-purpose field communication systems, EN 50254 has high-efficiency communication subsystems for small data packages, and EN 50325 is composed of different solutions based on the CAN technology. In the later phases of the European standardization process, the British national committee played the part of an advocate of the American companies and submitted also FF, DeviceNet, and ControlNet for inclusion in the European standards. Table 7.2 shows a compilation of all these standards, as well as their relation to the new IEC standard. For the sake of completeness, it should be noted that a comparable, though much less disputed, standardization process also took place for bus systems used in machine construction (dealt with by ISO), as well as building automation (in CEN and more recently in ISO). While the Europeans were busy standardizing their national fieldbus systems and simply disregarded what happened in IEC, the Fieldbus Foundation prepared its own specification. This definition was modeled after the bus access scheme of FIP and the application layer protocol of the ISP work (which was in turn based on Profibus-FMS). The FF specification naturally influenced the work in the IEC committee, and consequently, the new draft evolved into a mixture of FF and WorldFIP. By several members of IEC TC65, this was seen as a reasonable compromise able to put an end to the lengthy debate.

© 2005 by CRC Press

7-13

Fieldbus Systems: History and Evolution

However, when the draft was put to a vote in 1996, it was rejected by a very narrow margin, and the actual fieldbus war started. What had happened? The casus belli was that Profibus (specifically the variant PA, which was named after the target application area, process automation, and which had been developed by the Profibus User Organization based on the ideas of the abandoned ISP project) was no longer properly represented in the IEC draft. When the majority of ISP members had teamed up with WorldFIP North America to form the Fieldbus Foundation, the main Profibus supporters had been left out in the rain. The fact that Profibus was already part of a CENELEC standard was no consolation. Given the strict European standardization rules and the Dresden Agreement, according to which international (i.e., IEC) standards supersede opposing CENELEC standards, the Profibus proponents feared that FF might gain a competitive advantage and “their” fieldbus might lose ground. Consequently, the countries where Profibus had a dominant position managed to organize an obstructive minority that prohibited the adoption of the standard. The fact that the IEC voting rules make it easier to cast positive votes (negative votes have to be justified technically) was no particular hindrance, as there were still many inconsistencies and flaws in the draft that could serve as a fig leaf. The FF empire (as it was seen by the Profibus supporters) could not take this and struck back to save “their” standard. They launched an appeal to cancel negative votes that had, in their opinion, no sufficient technical justification. The minority of votes against the draft was very small, so the cancellation of a few votes would have been enough to turn the voting result upside down. Because this idea of using rather sophisticated legal arguments to achieve the desired goal was rather delicate, they proposed that the members of the IEC committee (i.e., the respective national mirror committees) should decide about the (non-)acceptance of the incriminated votes — a procedure that is not in conformance with the IEC rules and caused substantial exasperation. The discredited countries filed a complaint to the Committee of Action (CoA) of the IEC and asked it to resolve the situation. Owing to the infrequent meetings and rather formal procedures, the controversy sketched here carried on for several months. In the meantime, a new draft had been prepared with most of the editorial errors removed. The main discussion point was again the data link layer draft. But now the question was whether the draft in its present form could really be implemented to yield a functioning fieldbus. The Profibus supporters claimed it was not possible, and they envisioned — especially in Europe — a dreary scenario of a nonfunctional IEC fieldbus standard replacing the market-proven European counterparts. The FF proponents maintained it was possible. Their argument was that the Foundation Fieldbus was implemented according to the draft and that products were already being sold. The debate waved to and fro, and Figure 7.4 tries to depict why it was so difficult to judge what was right. Over the years of development, several different versions of the data link layer specification had been submitted to the various standardization committees or implemented as products. Hence, both sides could find ample evidence for their claims.

1995 IEC WG6 DLL

1997

1996 IEC WG6 DLL CDV 160 / 161

Editorial changes

Rejected in vote

1996 ISA SP 50.02 DLL

FF Prelim. Spec. Corrections / Amendments to ISA Subset 1996

IEC WG6 DLL CDV 178/179 1997 DD 238: FF Prelim. Spec. ref. IEC 160/161 instead of ISA

FDIS Vote

To CENELEC

Amendment EN 50170 prA1

1997 FF Final Spec.

To market

Addition of 80 extra pages

FIGURE 7.4 Evolution of the IEC 61158 data link layer and the Foundation Fieldbus (FF) demonstrating the various inconsistent flavors of the document.

© 2005 by CRC Press

7-14

The Industrial Communication Technology Handbook

In the course of subsequent voting processes, the battle raged and things grew worse. There were countries voting — both in favor and against — that had never cast a vote before or that according to their status in the IEC were not even allowed to vote. There were votes not being counted because they were received on a fax machine different from that designated at the IEC and thus considered late (because the error was allegedly discovered only after the submission deadline and it took several days to carry the vote to the room next door). Finally, there were rumors about presidents of national committees who high-handedly changed the conclusions of their committee experts. Throughout this entire hot phase of voting, the meetings of the national committees burst of representatives of leading companies trying to convince the committees of one or the other position. Never before or afterwards was the interest in fieldbus standardization so high, and never were the lobbying efforts so immense — including mobilization of the media, who had difficulties getting an objective overview of the situation [42]. The spiral kept turning faster and faster, but by and large, the obstruction of the standard draft remained unchanged, and the standardization process had degenerated to a playground for company tactics, to an economical and political battle that was apt to severely damage the reputation of standardization as a whole.

7.4.3 The Compromise On June 15, 1999, the Committee of Action of the IEC decided to go a completely new way to break the stalemate. One month later, on July 16, the representatives of the main contenders in the debate (Fieldbus Foundation, Fisher Rosemount, ControlNet International, Rockwell Automation, Profibus User Organization, and Siemens) signed a “Memorandum of Understanding,” which was intended to put an end to the fieldbus war. The Solomonic resolution was to create a large and comprehensive IEC 61158 standard accommodating all fieldbus systems — a move that left unhappy many of those who had been part of the IEC fieldbus project from the beginning [36, 43]. However, other than CENELEC, where complete specifications had been copied into the standard, the IEC decided to retain the original layer structure of the draft with physical, data-link, and application layers, each separated into services and protocols parts (Table 7.3). The individual fieldbus system specifications had to be adapted to so-called types to fit into this modular structure. In a great effort and under substantial time pressure, the draft was compiled and submitted for vote. The demand of the CoA was clear-cut: either this new draft would finally be accepted, or the old draft would be adopted without further discussion. Hence it was no wonder that the new document passed the vote, and the international fieldbus was released as a standard on the carefully chosen date of December 31, 2000. It was evident that the collection of fieldbus specification modules in the IEC 61158 standard was useless for any practicable implementation. What was needed was a manual for the practical use showing which parts can be compiled to a functioning system and how this can be accomplished. This guideline was compiled later on as IEC 61784-1 as a definition of so-called communication profiles [44]. At the same time, the specifications of IEC 61158 were corrected and amended. The collection of profiles shows TABLE 7.3 Systems

© 2005 by CRC Press

Structure of the IEC 61158 Fieldbus for Industrial Control

Standards Part

Contents

Contents and Meaning

IEC 61158-1 IEC 61158-2 IEC 61158-3 IEC 61158-4 IEC 61158-5 IEC 61158-6 IEC 61158-7 IEC 61158-8

Introduction PhL: Physical Layer DLL: Data Link Layer Services DLL: Data Link Layer Protocols AL: Application Layer Services AL: Application Layer Protocols Network Management Conformance Testing

Only technical report 8 types of data transmission 8 types 8 types 10 types 10 types Must be completely revised Work has been canceled

7-15

Fieldbus Systems: History and Evolution

TABLE 7.4

Profiles and Protocols according to IEC 61784-1 and IEC 61158 IEC 61158 Protocols

IEC 61784 Profile CPF-1/1 CPF-1/2 CPF-1/3 CPF-2/1 CPF-2/2 CPF-3/1 CPF-3/2 CPF-3/3 CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2

Phy

DLL

AL

CENELEC Standard

Brand Names

Type 1 Ethernet Type 1 Type 2 Ethernet Type 3 Type 1 Ethernet Type 4 Type 4 Type 1 Type 1 Type 1 Type 8 Type 8 Type 8 Type 6 Type 6

Type 1 TCP/UDP/IP Type 1 Type 2 TCP/UDP/IP Type 3 Type 3 TCP/UDP/IP Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 Type 6 Type 6

Type 9 Type 5 Type 9 Type 2 Type 2 Type 3 Type 3 Type 10 Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 — Type 6

EN 50170-A1 (Apr. 2000) — EN 50170-A1 (Apr. 2000) EN 50170-A3 (Aug. 2000) — EN 50254-3 (Oct. 1998) EN 50170-A2 (Oct. 1998) — EN 50170-1 (July 1996) EN 50170-1 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) — —

Foundation Fieldbus (H1) Foundation Fieldbus (HSE) Foundation Fieldbus (H2) ControlNet EtherNet/IP Profibus-DP Profibus-PA PROFInet P-Net RS-485 P-Net RS-232 WorldFIP (MPS, MCS) WorldFIP (MPS, MCS, SubMMS) WorldFIP (MPS) Interbus Interbus TCP/IP Interbus subset Swiftnet transport Swiftnet full stack

that the international fieldbus today consists of seven different main systems (communication profile families) that in turn can be subdivided (see Table 7.4). All important fieldbuses from industrial and process automation are listed here, and the world’s biggest automation companies are represented with their developments. Foundation Fieldbus consists of three profiles. The H1 bus is used in process automation, whereas high-speed Ethernet (HSE) is planned as an Ethernet backbone and for industrial automation. H2 is a remainder of the old draft. It allows for a migration of the WorldFIP solution toward FF, but in the profile description it is explicitly noted that there are no products available. From the Profibus side, the two profiles DP (decentralized periphery) and PA are present (even the new PROFInet has been included). Interestingly, the experts did not consider it worthwhile to list the original version of Profibus, the FMS, which is a strong sign for the diminishing importance, if not abandonment, of this hard-to-engineer fieldbus that is currently only contained in the EN 50170-2. The Danish fieldbus P-Net was taken over like all definitions and variants of WorldFIP and Interbus. In the latter case, the extensions for the tunneling of TCP/IP traffic have also been foreseen in the standard. A newcomer in the fieldbus arena is Swiftnet, which is widely used in airplane construction. The correct designation of an IEC fieldbus profile is shown for the example of Profibus-DP: compliance to IEC 61784 Ed.1:2002 CPF 3/1. Table 7.5 shows some technical characteristics and the main fields of application for the different systems. Lowlevel fieldbus systems for simple inputs/outputs (I/Os) such as the ones based on CAN or the AS-Interface are not part of IEC 61158; it is planned to combine them in IEC 62026.

7.5 Fieldbus Characteristics The application areas of fieldbus systems are manifold; hence, many different solutions have been developed in the past. Nevertheless, there is one characteristic and common starting point for all those efforts. Fieldbus systems were always designed for efficiency, with two main aspects: • Efficiency concerning data transfer, meaning that messages are rather short according to the limited size of process data that must be transmitted at a time • Efficiency concerning protocol design and implementation, in the sense that typical field devices do not provide ample computing resources

© 2005 by CRC Press

7-16

TABLE 7.5

The Industrial Communication Technology Handbook

Technical Characteristics and Application Domains of the Different Profiles

Profile

Name

CPF-1/1 CPF-1/2

FF (H1) FF (HSE)

CPF-1/3

Industry

Special Features

Nodes per Segment

Processing

Bus Access

Centralized Decentralized Decentralized Centralized Decentralized Centralized

Producer–consumer with distributor CSMA/CD Producer–consumer with distributor Producer–consumer

Centralized

Master–slave with token passing

Max. 99 Max. 30 Max. 126 Max. 32

Function blocks for decentralized control

FF (H2)

Process Factory Process Factory

Max. 32

CPF-2/1 CPF-2/2 CPF-3/1

ControlNet EtherNet/IP Profibus-DP

Factory Factory Factory

CPF-3/2

Profibus-PA

Process

CPF-3/3

PROFInet

Factory

Decentralized

Producer–consumer

Max. 30

CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2

P-Net RS-485 P-Net RS-232 WorldFIP

Factory Shipbuilding Factory

Optimized for factory applications Optimized for remote I/O Optimized for process control Distributed automation objects Multinet capability

Centralized

Max. 32

Distributed real-time database

Centralized Decentralized

Master–slave with token passing Producer–consumer with distributor

Interbus Interbus TCP/IP Interbus Subset Swiftnet transport Swiftnet full stack

Factory

Optimized for remote I/O

Centralized

Aircraft

Optimized for aircraft

Decentralized

Max. 30 Max. 32

Centralized

Single master with synchronized shift register Producer–consumer with distributor

Max. 256

Max. 256

Max. 1024

These two aspects, together with characteristic application requirements in the individual areas with respect to real-time, topology, and economical constraints, have led to the development of concepts that still are very peculiar of fieldbus systems and present fundamental differences to LANs.

7.5.1 Communication Concepts One difference to LANs concerns the protocol stack. Like all modern communication systems, fieldbus protocols are modeled according to the ISO/OSI model. However, normally only layers 1, 2, and 7 are actually used [14]. This is in fact a tribute to the lessons learned from the MAP failure, where it was found that a full seven-layer stack requires far too many resources and does not permit an efficient implementation. For this reason, the MiniMAP approach and, based on it, the IEC fieldbus standard explicitly prescribe a three-layer structure consisting of physical, data link, and application layers. In most cases, this reduced protocol stack reflects the actual situation found in many automation applications anyway. Fieldbuses typically are single-segment networks, and extensions are realized via repeaters or, at most, bridges. Therefore, network and transport layers — which contain routing functionality and end-to-end control — are simply not necessary. If functions of these layers, as well as layers 5 and 6, are still needed, they are frequently included in layer 2 or 7. For the IEC 61158 fieldbus standard, the rule is that layer 3 and 4 functions can be placed in either layer 2 or layer 7, whereas layer 5 and 6 functionalities are always covered in layer 7 (Figure 7.5) [45]. In the building automation domain (LonWorks, EIB/KNX [European installation bus and its successor, Konnex], BacNet), the situation is different. Owing to the possibly high number of nodes, these fieldbus systems must offer the capability of hierarchically structured network topologies, and a reduction to three layers is not sensible. For typical process control applications, determinism of data transfer is a key issue, and cycle time is a critical parameter. This fact has been the optimization criterion for many different fieldbus protocols and the reason that they are different from conventional LANs. Particularly the physical layer has to meet substantially more demanding requirements like robustness, immunity to electromagnetic disturbances,

© 2005 by CRC Press

7-17

Fieldbus Systems: History and Evolution

Full OSI stack Application Presentation Session Transport Network Data Link Physical

Reduced fieldbus stack IEC 61158 Coverage Application

Application

Data Link Physical

Data Link Physical

FIGURE 7.5 Layer structure of a typical fieldbus protocol stack as defined by IEC 61158.

intrinsic safety for hazardous areas, and costs. The significance of the physical layer is underpinned by the fact that this area was the first that reached (notably undisputed) consensus in standardization. On the data link layer, all medium access strategies also known from LANs are used, plus many different subtypes and refinements. Simple master–slave polling (ASi, Profibus-DP) is used as well as token-based mechanisms in either explicit (Profibus, WorldFIP) or implicit (P-Net) form. Carrier-sense multiple access (CSMA) is mostly used in a variant that tries to avoid collisions by either the dynamic adaptation of retry waiting times (LonWorks) or the use of asymmetric signaling strategies (CAN, EIB). Especially for real-time applications, time-division multiple-access (TDMA)-based strategies are employed (TTP [time-triggered protocol], but also Interbus). In many cases, the lower two layers are implemented with application-specific integrated circuits (ASICs) for performance and cost reasons. As a side benefit, the preference of dedicated controllers over software implementations also improves interoperability of devices from different manufacturers. An essential part of fieldbus protocol stacks is comprehensive application layers. They are indispensable for open systems and form the basis for interoperability. Powerful application layers offering abstract functionalities to the actual applications, however, require a substantial software implementation effort, which can negatively impact the protocol processing time and also the costs for a fieldbus interface. This is why in many cases (like Interbus or CAN) an application layer was originally omitted. While the application areas were often regarded as limited in the beginning, market pressure and the desire for flexibility finally enforced the addition of higher-layer protocols, and the growing performance of controller hardware facilitated their implementation. Network management inside fieldbus protocols is traditionally not very highly developed. This stems from the fact that a fieldbus normally is not designed for the setup of large, complex networks. There are exceptions, especially in building automation, which consequently needs to provide more elaborated functions for the setup and maintenance of the network. In most cases, however, the flexibility and functionality of network management is adapted to the functionality and application area of the individual fieldbus. There are systems with comparatively simple (ASi, Interbus, P-Net, J1939) and rather complex management functions (Profibus-FMS, WorldFIP, CANopen, LonWorks, EIB). The latter are typically more flexible in their application range but need more efforts for configuration and commissioning. In any case, network management functions are normally not explicitly present (in addition to the protocol stack, as suggested by the OSI model), but rather included in the protocol layers (mostly the application layer).

7.5.2 Communication Paradigms The characteristic properties of the various data types inside a fieldbus system differ strongly according to the processes that must be automated. Application areas like manufacturing, processing, and building automation pose different timing and consistency requirements that are not even invariant and consistent within the application areas [46]. Typical examples for different timing parameters are continuous measurement data that are sampled and transmitted in discrete-time fashion and form the basis for continuous process control and monitoring (like temperature, pressure, etc.). Other data are typically event based; i.e., they need transmission only in case of status changes (like switches, limit violations,

© 2005 by CRC Press

7-18

The Industrial Communication Technology Handbook

TABLE 7.6

Properties of Communication Paradigms

Communication relation Communication type Master–slave relation Communication service Application classes

Client–Server Model

Producer–Consumer Model

Publisher–Subscriber Model

Peer to peer Connection oriented Monomaster, multimaster Confirmed, unconfirmed, acknowledged Parameter transfer, cyclic communication

Broadcast Connectionless Multimaster Unconfirmed, acknowledged

Multicast Connectionless Multimaster Unconfirmed, acknowledged

Event notification, alarms, error, synchronization

State changes, event-oriented signal sources (e.g., switches)

etc.). As far as consistency is concerned, there are on the one hand process data that are continuously updated and on the other hand parameterization data that are transferred only upon demand. In case of error, the former can easily be reconstructed from historical data via interpolation (or simply be updated by new measurements). The systemwide consistency of configuration data, on the other hand, is an important requirement that cannot be met by mechanisms suitable for process data. These fundamental differences led to the evolution of several communication paradigms that are used either individually or in combination. The applicability in different fieldbus systems is quite different because they require various communication services and media access strategies. The three basic paradigms are: • Client–server model • Producer–consumer model • Publisher–subscriber model The most relevant properties of these three are summed up in Table 7.6. The overview shows that processes with mostly event-based communication can get along very well with producer–consumer-type communication systems, especially if the requirements concerning dynamics are not too stringent. The obvious advantage is that all connected devices have direct access to the entire set of information since the broadcasting is based on identification of messages rather than nodes. Reaction times on events can be very short due to the absence of slow polling or token cycles. Generally, producer–consumer-type systems (or subsystems) are necessarily multimaster systems because every information source (producer) must have the possibility to access the bus. The selection of relevant communication relationships is solely based on message filtering at the consumer’s side. Such filter tables are typically defined during the planning phase of an installation. The publisher–subscriber paradigm uses very similar mechanisms; the only difference is that multicast communication services are employed. The subscribers are typically groups of nodes that listen to information sources (publishers). Relating publishers and subscribers can be done online. As both paradigms are message based and therefore connectionless on the application layer, they are not suited for the transmission of sensitive, nonrepetitive data such as parameter and configuration values or commands. Connectionless mechanisms can inform the respective nodes about communication errors on layer 2, but not about errors on the application layer. The client–server paradigm avoids this problem by using connection-oriented information transfer between two nodes with all necessary control and recovery mechanisms. The communication transfer itself is based on confirmed services with appropriate service primitives (request, indication, response, confirm) as defined in the OSI model. Basically, a client–server-type communication can be implemented in both mono- and multimaster systems. In the latter cases (CSMA- and token-based systems) every master can take on the role of a client, whereas in monomaster systems (polling based) this position is reserved for the bus master. Consequently, the client–server paradigm is used mainly for monomaster systems as well as generally for discrete-time (cyclic) information transfer and for reliable data transfer on the application level (e.g., for parameterization data).

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-19

It is a characteristic feature of fieldbus systems that they do not adhere to single communication paradigms, but support a mix of strategies on different levels of sophistication. Examples for typical client–server systems are Interbus, Profibus, P-Net, and ASi. Broadcast services are here only used for special cases like synchronization purposes. Likewise, there are special ways of receiving messages (e.g., direct slave-to-slave communication) that require temporary delegation of certain bus master aspects. The other two paradigms are widely used in systems like CAN, CANopen, DeviceNet, ControlNet, EIB, and LonWorks. Yet, these systems also employ the client–server paradigm for special functions such as node configuration, file transfer, or the like.

7.5.3 Above the OSI Layers: Interoperability and Profiles A key point for the acceptance of open fieldbus systems was the possibility to interconnect devices of different vendors. Multivendor systems and interoperability are still important arguments in fieldbus marketing. The standardization of fieldbuses was originally thought to be sufficient for interoperable systems, but reality quickly showed that it was not. Standards often leave room for interpretation, and implementations may vary, even if they conform to the standard. Certification of the devices is a suitable way to reduce the problems, but by no means a guarantee. Another reason for troubles is that the semantics of data objects are not precisely defined. This problem has been disregarded in many cases until recently. In fact, it is not a problem of the fieldbus itself, but of the application. Consequently, it must be tackled beyond the ISO/OSI model. The definition of appropriate profiles (or companion standards in MMS) addresses this problem. The creation of profiles originated from the recognition that the definition of the protocol layers alone is not sufficient to allow for the implementation of interoperable products, because there are simply too many degrees of freedom. Therefore, profiles limit the top-level functionality and define specialized subsets for particular application areas [47]. Likewise, they specify communication objects, data types, and their encoding. So they can be seen as an additional layer on top of the ISO/OSI model, which is why they have also been called layer 8 or user layer. One thing to be kept in mind is that nodes using them literally form islands on a fieldbus, which contradicts the philosophy of an integrated, decentralized system. Different profiles may coexist on one fieldbus, but communication between the device groups is normally very limited or impossible. From a systematic viewpoint, profiles can be distinguished into communication, device, and branch profiles. A bus-specific communication profile defines the mapping of communication objects onto the services offered by the fieldbus. A branch profile specifies common definitions within an application area concerning terms, data types, and their coding and physical meaning. Device profiles build on communication and branch profiles and describe functionality, interfaces, and in general the behavior of entire device classes such as electric drives, hydraulic valves, and simple sensors and actuators. The work of defining profiles is scattered among different groups. Communication profiles are usually in the hands of fieldbus user groups. They can provide the in-depth know-how of the manufacturers, which is indispensable for bus-specific definitions. Device and branch profiles are increasingly a topic for independent user groups. For them, the fieldbus is just a means to an end — the efficient communication between devices. What counts more in this respect is the finding and modeling of uniform device structures and parameters for a specific application. This forms the basis for a mapping to a communication system that is generic within a given application context. The ultimate goal is the definition of fieldbus-independent device profiles [47]. This is an attempt to overcome on a high level the still overwhelming variety of systems. Finally, such profiles are also expected to facilitate the employment of fieldbus systems by the end user, who normally is only concerned about the overall functionality of a particular plant — and not about the question of which fieldbus to use. The methods used to define data types, indices, default values, coding and meanings, identification data, and device behavior are based on functional abstractions (most promising are currently function blocks [43, 48]) and universal modeling techniques [49]. A first step in the direction of fieldbus harmonization

© 2005 by CRC Press

7-20

The Industrial Communication Technology Handbook

has been taken by the European research project NOAH (Network-Oriented Application Harmonization [48, 50]), the results of which are currently under standardization by IEC SC65C in project IEC 61804.

7.5.4 Management Owing to the different capabilities and application areas of fieldbus systems, fieldbus management shows varying complexity and its solutions are more or less convenient for the user. It has already been stated above that the various fieldbuses offer a wide range of management services with grossly varying levels of sophistication. Apart from the functional boundary conditions given by the protocols, fieldbus management always strongly relies on the tool support provided by the manufacturers. This significantly adds to inhomogeneity of the fieldbus world in that entirely different control concepts, user interfaces, and implementation platforms are used. Furthermore, a strict division between communication and application aspects of fieldbus management is usually not drawn. Typical communication-related management functions are bus parameter settings like address information, data rate, or timing parameters. These functions are rather low level and implicitly part of all fieldbus protocols. The user can access them via software tools mostly supplied by the device vendor. Application-related management functions concern the definition of communication relations, systemwide timing parameters (such as cycle times), priorities, or synchronization. The mechanisms and services offered by the fieldbus systems to support these functions are very diverse and should be integrated in the management framework for the application itself (e.g., the control system using the fieldbus). As a matter of fact, a common management approach for various automation networks is still not available today, and vendor-specific solutions are preferred. From the users’ point of view (which includes not only the end users, but also system integrators), this entails significantly increased costs for the buildup and maintenance of know-how because they must become acquainted with an unmanageable variety of solutions and tools. This situation actually revives one of the big acceptance problems that fieldbus systems originally had among the community of users: the missing interoperability. Communication interoperability (as ensured by the fieldbus standards) is a necessary but not sufficient precondition. For the user, handling interoperability of devices from different vendors is equally important. What is needed are harmonized concepts for configuration and management tools. As long as such concepts do not exist, fieldbus installations will typically be single-vendor systems, which is naturally a preferable situation for the manufacturers to secure their market position. With the increasing importance of LAN and Internet technologies in automation, new approaches for fieldbus management appeared that may be apt to introduce at least a common view at various fieldbuses. All these concepts aim at integrating fieldbus management into existing management applications of the higher-level network, which is nowadays typically IP based. One commonly employed high-level network management protocol is the Simple Network Management Protocol (SNMP), which can also be used to access fieldbus data points [51, 52]. Another approach involves the use of Directory Services [53]. These two solutions permit the inclusion of a large number of devices in specialized network management frameworks. An alternative that has become very popular is the use of Web technology, specifically HTTP tunneled over the fieldbus, to control device parameters. This trend is supported by the increasing availability of embedded Web servers and the use of Extensible Markup Language (XML) as a device description language [54]. The appealing feature of this solution is that no special tools are required and a standard Web browser is sufficient. However, Web pages are less suitable for the management of complete networks and rather limited to singledevice management. Nevertheless, this approach is meanwhile pursued by many manufacturers.

7.6 New Challenges: Industrial Ethernet As stated before, Ethernet has become increasingly popular in automation. And like in the early days of fieldbus systems, this boom is driven mainly by the industry — on an academic level, the use of Ethernet had been discussed decades ago. Hence, the initial situation is comparable to that of 15 years ago, and there is enough conflict potential in the various approaches to use Ethernet in automation. After all, a

© 2005 by CRC Press

7-21

Fieldbus Systems: History and Evolution

HTTP, FTP, SMTP

SNMP, TFTP

Standard Internet TCP

Fieldbus Application Protocol Fieldbus over Internet

UDP IP

Fieldbus over Ethernet Standard Fieldbus Internet over Fieldbus

Real-time extensions Fieldbus Ethernet

FIGURE 7.6 Structures of Ethernet and fieldbus combinations.

key argument for the introduction of Ethernet was its dominating role in the office world and the resulting status of a uniform network solution. It was exactly this picture of uniqueness that marketing campaigns tried to project also onto the automation world: Ethernet as the single, consistent network for all aspects. A quick look at reality, however, shows that things are different. Ethernet per se is but a solution for the two lower OSI layers, and as fieldbus history already showed, this is not sufficient. Even if the commonly used Internet protocol suite with TCP (Transport Control Protocol) and UDP (User Datagram Protocol) is taken into account, only the lower four layers are covered. Consequently, there are several possibilities to get Ethernet or Internet technologies into the fieldbus domain, all of which are actually used in practice (Figure 7.6): • • • •

Tunneling of a fieldbus protocol over UDP/TCP/IP Definition of new real-time-enabled protocols Reduction of the free medium access in standard Ethernet Tunneling of TCP/IP over an existing fieldbus

The future role of Ethernet in the automation area is not clear. Initially, Ethernet was considered inappropriate because of its lack of real-time capabilities. With the introduction of switched Ethernet and certain modifications of the protocol, however, these problems have been alleviated. And even if there are still doubts about the predictability of Ethernet [55], its penetration into the real-time domain will influence the use of fieldbus-based devices and most likely restrict the future use of fieldbus concepts [56]. Today, Ethernet already takes the place of midlevel fieldbus systems, e.g., for the connection of PLCs. There exist first applications in manufacturing and building automation where no other fieldbuses are installed but Ethernet. To replace the existing lower-level fieldbuses by Ethernet and TCP/UDP/IP, more efforts are needed. One critical issue is (hard) real time, and there exist already different solutions to make Ethernet and TCP/IP meet the requirements of industrial applications [57]. One step below, on the sensor–actuator level, cost and implementation complexity are the most important factors. At the moment, fieldbus connection circuits for simple devices, often only one ASIC, are still cheaper than Ethernet connections. However, with modifications and simplifications of the controller hardware and the protocol implementations, Ethernet could finally catch up and become an interesting option.

7.6.1 Ethernet in IEC 61158 Only recently has standardization begun to deal with the question of Industrial Ethernet. Still, in the wake of the fieldbus wars, several solutions based on Ethernet and TCP/UDP/IP have made their way into the IEC 61158 standard without much fighting (see also Table 7.4):

© 2005 by CRC Press

7-22

The Industrial Communication Technology Handbook

• • • •

High-speed Ethernet (HSE) of the Foundation Fieldbus EtherNet/IP of ControlNet and DeviceNet PROFInet defined by Profibus International TCP/IP over Interbus

HSE and EtherNet/IP (note that here IP stands for Industrial Protocol) are two solutions with a fieldbus protocol being tunneled over TCP/IP. To be specific, it is no real tunneling, where data packets of a lower fieldbus OSI layer are wrapped in a higher-layer protocol of the transport medium. Instead, the same application layer protocol, which is already defined for the fieldbus, is also used over the TCP/IP or UDP/IP stack. In the case of ControlNet and DeviceNet, this is the Control and Information Protocol [58]. This solution allows the device manufacturers to base their developments on existing and well-known protocols. The implementation is without any risk and can be done fast. The idea behind PROFInet is more in the direction of implementing a new protocol. For the actual communication, however, it was decided to use the component object model (COM)/distributed component object model (DCOM) mechanism known from the Windows world. This solution opens a wide possibility of interactions with the office IT software available on the market. The possibility to use fieldbus devices like objects in office applications will increase the vertical connectivity. On the other hand, this also includes the risk of other applications overloading the network, which has to be avoided. Basically, the COM/DCOM model defines an interface to use modules as black boxes within other applications. PROFInet offers a collection of automation objects with COM interfaces independent of the internal structure of the device. So the devices can be virtual, and the so-called proxy servers can represent the interfaces of any underlying fieldbus. This encapsulation enables the user to apply different implementations from different vendors. The only thing the user has to know is the structure of the interface. Provided the interfaces of two devices are equal, the devices are at least theoretically interchangeable. Although this proxy mechanism allows the connection of the Ethernet to all types of fieldbus systems, it will not be a simple and real-time-capable solution. A second problem is that in order to achieve portability, the COM/DCOM mechanism has to be reprogrammed for different operating systems. DCOM is tightly connected to the security mechanisms of Windows NT, but there is also the possibility of using WIN95/98 systems or — with restrictions — some UNIX systems. To simplify this, the PROFInet runtime system includes the COM/DCOM functionality, and the standard COM/DCOM functions inside the operating system have to be switched off if PROFInet is used. The solution of tunneling TCP/IP over a fieldbus requires some minimum performance in terms of throughput from the fieldbus to be acceptable. Normally, throughput of acyclic data (the transport mechanism preferably used in this case) is not the strongest point of fieldbus systems. Nevertheless, Interbus defines the tunneling of TCP/IP over its acyclic communication channel [59]. The benefit of this solution is the parameterization of devices connected to the fieldbus with standard Internet services and well-known tools, e.g., a Web browser. This approach opens the possibility of achieving a new quality of user interaction, as well as a simpler integration of fieldbus management into existing high-level systems. On the downside, however, it forces the manufacturer of the field device to also implement the complete TCP/IP stack, maybe together with a Web server, on the device and the installation personnel to handle the configuration of the IP addressing parameters.

7.6.2 Real-Time Industrial Ethernet The Industrial Ethernet solutions discussed so far build on Ethernet in its original form; i.e., they use the physical and data link layers of ISO/IEC 8802-3 without any modifications. Furthermore, they assume that Ethernet is low loaded or Fast Ethernet switching technology is used, in order to get a predictable performance. Switching technology does eliminate collisions, but delays inside the switches and lost packages under heavy load conditions are unavoidable with switches [60]. This gets worse if switches are used in a multilevel hierarchy and may result in grossly varying communication delays. The real-time

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-23

capabilities of native Ethernet are therefore limited and must rely on application-level mechanisms controlling the data throughput. For advanced requirements, like drive controls, this is not sufficient. These known limitations of conventional Ethernet stimulated the development of several alternative solutions that were more than just adaptations of ordinary fieldbus systems. These entirely new approaches were originally outside the IEC standardization process, but are now candidates for inclusion in the real-time Ethernet (RTE) standard, i.e., the second volume of IEC 61784. The initial and boundary conditions for the standardization work, which started in 2003, are targeted at backward compatibility with existing standards. First of all, RTE is seen as an extension to the Industrial Ethernet solutions already defined in the communication profile families in IEC 61784-1. Furthermore, coexistence with conventional Ethernet is intended. The scope of the working document [61] states that “the RTE shall not change the overall behavior of an ISO/IEC 8802-3 communication network and their related network components or IEEE 1588, but amend those widely used standards for RTE behaviors. Regular ISO/IEC 8802-3 based applications shall be able to run in parallel to RTE in the same network.” Reference to the time distribution standard IEEE 1588 [62] is made because it will be the basis for the synchronization of field devices. The work program of the RTE working group essentially consists of the definition of a classification scheme with RTE performance classes based on actual application requirements [63]. This is a response to market needs that demand scalable solutions for different application domains. One possible classification structure could be based on the reaction time of typical applications in automation: • A first low-speed class with reaction times around 100 ms. This timing requirement is typical for the case of humans involved in the system observation (10 pictures per second can already be seen as a low-quality movie), for engineering, and for process monitoring. Most processes in process automation and building control fall into this class. This requirement may be fulfilled with a standard system with a TCP/IP communication channel without many problems. • In a second class the requirement is a reaction time below 10 ms. This is the requirement for most tooling machine control systems like PLCs or PC-based control. To reach this timing behavior, special care has to be taken in the RTE equipment: sufficient computing resources are needed to handle TCP/IP in real-time or the protocol stack must be simplified and reduced to get these reaction times on simple, cheap resources. • The third and most demanding class is defined by the requirements of motion control: to synchronize several axes over a network, a time precision well below 1 ms is needed. Current approaches to reach this goal rely on modifications of both protocol medium access and hardware structure of the controllers. These classes will then be the building blocks for additional communication profiles. The intended structural resemblance to the fieldbus profiles is manifested by the fact that the originally attributed document number IEC 62391 was changed to IEC 61784-2. The technological basis for the development will mostly be switched Ethernet. At the moment there are several systems that have the potential to fulfill at least parts of such an RTE specification and that are already introduced on the market or will be shortly. From these systems, three are extensions to fieldbuses already contained in IEC 61784: EtherNet/IP: Defined by Rockwell and supported by Open DeviceNet Vendor Association (ODVA) and ControlNet International, EtherNet/IP makes use of the Common Industrial Protocol (CIP), which is common to the networks EtherNet/IP, ControlNet, and DeviceNet. CIP defines objects and their relations in different profiles and fulfills the requirements of class 1 on EtherNet/IP. As such, it is part of IEC 61784-1. With the CIP Sync extensions it is possible to get isochronous communication that satisfies class 2 applications. These extensions use 100 MBit/s networks with the help of IEEE 1588 time synchronization. PROFInet: Defined mainly by Siemens and supported by Profibus International. Only the first version is currently included in the international fieldbus standard. A second step was the definition of

© 2005 by CRC Press

7-24

The Industrial Communication Technology Handbook

TABLE 7.7 IEC 61784

Industrial Ethernet Profiles Defined in

IEC 61784 Profile

Volume

Brand Names

CPF-1 CPF-2 CPF-3 CPF-6 CPF-10 CPF-11 CPF-12 CPF-13 CPF-14 CPF-15

1 1, 2 1, 2 1, 2 2 2 2 2 2 2

Foundation Fieldbus EtherNet/IP PROFInet Interbus VNET/IP TCnet EtherCAT EPL (Ethernet Powerlink) EPA Modbus

a soft real-time (SRT) solution for PROFInet IO. In this version class 2 performance is also reached for small and cheap systems by eliminating the TCP/IP stack for process data. I/O data are directly packed into the Ethernet frame with a specialized protocol. Class 3 communication is reached with a special switch ASIC with a short and stable cut-through time and special priority mechanism for real-time data [64]. Synchronization is based on an extension of IEEE 1588 using onthe-fly time stamping, an idea that has been introduced in a different context [65]. The first application planned for PROFInet isochronous real time (IRT) is the PROFIdrive profile for motion control applications. Interbus: Will also have an RTE extension, which will be identical to PROFInet. Still, it will be listed as a separate profile. Apart from these approaches that merely extend well-known fieldbus systems, there is a multitude of new concepts collected in IEC 61784-2 (Table 7.7), not all of which were known in detail at the time of this writing: VNET/IP: Developed by Yokogawa. The real-time extension of this protocol is called RTP (Real-Time and Reliable Datagram Protocol). Like many others, it uses UDP as a transport layer. Characteristic for the approach are an optimized IP stack (with respect to processing times) and a concept for redundant network connections. TCnet: A proposal from Toshiba. Here, the real-time extension is positioned in the medium access control (MAC) layer. Also, a dual redundant network connection is proposed, based on shared Ethernet. EtherCAT: Defined by Beckhoff and supported by the Ethercat Technology Group (ETG), EtherCAT uses the Ethernet frames and sends them in a special ring topology [66]. Every station in the net removes and adds its information. This information may be special input/output data or standard TCP/IP frames. To realize such a device, a special ASIC is needed for medium access that basically integrates a two-port switch into the actual device. The performance of this system is very good: it may reach cycle times of 30 µs. Powerlink: Defined by B&R and now supported by the Ethernet Powerlink Standardization Group (EPSG). It is based on the principle of using a master–slave scheduling system on top of a regular shared Ethernet segment [67]. The master ensures the real-time access to the cyclic data and lets standard TCP/IP frames pass through only in specific time slots. To connect several segments, a synchronization based on IEEE 1588 is used. This solution is the only product available on the market that already fulfills the class 3 requirements today. In the future, the CANopen drive profiles will be supported. EPA (Ethernet for Process Automation) protocol: A Chinese proposal. It is a distributed approach to realize deterministic communication based on a time-slicing mechanism.

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-25

Modbus/TCP: Defined by Schneider Electric and supported by Modbus-IDA,* Modbus/TCP uses the well-known Modbus over a TCP/IP network. This is probably the most widely used Ethernet solution in industrial applications today and fulfills the class 1 requirements without problems. Modbus/TCP was — contrary to all other fieldbus protocols — submitted to Internet Engineering Task Force (IETF) for standardization as an RFC (request for comments) [68]. The real-time extensions use the Real-Time Publisher–Subscriber (RTPS) protocol, which runs on top of UDP. Originally outside the IEC SC65C was SERCOS, well known for its optical ring interface used in drive control applications. SERCOS III, also an Ethernet-based solution, is under development [69]. The ring structure is kept and the framing replaced by Ethernet frames to allow easy mixture of real-time data with TCP/IP frames. In every device a special software or, for higher performance, an application-specific integrated circuit will be needed that separates the real-time time slot from the TCP/IP time slot with a switch function. Recently, cooperation between the committee working on SERCOS and SC65C has been established to integrate SERCOS in the RTE standard. The recent activities of IEC SC65C show that there is a substantial interest, especially from industry, in the standardization of real-time Ethernet. This situation closely resembles fieldbus standardization at the beginning of the 1990s, which ultimately led to the fieldbus wars. Given the comparable initial situation, will history repeat itself? Most likely not, because the structure of the intended standard documents already anticipates a multipart solution. So, the compromise that in former days needed so long to be found is already foreseen this time. Furthermore, the big automation vendors have learned their lessons to allow them to avoid time- and resource-consuming struggles that eventually end up in compromises anyway. Finally, the IEC itself cannot afford a new standardization war that would damage its image. Hence, all parties involved should have sufficient interest for the standardization process to be smooth and fast without too much noise inside the committees. Another evidence for this attitude is that the CENELEC committee TC65CX explicitly decided not to carry out standardization on the European level, but to wait for the outcome of the IEC work. The final standard is expected in 2007.

7.7 Aspects for Future Evolution Even though fieldbus systems have reached a mature state, applications have become more demanding, which in turn creates new problems. Much work is still being done to improve the fieldbus itself, in particular concerning transmission speed and the large area of real-time capabilities [46, 70]. Another subject receiving considerable attention is the extension of fieldbuses to wireless physical layers [71, 72]. Apart from such low-level aspects, other problems are lurking on the system and application levels.

7.7.1 Driving Forces Historically, the most important driving forces behind the development of fieldbus systems were the reduction of cabling and the desire to integrate more intelligence into the field devices. At least in Europe, the general need for automation, of which fieldbus systems are an integral part, also had a socioeconomic reason. Raising production costs due to comparatively high wages required a higher degree of automation to stay competitive in an increasingly globalized market. The enabling technology for automation was, of course, microelectronics. Without the availability of highly integrated controllers, the development of fieldbus systems would have never been possible. Today’s driving forces for further evolution mainly come from the application fields that will be reviewed. Nevertheless, there are also technology push factors that promote the application of new *IDA, Interface for Distributed Automation, a consortium that originally worked on an independent solution, but finally merged with Modibus.

© 2005 by CRC Press

7-26

The Industrial Communication Technology Handbook

technologies, mainly at the lower layers of communication (e.g., Ethernet). It must not be overlooked, however, that these factors are to a certain extent marketing driven and aim at the development of new market segments or the redistribution of already existing ones. One important factor is what has recently become known as vertical integration. It concerns the possibly seamless interconnection between the traditional fieldbus islands and higher-level networks. The driving force behind this development is that people have become used to the possibility of accessing any information at any time over the Internet. Computer networks in the office area have reached a high level of maturity. Moreover, they are (quasi) standards that permitted worldwide interconnectivity and — even more important — easy access and use for nonspecialists. Hence, it is not astonishing that the anytime–anywhere concept is also extended to fieldbuses and automation systems in general. A common solution today is to have the coexistence of real-time fieldbus traffic and not time-critical tasks like configuration and parameterization based on, e.g., user-friendly Web-based services on the same communication medium. This becomes possible by the use of embedded Web servers in the field devices and the tunneling of TCP/IP over the fieldbus. Other approaches employ gateways to translate between the two worlds. In the near future, the increased use of Ethernet on the field level is supposed to further alleviate network integration, even though it will not be able to solve all problems. Another driving force for the development of new concepts comes from the area of building automation. Although networks in this field emerged relatively late compared with industrial automation, the benefits are evident: the operating costs of a building can be reduced dramatically, if information about the status of the building is available for control purposes. This concerns primarily the energy consumption, but also service and maintenance costs. Energy control is a particularly interesting topic. Provided electrical appliances are interconnected via a fieldbus, they can adjust their energy consumption so as to balance the overall load [73, 74]. This demand-side management avoids peak loads, which in turn is honored by the utility companies with lower energy prices. Even more important will be the combination of fieldbuses in buildings (and also private homes) with Internet connections. This is a particular aspect of vertical integration and opens a window for entirely new services [75]. External companies could offer monitoring and surveillance services for private houses while the owners are on vacation. Currently, such services already exist, but are limited to company customers (mostly within the context of facility management). A very important topic for utility companies in many countries is remote access to energy meters [76]. Having an appropriate communication link, they can more precisely and with finer granularity monitor the actual energy consumption of their customers, detect possible losses in the network, and better adapt their own productions and distributions. As a side benefit, billing can be automated and tariffs can be made more flexible when load profiles can be recorded. Eventually, if the energy meters support prepayment, even billing is no longer necessary. An application field that is becoming increasingly relevant for networks is safety-relevant systems. As this domain is subject to very stringent normative regulations, and thus very conservative, it was dominated for a long time (and still is) by point-to-point connections between devices. The first bus system to penetrate this field was the CAN-based safety bus [77]. It took a long time and much effort for this system to pass the costly certification procedures. Nevertheless, it was finally accepted by the users, which was by no means obvious in an area concerned with the protection of human life, given that computer networks usually have the psychological disadvantage of being considered unreliable. After this pioneering work, other approaches like the ProfiSafe profile [78], Interbus safety [79], ASi safety [80], and recently EtherNet/IP safety [81] and WorldFIP [82] readily followed. The next big step is just ahead in car manufacturing, where in-vehicle networks in general and x-by-wire technology in particular will become determining factors [83]. Here, safety is of even more obvious relevance, and the latest developments of fieldbus systems for automotive use clearly address this issue. In the current Industrial Ethernet standardization process, safety considerations also play an important role. Microelectronics will continue to be the primary enabling technology for automation networks. Increasing miniaturization and the possibility to integrate more and more computing power while at the same reducing energy consumption will be the prerequisite for further evolution. Today, system-on-a-

© 2005 by CRC Press

7-27

Fieldbus Systems: History and Evolution

3–6 nodes physical layer error control

6–12 nodes networks software tools

~ 1985

~ 1990

up to 20,000 nodes Profiles Plug & Play Internet

~ 2000

up to 1,000,000 nodes complex systems agent-based approaches

~ 2015

FIGURE 7.7 With increasing complexity of fieldbus installations, the important topics in research and practice change.

chip (SoC) integration of a complete industrial PC with Ethernet controller, on-chip memory, and a complete IP stack as firmware is available. Of course, the computing resources of such integrated solutions cannot be compared with high-end PCs, but they are sufficient for smart and low-cost sensors and actuators. This evolution is, on the one hand, the foundation of the current boom of Ethernet in automation. On the other hand, it will stimulate more research in the emerging field of sensor networks [84]. Currently most of the effort in this area is being put into wireless networking approaches, but it can be expected that work on other aspects will gain importance in the future. From an application point of view, other emerging fields like ubiquitous computing [85] or concepts inspired by bionics [86] will also rely on low-level networking as an essential technological cornerstone.

7.7.2 System Complexity If we consider the evolution of fieldbus systems, we observe a very interesting aspect. Until the mid-1990s, the developers of fieldbus systems concentrated on the definition of efficient protocols. Since the computing resources in the field devices were limited and the developers did not expect fieldbuses to have a complex network structure, most protocols only use the lower two or three layers and the top layer of the OSI/ISO model. In those days, typical applications in industrial automation had only about six nodes on average, so the assumption of not-so-complex structures was justified. With the availability of more fieldbus devices and a growing acceptance of the technology, the number of nodes in a typical installation has also increased. A decade ago, the average application in industrial automation had 6 to 12 nodes. With time, however, it turned out that the main costs of fieldbus systems were determined not so much by the development of the nodes, but rather by the maintenance of the node software, as well as the software tools necessary to integrate and configure the network. Actually, the development of a fieldbus system means much more than just designing a clever protocol and implementing a few nodes — an aspect that was often underrated in the past. More important for the success of a fieldbus is the fact that a user-friendly configuration and operating environment is available. This was, by the way, a strong argument in favor of open systems, where the development of field devices and software tools can be accomplished by different companies. For proprietary systems, by contrast, the inventor must supply both devices and software, which is likely to overstrain a single company. Today, the number of nodes per installation is increasing dramatically. The enormous numbers shown in Figure 7.7 are of course not found in industrial automation, but in the area of building automation, where installations with 20,000 or more nodes are nowadays feasible. This evolution goes hand in hand with the advances of sensor networks in general. If we extrapolate the experience from other fields of computer technology, we can try to sketch the future evolution: the prices of the nodes will fall, and at the same time the performance will increase, allowing for the integration of more and more intelligence into the individual node. This way, we can have complex networks with up to 1 million nodes working together. Such complex systems will be the challenge for the next decades. It is evident that applications in such systems must be structured differently from today’s approaches. What is required is a true distribution of the application. A promising concept is holonic systems that have been thoroughly investigated in manufacturing systems [87, 88]. A holonic system consists of

© 2005 by CRC Press

7-28

The Industrial Communication Technology Handbook

distributed, autonomous units (holons) that cooperate to reach a global goal. In artificial intelligence, the same concept is better known as a multiagent system. Such agents could be an interesting way to cope with complex systems [89, 90]. The main problem, however, will be to provide tools that can support the user in creating the distributed application. A problem directly connected with system complexity is installation and configuration support through some plug-and-play capability. The ultimate meaning here is that new nodes can be attached to an existing network and integrate themselves without further input from the user. Realistically, this will remain only an appealing vision, as the user will always have to define at least the semantics of the information flow (i.e., in the trivial case of building automation, which switch is associated with which lamp), but nodes will have to be much more supportive than they are today. To date, the concepts for plug and play or at least plug and participate are at a very early stage. There are exemplary solutions for the automatic configuration of Profibus-DP devices [91] based on a manager–agent model inspired by management protocols like SNMP or the management framework of the ISO/OSI model. Here, a manager controls the status of the fieldbus and initiates the start-up and commissioning of the system in cooperation with the agents on the individual devices. The necessary data are kept in a (distributed) management information base (MIB). Service broker approaches, such as Jini [92], could also be a suitable approach to tackle the problem of plug and play. The goal of Jini is to make distributed resources in a client–server network accessible. The term resource has a very abstract meaning and is composed of both hardware and software. To locate the resources in the network, services offered, as well as service requests, are published by the nodes and matched by the service broker [93]. A problem of Jini is that it builds on the relatively complex programming language Java. Hence, all Jini-enabled devices need to have a Java Virtual Machine as an interpreter, which is rather computing intensive. Jini is well developed today; however, hardware support still does not exist, and the breakthrough in smart devices as originally intended is not in sight. Competing approaches like Universal Plug and Play (UPnP) are catching up, but it is also questionable whether they will be suitable for complex systems.

7.7.3 Software Tools and Management The fieldbus as a simple means to communication is only one part of an automation system. Today, it is the part that is best understood and developed. What becomes increasingly a problem, especially with increasing complexity, is the support through software tools. Historically, such tools are provided by the fieldbus vendors or system integrators and are as diverse as the fieldbuses themselves. Moreover, there are different (and often inconsistent) tool sets for different aspects of the life cycle of a plant, like planning, configuration, commissioning, testing, and diagnosis or maintenance. Such tools typically only support a topological view on the installation, whereas modern complex systems would require rather functionality-oriented, abstract views. A major disadvantage of the tool variety is that they operate in many cases on incompatible databases, which hampers system integration and is likely to produce consistency problems. More advanced concepts build on unified data sets that present consistent views to the individual tools with well-defined interfaces [94, 95]. The data structures are nevertheless still specific for each fieldbus. Unification of the data representations is one of the goals of NOAH [50]. For fieldbus-independent access to the field devices and their data (not necessarily covering the entire life cycle), several solutions have been proposed. They mostly rely on a sort of middleware abstraction layer using object-oriented models. Examples are OPC (OLE for Process Control) [96], Java, and other concepts [97]. Such platforms can ultimately be extended through definition of suitable application frameworks that permit the embedding of generic or proprietary software components in a unified environment spanning all phases of the life cycle. Relevant approaches are, e.g., Open Control [95], Field Device Tool [98], and a universal framework of the ISO [99]. Beyond pure communication management, in the application domain, essential aspects of engineering and management are also not yet universally solved. The ample computing resources of modern field devices, however, allow the introduction of new and largely fieldbus-independent concepts for the modeling of applications. A promising development are function blocks, standardized in IEC 61499

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-29

[100]. Historically evolved as an extension to the PLC programming standard IEC 61131, they can be used to create a functional view (rather than a topological one) on distributed applications. The function block concept integrates the models known from PLCs in factory automation, as well as typical functions from process automation that are in many fieldbuses available as proprietary implementations. With its universal approach, it is also a good option for the implementation of fieldbus profiles. In the context of management and operation frameworks, the unified description of device and system properties becomes of eminent importance. To this end, device description languages were introduced. The descriptions of the fieldbus components are mostly developed by the device manufacturer and are integral parts of the products. Alternatively, they are contained in libraries where they can be downloaded and parsed for further use. Over the years, several mutually incompatible languages and dialects were developed [101, 102]. This is not surprising, as device descriptions are the basis for effective installation and configuration support. Thus, they are a necessary condition for the already discussed plug-and-play concepts. In recent years, the diversity of description languages is being addressed by the increased usage of universal languages like XML [103, 104], which is also the basis for the electronic device description language (EDDL) standardized in IEC 61804 [105, 106].

7.7.4 Network Interconnection and Security Security has never been a real issue in conventional fieldbus systems. This is understandable in so far as fieldbuses were originally conceived as closed, isolated systems, which raised no need for security concepts. In building automation, where networks are naturally larger and more complex, the situation is different, and at least rudimentary security mechanisms are supported [107]. In factory and process automation, things changed with the introduction of vertical integration and the interconnection of fieldbuses and office-type networks. In such an environment, security is an essential topic on all network levels. Given the lack of appropriate features on the fieldbus level, the development and application of security concepts is typically confined to the actual network interconnection [108, 109]. One important aspect is that popular firewalls are not sufficient to guarantee security. Likewise, encryption is no cure-all, albeit an important element of secure systems. To reach a meaningful security level, a thorough risk analysis is the first step. On this basis, a security strategy needs to be developed detailing all required measures, most of which are organizational in nature. In practice, one will face two major problems: (1) the additional computational effort for security functions on the field devices (e.g., for cryptographic function), which may contradict real-time demands; and (2) the logistical problem of distributing and managing the keys whose secrecy forms the basis of every security policy. Both problems can — to a certain extent — be tackled with the introduction of security tokens such as smart cards [107]. With the introduction of Ethernet in automation, a reconsideration of field-level security is also possible. This is facilitated by the fact that many Industrial Ethernet solutions use IP and the Internet transport protocols UDP and TCP on top of Ethernet, which means that standard security protocols like Transport Layer Security (TLS) [110] can be used. One should recognize, however, that there are other approaches that use proprietary protocols above Ethernet, and that Ethernet per se is not the layer where security features can be reasonably implemented. The fact that automation networks do not have security features up to now is also reflected in the recent standardization work of IEC SC65C WG13. Unlike other working groups, where the aim of the members is to get concrete proposals of established systems into the standards, no ready-to-use proposals exist. Apart from general considerations, the work has to be started largely from scratch. There is, however, related work in other fields that is being considered: • IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, maintained by IEC/SC 65A. Functional safety is in principle covered by the work of WG 12, but the common understanding is that safety-related systems necessarily have security aspects. • Work being done in IEC TC57/WG15: Power systems management and associated information exchange/data and communication security. • ISO/IEC 17799: Code of Practice for Information Security Management.

© 2005 by CRC Press

7-30

The Industrial Communication Technology Handbook

• ISO/IEC 15408: Common Criteria for IT Security Evaluation. • ISA SP99: Manufacturing and Control Systems Security. It can be expected that this U.S. activity will have significant influence on the WG 13 work. • AGA/GTI 12: Cryptographic Protection of SCADA Communications. • NIST PCSRF: Process Control Security Requirements Forum.

7.8 Conclusion and Outlook Fieldbus systems have come a long way from the very first attempts of industrial networking to contemporary highly specialized automation networks. What is currently at hand — even after the selection process during the last decade — nearly fully covers the complete spectrum of possible applications. Nevertheless, there is enough evolution potential left [70, 86]. On the technological side, the communication medium itself allows further innovations. Up to now, the focus has been on wired links, twisted pair being the dominant solution. Optical media have been used comparatively early for large distances and electromagnetically disturbed environments. Recently, plastic optical fibers have reached a status of maturity that allows longer cable lengths and smaller prices. Another option, especially for building automation, is the use of electrical power distribution lines. This possibility, although tempting in principle, is still impaired by bad communication characteristics of the medium. Substantial research effort will be needed to overcome these limitations, which in fact comes down to a massive use of digital signal processing. The most promising research field for technological evolution is the wireless domain. The benefits are obvious: no failure-prone and costly cabling and high flexibility, even mobility. The problems, on the other hand, are also obvious: very peculiar properties of the wireless communication channel must be dealt with, such as attenuation, fading, multipath reception, temporarily hidden nodes, and the simple access for intruders [71]. Wireless communication options do exist today for several fieldbuses [72]. Up to now, they have been used just to replace the conventional data cable. A really efficient use of wireless communication, however, would necessitate an entire redefinition of at least the lower fieldbus protocol layers. Evaluation of currently available wireless technologies from the computer world with respect to their applicability in automation is a first step in this direction. Ultimately we can expect completely new automation networks optimized for wireless communication, where maybe only the application layer protocol remains compatible with traditional wired solutions to achieve integration. Apart from mere technological issues, the currently largest trend is the integration of fieldbus systems in higher-level, heterogeneous networks and process control systems. Internet technologies play a particularly prominent role here, and the penetration of the field level by optimized Ethernet solutions creates additional momentum. The ultimate goal is a simplification and possibly harmonization of fieldbus operation. For the fieldbus itself, this entails increasing complexity in the higher protocol levels. At the same time, more and more field-level applications employ standard PC-based environments and operating systems like Windows or Linux [111]. These two trends together result in a completely new structure of the automation hierarchy. The old multilevel pyramid finally turns into a rather flat structure with two, maybe three levels, as shown in Figure 7.8. Here, functions of the traditional middle layers (like process and cell levels) are transferred into the intelligent field devices (and thus distributed) or into the management level. The traditional levels may persist in the organizational structure of the company, but not in the technical infrastructure. Does all this mean we have reached the end of the fieldbus era? The old CIM pyramid, which was a starting point for the goal-oriented development of fieldbus systems, ceases to exist, and Ethernet is determined to reach down into the field level. This may indeed be the end of the road for the traditional fieldbus as we know it, but certainly not for networking in automation. What we are likely to see in the future are Ethernet- and Internet-based concepts at all levels, probably optimized to meet special performance requirements on the field level but still compatible with the standards in the management area. Below, very close to the technical process, there will be room for highly specialized sensor–actuator networks — new fieldbus systems tailored to meet the demands of high flexibility, energy optimization,

© 2005 by CRC Press

7-31

Fieldbus Systems: History and Evolution

Management Marketing, Planning

Data Server Business data

Process Information Level

Statistics

Ethernet

Company network (backbone)

Quality control PC Control Parameters

Ethernet, (Fieldbus)

Process data management PLC

Process Control Level

Visualization

Fieldbus, (Ethernet) Measurement technology, sensors, actuators, controllers

FIGURE 7.8 Flattened, two-level automation hierarchy.

small-footprint implementation, or wireless communication. The next evolution step in fieldbus history is just ahead.

Acknowledgments The author thanks Dietmar Dietrich, Kurt Milian, Eckehardt Klemm, Peter Neumann, and Jean-Pierre Thomesse for the extensive discussions, especially about the historical aspects of fieldbus systems.

References [1] International Electrotechnical Commission, IEC 61158, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2003. [2] Fieldbus Foundation, What Is Fieldbus? http://www.fieldbus.org/About/FoundationTech/. [3] G.G. Wood, Fieldbus status 1995, IEE Computing and Control Engineering Journal, 6, 251–253, 1995. [4] G.G. Wood, Survey of LANs and standards, Computer Standards and Interfaces, 6, 27–36, 1987. [5] N.P. Mahalik (Ed.), Fieldbus Technology: Industrial Network Standards for Real-Time Distributed Control, Spinger, Heidelberg, 2003. [6] H. Töpfer, W. Kriesel, Zur funktionellen und strukturellen Weiterentwicklung der Automatisierungsanlagentechnik, Messen Steuern Regeln, 24, 183–188, 1981. [7] T. Pfeifer, K.-U. Heiler, Ziele und Anwendungen von Feldbussystemen, Automatisierungstechnische Praxis, 29, 549–557, 1987. [8] H. Steusloff, Zielsetzungen und Lösungsansätze für eine offene Kommunikation in der Feldebene, Automatisierungstechnik, 855, 337–357, 1990. [9] L. Capetta, A. Mella, F. Russo, Intelligent field devices: user expectations, IEE Coll. on Fieldbus Devices: A Changing Future, 6/1–6/4, 1994.

© 2005 by CRC Press

7-32

The Industrial Communication Technology Handbook

[10] K. Wanser, Entwicklungen der Feldinstallation und ihre Beurteilung, Automatisierungstechnische Praxis, 27, 237–240, 1985. [11] J.A.H. Pfleger, Anforderungen an Feldmultiplexer, Automatisierungstechnische Praxis, 29, 205–209, 1987. [12] H. Junginger, H. Wehlan, Der Feldmultiplexer aus Anwendersicht, Automatisierungstechnische Praxis, 31, 557–564, 1989. [13] W. Schmieder, T. Tauchnitz, FuRIOS: fieldbus and remote I/O: a system comparison, Automatisierungstechnische Praxis, 44, 61–70, 2002. [14] P. Pleinevaux, J.-D. Decotignie, Time critical communication networks: field buses, IEEE Network, 2, 55–63, 1988. [15] E.H. Higham, Casting a crystal ball on the future of process instrumentation and process measurements, in IEEE Instrumentation and Measurement Technology Conference (IMTC ’92), New York, May 1992, pp. 687–691. [16] J.P. Thomesse, Fieldbuses and interoperability, Control Engineering Practice, 7, 81–94, 1999. [17] J.-C. Orsini, Field Bus: A User Approach, Cahier Technique Schneider Electric 197, 2000, http:// www.schneider-electric.com.tr/ftp/literature/publications/ECT197.pdf. [18] R.D. Quick, S.L. Harper, HP-IL: A Low-Cost Digital Interface for Portable Applications, HewlettPackard Journal, January 1983, pp. 3–10. [19] Philips Semiconductor, The I2C-Bus Specification, 2000, http://www.semiconductors.philips.com/ buses/i2c/. [20] H. Zimmermann, OSI reference model: the ISO model of architecture for open system interconnection, IEEE Transactions on Communications, 28, 425–432, 1980. [21] J. Day, H. Zimmermann, The OSI reference model, Proceedings of the IEEE, 71, 1334–1340, 1983. [22] D.J. Damsker, Asessment of industrial data network standards, IEEE Trans. Energy Conversion, 3, 199–204, 1988. [23] H.A. Schutz, The role of MAP in factory integration, IEEE Transactions on Industrial Electronics, 35, 6–12, 1988. [24] B. Armitage, G. Dunlop, D. Hutchison, S. Yu, Fieldbus: an emerging communications standard, Microprocessors and Microsystems, 12, 555–562, 1988. [25] S.G. Shanmugham, T.G. Beaumariage, C.A. Roberts, D.A. Rollier, Manufacturing communication: the MMS approach, Computers and Industrial Engineering, 28, 1–21, 1995. [26] T. Phinney, P. Brett, D. McGovan, Y. Kumeda, FieldBus: real-time comes to OSI, in International Phoenix Conference on Computers and Communications, March 1991, pp. 594–599. [27] K. Bender, Offene Kommunikation: Nutzen, Chancen, Perspektiven für die industrielle Kommunikation, in iNet ’92, 1992, pp. 15–37. [28] T. Sauter and M. Felser, The importance of being competent: the role of competence centres in the fieldbus world, in FeT ’99 Fieldbus Technology, Magdeburg, Germany, September 1999, pp. 299–306. [29] Gesmer Updegrove LLP, Government Issues and Policy, http://www.consortiuminfo.org/ government/. [30] M.A. Smith, Vienna Agreement on Technical Cooperation between ISO and CEN, paper presented at ISO/IEC Directives Seminar, Geneva, June 1995, isotc.iso.ch/livelink/livelink/fetch/2000/2123/ SDS_WEB/sds_dms/vienna.pdf. [31] International Electrotechnical Commission, IEC-CENELEC Agreement, http://www.iec.ch/about/ partners/agreements/cenelec-e.htm. [32] E. Klemm, Der Weg durch die Gremien zur internationalen Feldbusnorm, paper presented at VDE Seminar Die neue, internationale Feldbusnorm: Vorteile, Erfahrungen, Beispiele, Zukunft, November 2002, Mannheim. [33] Instrument Society of America Standards and Practices 50, Draft Functional Guidelines, March 10, 1987, document ISA-SP50-1986-17-D.

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-33

[34] G.G. Wood, Current fieldbus activities, Computer Communications, 11, 118–123, 1988. [35] C. Gilson, Digital Data Communications for Industrial Control Systems or How IEC 61158 (Just) Caught the Bus, paper presented at IEC E-TECH, March 2004, http://www.iec.ch/online_news/ etech/arch_2004/etech_0304/focus.htm#fieldbus. [36] P. Leviti, IEC 61158: an offence to technicians? in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, p. 36. [37] T. Phinney, Mopping up from bus wars, World Bus Journal, 22–23, December 2001. [38] H. Engel, Feldbus-Normung 1990, Automatisierungstechnische Praxis, 32, 271–277, 1990. [39] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 4), Automatisierungstechnische Praxis, 40, S25–S28, 1998. [40] J. Rathje, The fieldbus between dream and reality, Automatisierungstechnische Praxis, 39, 52–57, 1997. [41] G.H. Gürtler, Fieldbus standardization, the European approach and experiences, in Feldbustechnik in Forschung, Entwicklung und Anwendung, Springer, Heidelberg, 1997, pp. 2–11. [42] S. Bury, Are you on the right bus? Advanced Manufacturing, 1, 26–30, 1999, http://www.advanced manufacturing.com/October99/fieldbus.htm. [43] G.G. Wood, State of play, IEE Review, 46, 26–28, 2000. [44] International Electrotechnical Commission, IEC 61784-1, Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2003. [45] International Electrotechnical Commission, IEC 61158-1, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems: Part 1: Introduction, 2003. [46] J.-P. Thomesse, M. Leon Chavez, Main paradigms as a basis for current fieldbus concepts, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 2–15. [47] C. Diedrich, Profiles for Fieldbuses: Scope and Description Technologies, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 90–97. [48] U. Döbrich, P. Noury, ESPRIT Project NOAH: Introduction, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 414–422. [49] R. Simon, P. Neumann, C. Diedrich, M. Riedl, Field devices-models and their realisations, in IEEE International Conference on Industrial Technology (ICIT ’02), Bangkok, December 2002, pp. 307–312. [50] A. di Stefano, L. Lo Bello, T. Bangemann, Harmonized and consistent data management in distributed automation systems: the NOAH approach, in IEEE International Symposium on Industrial Electronics, ISIE 2000, Cholula, Mexico, December 2000, pp. 766–771. [51] M. Knizak, M. Kunes, M. Manninger, T. Sauter, Applying Internet management standards to fieldbus systems, in WFCS ’97, Barcelona, October 1997, pp. 309–315. [52] M. Kunes, T. Sauter, Fieldbus-Internet connectivity: the SNMP approach, IEEE Transactions on Industrial Electronics, 48, 1248–1256, 2001. [53] M. Wollschlaeger, Integration of VIGO into Directory Services, paper presented at 6th International P-NET Conference, Vienna, May 1999. [54] M. Wollschlaeger, Framework for Web integration of factory communication systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes JuanLes-Pins, France, October 2001, pp. 261–265. [55] J.D. Decotignie, A perspective on Ethernet-TCP/IP as a fieldbus, in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, pp. 138–143. [56] E. Byres, Ethernet to Link Automation Hierarchy, InTech Magazine, June 1999, pp. 44–47. [57] M. Felser, Ethernet TCP/IP in automation, a short introduction to real-time requirements, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 501–504.

© 2005 by CRC Press

7-34

The Industrial Communication Technology Handbook

[58] V. Schiffer, The CIP family of fieldbus protocols and its newest member: EtherNet/IP, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 377–384. [59] M. Volz, Quo Vadis Layer 7? The Industrial Ethernet Book, no. 5, Spring 2001. [60] K.C. Lee, S. Lee, Performance evaluation of switched Ethernet for real-time industrial communications, Computer Standards and Interfaces, 24, 411–423, 2002. [61] TC65/SC65C, New work item proposal, 65C/306/NP, 2003. [62] IEEE 1588, Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, 2002. [63] TC65/SC65C, Meeting minutes, 65C/318/INF, 2003. [64] A. Boller, Profinet V3: bringing hard real-time and the IT world together, Control Engineering Europe, September 2003, http://www.manufacturing.net/ctl/article/CA318939. [65] R. Höller, G. Gridling, M. Horauer, N. Kerö, U. Schmid, K. Schossmaier, SynUTC: high precision time synchronization over Ethernet networks, in 8th Workshop on Electronics for LHC Experiments (LECC), Colmar, France, September 9–13, 2002, pp. 428–432. [66] http://www.ethercat.org/. [67] http://www.ethernet-powerlink.com/. [68] Schneider Automation, Modbus Messaging on TCP/IP Implementation Guide, May 2002, http:// www.modbus.org/. [69] E. Schemm, SERCOS to link with ethernet for its third generation, IEE Computing and Control Engineering Journal, 15, 30–33, 2004. [70] J.-D. Decotignie, Some future directions in fieldbus research and development, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [71] L. Rauchhaupt, J. Hähniche, Opportunities and problems of wireless fieldbus extensions, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [72] L. Rauchhaupt, System and device architecture of a radio based fieldbus: the RFieldbus system, in IEEE Workshop on Factory Communication Systems, Västerås, Sweden, 2002, pp. 185–192. [73] P. Palensky, Distributed Reactive Energy Management, Ph.D. thesis, Vienna University of Technology, Austria, 2001. [74] G. Gaderer, T. Sauter, Ch. Eckel, What it takes to make a refrigerator smart: a case study, in IFAC International Conference on Fieldbus Systems and Their Applications (FeT), Aveiro, Portugal, July 2003, pp. 85–92. [75] L. Haddon, Home Automation: Research Issues, paper presented at EMTEL Workshop: The European Telecom User, Amsterdam, November 10–11, 1995. [76] M. Lobashov, G. Pratl, T. Sauter, Implications of power-line communication on distributed data acquisition and control systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Lisboa, Portugal, September 2003, pp. 607–613. [77] R. Piggin, An introduction to safety-related networking, IEE Computing and Control Engineering Journal, 15, 34–39, 2004. [78] PROFIBUS International, Profile for Failsafe with PROFIBUS, DP-Profile for Safety Applications, Version 1.2, October 2002, http://www.profibus.com. [79] INTERBUS Club, INTERBUS Safety, White Paper, 2003. [80] http://as-i-safety.net. [81] ODVA, Safety Networks: Increase Productivity, Reduce Work-Related Accidents and Save Money, Open DeviceNet Vendor Assoc., White Paper, 2003, http://www.odva.org. [82] J.-P. Froidevaux, O. Nick, M. Suzan, Use of fieldbus in safety related systems, an evaluation of WorldFIP according to proven-in-use concept of IEC 61508, WorldFIP News, http://www. worldfip.org. [83] G. Leen, D. Heffernan, Expanding automotive electronic systems, IEEE Computer, 35, 88–93, 2002. [84] H. Gharavi, S.P. Kumar (Eds.), Special issue on sensor networks and applications, Proceedings of the IEEE, 91, 2003.

© 2005 by CRC Press

Fieldbus Systems: History and Evolution

7-35

[85] G. Borriello, Key challenges in communication for ubiquitous computing, IEEE Communications Magazine, 40, 16–18, 2002. [86] D. Dietrich, T. Sauter, Evolution potentials for fieldbus systems, in Proceedings of the 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 343–350. [87] A. Koestler, The Ghost in the Machine, Arkana Books, London, 1967. [88] F. Pichler, On the construction of A. Koestler’s holarchical networks, in Cybernetics and Systems 2000, Austrian Society for Cybernetic Systems, Vienna, 2000. [89] P. Palensky, The convergence of intelligent software agents and field area networks, in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 917–922. [90] T. Wagner, An agent-oriented approach to industrial automation systems, in Agent Technologies, Infrastructures, R. Kowalczyk et al. (Eds.), Springer-Verlag, Berlin, 2003, pp. 314–328. [91] A. Pöschmann, P. Krogel, Autoconfiguration Management für Feldbusse: PROFIBUS Plug & Play, Elektrotechnik und Informationstechnik, 117, 5, 2000. [92] W. Kastner, M. Leupold, How dynamic networks work: a short tutorial on spontaneous networks, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes JuanLes-Pins, France, October 15–18, 2001, pp. 295–303. [93] S. Deter, Plug and participate for limited devices in the field of industrial automation, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 263–268. [94] O. Cramer Nielsen, A real time, object oriented fieldbus management system, in 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 335–340. [95] A. Baginski, G. Covarrubias, Open control: the standard for PC-based automation technology, in IEEE International Workshop on Factory Communication Systems, October 1997, pp. 329–333. [96] OPC Data Access Automation Specification, Version 2.0, OPC Foundation, October 14, 1998. [97] R. Bachmann, M.S. Hoang, P. Rieger, Component-based architecture for integrating fieldbus systems into distributed control applications, in Fieldbus Technology, Springer-Verlag, Heidelberg, 1999, pp. 276–283. [98] R. Simon, M. Riedl, C. Diedrich, Integration of field devices using field device tool (fdt) on the basis of electronic device descriptions (EDD), in IEEE International Symposium on Industrial Electronics, ISIE ’03, June 9–11, Rio de Janeiro, 2003, pp. 189–194. [99] W.H. Moss, Report on ISO TC184/SC5/WG5 open systems application frameworks based on ISO 11898, in 5th International CAN Conference (iCC ’98), San Jose, CA, 1998, pp. 07-02–07-04. [100] Function Blocks for Industrial-Process Measurement and Control Systems: Committee Draft, IEC TC65/WG6, ftp://ftp.cle.ab.com/stds/iec/sc65bwg7tf3/html/news.htm. [101] GSD Specification for PROFIBUS-FMS (version 1.0), PNO Karlsruhe. [102] Device Description Language specification, HART Communication Foundation, Austin, TX, 1995. [103] T. Bray, J. Paoli, C. M. Sperberg-McQueen, Extensible Markup Language (XML) 1.0, 1998, http: //www.w3.org/TR/REC-xml. [104] M. Wollschlaeger, Descriptions of fieldbus components using XML, Elektrotechnik und Informationstechnik, 117, 5, 2000. [105] International Electrotechnical Commission, IEC 61804-2, Function Blocks (FB) for Process Control: Part 2: Specification of FB Concept and Electronic Device Description Language (EDDL), 2003. [106] P. Neumann, C. Diedrich, R. Simon, Engineering of field devices using device descriptions, paper presented at IFAC World Congress 2002, Barcelona, 2002. [107] C. Schwaiger, A. Treytl, Smart card based security for fieldbus systems, in 2003 IEEE Conference on Emerging Technologies and Factory Automation, Lisbon, September 2003, pp. 398–406. [108] T. Sauter, Ch. Schwaiger, Achievement of secure Internet access to fieldbus systems, Microprocessors and Microsystems, 26, 331–339, 2002. [109] P. Palensky, T. Sauter, Security considerations for FAN-Internet connections, in IEEE International Workshop on Factory Communication Systems, Porto, September 2000, pp. 27–35.

© 2005 by CRC Press

7-36

The Industrial Communication Technology Handbook

[110] E. Rescorla, SSL and TLS, Addison-Wesley, Reading, MA, 2000. [111] W. Kastner, C. Csebits, M. Mayer, Linux in factory automation? Internet controlling of fieldbus systems! in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 27–31. [112] CAMAC, A Modular Instrumentation System for Data Handling, EUR4100e, March 1969. [113] http://www.hit.bme.hu/people/papay/edu/GPIB/tutor.htm. [114] National Instruments, GPIB Tutorial, www.raunvis.hi.is/~rol/Vefur/%E9r%20Instrupedia/CGPTUTO.PDF. [115] W. Büsing, Datenkommunikation in der Leittechnik, Automatisierungstechnische Praxis, 28, 228–237, 1986. [116] G. Färber, Bussysteme, 2nd ed., Oldenbourg-Verlag, Munich, 1987. [117] M-Bus Usergroup, The M-Bus: A Documentation, Version 4.8, November 11, 1997, http://www.mbus.com/mbusdoc/default.html. [118] G. Leen, D. Heffernan, A. Dunne, Digital networks in the automotive vehicle, IEE Computer and Control Engineering Journal, 10, 257–266, 1999. [119] CAN-in-Automation, CAN history, http://www.can-cia.de/can/protocol/history/. [120] Condor Engineering, MIL-STD-1553 tutorial, http://www.condoreng.com/support/downloads/ tutorials/MIL-STD-1553Tutorial.PDF. [121] Grid Connect, The Fieldbus Comparison Chart, http://www.synergetic.com/compare.htm. [122] Interbus Club, Interbus Basics, 2001, http://www.interbusclub.com/en/doku/pdf/interbus_ basics_en.pdf. [123] H. Kirrmann, Industrial Automation, lecture notes, EPFL, 2004, http://lamspeople.epfl.ch/ kirrmann/IA_slides.htm. [124] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 3), Automatisierungstechnische Praxis, 40, S17–S24, 1998. [125] T. Sauter, D. Dietrich, W. Kastner (Eds.), EIB Installation Bus System, Publicis MCD, Erlangen, Germany, 2001. [126] E.B. Driscoll, The History of X10, http://home.planet.nl/~lhendrix/x10_history.htm.

© 2005 by CRC Press

7-37

Fieldbus Systems: History and Evolution

Appendix The tables presented here give an overview of selected fieldbus systems, categorized by application domain. The list is necessarily incomplete, although care has been taken to include all approaches that either exerted a substantial influence on the evolution of the entire field or are significant still today. The year of introduction refers to the public availability of the specification or first products. This year is also the one used in the timeline in Figure 7.3. Note that despite careful research, the information obtained from various sources was frequently inconsistent, so there may be an uncertainty in the figures. Where respective data could be obtained, the start of the project has been listed as well because there are several cases where much time elapsed between the start of development of the fieldbus and its first release. TABLE 7.8

Instrumentation and PCB-Level Buses

Fieldbus

Developer (Country)

Introduced in

Standard IEEE 583 (1970, 1982, 1994) IEEE 595 (1974, 1982) IEEE 596 (1972, 1982) IEEE 758 (1979) ANSI IEEE-488 (1975, 1978) ANSI IEEE-488.2 (1987, 1992) IEC 60625 (1979,1993) —

[18]

— EN 1434-3 (1997)

[116] [117]

CAMAC

ESONE (Europe)

1969 (start of development 1966)

GPIB (HP-IB)

Hewlett-Packard (U.S.)

1974 (start of development 1965)

HP-IL

Hewlett-Packard (U.S.)

I 2C M-Bus

Philips (Netherlands) University of Paderborn, TI, Techem (Germany) Industry consortium (Germany)

1980 (start of development 1976) 1981 1992

Measurement Bus

TABLE 7.9

1988

[112]

[113, 114, 115]

DIN 66348-2 (1989) DIN 66348-3 (1996)

Automotive and Aircraft Fieldbuses

Fieldbus

Developer (Country)

Introduced in

Standard

References

— AEEC ARINC 429 (1978, 1995) ISO 11898 (1993, 1995) ISO 11519 (1994)

[118]

ABUS ARINC CAN

Volkswagen (Germany) Aeronautical Radio, Inc. (U.S.) Bosch (Germany)

Flexray J1850

DaimlerChrysler, BMW (Germany) Ford, GM, Chrysler (U.S.)

1987 1978 1986 (start of development 1983), CAL 1992 2002 1987

J1939 LIN MIL-1533

SAE (U.S.) Industry consortium SAE (military and industry consortium, U.S.)

1994 1999 1970 (start of development 1968)

VAN

Renault, PSA Peugeot-Citroen (France), ISO TC22 Ship Star Assoc., Boeing (U.S.) Vienna University of Technology (Austria)

1988

— SAE J1850 (1994, 2001) ISO 11519-4 SAE J1939 (1998) — (open spec) MIL-STD-1553 (1973) MIL-STD-1553A (1975) MIL-STD-1553B (1978) ISO 11519-3 (1994)

1997 1996

IEC 61158 (2000) —

SwiftNet TTP

References

© 2005 by CRC Press

[119]

[118] [118] [120]

[118]

[118]

7-38

The Industrial Communication Technology Handbook

TABLE 7.10 Fieldbuses for Industrial and Process Automation and Their Foundations Fieldbus ARCNET ASi

Developer (Country)

Introduced in

Hart Interbus-S

Datapoint (U.S.) Industry and university consortium (Germany) Intel (U.S.) Mitsubishi (Japan) CAN in Automation (user group, Germany) Allen-Bradley (U.S.) Allen-Bradley (U.S.) Fieldbus Foundation (industry consortium, U.S.) Rosemount (U.S.) Phoenix Contact (Germany)

MAP

General Motors (U.S.)

MMS Modbus PDV-Bus P-NET

ISO TC 184 Gould, Modicon (U.S.) Industry and university consortium (Germany) PROCES-DATA (Denmark)

1986 1979 1979 (start of development 1972) 1983

PROWAY C

IEC TC 65

Profibus

Industry and university consortium (Germany)

1986 (start of development 1975) 1989 (start of development 1984)

SDS Sercos

Honeywell (U.S.) Industry consortium (Germany) APC, Inc. (U.S.) Siemens (Germany) ISA SP 50 (U.S.)

1994 1989 (start of development 1986) 1990 1992 1993

Industry and university consortium (France)

1987 (start of development 1982)

Bitbus CC-Link CANopen ControlNet DeviceNet FF

Seriplex SINEC L2 SP50 Fieldbus (World)FIP

© 2005 by CRC Press

1977 1991 1983 1996 1995 (start of development 1993) 1996 1994 1995 (start of development 1994) 1986 1987 (start of development 1983) 1982 (start of development 1980)

Standard

References

ANSI ATA 878 (1999) EN 50295-2 (1998, 2002) IEC 62026-2 (2000) ANSI IEEE 1118 (1990) — (open spec) EN 50325-4 (2002)

[121]

EN 50170-A3 (2000) EN 50325-2 (2000) BSI DD 238 (1996) EN 50170-A1 (2000) — (open spec) DIN 19258 (1993) EN 50254-2 (1998) MAP 1.0 (1982) MAP 2.0 (1985) MAP 3.0 (1988) ISO/IEC 9506 (1988, 2000) — (open spec) DIN 19241 (1982)

[121] [119] [121]

DS 21906 (1990) EN 50170-1 (1996) ISA S72.01 (1985) IEC 60955 (1989) FMS: DIN 19245-1 and -2 (1991) DP: DIN 19245-3 (1993) PA: DIN 19245-4 (1995) FMS/DP: EN 50170-2 (1996) DP: EN 50254-3 (1998) PA: EN 50170-A2 (2000) EN 50325-3 (2000) IEC 61491 (1995) EN 61491 (1998) IEC 62026-6 (2000) — ISA SP 50 (1993) AFNOR NF C46601-7 (1989–1992) EN 50170-3 (1996) DWF: AFNOR NF C46638 (1996) DWF: EN 50254-4 (1998)

[119]

[122] [123]

[124, 115]

[14]

[119]

[121]

[16]

7-39

Fieldbus Systems: History and Evolution

TABLE 7.11 Fieldbuses for Building and Home Automation Fieldbus

Developer (Country)

Introduced in

BACnet

ASHRAE SPC135P (industry consortium, U.S.)

1991

Batibus

Industry consortium (France) Industry consortium (U.S.) Industry consortium (Europe) Industry consortium (Germany)

1987

CEBus EHS EIB

HBS

Industry consortium (Japan)

LonWorks

Echelon (U.S.)

Sigma I X10

ABB (Germany) Pico Electronics (U.K.)

© 2005 by CRC Press

1984 1987 1990

1986 (start of development 1981) 1991 1983 1978 (start of development 1975)

Standard

References

ANSI/ASHRAE 135 (1995) ENV 1805-1 (1998) ENV 13321-1 (1999) ISO 16484-5 (2003) AFNOR NF 46621-3 and -9 (1991) ENV 13154-2 (1998) ANSI EIA 600 (1992) ENV 13154-2 (1998) AFNOR NFC 46624-8 (1991) DIN V VDE 0829 (1992) ENV 13154-2 (1998) EIAJ/REEA ET2101

[125]

ANSI EIA 709 (1999) ENV 13154-2 (1998) — —

[121, 126]

[126]

8 The WorldFIP Fieldbus 8.1 8.2 8.3 8.4

Introduction ........................................................................8-1 WorldFIP Origin .................................................................8-2 Requirements.......................................................................8-2 Choices of WorldFIP...........................................................8-3 Identified Data vs. Classical Messages • Periodic and Aperiodic Traffic • Timeliness Attributes and Mechanisms for Time-Critical Systems

8.5

WorldFIP Architecture........................................................8-5

8.6

Physical Layer ......................................................................8-6

8.7

Data Link and Medium Access Control Layers ................8-7

Architecture and Standardization Figures • Topology • Coding Introduction • Basic Mechanism • The Aperiodic Server • Variable Transfer Services • Message Transfer • Synthesis on the Data Link Layer

8.8

Application Layer ..............................................................8-11 Services Associated with the Variables • Temporal Validity of Variables • Synchronous and Asynchronous • Synchronization Services • Services Associated with Variables Lists

8.9

Jean-Pierre Thomesse Institut National Polytechnique de Lorraine

WorldFIP State and Technology.......................................8-16 Technology • Fieldbus Internet Protocol • New Development

8.10 Conclusion.........................................................................8-16 References .....................................................................................8-17

8.1 Introduction This chapter is dedicated to the study of the WorldFIP* fieldbus. It is one of the first fieldbuses, born at the beginning of the 1980s. But it is also at the origin of several main concepts, which are now implemented in different, other fieldbuses. For example, the producer–consumer model, the timeliness attributes to qualify the validity of data, and the time coherence and consistency attributes are some of the most important WorldFIP contributions. Many of them are coming from research activities (academic and industrial) and from very distributed and real-time requirements analysis. That is why the first sections of this chapter will briefly relate the origin of this fieldbus (Section 8.2), the requirements (Section 8.3), and the choices of WorldFIP specifications (Section 8.4). The technical aspects will be further studied in the four following sections: the architecture in Section 8.5, the physical layer in Section 8.6, the data link layer in Section 8.7, and the application layer in Section 8.8. The current state of this fieldbus is given in the last section (Section 8.9) before the conclusion and bibliography. A lot of theoretical works have been developed for more than 15 years, to prove the protocols, to evaluate the performances, to *WorldFIP is the current name of the previous FIP network. FIP stands for Factory Instrumentation Protocol, but in the French language, the acronym means Flux d’Information (de et vers le) Processus.

8-1 © 2005 by CRC Press

8-2

The Industrial Communication Technology Handbook

guarantee the time constraints (Pleinevaux et al., 1988; Song et al., 1991; Simonot et al., 1995), or to estimate the performances of distributed applications (Bergé et al., 1995).

8.2 WorldFIP Origin The first works on the WorldFIP specification started in September 1982, in a working group under the aegis of the French Ministry of Research and Technology. This working group was composed of representatives of end users, engineering companies, and laboratories. It was important not to include providers and manufacturers of networks at the beginning in order to organize a real end users’ needs analysis, without having to consider the possible influence of existing products or projects. The first objective of this work was to analyze the needs for communication in automatic control systems, but it was necessary to take into account the following points: • It was really the starting development of local area networks. • It was the beginning of the Manufacturing Automation Protocol (MAP) project in the U.S. (MAP, 1988). • Some new ideas appeared on the application architectures, especially the idea of really distributed systems. • The intelligent devices started their development thanks to the progress of microelectronics. The development of WorldFIP started in this context, with essentially two main types of contributions, coming from research and end users’ experiences. The functional analysis of the communication needs in automatic control systems led to the distinction between two mains flows: • A flow of information associated with the control rooms in continuous processes or with the plant in discrete part manufacturing applications • A flow associated with the field devices called “flow of information of the process,” which will be analyzed later and which led to the WorldFIP fieldbus profiles To satisfy the former type, different local area networks were already in existence, while nothing yet existed for the latter. It was then decided to specify a so-called instrumentation network.* The first specification of the FIP Fieldbus was then published in May 1984 (Galara and Thomesse, 1984). It was only at the beginning of the 1990s that the name was transformed to WorldFIP. More information on the origins may be found in Thomesse (1993, 1998). The first results were presented for sustaining a standardization process at International Electrotechnical Commission (IEC) (Gault and Lobert, 1985).

8.3 Requirements The first (and abstract) requirement was to define a communication system to take the place of usual connections standards (4 to 20 mA) between the devices and controllers in an automation system. Another expression was more complete but also abstract: the objective was the design of an operating system for instrumentation. It was in fact a real need in order to build not only a communication system but also really distributed systems. It was then important to provide the right and well-suited services for the distribution of the applications (facilities for the management of coherence and consistencies and for the management of the impossible common global state and clock synchronization). The requirements could be enounced at different abstraction levels. Starting from the most general (see above), they have led to the following: • The connection between the field devices and the control functions should be expensive enough to try the specification of another communication technique. *At this time the word Fieldbus was not yet in use.

© 2005 by CRC Press

The WorldFIP Fieldbus

8-3

• The access to the data by the network should be standardized. • The location of data should be transparent for the user. • The system should be built to meet different dependability requirements by using the same basic components. • The competitiveness of companies should be improved by such technologies. • The development should go through the international standardization. • The protocols should be implemented in silicon. • The data flows between the functions and the set of field and control equipment have then been identified and analyzed, leading to the identification of special needs for a so-called instrumentation network. These led to the identification of the traffic and then to the more technical requirements: • The exchanged data are coming from sensors or are put to actuators. Most of them are known and identified (temperature, pressure, speed, position, and so on), but other transmitted data are not identified in the same sense and usual messages must be transmitted. • The exchanges are periodic or not. They are time constrained in terms of period, jitter, deadline, lifetime, promptness, and refreshment. • Most critical traffic must be periodically managed, but sporadic traffic must also take place. • The timeliness is important for the quality of service and the dependability of the applications. • The distributed decisions must be consistent; i.e., the data and the physical process must be seen in a coherent manner by all application processes. The impossible global state must be approached by a reliable broadcasting of states and events.

8.4 Choices of WorldFIP According to the previous requirements, the WorldFIP solution is based on a few basic ideas, which give the right quality of service to this fieldbus: • The distinction of two types of messages: the notion of identified data vs. the concept of classical messages, associated with the respective cooperation models, producer–consumer vs. client–server • The predefined scheduling of periodic traffic, with periods suited to the physical needs, especially the sampling theory • The online scheduling of sporadic traffic, with priority messages to the critical traffic • The cyclic updating of real-time data at the consumers’ sites • The timeliness attributes and mechanisms for time-critical systems These choices will be presented and analyzed below.

8.4.1 Identified Data vs. Classical Messages Data provided by sensors, data sent to the actuators, and more generally input/output (I/O) and control data are all identified in a given process. They are known within the application. These data are also called identified variables or identified objects. They are often simple objects (temperature, pressure, speed, etc.) and of fixed syntax (integer, real, Boolean, record, list, or other structured data). For instance, a temperature sensor can produce a temperature value coded as an integer, or as a real, and the manufacturer identification as a character chain. The identified data receive a name, which is a global name for the whole application. This name is also used for managing access to the medium. Each variable value has a single producer and one or more consumers. Since transferred values correspond to variables in the process, an identifier is attached to each variable whose value is to be transmitted on the network. This identifier is used as source address to control the medium access. The destination

© 2005 by CRC Press

8-4

The Industrial Communication Technology Handbook

is not indicated. Consumers are responsible for deciding the update of their copies of data on the reception of data by recognizing the corresponding identifier. This is the so-called source addressing. This addressing technique represents several advantages. It allows communication in a one-to-many manner with broadcast. Not only is the communication channel used efficiently when the same information has to be transmitted to more than one consumer, but also the coherence may be obtained with reliable broadcast. A new receiver may be added without address modification. Identifying the variables instead of the sources of the information on the variables offers an additional advantage: the variable is no longer bound to a node of the network. For example, in case of failure of the node providing the variable value, a new source may become active and replace the failed node without any modification of the receivers. Regarding an identified object, a single active producer is defined and all other stations may be defined as consumers.

8.4.2 Periodic and Aperiodic Traffic The control systems are usually based on the system sampling theory, and then the data in inputs and outputs should be transferred periodically. WorldFIP has chosen to privilege the periodic traffic of identified objects between producer and consumers. Variable values are stored in erasable buffers rather than in queues. There is neither acknowledgment nor retransmission for variable transfers. WorldFIP is from this point of view a time-triggered system (Kopetz, 1990). WorldFIP may also be seen as a distributed database updating and management system. The producer of an identified object periodically updates his own buffer, WorldFIP periodically updates the buffers at the consumer locations, and then these consumers may periodically use the copy of the producer value. If a failure occurs during the transmission, the last value is always available for the consumer until he or she receives a new one. In WorldFIP, a one-place erasable buffer is associated with each variable at its production and consumption locations. The usual acknowledgments are not necessary, and the retransmissions are avoided in case of error. The question is how to handle some critical data like alarms or rarely occurring events. In WorldFIP, there are two possible ways, depending on the criticality. If no real-time reaction is required, the best is to use the usual message transfer. Otherwise, the only good solution is to transform the alarm into a variable whose value reflects the presence of an alarm and transfer this value periodically. One may think that this would result in a waste of bandwidth. This is true, but it is the price to pay to ensure a deterministic response time. Moreover, multicast transfers are complex when acknowledgments from each receiver are required. In FIP, the choice to suppress acknowledgments simplifies drastically the solution.

8.4.3 Timeliness Attributes and Mechanisms for Time-Critical Systems Due to the periodic transfer of identified data between a producer and its consumers, no acknowledgment has been proposed. No retransmission is basically allowed. We consider the three following elements: a producer, the consumers, and the bus. The producer is a process producing a data named X at a given period. Several processes consume X at different periods, and the bus updates at a given period the copy of X at each consumer site from the original of X. The question at each consumption site is: Is the value of X fresh, too old, or obsolete? Therefore, some timeliness attributes have been defined in order to indicate to the consumers if the data are correct and in this case the cause of error. These attributes are called refreshments and promptness. The former type indicates if the production is timely; the latter indicates if the reception is correct. Based on these elementary attributes, it is then possible to define the time coherence of actions, i.e., the fact that different distributed actions take place in a given time interval. That is also the definition of simultaneity of actions.

© 2005 by CRC Press

8-5

The WorldFIP Fieldbus

MMS

MPS

Identified Traffic management

Messaging management

Physical layer

FIGURE 8.1 Simplified architecture of WorldFIP.

MMS

MPS Identified Traffic management

Messaging management

(ident,value) transfer Physical layer

FIGURE 8.2 Architecture of WorldFIP.

Other mechanisms have been introduced as synchronization mechanisms between the local operations and the behavior on the network. All these mechanisms will be detailed in Section 8.8.

8.5 WorldFIP Architecture The WorldFIP architecture is demonstrated in Figure 8.1 and Figure 8.2, according to the Open Systems Interconnection (OSI) architecture model (Zimmermann, 1980). All elements were standardized in France in 1992 (AFNOR, 1989). This architecture shows that two main profiles may be used. One is defined to solve the traffic of identified objects; the other is defined for the usual messaging exchanges. This architecture is directly issued from the need analysis. It is important to note that the messaging services in the data link layer are related to the point-to-point exchanges of frames, with storage in queues, with or without acknowledgment, and replication detection. The identified traffic services are related to the exchanges of data in a broadcast manner, with storage in erasable buffers, without acknowledgment, except by the space consistency mechanism at the application layer. Messaging periodic service (MPS) is the service element for the periodic and aperiodic exchanges of identified data. It uses the services of identified traffic at the data link layer. MMS is a subset of the wellknown MMS standard (ISO, 1990) and uses the messaging services at the data link layer. We may say that the first profile (left of the figure) is a profile for real-time traffic management, with guaranteed quality-of-service and timeliness properties. The second profile is used more for noncritical exchanges, e.g., during commissioning, for maintenance and configuration, or more generally for management. Notice that the messaging services are based on the same medium access control.

8.5.1 Architecture and Standardization The European standard EN 50170 [CENELEC, 1996a] contains three national standards in Europe.* Volume 3 outlines all WorldFIP specifications according to the organization shown in Table 8.1 and Figure 8.3. *The other volumes are concerned with P-Net and Profibus.

© 2005 by CRC Press

8-6

The Industrial Communication Technology Handbook

TABLE 8.1

Parts of the European 50170-3 Standard

EN 50170 volume 3

Part 1-3

EN 50170 volume 3

Part 2-3 Sub-part 2-3-1 Sub-part 2-3-2 Sub-part 2-3-3 Part 3-3 Sub-part 3-3-1 Sub-part 3-3-2 Sub-part 3-3-3 Part 5-3 Sub-part 5-3-1 Sub-part 5-3-2 Part 6-3 Part 7-3

EN 50170 volume 3

EN 50170 volume 3

EN 50170 volume 3 EN 50170 volume 3

General Purpose Field Communication System Physical Layer IEC Twisted Pair (IEC 61158-2) IEC Twisted Pair Amendment IEC Fiber optic Data Link Layer Data Link Layer Definitions FCS Definition Bridge Specification Application Layer Specification MPS Definition SubMMS Definition Application Protocol Specification Network Management

TABLE 8.2

Data Rate and Maximum Possible Lengths

Data Rate

Length without Repeater

Length with 4 Repeaters

31,25 kbps 1 Mbps 2,5 Mbps

10 km 1 km 700 m

50 km 5 km 3,5 km

Several profiles of WorldFIP have been defined. One of them, the simpler one providing only periodic traffic of identified data, is called Device WorldFIP (DWF) and is standardized (AFNOR, 1996; CENELEC, 1996b).

8.6 Physical Layer The physical layer of WorldFIP was obviously the first to be conformed to IEC 1158-2* because this standard has been defined starting from the FIP French standard C46 604. The medium may be a twisted shielded pair or a fiber optic.

8.6.1 Figures 8.6.1.1 Data Rates The standard defines three data rates for the shielded twisted pair: 31.25 kbps, 1 Mbps, and 2.5 Mbps. For the fiber optic, a fourth data rate, 5 Mbps, is defined. However, some experiences have been built with other data rates, for example, 25 Mbps, with transfer of speed and video. 8.6.1.2 Maximum Length The maximum number of stations is 256 and the maximum number of repeaters is 4. According to the data rate and number of repeaters, Table 8.2 gives the possible maximum lengths.

8.6.2 Topology The topology for a twisted shielded pair may be like that shown in Figure 8.4.

*This number was the previous number of the current 61158 standard.

© 2005 by CRC Press

8-7

The WorldFIP Fieldbus

Network Management

SubMMS EN50170 - volume 3, part 5-3-2

MPS EN50170 Volume 3 Part 5-3-1

MCS EN50170 - volume 3, part 6-3

EN50170 Volume 3 Part 7-3

Data Link layer EN50170 - volume 3, Part 3-3

Physical layer EN50170 - volume 3, Part 2-3

FIGURE 8.3

Architecture and European standard.

PC JB

TAP

JB

REP

JB

JB

DS

NDS

DS

DS

PC

DS

JB JB

DS

DS

NDS

DS

DS

NDS

NDS

JB: JunctionBox TAP: Connector DS: Diffusion Box DS: Device locally disconnectable NDS: Not disconnectable device RP: Repeater PC: Principal cable

FIGURE 8.4 Example of topology.

8.6.3 Coding The coding is based on a Manchester code. A physical data frame is composed of three parts: a sequence of frame composed of a preamble and a start delimiter (PRE and FSD), the data link information, and the end delimiter (FED). Twenty-four bits are added to each data frame.

8.7 Data Link and Medium Access Control Layers 8.7.1 Introduction The WorldFIP medium access control is centralized and managed by a so-called bus arbitrator (BA). All exchanges are under the control of this bus arbitrator. They are currently scheduled according to the

© 2005 by CRC Press

8-8

The Industrial Communication Technology Handbook

timing requirements and time constraints (Cardeira and Mammeri, 1995), but the scheduling policy is not relevant to the standard. The data link layer provides two types of services: for the identified objects and for the messages. Both may take place periodically. Thanks to the medium access control (MAC) protocol, it is obviously easy to manage the periodic traffic, which may be scheduled before the runtime. However, then it is necessary to provide the well-suited services for the requirement of sporadic or random traffic and the associated protocol mechanisms. As it is usually known, the random traffic is managed by a periodic server. When a station is polled, it may express a request for extra polling. Such requests corresponding to the aperiodic traffic are managed dynamically by the bus arbitrator.

8.7.2 Basic Mechanism The medium access control is based on the following principle: each exchange is composed of two frames, a request and a response. All of the exchanges are based on the couple (name of information, value of the information) implemented by two frames: an identification frame and a value frame. So, to exchange a value of an object, the bus arbitrator sends a frame that contains the identifier of this object. This frame is denoted ID-DAT (as identification of data) (Figure 8.5a). This frame is received by all active current stations and recognized by the so-called producer station of the identified object, and also by the consumer stations that subscribe to this object* (Figure 8.5b). The station that recognizes itself as the producer sends the current value of the identified object. This value is transferred in a so-called RP_DAT frame (as response) (Figure 8.5c). All interested stations, including all subscribers and the bus arbitrator, receive this RP_DAT frame (cf. Figure 8.5d). The ID and RP frames have the following formats: ID frame: • A control field (cf.) of 8 bits for which roles will be developed later • The identifier of the object (16 bits) • A cyclic redundancy check (CRC) (16 bits) RP frame: • A control field (cf.) of 8 bits for which roles will be developed later • The value of the identified object in the previous frame (maximum of 256 bytes) • A CRC (16 bits) The idea is now to extend this simple mechanism of polling by designation of the data to be sent, in order to solve the problem of messages transfer and the transfer of aperiodic data or messages.

8.7.3 The Aperiodic Server The needs for aperiodic transfer have been identified in Section 8.4.2. This aperiodic transfer takes place in the free time slots of the periodic one. The aperiodic traffic is dynamically managed by the bus arbitrator. 8.7.3.1 First Stage The first stage is the expression of the request to the BA by a station. Any producer may request a new exchange by an indication in the control field of the RP frame, when it is polled by an ID_DAT frame. This indication specifies the type of RP frame. We may then observe the three following RP frame types as answers to an ID_DAT frame:

*Notice that the producer–consumer model is also called publisher–subscriber, following the concept of subscribing to the data by the consumers.

© 2005 by CRC Press

8-9

The WorldFIP Fieldbus

(a) VARK

Identification of VARK

VAR3 VAR2 VAR1

(b) VARK

VAR3 VAR2 VAR1 Prod Value of VARK

Cons Value of VARK

Cons Value of VARK

(c) New Value of VARK

VARK

VAR3 VAR2 VAR1 Prod

Cons

Cons

New Value of VARK

Old Value of VARK

Old Value of VARK

Prod

Cons

(d) VARK

VAR3 VAR2 VAR1

New Value of VARK

FIGURE 8.5 Exchange and updating of a variable.

© 2005 by CRC Press

New Value of VARK

Cons New Value of VARK

8-10

The Industrial Communication Technology Handbook

RP_DAT: Response to ID_DAT without any request RP_DAT_RQ: Response to ID_DAT with request for random exchange of other identified objects RP_DAT_MSG: Response to ID_DAT with request for random exchange of message 8.7.3.2 Second Stage The second stage is to satisfy the request. The BA has then to place the right ID frames in a free time slot of the scanning table, according to its own scheduling policy. The ID frames corresponding to the possible requests are ID_RQ and ID_MSG. The former satisfies the RP_DAT_RQ and the latter the RP_DAT_MSG. Following the reception of ID_RQ, the station at the origin of the request sends an RP_RQ frame, in the data field, with a list of identifiers, which have to be sent as ID frames by the BA. Following the reception of ID_MSG, the station at the origin of the request sends an RP_MSG frame with a message in the data field. This message is specified with or without acknowledgment. Both of these RP_MSG frames are called RP_MSG NOACK or RP_MSG_ACK. In this last case, the receiver sends an RP_ACK after reception of the message. A special frame RP_FIN allows the BA to continue its polling.

8.7.4 Variable Transfer Services To each identified object, the data link layer associates a buffer B_DAT_prod at the producer station and a buffer B_DAT_cons at each consumer station. Two main services are defined for writing and reading a buffer: L_PUT and L_GET, respectively. The write service (L_PUT) places the new value in the producer buffer. The previous buffer content is overwritten. The read service (L_GET) gets the value from the consumer buffer. These services do not cause any traffic on the bus. The content of each consumer buffer is updated with the value stored in the producer buffer under the control of the bus arbitrator (Figure 8.5d), as seen in Section 8.7.2. A L_SENT.indication informs the producer when the transmission takes place. The consumers are informed by an L_RECEIVED indication when the update occurs.

8.7.5 Message Transfer For the message as for the identified objects, WorldFIP defines periodic and aperiodic (or on-request) message transfers. To periodic messages are assigned an identifier and a queue. According to the application needs, more than one identifier and one queue may be used when one wants to have different polling periods. Messages are deposited at the source side and the transfer of the content of the queue is periodically triggered by the bus arbitrator (ID-MSG frame). If the queue is not empty, the source DLL sends the first message in the queue in an RP_MSG_xx frame. The destination data link layer stores the message in the receive queue and, if requested, immediately acknowledges the transfer using an RP_ACK frame. The end of the transaction is signaled by the source to the bus arbiter using an RP_FIN frame. If the queue is empty, no transfer takes place and only the RP_FIN frame is sent. The polling period is a configuration parameter. For aperiodic message transfer, on the source side, the data link layer defines a single queue F_MSG_aper that will hold the pending messages. On the destination side, a receiving queue F_MSG_rec is defined. As for aperiodic variables, transfer requests are signaled to the bus arbiter as piggybacks on RP_DAT frames sent in response to ID_DAT frames.

8.7.6 Synthesis on the Data Link Layer Three types of objects are exchanged according to the basic principle based on the exchange of the couples (name, value) (Thomesse and Rodriguez, 1986). These objects are identified objects or list of identifiers of identified objects or messages with or without acknowledgment (Figure 8.6). The data link layer protocol may be considered with a connection established at the configuration stage. These connections are multicast and the corresponding services access points (SAPs) are represented

© 2005 by CRC Press

8-11

The WorldFIP Fieldbus

Station queue

Service Access Point IDENT4

Buffer Value of IDENT4

Buffer List of identifiers

CEP RP-DAT

CEP RP-RQ

Pointer to message queue

General Queue for messages

CEP RP-MSG

FIGURE 8.6 Service access point and connection end point.

by the identifiers. Associated with each SAP, different connection end points (CEPs) are represented by the associated objects and necessary resources: • A CEP for data exchange represented by the buffer storing the successive values • A CEP for the list of identifiers exchange represented by another buffer • A CEP for the sending of messages represented by a queue Each of these CEPs is addressed first by the identifier addressing the SAP and second by the indication in the control field of the ID frame, specifying the corresponding resource.

8.8 Application Layer Two application service elements comprise the WorldFIP application layer: MPS for real-time data exchange (Thomesse and Delcuvellerie, 1987; Thomesse and Lainé, 1989) and sub-MMS for usual messaging service and compatibility with other networks. For the real-time data exchange, FIP behaves like a distributed database being refreshed by the network periodically or aperiodic on demand. All application services related to the periodic and aperiodic data exchange are called MPS. MPS provides local read-write services (periodic) as well as remote read-write services (aperiodic) of the values of variables or lists of variables. The read services associate indications on the age of the object value. Considering a producer (producer of data named X) at a given period, the consumers of X consuming X at different periods, and the bus itself updating at a given period the copies of X consumers from the original of X, the question at a consumption site is: Is the value of X fresh, too old, or obsolete? In WorldFIP, this information is based on local mechanisms and provided as two types of status: the refreshment status elaborated by the producer and the promptness status elaborated by the consumer (Thomesse et al., 1986; Decotignie and Raja, 1993; Lorenz et al., 1994). These statuses are returned by the read services with the value itself. This information may also be used to check whether a set of variables are time coherent (Figure 8.9). As a variable may have several consumers, there is a need to know whether the different copies of the variable value available to the various consumers are identical. This information, called spatial coherency status (or spatial consistency) (Saba et al., 1993), is provided by the MPS read services related to lists of variables. The same services also offer a temporal coherence status.

8.8.1 Services Associated with the Variables A variable can be a simple type, such as integer, floating, Boolean, character, or composite-like array and records. It may have different semantics: variable, synchronization variable, consistency variable, or variable descriptor. Synchronization variables are used to synchronize application processes and also in the elaboration of the temporal and spatial statuses associated with ordinary variables. Consistency

© 2005 by CRC Press

8-12

The Industrial Communication Technology Handbook

variables are used to elaborate the spatial coherence status. Variable descriptors hold all information concerning variables, type, semantics, periodicity, etc., for configuration, commissioning, maintenance, and management. Three request services, A_READ, A_WRITE, and A_UPDATE, and two indication services, A_SENT and A_RECEIVED, are available. A_UPDATE is used to request an aperiodic transfer of a variable. A_SENT and A_RECEIVED are optional services. When used, A_RECEIVED informs the consumer at its local communication entity of the reception of a new variable value. Similarly, A_SENT notifies the producer of the transmission of the produced value. These indication services can be used by application processes (APs) to verify the proper operations of the communication entity and also to synchronize with each other by receiving synchronization variables (see Section 8.8.4). A_READ and A_WRITE exist in two forms, local and remote. A local read of a variable (denoted A_READLOC) provides the requesting AP with the current value of the local variable image. A local write (denoted A_WRITELOC) invoked by an AP updates the local value of the variable to be sent at a next scanning, specified in the request. As already said, these operations do not invoke any communication. The transfer of the value from the producer to its copies in all consumers concerns the distribution function of the application layer. The variable value given in an A_WRITELOC will be available for the distributor, which is in charge of broadcasting it to all consumers of this value. The variable value, returned by an A_READLOC at a consumer site, will be the last value updated by the distribution function. A remote write, A_WRITEFAR, is used by a producer to update the content of the local buffer assigned to the variable specified in the request and to ask for the transfer of this value to the producers. It may been seen as the combination of the invocation of an A_WRITELOC service followed by an A_UPDATE. However, as for the A_WRITELOC, this service may only be invoked by the producer of the variable. Operations are as follows: 1. 2. 3. 4. 5.

Remote READ request Ask the distributor for an update Transfer order from the distributor Transfer of the producer’s variable value to the users Confirmation of remote READ

In a similar way, with a remote read, A_READFAR, a consumer requires the transfer of the variable value from the producer to all consumers. This value is returned as a result of this request. This service is hence a combination of an A_UPDATE service invocation and an A_READLOC service invocation. However, read services may only be invoked on a consumer. Figure 8.7 shows this service from the producer–consumer model point of view, considering an AP that is a consumer of variable X. In order to request explicitly a transfer, this AP should also be the producer of a variable (Y in the example). Figure 8.8 depicts in detail all necessary primitive calls and frame exchanges to achieve the remote READ of X. We can see that these remote operations are not symmetrical. The remote write is a service without confirmation, while the remote read is with confirmation. User

Producer

User

5

1

Distributor

3 FIGURE 8.7 Remote READ using PDC model.

© 2005 by CRC Press

4

2

8-13

The WorldFIP Fieldbus

AP: consumer of X and producer of Y

Data Link layer

Application layer

Exchanged frames

A-READFAR-Rq(X) L-UPDATE-Rq(X)

Y val(Y)

ID-DAT(Y)

X demand queue

L-SENT-Ind(Y)

RP-DAT-RQ(val(Y)) ID-RQ(Y)

X

L-UPDATE-Cnf(X) X val(X)

L-RECEIVED-Ind(X)

RP-RQ(X) ID-DAT(X) RP-DAT(val(X))

A-READFAR-Cnf(X,val(X))

FIGURE 8.8 Primitives and frame exchanges to achieve a remote READ service.

8.8.2 Temporal Validity of Variables Normally, according to the producer–consumer model, operations should proceed in the following order: production, transfer, and consumption. As the receipt of an identifier triggers the transfer, a production and a consumption may be triggered by a production or consumption order. These behaviors may lead to abnormal situations. For example, several productions may occur successively without any transfer. Conversely, a number of transfers may take place between two successive productions of a variable. The same problems may arise on the consumer side. A consumer may read several times the same value of a variable or may not have enough time (or may not be interested) to consume all of the received values. It is thus important to detect these deviations from normal behavior. This means that any consumer should be able to know if the producer has produced on time, the transfer has been handled on time, and the consumer itself has consumed on time. Finally, the consumer should be able to check whether the value is still temporally valid (Lorenz et al., 1994a; Lorenz et al., 1994b). 8.8.2.1 Refreshment The refreshment status type for a variable is a Boolean that indicates if a production occurred in the right time window. It is elaborated by the application layer of the producer. The time window is defined by a start event and a duration. From a simplified view, the refreshment is correct (true) if the production occurs in the time window and it remains true during a given delay, which is the normal production period. It is false after this period deadline. The value of the current refreshment status is sent with the value and indicates to the consumer if, from the point of view of the producer and transmission, the value is valid (Figure 8.9). sync. variable variable prod.

variable transm.

sync. variable

Ttx time timer Tprod refreshment status

true false

timer Tcons promptness status

FIGURE 8.9

true false

Refreshment and promptness status elaboration in the synchronous case.

© 2005 by CRC Press

8-14

The Industrial Communication Technology Handbook

In summary, the refreshment status indicates to a consumer that the producer has produced its value by respecting a production delay called the production period. 8.8.2.2 Promptness The promptness status type for a variable value indicates if the transmission of the data has been done in a right time window. It is elaborated by the communication entities of its consumers. It is returned with the variable value and the refreshment status as a result of an invocation of A_READ.

8.8.3 Synchronous and Asynchronous Refreshment and promptness are the timeliness attributes associated with a given variable. They indicate if an event occurs in a time window defined by a starting event and a duration. The WorldFIP fieldbus has defined two types of timeliness attributes, the synchronous type and asynchronous type.* An attribute is said to be synchronous when the starting event of the time window is a received indication of a dedicated variable. It is normally periodic, and the bus arbitrator is, as for other variables, in charge of the respect of the period. The duration is the period of production and is managed by the network, through the bus arbitrator behavior. An attribute is said to be asynchronous when the starting event is the previous occurrence of this event. The duration is the period of production and is here managed by the device itself. In fact, in WorldFIP, two attributes are relevant to the synchronous type: the so-called synchronous attribute, as defined previously, and the so-called ponctual attribute, where the starting event is the same as for the synchronous attributes, but the duration is no more than the period of production. It is a shorter delay. Further details and the finite-state machines of these mechanisms can be found in the C46-602 standard and in Lorenz et al. (1994a).

8.8.4 Synchronization Services The processes of an application can be synchronized or asynchronous. An asynchronous application process is one whose execution is independent of the network behavior. An application process is said to be synchronized when its execution is related to the reception of some indication of the network. In many cases, the various distributed processes of an application are synchronized. This synchronization may be ensured through the indication of reception of variables. However, some application processes may not be able to handle synchronization. For such asynchronous APs that need to participate in a synchronized distributed application, FIP provides a resynchronization mechanism. In addition to the existing buffer, called the public buffer, for each variable the resynchronization mechanism associates a second buffer, the private buffer. The private buffer is only accessible to the corresponding AP. The access can be performed by the local A_READ if the variable is consumed or A_WRITE if the variable is produced. The public buffer is only accessible to the network (Figure 8.10). The resynchronization mechanism consists of copying the content from one buffer to the other according to the synchronization order via the network. Both variable production and variable consumption can be resynchronized. When variable consumption has to be resynchronized, its value in the private buffer is kept unchanged until the resynchronization order. If a new value is transferred on the network, it will be kept in the public buffer. Only at the reception of a resynchronization order will the value in the public buffer be copied in the private buffer. The process is similar for variable production. In both cases, the resynchronization order is given by a synchronization variable specified with each variable, produced or consumer, that needs to be resynchronized. *The terms synchronous and asynchronous could be replaced by “synchronized by the network” and “locally synchronized,” respectively.

© 2005 by CRC Press

8-15

The WorldFIP Fieldbus

Asynchronous access by producer or consumer

Occurrences of the same variable

Private buffer Public buffer

Synchronous access by network FIGURE 8.10

Resynchronization mechanism.

8.8.5 Services Associated with Variables Lists A variable list is an unordered set of variables that must verify the time coherence, i.e., the fact that they are all produced in a given time window. All variables of a list are consumed at the same time by at least one consumption process. In the case of more than one process concerned with the consumption of a list, another property must be verified: the space consistency, i.e., the fact that all copies of all variables of the list are the same on all consumption sites. The variables that comprise the list may be produced on different sites. They need not all be produced by the same producer, as in the usual application layers, as MMS or MMS-like (FMS or some sub-MMS, for example). Usually the productions are synchronous operations. The only service defined on lists is A_READLIST, which allows the reading of all variables of a list in a single invocation. This service returns the last received values for the variables in the list and three optional statuses: a production coherence status, a transmission coherence status, and a spatial consistency status. These statuses are provided to account for two important needs in the systems using fieldbuses: First, a consumer of several variables is interested to know whether the corresponding values have been sampled nearly at the same time, which is called time coherence (Kopetz, 1990). Second, when the value of a variable has been distributed to several consumers, it might be useful to know if all of the values are identical. This is referred to as spatial consistency. The idea in FIP is not to ensure temporal coherence and spatial consistency, but rather to indicate if these properties are present. In FIP, temporal coherence indication is given through two statuses, the production coherence status and the transmission coherence status. The production coherence status is a Boolean information elaborated by the consumer application layer. This status corresponds to a logical AND operation of all corresponding refreshment statuses. Similarly, a transmission coherence status is calculated as the logical AND of all promptness statuses of the variables in the list. The production coherence status and the transmission coherence status together are an indication of the temporal coherence of the variables in the list. The space consistency status of a list is elaborated by the application layer of each consumer of the list. The elaboration mechanism relies on the broadcast of a consistency variable by each of the consumers. This variable indicates for each copy of the variable list if it has been received correctly and in a given window. After the reception of all consistency variables by all consumers, each of them will have knowledge about the validity of the variables in the list at all consumers. A logical AND operation of all consistency variables will give the spatial consistency status. To make the variable list transfer more reliable, FIP defines an error recovery mechanism. If needed, a consumer can trigger a retransmission of its consumed variables when any error is detected. This mechanism performs a retransmission request (using aperiodic variable transfer service of data link layer) for a number of times, limited to a maximum defined for each of the instances of this list. The duration of the whole transaction of the list (including retransmission) must be bounded by a time window T smaller than the delay between two consecutive synchronization orders. It has been shown in (Song et

© 2005 by CRC Press

8-16

The Industrial Communication Technology Handbook

al., 1991) that this recovery mechanism (a kind of grouped acknowledgment technique) is very efficient and can be recommended for use for other multicast protocols.

8.9 WorldFIP State and Technology 8.9.1 Technology A lot of integrated circuits and software libraries are available in order to build devices compatible with the standard. They conform to the European standard EN 50 170. The circuits cover all physical layer protocols and all profiles of data link and application layers. They are referenced as FIPIU2, FULLFIP2, and MICROFIP for the communication component. The circuits for the physical layer are essentially FIELDRIVE and CREOL for copper wire and OPERA-FIPOPTIC for fiber optic. It is important to notice that redundant channels may be used with the FIELDUAL component. TransoFIP and FIELDTR are possible transformers for connection on copper wire. The libraries are used in order to create the link between the user application and the communications controller. Each library is dedicated to a communication component.

8.9.2 Fieldbus Internet Protocol FIP may now be interpreted as FieldBus Internet Protocol. Indeed, the messaging services are used to transfer Hypertext Transfer Protocol (HTTP) protocol data unit (PDU), and then each site on a WorldFIP fieldbus may host a Web server. For remote maintenance applications, for remote configuration, it is then possible to access the stations through a browser. Two solutions are currently used: 1. A single station may be seen as the image of all stations of the fieldbus. 2. Each station is directly accessed by tunneling HTTP in the WorldFIP PDUs. The interest of WorldFIP is that the data flow generated by Internet connections is managed in complete compatibility with the time constraints of the process, and then with its own dependability. The timecritical traffic is always satisfied in priority.

8.9.3 New Development In 2001, WorldFIP was certified safe according to the procedure “proven in use” defined by IEC 61 508. According to this standard, a fieldbus is considered a subsystem. It has been certified by Bureau Veritas after considering very reliable applications from detailed records from field users, with a sufficient number of applications, with a high level of confidence in the operational figures. It is the only fieldbus in the world with such a certification.

8.10 Conclusion The WorldFIP fieldbus is now 20 years old if we consider the beginning of its specification. It is only a few years old if we consider the very recent dates of the international standards (Leviti, 2001), knowing that the definition of the profiles standard (CENELEC, 2003b) is not yet finished. The current integrated circuits (ICs) (the third generation for the first IC) are based on the last development of microelectronics. The WorldFIP fieldbus is the only certified SIL3 (safety integrity level) in the world (Froidevaux et al., 2001). WorldFIP occupies a special place in the international market. It is used in all types of industry (iron–steel industry, car manufacturing, bottling, power plants, etc.), but also in a lot of time-critical applications, embedded systems in trains, autobuses, ships, subways, and a very important application: the Large Hadron Collider in CERN (Centre Européen de Recherche Nucléaire) in Switzerland. The main reason must be searched in the technical specifications, in the services provided, and in the quality of

© 2005 by CRC Press

The WorldFIP Fieldbus

8-17

services, essentially from a timeliness point of view. WorldFIP guarantees that time constraints will be met, periods without jitter, and synchronization of distributed actions (productions, consumptions, data acquisitions, controls). The validation of data is also an important element of the quality of service and of the dependability of WorldFIP-based systems. The quality of the physical layer (IEC, 1993) and the redundancy capabilities are also important points for choosing WorldFIP for critical applications. A lot of these concepts and mechanisms have been repeated in the TCCA (time-critical communication architecture) report (ISO, 1991; Grant, 1992) and in the IEC TS 61158 (CENELEC, 2003a). Regarding other and newer requirements, WorldFIP is able to transport voice and video without disturbing or influencing the real-time traffic. Some examples of voice transport are already industrial in trains. WorldFIP is also able to transport HTTP data units, allowing remote access to WorldFIP-based systems for monitoring, maintenance, configuration, etc.

References AFNOR (1989). French Standards NF C46601 to C46607. FIP bus for exchange of information between transmitters, actuators and programmable controllers. Published between 1989 and 1992 (in French). AFNOR (1996). French Standard C46-638, Système de Communication à haute performance pour petits modules de données (WorldFIP Profil 1, DWF). Bergé, N., G. Juanole, and M. Samaan (1995). Using Stochastic Petri Nets for Modeling and Analysing an Industrial Application Based on FIP Fieldbus. Paper presented at International Conference on Emerging Technologies and Factory Automation, INRIA, Paris. Cardeira, C. and Z. Mammeri (1995). A schedulability analysis of tasks and network traffic in distributed real-time systems. Measurement, 15, 71–83. CENELEC (1996a). European standard EN 50170. Fieldbus. Volume 1: P-Net, Volume 2: PROFIBUS, Volume 3: WorldFIP. CENELEC (1996b). High Efficiency Communications Subsystems for Small Data Packages, CLC TC/ 65CX, EN 50254. CENELEC (2003a). prEN61158-2: Digital Data Communication for Measurement and Control: Fieldbus for Use in Industrial Control Systems. Part 2: Physical layer specification, Part 3: Data link layer service definition, Part 4: Data link layer protocol specification, Part 5: Application layer service definition, Part 6: Application layer protocol specification. CENELEC (2003b). prEN61784-1 (65C/294/FDIS): Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems. Decotignie, J.D. and P. Raja (1993). Fulfilling temporal constraints in fieldbus. In Proc. IECON ’93, Maui, HI, pp. 519–524. Froidevaux, J.-P. (2001). Use of Fieldbus In Safety Related Systems, an Evaluation of WorldFIP according to Proven-in-Use Concept of IEC 61508. Paper presented at 4th FET, IFAC Conference, Nancy, France. Galara, D. and J.P. Thomesse (1984). Groupe de réflexion FIP, Proposition d’un système de transmission série multiplexée pour les échanges d’informations entre des capteurs, des actionneurs et des automates réflexes. Ministère de l’Industrie et de la Recherche. Gault, M. and J.P. Lobert (1985). Contribution for the fieldbus standard. Presentation to IEC/TC65/ SC65C/WG6. Grant, K. (1992). Users Requirements on Time Critical Communications Architectures, Technical Report. ISO TC184/SC5/WG2/TCCA. IEC (1993). IEC Standard 1158-2, Fieldbus Standard for Use in Industrial Control Systems: Part 2: Physical Layer Specification and Service Definition + AMD1, 1995. ISO (1990). International Standard ISO 9506, Manufacturing Message Specification (MMS): Part 1: Service Definition, Part 2: Protocol Specification, 1991.

© 2005 by CRC Press

8-18

The Industrial Communication Technology Handbook

ISO (1991). ISO/TC 184/SC 5/WG 2-TCCA-N56, Draft Technical Report of the TCCA Rapporteurs’ Group of ISO/TC 184/SC 5/WG 2 Identifying User Requirements for systems Supporting TimeCritical Communications, August 1991. Kopetz, H. (1990). Event triggered vs. time triggered real time systems. LNCS, 563, 87–101. Leviti, P. (2001). IEC 61158, An Offence to Technicians. Paper presented at 4th FET, IFAC Conference, Nancy, France. Lorenz, P. and Z. Mammeri (1994a). Temporal Mechanisms in Communication Models Applied to Companion Standards. Paper presented at SICICA 94, Budapest. Lorenz, P., J.-P. Thomesse, and Z. Mammeri (1994b). A State-Machine for Temporal Qualification of Time-Critical Communication. Paper presented at 26th IEEE Southeastern Symposium on System Theory, Athens, Ohio, March 20–22. MAP (1988). General Motors, Manufacturing Automation Protocol, version 3.0. Pleinevaux, P. and J.-D. Decotignie (1988). Time critical communications networks: field buses. IEEE Network Magazine, 2, 55–63. Saba, G., J.P. Thomesse, and Y.Q. Song (1993). Space and time consistency qualification in a distributed communication system. In Proceedings of IMACS/IFAC International Symposium on Mathematical and Intelligent Models in System Simulation, Vol. 1, Brussels, Belgium, April 12–16, pp. 383–391. Simonot, F., Y.Q. Song, and J.P. Thomesse (1995). On message sojourn time in TDM schemes with any buffer capacity. IEEE Transactions on Communication, 43, 2/3/4, 1013–1021. Song, Y.Q., P. Lorenz, F. Simonot, and J.P. Thomesse (1991). Multipeer/Multicast Protocols for TimeCritical Communication. Paper presented at Multipeer/Multicast Workshop, Orlando, FL. Thomesse, J.-P. (1993). Le réseau de terrain FIP. Revue Réseaux et Informatique Répartie, Ed. Hermès, 3, 3, 287–321. Thomesse, J.-P. (1998). A review of the fieldbuses. Annual Reviews in Control, Pergamon, 22, 35–45. Thomesse, J.-P. and J.-L. Delcuvellerie (1987). FIP: A Standard Proposal for Fieldbuses. Paper presented at IEEE-NBS Workshop on Factory Communications, Gaithersburg, MD, March 17–19. Thomesse, J.-P., J.-Y. Dumaine, and J. Brach (1986). An industrial instrumentation local area network. Proceedings of IECON, 1, 73–78. Thomesse, J.-P. and T. Lainé (1989). The field bus application services. In Proceedings of IECON ’89, 15th Conference IEEE-IES Factory Automation, Philadelphia, pp. 526–530. Thomesse, J.-P. and M. Rodriguez (1986). FIP, A Bus for Instrumentation. Paper presented at Advanced Seminar on Real Time Local Area Networks, Colloque INRIA Bandol, France. Zimmermann, H. (1980). OSI reference model. The ISO model of architecture for open system interconnection. IEEE Transactions on Communication, 28, 425–432.

© 2005 by CRC Press

9 FOUNDATION Fieldbus: History and Features 9.1 9.2

Principles of FOUNDATION Fieldbus.....................................9-1 Technical Description of FOUNDATION Fieldbus ................9-2 H1 and HSE FOUNDATION Fieldbus User Application Layer • H1 FOUNDATION Fieldbus • HSE FOUNDATION Fieldbus • Open Systems Implementation

Salvatore Cavalieri University of Catania

9.3 Conclusions .......................................................................9-16 References .....................................................................................9-16

FOUNDATION fieldbus is an all-digital, serial, two-way communication system. Its specification has been developed by the nonprofit Fieldbus Foundation [1]. Since its very beginning, FOUNDATION Fieldbus has shown two fundamental and unique (at that time, at least) features: an emphasis on the standardization of the description of the devices to be connected to the fieldbus, and the adoption of the main link access mechanisms (i.e., token-based and centralized ones), which the International Electrotechnical Commission (IEC)/Instrument Society of America (ISA) fieldbus committee (IEC 61158 and ISA SP50) was trying to derive from the existing proposals within a new and complete solution [2][3][4][5]. One of the aims of this paper is to emphasize the value of the choices made by the Fieldbus Foundation as well as their impact on current features of the FOUNDATION Fieldbus communication system. Furthermore, those features will be described in great detail, allowing the reader to clearly understand the key points of the system. This chapter is organized into two parts: Section 9.1 will give an overview of the principles of FOUNDATION Fieldbus, and Section 9.2 will discuss the main features of this communication system.

9.1 Principles of FOUNDATION Fieldbus Since the first fieldbus communication systems [6][7] have appeared on the market, the need to achieve just one fieldbus standard was felt immediately. Over 15 years ago, the International Electrotechnical Commission (IEC) and Instrument Society of America (ISA) embarked on a joint standardization effort identified by two codes: 61158 on the IEC side and SP50 on the ISA one. The main aim of the standard committee was the definition of a unique communication system able to merge the main features of the fieldbuses available on the market: FIP (Factory Instrumentation Protocol) [8] and Profibus [9]. The Fieldbus Foundation (note that Fieldbus Foundation refers to the name of the association while FOUNDATION Fieldbus is used for the relevant communication system [1]), established in 1994 as a result of a merger between ISP (Interoperable System Project) and WorldFIP North America, has defined a small set of basic principles. Those basic principles included two main cornerstones:

9-1 © 2005 by CRC Press

9-2

The Industrial Communication Technology Handbook

1. The adoption of both main medium access control (MAC) mechanisms that the IEC/ISA fieldbus committee was trying to derive from the existing proposals within a truly complete solution 2. Emphasis on a standard description of the devices to be connected on the fieldbus Cornerstone 1 made the Fieldbus Foundation free from the persistent solution issue: scheduled access vs. circulated token. IEC 61158 type 1 data link layer (DLL) stated: “Both paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3]. FOUNDATION Fieldbus, since being established, fully adopted an approach to provide both the predefined scheduling philosophy of FIP and the token rotation philosophy of Profibus. Section 9.2 provides more details about these two fundamental mechanisms. Cornerstone 2 allowed the Fieldbus Foundation to avoid a situation (which affected most of the previous fieldbus proposals) in which, after defining the communication stack, much more still needed to be done in order to make devices operational after being connected to a fieldbus. In fact, the previous fieldbus proposals started their developments by focusing on the communication aspects (physical media, access mechanisms, addressing, connections, quality of services, etc.). That was mostly motivated by the fact that when switching from a dedicated low-frequency 4- to 20-A signal to a multidata high-frequency serial link, the most evident change was relevant to the communication mechanism itself: Was that cable still good? What is the best encoding method? How can we guarantee noise recovery? Will too many data on the same medium affect their timeliness? And so on. But once the communication aspects were defined and proven, the data, which such communication was able to transfer between two or more devices, did not make those devices able to interoperate. FOUNDATION Fieldbus had this concept clear in sight since the beginning and included the definition of data semantics plus their configuration and use within the first set of specifications. The only aspect that initially the Fieldbus Foundation intentionally left out was the higher-speed version of Fieldbus, H2. That was mostly because the market addressed by the Fieldbus Foundation was relevant to the replacement of existing 4- to 20-mA devices with the ones compliant with FOUNDATION Fieldbus, trying to reduce as much as possible the relevant costs. This market strategy, chosen in order to make easier the adoption of the new fieldbus technology, could be realized if the already-laid-down twisted-pair cables (connecting the old 4- to 20-mA devices), which support only the slow-speed version, H1, were maintained. This explains why such a strong effort was put in the development of the H1 technology as opposed to the higher-speed H2 version of FOUNDATION Fieldbus. For H2, the Fieldbus Foundation initially planned to adopt the IEC/ISA high-speed standard [2], but ultimately decided to use the High-Speed Ethernet (HSE) instead, mainly due to the wide availability of components and the existence of networks in the plants (at least at the backbone level).

9.2 Technical Description of FOUNDATION Fieldbus FOUNDATION fieldbus is an all-digital, serial, two-way communication system. FOUNDATION Fieldbus specifications include two different configurations, H1 and HSE [1][10]. H1 (running at 31.25 kbit/s) interconnects field equipment such as sensors, actuators, and inputs/outputs (I/Os). HSE (running at 100 Mbit/s) provides integration of controllers (such as distributed control systems [DCSs] and programmable logic controllers [PLCs]), H1 subsystems (via a linking device), data servers, and workstations. HSE is based on standard Ethernet technology to perform its role. In detail: • The H1 FOUNDATION Fieldbus communication system is mainly devoted to distributed continuous process control and to replacement of existing 4- to 20-mA devices. Its communication functionalities, specifically foreseen for time-critical applications, are supported by services grouped within levels, as in all other OSI-RM open-system architectures. The number of levels is minimal, in order to guarantee maximum speed in the data handling. Below the fieldbus application layer, H1 FOUNDATION Fieldbus directly presents the data link layer, managing access to the communication

© 2005 by CRC Press

FOUNDATION Fieldbus: History and Features

FIGURE 9.1

9-3

H1 FOUNDATION Fieldbus vs. ISO/OSI architecture.

FIGURE 9.2 HSE FOUNDATION Fieldbus vs. ISO/OSI architecture.

channel. A physical layer deals with the problem of interfacing with the physical medium. A network and system management layer is also present. Figure 9.1 compares the H1 FOUNDATION Fieldbus architecture against the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) reference model. • The HSE FOUNDATION Fieldbus defines an application layer and associated management functions, designed to operate over a standard Transmission Control Protocol (TCP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, over twisted-pair or fiber-optic switched Ethernet. It is mainly foreseen for discrete manufacturing applications, but of course, it can also be used to interconnect H1 segments, as well as foreign protocols trough IP/TCP gateways, in order to build complete plant networks. Figure 9.2 compares the HSE FOUNDATION Fieldbus architecture against the ISO/ OSI reference model. Looking at Figure 9.1 and Figure 9.2, it becomes evident that the Fieldbus Foundation has specified a user application layer, significantly differentiating the solution from the ISO/OSI model, which does not define such a layer. The H1 and HSE FOUNDATION Fieldbus user application layer is mainly based on function blocks, providing a consistent definition of inputs and outputs that allow seamless distribution and integration of functionality from various vendors [11].

9.2.1 H1 and HSE FOUNDATION Fieldbus User Application Layer As mentioned above, the Fieldbus Foundation has defined a standard user application layer based on blocks. Blocks are representations of different types of application functions. The types of blocks used in a user application are resource, transducer, and function. Devices are configured by using resource blocks and transducer blocks. The control strategy is built by using function blocks [11] instead.

© 2005 by CRC Press

9-4

The Industrial Communication Technology Handbook

TABLE 9.1

Basic Function Blocks

Function Block Name

Symbol Name

Analog Input Analog Output Bias/Gain Control Selector Discrete Input Discrete Output Manual Loader Proportional/Derivative Proportional/Integral/Derivative Ratio

AI AO BG CS DI DO ML PD PID RA

H1 Fieldbus

Al 110 PID 110 AO 110

FIGURE 9.3 Example of a complete control loop using function blocks in FOUNDATION Fieldbus devices.

9.2.1.1 Resource Block The resource block describes characteristics of the fieldbus device such as the device’s name, manufacturer, and serial number. There is only one resource block in a device. 9.2.1.2 Function Block Function blocks (FBs) provide the control system behavior. The input and output parameters of function blocks running in different devices can be linked over the fieldbus. The execution of each function block is precisely scheduled. There can be many function blocks in a single user application. The Fieldbus Foundation has defined sets of standard function blocks. Ten standard function blocks for basic control are defined in [12]. These blocks are summarized in Table 9.1. Other, more complex standard function blocks are defined in References [13] and [14]. The flexible function block is defined in [15]. A flexible function block (FFB) is a user-defined block. FFB allows a manufacturer or user to define block parameters and algorithms to suit an application that interoperates with standard function blocks and host systems. Function blocks can be built into fieldbus devices as required in order to achieve the desired device functionality. For example, a simple temperature transmitter may contain an analog input (AI) function block. A control valve might contain a proportional integrative derivative (PID) function block as well as the expected analog output (AO) block. Thus, a complete control loop can be built using only a simple transmitter and a control valve (Figure 9.3). 9.2.1.3 Transducer Blocks Like the resource blocks, the transducer blocks are used to configure devices. Transducer blocks decouple function blocks from the local input/output functionalities required in order to read sensors or to command actuator’s output. They contain information such as calibration date and sensor type [16][17].

© 2005 by CRC Press

FOUNDATION Fieldbus: History and Features

9-5

9.2.2 H1 FOUNDATION Fieldbus The H1 FOUNDATION Fieldbus is made up by different layers (as shown in Figure 9.1), whose functionalities will be described in the following. 9.2.2.1 H1 FOUNDATION Fieldbus Physical Layer The H1 FOUNDATION Fieldbus physical layer has been conceived to receive messages from the communication stack in order to convert them into physical signals on the fieldbus transmission medium and vice versa [18]. Conversion tasks include the adding and removing of preambles, start delimiters, and end delimiters. The preamble is used by the receiver to synchronize its internal clock with the incoming fieldbus signal. The receiver uses the start delimiter to find the beginning of a fieldbus message. After finding the start delimiter, the receiver accepts data until the end delimiter is received. The physical layer is defined by approved standards issued by the IEC and ISA. In particular, the FOUNDATION Fieldbus H1 physical layer is the 31.25-Kbaud version of the IEC (type 1)/ISA fieldbus [2][19]. Signals (±10 mA on 50-ohm load) are encoded using the synchronous Manchester biphase-L technique and can be conveyed on low-cost twisted-pair cables. The signal is called synchronous serial because the clock information is embedded in the serial data stream. Data are combined with the clock signal while creating the fieldbus signal. The receiver of the fieldbus signal interprets a positive transition in the middle of a bit time as a logical 0 and a negative transition as a logical 1. Special codes are defined for the preamble, start delimiter, and end delimiter. Special N+ and N– characters are used in the start delimiter and end delimiter. Note that the N+ and N– signals do not have a transition in the middle of a bit time. 9.2.2.1.1 Fieldbus Signaling The transmitting device delivers ±10 mA at 31.25 kbit/s into a 50-ohm equivalent load to create a 1.0volt peak-to-peak voltage modulated on top of the direct current (DC) supply voltage. The DC supply voltage can range from 9 to 32 volts. The 31.25 kbit/s fieldbus also supports intrinsically safe (I.S.) fieldbuses for bus-powered devices. To accomplish this, an I.S. barrier is placed between the power supply in the safe area and the I.S. device in the hazardous area. In this case (I.S. applications), the allowed power supply voltage depends on the barrier rating. 9.2.2.1.2 Fieldbus Wiring H1 FOUNDATION Fieldbus wiring is based on trunk cables featuring terminators installed at each end. The H1 FOUNDATION Fieldbus allows for stubs or spurs located anywhere along the trunk and connected to the trunk by junction box, as shown by Figure 9.4. A single device can be connected by each spur. Existence of spurs allows 31.25 kbit/s devices to operate on wiring previously used for 4- to 20-mA devices [20][21]. More trunks can be connected in a fieldbus link by means of repeaters. Up to five trunks (by means of four repeaters) can be interconnected. Spur length varies from 1 up to 120 m, depending on the number of devices connected to the fieldbus link. The maximum number of devices on a fieldbus link is 32; the actual number depends on factors such as the power consumption of each device, the type of cable, the use of repeaters, etc. In particular, the maximum number of devices is usually 6 for intrinsically safe applications that are power delivered through the bus, 12 for nonintrinsically safe applications that are power delivered through the bus, and 32 for nonintrinsically safe applications that are not power delivered through the bus [22]. The total trunk length (including spurs) is 1900 m, and the number of network addresses available for each link is 240. 9.2.2.2 H1 FOUNDATION Fieldbus Data Link Layer The H1 FOUNDATION Fieldbus data link layer (DLL) controls transmission of messages onto the fieldbus. As mentioned already, FOUNDATION Fieldbus fully adopted the IEC (type 1)/ISA DLL statement: “Both

© 2005 by CRC Press

9-6

The Industrial Communication Technology Handbook

FIGURE 9.4 Trunk, junction box, and spurs in H1 FOUNDATION Fieldbus.

paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3][23][24]. Thus, a fieldbus needs a good mix of circulated token and scheduled access, well balanced to avoid losing bandwidth for the scheduled access when it is not really needed, but always giving priority to scheduled access over circulated token when a conflict arises. That is what the IEC (type 1)/ISA DLL documents propose and FOUNDATION Fieldbus made real: an overall schedule able to guarantee the needed data at the needed time but also allowing gaps within which a circulated token mechanism can take place while complying with a defined maximum rotation time. Such a philosophy clearly needs an arbitrator that univocally imposes the transmission of defined data at a defined time by a defined entity, when so required, but also guarantees a defined minimum amount of free time to each entity. This arbitrator is called link active scheduler (LAS) within IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL [3][23][24]. Essentially, LAS performs: • Access to the physical medium on a scheduled basis. • Circulation of the token only when no scheduled traffic is needed. The token is passed for a limited amount of time that is always shorter than the interval left before the next scheduled traffic. • A policy for the management of the token, according to which the token is returned to the LAS instead of passing it on to a new node so that the LAS, depending on the time left, can decide whether to actually pass the token once more or to resume link control to manage the scheduled traffic. Specific needs of the token mechanism include: • Giving enough contiguous token time to each node • Guaranteeing the most regular, as possible, token rotation time among all the nodes These needs are granted by the token method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. Other needs include: • Keeping the token cycle short enough • Satisfying the occurrence of high-priority events

© 2005 by CRC Press

9-7

FOUNDATION Fieldbus: History and Features

LAS

CD

DT

FIGURE 9.5

Centralized access mechanism.

These are substituted by the scheduled access method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. 9.2.2.2.1 Device Types Two types of devices are defined in the DLL specification: basic device and link master. Link master devices are capable of becoming the link active scheduler (LAS). Basic devices do not have the capability to become the LAS. 9.2.2.2.2 Scheduled Communication The way the LAS manages the centralized government is based on the following mechanism. The LAS has a list of transmitting times for all data buffers in all devices that need to be cyclically transmitted. When it is time for a device to send contents of a buffer, the LAS issues a compel data (CD) message to the device. Upon receipt of the CD, the device broadcasts or publishes the data item (DT) in the buffer to all devices on the fieldbus. Any device configured to receive the data is called a subscriber. Figure 9.5 shows this access mechanism. Scheduled data transfers are typically used for the regular, cyclic transfer of control loop data between devices on the fieldbus. 9.2.2.2.3 Unscheduled Communication The federal autonomy of each node is given by a bandwidth distribution mechanism based on the use of a circulating token. In unused portions of the bandwidth (i.e., not occupied by the transmission of CDs), the LAS sends a pass token (PT) to each node included in a particular list called live list (described in the following). Each token is associated with a maximum utilization interval, during which the receiving node can use the available bandwidth to transmit what it needs. On the expiration of the time interval

© 2005 by CRC Press

9-8

The Industrial Communication Technology Handbook

PT PT PT

RT RT RT PT

RT LAS PT

RT RT PT

FIGURE 9.6 Token passing mechanism.

or when the node completes its transmissions, the token is returned to the LAS by using another frame called return token (RT). A target token rotation time (TTRT) defines the interval time desired for each token rotation. The value to be assigned to this parameter is linked to the maximum admissible delay in the transmission of the asynchronous flow. Figure 9.6 shows the token circulation managed by the LAS. 9.2.2.2.4 Live List Maintenance The list of all devices that are properly responding to the pass token (PT) is called the live list. New devices may be added to the fieldbus at any time. The LAS periodically sends probe node (PN) messages to all the addresses not yet present in the live list. If a new device appears at an address and receives the PN, it immediately returns a probe response (PR) message. When a device returns a PR, the LAS adds the device to the live list and confirms its addition by sending the device a node activation message. The LAS is required to probe at least one address after it has completed a cycle of sending PTs to all the devices in the live list. The device will remain in the live list as long as it responds properly to the PTs sent from the LAS. The LAS will remove a device from the live list if the device does not reply to the relevant PT for three successive tries. Whenever a device is added or removed from the live list, the LAS broadcasts changes in the live list to all devices; this allows each link master device to maintain a current copy of the live list in order to be ready to become LAS if needed. 9.2.2.2.5 Data Link Time Synchronization A DLL time synchronization mechanism is provided so that any node can request the LAS for a scheduled action to be executed at a defined time that represents the same absolute instant for all the nodes.

© 2005 by CRC Press

9-9

FOUNDATION Fieldbus: History and Features

Is there time to do something before next scheduled CD ?

no

Wait until it’s time to issue the CD Issue the CD Send Idle Messages while waiting Legend: CD = Compel Data PN = Probe Node TD = Time Distribution PT = Pass Token

yes

Issue PN, TD, or PT

FIGURE 9.7 Link active scheduler algorithm.

The LAS periodically broadcasts a time distribution (TD) message on the fieldbus so that all devices have exactly the same data link time. This is important because scheduled communications on the fieldbus and scheduled function block executions in the user application layer are based on timing derived from these messages. 9.2.2.2.6 Link Active Scheduler Operation The algorithm used by the LAS is shown in Figure 9.7. 9.2.2.2.7 LAS Redundancy IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DDL provides the possibility to have more than one potential LAS on each link as well as backup procedures that are essential for fieldbus availability. In particular, as a fieldbus may have multiple link masters, if the current LAS fails, one of the link masters will become the LAS and the operation of the fieldbus will continue. 9.2.2.3 H1 FOUNDATION Fieldbus Application Layer The H1 FOUNDATION Fieldbus application layer includes two sublayers: FAS and FMS [25][26]. 9.2.2.3.1 Fieldbus Access Sublayer Fieldbus access sublayer (FAS) uses both the scheduled and unscheduled features of the data link layer to provide services for the fieldbus message specification (FMS) [25]. The type of each FAS service is described by virtual communication relationships (VCRs). VCR defines the kind of information (messages) exchanged between two applications. Possible features of the VCR may be the number of receivers (one or many) for each transmitter, the memory organization (queue or buffer) used to store the messages to be sent/received, and the DLL mechanism used to send the message (PT or CD). The types of VCR defined by the Fieldbus Foundation are: • Client–server VCR type. The client–server VCR type is used for queued, unscheduled, userinitiated, one-to-one communication between devices on the fieldbus. Queued means that messages are sent and received in the order submitted for transmission, according to their priority, without overwriting previous messages. When a device receives a pass token (PT) from the LAS, it may send a request message to another device on the fieldbus. The requester is called the client and the device that received the request is called the server. The server sends the response when it receives a PT from the LAS. The client–server VCR type is used for operator-initiated requests such as setpoint changes, access to and change of a tuning parameter, alarm acknowledgment, and device upload and download.

© 2005 by CRC Press

9-10

The Industrial Communication Technology Handbook

• Report distribution VCR type. The report distribution VCR type is used for queued, unscheduled, user-initiated, one-to-many communications. When a device, holding an event or a trend report to send, receives a PT from the LAS, it sends its message to a group address defined by its VCR. Devices that are configured to listen for that VCR will receive the report. The report distribution VCR type is normally used by fieldbus devices to send alarm notifications to the operator consoles. • Publisher–subscriber VCR type. The publisher–subscriber VCR type is used for buffered, oneto-many communications. Buffered means that only the latest version of the data is maintained within the network. New data completely overwrite previous data. When a device receives compel data (CD), the device will publish (broadcast) its message to all devices on the fieldbus. Devices that wish to receive the published message are called subscribers. CD may be scheduled in LAS, or they may be sent by subscribers on an unscheduled basis. An attribute of the VCR indicates which method is used. The publisher–subscriber VCR type is normally used by the field devices for cyclic, scheduled publishing of user application function block inputs and outputs. 9.2.2.3.2 Fieldbus Message Specification Fieldbus message specification (FMS) services allow user applications to send messages to each other across the fieldbus by using a standard set of message formats. FMS describes the communication services, message formats, and protocol behavior needed to build messages for the user application [26]. Data that are communicated over the fieldbus are described by an object description. Object descriptions are collected together in a structure called an object dictionary (OD). The object description is identified by its index in OD. Index 0, called the object dictionary header, provides a description of the dictionary itself and defines the first index for the object descriptions of the user application. The user application object descriptions can start at any index above 255. Index 255 and below define standard data types such as Boolean, integer, float, bit string, and data structures that are used to build all other object descriptions. A virtual field device (VFD) is used to remotely view local device data described in the object dictionary. A typical device will have at least two VFDs: the network and system management VFD and the user application VFD. The network and system management VFD provides access to the network management information base (NMIB) and to the system management information base (SMIB). NMIB data include VCRs, dynamic variables, statistics, and LAS schedules (if the device is a link master). SMIB data include device tag and address information and schedules for function block execution. The user application virtual field device is used to make the device functions (the function of a fieldbus device is defined by the selection and interconnection of blocks) visible to the fieldbus communication system. The header of the user application object dictionary points to a directory that is always the first entry in the function block application. The directory provides the starting indices of all of the other entries used in the function block application. The VFD object descriptions and their associated data are accessed remotely over the fieldbus network using virtual communication relationships. FMS communication services provide a standard way for user applications, such as function blocks, to communicate over the fieldbus. Specific FMS communication services are defined for each object type. Table 9.2 summarizes the communication services available. Detailed descriptions for each service are provided in [26]. All of the FMS services use the client–server VCR type except as noted (see notes a and b in Table 9.2). 9.2.2.4 H1 FOUNDATION Fieldbus System Management Inside the H1 FOUNDATION Fieldbus specification, system management handles important system features [27] such as: • Function block scheduling. Function blocks must often be executed at precisely defined intervals and in the proper sequence for correct control system operation. System management synchronizes execution of the function blocks to a common time clock shared by all devices. A macrocycle is a single iteration of a schedule within a device. According to the type of the device, we can have a LAS macrocycle and a device macrocycle. According to the first one, the system management

© 2005 by CRC Press

9-11

FOUNDATION Fieldbus: History and Features

TABLE 9.2

Set of Services in FMS

Group Management and environment services

Object dictionary (OD) services

Variable access services

Event services

Downloading/uploading services

Program handling services

a b

Service

Meaning

Initiate Abort Reject Status Unsolicited status Identify Get OD Initiate put OD Put OD Terminate put OD Read Write Information reporta Define variable list Delete variable list Event notificationb Acknowledge event notification Alter event condition monitoring Request domain in upload Initiate upload sequence Upload segment Terminate upload sequence Request domain download Initiate upload sequence Upload segment Terminate upload sequence Generic initiate download sequence Generic download segment Generic terminate download Create program invocation Delete program invocation Start Stop Resume Reset Kill

Establish a communication Abort a communication Reject a nonvalid service Gives the status of a service Send a nonrequested status Read a device specification (vendor, type, and version) Read an object dictionary (OD) Start loading an OD Load an OD in a device Stop loading an OD Read a variable Update the value of a variable Send data Define a variable list Delete a variable list Notify an event Acknowledge an event Enable or disable an event Request of domain upload Initiate upload Upload data End upload Request of domain download Initiate upload Upload data End upload Open download Send data to device Sequence stop download Create a program object Delete a program object Start a program object Stop a program object Resume execution of a program Reset a program Kill a program

Service can only use the publisher–subscriber or report distribution on VCR. Service can only use the report distribution on VCR.

can synchronize execution of the function blocks across the entire fieldbus link. On the basis of the device macrocycle, instead the system management can synchronize execution of function blocks inside each device. • Application clock distribution. This function allows publication of the time of day to all devices, including automatic switchover to a redundant time publisher. FOUNDATION Fieldbus supports an application clock distribution function. The application clock is usually set equal to the local time of day or to universal coordinated time. System management has a time publisher that periodically sends an application clock synchronization message to all fieldbus devices. The data link scheduling time is sampled and sent with the application clock message so that the receiving devices can adjust their local application times. During the intervals between synchronization messages, application clock time is independently maintained within each device relying on its own internal clock. • Device address assignment. Fieldbus devices do not use jumpers or switches to configure addresses. Instead, device addresses are set by configuration tools using system management services. Every fieldbus device must have a unique network address and physical device tag for the fieldbus to operate properly. To avoid the need for address switches on the devices, assignment

© 2005 by CRC Press

9-12

The Industrial Communication Technology Handbook

of network addresses can be performed by configuration tools using system management services. The sequence for assigning a network address to a new device is as follows: • An unconfigured device will join the network at one of four special temporary default addresses. • A configuration tool will assign a physical device tag to the new device using system management services. • A configuration tool will choose an unused permanent address and assign it to the device using system management services. • The sequence is repeated for all devices that enter the network at a default address. • Devices store the physical device tag and node address in nonvolatile memory, so they will also retain these settings after a power failure. • Find tag service. For the convenience of host systems and portable maintenance devices, system management supports a service for finding devices or variables by a tag search. The “find tag query” message is broadcasted to all fieldbus devices. Upon receipt of the message, each device searches its virtual field devices for the requested tag and returns complete path information (if the tag is found), including the network address, VFD number, VCR index, and OD index. Once the path is known, the host or maintenance device can access the data by its tag. All of the configuration information needed by system management, such as the function block schedule, is described by object descriptions in the network and system management VFD. This VFD provides access to the system management information base and also to the network management information base. 9.2.2.5 H1 FOUNDATION Fieldbus Network Management H1 FOUNDATION Fieldbus network management mainly provides for the configuration of the communication stack [28].

9.2.3 HSE FOUNDATION Fieldbus The HSE FOUNDATION Fieldbus foresees the architecture depicted in Figure 9.2. As shown, its main feature is the use of Internet architecture (full TCP/UDP/IP and IEEE 802.3u stack [29][30][31]) for high-speed discrete control and, more generally, for interconnecting several H1 segments in order to achieve a plantwide fieldbus network [32]. Before describing the HSE FOUNDATION Fieldbus specifications, a brief overview of its general features will be given, with particular emphasis on the capability to interconnect different H1 FOUNDATION Fieldbus segments. There are four basic HSE device categories (but several of them are typically combined into a single real device): linking device, Ethernet device, host device, and gateway device. A linking device (LD) connects H1 networks to the HSE network. An Ethernet device (ED) may execute function blocks and may have some conventional I/Os. A gateway device (GD) interfaces other network protocols such as Modbus [33], DeviceNet [34], or Profibus [9]. A host device (HD) is a non-HSE device capable of communicating with HSE devices. Examples include configurators, operator workstations, and an OPC server. The network in Figure 9.8 shows a host system operating on an HSE bus segment labeled Segment A. Communications to H1 segments (B and C, as shown in the figure) are achieved by means of an Ethernet switch. The same switch is used to connect a second HSE segment (D) and a segment running a foreign protocol (E). Any of the devices connected to the switch may attempt communication to any other device, and it is the function of the switch to provide the correct routing and to negotiate transmission without collisions. The connecting mechanism between HSE and H1 segments is performed by a linking device (LD). A typical LD will serve multiple H1 segments, though for simplicity, only one segment per LD is shown in Figure 9.8. The connection between HSE and a foreign protocol is made through a gateway device (GD). The capabilities of the interconnections shown in Figure 9.8 are as follows:

© 2005 by CRC Press

9-13

FOUNDATION Fieldbus: History and Features

HSE Segment A

Host System

LD

LD H1 Segment B Foreign Protocol Segment E Switch

GD LD H1 Segment C

LD HSE Segment D

FIGURE 9.8 H1 and HSE FOUNDATION Fieldbus interconnection.

• HSE host/H1 segment. The HSE host interacts with a standard H1 device through an LD. In this situation, the HSE host is able to configure, diagnose, and publish and subscribe data to and from the H1 device. • HSE host/HSE segment. The HSE host interacts with an HSE device and is able to configure, diagnose, and publish and subscribe data to and from the HSE device. • H1/H1 segment. In this situation, the interaction is between two H1 devices on two distinct H1 bus segments. The segments are connected to the Ethernet by LDs. Communications between devices on two H1 segments are functionally equivalent to communications between two H1 devices on the same bus segment. But it is clear that real-time communication between devices belonging to different H1 segments cannot be guaranteed due to the lack of a unique scheduler of the communication among different H1 segments. • HSE host/foreign protocol. This connection defines the relationship between a foreign device and the FOUNDATION Fieldbus application environment. The connection is made by GD. The foreign device is seen as a publisher to an HSE resident subscriber; the HSE host can handle the data stream from the I/O gateway in the same manner as it treats the data streams from devices on H1/HSE segments. 9.2.3.1 HSE FOUNDATION Fieldbus Physical, Data Link, Network, and Transport Layers As explained before, a higher-speed physical layer specification was always intended for selected process applications and for factory (discrete parts) automation. The original high-speed solution, called H2, was based on the H1 protocol and function block application running on different media at either 1 or 2.5 Mbit/s. In March 1998, the Foundation board of directors reconsidered the high-speed solution options and terminated further work on H2. The new approach was based on Ethernet and was intended to make use, as much as possible, of commercially available, off-the-shelf technology (COTS) components and software. The new solution, high-speed Ethernet, is designed to integrate multiple protocols, including multiple H1 FOUNDATION Fieldbus segments as well as foreign protocols, as described above. For its high-speed physical layer version, the Fieldbus Foundation has selected high-speed Ethernet at 100 Mbaud. The specifications for the physical layer, as well as for the Ethernet data link layer, are maintained by the Institute of Electrical and Electronics Engineers (IEEE) [30][31].

© 2005 by CRC Press

9-14

The Industrial Communication Technology Handbook

HSE also makes use of well-established Internet protocols that are maintained by the Internet Architecture Board. These include TCP (Transport Control Protocol), UDP (Unit Datagram Protocol), and IP (Internet Protocol) [29]. Standard HSE stack components are the Distributed Host Configuration Protocol (which assigns addresses), Simple Network Time Protocol (SNTP), and Simple Network Management Protocol (SNMP), which rely on TCP and UDP over IP and the IEEE 802.3 MAC and physical layers. This has resulted in a practically unlimited number of nodes (IP addressing) over star topology networks made of as many links as required, the length of which can be up to 100 m for twisted pair and 2000 m on fiber. Messages sent on the Ethernet are bounded by a series of data fields called frames. The combination of a message and frame is called an Ethernet packet. Typically, a packet encoded according to TCP/IP will be inserted in the message field of the Ethernet packet. FOUNDATION Fieldbus uses a similar data structure where messages are bounded by addressing and other data items. What corresponds to a packet in Ethernet is called a protocol data unit (PDU) in FOUNDATION Fieldbus. Let us consider a communication between two H1 devices over an interposed HSE segment, as illustrated in Figure 9.8. The easiest method for LD might be, upon receiving a communication from an H1 device, to simply insert the entire H1 PDU into the message part of the TCP/IP packet. Then LD on the destination H1 segment, upon receiving the Ethernet packet, would merely strip away the Ethernet frame and send the H1 PDU on to the receiving H1 bus segment. This technique is called tunneling and is commonly used in mixed-protocol networks. The solution developed by HSE FOUNDATION Fieldbus is somewhat more complex, but more efficient than tunneling. The HSE FOUNDATION Fieldbus PDU is inserted into the data field of a TCP/IP message field. However, the fieldbus address is encoded as a unique TCP/IP address, so the fieldbus PDU address is used to fill the address field of the TCP/IP packet. The entire TCP/IP packet is then inserted into the message field of the Ethernet packet. Because of the HSE encoding scheme, networks having multiple LDs can locate and transfer messages to the correct destination much more quickly, with far less extraneous bus traffic, as opposed to tunneling. Perhaps even more important, every H1 device (and every HSE device for that matter) has a unique TCP/IP address and can be directly accessed over standard IT and Internet networks. 9.2.3.2 HSE FOUNDATION Fieldbus Application Layer Existing Fieldbus specifications, which have been widely tested in H1 applications and had already been maintained by the Fieldbus Foundation, have been used in the HSE standard too, where applicable. These include fieldbus message specification (FMS) and system management (SM). New specifications were developed and tested to provide complete high-speed communications and control solutions. The new technology is based on the field device access agent (FDA agent) [35]. The FDA agent allows SM and FMS services used by the H1 devices to be conveyed over the Ethernet using standard UDP and TCP. This allows HSE devices to communicate with H1 devices that are connected via a linking device. The FDA agent is also used by the local function blocks in an HSE device. Thus, the FDA agent enables remote applications to access HSE devices and H1 devices through a common interface. 9.2.3.3 HSE FOUNDATION Fieldbus System Management The following aspects of management are supported in the HSE system management layer [36]: • • • •

Each device has a unique and permanent identity and a system-specific configured name. Devices maintain version control information. Devices respond to requests aiming to locate objects, including the device itself. Time is distributed to all devices on the network.

© 2005 by CRC Press

FOUNDATION Fieldbus: History and Features

9-15

• Function block schedules are used to execute function blocks. • Devices are added and removed from the network without affecting other devices on the network. 9.2.3.4 HSE FOUNDATION Fieldbus Network Management HSE network management allows HSE host systems to perform management operations over the HSE network [37]. The following capabilities are provided by the network management: • Configuring the H1 bridge, which performs data forwarding and republishing between H1 interfaces. • Loading the HSE session list or single entries in this list. An HSE session endpoint represents a logical communication channel between two or more HSE devices. • Loading the HSE VCR list or single entries in this list. An HSE VCR is a communication relationship used for accessing VFDs across HSE. • Performance monitoring for session endpoints, HSE VCRs, and the H1 bridge. • Fault detection monitoring. 9.2.3.5 HSE FOUNDATION Fieldbus Redundancy The HSE FOUNDATION Fieldbus specification provides for management of redundant network interfaces. This capability protects against single and multiple faults in the network. Each device monitors the network and selects the best route to the destination for each message it has to send. HSE provides for various levels of redundancy up to and including complete device and media redundancy. HSE fault tolerance is achieved by operational transparency; i.e., the redundancy operations are not visible to the HSE applications. This is necessary because HSE applications are required to coexist with standard information technology applications. The HSE local area network (LAN) redundancy entity (LRE) coordinates the redundancy function. Each HSE device periodically transmits on both of its Ethernet interfaces a diagnostic message (representing its view of the network) to the other HSE devices. Each device uses the diagnostic messages to maintain a network status table (NST), which is used for fault detection and transmission port selection. There is no central redundancy manager. Instead, each device determines its behavior in response to the faults it detects [38].

9.2.4 Open Systems Implementation One of the main features of FOUNDATION Fieldbus (in both H1 and HSE configurations) is the ability to build open communication systems. It is clear that this represents a key issue in a communication system, as building up a perfect communication stack between two devices is completely useless if those two devices are not able to understand the meaning of each other’s data or behavior. Implementation of open systems is achieved through the use of function blocks and the adoption of a standard way to represent them (the device description language (DDL)). FOUNDATION Fieldbus has defined a set of standard function blocks that can be combined and parameterized to build up a device [11]. Due to the standard format, behavior, and connection of such function blocks, their access and use through the bus is then immediate, allowing the achievement of interoperability and interchangeability. Further, a manufacturer can improve and innovate an existing function block, creating a new standard function block. The solution adopted inside FOUNDATION Fieldbus to realize this has been the DDL, able to provide a formal device description (DD) that can then be interpreted by the DD services library available through FOUNDATION Fieldbus [39][40][41]. Such a DD acts as a driver for each specific device and is supplied together with the device itself. Within each DD, and for each function block included in the device, a hierarchy of definitions is followed: (1) the universal parameters of the device itself, (2) the common parameters of each function block, (3) the common parameters of the transducer blocks, and (4) the parameters specific to the manufacturer. DD may also include small programs able to interoperate with the device (e.g., for its calibration), as well as download capability for managing manufacturers’ upgrading.

© 2005 by CRC Press

9-16

The Industrial Communication Technology Handbook

9.3 Conclusions The Fieldbus Foundation so far appears to be the only multisupplier organization able to achieve concrete results in proposing a large-scale fieldbus solution merging the initial FIP/Profibus proposals. That is mainly due to its features providing true device interoperability and a combination of guaranteed scheduling and token rotation.

References [1] www.fieldbus.org. [2] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Part 2: Physical Layer Specification, 2001. [3] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 3 and 4: Data Link Layer Service and Protocol Definition, 2001. [4] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 5 and 6: Application Layer Service and Protocol Definition, 2001. [5] IEC 61784, Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2001. [6] J.D. Decotignie, P. Pleinevaux, Time critical communication networks: field buses, IEEE Network, 2. [7] J.D. Decotignie, P. Pleinevaux, A survey on industrial communication networks, Annales des Telecommunications, 48, 9–10. [8] www.worldfip.org. [9] www.profibus.org. [10] CENELEC EN50170/A1, General Purpose Field Communication System, Addendum A1, Foundation Fieldbus, 2000. [11] Fieldbus Foundation FF-890, Function Block Application Process: Part 1. [12] Fieldbus Foundation FF-891, Function Block Application Process: Part 2. [13] Fieldbus Foundation FF-892, Function Block Application Process: Part 3. [14] Fieldbus Foundation FF-893, Function Block Application Process: Part 4. [15] Fieldbus Foundation FF-894, Function Block Application Process: Part 5. [16] Fieldbus Foundation FF-902, Transducer Block Application Process: Part 1. [17] Fieldbus Foundation FF-903, Transducer Block Application Process: Part 2. [18] Fieldbus Foundation FF-816, 31.25 kbit/s Physical Layer Profile Specification. [19] ISA S50.02, Physical Layer Standard, 1992. [20] Fieldbus Foundation AG-140, 31.25 kbit/s Wiring and Installation Guide. [21] Fieldbus Foundation AG-165, Fieldbus Installation and Planning Guide. [22] Fieldbus Foundation AG-163, 31.25 kbit/s Intrinsically Safe Systems Application Guide. [23] Fieldbus Foundation FF-821, Data Link Services Subset. [24] Fieldbus Foundation FF-822, Data Link Protocol Subset. [25] Fieldbus Foundation FF-875, Fieldbus Access Sublayer. [26] Fieldbus Foundation FF-870, Fieldbus Message Specification. [27] Fieldbus Foundation FF-800, System Management Specification. [28] Fieldbus Foundation FF-801, Network Management. [29] Douglas E. Comer, Internetworking with TCP/IP, Vol. I, Principles, Protocols, and Architecture, Prentice Hall International, Englewood Cliffs, NJ, 1999. [30] ANSI/IEEE 802.3, IEEE Standards for Local Area Networks: CSMA/CD Access Method and Physical Layer Specifications, 1985. [31] ANSI/IEEE 802.3u, IEEE Standards for Local Area Networks: Supplement to CSMA/CD Access Method and Physical Layer Specifications: MAC Parameters, Physical Layer, MAUs, and Repeater for 100 Mb/s Operation, Type 100BASE-T, 1995. [32] Fieldbus Foundation FF-581, System Architecture.

© 2005 by CRC Press

FOUNDATION Fieldbus: History and Features

[33] [34] [35] [36] [37] [38] [39] [40] [41]

www.modbus.org. CENELEC EN 50325-2, DeviceNet. Fieldbus Foundation FF-588, HSE Field Device Access Agent. Fieldbus Foundation FF-589, HSE System Management. Fieldbus Foundation FF-803, HSE Network Management. Fieldbus Foundation FF-593, HSE Redundancy. Fieldbus Foundation FD-900, Device Description Language Specification. Fieldbus Foundation FD-110, DDS User’s Guide. Fieldbus Foundation FD-100, DDL Tokenizer User’s Manual.

© 2005 by CRC Press

9-17

10 PROFIBUS: Open Solutions for the World of Automation 10.1 Basics ..................................................................................10-1 10.2 Transmission Technologies ...............................................10-2 10.3 Communication Protocol .................................................10-4 PROFIBUS DP • System Configuration and Device Types • Cyclic and Acyclic Data Communication Protocols

10.4 Application Profiles...........................................................10-8 General Application Profiles • Specific Application Profiles • Master and System Profiles

Ulrich Jecht UJ Process Analytics

Wolfgang Stripf Siemens AG

Peter Wenzel PROFIBUS International

10.5 10.6 10.7 10.8

Integration Technologies ................................................10-17 Quality Assurance ...........................................................10-19 Implementation...............................................................10-20 Prospects ..........................................................................10-21 PROFINET CBA • PROFINET IO • The PROFINET Migration Model

Abbreviations .............................................................................10-22 References ...................................................................................10-23

10.1 Basics Fieldbuses are industrial communication systems with bit-serial transmission that use a range of media such as copper cable, fiber optics, or radio transmission to connect distributed field devices (sensors, actuators, drives, transducers, analyzers, etc.) to a central control or management system. Fieldbus technology was developed in the 1980s with the aim of saving cabling costs by replacing the commonly used central parallel wiring and dominating analog signal transmission (4- to 20-mA or ±10-V interface) with digital technology. Due to the different industry-specific demands to sponsored research and development projects or preferred proprietary solutions of large system manufacturers, several bus systems with varying principles and properties were established in the market. The key technologies are now included in the recently adopted standards IEC 61158 and 61784 [1]. PROFIBUS is an integral part of these standards. Fieldbuses create the basic prerequisite for distributed automation systems. Over the years they evolved to instruments for automated processes with high productivity and flexibility compared to conventional technology. PROFIBUS is an open, digital communication system with a wide range of applications, particularly in the fields of factory and process automation, transportation, and power distribution. PROFIBUS is suitable for both fast, time-critical applications and complex communication tasks (Figure 10.1).

10-1 © 2005 by CRC Press

10-2

The Industrial Communication Technology Handbook

PROFIBUS

Upstream Inbound logistics

PROFIBUS

PROFIBUS

Mainstream

PROFIBUS

Downstream Outbound

Production

logistics

Automation Technology

FIGURE 10.1 PROFIBUS suitable for all decentralized applications.

The application and engineering aspects are specified in the generally available guidelines of the PROFIBUS International [2]. This fulfills user demand for standardization, manufacturer independence, and openness and ensures communication between devices of various manufacturers. Based on a very efficient and extensible communication protocol, combined with the development of numerous application profiles (communication models for device type families) and a fast-growing number of devices and systems, PROFIBUS began its record of success, initially in factory automation and, since 1995, in process automation. Today, PROFIBUS is the world market leader for fieldbuses with more than a 20% share of the market, approximately 500,000 equipped plants, and more than 12 million nodes. Today, there are more than 2000 PROFIBUS products available from a wide range of manufacturers. The success of PROFIBUS stems in equal measures from its progressive technology and the strength of its noncommercial PROFIBUS User Organization e.V. (PNO), the trade body of manufacturers and users founded in 1989. Together with the 25 other regional PROFIBUS associations within countries all around the world and the international umbrella organization PROFIBUS International (PI) founded in 1995, this organization now totals more than 1200 members worldwide. Objectives are the continuing further development of PROFIBUS technology and increasing worldwide acceptance. PROFIBUS has a modular structure (PROFIBUS toolbox) and offers a range of transmission and communication technologies, numerous application and system profiles, and device management and integration tools [8]. Thus, PROFIBUS covers the various and application-specific demands from the field of factory to process automation, from simple to complex applications, by selecting the adequate set of components out of the toolbox (Figure 10.2).

10.2 Transmission Technologies PROFIBUS features four different transmission technologies, all of which are based on international standards. They all are assigned to PROFIBUS in both IEC 61158 and IEC 61784: RS485, RS485-IS, MBPIS (IS stands for intrinsic safety protection), and fiber optics. RS485 transmission technology is simple and cost-effective and primarily used for tasks that require high transmission rates. Shielded, twisted-pair copper cable with one conductor pair is used. No expert knowledge is required for installation of the cable. The bus structure allows addition or removal of stations or the step-by-step commissioning of the system without interfering with other stations. Subsequent expansions (within defined limits) have no effect on stations already in operation.

© 2005 by CRC Press

10-3

Integration Technologies

System Profiles 1…x • Master Conformance Classes • Interfaces (Comm-FB, FDT, etc.) • Constraints

Encoder

Ident

PROFIdrive

SEMI

Common application profiles (optional): I&M functions, PROFIsafe, Time stamp, Redundancy, etc.

Communication IEC 61158/61784 protocol

PROFIBUS DP DP-V0...V2

Transmission technologies

• Descriptions (GSD, EDD) • Tools (DTM, Configurators)

Application profiles I

RIO for PA

PA Devices

Application profiles II

Weighing & Dosage

PROFIBUS: Open Solutions for the World of Automation

RS 485 NRZ RS 485-IS Intrinsic Safety

Fiber Glass Multi Mode Optics: Glass Single Mode PCF/Plastic Fiber

MBP: Manchester Bus Powered MBP-LP: Low Power MBP-IS: Intrinsic Safety

FIGURE 10.2 Structure of PROFIBUS system technology.

Various transmission rates can be selected from 9.6 Kbit/s up to 12 Mbit/s. One uniform speed is selected for all devices on the bus when commissioning the system. Up to 32 stations (master or slaves) can be connected in a single segment. For connecting more than 32 stations, repeaters can be used. The maximum permissible line length depends on the transmission rate. Different cable types (type designation A to D) for different applications are available on the market for connecting devices either to each other or to network elements (segment couplers, links, and repeaters). When using RS485 transmission technology, PI recommends the use of cable type A. RS485-IS transmission technology responds to an increasing market demand to support the use of RS485 with its fast transmission rates within intrinsically safe areas. A PROFIBUS guideline is available for the configuration of intrinsically safe RS485 solutions with simple device interchangeability. The interface specification details the levels for current and voltage that must be adhered to by all stations in order to ensure safe operation during interconnection. An electric circuit limits currents at a specified voltage level. When connecting active sources, the sum of the currents of all stations must not exceed the maximum permissible current. In contrast to the FISCO model (see below), all stations represent active sources. Up to 32 stations may be connected to the intrinsically safe bus circuit. MBP type transmission technology (Manchester coding and bus powered) is a new term that replaces the previously common terms for intrinsically safe transmission such as physics in accordance with IEC 61158-2, 1158-2, etc. In the meantime, the current version of IEC 61158-2 (physical layer) describes several different transmission technologies, MBP technology being just one of them. Thus, differentiation in naming was necessary. MBP is a synchronous, Manchester-coded transmission with a defined transmission rate of 31.25 Kbit/s. In the MBP-IS version, it is frequently used in process automation as it satisfies the key demands of the chemical and petrochemical industries for intrinsic safety and bus powering using two-wire technology. MBP transmission technology is usually limited to a specific segment (field devices in hazardous areas) of a plant, which is then linked to a RS485 segment via a segment coupler or links (Figure 10.3). Segment couplers are signal converters that modulate the RS485 signals to the MBP signal level and vice versa. They are transparent from a bus protocol’s point of view. In contrast, links provide more computing power. They virtually map the entire field devices connected to the MBP segment into the RS485 segment as a single slave. Tree or line structures (and any combination of the two) are network topologies supported by PROFIBUS with MBP transmission with up to 32 stations per segment and a maximum of 126 per network.

© 2005 by CRC Press

10-4

The Industrial Communication Technology Handbook

Control system (PLC)

Engineering or HMI tool

ε x

≤ 12 Mbit/s PROFIBUS DP/RS 485 Actuator + ε x

31.25 Kbit/s

I

PROFIBUS DP/MBP-IS Transducer Segment coupler/link

FIGURE 10.3 Intrinsic safety and powering of field devices using MBP-IS.

Fiber-optic transmission technology is used for fieldbus applications with very high electromagnetic interference or that are spread over a large area or distance. The PROFIBUS guideline for fiber-optic transmission [3] specifies the technology available for this purpose, including multimode and singlemode glass fiber, plastic fiber, and hard-clad silica (HCS) fiber. Of course, while developing these specifications, great care was taken to allow problem-free integration of existing PROFIBUS devices in a fiberoptic network without the need to change the protocol behavior of PROFIBUS. This ensures backward compatibility with existing PROFIBUS installations. The internationally recognized FISCO model considerably simplifies the planning, installation, and expansion of PROFIBUS networks in potentially explosive areas. FISCO stands for fieldbus intrinsically safe concept. It was developed by the German PTB [4]. The model is based on the specification that a network is intrinsically safe and requires no individual intrinsic safety calculations when the relevant four bus components (field devices, cables, segment couplers, and bus terminators) fall within predefined limits with regard to voltage, current, output, inductance, and capacity. The corresponding proof can be provided by certification of the components through authorized accreditation agencies, such as PTB (Germany), UL and FM (U.S.), and others. If FISCO-approved devices are used, not only is it possible to operate more devices on a single line, but the devices can be replaced during runtime by devices of other manufacturers, or the line can be expanded — all without the need for time-consuming calculations or system certification. So you can simply plug and play, even in hazardous areas.

10.3 Communication Protocol 10.3.1 PROFIBUS DP At the protocol level, PROFIBUS with decentralized peripherals (DP) and its versions DP-V0 to DP-V2 offer a broad spectrum of optional services, which enable optimum communication between different applications. DP has been designed for fast data exchange at the field level. Data exchange with the distributed devices is primarily cyclic. The communication functions required for this are specified through the DP basic functions (version DP-V0). Geared toward the special demands of the various areas of application, these basic DP functions have been expanded step by step with special functions, so that DP is now available

© 2005 by CRC Press

PROFIBUS: Open Solutions for the World of Automation

10-5

in three versions — DP-V0, DP-V1, and DP-V2 — whereby each version has its own special key features. All versions of DP are specified in detail in IEC 61158 and 61784, respectively. Version DP-V0 provides the basic functionality of DP, including cyclic data exchange as well as station diagnosis, module diagnosis, and channel-specific diagnosis. Version DP-V1 contains enhancements geared toward process automation, in particular acyclic data communication for parameter assignment, operation, visualization, and alarm handling of intelligent field devices, in coexistence with cyclic user data communication. This permits online access to stations using engineering tools. In addition, DP-V1 defines alarms. Examples for different types of alarms are status alarm, update alarm, and a manufacturer-specific alarm. Version DP-V2 contains further enhancements and is geared primarily toward the demands of drive technology. Due to additional functionalities, such as isochronous slave mode and slave-to-slave(s) communication (data exchange broadcast (DXB)), etc., DP-V2 can also be implemented as a drive bus for controlling fast movement sequences in drive axes.

10.3.2 System Configuration and Device Types DP supports implementation of both monomaster and multimaster systems. This affords a high degree of flexibility during system configuration. A maximum of 126 devices (masters or slaves) can be connected to a bus network. In monomaster systems, only one master is active on the bus during operation of the bus system. Figure 10.4 shows the system configuration of a monomaster system. In this case, the master is hosted by a programmable logic controller (PLC). The PLC is the central control component. The slaves are connected to the PLC via the transmission medium. This system configuration enables the shortest bus cycle times. In multimaster systems several masters are sharing the same bus. They represent both independent subsystems, comprising masters and their assigned slaves, and additional configuration and diagnostic master devices. The masters are coordinating themselves by passing a token from one to the next. Only the master that holds the token can communicate. PROFIBUS DP differentiates three groups of device types on the bus. DP master class 1 (DPM1) is a central controller that cyclically exchanges information with the distributed stations (slaves) at a specified message cycle. Typical DPM1 devices are PLCs or PCs. A DPM1 has active bus access with which it can read measurement data (inputs) of the field devices and write the set-point values (outputs) of the actuators at fixed times. This continuously repeating cycle is the basis of the automation function (Figure 10.4). PLC with Master Class 1

Bus cycle

1 2

Slaves

FIGURE 10.4 PROFIBUS DP monomaster system (DP-V0).

© 2005 by CRC Press

10-6

The Industrial Communication Technology Handbook

Power_On

Wait on Parameterization

Optional: - set slave address - get slave diagnosis

Parameterization Configuration

not ok

Wait on Configuration Configuration

Slave fault or timeout

Data Exchange

ok Optional: - get configuration - get slave diagnosis ok Diagnosis telegram instead of process data

FIGURE 10.5 State machine for slaves.

DP master class 2 (DPM2) consists of engineering, configuration, or operating devices. They are put in operation during commissioning and for maintenance and diagnostics in order to configure connected devices, evaluate measured values and parameters, and request the device status. A DPM2 does not have to be permanently connected to the bus system. The DPM2 also has active bus access. DP slaves are peripherals (input/output (IO) devices, drives, human machine interfaces (HMIs), valves, transducers, analyzers), which read in-process information or use output information to intervene in the process. There are also devices that solely provide input information or output information. As far as communication is concerned, slaves are passive devices: they only respond to direct queries (see Figure 10.4, sequences ① and ➁). This behavior is simple and cost-effective to implement. In the case of DPV0, it is already completely included in the Bus-ASIC.

10.3.3 Cyclic and Acyclic Data Communication Protocols Cyclic data communication between the DPM1 and its assigned slaves is automatically handled by the DPM1 in a defined, recurring sequence (Figure 10.4). The appropriate services are called MS0. The user defines the assignment of the slave(s) to the DPM1 when configuring the bus system. The user also defines which slaves are to be included/excluded in the cyclic user data communication. DPM1 and the slaves are passing three phases during start-up: parameterization, configuration, and cyclic data exchange (Figure 10.5). Before entering the cyclic data exchange state, the master first sends information about the transmission rate, the data structures within a PDU, and other slave-relevant parameters. In a second step, it checks whether the user-defined configuration matches the actual device configuration. Within any state the master is enabled to request slave diagnosis in order to indicate faults to the user. An example for the telegram structure for the transmission of information between master and slave is shown in Figure 10.6. The telegram starts with some synchronization bits, the type (SD) and length (LE) of the telegram, the source and destination addresses, and a function code (FC). The function code indicates the type of message or content of the load (processing data unit) and serves as a guard to control the state machine of the master. The PDU, which may carry up to 244 bytes, is followed by a safeguard mechanism frame-checking sequence (FCS) and a delimiter. One example for the usage of the function code is the indication of a fault situation on the slave side. In this case, the master sends a special diagnosis request instead of the normal process data exchange that the slave replies to with a diagnosis message. It comprises 6 bytes of fixed information and userdefinable device and module- or channel-related diagnosis information [1], [7].

© 2005 by CRC Press

10-7

PROFIBUS: Open Solutions for the World of Automation

Stream of standard PROFIBUS telegrams (S) S

Sync time

S

S

SD LE LEr SD DA

33TBit 68H

...

...

68H ....

S

S

S

SA

FC

Processing Data Unit

FCS ED

....

...

1.......244 Bytes

..... 16H

1 Cell = 11 Bit LE SB ZB ZB ZB ZB ZB ZB ZB ZB PB EB 0 1 2 3 4 5 6 7 TBit SD LE LEr DA SA FC

PDU

= Processing Data Unit, 244 Bytes maximum = Clock-Bit = 1/Baudrate = Frame Checking Sequence FCS = Start Delimiter (here SD2, var. data length) (across data within LE) = Length of Process Data ED = End Delimiter = Repetition of Length; no check in FCS SB = Start-Bit = Destination Address ZB0...7 = Character-Bit = Source Address PB = (even) Parity Bit = Function Code (Message type) EB = Stop-Bit

FIGURE 10.6 PROFIBUS DP telegram structure (example).

In addition to the single station-related user data communication, which is automatically handled by the DPM1, the master can also send control commands to all slaves or a group of slaves simultaneously. These control commands are transmitted as multicast messages and enable sync and freeze modes for event-controlled synchronization of the slaves [1], [7]. For safety reasons, it is necessary to ensure that DP has effective protective functions against incorrect parameterization or failure of transmission equipment. For this purpose, the DP master and the slaves are fitted with monitoring mechanisms in the form of time monitors. The monitoring interval is defined during configuration. Acyclic data communication is the key feature of version DP-V1. This forms the requirement for parameterization and calibration of the field devices over the bus during runtime and for the introduction of confirmed alarm messages. Transmission of acyclic data is executed parallel to cyclic data communication, but with lower priority. Figure 10.7 shows some sample communication sequences for a master class 2, which is using MS2 services. In using MS1 services, a master class 1 is also able to execute acyclic communications. Slave-to-slave communications (DP-V2) enable direct and timesaving communication between slaves using broadcast communication without the detour over a master. In this case, the slaves act as publisher; i.e., the slave response does not go through the coordinating master, but directly to other slaves embedded in the sequence, the so-called subscribers (Figure 10.8). This enables slaves to directly read data from other slaves and use them as their own input. This opens up the possibility of completely new applications; it also reduces response times on the bus by up to 90%. The isochronous mode (DP-V2) enables clock synchronous control in masters and slaves, irrespective of the busload. The function enables highly precise positioning processes with clock deviations of Example: Event observation: “The position of control valve A changed by 5˚ at 10:42 A.M.” Event observations require exactly-once semantics when transmitted to a consumer. At the sender, event information is consumed on sending, and at the receiver, event information must be queued and consumed on reading. Event information is transmitted in event messages. Periodic state observations or sporadic event observations are two alternative approaches for the observation of a dynamic environment in order to reconstruct the states and events of the environment at the observer. Periodic state observations produce a sequence of equidistant “snapshots” of the environment that can be used by the observer to reconstruct those events that occur within a minimum temporal distance that is longer than the duration of the sampling period. Starting from an initial state, a complete sequence of (sporadic) event observations can be used by the observer to reconstruct the complete sequence of states of the RT entity that occurred in the environment. However, if there is no minimum duration between events assumed, the observer and the communication system must be infinitely fast.

12.2.3 Temporal Firewalls An extensible architecture must be based on a small number of orthogonal concepts that are reused in many different situations in order to reduce the mental load required for understanding large systems. In a large distributed system the characteristics of the interfaces between subsystems determine to a large extent the comprehensibility of the architecture. In TTA, the communication network interface (CNI; Figure 12.2) between a host computer and the communication network is the most important interface. The CNI appears in every node of the architecture and separates the local processing within a node from the global interactions among the nodes. The CNI consists of two unidirectional data flow interfaces, one from the host computer to the communication system and the other in the opposite direction.

© 2005 by CRC Press

12-4

The Industrial Communication Technology Handbook

Input-Output Subsystem CNI Host Processor with Memory, Operating System and Application Software CNI TT Communication Controller to/from Replicated Communication Channels FIGURE 12.2 Node of TTA.

We call a unidirectional data flow interface elementary if there is only a unidirectional control flow [7] across this interface. An interface that supports periodic state messages with error detection at the receiver is an example of such an elementary interface. We call a unidirectional data flow interface composite if even a unidirectional data flow requires a bidirectional control flow. An event message interface with error detection is an example of a composite interface. Composite interfaces are inherently more complex than elementary interfaces, since the correct operation of the sender depends on the control signals from all receivers. This can be a problem in multicast communication where many control messages are generated for every unidirectional data transfer, and each one of the receivers can affect the operation of the sender. The basic CNI of TTA as depicted in Figure 12.3 is an elementary interface. The time-triggered transport protocol carries autonomously — driven by its time-triggered schedule — state messages from the sender’s CNI to the receiver’s CNI. The sender can deposit the information into its local CNI memory according to the information push paradigm, while the receiver will pull the information out of its local CNI memory. From the point of view of temporal predictability, information push into a local memory at the sender and information pull from a local memory at the receiver are optimal, since no unpredictable task delays that extend the worst-case execution occur during reception of messages. A receiver that is working on a time-critical task is never interrupted by a control signal from the communication system. Since no control signals cross the CNI in TTA (the communication system derives control signals for the fetch-and-delivery instants from the progress of global time and its local schedule exclusively), propagation of control errors is prohibited by design. We call an interface that prevents propagation of control errors by design a temporal firewall [4]. The integrity of the data in the temporal firewall is assured by the nonblocking write (NBW) concurrency control protocol [5, p. 217].

12.2.4 Communication Interface From the point of view of complexity management and composability, it is useful to distinguish between three different types of interfaces of a node: the real-time service (RS) interface, the diagnostic and management (DM) interface, and the configuration and planning (CP) interface [8]. These interface types serve different functions and have different characteristics. For the temporal composability, the most important interface is the RS interface. 12.2.4.1 The Real-Time Service Interface The RS interface provides the timely real-time services to the node environment during the operation of the system. In real-time systems it is a time-critical interface that must meet the temporal specification of the application in all specified load and fault scenarios. The composability of an architecture depends

© 2005 by CRC Press

12-5

Dependable Time-Triggered Communication

CNI Memory

CNI Memory

Pull

Sender

Push

Global Time

Receiver

Cluster Communication System Control Flow

Data Flow

FIGURE 12.3 Data flow and control flow at a TTA interface.

on the proper support of the specified RS interface properties (in the value and temporal domains) during operation. From the user’s point of view, the internals of the node are not visible at the CNI, since they are hidden behind the RS interface. 12.2.4.2 The Diagnostic and Management Interface The DM interface opens a communication channel to the internals of a node. It is used for setting node parameters and for retrieving information about the internals of the node, e.g., for the purpose of internal fault diagnosis. The maintenance engineer that accesses the internals of a node via the DM interface must have detailed knowledge about the internal objects and behavior of the node. The DM interface does not affect temporal composability. Usually, the DM interface is not time critical. 12.2.4.3 The Configuration and Planning Interface The CP interface is used to connect a node to other nodes of a system. It is used during the integration phase to generate the “glue” between the nearly autonomous nodes. The use of the CP interface does not require detailed knowledge about the internal operation of a node. The CP interface is not time critical. The CNI of TTA can be directly used as the real-time service interface. On input, the precise interface specifications (in the temporal and value domains) are the preconditions for the correct operation of the host software. On output, the precise interface specifications are the postconditions that must be satisfied by the host, provided the preconditions have been satisfied by the host environment. Since the bandwidth is allocated statically to the host, no starvation of any host can occur due to high-priority message transmission from other hosts. TTA implements an event-triggered communication service on top of the basic TT service to realize the DM and CP interfaces. Since the event-triggered communication is based on (but not executed in parallel to) the time-triggered communication, it is possible to maintain and to use all predictability properties of the basic TT communication service in event-triggered communication.

12.3 The Time-Triggered Architecture The range of TTA’s services is understood best if put into a broader context: the integrated project Dependable Computer Systems (DECOS) aims to develop technologies to move from federated distributed architectures to integrated distributed architectures [1]. While federated basically means that each application’s subsystem is placed on an independent node, integrated architectures try to unite several application’s subsystems on a single node. A schematic overview of the DECOS approach for integrated distributed architectures is depicted in Figure 12.4. An application is divided into different distributed application subsystems (DASs); such subsystems could be, for example, the power train, a breaking system, a steering system, etc. In a federated architecture each DAS would be implemented on a single node; an integrated architecture provides services that allow more DASs to be implemented on a single node. These services form the platform interface layer (PIL). Examples of PIL services are:

© 2005 by CRC Press

12-6

The Industrial Communication Technology Handbook

DAS A

DAS B

DAS C

DAS D

Platform Interface Layer (PIL)

Basic Services

Different Implementation Platforms and Choices

FIGURE 12.4 Structure of DECOS integrated distributed architecture.

• • • • • •

Encapsulation services Event-triggered communication Virtual networks Hidden gateways Provision of legacy interfaces Application diagnosis support

12.3.1 Basic Services The PIL services rely on a set of validated basic services. TTA is a target architecture that provides the basic services. 12.3.1.1 Predictable Time-Triggered Transmission The very basic principle of time-triggered communication is that transmission of messages is triggered by the clock rather than the availability of new information; the so-called time-division multiple-access (TDMA) strategy is used. In an architecture using TDMA, time is split up into (nonoverlapping) pieces of not necessarily equal durations, which are called slots. These slots are grouped into sequences called TDMA rounds, in which every node occupies exactly one slot. The knowledge of which node occupies which slot in a TDMA round is static, available to all components a priori, and equal for all TDMA rounds. When the time of a node’s slot is reached, the node is provided exclusive access to the communications medium for the duration of the slot, t islot , where 0 £ i < n (assuming there are n nodes in the system). The sending slot, t islot, of a respective node i is split up into three phases: presend, transmit, postreceive, where in the first phase preparations for the transmission are done, and the actual sending process is done in the second phase. During the postreceive phase the state of the nodes is updated according to the received messages. Durations between two consecutive transmit phases of succeeding nodes are called interframe gaps. The interframe gaps have to be chosen with respect to the postreceive phase and the different propagation delays of the messages on the channels. After the end of one TDMA round, the next TDMA round starts; that is, after the sending of the node in the last slot of a TDMA round, the node that is allowed to send in the first slot sends again. Consequently, each node sends predictably every tround time units, where tround = S in=-01t islot. 12.3.1.2 Fault-Tolerant Clock Synchronization It is widely understood that a common agreement on physical time throughout the complete systems is necessary for distributed control applications. Since safety-critical systems shall not rely on a single point of failure, each fault-tolerant solution for clock synchronization requires a distributed solution. Typically,

© 2005 by CRC Press

Dependable Time-Triggered Communication

12-7

we can distinguish three phases in a fault-tolerant clock synchronization algorithm [5]: In the first phase, each node that participates in clock synchronization acquires information on the local views on the global time in all other nodes. The required message exchange can be implemented either by the exchange of dedicated synchronization messages or by a priori knowledge of the transmission pattern of regular message flow (implicit synchronization). In the second phase each node executes a convergence function based on the received deviation values from the different nodes. In the third phase a node adjusts its local timer that represents the local view on the global time by the output of the convergence function. The adjustment procedure can be implemented either as state correction, where the local timer is corrected at an instant, or as rate correction, where the local timer is corrected over an interval by accelerating or decelerating the speed of the local clock. More sophisticated clock synchronization algorithms take the stability of the drift of the node’s local clock into account and correct the rate of the clock in advance. Case studies show that a combination of a regular clock synchronization algorithm with a rate correction algorithm yields an impressive quality of the precision in the system. A crucial phase of clock synchronization is the initial synchronization after power-on when the nodes within a system are unsynchronized (since the power-on times of different nodes may vary, and thus the local clocks start to run at different points in time). Start-up algorithms have to be used to achieve a sufficient degree on initial synchronization. One possible solution for a start-up algorithm is a variation of a clock synchronization algorithm: after power-on, the local clocks of different nodes may be far apart, but successive rounds of message exchange and convergence should achieve a sufficient precision. However, if the exchange of messages, in particular synchronization messages, itself requires synchronization between the nodes, as is the case in time-triggered protocols, this solution cannot be implemented and a dedicated start-up algorithm has to be constructed. 12.3.1.3 Determinism A definition of a timely and deterministic multicast channel is given in [9] by the following three properties: 1. Timeliness: Given that a message is sent at the send instant tsend, then the receive instants treceive at all receivers of the (multicast) message will be in the interval [tsend + dmin, tsend + dmax], where dmin is called the minimum delay and dmax is called the maximum delay. The difference dmax – dmin is called the jitter of the communication channel. dmax and dmin are a priori known characteristic parameters of the given communication channel. 2. Constant order: The receive order of the messages is the same as the send order. The send order among all messages is established by the temporal order of the send instants of the messages as observed by an omniscient observer. 3. Agreed order: If the send instances of n (n > 1) messages are the same, then an order of the n messages will be established in an a priori known manner. We call a communication channel that fulfills properties 2 and 3 ordinal deterministic. If a communication channel fulfills all properties stated above, this communication channel is temporal deterministic; thus, temporal determinism is a stronger form of determinism than ordinal determinism. We call a communication channel path deterministic if there exists an a priori known route from a sending to a receiving node. Path determinism and temporal determinism are therefore orthogonal properties. 12.3.1.4 Fault Isolation In the field of fault-tolerant computing the notion of a fault containment region (FCR) is introduced in order to delimit the impact of a single fault. A fault containment region is defined as the set of subsystems that share one or more common resources. A fault in any one of these shared resources can thus impact all subsystems of the FCR; i.e., the subsystems of an FCR cannot be considered to fail independently of each other. In the context of this chapter we consider the following resources that can be impacted by a fault: • Computing hardware • Power supply

© 2005 by CRC Press

12-8

The Industrial Communication Technology Handbook

• Timing source • Clock synchronization service • Physical space For example, if two subsystems depend on a single timing source, e.g., a single oscillator or a single clock synchronization algorithm, then these two subsystems are not considered to be independent and therefore belong to the same FCR. Since this definition of independence allows that two FCRs can share the same design, i.e., the same software, software faults are not part of this fault model. In TTA a node is considered to form a single FCR. An architecture for safety-critical systems has to ensure that a fault that affects one FCR is isolated so that it will not cause other FCRs to fail. 12.3.1.5 FCR Diagnosis (Membership) The failure of an FCR must be reported to all other FCRs in a consistent manner within a short latency [5]. The membership service is a form of concurrent diagnosis that realizes such a detection service. The time-triggered protocols TTP/C and TTP/A are concrete implementations of TTA services. TTP/C is designed for ultrahigh dependable systems and thus tolerates either an arbitrary failure of any one of its nodes or a passive arbitrary failure of one of its channels (that means that even a faulty channel will not be allowed to create a correct TTP/C message itself). Furthermore, TTP/C is equipped with fault tolerance mechanisms that ensure that if the fault assumptions are temporally violated, the system will be able to recover within a bounded duration after the fault assumptions hold again. To ensure this robustness TTP/C implements all listed basic services. The low-cost TTP/A protocol is intended for usage as a fieldbus protocol and tolerates only fail-silent components. It implements only the predictable time-triggered transmission service. We discuss the time-triggered protocols TTP/C and TTP/A next.

12.3.2 The System Protocol TTP/C

Hub

The time-triggered protocol for Society of Automotive Engineers (SAE) Class C applications (TTP/C) currently supports bus (Figure 12.5a) and star (Figure 12.5b) topologies as well as hybrid compositions of those. The communication medium is replicated to compensate transmission failures of messages. The communication links are half duplex; that is, a node is able to either transmit or receive via an attached link. Full-duplex links would not bring advancements since the TDMA strategy excludes the possibility of more than one good node transmitting concurrently.* TTP/C realizes the predictable time-triggered transmission service by adhering to an a priori defined communication schedule that organizes communication into TDMA rounds. Several successive TDMA rounds form a cluster cycle. The messages a node may send may differ with respect to the TDMA

Node

Node

Node

Node

Node

Node

Node

Hub

Node

a)

b)

FIGURE 12.5 Different TTP/C topologies.

*Full-duplex links may bring advancements during the start-up phase of the protocol; the current start-up algorithm, however, is designed for half-duplex links.

© 2005 by CRC Press

Dependable Time-Triggered Communication

12-9

round in the cluster cycle. When a cluster cycle is finished, it is restarted such that the cluster cycle is executed cyclically. The communication schedule, the so-called message description list (MEDL), is stored within the communication controller of each node. In addition to the time-triggered transmission concept described in Section 12.3.1.1, TTP/C also supports multiplexed nodes and shadow nodes. A set of nodes is called to be multiplexed if they share the same slot in a TDMA round. Depending on the TDMA round in a cluster cycle, the single node that is allowed to send in the multiplexed slot is identified (this information is stored in the MEDL as well). A shadow node has a dedicated slot in the TDMA round but will only transmit in this slot if it detects that its primary node fails to send. After recovery of the primary node the former primary will act as shadow node. A particular message may carry up to 240 bytes of data. The data are protected by a 24-bit cyclic redundancy check (CRC) checksum. In order to achieve high data efficiency, the sender name and message name are derived from the send instant. We distinguish between three different types of messages in TTP/C: I-frames, N-frames, and X-frames. I-frames carry the current controller state (C-state) and can be used for nodes that are out of synchronization to reintegrate into a running system. N-frames are used for regular application data and do not carry C-state information explicitly. However, the sending node calculates the CRC checksum using the N-frame and its internal C-state. A receiving node will calculate the CRC checksum using the received N-frame and its C-state. Thus, the CRC check will only be successful if both sender and receiver agree on the C-state. Using this form of CRC checksum calculation makes it impossible for a receiver to distinguish a transmission failure from a disagreement on the Cstate. X-frames (that is, N-frames that carry the C-state information explicitly) overcome this limitation. The fault-tolerant clock synchronization of TTP/C exploits the common knowledge of the send schedule: every node measures the difference between the a priori known expected and the actually observed arrival time of a correct message to learn about the difference between the clock of the sender and the clock of the receiver. This information is used by a fault-tolerant average algorithm to calculate periodically a correction term for the local clock in order to keep the clock in synchrony with all other clocks of the cluster. The clock synchronization algorithm has been formally verified in [16]. TTP/C uses a faulttolerant start-up algorithm that ensures that the system will become synchronized within an upper bound in time, provided that there is a minimum number of components awake. The start-up algorithm used in TTP/C is a waiting room algorithm that is based on unique time-outs. Each node i has two unique ) and cold-start time-out (t CS time-outs, listen time-out (t listen i i ). For each pair of nodes i, j the following relation holds: t listen > t CS i j

(12.1)

time units. After power-up, say at t0, node k starts to listen on the communication channels for t listen k If there is already synchronous operation established, node k will receive a frame during this period. If the , it will initiate cold-start by itself by sending a coldnode was not able to synchronize during t0 + t listen k start frame. After transmission of the cold-start frame, node k listens to the communication channel until t 0 + t listen . If node k was not able to integrate until this point in time, and no collision occurred, + t CS k k node k will send another cold-start frame. Node k will transmit cold-start frames with a period of t CS k until it successfully synchronizes to a received frame. Extensive model-checking studies of the start-up concept, including exhaustive failure simulation, were performed in [18]. As a key lemma of these studies, it was verified that a minimum configuration of three nodes and intelligent central guardians is necessary and sufficient to tolerate one arbitrarily faulty node or one passive arbitrarily faulty central guardian during the start-up sequence. The membership service employs a distributed agreement algorithm to determine whether the outgoing link of the sender or the incoming link of the receiver has failed. Nodes that have suffered a transmission fault are excluded from the membership until they restart with a correct protocol state. Before each send operation of a node, the clique avoidance algorithm checks if the node is a member of the majority clique. Certain aspects of TTA group membership service have been formally verified in [15].

© 2005 by CRC Press

12-10

The Industrial Communication Technology Handbook

The fault tolerance concepts of TTA that are used in TTP/C are discussed in detail in Section 12.4. As in any distributed computing system, the performance of TTA depends primarily on the available communication bandwidth and computational power. Because of physical effects of time distribution and limits in the implementation of the guardians [19], a minimum interframe gap of about 5 ms must be maintained between frames to guarantee the correct operation of the guardians. If a bandwidth utilization of about 80% is intended, then the message send phase must be in the order of about 20 ms, implying that about 40,000 messages can be sent per second within such a cluster. With these parameters, a sampling period of about 250 ms can be supported in a cluster composed of 10 nodes. The precision of the clock synchronization in current prototype systems is below 1 ms. If the interframe gap and bandwidth limits are stretched, it might be possible to implement in such a system a 100-ms TDMA round (corresponding to a 10-kHz control loop frequency), but not much smaller if the system is physically distributed (to tolerate spatial proximity faults). The amount of data that can be transported in the 20-ms window depends on the bandwidth: in a 5 MBit/s system it is about 12 bytes; in a 1 GBit/s system it is about 2400 bytes. A prototype implementation of TTP/C using Gigabit Ethernet [17] was developed within the next TTA project. This prototype implementation uses COTS (commercially available, off-the-shelf technology) hardware and was therefore not expected to achieve the limiting performance. The objective of this project was rather to determine the performance that can be achieved without special hardware and to pinpoint the performance bottlenecks faced when using COTS components. TTP/C is commercially available in the form of the automotive qualified TTP/C-C2 chip [22]. A Federal Avionics Aviation (FAA) certification process (DO-178b) is currently under finalization that shall also prove the appropriateness of the hardware for avionics applications. The detailed specification of the TTP/C protocol can be found at [20]. There are several ongoing projects that use TTP/C; examples are a railway signaling system or the cabin pressure control in Airbus A380. See [21] for a list of projects that employ TTP/C as a commercial product.

12.3.3 The Fieldbus Protocol TTP/A The TTP/A protocol is the time-triggered fieldbus protocol of TTA. It is used to connect low-cost smart transducers to a node of TTA, which acts as the master of a transducer cluster. In TTP/A the CNI memory element of Figure 12.3 has been expanded at the transducer side to hold a simple interface file system (IFS). Each interface file contains up to 256 records of four bytes each. The IFS forms the uniform name space for the exchange of data between a sensor and its environment (Figure 12.6). The IFS holds the real-time data, calibration data, diagnostic data, and configuration data. The information between the IFS of the smart transducer and the CNI of the TTA node is exchanged by the time-triggered TTP/A protocol, which distinguishes between two types of rounds, the master–slave (MS) round and the multipartner (MP) round. The MS rounds are used to read and write records from the IFS of a particular transducer to implement the DM and CP interfaces. The MP rounds are periodic and transport data from selected IFS records of several transducers across the TTP/A cluster to implement the RS. MP rounds and MS rounds are interleaved, such that the time-critical RS implemented by means of MP rounds and the event-based MS service can coexist. It is thus possible to diagnose a smart transducer

Internal Logic of Transducer

Interface File System (IFS)

Sensor FIGURE 12.6 Interface file system in a smart transducer.

© 2005 by CRC Press

Read by Client Write

Dependable Time-Triggered Communication

12-11

or to reconfigure or install a new smart transducer online, without disturbing the time-critical RS of the other nodes. The TTP/A protocol also supports a plug-and-play mode where new sensors are detected, configured, and integrated into a running system online and dynamically. The detailed specification of the TTP/A protocol can be found at [14].

12.4 Fault Tolerance In any fault-tolerant architecture it is important to distinguish clearly between fault containment and error containment. Fault containment is concerned with limiting the immediate impact of a single fault to a defined region, while error containment tries to avoid the propagation of the consequences of a fault, the error. It must be prohibited that an error in one fault containment region propagate into another fault containment region that has not been directly affected by the original fault.

12.4.1 Fault Containment In TTA, nodes communicate by the exchange of messages across replicated communication channels. Each one of the two channels transports independently its own copy of the message at about the same time from the sending CNI to the receiving CNI. The start of sending a message by the sender is called the message send instant. The termination of receiving a message by the receiver is called the message receive instant. In TTA, the intended message send instants and the intended message receive instants are a priori known to all communicating partners. A message contains an atomic data structure that is protected by a CRC. We make the assumption that a CRC cannot be forged by a fault. A message is called a valid message if it contains a data structure with a correct CRC. A message is called a timely message if it is a valid message and conforms to the temporal specification. A message that does not conform to the temporal specification is an untimely message. A timely message is a correct message if its data structure is in agreement, at both the syntactic and semantic levels, with the specification. We call a message with a message length that differs from its specification or with an incorrect CRC an invalid message.

12.4.2 Error Containment in the Temporal Domain An error that is caused by a fault in the sending FCR can propagate to another FCR via a message failure; i.e., the FCR sends a message that deviates from the specification. A message failure can be a message value failure or a message timing failure. A message value failure implies either that a message is invalid or that the data structure contained in a valid message is incorrect. A message timing failure implies that the message send instant or the message receive instant is not in agreement with the specification. In order to avoid error propagation of a sent message, we need error detection mechanisms that are in different FCRs than the message sender. Otherwise, the error detection mechanism may be impacted by the same fault that caused the message failure. In TTA we distinguish between timing failure detection and value failure detection. Timing failure detection is performed by a guardian (Figure 12.7), which is part of TTA. Value failure detection is the responsibility of the host computer. The guardian is an autonomous unit that has a priori knowledge of all intended message send and receive instants. Each one of the two replicated communication channels has its own independent guardian. A receiving node within TTA judges a sending node as operational, if it has received at least one timely message from the sender around the specified receive instant. It is assumed that a guardian cannot forge a CRC and cannot store messages; i.e., it can only output a valid message at one of its output ports if it has received a valid message on one of its input ports within the last d time units. A guardian transforms a message that it judges to be untimely into an invalid message by cutting off its tail. Such a truncated message will be recognized as invalid by all correct receivers and will then be discarded. The guardian may truncate a message either because it detected a message timing failure or because the guardian itself is faulty. In the latter case it is assumed that the sender of the message is correct, and thus the correct message will proceed to the receivers via the replicated channel of TTA.

© 2005 by CRC Press

12-12

The Industrial Communication Technology Handbook

TT

P/ C

CN I

Star Coupler including Central Guardian and TTP/C Communication Controller

TTP/C Communication Controller

C P/ TT I

CN FIGURE 12.7 TTA star topology with central guardian.

12.4.3 Error Handling in the Value Domain Detection of value failures is not the responsibility of TTA, but of the host computers. For example, detection and correction of value failures can be performed in a single step by triple modular redundancy (TMR). In this case three replicated senders, placed in three different FCRs, perform the same operations in their host computers. They produce — in the fault-free case — correct messages with the same content that are sent to three replicated receivers that perform a majority vote on these three messages (actually, at the communication level six messages will be transported, one from each sender on each of its two channels). Detection of value failures and detection of timing failures are not independent in TTA. In order to implement a TMR structure at the application level, the integrity of the timing of the architecture must be assumed. An intact sparse global time base is a prerequisite for the systemwide definition of the distributed state, which again is a prerequisite for masking value failures by voting. The separation of handling timing failures from handling value failures has beneficial implications for resource requirements. In general, it is necessary to implement interactive consistency to solve the Byzantine Generals Problem: a set of nodes has to agree upon a correct value in the presence of faulty nodes that may be asymmetric faulty. A Byzantine-tolerant algorithm that establishes interactive consistency in the presence of k arbitrary failing nodes requires 3k + 1 nodes and several rounds of message exchange [12]. For clock synchronization, and thus for the maintenance of the sparse global time base, instead of an interactive consistency algorithm, an interactive convergence algorithm [11] can be used that needs only a single round of message exchange. TTA claims to tolerate one arbitrary faulty component (that is, k = 1). Since all nodes of a cluster, independent of their involvement in a particular application system, can contribute to handling timing failures at the architectural level, the lower bound of nodes in a system is 4, which is a relatively small number for real systems. Once a proper global time has been established, TMR for masking of value failures can be implemented using only 2k + 1 synchronized nodes in a particular application subsystem. Two concepts contribute to this fine property: the self-confidence principle and replica determinism. According to the self-confidence principle, a node will consider itself correct until it is accused by a sufficient set of nodes. A set of nodes that operates replica determinant will produce the same output that are at most an a priori specifiable interval d apart [5]. That means that the tolerance of a Byzantine-faulty component does not necessarily

© 2005 by CRC Press

12-13

Dependable Time-Triggered Communication

Physical CAN Bus

Logical CAN Bus

CAN Controller

High-dependable TTP/C TTP/C Communication Communication Controller FIGURE 12.8 Virtual CAN on top of TTP/C.

require a solution to the Byzantine Generals Problem. The Byzantine Generals Problem has to be solved only if values from the environment are received, and the nodes have to come to a consistent view on these values. This separation of timing failures and value failures thus reduces the number of components needed for fault tolerance of an application from 3k + 1 to 2k + 1.

12.4.4 Virtual Networks As it is most likely that in a real system the nodes are of mixed criticality, it is economically feasible to provide a communication infrastructure that is of mixed dependability. Nodes that execute a highly dependable task are of high criticality and communicate via a highly dependable network protocol, while nodes of minor criticality operate on a low-dependable network protocol. TTA provides a mixeddependability communication infrastructure by virtual networks. Virtual networks provide a logical network structure on top of a physical network structure by emulation. Example: Recent research was concerned with a prototype study of CAN over TTP/C [13]. In this work two physical CAN networks were connected to a TTP/C cluster via two gateway nodes (Figure 12.8). The CAN messages are tunneled through the TTP/C system. Thus, the physically separated CAN buses form one logical CAN bus in a transparent fashion for the CAN controllers. With the virtual network approach, it is possible to have low-critical nodes communicate via a dynamic protocol while highly critical nodes communicate via the highly dependable TTP/C. Furthermore, the logical CAN bus consists of three independent fault containment regions, and thus a babbling CAN controller will only affect the physical part of the CAN bus where it is located. This approach is also scalable with respect to the number of logical CAN buses. To summarize, fault containment and error detection are achieved in TTA in three distinct steps. First, fault containment is achieved by proper architectural decisions concerning resource sharing in order to provide independent fault containment regions. Second, propagation of timing errors is avoided at the architecture level by the guardians. Third, handling of value failures is performed at the application level by voting.

12.5 The Design of TTA Applications Composability and the associated reuse of nodes and software can only be realized if the architecture supports a two-level design methodology. In TTA such a methodology is supported: TTA distinguishes between the architecture design (cluster design) and the component design (node design).

© 2005 by CRC Press

12-14

The Industrial Communication Technology Handbook

I/O TTP/A Network Driver Interface

Vehicle Dynamics

Brake Manager

Engine Control

I/O

I/O

Communication Controller

Gateway Body

Steering Manager

Suspension

I/O

I/O

Communication Network Interface

Replicated Broadcast Channels

FIGURE 12.9 Decomposition of a drive-by-wire application.

12.5.1 Architecture Design In the cluster design phase, an application is decomposed into clusters and nodes. This decomposition will be guided by engineering insight and the structure inherent in the application, in accordance with the proven architecture principle of form follows function. For example, in an automotive environment, a drive-by-wire system may be decomposed into functional units, as depicted in Figure 12.9. If a system is developed “on the green lawn,” then a top-down decomposition will be pursued. After the decomposition has been completed, the CNIs of the nodes must be specified in the temporal and value domains. The data elements that are to be exchanged across the CNIs are identified, and the precise fetch instants and delivery instants of the data at the CNI must be determined. Given these data, the schedules of the TTP/C communication system can be calculated and verified. At the end of the architecture design phase, the precise interface specifications of the nodes are available. These interface specifications are the inputs and constraints for the node design. Given a set of available nodes with their temporal specifications (nodes that are available for reuse), a bottom-up design approach must be followed. Given the constraints of the nodes at hand (how much time they need to calculate an output from an input), a TTP/C schedule must be found that meets the application requirements and satisfies the node constraints.

12.5.2 Component Design During the node design phase, the application software of the host computer is developed. The deliveryand-fetch instants established during the architecture design phase are the preconditions and postconditions for the temporal validation of the application software. The host operating system can employ any reasonable scheduling strategy, as long as the given deadlines are satisfied and the replica determinism of the host system is maintained. Node testing proceeds bottom up. A new node must be tested with respect to the given CNI specifications in all anticipated load and fault conditions. The composability properties of TTA (stability of prior service achieved by the strict adherence to information pull interfaces) ensure that a property that has been validated at the node level will also hold at the system level. At the system level, testing will focus on validating the emerging services that are a result of the integration.

12.5.3 Validation Today, the integration and validation phases are probably the most expensive phases in the implementation of a large distributed real-time system. TTA has been designed to reduce this integration and validation effort by providing the following mechanisms:

© 2005 by CRC Press

Dependable Time-Triggered Communication

12-15

• The architecture provides a consistent distributed computing base to the application and informs the application in case a loss of consistency is caused by a violation of the fault hypothesis. The basic algorithms that provide this consistent distributed computing base (clock synchronization and membership) have been analyzed by formal methods and are implemented once and for all in silicon. The application need not be concerned with the implementation and validation of the complex distributed agreement protocols that are needed to establish consistency in a distributed system. • The architecture is replica deterministic, which means that any observed deficiency can be reproduced in order to diagnose the cause of the observed problem. • The interaction pattern between the nodes and the contents of the exchanged messages can be observed by an independent observer without the probe effect. It is thus possible to determine whether a node complies with its preconditions and postconditions without interfering with the operation of the observed node. • The internal state of a node can be observed and controlled by the DM interface. • In TTA it is straightforward to provide a real-time simulation test bench that reproduces the environment to any node in real time. Deterministic automatic regression testing can thus be implemented.

12.6 Conclusions The Time-Triggered Architecture is the result of more than 20 years of research in the field of dependable distributed real-time systems. During this period, many ideas have been developed, implemented, evaluated, and finally discarded. What survived is a small set of orthogonal concepts that center around the availability of a dependable global time base. The guiding principle during the development of TTA has always been to take maximum advantage of the availability of this global time, which is part of the world, even if we do not use it. TTA spans the whole spectrum of dependable distributed real-time systems, from the low-cost deeply embedded sensor nodes to high-performance nodes that communicate at gigabits per second speeds, persistently assuming that a global time of appropriate precision is available in every node of TTA. At present, TTA occupies a niche position, since in the experimental as well as in the theoretical realm of main-line computing, time is considered a nuisance that makes life difficult and should be dismissed at the earliest moment [10]. However, as more and more application designers start to realize that real time is an integrated part of the real world that cannot be abstracted away, the future prospects for TTA look encouraging.

Acknowledgments This work was supported by the European IST (Information Society Technologies) project “Next TTA” under project number IST-2001-32111. This document is a revised version of [2].

References [1] Consortium DECOS. DECOS Annex 1: Description of Work, 2003. Contract FP6-511764. [2] H. Kopetz and G. Bauer. Time-triggered communication networks. In Industrial Information Technology Handbook. CRC Press, Boca Raton, FL, 2004. [3] H. Kopetz and G. Bauer. The Time-Triggered Architecture. Proceedings of the IEEE, 91:112–126, 2003. [4] H. Kopetz and R. Nossal. Temporal firewalls in large distributed real-time systems. In Proceedings of the IEEE Workshop on Future Trends in Distributed Computing, 1997, pp. 310–315. [5] H. Kopetz. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.

© 2005 by CRC Press

12-16

The Industrial Communication Technology Handbook

[6] H. Kopetz. The time-triggered (TT) model of computation. In Proceedings of the 19th IEEE RealTime System Symposium, 1998, pp. 168–177. [7] H. Kopetz. Elementary versus composite interfaces in distributed real-time systems. In Proceedings of the 4th International Symposium on Autonomous Decentralized Systems, 1999, pp. 26–33. [8] H. Kopetz. Software engineering for real-time: a roadmap. In Proceedings of the 22nd International Conference on Software Engineering, 2000, pp. 201–211. [9] Hermann Kopetz. On the Determinism of Communication Systems. Research Report 48/2003, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2003. [10] E. Lee. What’s ahead for embedded software? IEEE Computer, 33:18–26, 2000. [11] L. Lamport and P.M. Melliar-Smith. Synchronizing clocks in the presence of faults. Journal of the ACM, 32:52–78, 1985. [12] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, 4:382–401, 1982. [13] Roman Obermaisser. An Integrated Architecture for Event-Triggered and Time-Triggered Control Paradigms. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [14] OMG. Smart Transducers Interface. Final adopted specification ptc/2002-10-02, Object Management Group, 2002. Available at http://www.omg.org. [15] H. Pfeifer. Formal verification of the TTP group membership algorithm. In Tommaso Bolognesi and Diego Latella, editors, Formal Methods for Distributed System Development Proceedings of FORTE XIII/PSTV XX 2000, Pisa, Italy, October 2000, pp. 3–18. Kluwer Academic Publishers, Dordrecht, The Netherlands. [16] Holger Pfeifer, Detlef Schwier, and Friedrich W. von Henke. Formal verification for time-triggered clock synchronization. In Charles B. Weinstock and John Rushby, editors, Dependable Computing and Fault Tolerant Systems, Vol. 12, Dependable Computing for Critical Applications — 7, IEEE Computer Society, San Jose, CA, 1999, pp. 207–226. [17] Martin Schwarz. Implementation of a TTP/C Cluster Based on Commercial Gigabit Ethernet Components. Master’s thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [18] Wilfried Steiner, John Rushby, Maria Sorea, and Holger Pfeifer. Model Checking a Fault-Tolerant Startup Algorithm: From Design Exploration to Exhaustive Fault Simulation. Paper presented at the International Conference on Dependable Systems and Networks (DSN2004), June 2004. [19] Christopher Temple. Enforcing Error Containment in Distributed Time-Triggered Systems: The Bus Guardian Approach. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 1999. [20] TTTech Computertechnik AG. Specification of the TTP/C Protocol. Available at http://www.tttech.com. [21] TTTech Computertechnik AG. TTP in Commercial Production. Available at http://tttech.com/c u s t o m e r s / . [22] TTTech Computertechnik AG. TTP/C-C2 Data Sheet. Available at http://www.ttchip.com.

© 2005 by CRC Press

13 Controller Area Network: A Survey 13.1 Introduction ......................................................................13-1 13.2 CAN Protocol Basics.........................................................13-2 Physical Layer • Frame Format • Access Technique • Error Management • Fault Confinement • Communication Services • Implementation

13.3 Main Features of CAN....................................................13-12 Advantages • Drawbacks • Solutions

Gianluca Cena IEIIT-CNR

Adriano Valenzano IEIIT-CNR

13.4 Time-Triggered CAN ......................................................13-14 Main Features • Protocol Specification • Implementation

13.5 CAN-Based Application Protocols.................................13-16 CANopen • DeviceNet

References ...................................................................................13-20

13.1 Introduction The history of Controller Area Network (CAN) starts more than 20 years ago. At the beginning of the 1980s a group of engineers at Bosch GmbH were looking for a serial bus system suitable for use in passenger cars. The most popular solutions adopted at that time were considered inadequate for the needs of most automotive applications. The bus system, in fact, had to provide a number of new features that could hardly be found in the already existing fieldbus architectures. The design of the new proposal also involved several academic partners and had the support of Intel, as the potential main semiconductor producer. The new communication protocol was presented officially in 1986 with the name of Automotive Serial Controller Area Network at the Society of Automotive Engineers (SAE) congress held in Detroit. It was based on a multimaster access scheme to the shared medium that resembled the well-known carriersense multiple-access (CSMA) approach. The peculiar aspect, however, was that CAN adopted a new distributed nondestructive arbitration mechanism to solve contentions on the bus by means of priorities implicitly assigned to the colliding messages. Moreover, the protocol specifications also included a number of error detection and management mechanisms to enhance the fault tolerance of the whole system. In the following years, both Intel and Philips started to produce controller chips for CAN following two different philosophies. The Intel solution (often referred to as FullCAN in the literature) required less host CPU power, since most of the communication and network management functions were carried out directly by the network controller. Instead, the Philips solution (BasicCAN) was simpler but imposed a higher load on the processor used to interface the CAN controller. Since the mid-1990s more than 15 semiconductor vendors, including Siemens, Motorola, and NEC, have been producing and shipping millions of CAN chips mainly to car manufacturers such as Mercedes-Benz, Volvo, Saab, Volkswagen, BMW, Renault, and Fiat. The Bosch specification (CAN version 2.0) was submitted for international standardization at the beginning of the 1990s. The proposal was approved and published as ISO 11898 at the end of 1993 and

13-1 © 2005 by CRC Press

13-2

The Industrial Communication Technology Handbook

contained the description of the network access protocol and the physical layer architecture. In 1995 an addendum to ISO 11898 was approved to describe the extended format for message identifiers. The CAN specification is currently in the process of being revised and reorganized and has been split into four separate parts: [ISO1], [ISO2], and [ISO4] have already been approved as international standards, whereas [ISO3] has reached a stable status and is being finalized. Even though it was conceived for vehicle applications, at the beginning of the 1990s CAN began to be adopted in different scenarios. The standard documents provided satisfactory specifications for the lower communication layers but did not offer guidelines or recommendations for the upper part of the Open Systems Interconnection (OSI) protocol stack, in general, and for the application layer, in particular. This is why the earlier applications of CAN outside the automotive scenario (i.e., textile machines, medical systems, and so on) adopted ad hoc monolithic solutions. The CAN in Automation (CiA) users’ group, founded in 1992, was originally concerned with the specification of a standard CAN application layer. This effort led to the development of the general-purpose CAN application layer (CAL) specification. CAL was intended to fill the gap between the distributed application processes and the underlying communication support, but in practice it was not successful, the main reason being that because CAL is really application independent, each user had to develop a suitable profile based on CAL for her or his specific application field. In the same years, Allen-Bradley and Honeywell started a joint distributed control project based on CAN. Although the project was abandoned a few years later, Allen-Bradley and Honeywell continued their works separately and focused on the higher protocol layers. The results of these activities were the Allen-Bradley DeviceNet solution and the Honeywell Smart Distributed System (SDS). For a number of reasons, SDS remained, in practice, an internal solution to Honeywell Microswitch, while DeviceNet was soon switched to Open DeviceNet Vendor Association and was widely adopted in a number of U.S. factory automation areas, becoming a serious competitor to widespread solutions such as PROFIBUS-DP and INTERBUS. Besides DeviceNet and SDS, other significant initiatives were focused on CAN and its application scenarios. CANopen was conceived in the framework of the European Esprit project ASPIC* by a consortium led once again by Bosch GmbH. The purpose of CANopen was to define a profile based on CAL, which could support communications inside production cells. The original CANopen specifications were further refined by CiA and released in 1995. Later, both CANopen and DeviceNet became European standards, and they are now widely used, especially in two different areas: factory automation and machine-distributed controls.

13.2 CAN Protocol Basics The CAN protocol architecture is structured according to the layered approach of the International Organization for Standardization (ISO)/OSI model. However, as in most of the currently existing networks conceived for use at the field level in the automated manufacturing environments, only few layers have been considered in its protocol stack. This is to make implementations more efficient and inexpensive. Few protocol layers, in fact, imply reduced processing delays when receiving and transmitting messages and simpler communication software. The CAN specifications [ISO1] and [ISO2], in particular, include only the physical and the data link layers, as depicted in Figure 13.1. The physical layer is aimed at managing the effective transmission of data over the communication support and tackles the mechanical, electrical, and functional aspects. Bit timing and synchronization, in particular, belong to this layer. The data link layer is split into two separate sublayers: medium access control (MAC) and logical link control (LLC). The purpose of the MAC entity is basically to manage access to the shared transmission support by providing a mechanism aimed at coordinating the use of the bus, so as to avoid unmanageable collisions. The functions of the MAC sublayer include frame encoding and decoding, arbitration, error

*ASPIC, Automation and control Systems for Production units using Installation bus-Concept.

© 2005 by CRC Press

Controller Area Network: A Survey

13-3

FIGURE 13.1 CAN protocol stack.

checking and signaling, and also fault confinement. The LLC sublayer offers the user (i.e., the application programs running in the upper layers) a proper interface, which is characterized by a well-defined set of communication services, in addition to the ability to decide whether an incoming message is relevant to the node. It is worth noting that the CAN specification is very flexible for what concerns both the implementation of the LLC services and the choice of the physical support, whereas there can be no modifications to the behavior of the MAC sublayer. As said before, unlike most fieldbus networks, the CAN specification does not include any native application layer. However, a number of such protocols exist that rely on CAN and ease the design and implementation of complex CAN systems.

13.2.1 Physical Layer The features of the physical layer of CAN that are valid for any system, such as those related to the physical signaling, are described in ISO 11898-1 [ISO1]. The medium access units (i.e., the transceivers) are defined in two separate documents: ISO 11898-2 [ISO2] and ISO 11898-3 [ISO3] for high-speed and low-speed communications, respectively. The definition of the medium interface (i.e., the connectors) is usually covered in other documents. 13.2.1.1 Network Topology CAN networks are based on a shared-bus topology. Buses have to be terminated at each end with resistors (the recommended nominal impedance is 120 W), so as to suppress signal reflections. For the same reason, the standard documents state that the topology of a CAN network should be as close as possible to a single line. Stubs are permitted for connecting devices to the bus, but their length should be as short as possible. For example, at 1 Mbit/s the length of a stub must be shorter than 30 cm. Several kinds of transmission media can be used: • Two-wire bus, which enables differential signal transmissions and ensures reliable communications. In this case, shielded twisted pair can be used to further enhance the immunity to electromagnetic interferences. • Single-wire bus, a simpler and cheaper solution that features lower immunity to interferences and is mainly suitable for use in automotive applications.

© 2005 by CRC Press

13-4

The Industrial Communication Technology Handbook

• Optical transmission medium, which ensures complete immunity to electromagnetic noise and can be used in hazardous environments. Fiber optics is often adopted to interconnect (through repeaters) different CAN subnetworks. This is done to cover plants that are spread over a large area. Several bit rates are available for the network, the most adopted being in the range of 50 Kbit/s to 1 Mbit/s (the latter value represents the maximum allowable bit rate according to the CAN specifications). The maximum extension of a CAN network depends directly on the bit rate. The exact relation between these two quantities involves parameters such as the delays introduced by transceivers and opto-couplers. Generally speaking, the mathematical product between the length of the bus and the bit rate has to be approximately constant. For example, the maximum extension allowed for a 500 Kbit/s network is about 100 m, and increases up to about 500 m when a bit rate of 125 Kbit/s is considered. Signal repeaters can be used to increase the network extension, especially when large plants have to be covered and the bit rate is low or medium. However, they introduce additional delays on the communication paths; hence the maximum distance between any two nodes is effectively shortened at high bit rates. Using repeaters also achieves topologies different from the bus (trees or combs, for example). In this case, good design could increase the effective area that is covered by the network. It is worth noting that unlike other field networks, such as, for example, PROFIBUS-PA, there is in general no cost-effective way in CAN to use the same wire for carrying both the signal and the power supply. However, an additional pair of wires can be provided inside the bus cable for the power supply. Curiously enough, connectors are not standardized by the CAN specifications. Instead, several companion or higher-level application standards exist that define their own connectors and pin assignment. CiA DS102 [DS102], for example, foresees the use of a SUB-D9 connector, while DeviceNet and CANopen suggest the use of either five-pin ministyle, microstyle, or open-style connectors. In addition, these documents include recommendations for bus lines, cables, and standardized bit rates, which were not included in the original CAN specifications. 13.2.1.2 Bit Encoding and Synchronization In CAN the electrical interface of a node to the bus is based on an open-collector-like scheme. As a consequence, the level on the bus can assume two complementary values, which are denoted symbolically as dominant and recessive. Usually, the dominant level corresponds to the logical value 0 while the recessive level coincides with the logical value 1. CAN relies on the non-return-to-zero (NRZ) bit encoding, which features very high efficiency in that synchronization information is not encoded separately from data. Bit synchronization in each node is achieved by means of a digital phase-locked loop (DPLL), which extracts the timing information directly from the bit stream received from the bus. In particular, the edges of the signal are used for synchronizing the local clocks, so as to compensate tolerances and drifts of the oscillators. To provide a satisfactory degree of synchronization among the nodes, the transmitted bit stream should include a sufficient number of edges. To do this, CAN relies on the so-called bit stuffing technique. In practice, whenever five consecutive bits at the same value (either dominant or recessive) appear in the transmitted bit stream, the transmitting node inserts one additional stuff bit at the complementary value, as depicted in Figure 13.2. These stuff bits can be easily and safely removed by the receiving nodes, to obtain the original stream of bits back.

FIGURE 13.2 Bit stuffing technique.

© 2005 by CRC Press

Controller Area Network: A Survey

13-5

From a theoretical point of view, the maximum number of stuff bits that may be added is one every four bits in the original frame, so the encoding efficiency can be as low as 80% (see, for example, the rightmost part of Figure 13.2, where the original bit stream alternates sequences of four consecutive bits at the dominant level followed by four bits at the recessive level). However, the influence of bit stuffing in real operating conditions is noticeably lower than the theoretical value computed above. Simulations show that, on average, only two to four stuff bits are effectively added to each frame, depending on the size of the identifier and data fields. Despite its being quite efficient, the bit stuffing technique has a drawback: the time taken to send a message over the bus is not fixed; instead, it depends on the content of the message itself. This might cause annoying jitters. Not all fields in a CAN frame are encoded according to the bit stuffing mechanism: it applies only to the initial part of the frames, from the start-of-frame (SOF) bit up to the cyclic redundancy check (CRC) sequence. The remaining fields are of fixed form and are not stuffed.

13.2.2 Frame Format The CAN specification [ISO1] defines both a standard and an extended frame format. These formats mainly differ for the size of the identifier field and for some other bits in the arbitration field. In particular, the standard frame format (also known as CAN 2.0A format) defines an 11-bit identifier field, which means that up to 2048 different identifiers are available to the applications executing in the same network (many older CAN controllers, however, only support identifiers in the range of 0 to 2031). The extended frame format (identified as CAN 2.0B) instead assigns 29 bits to the identifier, so that up to a half billion different objects could exist (in theory) in the same network. This is a fairly high value, which is virtually sufficient for any kind of application. Using extended identifiers in a network to which 2.0A-compliant CAN controllers are also connected usually leads to unmanageable transmission errors, which effectively make the network unstable. Thus, a third category of CAN controllers was developed, known as 2.0B passive: they manage in a correct way the transmission and reception of CAN 2.0A frames, while CAN 2.0B frames are simply ignored so that they do not hang the network. It is worth noting that, in most practical cases, the number of different objects allowed by the standard frame format is more than adequate. Since standard CAN frames are shorter than the extended ones (because of the shorter arbitration field), they permit higher communication efficiency (unless part of the payload is moved into the arbitration field). As a consequence, they are adopted in most of the existing CAN systems, and most of the CAN-based higher-layer protocols, such as CANopen and DeviceNet, basically rely on this format. The CAN protocol foresees only four kinds of frames: data, remote, error, and overload. Their formats are described in detail below. 13.2.2.1 Data Frame Data frames are used to send information over the network. Each data frame in CAN begins with a startof-frame (SOF) bit at the dominant level, as shown in Figure 13.3. Its role is to mark the beginning of the frame, as in serial transmissions carried out by means of conventional Universal Asynchronous Receiver/Transmitters (UARTs). The SOF bit is also used to synchronize the receiving nodes. Immediately after the SOF bit there is the arbitration field, which includes both the identifier and the remote transmission request (RTR) bit. As the name suggests, the identifier field identifies the content of the frame that is being exchanged uniquely on the whole network. The identifier is also used by the MAC sublayer to detect and manage the priority of the frame, which is used whenever a collision occurs (the lower the numerical value of the identifier, the higher the priority of the frame). The identifier is sent starting from the most significant bit up to the least significant one. The size of the identifier is different for the standard and the extended frames. In the latter case, the identifier has been split into an 11-bit base identifier and an 18-bit extended identifier, to provide compatibility with the standard frame format.

© 2005 by CRC Press

13-6

The Industrial Communication Technology Handbook

FIGURE 13.3 Format of data frames.

The RTR bit is used to discriminate between data and remote frames. Since a dominant value of RTR denotes a data frame while a recessive value stands for a remote frame, a data frame has a higher priority than a remote frame having the same identifier. Next to the arbitration field comes the control field. In the case of standard frames, it includes the identifier extension (IDE) bit, which discriminates between standard and extended frames, followed by the reserved bit r0. In the extended frames, the IDE bit effectively belongs to the arbitration field, as well as the substitute remote request (SRR) bit — a placeholder that is sent at recessive value to preserve the structure of the frames. In this case, the IDE bit is followed by the identifier extension and then by the control field, which begins with the two reserved bits r1 and r0. After the reserved bits there is the data length code (DLC), which specifies — encoded on 4 bits — the length (in bytes) of the data field. Since the IDE bit is dominant in the standard frames, while it is recessive in the extended ones, when the same base identifier is considered, standard frames have precedence over extended frames. Reserved bits r0 and r1 must be sent by the transmitting node at the dominant value. Receivers, however, will ignore the value of these bits. For the DLC field, values ranging from 0 to 8 are allowed. According to the last specification, higher values (from 9 to 15) can be used for application-specific purposes. In this case, however, the length of the data field is meant to be 8. The data field is used to store the effective payload of the frame. In order to ensure a high degree of responsiveness and minimize the priority inversion phenomenon, the size of the data field is limited to 8 bytes at most. After the data field there are the CRC and acknowledgment fields. The former field is made up of a cyclic redundancy check sequence encoded on 15 bits, which is followed by a CRC delimiter at the recessive value. The kind of CRC adopted in CAN is particularly suitable to cover short frames (i.e., counting less than 127 bits). The acknowledgment field is made up of two bits: the ACK slot followed by the ACK delimiter. Both of them are sent at the recessive level by the transmitter. The ACK slot, however, is overwritten with a dominant value by each node that has received the frame correctly (i.e., no error was detected up to the ACK field). It is worth noting that, in this way, the ACK slot is actually surrounded by two bits at the recessive level: the CRC and ACK delimiters. By means of the ACK bit, the transmitting node is enabled to discover whether at least one node in the network has received its frame correctly. At the end of the frame there is the end-of-frame (EOF) field, made up of seven recessive bits, which notifies all the nodes of the end of an error-free transmission. In particular, the transmitting node assumes that the frame has been exchanged correctly if no error is detected until the last bit of the EOF field, while in the case of receivers, the frame is valid if there are no errors until the sixth bit of EOF. Different frames are interleaved by the intermission (IMS), which consists of three recessive bits and effectively separates consecutive frames exchanged on the bus. 13.2.2.2 Remote Frames Remote frames are very similar to data frames. The only difference is that they carry no data (i.e., the data field is not present in this case). They are used to request that a given message be sent on the network

© 2005 by CRC Press

Controller Area Network: A Survey

13-7

by a remote node. It is worth noting that the requesting node does not know who the producer of the related information is. It is up to the receivers to discover the one that has to reply. The DLC field in remote frames is not effectively used by the CAN protocol. However, it should be set to the same value as the corresponding data frame, so as to cope with the situations where several nodes send remote requests with the same identifier at the same time (this is legal in a CAN network). In this case, it is necessary for the different requests to be perfectly identical, so that they will overlap in the case of a collision. It should be noted that because of the way the RTR bit is encoded, if a request is made for an object at the same time the transmission of that object is started by the related producer, the contention is resolved in favor of the data frame. 13.2.2.3 Error Frames Error frames are used to notify the nodes in the network that an error has occurred. They consist of two fields: error flag and error delimiter. There are two kinds of error flag: the active error flag is made up of six dominant bits, while the passive error flag consists of six recessive bits. An active error flag violates the bit stuffing rules or the fixed-format parts of the frame that is currently being exchanged; hence, it enforces an error condition that is detected by all other stations connected to the network. Each node that detects an error condition transmits an error flag on its own. In this way, as a consequence of the transmission of an error flag, there can be from 6 to 12 dominant bits on the bus. The error delimiter is made up of eight recessive bits. After the transmission of an error flag, each node starts sending recessive bits, and at the same time, it monitors the bus level until a recessive bit is detected. At this point the node sends seven more recessive bits, hence completing the error delimiter. 13.2.2.4 Overload Frames Overload frames can be used by the slow receivers to slow down operations on the network. This is done by adding an extra delay between consecutive data and remote frames. Their format is very similar to that of error frames. In particular, it is made up of an overload flag followed by an overload delimiter. Today’s CAN controllers are very fast, and so they make the overload frame almost useless.

13.2.3 Access Technique The medium access control mechanism on which CAN relies is basically carrier-sense multiple access (CSMA). When no frame is being exchanged, the network is idle and the level on the bus is recessive. Before transmitting a frame, the nodes have to observe the state of the network. If the network is idle, the frame transmission begins immediately; otherwise, the node must wait for the current frame transmission to end. Each frame starts with the SOF bit at the dominant level, which informs all the other nodes that the network has switched to the busy state. Even though very unlikely, it may happen that two or more nodes start sending their frames exactly at the same time. This is actually possible because the propagation delays on the bus — even though very small — are greater than zero. Thus, one node might start its transmission while the SOF bit of another frame is already traveling on the bus. In this case, a collision will occur. In CSMA networks that are based on collision detection, such as, for example, nonswitched Ethernet, this unavoidably leads to the corruption of all frames involved, which means that they have to be retransmitted. The consequence is a waste of time and a net decrease of the available bandwidth. In high-load conditions, this may lead to congestion: when the number of collisions is so high that the net throughput on the Ethernet network falls below the arrival rate, the network becomes stalled. Unlike Ethernet, CAN is able to resolve the contentions in a deterministic way, so that neither time nor bandwidth is wasted. Therefore, congestion conditions can no longer occur and all the theoretical system bandwidth is effectively available for communications. For the sake of truth, it should be said that contentions in CAN occur more often than one may think. In fact, when a node that has a frame to transmit finds the bus busy or loses the contention, it waits for

© 2005 by CRC Press

13-8

The Industrial Communication Technology Handbook

the end of the current frame exchange, and immediately after the intermission has elapsed, it starts transmitting. Here, the node may compete with other nodes for which — in the meantime — a transmission request has been issued. In this case, the different nodes synchronize on the falling edge of the first SOF bit that is sensed on the network. This implies that the behavior of a CAN network is effectively that of a network-wide distributed transmission queue where messages are selected for transmission according to a priority order. 13.2.3.1 Bus Arbitration The most distinctive feature of the medium access technique of CAN is the ability to resolve in a deterministic way any collision that should occur on the bus. In turn, this is made possible by the arbitration mechanism, which effectively finds out the most urgent frame each time there is a contention for the bus. The CAN arbitration scheme allows the collisions to be resolved by stopping the transmissions of all frames involved except the one that is characterized by the highest priority (i.e., the lowest identifier). The arbitration technique exploits the peculiarities of the physical layer of CAN, which conceptually provides a wired-end connection scheme among all the nodes. In particular, the level on the bus is dominant if at least one node is sending a dominant bit; likewise, the level on the bus is recessive if all the nodes are transmitting recessive bits. By means of the so-called binary countdown technique, each node — immediately following the SOF bit — transmits the message identifier serially on the bus, starting from the most significant bit. When transmitting, each node checks the level observed on the bus against the value of the bit that is being written out. If the node is transmitting a recessive value and the level on the bus is dominant, the node understands it has lost the contention and withdraws immediately. In particular, it ceases transmitting and sets its output port to the recessive level so as not to interfere with the other contending nodes. At the same time, it switches to the receiving state to read the incoming (winning) frame. The binary countdown technique ensures that in the case of a collision, all the nodes that are sending lower-priority frames will abort their transmissions by the end of the arbitration field, except for the one that is sending the frame characterized by the highest priority (the winning node does not even realize that a collision has occurred). This implies that no two nodes in a CAN network can be transmitting messages related to the same object (that is to say, characterized by the same identifier) at the same time. If this is not the case, in fact, unmanageable collisions could take place that, in turn, cause transmission errors. Because of the automatic retransmission feature of the CAN controllers, this will lead almost certainly to a burst of errors on the bus, until the stations involved are disconnected by the fault confinement mechanism. This implies that, in general, only one node can be the producer of each object. One exception to this rule is given by the frames without a data field, such as, for example the remote frames. In this case, should a collision occur among frames with the same identifier, they overlap perfectly and hence no collision effectively occurs. The same is also true for data frames that have a nonempty data field, provided that the content of this field is the same for all the frames sharing the same identifier. However, it makes no sense in general to send frames with a fixed data field. All nodes that lose the contention have to retry the transmission as soon as the exchange of the current (winning) frame ends. They will all try to send their frames again immediately after the intermission is read on the bus. Here, a new collision could take place that also involves the frames sent by the nodes for which a transmission request was issued while the bus was busy. An example that shows the detailed behavior of the arbitration phase in CAN is outlined in Figure 13.4. Here, three nodes (that have been indicated symbolically as A, B, and C) start transmitting a frame at the same time (maybe at the end of the intermission following the previous frame exchange over the bus). As soon as a node understands it has lost the contention, it switches its output level to the recessive value, so that it no longer interferes with the other transmitting nodes. This event takes place when bit ID5 is being sent for node A, while for node B this happens at bit ID2. Node C manages to send the entire identifier field, and then it can keep on transmitting the remaining part of the frame.

© 2005 by CRC Press

Controller Area Network: A Survey

13-9

FIGURE 13.4 Arbitration phase in CAN.

13.2.4 Error Management One of the main requirements that was fundamental in the definition of the CAN protocol was the need to have a communication system characterized by high robustness, i.e., a system that is able to detect most of the transmission errors. Hence, particular care has been taken in defining error management. The CAN specification foresees five different mechanisms to detect transmission errors: 1. Cyclic redundancy check: When transmitting a frame, the originating node adds a 15-bit-wide CRC to the end of the frame itself. Receiving nodes reevaluate the CRC to check if it matches the transmitted one. Generally speaking, the CRC used in CAN is able to discover up to 5 erroneous bits distributed arbitrarily in the frame or errors bursts including up to 15 bits. 2. Frame check: The fixed-format fields in the received frames can be easily tested against their expected values. For example, the CRC and ACK delimiters as well as the EOF field have to be at the recessive level. If one or more illegal bits are detected, a form error is generated. 3. Acknowledgment check: The transmitting node checks whether the ACK bit has been set to the dominant value in the received frame. On the contrary, an acknowledgment error is issued. 4. Bit monitoring: Each transmitting node compares the level on the bus against the value of the bit that is being written. Should a mismatch occur, an error is generated. This does not hold for the arbitration field or the acknowledgment slot. Such an error check is very effective to detect local errors that may occur in the transmitting nodes. 5. Bit stuffing: Each node verifies whether the bit stuffing rules have been violated in the portion of the frames from the SOF bit up to the CRC sequence. In the case when six bits of identical value are read from the bus, an error is generated. The residual probability that a corrupted message is not detected in a CAN network — under realistic operating conditions — has been evaluated and is found to be about 4.7 · 10–11 times the frame error rate or less.

13.2.5 Fault Confinement To prevent a node that is not operating properly from sending repeatedly corrupted frames, hence blocking the entire network, a fault confinement mechanism has been included in the CAN specification. The fault confinement unit supervises the correct operation of the related MAC sublayer, and should the node become defective, it disconnects that node from the bus. The fault confinement mechanism has been conceived to discriminate, as long as it is possible, between permanent failures and short disturbances that may cause bursts of errors on the bus. According to this mechanism, each node can be in one of the three following states:

© 2005 by CRC Press

13-10

The Industrial Communication Technology Handbook

• Error active • Error passive • Bus off Error-active and error-passive nodes take part in the communication in the same way. However, they react to the error conditions differently. They send active error flags in the former case and passive error flags in the latter. This is because an error-passive node has already experienced several errors, and hence it should avoid interfering with the network operations (a passive error flag, in fact, does not corrupt the ongoing frame exchange). The fault confinement unit uses two counters to track the behavior of the node with respect to the transmission errors: transmission error count (TEC) and receive error count (REC). The rules by which TEC and REC are managed are actually quite complex. However, they can be summarized as follows: each time an error is detected, the counters are increased by a given amount, whereas successful exchanges decrease them by one. Furthermore, the amount of the increase for the nodes that first detected the error is higher than for the nodes that simply replied to the error flag. In this way, it is very likely that the counters of the faulty nodes increase more quickly than the nodes that are operating properly, even when sporadic errors due to electromagnetic noise are considered. When counters exceed the first threshold (127), the node is switched to the error-passive state, to try not to affect the network. When a second threshold (255) is exceeded, the node is switched to the busoff state. At this point, it can no longer transmit any frame on the network, and it can be switched back to the error-active state only after it has been reset and reconfigured.

13.2.6 Communication Services According to the ISO specification [ISO1], the LLC sublayer of CAN provides two communication services only: L_DATA, which is used to broadcast the value of a specific object over the network, and L_REMOTE, which is used to ask for the value of a specific object to be broadcast by the related remote producer. From a practical point of view, these primitives are implemented directly in the hardware by all currently available CAN controllers. 13.2.6.1 Model for Information Exchanges Unlike most network protocols conceived for use in automated manufacturing environments (which rely on node addressing), CAN adopts object addressing. In other words, messages are not tagged with the address of the destination or originating node. Instead, each piece of information that is exchanged over the network (often referred to as an object) is assigned a unique identifier, which denotes unambiguously the meaning of the object itself in the whole system. This fact has important consequences on the way communications are carried out in CAN. In fact, identifying the objects that are exchanged over the network according to their meaning rather than to the node they are intended for implicitly allows multicasting and makes it very easy for the control applications to manage interactions among devices according to the producer–consumer paradigm. The exchange of information in CAN takes place according to the three phases shown in Figure 13.5: 1. The producer of a given piece of information encodes and transmits the related frame on the bus (the arbitration technique will transparently resolve any contention that should occur). 2. Because of the intrinsically broadcast nature of the bus, the frame is propagated all over the network, and every node reads its content in a local receive buffer. 3. The frame acceptance filtering (FAF) function in each node determines whether the information is relevant to the node itself. If it is, the frame is passed to the upper communication layers (from a practical point of view, this means that the CAN controller raises an interrupt to the local device logic, which will then read the value of the object); if it is not, the frame is simply ignored and discarded.

© 2005 by CRC Press

Controller Area Network: A Survey

13-11

FIGURE 13.5 Producer–consumer model.

In the sample data exchange depicted in Figure 13.5, node B is the producer of some kind of information that is relevant to (i.e., consumed by) nodes A and D. Node C is not interested in such data, so it is rejected by the filtering function (this is the default behavior of the FAF function). 13.2.6.2 Model for Device Interaction The access technique of CAN makes this kind of network particularly suitable to be used in distributed systems that communicate according to the producer–consumer model. In this case, data frames are used by the producer nodes to broadcast new values over the network, each of which is identified unambiguously by means of its identifier. Unlike the networks based on the producer–consumer–arbiter model, such as the Factory Instrumentation Protocol (FIP), information is sent in CAN as soon as it becomes available from either the control applications or the controlled physical system (by means of sensors), without the need for the intervention of a centralized arbiter. This noticeably improves the responsiveness of the whole system. CAN networks also work equally well when they are used to interconnect devices in systems based on a more conventional master–slave communication model. In this case, the master can use remote frames to ask for some specific information to be remotely sent on the network. The producer of that information, as a consequence of this frame, will reply with a data frame carrying the related object. It is worth noting that this kind of interaction is implemented in CAN in a fairly more flexible way than in the conventional master–slave networks, such as, for example, PROFIBUS-DP. In CAN, in fact, it is not necessary for the reply (data frame) to follow the request (remote frame) immediately. In other words, the network is not kept busy while the device is trying to send the reply. This allows the entire bandwidth to be theoretically available to the applications. Furthermore, the reply containing the requested value is broadcast on the whole network, and hence it can be read by all the interested nodes, in addition to the one that transmitted the remote request.

13.2.7 Implementation According to the internal architecture, CAN controllers can be classified in two different categories: BasicCAN and FullCAN. Conceptually, BasicCAN controllers are provided with one transmit and one receive buffer, as in conventional UARTs. The frame-filtering function, in this case, is generally left to the application programs (i.e., it is under control of the host controller), even though some kind of filtering can be done by the controller. To avoid overrun conditions, a double-buffering scheme based on shadow receive buffers is usually available, which permits a new frame to be received from the bus while the previous one is being read by the host controller. An example of a controller based on the BasicCAN scheme is given by Philips’ PCA82C200. FullCAN implementations foresee a number of internal buffers that can be configured to either receive or transmit some particular messages. In this case, the filtering function is implemented directly in the

© 2005 by CRC Press

13-12

The Industrial Communication Technology Handbook

CAN controller. When a new frame that is of interest for the node is received from the network, it is stored in the related buffer, where it can then be read by the host controller. In general, new values simply overwrite the previous ones, and this does not lead to an overrun condition (the old value of a variable is superseded by a newer one). The Intel 82526 and 82527 CAN controllers are based on the FullCAN architecture. FullCAN controllers, in general, free the host controller of a number of activities, so they are considered to be more powerful than BasicCAN controllers. However, the most recent CAN controllers embed the operating principles of both above architectures, so the above classification is actually in the process of being superseded.

13.3 Main Features of CAN The medium access technique on which CAN relies basically implements a nonpreemptive distributed priority-based communication system, where each node is enabled to compete directly for the bus ownership, so that it can send messages on its own (this means that CAN is a true multimaster system). This can be advantageous for use in event-driven systems.

13.3.1 Advantages CAN is by far more simple and robust than the token-based access schemes (such as, for example, PROFIBUS when used in multimaster configurations). In fact, there is no need to build or maintain the logical ring, nor to manage the circulation of the token around the master stations. In the same way, it is noticeably more flexible than the solutions based on the time-division multiple-access (TDMA) or combined-message approaches — two techniques adopted by SERCOS and INTERBUS, respectively. This is because message exchanges do not have to be known in advance. When compared to schemes based on centralized polling, such as FIP, it is not necessary to have a node in the network that acts as the bus arbiter, which can become a point of failure for the whole system. Since in CAN all the nodes are masters (at least from the point of view of the MAC mechanism), it is very simple for them to notify asynchronous events, such as, for example, alarms or critical error conditions. In all cases where this aspect is important, CAN is clearly better than the above-cited solutions. Thanks to the arbitration scheme, it is certain that no message will be delayed by lower-priority exchanges (this phenomenon is known as priority inversion). Since the CAN protocol is not preemptive (as is the case for almost all existing protocols), a message can still be delayed by a lower-priority one whose transmission has already started. This is unavoidable in any nonpreemptive system. However, as the frame size in CAN is very small (standard frames are 135 bits long at most, including stuff bits), the blocking time experienced by the very urgent messages is in general quite low. This makes CAN a very responsive network, which explains why it is used in many real-time control applications despite its relatively low bandwidth. The above characteristics have to be considered carefully when assigning identifiers to the different objects that have to be exchanged in distributed real-time control applications. From an intuitive point of view, the most urgent messages (i.e., the messages characterized by the tightest deadlines) should be assigned the lowest identifiers (for example, identifier 0 labels the message that has the highest priority in any CAN network). If the period of cyclic data exchanges (and the minimum interarrival time of the acyclic ones) is known in advance, a number of techniques based on either the rate monotonic or deadline monotonic approaches have appeared in the literature [TIN] that can be used to find (if it exists) a suitable assignment of identifiers to the objects, so that the resulting schedule is feasible (i.e., the deadlines of all the objects are always respected).

13.3.2 Drawbacks There are a number of drawbacks that affect CAN, the most important being related to performance, determinism, and dependability. Though they were initially considered mostly irrelevant, as time elapses they are becoming quite limiting in a number of application fields.

© 2005 by CRC Press

Controller Area Network: A Survey

13-13

13.3.2.1 Performances Even though inherently elegant, the arbitration technique of CAN poses serious limitations on the performance that can be obtained by the network. In fact, in order for the arbitration mechanism to operate correctly, it is necessary for the signal to be able to propagate from a node located at one end of the bus up to the farthest node (at the other end) and come back before the originating samples the level on the bus. Since the sampling point is located roughly after the middle of each bit (the exact position can be programmed by means of suitable registers), the end-to-end propagation delay, including the hardware delay of transceivers, must be shorter than about one quarter of the bit time (the exact value depending on the bit timing configuration in the CAN controller). As the propagation speed of signals is fixed (about 200 m/µs on copper wires), this implies that the maximum length allowed for the bus is necessarily limited and depends directly on the bit rate chosen for the network. For example, a 250 Kbit/s CAN network can span at most 200 m. Similarly the maximum bus length allowed when the bit rate is selected as equal to 1 Mbit/s is only 40 m. This, to some degree, explains why the maximum bit rate allowed by CAN specifications [ISO1] has been limited to 1 Mbit/s. It is worth noting that this limitation depends on physical factors, and hence it cannot be overcome in any way by advances in the technology of transceivers (to make a comparison, at present, several inexpensive communication technologies are available on the market that allow bit rates in the order of tens or hundreds of Mbit/s). Even though this can appear to be a very limiting factor, it will probably not have any relevant impact in the near future for several application areas — including automotive and process control applications — for which cheap and well-assessed technology is more important than performance. However, there is no doubt that CAN will suffer in a couple of years from the higher bit rates of its competitors, i.e., PROFIBUS-DP (up to 12 Mbit/s), SERCOS (up to 16 Mbit/s), INTERBUS (up to 2 Mbit/s), FlexRay (up to 10 Mbit/s), or the networks based on Industrial Ethernet (up to 100 Mbit/s). Such solutions, in fact, are able to provide a noticeably higher data rate, which is necessary for the systems that have a lot of devices and very short cycle times (1 ms or less). 13.3.2.2 Determinism Because of its nondestructive bitwise arbitration scheme, CAN is able to resolve in a deterministic way any collision that might occur on the bus. However, if nodes are allowed to produce asynchronous messages on their own — this is the way event-driven systems usually operate — there is no way to know in advance the exact time a given message will be sent. This is because it is not possible to foresee the actual number of collisions a node will experience with higher-priority messages. This behavior leads to potentially dangerous jitters, which in some kind of applications, such as, for example, those involved in the automotive field, might affect the control algorithms in a negative way and worsen its precision. In particular, it might happen that some messages miss their intended deadlines. Related to determinism is the problem that composability is not ensured in CAN networks. This means that when several subsystems are connected to the same network, the overall system may fail to satisfy some timing requirement, even though each subsystem was tested separately and proved to behave correctly. This is a severe limitation to the chance of integrating subsystems from different vendors, and hence makes the design tasks more difficult. 13.3.2.3 Dependability The last drawback of CAN concerns dependability. Whenever safety-critical applications are considered, where a communication error may lead to damages to the equipment or even injuries to human beings, such as, for example, in automotive x-by-wire systems, a highly dependable network has to be adopted. Reliable error detection should be achieved both in the value and in the time domain. In the former case, conventional techniques such as, for example, the use of a suitable CRC are adequate. In the latter case, a time-triggered approach is certainly more appropriate than the event-driven communication scheme provided by CAN. In time-triggered systems all actions (including message exchanges, sampling

© 2005 by CRC Press

13-14

The Industrial Communication Technology Handbook

of sensors, actuation of commanded values, and task activations) are known in advance and must take place at precise points in time. In this context even the presence (or absence) of a message at a given instant provides significant information (i.e., it enables the discovery of faults). Also related to dependability issues is the so-called babbling idiot problem, from which the CAN system might suffer. In fact, a faulty node that repeatedly transmits a very high priority message on the bus can block the whole network. Such a failure cannot be detected by the fault confinement unit embedded in CAN chips, as it does not depend on physical faults, but is due to logical errors.

13.3.3 Solutions Among the possible solutions conceived to enhance the behavior of CAN is the so-called time-triggered CAN (TTCAN) protocol [ISO4], for which the first chips are already available. By adopting a common clock and a time-triggered approach it is possible to reduce jitters and provide a fully deterministic behavior. If asynchronous transmissions are not allowed in the system (which means that the arbitration technique is not actually used), TTCAN effectively behaves like a TDMA system, and thus there is not any particular limitation on the bit rate (which could be increased above the theoretical limit of CAN). However, such a solution is generally not advisable, in that the behavior of the resulting network becomes noticeably different from CAN. Other solutions have appeared in the literature for improving CAN performances, such as, for example, WideCAN [WCAN], that provide higher bit rates and still rely on the conventional CAN arbitration technique. However, at present their interest is mainly theoretical.

13.4 Time-Triggered CAN The time-triggered CAN protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. However, it can be profitably used in those applications characterized by tight timing requirements that demand strictly deterministic behavior. In TTCAN, in fact, it is possible to decide exactly the point in time when safety-critical messages will be exchanged, irrespective of the network load. Moreover, composability is much improved with respect to CAN, so that it is possible to split a system into several subsystems that can be developed and tested separately. The TTCAN specification is now stable and is being standardized by ISO [ISO4]. The main reason that led to the definition of TTCAN was the need to provide improved communication determinism while maintaining the highest degree of compatibility with the existing CAN devices and development tools. In this way, noticeable savings in the investments for the communication technology can be achieved.

13.4.1 Main Features One of the most appealing features of TTCAN is that it allows event-driven and time-triggered operations to coexist in the same network. To ease migration from CAN, TTCAN foresees two levels of implementations that are known as levels 1 and 2, respectively. Level 1 implements basic time-triggered communications over CAN. Level 2, which is a proper extension of level 1, also offers a means for maintaining a global system time across the whole network, irrespective of tolerances and drifts of the local oscillators. This enables high-end synchronization, and hence true time-triggered operations can take place in the system. The TTCAN protocol is placed above the (unchanged) CAN protocol. It allows time-triggered exchanges to take place in a quasi-conventional CAN network. Because TTCAN relies on CAN directly (they adopt the same frame format and the same transmission protocol), it suffers from the same performance limitations of the underlying technology. In particular, it is not practically feasible to increase the transmission speed above 1 Mbit/s. However, because of the time-triggered paradigm it relies on, TTCAN is able to ensure strictly deterministic communications, which means that, for example, it is suitable for the first generation of drive-by-wire automotive systems — which are provided with hydrau-

© 2005 by CRC Press

Controller Area Network: A Survey

13-15

lic/mechanical backups. However, it will likely be unsuitable for the next generation of steer-by-wire applications. In these cases, in fact, the required bandwidth is noticeably higher.

13.4.2 Protocol Specification TTCAN is based on a centralized approach, where a special node called the time master (TM) keeps the whole network synchronized by regularly broadcasting a reference message (RM), usually implemented as a high-priority CAN message. Redundant time masters can be envisaged to provide increased reliability. Whenever receiving RM, each node restarts its cycle timer, so that a common view of the elapsing time is ensured across the whole network. In practice, every time a SOF bit is read on the bus, a synchronization event is generated in every network controller that causes the local time to be copied in a sync mark register. If the SOF bit is related to a valid reference message, the sync mark register is then loaded into the reference mark register. At this point, the cycle time is evaluated as the difference between the current local time and the reference mark. Two kinds of RM are foreseen: in level 1 implementations RM is 1 byte long, whereas level 2 relies on a 4-byte RM that is backward compatible with level 1 (from a practical point of view, 3 bytes are added for distributing the global time as seen by the time master). Protocol execution is driven by the progression of the cycle time. In particular, a number of time marks are defined in each network controller as either transmission or receive triggers, which are used for sending messages and validating message receptions, respectively. In TTCAN each node does not have to know all the messages in the network. Instead, only details of the messages the node sends or reads are needed. Transmission of data is organized as a sequence of basic cycles (BCs). Each basic cycle begins with the reference message, which is followed by a fixed number of time windows that are configured offline and can be of the following three types: • Exclusive windows: Each exclusive window is statically reserved to a predefined message, so that collisions cannot occur. They are used for safety-critical data that have to be sent deterministically and without jitters. • Arbitration windows: Such windows are not preallocated to any given message; thus, different competing messages will rely on the nondestructive CAN arbitration scheme to resolve any possible collision that might occur. • Free windows: They are reserved for future expansions of TTCAN systems. So that time windows are not exceeded, in TTCAN controllers it should be possible to disable the automatic retransmission feature of CAN when either the contention is lost or transmission errors are detected. The only exception occurs when several adjacent arbitrating windows exist. In this case, they can be merged to provide a single larger window, which can accommodate asynchronously generated messages in a more flexible way. Despite it seamlessly mixing both synchronous (exclusive) and asynchronous (arbitration) messages, TTCAN is very dependable: in fact, should there be a temporary lack of synchronization and more than one node tries to transmit in the same exclusive window, the arbitrating scheme of CAN is used to solve the collision. For increased flexibility, it is possible to have more than one basic cycle. A system matrix can be defined that consists of up to 64 different BCs, which are repeated periodically (see Figure 13.6). Thus, the effective periodicity in TTCAN is given by the so-called matrix cycle. A cycle counter — included in the first byte of RM — is used by every node to determine the current basic cycle. It is increased each cycle up to a maximum value (which is selected on a network-wide basis before operation is started), after which it is restarted. It should be noted that the system matrix is highly column oriented. In particular, each BC is made up of the same sequence of time windows; i.e., corresponding windows in different BCs have the same duration. However, they can be used to convey different messages, depending on the cycle counter. In this way, it is possible to have messages in exclusive time windows that are repeated once every any given

© 2005 by CRC Press

13-16

The Industrial Communication Technology Handbook

FIGURE 13.6 System matrix in TTCAN.

number of BCs. In this case, each message is assigned a repeat factor and a cycle offset, which characterize its transmission schedule. In the same way, it is possible to have more than one exclusive window in the BC allocated to the same message. This is useful either to replicate critical data or for having a refresh rate for some variables that is faster than the basic cycle.

13.4.3 Implementation TTCAN requires slight and inexpensive changes to the current CAN chips. In particular, transmit and receive triggers and a counter for managing the cycle time are needed for ensuring time-triggered operations. Even though level 1 could be implemented in software, a specialized hardware support can reduce noticeably the burden on the processor for managing time-triggered operations. As level 2-compliant controllers should allow drift correction and calibration of the local time, they need modified hardware. The structure of TTCAN modules is very similar to that of conventional CAN modules. In particular, two additional blocks are needed: the trigger memory and the frame synchronization entity. The former is used for storing the time marks of the system matrix. They are linked to the message buffers held in the controller’s memory. The latter is used to control the time-triggered communications. At present, there are several controllers available off-the-shelf that comply with TTCAN specifications, so that this protocol can be readily embedded in new projects.

13.5 CAN-Based Application Protocols To reduce the costs of designing and implementing automated systems, a number of higher-level application protocols have been defined in the past few years that rely on the CAN data link layer to exchange messages among the nodes (all the functions of the data link layer of CAN are implemented directly in hardware in the current CAN controllers, which increases the efficiency and reliability of the data exchanges). The aim of such protocols is to provide a usable and well-defined set of service primitives that can be used to interact with the field devices in a standardized way. At present, two of the most widely available solutions for the process control and automated manufacturing environments are CANopen [COP] and DeviceNet [DNET]. Both of them define an object model that describes the behavior of devices. This permits interoperability and interchangeability among devices coming from different manufacturers. In fact, as long as a device conforms to a given profile, it can be used in place of any other device (of a different brand) that adheres to the same profile.

© 2005 by CRC Press

Controller Area Network: A Survey

13-17

13.5.1 CANopen CANopen was originally conceived to rely on the communication services provided by the CAN application layer (CAL). However, the latest specifications [DS301] no longer refer explicitly to CAL. Instead, the relevant communication services have been embedded directly in the CANopen documents. In CANopen, information is exchanged by means of communication objects (COBs). A number of different COBs are foreseen, which are aimed at different functions: • Process data objects (PDOs), used for real-time exchanges such as, for example, measurements read from sensors and commanded values sent to the actuators for controlling the physical system • Service data objects (SDOs), used for non-real-time communications, i.e., parameterization of devices and diagnostics • Emergency objects (EMCY), used by devices to notify the control application that some error condition has occurred • Synchronization object (SYNC), used to achieve synchronized and coordinated operations in the system Even though in principle every CAN node is a master, at least from the point of view of the MAC mechanism, CANopen systems often rely on a master–slave approach, so as to simplify system configuration and network management. In most cases, in a CANopen network there is only one application master (which is responsible for actually controlling the operations of the automated system) and up to 127 slave devices (sensors and actuators). Each device is identified by means of a unique 7-bit address, called the node identifier, which lies in the range of 1 to 127. The node identifier 0 is used in general for broadcast communications. To ease network configuration, a predefined master–slave connection set has to be provided mandatorily by every CANopen device. It is a standard allocation scheme of identifiers to COBs that is available directly after initialization, when a node is switched on or reset — provided that no modifications have been stored in a nonvolatile memory of the device. COB identifiers in the predefined connection set are made up of a function code — which takes the four most significant bits of the CAN identifier — followed by the node address. The function code, on which mainly depends the priority of the COB, is used to discriminate among the different kinds of COBs, that is, PDOs, SDOs, EMCYs, network management (NMT) functions, and so on. 13.5.1.1 Object Dictionary The behavior of any CANopen device is described completely by means of a number of objects, each one tackling a particular aspect related to either the communications on the CAN bus or the functions available to interact with the physical controlled system (for example, there are objects that define the device type, the manufacturer’s name, the hardware and software versions, and so on). All the objects relevant to a given node are stored in the object dictionary (OD) of that node. Entries in the OD are addressed by means of a 16-bit index. Each entry, in turn, can either be represented by a single value or consist of several components that are accessible through an 8-bit subindex (such as the arrays and records). The object dictionary is split into four separate parts, according to the index of entries. Entries below 1000H are used to specify data types. Entries from 1000H to 1FFFH are used to describe communicationspecific parameters (i.e., the interface of the device as seen by the CAN network). Entries from 2000H to 5FFFH can be used by manufacturers to extend the basic set of functions of their devices. Their use has to be considered carefully, in that they could make devices no longer interoperable. Finally, entries from 6000H to 9FFFH are used to describe in a standardized way all aspects related to a specific category of devices (as defined in a device profile). 13.5.1.2 Process Data Objects All the real-time process data involved in controlling a physical system are exchanged in CANopen by means of PDOs. Each PDO is mapped on exactly one CAN frame, so that it can be exchanged quickly

© 2005 by CRC Press

13-18

The Industrial Communication Technology Handbook

and reliably. As a direct consequence, the amount of data that can be exchanged with one PDO is limited to 8 bytes at most. In most cases, this is more than sufficient to encode an item of process data. According to the predefined connection set, each node in CANopen can define up to four receive PDOs (from the application master to the device) and four transmit PDOs (from the device to the application master). In case more PDOs are needed, the PDO communication parameter entries in the OD of the device can be used to define additional messages — or to change the existing ones. By using the PDO mapping parameter — if supported by the device — it is even possible to define in the configuration phase which application objects (i.e., process variables) will be included in each PDO. The transmission of PDOs from the slave devices can be triggered by some local event taking place on the node — including the expiration of some time-out — or it can be remotely requested from the master. This gives system designers a very high degree of flexibility in choosing how devices interact in the automated system, and enables the features offered by intelligent devices to be exploited better. No additional control information is added to PDOs by CANopen, so that communication efficiency is as high as in CAN. This means that the meaning of each PDO is determined directly by the related identifier. As multicasting is allowed on PDOs, their transmission is unconfirmed; i.e., the producer has no way to determine whether the PDO has been read by all the intended consumers. One noticeable feature of CANopen is that it can provide synchronous operations. In particular, it is possible to configure the transmission type of each single PDO so that its exchanges will be driven by the occurrence of the SYNC message, which is sent regularly by a node known as sync master (which usually is the same node as the application master). Synchronous data exchanges, in this case, take place in periodic communication cycles. When synchronous operations are selected, commanded values are not actuated by devices as soon as they are received, nor are sampled values transmitted immediately. Instead, as depicted in Figure 13.7, each time a SYNC message is read from the network, the PDOs received in the previous communication cycle are actuated by every output device. At the same time, all sensors will sample their input ports and the measured values will be sent as soon as possible in the next cycle. A synchronous window length parameter can be defined that specifies the latest time when it is certain that all commanded values have been made available to devices. After that time the processing of output values can be started. Synchronous operations provide a noticeable improvement for what concerns the effect of jitters: in this case, in fact, system operations and timings are decoupled by the actual times PDOs are exchanged over the network. As the SYNC message is mapped on a high-priority frame, jitters are, at worst, the same as the duration of the longest CAN message.

FIGURE 13.7 Synchronous operation.

© 2005 by CRC Press

Controller Area Network: A Survey

13-19

13.5.1.3 Service Data Objects SDOs are used in CANopen for parameterization and configuration, which usually take place at a lower priority than process data (hence, they are effectively considered non-real-time exchanges). In this case a confirmed transmission service has to be provided, which ensures a reliable exchange of information. Furthermore, SDOs are only available on a peer-to-peer communication basis (multicasting is not allowed). A fragmentation protocol has been adopted for SDOs — which derives from the domain transfer services of CAL — so that information of any size can be exchanged. This means that the SDO sender has to split the information in smaller chunks, which are then reassembled at the receiving side. This affects the communication efficiency in a negative way. However, as SDOs are not used for the real-time control of the system, this is not a problem. SDOs are used to access the entries of the object dictionary directly, so that they can be read or modified by the configuration tools. From a practical point of view, two services are provided, which are used to upload and download the content of one subentry of the OD, respectively. According to the predefined connection set, each node must provide SDO server functionalities and has to define a pair of COB IDs for dealing with the OD access, one for each direction of transfer. In a CANopen network only one SDO client at a time is usually allowed (in reality, what is needed is that all SDO connections between clients and servers be defined statically). It is optionally possible to provide dynamic establishment of additional SDO connections by means of a network entity called the SDO manager. 13.5.1.4 Network Management There are two kinds of functions related to network management (NMT): node control and error control. Node control services are used to control the operation of either a single node or the whole network. For example, they can be used to start or stop nodes, to reset their state, or to put a node in configuration (preoperational) mode. Such commands are definitely time critical, and hence they use the highestpriority communication object available in CAN. Error control services are used to monitor the correct operation of the network. Two mechanisms are available: node guarding and heartbeat. In both cases, low-priority messages are exchanged periodically in the background over the network by the different nodes and suitable watchdogs are defined, both in the NMT master and in slave nodes. Should one device cease sending these messages, after a given timeout the network management layer is made aware of the problem and can take the appropriate actions. 13.5.1.5 Device Profiles In order to provide interoperability, a number of device profiles have been standardized in CANopen. Each profile describes the common behavior of a particular class of devices and is usually described in a separate document. Among the available profiles are the following: • I/O devices [DS401], which include both digital and analog input/output devices • Drives and motion control, which are used to describe digital motion products, such as stepper motors and servo-drives • Human machine interfaces, which describe the use of displays and operator interfaces • Measuring devices and closed-loop controllers, which measure and control physical quantities • IEC 61131-3 programmable device, which describes the behavior of programmable logic controllers (PLCs) and intelligent devices • Encoders, which define incremental/absolute linear and rotary encoders to measure both position and velocity The I/O device profile, for instance, permits the definition of the polarity of each digital input/output port, or the application of a filtering mask for disabling selected bits. Device ports can be accessed in groups of 1, 8, 16, or 32 bits. For analog devices, it is possible to use the raw value or a converted one (after a scaling factor and an offset have been applied), or to define triggering conditions when specific thresholds are exceeded.

© 2005 by CRC Press

13-20

The Industrial Communication Technology Handbook

13.5.2 DeviceNet DeviceNet [DNET] is a very flexible protocol to be used at the field level in the automated environments. The implementation of devices that comply with DeviceNet is, in general, slightly more complex than that for CANopen devices. However, DeviceNet offers a number of additional features with respect to CANopen, which can be used, for example, in complex multimaster networks. One appealing feature of DeviceNet is that it is based on the same Control and Information Protocol (CIP) adopted by ControlNet and EtherNet/IP. This means that a good level of interoperability is ensured among these networks, making it possible to interconnect them to provide seamless communications from devices at the plant floor up to the Internet. In addition to the services at the application level, the DeviceNet specification also defines the physical layer in detail, including aspects such as connectors and cables (thin, thick, and flat cables are foreseen). It should be noted that the cable in DeviceNet can be used for both the signal and power supply (by using 4 wires plus ground). Each DeviceNet network can include up to 64 different devices, which means that each node is identified by means of a 6-bit MAC ID. The allowable bit rates are limited to 125, 250, and 500 Kbit/s, which means that the permitted maximum bus extensions lie in the range of 100 to 500 m. 13.5.2.1 Object Model The behavior and functions of each device are described in detail in DeviceNet by means of objects. In particular, three kinds of objects are foreseen: communication, system, and application-specific objects. Two very important objects are the connection object, which defines all aspects related to a connection (including the CAN identifier and the triggering mode), and the application object, which defines the standardized behavior of a class of devices. Data and services made available by each device are addressed by means of a hierarchical addressing scheme that is based on the following components: MAC ID (i.e., the device’s address), class ID, instance ID, attribute ID, and service code. The class, instance, and attribute identifiers are usually specified on 8 bits, while the service code is made up of a 7-bit integer. 13.5.2.2 Communication Model Communication among nodes (either point-to-point or multicast) takes place according to a connectionoriented scheme. By using the standard 11-bit CAN identifier, it is possible to provide an addressing scheme based on four message groups, in decreasing order of priority: • Message group 1 includes the highest-priority identifiers and permits up to 16 different messages per node. • Message group 2 essentially refers to the predefined master–slave connection set. • Message group 3 is similar to message group 1, but it is made up of low-priority frames. • Message group 4 is primarily used for network management. Basically, two kinds of communication are possible: explicit messages and I/O messages. Explicit messages are used for general data exchanges among devices, such as configuration, management, and diagnostics. These kind of exchanges take place on the network at a low priority. I/O messages are used to exchange high-priority real-time messages according to the producer–consumer model. Because the underlying communication system is based on a CAN network, each frame can include 8 bytes at most. Should one item of data exceed this size, a fragmentation protocol has been defined in DeviceNet that manages message splitting and the successive reassembly.

References [COP] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 4: CANopen, EN 503254, 2001.

© 2005 by CRC Press

Controller Area Network: A Survey

13-21

[DNET] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 2: DeviceNet, EN 503252, 2000. [DS102] CAN in Automation International Users and Manufacturers Group e.V., CAN Physical Layer for Industrial Applications: Two-Wire Differential Transmission, CiA DS 102, version 2.0, 1994. [DS301] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Application Layer and Communication Profile, CiA DS 301, version 4.02, 2002. [DS401] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Device Profile for Generic I/O Modules, CiA DS 401, version 2.1, 2002. [ISO1] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 1: Data Link Layer and Physical Signalling, ISO 11898-1, 2003. [ISO2] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 2: High-Speed Medium Access Unit, ISO 11898-2, 2003. [ISO3] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 3: Low-Speed, Fault-Tolerant, Medium Dependent Interface, TC 22/SC 3/WG 1, ISO/PRF 11898-3, 2003. [ISO4] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 4: Time-Triggered Communication, ISO 11898-4, 2004. [TIN] Tindell K.W., Burns A., and Wellings A.J., Calculating Controller Area Network (CAN) messages response times, Control Engineering Practice, 3, 1163–1169, 1995. [WCAN] Cena G. and Valenzano A., A multistage hierarchical distributed arbitration technique for priority-based real-time communication systems, in IEEE Transactions on Industrial Electronics, 49, 1227–1239, 2002.

© 2005 by CRC Press

14 The CIP Family of Fieldbus Protocols 14.1 Introduction ......................................................................14-1 14.2 Description of CIP ............................................................14-3 Object Modeling • Services • Messaging Protocol • Communication Objects • Object Library • Device Profiles • Configuration and Electronic Data Sheets • Bridging and Routing • Data Management

14.3 Network Adaptations of CIP..........................................14-18 DeviceNet • ControlNet • EtherNet/IP

14.4 Benefits of the CIP Family .............................................14-50 Benefits for the Manufacturer of Devices • Benefits for the Users of Devices and Systems

14.5 Protocol Extensions under Development......................14-51 CIP Sync • CIP Safety

Viktor Schiffer Rockwell Automation

14.6 Conclusion.......................................................................14-64 References ...................................................................................14-64

14.1 Introduction In the past, typical fieldbus protocols (e.g., Profibus, Interbus-S, FIP (Factory Instrumentation Protocol), P-Net, AS-i (Actuator/Sensor Interface)) have been isolated implementations of certain ideas and functionalities that the inventors thought were best suited to solve a certain problem or do a certain job. This has led to quite effective fieldbuses that do their particular job quite well, but they are optimized for certain layers within the automation pyramid or are limited in their functionality (e.g., strict single master systems running a Master/Slave protocol). This typically results in barriers within the automation architecture that are difficult to penetrate and that require complex gateway devices without being able to fully bridge the gap between the various systems that can be quite different in nature. In contrast, the CIP™* family of protocols (CIP = Common Industrial Protocol) offers a scalable solution that allows a uniform protocol to be employed from the top level of an automation architecture down to the device level without burdening the individual devices. DeviceNet™* is the first member of this protocol family introduced in 1994. DeviceNet is a CIP implementation using the very popular Controller Area Network (CAN) data link layer. CAN in its typical form (ISO 11898 [11]) defines layers 1 and 2 of the OSI seven-layer model [14] only, while DeviceNet covers the rest. The low cost of implementation and the ease of use of the DeviceNet protocol has led to a large number of manufacturers, with many of them organized in the Open DeviceNet Vendor Association (ODVA; see http://www.odva.org). *CIP™ and DeviceNet™ are trademarks of ODVA.

14-1 © 2005 by CRC Press

14-2

The Industrial Communication Technology Handbook

Layers according to ISO/OSI User Device Profiles

Semiconductor

Pneumatic valves

Position controller

AC Drives

Other profiles

CIP Application Layer Application Object Library

Application

CIP (Presentation)

CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages

(Session)

Transport

ControlNet Transport

DeviceNet Transport

Encapsulation TCP

Network

UDP IP

DataLink

ControlNet CTDMA

CAN CSMA/NBA

EtherNet CSMA/CD

Physical

ControlNet Phys. Layer

DeviceNet Phys. Layer

EtherNet Physical Layer

FIGURE 14.1

Possible future Alternatives: ATM, USB, FireWire,...

Relationship between CIP, its implementations, and the ISO/OSI layer model.

ControlNet™,* introduced a few years later (1997), implemented the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters) for more demanding applications. Vendors and users of ControlNet products are organized within ControlNet International (CI; see http://www.controlnet.org) to promote the use of these products. In 2000, ODVA and ControlNet International introduced the newest member of the CIP family — EtherNet/IP™,† where IP stands for Industrial Protocol. In this network adaptation, CIP runs over TCP/ IP and therefore can be deployed over any Transmission Control Protocol (TCP)/Internet Protocol (IP)supported data link and physical layers, the most popular of which is IEEE 802.3 [12], commonly known as Ethernet. The universal principles of CIP easily lend themselves to possible future implementations on new physical/data link layers, e.g., ATM, USB, or FireWire. The overall relationship between the three implementations of CIP and the ISO/OSI layer model is shown in Figure 14.1. Two significant additions to CIP are currently being worked on: CIP Sync™ and CIP Safety™.‡ CIP Sync allows synchronization of applications in distributed systems through precision real-time clocks in all devices. These real-time clocks are kept in tight synchronization by background messages between clock masters and clock slaves using the new IEEE 1588:2002 standard [24]. A more detailed description of this CIP extension is given in Section 14.5.1. CIP Safety is a protocol extension that allows the transmission of safety-relevant messages. Such messages are governed by additional timing and integrity mechanisms that are guaranteed to detect system flaws to a very high degree, as required by international standards such as IEC 61508 [15]. If anything goes wrong, the system will be brought to a safe state, typically taking the machine to a standstill. A more detailed description of this CIP extension is given in Section 14.5.2. In both cases, ordinary devices can operate with CIP Sync or CIP Safety devices side by *ControlNet™ is a trademark of ControlNet International. †EtherNet/IP™ is a trademark of ControlNet International under license by ODVA. ‡CIP Sync™ and CIP Safety™ are trademarks of ODVA.

© 2005 by CRC Press

The CIP Family of Fieldbus Protocols

14-3

side in the same system. There is no need for strict segmentation into Standard, Sync, and Safety networks. It is even possible to have any combination of all three functions in one device.

14.2 Description of CIP CIP is a very versatile protocol that has been designed with the automation industry in mind. However, due to its very open nature, it can be applied to many more areas. The overall CIP Specification is divided into several volumes: • Volume 1 is the CIP Specification. It contains all general parts of the specification that apply to all the network variants. • Volume 2 is the EtherNet/IP Specification. It contains the adaptation of CIP to the Ethernet TCP/ IP and User Datagram Protocol (UDP)/IP transportation layers and all details that apply specifically to EtherNet/IP, including extensions and any modifications of the CIP Specification. • Volume 3 is the DeviceNet Specification. It contains the adaptation of CIP to the CAN data link layer and all details that apply specifically to DeviceNet, including extensions and any modifications of the CIP Specification. • Volume 4 is the ControlNet Specification. It contains the adaptation of CIP to the ControlNet data link layer and all details that apply specifically to ControlNet, including extensions and any modifications of the CIP Specification. • Volume 5 will contain CIP Safety; it is planned to be published in early 2005. The CIP Specification [4] is available from ODVA. It is beyond the scope of this handbook to fully describe each and every detail of this specification, but the key features will be presented. The specification is subdivided into several chapters and appendices that describe the following features: • • • • • • • • •

Object modeling Messaging protocol Communication objects General object library Device profiles Electronic Data Sheets Services Bridging and routing Data management

There are a few more chapters containing descriptions of further CIP elements, but they are not of significance in the context of this book. A few terms used throughout this section should be described here to ensure they are well understood: • Client: Within a Client/Server architecture, the client is the device that sends a request to a server. The client expects a response from the server. • Server: Within a Client/Server architecture, the server is the device that receives a request from a client. The server is expected to give a response to the client. • Producer: Within a Producer/Consumer architecture, the producing device places a message on the network for consumption by one or several consumers. The produced message is in general not directed to a specific consumer. • Consumer: Within a Producer/Consumer architecture, the consumer is one of potentially several consuming devices that pick up a message placed on the network by a producing device. • Producer/Consumer model: CIP makes use of the Producer/Consumer model as opposed to the traditional Source/Destination message addressing scheme (Figure 14.2). It is inherently multicast. Nodes on the network determine if they should consume the data in a message based on the Connection ID in the packet.

© 2005 by CRC Press

14-4

The Industrial Communication Technology Handbook

Source/Destination src

dst

data

crc

data

crc

Producer/Consumer identifier

FIGURE 14.2

Source/Destination vs. Producer/Consumer model.

• Explicit Message: Explicit Messages contain addressing and service information that directs the receiving device to perform a certain service (action) on a specific part (e.g., an attribute) of a device. • Implicit (Input/Output (I/O)) Message: Implicit Messages do not carry address or service information; the consuming node(s) already know what to do with the data based on the Connection ID that was assigned when the connection was established. They are called Implicit Messages because the meaning of the data is implied by the Connection ID. Let us now have a look at the individual elements of CIP.

14.2.1 Object Modeling CIP makes use of abstract object modeling to describe: • The suite of available communication services • The externally visible behavior of a CIP node • A common means by which information within CIP products is accessed and exchanged Every CIP node is modeled as a collection of objects. An object provides an abstract representation of a particular component within a product. Anything not described in object form is not visible through CIP. CIP objects are structured into classes, instances, and attributes. A class is a set of objects that represent the same kind of system component. An object instance is the actual representation of a particular object within a class. Each instance of a class has the same attributes, but it has its own particular set of attribute values. As Figure 14.3 illustrates, multiple object instances within a particular class can reside within a CIP node. In addition to the instance attributes, an object class may also have class attributes. These are attributes that describe properties of the whole object class, e.g., how many instances of this particular object exist. Furthermore, both object instances and the class itself exhibit a certain behavior and allow certain services to be applied to the attributes, instances, or whole class. All publicly defined objects that are implemented in a device must follow at least the mandatory requirements of the CIP specification. Vendor-specific objects may also be defined with a set of instances, attributes, and services according to the requirements of the vendor. However, they need to follow certain rules described in Chapter 4 of the CIP Specification [4]. The objects and their components are addressed by a uniform addressing scheme consisting of: • Node Identifier: An integer identification value assigned to each node on a CIP network. On DeviceNet and ControlNet, this is also called MAC ID (Media Access Control Identifier) and is nothing more than the node number of the device. On EtherNet/IP the Node ID is the IP address. • Class Identifier (Class ID): An integer identification value assigned to each object class accessible from the network. • Instance Identifier (Instance ID): An integer identification value assigned to an object instance that identifies it among all instances of the same class.

© 2005 by CRC Press

14-5

The CIP Family of Fieldbus Protocols

CIP Node

A Class of Objects

FIGURE 14.3

Object Instances

A class of objects.

• Attribute Identifier (Attribute ID): An integer identification value assigned to a class or instance attribute. • Service Code: An integer identification value that denotes an action request that can be directed at a particular object instance or object class (see Section 14.2.2). Object Class Identifiers are divided into open objects, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x00F0 to 0x02FF), and vendor-specific objects (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other Class Identifiers are reserved for future use. In some cases, e.g., within the Assembly Object class, Instance Identifiers are divided into open instances, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x0100 to 0x02FF), and vendor-specific instances (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other instance identifiers are reserved for future use. Attribute Identifiers are divided into open attributes, defined in the CIP Specifications (ranging from 0x00 to 0x63), and vendor-specific attributes (ranging from 0x64 to 0xC7); the other Attribute Identifiers are reserved for future use. Vendor-specific objects can be created with a lot of freedom, but they still have to adhere to certain rules specified for CIP; e.g., they can use whatever Instance and Attribute IDs they wish, but their class attributes must follow the CIP Specification. Figure 14.4 shows an example of this object addressing scheme. More details on object modeling can be found in Chapters 1 and 4 of the CIP Specification [4]. Node ID #1

Node ID #2 Node ID #4: Object Class #5: Instance #2:Attribute #2

CIP Link

Object Class #5 Instance #1

Object Class #5 Attribute #2

Object Class #7 Instance #1

Instance #2

Node ID #3 Instance #1

Node ID #4

FIGURE 14.4

© 2005 by CRC Press

Object addressing example.

14-6

The Industrial Communication Technology Handbook

14.2.2 Services Service Codes are used to define the action that is requested to take place when an object or parts of an object are addressed through Explicit Messages using the addressing scheme described in Section 14.2.1. Apart from the simple read and write functions, a set of CIP Common Services (totaling 22, currently described in [4]) have been defined. These CIP Common Services are common in nature, which means that they can be used in all CIP networks and that they are useful for a large variety of objects. Furthermore, there are object-specific Service Codes that may have a different meaning for the same code, depending on the class of object. Finally, there is a possibility to define vendor-specific services according to the requirements of the developer. While this gives a lot of flexibility, the disadvantage of vendor-specific services is that they may not be understood universally. Complete details of the CIP Service Codes can be found in Appendix A of the CIP common Specification [4].

14.2.3 Messaging Protocol CIP is a connection-based protocol. A CIP Connection provides a path between multiple application objects. When a connection is established, the transmissions associated with that connection are assigned a Connection ID (CID) (Figure 14.5). If the connection involves a bidirectional exchange, then two Connection ID values are assigned. The definition and format of the Connection ID is network dependent. For example, the Connection ID for CIP Connections over DeviceNet is based on the CAN Identifier field. Since most messaging on a CIP network is done through connections, a process has been defined to establish such connections between devices that are not connected yet. This is done through the Unconnected Message Manager (UCMM) function, which is responsible for the processing of Unconnected Explicit Requests and Responses. The general method to establish a CIP Connection is by sending a UCMM Forward_Open Service Request Message. While this is the method used on ControlNet and EtherNet/IP (all devices that allow Connected Messaging support it), it is rarely used on DeviceNet so far. For DeviceNet, the simplified methods described in Sections 14.3.1.11 and 14.3.1.12 are typically used. DeviceNet Safety™* (see Section 14.5.2), on the other hand, fully utilizes this service. A Forward_Open request contains all information required to create a connection between the originator and the target device and, if requested, a second connection between the target and the originator. In particular, the Forward_Open request contains information on the following: • • • • •

Time-out information for this connection Network Connection ID for the connection from the originator to the target Network Connection ID for the connection from the target to the originator Information on the identity of the originator (Vendor ID and Serial Number) (Maximum) data sizes of the messages on this connection

FIGURE 14.5

Connections and Connection IDs.

*DeviceNet Safety™ is a trademark of ODVA.

© 2005 by CRC Press

The CIP Family of Fieldbus Protocols

14-7

• Trigger mechanisms, e.g., Cyclic, Change of State (COS) • Connection Path for the application object data in the node The Connection Path may also contain a Routing Segment that allows connections to exist across multiple CIP networks. The Forward_Open request may also contain an electronic key of the target device (Vendor ID, Device Type, Product Code, Revision), as well as configuration information that will be forwarded to the Configuration Assembly of the target device. Some networks, like ControlNet and EtherNet/IP, may also make extensive use of Unconnected Explicit Messaging, while DeviceNet uses Unconnected Messaging only to establish connections. All connections in a CIP network can be divided into I/O Connections and Explicit Messaging Connections: • I/O Connections provide dedicated, special-purpose communication paths between a producing application and one or more consuming applications. Application-specific I/O data move through these ports and are often referred to as Implicit Messaging. These messages are typically multicast. • Explicit Messaging Connections provide generic, multipurpose communication paths between two devices. These connections are often referred to as just Messaging Connections. Explicit Messages provide the typical Request/Response-oriented network communications. These messages are typically point-to-point. The actual data transmitted in CIP I/O Messages are the I/O data in an appropriate format — it may be prepended by a Sequence Count value. This Sequence Count value can be used to distinguish old data from new, e.g., if a message has been re-sent as a heartbeat in a COS Connection. The two states Run and Idle can be indicated with an I/O Message either by prepending a Run/Idle header, used for ControlNet and EtherNet/IP, or by sending I/O data (Run) or no I/O data (Idle), mainly used for DeviceNet. Run is the normal operative state of a device; the reaction to receiving an Idle event is vendor-specific and application-specific. Typically, this means bringing all outputs of the device to an Idle state, and that typically means “off,” i.e., de-energized. Explicit Messaging requests, on the other hand, contain a Service Code with path information to the desired object (attribute) within the target device followed by data (if any). The associated responses repeat the Service Code followed by status fields followed by data (if any). DeviceNet uses a condensed format for Explicit Messages, while ControlNet and EtherNet/IP use the full format. More details of the messaging protocol can be found in Chapter 2 of the CIP Specification [4].

14.2.4 Communication Objects The CIP communication objects manage and provide the runtime exchange of messages. While these objects follow the overall principles and guidelines for CIP objects, the communication objects are unique in a way since they are the focal point for all CIP communication. It therefore makes sense to have a look at them in more detail. Every instance of a communication object contains a link producer part or a link consumer part, or both. I/O Connections may be either producing or consuming or producing and consuming, while Explicit Messaging Connections are always producing and consuming. Figure 14.6 and Figure 14.7 show the typical connection arrangement for CIP I/O Messaging and CIP Explicit Messaging. The attribute values in the Connection Objects define a set of attributes that describe vital parameters of this connection. Note that Explicit Messages are always directed to the Message Router Object. First of all, they state what kind of connection this is. They specify whether this is an I/O Connection or an Explicit Messaging Connection, but also the maximum size of the data to be exchanged across this connection, and the source and sink of this data. Note that Explicit Messages are always directed to the Message Router Object. Further attributes define the state of this connection and what kind of behavior this connection is to show. Of particular importance is how messages are triggered (from the application, through Change of State or Change of Data, through Cyclic events or network events) and the timing of the connections

© 2005 by CRC Press

14-8

The Industrial Communication Technology Handbook

I/O Connection

I/O Producing Application Object

Producing I/O Connection

I/O Consuming Application Object

Consuming I/O Connection

Device #2

Device #1 I/O Message

I/O Consuming Application Object

Consuming I/O Connection

Device #3

FIGURE 14.6

CIP I/O Multicast Connection.

Explicit Messaging Connection Device #1 Request Application Object

Explicit Messaging Connection

Device #2 Request Explicit Messages

Response

FIGURE 14.7

Explicit Messaging Connection

Message Router Response

Obj. Obj.

CIP Explicit Messaging Connection.

(time-out associated with this connection and predefined action if a time-out occurs). CIP allows multiple connections to coexist in a device, although simple devices, e.g., simple DeviceNet slaves, will typically only have one or two connections alive at any given point in time. Complete details of the communication objects can be found in Chapter 3 of the CIP Specification [4].

14.2.5 Object Library The CIP family of protocols contains a very large collection of commonly defined objects (currently 48 object classes). The overall set of object classes can be subdivided into three types: • General-use objects • Application-specific objects • Network-specific objects Apart from the objects that are network-specific, all other objects are used in all three CIP network types. Figure 14.8 shows the general-use objects, Figure 14.9 shows a group of application-specific objects, and Figure 14.10 shows a group of network-specific objects. New objects are added on an ongoing basis. The general-use objects can be found in many different devices, while the application-specific objects are typically only found in devices hosting such applications.

© 2005 by CRC Press

14-9

The CIP Family of Fieldbus Protocols

• Identity Object, see Section 14.2.5.1

• Parameter Object, see Section 14.2.5.2

• Message Router Object

• Parameter Group Object

• Assembly Object, see Section 14.2.5.3

• Acknowledge Handler Object

• Connection Object, see Section 14.2.4

• Connection Configuration Object

• Connection Manager Object, see Section 14.2.4

• Port Object

• Register Object

• Selection Object • File Object

FIGURE 14.8

General-use objects.

• Discrete Input Point Object

• Sequencer Object

• Discrete Output Point Object

• Command Block Object

• Analog Input Point Object

• Motor Data Object

• Analog Output Point Object

• Control Supervisor Object

• Presence Sensing Object

• AD/DC Drive Object

• Group Object

• Overload Object

• Discrete Input Group Object

• Softstart Object

• Discrete Output Group Object

• S-Device Supervisor Object

• Discrete Group Object

• S-Analog Sensor Object

• Analog Input Group Object

• S-Analog Actor Object

• Analog Output Group Object

• S-Single Stage Controller Object

• Analog Group Object

• S-Gas Calibration Object

• Position Sensor Object

• Trip Point Object

• Position Controller Supervisor Object

• S-Partial Pressure Object

• Position Controller Object FIGURE 14.9

Application-specific objects.

• DeviceNet Object, see Section 14.3.1.4.1 • ControlNet Object, see Section 14.3.2.4.1 • ControlNet Keeper Object, see Section 14.3.2.4.2 • ControlNet Scheduling Object, see Section 14.3.2.4.3 • TCP/IP Interface Object, see Section 14.3.3.5.1 • Ethernet Link Object, see Section 14.3.3.5.2 FIGURE 14.10

© 2005 by CRC Press

Network-specific objects.

14-10

The Industrial Communication Technology Handbook

Parameter

Application Object(s)

Identity

Message Router

Assembly

Required Objects

Optional Objects I/O

Explicit msg

Connection(s)

Network Link* * - DeviceNet - ControlNet - Ethernet

CIP Network

FIGURE 14.11

Typical device object model.

This looks like a large number of object types, but typical devices only implement a subset of these objects. Figure 14.11 shows the object model of such a typical device. The objects required in a typical device are: • • • •

Either a Connection Object or a Connection Manager Object An Identity Object One or several network link-related objects (depends on network) A Message Router Object (at least its function)

Further objects are added according to the functionality of the device. This allows very good scalability of implementations so that small devices such as a proximity sensor on DeviceNet are not burdened with unnecessary overhead. Developers typically use publicly defined objects (see above list), but can also create their own objects in the vendor-specific areas, e.g., Class IDs 100 to 199. However, it is strongly encouraged to work with the Special Interest Groups (SIGs) of ODVA and ControlNet International to create common definitions for further objects instead of inventing private ones. Out of the general-use objects, several will be described in more detail below. 14.2.5.1 Identity Object (Class Code 0x01) This object is described in more detail for two reasons: (1) being a relatively simple object, it can easily be used to show the general principles, and furthermore, (2) every device must have an Identity Object. Therefore, it is of general interest in this context. The vast majority of devices only support one instance of the Identity Object. Thus, there are typically no requirements for any class attributes that would describe further class details, e.g., how many instances exist in the device; only instance attributes are required in most cases. There are mandatory attributes (Figure 14.12) and optional attributes (Figure 14.13). • The Vendor ID attribute allows an identification of the vendor of every device. This UINT (Unsigned Integer) value (for Data Type descriptions, see Section 14.2.9) is assigned to a specific vendor by ODVA or ControlNet International. If a vendor intends to build products for more than one CIP network, he will get the same Vendor ID for all networks. • The Device Type specifies which profile has been used for this device. It must be one of the Device Types described in Chapter 6 of the CIP Specification [4] or a vendor-specific type (see Section 14.2.6). • The Product Code is a UINT number defined by the vendor of the device. This is used to distinguish multiple products of the same Device Type from the same vendor.

© 2005 by CRC Press

14-11

The CIP Family of Fieldbus Protocols

• Vendor ID

• Status

• Device Type

• Serial Number

• Product Code

• Product Name

• Revision FIGURE 14.12

Mandatory attributes.

• State • Configuration Consistency Value • Heartbeat Interval • Languages Supported FIGURE 14.13

Optional attributes.

• The Revision is split into two USINT (Unsigned Short Integer) values specifying a Major Revision and a Minor Revision. Any change of the device that results in a modified behavior of the device on the network must be reflected in a change of at least the Minor Revision. Any change in the device that needs a revised Electronic Data Sheet (EDS; see Section 14.2.7) must be reflected in a change of the Major Revision. Vendor ID, Device Type, Product Code, and Major Revision allow an unambiguous identification of an EDS for this device. • The Status attribute provides information on the status of the device, e.g., whether it is owned (controlled by another device), whether it is configured (to something different from the out-ofthe-box default), and whether any major or minor faults have occurred. • The Serial Number is used to uniquely identify individual devices in conjunction with the Vendor ID; i.e., no two CIP devices of a vendor may carry the same Serial Number. The 32 bits of the Serial Number allow ample space for a subdivision into number ranges that could be used by different divisions of larger companies. • The Product Name attribute allows the vendor to give a meaningful ASCII name string (up to 32 characters) to the device. • The State attribute describes the state of a device in a single UINT value; it is thus less detailed than the Status attribute. • The Configuration Consistency Value allows a distinction between a configured and an unconfigured device or between different configurations in a device. This helps avoid unnecessary configuration downloads. • The Heartbeat Interval allows enabling of the Device Heartbeat Message and setting the maximum time between two heartbeats to 1 to 255 s. The services supported by the class and instance attributes are either Get_Attribute_Single (typically implemented in DeviceNet devices) or Get_Attributes_All (typically implemented in ControlNet and EtherNet/IP devices). None of the attributes is settable, except for the Heartbeat Interval (if implemented). The only other service that is typically supported by the Identity Object is the reset service. The behavior of the Identity Object is described through a state transition diagram. This and further details of the Identity Object can be found in Chapter 5 of the CIP Specification [4]. 14.2.5.2 Parameter Object (Class Code 0x0F) This object is described in some detail since its concept is referred to in Section 14.2.7, “Configuration and Electronic Data Sheets.” This object, when used, comes in two “flavors”: a complete object and an abbreviated version (Parameter Object Stub). This abbreviated version is mainly used by DeviceNet

© 2005 by CRC Press

14-12

The Industrial Communication Technology Handbook

Parameter Value

This is the actual parameter.

Link Path Size Link Path

These two attributes contain information on what application object/instance/ attribute the parameter value is retrieved from.

Descriptor

This describes parameter properties, e.g., read-only, monitor parameter, etc.

Data Type

This must be one of the Data Types described in Chapter C-6.1 of the CIP Specification, see Section 14.2.9.

Data Size

Data size in bytes.

FIGURE 14.14

Parameter Object Stub attributes.

devices that only have small amounts of memory available. The Object Stub in conjunction with the Electronic Data Sheet has more or less the same functionality as the full object (see Section 14.2.7). The purpose of this object is to provide a general means to allow access to many attributes of the various objects in the device without a simple tool (such as a handheld terminal) having to know anything about the specific objects in the device. The class attributes of the Parameter Object contain information on how many instances exist in this device and a Class Descriptor indicating, among other properties, whether a full or stub version is supported. Furthermore, they tell whether a Configuration Assembly is used and what language is used in the Parameter Object. Of the instance attributes, the first six are those required for the Object Stub. These are listed in Figure 14.14. These six attributes already allow access, interpretation, and modification of the parameter value, but the remaining attributes make life a lot better: • The next three attributes provide ASCII strings with the name of the parameter, its engineering units, and an associated help text. • Another three attributes contain the minimum, maximum, and default values of the parameter. • The next four attributes that follow allow scaling of the parameter value so that the parameter can be displayed in a more meaningful way, e.g., raw value in multiples of 10 mA, scaled value displayed in amps. • Another four attributes follow that can link the scaling values to other parameters. This feature allows variable scaling of parameters, e.g., percentage scaling to a full range value that is set by another parameter. • Attribute 21 defines how many decimal places are to be displayed if the parameter value is scal