5,209 1,540 29MB
Pages 879 Page size 453.6 x 705.6 pts Year 2007
THE
INDUSTRIAL COMMUNICATION TECHNOLOGY H
© 2005 by CRC Press
A
N
D
B
O
O
K
I N D U S T R I A L I N F O R M AT I O N T E C H N O L O G Y S E R I E S Series Editor
RICHARD ZURAWSKI Forthcoming Books Embedded Systems Handbook Edited by Richard Zurawski
Electronic Design Automation for Integrated Circuits Handbook Luciano Lavagno, Grant Martin, and Lou Scheffer
© 2005 by CRC Press
THE
INDUSTRIAL COMMUNICATION TECHNOLOGY H
A
N
D
B
O
O
Edited by
RICHARD ZURAWSKI
© 2005 by CRC Press
K
Library of Congress Cataloging-in-Publication Data The industrial communication technology handbook / Richard Zurawski, editor. p. cm. — (The industrial information technology series ; 1) Includes bibliographical references and index. ISBN 0-8493-3077-7 (alk. paper) 1. Computer networks. 2. Data transmission systems. 3. Wireless communication systems. I. Zurawski, Richard. II. Series. TK5105.5.I48 2005 670'.285'46—dc22
2004057922
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal use of specific clients, may be granted by CRC Press, provided that $1.50 per page photocopied is paid directly to Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA. The fee code for users of the Transactional Reporting Service is ISBN 0-8493-3077-7/05/$0.00+$1.50. The fee is subject to change without notice. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press for such copying. Direct all inquiries to CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com © 2005 by CRC Press No claim to original U.S. Government works International Standard Book Number 0-8493-3077-7 Library of Congress Card Number 2004057922 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper
© 2005 by CRC Press
Foreword
A handbook on industrial communication technology! What a challenge! When we know the complexity of industrial applications, the number of possible solutions, the number of standards, the variety of applications, of contexts, and of products! The challenge can be expressed with just a few words: applications diversity, need for networking, integration of functions, and technologies. Applications diversity: The applications concerned with industrial communications are known under the following terms: process control, manufacturing and flexible systems, building automation, transport management, utilities, and embedded systems, in trains, aircraft, cars, etc. All these applications need similar services, but in very different environments and also with very different qualities of service. Need for networking: The need for networking is not new. Since the MAP and TOP projects, in the field of automation, it is clear that the future of automation is really in distributed systems supported by distributed (heterogeneous) communication systems. The sharing of information, the necessity of interoperability, and the necessity of abstraction levels are just some of the reasons why industrial communication has always been considered a major challenge. Integration: In all the domains, integration is a key word meaning that all the functions in an enterprise need to be interconnected, in real time, as much as possible. This is only feasible through the use of robust communication systems, real-time features, and coherent design of the applications. With the development of ubiquitous computing and ambient intelligence, industrial communication applications will become the next challenge. Technologies: Numerous technologies are available for use at different levels of control and command and in all the services provided by a company; in addition, they exist for maintenance, supervision and monitoring, diagnosis, spare parts management, and so on. Specific solutions are frequently dictated by specific problems. The importance of standards cannot be overemphasized. Wireless systems, fieldbuses and cell or plant networks, building automation, device buses and applications, embedded systems, Internet technologies and related applications, security and safety, MAC protocols, and representative application domains are just some of the topics treated in this handbook. Methodology considerations for choosing and developing systems are also presented. This handbook will become the major reference source for this domain. Setting aside some technological details, the methods and principles presented will be relevant for years to come. Putting together such a book would not be possible without the cooperation of a great number of authors, all specialists in their fields and involved in the development of communication systems and applications, as well as members of the International Advisory Board. The Industrial Communication Technology Handbook is a must for industrial communication professionals. Jean-Pierre Thomesse Institute National Polytechnique de Lorraine Nancy, France v © 2005 by CRC Press
International Advisory Board
Jean-Pierre Thomesse, LORIA-INPL, France, Chair Salvatore Cavalieri, University of Catania, Italy Dietmar Dietrich, Vienna University of Technology, Austria Jean-Dominique Decotignie, CSEM, Switzerland Josep M. Fuertes, Universitat Politecnico de Catalunia, Spain Jürgen Jasperneite, Phoenix Contact, Germany Chris Jenkins, Proces-Data, U.K. Ed Koch, Akua Control, U.S. Thilo Sauter, Austrian Academy of Sciences, Austria Viktor Schiffer, Rockwell Automation, Germany Wolfgang Stripf, Siemens AG, Germany
vi © 2005 by CRC Press
Preface
Introduction Aim The purpose of The Industrial Communication Technology Handbook is to provide a reference useful for a broad range of professionals and researchers from industry and academia interested in or involved in the use of industrial communication technology and systems. This is the first publication to cover this field in a cohesive and comprehensive way. The focus of this book is on existing technologies used by the industry, and newly emerging technologies and trends, the evolution of which is driven by the actual needs and by the industry-led consortia and organizations. The book offers a mix of basics and advanced material, as well as overviews of recent significant research and implementation/technology developments. The book is aimed at novices as well as experienced professionals from industry and academia. It is also suitable for graduate students. The book covers extensively the areas of fieldbus technology, industrial Ethernet and real-time extensions, wireless and mobile technologies in industrial applications, linking the factory floor with the Internet and wireless fieldbuses, industrial networks’ security and safety, automotive applications, industrial automation applications, building automation applications, energy systems applications, and others. It is an indispensable companion for those who seek to learn more on industrial communication technology and systems and for those who want to stay up to date with recent technical developments in the field. It is also a rich source of material for any university or professional development course on industrial networks and related technologies. Contributors The book contains 42 contributions, written by leading experts from industry and academia directly involved in the creation and evolution of the ideas and technologies treated in the book. Over half of the contributions are from industry and industrial research establishments at the forefront of the developments shaping the field of industrial communication technology, for example, ABB, Bosch Rexroth Corporation, CSEM, Decomsys, Frequentis, Phoenix Contact, PROCES-DATA, PSA Peugeot-Citroen, PROFIBUS International, Rockwell Automation, SERCOS North America, Siemens, and Volcano. Most of the mentioned contributors play a leading role in the formulation of long-term policies for technology development and are key members of the industry–academe consortia implementing those policies. The contributions from academia and governmental research organizations are represented by some of the most renowned institutions, such as Cornell University, Fraunhofer, LORIA-INPL, National Institute of Standards (U.S.), Politecnico di Torino (Italy), Singapore Institute of Manufacturing Technology, Technical University of Berlin, and Vienna University of Technology. Format The presented material is in the form of tutorials, surveys, and technology overviews, combining fundamentals with advanced issues, making this publication relevant to beginners as well as seasoned profesvii © 2005 by CRC Press
sionals from industry and academia. Particular emphasis is on the industrial perspective, illustrated by actual implementations and technology deployments. The contributions are grouped in sections for cohesive and comprehensive presentation of the treated areas. The reports on recent technology developments, deployments, and trends frequently cover material released to the profession for the first time. Audience The handbook is designed to cover a wide range of topics that comprise the field of industrial communication technology and systems. The material covered in this volume will be of interest to a wide spectrum of professionals and researchers from industry and academia, as well as graduate students, from the fields of electrical and computer engineering, industrial and mechatronic engineering, mechanical engineering, computer science, and information technology.
Organization The book is organized into two parts. Part 1, Basics of Data Communication and IP Networks, presents material to cover in a nutshell basics of data communication and IP networks. This material is intended as a handy reference for those who may not be familiar with or wish to refresh their knowledge of some of the concepts used extensively in Part 2. Part 2, Industrial Communication Technology and Systems, is the main focus of the book and presents a comprehensive overview of the field of industrial communication technologies and systems. Some of topics presented in this part have received limited coverage in other publications due to either the fast evolution of the technologies involved, material confidentiality, or limited circulation in case of industry-driven developments. Part 1 includes six chapters that present in a concise way the vast area of IP networks. As mentioned, it is intended as supplementary reading for those who would like to refresh and update their knowledge without resorting to voluminous publications. This background is essential to understand the material presented in the chapters in Part 2. This part includes the following chapters: “Principles of LowerLayer Protocols for Data Communications in Industrial Communication Networks,” “IP Internetworking,” “A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues,” “Fundamentals in Quality of Service and Real-Time Transmission,” “Survey of Network Management Frameworks,” and “Internet Security.” Part 2 includes five major sections: Field Area and Control Networks, Ethernet and Wireless Network Technologies, Linking Factory Floor with the Internet and Wireless Fieldbuses, Security and Safety Technologies in Industrial Networks, and Applications of Networks and Other Technologies. Field Area and Control Networks The section on fieldbus technology provides a comprehensive overview of selected fieldbuses. The focus is on the most widely used in industry and the most widely known. The presentation is not exhaustive, however. One of the limiting factors was the availability of qualified authors to write authoritatively on the topics. This section begins with “Fieldbus Systems: History and Evolution,” presenting an extensive introduction to the fieldbus technology, comparison and critical evaluation of the existing technologies, and the evolution and emerging trends. This chapter is a must for anyone with an interest in the origins of the current fieldbus technology landscape. It is also compulsory reading for novices to understand the concepts behind fieldbuses. The “The WorldFIP Fieldbus” chapter was written by Jean-Pierre Thomesse, one of the pioneers of the fieldbus technology. WorldFIP is one of the first fieldbuses, developed in France at the beginning viii © 2005 by CRC Press
of the 1980s and widely used nowadays, particularly in applications that require hard real-time constraints and high dependability. This is almost a “personal” record of a person involved in the development of WorldFIP. A brief record of the origins and evolution of the FOUNDATION Fieldbus (H1, H2, and HSE) and its technical principles is presented in the chapter “FOUNDATION Fieldbus: History and Features.” The description of PROFIBUS (PROFIBUS DP) is presented in “PROFIBUS: Open Solutions for the World of Automation.” This is a comprehensive overview of PROFIBUS DP, one of the leading players in the fieldbus field, and it includes material on HART on PROFIBUS DP, application and master and system profiles, and integration technologies such as GSD (general station description), EDD (electronic device description), and DTM (device type manager). The chapter “Principles and Features of PROFInet” presents a new automation concept, and the technology behind it, that has emerged as a result of the trend in automation technology toward modular, reusable machines and plants with distributed intelligence. PROFInet is an open standard for industrial automation based on the industrial Ethernet. The material is presented by researchers from the Automation and Drives Division of Siemens AG, the leading provider of automation solutions within Siemens AG. Dependable time-triggered communication and architecture are presented in “Dependable TimeTriggered Communication,” written by Hermann Kopetz et al. Hermann Kopetz is the inventor of the concept and the driving force for the technology development. The TTP (Time-Triggered Protocol) and TTA (Time-Triggered Architecture) had a profound impact on the development of safety-critical systems, particularly in the automotive industry. This is one of the most authoritative presentations on this topic. The time-triggered CAN (TTCAN) protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. This technology is introduced in “Controller Area Network: A Survey.” This chapter describes the main features of the Controller Area Network (CAN) protocol, including TTCAN. The chapter “The CIP Family of Fieldbus Protocols” introduces the following CIP (Common Industrial Protocol) -based networks: DeviceNet, a CIP implementation employing a CAN data link layer; ControlNet, implementing the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters); and Ethernet/IP, in which CIP runs over TCP/IP. The chapter also introduces CIP Sync, which is a CIP-based communication principle that enables synchronous low-jitter system reactions without the need for low-jitter data transmission. This is important in applications that require much tighter control of a number of real-time parameters characterizing hard real-time control systems. The chapter also overviews CIP Safety, a safety protocol that adds additional services to transport data with high integrity. The P-NET fieldbus is presented in the chapter “The Anatomy of the P-NET Fieldbus.” The chapter was written by the chairman of the International P-NET User Organization and the technical director of PROCES-DATA (U.K.) Ltd., which provides the real-time PC operating system for P-NET. The chapter “INTERBUS Means Speed, Connectivity, Safety” introduces INTERBUS, a fieldbus with over 6 million nodes installed, and a broad base of device manufacturers. The chapter also briefly introduces IP over INTERBUS and looks at data throughput for IP tunneling. The IEEE 1394 FireWire, a high-performance serial bus, principles of its operation, and applications in the industrial environment are presented in “Data Transmission in Industrial Environments Using IEEE 1394 FireWire.” The issues involved in the configuration (setting up a fieldbus system) and management (diagnosis and monitoring, and adding new devices to the network, to mention some activities) of fieldbus systems
ix © 2005 by CRC Press
are presented in “Configuration and Management of Fieldbus Systems.” This chapter also discusses the plug-and-participate concept and its implementations in the industrial environment. The section on fieldbus technology is concluded by an excellent chapter discussing the pros and cons of selecting control networks for specific applications and application domains. The material in this chapter is authored by Jean-Dominique Decotignie. It includes a great deal of practical recommendations that can be useful for practicing professionals. It is the kind of material that cannot be easily found in the professional literature. Ethernet and Wireless Network Technologies This section on Ethernet and wireless/mobile network technologies contains four chapters discussing the use of Ethernet and its variants in industrial automation, as well as selected issues related to wireless technologies. Ethernet is fast becoming a de facto industry standard for communication in factories and plants at the fieldbus level. The random and native CSMA/CD (carrier-sense multiple access with collision detection) arbitration mechanism is being replaced by other solutions allowing for deterministic behavior required in real-time communication to support soft and hard real-time deadlines. The idea of using wireless technology on the factory floor is appealing, since fieldbus stations and automation components can be mobile, and furthermore, the need for (breakable) cabling is reduced. However, the wireless transmission characteristics are fundamentally different from those of other media types, leading to comparably high and time-varying error rates. This poses a significant challenge for fulfilling the hard real-time and reliability requirements of industrial applications. This section begins with the chapter “Approaches to Enforce Real-Time Behavior in Ethernet,” which discusses various approaches to ensure real-time communication capabilities, to include those that support probabilistic as well as deterministic analysis of the network access delay. This chapter also presents a brief description of the Ethernet protocol. The practical solutions to ensure real-time communication capabilities using switched Ethernet are presented in “Switched Ethernet in Automation Networking.” This chapter provides an evaluation of the switched Ethernet suitability in the context of industrial automation and presents practical solutions obtained through R&D to address actual needs. The issues involving the use of wireless and mobile communication in the industrial environment (factory floor) are discussed in “Wireless LAN Technology for the Factory Floor: Challenges and Approaches.” This is a very comprehensive chapter dealing with topics such as error characteristics of wireless links and lower-layer wireless protocols for industrial applications. It also briefly discusses hybrid systems involving extending selected fieldbus technologies (such as PROFIBUS and CAN) with wireless stations. The chapter “Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment” concludes this section. This chapter discusses from the radio network perspective the potentials and limits of technologies such as Bluetooth, IEEE 802.11, and ZigBee for deployment in the industrial environments. Linking Factory Floor with the Internet and Wireless Fieldbuses The demand for process data availability at different levels of factory organizational hierarchy, from production to the business level, has caused an upsurge in the activities to link the “factory floor” with the intranet/Internet. The issues, solutions, and technologies for linking industrial environments with the Internet and wireless fieldbuses are extensively discussed in this section.
x © 2005 by CRC Press
The issues and actual and potential solutions behind linking factory floor/industrial environments with the Internet/intranet are discussed in “Linking Factory Floor and the Internet.” This chapter also discusses new trends involving industrial Ethernet. The chapter “Extending EIA-709 Control Networks across IP Channels” presents a comprehensive overview of the use of the ANSI/EIA-852 standard to encapsulate the ANSI/EIA-709 control network protocol. This contribution comes from authors from industry involved directly in the relevant technology development. The means for interconnecting wire fieldbuses to wireless ones in the industrial environment, various design alternatives, and their evaluation are presented in “Interconnection of Wireline and Wireless Fieldbuses.” This is one of the most comprehensive and authoritative discussions of this challenge, presented by one of the leading authorities of the fieldbus technology. Security and Safety Technologies in Industrial Networks Security in the field area networks employed in the industrial environment is a major challenge. The requirement for process data availability via intranet/Internet access opens possibilities for intrusion and potential hostile actions to result in engineering system failures, including catastrophic ones if they involve chemical plants, for instance. These and safety issues are the focus of this section. This section begins with the chapter “Security Topics and Solutions for Automation Networks,” which provides a comprehensive discussion of the issues involved, challenges, and existing solutions amenable to adaptation to industrial environments, and outlines a need for new approaches and solutions. The second paper in this section is “PROFIsafe: Safety Technology with PROFIBUS,” which focuses on the existing solutions and supporting technology in the context of PROFIBUS, one of the most widely used fieldbuses in industrial applications. The material is presented by some of the creators of PROFIsafe. CIP Safety, a safety protocol for CIP, is presented in the Field Area and Control Networks section in “The CIP Family of Fieldbus Protocols.” Applications of Networks and Other Technologies This is the last major section in the book. It has eight subsections dealing with specialized field area networks (synonymous with fieldbuses) and their applications to cover automotive communication technology, building automation, manufacturing message specification in industrial communication systems, motion control, train communication, smart transducers, energy systems, and SEMI (Semiconductor Equipment and Materials International). This section tries to present some of the most representative applications of field area networks outside the industrial controls and automation presented in the Field Area and Control Networks section. The “Automotive Communication Technologies” subsection has four chapters discussing different approaches, solutions, and technologies. The automotive industry is a very fast growing consumer of field area networks, aggressively adopting mechatronic solutions to replace or duplicate existing mechanical/hydraulic systems. This subsection begins with the chapter “Design of Automotive X-by-Wire Systems,” which gives an overview of the X-by-wire approach and introduces safety-critical communication protocols (TTP/C, FlexRay, and TTCAN) and operating systems and middleware services (OSEKTime and FTCom) used in automotive applications. The chapter also presents a comprehensive case study illustrating the design of a Steer-by-Wire system. The newly emerging standard and technology for automotive safety-critical communication — FlexRay — is presented in the chapter “FlexRay Communication Technology.” The material is among the most
xi © 2005 by CRC Press
comprehensive and authoritative available at the time of this book’s publication, and it is written by industry people directly involved in the standard and technology development. The LIN (Local Interconnect Network) communication standard, enabling fast and cost-efficient implementation of low-cost multiplex systems for local interconnect networks in vehicles, is presented in “The LIN Standard.” The Volcano concept and technology for the design and implementation of in-vehicle networks using the standardized CAN and LIN communication protocols are presented in “ Volcano: Enabling Correctness by Design.” The material comes from the source: Volcano Communications Technologies AG. This chapter provides insight into the design and development process of an automotive communication network. Another fast-growing consumer of field area networks is building automation. At this stage, particularly for office, commercial, and industrial complexes, the use of automation solutions offers substantial financial savings on costs of lighting and HVAC and can considerably improve the quality of the environment. There are other benefits as well. Relevant communication solutions for this application domain are presented in the subsection “Networks in Building Automation.” This subsection is composed of three contributions, outlining the issues involved and the specific technologies currently in use. An excellent introduction to issues, architectures, and available solutions is presented in “The Use of Network Hierarchies in Building Telemetry and Control Applications.” The material was written by one of the pioneers of the concept of building automation and a technology developer. The details of the European Installation Bus (EIB), a field area network designed specifically for building automation purposes, are presented in “EIB: European Installation Bus.” This chapter was contributed by one of the most active proponents of using field area networks in building automation and a co-founder of one of the largest research groups in this field, the Vienna University of Technology. “Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)” chapter introduces the technical aspects of LonWorks networks, one of the main contenders for building automation. It covers protocol, development environments, and tools. The subsection “Manufacturing Message Specification in Industrial Automation” focuses on the highly successful international standard MMS (manufacturing message specification), which is an Open Systems Interconnection (OSI) application layer messaging protocol designed for the remote control and monitoring of devices such as remote terminal units (RTUs), programmable logic controllers (PLCs), numerical controllers (NCs), robot controllers (RCs), etc. This section features two chapters: “The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS),” which gives a fairly comprehensive introduction to the standard and illustrates its use; and “Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine,” which shows the use of MOTIP (MMS on top of TCP/IP) in development and operation of the virtual factory environment. The chapter also discusses an MMS-based Internet monitoring system. The chapter “The SERCOS interface™” describes the international standard (IEC/EN 61491) for communication between digital motion controls, drives, input/output (I/O), and sensors. It includes definitions, a brief history, a description of SERCOS interface communication methodology, an introduction to SERCOS interface hardware, a discussion of speed considerations, information on conformance testing, and information on available development tools. A number of real-world applications are presented and a list of sources for additional information is provided. The “IEC/IEEE Train Communication Network” chapter presents details of the international standard IEC 61375, adopted in 1999. It also discusses other European and U.S. initiatives in this field.
xii © 2005 by CRC Press
“A Smart Transducer Interface Standard for Sensors and Actuators” presents material on the IEEE 1451 standards for connecting sensors and actuators to microprocessors, control and field area networks, and instrumentation systems. The standards also define the Transducer Electronic Data Sheet (TEDS), which allows for the self-identification of sensors. The IEEE 1451 standards facilitate sensor networking, a new trend in industrial automation, which, among other benefits, offers strong economic incentives. The use of IEC 61375 (Train Communication Network) in substation automation is presented in “Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations.” This is in an interesting case study illustrating the suitability of some of the field area networks for various application domains. The last subsection and chapter in the Applications of Networks and Other Technologies section is “SEMI Interface and Communication Standards: An Overview and Case Study.” This is an excellent introduction to SEMI, providing an overview of the fundamentals of the SEMI Equipment Communication Standard, commonly referred to as SECS, its interpretation, the available software tools, and case study applications. The material was written by experts from the Singapore Institute of Manufacturing Technology who were involved in a number of SEMI technology developments and deployments.
Locating Topics To assist readers with locating material, a complete table of contents is presented at the front of the book. Additionally, each chapter begins with its own table of contents. For further assistance, two indexes are provided at the end of the book: an index of authors who contributed to the book, together with the titles of their contributions, and a detailed subject index.
xiii © 2005 by CRC Press
Acknowledgments
I thank all members of the International Advisory Board for their help with structuring the book, selection of authors, and material evaluation. I have received tremendous cooperation from all contributing authors. I thank all of them for that. I also express gratitude to my publisher, Nora Konopka, and other CRC Press staff involved in the book’s production, particularly Jessica Vakili, Elizabeth Spangenberger, and Gail Renard. My gratitude goes also to my wife, who tolerated the countless hours I spent preparing this book. Richard Zurawski ISA Corp Santa Clara, CA
© 2005 by CRC Press
The Editor
Dr. Richard Zurawski is president and CEO of ISA Corp., South San Francisco and Santa Clara, CA, a company involved in providing solutions for industrial and societal automation. He is also chief scientist with and a partner in a Silicon Valley-based start-up involved in the development of wireless solutions and technology. Dr. Zurawski is a co-founder of the Institute for Societal Automation, Santa Clara, a research and consulting organization. Dr. Zurawski has over 25 years of academic and industrial experience, including a regular appointment at the Institute of Industrial Sciences, University of Tokyo, and full-time R&D advisor with Kawasaki Electric, Tokyo. He has provided consulting services to Telecom Research Laboratories, Melbourne, Australia, and Kawasaki, Ricoh, and Toshiba Corporations, Japan. He has participated in an IMS package: Formal Methods in Distributed Autonomous Manufacturing Systems and Distributed Logic Controllers, Task 8: Distributed Intelligence in Manufacturing Systems; Globeman 21 Group I: Global Product Management. He has also participated in a number of Japanese Intelligent Manufacturing Systems programs. Dr. Zurawski’s involvement in R&D projects and activities in the past few years includes remote monitoring and control, network-based solutions for factory floor control, network-based demand side management, MEMS (automatic microassembly), Java technology, SEMI (Semiconductor Equipment and Materials International) implementations, development of DSL telco equipment, and wireless applications. Dr. Zurawski currently serves as an associate editor of the IEEE Transactions on Industrial Electronics and Real-Time Systems: The International Journal of Time-Critical Computing Systems, Kluwer Academic Publishers. He was a guest editor of three special sections in IEEE Transactions on Industrial Electronics: two sections on factory automation and one on factory communication systems. He has also been a guest editor of a special issue of the Proceedings of the IEEE dedicated to industrial communication systems. In addition, Dr. Zurawski was invited by IEEE Spectrum to contribute material on Java technology to “Technology 1999: Analysis and Forecast Issue.” Dr. Zurawski is the series editor for The Industrial Information Technology Series, CRC Press, Boca Raton, FL, and has served as a vice president of the Institute of Electrical and Electronics Engineers (IEEE) Industrial Electronics Society (IES), chairman of the Factory Automation Council, and chairman of the IEEE IES Ad Hoc Committee on IEEE Transactions on Factory Automation. He was an IES representative to the IEEE Neural Network Council and IEEE Intelligent Transportation Systems Council. He was also on a steering committee of the ASME/IEEE Journal of Micromechanical Systems. In 1996, he received the Anthony J. Hornfeck Service Award from the IEEE Industrial Electronics Society. Dr. Zurawski has established two IEEE events: the IEEE Workshop on Factory Communication Systems, the only IEEE event dedicated to industrial communication networks; and the IEEE International Conference on Emerging Technologies and Factory Automation, the largest IEEE conference on factory automation. He has served as a general, program, and track chair for a number of IEEE conferences and workshops. xv © 2005 by CRC Press
Dr. Zurawski has published extensively on various aspects of control systems, industrial and factory automation, industrial communication systems, robotics, formal methods in the design of embedded and industrial systems, and parallel and distributed programming and systems. Currently, he is preparing The Embedded Systems Handbook, soon to be published by CRC Press.
© 2005 by CRC Press
Contributors
Luís Almeida
Joachim Feld
Øyvind Holmeide
Universidade de Aveiro Aveiro, Portugal
Siemens AG Nürnberg, Germany
OnTime Networks Billingstad, Norway
Herbert Barthel
A.M. Fong
Jürgen Jasperneite
Siemens AG Nürnberg-Moorenbrunn, Germany
Singapore Institute of Manufacturing Technology Singapore
Phoenix Contact GmbH & Co. KG Bad Pyrmont, Germany
Günther Bauer
Klaus Frommhagen
Ulrich Jecht
Vienna University of Technology Vienna, Austria
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
UJ Process Analytics Baden-Baden, Germany
K.M. Goh
PROCES-DATA (U.K.) Ltd. Wallingford, Oxon, United Kingdom
Ralph Büsgen Siemens AG Nürnberg, Germany
Christopher G. Jenkins
Salvatore Cavalieri
Singapore Institute of Manufacturing Technology Singapore
University of Catania Catania, Italy
Zygmunt J. Haas
Gianluca Cena
Cornell University Ithaca, New York
IEIIT-CNR Torino, Italy
Scott C. Hibbard
Jean-Dominique Decotignie
Svein Johannessen ABB Corporate Research Billingstad, Norway
Wolfgang Kampichler Bosch Rexroth Corporation Hoffman Estates, Illinois
Frequentis GmbH Vienna, Austria
Wolfgang Kastner Helmut Hlavacs
Centre Suisse d’Electronique et de Microtechnique Neuchatel, Switzerland
University of Vienna Vienna, Austria
Wilfried Elmenreich
Mai Hoang
Vienna University of Technology Vienna, Austria
University of Potsdam Potsdam, Germany
Vienna University of Technology Vienna, Austria
Dong-Sung Kim Kumoh National Institute of Technology Gumi-Si, South Korea
xvii © 2005 by CRC Press
Hubert Kirrmann
Peter Lutz
Antal Rajnák
ABB Corporate Research Baden, Switzerland
Interests Group SERCOS interface e.V. Stuttgart, Germany
Volcano AG Tägerwilen, Switzerland
Edward Koch Akua Control San Rafael, California
Hermann Kopetz Vienna University of Technology Vienna, Austria
Kirsten Matheus Carmeq GmbH Berlin, Germany
Dietmar Millinger DECOMSYS — Dependable Computer Systems Vienna, Austria
Christopher Kruegel Vienna University of Technology Vienna, Austria
Christian Kurz University of Vienna Vienna, Austria
Ronald M. Larsen SERCOS North America Lake in the Hills, Illinois
Kang Lee National Institute of Standards and Technology Gaithersburg, Maryland
Y.G. Lim Singapore Institute of Manufacturing Technology Singapore
Lucia Lo Bello
Petra Nauber
Thilo Sauter Austrian Academy of Sciences Wiener Neustadt, Austria
Uwe Schelinski Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Viktor Schiffer
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Rockwell Automation Haan, Germany
Nicolas Navet
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Michael Scholles LORIA Vandoeuvre-lès-Nancy, France
Georg Neugschwandtner Vienna University of Technology Vienna, Austria
Roman Nossal DECOMSYS — Dependable Computer Systems Vienna, Austria
Paulo Pedreiras Universidade de Aveiro Aveiro, Portugal
Christian Schwaiger Austria Card GmbH Vienna, Austria
Karlheinz Schwarz Schwarz Consulting Company (SCC) Karlsruhe, Germany
Françoise Simonot-Lion LORIA Vandoeuvre-lès-Nancy, France
University of Catania Catania, Italy
Stefan Pitzek
Tor Skeie
Vienna University of Technology Vienna, Austria
ABB Corporate Research Billingstad, Norway
Dietmar Loy
Manfred Popp
Ye Qiong Song
LOYTEC Electronics GmbH Vienna, Austria
Siemens AG Fürth, Germany
LORIA Vandoeuvre-lès-Nancy, France
xviii © 2005 by CRC Press
Stefan Soucek
O. Tin
Cédric Wilwert
LOYTEC Electronics GmbH Vienna, Austria
Singapore Institute of Manufacturing Technology Singapore
PSA Peugeot–Citroen La Garenne Colombe, France
Wilfried Steiner Vienna University of Technology Vienna, Austria
Hagen Woesner Albert Treytl Vienna University of Technology Vienna, Austria
Wolfgang Stripf Siemens AG Karlsruhe, Germany
Technical University of Berlin Berlin, Germany
K. Yi Adriano Valenzano IEIIT-CNR Torino, Italy
Singapore Institute of Manufacturing Technology Singapore
Peter Wenzel
Pierre A. Zuber
PROFIBUS International Karlsruhe, Germany
Bombardier Transportation Total Transit Systems Pittsburgh, Pennsylvania
Jean-Pierre Thomesse Institut National Polytechnique de Lorraine Vandoeuvre-lès-Nancy, France
Andreas Willig University of Potsdam Potsdam, Germany
xix © 2005 by CRC Press
Contents
Part 1
Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks..............................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel
Part 2
Industrial Communication Technology and Systems
Section I
Field Area and Control Networks
7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner
xxi © 2005 by CRC Press
13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie
Section II
Ethernet and Wireless Network Technologies
20 Approaches to Enforce Real-Time Behavior in Ethernet .............................................20-1 Paulo Pedreiras and Luís Almeida 21 Switched Ethernet in Automation Networking .............................................................21-1 Tor Skeie, Svein Johannessen, and Øyvind Holmeide 22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches..........22-1 Andreas Willig 23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment ......................................................................................................................23-1 Kirsten Matheus
Section III
Linking Factory Floor with the Internet and Wireless Fieldbuses
24 Linking Factory Floor and the Internet.........................................................................24-1 Thilo Sauter 25 Extending EIA-709 Control Networks across IP Channels..........................................25-1 Dietmar Loy and Stefan Soucek 26 Interconnection of Wireline and Wireless Fieldbuses..................................................26-1 Jean-Dominique Decotignie
Section IV
Security and Safety Technologies in Industrial Networks
27 Security Topics and Solutions for Automation Networks............................................27-1 Christian Schwaiger and Albert Treytl
xxii © 2005 by CRC Press
28 PROFIsafe: Safety Technology with PROFIBUS ...........................................................28-1 Wolfgang Stripf and Herbert Barthel
Section V
Applications of Networks and Other Technologies
Automotive Communication Technologies 29 Design of Automotive X-by-Wire Systems ....................................................................29-1 Cédric Wilwert, Nicolas Navet, Ye Qiong Song, and Françoise Simonot-Lion 30 FlexRay Communication Technology ............................................................................30-1 Dietmar Millinger and Roman Nossal 31 The LIN Standard ............................................................................................................31-1 Antal Rajnák 32 Volcano: Enabling Correctness by Design.....................................................................32-1 Antal Rajnák Networks in Building Automation 33 The Use of Network Hierarchies in Building Telemetry and Control Applications......................................................................................................................33-1 Edward Koch 34 EIB: European Installation Bus ......................................................................................34-1 Wolfgang Kastner and Georg Neugschwandtner 35 Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)..........................................................................................................35-1 Dietmar Loy Manufacturing Message Specification in Industrial Automation 36 The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS) ..............................................................................................................36-1 Karlheinz Schwarz 37 Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine...........................................................................................37-1 Dong-Sung Kim and Zygmunt J. Haas Motion Control 38 The SERCOS interface™..................................................................................................38-1 Scott C. Hibbard, Peter Lutz, and Ronald M. Larsen Train Communication Network 39 The IEC/IEEE Train Communication Network ............................................................39-1 Hubert Kirrmann and Pierre A. Zuber
xxiii © 2005 by CRC Press
Smart Transducer Interface 40 A Smart Transducer Interface Standard for Sensors and Actuators ...........................40-1 Kang Lee Energy Systems 41 Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations ..................................................................................................41-1 Hubert Kirrmann SEMI 42 SEMI Interface and Communication Standards: An Overview and Case Study........42-1 A.M. Fong, K.M. Goh, Y.G. Lim, K. Yi, and O. Tin
xxiv © 2005 by CRC Press
3077_book.fm Page 1 Friday, November 19, 2004 11:21 AM
1 Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks................................................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel
1-1 © 2005 by CRC Press
1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks 1.1 1.2
Introduction ........................................................................1-1 Framing and Synchronization............................................1-2 Bit Synchronization • Frame Synchronization • Example: Bit and Frame Synchronization in the PROFIBUS
1.3
Medium Access Control Protocols.....................................1-6 Requirements and Quality-of-Service Measures • Design Factors • Random Access Protocols • Fixed-Assignment Protocols • Demand-Assignment Protocols • Meta-MAC Protocols
1.4
Error Control Techniques.................................................1-15 Open-Loop Approaches • Closed-Loop Approaches • Hybrid Approaches • Further Countermeasures
1.5
Flow Control Mechanisms................................................1-18 XON/XOFF and Similar Methods • Sliding-Window Flow Control • Further Mechanisms
1.6
Packet Scheduling Algorithms..........................................1-20 Priority Scheduling • Fair Scheduling
1.7
Andreas Willig University of Potsdam
Hagen Woesner Technical University of Berlin
Link Layer Protocols .........................................................1-22 The HDLC Protocol Family • The IEEE 802.2 LLC Protocol
References .....................................................................................1-24 Bit and Frame Synchronization • Medium Access Control Protocols • Error Control • Flow Control • Link Layer Protocols • Packet Scheduling
1.1 Introduction In packet-switched networks the lower layers (data link layer, medium access control layer, physical layer) have to solve some fundamental tasks to facilitate successful communication. The lower layers are concerned with communication between neighbored stations, in contrast to the layers above (networking layer, transport layer), which are concerned with end-to-end communications over multiple intermediate stations.
1-1 © 2005 by CRC Press
1-2
The Industrial Communication Technology Handbook
The lower layers communicate over physical channels, and consequently, their design is strongly influenced by the properties of the physical channel (bandwidth, channel errors). The importance of the lower layers for industrial communication systems is related to the requirement for hard real-time and reliability guarantees: if the lower layers are not able to guarantee successful delivery of a packet/frame* within a prescribed amount of time, this cannot be compensated by any actions of the upper layers. Therefore, a wide variety of mechanisms have been developed to implement these guarantees and to deal with harmful physical channel properties like transmission errors. In virtually all data communication networks used in industrial applications the transmission is packet based; i.e., the user data are segmented into a number of distinct packets, and these packets are transmitted over the channel. Therefore, the following fundamental problems have to be solved: • What constitutes a frame and how are the bounds of a frame specified? How does the receiver detect frames and the data contained? To this end, framing and synchronization schemes are needed, discussed in Section 1.2. • When should a frame be transmitted? If multiple stations want to transmit their frames over a common channel, appropriate rules are needed to share the channel and to let each station determine when it may send its frames. This problem is tackled by medium access control (MAC) protocols, discussed in Section 1.3. • How should channel errors be coped with? The topic of error control is briefly touched on in Section 1.4. • How should the receiver be protected against too much data sent by the transmitter? This is the problem of flow control, discussed in Section 1.5. • Which packet should be transmitted next? This is the problem of packet scheduling, sketched in Section 1.6. • Finally, in link layer protocols all these mechanisms are combined into a working protocol. We discuss two important protocols in Section 1.7. The chapter is necessarily short on many topics. The interested reader will find further references in the text.
1.2 Framing and Synchronization The problem of synchronization is related to the transmission of information units (packets, frames) between a sending and a receiving entity. In computer systems, information is usually stored and processed in a binary digital form (bits). A packet is formed from a group of bits and shall be transmitted to the receiver. The receiver must be able to uniquely determine the start and end of a packet as well as the bits within the packet. The transmission of information over short distances, for instance, inside the computer, can be done with parallel transmission. Here, a number (say 64) of parallel copper wires transport all bits of a 64-bit data word at the same time. In most cases, one additional wire transmits the common reference clock. Whenever the transmitter has applied the correct voltage (representing a 0 or 1 bit) on all wires, it signals this by sending a sampling pulse on the clock wire toward the receiver. Conversely, on receiving a pulse on the clock wire, the receiver samples the voltage levels on all data wires and converts them back to bits by comparing them with a threshold. This kind of transmission is fast and simple, but cannot span large distances, because the cabling cost becomes prohibitive. Therefore, the data words have to be serialized and transmitted bit by bit on a single wire.† *We will use both terms interchangeably. †The term wire is actually used here as a synonym for a transmission channel. It therefore could also be a wireless or ISDN channel.
© 2005 by CRC Press
1-3
Principles of Lower-Layer Protocols for Data Communications
1
0
1
1
0
0
1
0
NRZ
Manchester
Diff. Manchester
1 means no level change
0 means level change
FIGURE 1.1 NRZ, Manchester, and differential Manchester codes.
1.2.1 Bit Synchronization The spacing of bits generated by the transmitter depends on its local clock. The receiver needs this clock information to sample the incoming signal at appropriate points in time. Unfortunately, the transmitters’ and receivers’ clocks are not synchronized, and the synchronization information has to be recovered from the data signal; the receiver has to synchronize with the transmitter. This process is called bit synchronization. The aim is to let the receiver sample the received signal in the middle of the bit period in order to be robust against the impairments of the physical layer, like bandwidth limitation and signal distortions. Bit synchronization is called asynchronous if the clocks are synchronized only for one data word and have to be resynchronized for the next word. A common mechanism used for this employs one start bit preceding the data word and one or more stop bits concluding it. The Universal Asynchronous Receiver/Transmitter (UART) specification defines one additional parity bit, which is appended to the 8 data bits, leading to the transmission of 11 bits total for every 8 data bits [3]. The upper row in Figure 1.2 illustrates this. For longer streams of information bits, the receiver clock must be synchronized continuously. The digital phase-locked loop (DPLL) is an electrical circuit that controls a local clock and adjusts it to the received clock being extracted from the incoming signal [23]. To recover the clock from the signal, sufficiently frequent changes of signal levels are needed. Otherwise, if the wire shows the same signal level for a long time (as may happen for the non-return to zero (NRZ) coding method, where bits are directly mapped to voltage levels), the receiver clock could drift away from the transmitter clock. The Manchester encoding (shown in the second row of Figure 1.1) ensures that there is at least one signal change per bit. Every logical 1 is represented by a signal change from one to zero, whereas a logical 0 shows the opposite signal change. The internal clock of the DPLL samples the incoming signal with a much higher frequency, for instance, 16 times per bit. For a logical 0 bit that is arriving exactly in time, the DPLL receives a sample pattern of 0000000011111111. If the transition between the 0 and 1 samples is not exactly in the middle of the bit but rather left or right of it, the local clock has to be readjusted to run faster or slower, respectively. In the classical IEEE 802.3 Ethernet, the bits are Manchester encoded [2]. To allow the DPLL of the receiver to synchronize to the received bit stream, a 64-bit-long preamble is transmitted ahead of each frame. This preamble consists of alternating 0 and 1 bits that result in a square wave of 5 MHz. A start-of-frame delimiter of two consecutive 1 bits marks the end of the preamble and the beginning of the data frame.
© 2005 by CRC Press
1-4
The Industrial Communication Technology Handbook
Start
D7
D6
D5
D4
D3
D2
D1
D0
Parity
Stop
UART character (11 bit) Start, Stop Start/Stop bit D7D0 Data bits SD1
DA
SA
FC
FCS
ED
Control frame (no data)
SD2
DA
SA
FC
Data
FCS
ED
SD1SD3 DA, SA FC FCS LE LEr ED
Start Delimiter Destination, Source address Frame Control byte Frame Check Sequence (CRC) Length Field Length Field repeated End Delimiter
Fixed data length (8 characters)
SD3
LE
LEr
SD3
SA
DA
FC
Data
FCS
ED
Variable length data frame (0249 characters)
FIGURE 1.2 EN 50170 PROFIBUS: character and selected frame formats.
1.2.2 Frame Synchronization It is of interest for the receiver to know whether the received information is (1) complete and (2) correct. Section 1.4 treats the latter problem in some more detail. To decide about the first problem, the receiver needs to know where a packet starts and ends. The question that arises immediately is that of marking the start and end of a frame. There are several ways to accomplish this; in real-world protocols one often finds combinations of them. In the following, the most important will be discussed briefly. 1.2.2.1 Time Gaps The most straightforward way to distinguish between frames is to leave certain gaps of silence between them. However, when many stations share the same medium, all of them have to obey these time gaps. As it will be seen in Section 1.3.3.2, several MAC protocols rely on minimum time gaps to determine if the medium is accessible. While time gaps are a simple way to detect the start of a frame, it should be possible to detect the end of it, too. Using time gaps, the end of the previous packet can be detected only after a period of silence. Even if the receiver detects a silent medium, it cannot be sure if this is the result of a successful transmission or a link or node failure. Therefore, additional mechanisms are needed. 1.2.2.2 Code Violations A bit is usually encoded by a certain signal pattern (e.g., a change in voltage or current levels) that is, of course, different for a 0 and 1 bit. A signal pattern that represents neither of the allowed values can be taken as a marker for the start of a frame. An example for this is the IEEE 802.5 Token Ring protocol [28], which uses differential Manchester encoding (see Figure 1.1 and [28]). Here, two special symbols appear: J for a so-called positive code violation and K for a negative one. In contrast to the bit definitions in the encoding, these special symbols do not show a transition in the middle of the bit. Special 8-bit-long characters that mark the beginning and end of the frame are constructed from these symbols. 1.2.2.3 Start/End Flags Some protocols use special flags to indicate the frame boundaries. A sequence of 01111110, that is, six 1 bits in a sequence surrounded by two 0 bits, marks the beginning and the end of each frame. Of course, since the payload that is being transmitted can be an arbitrary sequence of bits, it is possible that the flag is contained in the payload. To avoid misinterpretation of a piece of payload data as being the end of a frame, the sender has to make sure that it only transmits the flag pattern if it is meant as a flag. Any
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-5
flag-like data have to be altered in a consistent way to allow the receiver to recover the original payload. This can be done using bit- or byte-/character-stuffing techniques. Bit stuffing, as exercised in high-level data link control (HDLC) [23] protocols, requires the sender to insert a zero bit after each sequence of five consecutive 1 bits. The receiver checks if the sixth bit that follows five 1s is a zero or one bit. If it detects a zero, it is removed from the sequence of bits. If it detects a 1, it can be sure that this is a frame boundary. Unfortunately, it might happen that a transmission error modifies the data sequence 01111100 into 01111110, and thus creates a flag prematurely. Therefore, additional mechanisms like time gaps are needed to remove the following bits and detect the actual end of the frame. Byte stuffing, as employed in Point-to-Point Protocol (PPP), uses the same flag pattern, but relies on a byte-oriented transmission medium [5, 6]. The flag can be written as a hexadecimal value 0¥7E. Every unintentional appearance of the flag pattern is replaced by two characters, 0¥7D 0¥5E. This way, the flag character disappears, but the 0¥7D (also called escape character) has to be replaced, if it is found in the user data. To this end, 0¥7D is replaced by 0¥7D 0¥5D. The receiver, after detecting the escape character in the byte stream, discards this byte and performs an exclusive-or (XOR) operation of 0¥20 with the following byte to recover the original payload. In both cases, more data are being transmitted than would be necessary without bit- or byte-stuffing techniques. To make things worse, the amount of overhead depends on the contents of the payload. A malicious user might effectively double the data rate (with byte stuffing) or increase it by around 20% (with bit stuffing) by transmitting a continuous stream of flags. To avoid this, several measures can be taken. One is to scramble the user data before they are put into data frames [4]. Another possibility is the so-called consistent overhead byte stuffing (COBS), proposed in [1]. Here, the stream of data bytes is scanned in advance for appearing flags. The sequence of data bytes is then cut into chunks of at most 254 bytes not containing the flag. Thus, every flag that appears in the flow is replaced by one byte representing the number of nonflag data bytes following it. This way, no additional data are to be transmitted as long as there is at least one flag every 255 data bytes. Otherwise, one byte is inserted every 254 bytes, indicating a full-length chunk. 1.2.2.4 Length Field To avoid the processing overhead that comes with bit or character stuffing, it is possible to reserve a field in the frame header that indicates the length of the whole frame. Having read this field, the receiver knows in advance how many characters or bytes will arrive. No end delimiter is needed anymore. Either a continuous transmission of packets followed by idle symbols, or the usual combination of preamble and start delimiter is needed to correctly determine which of the header fields carries the length information. Being potentially the best solution concerning transmission overhead, the length field mechanism suffers from erroneous transmission media. If the packet length information is lost or corrupted, then it is difficult to find again. Therefore, it has to be protected separately using error-correcting codes or redundant transmission. Additional mechanisms (for example, time gaps) should be employed to find the end of a frame even when the length field is erroneous.
1.2.3 Example: Bit and Frame Synchronization in the PROFIBUS As an example to illustrate the mechanisms introduced above, let us look at the lower layers of the EN 50170/DIN 19245 process fieldbus (PROFIBUS) [52]. This standard defines a fieldbus system for industrial applications. The lowest layer, the physical layer, is based on the RS-485 electrical interface. Shielded twisted-pair or fiber cable may be used as the transmission medium. The UART converts every byte into 11-bit transmission characters by adding start, parity, and stop bits. Thus, asynchronous bit synchronization is used on the lowest layer of the PROFIBUS. The second layer is called the fieldbus data link (FDL). It defines the frame format, as shown in Figure 1.2. Start and end delimiters are used in every frame, but different start delimiters SDx (SD1 to SD4; the
© 2005 by CRC Press
1-6
The Industrial Communication Technology Handbook
latter is not shown in the figure) define different frame types. Thus, a receiver knows after reading an SD1 that a control frame of fixed length will arrive. In addition, time gaps of 33 bit times are required between the frames. After receiving an SD3, the receiver interprets the next byte as LE (length field) and checks this against the redundant transmission of LE in the third byte, thereby decreasing the probability of undetected errors in the length field. Using the combination of time gaps and the redundant transmission of the length field, a character stuffing to replace all possible start and end delimiters in the payload becomes unnecessary.
1.3 Medium Access Control Protocols All medium access control or multiple-access control (MAC) protocols try to solve the same problem: to let a number of stations share a common resource (namely, the transmission medium) in an efficient manner and such that some desired performance objectives are met. They are a vital part of local area network (LAN) and metropolitan area network (MAN) technologies, which typically connect a small to moderate number of users in a small geographical area, such that a user can communicate with other users. With respect to the Open Systems Interconnection (OSI) reference model, the MAC layer does not form a protocol layer on its own, but is considered a sublayer of either the physical layer or the data link layer [45]. However, due to its distinguished task, the MAC sublayer deserves separate treatment. The importance of the MAC layer is reflected by the fact that many MAC protocol standards exist, for example, the IEEE 802.x standards. Its most fundamental task is to determine for each station attached to a common broadcast medium the points in time where it is allowed to access the medium, i.e., to send data or control frames. To this end, each station executes a separate instance of a MAC protocol. The design and behavior of a MAC protocol depend on the design goals and the properties of the underlying physical medium. Specifically for hard real-time communications, the MAC layer is a key component: if the delays on the MAC layer are not strictly bounded, the upper layers cannot compensate this. A large number of MAC protocols have been developed during the last three decades. The following references are landmark papers or survey articles covering the most important protocols: [7], [8], [17], [20], [21], [32], [33], [34], [36], [42], [43], [48], [49], [50], [54]. Furthermore, MAC protocols are covered in many textbooks on computer networking, for example, [12], [23], [45]. In this survey, we stick to those protocols that are important for industrial applications and that have found some deployment in factory plants, either as stand-alone solutions or as building blocks of more complex protocols.
1.3.1 Requirements and Quality-of-Service Measures There are a number of (sometimes conflicting) requirements for MAC protocols; some of them are specific for industrial applications with hard real-time and reliability constraints. There are two main delay-oriented measures: the medium access delay and the transmission delay. The medium access delay is the time between arrival of a frame and the time where its transmission starts. This delay is affected by the operational overhead of the MAC itself, which may include collisions, MAC control frames, backoff and waiting times, and so on. The transmission delay denotes the time between frame arrival and its successful reception at the intended receiver. Clearly, the medium access delay is a fraction of the transmission delay. For industrial applications with hard real-time requirements, both delays must be upper bounded. In addition, a desirable property is to have low medium access delays in case of low network loads. A key requirement for industrial applications is the support for priorities: important frames (for example, alarms, periodic process data) should be transmitted before unimportant ones. This requirement can be posed locally or globally: in the local case, each station decides independently which of its waiting frames is transmitted next. There is no guarantee that station A’s important frames are not blocked by station B’s unimportant frames. In the global case, the protocol identifies the most important frame of all stations to be transmitted next.
© 2005 by CRC Press
1-7
Principles of Lower-Layer Protocols for Data Communications
The need to share bandwidth between stations constitutes another important class of desired MAC properties. A frequently posed requirement is fairness: stations should get their fair share of the bandwidth, even if other stations demand much more. It is also often required that a station receives a minimum bandwidth, like for the transmission of periodic process data of fixed size. With respect to throughput, it is clearly important to keep the MAC overhead small. This concerns the frame formats, the number and frequency of MAC control frames, and efficiency losses due to the operation of the MAC protocol. An example for efficiency loss is collisions: the bandwidth spent for collided packets is lost, since typically the collided frames are useless and must be retransmitted. A MAC protocol is said to be stable if an increase in the overall load does not lead to a decrease in throughput. Depending on the application area, other constraints can be important as well. For simple field devices, the MAC implementation should have a low complexity and be simple enough to be implementable in hardware. For mobile stations using wireless media, the energy consumption is a major concern; therefore, power-saving mechanisms are needed. For wireless transmission media, the MAC should contain additional mechanisms to adapt to the instantaneous error behavior of the wireless channel; possible control knobs are the transmit power, error-correcting codes, the bit rate, and several more.
1.3.2 Design Factors The most important factors influencing the design of MAC protocols are the medium properties/medium topology and the available feedback from the medium. We can broadly distinguish between guided media and unguided media. In guided media the signals originating from frame transmissions propagate within well-specified geographical bounds, typically within copper or fiber cables. If the medium is properly shielded, then beyond these bounds the communications are invisible and two cables can be placed close to each other without mutual interference. In contrast, in unguided media (with radio frequency or infrared wireless media being the prime example) the wave propagation is visible in the whole geographical vicinity of the transmitter, and ongoing transmissions can be received at any point close enough to the transmitter. Therefore, two different networks overlayed within the same geographical region can influence each other. This coexistence problem appears, for example, with IEEE 802.11b [37] and Bluetooth [13, 22]. Both systems utilize the 2.4-GHz industrial, scientific, and medical (ISM) band [16, 25, 35]. Guided media networks can have a number of topologies. We discuss a few examples. In a ring topology (see Figure 1.3), each station has a point-to-point link to its two neighbors, such that the stations form a ring. In a bus topology like the one shown in Figure 1.4, the stations are connected to a common bus
1
4
FIGURE 1.3 Ring topology.
© 2005 by CRC Press
1
2
4
3
2
3 FIGURE 1.4 Bus topology (the black boxes are line terminations).
1-8
The Industrial Communication Technology Handbook
1
2
Star
4
3
FIGURE 1.5 Star topology.
1
7
2
5 6 3
8
4 FIGURE 1.6 Partial mesh topology.
and all stations see the same signals. Hence, the bus is a broadcast medium. In the star topology illustrated in Figure 1.5, all stations only have a physical connection to a central device, the star coupler, which repeats and optionally amplifies the signals coming from one line to all the other lines. A network with a star topology also provides a broadcast medium, where each station can hear all transmissions. When using wireless transmission media, the distance between stations might be too large to allow all stations to receive all transmissions. Therefore, the network is often only partially connected or has a partial mesh structure, shown in Figure 1.6. Additional routing mechanisms have to be employed to implement multihop transmission, for example, from station 4 to station 8. An important property of a physical channel is the available feedback. Specifically, some kinds of media allow a station to read back data from the channel while transmitting. This can be done to detect faulty transceivers (like in the PROFIBUS protocol [52]), collisions (like in the Ethernet protocol), or parallel ongoing transmissions of higher priority (like in the Controller Area Network (CAN) protocol). This feature is typically not available when using wireless technologies: it is not possible to send and receive simultaneously on the same channel.
1.3.3 Random Access Protocols In random access (RA) protocols the stations are uncoordinated and the protocols work in a fully distributed manner. RA protocols typically incorporate a random element, for example, by exploiting random packet arrival times, setting timers to random values, and so on. The lack of central coordination and of fixed resource assignment allows the sharing of a channel between a potentially infinite number of stations, whereas fixed assignment and polling protocols support only a finite number of stations. However, the randomness can make it impossible to give deterministic guarantees on medium access delays and transmission delays. There are many RA protocols that are used not only on their own, but also as building blocks of more complex protocols. One example is the GSM system, where speech data are transmitted in exclusively allocated time slots on a certain frequency, but the call setup messages have to contend for a shared channel using an ALOHA-like protocol.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-9
1.3.3.1 ALOHA and Slotted ALOHA A classical protocol is ALOHA [7], for which we present two variants here. In both variants a number of stations want to transmit packets to a central station. In pure ALOHA a station sends a newly arriving data frame immediately without inquiring the status of the transmission medium. Hence, frames from multiple stations can overlap at the central station (collision) and become unrecognizable. In slotted ALOHA all stations are synchronized to a common time reference and the time is divided into fixed-size time slots. Newly arriving frames are transmitted at the beginning of the next time slot. In both ALOHA variants the transmitter starts a timer after frame transmission. The receiver has to send an immediate acknowledgment frame upon successful reception of the data frame. When the transmitter receives the acknowledgment, it stops the timer and considers the frame successfully transmitted. If the timer expires, the transmitter selects a random backoff time and waits for this time before the frame is retransmitted. The backoff time is chosen randomly to avoid synchronization of colliding stations. This protocol has two advantages: it is extremely simple and it offers short delays in case of a low network load. However, the protocol does not support priorities, and with increasing network load, the collision rate increases and the transmission delays grow as well. In addition, ALOHA is not stable: above a certain threshold load an increase in the overall load leads to a decrease in overall throughput. The maximum normalized throughput of pure ALOHA is 1/2e ª18% under Poisson arrivals and an infinite number of stations. The maximum throughput can be doubled with slotted ALOHA. A critical parameter in ALOHA is the backoff time, which is typically chosen from a certain time interval (backoff window). A collision can be interpreted as a sign of congestion. If another collision occurs after the backoff time, the next backoff time should be chosen from a larger backoff window to reduce the pressure on the channel. A popular rule for the evolution of the backoff window is the truncated binary exponential backoff scheme, where the backoff window size is doubled upon every collision. Above a certain number of failed trials, the window remains constant. After successful transmission the backoff window is restored to its original value. 1.3.3.2 CSMA Protocols In carrier-sense multiple-access (CSMA) the stations act more careful than in ALOHA: before transmitting a frame they listen on the medium (carrier sensing) to see whether it is busy or free [32, 46]. If the medium is free (many protocols require it to be contiguously free for some minimum amount of time), the station transmits its frame. If the medium is busy, the station defers transmission. The various CSMA protocols differ in the following steps. In nonpersistent CSMA the station simply defers for a random time (backoff time) without listening to the medium during this time. After this waiting time the station listens again. All other protocols discussed next wait until the end of the ongoing transmission before starting further activities. In p-persistent CSMA (0 < p < 1) the time after the preceding transmission ends is divided into time slots. A station listens to the medium at the beginning of a slot. If the medium is free, the station starts transmitting its frame with a probability p and with probability 1 – p it waits until the next slot comes. In 1-persistent CSMA the station transmits immediately without further actions. Both approaches still have the risk of collisions, since two or more stations can decide to transmit (1-persistent CSMA) or can choose the same slot (p-persistent CSMA). The problem is the following: if station A senses the medium as idle and starts transmission at time t0, station B would notice this earliest at some later time t0 + t, due to the propagation delay. If B performs carrier sensing at a time between t0 and t0 + t, it senses the medium to be idle and starts transmission too, resulting in a collision. Therefore, the collision probability depends on the propagation delay, and thus on the maximum geographical distance between stations. Similar to ALOHA, pure CSMA protocols rely on acknowledgments to recognize collisions. Although the throughput of CSMA-based protocols is much better than that of ALOHA (ongoing transmissions can be completed without disturbance), the number of collisions and their duration limit the throughput. Collision detection and collision avoidance techniques can be used to relax these problems. These are discussed in the following sections. Specifically for wireless media the task of carrier sensing is not without problems. After all, the transmitter senses the medium ultimately because he wants to know the state of the medium at the
© 2005 by CRC Press
1-10
The Industrial Communication Technology Handbook
A
C
B
FIGURE 1.7 Hidden-terminal scenario.
intended receiver, since collisions are only important at the receiver. However, due to path loss [40, Chapter 4], any signal experiences attenuation with increasing distance. If a minimum signal strength is required, the hidden-terminal problem occurs (refer to Figure 1.7): consider three stations, A, B, and C, with transmission radii as indicated by the circles. Stations A and C are in range of B, but A is not in the range of C and vice versa. If C starts to transmit to B, A cannot detect this by its carrier-sensing mechanism and considers the medium to be free. Hence, A also starts frame transmission and a collision occurs at B. For wireless media there is a second scenario where carrier sensing leads to false predictions about the channel state at the receiver: the so-called exposed-terminal scenario, depicted in Figure 1.8. The four stations A, B, C, and D are placed such that the pairs A/B, B/C, and C/D can hear each other; all remaining combinations cannot. Consider the situation where B transmits to A, and one short moment later C wants to transmit to D. Station C performs carrier sensing and senses the medium is busy, due to B’s transmission. As a result, C postpones its transmission. However, C could safely transmit its frame to D without disturbing B’s transmission to A. This leads to a loss of efficiency. Two approaches to solve these problems are busy-tone solutions [50] and the request-to-send (RTS)/ clear-to-send (CTS) protocol, as applied in the IEEE 802.11 wireless LAN (WLAN) medium access control protocol [47]. In the busy-tone approach the receiver transmits a busy-tone signal on a second channel during frame reception. Carrier sensing is performed on this second channel. This solves the exposedterminal problem. The hidden-terminal scenario is also solved, except the rare cases where A and C start their transmissions simultaneously. The RTS/CTS protocol attacks the hidden-terminal problem using only a single channel. Consider the case that A has a data frame for B. After A has obtained channel access, it sends a short RTS frame to B, indicating the time duration needed for the whole frame exchange sequence (the sequence consists of the RTS frame, the CTS frame, a data frame, and a final acknowledgment frame). If B receives the RTS frame properly, it answers with a CTS frame, indicating the time needed for the remaining frame exchange sequence. Station A starts transmission after receiving the CTS frame. Station C, hearing the RTS and CTS frames, defers its transmissions for the indicated time, thus not disturbing the ongoing frame exchange. It is a conservative choice to defer on any of these frames, but the exposed-terminal problem still exists. If station C defers only on receiving both frames, the exposed-terminal problem is solved. However, there is the risk of bit errors in the CTS frame, which may lead C to start transmissions falsely. The RTS/CTS protocol of IEEE 802.11 does not resolve collisions of RTS frames at the receiver, nor does
A
FIGURE 1.8 Exposed-terminal scenario.
© 2005 by CRC Press
B
C
D
Principles of Lower-Layer Protocols for Data Communications
1-11
it entirely solve the hidden-terminal problem [39]. Furthermore, this four-way handshake imposes serious overhead, which only pays out for large frames. 1.3.3.3 CSMA Protocols with Collision Detection If two or more stations collide without recognizing this, they would uselessly transmit their entire frames. If the stations could quickly detect a collision and abort transmission, less bandwidth is wasted. The class of carrier-sense multiple access with collision detection (CSMA/CD) protocols enhances the basic CSMA method with a collision detection facility. The collision detection is performed by reading back the signal from the cable during transmission, and by comparing the measured signal with the transmitted one. If the signals differ, a collision has been detected [23, Section 6.1.3]. When a station experiences a collision, it executes a backoff algorithm. In the IEEE 802.3 Ethernet this algorithm works with slotted time. A time slot is large enough to accommodate the maximum roundtrip time, in order to make sure that all stations have the chance to reliably recognize an ongoing transmission. As an example, in the CSMA/CD method of IEEE 802.3 a truncated binary exponential backoff scheme is used: after the first collision, a station randomly chooses to wait either 0 or 1 slot. If another station starts transmission during the waiting time, the station defers. After the second collision, a station chooses to wait between 0 and 3 slots, and for all subsequent collisions, the backoff window is doubled. After 10 collisions the backoff window is kept fixed to 1024 slots, and after 16 collisions the station gives up and discards the frame. In wireless LANs (for example, in the IEEE 802.11 wireless LAN) acknowledgment frames are used to give the transmitter feedback, since wireless transceivers cannot transmit and receive simultaneously on the same channel. The lack of an acknowledgment frame indicates either a collision or a transmission error. Furthermore, two colliding frames do not necessarily result in total loss of information: when the signal strength of one frame is much stronger than the other one, the receiver may be able to successfully decode the frame (near–far effect). 1.3.3.4 CSMA Protocols with Collision Resolution This class of CSMA protocols reacts to collisions not by going into a backoff mode and deferring transmissions, but by trying to resolve them. One approach to resolving a collision is to determine one station among the contenders, which is ultimately allowed to send its frame. One example for this is protocols with bit-wise priority arbitration like the MAC protocol of Controller Area Network (CAN) [30] and the protocol used for the D-channel of Integrated Services Digital Network (ISDN) [41]. Another approach is to give all contenders a chance to transmit, like what is done in the adaptive tree walking protocol [14], which works as follows: The time is slotted, just as in the Ethernet CSMA/CD protocol. Furthermore, all stations are arranged in a balanced binary tree T and know their respective positions in this tree. All stations wishing to transmit a frame (called backlogged stations) wait until the end of the ongoing transmission and start to transmit their frame in the first slot (slot 0). If there is only one backlogged station, then it could transmit its frame without further disturbance. If two or more stations collide, then in slot 1 only the members of the left subtree TL are allowed to try transmission again. If another collision happens, only stations of the left subtree TL,L of TL are allowed to transmit in slot 2, and so forth. On the other hand, if only one station from TL transmits its frame, then for fairness reasons the next frame transmission is reserved for a station from the right subtree TR, and so on. The bit-wise arbitration protocols do not try to be fair to stations. As an example, we present the MAC protocol of CAN. CAN requires a transmission medium that guarantees that overlapping signals do not destroy each other, but lead to a valid signal. If two stations transmit the same bit, the medium adapts the common bit value. If one station transmits a zero bit and the other a one bit, the medium adopts a well-defined state, for example, a zero bit. The CAN protocol uses a priority field of a certain length at the beginning of a MAC frame. Backlogged stations wait until the end of an ongoing frame and then transmit the first bit of their priority field. In parallel, they read back the state of the medium and compare it with their transmitted bits. If both agree, the station continues with the second bit of the priority field. If the bits differ, the station has lost contention and defers until the end of the next frame. This process
© 2005 by CRC Press
1-12
The Industrial Communication Technology Handbook
is continued until the end of the priority field is reached. If it can be guaranteed that all priority values are distinct, only one station survives contention. This protocol supports global frame priorities in a natural way, and the medium access time for the highest-priority frame is tightly bounded. However, the assignment of priorities to stations or frames is nontrivial when fairness is a goal. If the priorities are assigned on a per-station basis, the protocol is inherently unfair. One solution is to rotate station priorities over time. In CAN applications the priorities are not assigned to stations but to data. Another drawback of the protocol is that all stations have to be synchronized with a precision of a bit time, and the need for all stations to agree on the state of the medium limits either the bit rate or the geographical extension of a CAN network. 1.3.3.5 CSMA Protocols with Collision Avoidance If it is technically not feasible to immediately detect collisions, one might try to avoid them. Protocols belonging to this class are called carrier-sense multiple access with collision avoidance (CSMA/CA) protocols. An important application area is wireless LANs, where (1) stations cannot transmit and receive simultaneously on the same channel, and (2) the transmitter cannot directly detect collisions at the receiver due to path loss and the need for a minimum signal strength (see the discussion of the hiddenterminal scenario in Section 1.3.3.2). The IEEE 802.11 WLAN protocol combines two mechanisms to avoid collisions. The first one is the RTS/CTS handshake protocol described in Section 1.3.3.2. The second mechanism, the carrier-sensing mechanism of IEEE 802.11, not only requires a minimum idle time on the channel, but each station also chooses a random backoff time, during which the carrier-sense operation is continued. If another station starts to transmit in the meantime, the station defers and resumes after the other frame is finished. This approach with random backoff times also enables the introduction of stochastic priorities into IEEE 802.11: frames with different priorities can choose their backoff times from different distributions, with more important frames likely having shorter backoffs than unimportant ones. Such an approach is proposed in [11] and also used for the IEEE 802.11e extension of the IEEE 802.11 standard. Another example is the EY/NPMA protocol of HIPERLAN [18]. Here the collision avoidance part consists of three phases; all stations wishing to transmit a frame wait for the end of the ongoing transmission. In the first phase (priority phase), the stations wait for a number of slots corresponding to the frames priority (there are five distinct priorities). If station A decides to transmit in slot n and station B starts in slot m < n, then A defers, since B has a higher-priority frame. In the second phase (elimination phase), the surviving stations transmit a burst of random length before switching to receive mode. If it receives some energy, the station gives up, since another station sends a longer burst. In the third phase (yield phase), the surviving stations keep idle for a random amount of time. If another station starts to transmit in the meantime, the station defers. Otherwise, the station starts to transmit its data frame.
1.3.4 Fixed-Assignment Protocols In fixed-assignment (FA) protocols a station is assigned a channel resource (frequency, time, code, space) exclusively; i.e., it does not need to contend with other stations when using its share, and it is intrinsically guaranteed that medium access can be achieved within a bounded time. In frequency-division multiple-access (FDMA) systems the available spectrum is subdivided into N subchannels, with some guard band between them. A channel is assigned exclusively to a station. When a frame arrives, the station can transmit immediately on the assigned channel; the intended receiver has to know the channel in advance. Idle subchannels cannot be used by highly loaded stations. When a station wants to use multiple subchannels in parallel, it needs multiple transceivers. In code-division multiple-access (CDMA) systems the stations spread their frames over a much larger bandwidth than needed while using different codes to separate their transmissions. The receiver has to know the code used by the transmitter; all parallel transmissions using other codes appear as noise. Similar to FDMA, stations can transmit newly arriving frames immediately.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-13
In time-division multiple-access (TDMA) systems the time is divided into fixed-length superframes, which in turn are divided into time slots. Each station is assigned a set of slots in every superframe.* During its slots, a station can use the full channel bandwidth. The stations need to be synchronized on slot boundaries; however, the slots contain some guard times to compensate for inaccurate synchronization. In a centralized setting where all stations transmit to a central station, such inaccuracies can be introduced by different propagation delays resulting from different distances between stations and the central controller. In the GSM network a timing-advance mechanism is used to compensate different propagation delays [55]. In space-division multiple-access (SDMA) systems spatial resources are divided among stations. Consider, for example, a cellular system, where the base station is equipped with a smart antenna array. Using this array, the base station can form a number of spatially directed spot beams and focus them to the stations. If a beam covers two or more stations, they have to share the channel by some other protocol, but stations in different beams can transmit in parallel. Another example is the use of sectored antennas in cellular systems. In all these schemes the allocation of channel resources to stations can be static or dynamic. In the static case the allocation may be preconfigured. In a dynamic scheme a station requests the resource once from some resource management facility, which may be part of a central station/access point. The TimeTriggered Protocol is an example of such a scheme [51]. Another example is the cyclic window in WorldFIP, which offers a preconfigured allocation of time slots. Some light dynamics can be introduced by changing the allocation tables at appropriate times. The fixed assignment of channel resources is advantageous, specifically for industrial applications, for the following reasons: • It allows the guarantee of a minimum bandwidth to a station. • It allows the guarantee of a strictly bounded medium access time as well as strictly isochronous service.
1.3.5 Demand-Assignment Protocols In demand-assignment (DA) protocols channel resources are also assigned exclusively to a station, but on a much shorter timescale than for fixed-assignment protocols. In the latter case, assignment happens once and lasts for the lifetime of a session, while in demand-assignment protocols resources are assigned only for the duration of a data burst. Consequently, for each new data burst a station must obtain new channel resources. Clearly, this involves appropriate signaling mechanisms. We can broadly distinguish two classes of DA protocols: In distributed protocols there is no central authority for resource allocation; instead, token-passing schemes are often used. On the other hand, in centralized protocols the stations have to signal their demands to a central station, which assigns resources and schedules transmissions. The signaling channel can be either in-band (requests can be piggybacked to transmissions of data or control frames) or a separate logical signaling channel with its own medium access procedure, for example, ALOHA/slotted ALOHA. For industrial applications demand-assignment protocols have two major advantages: they can guarantee a bounded medium access time and they allow the use of idle resources. However, for the distributed schemes there is inevitably some jitter in the medium access times, which hinders strictly isochronous services. The centralized schemes introduce a single point of failure, namely, the resource manager. 1.3.5.1 Centralized Schemes: Hub-Polling Protocols and Reservation Protocols As a very general description [44], a hub-polling system consists of a central station (called hub) and a number of stations, with each station conceptually having a queue of frames. The hub carries out two different tasks: (1) it queries the queue states from the stations, and (2) it assigns bandwidth to the stations according to the query results and some polling policy. Typically it is assumed that a query is less costly *It is perfectly possible to assign slots every k-th superframe as well.
© 2005 by CRC Press
1-14
The Industrial Communication Technology Handbook
than to serve a frame; otherwise, the query overhead would not be justified. To be queried, a station must register itself with the hub. Polling schemes differ in the sequence by which stations are polled: • In round-robin, the stations are visited one after another. • In table-driven schemes, the next station to be visited is determined from a prespecified table. • In random polling, the next station to poll is determined randomly. Furthermore, they differ in the type of service a polled station is granted: • k-limited service: Up to k frames are served per station before proceeding to the next station. • Time-limited service: The station may transmit frames, including retransmissions, for no longer than a specified time. • Exhaustive service: A queue is serviced until it is empty. • Gated service: The server serves only those frames of station i that were already present when starting service for i. As an example, the master/slave protocol of PROFIBUS can be classified as a table-driven and timelimited service (however, with varying masters). In the BITBUS protocol [29], the role of the master does not change over time. A variation of hub-polling protocols is probing protocols [24, 42]. These are based on the observation that polling each station separately is wasteful, if the load is low. Instead, it is more effective to poll a group of stations as a whole. For example, the hub may announce that a random access slot follows, which can be used by stations belonging to a certain group to signal their transmission needs. If no station answers, the next group can be polled. If a single station answers, it is granted access to the medium. If two or more stations answer, their requests will collide in the random access slot. Different methods can now be applied to resolve this collision, for example, the tree walking approach discussed in Section 1.3.3.4, or all stations in the group can be polled separately. In [56] the latter approach is introduced, along with a scheme that adapts the group sizes to the current load. In reservation protocols the stations have to send a reservation message to the resource manager. The reservation message may specify the length of the desired data transmission and its timing constraints. The resource manager can perform an admission control test to decide whether the request can be satisfied without harming the guarantees given to already admitted requests. After successful admission control, the resource manager sends some feedback describing the allocated resources (for example, the time slots to use). There are three common methods to transmit reservation messages: (1) in piggybacking schemes the reservation requests are sent along with already admitted data or control frames; (2) the stations send request frames on a separate signaling channel using a contention-based MAC protocol (ALOHA or CSMA protocols); and (3) the resource manager may poll all stations that are currently idle and thus cannot use piggybacking. Many protocols developed in the context of wireless ATM [17] belong to this class, for example, the MASCARA protocol [38]. The FTT-CAN [10] protocol is another example of this class, where stations send reservation requests for periodic transmissions to a central master station. 1.3.5.2 Distributed Schemes: Token-Passing Protocols In distributed schemes there is no central facility controlling resource allocation or medium access. Instead, a special frame, called the token frame, circulates between stations. Only the station that currently holds the token (token owner) is allowed to initiate transmissions. After some time, the token owner must pass the token to another station by sending a token frame. Token-passing schemes can be applied in networks with a ring topology (examples: IEEE 802.5 Token Ring [28] or Fiber Distributed Data Interface (FDDI) [15, 31]) or with a bus/tree topology (examples: IEEE 802.4 Token Bus [27] or PROFIBUS with the FMS profile [52]). To guarantee an upper bound on medium access delay, the IEEE Token Bus, FDDI, and PROFIBUS protocols use variants of the timed-token protocol [9]. In this protocol all stations agree on a common parameter, the target token rotation time TTTRT . Furthermore, each station is required to measure the time TRT that passed between the last time it received the token and the actual token reception time. This time
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-15
is called the token rotation time. If the difference TTTRT – TRT is positive, the arriving token is called an early token; otherwise, it is called a late token. Some protocols forbid a station to transmit when receiving a late token; however, the PROFIBUS protocol allows transmission of a single high-priority frame in case of a late token. When a station receives an early token, it may transmit for a time corresponding to TTTRT – TRT . This way, the timed-token protocol guarantees an upper bound on the medium access delay. Token-passing protocols over broadcast media (bus, tree) construct a logical token-passing ring. The token frame is passed among all stations in this ring; each station gets the token once per cycle. The ring members have the additional burden to execute ring maintenance algorithms, which include, among others, new stations, excluding leaving or crashed stations; detect and repair lost tokens; and some more. These mechanisms rely on control frames and are designed in a way that they do not harm the timing guarantees given by the timed-token protocol.
1.3.6 Meta-MAC Protocols In reference [19] meta-MAC protocols are introduced. The basic idea is simple and elegant: a station contains not only a single MAC instance, but several of them, running in parallel. These can be entirely different protocols or the same protocol, but with different parameters. However, only one protocol is really active at a given time in the sense that its decisions (transmit/not transmit) are executed; in the other instances decisions are only recorded. From time to time a new active protocol is selected. This selection is based on history information about transmission outcomes (success, failure). For each candidate protocol it is evaluated how successful the protocol would have been given the outcomes in the history. For example, a protocol that produced a lot of transmit decisions in successful time slots would get a high ranking, while a protocol whose transmit decisions would have resulted in collisions gets a bad ranking. Based on this ranking, a new protocol is chosen.
1.4 Error Control Techniques When a packet is transmitted over a physical channel, it might be subject to distortions and channel errors. Potential error sources are noise, interference, loss of signal power, etc. As a result, a packet may be either completely or partially lost (for example, when the receiver fails to acquire bit synchronization or loses it somewhere), or a number of the bits within a packet are modified. In some types of channels, errors occur quite frequently, with wireless channels being the prime example [79]. One option to deal with errors is to tolerate them. For example, in Voice-over-IP systems a loss rate of speech packets of approximately 1% still gives an acceptable speech quality at the receiver, depending on the codec and the influence of error concealment techniques [65, Chapter 7]. Hence, as long as the loss rate is below this level, no action needs to be taken. However, in safety-critical industrial applications, errors are often not tolerable; they must be detected and subsequently corrected. There are the following fundamental approaches to error control [61, 69, 70]: • In open-loop approaches the transmitter receives no feedback from the receiver about the transmission outcomes. Redundancy is introduced to protect the transmitted data against a certain amount of errors. • In closed-loop schemes the transmitter gets feedback about erroneously received packets. The receiver requests retransmission of these packets by the transmitter. • In hybrid schemes these two approaches are combined. The detection of errors is based on checksums, which are appended to a packet. Well-known kinds of checksums are cyclic redundancy checks (CRCs) or parity bits [69]. However, no checksum algorithm is perfect; there are always bit error patterns that cannot be detected by a checksum algorithm. Hence, the residual error probability is nonzero, but fortunately very small for many practical channels. A study of the performance of checksum algorithms over real data is [76]. There is a rich literature on error control. Some standard references are [61], [69], [70], and [71].
© 2005 by CRC Press
1-16
The Industrial Communication Technology Handbook
1.4.1 Open-Loop Approaches In general, open-loop approaches involve redundant data transmission. Several kinds of redundancy can be used: • Send multiple copies of a packet. • Add redundancy bits to the packet data. • Diversity techniques. In the multiple-copies approach, the transmitter sends K identical copies of the same packet [57], each one equipped with a checksum. If the receiver receives at least one copy without checksum errors, this is accepted as the correct packet. If the receiver receives all copies with checksum errors, it might apply a bit-by-bit majority voting scheme [73, Chapter 4] on all received copies and check the result again. A variation of the multiple-copies scheme is to not send multiple copies of the same packet, but to send each bit of the user data multiple times: instead of sending 00110101, the transmitter sends, for example, 000.000.111.111.000.111.000.111. Hence, each user bit is transmitted three times and the receiver applies majority voting to each group of three bits. In error-correcting codes or forward error correction (FEC) codes to k bits of user data, a number n – k of redundancy bits are appended and the block of n bits is transmitted (the fraction k/n is called code rate), such that bit errors can be detected and a limited amount of bit errors can be corrected [69–71]. In block coding schemes, the user data are divided into blocks of k bits and each block is coded independently. Some well-known block FEC schemes are Reed–Solomon codes, Hamming codes, and Bose–Chaudhuri–Hocquenghem (BCH) codes. In convolutional coding schemes, the encoder has some memory, such that the coding of the current bit affects coding of future bits. Therefore, there are no clear block boundaries. Recently, the class of turbo codes has attracted considerable attention [75, 60]. In this class of codes two convolutional codes are concatenated and combined with an interleaver [58]. Diversity techniques are often applied on wireless channels. In general, in diversity schemes multiple copies of the same signal are created and the receiver tries to combine these copies in a sensible way. These copies can be created either explicitly (by sending the same packet multiple times on the same channel, on different channels, in different directions, etc.) or implicitly (by letting the channel create the multiple signal copies — reflections). In the case of receiver diversity, the receiver is equipped with two or more antennas. If these are appropriately spaced [72], the antennas receive two copies of the transmitted waveform, which in the best case are uncorrelated. Hence, it might happen that one antenna receives only a weak signal while the other one experiences good signal quality. The two antenna signals may then be combined in different ways. Clearly, there are many more diversity schemes.
1.4.2 Closed-Loop Approaches In closed-loop approaches receiving station B checks the arriving packets sent by station A by means of checksums and sequence numbers. In addition, station B provides A with feedback information indicating the transmission outcome (success or failure). Usually, B sends acknowledgment frames to provide this feedback to A, but the feedback information may as well be piggybacked onto data frames sent from B to A. Automatic repeat request (ARQ) protocols implement this approach [23, 63]. Some basic ARQ protocols are the send-and-wait/alternating-bit protocol, the Goback-N protocol, and the SelectiveRepeat protocol. In the send-and-wait/alternating-bit protocol, the transmitter sends a packet and starts a timer. The receiver sends an acknowledgment if the packet is received correctly; otherwise, it keeps quiet. If the transmitter receives the acknowledgment, the timer is canceled and the next packet is transmitted. If the transmitter’s timer expires without acknowledgment, the transmitter retransmits the packet. A 1-bit sequence number is used to prevent duplicates at the receiver, which can occur if not the data frame but the acknowledgment is lost and the same data frame is transmitted again. If the receiver receives a duplicate packet, the packet is acknowledged, but the data are not delivered to the user. This protocol is simple and works reliably as long as the delay for data packets or acknowledgments can be upper bounded.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-17
A drawback of this protocol is its inability to fill “long fat pipes” (links with a high-bandwidth-delay product), since there can be at most one outstanding (not yet acknowledged) frame at any time. Both the Goback-N and the Selective-Repeat protocols are not restricted to a single unacknowledged or outstanding frame, but allow for multiple outstanding frames; these protocols are also called slidingwindow protocols. The frames are identified by sequence numbers. In the Goback-N protocol there may be up to N outstanding frames. The transmitter sets a timer for each transmitted frame and resets it as soon as an acknowledgment for this frame is received. If the receiver receives an in-sequence frame, it delivers the frame to its local user and sends a positive acknowledgment; otherwise, the frame is dropped (even if it is received correctly) and the receiver sends a negative acknowledgment or keeps quiet. When the transmitter receives a negative acknowledgment for an outstanding frame, or if the timer for this frame expires, it retransmits this frame and all subsequent outstanding frames. Therefore, it might happen that correctly received frames are retransmitted, which is inefficient. This drawback is attacked by the Selective-Repeat protocol, which works similar to Goback-N, but allows the receiver to buffer and acknowledge frames that are not received in sequence. As soon as the missing frames arrive, the buffered frames are delivered to the user in their correct sequence and the buffer is freed.
1.4.3 Hybrid Approaches Open-loop and closed-loop approaches can be combined, forming so-called hybrid ARQ protocols. Some simple schemes are: • To each packet some light FEC is applied; the remaining errors are corrected through the ARQ mechanism. • Normal packets are not FEC coded; only retransmissions are. • Increase the amount of redundancy for subsequent retransmissions, for example, by adapting the number of copies in a multicopy approach [57]. Another line of attack would be to make the receiver more clever and to take advantage of the information contained in already received erroneous packets by using packet-combining methods [64, 66, 78], for example, equal-gain combining or bit-by-bit majority voting. Such an approach is also referred to as type II hybrid ARQ [70]. In [59] and [77] this approach has been made deadline aware, by adopting the strategy to increase the coding strength (decreasing the code rate) more and more as the packet deadline comes closer (deadline-dependent coding). In reference [62] a scheme is described that takes both the estimated channel state and the packet deadline into account to select one coding scheme from a set of available schemes. For a bursty wireless channel, this scheme reduces the bandwidth need for a prescribed maximum failure probability, compared to a static scheme solely taking the channel state into account. A scheme that utilizes already received and partially erroneous packets but does not require redundancy is the intermediate checksum method (see, for example, [68]), where a packet is not equipped with a single checksum covering its whole contents, but is subdivided into several chunks, such that each chunk is equipped with a separate checksum. The receiver requests only the erroneous chunks for retransmission.
1.4.4 Further Countermeasures The transmitter has some further control knobs to reduce the probability of packet errors at the receiver. One control knob is the packet length: in a scenario where a packet is equipped with a checksum but not with redundant FEC data, longer packets have a higher probability of being received in error. On the other hand, short packets have a higher probability of being received successfully, but the overhead of the fixed-length packet header becomes prohibitive. For a given channel quality there is an optimum packet length, and adaptive schemes exist to find this [67, 68]. It is a fundamental communications law that the bit error rate at the receiver depends on the ratio of the energy expended per bit to the channel noise level [74]. There are two possibilities to use this relationship to increase transmission reliability:
© 2005 by CRC Press
1-18
The Industrial Communication Technology Handbook
• If the transmit power is increased, the energy per bit is increased and the bit error rate is reduced. However, often the transmit power is technically or legally restricted. • If the bits are transmitted at lower speed, the energy per bit is increased, too. Hence, a transmitter might apply transmit power control or modulation rate control.
1.5 Flow Control Mechanisms Flow control compensates different processing speeds of transmitters and receivers [12, Chapter 6], [23], [80]. Specifically, if the receiver does not have enough resources (buffers, processing speed) to process packets as fast as the transmitter sends them, mechanisms to slow down the transmitter are useful. Otherwise, the receiver would have to drop data packets, causing the transmitter to retransmit them and to waste further network resources. It is therefore necessary for the transmitter to receive feedback from the receiver. The function of flow control is to be distinguished from congestion control, although many authors consider the former to be a special case of the latter. Congestion control is relevant in multihop networks, where two end nodes are connected by a series of intermediate nodes, for example, routers. In congestion control, it is not the ultimate receiver but the intermediate nodes that need to be protected against resource exhaustion. However, here we do not discuss congestion control any further. Fieldbus systems offer different communication models. One important model is where all communications are performed between individually addressable stations and all packets are delivered from the link layer to the upper layers. Representatives of this class are PROFIBUS [52] and Foundation/IEC Fieldbus [26]. These systems can benefit from flow control mechanisms. Another important model is that of a real-time database, where not stations but data are addressed. The owner of a data item (the producer) broadcasts it, and all nodes interested in the data (the consumers) copy the data item into a preallocated local buffer. Each time the consumer receives an updated version of the same data item, it copies the data silently into the local buffer without notifying the applications running on the consumer node. The latter read the buffer contents when they need the value of the data item. Reading from and writing to this buffer are decoupled, and reading the buffer does not trigger any communications to fetch the data from the producer. Representatives of this class are CAN [30] and Factory Instrumentation Protocol (FIP)/WorldFIP [53]. Flow control is not an issue here, since the buffers are preallocated. Conceptually, flow control mechanisms need two key ingredients: • A signaling mechanism provides the transmitter with information about the available resources at the receiver. Different signaling mechanisms can vary in their accuracy (number of distinguishable states of receiver resource utilization), update frequency, signaling path (in-band or out-of-band), and relationship to other mechanisms, for example, error control. • A signaling answer determines the transmitter’s reaction to flow control signals. Flow control is not restricted to the data link layer, but is used in higher-layer protocols like the Transmission Control Protocol (TCP) as well. In the following we describe some of the most important mechanisms frequently found on the link layer. A more general discussion can be found in textbooks like [12], [23], and [45].
1.5.1 XON/XOFF and Similar Methods This family of flow control methods is simple. The receiver distinguishes only two different states: ready or not ready to accept frames. The transmitter, upon acquiring a ready signal, transmits frames at an arbitrary rate until it acquires a not-ready signal. After this, the transmitter does not transmit any data packets until a ready signal is again acquired. This basic scheme is implemented in different protocols. One application of this scheme can be found in the ITU V.24 recommendation (equivalent to EIA RS232-C), providing an interface between a DTE (data terminal equipment, for example, a computer) and
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-19
a DCE (data communications equipment, for example, a modem). This interface targets asynchronous and serial communications. It provides either 9 or 25 lines, 2 of which are the request-to-send and clearto-send lines. When the DTE wants to send data to the DCE, it raises the RTS line. If the DCE is willing to accept data, it answers by raising the CTS line. As soon as the DCE lowers the CTS signal, the DTE must stop transmission. This is an out-of-band signaling mechanism since data and flow control signals do not share the same line. The XON/XOFF mechanism is employed, for example, in the North American Digital Data System (DDS) [81, Section 24-11]. This mechanism rests on two characters of the underlying charset. For example, in the ASCII charset the DC1 character is used for XON and the DC3 character is used for XOFF. When the transmitter receives an XOFF character, it must stop its transmission, and it may resume as soon as XON is received. This is an in-band mechanism; the occurrence of these characters in the payload must be prevented by using proper escaping mechanisms (see also Section 1.2). The HDLC family of link layer protocols [81, Section 26-2] uses special supervisory frames for executing flow control: the RR (receive ready) and RNR (receive not-ready) frames. Since user data are transmitted in another type of frames, the RR/RNR mechanism uses out-of-band signaling. The receiver issues an RNR frame when all its buffers are full. As soon as it can accept new data, the receiver sends an RR frame. This frame also contains the sequence number of the next expected data frame (see Section 1.7).
1.5.2 Sliding-Window Flow Control In this class of schemes flow control is integrated with a sliding-window ARQ protocol like Goback-N or Selective-Repeat (see Section 1.4.2). The transmitter has a buffer for a number W of packets, called its window. The window size W specifies the number of allowed outstanding packets, i.e., packets for which the transmitter has not received an acknowledgment (yet). The receiver can use this for flow control by delaying its acknowledgments. This approach is tightly integrated with the ARQ protocol and does not need any extra control frames or escape mechanisms. However, there are two important drawbacks: • Delaying acknowledgments is not a good idea for time-critical transmissions with desired response times in the millisecond range. • Even without real-time requirements, the link layer protocol does not wait arbitrarily long for acknowledgments. Instead, with each frame a timeout is associated. In link layer protocols these timeouts are typically chosen such that propagation delays, packet generation delays, and processing speeds are included, but nothing more. This is in sharp contrast to multihop networks, where queueing delays are a significant fraction of the overall delay. In multihop networks, timeouts are therefore chosen much larger than necessary for a single link. Either way, if a timeout occurs, the transmitter retransmits the packet. If the receiver is still busy, the retransmission is wasted.
1.5.3 Further Mechanisms One straightforward approach to flow control can be applied in connection-oriented link layer protocols. Upon connection setup, the receiving station specifies a rate by which the transmitter can send packets [80], and which the receiver guarantees to always accept. Instead of a rate specification, the receiver can also specify the parameters (s, r) of a leaky bucket. The leaky-bucket scheme works as follows: The transmitter generates permits at rate r, i.e., every 1/r seconds. The transmitter is allowed to keep a maximum number of s permits; any permit in excess of this number is dropped. When a packet is to be transmitted, it is checked whether there is a permit. If so, the number of stored permits is decremented and the packet is transmitted. Otherwise, the transmitter has to wait for the arrival of the next permit. Another approach is used by TCP. TCP contains a flow control mechanism where the receiving end of a TCP connection tells the sender explicitly about its available buffer space (the advertised window). The advertised window is part of the TCP header and carried in each acknowledgment or data packet
© 2005 by CRC Press
1-20
The Industrial Communication Technology Handbook
going back from the receiver to the transmitter. This mechanism is independent of any underlying link layer flow control mechanism.
1.6 Packet Scheduling Algorithms At an abstract level, a packet scheduling algorithm selects the packet to be transmitted next, after service of the current packet has been finished. The packet is selected from a set of waiting packets. The packet waiting room can be located within a single station, but it can also be distributed over several stations. In the latter case, a MAC protocol can be considered part of a packet scheduling algorithm, since it actually decides about the station transmitting a packet, and the winning station has to make a local decision regarding which of its waiting packets to transmit. In this section we consequently restrict the perspective to a single station and its packet scheduler. As opposed to processor scheduling algorithms, packet scheduling algorithms are nonpreemptive; i.e., an ongoing transmission is not interrupted upon arrival of a more important packet. A packet scheduler bases its decision upon some performance objectives to be optimized. Typical objectives are delay, avoiding deadline misses, jitter avoidance, fairness, throughput, and priority. In the absence of any specific criterion, packets are often served on a first-come, first-served (FCFS) basis. In this section we discuss some popular scheduling schemes. A more detailed introduction to packet scheduling and more general scheduling problems are [87, Chapter 9] and [86].
1.6.1 Priority Scheduling In priority scheduling each packet is tagged with an explicit priority value, or the packet priority is derived from other packet attributes like addresses, packet types, and so on. The scheduler always selects the packet that currently has the highest priority. Multiple packets of the same highest priority are served in random order or FCFS order. Some algorithms map time-dependent information onto priorities. One example is the rate-monotonic scheduling algorithm [88] and its nonpreemptive extensions. Here it is assumed that the packets are generated from different periodic streams or flows, and each packet is associated with a deadline corresponding to its flow period. The priorities are then assigned in inverse order of the periods; therefore, the stream with the smallest period receives highest priority. Another example is the earliest-deadlinefirst (EDF) algorithm where the packet with the tightest deadline has highest priority.
1.6.2 Fair Scheduling In the last years many algorithms for fair queueing have been developed. The packets are grouped into distinct flows, and to each flow a separate queue is associated. Within a flow, packets are served in FCFS order. A nonempty queue is said to be backlogged. The goal of a fair scheduling algorithm is twofold: • Each backlogged flow should get a minimum share of the available bandwidth independent of the behavior of the other flows (firewall property). • The bandwidth of a currently inactive flow should be fairly distributed to the other flows, thus making efficient use of bandwidth. One of the simplest fair queueing algorithms is the round-robin algorithm, where all backlogged queues are served in round-robin order. A modification of this scheme is weighted round-robin, where each flow i is associated with a specific weight fi such that
 f = 1 (F is the set of all flows). The time is i
iŒF
divided into epochs of fixed upper length t. At the start of an epoch the scheduler determines the set of backlogged queues. The first nonempty queue j receives service until it is empty or its transmission time approaches its share of fj · t of the overall epoch. Following this, the second nonempty queue is served, and so on. The next epoch starts when all nonempty queues of the previous epoch are served.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-21
Other fair queueing algorithms have been derived from the generalized processor sharing (GPS) approach [90–92]. In its pure form, GPS assumes a number of flows with associated weights fi
Â
Ê fi = 1ˆ˜ . All backlogged queues are served in parallel. Serving a queue means transmission of the Á ¯ Ë iŒF packet at the head of the queue. As soon as this packet is finished, the next packet’s transmission is started. It can be shown that GPS has the following desirable property: if queue i is always nonempty during time interval (t1,t2) and if Wi(t1,t2) is the amount of service queue i receives during (t1,t2), then for a GPS server the following holds: Wi (t 1,t 2 ) fi ≥ W j (t 1,t 2 ) f j for all sessions j (except those with Wj(t1,t2) = 0). If both sessions i and j are backlogged during (t1,t2), then we have Wi (t 1,t 2 ) fi = W j (t 1,t 2 ) f j This scheme, however, is not directly usable for packet-based communications, since at most one packet can be transmitted at a time. Therefore, packet-based approximations to GPS have been developed. A good approximation strategy tries to pick the packets in the same order as they would finish under GPS. This decision has to be made each time a packet has finished transmission and the next packet is to be picked. However, at this time the packet that would finish next under GPS may not have arrived yet. The scheduler can only take the currently backlogged queues into account when making its decision. In weighted fair queueing (WFQ) the scheduler simulates the GPS operation. More specifically, for each flow a virtual time is maintained. For the k-th packet of flow i, the virtual start and finish times Sik and Fik are defined as [84] Sik = max{Fik -1, V (aik )} Fik = Sik +
Lki ri
where Fi 0 = 0, ri = r · fi (r is the overall link capacity), aik is the arrival time of the k-th packet in flow i, Lki is its length, and V(·) is a so-called virtual-time function. This basic algorithm can be performed with different virtual-time functions (vtf). For vtf of WFQ, the following holds: VWFQ(t 1 ) = 0 ∂VWFQ(t) ∂t
=
1
Â
fi
iŒBWFQ ( t )
t1 denotes the beginning of a system busy period and BWFQ(t) denotes the set of backlogged queues at time t. By this definition the vtf may change on every packet arrival or departure, and the vtf needs to be tracked by the scheduler. Therefore, the vtf has a computational complexity proportional to the number of flows. A number of schemes have been developed with similar properties but lower complexity. One example is frame-based fair queueing [93]. Both GPS and WFQ can be shown to guarantee a minimum service rate of ri = r · fi to a flow i. If flow i is leaky bucket constrained (i.e., the packets have a minimum
© 2005 by CRC Press
1-22
The Industrial Communication Technology Handbook
interarrival time and the packet length is appropriately bounded), then it can also be shown that for each packet an upper bound on its finishing time can be guaranteed. In the context of wireless transmission media, the situation changes slightly: to avoid waste of resources, the scheduler should pick a packet only from those backlogged flows where the head-of-queue packet is destined to a station for which the wireless channel is currently in a good state. Therefore, a backlogged flow does not receive service if its packet is likely to fail [85]. A number of wireless fair queueing schemes has been developed (see, for example, [89]), differing, for example, in the amount of compensation granted to flows suffering from bad channels for some time.
1.7 Link Layer Protocols In this section we present two standard link layer protocols. In general, a link layer protocol combines several of the mechanisms discussed in the previous sections.
1.7.1 The HDLC Protocol Family The HDLC protocol (high-level data link control) [81, Section 26.2; 82] can be considered the “mother” of many link layer protocols, including LAPB (used in X.25), LAPD (used in ISDN), LAPM (used in GSM), and the IEEE 802.2 Logical Link Control (LLC) protocol discussed in Section 1.7.2. An HDLC variant is also used in the IEEE BITBUS standard [29]. It is designed for point-to-point links; however, it can also be used over multiple-access channels with unique station addresses. The HDLC protocol distinguishes the following station types: • A primary station controls the link; it is responsible for error control, flow control, and setup and teardown of the logical link. All frames generated by a primary station are called commands. • A secondary station is controlled by a primary station. Specifically, it may not initiate data transfers on its own. Frames generated by a secondary station are called responses. • A combined station combines these two roles. These two station types can be used in two different configurations: • In the unbalanced configuration there is a single primary station and a number of secondary stations with distinct addresses. • In the balanced configuration two combined stations are connected. Since either station is both a primary and a secondary station, both can initiate data transfers. The HDLC protocol offers three modes of operation: • In the normal response mode (NRM) there is a central coordinator (a primary station) and a number of secondary stations. The secondary stations only send frames upon being polled by the primary station. • In the asynchronous response mode (ARM) the same configuration is used as in the normal response mode, but a secondary station may send frames on its own, without having to wait to be polled. • The asynchronous balanced mode (ABM) is used on point-to-point links. There is a combined station at either end of the link. This mode is used, for example, in the X.25 link layer and in the IEEE 802.2 Logical Link Control (see next section). The protocol is built upon three different frame types, illustrated in Figure 1.9. The general frame format has a flag field, used for bit and frame synchronization (see Section 1.2); an address field to identify a specific secondary on a multipoint link; a control field (explained below); an optional information field carrying the user data; a frame check sequence (FCS) field containing a 16- or 32-bit CRC checksum; and a closing flag. The three frame types are distinguished by their purpose and the different layouts of the control field:
© 2005 by CRC Press
1-23
Principles of Lower-Layer Protocols for Data Communications
8 bits
x * 8 bits
Flag
8/16 bits
Address
N bits
Control
Iframe
0
N(S)
Uframe
1
1
M
M
P/F
Sframe
1
0
S
S
P/F
N(S) N(S) P /F
Information
16/32 bits
8 bits
FCS
Flag
N(R) N(R) N(R)
M
M
M
N(R) N(R) N(R)
FIGURE 1.9 HDLC frame structure.
• Supervisory frames (S-frames) are used for error control and flow control purposes: — The two S-bits in the supervisory frame correspond to four different receiver answers: RR and RNR are used for flow control (see Section 1.5), whereas the two answers REJ and SREJ belong to the Goback-N or Selective-Repeat ARQ protocol (the HDLC frame format is usable for both protocols). — The P/F (poll/final) bit is a poll bit if it is sent in a command frame; otherwise, it is called the final bit. If the poll bit is set, the primary requires an acknowledgment for the corresponding command frame. If the secondary answers with the final bit set to one, this indicates that the command frame has been received and the corresponding command has been successfully executed. If the final bit is zero, the secondary indicates successful reception of the command frame, but executing the requested command has not (yet) been finished. — The receiver sequence number N(R) is discussed below. • Unnumbered frames (U-frames) are used for link management purposes (link setup, teardown). The five M-bits (mode bits) encode commands and responses. When used as a command, the primary can set the secondary’s mode of operation (ABM, ARM, NRM), reset the secondary, disconnect from the secondary, or reject a frame. When used as a response, the M-bits either acknowledge or reject a command. • Information frames (I-frames) carry user data. The control field contains transmit and receive sequence numbers. A station transmitting data packets equips each I-frame with a sequence number N(S). The receiver checks whether N(S) is the same as the expected sequence number N(R). If so, the receiver increments N(R) and sends an acknowledgment carrying the new value. This acknowledgment is either piggybacked onto an I-frame going in the opposite direction or sent as a separate S-frame when there is no outgoing I-frame ready for transfer. With the help of these sequence numbers, the receiver can detect lost and duplicate frames. HDLC supports procedures for setup and teardown of a logical connection and allows specification of the operation mode (ABM, ARM, NRM) between two stations. With the available frame types, different ARQ protocols can be implemented, most notably Goback-N and Selective-Repeat (called Selective-Reject in the context of HDLC). The several HDLC variants differ in their ARQ protocols and in the supported modes. In the next section we briefly discuss one of HDLC’s descendants, the IEEE 802.2 Logical Link Control protocol.
1.7.2 The IEEE 802.2 LLC Protocol The IEEE 802.2/ISO/IEC 8802-2 Logical Link Control (LLC) protocol [83] is a member of the IEEE 802.x family of MAC and link layer protocols. Specifically, it operates on top of the different IEEE 802.x MAC protocols, like Ethernet (IEEE 802.3), Token Bus (IEEE 802.4), Token Ring (IEEE 802.5), or wireless LAN
© 2005 by CRC Press
1-24
The Industrial Communication Technology Handbook
(IEEE 802.11). The LLC protocol offers three services to upper layers: an unacknowledged connectionless datagram service (best effort), an acknowledged connectionless datagram service, and a reliable connection-oriented service. The LLC can run with several MAC protocols because it makes rather weak assumptions about the MAC services: nothing more than a connectionless best-effort service is assumed. All these services use addressing information consisting of four attributes: source and destination MAC addresses as well as source and destination service access points (SAPs). Consequently, all packets carry SAP addresses in addition to MAC addresses. The connection-oriented service requires explicit setup and teardown of a link layer connection. A link layer connection is characterized by source and destination MAC and SAP addresses. For each link layer connection there is a separate connection context, which includes, among others, the sequence numbers. A connection provides reliable and in-sequence data delivery, and it is additionally possible to request flow control operations. Specifically, the upper layers can specify an amount of data they are willing to accept. The sequence number fields are larger than shown in Figure 1.9; up to 127 sequence numbers can be distinguished. The ARQ protocol is essentially Goback-N; the Selective-Reject feature of HDLC is not used. The LLC uses the asynchronous balanced mode.
Abbreviations ATM — Asynchronous Transfer Mode EIA — Electrical Industries Association FMS –– Fieldbus Message Specification FTT — Flexible Time–Triggered GSM — Global System for Mobile communications ITU — International Telecommunications Union LAP — Link Access Procedure
References Bit and Frame Synchronization [1] Stuart Cheshire and Mary Baker. Consistent overhead byte stuffing. ACM SIGCOMM Computer Communication Review, 27:209–220, 1997. [2] IEEE. Carrier Sense Multiple Access with Collision Detection (CSMA/CD): ETHERNET, 1985. [3] International Organization for Standardization (ISO). IS 1177-1985, Character Structure for Start/ Stop and Synchronous Character Oriented Transmission, 1985. [4] J. Manchester, J. Anderson, B. Doshi, and S. Dravida. IP over SONET. IEEE Communications Magazine, 36(5): 136–142, May 1998. [5] W. Simpson. RFC 1661, The Point-to-Point Protocol (PPP), July 1994. [6] W. Simpson. RFC 1662, PPP in HDLC-Like Framing, July 1994. Obsolete RFC 1549. Status: STANDARD.
Medium Access Control Protocols [7] Norman Abramson. Development of the ALOHANET. IEEE Transactions on Information Theory, 31:119–123, 1985. [8] Norman Abramson, Editor. Multiple Access Communications: Foundations for Emerging Technologies. IEEE Press, New York, 1993. [9] G. Agrawal, B. Chen, W. Zhao, and S. Davari. Guaranteeing synchronous message deadlines with the timed-token medium access control protocol. IEEE Transactions on Computers, 43:327–339, 1994. [10] Luis Almeida, Paulo Pedreiras, and Jose Alberto G. Fonseca. The FFT-CAN protocol: why and how. IEEE Transactions on Industrial Electronics, 49:1189–1201, 2002.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-25
[11] Michael Berry, Andrew T. Campbell, and Andras Veres. Distributed control algorithms for service differentiation in wireless packet networks. In Proc. INFOCOM 2001, Anchorage, AK, April 2001. IEEE. [12] D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, Englewood Cliffs, NJ, 1987. [13] Bluetooth Consortium. Specification of the Bluetooth System. http://www.bluetooth.org, 1999. [14] J.I. Capetanakis. Tree algorithm for packet broadcast channels. IEEE Transactions on Information Theory, 25:505–515, 1979. [15] Biao Chen, Nicholas Malcolm, and Wei Zhao. Fiber distributed data interface and its use for timecritical applications. In Jerry D. Gibson, editor, The Communications Handbook, pp. 597–610. CRC Press/IEEE Press, Boca Raton, FL, 1996. [16] Carla-Fabiana Chiasserini and Ramesh R. Rao. Coexistence mechanisms for interference mitigation in the 2.4-GHz ISM band. IEEE Transactions on Wireless Communications, 2:964–975, 2003. [17] Lou Dellaverson and Wendy Dellaverson. Distributed channel access on wireless ATM links. IEEE Communications Magazine, 35:110–113, 1997. [18] ETSI. High Performance Radio Local Area Network (HIPERLAN): Draft Standard, March 1996. [19] Andras Farago, Andrew D. Myers, Violet R. Syrotiuk, and Gergely V. Zaruba. Meta-MAC protocols: automatic combination of MAC Protocols to optimize performance for unknown conditions. IEEE Journal on Selected Areas in Communications, 18:1670–1681, 2000. [20] Robert G. Gallager. A perspective on multiaccess channels. IEEE Transactions on Information Theory, 31:124–142, 1985. [21] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3, 2–15, 2000. http://www.comsoc.org/pubs/surveys. [22] Jaap C. Haartsen. The Bluetooth radio system. IEEE Personal Communications, 7:28–36, 2000. [23] Fred Halsall. Data Communications, Computer Networks and Open Systems. Addison-Wesley, Reading, MA, 1996. [24] J.F. Hayes. Modeling and Analysis of Computer Communications Networks. Plenum Press, New York, 1984. [25] Ivan Howitt. Bluetooth performance in the presence of 802.11b WLAN. IEEE Transactions on Vehicular Technology, 51:1640–1651, 2002. [26] IEC. IEC 1158-1, FieldBus Specification: Part 1: FieldBus Standard for Use in Industrial Control: Functional Requirements. [27] IEEE. IEEE 802.4, Token-Passing Bus Access Method, 1985. [28] IEEE. IEEE 802.5, Token Ring Access Method and Physical Layer Specifications, 1985. [29] IEEE. IEEE 1118, Standard Microcontroller System Serial Control Bus, August 1991. [30] ISO. ISO 11898, Road Vehicle: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication, 1993. [31] Raj Jain. FDDI Handbook: High-Speed Networking Using Fiber and Other Media. Addison-Wesley, Reading, MA, 1994. [32] Leonard Kleinrock and Fouad A. Tobagi. Packet switching in radio channels. Part I. Carrier sense multiple access models and their throughput-/delay-characteristic. IEEE Transactions on Communications, 23:1400–1416, 1975. [33] J.F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16:43–70, 1984. [34] S.S. Lam. Multiaccess protocols in computer communications. In W. Chon, Editor, Principles of Communication and Network Protocols, Volume I, Principles, pp. 114–155. Prentice Hall, Englewood Cliffs, NJ, 1983. [35] Jim Lansford, Adrian Stephens, and Ron Nevo. Wi-Fi (802.11b) and Bluetooth: enabling coexistence. IEEE Network Magazine, 15:20–27, 2001. [36] Andrew D. Myers and Stefano Basagni. Wireless media access control. In Ivan Stojmenovic, Editor, Handbook of Wireless Networks and Mobile Computing, pp. 119–143. John Wiley & Sons, New York, 2002.
© 2005 by CRC Press
1-26
The Industrial Communication Technology Handbook
[37] IEEE. IEEE 802.11, Standard for Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Networks: Specific Requirements: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical Layer (PHY) Extension in the 2.4 GHz Band, 1999. [38] Nikos Passas, Sarantis Paskalis, Dimitri Vali, and Lazaros Merakos. Quality-of-service-oriented medium access control for wireless ATM networks. IEEE Communications Magazine, 35:42–50, 1997. [39] C.S. Raghavendra and Suresh Singh. Pamas: power aware multi-access protocol with signalling for ad hoc networks. ACM Computer Communication Review, 27, 5–26, 1998. [40] Theodore S. Rappaport. Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 2002. [41] Erwin P. Rathgeb. Integrated services digital network (ISDN) and broadband (B-ISDN). In Jerry D. Gibson, Editor, The Communications Handbook, pp. 577–590. CRC Press/IEEE Press, Boca Raton, FL, 1996. [42] Izhak Rubin. Multiple access methods for communications networks. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 622–649. CRC Press/IEEE Press, Boca Raton, FL, 1996. [43] S.R. Sachs. Alternative local area network access protocols. IEEE Communications Magazine, 26:25–45, 1988. [44] Hideaki Takagi. Analysis of Polling Systems. MIT Press, Cambridge, MA, 1986. [45] Andrew S. Tanenbaum. Computer Networks, 3rd edition. Prentice Hall, Englewood Cliffs, NJ, 1997. [46] Andrew S. Tanenbaum. Computernetzwerke, 3rd edition. Prentice Hall, Muenchen, 1997. [47] IEEE. IEEE 802.11, Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, November 1997. [48] Fouad A. Tobagi. Multiaccess protocols in packet communications systems. IEEE Transactions on Communications, 28:468–488, 1980. [49] Fouad A. Tobagi. Multiaccess link control. In P.E. Green, Editor, Computer Network Architectures and Protocols. Plenum Press, New York, 1982. [50] Fouad A. Tobagi and Leonard Kleinrock. Packet switching in radio channels. Part II. The hidden terminal problem in CSMA and busy-tone solutions. IEEE Transactions on Communications, 23:1417–1433, 1975. [51] TTTech Computertechnik GmbH, Vienna. TTP/C Protocol, Version 0.5, 1999. [52] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 2, PROFIBUS, 1996. [53] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 3, WorldFIP, 1996. [54] Harmen R. van As. Media access techniques: the evolution towards terabit/s LANs and MANs. Computer Networks and ISDN Systems, 26:603–656, 1994. [55] Bernhard Walke. Mobile Radio Networks: Networking, Protocols and Traffic Performance. John Wiley & Sons, Chichester, 2002. [56] Andreas Willig and Andreas Köpke. The adaptive-intervals MAC protocol for a wireless PROFIBUS. In Proc. 2002 IEEE International Symposium on Industrial Electronics, L’Aquila, Italy, July 2002.
Error Control [57] A. Annamalai and Vijay K. Bhargava. Analysis and optimization of adaptive multicopy transmission arq protocols for time-varying channels. IEEE Transactions on Communications, 46:1356–1368, 1998. [58] Sergio Benedetto, Guido Montorsi, and Dariush Divsalar. Concatenated convolutional codes with interleavers. IEEE Communications Magazine, 41:102–109, 2003. [59] Henrik Bengtsson, Elisabeth Uhlemann, and Per-Arne Wiberg. Protocol for wireless real-time systems. In Proc. 11th Euromicro Conference on Real-Time Systems, York, England, 1999. [60] Claude Berrou. The ten-year-old turbo codes are entering into service. IEEE Communications Magazine, 41:110–116, 2003.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-27
[61] Daniel J. Costello, Joachim Hagenauer, Hideki Imai, and Stephen B. Wicker. Applications of errorcontrol coding. IEEE Transactions on Information Theory, 44:2531–2560, 1998. [62] Moncef Elaoud and Parameswaran Ramanathan. Adaptive use of error-correcting codes for real-time communication in wireless networks. In Proc. INFOCOM 1998, San Francisco, March 1998. IEEE. [63] David Haccoun and Samuel Pierre. Automatic repeat request. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 181–198. CRC Press/IEEE Press, Boca Raton, FL, 1996. [64] Bruce A. Harvey and Stephen B. Wicker. Packet combining systems based on the Viterbi decoder. IEEE Transactions on Communications, 42:1544–1557, 1994. [65] Olivier Hersent, David Gurle, and Jean-Pierre Petit. IP Telephony: Packet-Based Multimedia Communications Systems. Addison-Wesley, Harlow/England, London, 2000. [66] Samir Kallel. Analysis of a type-II hybrid ARQ scheme with code combining. IEEE Transactions on Communications, 38:1133–1137, 1990. [67] Paul Lettieri, Curt Schurgers, and Mani B. Srivastava. Adaptive link layer strategies for energyefficient wireless networking. Wireless Networks, 5:339–355, 1999. [68] Paul Lettieri and Mani Srivastava. Adaptive frame length control for improving wireless link throughput, range and energy efficiency. In Proc. INFOCOM 1998, pp. 564–571, San Francisco, 1998. IEEE. [69] Shu Lin and Daniel J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1983. [70] Hang Liu, Hairuo Ma, Magda El Zarki, and Sanjay Gupta. Error control schemes for networks: an overview. MONET: Mobile Networks and Applications, 2:167–182, 1997. [71] Arnold M. Michelson and Allen H. Levesque. Error-Control Techniques for Digital Communication. John Wiley & Sons, New York, 1985. [72] Arogyaswami Paulraj. Diversity techniques. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 213–223. CRC Press/IEEE Press, Boca Raton, FL, 1996. [73] Martin L. Shooman. Reliability of Computer Systems and Networks. John Wiley & Sons, New York, 2002. [74] Bernard Sklar. Digital Communications: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1988. [75] Bernard Sklar. A primer on turbo code concepts. IEEE Communications Magazine, 35, 94–102, 1997. [76] Jonathan Stone, Michael Greenwald, Craig Partridge, and James Hughes. Performance of checksums and CRC’s over real data. IEEE/ACM Transactions on Networking, 6:529–543, 1998. [77] Elisabeth Uhlemann, Per-Arne Wiberg, Tor M. Aulin, and Lars K. Rasmussen. Deadline-dependent coding: a framework for wireless real-time communication. In Proc. International Conference on RealTime Computing Systems and Applications, pp. 135–142, Cheju Island, South Korea, December 2000. [78] Xin Wang and Michael T. Orchard. On reducing the rate of retransmission in time-varying channels. IEEE Transactions on Communications, 51:900–910, 2003. [79] Andreas Willig, Martin Kubisch, Christian Hoene, and Adam Wolisz. Measurements of a wireless link in an industrial environment using an IEEE 802.11-compliant physical layer. IEEE Transactions on Industrial Electronics, 49:1265–1282, 2002.
Flow Control [80] Rene L. Cruz. Routing and flow control. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 650–660. CRC Press/IEEE Press, Boca Raton, FL, 1996. [81] Roger L. Freeman. Reference Manual for Telecommunications Engineering, 3rd edition, Volume 2. John Wiley & Sons, New York, 2002.
Link Layer Protocols [82] D.E. Carlson. Bit-oriented data link control procedures. IEEE Transactions on Communications, 28:455–467, 1980.
© 2005 by CRC Press
1-28
The Industrial Communication Technology Handbook
[83] LAN/MAN Standards Committee of the IEEE Computer Society. International Standard ISO/IEC 8802-2, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 2: Logical Link Control, 1998.
Packet Scheduling [84] Jon C.R. Bennet and Hui Zhang. Hierarchical packet fair queueing algorithms. In Proc. ACM SIGCOMM, 1996. Association of Computing Machinery. [85] Pravin Bhagwat, Partha Bhattacharya, Arvind Krishna, and Satish K. Tripathi. Using channel state dependent packet scheduling to improve TCP throughput over wireless LANs. Wireless Networks, 3:91–102, 1997. [86] E.G. Coffman, Jr. Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York, 1982. [87] Srinivasan Keshav. An Engineering Approach to Computer Networking: ATM Networks, the Internet and the Telephone Network. Addison-Wesley, Reading, MA, 1997. [88] C.L. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM, 20:46–61, 1973. [89] Songwu Lu, Vaduvar Bharghavan, and Rayadurgan Srikant. Fair queueing in wireless packet networks. In Proc. of ACM SIGCOMM ’97 Conference, pp. 63–74, Cannes, France, September 1997. [90] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the single node case. In Proc. IEEE INFOCOM, Volume 2, pp. 915–924, 1992. IEEE. [91] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the multiple node case. In Proc. IEEE INFOCOM, Volume 2, pp. 521–530, 1993. IEEE. [92] Abhay Kumar J. Parekh. A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, February 1992. [93] Anujan Varma and Dimitrios Stiliadis. Hardware implementation of fair queueing algorithms for ATM networks. IEEE Communications Magazine, 35:54–68, 1997.
© 2005 by CRC Press
2 IP Internetworking 2.1
ISO/OSI Reference Model ..................................................2-1 The Physical Layer • The Data Link Layer • The Network Layer • The Transport Layer • The Session Layer • The Presentation Layer • The Application Layer
2.2
The TCP/IP Reference Model.............................................2-4 The Host-to-Network Layer • The Internet Layer • The Transport Layer • The Application Layer
2.3 2.4
Reference Model Comparison............................................2-6 Data Link Layer Protocols and Services ............................2-8 Frame Creation • Error Detection and Correction • Media Access Control
2.5
Network Layer Protocols and Services ............................2-10 IPv4 • IPv4 Multicasting • IPv6 • Address Resolution Protocol • Internet Control Message Protocol • Internet Group Management Protocol
2.6
Transport Layer Protocols and Services ..........................2-18 Transmission Control Protocol • User Datagram Protocol • Resource Reservation Protocol
2.7 2.8
Helmut Hlavacs University of Vienna
Christian Kurz University of Vienna
Presentation Layer Protocols and Services ......................2-21 Application Layer Protocols and Services .......................2-22 TELNET • File Transfer Protocol • Hypertext Transfer Protocol • Simple Mail Transfer Protocol • Resource Location Protocol • Real-Time Protocol
2.9 Summary............................................................................2-26 References .....................................................................................2-26
2.1 ISO/OSI Reference Model The ISO/OSI reference model [ISO7498] was developed by ISO (International Organization for Standardization) and finished in 1982. The OSI (Open Systems Interconnection) reference model allows the connection of open systems. This objective is reached by applying a layered approach. The communication system is divided into seven layers (see Figure 2.1) [PET2000]. The lowest three layers are network dependent. They provide support for data communication between and linking of two systems. The upper three layers are application oriented. They allow the end-user application processes to interact with each other. The intermediate layer (transport layer) isolates the application-oriented layers from the communication details at the lower layers [HAL1996]. Each layer performs a well-defined function. This allows the reduction of the level of complexity at each layer and is defined by a protocol. The information flow between the layers is directed through interfaces and should be minimized [TAN1996]. Each layer exchanges messages using services of the layer below. It communicates with the related peer at the same level in a remote system and provides services to the layer above [COL2001]. At each layer the source host adds a header to the packet, which
2-1 © 2005 by CRC Press
2-2
The Industrial Communication Technology Handbook
Host A
Host B Application Layer
Interface Presentation Layer Interface
Application Oriented Layers
Session Layer Interface Transport Layer
Intermediate Layer
Interface Network Layer Interface Data Link Layer
Network Dependent Layers
Interface Physical Layer
Physical Link (e.g., Cable)
FIGURE 2.1 ISO/OSI reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
is read and removed again by the receiver. It is important to note that the implementation of one layer is therefore independent from the implementation of the other layers. In the next section, each of the layers is discussed separately, starting at the lowest one.
2.1.1 The Physical Layer The lowest layer is concerned with the transmission of raw bits from the electrical interface of the user equipment to the communication channel. This can be either an electrical, optical, or wireless medium, and it transfers a serial stream of data. It has to be ensured that a sent 1 bit is seen by the receiver as a 1 bit, not as a 0 bit. Design issues at this layer are, for example, how long one bit lasts and by which wavelength of light or by which voltage level a 1 and a 0 bit is represented. Additionally, the handling of the initial connection and the closure of the connection are carried out at the physical layer. Also, mechanical properties of the network equipment, such as size and shape of connectors and cables, have to be specified. Furthermore, electrical (or optical) parameters must be determined. These are the voltage levels, electrical resistance of the cable, duration of signaling elements and voltage changes, and coding method. The next issue handled by the physical layer is the functional specification. It concerns the meaning of switched connections, distinguishing between data and control wires and specifying clock rate and ground.
2.1.2 The Data Link Layer As the physical layer is only concerned with the transmission of raw data, the main function of the data link layer is to recognize and correct transmission errors. For this reason, the sender divides the data stream into frames that are transmitted sequentially. When the frame is received, an acknowledgment may be sent back to the sender. If a frame is destroyed by a burst in the line and therefore is not acknowledged, it is retransmitted by the sender. As the acknowledgment frame could also be lost, care
© 2005 by CRC Press
IP Internetworking
2-3
has to be taken that no duplicate frames are inserted into the data stream. The data link layer therefore solves problems arising from lost, damaged, or duplicate frames. This layer may also offer different service classes, e.g., for protected or unprotected services. If the receiver is slower than the sender, frames can be lost because of different processing speeds. To prevent this scenario, a mechanism is implemented to regulate network traffic. Therefore, the sender should know how much buffer space is left at the receiver. Another task of the data link layer is the media access control within broadcasting networks. In these networks, all connected computers perceive all data transferred; they share a common link. Therefore, it has to be made sure that there is only one sender at a time to avoid data collision. If a collision occurs, it has to be detected and retransmission of all affected data has to be initiated. When data can be transmitted in both directions simultaneously, the acknowledgment frame for sender A sending to receiver B competes with the data frames that B is sending to A. A solution for this problem is piggybacking, where the acknowledgment information is added to data packages sent, instead of sending additional frames.
2.1.3 The Network Layer The network layer is responsible for the setup, handling, and termination of network-wide connections; it controls the operation of the subnet. There are two possible types of network connections: virtual connections and datagram connections. Virtual connections are set up at the start of a transmission to fix the route for the following data packets; packets are always sent using the same route. Using a datagram connection, the route is chosen separately for each package. Sometimes it has to be ensured that packets arrive in the same order as they were sent. A packet B sent after a packet A may arrive in front of A using a different route in datagram communication. Additionally, the network layer is concerned with package routing from source to destination. Routing information can either be stored in a static table or be determined dynamically at the start of each transmission. The chosen route can also depend on the current network load. If too many packets are sent in one subnet, a capacity bottleneck forms. To avoid this situation, the network layer may implement congestion control. To be able to analyze network traffic, an accounting mechanism is incorporated at this layer. This mechanism counts how many packets are sent, also storing information about packet source and destination. The gathered information can be used to produce billing information. Also, there may be problems when a packet is traveling through heterogeneous networks. The network layer handles the issues of different packet sizes, varying addressing schemas, or different protocols.
2.1.4 The Transport Layer This layer is the interface between the higher application-oriented layers and the underlying networkdependent layers. Thus, the session layer can transfer messages independent from the network structure. Seen from layers above, messages can be transferred transparently without having knowledge of the underlying network structure. The transport layer basically cuts messages into smaller packets if needed and passes them to the network layer. At the sender, the messages are assembled again and passed to the session layer. An important task of the transport layer is the handling of transport connections. Normally, one network connection is created for each transport connection required by the session layer. If the session layer requires higher output than can be handled by one connection, the network layer might create additional connections. On the other hand, if one wants to save costs, a number of transport connections can be multiplexed onto one network connection. As there is the possibility to set up multiple connections, a transport header is added to distinguish between them. The transport layer provides different classes of quality of service (QoS). The lowest service class provides only basic functionality for connection establishment; the highest class allows full error control and flow control. To avoid the situation of a fast sender overrunning a slower receiver with messages, an
© 2005 by CRC Press
2-4
The Industrial Communication Technology Handbook
algorithm for flow control is provided. The most popular type of connection is an error-free point-topoint connection where messages are delivered in the same order they were sent. Additionally, messages with no guaranteed order can be sent. It is also possible to send messages not only to one, but to multiple destinations, or to send broadcast messages. The transport layer establishes and terminates connections across the network. Therefore, the need for a naming mechanism arises, allowing processes to choose with whom they converse.
2.1.5 The Session Layer Layer 6 organizes and synchronizes the data exchange for two application layer processes. It sets up and clears the communication channel for the whole duration of the network transaction between them, and therefore sets up sessions between users on different machines. A session might be used to log into another machine in a remote time-sharing environment or to transfer a file. The session layer provides interaction management (also called dialogue control). Data can be exchanged using duplex or half-duplex connections. A duplex connection transfers data both ways simultaneously. A half-duplex connection can transfer either one way or the other, where the session layer decides which party is allowed to use the link. Another task of the session layer is the token management. It is useful when both sides are not allowed to perform the same operation at the same time. To schedule these operations, a token is issued only to one process at each given time, allowing only the process that holds the token to perform the critical task. For big data transmissions, synchronization points can be set periodically. If the network connection fails, the transmission is restarted at the last synchronization point set. Thus retransmission of the whole data can be avoided. Nonrecoverable exceptions during transmissions are reported to the application layer.
2.1.6 The Presentation Layer The main task of the presentation layer is the representation of data, e.g., integers, floating point numbers, or character strings. Therefore, the syntax for these data containers is defined. As different computers may use varying internal data representations (for example, for characters or numbers, a conversion has to be done), the data sent are converted to an appropriate transfer syntax and are transformed back to the receiver’s internal data format upon receipt. Those converters for the syntax of data do not necessarily have to understand the semantics. This layer may also provide services for data encryption and data compression.
2.1.7 The Application Layer This layer provides services not to other layers, but directly to application programs. Thus, there is no specific service in this layer, but there is a distinct combination of services offered to each application process. As the connected hosts may use different file systems, the application layer handles the differences and avoids incompatibilities. Therefore, this layer provides the means for network-wide distributed information services. This allows the application processes to transfer files, send e-mails, or perform directory lookups. Furthermore, the application layer provides services for the identification of intended communication partners, to check the availability of an intended communication partner, to verify communication authority, to provide privacy services, to authenticate communication partners, to select the dialogue discipline, to reach an agreement on the responsibility for error recovery, and to identify the constraints on data syntax [HAL1996].
2.2 The TCP/IP Reference Model TCP/IP (Transmission Control Protocol/Internet Protocol) was first used when the ARPANET was emerging. This network was developed for use by the U.S. Armed Forces. Therefore, it was required that even when some parts of the network were destroyed during battle, it should still provide communication
© 2005 by CRC Press
2-5
IP Internetworking
Host A
Host B Application Layer
Interface Transport Layer Interface Internet Layer Interface Host-to-Network Layer
Physical Link (e.g., Cable)
FIGURE 2.2 TCP/IP reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
services. As long as two hosts were still functioning and there was any path available between them, communication must be possible. Another important issue was the ability to connect multiple different networks together no matter the underlying protocols, physical transport medium, or provided bandwidth. The TCP/IP reference model is structured similar to the ISO/OSI model introduced in Section 2.1, but it consists of only four layers. A comparison of both models is done later in this chapter. The lowest layer is the host-to-network layer, where the Internet layer is attached. Above the transport layer is located the highest layer, the application layer (Figure 2.2) [SCO1991]. The next sections give an overview of the services provided by the TCP/IP model layers, starting with the lowest one.
2.2.1 The Host-to-Network Layer TCP/IP does not specify services or operations at the host-to-network layer. It is only required that the host can somehow connect to the network to enable the Internet layer to send packets. As this layer is not defined, the implementation can vary on each system. The network service may be provided by Ethernet, Token Ring, asynchronous transfer mode (ATM), wide area network (WAN) technologies, wireless technologies, or any other means of transferring network packets.
2.2.2 The Internet Layer The main features of the Internet layer are addressing, packet routing, and error reporting. Additionally, services for fragmentation and reassembly of packets are provided [HAL1996]. The core protocols at the Internet layer are the Internet Protocol (IP) [RFC791], the Address Resolution Protocol (ARP) [RFC826], the Internet Control Message Protocol (ICMP) [RFC792], and the Internet Group Management Protocol (IGMP) [RFC3376]. The Internet Protocol is concerned with packet routing, IP addressing, and the fragmentation and reassembly of packets. It is a packet-switching protocol based on a best-effort connectionless architecture. Packets travel independently from each other from source to destination host. Each packet may be routed differently through the network; thus packets may be delivered in a different order than they were sent. Packets may also be lost, because delivery is not guaranteed. To be able to route packets across the network, each host has to know the location of a gateway or a router. The gateway decides which path a packet has to travel. For this reason, a routing table is maintained at the Internet layer. To send packets across networks that only support small packet sizes, the packets are broken down in size at the source host and are assembled again at the destination host.
© 2005 by CRC Press
2-6
The Industrial Communication Technology Handbook
ARP [RFC826], the Address Resolution Protocol, translates network layer addresses to link layer (hardware) addresses. Thus, an IP address is translated to, for example, an Ethernet address. The Internet Control Message Protocol (ICMP) [RFC792] is concerned with datagram error reporting and is able to provide certain information about the Internet layer. The Internet Group Management Protocol (IGMP) [RFC3376] is used to manage IP multicast groups [RFC1122].
2.2.3 The Transport Layer The transport layer provides stream and datagram communication services. Protocols specified at this layer are the Transmission Control Protocol (TCP) [RFC793] and the User Datagram Protocol (UDP) [RFC768]. Both protocols deliver end-to-end communication services (i.e., message transfer). The Transmission Control Protocol is a connection-oriented and reliable point-to-point communication service [RFC793]. A data stream is sent to any other host in the Internet without errors. This data stream is broken down into messages and handed down to the Internet layer. TCP sets up and terminates the connection, and it sequences and acknowledges the packets it sends. It is also responsible for retransmitting packets lost during transmission. Also, a service for flow control is implemented, thus avoiding a receiver being flooded by a faster sender. The User Datagram Protocol is a connectionless, unreliable communication protocol [RFC768]. Thus sequencing or flow control is not provided. It is used when prompt delivery of packets is more important than error-free transmission, like demanded for the transmission of video or audio content. Compared to TCP, there is no connection establishment, no connection state, a smaller packet overhead, and an unregulated send rate [KUR2001].
2.2.4 The Application Layer This layer provides services to application processes. It accesses services of the transport layer and allows processes at different hosts to communicate with each other using a variety of protocols. These include the Hypertext Transfer Protocol (HTTP) [RFC2616] to send and receive files that make up Web pages. Also, protocols for sending electronic mail, the Simple Mail Transfer Protocol (SMTP) [RFC821], and interactive file transfer, the File Transfer Protocol (FTP) [RFC959], are implemented at this layer. Another provided and often used service is Telnet, which is a terminal emulation protocol [RFC854]. It enables the user to log on to remote hosts. To access news articles at virtual blackboards, the Network News Transfer Protocol (NNTP) [RFC977] is provided. Additionally, protocols for the management of TCP/IP networks are available at this layer. The Domain Name Service (DNS) [RFC1034, RFC1035] resolves a host name to an IP address. Network management, including the collection and exchange of management information, is facilitated by the Simple Network Management Protocol (SNMP) [RFC1157]. Besides these basic protocols, a wide variety of other protocols are implemented for use at the TCP/ IP application layer. An overview of the assignment of the protocols mentioned in this section to the respective layer is given in Figure 2.3.
2.3 Reference Model Comparison Both reference models described above are based on a layered approach. Also, the layers provide quite similar services. The application layer of the TCP/IP model corresponds to the application layer of the ISO/OSI reference model. Presentation and session layers are not present in the TCP/IP reference model. Thus, in the TCP/IP model, services provided by these two layers have to be performed by the application process itself. The two transport layers perform similar services. The next layer in the TCP/IP model, the Internet layer, is equivalent to the network layer in ISO/OSI. The data link layer and the physical layer of the OSI reference model are represented by the host-to-network layer in the TCP/IP model (Figure 2.4). In both models, layers above the transport layer are application dependent [TAN1996].
© 2005 by CRC Press
2-7
IP Internetworking
HTTP
SMTP
FTP
NNTP
TCP
SNMP
RTP
ICMP
IP
WLAN
IGMP
Token Ring
Application Layer
Transport Layer
UDP
ARP
Ethernet
DNS
ATM
Internet Layer
Host-toNetwork Layer
FIGURE 2.3 TCP/IP architecture. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
ISO/OSI
TPC/IP
7
Application Layer
Application Layer
4
6
Presentation Layer
5
Session Layer
4
Transport Layer
Transport Layer
3
3
Network Layer
Internet Layer
2
2
Data Link Layer Host-to-Network Layer
1
1
Physical Layer
FIGURE 2.4 ISO/OSI vs. TCP/IP. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
The ISO/OSI model is mainly a conceptual model; it is an example for a universally applicable structured network model. It introduced three main concepts, which were also obeyed when the TCP/ IP reference model was developed. Each layer provides exactly defined services to the layer above and uses services of the layer below. These services are accessed using interfaces specifying which parameters are expected and the results returned. Protocols defined in each layer communicate with their peer at the remote host, independent from the underlying network structure. These ideas in both protocols are similar to object-oriented software development [TAN1996]. In the beginning, the TCP/IP model did not strictly separate between services, interfaces, and protocols; these concepts have been introduced later. Thus, protocols in the ISO/OSI model are better encapsulated than those in the TCP/IP model, and it is easier to alter services in the ISO/OSI reference model [TAN1996]. As the ISO/OSI model was developed before the respective protocols and their implementation, it was possible to easily distinguish between services, interfaces, and protocols for each layer. The developers were able to choose the appropriate number of layers such that each one could perform only a distinct set of matching services. For TCP/IP, the protocols were developed first, and afterwards, the
© 2005 by CRC Press
2-8
The Industrial Communication Technology Handbook
abstract model was created. The problem with this approach was that the model did not fit to any existing protocol stacks [TAN1996]. Finally, there are differences at the area of connectionless vs. connection-oriented communication. The ISO/OSI reference model provides services for both kinds of communication at the network layer, but only connection-oriented services at the transport layer. The TCP/IP reference model supports both connectionless and connection-oriented communication at the transport layer, but only connectionless services at the network layer [TAN1996]. In the following, the most important TCP/IP models and services, their functionality, and their position in the OSI stack will be described.
2.4 Data Link Layer Protocols and Services In the OSI model, the data link layer is situated at layer 2, and in the TCP/IP reference model, at the host-to-network layer. Its purpose is to offer services to OSI layer 3 in the way that protocols at layer 3 may send data to neighboring computers (i.e., computers directly connected via a network link or via layer 1 or 2 repeaters, bridges, hubs, or switches) in a reliable way. The data link layer may offer one of the following services to layer 3: • Using an unacknowledged connectionless service, no measures are taken to detect lost packets by the sender or the receiver. • In acknowledged connectionless services, the receiver must acknowledge the data it received by sending back an acknowledgment to the sender. If the sender does not receive the acknowledgment after a certain amount of time, it assumes that the data were lost and retransmits them. • In connection-oriented services, the data link layer must first create a (possibly virtual) path between sender and receiver before data can be sent. Furthermore, the data link layer adds sequence numbers to the sent data units in order to detect lost or erroneous data units. The bit-error rate of modern wire-line (electrical or optical) local area network (LAN) interconnections is too low to justify the additional effort for virtual path creation at this level. In LANs, therefore, usually acknowledged (Token Ring) or unacknowledged (Ethernet) connectionless services are used at layer 2. Lost packets or packets delivered out of order are then often to be detected at layer 4 or even higher. Wireless networks, however, may severely suffer from lost packets or high bit-error rates. Under these conditions, sophisticated data link layer protocols like IBM’s Synchronous Data Link Control (SDLC), or the closely related ISO norm High-Level Data Link Control (HDLC) and the CCITT recommendation Link Access Procedure (LAP), or the IEEE 802.2 norm Logical Link Control (LLC), are often used.
2.4.1 Frame Creation One major task of layer 2 is to pack the data it receives from a higher layer for transfer into so-called frames, i.e., data packets, which are then modulated onto the physical network medium. This is done in a way that the desired receivers are able to (1) detect that a frame has been sent, (2) decode the frame reliably and retrieve the sender and receiver addresses, and (3) identify those frames, which are meant for themselves. Frames are nothing more than a sequence of bits modulated onto a carrier. In order to be able to decode information stored in a frame, a receiver first has to be able to identify the first bit of a frame. This start bit then is usually followed by a specific sequence of bits containing frame information like the frame length, the content type, checksums, etc. A simple method for finding the frame start is given by bit stuffing. In protocols like X.25, the start of a frame is signaled by six 1s. If the data transported in the frame also contain six 1s in a row, then after the fifth 1, a 0 has to be inserted by the data link layer. The receiving data link layer then knows that if it receives five consecutive 1s followed by a 0, the sender must have inserted the 0, and therefore removes it. Another method for synchronizing senders and receivers at the bit level is given by sending sync bytes, as, for instance, is done in Digital Video Broadcast (DVB) [REI2001]. Here, data frames are 204 bytes long
© 2005 by CRC Press
IP Internetworking
2-9
and contain a certain value (0¥47) always at the same position (sync byte). The task of a sync-byte detector is to detect the regular occurrence of this value every 204 bytes. If this value is detected five times, then sender and receiver are synchronized and the receiver may easily compute the frame start from it. Other methods include octet counting or octet stuffing and will be described in the context of application protocols. Once the start bit of a frame is identified, a network card may determine whether it is the receiver of the sensed frame. In the IEEE 802 standard, each network interface card is assigned a unique 6-bytelong media access control (MAC) address, and each sent frame starts with the MAC address of the destination network card. Thus, each network card receiving a frame just compares the first 6 bytes of the frame with its own MAC address, and if they are equal, it passes the frame on to the next higher layer for further processing.
2.4.2 Error Detection and Correction Sending data over certain media types is often unreliable and may be severely disturbed by external disruptions, causing data to be lost or wrongly received. Thus, another important task of the data link layer is either to correct corrupted frames or at least to detect the occurrence of bit errors. In order to detect or correct bit errors, the sender must add checksum information, which is additional to the transported headers and user data. The more information is added, the more wrong bits may be detected or even corrected. A popular method for error detection and correction is given by Hamming codes. Here, certain code words are sent, which are different at a specific number of bits, the Hamming distance. For instance, a code containing the words 000111, 111000, 000000, and 111111 has the Hamming distance 3; i.e., code words differ at at least 3 bits from each other. In order to identify that d bits have been changed during transmission, a Hamming distance of d + 1 is required. If the receiver has to be able to correct d bits, then a Hamming distance of 2d + 1 must be kept. The code above thus is able to detect two wrong bits and correct one wrong bit. A simpler way for detecting wrong bits is given by the parity bit. Here, only one bit is added to each code word, counting the number of 1s in the word. If this number is even, the parity bit is set to 0; otherwise, it is set to 1 (or vice versa). Codes with parity bits may detect one wrong bit per code word only. A more sophisticated error detection code is give by the cyclic redundancy check (CRC) code. Here, each sequence of bits is treated as a polynomial over the field of binary numbers (modulo 2). The number 101, for instance, is treated as the polynomial x2 + 1. Modulo 2 means that each addition of single bits is treated as an exclusive-or (XOR) operation, i.e., 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, and 1 + 1 = 0. For CRC codes, a fixed polynomial is chosen, called generator polynomial G(x). If a code word W(x) is to be sent, it is replaced by another polynomial R(x), which can be divided by G(x) with rest 0, and from which the original code word W(x) can be reconstructed. The polynomial R(x) is then transmitted and received. If the received R(x) can be divided by G(x) without rest, then the transmission has been error-free with high probability. Otherwise, bit errors are detected and the transmitted code word is dropped. Other error correction techniques include Reed–Solomon codes and convolutional codes, but these will not be treated here.
2.4.3 Media Access Control An important part of the data link layer is given by the media access control (MAC) sublayer. At this sublayer, the access to the physical medium is controlled, which may be shared by several senders concurrently. Depending on the media type, one or several senders may transmit data at the same time. In case of conflicts, several techniques exist in order to grant the right to use the medium. 2.4.3.1 ALOHA The ALOHA technique, developed at the University of Hawaii, allows all senders to send their data to a commonly shared broadcast medium whenever they wish. In a broadcast medium, the data sent by one
© 2005 by CRC Press
2-10
The Industrial Communication Technology Handbook
host are received by all others listening to the same medium. In case of collisions due to the concurrent sending of two or more senders, the colliding frames are discarded and must be sent anew. 2.4.3.2 CSMA/CD For carrier-sense multiple access/collision detection (CSMA/CD), as, for example, implemented in Ethernet, several network cards share the same broadcast medium (e.g., an electrical wire). Each network card listens to the medium (carrier sense), and if no signal is detected, then a new sender may use the medium immediately. Due to the limited speed of signals, two or more senders may send simultaneously without noticing each other in time, resulting in collisions. At such an instance, all colliding frames are discarded and each sender waits for a random amount of time until it tries to send again. 2.4.3.3 TDMA In time-division multiple access (TDMA), time is divided into time slices, and each sender is granted one slice where it may send its data into the medium. Here, bandwidth may be wasted as senders own their time slice, whether they have something to send or not. 2.4.3.4 FDMA In frequency-division multiple access (FDMA) several sending frequencies exist, and for each frequency, one sender may transmit without fearing interference from other frequencies. For example, GSM (global system for mobile communication) uses a mixture of TDMA and FDMA for its calls. Additionally, GSM terminals change their frequency according to a fixed scheme (frequency hopping). 2.4.3.5 CDMA The concept of code-division multiple access (CDMA) is fundamentally different from the previous concepts. Here, each sender is assigned a unique bit sequence of length N called chip. Each sent bit is then added (modulo 2) to all chip bits, yielding the chip if a 0 is to be sent, or the inverse chip if a 1 is to be sent. If a terminal wants to transmit R bits per second (bps), then R chips have to be transferred per second, making necessary a much higher bandwidth of R ¥ N bps in total. Thus, the necessary frequency band is broadened significantly. In essence, the signal is spread over a broad spectrum and the chip is thus often called spreading sequence. In CDMA, senders with different chips can send concurrently and do not disturb the reception of other signals. This works because different chips are mathematically orthogonal to each other with respect to the inner products of chips (which can also be interpreted as bit vectors) and their inverse. Also, due to the use of a broader spectrum, the reconstruction of the signal is more robust with respect to other noise sources.
2.5 Network Layer Protocols and Services 2.5.1 IPv4 The term Internet Protocol (IP) usually denotes IP version 4 (IPv4), which has been specified in [RFC791] and [RFC1122] and is the established standard protocol for the Internet at layer 3 of the ISO/OSI reference model and at the Internet layer of the TCP/IP reference model. The task of IP is to transport a packet from one source computer to a destination computer, where both computers are interconnected by an internet. Here, internet denotes any (possibly privately managed) heterogeneous network that is interconnected using IP and IP-based routers. In contrast, the Internet denotes the well-known worldwide IPbased network interconnecting millions of computers and being managed by network information centers (NICs) and Internet service providers (ISPs). When traveling through an internet, a packet may pass by several intermediary networks with different network technologies, for instance, Ethernet, Token Ring, ATM, etc., used at layers 1 and 2. At the border between two different networks, the packet’s destination network address is examined by a router, i.e., a computer that is connected to both networks and that is able to select other routers in the path between sender and receiver, or to find the receiver in its own network. Routing decisions are usually made using
© 2005 by CRC Press
2-11
IP Internetworking
TABLE 2.1
IP Network Classes
Class
Most Significant Bits
Network Address
Host Address
A B C D E
0 10 110 1110 11110
7 bits 14 bits 21 bits 28 bits Reserved
24 bits 16 bits 8 bits 0 bits Reserved
Source: From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.
predefined and regularly updated routing tables. However, the next chosen router is by no means fixed and may depend on runtime situations like congestion or link failures, or it may simply be chosen at random. As a consequence, packets may travel through different paths from sender to receiver, and neither the delivery itself nor the original order can be guaranteed. IP packets are called datagrams, which may have a total length of 65,535 bytes. Datagrams may be cut into a sequence of smaller datagrams, if the datagram size is larger than the network’s maximum transfer unit (MTU), i.e., the largest OSI layer 2 frame that may be transmitted by the network. For Ethernet, for instance, the MTU is 1500 bytes. This process is called fragmentation, and the IP header reflects several fields for reassembling such fragments into the original datagram again. Each datagram or fragment is led by a 20-byte header containing the following information: • • • • • • • • • •
The version number of the IP (4). The IP header length (IHL), which may be larger than 20. The total length of the datagram, including header. An identification number for reassembling fragmented datagrams. All fragments with the same ID belong to the same datagram. Flags, including the don’t fragment (DF) flag, flagging that the datagram should not be fragmented, and the more fragments (MF) flag, signaling that more fragments are still to come. A fragment offset identifying the offset of the received fragment in the whole datagram. The time-to-live (TTL) counter, which is decreased by one by each router. A datagram with TTL equal to zero is discarded. This prevents faulty datagrams from circling through the Internet forever. A number identifying the used transport protocol (6 for TCP, 17 for UDP, …). A header checksum. The IP source and destination addresses.
An important aspect of IPv4 is given by the 32-bit-long IP addresses. The written form follows the dotted decimal notation scheme X1.X2.X3.X4, where the Xi are decimals between 0 and 255. Each address starts with an address class identifier, and then is followed by the network address, and finally by the host address. There are different network classes, as shown in Table 2.1. Each network card attached to the Internet must have a unique IP address. The address assignment scheme is a two-step strategy. First, each site managing a network connected to the Internet is assigned a unique network address by a central authority called the network information center (NIC). Then each site may assign the unique host addresses belonging to this network address, which may include 224 – 2 = 16,777,214 addresses (class A), 216 – 2 = 65,534 (class B), or 28 – 2 = 254 (class C) unique host addresses. IP defines a set of private addresses that may be used freely, but whose traffic should not be routed over the Internet without modification [RFC1918]. The three address blocks are: • 10.0.0.0 to 10.255.255.255 (one class A network) • 172.16.0.0 to 172.31.255.255 (16 contiguous class B networks) • 192.168.0.0 to 192.168.255.255 (256 contiguous class C networks)
© 2005 by CRC Press
2-12
The Industrial Communication Technology Handbook
Source
MR1 MRS
Recv. 1
Recv. 2
FIGURE 2.5 Unicast.
Multicast addresses are special addresses reserved for groups of hosts receiving the same multimedia program via multicast from a single source [RFC1112]. Multicast addresses may range from 224.0.0.0 to 239.255.255.255; details about multicasting are described in Section 2.5.2. Two host addresses are reserved in each (sub)network. The host address 0 denotes the network itself; the highest possible host address denotes a broadcast address that is received by all hosts of a given network.
2.5.2 IPv4 Multicasting The normal mode of communicating via IPv4 is unicast; i.e., one sender sends data to one receiver. Another possible transfer mode inside a subnet is broadcast. In this case one sender sends data to each node of the subnet, regardless of whether the node is interested in the data or not. IP broadcasting, however, does not work beyond the respective subnet boundaries. The third mode of communication is called multicast. Here one sender sends data to a well-defined group of nodes, which may be attached to the same subnet or attached to some other subnet that can be reached via the Internet. Nodes that do not belong to this group do not receive or ignore the sent data. The main advantage of multicasting can be seen in Figure 2.5 and Figure 2.6. A host sending a data packet to a group of N receivers in unicast mode (Figure 2.5) must send the data N times, once for each receiver, thus causing significant traffic and CPU overhead at the source. When using multicast (Figure 2.6), the host sends the data only once, and somewhere in between the source and the receivers, multicast routers duplicate the data packets (as done by MR1 in Figure 2.6) and pass them on to the interested receivers. This way, the source sends each packet only once, reducing traffic for the source itself and for the links between sender and receivers. It must be noted that in a multicasting network all routers must be able to route multicast traffic. If pure unicast routers are present, then multicast traffic must be embedded into unicast traffic, resulting in tunnels, as, for instance, is necessary for the Internet MBone example described below. Pure unicast traffic, however, can be routed by unicast and multicast routers. Source
MR1 MRS
FIGURE 2.6 Multicast.
© 2005 by CRC Press
Recv. 1
Recv. 2
2-13
IP Internetworking
Internet
FIGURE 2.7 MBone multicast islands.
MR1
MR2
Unicast Routers FIGURE 2.8 Tunnel between two multicast routers MR1 and MR2. Logically, MR1 sends multicast traffic directly to MR2. Physically, the data are transported in the payload section of unicast packets.
2.5.2.1 MBone Most of the existing Internet routers either are not able to route multicast traffic, or this ability has not been activated. If a multicast data packet is received by such a pure unicast router, the packet cannot be routed and therefore is discarded. In contrast, the Internet multicast backbone (MBone) is a set of Internet routers that are able to route multicast data and that collaborate with each other. Each of these routers is also attached to a multicasting-enabled subnet; thus, the MBone makes up a set of interconnected multicast islands (Figure 2.7). MBone routers act at two levels. At the usual unicast level, they are standard Internet routers, able to communicate with all other Internet routers via unicast. At the multicast level, they logically send multicast traffic or multicast routing information only to other members of the MBone. As between two different MBone routers physically there may be an arbitrary number of pure unicast routers, multicast data and usually also routing information are sent inside unicast tunnels (Figure 2.8). This means that if a multicast router sends a multicast packet PM toward its receivers, it creates a new UDP unicast packet PU and puts the whole multicast packet PM (including its IP/UDP headers) into the data section of the UDP packet PU . The UDP packet PU is then sent to the next multicast router via unicast. For tunneling, usually IP in IP [RFC1853] is used, but the more general generic route encapsulation (GRE) [RFC2784] may also be used. The MBone is a so-called overlay network on top of the Internet, because the MBone routers together with the tunnels form up a second smaller logical network above the Internet, which at the multicast level is not necessarily aware of the lower-level Internet structure and all its unicast routers. At the multicast level, only the MBone multicast routers and tunnels (connections between the MBone routers) are visible. Nowadays, the MBone consists of thousands of multicast islands being interconnected via tunnels, and users attached to a multicast island may multicast audio and video transmissions to all other users connected to the MBone worldwide. 2.5.2.2 IPv4 Multicast Addressing Like in unicast, when sending a multicast UDP packet, the destination address field of the IP header represents the nodes that receive the packet. However, this destination address must be a class D IP multicast
© 2005 by CRC Press
2-14
The Industrial Communication Technology Handbook
TABLE 2.2
IPv4 Multicast Addressing Scheme
Start
End
Description
224.0.0.0 224.0.1.0 232.0.0.0 233.0.0.0 239.0.0.0
224.0.0.255 238.255.255.255 232.255.255.255 233.255.255.255 239.255.255.255
Routing protocols (e.g., DVMRP, topology discovery, etc.) Either permanently assigned or free for dynamic use Source-specific multicast (SSM) GLOP Administratively scoped IP multicast
address, also called group address. Thus, a multicast packet is always sent to a group of hosts rather than to a specific host. Table 2.2 shows parts of the Internet multicast addressing scheme [IANAM, ALB2004]. It can be seen that some parts of the addressing range are reserved, for instance, for routing protocols, etc.; some are reserved for static multicast groups, which are defined permanently; and some are reserved for different multicast address assignment schemes. For sending a multicast to a transient group (one that is created and destroyed again), the sender must obtain an unused multicast address. Unfortunately, there is no central authority for assigning such an address. Thus, users must either arbitrarily take an address from one of the free address ranges and hope that no one else uses it, or use tools like sd or sdr (see Section 2.5.2.5), which are able to suggest unused addresses. Alternatively, senders may use global scope multicast addresses (GLOP) [RFC2770] or multicast address-set claim (MASC) [RFC2909] for obtaining such an address. Finally, there is a range of multicast addresses that are devoted to limiting their scope within a hierarchically set scheme rather than with the somewhat crude TTL mechanism (explained in Section 2.5.2.4). These are called administratively scoped [RFC2365] addresses; i.e., a large company or institution may limit the set of multicast routers that may receive the sent traffic to their subnets, but not beyond. 2.5.2.3 Local Multicast Hosts wanting to receive multicast data must first join the respective group that will receive the data. If the casting is restricted to a specific LAN, then a receiver at least needs to implement the Internet Group Management Protocol (see Section 2.5.6). It must provide the functions JoinHostGroup(group-address, interface) and LeaveHostGroup(group-address, interface) for its IP service interfaces [RFC1112]. With IGMP, the host joins a group at the IP level and informs its local multicast router that it wishes to receive data sent to this group. The two interface functions instruct each network interface card that it should either join or leave a multicast group at the data link layer (ISO/OSI layer 2). When sending a multicast packet in a LAN, it is advisable to use the existing multicasting capabilities of the used LAN data link layer technology, which are often available additionally to unicast and broadcast. This means that inside a LAN, multicast data should be handled by layer 2 only, rather than layer 3. For instance, a multicast IP address (4 bytes) can be mapped to a unique IEEE 802 (e.g., Ethernet, FDDI, etc.) MAC layer multicast address (6 bytes). For this purpose, the IANA [IANA] has been assigned the IEEE 802 MAC address block from 01-00-5E-00-00-00 to 01-00-5E-FF-FF-FF for the sole use of IP multicasting. For mapping the IP multicast address to the corresponding MAC multicast address, the least significant 23 bits of the IP multicasting address are added to the IANA MAC multicasting base address 01-00-5E-00-0-00. As an IP class D address (32 bits) starts with 4 fixed bits (see Table 2.1), leaving 28 bits free to choose, 5 bits of an IP multicast address are ignored in this mapping, leading to the fact that 25 = 32 IP multicast addresses are always mapped to the same MAC multicast address. The procedure for the transmission of multicast traffic sent in the same LAN is simple. The sender sends the data to a specific IP multicast address AI, which is mapped to the corresponding MAC multicast address AM, and the destination MAC address of each sent frame is set to AM. If a network interface card is instructed to receive multicast sent to the IP multicast address AI (via call to JoinHostGroup), the IP multicast address is again mapped to the same MAC multicast address AM. Once the network interface card detects a frame having the very multicast MAC address AM as the destination address, it accepts the frame and passes it on to layer 3.
© 2005 by CRC Press
2-15
IP Internetworking
TABLE 2.3
Connection between TTL and Scope
TTL
Scope
128 64 48 16–32 1–16
Low-speed tunnels Intercontinental International (within the continent) National (depending on the links involved) Within institution
A call to LeaveHostGroup deletes this association at the receiver. From there on, received multicast frames sent to the group will be ignored. 2.5.2.4 Multicast Routing If multicast packets should be received outside their own LAN, things become more complicated. Whether packets should be sent beyond their own LAN via the local multicast router is in principle determined by the TTL field of the sent packet. Similar to unicast, this field is decremented by one by each router it passes by. Once it reaches zero, it is dropped. This automatically prevents packets from circulating through the Net forever due to incorrect routing tables and also provides scoping, i.e., a way for defining how far the sent packets may travel. For instance, if a packet should be received by hosts being attached to the same LAN only (and nowhere else), the TTL must be set to 1; if packets should be received only by hosts situated on the same continent as the sender, the TTL must be set to 48. Other values for the TTL limit the scope to certain areas centered around the sender (Table 2.3). If TTL is greater than one, then the local multicast router must forward the packet to each multicast router it is connected to. On the MBone this means the packet is sent over each tunnel going out of the local multicast router. As the sender does not know who the other members of the multicast group (i.e., the receivers) are, each multicast packet should be sent to all multicast routers of the MBone (i.e., flooding the whole network) in order to make sure that all group members get the sent data. However, this would lead to a drastic overload of the multicast network, and usually routing protocols exist that minimize the traffic and yet guarantee that each member of the multicast group will receive each packet that is sent to the group, for instance, the distance-vector multicast routing protocol (DVMRP) [RFC1075, PUS2003], multicast extensions to open shortest path first (MOSPF) [RFC1584], or protocol-independent multicast (PIM) [ADA2003, RFC2362]. 2.5.2.5 Multicast Applications Several tools have been created for creating, managing, and receiving multicast traffic over the MBone. For initializing and joining multicast sessions, the tools Session Directory (sd or sdr) or Multikit can be used. Sdr shows multicast programs currently being sent or scheduled for the future. It can also be used for obtaining an unused multicast address and announcing a multicast session to be scheduled for the future. When sessions are joined, sdr will launch the appropriate tools for presenting the program. This can be video tools like vic (video conferencing) or nv (network video), or audio tools like vat (visual audio tool) or rat (robust audio tool). Telephony is done via Free Phone (fphone), and a whiteboard application is given by wb. Other examples for multicast tools include text tools like the Network Text Editor (nt) and a polling tool (mpoll).
2.5.3 IPv6 The Internet Protocol version 6 (IPv6) has been designed for replacing the old IPv4 in the next-generation Internet [RFC1883, RFC1887]. It represents a totally new approach and is incompatible with version 4. As most Internet hosts and routers still only support IPv4, IP packets following IPv6 often cannot be transported from sender to receiver without further modification. Usually, when leaving the IPv6 subnetwork of the sender, IPv6 packets are tunneled over IPv4, i.e., transported in IPv4 packets, where the whole IPv6 packet is treated as pure IPv4 data.
© 2005 by CRC Press
2-16
The Industrial Communication Technology Handbook
The header has been simplified and contains only 7 fixed fields (the IPv4 header includes 13): • • • • •
A version field containing the value 6. A priority field distinguishing between data and real-time traffic. A flow label for supporting pseudo end-to-end connections with guaranteed QoS. The payload length specifies the size of the data contained in the packet. The next header points at the next optional header or an ID for the used transport protocol (TCP or UDP). • The hop limit is decreased by each passed-by router; a packet with zero hop limit is discarded. This prevents faulty packets from circling through the network forever. • Finally, the 16-byte source and destination addresses are contained. IPv6 offers the following enhancements with respect to IPv4: • Addresses are 16 bytes long, written in groups of four hexadecimal digits separated by colons (e.g., 8000:0000:1111:2222:3333:4444:ABCD:EFFF). This solves the shortage of IPv4 addresses caused by the exponential growth of the Internet. Even when wasting a lot of such addresses due to the inefficient use of network addresses, thousands of IP addresses could be assigned to each square meter of the Earth’s surface. • New address classes exist, including addresses for Internet service providers and geographical regions. • Due to the simpler header, routing is made more efficient. Additionally, IPv6 supports an arbitrary list of options that may be skipped by routers that do not support them. • IPv6 supports authentication and encryption. • IPv6 supports QoS for real-time applications. Of course, multicasting is also an intrinsic capability of IPv6 but will not be treated here. For more information, see [RFC2373] and [RFC2460]. Even though IPv6 offers substantial advantages, its implementation is costly and requires buying new routers and reconfiguring existing hosts. For these reasons, IPv4 still is the Internet Protocol today, and IPv6 will not dominate the Internet until the year 2010 or even later.
2.5.4 Address Resolution Protocol The Address Resolution Protocol (ARP) defined in [RFC826] and its complement, the Reverse Address Resolution Protocol (RARP) defined in [RFC903], are a means for connecting OSI layer 2 addresses to their corresponding layer 3 IP addresses. Basically, computers communicate with each other by sending messages on the data link layer (and subsequently the physical layer), for instance, by sending an Ethernet frame over an Ethernet variant. On this level all network cards following IEEE 802 are identified by globally unique 6-byte-long identifiers called MAC addresses. In order to successfully send an Ethernet frame, each sending network card must put both its own MAC address and the MAC address of the receiving card into the Ethernet frame. If too many computers are connected by a single layer 2 network (possibly via hubs, bridges, or switches), senders often know only the IP address of a receiver. However, for Ethernet cards, IP addresses are meaningless. In such situations, ARP can be used to find out the MAC address of a network card, which at a higher layer is bound to a given IP address. If computer A wants to find out the MAC address of a network card on computer B, which according to its IP address belongs to the same layer 2 subnet, then on computer A ARP is automatically activated. At first, computer A looks into a small ARP cache to find out if the desired binding is already stored there. If not, computer A generates an ARP request message (who is B.B.B.B tell A.A.A.A, where B.B.B.B is the IP address of computer B and A.A.A.A the IP address of computer A), which is no more than a special Ethernet frame containing the following information: • Ethernet protocol type is set to 0¥806.
© 2005 by CRC Press
IP Internetworking
• • • •
2-17
Sender MAC address. Sender IP address. Receiver MAC address is set to the Ethernet broadcast address FF:FF:FF:FF:FF:FF. Receiver IP address.
As the receiver in this Ethernet frame is the broadcast address, all network cards connected to the same subnet will receive this request, including computer B. Upon the reception of the ARP request message, computer B will then activate its own ARP, which will immediately send an ARP response message (B.B.B.B is HH:HH:HH:HH:HH:HH, where B.B.B.B is the IP address and HH:HH:HH:HH:HH:HH is the MAC address of the network card of computer B). Once the ARP response message has been received by computer A, computer A will store this IP-MAC address binding for computer B in its ARP cache and may start sending Ethernet frames to computer B. In order to avoid outdated ARP caches, these caches are periodically emptied. The purpose of RARP is to let computers find out their IP addresses upon start-up, in case they only know their MAC addresses. This can be the case, for example, for diskless workstations, which automatically attach to a server, or for workstations with identical disk images (which do not require manual setup). RARP works in a manner similar to that of ARP, except that the protocol type value is set to 0¥8035. Also, a router is required that contains a table with the MAC-IP bindings. Alternatives to RARP are given by the Bootstrap Protocol (BOOTP) or the Dynamic Host Configuration Protocol (DHCP), which allow the resolution of IP addresses in a more flexible way.
2.5.5 Internet Control Message Protocol The Internet Control Message Protocol (ICMP), defined in [RFC792] and [RFC1122], is used for automatically sending control signals and commands between computers attached to an IP network. Also, ICMP messages can be used for testing connections and measuring interconnection performance. ICMP messages are sent as special IP packets and thus can be handled by routers. As a consequence, ICMP messages can be sent to or received from arbitrary computers connected with each other over an IP network. An ICMP message contains the following data: • • • •
The type defines the purpose of the ICMP packet. There are over 30 different ICMP types. The code further defines the packet’s purpose. A header checksum. The rest of the packet may then contain further data depending on the ICMP type.
The most important ICMP packet types are: Echo Request: When receiving such an ICMP packet, the receiver should answer with an ICMP Echo Reply packet. Echo Reply: Answer to an ICMP Echo Request packet. Time Stamp Request: The same as Echo Request, except that the receiver answers with a Time Stamp Reply packet, which holds additional time stamps. Time Stamp Reply: The answer to an ICMP Time Stamp Request, which holds the time points at which the Time Stamp Request was received and the Time Stamp Reply was sent back. Destination Unreachable: This message is returned by a router to the source host to inform it that the destination of a previously sent packet cannot be reached. Time Exceeded: Sent from a router to a source host to inform it that the lifetime of a previously sent packet has reached zero. Parameter Problem: Sent to a source host to inform it that a previously sent packet contains invalid header data. Source Quench: Sent to a source host to inform it that due to insufficient bandwidth, it should lower its sending bit rate.
© 2005 by CRC Press
2-18
The Industrial Communication Technology Handbook
2.5.6 Internet Group Management Protocol The Internet Group Management Protocol (IGMP) is defined in [RFC2236] and is used by IP hosts to inform multicast routers in their LAN about their multicast group memberships (see Section 2.5.2.4). IGMP messages are encapsulated into IP datagrams with protocol number 2. The goal is to ensure that the multicast router knows whenever a host in its multicast island joins or leaves a multicast group. As a necessary prerequisite, all hosts wishing to receive multicast traffic must join the local LAN all-hostsgroup with multicast IP address 224.0.0.1. Periodically, multicast routers send a Host Membership Query message to the all-hosts-group of their attached LANs. Upon receiving this message, each host answers by reporting those host groups it is a member of by sending appropriate Host Membership Report messages (one for each group). In principle, the multicast router is interested only in whether, for a specific group A, there are members in the LAN. Thus, even if several hosts are members of the same group A, it is sufficient that only one membership report for A reaches the router. In order to minimize the sent membership reports, before sending a membership report for A, each member of A first waits a random amount of time. Then, if no other membership report of some other member of A has been received, the host sends its own membership report to the group address of A, thus reaching the multicast router (which receives all multicast traffic) and all other members of A (which in turn suppress their own membership reports for A). Additionally to the above scheme, if a host newly joins a multicast group, it sends a Host Membership Report to the multicast router without waiting for a query, thus being able to receive the respective traffic immediately in case it is the first member of this group in the LAN. To cover the probability of lost reports, this is done at least twice. An IGMP version 2 packet has the following format: • The type field defines the type of the message. • The maximum response time is meaningful only on Membership Query messages and defines the maximum allowed time before sending a report message (unit is 1/10 second). • The checksum is computed over the whole IP payload. • The group address field contains the respective multicast address. There are various types of IGMP messages: • • • •
Host Membership Query Group Specific Query Version 1 and Version 2 Membership Report Leave Group
Whenever a host leaves a group, it may send a Leave Group message to the all-routers multicast group (224.0.0.2) to inform all routers of the LAN that there are possibly no more members of this group present. If the leaving host was the one host that actually answered the last Membership Query for this group, then it should send this message. Upon receiving a Leave Group message, a router sends one or more Group Specific Query messages to the group that the host has left, to verify whether members of this particular group are left.
2.6 Transport Layer Protocols and Services 2.6.1 Transmission Control Protocol The Transmission Control Protocol (TCP) operates at OSI layer 4 (in the TCP/IP reference model at the transport layer) on top of IP and is assigned the IP number 6. It constitutes the most important Internet protocol and is defined in [RFC793], [RFC1122], and [RFC1323]. The purpose of TCP is twofold: • To guarantee the correct delivery of packets sent over an intrinsically unreliable packet-oriented IP network.
© 2005 by CRC Press
2-19
IP Internetworking
TCP
FIGURE 2.9 Full-duplex TCP connection. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
• To control the output bit rate of each sender in order to minimize packet losses due to congested routers or receivers. TCP operates connection oriented in full duplex. Applications using TCP may assume that a TCP connection opened from a source host to a receiver host is like a reliable pipeline or byte stream. Data (arbitrary bytes) put into this pipeline are guaranteed to drop out at the receiver without losses and in correct order (Figure 2.9). In order to guarantee this correctness, TCP divides the data to send into so-called segments, which are themselves sent in IP packets. In principle, IP packets can hold up to 65,535 bytes. However, in order to avoid fragmentation, the size of TCP segments is more importantly limited by the network’s MTU. Each segment starts with a TCP header, which is at least 20 bytes long, but may hold additional options. The rest of the segment may hold user data, but may also be empty. The TCP header contains the following information: • Source port and destination port. • A sequence number identifying each sent byte. This is wrapped back to zero, in case the highest number has been used. • An acknowledgment number denoting the number of the next expected byte. This field only contains valid data if the ACK bit is set. • The data offset holding the size of the TCP header. • Explicit congestion notification (ECN) and control bits, including URG, ACK, PSH, RST, SYN, and FIN. • Sender receive window size. • A header checksum. • An optional pointer to urgent data (URG flag set) and optional TCP headers. In order to create a TCP connection between two applications X and Y running on computers A and B, both applications first must get a port number, an identifier between 0 and 65,535, which can be assigned only once on each computer. The application X initiating the connection then must provide its own port number, the IP address of computer B, and the port number of the partner application Y to TCP. TCP then sends a segment to the given IP address and port number, where the SYN flag is set to 1 and ACK is set to 0, and a random sequence number x is chosen. If application Y correctly waits at the given port, the TCP on computer B answers with a segment, where the SYN and ACK bits are set, the sequence number of side B is set to a random number y, and the acknowledgment number is set to x + 1. Upon reception of this second segment, the TCP on computer A again sends a segment, where the SYN and ACK flags are set, the sequence number is set to x + 1, and the acknowledgment number is set to y + 1. As three segments must be sent for establishing a TCP connection, this process is called three-way handshake (see Figure 2.10). After the establishment of the connection, each side may send arbitrary bytes to the other side. If one side wants to terminate the connection, a segment with set FIN flag must be sent. Otherwise, if, for instance, application X sends data to Y, then the data are put into one or more TCP segments, which are then sent via IP to computer B. Due to the sequence numbers of each segment, the TCP layer at B is able to realize missing segments or the out-of-order delivery of segments. For each correctly received segment, B must send an acknowledgment segment back to A, where the acknowledgment number identifies the number of the next expected byte. The TCP on computer A, on the other hand, starts a
© 2005 by CRC Press
2-20
The Industrial Communication Technology Handbook
Host 1
Host 2
SY N ( S EQ =
=y (S E Q SYN
SYN
(S EQ
, AC K
= x+ 1 ,
x)
=x +1 )
AC K=
y+ 1 )
FIGURE 2.10 TCP three-way handshake. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
so-called retransmission timer for each sent segment. If no acknowledgment has been received within a certain amount of time, computer A assumes that the segment was lost and has to be sent again. TCP also maintains two so-called sliding windows in order to control the transmission bit rate of each sender (flow control). One window simply tells each sender how many bytes the receiver may currently receive without risking a buffer overflow. This information is transmitted in each ACK segment in the receive window size field. The second window is called congestion window (CWND). Here, each sender additionally restricts the number of bytes it may send without acknowledgment to the congestion window size. Initially, the window size is set to 1 packet (i.e., the maximum allowed segment size), a strategy that is called slow start. For each acknowledged byte, TCP increases the size of its congestion window, at first with exponential speed, but after reaching a certain threshold h, only with linear speed. If a timeout of the retransmission timer occurs, h is set to h/2 and the congestion window is reset to one packet. Instead of waiting for the retransmission timer to time out, a strategy called fast retransmit enables receivers to send duplicate ACKs to the sender, in case out-of-order segments are received. A sender receiving more than two or three such duplicate ACKs may deduce that an intermediate segment has been lost rather than that the segments have been just remixed on the way, and may retransmit the missing segment earlier [RFC1122].
2.6.2 User Datagram Protocol The User Datagram Protocol (UDP) is the second important IP at OSI layer 4 (in the TCP/IP reference model at the transport layer) [RFC768, RFC1122] and is assigned the IP number 17. It is meant for transporting application data in a message-oriented, unreliable manner from one application to another. As most functionality is already provided by IP, the UDP header only contains the port numbers of the source and receiver applications, the length of the UDP packet, and a checksum. As UDP does not provide any functionality for detecting lost packets or out-of-order delivery, it is mostly used either in local networks with large bandwidths and reliable layer 2 transport, or for transporting multimedia data like live broadcasts, where a few lost packets will not seriously decrease the perceived quality of the presentation. In any case, detection of lost packets or out-of-order delivery must be carried out by the receiving applications, usually by including sequence numbers into the UDP application data. The interpretation of these numbers is left solely to the applications.
© 2005 by CRC Press
2-21
RE
PATH
V
RESV
TH
Source
ES R
SV
PA
PATH
PA TH
IP Internetworking
RESV
Destination
FIGURE 2.11 RESV and PATH messages in a multicast tree. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
2.6.3 Resource Reservation Protocol IPv4 does not contain mechanisms for guaranteeing a minimum quality of service (QoS) for its traffic, for instance, a minimum sustainable end-to-end bit rate or a maximum end-to-end delay or jitter (delay variation). This may severely affect the presentation quality of real-time transmissions, using, for example, Real-Time Protocol (RTP) (see application layer protocols). The Resource Reservation Protocol (RSVP) tries to fill this gap by providing means for guaranteeing certain quality of service parameters [RFC2205, RFC2750]. It is an optional add-on for Internet routers and clients using IP (IPv4 and IPv6), and is currently available on a small subset of Internet hosts only. Being at the same level as TCP or UDP, it has its own IP number (47). RSVP is no routing protocol itself, but rather a signaling protocol. It cooperates with other routing protocols for controlling efficient unicast and multicast over IP. RSVP allows two different QoS modes. In the controlled load service, RSVP simulates a lightly loaded network for its clients, although the network itself may be overloaded [RFC2211]. Although no hard QoS parameters are met, a lightly loaded network is likely to be sufficient for many load-tolerant and adaptive applications like audio/video streaming. In contrast, the guaranteed service guarantees that the RSVP path will meet the agreed QoS level at all times [RFC2212]. A client application wishing to receive a multicast multimedia stream passes this request to its local RSVP daemon. This daemon then sends a reservation (RESV) request to adjacent RSVP routers toward the multimedia source along the reverse multicast tree path. The RESV request contains a description of the desired quality of service in a so-called flow descriptor. Coming from the other side, the multicast source periodically sends PATH messages down the multicast tree. PATH messages create and acknowledge valid and active multicasting paths (Figure 2.11). Also, they carry information about the quality of service of the path from source to receiver. RSVP routers may merge different QoS requests into one single reservation, here choosing the maximum of each request as the prereserved QoS level. During runtime, reservations may be changed to other QoS levels. Also, RSVP paths must be acknowledged periodically by PATH and RESV messages, but RSVP is fault tolerant with respect to a few missing messages. Only if none have been received for a certain time is the whole path cancelled. On each RSVP router, an RSVP daemon manages and controls the IP routing process. It consists of the following modules: An incoming QoS reservation request is approved or denied by the admission control, depending on whether the QoS request can be satisfied. The rights for making reservations are checked in the policy control module. Incoming data packets are sorted by the packet classifier, which puts them into different queues. Finally, the packet scheduler is responsible for granting the agreed QoS to the packets in the routing queues; packets belonging to the same queue are treated identically.
2.7 Presentation Layer Protocols and Services Applications may send arbitrary data to others, often embedding complex data structures into their messages. In this process, the data structures have to be transformed (flattened, marshalled) into a sequence of bytes, containing the data as well as information about the used data representation. The
© 2005 by CRC Press
2-22
The Industrial Communication Technology Handbook
receiver must be able to understand the structure of the byte sequence and how to interpret the single bytes in order to reconstruct the sent data structures. This is achieved by the presentation layer (layer 6 of the ISO/OSI model). The presentation layer ensures that two computer systems may successfully communicate even if they use different data representations. Due to different data representation schemes, the presentation layer often is forced to translate sent or received messages. This, however, should be done in a manner totally transparent to the OSI application layer above. Problems may arise, for instance, because of the CPU byte order. In modern 32-bit architectures, CPUs store values and addresses using 32 bits, stored in four consecutive bytes. In Intel processors, for example, the least significant byte is stored first and the most significant byte last. This is called little endian. On the other hand, for example, Motorola processors store a 4-byte value in the reverse order, called big endian. If an Intel-based computer sends a 32-bit value to a Motorola-based computer, without further corrective measures, the receiver totally misinterprets the received value. This may be prevented, for instance, by forcing the sender to convert the data to the receiver’s format before sending, or alternatively forcing the receiver to convert the data from the sender’s format after receiving. A third approach is to agree to a commonly used format and to convert to this format before sending or from this format after receiving. TCP/IP, for instance, defines a common network byte order. Using, for example, the C programming language, 32-bit values may be converted to and from this format by the macros hton() and ntoh(). Another system using an external data format is given by the external data representation (XDR), as specified by Sun [SUN1990]. An additional problem arising in different computers is the code interpretation. For instance, characters may be stored using one of the following codes: ASCII (common in Intel compatibles, 8 bits/character), EBCDIC (used on IBM mainframes, 8 bits/character), or UNICODE (16 or 20 bits/character). Here, the presentation layer is responsible for automatically translating between the various code schemes. At the next-higher decoding level, received complex data structures should be reconstructed (unmarshalled) from their flattened byte sequence representation. For inhomogeneous data, the data structures must be described by metadata, for instance, defining the data types belonging to each structure, being followed by the data values themselves. This, for instance, can be achieved by using the standardized Abstract Syntax Notation 1 (ASN.1) [X680]. Other tasks of the presentation layer include the encryption of messages and supporting authentication. Finally, the presentation layer may also be responsible for the compression of data.
2.8 Application Layer Protocols and Services In both the ISO/OSI scheme and the TCP/IP reference model, the application layer defines protocols directly to be used by applications for exchanging data with each other. These include, for instance, authentication, distributed databases and files systems, file transport, data syntax restrictions, coordination and agreement procedures, quality of service issues, e-mail, and terminal emulation. Many standard protocols are already specified by the Internet Engineering Task Force (IETF). They define standard data structures that are to be exchanged between applications. Applications following these protocols are guaranteed to be able to successfully interact with other applications over the Internet, even if these applications have been created by different sources. For instance, Web browsers following HTTP may download Web pages from any Web server connected to the Internet. The IETF-specified protocols usually use TCP for reliable transport and UDP for the transport of realtime multimedia data (although real-time multimedia data may also be sent over TCP). Usually, both control commands and pure data can be transmitted over the same TCP connection. For signaling the end of a data transmission, one of three approaches is used. In octet stuffing, the end of a data transmission is signaled by a certain byte sequence (similar to the bit stuffing used at the data link layer). If the transported data also contain this very sequence, the sequence is changed (escaped) into another sequence. The receiver must detect such a change and undo it. An example for octet stuffing is SMTP. In octet
© 2005 by CRC Press
IP Internetworking
2-23
counting, transported messages contain special headers that specify the number of data bytes to be transferred. This concept is used, for instance, in HTTP. Finally, in connection blasting, the end of a transmission is signaled by closing the TCP connection. This is used, for instance, in FTP.
2.8.1 TELNET The TELNET protocol is meant for providing a general 8-bit interface for the communication between users, hosts, and processes [RFC854]. Generally, a TELNET client running on computer A opens a TCP connection to port 23 of a TELNET server on computer B. Both sides then emulate a certain simple type of terminal called network virtual terminal (NVT), but may negotiate additional services after the connection has been established. An NVT is a bidirectional character device consisting of a printer that shows the information received from the other side, and a keyboard where keystrokes are produced and sent to the other side. TELNET defines a set of commands that may be sent in-band with the stream of data. The mechanism used here is the octet stuffing. Byte 255 is called interpret as command (IAC) and signals that the following byte specifies a TELNET command, for example, for sending an interrupt to the running process or for erasing the last character. If a data byte with value 255 is to be sent, then two bytes with value 255 are sent. Upon receiving two consecutive bytes with value 255, the receiver side must remove one of them automatically.
2.8.2 File Transfer Protocol The File Transfer Protocol (FTP) is used for transporting arbitrary binary data from one Internet host to another [RFC959]. On computer A, an FTP client is started with the IP or DNS address of the Internet computer B with which communication is desired. The FTP client then opens a TCP connection to port 21 of computer B, representing the control connection. The control connection uses the TELNET protocol underneath, and users may send control commands to the FTP server on computer B, including the request for showing the contents of the current directory at computer B (LIST), as well as changing this current directory to another one (CWD), creating new directories (MKD), etc. Additionally, the user may start uploads (STOR) or downloads (RETR) of files to and from the current directory. Upon the reception of a control command over the control connection, the server answers with a reply, sending status or error information to the client. One has to distinguish between the FTP control command that is actually sent over the control channel and commands that are typed in by users into a command line application, which may be different. Once data are to be sent, a TCP data connection is opened by the server on computer B from port 20 to the user client on computer A listening on port 21. Then, depending on the specified direction, the data are sent either from A to B or vice versa. After transmitting the last byte, the sender must close the data connection, indicating to the other side that the transmission has ended. It is worth noting that FTP knows different transmission modes. In the binary mode, the data are sent without modification. In the ASCII mode, the FTP automatically changes different character representation codes, for instance, when sending a pure text file from an IBM mainframe (using EBCDIC) to a PC (using ASCII), or when exchanging data between different operating systems like Microsoft Windows and Unix or Unix-like operating systems (having different end-of-line representations in text files).
2.8.3 Hypertext Transfer Protocol The Hypertext Transfer Protocol (HTTP) is available as version 1.0 [RFC1945] and version 1.1 [RFC2616]. Its purpose is to manage the download of documents that are part of the World Wide Web (WWW), usually following the Hypertext Markup Language (HTML) [RFC1866]. Most Web browsers and servers nowadays understand HTTP/1.0, although [RFC1945] is not a standard but rather an informational guideline. Newer Web clients and servers also support the standardized HTTP/1.1.
© 2005 by CRC Press
2-24
The Industrial Communication Technology Handbook
HTTP is a client/server-based protocol following the octet-counting approach. A client wishing to download a specific document from a Web server opens a TCP connection to the server port 80 (sometimes 8080). The client then sends a request, containing a request line, various headers, an empty line, and an optional body. The request line specifies what the client wants the server to do. For example, a request line “GET /dir1/dir2/the_document.html HTTP/1.1” informs the server that the clients wants to download the document “the_document.html,” which is situated in the directory “/dir1/dir2” by using HTTP/1.1. Clients may also send data to the server, for example, a form that has been filled out by a user. This can be done using the PUT command. The server then answers by sending a status line containing a code for success or an error description, various headers describing the downloaded document (e.g., its size or the time stamp of its last change), followed by an empty line. Finally, in the message body, the HTML document itself is transported to the client. HTTP/1.1 masters several limitations of HTTP/1.0. For example, an HTML document may contain several other subdocuments, like photos, wall papers, frames, etc. In HTTP/1.0, for each subdocument a new TCP connection has to be created. In HTTP/1.1, all subdocuments can be transported over the same persistent TCP connection.
2.8.4 Simple Mail Transfer Protocol The Simple Mail Transfer Protocol (SMTP) defines the exchange and relay of text mails over TCP/IP [RFC821]. If a mail client running on computer A wants to send mail to a receiver on computer B, it opens a TCP connection to port 25 of either computer B or an intermediate mail server that is able to pass on the mail to the receiver on computer B. Then the sender client sends SMTP commands to the receiver, which replies by sending SMTP responses. Once the sender wants to send an electronic mail, it sends the command MAIL with an identifier for the sender. If the receiver is willing to accept mail from the sender, it answers with an OK reply. Now, the sender client sends a sequence of receipt to (RCPT) commands, which identify the receivers of the mail. Each recipient is acknowledged individually by an OK reply. Once all receivers have been specified, the client sends a DATA command followed by the mail data itself. In order to indicate the end of the mail, the client sends a line containing only a period. If such a line is part of the message, the sender will introduce an additional period, which is removed by the receiver automatically (octet stuffing). In SMTP, (text) mail must be composed of 7-bit ASCII characters only (byte values 0 to 127), a limitation that was not severe in 1982 when SMTP was designed. Nowadays, electronic mail often contains multimedia attachments like audio or video files, where each byte may contain any value between 0 and 255. In order to be able to transport binary data over SMTP, these data are usually transformed into a sequence of 7-bit ASCII characters by using a byte-to-character mapping like Base64 or uuencode. Upon receiving such a transformed character sequence, the receiver must apply the inverse of the transform in order to retrieve the original binary data.
2.8.5 Resource Location Protocol Computers connected over an IP network may offer a variety of services to others, including services standardized by the IETF like DNS, SMTP, FTP, etc., as well as self-created services, for instance, for managing personal information. The Resource Location Protocol (RLP) has been designed to enable arbitrary computers to automatically find other computers that provide specific services [RFC887]. For this purpose, RLP defines a set of request messages that may be sent by the searching computer. RLP uses UDP as a transport protocol. A request message is sent to the UDP port 39 of another host and contains a question and a description of one or more services that are looked for. Depending on the question, hosts that provide the service or know of others that do answer by sending a reply message. RLP defines the following request messages:
© 2005 by CRC Press
IP Internetworking
2-25
• Who Provides? is usually broadcast into a LAN. Hosts providing one of the described services may answer; hosts that do not provide any of the specified services may not. • Do You Provide? is directly sent to some specific host. It may not be broadcast. A host receiving this message must answer, regardless of whether it provides any of the specified services. • Who Anywhere Provides? also is usually broadcast into a LAN. Hosts either providing any service or knowing other hosts that do so may answer. • Does Anyone Provide? is sent to a specific host, which must send back an answer, regardless of whether it knows of any host providing any of the services. There are two possible answers. The I Provide reply contains a (possibly empty) list of services that are supported by the answering host. The They Provide reply contains a (possibly empty) list of supported services, qualified by a list of host IP addresses supporting them. An RLP message contains the following fields: • The type field defines the question or reply type. • The flag local only specifies whether only hosts with the same IP network address should answer or be included in the answer list. • A message ID enables the mapping of received answers to previously sent requests. • Finally, the resource list contains a description of the looked-for or provided services and supporting hosts. Resources and services may by described by several fields. The first description byte specifies the IP number of the IP transport protocol the service uses, for instance, 6 for TCP or 17 for UDP. The next byte defines the port that is usually used by the service, for instance, 23 for TELNET or 25 for SMTP. Additional bytes may then define arbitrary self-created services.
2.8.6 Real-Time Protocol The Real-Time Protocol (RTP) has been designed for carrying real-time multimedia data like audio or video information [RFC3550]. Multimedia data usually are produced as a continuous stream of bits. For this stream to be carried over the network, it must be packetized and sent as a sequence of packets to one (unicast) or several (multicast) receivers. For real-time traffic, UDP is preferred over TCP, as the delivery of late or lost packets (which is mandatory for TCP) may cause the presentation to stall, which is undesirable, for example, for video conferences. Instead, in case of (a few) lost packets, small artifacts may be visible or audible, which are less annoying than a complete connection breakdown or stall. At the receiver, the original sequence of RTP packets and its content are restored, and lost packets are identified. Pure RTP does not know anything about the payload content. Instead, RTP headers may be altered to fit the needs of specific applications like audio and video conferences. Such changes are then defined in so-called profile specifications. Additionally, different RTP payload formats may be defined in payload format specifications, as, for instance, is given by [RFC2190] for H.263. The RTP specification defines the following header fields: • • • • • • • • •
The RTP version (1 or 2) Padding and header extension flags Contributing sources (CSRC) count, i.e., length of the CSRC list A marker flag M to be used freely by profiles Payload type (PT), must be interpreted by the application Sequence number, increased by one for each new RTP packet Time stamp of the sampling of the first RTP payload byte RTP synchronization source identifier, which must be unique for concurrent RTP sessions An optional contributing sources list (CSRC list)
As RTP is transported over the best-effort protocols TCP/UDP/IP, no guarantee can be made that a required bit rate is available for the real-time transport. Instead, RTP provides a means for measuring
© 2005 by CRC Press
2-26
The Industrial Communication Technology Handbook
and controlling the output bit rate and perceived quality of service of a real-time stream. This procedure is provided by the Real-Time Control Protocol (RTCP). RTCP can carry the following information: • • • • •
In a sender report, statistics for each active sender are sent to the receivers. In a receiver report, receivers (which are not senders) send reception statistics to the active senders. Sender attributes like e-mail addresses, etc. (source description). The request for leaving the presentation. Application-specific control information.
For each real-time session (transporting exactly one medium like audio or video), each participant needs two ports — one for RTP and one for RTCP. RTP is able to multiplex several sessions into one. This is done by a so-called mixer. For example, the audio data of several participants of an audio conference may be mixed into one single audio stream and sent over a connection with low bandwidth. Here, the mixer would act as a new synchronization source; the IDs of the original sources, however, may then be stored additionally after the RTP header in the list of contributing sources. Another RTP entity is a translator, which is able to change payload content or tunnel packets through a firewall.
2.9 Summary The TCP/IP suite consists of numerous protocols covering several layers of the ISO/OSI stack or, alternatively, the TCP/IP reference model. Starting at OSI layer 2, protocols are defined for link-level services for secure frame transport over IP at OSI layer 3 for the unreliable delivery of datagrams from one host connected to the Internet to another. At OSI layer 4, transport protocols regulate the either reliable and controlled or the unreliable transport of data from process to process. Further services may alter the data due to different presentation schemes or offer direct support to applications.
References [ADA2003] A. Adams, J. Nicholas, W. Siadak, Protocol Independent Multicast–Dense Mode (PIM-DM): Protocol Specification (Revised), IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/ draft-ietf-pim-dm-new-v2-04.txt. [ALB2004] Z. Alb, et al. IANA Guidelines for IPv4 Multicast Address Assignments, IETF Internet Draft, 2004, http://www.ietf.org/Internet-drafts/draft-ietf-mboned-rfc3171bis-01.txt. [COL2001] G. Coulouris, J. Dollimore, T. Kindberg, Distributed Systems, 3rd edition, Addison-Wesley, Boston, 2001. [HAL1996] F. Halsall, Data Communications, Computer Networks and Open Systems, 4th edition, AddisonWesley, Reading, MA, 1996. [IANA] Internet Assigned Numbers Authority, http://www.iana.org/. [IANAM] Internet Assigned Numbers Authority, Internet Multicast Addresses, http://www.iana.org/ assignments/multicast-addresses. [ISO7498] ISO/IEC 7498-1, Information Technology–Open Systems Interconnection: Basic Model, ISO, 1994. [KUR2001] J.F. Kurose, K.W. Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Reading, MA, 2001. [PET2000] L.L. Peterson, B.S. Davie, Computer Networks: A Systems Approach, 2nd edition, Morgan Kaufmann, San Francisco, 2000. [PUS2003] T. Pusateri, Distance Vector Multicast Routing Protocol, Version 3, IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/draft-ietf-idmr-dvmrp-v3-11.txt. [REI2001] U. Reimers, Digital Video Broadcasting, Springer, Berlin, 2001. [RFC768] RFC 768, User Datagram Protocol, IETF, 1980, http://www.ietf.org/rfc/rfc0768.txt. [RFC791] RFC 791, Internet Protocol: DARPA Internet Program Protocol Specification, DARPA, 1981, http://www.ietf.org/rfc/rfc791.txt.
© 2005 by CRC Press
IP Internetworking
2-27
[RFC792] RFC 792, Internet Control Message Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc792.txt. [RFC793] RFC 793, Transmission Control Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc793.txt. [RFC821] RFC 821, Simple Mail Transfer Protocol, IETF, 1982, http://www.ietf.org/rfc/rfc821.txt. [RFC826] RFC 826, An Ethernet Address Resolution Protocol, IETF, 1982, http://www.ietf.org/rfc/ rfc826.txt. [RFC854] RFC 854, Telnet Protocol Specification, IETF, 1983, http://www.ietf.org/rfc/rfc854.txt. [RFC887] RFC 887, Resource Location Protocol, IETF, 1983, http://www.ietf.org/rfc/rfc887.txt. [RFC903] RFC 903, A Reverse Address Resolution Protocol, IETF, 1984, http://www.ietf.org/rfc/rfc903.txt. [RFC959] RFC 959, File Transfer Protocol (FTP), IETF, 1985, http://www.ietf.org/rfc/rfc959.txt. [RFC977] RFC 977, Network News Transfer Protocol: A Proposed Standard for the Stream-Based Transmission of News, IETF, 1986, http://www.ietf.org/rfc/rfc977.txt. [RFC1034] RFC 1034, Domain Names: Concepts and Facilities, IETF, 1987, http://www.ietf.org/rfc/ rfc1034.txt. [RFC1035] RFC 1034, Domain Names: Implementation and Specification, IETF, 1987, http:// www.ietf.org/rfc/rfc1035.txt. [RFC1075] RFC 1075, Distance Vector Multicast Routing Protocol, IETF, 1988, http://www.ietf.org/rfc/ rfc1075.txt. [RFC1112] RFC 1112, Host Extensions for IP Multicasting, IETF, 1989, http://www.ietf.org/rfc/ rfc1112.txt. [RFC1122] RFC 1122, Requirements for Internet Hosts: Communication Layers, IETF, 1989, http:// www.ietf.org/rfc/rfc1122.txt. [RFC1157] RFC 1157, A Simple Network Management Protocol (SNMP), IETF, 1990, http:// www.ietf.org/rfc/rfc1190.txt. [RFC1323] RFC 1323, TCP Extensions for High Performance, IETF, 1992, http://www.ietf.org/rfc/ rfc1323.txt. [RFC1584] RFC 1584, Multicast Extensions to OSPF, IETF, 1994, http://www.ietf.org/rfc/rfc1584.txt. [RFC1853] RFC 1853, IP in IP Tunneling, IETF, 1995, http://www.ietf.org/rfc/rfc1853.txt. [RFC1866] RFC 1866, Hypertext Markup Language: 2.0, IETF, 1995, http://www.ietf.org/rfc/rfc1866.txt. [RFC1883] RFC 1883, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1995, http://www.ietf.org/ rfc/rfc1883.txt. [RFC1887] RFC 1887, An Architecture for IPv6 Unicast Address Allocation, IETF, 1995, http:// www.ietf.org/rfc/rfc1887.txt. [RFC1918] RFC 1918, Address Allocation for Private Internets, IETF, 1996, http://www.ietf.org/rfc/ rfc1918.txt. [RFC1945] RFC 1945, Hypertext Transfer Protocol: HTTP/1.0, IETF, 1996, http://www.ietf.org/rfc/ rfc1945.txt. [RFC2190] RFC 2190, RTP Payload Format for H.263 Video Streams, IETF, 1997, http://www.ietf.org/ rfc/rfc2190.txt. [RFC2205] RFC 2205, Resource ReSerVation Protocol (RSVP), IETF, 1997, http://www.ietf.org/rfc/ rfc2205.txt. [RFC2211] RFC 2211, Specification of the Controlled-Load Network Element Service, IETF, 1997, http:// www.ietf.org/rfc/rfc2211.txt. [RFC2212] RFC 2212, Specification of Guaranteed Quality of Service, IETF, 1997, http://www.ietf.org/ rfc/rfc2212.txt. [RFC2236] RFC 2236, Internet Group Management Protocol, Version 2, IETF, 1997, http://www.ietf.org/ rfc/rfc2236.txt. [RFC2362] RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, IETF, 1998, http://www.ietf.org/rfc/rfc2362.txt. [RFC2365] RFC 2365, Administratively Scoped IP Multicast, IETF, 1998, http://www.ietf.org/rfc/ rfc2365.txt.
© 2005 by CRC Press
2-28
The Industrial Communication Technology Handbook
[RFC2373] RFC 2373, IP Version 6 Addressing Architecture, IETF, 1998, http://www.ietf.org/rfc/ rfc2373.txt. [RFC2460] RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1998, http://www.ietf.org/ rfc/rfc2265.txt. [RFC2616] RFC 2616, Hypertext Transfer Protocol: HTTP/1.1, IETF, 1999, http://www.ietf.org/rfc/ rfc2616.txt. [RFC2750] RFC 2750, RSVP Extensions for Policy Control, IETF, 2000, http://www.ietf.org/rfc/ rfc2750.txt. [RFC2770] RFC 2770, GLOP Addressing in 233/8, IETF, 2000, http://www.ietf.org/rfc/rfc2770.txt. [RFC2784] RFC 2784, Generic Route Encapsulation (GRE), IETF, 2000, http://www.ietf.org/rfc/ rfc2784.txt. [RFC2909] RFC 2909, The Multicast Address-Set Claim (MASC) Protocol, IETF, 2000, http:// www.ietf.org/rfc/rfc2909.txt. [RFC3550] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, IETF, 2003, http:// www.ietf.org/rfc/rfc3550.txt. [RFC3376] RFC 2376, Internet Group Management Protocol Version 3, IETF, 2002, http://www.ietf.org/ rfc/rfc3376.txt. [SCO1991] T. Scocolowski, C. Kale, A TCP/IP Tutorial, IETF, Network Working Group, RFC 1180, January 1991, http://www.ietf.org/rfc/rfc1180.txt. [SUN1990] Network Programming, Sun Microsystems, Inc., Mountain View, CA, March 1990. [TAN1996] A.S. Tanenbaum, Computer Networks, 3rd edition, Prentice Hall, 1996. [X680] Information Technology: Abstract Syntax Notation One (ASN.1): Specification of Basic Notation, ITU-T Recommendation X.680 (1997), ISO/IEC 8824-1, 1998.
© 2005 by CRC Press
3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues 3.1 3.2 3.3
Introduction ........................................................................3-1 Routing and Routers ...........................................................3-2 Routing Algorithm Design Issues ......................................3-3 Optimality • Convergence • Scalability • Robustness • Flexibility and Stability • Simplicity
3.4
Classification of Routing Protocols ...................................3-4 Static or Dynamic • Global or Decentralized • Link State or Distance Vector • Single Path or Multipath • Flat and Hierarchical • Intra-AS and Inter-AS • Unicast and Multicast
3.5
IP Unicast Routing: Interior and Exterior Gateway Protocols ..............................................................................3-8 Interior Gateway Protocols for IP Networks • Exterior Gateway Protocols for IP Internetworks
3.6
IP Multicast Routing.........................................................3-17 Distance-Vector Multicast Routing Protocol • Multicast OSPF • Protocol-Independent Multicast • Core-Based Tree • Interdomain IP Multicast Routing
3.7
IP Addressing and Routing Issues....................................3-19 Classful IP Addressing • Impact of IP Addressing on Routing Tables and Internet Scalability • Subnetting • Variable-Length Subnet Masks • Classless Interdomain Routing
3.8
IPv6 Overview ...................................................................3-24 IPv6 Addressing, Subnetting, and Routing • IPv6 Deployment in the Current Internet: State of the Art and Migration Issues
Lucia Lo Bello University of Catania
3.9 Conclusions .......................................................................3-29 References .....................................................................................3-29
3.1 Introduction This chapter addresses routing from a broad perspective. After an introduction on routing algorithm principles, an overview of the routing protocols currently used in the Internet domain is presented. Both unicast and multicast routing are dealt with. The chapter then focuses on the strict correlation between Internet Protocol (IP) routing and addressing.
3-1 © 2005 by CRC Press
3-2
The Industrial Communication Technology Handbook
The impact of the traditional classful addressing scheme on the size of routing tables for Internet routers and its poor scalability in today’s Internet is discussed. Then, classless interdomain routing (CIDR), which, for the time being, has solved the problems previously mentioned, is presented and discussed. Finally, the next-generation IP, which represents the long-term solution to the problems of the current Internet, is introduced. IP version 6 (IPv6) is outlined and issues on the IPv4-to-IPv6 transition addressed.
3.2 Routing and Routers Two or more networks joined together form an internetwork, where network layer routing protocols implement path determination and packet switching. Path determination consists of choosing which path (or route) the packets are to follow from a source to a destination node, while packet switching refers to transporting them. Path determination is accomplished by routing algorithms that, given a set of routers and links connecting them, determine the best (i.e., least-cost) path from source to destination, according to a given cost metric. A router is a specialized network computing device, similar to a computer, but optimized for packet switching. It typically contains memory (ROM, RAM, Flash) and some kind of bus and is equipped with an operating system (OS), a configuration, and a user interface. As happens inside a computer, in a router a boot process loads bootstrap code from the ROM, thus enabling the device to load its operating system and configuration into the memory. A significant difference between a router and a computer lies in the user interface and memory configuration. While DOS or UNIX systems typically have one physical bank of memory chips that will be allocated by the software to different functions, routers feature several distinct banks of memory, each dedicated to a different function. In many routers, OSs are stripped-down schedulers derived from early versions of Free BSD (Berkeley Software Distribution) UNIX. A growing interest in the Linux OS has recently appeared. Some vendors run proprietary OSs on their routers (for example, Cisco routers run the Internetwork Operating System (IOS), which embeds a broad set of functions). A router’s task is to switch IP packets between interconnected networks. In order to allow the calculation of the best path for individual packets, routing protocols enable routers to communicate with each other, exchanging both topology information (e.g., about neighbors and routes) and state information (e.g., costs), which are fed into routing tables. A routing table consists of a list of routing entries indicating which outgoing link should be used to forward packets to a given destination. Figure 3.1 shows a simplified routing table. When a router receives an incoming packet, it checks the routing table to find a destination/nexthop association for the destination address specified in the packet. The routing table data structure contains all the information necessary to forward an IP data packet toward its destination. When forwarding an IP data packet, the routing table entry providing the best match for the packet’s IP destination is chosen.
Destination Network
Next Router
# of Hops to Destination
Interface
205.219.0.0
205.219.5.2
—
Ethernet 0
151.5.0.0
160.4.2.5
5
Ethernet 0
Default
193.55.114.128
2
Ethernet 1
FIGURE 3.1
A simplified routing table.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-3
3.3 Routing Algorithm Design Issues While designing routing algorithms, several issues have to be addressed. The main ones are listed and discussed below.
3.3.1 Optimality Optimality is the ability to find the optimal (i.e., least-cost) path from source to destination according to a given metric. A single metric may be used, or a combination of multiple metrics in a single hybrid one. The most common routing metric is path length, usually expressed in terms of hop count, i.e., the number of hops that a packet must make on its path from a source to a destination. Alternatively, a network administrator may assign arbitrary costs (expressed as integer values) to each network link and calculate the path length as the sum of the costs associated with each link traversed. These costs account for several link features, such as: • Bandwidth. • Routing delay: The length of time required to move a packet from source to destination through the internetwork. • Load: The degree of utilization of a router, obtained by monitoring variables such as CPU utilization or packets routed per second. • Reliability: Accounts for the link’s fault probability or recovery time in the event of failure. • Monetary cost: Companies may prefer to send packets over their own lines, even though slower, rather than through faster, but expensive external lines that charge money for usage time. Different routing protocols generally adopt different metrics and algorithms that are not compatible with each other. As a result, in a network where multiple routing protocols are present, a way to determine the best path across the multiple protocols has to be found. Each routing protocol is therefore labeled with an integer value that defines the trustworthiness of the protocol, called the administrative distance. When there are multiple different routes to the same destination from two different routing protocols, routers will select the route supplied by the protocol with the shortest administrative distance.
3.3.2 Convergence When a router detects a topology change (e.g., new routes being added, existing routes changing state, etc.), this information must be propagated through the network and a new routing topology calculated. Routers achieve this by distributing routing update messages to the other routers, thus stimulating recalculation of optimal routes and eventually causing all routers to agree on these routes. The time taken to detect changes in the network topology, reconfigure the topology correctly, and agree, called convergence time, is a very important characteristic of routing algorithms. Slow convergence should be avoided, as it may entail network interruption or routing loops. These occur when, due to slow convergence, a packet arriving at a router A or B and destined for a router C bounces back and forth between these two routers until either convergence is reached or the packet has been switched the maximum number of times allowed. Convergence time may depend either on the network topology and size (e.g., number of routers, link speeds, routing delays) or on the routing protocol used and the setting of the relevant timing parameters.
3.3.3 Scalability Routing algorithms that behave well in small systems should also scale well in larger internetworks. Unfortunately, some routing algorithms (such as those based on heavy flooding techniques), while performing well in small networks, are not suitable for use in large-size ones.
© 2005 by CRC Press
3-4
The Industrial Communication Technology Handbook
3.3.4 Robustness Routing algorithms should perform correctly even in the presence of unusual or unforeseen events (e.g., router failures, misbehavior, sabotage).
3.3.5 Flexibility and Stability When responding to network changes (e.g., in bandwidth, router queue size, network delay), routing algorithms should exhibit flexible, but stable behavior.
3.3.6 Simplicity In order to reduce the overhead on routers (in terms of both processing and storage), routing algorithms should be as simple as possible. Moreover, they have to exploit system resources efficiently (especially when executing on resource-constrained hosts).
3.4 Classification of Routing Protocols Routing protocols may be classified according to se ver al different char acter istics [Cisco03][Kenyon02][Kurose01]. Here we will address the most relevant. As will be seen, routing protocols may be static or dynamic, global or decentralized, link state or distance vector, single path or multipath, flat or hierarchical, intra-AS or inter-AS, unicast or multicast.
3.4.1 Static or Dynamic Static algorithms are based on fixed tables. Static routes seldom change, and when changes do occur, it is usually a result of human intervention (i.e., editing a router’s forwarding table). Static routing algorithms are simple, introduce a low overhead, and are suitable for environments where network traffic is stable and predictable. Static routing is commonly adopted where there is no need for an alternative path (for example, in permanent point-to-point wide area network (WAN) links to remote sites or dial-up Integrated Services Digital Network (ISDN) lines). Dynamic routing algorithms automatically generate the routing paths responding to the network traffic or topology changes. When a change occurs, the routing algorithm running on a router recalculates routes, reflects changes in the routing table, and then propagates updates throughout the network, thus stimulating recalculation in the other routers as well. Dynamic algorithms sometimes have static routes inserted in their routing tables. This is the case, for instance, of default routers, to which all traffic should be forwarded when the destination address is unknown (i.e., not explicitly listed in the routing table). For example, the last entry in Figure 3.1 indicates the default router, to which all traffic should be forwarded when the destination address is not explicitly listed in the routing table.
3.4.2 Global or Decentralized A global algorithm makes the routing decision on the basis of complete information about the network, in terms of connectivity and link costs. The calculation of the best path can be up to a single site or replicated over multiple ones. In a decentralized routing algorithm no site has complete knowledge of the network, and route calculation is iterative and distributed. Each node only knows the status of the links directly connected to it. This information is then distributed to its neighbors, i.e., nodes directly connected to it, and this iterative process of route calculations and exchanges enables a node to determine the least-cost path to a destination.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-5
3.4.3 Link State or Distance Vector Link-state algorithms (also called shortest-path-first algorithms) compute the least-cost path using complete, global knowledge of the network in terms of connectivity and link costs. Each router maintains a complete copy of the topology database in its routing table and floods routing information to all the nodes in the internetwork. At the beginning, each router will only know about its neighbors, but it will increase its knowledge through link-state broadcasts received from all the other routers. The router does not send the entire routing table, but only the portion of the table that describes the state of its own links. Distance-vector algorithms (also known as minimum-hop or Bellman–Ford algorithms) require each router to keep track of its distance (hop count) from all other possible destinations. Each node receives information from its directly connected neighbors, calculates the routing table, and then distributes the results back to the neighbors. When a change is detected in the link cost from a node to a neighbor, the router first updates its distance table and then, if the change also affects the cost of the least-cost path, notifies its neighbors. Distance-vector algorithms are distributed and iterative, as the routing table distribution process goes on until no more exchanges with neighbors occur. Routers can identify new destinations as they come into the network, learn of failures in the network, and calculate distances to all known destinations. Each router advertises on a regular basis all the destinations it is aware of with the relevant distances and sends update messages containing all the information maintained in the routing table to neighboring routers on directly connected segments. Each router can therefore build a detailed picture of the network topology by analyzing routing updates from all other routers. The best route for each destination is determined according to a minimum-distance (minimum-hop) rule. Table 3.1 compares link-state and distance-vector routing algorithms. As all the routers share the same knowledge of the network in link-state algorithms, they have a consistent view of the best path to a given destination. This would entail a sudden change in the load on the least-cost link, and even congestion if all the routers decided to send their packets through that link at the same time. TABLE 3.1
Link-State vs. Distance-Vector Routing Algorithms Link State Global
Distance Vector Decentralized
“Tell the world about the neighbors”; i.e., a router does not send the entire routing table, but only the portion of the table that describes the state of its own links More robust, as each router autonomously calculates its routing table Fast Good High
Type Route advertising
Message overhead
High: any change in a link cost entails the need to send all nodes the new cost
Implementation complexity Processor and memory requirements Stability
High
“Tell all neighbors about the world”; i.e., routers tend to distribute the entire routing table (or large portions of it) to their directly attached neighbors only An incorrect node calculation can be spread over the entire network Slow (routing loops may occur) Poor “Good news propagates fast, bad news propagates slowly”; i.e., a decrease in the cost of the best path propagates fast, while an increase goes slowly Low: when link costs change, the results of the change will be propagated only if the latter entails a change in the least-cost path for one of the nodes attached to that link Low
High
Low
Problematic, due to oscillation in routes
Good
Robustness Convergence Scalability Responsiveness to network changes
© 2005 by CRC Press
3-6
The Industrial Communication Technology Handbook
A way to avoid such oscillations would be to ensure that all the routers do not run the algorithm at the same time. However, it has been noted that routers on the Internet can self-synchronize. Even though they initially execute the routing algorithm at the same rate, but at different times, the algorithm execution instance will eventually become synchronized at the routers [Floyd97]. To deal with this problem, randomization is introduced into the period between the execution instants of the algorithm at each router.
3.4.4 Single Path or Multipath Routing protocols may be single path or multipath. The difference lies in the fact that multipath algorithms support multiple entries for the same destination in the routing table, while single path ones do not. The presence of alternative routes in multipath routing protocols allows traffic to be multiplexed over several circuits (LAN (local area network) or WAN), thus providing not only greater throughput and topological robustness, but also support for load balancing (i.e., splitting traffic between paths that have equal costs). In multipath algorithms, multiplexing may be packet based or session based. In the former case, a roundrobin technique is typically used. In the latter case, load sharing is performed on a session basis, typically using a source–destination or destination hash function.
3.4.5 Flat and Hierarchical Another distinction can be made between flat and hierarchical routing algorithms. In a flat routing algorithm, all routers are peers. Each router is indistinguishable from another as they all execute the same algorithm to compute routing paths through the entire network. This flat model suffers from two main problems: lack of scalability and poor administrative autonomy. The first derives from the growing computational, storing, and communication overhead the algorithm introduces when the number of routers becomes large. The second arises from the need to hide some features of a company’s internal network from the outside, a crucial requirement for enterprise networks. In hierarchical routing, logical groups of nodes, called domains or autonomous systems (ASs), are defined. A routing domain is a collection of routers that coordinate their routing knowledge using a single routing protocol. An AS is a routing domain that is administered by one authority (person or group). Each AS requires a registered AS number (ASN) to connect to the Internet. According to this hierarchy, some routers in an AS can communicate with routers in other ASs, while others can communicate only with routers within their own AS. In very large networks, additional hierarchical levels may exist, with routers at the highest hierarchical level forming a routing backbone (as shown in Figure 3.2). Packets from nonbackbone routers are conveyed along the backbone by backbone routers until a backbone router connected to the destination AS is found. Then packets are sent through one or more nonbackbone routers within the AS until the ultimate destination is reached. Compared to flat routing, a drawback of hierarchical routing is that suboptimal paths may sometimes be found. Nevertheless, hierarchical routing offers several advantages over flat routing. First, the amount of information maintained and exchanged by routers is reduced, and this increases the speed of route calculation, thus allowing faster convergence. Second, unlike flat routing, where a single router problem can affect all routers in the network, in a hierarchical algorithm the scope of router misbehavior is limited. This increases overall network availability. In addition, the existence of boundary interfaces between different levels in the hierarchy can be exploited to enforce security policies (e.g., access control lists or firewalls) on border routers. Hierarchical routing also improves scalability and protocol upgrades, thus making the task of the network manager easier. Thanks to the above-mentioned advantages, large companies generally adopt hierarchical routing.
3.4.6 Intra-AS and Inter-AS A very large internetwork in the IP domain is typically organized as a collection of ASs. An AS can be composed of one or more areas, made up of contiguous nodes and networks that may be further split
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
AS-3
Backbone
ASBR
EGP
EGP EGP
ASBR
ABR Area 1
IGP
Area 3 ABR Area 4
ABR
ASBR
IGP
AS-1
3-7
IGP
Area 2
IGP AS-2 Area 6 Area 4 ABR ABR IGP IGP ABR Area 5
FIGURE 3.2 A routing architecture with three ASs connected through backbone routers.
into subnetworks. Within an area, network devices equipped with the capability to forward packets between subnetworks are called intermediate systems (ISs). ISs may be further classified into those that can communicate within routing areas only (intra-area ISs) and those that can communicate both within and between routing areas (inter-area ISs). Autonomous system border routers (ASBRs) on the backbone are entrusted with routing traffic between different ASs, while area border routers (ABRs) deal with traffic between different areas within the same AS. The routing protocol used within an AS (an intra-AS routing protocol) is commonly referred to as an Interior Gateway Protocol (IGP). A separate protocol is used to interface among the ASs, called the Exterior Gateway Protocol (EGP). EGPs are usually referred to as inter-AS routing protocols. All routers within each AS will run one or more IGPs. Routing information between ASs is exchanged through the routing backbone via an EGP. The use of an EPG limits the amount of routing information exchanged among the three ASs and allows them to be managed differently. The main differences between intra-AS routing protocols (IGPs) and inter-AS routing protocols (EGPs) can be summarized in terms of policy, scalability, and performance. When dealing with inter-AS routing, enforcing policy is crucial. For example, traffic originating from a given AS might be required not to pass through another specific AS. On the other hand, policy is less critical within an AS, where everything is under the control of a single administrative entity. Scalability represents a more critical requirement in inter-AS routing, where a large number of internetworks may be involved, than in intra-AS routing. Conversely, performance is more important within an AS than in inter-AS, where it is of secondary importance compared to policy. For instance, an EGP may prefer a more costly path to another one if the former complies with certain policy criteria that the second does not fulfill.
3.4.7 Unicast and Multicast Routing protocols involving just one sender and one receiver are called unicast protocols. However, several applications require addressing packets from one source to multiple destinations. This is the case, for instance, of applications that distribute identical information to multiple users. In this case, multicast routing is required. Multicast routing enables sending a packet from one source to multiple recipients with a single operation. Multicast addresses are present in IP. A multicast address is a single identifier for a group of receivers, which are thus members of a multicast group. To deploy multicast routing, two approaches are possible. In the first one, there is no explicit multicast support at the network layer, but an emulation using multiple
© 2005 by CRC Press
3-8
The Industrial Communication Technology Handbook
point-to-point unicast connections. This means that each application-level data unit is passed to the transport layer and here duplicated and transmitted over individual unicast network layer connections. In the second option, explicit multicast support is provided: a single packet is transmitted by the source and then replicated at a network router, i.e., forwarded over multiple outgoing links, to reach the destinations. The advantage of the second approach is that there is a more efficient use of bandwidth, as only one copy of a packet will cross a link. However, this approach does have a cost. In the Internet, for instance, multicast is not connectionless, as routers on a multicast connection have to maintain state information. This entails a combination of routing and signaling in order to establish, maintain, and tear down connection state in the routers. Compared to unicast routing, where the focus is on the destination of a packet, multicast routing is backward oriented: multicast routing packets are transmitted from a source to multiple destinations through a spanning tree. Multicast IP routing and relevant protocols will be addressed in Section 3.6. Unicast IP routing is addressed in Section 3.5.
3.5 IP Unicast Routing: Interior and Exterior Gateway Protocols With reference to unicast IP routing, among the IGPs that support IP there are [Lewis99]: • • • • •
The Routing Information Protocol (RIPv1 and RIPv2) The Cisco Interior Gateway Routing Protocol (IGRP) The Cisco Enhanced Interior Gateway Routing Protocol (EIGRP) The Open Shortest-Path-First Protocol (OSPF) The Intermediate System-to-Intermediate System Protocol (IS-IS)
EGPs in the IP domain include: • The Exterior Gateway Protocol (EGP) • The Border Gateway Protocol (BGP) In the following, the different protocols in each class will be described and compared.
3.5.1 Interior Gateway Protocols for IP Networks 3.5.1.1 Distance-Vector IGPs This section describes and compares two popular distance-vector protocols supporting IP: RIP and IGRP. 3.5.1.1.1 Routing Information Protocol The Routing Information Protocol (RIP) was one of the first IGPs and has been used for routing computations in computer networks since the early days of the ARPANET. Formally defined in the XNS (Xerox Network Systems) Internet Transport Protocols publications (1981), its widespread use was favored by its inclusion, as the routed process, in the Berkeley Software Distribution (BSD) version of UNIX supporting Transmission Control Protocol (TCP)/IP (1982). Two RIP versions exist: version 1 [Hedri88] and version 2 [Malkin97]. In RIP, each router sends a complete copy of its entire routing table to all its neighbors on a regular basis (typical RIP update timer = 30 seconds). A single RIP routing update contains up to 25 route entries within the AS. Each entry contains the destination address of a host or network, the IP address of the next-hop, the distance to the destination (in hops), and the interface. To obtain the cost to a given destination, a router can also send RIP request messages. After receiving an update, a router compares the new information with the information it already possesses. If the routing update includes a new destination network, it is added to the routing table. If the router receives a route to an existing destination with a lower metric, it replaces the current entry with the new one. If an entry in the update message has the same next-hop as the current route entry, but a different metric, the new metric will be used to update the routing table.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-9
FIGURE 3.3 A simplified internetwork used to explain split horizon with poison reverse.
If a router does not hear from a neighbor for a given time interval, called a dead interval (an invalid timer is used for this purpose), it assumes that the neighbor is no longer available (either down or unreachable). As a result, the router modifies its local routing table and then notifies its neighbors of the unavailable route. After another predefined time interval (a flush timer is set for this purpose), if nothing is heard from the route, the information is flushed from the router’s routing table. Routers allow for configuration of an active or passive RIP mode on specific interfaces. The active mode means full routing capability, while passive means a listen-only mode; that is, no RIP updates are sent out. To speed up convergence, RIP uses triggered updates. That is, whenever a RIP router learns of a change, such as a link becoming unavailable, it sends out a triggered update immediately, rather than waiting for the next announcement interval. As it takes time for triggered updates to get to all the other routers in large networks, a gateway that has not yet received the triggered update may issue a regular update at the wrong time, thus causing a bad route to be reinserted in a neighbor that has already received the triggered update. In order to prevent new routes from reinstating an old link, a hold-down period is enforced in the protocol: when a route is removed, no update for that route will be accepted for a given period, until the topology becomes stable. Hold-downs have the drawback of slowing convergence. Besides hold-down, RIP implements other techniques to avoid routing loops between adjacent routers, called split horizon with poison reverse. The general split-horizon algorithm prevents routes from being propagated back to the source, i.e., down the interface from which they were learned. As an example, consider the case in Figure 3.3. During normal operations, router A will notify router B that it has a route to network 1. According to the split-horizon algorithm, when B sends updates to A, it will not mention network 1. Now let us assume that the router A interface to network 1 goes down. Without split horizon, router B would inform router A that it can get to network 1. Since it no longer has a valid route, router A might select that route. In this case, A and B would both have routes to 1. But this would result in a circular route, where A points to B and B points to A. Using split horizon with poison reverse, instead of not advertising routes to the source, routes are advertised back to the source with a cost of infinity (i.e., 16), which will make the source router ignore the route. On the whole, RIP is quite robust and very easy to set up. It is the only routing protocol that UNIX nodes universally understand and is therefore commonly used in UNIX environments. RIP is also commonly used in end-system routing as a dynamic router discovery protocol. Dynamic router discovery is an alternative to static configurations, which allows hosts to dynamically locate routers when they have to access devices external to the local network. RIP was designed to work with moderate-size networks using reasonably homogeneous technology. This makes it suitable for local networks featuring a small number of routers (about a dozen) and links with equal characteristics. RIP has a poor degree of scalability, so it is not recommended for use in more complex environments. RIP cannot be used to build a backbone larger than 15 hops in diameter,* as it
*RIP is usually configured in such a way that a cost of 1 is used for the outbound link. If a network administrator chooses to use larger costs, the upper bound of 15 can be quickly reached and will become a problem.
© 2005 by CRC Press
3-10
The Industrial Communication Technology Handbook
specifies a maximum hop count of 15. A number of hops equal to 16 corresponds to infinity. This is useful to prevent packets that get stuck in routing loops from being constantly switched back and forth between routers. As it uses a simple, fixed metric to compare alternative routes, RIP can generate suboptimal routing tables, resulting in packets sent over slow (or costly) links even in the presence of better choices. RIP is therefore not suitable for environments where routes need to be chosen based on dynamically varying parameters such a measured delay, reliability, or load (RIP does not support load balancing). RIP version 1 (RIPv1) lacks variable-length subnet mask (VLSM) [Brade87] supports, so RIPv1 is classful. As will be explained in Section 3.7, this may seriously deplete the available address space to the detriment of scalability. On the other hand, RIP version 2 (RIPv2) is classless. VLSM, classful, and classless concepts will be addressed in detail in Section 3.7. RIPv1 uses a broadcast mode to advertise and request routes, while RIPv2 has the ability to send routing updates via multicast addresses. RIPv2 also supports route aggregation techniques, i.e., the use of a single network prefix to advertise multiple networks [Chen99]. Moreover, RIPv2 provides some support for authorization data link security, implementing authentication support on a per-message basis.* RIPv2 also offers some support for ASs and IGP/EGP interaction by means of external route tags. External routes are learned from neighbors situated outside the AS, while internal routes lie completely within a given AS. In RIPv2 a route tag field is used to separate internal RIP routes from external routes imported from an EGP. Finally, RIPv2 scales better than RIPv1, but, compared with link-state protocols, it suffers from slow convergence and scalability limitations. 3.5.1.1.2 Interior Gateway Routing Protocol The Interior Gateway Routing Protocol (IGRP) [Hedri91] was created by Cisco in the 1980s. Not an open standard, it only runs on Cisco routers. IGRP shares several features with RIP. IGRP sends out updates on a regular basis (every 90 seconds) and uses update, invalid, and flush timers as well (with different values from RIP, depending on the implementation). Like RIP, IGRP uses triggered updates to speed up convergence and hold-down timers to enforce stability. However, while RIP only allows a network diameter of 15 hops, IGRP can support a network diameter of up to 255 hops, so it can be used in larger networks. IGRP differs from RIP regarding metrics, parallel route support, the reverse poisoning algorithm, and the use of a default gateway. Like RIPv1, IGRP does not support VLSM. Unlike RIP, IGRP does not use a single metric, but a vector of metrics that takes into account the topological delay time, the bandwidth of the narrowest bandwidth segment of the path, channel occupancy, and the reliability of the path. A single composite metric can be computed from this vector [Hedri91], which encompasses the effect of the various components into a single number representing how good a path is. The path featuring the smallest value for the composite metric will be the best path. The hop count and MTU (maximum transmission unit, i.e., the maximum packet size that can be sent along the entire path without fragmentation) of each network are also considered in the best path calculation. With IGRP, network administrators are supplied with a large range of metrics and they are also allowed to customize them. By giving higher or lower weight to specific metrics, network administrators can therefore influence IGRP’s automatic route selection. For example, suitable constants can be set to several different values to provide different types of service (interactive traffic, for instance, would typically give a higher weight to delay, whereas file transfer would assign a higher weight to bandwidth). IGRP is more accurate than RIP in calculating the best path, as a vector of metrics instead of a single metric improves the description of the status of a network. If, for instance, a single metric is used, several consecutive fast links will appear to be equivalent to a single slow link. While this would be appropriate for delay-sensitive traffic, it would not be so for bulk *A whole RIP entry, denoted by a special address family identifier (AFI) value of 0¥FFFF, is used for authentication purposes, thus reducing the total number of routing entries per advertisement to 24.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-11
data transfer, which is more sensitive to bandwidth. This problem may arise with RIP, but not with IGRP, as it considers delay and bandwidth separately. An interesting feature of IGRP is that it provides multipath routing and can perform load balancing, by splitting traffic equally between paths that have equal values for the composite metrics. To deal with the case of multiple paths featuring unequal values for the composite metrics, a variance parameter is defined as a multiplier of the best metrics. Only routes with metrics within a given variance of the best route can be used as multiple paths. In this case, traffic is distributed among multiple paths in inverse proportion to the composite metrics. The variance can be fixed by the network administrator. The default value is 1, which means that when the metric values of two paths are not equal, only the best route will be used. A higher variance value means that any path with a metric up to that value will be considered for routing purposes. However, variance values other than 1 should be carefully managed, as they may cause routing loops*. Unlike RIP, which can only prevent routing loops between adjacent routers, the route poisoning algorithm used in IGRP prevents large routing loops between nonadjacent routers. This is achieved by poisoning routes for which the metrics increase by a factor of 10% or more after an update. The rationale behind this is that routing loops generate continually increasing metrics. It should be noted that if this poisoning rule is used, valid routes may be erroneously deleted from routing tables. However, if the routes are valid, they will be reinstalled by the next update. The advantage of the IGRP poisoning algorithm is that it safely allows a zero hold-down value, which significantly improves network convergence time. IGRP handles default routes differently from RIP. Instead of a single dummy entry for the default route, IGRP allows real networks to be flagged as default candidates. IGRP periodically analyzes all the candidate default routes to choose the one with the lowest metric, which will be the actual default route. This approach is more flexible than the one adopted by typical RIP implementations. The default route can change in response to changes within a network. 3.5.1.2 A Hybrid Protocol: The Enhanced Interior Gateway Routing Protocol EIGRP was introduced by Cisco in the early 1990s as an evolution of IGRP [Cisco02a]. As it offers support for multiple network layer protocols (e.g., IP, AppleTalk, and Novell NetWare), EIGRP is commonly used in mixed networking environments. EIGRP is called a hybrid protocol as it combines a distance-vector routing protocol with the use of a diffusing update algorithm (DUAL) [Garcia93], which has some of the features of link-state routing algorithms. The advantages of this combination are fast convergence and lower bandwidth consumption. DUAL, based on distance information provided by route advertisements from neighbors, finds all loop-free paths to any given destination. Among all the loop-free paths to the destination, the neighbor with the best path is selected as the successor, while the others are selected as feasible successors, i.e., eligible routes to be used if the primary route becomes unavailable. If, after a failure, one feasible successor is found, DUAL promotes it to primary route without performing recalculation, thus reducing the overhead on routers and transmission facilities. If, on the other hand, no feasible successors exist, a recomputation (called a diffusing computation) is performed to select a new successor. Unnecessary recomputations affect convergence, so they should be avoided. EIGRP uses the same composite metrics as IGRP. However, it does not send periodic updates, but instead implements neighbor discovery/recovery, based on a hello mechanism, to assess neighbor reachability. Routing updates are only sent in the event of topology changes or a failure in a router or link. Moreover, when the metric for a route changes, only partial updates are sent, and they only reach the routers that really need the update. As a result, EIGRP requires less bandwidth than IGRP. EIGRP relies on the Reliable Transport Protocol (RTP) [Schulz03] to achieve guaranteed and ordered delivery of EIGRP packets to all neighbors. It supports VLSM, thus providing more flexibility in inter*It should also be pointed out that the load balancing performed by the IGRP may produce out-of-sequence packets. This should be taken into account when neither the data link nor transport layer protocols have the capability of handling out-of-sequence packets.
© 2005 by CRC Press
3-12
The Industrial Communication Technology Handbook
network design than RIPv1 or IGRP. This can be particularly useful when dealing with a limited address space (as will be explained in Section 3.7). EIGRP has a modular architecture that makes it possible to add support for new protocols to an existing network. On the whole, EIGRP is robust and easy to configure and use. EIGRP supports both internal and external routes, allowing tags to identify the source of external routes. This feature can be exploited by network administrators to develop their own interdomain routing policies [Pepel00][Cisco97]. 3.5.1.3 Link-State Protocols 3.5.1.3.1 OSPF: The Open Shortest-Path-First Protocol OSPF is a link-state IGP designed by the Internet Engineering Task Force (IETF) in the late 1980s to overcome the limitations of RIP for large networks. The name of this protocol derives from the fact that it is an open standard and that it uses the shortest-path-first (also called Dijkstra [Dijks59]) algorithm [Perlm92][Moy89]. It was specifically designed for the TCP/IP environment and runs directly over IP. OSPF is commonly used in medium to large IP networks and is implemented by all major router manufacturers. Several specifications have appeared since the first one [Moy89]. The RFC for OSPF version 2 (OSPFv2) is in [Moy88], while OSFP specifications to support IPv6 are given in [Coltun99]. One of the most appealing features of OSPF is its support for hierarchical routing design within ASs (although it is purely an IGP). An AS running OSPF can therefore be configured into areas that are interconnected by one backbone area (as shown in Figure 3.4). There are two types of routing within OSPF: intra-area and inter-area routing. In OSPF, each router monitors the state of its attached interfaces. If a topology change occurs, the router that has detected it distributes information about the change to all the other routers within its area, through broadcast messages called link-state advertisements (LSAs). LSAs include metrics, interface addresses, and other data. A router uses this information to build a topological database in the form of a direct graph. The costs associated with the various edges (i.e., links) are expressed by a single dimensionless metric configured by the network administrator. A topological database is therefore present in each router. When two routers have identical topological databases, they become adjacent. This is important, as routing information can only be exchanged between adjacent routers. Locally, the router applies a shortest-path tree algorithm to all the networks, taking itself as the root node. As said above, each router only sends LSAs to all the routers in the same area. This prevents intraarea routing from spreading outside the area. Each router within an area will know routes to any destination within the area and to the backbone. Other AS ASBR 11
IR 1
Area 1
ABR 2
H1
Area 0 (Backbone) BR 10
ABR 3 Area 2
IR 4
ABR 9 IR 6
IR 5 IR 7
H2
Area 3
H3
IR 8 AS 1
FIGURE 3.4 OSPF routing hierarchy.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-13
The internal topology of any area is hidden to every other area. Each router within a given area will know how to reach both every other router within its area and the backbone, but it does not have any idea of the number of routers existing or the way they are interconnected for any other area. Inter-area routing is performed through the backbone (which must be configured as area 0). It should be pointed out that an OSPF backbone is a routing area and not a physical backbone network, as the name might suggest. Four different types of routers can be configured in OSPF: internal routers (IRs), area border routers (ABRs), backbone routers (BRs), and autonomous system border routers (ASBRs). All of them are depicted in Figure 3.4. IRs are only entrusted with intra-area routing. All their interfaces are connected within an area. ABRs route packets between different areas within the AS. They interface to multiple areas (including the backbone), so they belong to both an area and the backbone and maintain topological information about both. BRs belong to the backbone (area 0), but are not ABR; that is, their interfaces are connected only to the backbone (e.g., BR-10). As the backbone operates an area itself, BRs maintain area routing information. ASBRs, also called boundary routers, are responsible for exchanging routing information with routers belonging to other ASs (e.g., ASBR-11 in Figure 3.4). They have an external interface to another AS and learn external routes from dynamic EGPs, such as BGP-4 (presented in Section 3.5), or static routes. External routes are passed transparently throughout the AS and kept separate from the OSPF link-state data. External routes can also be tagged by the advertising routes. A packet to be sent to another area in the AS (inter-area routing) is first routed to an ABR in the source area (intra-area routing) and then routed via the backbone to the ABR that belongs to the destination area. Finally, it will be routed to the ultimate destination. Route calculation introduces high complexity, and as an OSPF router stores all link states for all the areas, the overhead is high in a large internetwork. On the other hand, OSPF scales well and converges quickly. It also provides several appealing features, such as: • Security: All the exchanges between OSPF routers (e.g., LSAs) are authenticated, thus preventing intruders from manipulating routing information. • Type-of-service (TOS) support: Each link may feature different costs according to the traffic TOS requirements. • Load balancing between multiple equal-cost paths. • Triggered updates. • Designated routers: On a LAN, if several routers may be connected, one will be elected as the designated router and another as its backup. The designated router is responsible primarily for generating LSAs for the LAN to all other networks in the OSPF area. • Explicit support for VLSM and the ability to have discontinuous subnets (i.e., made up of single networks or sets of networks featuring noncontiguous addressing). This feature is very useful on the Internet (as will be discussed in Section 3.7). • Tagged external routes: OSPF is capable of receiving routes from and sending routes to different ASs. • Route summarization: An area border router can aggregate two subnets belonging to the same area so that only one entry in the routing table of another router will be used to reach both subnets. This minimizes the size of topological databases in the routers, thus significantly reducing protocol traffic. 3.5.1.3.2 Integrated IS-IS The Intermediate System-to-Intermediate System Protocol (IS-IS) [Callon90] is a link-state protocol. It is an open standard and its name indicates that this protocol is used by routers to talk to each other. Developed for the OSI world, IS-IS was made integrated so that it can route both OSI and IP simultaneously. Similar to OSPF, integrated IS-IS uses the SPF algorithm and is based on LSAs sent to all routers within a given area and hello packets to check the current state of a router. As it is a link-state protocol, it converges fast, but at the expense of high complexity. Integrated IS-IS supports VLSM, load sharing,
© 2005 by CRC Press
3-14
The Industrial Communication Technology Handbook
and triggered updates. Still not widely deployed at present (it is confined to telco and government networks), integrated IS-IS has good potential for medium to large IP networks.
3.5.2 Exterior Gateway Protocols for IP Internetworks EGPs are responsible for routing between ASs. Possible alternatives in the IP world are static routes, the Exterior Gateway Protocol (EGP) [Rosen82], and the Border Gateway Protocol (BGP) [Rekhr95]. 3.5.2.1 Static Routes Static routes can be applied to both intra-AS and inter-AS routing. However, as they offer particularly appealing features for inter-AS routing, they are included in this subsection. Configuration of static routes is simple, and it is very easy to enforce policy (as no routes equals no access). With static routes, no routing protocol messages travel over the links between ASs. On the other hand, maintenance of static routes in large internetworks may be complex, as they do not scale. For this reason, many network designers adopt the Dynamic Host Configuration Protocol (DHCP) [Droms97], which dynamically allocates a default gateway from a set of candidate gateways. Static routes also lack flexibility (as there is no way to choose a better path that could have been selected if dynamic routing protocols were used), so they are not suitable for changing environments. They do not respond to topological changes. To enforce fault tolerance in static routes, a secondary gateway is usually maintained, which could take the role of the primary one if the latter becomes unreachable or goes down. 3.5.2.2 Exterior Gateway Protocol The Exterior Gateway Protocol [Rosen82] was the first EPG to be developed. It runs directly over IP and is a best-effort service. The routing information of EGP is similar to distance-vector protocols, but it does not use metrics. EGP suffers from some design limitations, as it does not support routing loops detection and multiple paths. If more than one path to a destination exists, packets can easily get stuck in routing loops. Nowadays, it has been declared obsolete and replaced by the Border Gateway protocol (BGP). 3.5.2.3 Border Gateway Protocol Border Gateway Protocol version 4 (BGP-4), originally specified in RFC 1771 [Rekhr95], is a very robust and scalable routing protocol that is becoming a de facto standard for inter-AS routing on the current Internet. BGP-4 is an open standard. It somewhat resembles distance-vector protocols, as it is a distributed protocol where information exchange occurs only between directly connected routers. However, it is more appropriate to define BGP-4 as a path-vector protocol. This is because, as stated in [Rekhr95], “the primary function of a BGP speaking system is to exchange network reachability information with other BGP systems.” Network reachability information includes only the list of autonomous systems (ASs) that need to be traversed in order to reach other networks. Path information, instead of cost information, is exchanged between neighboring BGP routers, and BGP-4 does not specify the rule for choosing a path from those advertised. The routing mechanism and routing policy are therefore separated. A policy is manually configured to allow a BGP router to rate possible routes to other ASs and choose the best path. Two BGP routers first have to establish a TCP connection. After a negotiation phase, in which the two systems exchange some parameters (such as BGP version number, AS number, etc.), they become BGP peers and can start exchanging information. Initially, they exchange the full routing tables. Thereafter, only incremental updates are sent when some change in the routing tables occurs. No periodic refresh of the entire routing table is needed, so a BGP speaker maintains the current version of the entire BGP routing tables of all of its peers for the duration of the connection. In order to maintain the connection, peers regularly exchange keep-alive messages. Notification messages are sent in response to errors or special conditions. In the event of error, a notification message is sent and the connection is closed. Network reachability information is used to construct a graph of AS connectivity, from which routing loops may be pruned and policy decisions at the AS level may be enforced. Each BGP router maintains a routing table with all feasible paths to a given network.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-15
As said above, BGP does not propagate cost information but path information, and does not specify which path should be selected from those that have been advertised, as this is a policy-dependent decision left up to the network administrator. This is because, as stated previously (in Section 3.4), when dealing with inter-AS routing, policy is more important than performance. However, when a BGP speaker receives several updates describing the best paths to the same destination, it has to choose one of them. The selection criteria are based on rules that take path attributes into account. Once the decision has been made, the speaker puts the best path in its routing table and then propagates the information to its neighbors. The path attributes generally used in the route selection process are [Lewis99][Halabi00][Huitem95]: • Weight: A Cisco-defined attribute that is local to a router and is not advertised to neighboring routers. If the router learns about more than one route to the same destination, the route with the highest weight will be preferred. • Local preference: Used to choose an exit point from the local autonomous system (AS). It is propagated throughout the local AS. If there are multiple exit points from the AS, the local preference attribute is used to select the exit point for a specific route. • Multiexit discriminator, or metric attribute: Used as a suggestion to an external AS regarding the preferred route into the AS that is advertising the metric. • Origin: Indicates how BGP learned about a particular route. The origin attribute can have one of three possible values: IGP (the route is interior to the originating AS), EGP (the route is learned via the Exterior Border Gateway Protocol (EBGP), or incomplete (the origin of the route is unknown or learned in some other way — this occurs when a route is redistributed in BGP). • AS_path: When a route advertisement passes through an autonomous system, the AS number is added to the ordered list of AS numbers the advertisement traversed. • Next-hop: The IP address that is used to reach the advertising router. • Community: This provides a way of grouping communities, i.e., destinations to which routing decisions can be applied. Predefined community attributes are no-export (this route must not be advertised to EBGP peers), no-advertise (this route must not be advertised to any peer), and Internet (this route must not be advertised to the Internet community — all routers in the network belong to it). Attributes are crucial to achieve scalability, define routing policies, and maintain a stable routing environment. Other BGP features include route filtering, i.e., the ability of a BGP speaker to specify which routes to send and receive from any of its peers. Filtering may refer to inbound or outbound links and may be applied as permit or deny. BGP-4 supports supernetting or classless interdomain routing (CIDR) [Fuller93] (dealt with in Section 3.7), which enables route aggregation, that is, the combination of several routes within a single route advertisement. This minimizes the size of routing tables and protocol overhead. BGP was originally designed to perform inter-AS routing, but it can also be used for intra-AS routing. BGP connections between ASs are called external BGPs (e-BGPs), while those within an AS are called internal BGPs (i-BGPs). Figure 3.5 shows a typical scenario for e-BGP, i.e., a multihomed AS connected to the Internet. All the networks X, Y, Z, 1, 2, and 3 are ASs. More specifically, 1 and 3 are stub networks, 2 is a multihomed stub network, and X, Y, and Z are backbone provider networks. A stub network is such that all traffic entering it should be destined to that network. A multihomed AS is connected to multiple ASs (for example, via two different service providers*), but does not allow transit traffic. Stub network 2 will be prevented from forwarding traffic between Y and Z by a selective route advertisement mechanism. BGP routes will be advertised in such a way that network 2 will not advertise
*In this situation, multiple service providers are used to increase availability and to allow load sharing.
© 2005 by CRC Press
3-16
The Industrial Communication Technology Handbook
FIGURE 3.5 A scenario for BGP.
to its neighbors (Y and Z) any path to other destinations except itself. For instance, 2 will not advertise path 2 Z 3 to network Y, so the latter will not forward traffic to 3 via 2. i-BGP can be used to distribute routing information to routers within the AS regarding destinations (networks) outside the AS. Running i-BGP offers several advantages, such as a consistent view of the AS to external neighbors, more control over information exchange within the AS, flexibility, and a slightly shorter convergence time than e-BGP (which is slow). Table 3.2(A) and Table 3.2(B) list the various IGPs and EGPs discussed and compare them according to multiple design criteria. The tables can be helpful to guide the designer’s choice when dealing with routing protocols in the IP domain. We have seen that various IGPs and EGPs exist in the IP domain, including both proprietary protocols and open standards. In a composite, complex internetwork they may coexist either for historical reasons or due to the presence of multiple vendor solutions. This coexistence may cause incompatibility problems due to both the different metrics adopted and peculiarities of the various protocols. It is therefore necessary to find a way to overcome incompatibilities and form a holistic view of the network topology, thus enabling interoperability. This is the case for route redistribution [Awdu02]. Route redistribution is the process that enables routing information from one protocol to be translated and used by a different one. A router can therefore be configured to run more than one routing protocol and redistribute route information between the two protocols. TABLE 3.2(A) IGPs vs. EGPs in the IP Domain Protocol IGP
EGP
Technology
Type
Metrics
Distance vector
Classful
RIPv1 RIPv2 EIGRP
Advanced distance vector
Classful Classless Classless
OSPF IS-IS BGP-4
Link state Link state Path vector
Classless Classless Classless
Bandwidth delay, load, reliability, MTU Hop count Hop count Bandwidth, delay, load, reliability, MTU Cost Cost Cost, hop, policy
IGRP
Scalability
VLSM Support
Medium
No
Small Small Large
No Yes Yes
Large Very large Large
Yes Yes Yes
TABLE 3.2(B) IGPs vs. EGPs in the IP Domain Protocol IGP
EGP
IGRP RIPv1 RIPv2 EIGRP OSPF IS-IS BGP-4
© 2005 by CRC Press
Hop Count Limit
Load Balance (Equal Paths)
Load Balance (Unequal Paths)
Standard
Convergence Time
Routing Algorithm
100 (up to 255) 15 15 100 (up to 255) 200 1024
Yes Yes Yes Yes Yes Yes No
Yes No No Yes No No No
No Yes Yes No Yes Yes Yes
Slow Slow Slow Fast Fast/slow Fast/slow Slow
Bellman–Ford Bellman–Ford Bellman–Ford Dual Dijkstra IS-IS
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-17
Redistribution is needed in two cases: • At the AS boundary, as IGPs and EGPs do not match • Within an AS, when multiple IGPs are used Redistribution has direction. That is, routing information can be redistributed symmetrically (mutual redistribution) or asymmetrically (hierarchical redistribution). Handling redistribution is not an easy task, and it is difficult to define a general approach. Solutions are typically vendor dependent, as the problems that can arise are strictly related to the details of the various protocols.
3.6 IP Multicast Routing The aim of multicast routing is to build a tree of links connecting all the routers that are attached to hosts belonging to the multicast group. Two different approaches are possible. The first one, called groupshared tree, foresees a single tree for all sources in the multicast group, while the second approach, called source-based trees, provides an individual routing tree for each source in the multicast group. The group-shared tree approach entails solving the Steiner tree problem [Hakimi71]. Although several heuristics were devised to solve this problem [Wall80][Waxm88][Wei93], they were not adopted by existing Internet multicast routing algorithms, due to the complexity and poor scalability of these methods, which entail that multicast-aware routers maintain state information about all the links in the network. Moreover, the tree built using these techniques should be recalculated at any change in link costs. A more effective way to find the group-shared tree is the center-based (or core-based) approach, which is adopted by several Internet multicast routing algorithms. Here a router of the multicast group (also called a rendezvous point) is elected to be the core of the multicast tree. The core node will be the recipient of join messages sent by all the routers attached to the hosts belonging to the multicast group. A join message is forwarded toward the core via unicast routing, and it stops when it reaches the core or when it reaches a router that is already part of the multicast tree. This way, the path followed by the join message is the branch to the core from the originating node, and it will become part of the multicast tree (the branch is said to be grafted on to the existing tree). The advantages of this approach are that unicast tables are exploited to forward the join messages and that multicast-aware routers do not need to maintain link state. The source-based trees approach entails solving a least-cost path multicast routing problem. This could be expensive, as each sender needs to know all links’ costs to derive the least-cost spanning tree. For this reason, the reverse path-forwarding (RPF) algorithm [Dalal78] is used, which is also useful as it prevents loops. The RPF algorithm allows a router to accept a multicast packet only on the interface from which the router would send a unicast packet to the source of the incoming multicast packet. RPF is effective, as each router only has to know the next-hop along its least-cost path to the source, but it has the drawback that even routers that are not attached to hosts belonging to the multicast group would receive multicast packets. This can be solved by pruning techniques, i.e., a prune message can be sent back upstream by a multicast router receiving a multicast message for which it has no recipients. As in a dynamic scenario, if it happens that a receiver later joins a multicast-aware router that has already sent a prune message, unprune messages to be sent back upstream or suitable timeouts (time to live (TTL)) to remove bad prunes can be introduced. In the Internet, multicast is achieved by the combination of the Internet Group Management Protocol version 2 (IGMPv2) [Fenne97] and multicast routing protocols. IGMP is an end system to intermediate system (ES-IS) protocol for multicasts. End systems (ESs) are network devices without the capability to forward packets between subnetworks. IGMP is used by a host to inform its directly attached router that an application running on it is interested in joining a given multicast group. IGMP allows multicast group members to join or leave a group dynamically and maintains state information on router interfaces that can be exploited by multicast routing protocols to build the delivery tree.
© 2005 by CRC Press
3-18
The Industrial Communication Technology Handbook
Multicast routing algorithms, such as the protocol-independent multicast (PIM), the Distance-Vector Multicast Routing Protocol (DVMRP), the multicast open shortest path first (MOSFP), and the corebased tree (CBT), are in charge of coordinating the routers so that multicast packets are forwarded to their ultimate destinations. They are outlined below.
3.6.1 Distance-Vector Multicast Routing Protocol The Distance-Vector Multicast Routing Protocol [Waitz88] is an Interior Gateway Protocol derived from RIP. DVMRP combines many of the features of RIP with the truncated reverse-path-broadcasting (TRPB) algorithm described by Deering [Deeri88]. DVMRP is based on flood and prune, according to the dense-mode multicast routing model, which assumes that the multicast group members are densely distributed over the network and bandwidth is plentiful. Such a model fits high-density enterprise networks. The multicast forwarding algorithm requires the building of per-source multicast trees based on routing information, and then the dynamic creation of per-source group multicast delivery trees, by pruning the multicast tree for each source in a selective way. To unprune a previously pruned link, DVMRP provides both an explicit graft message and a TTL on prune messages (2 hours by default). DVMRT includes support for tunneling IP multicast packets. This is very useful, as not all IP routers have multicast capabilities. Tunneling is performed by encapsulating multicast packets in IP unicast packets that are addressed and forwarded to the next multicast router on the destination path. Thanks to its tunneling capabilities, DVMRP has been used on the Internet for several years to support the multicast overlay network MBone [Kumar96], although its flooding nature does not recommend its adoption on large internetworks. The tree building performed by DVMRP needs more state information than RIP, so DVMRP is more complicated than RIP. There is also a very important difference from RIP. While the target of RIP is to route and forward packets to a particular destination, the goal of DVMRP is to keep track of the return paths to the source of multicast packets.
3.6.2 Multicast OSPF Multicast OSPF (MOSPF) [Moy94] is an extension of the OSPFv2 unicast protocol enabling the routing of IP multicast packets. MOSPF is not a separate routing protocol; the multicast extensions are built on top of OSPFv2 and have been implemented so that a multicast routing capability can be gradually introduced into an OSPFv2 routing domain. A new OSPF link-state advertisement (LSA) describing the location of multicast destinations is added. Each router builds least-cost multicast trees for each (sender, group) pair. The path for a multicast packet is obtained by building a source-rooted pruned shortest-path multicast tree. The state of the tree is cached: it has to be recalculated following link-state changes or when the cache times out. Unlike unicast packets, in MOSPF an IP multicast packet is routed based on both the packet’s source and its multicast destination. During packet forwarding, any commonality of paths is exploited. When multiple hosts belong to a single multicast group, a multicast packet will be replicated only when the paths to the separate hosts diverge [Moy94]. MOSPF is an example of sparse-mode routing protocol, which assumes that multicast members are widely distributed over the network and bandwidth is possibly restricted. The sparse mode is suitable for internetworking applications. However, MOSF does not support tunneling and, due to the inherent scaling problems of shortest-path algorithm, is not suitable for large internetworks.
3.6.3 Protocol-Independent Multicast Protocol-independent multicast (PIM) is a recent IP multicast protocol. The name indicates that PIM is not dependent on any particular unicast routing protocol. PIM works with IGMP and existing unicast routing protocols, such as RIP, IGRP, OSFP, IS-IS, and BGP. There are two operating modes for PIM, the sparse mode and the dense mode, which are described below.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-19
3.6.3.1 Sparse-Mode PIM The sparse-mode protocol-independent multicast (PIM-SM) [Estrin98] is a protocol for efficiently routing to multicast groups that may span wide-area (and interdomain) internetworks and is designed to support the sparse distribution model. PIM-SM works using a rendezvous point (RP) and requires explicit join/prune messages. A sender willing to send multicast packets has first to announce its presence by sending data to the RP. Analogously, a receiver willing to receive multicast data has first to register with the RP. Once the data flow from sender to RP to receiver starts, the routers on the path automatically optimize the path, removing not-needed hops. This way, traffic only flows where it is required and router state is maintained only along the path. A drawback of PIM-SM is that RP could become a bottleneck, so multiple RPs may be introduced to avoid congestion through load sharing. 3.6.3.2 Dense-Mode PIM The dense-mode protocol-independent multicast (PIM-DM) [Nicho03] forwards multicast packets out to all connected interfaces except the receive one. Thus, it floods the network first and prune-specific branches later. PIM-DM efficiently supports routing in dense multicast networks, where it is reasonable to assume that every downstream system is potentially a member of the multicast group. Unlike DVMRP, PIM-DM does not use routing tables. PIM-DM is easy to configure, but less scalable and less efficient than PIM-SM for most applications.
3.6.4 Core-Based Tree The core-based tree (CBT) multicast routing protocol builds a shared multicast distribution tree per multicast group. CBT follows the core-based approach described at the beginning of Section 3.6 for group-shared tree building. Despite its simplicity, the way the tree is built in CBT tends to concentrate traffic around the core routers, and this may lead to congestion. For this reason, some implementation features multiple core routers and performs load sharing between them. CBT is very suitable for supporting multicast applications, such as distributed interactive simulations or distributed video gaming, as they are characterized by many senders within a single multicast group. The deployment of CBT until now has been limited. More details on CBT can be found in [CBT][CBTv2].
3.6.5 Interdomain IP Multicast Routing The current approaches to interdomain IP multicast routing are based on an extension of BGP-4, called the Multicast Border Gateway Protocol (MBGP) [Bates00]. MBGP carries two sets of routes, one for unicast routing and one for multicast routing. The routes associated with multicast routing are used by PIM-SM to build data distribution trees.
3.7 IP Addressing and Routing Issues This section focuses on the strict coupling between IP addressing and routing. First, the two commonly used addressing models, classful and classless, are described, together with their impact on routing. Then subnetting, variable-length subnet masks (VLSM), and classless interdomain routing (CIDR) are presented. Finally, IPv6 and its deployment in the current Internet, together with IPv4/IPv6 migration issues, are discussed.
3.7.1 Classful IP Addressing According to the first IP specification [Poste81], each system attached to an IP-based Internet has a globally unique 32-bit Internet address. IP addresses are administered by the Internet Assigned Numbers Authority [IANA]. Systems that have interfaces to more than one network must have a unique IP address for each network interface. The Internet address consists of two parts: (1) the network number (or
© 2005 by CRC Press
3-20
The Industrial Communication Technology Handbook
TABLE 3.3(A) Address Formats and Network Sizes for Each Class
Class ID Class A Class B Class C Class D Class E
Highest-order bits = 0 Second-highest-order bits = 10 Third-highest-order bits = 110 Fourth-highest-order bits = 1110 Fifth-highest-order bits = 11,110
Network Prefix Size (bit)
Network Number Size (bit)
Host Number Length (bit)
Maximum No. of Networks
Maximum No. of Hosts Per Network
8 16
7 14
24 16
27 – 2 = 126a 214
224 – 2b 210 – 2
24
21
8
221
28 – 2
na
na
na
na
na
tbd
tbd
tbd
tbd
tbd
a
The maximum number of networks which can be defined is 126, and not 128, as there are 2 reserved networks, i.e., network 0.0.0.0 (reserved for default routes) and network 127.0.0.0 (reserved for the loopback function). b As host numbers 0.0.0.0 (“this network”) and 1.1.1.1 (“broadcast”) are reserved.
TABLE 3.3(B) Address Type and Ranges for Each Class
Class A Class B Class C Class D Class E
Type
Dotted Decimal Notation Range
Unicast Unicast Unicast Multicasta Reserved for experimental use only
From 1.0.0.0 to 127.255.255.255 From 128.0.0.0 to 191.255.255.255 From 192.0.0.0 to 223.255.255.225 From 224.0.0.0 to 239.255.255.255 From 240.0.0.0 to 247.255.255.255
a
Class D is used for multicast applications and routing protocols such as OSPF and RIPv2.
network identifier, netID), which identifies the network to which the host belongs, and (2) the host number (or host identifier, hostID), which specifies the particular host on the given network. In classful addressing, the IP address space is divided into three main address classes, Class A, Class B, and Class C, which differ for the position of the boundary between the network number and the host number within the 32-bit address. Two additional classes are also defined: Class D (used for multicast addresses) and Class E (reserved for future use). IP addresses are commonly expressed in what is called dotted decimal notation, which divides the 32bit Internet address into four 8-bit fields. Each field value corresponds to each byte of the address, written in its decimal form and separated by a period (dot) from the other bytes in the address. For example, let us consider the IP address 192.32.215.8. The first number, 192, is the decimal equivalent of the first eight bits of the address, i.e., 11000000; the second, 32, is the equivalent of the second eight bits of the address; and so on. In binary notation, the address is therefore 11000000 00100000 11010111 00001000. Table 3.3(A) summarizes the address formats for each class and the maximum number of networks and hosts that can be defined within each class, while Table 3.3(B) shows the address type and address ranges for the different classes in the dotted decimal notation. Each unicast address class (i.e., A, B, and C) has an associated default mask, which is a bit mask used by host and routers to assess how much of the netID is significant for forwarding decisions. The bit mask therefore indicates how much of the address is allocated to the netID and how much is left to the hostID. Table 3.4 shows the default masks associated with each unicast IP address class. For example, the default mask 255.0.0.0 for Class A indicates that only the first eight bits are used by the netID. The mask role is crucial for routers, as by ANDing such a mask with a destination address, they can easily determine if an incoming packet should be sent directly to the local network or forwarded to another one.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-21
TABLE 3.4 Prefix-Default Masks Association for Each Unicast Class
Class A Class B Class C
Default Mask
Default Prefix
255.0.0.0 255.255.0.0 255.255.255.0
/8 /16 /24
Recently, masks have been indicated using the so-called prefix, which indicates the number of contiguous bits used by the mask. Prefix–default mask associations for each unicast class are also shown in Table 3.4. A host can be configured with multiple IP addresses of different classes on the same physical interface. Direct communication is only possible between nodes within the same class and prefix, while nodes with a different address class or the same address class and a different prefix need an intermediate device, such as a layer 3 switch, router, proxy, or network address translator (NAT). Network address translation allows a router to act as an agent between the Internet and a local private network. This means that only a single, unique IP address is required to represent an entire group of computers. NAT implements short-term address reuse and is based on the fact that a very small percentage of hosts in a stub domain (i.e., a domain such as a corporate network that only handles traffic originated or destined to hosts in the domain) are communicating outside of the domain at any given time. Because many hosts never communicate outside of their stub domain, only a subset of the IP addresses inside the domain needs to be translated into IP addresses that are globally unique when outside communications are required. Each NAT device has a table consisting of pairs of local IP addresses and globally unique addresses. The IP addresses inside the stub domain are not globally unique. They are reused in other domains, thus solving the address depletion problem. The globally unique IP addresses are assigned according to the CIDR address allocation schemes, which solve the scaling problem (as will be discussed below). The main advantage of NAT is that it can be installed without changes to routers or hosts. More details on NAT can be found in [Egeva94].
3.7.2 Impact of IP Addressing on Routing Tables and Internet Scalability The two most compelling problems facing today’s Internet are IP address depletion and poor scaling due to the uncontrolled increase in the size of Internet routing tables. The first problem is the result of the IP version 4 (IPv4) addressing scheme, based on a 32-bit address, which limits the total number of IPv4 addresses available. The situation is further complicated by the traditional model of classful addressing, which determined inefficient allocation of some portions of the IP address space in the early days of the Internet. The second problem derives from the exponential growth of the number of organizations connected on the Internet, combined with the fact that Internet backbone routers have to maintain complete routing information for the Internet. This problem cannot be solved by hardware enhancements, such as expanding router memory or improving router processing power; to deal with large routing table processing, route flapping (i.e., rapid changes in WWW route connections), and large volumes of information to be exchanged without jeopardizing routing efficiency or the reachability of Internet portions, a more comprehensive and effective approach is needed. In the following, a technique, called subnetting, to reduce the uncontrolled growth of Internet routing tables is described.
3.7.3 Subnetting Introduced in [Mogul85], subnetting allows a single Class A, B, or C network number to be divided into smaller parts. This is achieved by splitting the standard classful hostID field into two parts: the subnetID
© 2005 by CRC Press
3-22
The Industrial Communication Technology Handbook
FIGURE 3.6 Extended network prefix for subnetting.
and the hostID on that subnet (as shown in Figure 3.6). The (netID, subnetID) pair forms the extended network prefix. In this way, if the internal network of a large organization is split into several subnetworks, the division is not visible outside the organization’s private network. This allows Internet routers to use a single routing table entry for all the subnets of a large organization, thus reducing the size of their routing tables. However, subnetworks are visible to internal routers, which will have to differentiate between the internal routes. Internet routers therefore use only the netID of the destination address to route traffic, while internal routers will use the extended network prefix. Subnetting has two main advantages. First, it hides the complexity of private network organization within the private network boundary, preventing it from spreading outside and affecting the size of Internet router routing tables. Second, thanks to subnetting, local administrators do not have to obtain a new network number from the Internet when deploying new subnets. Each bit in the subnetID mask has a one-to-one correspondence with the Internet address. If a bit in the subnet mask is set to 1, the corresponding bit in the IP address will be considered by the router as part of the extended network prefix. Otherwise, the corresponding bit in the IP address will be considered as part of the host number. Modern routing protocols still carry the complete four-octet subnet mask. The use of a single subnet mask, however, limits organizations to hosting a fixed number of fixed-size subnets. A further improvement that greatly enhances flexibility is using more than one subnet mask for an IP network split into several subnetworks. This solution, called variable-length subnet masks, is described next.
3.7.4 Variable-Length Subnet Masks Variable-length subnet masks (VLSM), introduced in [Brade87], proposed the possibility of using more than one subnet mask for a subnetted IP network. As in this case, where the extended network prefix may have different lengths, the subnetted IP network is called a network with variable length subnet masks. VLSM is a powerful improvement in flexibility. VLSM also offers another significant benefit — the possibility of introducing route aggregation (summarization). Route aggregation is defined as the ability of a router to collapse several forwarding information base entries into a single entry [Trotter01]. This aggregation process works in combination with VLSM to build hierarchically structured networked environments. VLSM allows us to recursively divide the address space of a large organization. This is accomplished by splitting a large network into subnets, some of which are then further divided into subnets, some of which will, in turn, be split into subnets. This division makes routing information relevant to one group of subnets invisible to routers belonging to another subnet group. A single router can therefore summarize multiple subnets behind it into a single advertisement, thus allowing a reduction in the routing information to be maintained at the top level. From the routing perspective, route summarization offers the following advantages: • It reduces the amount of information stored in routing tables.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-23
• It simplifies the routing process, thus reducing the load on router resources (e.g., processor and memory). • It improves network convergence time and isolates topology changes. Without summarization, every router would need to have a route to every subnet in the network environment. As the size of the network gets larger, the more crucial route summarization becomes. 3.7.4.1 Routing Protocol Requirements for Deploying VLSM Now that the advantages of VLSM are clear, let us analyze what features routing protocols have to offer in order to provide support for VLSM deployment. The first requirement is that routing protocols must carry extended network prefix information along with each route advertisement. This allows each subnetwork to be advertised with its corresponding prefix length or mask. As said before (in Section 3.5), some protocols, such as OSPF, Interior IS-IS, and RIPv2, provide this feature. RIPv1, on the other hand, allows only a single subnet mask to be used within each network number, because it does not provide subnet mask information as part of its routing table update messages. In this case, a router would have to either guess that the locally configured prefix length should be used (but this cannot guarantee that the correct prefix will be applied) or perform a lookup in a statically configured prefix table containing all the masking information (but static tables raise severe scalability issues, require nonnegligible effort for maintenance, and are error-prone). As a result, to successfully deploy VLSM in a large complex network, the designer must choose an IGP such as OSPF, IS-IS, or RIP v2, while RIPv1 should be avoided. The second requirement for deploying VLSM is that all routers must adopt a consistent forwarding algorithm based on the longest match. When VLSM is implemented, it may happen that a destination address matches multiple routes in a router’s routing table. As a route with a longer extended network prefix is more specific than a route with a shorter one, when forwarding traffic, routers must always choose the route with the longest matching extended network prefix. The third requirement to support VLSM is related to route aggregation and consists of assigning addresses so that they have topological significance. This means that addresses have to reflect the hierarchical network topology. In general, network topology follows continental and national boundaries, so IP addresses should be assigned on this basis. If the organizational topology does not match the network topology, route aggregation should not be applied. While, in fact, it is reasonable to aggregate a pool of addresses assigned to a particular region of the network into a single routing advertisement, it is not meaningful to group together addresses that are not topologically significant. Wherever route aggregation cannot be applied, the size of the routing tables cannot be reduced. The solution that allows today’s Internet to operate normally despite the problems related to the depletion of IPv4 addressing space and to the growing size of Internet routing tables is classless interdomain routing (CIDR), which is described below.
3.7.5 Classless Interdomain Routing CIDR, also called supernetting, relaxes the traditional rules of classful IP addressing; as with CIDR, the netID is not constrained to 8, 16, or 24 bits anymore, but may be any number of bits long. This realizes the so-called classless addressing [Hinden93][Rekht93a][Fuller93][Rekht93b]. In the CIDR model, a prefix length specifies the number of leading bits in the 32-bit address that represent the network portion of the address. For example, the network address in the dotted decimal form a.b.c.d/21 indicates that the first 21 bits specify the netID, while the remaining 11 bits identify the specific hosts in the organization. As a result, CIDR supports the deployment of arbitrarily sized networks rather than the standard 8-bit, 16-bit, or 24-bit network numbers associated with classful addressing. Moreover, the rightmost 11 bits could be further divided through subnetting [Mogul85], so that new internal networks within the a.b.c./21 network can be created. For route advertising, instead of the traditional high-order bits scheme, prefix length is used. Routers supporting CIDR therefore rely on the prefix length information provided with the route.
© 2005 by CRC Press
3-24
The Industrial Communication Technology Handbook
192.0.0.0/8
192.169.0.0/16
192.168.1.0/24 192.168.2.0/24 192.168.3.0/24
192.169.1.0/24
FIGURE 3.7 CIDR and route summarization.
CIDR also supports route aggregation. As a result, with CIDR a single routing table entry can represent the address space of thousands of traditional classful routes. That is, in route advertisement, networks can be combined into supernets, as long as they have a common network prefix (Figure 3.7). This is crucial to reduce the size of Internet backbone router routing tables and to simplify routing management. The implementation of CIDR in the Internet is mainly based on the BGP-4 protocol. Internet is divided into addressing domains in a hierarchical way. Within a domain, detailed information about all the networks belonging to the domain is available, while outside the domain only the common network prefix is advertised. This allows a single routing table entry to specify a route to many individual network addresses. CIDR and VLSM are similar, since both allow a portion of the IP address space to be recursively divided into smaller pieces. Both approaches require that the extended network prefix information be provided with each route advertisement and use longest matching for addresses. The key difference between VLSM and CIDR is a matter of where recursion is performed. In VLSM the subdivision of addresses is done after the address range is assigned to the user (e.g., a private enterprise’s network). In CIDR the subdivision of addresses is done by the Internet authorities and ISPs before the user receives the addresses. CIDR deployment also imposes the same routing protocol requirements as VLSM. Although CIDR, in combination with network address translation (NAT), represents an acceptable short-term solution to today’s Internet deficiencies, the long-term solution is to redesign the address format to allow for more possible addresses and more efficient routing on the Internet. This is the reason for a new version of the IP, called IP version 6 (IPv6) [Deeri98], which was devised to overcome the limitations of IPv4.
3.8 IPv6 Overview IPv6 [Deeri98], the new version of the IP, solves the problem of a limited number of available IPv4 addresses and also adds many improvements in areas such as routing, network self-configuration, and QoS support. The IPv6 protocol has been streamlined to expedite packet handling by network nodes and provides support for congestion control, reducing the need for reliable, but untimely higher-level protocols (e.g., TCP). Moreover, IPV6 characteristics can be exploited inside routers to entrust them with the task of providing diversified scheduling for real-time and non-real-time flows. The IPv6 header is different from the IPv4 header format, and this makes the two protocols incompatible. Although the IPv6 addresses are four times longer than the IPv4 addresses, the IPv6 header is only twice the size of the IPv4 header, as several functions present in the IPv4 header have been relocated in extension headers or dropped [Deeri98].
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-25
For example, in IPv6 there is no checksum field. This speeds up routing, as routers are relieved from recalculating checksum for each incoming packet. IPv6 also provides the hop-limit field, which indicates the number of hops remaining valid for this packet. Used to limit the impact of routing loops, the hop limit field is decremented by 1 by each node that forwards the packet. The packet is discarded if the hop limit reaches zero. In IPv4 there was a timeto-live field, but it was expressed in seconds. The change in number of hops reduces the processing time within routers. Another difference is that in IPv6 the minimum header length is 40 bytes long and can be supplemented by a number of extension headers of variable length. For example, the payload length in bytes is specified in the payload length field. Note that IPv6 options can be of arbitrary length (and not limited to 40 bytes, as in IPv4). Another interesting field in the IPv6 header is the flow label [Partri95], which enables the source to label packets for special handling by intermediate systems. The important features of IPv6 can therefore be summarized as follows [Deeri98]: • A new addressing scheme • No fragmentation/reassembly at the intermediate routers: These time-consuming operations are left to the source and destination, thus improving IP forwarding within the network. • Simplified header and fixed-length options in the header: Compared to IPv4, these features reduce the processing overhead on routers. • Support for extension headers and options: IPv6 options are placed in separate headers located in the packet between the IPv6 header and the transport layer header. Most IPv6 option headers are not processed by any router along a path before the packet arrives at its final destination, and this improves router performance for packets containing options. • Quality-of-service capabilities: These are crucial for the future evolution of the Internet. A new capability enables labeling of packets belonging to specific traffic flows for which the sender has requested special handling, such as nondefault quality of service or real-time service. • Support for authentication and privacy: Through an extension that provides support for authentication and data integrity. • Support for source routes: IPv6 includes an extended source routing header designed to support source-initiated selection of routes (used to complement the route selection provided by existing routing protocols for both interdomain and intradomain routes). Some extension headers are exploited for routing purposes. The routing header is used by a source to list one or more intermediate nodes to be visited on the way to a packet’s destination (more details will be given below). This particular form of the routing header is designed to support source demand routing (SDR) [Estrin96]. The hop-by-hop options header is used to carry optional information that must be examined by every node along a packet’s delivery path. Finally, the end-to-end options header is used to carry optional information that needs to be examined only by a packet’s destination node (or nodes).
3.8.1 IPv6 Addressing, Subnetting, and Routing The address field in IPv6 is 128 bits long. This not only entails a higher number of possible IP addresses than the one achievable with the 32-bit IPv4 address, but also allows more levels of the addressing hierarchy and simpler autoconfiguration. IPv6 addresses are assigned to interfaces, not nodes. A node can have several interfaces, and therefore it can be identified by any of the unicast addresses assigned to any of its interfaces. IPv6 supports unicast, anycast, and multicast addresses. With unicast addressing, the packet is delivered to the interface identified by the specific address. With anycast addressing, the packet is delivered to one of the interfaces identified by that address, i.e., the nearest one, according to the routing metric adopted. Anycast can be considered
© 2005 by CRC Press
3-26
The Industrial Communication Technology Handbook
a refinement of unicast devised for simplifying and streamlining the routing process. When used as part of a route sequence, anycast addresses permit a node to select which of several ISPs it wants to transfer its traffic. This capability is referred to as source-selected policies. Finally, with multicast addressing, the packet is delivered to all the interfaces identified by that address. With multicast and anycast addressing, addressed interfaces typically belong to different nodes. The address form can be expressed in three different ways: preferred, compressed, and mixed. The preferred form is the full IPv6 address in hexadecimal values, which is H:H:H:H:H:H:H:H, where each H refers to a hexadecimal integer (16 bits). The compressed form substitutes zero strings with a shorthand indicator — double colons (::) — to compress arbitrary-length strings of zeros. This is useful as IPv6 addresses containing long strings of zeros are quite common. The mixed form is represented as H:H:H:H:H:H:H:H:D.D.D.D, where the Hs represent the hexadecimal values of the six high-order 16-bit parts of the address, while the Ds stand for the standard IPv4 decimal value representation of the four low-order 8-bit parts of the address. This mixed form is useful in hybrid IPv4/ IPv6 environments. There are two special addresses, the unspecified address 0:0:0:0:0:0:0:0 and the loopback address. The first one indicates the absence of an address, which must never be assigned to any node or used as the destination address. The second is the special unicast address 0:0:0:0:0:0:0:1, which may be used by a node to send a packet to itself. There are six types of unicast IPv6 addresses. Here we only mention aggregatable global unicast addresses, which can be routed globally on the IPv6 backbone, i.e., the 6Bone, and are equivalent to public IPv4 addresses. Aggregatable global unicast addresses can be aggregated or summarized to produce an efficient routing infrastructure. IPv6 subnetting can be compared to classless addressing. As in the CIDR notation, the prefix length indicates the leading bits that constitute the netID. IPv6 routing is hierarchical and reflects the classless concept. However, with IPv6, small to regional network service providers and end users are no longer able to obtain address space directly from numbering authorities such as IANA [IANA]. Only top-level aggregators (TLAs) (i.e., large ISPs) will be assigned address space from the Internet Registry. TLAs will be assigned address blocks, which they will in turn handle and delegate to their downstream connections, i.e., next-level aggregators (NLAs) (medium-size ISPs or specific customers sites) and site-level aggregators (SLAs) (individual organizations) [3COM]. With this new hierarchical architecture, the number of entries to be maintained in the routing tables of Internet core routers is reduced, thus limiting the routing complexity of the future Internet. IPv6 embeds simple routing extensions that support powerful new routing functionalities, such as: • Provider selection (according to some criteria such as policy, performance, cost, etc.) • Host mobility (route to current location) • Auto-readdressing (route to new address) The new routing functionality is obtained by creating sequences of IPv6 addresses using the IPv6 routing option. The routing option is used by an IPv6 source to list one or more intermediate nodes to be visited on the way to a packet’s destination. (This function is IPv4’s loose source and records route options.) To enable address sequences, IPv6 hosts are required in most cases to reverse routes in a packet they receive (if the packet was successfully authenticated using the IPv6 authentication header) containing address sequences in order to return the packet to its originator. The address sequence facility of IPv6 is simple but powerful. As an example, if host H1 were to decide to enforce a policy that all packets to/from host H2 should only go through given provider ISPx, it would construct a packet containing the following address sequence: H1, ISPx, H2. This ensures that when H2 replies to H1, it will reverse the route and the reply will go through ISPx. The addresses in H2’s reply would be: H2, ISPx, H1.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-27
FIGURE 3.8 Dual stack.
3.8.2 IPv6 Deployment in the Current Internet: State of the Art and Migration Issues 3.8.2.1 Transition from IPv4 to IPv6 The two versions of IP are not compatible, so they cannot coexist in the same subnet. This makes the spread of the protocol difficult. No company, organization, or university would switch to IPv6 if it meant turning the network off until all the nodes and routers are updated. To cope with these problems, the IETF has standardized an almost painless transition mechanism called SIT (simple Internet transition) [Gillig96]. The new protocol has been provided with properties that allow a simple, fast transition and mechanisms that allow the two protocols to coexist. IPv6 thus has the following properties: • Incremental update: IPv4 nodes can be updated to IPv6 one at a time. • Minimal dependence in update operations: The only requirement is that before performing a transition operation on a host, it has to be performed on the Domain Name Service (DNS) server; there are no requirements for transition on routers. • Easy addressing: At the moment of transition, the same addresses can be used simply by transforming them. A class of addresses that exploits IPv4 addresses (IPv4 compatibile) has been provided. • Low initialization costs: Little effort is required to update IPv4 to IPv6 and initialize new systems supporting IPv6. In addition, two cooperation mechanisms have been provided: dual stack and tunneling. 3.8.2.1.1 Dual Stack These are gateways that support both IPv4 and IPv6 implementing the two protocols completely, so as to allow IPv6 nodes to receive correctly traffic from nodes using IPv4 (as shown in Figure 3.8). These gateways receive IPv4 packets, replace the header with an IPv6 one, and relaunch them to the IPv6 subnet. Dual-stack nodes have at least two addresses, an IPv4 one and an IPv6 one, that can be related. For example, the IPv6 address may be IPv4 compatible, but not necessarily. 3.8.2.1.2 Tunneling This mechanism was actually born before IPv6 to solve certain communication problems in networks using different protocols. It consists of creating a virtual tunnel between two network nodes. In practice, a whole packet arriving at the first node (from the network header onward) is inserted into the payload of another packet. In IPv6 the routers providing tunneling are similar to the gateway dual stacks discussed above. They encapsulate IPv6 packets in IPv4 packets, as can be seen in Figure 3.9. This allows IPv6 packets to pass through networks that do not support this protocol. At the destination there obviously has to be a router to perform the inverse operation, i.e., to open the IPv4 packets, extract the IPv6 packets, and send them to the destination subnet. This mechanism is shown in Figure 3.10, where routers RA and RB handle the tunnel. This mechanism is fundamental for the worldwide spread of IPv6: it allows communications between IPv6 islands through the IPv4 network. IPv6 islands are the subnets using IPv6 instead of IPv4. To communicate with each other through the IPv4 network, they have to use tunneling. There are two kinds of tunneling:
© 2005 by CRC Press
3-28
The Industrial Communication Technology Handbook
FIGURE 3.9 Tunneling.
FIGURE 3.10 Tunneling.
• Configured tunneling: When the packet’s destination is not at the end of the tunnel. There are two modes: • Router to Router: The tunnel interconnects two IPv6 routers that are neither the source nor the destination of the IPv6 packet. In this case, the tunnel is part of the path the packet has to cover. • Host to Router: The tunnel interconnects an IPv6 host, which is the source of the packet, to an IPv6 router, which is not the packet’s destination. In this case, the tunnel is the first part of the path the packet has to cover. • Automatic tunneling: When the packet’s destination is at the end of the tunnel. Again, there are two modes: • Host to Host: The tunnel connects two IPv6 hosts; the first is the source of the packet and the second its destination. In this case, the tunnel is the whole path the packet has to cover. • Router to Host: The tunnel connects an IPv6 router and an IPv6 host; the former is not the source of the packet, but the latter is its destination. In this case, the tunnel is the final part of the path the packet has to cover. 3.8.2.2 An Experimental Network: The 6Bone The first lab experiments on networking solutions based on IPv6 soon led to worldwide geographical area experimentation and the introduction in 1996 of the 6Bone network [6Bone]. 6Bone (IPv6 backbone) is an experimental IPv6 network, parallel to the network using IPv4 and realized by interconnecting IPv6 labs via tunneling. It became a reality in March 1996 with the setting up of a first tunnel between the IPv6 labs of G6 [G6] (France), UNI-C [UNI-C] (Denmark), and WIDE [WIDE] (Japan). 6Bone has seen continuous growth in the number of interconnected labs and is the environment in which the most interesting IPv6 protocol experiments are being carried out: verification of the maturity of implementations, handling of the addressing spaces assigned to experimental providers, IPv6 routing, etc. The network is organized as a three-layer hierarchy. At the highest level are the sites making up the 6Bone backbone, i.e., the portion of the network on which most of the geographical connectivity enjoyed by the other sites connected is based. At the next level down there are the so-called 6Bone transit sites, i.e., sites that are connected to at least one of the backbone sites but which, in turn, operate as network access points for nodes that do not have a direct tunnel toward the backbone. The latter are the lowest level in the current 6Bone hierarchy and are called leaves. Connectivity between the backbone sites is ensured by a large number of tunnels on the Internet and some direct links forming an arbitrary mesh topology within which the routing of IPv6 packets is based
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-29
on the BGPv4+ dynamic routing protocol [Bates00] (a version of BGP4 that is capable of supporting both IPv4 and IPv6).
3.9 Conclusions The adoption of IPv6 has been limited up to now, as it requires modification of the entire infrastructure of the Internet. More effort on the transition is therefore necessary to make it as simple as possible and open the way for the potential of IPv6. IPv6 is expected to gradually replace IPv4, with the two coexisting for a number of years during a transition period, thanks to tunneling and dual-stack techniques. Currently, IPv6 is implemented on the 6Bone network [6Bone], a collaborative project involving Europe, North America, and Japan. Studies about how to exploit novel IPv6 features have already appeared, and companies are also interested in IPv6 technology to overcome IPv4 limitations [Ficher03][LoBello03][VaSto03]. Migration from IPv4 to IPv6 is expected to be gradual, due to reasons that slow the process, for example: • Increased memory requirements for intermediate devices, such as routers, switches, etc., for network addresses • The extra load on domain name systems (DNSs), which need to maintain and provide both the addresses that, during the transition, each IPv6 host will have, i.e., an IPv4 28-bit address and an IPv6 128-bit one • The need to redesign the user interfaces of current TCP/IPv4 applications and services, which are based on traditional 32-bit addresses and therefore have to be adapted to work with the larger IPv6 addresses Nevertheless, all the major router vendors have already started to enable IPv6 implementation on their systems. Among the routing protocols supporting IPv6 are RIPng [Malkin98], OSPFv3 [Coltun99], Integrated IS-ISv6 [Hopps03], and MP-BGPv6 [Marque99]. More details can be found in [Cisco02b].
References [3COM] Understanding IP Addressing, White Paper, 3COM Corporation, www.3com.com. [6Bone] http://www.6bone.net. [Awdu02] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao, RFC 3272, Overview and Principles of Internet Traffic Engineering, May 2002. [Bates00] T. Bates, Y. Rekhter, R. Chandra, D. Katz, RFC 2858, Multiprotocol Extensions for BGP-4, June 2000. [Brade87] R. Braden, J. Postel, RFC 1009, Requirements for Internet Gateways, June 1987. [CBT] RFC 2189, Core-Based Tree (CBT Version 2) Multicast Routing: Protocol Specification, September 1997. [CBTv2] RFC 2201, Core-Based Tree (CBT) Multicast Routing Architecture, September 1997. [Chen99] E. Chen, J. Stewart, RFC 2519, A Framework for Inter-Domain Route Aggregation, February 1999, ftp://ftp.rfc-editor.org/in-notes/rfc2519.txt. [Cisco97] Cisco Systems, Integrating Enhanced IGRP into Existing Networks, 1997, http://www.cisco.com (search for the document title). [Cisco02a] Cisco Systems, Inc., Enhanced IGRP, http://www.cisco.com/univercd/cc/td/doc/cisintwk/ ito_doc/en_igrp.htm. [Cisco02b] Cisco IOS Learning Services, The ABCs of IP Version 6, 2002, www.cisco.com/go/abc. [Cisco03] Cisco Systems, Inc., Internetworking Technologies Handbook, Cisco Press, Indianapolis, 2003. [Callon90] R.W. Callon, RFC 1195, Use of OSI IS-IS for Routing in TCP/IP and Dual Environments, December 1990. [Coltun99] R. Coltun, D. Ferguson, J. Moy, RFC 2740, OSPF for IPv6, December 1999.
© 2005 by CRC Press
3-30
The Industrial Communication Technology Handbook
[Dalal78] Y.K. Dalal and R.M. Metcalf, Reverse path forwarding of broadcast packets, Communications of the ACM, 21(12), 1040–1048, December 1978. [Deeri88] S. Deering, Multicast routing in internetworks and extended LANs, ACM Computer Communication Review, 18(4), Proceedings of ACM SIGCOMM’88, pp. 55–64, Stanford, Aug. 16–19, 1988. [Deeri98] S. Deering, R. Hinden, RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, December 1998. [Dijks59] E.W. Dijkstra, A note on two problems in connection with graphs, Numer. Math., 1, 269–271, 1959. [Droms97] R. Droms, RFC 2131, Dynamic Host Configuration Protocol, March 1997. [Egeva94] K. Egevang, P. Francis, RFC 1631, The IP Network Address Translator (NAT), May 1994. [Estrin96] D. Estrin, T. Li, Y. Rekhter, K. Varadhan, D. Zappala, RFC 1940, Source Demand Routing: Packet Format and Forwarding Specification (Version 1), May 1996. [Estrin98] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, June 1998. [Fenne97] W. Fenner, RFC 2236, Internet Group Management Protocol, Version 2, November 1997. [Ficher03] S. Fichera, S. Visalli, O. Mirabella, QoS Support for Real-Time Flows in Internet Routers, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Floyd97] S. Floyd, V. Jacobson, Synchronization of periodic routing messages, IEEE/ACM Transactions on Networking, 2, 122–136, 1997. [Fuller93] V. Fuller, T. Li, J. Yu, K. Varadhan, RFC 1519, Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy, September 1993. [G6] http://www.g6.asso.fr. [Garcia93] J.J. Garcia-Luna-Aceves, Loop-free routing using diffusing computations, IEEE/ACM Transactions on Networking, 1, 130–141, 1993. [Gillig96] R. Gilligan, E. Nordmark, RFC 1933, Transition Mechanisms for IPv6 Hosts and Routers, April 1996. [Hakimi71] S.L. Hakimi, Steiner’s problem in graphs and its implications, Networks, 1, 113–133, 1971. [Halabi00] B. Halabi, D. McPherson, Internet Routing Architectures, Cisco Press, Indianapolis, 2000. [Hedri88] C.L. Hedri, RFC 1058, Routing Information Protocol, June 1988. [Hedri91] C.L. Hedri, An Introduction to IGRP, August 1991, http://www.cisco.com/warp/public/103/ 5.html. [Hinden93] R. Hinden, Editor, RFC 1517, Applicability Statement for the Implementation of Classless Inter-Domain Routing (CIDR), September 1993. [Hopps03] C.E. Hopps, Routing IPv6 with IS-IS, January 2003, draft-ietf-isis-ipv6-05.txt. [Huitem95] C. Huitem, Routing in the Internet, Prentice Hall, 1995. [IANA] Internet Assigned Number Authority homepage, http://www.iana.org/. [Kenyon02] T. Kenyon, Data Networks, Digital Press, Elsevier Science, 2002. [Kumar96] V. Kumar, Mbone: Interactive Media on the Internet, New Riders Publishing, Indianapolis, 1996. [Kurose01] J.F. Kurose, K. Ross, Computer Networking, Addison-Wesley, Reading, MA, 2001. [Lewis99] C. Lewis, Cisco TCP/IP Routing Professional Reference, McGraw-Hill Companies, New York, 1999. [LoBello03] L. Lo Bello, S. Fichera, S. Visalli, O. Mirabella, Congestion Control Mechanisms for MultiHop Network Routers, paper presented at IEEE International Conference on Emerging Technologies and Factory Automation ETFA2003, Lisbon, Portugal, October 2003. [Malkin97] G. Malkin, R. Minnear, RFC 2080, RIPng for IPv6, January 1997. [Malkin98] G. Malkin, RFC 2453/STD 0056, RIP Version 2, November 1998. [Marque99] P. Marques, F. Dupont, RFC 2545, Use of BGP-4 Multiprotocol Extensions for IPv6 InterDomain Routing, March 1999.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-31
[Mogul85] J. Mogul, J. Postel, RFC 950, Internet Standard Subnetting Procedure, August 1985. [Moy89] J. Moy, RFC 1131, OSPF Specification, October 1989. [Moy88] J. Moy, RFC 2328, OSPF Version 2, April 1998. [Moy94] J. Moy, RFC 1584, Multicast Extensions to OSPF, March 1994. [Nicho03] J. Nicholas, W. Siadak, Protocol Independent Multicast — Dense Mode (PIM-DM): Protocol Specification (Revised), September 2003, draft-ietf-pim-dm-new-v2-04.txt. [Partri95] C. Partridge, RFC 1809,Using the Flow Label Field in IPv6, June 1995. [Pepel00] I. Pepelnjak, EIGRP Network Design Solutions, Cisco Press, Indianapolis, 2000. [Perlm92] R. Perlman, Interconnections: Bridges and Routers, Addison-Wesley, Reading, MA, 1992. [Poste81] J. Postel, RFC 791, Internet Protocol, September 1981. [Rekht93a] Y. Rekhter, T. Li, RFC 1518, An Architecture for IP Address Allocation with CIDR, September 1993. [Rekht93b] Y. Rekhter, C. Topolcic, RFC 1520, Exchanging Routing Information across Provider Boundaries in the CIDR Environment, September 1993. [Rekhr95] Y. Rekhter, T. Li, RFC 1771, A Border Gateway Protocol 4 (BGP-4), March 1995. [Rosen82] E.C. Rosen, RFC 0827, Exterior Gateway Protocol, October 1982. [Schulz03] H. Schulzrinne, S. Casner, R. Frederick, RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [Trotter01] G. Trotter, RFC 3222, Terminology for Forwarding Information Base (FIB) Based Router Performance, December 2001. [UNI-C] http://www.uni-c.dk. [VaSto03] P. Van der Stok, M. van Hartskamp, Robust Real-Time IP-Based Multimedia Communication, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Waitz88] D. Waitzman, C. Partridge, RFC 1075, Distance Vector Multicast Routing Protocol, November 1988. [Wall80] D. Wall, Mechanisms for Broadcast and Selective Broadcast, Ph.D. Dissertation, Stanford University, June 1980. [Waxm88] B.M. Waxmann, Routing of multipoint connections, IEEE Journal of Selected Areas in Communications, 6, 1617–1622, 1988. [Wei93] L.Wei, D. Estrin, TR USC-CD-93-560, A Comparison of Multicast Trees and Algorithms, Department of Computer Science, University of Southern California, Los Angeles, September 1993. [WIDE] http://6bone.v6.wide.ad.jp.
© 2005 by CRC Press
4 Fundamentals in Quality of Service and Real-Time Transmission 4.1 4.2
What Is Quality of Service? ................................................4-1 Factors Affecting the Network Quality..............................4-3 Bandwidth • Throughput • Latency • Queuing Delay • Transmission Delay • Propagation Delay • Processing Delay • Jitter • Packet Loss
4.3
QoS Delivery........................................................................4-6 FIFO Queuing • Priority Queuing • Class-Based Queuing • Weighted Fair Queuing
4.4
Protocols to Improve QoS ..................................................4-8 Integrated Services • Differentiated Services • Multi-Protocol Label Switching • Combining QoS Solutions
4.5
Wolfgang Kampichler Frequentis GmbH
Protocols Supporting Real-Time Traffic..........................4-14 Real-Time Transport Protocol • Real-Time Transport Control Protocol • Real-Time Streaming Protocol
References .....................................................................................4-17
4.1 What Is Quality of Service? It is difficult to find an adequate definition of what quality of service (QoS) actually is. There is a danger that because we wish to use quantitative methods, we might limit the definition of QoS to only those aspects of QoS that can be measured and compared. In fact, there are many subjective and perceptual elements to QoS, and there has been a lot of work done trying to map the perceptual to the quantifiable (particularly in the telephony industry). However, as yet there does not appear to be a standard definition of what QoS actually is in measurable terms. When considering the definition of QoS, it might be helpful to look at the old story of the three blind men who happen to meet an elephant on their way. The first man touches the elephant’s trunk and determines that he has stumbled upon a huge serpent. The second man touches one of the elephant’s massive legs and determines that the object is a large tree. The third man touches one of the elephant’s ears and determines that he has stumbled upon a huge bird. All three of the men envision different things, because each man examines only a small part of the elephant. In this case, think of the elephant as a concept of QoS. Different people see QoS as different concepts, because various and ambiguous QoS problems exist. Hence, there is more than one way to characterize QoS. Briefly described, QoS is the
4-1 © 2005 by CRC Press
4-2
The Industrial Communication Technology Handbook
ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery [3]. By nature, the basic Internet Protocol (IP) service available in most of the network is a best effort. For instance, from a router point of view (upon receiving a packet at the router), this service could be described as follows: • It determines first where to send the incoming packet (the next-hop of the packet). This is usually done by looking up the destination address in the forwarding table. • Once it is aware of the next-hop, it will send the packet to the interface associated with this nexthop. If the interface is not able to immediately send the packet, it is stored on the interface in an output queue. • If the queue is full, the arriving packet is dropped. If the queue already contains packets, the newcomer is subjected to extra delay due to the time needed to emit the older packets in the queue. Best effort allows the complexity to stay in the end hosts, so the network can remain relatively simple. This scales well, as evidenced by the ability of the Internet to support its growth. As more hosts are connected, the network degrades gracefully. Nevertheless, the resulting variability in delivery delay and packet loss does not adversely affect typical Internet applications (e.g., e-mail or file transfer). Considering applications with real-time requirements, delay, delay variation, and packet loss will cause problems. Generally, applications are of two main types: • Applications that generate elastic traffic — The application would rather wait for reception of traffic in the correct order, without loss, than display incoming information at a constant rate (such as an e-mail). • Applications that generate inelastic traffic — Timeliness of information is more important to the application than zero loss, and traffic that arrives after a certain delay is essentially useless (such as voice communication). In an IP-based network, applications run across User Datagram Protocol (UDP) or Transmission Control Protocol (TCP) connections. TCP guarantees delivery, doing so through some overhead and session layer sequencing of traffic. It also throttles back transmission rates to behave gracefully in the face of network congestion. By contrast, UDP is connectionless; thus, no guarantee of delivery is made, and sequencing of information is left to the application itself. Most elastic applications use TCP for transmission, and in contrast, many inelastic applications use UDP as a real-time transport. Inelastic applications are often those that demand a preferential class of service or some form of reservation to behave properly. However, many of the mechanisms that network devices use (such as traffic discard or TCP session control) are less effective on UDP-based traffic since it does not offer some of TCP’s self-regulation. Common for all packets is that they are treated equally. There are no guarantees, no differentiation, and no attempt at enforcing fairness. However, the network should try to forward as much traffic as possible with reasonable quality. One way to provide a guarantee to some traffic is to treat those packets differently from packets of other types of traffic. Increasing bandwidth is seen as a necessary first step for accommodating real-time applications, but it is still not enough. Even on a relatively unloaded network, delivery delays can vary enough to continue to affect time-sensitive applications adversely. To provide an appropriate service, some level of quantitative or qualitative determinism must be supplemented to network services. This requires adding some “intelligence” to the net, to distinguish traffic with strict timing requirements from others. Yet there remains a further challenge: in the real world, the end-to-end communication path consists of different elements utilizing several network layers and traversing domains managed by different service providers. Therefore, it is unlikely that QoS protocols will be used independently, and in fact, they are designed for use with other QoS technologies to provide top-to-bottom and end-to-end QoS between senders and receivers. What does matter is that each element has to provide QoS control services and the ability to map other QoS technologies in the correct manner. The following gives a brief overview
© 2005 by CRC Press
4-3
Fundamentals in Quality of Service and Real-Time Transmission
LAN Ethernet 100 Mbps
LAN Ethernet 100 Mbps
WAN STM-1 155 Mbps Router
Router
Processing delay Queuing delay Transmission delay Propagation delay
FIGURE 4.1 Network end-to-end communication path.
of end-to-end network behavior and some key QoS protocols and architectures. For a detailed description, refer to other chapters in this book.
4.2 Factors Affecting the Network Quality A typical end-to-end communication path might look like that illustrated in Figure 4.1 and consist of two machines, each connected through a local area network (LAN) to an enterprise network. Further, these networks might be connected through a wide area network (WAN). The data exchange can be anything from a short e-mail message to a large file transfer, an application download from a server, or communication data from a time-sensitive application. While networks, especially local area networks, have been getting faster, perceived throughput at the application has not always increased accordingly. An application is generally running on a host CPU, and its performance is a function of the processing speed, memory availability, and overall operating system load. In many situations, it is the processing that is the real limiting factor on throughput, rather than the infrastructure that is moving data [14]. Network interface hardware transfers incoming packets from the network to the computer’s memory and informs the operating system that a packet has arrived. Usually, the network interface uses the interrupt mechanism to do so. The interrupt causes the CPU to suspend normal processing temporarily and to jump to a code called a device driver. The device driver informs the protocol software that a packet has arrived and must be processed. Similar operations occur in each intermediate network node. Routing devices pass packets along a chain of hops until the final address is reached. These hops are routing machines of various kinds that generally maintain a queue (or multiple queues) of outgoing packets on each outgoing physical port [2]. If these queues of outgoing data packets become full, a routing machine simply starts to discard packets randomly to ease the buildup of congestion. It is evident that such nodes are customized for forwarding operations, which are mostly processed in hardware. In recent years, however, the Internet has seen increasing use of applications that rely on the timely, regular delivery of packets, and that cannot tolerate the loss of packets or the delay caused by waiting in queues. In general, the one-way delay is equivalent to the sum of single-hop delays suffered between each pair of consecutive pieces of equipment encountered on the path. Measurable factors [7], [8] that are used to describe network QoS are as follows.
4.2.1 Bandwidth Bandwidth (better described as data rate in this context) is the transmission capacity of a communications line, which is usually stated in bit/second. In reality, as data exchange approaches the maximum limit (in a shared environment), delays and collisions might mean a drop in quality. Basically, the bandwidths of all networks utilized in an end-to-end path need to be considered, as the narrowest section provides the maximum speed of data transfer for the entire path. A routing device needs to be capable of transmitting data at a rate commensurate with the potential bandwidth of the network segments that it is servicing. The cost of bandwidth has fallen in recent years, but demand has obviously gone up.
© 2005 by CRC Press
4-4
The Industrial Communication Technology Handbook
TABLE 4.1
Queuing Delays
Number of Queued 1000-Bit Packets
STM-1 (155 Mbps)
STM-4 (622 Mbps)
Gigabit Ethernet (1 Gbps)
40 (80% load) 80 (85% load) 200 (93% load) 500 (97% load)
256 ms 512 ms 1280 ms 3200 ms
64 ms 128 ms 320 ms 800 ms
40 ms 80 ms 200 ms 500 ms
4.2.2 Throughput Throughput is the average of actual traffic transferred over a given link, in a given time span expressed in bit/second. It can be seen, for congestion-aware transport protocols such as TCP, as transport capacity = (data sent)/(elapsed time), where data sent represents the unique data bits transferred (i.e., not including header bits or emulated header bits). It should also be noted that the amount of data sent should only include the unique number of bits transmitted (i.e., if a particular packet is retransmitted, the data it contains should be counted only once). Hence, in such a case, the throughput is not only limited by the transmission window, but also limited by the value of the round-trip time.
4.2.3 Latency In general, latency is the time taken to transmit a packet from a sending to a receiving node. This encompasses delay in a transmission path or in a device within the transmission path. The nodes might be end stations or intermediate routes. Within a single router, latency is the amount of time between the receipt of a data packet and its transmission, which includes processing and queuing delay, as described next, among other sources of delay.
4.2.4 Queuing Delay The major random component of delay (that is, the only source of jitter) for a given end-to-end path consists of queuing delay in the network. Queuing delay depends on the number of hops in the path and the queuing mechanisms used, and it also increases with the offered load leading to packet loss if the queues are filled up. The last packet in the queue has to wait (N*8)/X seconds before being emitted by the interface, where N is the number of bytes that have to be sent before the last queued packet and X is the sending rate (bit/s). Typical queuing delay values of state-of-the-art routers are summarized in Table 4.1. Values are about 0.5 to 1 ms; thus, it can be said that queuing delay in a well-dimensioned backbone network (using priority scheduling mechanisms, as described later) would not dramatically increase latency, even if there are five to eight hops within the path. At this point, it should be mentioned that queuing delay may be impaired by edge routers connecting high- and low-bandwidth links and could easily reach tens of milliseconds, thus increasing latency more noticeably.
4.2.5 Transmission Delay Transmission, or serialization delay, is the time taken to transmit all the bits of the frame containing the packet, i.e., the time between emission of the first bit of the frame and emission of the last bit (see also [4]). It is inversely proportional to the line speed or, in other words, the ratio between packet size (bit) and transmission rate (bps). For example, transmission of a 1500-byte packet over a 10-Mbps link takes 1.2 ms, whereas for a 64-kbps link it takes 187.5 ms (the protocol overhead is not considered in either case). In general, a small packet size and a high transmission rate lower the transmission time.
4.2.6 Propagation Delay Propagation delay is the time between emission (by the emitting equipment) of the first bit (or the last bit) and the reception of this bit by the receiving equipment. It is mainly a function of the speed of light
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-5
and the distance traveled. For local area networks, propagation delay is almost negligible. For wide area connections, it typically adds 2 ms per 250 miles to the total end-to-end delay. One can assume that a well-designed homogeneous high-speed backbone network (e.g., STM-4*) would have a network delay (only propagation and queuing taken into account) of 10 ms when considering 10 hops using priority queuing mechanisms and a network extension of about 625 miles.
4.2.7 Processing Delay Most networks use a protocol suite that provides connectionless data transfer end to end, in our case IP. Link layer communication is usually implemented in hardware, but IP will usually be implemented in software, executing on the CPU in a communicating end station. Normally, IP performs very few functions. Upon inputting of a packet, it checks the header for correct form, extracts the protocol number, and calls the upper-layer protocol function. The executed path is almost always the same. Upon outputting, the operation is very similar, as shown in the following IP instruction counts: • Packet receipt: 57 instructions • Packet sending: 61 instructions Since input occurs at interrupt time, arbitrary procedures cannot be called to process each packet. Instead, the system uses a queue along with message-passing primitives to synchronize communication. When an IP datagram arrives, the interrupt software must en(d)-queue the packet and invoke a send primitive to notify the IP process that a datagram has arrived. When the IP process has no packets to handle, it calls the receiving primitive to wait for the arrival of another datagram. Once the IP process accepts an incoming datagram, it must decide where to send it for further processing. If the datagram carries a TCP segment, it must go to the TCP module; if it carries a UDP datagram, it is forwarded to the UDP module. Being complex, most TCP designs use a separate process to handle incoming segments. A consequence of having separate IP and TCP processes is that they must use an interprocess communication mechanism when they interact. Once TCP receives a segment, it uses the protocol port numbers to find the connection to which the segment belongs. If the segment contains data, TCP will add the data to a buffer associated with the connection and return an acknowledgment to the sender. If the incoming segment carries an acknowledgment for outbound data, the input process must also communicate with the TCP timer process to cancel the pending retransmission. The process structure used to handle an incoming UDP datagram is quite different from that used for TCP. As UDP is much simpler than TCP, the UDP software module does not execute as a separate process. Instead, it consists of conventional procedures executed by the IP process to handle an incoming UDP datagram. These procedures examine the destination UDP port number and use it to select an operating system queue for the incoming datagram. The IP process deposits the UDP datagram on the appropriate port, where an application program can extract it [15].
4.2.8 Jitter Jitter is best described as the variation in end-to-end delay, and it has its main source in the random component of the queuing delay. Jitter can be expressed as the distortion of interpacket arrival times when compared to the interpacket departure times from the original sending station. For instance, packets are sent out at regular intervals, but may arrive at varying irregular intervals. Jitter is the variation in interval times. When packets are taking multiple paths to reach their destination, extreme jitter can lead to packets arriving out of order. Jitter is generally measured in milliseconds, or as a percentage of variation from the average latency of a particular connection.
*Synchronous digital hierarchy (SDH) defines n transport levels (hierarchy) called a synchronous transport module-n (STM-n).
© 2005 by CRC Press
4-6
The Industrial Communication Technology Handbook
FIGURE 4.2 Classification, queuing, and scheduling.
4.2.9 Packet Loss Packets that fail to arrive, or arrive so late that they are useless, contribute to packet loss. Lost (or dropped) packets are a product of insufficient bandwidth on at least one routing device on the network path. Some packets may arrive, but have been corrupted in transit and are therefore unusable. Note that loss is relative to the volume of data that is sent and is usually expressed as a percentage of data sent. In some contexts, a high loss percentage can mean that the application is trying to send too much information and is overwhelming the available bandwidth. Packet loss starts to be a real problem when the percentage of loss exceeds a specific threshold or when loss occurs in bursts. Thus, it is important to know both the percentages of lost packets and their distribution [5].
4.3 QoS Delivery As packet-switched networks are operated in a store-and-forward paradigm, a solution for service differentiation in the forwarding process is to give priority to packets requiring, for instance, an upperbounded delay over other packets. Considering that queuing is the central component in the internal architecture of a forwarding device, it is not difficult to imagine that managing such queuing mechanisms appropriately is crucial for providing the underlying QoS. Hence, queuing can be seen as one of the fundamental parts for differentiating service levels. The queuing delay can be minimized and kept under a certain value, even in the case of interface congestion. To achieve this, the forwarding device has to support classification, queuing, and scheduling (CQS) techniques to classify packets according to a traffic type and its requirements, to place packets on different queues according to this type. Finally, to schedule outgoing packets by selecting them from the queues in an appropriate manner, see Figure 4.2. The following descriptions of queuing disciplines focus on output queuing strategies, being the predominating strategic location for store-and-forward traffic management and QoS-related queuing [3], common for all QoS policies. Queuing should never happen permanently and continuously; instead, it is used to deal with occasional traffic peaks.
4.3.1 FIFO Queuing First-in, first-out (FIFO) queuing is considered to be the standard method for store-and-forward handling of traffic from an incoming interface to an outgoing interface. Many router vendors have highly optimized forwarding performances that make this standard behavior as fast as possible. When a network operates in a mode with sufficient level of transmission capacity and adequate levels of switching capability, FIFO
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-7
queuing is highly efficient. This is because as long as the queue depth remains sufficiently short, the average packet-queuing delay is an insignificant fraction of the end-to-end packet transmission time. Otherwise, when the load on the network increases, the transient bursts raise significant queuing delay, and if the queue is full, all subsequent packets are discarded.
4.3.2 Priority Queuing One of the first queuing variations to be widely implemented was priority queuing. This is based on the concept that certain types of traffic can be identified and shuffled to the front of the output queue so that some traffic is always transmitted ahead of other types of traffic. Priority queuing may have an adverse effect on forwarding performance because of packet reordering (non-FIFO queuing) in the output queue. This method offers several levels of priority, and the granularity in identifying traffic to be classified into each queue is very flexible. Although the level of granularity is fairly robust, the more differentiation attempted, the more impact on computational overhead and packet-forwarding performance. Another possible vulnerability in this queuing approach is that if the volume of high-priority traffic is unusually high, normal traffic to be queued may be dropped because of buffer starvation. This usually occurs because of overflow caused by too many packets waiting to be queued and there is not enough room in the queue to accommodate them.
4.3.3 Class-Based Queuing Another queuing mechanism introduced several years ago is called class-based queuing (CBQ) or custom queuing (CQ). Again, this is a well-known mechanism used within operating system design intended to prevent complete resource denial to any particular class of service. CBQ is a variation of priority queuing, where several output queues can be defined. CBQ provides a mechanism to configure how much traffic can be drained off each queue in a servicing rotation. This servicing algorithm is an attempt to provide some semblance of fairness by prioritizing queuing services for certain types of traffic, while not allowing any one class of traffic to monopolize system resources. CBQ can be considered a primitive method of differentiating traffic into various classes of service, and for several years, it has been considered an efficient method for queue resource management. However, CBQ simply does not scale to provide the desired performance in some circumstances, primarily because of the computational overhead concerning packet reordering and intensive queue management in networks with very high speed links.
4.3.4 Weighted Fair Queuing Weighted fair queuing (WFQ) is another popular method of queuing that algorithmically attempts to deliver predictable behavior and to ensure that traffic flows do not encounter buffer starvation. It gives low-volume traffic flows preferential treatment and allows higher-volume traffic flows to obtain equity in the remaining amount of queuing capacity. WFQ uses a servicing algorithm that attempts to provide predictable response times and negate inconsistent packet transmission timing, which is done by sorting and interleaving individual packets by flow, and queuing each flow based on the volume of traffic in each flow [6]. Typically, low-bandwidth streams, such as Voice-over-IP (VoIP), are given priority over largerbandwidth consumers such as file transfer. The weighted aspect of WFQ is dependent on the way in which the servicing algorithm is affected by other extraneous criteria. This aspect is usually vendor specific, and at least one implementation uses the IP precedence bits in the type-of-service (ToS, or DiffServ Code Point (DSCP), as described later) field to weight the method of handling individual traffic flows. WFQ possesses some of the same characteristics as priority and class-based queuing — it simply does not scale to provide the desired performance in some circumstances, primarily because of computational overhead. However, if these methods of queuing (priority, CBQ, and WFQ) are moved completely into hardware instead of being done in software, the impact on forwarding performance can be reduced greatly.
© 2005 by CRC Press
4-8
The Industrial Communication Technology Handbook
4.4 Protocols to Improve QoS Delivering network QoS for a particular application implies minimizing the effects of sharing network resources (bandwidth, routers, etc.) with other applications. This means effective QoS aims to minimize delay, optimize throughput, and minimize jitter and loss. The reality is that network resources are shared with other competing applications. Some of the competing applications could also be time-dependent services (inelastic traffic); others might be the source of traditional, best-effort traffic. For this reason, QoS has the further goal of minimizing the parameters mentioned for a particular set of applications or users, but without adversely affecting other network users. In order to regulate network capacity, the network must classify traffic and then handle it in some way. The classification and handling may happen on a single device consisting of both classifiers and queues or routes. In a larger network, however, it is likely that classification will happen at the periphery, where devices can recognize application needs, while handling is performed at the core, where congestion occurs. The signaling between classifying devices and handling devices can come in a number of ways, like the ToS of an IP header or other protocol extensions. Classification can occur based on a variety of information sources, such as protocol content, media identifier, the application that generated the traffic, or extrinsic factors such as time of the day or congestion levels. Similarly, handling can be performed in a number of ways: • Through traffic shaping (traffic arrives and is placed in a queue, where its forwarding is regulated; excess traffic will be discarded) • Through various queuing mechanisms (first-in, first-out, priority weighting, and class-based queuing) • Through throttling using various flow control algorithms such as used in TCP • Through the selective discard of traffic to notify transmitters of congestion • Through packet marking for sending instructions to downstream devices that will shape the traffic QoS protocols are designed to act that way, but they never create additional bandwidth; rather, they manage it to be used more effectively. Briefly summarized, QoS is the ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery. The following sections give a brief overview of some of the key QoS protocols and architectures.
4.4.1 Integrated Services The Integrated Services (IntServ) architecture provides a framework for applications to choose between multiple controlled levels of delivery of services for their traffic flows. Two basic requirements exist to support this framework. The first is for the nodes in the traffic path to support the QoS control mechanisms and guaranteed services. The second is for a mechanism by which the applications can communicate their QoS requirements to the nodes along the transit path, as well as for the network nodes to communicate between each other about the requirements that must be provided for the particular traffic flow. All this is provided by a Resource Reservation Setup Protocol called RSVP [9], which is best described as the QoS signaling protocol. The information presented here is intended to be a qualitative description of the protocol, as in [3]. There is a logical separation between the Integrated Services QoS control services and RSVP. RSVP is designed to be used with a variety of QoS control services, and the QoS control services are designed to be used with a variety of setup mechanisms [11]. RSVP does not define the internal format of the protocol objects related to characterizing QoS control services; rather, it can be seen as the signaling mechanism transporting the QoS control information. RSVP is analogous to other IP control protocols, such as Internet Control Message Protocol (ICMP) or one of the many IP routing protocols. RSVP itself is not a routing protocol, but it uses the local routing table in routers to determine routes to the appropriate destinations.
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-9
FIGURE 4.3 Traffic flow of the RSVP Path and Resv messages.
In general terms, RSVP is used to provide QoS requests to all router nodes along the transit path of the traffic flows and to maintain the state necessary in the routers required to actually provide the requested services. RSVP requests, generally, result in resources being reserved in each router in the transit path for each flow. RSVP requires the receiver to be responsible for requesting specific QoS services, instead of the sender. This is an intentional design in RSVP that attempts to provide for efficient accommodation of large groups (e.g., multicast traffic), dynamic group membership (also for multicast), and diverse receiver requirements. There are two fundamental RSVP message types: the Resv message and the Path message, which provide for the basic RSVP operation, illustrated in Figure 4.3. An RSVP sender transmits Path messages downstream along the traffic path provided by a discrete routing protocol (i.e., Open Shortest-Path First (OSPF)). The Resv message is generated by the receiver and is transported back upstream toward the sender, creating and maintaining a reservation state in each node along the traffic path. RSVP still can function across intermediate nodes that are not RSVP capable. However, end-to-end resource reservations cannot be made, because non-RSVP-capable devices in the traffic path cannot maintain reservation or Path state in response to appropriate RSVP messages. Although intermediate nodes that do not run RSVP cannot provide these functions, they may have sufficient capacity to be useful in accommodating tolerant real-time applications. Since RSVP relies on a discrete routing infrastructure to forward RSVP messages between nodes, the forwarding of Path messages by non-RSVP-capable intermediate nodes is unaffected, since the Path message is carrying the IP address of the previous RSVP-capable node as it travels toward the receiver. RSVP is not a routing protocol by itself. RSVP is designed to operate with current and future unicast and multicast routing protocols. An RSVP process consults the local routing database(s) to obtain routes. In the multicast case, for example, a host sends IGMP messages to join a multicast group and then sends RSVP messages to reserve resources along the delivery path(s) of that group. Routing protocols determine where packets get forwarded — RSVP is only concerned with the QoS of those packets that are forwarded in accordance with routing (Figure 4.4). Summing up, Integrated Services is capable of bringing enhancements to the IP network model to support real-time transmissions and guaranteed bandwidth for specific flows. In this case, a flow is defined as a distinguishable stream of related datagrams from a unique sender to a unique receiver that results from a single user activity and requires the same QoS. The Integrated Services architecture promises precise per-flow service provisioning, but never really made it as a commercial end-user product, which was mainly accredited to its lack of scalability [16].
4.4.2 Differentiated Services Differentiated Services (DiffServ) defines an architecture (RFC 2474 and 2475) for implementing scalable service differentiation in the Internet. Here, a service defines some significant characteristics of packet transmission in one direction across a set of one or more paths within a network. These characteristics may
© 2005 by CRC Press
4-10
The Industrial Communication Technology Handbook
0
32 version
flags
message type
RSVP checksum
(reserved)
RSVP length
send TTL
• Version — The protocol version number; the current version is 1. • Flags — No flag bits are defined yet. • Message type — Possible values are: 1 Path, 2 Resv, 3 PathErr, 4 ResvErr, 5 PathTear, 6 ResvTear, and 7 ResvConf. • RSVP checksum — The checksum. • Send TTL — The IP TTL value with which the message was sent. • RSVP length — The total length of the RSVP message in bytes, including the common header and the variable-length objects that follow. FIGURE 4.4 Resource Reservation Protocol (RSVP). 0
8 CP
CU
FIGURE 4.5 DiffServ Code Point.
be specified in quantitative or statistical terms of throughput, delay, jitter, and loss, or may otherwise be specified in terms of some relative priority of access to network resources. Service differentiation is desired to accommodate heterogeneous application requirements and user expectations, and to permit differentiated pricing of Internet service. Differentiated Services mechanisms do not use per-flow signaling and, as a result, do not consume per-flow state within the routing infrastructure. Different service levels can be allocated to different groups of users, which means that all traffic is distributed into groups or classes with different QoS parameters. This reduces the maintenance overhead in comparison to Integrated Services. Network traffic is classified and apportioned to network resources according to bandwidth management criteria. To enable QoS, network elements give preferential treatment to classifications identified as having more demanding requirements. DiffServ provides a simple and coarse method of classifying services of applications. The main goal of DiffServ is a more scalable and manageable architecture for service differentiation in IP networks [13]. The initial premise was that this goal could be achieved by focusing not on individual packet flows, but on traffic aggregates, large sets of flows with similar service requirements. By carefully aggregating a multitude of QoS-enabled flows into a small number of aggregates, giving a small number of differentiated treatments within the network, DiffServ eliminates the need to recognize and store information about each individual flow in core routers. This basic approach to scalability succeeds by combining a small number of simple packet treatments with a larger number of per-flow policies to provide a broad and flexible range of services. A description of the externally observable forwarding treatment applied at a Differentiated Services-compliant node to a behavior aggregate is defined as per-hop behavior (PHB). Each DiffServ flow is policed and marked at the first QoS-enabled downstream router according to a contracted service profile, or service-level agreement (SLA). Downstream from this router, a DiffServ flow is mingled with similar DiffServ traffic into an aggregate. Then, all further forwarding and policing activities are performed on these aggregates. Current proposals [12] are using a few bits of the IP version 4 (IPv4) ToS byte or the IPv6 traffic class byte, now called the DiffServ Code Point (DSCP) (Figure 4.5), for marking packets. The format of the header is as follows: • CP — (Six-bit) Differentiated Services Code Point to select the PHB a packet experiences at each node • CU — Currently unused
© 2005 by CRC Press
4-11
Fundamentals in Quality of Service and Real-Time Transmission
Classifier
Conditioner Marker
Meter
FIGURE 4.6 Edge router: DiffServ classification and conditioning.
There are currently two standard per-hop behaviors defined that effectively represent two service levels (traffic classes): • Expedited forwarding (EF) — The objective with EF PHB (RFC 3246) is to provide a service that is low loss, low delay, and low jitter, such that the service approximates a virtual leased line. The basic approach is to minimize the loss and delay experienced in the network by minimizing queuing delays. This job can be done by ensuring that, at each node, the rate of departure of packets from the node is a well-defined minimum (shaping on egress points), and that the arrival rate at the node is always less than the defined departure rate (policing on ingress points). For example, to ensure that the incoming rate is always below the configured outgoing rate, any traffic that exceeds the traffic profile, which is defined by local policy, is discarded. Generally, expedited forwarding could be implemented in network nodes by a priority queue. The recommended DSCP value for the EF PHB is 101110; see [20]. • Assured forwarding (AF) — AF PHB is defined in RFC 2597. Its objective is to provide a service that ensures that high-priority packets are forwarded with a greater degree of reliability than lower-priority packets. AF defines four priorities (classes) of traffic, receiving different bandwidth levels (sometimes described as the Olympic services: gold, silver, bronze, and best effort). There are three drop preferences within each priority class, resulting in 12 different DSCP values. The worse the drop preference, the more chance of getting dropped during congestion. Hence, AF PHB enables packets to be marked with different AF classes and, within each class, to be marked with different drop precedence values. Within a router, resources are allocated according to the different AF classes. If the resources allocated according to a different class become congested, then packets must be dropped. The packets to be dropped are those with higher drop precedence, as in [20]. Normally, the traffic into a DiffServ network from a particular source should conform to a particular traffic profile; thus, the rate of traffic should not exceed some preagreed maximum. In the event that it does, excess traffic is not delivered with as high of a probability as the traffic within the profile, which means it may be demoted but not necessarily dropped. The PHBs are expected to be simple and define forwarding behaviors that may suggest, but do not require, a particular implementation or queuing discipline. In general, a classifier selects packets based on one or more predefined sets of header fields. The mapping of the network traffic to the specific behaviors is indicated by the DSCP. The traffic conditioners enforce the rules of each service at the network ingress point. Finally, PHBs are applied to the traffic by the conditioner at a network ingress point according to predetermined policy criteria. The traffic may be marked at this point and routed according to the marking, and then unmarked at the network egress. Each DiffServ-enabled edge router implements traffic conditioning functions, which perform metering, shaping, policing, and marking of packets to ensure that the traffic entering a DiffServ network conforms to the SLA, as illustrated in Figure 4.6. The simplicity of DiffServ to prioritize traffic belies its extensibility and power. Using RSVP parameters (as described in the next section) or specific application types to identify and classify constant-bit-rate (CBR) traffic might help to establish well-defined aggregate flows that may be directed to fixed-bandwidth pipes. DiffServ is more scalable at the cost of coarser service granularity, which may be the reason why it is not yet commercially available to the end users; see also [16].
© 2005 by CRC Press
4-12
The Industrial Communication Technology Handbook
0
32 label
exp
S
TTL
• Label — Label value carries the actual value of the label. When a labeled packet is received, the label value at the top of the stack is inspected and learns: • The next-hop to which the packet is to be forwarded. • The operation to be performed on the label stack before forwarding; this operation may be to replace the top label stack entry with another, or to pop an entry off the label stack, or to replace the top label stack entry and then to push one or more additional entries on the label stack. • Exp — Experimental use: Reserved for experimental use. • S — Bottom of stack: This bit is set to one for the last entry in the label stack, and zero for all other label stack entries. • TTL — Time-to-live field is used to encode a time-to-live value. FIGURE 4.7 MPLS label structure.
4.4.3 Multi-Protocol Label Switching As stated, we can see that IntServ and DiffServ take different approaches solving the QoS challenge. Meanwhile, another approach exists that is slightly different but already in use: Multi-Protocol Label Switching (MPLS). In contrast, it is not primarily a QoS solution, although it can be used to support QoS requirements. More specifically, MPLS has mechanisms to manage traffic flows of various granularities and is independent of the layer 2 and layer 3 protocols such as asynchronous transfer mode (ATM) and IP. MPLS provides a means to map IP addresses to simple, fixed-length labels used by different packet-forwarding and packet-switching technologies. Additionally, MPLS interfaces to existing routing and switching protocols, such as IP, ATM, Frame Relay, Resource Reservation Protocol (RSVP), Open Shortest-Path First (OSPF), and others. In MPLS, data transmission occurs on label-switched paths (LSPs). LSPs are a sequence of labels at each and every node along the path from the source to the destination. There are several label distribution protocols used today, such as Label Distribution Protocol (LDP) or RSVP, or piggybacked on routing protocols like Border Gateway Protocol (BGP) and OSPF. High-speed switching of data is possible because the fixed-length labels are inserted at the very beginning of the packet or cell and can be used by hardware to switch packets quickly between links. MPLS is best viewed as a new switching architecture and is basically a forwarding protocol that simplifies routing in IP-based networks. It specifies a simple and scalable forwarding mechanism, since it uses labels instead of a destination address to make the routing decision. The label value that is placed in an incoming packet header is used as an index to the forwarding table in the router (Figure 4.7). This lookup requires only one access to the table, in contrast to the traditional routing table access that might require uncountable lookups [1]. One of the most important uses of MPLS is in the area of traffic engineering, which can be summarized as the modeling, characterization, and control of traffic to meet specified performance objectives. Such performance objectives might be traffic oriented or resource oriented. The former deals with QoS and includes aspects such as minimizing delay, jitter, and packet loss. The latter deals with optimum usage of network resources, particularly network bandwidth. The current situation with IP routing and resource allocation is that the routing protocols are not well equipped to deal with traffic engineering issues. For example, a protocol such as OSPF (open shortestpath first) can actually promote congestion because it tends to force traffic down the shortest route, although other acceptable routes might be less loaded. With MPLS, a set of flows that share specific attributes can be routed over a given path. This capability has the immediate advantage of steering certain traffic away from the shortest path, which is likely to become congested before other paths. In conclusion, we may say that label switching offers scalability to networks by allowing a large number of IP addresses to be associated with one or a few labels. This approach reduces further the size of address
© 2005 by CRC Press
4-13
Top-to-bottom QoS
Fundamentals in Quality of Service and Real-Time Transmission
Application
Application
Presentation
Presentation
Session
Session
Transport
Transport
Network
Network
Data link
Data link
Physical
Physical
QoS API RSVP DiffServ 802.1p
802.1p
802.1p RSVP
QoS enabled application
DiffServ, MPLS
RSVP
End-to-end QoS
FIGURE 4.8 QoS architecture.
(actually label) tables and allows a router to support more users or to set up fixed paths for different types of traffic. Since the main attributes of label switching are fast relay of the traffic, scalability, simplicity, and route control, label switching can be a valuable tool to reduce latency and jitter for data transmission on packet-switched networks.
4.4.4 Combining QoS Solutions The QoS solutions previously described take different approaches, and each has its advantages and disadvantages. The Integrated Service approach is based on a sophisticated background of research in QoS mechanisms and protocols for packet networks. However, the acceptance of IntServ from network providers and router providers has been quite limited, at least so far, mainly due to scalability and manageability problems [10]. The scalability problems arise because IntServ requires routers to maintain control and a forwarding state for all flows passing through them. Maintaining and processing a per-flow state for gigabit or terabit links, with a lot of simultaneously active flows, is significantly difficult from an implementation point of view. Hence, the IntServ architecture makes the management and accounting of IP networks significantly more complicated. Additionally, it requires new application–network interfaces and can only provide service guarantees when all elements in the flow’s path support IntServ. MPLS may be used as an alternative intradomain implementation technology. These architectures in combination can enable end-to-end QoS. End hosts may use RSVP requests with high granularity (e.g., bandwidth, jitter, threshold, etc.). Border routers at backbone ingress points can then map those RSVP reservations to a class of service indicated by a DSCP or to a dedicated MPLS path. At the backbone egress point, the RSVP provisioning may be honored again, to the final destination; see Figure 4.8. Such combinations clearly represent a trade-off between service granularity and scalability: as soon as flows are aggregated, they are not as isolated from each other as they possibly were in the IntServ part of the network. This means that, for instance, unresponsive flows can degrade the quality of responsive flows. The strength of a combination is the fact that it gives network operators another opportunity to customize their network and fine-tune it based on QoS and scalability demands, as stated in [16]. Until now, IP has provided a best-effort service in which network resources are shared equitably. Adding quality-of-service support to the Internet raises significant concerns, since it enables Differentiated Services that represent a significant departure from the fundamental and simple design principles that made the Internet a success. Nonetheless, there is a significant need for IP QoS, and protocols have
© 2005 by CRC Press
4-14
The Industrial Communication Technology Handbook
evolved to address this need. The most viable solution today is a trade-off between protocol complexity and bandwidth scarcity with the following results: • Different QoS levels are used in the core network (e.g., four MPLS levels). • Applications at the user side are distinguished by DiffServ mechanisms. • The marked user traffic is mapped to the appropriate core layers. Finally, we should always bear in mind that an application-to-application guarantee not only depends on network conditions but also on the overall performance of each end system and the way of supporting real-time traffic, as discussed next.
4.5 Protocols Supporting Real-Time Traffic This section is intended to give a brief overview of protocols supporting end-to-end transport of realtime data. However, it may also be added that these protocols do not provide any QoS guarantees as previously described.
4.5.1 Real-Time Transport Protocol The Real-Time Transport Protocol (RTP) provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video or simulation data, over multicast or unicast network services. Real-time traffic examples are audio conversations between two people and playing individual video frames at the receivers as they are received from the transmitter. RTP itself, however, does not provide all of the functionality required for the transport of data, and therefore applications typically run RTP on top of UDP to make use of its multiplexing and checksum services. RTP is best described as an encapsulation protocol. The data field of the RTP packet carries the real-time traffic, and the RTP header contains information about the type of traffic that is transported [17]. RTP supports data transfer to multiple destinations using multicast distribution if provided by the underlying network, and may also be used with other suitable underlying network or transport protocols. RTP is described in the IETF’s RFC 3550 [18] specification as being a protocol providing end-to-end delivery services, such as payload type identification, time stamping, and sequence numbering, for data with real-time characteristics. RTP itself does not provide any mechanism to ensure timely delivery or provide other quality-of-service guarantees, but relies on lower-layer services to do so. It does not guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence (Figure 4.9). RTP consists of two closely linked parts: • The Real-Time Transport Protocol (RTP), to carry data that has real-time properties • The Real-Time Transport Control Protocol (RTCP), to monitor the quality of service and to convey information about the participants in an ongoing session
4.5.2 Real-Time Transport Control Protocol RTP usually works in conjunction with another protocol called the Real-Time Transport Control Protocol (RTCP), which provides minimal control over the delivery and quality of the data. It is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example, using separate port numbers with UDP. RTCP performs four main functions: • Feedback information — This is used to check the quality of the data distribution. During an RTP session, RTCP control packets are periodically sent by each participant to all the other participants. These packets contain information such as the number of RTP packets sent, the number of packets
© 2005 by CRC Press
4-15
Fundamentals in Quality of Service and Real-Time Transmission
0
32 V
P X
CC
M
sequence number
PT timestamp
synchronization source (SSRC) identifier contributing source (CSRC) identifier ....
• V — Version: Identifies the RTP version (V = 2). • P — Padding: When set, the packet contains one or more additional padding octets at the end that are not part of the payload. • X — Extension bit: When set, the fixed header is followed by exactly one header extension, with a defined format. • CSRC count (CC) — Contains the number of CSRC identifiers that follow the fixed header (0 to 15 items, 32 bits each). • M — Marker: The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. • Payload type — Identifies the format of the RTP payload and determines its interpretation by the application. A profile specifies a default static mapping of payload type codes to payload formats. Additional payload type codes may be defined dynamically through non-RTP means. • Sequence number — Increments by 1 for each RTP data packet sent and may be used by the receiver to detect packet loss and restore packet sequence. • Time stamp — Reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations. • SSRC — Synchronization source: This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. • CSRC — Contributing sources identifier list: Identifies the contributing sources for the payload contained in this packet. FIGURE 4.9 RTP header.
lost, etc., which the receiving application or any other third-party program can use to monitor network problems. The application might then change the transmission rate of the RTP packets to help reduce any problems. • Transport-level identification — This is used to keep track of each of the participants in a session. RTCP carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. Since the SSRC identifier may change if a conflict is discovered or a program is restarted, receivers require the CNAME to keep track of each participant. It is also used to associate multiple data streams from a given participant in a set of related RTP sessions, e.g., the synchronization of audio and video. • Transmission interval control — The first two functions require that all participants send RTCP packets; therefore, the rate must be controlled in order for RTP to scale up to a large number of participants. By having each participant send its control packets to all the others, each can independently observe the number of participants. This number is used to calculate the rate at which the packets are sent, which ensures that the control traffic will not overwhelm network resources. Control traffic is limited to at most 5% of the overall session traffic. • Minimal session control — This optional function is to convey minimal session control information, e.g., to display the name of a new user joining an informal session. This is most likely useful in loosely controlled sessions where participants enter and leave without membership control or parameter negotiation.
© 2005 by CRC Press
4-16
The Industrial Communication Technology Handbook
When an RTP session is initiated, an application defines one network address and two ports for RTP and RTCP. If there are several media formats such as video and audio, a separate RTP session with its own RTCP packets is required for each one. Other participants can then decide which particular session and hence medium they want to receive. Overall, RTP provides a way in which real-time information can be transmitted over existing transport and underlying network protocols. It is important to realize that RTP is an application layer protocol and does not provide any QoS guarantees. However, it does allow for various types of impairments such as packet loss or jitter to be detected. With the use of a control protocol, RTCP, it provides a minimal amount of control over the delivery of the data. However, to ensure that the real-time data will be delivered on time, if at all, RTP must be used in conjunction with other mechanisms or protocols that will provide reliable service.
4.5.3 Real-Time Streaming Protocol The Real-Time Streaming Protocol (RTSP) [19] establishes and controls either a single or several timesynchronized streams of continuous media such as audio and video. RTSP does not typically deliver the continuous streams itself, although interleaving of the continuous media stream with the control stream is possible. RFC236 [21] describes RTSP as being an application-level protocol that controls the delivery of streaming media with real-time properties. This media can be streamed over unicast or multicast networks. RTSP itself does not actually deliver the media data. This is handled by a separate protocol, and therefore RTSP can be described as a kind of network remote control to the server that is streaming the media. Sources of data can include both live data feeds and stored clips. RTSP is intended to control multiple data delivery sessions, provide a means for choosing delivery channels such as UDP, multicast UDP, and TCP, and provide a means for choosing delivery mechanisms based upon RTP. The underlying protocol that is used to control the delivery of the media is determined by the scheme used in the RTSP Uniform Resource Locator (URL). The schemes that are supported on the Internet are “rtsp:,” which requires that the commands are delivered using a reliable protocol, e.g., TCP; “rtspu:,” which identifies an unreliable protocol such as UDP; and “rtsps:,” which requires a TCP connection secured by the Transport Layer Security (TLS) protocol. Therefore, a valid RTSP URL could be “rtspu://foo.bar.com:5150,” which requests that the commands be delivered by an unreliable protocol to the server “foo.bar.com,” on port 5150. There is no notion of an RTSP connection; instead, a server maintains a session labeled by an identifier. During an RTSP session, an RTSP client may open and close many reliable transport connections to the server to issue RTSP requests. Alternatively, it may use a connectionless transport protocol such as UDP. RTSP is intentionally similar in syntax and operation to the Hypertext Transfer Protocol (HTTP) so that extension mechanisms to HTTP can in most cases also be added to RTSP. The protocol supports the following operations: • Retrieval of media from media server: The client can request a presentation description via HTTP or some other method. • Invitation of a media server to a conference: A media server can be invited to join an existing conference, either to play back media into the presentation or to record all or a subset of the media in a presentation. • Addition of media to an existing presentation: Particularly for live presentations, it is useful if the server can tell the client about additional media becoming available. Since most servers are designed to handle more than one user at a time, the server needs to be able to maintain a session state, i.e., whether it is setting up a session (the SETUP state), playing a stream (the PLAY state), etc. This will allow it to correlate RTSP requests with the relevant stream. HTTP, however, is a stateless protocol since typically there is no need to save the state of each client. Another area in which HTTP and RTSP differ is in the way the client and server interact. With HTTP the interaction is one way — the client issues a request for a document and the server responds. With
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-17
RTSP both the client and server can issue requests. To summarize, RTSP is more of a protocol framework than a protocol itself.
References [1] Uyless D. Black, MPLS and Label Switching Networks, Prentice Hall, Englewood Cliffs, NJ, 2001. [2] Douglas E. Comer, Computer Networks and Internets, 2nd edition, Prentice Hall, Englewood Cliffs, NJ, 1999. [3] P. Ferguson and G. Houston, Quality of Service: Delivering QoS on the Internet and in Corporate Networks, John Wiley & Sons, New York, 1998. [4] ITU-T Recommendation G.114, One-Way Transmission Time, International Telecommunication Union, 1996. [5] S. Kalinindi, OWDP: A Protocol to Measure One-Way Delay and Packet Loss, Technical Report STR-001, Advanced Network and Services, September 1998. [6] S. Keshav, An Engineering Approach to Computer Networking, Addison-Wesley, Reading, MA, 1997. [7] T. Kushida, The traffic and the empirical studies for the Internet, in Proc. IEEE Globecom 98, Sydney, 1998, pp. 1142–1147. [8] V. Paxson, Towards a Framework for Defining Internet Performance Metrics, Technical Report LBNL-38952, Network Research Group, Lawrence Berkeley National Laboratory, June 1996. [9] RFC2205, Resource ReSerVation Protocol (RSVP) Version 1 Functional Specification, September 1997. [10] RFC2208, Resource ReSerVation Protocol (RSVP) Version 1 Applicability Statement: Some Guidelines on Deployment, September 1997. [11] RFC2210, The Use of RSVP with IETF Integrated Services, September 1997. [12] RFC2474, Definition of the Differentiated Services Field (DS Field in the IPv4 and IPv6 Headers), December 1998. [13] RFC2475, An Architecture for Differentiated Services, September 1997. [14] R. Seifert, Gigabit Ethernet: Technology and Applications for High-Speed LANs, Addison-Wesley, Reading, MA, 1998. [15] R.W. Stevens, TCP/IP Illustrated: The Protocols, Volume 1, Addison-Wesley, New York, 1994. [16] M. Welzl and M. Mühlhäuser, Scalability and quality of service: a trade-off? IEEE Communications Magazine, 41, 32–36, 2003. [17] Uyless D. Black, Voice over IP, Prentice Hall, Englewood Cliffs, NJ, 2000. [18] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [19] H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol, Internet Draft, 1998. [20] D. Collins, Carrier Grade Voice Over IP, 2nd edition, McGraw-Hill, New York, 2003. [21] RFC2326, Real Time Streaming Protocol (RTSP), April 1998.
© 2005 by CRC Press
5 Survey of Network Management Frameworks 5.1 5.2 5.3
Introduction ........................................................................5-1 Network Management Architecture...................................5-3 ISO Systems Management Framework..............................5-4 Functional Aspects • Information Aspects • Organization Aspects • Communication Aspects
5.4
Internet Management Framework .....................................5-7
5.5
ISO and Internet Management Standards: Analysis and Comparison................................................................5-12
SNMPv1 • SNMPv2 • SNMPv3
SNMP and CMIP • MIBs and SMI • Network Management Functions
5.6
DHCP: IP Address Management Framework for IPv4 ....................................................................................5-14 IP Address Allocation Mechanisms • The IP Address Management of DHCP • Advantages and Disadvantages of DHCP for IPv4
Mai Hoang University of Potsdam
5.7 Conclusions .......................................................................5-17 References .....................................................................................5-18
5.1 Introduction Computer networks and distributed processing systems continue to grow in scale and diversity in business, government, and other organizations. Three facts become evident. First, new networks are added and existing ones are expanded almost as rapidly as new network technologies and products are introduced. The problems associated with network expansion affect day-to-day network operation management. Second, the network and its resources and distributed services become indispensable to organizations. Third, more things can go wrong, which disable the network or degrade the performance to an unacceptable level. Inhomogeneous large networks cannot be put together and managed by human effort alone. Instead, their complexity dictates the use of a rich set of automated network management tools and applications. In response, the International Organization for Standardization (ISO) began work in 1978 to establish a standard for network management, the Open Systems Interconnection (OSI) network management, including the management model, functional areas, Common Management Information Services (CMIS), Common Management Information Protocol (CMIP), and management information base (MIB) [ROS90]. The network management model describes the main components of a network man-
5-1 © 2005 by CRC Press
5-2
The Industrial Communication Technology Handbook
agement tool for a managed network. For a given managed network, it is necessary to know which problem areas have to be considered. These problem areas are specified as the ISO management functions, which were already contained in the first ISO working draft of the management framework and gradually evolved into what is presently known as the five functional areas of the ISO management framework (performance, faults, configuration, accounting, security). The important pieces of the ISO management are CMIP and CMIS, which managing and managed devices use for their communication. The CMIP consists of a set of services, the so-called CMIS that define the types of requests and responses and the actions they should invoke. In addition to being able to pass information back and forth, the managing and managed devices need to agree on a set of variables and means to initiate actions. The collection of this information is referred to as the management information base (MIB). Because of the slowness of the ISO standardization process, the complexity of the proposed new standard, and the urgent need for management tools, the Internet Engineering Task Force (IETF) devised the Simple Network Management Protocol (SNMP) [RFC1157], which was originally regarded as a provisional means for network management until the OSI management standards were complete, but subsequently became a de facto standard because of its dissemination and simplicity. The SNMP consists of three parts: the protocol, the structure of management information (SMI), and the management information base (MIB). The SNMP includes the SNMP operations, the format of messages, and how messages are exchanged between a manager and agent. The SMI is a set of rules allowing a user to specify the desired management information, e.g., by providing a means of naming and declaring the types of variables. Finally, the MIB is a structured collection of all managed objects maintained by a device. The managed objects are structured as a hierarchical tree. In order to address several weaknesses within SNMP, SNMP version 2 (SNMPv2) was initiated around 1994. SNMPv2 provides more functionality and greater efficiency than the original version of SNMP, but for various reasons SNMPv2 did not succeed. Finally, SNMP version 3 (SNMPv3) was issued in 1998. SNMPv3 describes an overall framework for present and future versions of SNMP and defines security features to SNMP. Both ISO and IETF frameworks are used for developing the network management systems and applications for monitoring and controlling the hardware as well as software components. In addition to these complex frameworks, other simple management frameworks have been developed. Each of these frameworks focuses only on a particular management task. One sort of framework is for Internet Protocol (IP) address management, which has been around since the advent of networks — each component within a network must have a set of definite, unique parameters so the rest of the network can recognize them. Traditionally, most network administrators used a pen and paper or a spreadsheet to keep track of their networks’ parameters. While this was sufficient for small networks with a few hosts, increased management expenses naturally followed as the networks grew and changed. Thus, the process of IP address management needed to be done through automated management applications. In response to this need, the IETF created the Dynamic Host Configuration Protocol (DHCP). DHCP was developed from an earlier protocol called the Bootstrap Protocol (BOOTP) [RFC951, RFC1542], which was used to pass information during initial booting to client systems. The BOOTP was designed to store and update static information for clients, including IP addresses. The BOOTP server always issued the same IP address to the same client. As a result, while BOOTP addressed the need for central management, it did not address the problem of managing IP addresses as a dynamic resource. To address the need to manage dynamic configuration information in general, and dynamic IP addresses specifically, the IETF standardized the DHCP as a framework for automatic IP verson 4 (IPv4) address management. To standardize the DHCP environment, the IETF issued a series of RFCs [RFC1542, RFC2131, RFC2132] focused on DHCP extensions to the BOOTP. The most recent of these standards is RFC2131, which was issued in March 1997. DHCP is built on a client–server model. It includes two parts: the mechanisms for IP address allocation and the protocol for communication between DHCP servers and DHCP clients. The most important features of DHCP are as follows. First, DHCP permits a server to allocate IP addresses automatically. Automatic address allocation is needed for environments such as wireless networks, where a computer can attach and detach quickly. Second, DHCP allows a client to acquire all the configuration information it needs in a single message.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-3
FIGURE 5.1 Manager–agent architecture.
This chapter focuses on the network management frameworks. First, it provides a comprehensive survey of conceptual models, protocols, services, and management information bases of the ISO and IETF management framework. Following that, the DHCP for IPv4 is discussed in detail. The chapter is organized as follows. Section 5.2 describes the network management model. The ISO network management framework is discussed briefly in Section 5.3, while Section 5.4 presents the IETF management framework. A comparison of these management standards is given in Section 5.5. Section 5.6 provides an overview of DHCP. Section 5.7 concludes the chapter with an overview of the open problems in network management.
5.2 Network Management Architecture The network management architecture used in ISO and IETF frameworks is called manager–agent architecture and includes the following key components: • • • •
Managed devices Management stations Management protocols Management information
These pieces are shown in Figure 5.1 and described below. Network management is done from management stations, which are computers running special management software. These management stations contain a set of management application processes called managers for data analysis, fault recovery, and so on. The manager is the locus of activity for network management: it provides or monitors information to users; it issues requests to managed devices in order to ask them to take some action; it receives responses to the requests; and it receives unsolicited reports from managed devices concerning the status of the devices — these reports are referred to as notification and are frequently used to report problems, anomalies, or changes in the agent environment. A managed device is a piece of network equipment that resides in a managed network. The managed devices might be hosts, routers, switches, bridges, or printers. To be managed from a management station, a device must be capable of running a management process, called (management) agent. These agents communicate with managers running on the management station and take local actions on the managed device under the command and control of the managers. An agent can act upon and respond to requests from a manager; furthermore, it can provide unsolicited notifications to a manager. Each managed device maintains one or more variables (for example, a network interface card or a set of configuration parameters for a piece of hardware or software) that describe its state. In the ISO and IETF management frameworks these variables are called managed objects. The collection of these managed objects is referred to as a management information base (MIB). These variables can be viewed and optionally modified by the managers. The network management protocol is needed for communication between managers and agents. This protocol allows the manager to query the status of managed devices and to initiate actions at these devices
© 2005 by CRC Press
5-4
The Industrial Communication Technology Handbook
by triggering the agents. Furthermore, agents can use the network management protocol to report exceptional events to the management stations. When describing any framework for network management, the following aspects must be addressed: • Functional aspect: Specifies management functional areas supported by managers and agents. This aspect relates to specific management functions that are carried out by the manager or agent. • Information aspect: Defines the kind of information that will be exchanged between manager and agent. The information aspect deals with MIBs and SMI. • Communication aspect: Addresses the communication protocol between manager and agent for exchanging this information. • Organization aspect: Deals with the definition of the principle structural components and the management architecture for a managed network. The OSI and IETF management frameworks are discussed in the following subsections from the view of these aspects.
5.3 ISO Systems Management Framework The first standard for network management was ISO 7498-4, which specifies the network management framework for the OSI model [ISO7498-4]. Although the production of this framework took considerable time, it was not generally accepted as an adequate starting point. It was therefore decided to issue an additional standard, which was called the Systems Management Overview [ISO10040]. Subsequently, ISO has issued a set of other standards for network management. Together these standards provide the basis for the OSI management framework.
5.3.1 Functional Aspects OSI Systems Management standardization followed a top-down approach, with a number of systems management functional areas (SMFAs) identified first. The intention was not to describe exhaustively all relevant types of management activity, but rather to investigate the key requirements and address these through a generic management model. The identified areas were fault, configuration, accounting, performance, and security management, collectively referred to as FCAPS from their initials [Sta93a]. 5.3.1.1 Fault Management Fault management deals with the mechanisms for the detection, isolation, and correction of abnormal operations. Fault management includes functions to: • • • •
Maintain and examine error logs Trace and identify faults Accept and act upon error notifications Carry out diagnostic tests and correct faults
5.3.1.2 Configuration Management Configuration management is the set of facilities that allow network managers to exercise control over the configuration of the network components and OSI layer entities. Configuration management includes functions to: • • • • •
Record the current configuration Record changes in the configuration Initialize and close down managed objects Identify the network components Change the configuration of managed objects (e.g., routing table)
© 2005 by CRC Press
Survey of Network Management Frameworks
5-5
5.3.1.3 Accounting Management Accounting management deals with the collection and processing of accounting information for charging and billing purposes. It should enable accounting limits to be set and costs to be combined when multiple resources are used in the context of a service. Accounting management includes functions to: • Inform users of the cost thus far • Inform users of the expected cost in the future • Set cost limits 5.3.1.4 Performance Management Performance management is the set of facilities that enable the network managers to monitor and evaluate the performance of the system and layer entities. Performance management involves three main steps: (1) performance data are gathered on variables of interest to network administrators, (2) the data are analyzed to determine normal (baseline) levels, and (3) appropriate performance thresholds are determined for each important variable so that exceeding these thresholds indicates a network problem worth attention. Management entities continually monitor performance variables. When a performance threshold is exceeded, an alert is generated and sent to the network management system. Performance management provides functions to: • Collect and disseminate data concerning the current level of performance of resources • Maintain and examine performance logs for planning and analysis purposes 5.3.1.5 Security Management Security management addresses the control of the access to network resources according to local guidelines so that the network cannot be damaged and persons without appropriate authorization cannot access sensitive information. A security management subsystem, for example, can monitor users logging on to a network resource and can refuse access to those who enter inappropriate access codes. Security management provides support for management of: • • • • •
Authorization facilities Access control Encryption and key management Authentication Security logs
Soon after the first working drafts of the management framework appeared, ISO started to define protocol standards for each of the five SMFAs. After some time, an interesting observation was made that most of the functional area protocols used a similar set of elementary management functions. ISO therefore decided to stop further progression of the five functional area protocols and concentrate on the definition of elementary management functions. Following this, a set of standards, e.g., object management, state management, relationships management, alarm reporting, event report management, and log control, have been issued as the general category systems management functions (SMFs). Each SMF standard defines the functionality to support specific management functional area (SMFA) requirements. Moreover, these standards provide a mapping between the CMIS (discussed below) and SMFs.
5.3.2 Information Aspects The information aspects of OSI systems management deal with the resources that are being managed by agents. OSI systems management relies on object-oriented concepts. Therefore, each resource being managed is represented by a managed object. A managed object may represent either a logical resource, such as a user account, or a real resource, like an ATM switch. Managed objects that refer to resources specific to an individual layer are called (N)-layer managed objects. Managed objects that refer to resources that encompass more than one layer are called systems managed objects. According to the OSI
© 2005 by CRC Press
5-6
l
The Industrial Communication Technology Handbook
Management Information Model [ISO10165-1], a managed object is defined in terms of attributes it possesses, operations that may be performed on it, notifications that it may issue, and its interactions with other managed objects. The managed objects are defined using two standards: Abstract Syntax Notation 1 (ASN.1), to define data types, and Guidelines for Definition of Managed Objects (GDMO), to define managed objects [ASN90, ISO10165-1]. Under systems management, all the managed objects are represented in the so-called management information base (MIB). The managed object concept is refined in a number of additional standards that are called the structure of management information (SMI) standards [ISO10165-1, ISO10165-2, ISO10165-4, ISO10165-5, ISO10165-7]. The SMI identifies the data types that can be used in the MIB and how the resources within the MIB are represented and named [Sta93b].
5.3.3 Organization Aspects The key elements of the OSI architectural model include the systems management application process (SMAP), systems management application entity (SMAE), layer management entity, and management information base. SMAP is the process within a managing device that is responsible for executing the network management functions; it has access to all parameters of managed devices and can therefore manage all aspects of a managed network. SMAP works in cooperation with SMAPs on other managed networks. A SMAE is responsible for communication with other devices, especially with devices exercising control functions. CMIP is used as a standardized application-level protocol by SMAE. Layer management entity is the logic embedded into each layer of OSI architecture to provide network management functions that are specific for this layer. To provide management of a distributed system, the elements in this architectural model must be implemented in a distributed fashion across all of the devices in a managed network. The OSI systems management is organized in a central manner. According to this scheme, a single manager may control several agents. Each agent contains a number of objects. Each object is a data structure that corresponds to an actual piece of device to be managed. The SMAP is allowed to take on either a manager role or an agent role. The manager role for a SMAP occurs in a device that acts as a network control center. The agent role for a SMAP occurs in managed devices. The manager performs operations upon the agents, and the agents forward notifications to the managers. Because of the expansion of the open system, the OSI management environment may be partitioned into a number of management domains. The partition can not only be based on the required management functions (security, accounting, performance, etc.), but also on other requirements (e.g., geographical).
5.3.4 Communication Aspects The communication aspect deals with the exchange of systems management information between manager and agents within a managed network. Relating to this aspect, ISO has issued two standards, the Common Management Information Services (CMIS) and the Common Management Information Protocol (CMIP) [ISO9595, ISO9596]. CMIS provides OSI management services to management applications. It defines a set of management services, specifies types of requests and responses, and defines what each request and response can do. The management processes initiate these services in order to communicate remotely. Seven services used to handle management information have been standardized. Table 5.1 lists the CMIS with their type and function. The CMIP provides the information exchange capability to support CMIS; it defines a set of protocol data units that implement the CMIS [ISO9596]. In particular, CMIP defines how the requests, responses, and notifications are encoded into messages and specifies which bearer service is used to transport those encoded messages between managers and agents. A CMIP request typically specifies one or more managed objects to which the request is to be sent. The correspondence between CMIS primitives and CMIP data units is described in [Sta99].
© 2005 by CRC Press
5-7
Survey of Network Management Frameworks
TABLE 5.1
CMIS
CMIS Services
Type
Functions
M-EVENT-REPORT
Confirmed/not confirmed
Notification Services Gives notification of an event occurring on a managed object
Operation Services M-GET M-SET M-ACTION M-CREATE M-DELETE M-CANCEL-GET
Confirmed Confirmed/not confirmed Confirmed/not confirmed Confirmed Confirmed Confirmed
Request for mangement data Modification of management data Execution of action on a managed object Creation of a managed object Deletion of a managed object Request to cancel any new responses to a previous request for M-GET services
5.4 Internet Management Framework An interesting difference between the IETF and ISO is that the IETF takes a more pragmatic and resultdriven approach than ISO. In the IETF, it is, for instance, unusual to spend much time on architectural discussions; people prefer to use their time for the development of protocols and implementations. This difference explains why no special standards management architecture and function areas have been defined in the first two versions of SNMP; only the communication aspect (as SNMP), the information aspect (as SMI and MIB), and the security aspect have been standardized. SNMP is an application layer protocol that facilitates the exchange of management information between network devices. It is a part of the Transmission Control Protocol (TCP)/IP suite and operates over User Datagram Protocol (UDP). As described in Section 5.2, the IETF network management is based on the manager–agent architecture. Figure 5.2 shows the architecture of the Internet management. In this architecture, a manager process controls access to a central MIB at the management station and provides an interface to the management application. Furthermore, a manager may control many agents, whereby each agent interprets the SNMP messages and controls the agent’s MIBs. In Section 5.2, we have provided an overview of the basic components of a management architecture used by ISO and IETF. The IETF network management framework consists of: • SNMP. SNMP is a management protocol for conveying information and commands between a manager and an agent running in a managed network device [KR01]. • MIB. Resources in networks may be managed by representing them as objects. Each object is a data variable that represents one aspect of a managed device. In the IETF network management framework, the representation of a collection of these objects is called the management information base (MIB) [RFC1066, RFC1157, RFC1212]. A MIB object might be a counter such as the number of IP datagrams discarded at a router due to errors, descriptive information such as generic information about the physical interfaces of the entity, or protocol-specific information such as the number of UDP datagrams delivered to UDP users. Management application Manager process Central MIB
SNMP UDP IP Network-dependent protocols
FIGURE 5.2 Internet management architecture.
© 2005 by CRC Press
Application manages objects SNMP messages Network or Internet
Agent process SNMP UDP IP Network-dependent protocols
Agent MIB
5-8
The Industrial Communication Technology Handbook
• SMI. SMI [RFC1155] allows the formal specification of the data types that are used in a MIB and specifies how resources within a MIB are named. The SMI is based on the ASN.1 (Abstract Syntax Notation 1) [ASN90] object definition language. However, since many SMI-specific data types have been added, SMI should be considered a data definition language of its own right. • Security and administration are concerned with monitoring and controlling access to managed networks and access to all or part of management information obtained from network nodes. In the following sections, an overview of several SNMP versions (SNMPv1, SNMPv2, SNMPv3) with respect to protocol operations, MIB, SMI, and security is given.
5.4.1 SNMPv1 The original network management framework is defined in the following documents: • RFC 1155 and RFC 1212 define SMI, the mechanisms used for specifying and naming managed objects. RFC 1215 defines a concise description mechanism for defining event notifications that are called traps in SNMPv1. • RFC 1157 defines SNMPv1, the protocol used for network access to managed objects and event notification. • RFC 1213 contains definitions for a specific MIB (MIB I) covering TCP, UDP, IP, routers, and other inhabitants of the IP world. 5.4.1.1 SMI The RFCs 1155, 1212, and 1215 describe the SNMPv1 structure of management information and are often referred to as SMIv1. Note that the first two SMI documents do not provide definitions of event notifications (traps). Because of this, the last document specifies a straightforward approach toward defining event notifications used with the SNMPv1 protocol. 5.4.1.2 Protocol Operations In SNMPv1, communication between manager and agent is performed in a confirmed way. The manager at the network management station takes the initiative by sending one of the following SNMP protocol data units (PDUs): GetRequest, GetNextRequest or SetRequest. The GetRequest and GetNextRequest are used to get management information from the agent; the SetRequest is used to change management information at the agent. After reception of one of these PDUs, the agent responds with a response PDU, which carries the requested information or indicates failure of the previous request (Figure 5.3). It is also possible that the SNMP agent takes the initiative. This happens when the agent detects some extraordinary event such as a status change at one of its links. As a reaction to this, the agent sends a trap PDU to the manager [RFC1215]. The reception of the trap is not confirmed (Figure 5.3(d)). 5.4.1.3 MIB As noted above, the MIB can be thought of as a virtual information store, holding managed objects whose values collectively reflect the current state of the network. These values may be queried or set by a manager by sending SNMP messages to the agent. Managed objects are specified using the SMI discussed above. The IETF has been standardizing the MIB modules associated with routers, hosts, and other network equipment. This includes basic identification data about a particular piece of hardware and management information about the devices network interfaces and protocols. With the different SNMP standards, the IETF needed a way to identify and name the standardized MIB modules, as well as the specific managed objects within a MIB module. To do that, the IETF adopted ASN.1 as a standardized object identification (naming) framework. In ASN.1, object identifiers have a hierarchical structure, as shown in Figure 5.4. The global naming tree illustrated in Figure 5.4 allows for unique identification of objects, which correspond to leaf nodes. Describing an object identifier is accomplished by traversing the tree, starting
© 2005 by CRC Press
5-9
Survey of Network Management Frameworks
FIGURE 5.3 Initiative from manager (a, b, c) and agent (d).
at the root, until the intended object is reached. Several formats can be used to describe an object identifier, with integer values separated by dots being the most common approach. As shown in Figure 5.4, ISO and the Telecommunications Standardization Sector of the International Telecommunications Union (ITU-T) are at the top of the hierarchy. Under the Internet branch of the tree (1.3.6.1), there are seven categories. Under the management (1.3.6.1.2) and MIB-2 (1.3.6.1.2.1) branches of the object identifier tree, we find the definitions of the standardized MIB modules. The
ITU-T (0)
ISO (1)
Standard (0)
ISO member body (2)
Joint ISO/ITU-T (2)
ISO identified organization (3)
US Dod (6) Internet (1)
directory (1) experimental (3) Security (5) management (2) private (4) SNMPv2 (6) MIB-2 (1)
system (1)
mail (7)
address icmp (5) udp (7) cmot (9) snmp (11) RMON (16) translation (3) interface (2) ip (4) tcp (6) egp (8) transmission (10)
FIGURE 5.4 ASN.1 object identifier tree.
© 2005 by CRC Press
5-10
The Industrial Communication Technology Handbook
lowest level of the tree shows some of the important hardware-oriented MIB modules (system and interface) as well as modules associated with some of the most important Internet protocols. RFC 2400 lists all standardized MIB modules. 5.4.1.4 Security The security capabilities deal with mechanisms to control the access to network resources according to local guidelines so that the network cannot be damaged (intentionally or unintentionally) and persons without appropriate authorization have no access to sensitive information. SNMPv1 has no security features. For example, it is relatively easy to use the SetRequest command to corrupt the configuration parameters of a managed device, which in turn could seriously impair network operations. The SNMPv1 framework only allows the assignment of different access rights to variables (READ-ONLY, READ-WRITE), but performs no authentication. This means that anybody can modify READ-WRITE variables. This is a fundamental weakness in the SNMPv1 framework. Several proposals have been presented to improve SNMP. In 1992, IETF issued a new standard, SNMPv2.
5.4.2 SNMPv2 Like SNMPv1, the SNMPv2 network management framework [RFC1213, RFC1441, RFC1445, RFC1448, RFC1902] consists of four major components: • RFC1441 and RFC1902 define the SMI, the mechanisms used for describing and naming objects for the purpose of management. • RFC1213 defines MIB-2, the core set of managed objects for the Internet suite of protocols. • RFC1445 defines the administrative and other architectural aspects of the framework. • RFC1448 defines the protocol used for network access to managed objects. The main achievements of SNMPv2 are improved performance, better security, and a possibility to build a hierarchy of managers. 5.4.2.1 Performance SNMPv1 includes a rule that states that if the response to a GetRequest or GetNextRequest (each of which can ask for multiple variables) would exceed the maximum size of a packet, no information will be returned at all. Because managers cannot determine the size of response packets in advance, they usually take a conservative guess and request just a small amount of data per PDU. To obtain all information, managers are required to issue a large number of consecutive requests. To improve performance, SNMPv2 introduced the GetBulk PDU. In comparison with Get and GetNext, the response to GetBulk always returns as much information as possible in lexicographic order. 5.4.2.2 Security The original SNMP had no security features. To solve this deficiency, SNMPv2 introduced a security mechanism that is based on the concepts of parties and contexts. The SNMPv2 party is a conceptual, virtual execution environment. When an agent or manager performs an action, it does so as a defined party, using the party’s environment as described in the configuration files. By using the party concept, an agent can permit one manager to do a certain set of operations (e.g., read, modify) and another manager to do a different set of operations. Each communication session with a different manager can have its own environment. The context concept is used to control access to the various parts of a MIB; each context refers to a specific part of a MIB. Contexts may be overlapping and are dynamically configurable, which means that contexts may be created, deleted, or modified during the network’s operational phase. 5.4.2.3 Hierarchy of Managers Practical experience with SNMPv1 showed that in several cases managers are unable to manage more than a few hundred agent systems. The main cause for this restriction is due to the polling nature of
© 2005 by CRC Press
Survey of Network Management Frameworks
5-11
FIGURE 5.5 Hierarchy of managers.
SNMPv1. This means that the manager must periodically poll every system under his control, which takes time. To solve this problem, SNMPv2 introduced the so-called intermediate-level managers concept, which allows polling to be performed by a number of intermediate-level managers under control of top-level managers (TLMs) via the InformRequest command provided by SNMPv2. Figure 5.5 shows an example of hierarchical managers: before the intermediate-level managers start polling, the top-level manager tells the intermediate-level managers which variable must be polled from which agents. Furthermore, the toplevel manager tells the intermediate-level manager of the events he wants to be informed about. After the intermediate-level managers are configured, they start polling. If an intermediate-level manager detects an event of interest to the top-level manager, a special Inform PDU is generated and sent to the TLM. After reception of this PDU, the TLM directly operates upon the agent that caused the event. SNMPv2 dates back to 1992, when the IETF formed two working groups to define enhancements to SNMPv1. One of these groups focused on defining security functions, while the other concentrated on defining enhancements to the protocol. Unfortunately, the group tasked with developing the security enhancements broke into separate camps with diverging views concerning the manner by which security should be implemented. Two proposals (SNMPv2m and SNMPv2*) for the implementation of encryption and authentication have been issued. Thus, the goal of the SNMPv3 working group was to continue the effort of the disbanded SNMPv2 working group to define a standard for SNMP security and administration.
5.4.3 SNMPv3 The third version of the Simple Network Management Protocol (SNMPv3) was published as proposed standards in RFCs 2271 to 2275 [RFC2271, RFC2272, RFC2273, RFC2274, RFC2275], which describe an overall architecture plus specific message structure and security features, but do not define a new SNMP PDU format. This version is built upon the first two versions of SNMP, and so it reuses the SNMPv2 standards documents (RFCs 1902 to 1908). SNMPv3 can be thought of as SNMPv2 with additional security and administration capabilities [RFC2570]. This section focuses on the management architecture and security capacities of SNMPv3. 5.4.3.1 The Management Architecture The SNMPv3 management architecture is also based on the manager–agent principle. The architecture described in RFC 2271 consists of a distributed, interacting collection of SNMP entities. Each entity implements a part of the SNMP capabilities and may act as an agent, a manager, or a combination of both. The SNMPv3 working group defines five generic applications (Figure 5.6) for generating and receiving SNMP PDUs: command generator, command responder, notification originator, notification receiver, and proxy forwarder. A command generator application generates the GetRequest, GetNextRequest, GetBulkRequest, and SetRequest PDUs and handles Response PDUs. A command responder application executes in an agent and receives, processes, and replies to the received GetRequest, GetNextRequest,
© 2005 by CRC Press
5-12
The Industrial Communication Technology Handbook
FIGURE 5.6 SNMPv3 entity.
GetBulkRequest, and SetRequest PDUs. A notification originator application also executes within an agent and generates Trap PDUs. A notification receiver accepts and reacts to incoming notifications. And a proxy forwarder application forwards request, notification, and response PDUs. The architecture shown in Figure 5.6 also defines an SNMP engine that consists of four components: dispatcher, message processing subsystem, security subsystem, and access control subsystem. This SNMP engine is responsible for preparing PDU messages for transmission, extracting PDUs from incoming messages for delivery to the applications, and doing security-related processing of outgoing and incoming messages. 5.4.3.2 Security The security capabilities of the SNMPv3 are defined in RFC 2272, RFC 2274, RFC 2275, and RFC 3415 {RFC3415]. These specifications include message processing, a user-based security model, and a viewbased access control model. The message processing can be used with any security model as follows. For outgoing messages, the message processor is responsible for constructing the message header attached to the outgoing PDUs and for passing the appropriate parameters to the security entity so that it can perform authentication and privacy functions, if required. For incoming messages, the message processor is used for passing the appropriate parameters to the security model for authentication and privacy processing and for processing and removing the message headers of the incoming PDUs. The user-based security model (USM) specified in RFC 2274 uses data encryption standard (DES) for encryption and hashed message authentication codes (HMACs) for authentication [Sch95]. USM includes means for defining procedures by which one SNMP engine obtains information about another SNMP engine, and a key management protocol for defining procedures for key generation, update, and use. The view-based access control model implements the services required for an access control subsystem [RFC2275]. It makes an access control decision that is based on the requested resource, the security model and security level used for communicating the request, the context to which access is requested, the type of access requested, and the actual object for which access is requested.
5.5 ISO and Internet Management Standards: Analysis and Comparison The purpose of this section is to compare the two different network management frameworks described in the previous sections. This comparison focuses on the four management aspects described above (functional, information, communication, organization). In particular, the network management protocols (SNMP and CMIP), the management information base (MIB), and the management functions, management architectures, and security capabilities of these two frameworks are discussed. Possible solutions to some disadvantages are also presented.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-13
5.5.1 SNMP and CMIP The biggest advantage of SNMP over CMIP is its simple design, which makes it easy to implement and as easy to use on a small network as on a large one. Users can specify the variables to be monitored in a straightforward manner. From a low-level perspective, each variable consists of the following information: • • • •
Variable name Its data type Its access attributes (READ-ONLY or READ-WRITE) Its value
Another advantage of SNMP is that it is in wide use today around the world. It has became so popular that no other network management protocol appeared to be likely to replace SNMP. The result of this is that almost all major vendors of network hardware, such as bridges and routers, design their products to support SNMP, making it very easy to implement. SNMP also has several disadvantages. The first deficiency with SNMP is that it has some large security leaks that can give network intruders access to managed devices. Intruders could also potentially shut down some terminals. To solve this problem, SNMPv2 and SNMPv3 have added some security mechanisms, as described above, that help combat the security problems. In comparison with SNMP, CMIP has a lot of advantages. The biggest advantage of CMIP is that an agent not only relays information to and from a terminal, as in SNMP, but in CMIP, an agent can perform management functions on its own instead of being restricted to gather information for remote processing by a manager. Another advantage of the CMIP approach is that it addresses many of the shortcomings of SNMP. For instance, it has built-in security management devices that support authorization, access control, and security logs. The result of this is a safer system from the beginning; no security upgrades are necessary. CMIP has many advantages, but has not been implemented yet. One problem of CMIP is that it needs more system resources than SNMP. Furthermore, a full implementation of CMIP requires adding more processes to network elements. One possible work-around is to decrease the size of the protocol by changing its specifications. Another problem with CMIP is that it is very difficult to program.
5.5.2 MIBs and SMI MIB and SMI represent the information aspects of a network management framework. In the ISO management framework, managed objects within the MIB are complex and have sophisticated data structures with three attributes: variable attributes that represent the variable characteristics, variable behaviors that define what actions can be triggered on that variable, and notifications that generate an event report whenever a specific event occurs [Sta99]. In contrast, in the SNMP framework, variables are only used to relay information to and from managers. The SNMP MIB concept has two important disadvantages. The first one is that the user has to know the names and meanings of (thousands of) different variables, which can be a daunting task. The second problem is the lack of variable aggregation: when the user wants to inquire about the values contained in an array, he has to ask separately for each element instead of naming the array at once. The latter problem has been fixed in the newer releases of SNMP: SNMPv2 and SNMPv3. These versions provide means for aggregating variables, e.g., by the new GetBulkRequest service. In fact, so many new features have been added that the formal specifications for SNMP MIBs have expanded considerably.
5.5.3 Network Management Functions One advantage of the ISO management framework over the IETF framework is that ISO has issued the five specific management functional areas, which are useful for users to develop management applications. In contrast, the IETF management framework has not defined any specific network management functions. These have to be provided entirely by the user. In fact, the IETF management standards explain
© 2005 by CRC Press
5-14
The Industrial Communication Technology Handbook
how individual management operations should be performed, but they do not specify the sequence in which these operations should be carried out to solve particular management problems.
5.6 DHCP: IP Address Management Framework for IPv4 In the previous sections, two network management frameworks have been discussed. These standards are used for developing automated network management tools and applications for monitoring and maintaining the network. In this section, the Dynamic Host Configuration Protocol (DHCP), as an IP address management framework derived from IETF, will be discussed. In comparison with the standards described before, DHCP is based on neither ISO nor IETF management standards. Each computer that can connect to the Internet needs a unique IP address. When an organization sets up its computers with a connection to the Internet, an IP address must be assigned to each machine. In the early phase of the Internet, the administrator had to manually assign an IP address to each computer, and if computers were moved to another location in another part of the network, a new IP address had to be entered. With the daily changes and additions of new IP addresses, it has become extremely difficult to keep track of the IP address records across the multitude of IP nodes and subnets. Problems involving duplicate IP addresses, missing devices, and overflows of allocated IP address pools can bring down parts or the whole of a network until the problems are manually remedied. To overcome those problems, IETF has developed the Dynamic Host Configuration Protocol [RFC2131, RFC2132, RFC3046], which allows for the automatic assignment of IP addresses to devices as they connect to the network. DHCP allows for a computer to acquire all the configuration information it needs in a single message. Furthermore, this protocol permits the allocation of IP addresses automatically. To use DHCP’s dynamic address allocation mechanism, the network administrator must configure a DHCP server by supplying a set of IP addresses. Whenever a new computer connects to the network, this computer contacts the DHCP server and requests an IP address. The server chooses one of the addresses the administrator specified and allocates that address to the computer. In the next subsections, IP address allocation and IP address management within DHCP are discussed in detail.
5.6.1 IP Address Allocation Mechanisms DHCP supports three mechanisms for IP address allocation: • Automatic allocation: DHCP server assigns a permanent IP address to a computer when it first attaches to the network. • Dynamic allocation: DHCP server assigns an IP address to a computer for a limited period of time (or until the client explicitly relinquishes the address). This mechanism is useful for assigning an address to a computer that will be connected to the network only temporarily or for sharing a limited IP address pool among a group of clients that do not need permanent IP addresses. • Manual allocation: The network administrator can configure a specific address for a specific computer. A particular network will use one or more of those mechanisms, depending on the policies of the network administrator.
5.6.2 The IP Address Management of DHCP In this subsection, DHCP is discussed from the viewpoint of four aspects: organization, information, function, and communication, which were presented in the previous sections while describing the ISO and IETF management frameworks. 5.6.2.1 Organization Aspect DHCP is built on a client–server model. The DHCP system consists of three types of devices: clients, relays, and servers. DHCP servers provide configuration information for one or several subnets. A DHCP
© 2005 by CRC Press
Survey of Network Management Frameworks
5-15
FIGURE 5.7 Communication between DHCP server and DHCP client.
FIGURE 5.8 The DHCP PDU format.
client is a host configured using information obtained from DHCP servers. If a client and a server reside on different networks, then a relay server on the client’s network is needed to relay broadcast messages between the server and the client. The organization architecture is shown in Figure 5.7. A DHCP server in a network receives DHCP requests from a client and, in case of dynamic address allocation policies selected, allocates an IP address to the requesting client. 5.6.2.2 Information Aspect The information aspects of DHCP deal with the network parameters (configuration parameters and IP addresses) exchanged between DHCP servers and DHCP clients, and the persistent storage of these parameters. The DHCP server stores a key–value entry for each client, where the key is some unique identifier and the value contains the configuration parameters for the client. A client can query the DHCP server to retrieve its configuration parameters. The client’s interface to the configuration parameters repository consists of protocol messages to request configuration parameters, and responses from the server carrying the configuration parameters. 5.6.2.3 Functional Aspect The functions of DHCP are defined through DHCP PDU. The format of a DHCP PDU is shown in Figure 5.8. Table 5.2 describes the fields in a DHCP message. There are eight message types for DHCP: five of them are used as messages sent from the client to the server, and the other three are used for messages sent from the server to the client. The types of these messages are described in Table 5.3. 5.6.2.4 Communication Aspect The communication aspect deals with rules for communication between a DHCP client and a DHCP server for exchanging the DHCP PDUs. The client–server interaction can be classified in two cases: (1) client–server interaction for allocating an IP address, and (2) client–server interaction for reusing a previously allocated IP address. In both cases, the communication between clients and servers is performed in a confirmed way and initiated by clients.
© 2005 by CRC Press
5-16
The Industrial Communication Technology Handbook
TABLE 5.2
DHCP Message Field Description
Field
Description
op xid ciaddr yiaddr giaddr sname file options
Message type (BOOTREQUEST, BOOTREPLY) identifies whether a message is sent from a client to a server (BOOTREQUEST) or from a server to a client (BOOTREPLY) Transaction ID, a random number chosen by client, used by the client and server to associate messages and responses between a client and a server Client IP address; only filled in if client can respond to ARP request Your client IP address Relay agent IP address, used in booting via a relay agent Optional server host name Boot file name Optional parameters field
TABLE 5.3
DHCP Message Type
PDU (Message)
Description Sent from Client to Server
DHCPDISCOVER DHCPREQUEST DHCPDECLINE DHCPINFORM DHCPRELEASE
Client broadcast to local available servers Requesting parameters from one server Indicating an IP address is already in use Asking for a local configuration parameter Relinquishing an IP address and canceling remaining lease Sent from Server to Client
DHCPOFFER DHCPACK DHCPNAK
Respond to DHCPDISCOVER with an offer of configuration parameters Acknowledgment with configuration parameters, including committed IP address Refusing request for configuration parameters (e.g., requested IP address already in use)
• The client–server interaction for allocating an IP address is performed as follows [RFC2131]: 1. A client, which attaches to the network for the first time, takes the initiative by sending a DHCPDISCOVER broadcast message to locate available servers. The DHCPDISCOVER message may include options that suggest values for the network address and lease duration. 2. DHCP servers receiving the DHCPDISCOVER message may return DHCPOFFER or may not return (many servers may receive the same DHCPDISCOVER message). If a server decides to respond, the server puts an available address into the “yiaddr” field (and other configuration parameters in DHCP options) and broadcasts a DHCPOFFER message. At this point, there is no agreement of an assignment between the server and the client. 3. The client receives one or more DHCPOFFER messages from one or more servers. It then chooses one server from them. The client puts the IP address of the selected server into the “server identifier” option of a DHCPREQUEST and broadcasts it to indicate which server it has selected. This DHCPREQUEST is broadcasted and relayed through DHCP relay agents. 4. Servers receive the DHCPREQUEST broadcast from the client. The servers check the “server identifier” option. If it does not match with its own address, the server interprets the message as a notification that the client has declined the offer. The selected server sends the DHCPACK (if its address is available) or the DHCPNAK (for example, the address is already assigned to another client). 5. The client, which gets the DHCPACK, starts using the IP address. From that point on, the client is configured. If it gets DHCPNAK, it restarts from step 1. 6. If the client finds a problem with the assigned address of DHCPACK, it sends DHCPDECLINE to the server and restarts from step 1. 7. The client may choose to relinquish its lease of a network address by sending a DHCPRELEASE message to the server.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-17
• Client–server interaction for reusing a previously allocated IP address: If a client remembers and wishes to reuse a previously allocated IP address, it may choose to omit some of the steps described in the previous section. The interaction is performed as follows [RFC2131]: 1. The client broadcasts a DHCPREQUEST message with the “requested IP address” option, which indicates the previously assigned address. 2. A DHCP server, which has a binding of the address, returns DHCPACK or DHCPNAK to the client. The DHCPACK message indicates that the client can use the previously assigned address. The DHCPNAK means that the IP address is already in use. 3. If the client receives the DHCPACK message, it performs a final check of the parameters, notes the duration of the lease specified in the DHCPACK message, and starts using the IP address. If the client receives the DHCPNAK message, the client must send the DHCPDECLINE message to the server and restarts the configuration process by requesting a new network address. If the client receives neither the DHCPACK nor DHCPNAK message, it times out and retransmits the DHCPREQUEST message. 4. The client may choose to relinquish its lease of an IP address by sending a DHCPRELEASE message to the server. DHCP uses UDP as its transport protocol. DHCP messages from a client to a server are sent to the DHCP server’s port (67), and DHCP messages from a server to a client are sent to the DHCP client’s port (68).
5.6.3 Advantages and Disadvantages of DHCP for IPv4 Nowadays, DHCP is used in many installations to pass configuration information to workstations. One of the main advantages of DHCP is that a workstation is not required to have any kind of permanent storage space. All network configuration parameters can be passed using DHCP without any human interaction. The other advantage is that the DHCP can play an important role in reducing the cost of ownership for large organizations by moving the administration of client systems to centralized management servers. DHCP helps reduce the impact of the increasing scarcity of available IP addresses in two ways. First, DHCP can be used to manage the limited standard IP addresses that are available to an organization. It does this by issuing the addresses to clients on an as-needed basis and reclaiming them when the addresses are no longer required. Second, DHCP can be used in conjunction with network address translation (NAT) to issue private network addresses to connect clients to the Internet. However, HDCP for IPv4 has a few inherent problems. First of all, the current DHCP implementation provides no authentication or security mechanisms, whereby the authentication for DHCP is proposed in RFC 3118 [RFC3118]. One example of this security problem is that a message broadcast by a server can lead to the situation in which all the traffic can be routed through a malicious host that eavesdrops on the traffic all the time. An even more dreadful situation would arise if a computer could obtain a tampered boot file that logs all the login–password pairs to the same remote hosts. Second, data about configuration parameters and IP addresses are held locally in the DHCP servers. There exists no standard for controlling and monitoring this configuration data from the DHCP servers. Another disadvantage of DHCP relates to the leasing mechanism: the client is expected to stop using any dynamically allocated IP address after the lease time expires. Additionally, a client requesting a new lease is not guaranteed to receive the same IP address as it had previously.
5.7 Conclusions We have surveyed the management architecture, protocols, services, and management information base of the network management frameworks standardized by ISO and IETF. We also introduced DHCP, an IP address management framework standardized by IETF. Within each of those frameworks, standards relating to four fundamental aspects of network management (functional, information, communication, and organization aspects) were addressed.
© 2005 by CRC Press
5-18
The Industrial Communication Technology Handbook
Both ISO and IETF network management frameworks have their advantages and disadvantages. However, the key decision factor in choosing between the two frameworks is in their implementation. It has been until now almost impossible to find a system with the necessary resources to support the ISO framework, although it is conceptually superior to SNMP (v1, v2, and v3) in both design and operation. In comparison with both network management frameworks, DHCP is much simpler in both design and implementation. This is largely due to the fact that it focuses only on one particular task —the IP address management. DHCP can play an important role in making systems management simpler and less expensive by moving the management of IP addresses away from the client systems and onto centralized servers.
References [ASN90] ISO/IEC 8824. Specification of Abstract Syntax Notation One (ASN.1), April 1990. [ISO7498-4] ITU-T-ISO/IEC.ITU-T X.700-ISO/IEC 7498-4. Information Processing Systems: Open Systems Interconnection: Management Framework for Open System Interconnection, 1992. [ISO9595] ISO 9595. Information Processing Systems: Open Systems Interconnection: Common Management Information Service Definition, Geneva, 1990. [ISO9596] ISO 9596. Information Processing Systems: Open Systems Interconnection: Common Management Information Protocol, Geneva, 1991. [ISO10040] ITU-T-ISO/IEC.ITU-T X.701-ISO/IEC 10040. Information Processing Systems: Open System Interconnection: System Management Overview, 1992. [ISO10165-1] ITU-T-ISO/IEC.ITU-T X.720-ISO/IEC 10165-1. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Management Information Model, Geneva, 1993. [ISO10165-2] ISO 10165-2. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Definition of Management Information, Geneva, 1993. [ISO10165-4] ISO 10165-4. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Part 4: Guidelines for the Definition of Managed Objects, Geneva, 1993. [ISO10165-5] ISO 10165-5. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Generic Management Information, Geneva, 1993. [ISO10165-7] ISO 10165-7. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: General Relationship Model, Geneva, 1993. [KR01] James F. Kurose, Keith W. Ross. Computer Networking: A Top-Down Approach Featuring the Internet, Addison Wesley, Reading, MA, 2001. [RFC951] Bill Crosoft, John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September 1985. [RFC1066] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets, RFC 1066, 1998. [RFC1155] K. McCloghrie, M. Rose. Structure and Identification of Management Information for TCP/ IP-Based Internets, RFC 1155, 1990. [RFC1157] J. Case, M. Fedor, M. Schofftall, C. Davin. The Simple Network Management Protocol, RFC 1157, May 1990. [RFC1212] K. McCloghrie, M. Rose. Concise MIB Definitions, RFC 1212, 1991. [RFC1213] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets: MIB-II, RFC 1213, 1991. [RFC1215] M. Rose. A Convention for Defining Traps for use with the SNMP, RFC 1215, 1991. [RFC1441] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Introduction to Version 2 of the InternetStandard Network Management Framework, RFC 1441, 1993. [RFC1445] J. Galvin, K. McCloghrie. Administrative Model for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1445, 1993.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-19
[RFC1448] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1448, 1993. [RFC1542] W. Wimer. Clarifications and Extensions for Bootstrap Protocol, RFC 1542, October 1993. [RFC1902] J. Case, K. McCloghrie, M. Rose, S. Waldbusser. Structure of Management Information for Version 2 of Simple Network Management Protocol (SNMPv2), RFC 1902, January 1996. [RFC2131] R. Droms. Dynamic Host Configuration Protocol, RFC 2131, March 1997. [RFC2132] S. Alexander, R. Droms. DHCP Options and BOOTP Vendor Extensions, RFC 2132, March 1997. [RFC2271] D. Harrington, R. Presuhn, B. Wijnen. An Architecture for Describing SNMP Management Frameworks, RFC 2271, 1998. [RFC2272] J. Case, D. Harrington, R. Presuhn, B. Wijnen. Message Processing and Dispatching for the Simple Network Management Protocol (SNMP), RFC 2272, 1998. [RFC2273] D. Levi, P. Meyer, B. Stewart. SNMPv3 Applications, RFC 2273, 1998. [RFC2274] U. Blumenthal, B. Wijnen. User-Based Security Model (USM) for Version 3 of the Simple Network Management Protocol (SNMPv3), RFC 2274, 1998. [RFC2275] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 2275, 1998. [RFC2570] J.Case, R. Mundy, D. Partain, B. Steward. Introduction to Version 3 of the Internet Standard Network Management Framework, RFC 2570, 1999. [RFC3046] M. Patrick. DHCP Relay Agent Information Option, RFC 3046, January 2001. [RFC3118] R. Droms, W. Arbaugh. Authentication for DHCP Messages, RFC 3118, June 2001. [RFC3411] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3411, 2002. [RFC3415] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3415, 2002. [Ros90] Marshall T. Rose. The Open Book: A Practical Perspective on OSI, Prentice Hall, Englewood Cliffs, NJ, 1990. [Sch95] Bruce Schneider. Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley, New York, 1995. [Sta93a] William Stallings. Networking Standards: A Guide to OSI, ISDN, LAN, and MAN Standards, Addison-Wesley, Reading, MA, 1993. [Sta93b] William Stallings. SNMP, SNMPv2 and CMIP: The Practical Guide to Network Management Standards, Addison-Wesley, Reading, MA, 1993. [Sta99] William Stallings. SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, Addison-Wesley, Reading, MA, 1999.
© 2005 by CRC Press
6 Internet Security 6.1 6.2
Security Attacks and Security Properties...........................6-1 Security Mechanisms ..........................................................6-3 Attack Prevention • Attack Avoidance • Attack and Intrusion Detection
Christopher Kruegel Vienna University of Technology
6.3 Secure Network Protocols ................................................6-10 6.4 Secure Applications ...........................................................6-12 6.5 Summary............................................................................6-13 References .....................................................................................6-13
In order to provide useful services or to allow people to perform tasks more conveniently, computer systems are attached to networks and get interconnected. This results in the worldwide collection of local and wide area networks known as the Internet. Unfortunately, the extended access possibilities also entail increased security risks, as it opens additional avenues for an attacker. For a closed, local system, the attacker was required to be physically present at the network in order to perform unauthorized actions. In the networked case, each host that can send packets to the victim can be potentially utilized. As certain services (such as Web or name servers) need to be publicly available, each machine on the Internet might be the originator of malicious activity. This fact makes attacks very likely to happen on a regular basis. The following attempts to give a systematic overview of security requirements of Internet-based systems and potential means to satisfy them. We define properties of a secure system and provide a classification of potential threats to it. We also introduce mechanisms to defend against attacks that attempt to violate desired properties. The most widely used means to secure application data against tampering and eavesdropping, the Secure Sockets Layer (SSL), and its successor, the Transport Layer Security (TLS) protocol, are discussed. Finally, we briefly describe popular application programs that can act as building blocks for securing custom applications. Before one can evaluate attacks against a system and decide on appropriate mechanisms against them, it is necessary to specify a security policy [23]. A security policy defines the desired properties for each part of a secure computer system. It is a decision that has to take into account the value of the assets that should be protected, the expected threats, and the cost of proper protection mechanisms. A security policy that is sufficient for the data of a normal user at home may not be sufficient for bank applications, as these systems are obviously a more likely target and have to protect more valuable resources. Although often neglected, the formulation of an adequate security policy is a prerequisite before one can identify threats and appropriate mechanisms to face them.
6.1 Security Attacks and Security Properties For the following discussion, we assume that the function of a system that is the target of an attack is to provide information. In general, there is a flow of data from a source (e.g., host, file, memory) to a
6-1 © 2005 by CRC Press
6-2
The Industrial Communication Technology Handbook
FIGURE 6.1 Security attacks.
destination (e.g., remote host, other file, user) over a communication channel (e.g., wire, data bus). The task of the security system is to restrict access to this information to only those parties (persons or processes) that are authorized to have access according to the security policy in use. In the case of an automation system that is remotely connected to the Internet, the information flow is from or to a control application that manages sensors and actuators via communication lines of the public Internet and the network of the automation system (e.g., a fieldbus). The normal information flow and several categories of attacks that target it are shown in Figure 6.1 and explained below (according to [22]): 1. Interruption: An asset of the system gets destroyed or becomes unavailable. This attack targets the source or the communication channel and prevents information from reaching its intended target (e.g., cut the wire, overload the link so that the information gets dropped because of congestion). Attacks in this category attempt to perform a kind of denial of service (DOS). 2. Interception: An unauthorized party gets access to the information by eavesdropping into the communication channel (e.g., wiretapping). 3. Modification: The information is not only intercepted, but modified by an unauthorized party while in transit from the source to the destination. By tampering with the information, it is actively altered (e.g., modifying message content). 4. Fabrication: An attacker inserts counterfeit objects into the system without having the sender do anything. When a previously intercepted object is inserted, this process is called replaying. When the attacker pretends to be the legitimate source and inserts his desired information, the attack is called masquerading (e.g., replay an authentication message, add records to a file). The four classes of attacks listed above violate different security properties of the computer system. A security property describes a desired feature of a system with regards to a certain type of attack. A common classification following [5, 13] is listed below: • Confidentiality: This property covers the protection of transmitted data against its release to nonauthorized parties. In addition to the protection of the content itself, the information flow should also be resistant against traffic analysis. Traffic analysis is used to gather other information than the transmitted values themselves from the data flow (e.g., timing data, frequency of messages).
© 2005 by CRC Press
Internet Security
6-3
• Authentication: Authentication is concerned with making sure that the information is authentic. A system implementing the authentication property assures the recipient that the data are from the source from which they claim to be. The system must make sure that no third party can masquerade successfully as another source. • Nonrepudiation: This property describes the feature that prevents either sender or receiver from denying a transmitted message. When a message has been transferred, the sender can prove that it has been received. Similarly, the receiver can prove that the message has actually been sent. • Availability: Availability characterizes a system whose resources are always ready to be used. Whenever information needs to be transmitted, the communication channel is available and the receiver can cope with the incoming data. This property makes sure that attacks cannot prevent resources from being used for their intended purpose. • Integrity: Integrity protects transmitted information against modifications. This property ensures that a single message reaches the receiver as it has left the sender, but integrity also extends to a stream of messages. It means that no messages are lost, duplicated, or reordered, and it makes sure that messages cannot be replayed. As destruction is also covered under this property, all data must arrive at the receiver. Integrity is not only important as a security property, but also as a property for network protocols. Message integrity must also be ensured in case of random faults, not only in case of malicious modifications.
6.2 Security Mechanisms Different security mechanisms can be used to enforce the security properties defined in a given security policy. Depending on the anticipated attacks, different means have to be applied to satisfy the desired properties. We divide these measures against attacks into three different classes: attack prevention, attack avoidance, and attack detection.
6.2.1 Attack Prevention Attack prevention is a class of security mechanisms that contains ways of preventing or defending against certain attacks before they can actually reach and affect the target. An important element in this category is access control, a mechanism that can be applied at different levels, such as the operating system, the network, or the application layer. Access control [23] limits and regulates the access to critical resources. This is done by identifying or authenticating the party that requests a resource and checking its permissions against the rights specified for the demanded object. It is assumed that an attacker is not legitimately permitted to use the target object and is therefore denied access to the resource. As access is a prerequisite for an attack, any possible interference is prevented. The most common form of access control used in multiuser computer systems is access control lists for resources that are based on the user identity of the process that attempts to use them. The identity of a user is determined by an initial authentication process that usually requires a name and a password. The login process retrieves the stored copy of the password corresponding to the user name and compares it with the presented one. When both match, the system grants the user the appropriate user credentials. When a resource should be accessed, the system looks up the user and group in the access control list and grants or denies access as appropriate. An example of this kind of access control is a secure Web server. A secure Web server delivers certain resources only to clients that have authenticated themselves and pose sufficient credentials for the desired resource. The authentication process is usually handled by the Web client, such as Microsoft Internet Explorer or Mozilla, by prompting the user to enter his name and password. The most important access control system at the network layer is a firewall [4]. The idea of a firewall is based on the separation of a trusted inside network of computers under single administrative
© 2005 by CRC Press
6-4
The Industrial Communication Technology Handbook
Firewall
Firewall
Internet
Inside Network Demilitarized Zone (DMZ)
FIGURE 6.2 Demilitarized zone.
control from a potential hostile outside network. The firewall is a central choke point that allows enforcement of access control for services that may run at the inside or outside. The firewall prevents attacks from the outside against the machines in the inside network by denying connection attempts from unauthorized parties located outside. In addition, a firewall may also be utilized to prevent users behind the firewall from using certain services that are outside (e.g., surfing Web sites containing pornographic material). For certain installations, a single firewall is not suitable. Networks that consist of several server machines that need to be publicly accessible and workstations that should be completely protected against connections from the outside would benefit from a separation between these two groups. When an attacker compromises a server machine behind a single firewall, all other machines can be attacked from this new base without restrictions. To prevent this, one can use two firewalls and the concept of a demilitarized zone (DMZ) [4] in between, as shown in Figure 6.2. In this setup, one firewall separates the outside network from a segment (DMZ) with the server machines, while a second one separates this area from the rest of the network. The second firewall can be configured in a way that denies all incoming connection attempts. Whenever an intruder compromises a server, he is now unable to immediately attack a workstation located in the inside network. The following design goals for firewalls are identified in [4]: 1. All traffic from inside to outside, and vice versa, must pass through the firewall. This is achieved by physically blocking all access to the internal network except via the firewall. 2. Only authorized traffic, as defined by the local security policy, will be allowed to pass. 3. The firewall itself should be immune to penetration. This implies the use of a trusted system with a secure operating system. A trusted, secure operating system is often purpose built, has heightened security features, and only provides the minimal functionality necessary to run the desired applications. These goals can be reached by using a number of general techniques for controlling access. The most common is called service control and determines Internet services that can be accessed. Traffic on the Internet is currently filtered on the basis of Internet Protocol (IP) addresses and Transmission Control Protocol (TCP)/User Datagram Protocol (UDP) port numbers. In addition, there may be proxy software that receives and interprets each service request before passing it on. Direction control is a simple mechanism to control the direction in which particular service requests may be initiated and permitted to flow through. User control grants access to a service based on user credentials similar to the technique used in a multiuser operating system. Controlling external users requires secure authentication over the network (e.g., such as provided in IPSec [10]). A more declarative approach in contrast to the operational variants mentioned above is behavior control. This technique determines how particular services are used. It may be utilized to filter e-mail to eliminate spam or to allow external access to only part of the local Web pages. A summary of capabilities and limitations of firewalls is given in [22]. The following benefits can be expected: • A firewall defines a single choke point that keeps unauthorized users out of the protected network. The use of such a point also simplifies security management. • It provides a location for monitoring security-related events. Audits, logs, and alarms can be implemented on the firewall directly. In addition, it forms a convenient platform for some nonsecurity-related functions, such as address translation and network management. • A firewall may serve as a platform to implement a virtual private network (e.g., by using IPSec).
© 2005 by CRC Press
Internet Security
6-5
The list below enumerates the limits of the firewall access control mechanism: • A firewall cannot protect against attacks that bypass it, for example, via a direct dial-up link from the protected network to an Internet service provider (ISP). It also does not protect against internal threats from an inside hacker or an insider cooperating with an outside attacker. • A firewall does not help when attacks are against targets whose access has to be permitted. • It cannot protect against the transfer of virus-infected programs or files. It would be impossible, in practice, for the firewall to scan all incoming files and e-mails for viruses. Firewalls can be divided into two main categories. A packet-filtering router, or short packet filter, is an extended router that applies certain rules to the packets that are forwarded. Usually, traffic in each direction (in- and outgoing) is checked against a rule set that determines whether a packet is permitted to continue or should be dropped. The packet filter rules operate on the header fields used by the underlying communication protocols, for the Internet almost always IP, TCP, and UDP. Packet filters have the advantage that they are cheap, as they can often be built on existing hardware. In addition, they offer good performance for high-traffic loads. An example for a packet filter is the iptables package, which is implemented as part of the Linux 2.4 routing software. A different approach is followed by an application-level gateway, also called proxy server. This type of firewall does not forward packets on the network layer but acts as a relay on the application level. The user contacts the gateway, which in turn opens a connection to the intended target (on behalf of the user). A gateway completely separates the inside and outside networks at the network level and only provides a certain set of application services. This allows authentication of the user who requests a connection and session-oriented scanning of the exchanged traffic up to the application-level data. This feature makes application gateways more secure than packet filters and offers a broader range of log facilities. On the downside, the overhead of such a setup may cause performance problems under heavy loads. Another important element in the set of attack prevention mechanisms is system hardening. System hardening is used to describe all steps that are taken to make a computer system more secure. It usually refers to changing the default configuration to a more secure one, possibly at the expense of ease of use. Vendors usually preinstall a large set of development tools and utilities, which, although beneficial to the new user, might also contain vulnerabilities. The initial configuration changes that are part of system hardening include the removal of services, applications, and accounts that are not needed and the enabling of operating system auditing mechanisms (e.g., event log in Windows). Hardening also involves a vulnerability assessment of the system. Numerous open-source tools such as network (e.g., nmap [8]) and vulnerability scanners (e.g., Nessus [12]) can help to check a system for open ports and known vulnerabilities. This knowledge then helps to remedy these vulnerabilities and close unnecessary ports. An important and ongoing effort in system hardening is patching. Patching describes a method of updating a file that replaces only the parts being changed, rather than the entire file. It is used to replace parts of a (source or binary) file that contain a vulnerability that is exploitable by an attacker. To be able to patch, it is necessary that the system administrators keep up to date with security advisories that are issued by vendors to inform about security-related problems in their products.
6.2.2 Attack Avoidance Security mechanisms in this category assume that an intruder may access the desired resource but the information is modified in a way that makes it unusable for the attacker. The information is preprocessed at the sender before it is transmitted over the communication channel and postprocessed at the receiver. While the information is transported over the communication channel, it resists attacks by being nearly useless for an intruder. One notable exception is attacks against the availability of the information, as an attacker could still interrupt the message. During the processing step at the receiver, modifications or errors that might have previously occurred can be detected (usually because the information cannot be correctly reconstructed). When no modification has taken place, the information at the receiver is identical to the one at the sender before the preprocessing step.
© 2005 by CRC Press
6-6
The Industrial Communication Technology Handbook
FIGURE 6.3 Encryption and decryption.
The most important member in this category is cryptography, which is defined as the science of keeping messages secure [18]. It allows the sender to transform information into a random data stream from the point of view of an attacker, but to have it recovered by an authorized receiver (Figure 6.3). The original message is called plain text (sometimes clear text). The process of converting it through the application of some transformation rules into a format that hides its substance is called encryption. The corresponding disguised message is denoted cipher text, and the operation of turning it back into clear text is called decryption. It is important to notice that the conversion from plain to cipher text has to be lossless in order to be able to recover the original message at the receiver under all circumstances. The transformation rules are described by a cryptographic algorithm. The function of this algorithm is based on two main principles: substitution and transposition. In the case of substitution, each element of the plain text (e.g., bit, block) is mapped into another element of the used alphabet. Transposition describes the process where elements of the plain text are rearranged. Most systems involve multiple steps (called rounds) of transposition and substitution to be more resistant against cryptanalysis. Cryptanalysis is the science of breaking the cipher, i.e., discovering the substance of the message behind its disguise. When the transformation rules process the input elements one at a time, the mechanism is called a stream cipher; in case of operating on fixed-size input blocks, it is called a block cipher. If the security of an algorithm is based on keeping the way the algorithm works (i.e., the transformation rules) secret, it is called a restricted algorithm. Those algorithms are no longer of any interest today because they do not allow standardization or public quality control. In addition, when a large group of users is involved, such an approach cannot be used. A single person leaving the group makes it necessary for everyone else to change the algorithm. Modern cryptosystems solve this problem by basing the ability of the receiver to recover encrypted information on the fact that he possesses a secret piece of information (usually called the key). Both encryption and decryption functions have to use a key, and they are heavily dependent on it. When the security of the cryptosystem is completely based on the security of the key, the algorithm itself may be revealed. Although the security does not rely on the fact that the algorithm is unknown, the cryptographic function itself and the used key, together with its length, must be chosen with care. A common assumption is that the attacker has the fastest commercially available hardware at his disposal in his attempt to break the cipher text. The most common attack, called known plain text attack, is executed by obtaining cipher text together with its corresponding plain text. The encryption algorithm must be so complex that even if the code breaker is equipped with plenty of such pairs and powerful machines, it is infeasible for him to retrieve the key. An attack is infeasible when the cost of breaking the cipher exceeds the value of the information or the time it takes to break it exceeds the life span of the information. Given pairs of corresponding cipher and plain text, it is obvious that a simple key guessing algorithm will succeed after some time. The approach of successively trying different key values until the correct one is found is called brute force attack because no information about the algorithm is utilized whatsoever. In order to be useful, it is a necessary condition for an encryption algorithm that brute force attacks are infeasible. Depending on the keys that are used, one can distinguish two major cryptographic approaches: secret and public key cryptosystems.
© 2005 by CRC Press
Internet Security
6-7
6.2.2.1 Secret Key Cryptography This is the kind of cryptography that has been used for the transmission of secret information for centuries, long before the advent of computers. These algorithms require that the sender and receiver agree on a key before communication is started. It is common for this variant (which is also called single key or symmetric encryption) that a single secret key is shared between the sender and receiver. It needs to be communicated in a secure way before the actual encrypted communication can start, and has to remain secret as long as the information is to remain secret. Encryption is achieved by applying an agreed function to the plain text using the secret key. Decryption is performed by applying the inverse function using the same key. The classic example of a secret key block cipher, which is widely deployed today, is the data encryption standard (DES) [6]. DES was developed in 1977 by IBM and adopted as a standard by the U.S. government for administrative and business use. Recently, it has been replaced by the advanced encryption standard (AES — Rijndael) [1]. It is a block cipher that operates on 64-bit plain text blocks and utilizes a key 56 bits in length. The algorithm uses 16 rounds that are key dependent. During each round 48 key bits are selected and combined with the block that is encrypted. Then, the resulting block is piped through a substitution and a permutation phase (which use known values and are independent of the key) to make cryptanalysis harder. Although there is no known weakness of the DES algorithm itself, its security has been much debated. The small key length makes brute force attacks possible, and several cases have occurred where DES-protected information has been cracked. A suggested improvement called 3DES uses three rounds of the simple DES with three different keys. This extends the key length to 168 bits while still resting on the very secure DES base. A well-known stream cipher that has been debated recently is RC4 [16], which has been developed by RSA. It is used to secure the transmission in wireless networks that follow the IEEE 802.11 standard and forms the core of the wired equivalent protection (WEP) mechanism. Although the cipher itself has not been broken, current implementations are flawed and reduce the security of RC4 down to a level where the used key can be recovered by statistical analysis within a few hours. 6.2.2.2 Public Key Cryptography Since the advent of public key cryptography, the knowledge of the key that is used to encrypt a plain text also allowed the inverse process, the decryption of the cipher text. In 1976, this paradigm of cryptography was changed by Diffie and Hellman [7] when they described their public key approach. Public key cryptography utilizes two different keys, one called the public key, the other the private key. The public key is used to encrypt a message while the corresponding private key is used to do the opposite. Their innovation was based on the fact that it is infeasible to retrieve the private key given the public key. This makes it possible to remove the weakness of secure key transmission from the sender to the receiver. The receiver can simply generate his public–private key pair and announce the public key without fear. Anyone can obtain this key and use it to encrypt messages that only the receiver with his private key is able to decrypt. Mathematically, the process is based on the trap door of one-way functions. A one-way function is a function that is easy to compute but very hard to inverse. That means that given x, it is easy to determine f(x), but given f(x), it is hard to get x. Hard is defined as computationally infeasible in the context of cryptographically strong one-way functions. Although it is obvious that some functions are easier to compute than their inverse (e.g., square of a value in contrast to its square root), there is no mathematical proof or definition of one-way functions. There are a number of problems that are considered difficult enough to act as one-way functions, but it is more an agreement among cryptanalysts than a rigorously defined set (e.g., factorization of large numbers). A one-way function is not directly usable for cryptography, but it becomes so when a trap door exists. A trap door is a mechanism that allows one to easily calculate x from f(x) when an additional information y is provided. A common misunderstanding about public key cryptography is thinking that it makes secret key systems obsolete, either because it is more secure or because it does not have the problem of secretly exchanging keys. As the security of a cryptosystem depends on the length of the used key and the utilized
© 2005 by CRC Press
6-8
The Industrial Communication Technology Handbook
transformation rules, there is no automatic advantage of one approach over the other. Although the key exchange problem is elegantly solved with a public key, the process itself is very slow and has its own problems. Secret key systems are usually a factor of 1000 (see [18] for exact numbers) faster than their public key counterparts. Therefore, most communication is still secured using secret key systems, and public key systems are only utilized for exchanging the secret key for later communication. This hybrid approach is the common design to benefit from the high speed of conventional cryptography (which is often implemented directly in hardware) and from a secure key exchange. A problem in public key systems is the authenticity of the public key. An attacker may offer the sender his own public key and pretend that it originates from the legitimate receiver. The sender then uses the fake public key to perform his encryption and the attacker can simply decrypt the message using his private key. In order to thwart an attacker that attempts to substitute his public key for the victim’s, certificates are used. A certificate combines user information with the user’s public key and the digital signature of a trusted third party that guarantees that the key belongs to the mentioned person. The trusted third party is usually called a certification authority (CA). The certificate of a CA itself is usually verified by a higher-level CA that confirms that the CA’s certificate is genuine and contains its public key. The chain of third parties that verify their respective lower-level CAs has to end at a certain point, which is called the root CA. A user that wants to verify the authenticity of a public key and all involved CAs needs to obtain the self-signed certificate of the root CA via an external channel. Web browsers (e.g., Netscape Navigator, Internet Explorer) usually ship with a number of certificates of globally known root CAs. A framework that implements the distribution of certificates is called a public key infrastructure (PKI). An important protocol for key management is X.509 [25]. Another important issue is revocation, the invalidation of a certificate when the key has been compromised. The best-known public key algorithm and textbook classic is RSA [17], named after its inventors at MIT, Rivest, Shamir, and Adleman. It is a block cipher that is still utilized for the majority of current systems, although the key length has been increased over recent years. This has put a heavier processing load on applications, a burden that has ramifications, especially for sites doing electronic commerce. A competitive approach that promises similar security as RSA using far smaller key lengths is elliptic curve cryptography. However, as these systems are new and have not been subject to sustained cryptanalysis, the confidence level in them is not yet as high as in RSA. 6.2.2.3 Authentication and Digital Signatures An interesting and important feature of public key cryptography is its possible use for authentication. In addition to making the information unusable for attackers, a sender may utilize cryptography to prove his identity to the receiver. This feature is realized by digital signatures. A digital signature must have similar properties as a normal handwritten signature. It must be hard to forge and it has to be bound to a certain document. In addition, one has to make sure that a valid signature cannot be used by an attacker to replay the same (or different) messages at a later time. A way to realize such a digital signature is by using the sender’s private key to encrypt a message. When the receiver is capable of successfully decrypting the cipher text with the sender’s public key, he can be sure that the message is authentic. This approach obviously requires a cryptosystem that allows encryption with the private key, but many (such as RSA) offer this option. It is easy for a receiver to verify that a message has been successfully decrypted when the plain text is in a human readable format. For binary data, a checksum or similar integrity checking footer can be added to verify a successful decryption. Replay attacks are prevented by adding a time stamp to the message (e.g., Kerberos [11] uses time stamps to prevent messages to the ticket-granting service from being replayed). Usually, the storage and processing overhead for encrypting a whole document is too high to be practical. This is solved by one-way hash functions. These are functions that map the content of a message onto a short value (called message digest). Similar to one-way functions, it is difficult to create a message when given only the hash value itself. Instead of encrypting the whole message, it is enough to simply encrypt the message digest and send it together with the original message. The receiver can then apply
© 2005 by CRC Press
Internet Security
6-9
the known hash function (e.g., MD5 [15]) to the document and compare it to the decrypted digest. When both values match, the message is authentic.
6.2.3 Attack and Intrusion Detection Attack detection assumes that an attacker can obtain access to his desired targets and is successful in violating a given security policy. Mechanisms in this class are based on the optimistic assumption that most of the time the information is transferred without interference. When undesired actions occur, attack detection has the task of reporting that something went wrong and then to react in an appropriate way. In addition, it is often desirable to identify the exact type of attack. An important facet of attack detection is recovery. Often it is enough to just report that malicious activity has been found, but some systems require that the effect of the attack has to be reverted or that an ongoing and discovered attack is stopped. On the one hand, attack detection has the advantage that it operates under the worst-case assumption that the attacker gains access to the communication channel and is able to use or modify the resource. On the other hand, detection is not effective in providing confidentiality of information. When the security policy specifies that interception of information has a serious security impact, then attack detection is not an applicable mechanism. The most important members of the attack detection class, which have received an increasing amount of attention in the last few years, are intrusion detection systems (IDSs). Intrusion detection [2, 3] is the process of identifying and responding to malicious activities targeted at computing and network resources. This definition introduces the notion of intrusion detection as a process, which involves technology, people, and tools. An intrusion detection system basically monitors and collects data from a target system that should be protected, processes and correlates the gathered information, and initiates responses, when evidence for an intrusion is detected. IDSs are traditionally classified as anomaly or signature based. Signature-based systems act similar to virus scanners and look for known, suspicious patterns in their input data. Anomaly-based systems watch for deviations of actual from expected behavior and classify all abnormal activities as malicious. The advantage of signature-based designs is the fact that they can identify attacks with an acceptable accuracy and tend to produce fewer false alarms (i.e., classifying an action as malicious when in fact it is not) than their anomaly-based cousins. The systems are more intuitive to build and easier to install and configure, especially in large production networks. Because of this, nearly all commercial systems and most deployed installations utilize signature-based detection. Although anomaly-based variants offer the advantage of being able to find prior unknown intrusions, the costs of having to deal with an order of magnitude more false alarms is often prohibitive. Depending on their source of input data, IDSs can be classified as either network or host based. Network-based systems collect data from network traffic (e.g., packets by network interfaces in promiscuous mode), while host-based systems monitor events at the operating system level, such as system calls, or receive input from applications (e.g., via log files). Host-based designs can collect high-quality data directly from the affected system and are not influenced by encrypted network traffic. Nevertheless, they often seriously impact performance of the machines they are running on. Network-based IDSs, on the other hand, can be set up in a nonintrusive manner — often as an appliance box without interfering with the existing infrastructure. In many cases, this makes them the preferred choice. As many vendors and research centers have developed their own intrusion detection system versions, the Internet Engineering Task Force (IETF) has created the intrusion detection working group [9] to coordinate international standardization efforts. The aim is to allow intrusion detection systems to share information and to communicate via well-defined interfaces by proposing a generic architectural description and a message specification and exchange format (IDMEF). A major issue when deploying intrusion detection systems in large network installations is the huge number of alerts that are produced. These alerts have to be analyzed by system administrators who have to decide on the appropriate countermeasures. Given the current state of the art of intrusion detection, however, many of the reported incidents are in fact false alerts. This makes the analysis process for the
© 2005 by CRC Press
6-10
The Industrial Communication Technology Handbook
system administrator cumbersome and frustrating, resulting in the problem that IDSs are often disabled or ignored. To address this issue, two new techniques have been proposed: alert correlation and alert verification. Alert correlation is an analysis process that takes as input the alerts produced by intrusion detection systems and produces compact reports on the security status of the network under surveillance. By reducing the total number of individual alerts and aggregating related incidents into a single report, it is easier for a system administrator to distinguish actual and bogus alarms. In addition, alert correlation offers the benefit of recognizing higher-level patterns in an alert stream, helping the administrator to obtain a better overview of the activities on the network. Alert verification is a technique that is directly aimed at the problem that intrusion detection systems often have to analyze data without sufficient contextual information. The classic example is the scenario of a Code Red worm that attacks a Linux Web server. It is a valid attack that is seen on the network; however, the alert that an IDS raises is of no use because the Linux server is not vulnerable (as Code Red can only exploit vulnerabilities in Microsoft’s IIS Web server). The intrusion detection system would require more information to determine that this attack cannot possibly succeed than what is available from only looking at network packets. Alert verification is a term that is used for all mechanisms that use additional information or means to determine whether an attack was successful. In the example above, the alert verification mechanism could supply the IDS with the knowledge that the attacked Linux server is not vulnerable to a Code Red attack. As a consequence, the IDS can react accordingly and suppress the alert or reduce its priority and thus reduce the workload of the administrator.
6.3 Secure Network Protocols Now that the general concepts and mechanisms of network security have been introduced, the following section concentrates on two actual instances of secure network protocols: the secure sockets layer (SSL [20]) and the transport layer security (TLS [24]) protocol. The idea of secure network protocols is to create an additional layer between the application and transport/network layers to provide services for a secure end-to-end communication channel. TCP/IP are almost always used as transport/network layer protocols on the Internet, and their task is to provide a reliable end-to-end connection between remote tasks on different machines that intend to communicate. The services on that level are usually directly utilized by application protocols to exchange data, for example, Hypertext Transfer Protocol (HTTP) for Web services. Unfortunately, the network layer transmits this data unencrypted, leaving it vulnerable to eavesdropping or tampering attacks. In addition, the authentication mechanisms of TCP/IP are only minimal, thereby allowing a malicious user to hijack connections and redirect traffic to his machine as well as to impersonate legitimate services. These threats are mitigated by secure network protocols that provide privacy and data integrity between two communicating applications by creating an encrypted and authenticated channel. SSL has emerged as the de facto standard for secure network protocols. Originally developed by Netscape, its latest version SSL 3.0 is also the base for the standard proposed by the IETF under the name TLS. Both protocols are quite similar and share common ideas, but they unfortunately cannot interoperate. The following discussion will mainly concentrate on SSL and only briefly explain the extensions implemented in TLS. The SSL protocol [21] usually runs above TCP/IP (although it could use any transport protocol) and below higher-level protocols such as HTTP. It uses TCP/IP on behalf of the higher-level protocols, and in the process allows an SSL-enabled server to authenticate itself to an SSL-enabled client, allows the client to authenticate itself to the server, and allows both machines to establish an encrypted connection. These capabilities address fundamental concerns about communication over the Internet and other TCP/IP networks and give protection against message tampering, eavesdropping, and spoofing. • SSL server authentication allows a user to confirm a server’s identity. SSL-enabled client software can use standard techniques of public key cryptography to check that a server’s certificate and
© 2005 by CRC Press
Internet Security
6-11
public key are valid and have been issued by a certification authority (CA) listed in the client’s list of trusted CAs. This confirmation might be important if the user, for example, is sending a credit card number over the network and wants to check the receiving server’s identity. • SSL client authentication allows a server to confirm a user’s identity. Using the same techniques as those used for server authentication, SSL-enabled server software can check that a client’s certificate and public key are valid and have been issued by a certification authority (CA) listed in the server’s list of trusted CAs. This confirmation might be important if the server, for example, is a bank sending confidential financial information to a customer and wants to check the recipient’s identity. • An encrypted SSL connection requires all information sent between a client and a server to be encrypted by the sending software and decrypted by the receiving software, thus providing a high degree of confidentiality. Confidentiality is important for both parties to any private transaction. In addition, all data sent over an encrypted SSL connection are protected with a mechanism for detecting tampering — that is, for automatically determining whether the data has been altered in transit. SSL uses X.509 certificates for authentication, RSA as its public key cipher, and one of RC4-128, RC2128, DES, 3DES, or IDEA (international data encryption algorithm) as its bulk symmetric cipher. The SSL protocol includes two subprotocols: the SSL Record Protocol and the SSL Handshake Protocol. The SSL Record Protocol simply defines the format used to transmit data. The SSL Handshake Protocol (using the SSL Record Protocol) is utilized to exchange a series of messages between an SSL-enabled server and an SSLenabled client when they first establish an SSL connection. This exchange of messages is designed to facilitate the following actions: • Authenticate the server to the client • Allow the client and server to select the cryptographic algorithms, or ciphers, that they both support • Optionally authenticate the client to the server • Use public key encryption techniques to generate shared secrets • Establish an encrypted SSL connection based on the previously exchanged shared secret The SSL Handshake Protocol is composed of two phases. Phase 1 deals with the selection of a cipher, the exchange of a secret key, and the authentication of the server. Phase 2 handles client authentication, if requested, and finishes the handshaking. After the handshake stage is complete, the data transfer between client and server begins. All messages during handshaking and after are sent over the SSL Record Protocol layer. Optionally, session identifiers can be used to reestablish a secure connection that has been previously set up. Figure 6.4 lists in a slightly simplified form the messages that are exchanged between the client C and the server S during a handshake when neither client authentication nor session identifiers are involved. In this figure, {data}key means that data has been encrypted with a key. The message exchange shows that the client first sends a challenge to the server, which responds with an X.509 certificate containing its public key. The client then creates a secret key and uses RSA with the server’s public key to encrypt it, sending the result back to the server. Only the server is capable of decrypting that message with its private key and can retrieve the shared, secret key. In order to prove to the client that
FIGURE 6.4 SSL handshake message exchange.
© 2005 by CRC Press
6-12
The Industrial Communication Technology Handbook
the secret key has been successfully decrypted, the server encrypts the client’s challenge with the secret key and returns it. When the client is able to decrypt this message and successfully retrieve the original challenge by using the secret key, it can be certain that the server has access to the private key corresponding to its certificate. From this point on, all communication is encrypted using the chosen cipher and the shared secret key. TLS uses the same two protocols shown above and a similar handshake mechanism. Nevertheless, the algorithms for calculating message authentication codes (MACs) and secret keys have been modified to make them cryptographically more secure. In addition, the constraints on padding a message up to the next block size have been relaxed for TLS. This leads to an incompatibility between both protocols. SSL/TLS is widely used to secure Web and mail traffic. HTTP and the current mail protocols IMAP (Internet Message Access Protocol) and POP3 (Post Office Protocol version 3) transmit user credential information as well as application data unencrypted. By building them on top of a secure network protocol such as SSL/TLS, they can benefit from secured channels without modifications. The secure communication protocols simply utilize different well-known destination ports (443 for HTTPs, 993 for IMAPs, and 995 for POP3s) than their insecure cousins.
6.4 Secure Applications A variety of popular tools that allow access to remote hosts (such as telnet, rsh, and rlogin) or that provide means for file transfer (such as rcp or ftp) exchange user credentials and data in plain text. This makes them vulnerable to eavesdropping, tampering, and spoofing attacks. Although the tools mentioned above could have also been built upon SSL/TLS, a different protocol suite called Secure Shell (SSH) [19] has been developed that follows partial overlapping goals. The SSH transport and user authentication protocols have features similar to those of SSL/TLS. However, they are different in the following ways: • TLS server authentication is optional and the protocol supports fully anonymous operation, in which neither side is authenticated. As such connections are inherently vulnerable to man-inthe-middle attacks, SSH requires server authentication. • TLS does not provide the range of client authentication options that SSH does — public key via RSA is the only option. • Most importantly, TLS does not have the extra features provided by the SSH connection protocol. The SSH connection protocol uses the underlying connection (also known as a secure tunnel), which has been established by the SSH transport and user authentication protocols between two hosts. It provides interactive login sessions, remote execution of commands and forwarded TCP/IP, as well as X11 connections. All these terminal sessions and forwarded connections are realized as different logical channels that may be opened by either side on top of the secure tunnel. Channels are flow controlled, which means that no data may be sent to a channel until a message is received to indicate that window space is available. The current version of the SSH protocol is SSH 2. It represents a complete rewrite of SSH 1 and improves some of its structural weaknesses. Because it encrypts packets in a different way and has abandoned the notion of server and host keys in favor of host keys only, the protocols are incompatible. For applications built from scratch, SSH 2 should always be the preferred choice. Using the means of logical channels for interactive login sessions and remote execution, a complete replacement for telnet, rsh, and rlogin could be easily implemented. A popular site that lists open-source implementations that are freely available for many different platforms can be found under [14]. Recently, a secure file transfer protocol (sftp) application has been developed that makes the use of regular File Transfer Protocol (FTP)-based programs obsolete. Notice that it is possible to tunnel arbitrary application traffic over a connection that has been previously set up by the SSH protocols. Similar to SSL/TLS, Web and mail traffic could be securely transmitted over a SSH connection before reaching the server port at the destination host. The difference is that SSH requires that a secure tunnel is created in advance that is bound to a certain port at the destination host. The setup
© 2005 by CRC Press
Internet Security
6-13
of this secure channel, however, requires that the client that is initiating the connection has to log in to the server. Usually, this makes it necessary that the user has an account at the destination host. After the tunnel has been established, all traffic sent by the client gets forwarded to the desired port at the target machine. Obviously, the connection is encrypted. In contrast, SSL/TLS connects directly to a certain point without prior logging in to the destination host. The encryption is set up directly between the client and the service listening at the destination port without a prior redirection via the SSH server. The technique of tunneling application traffic is often utilized for mail transactions when the mail server does not support SSL/TLS directly (as users have accounts at the mail server anyway), but it is less common for Web traffic.
6.5 Summary This chapter discusses security threats that systems face when they are connected to the Internet. In order to achieve the security properties that are required by the security policy in use, three different classes of mechanisms can be adopted. The first is attack prevention, which attempts to stop the attacker before it can reach its desired goals. Such techniques fall into the category of access control and firewalls. The second approach aims to make the data unusable for unauthorized persons by applying cryptographic means. Secret key as well as public key mechanisms can be utilized. The third class of mechanisms contains attack detection approaches. They attempt to detect malicious behavior and recover after undesired activity has been identified. The text also covers secure network protocols and applications. SSL/TLS as well as SSH are introduced, and their most common fields of operations are highlighted. These protocols form the base of securing traffic that is sent over the Internet on behalf of a variety of different applications.
References [1] Advanced Encryption Standard (AES). National Institute of Standards and Technology, U.S. Department of Commerce, FIPS 197, 2001. [2] Edward Amoroso. Intrusion Detection: An Introduction to Internet Surveillance, Correlation, Trace Back, and Response. Intrusion.Net Books, Andover, NJ, 1999. [3] Rebecca Bace. Intrusion Detection. Macmillan Technical Publishing, Indianapolis, 2000. [4] William R. Cheswick and Steven M. Bellovin. Firewalls and Internet Security. Addison-Wesley, Reading, MA, 1994. [5] George Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems: Concepts and Design, 2nd edition. Addison-Wesley, Harlow, England, 1996. [6] Data Encryption Standard (DES). National Bureau of Standards, U.S. Department of Commerce, FIPS 46-3, 1977. [7] W. Diffie and M. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22:644-654, 1976. [8] Fyodor. Nmap: The Network Mapper. http://www.insecure.org/nmap/. [9] Intrusion Detection Working Group. http://www.ietf.org/ids.by.wg/idwg.html. [10] IP Security Protocol. http://www.ietf.org/html.charters/ipsec-charter.html, 2002. [11] J. Kohl, B. Neuman, and T. T’so. The evolution of the Kerberos authentication system. Distributed Open Systems, 78-94, IEEE Computer Society Press, 1994. [12] Nessus Vulnerabilty Scanner. http://www.nessus.org/. [13] Steven Northcutt. Network Intrusion Detection: An Analyst’s Handbook. New Riders, Indianapolis, 1999. [14] OpenSSH: Free SSH Tool Suite. http://www.openssh.org. [15] R.L. Rivest. The MD5 Message-Digest Algorithm. Technical report, Internet Request for Comments (RFC) 1321, 1992.
© 2005 by CRC Press
6-14
The Industrial Communication Technology Handbook
[16] R.L. Rivest. The RC4 Encryption Algorithm. Technical report, RSA Data Security, 1992. [17] R.L. Rivest, A. Shamir, and L. A. Adleman. A method for obtaining digital signatures and publickey cryptosystems. Communications of the ACM, 21:120-126, 1978. [18] Bruce Schneier. Applied Cryptography, 2nd edition. John Wiley & Sons, New York, 1996. [19] Secure Shell (secsh). http://www.ietf.org/html.charters/secsh-charter.html, 2002. [20] Secure Socket Layer. http://wp.netscape.com/eng/ssl3/, 1996. [21] Introduction to Secure Socket Layer. http://developer.netscape.com/docs/manuals/security/ sslin/contents.htm, 1996. [22] William Stallings. Network Security Essentials: Applications and Standards. Prentice Hall, Englewood Cliffs, NJ, 2000. [23] Andrew S. Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs, NJ, 2002. [24] Transport Layer Security. http://www.ietf.org/html.charters/tsl-charter.html, 2002. [25] Public Key Infrastructure X.509. http://www.ietf.org/html.charters/pkix-charter.html, 2002.
© 2005 by CRC Press
2 Industrial Communication Technology and Systems
2-1 © 2005 by CRC Press
I Field Area and Control Networks 7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner 13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie
I-3 © 2005 by CRC Press
7 Fieldbus Systems: History and Evolution 7.1 7.2
What Is a Fieldbus? .............................................................7-1 Notions of a Fieldbus..........................................................7-2 The Origin of the Word • Fieldbuses as Part of a Networking Concept
7.3
History .................................................................................7-5
7.4
Fieldbus Standardization ....................................................7-8
The Roots of Industrial Networks • The Evolution of Fieldbuses The German–French Fieldbus War • The International Fieldbus War • The Compromise
7.5
Fieldbus Characteristics ....................................................7-15 Communication Concepts • Communication Paradigms • Above the OSI Layers: Interoperability and Profiles • Management
7.6
New Challenges: Industrial Ethernet ...............................7-20 Ethernet in IEC 61158 • Real-Time Industrial Ethernet
7.7
Aspects for Future Evolution............................................7-25 Driving Forces • System Complexity • Software Tools and Management • Network Interconnection and Security
Thilo Sauter Austrian Academy of Sciences
7.8 Conclusion and Outlook ..................................................7-30 Acknowledgments........................................................................7-31 References .....................................................................................7-31 Appendix ......................................................................................7-37
7.1 What Is a Fieldbus? Throughout the history of automation, many inventions and developments have influenced the face of manufacturing and information processing. But few novelties have had such a radical effect as the introduction of fieldbus systems, and no single achievement was so heavily disputed as these industrial networks. And yet, they have made automation what it is today. But even after some 20 years of fieldbus development, there exists no clear-cut definition for the term. The “definition” given in the IEC 61158 fieldbus standard is more a programmatic declaration, or a least common multiple compromise, than a concise formulation [1]: “A fieldbus is a digital, serial, multidrop, data bus for communication with industrial control and instrumentation devices such as — but not limited to — transducers, actuators and local controllers.” It comprises some important characteristics, but is far from being complete. On the other hand, it is a bit too restrictive. A more elaborate explanation is given by the Fieldbus Foundation, the user organization supporting one of the major fieldbus systems [2]: “A Fieldbus is a digital, two-way, multi-drop communication link among intelligent measurement and control devices. It serves as a Local Area Network (LAN) for advanced process control, remote input/output and high speed factory automation applications.” Again, this is a
7-1 © 2005 by CRC Press
7-2
The Industrial Communication Technology Handbook
bit restrictive, for it limits the application to process and factory automation, the primary areas where the Foundation Fieldbus is used. The lack of a clear definition is mostly due to the complex evolutionary history of fieldbuses. A look at today’s situation reveals that fieldbus systems are employed in all automation domains ranging from the aforementioned process and factory areas to building and home automation, machine building, automotive and railway applications, and avionics. In all those fields, bus systems emerged primarily to break up the conventional star-type point-to-point wiring schemes connecting simple digital and analog input and output devices to central controllers, thereby laying the grounds for the implementation of really distributed systems with more intelligent devices. As was declared in the original mission statement of the International Electrotechnical Commission (IEC) work, “the Field Bus will be a serial digital communication standard which can replace present signalling techniques such as 4-20 mA … so that more information can flow in both directions between intelligent field devices and the higher level control systems over a shared communication medium …” [3, 4]. But even though the replacement of especially the traditional 4–20 mA current loop by a digital interface is bequeathed as the sole impetus of fieldbus development, still in contemporary publications [5], there is much more to the idea of the fieldbus: • Flexibility and modularity: A fieldbus installation like any other network can be extended much more easily than a centralized system, provided the limitations of addressing space, cable length, etc., are not exceeded. • Configureability: A network — other than an analog interface — permits the parameterization and configuration of complex field devices, which facilitates system setup and commissioning and is the primary requirement for the usability of intelligent devices. • Maintainability: Monitoring of devices, applying updates, and other maintenance tasks are easier, if at all possible, via a network. • Distribution: A network is the prerequisite of distributed systems; many data processing tasks can be removed from a central controller and placed directly in the field devices if the interface can handle reasonably complex ways of communication. These aspects are not just theoretical contemplations but actual user demands that influenced the development from the beginning [4]. However, as the application requirements in the various automation domains were quite different, so were the solutions, and that makes it difficult to find a comprehensive definition. The purpose of this contribution is not to find the one and only precise definition for what constitutes a fieldbus. The vast literature on this topic shows that this is a futile attempt. Furthermore, such a definition would be mostly of academic nature and is not necessary that either. Instead, the following sections will treat the fieldbus as a given phenomenon in automation and look at it from different sides. Typical characteristics will be discussed as well as the role of fieldbus systems in a globally networked automation world. The major part of this chapter will be devoted to the historical evolution and the standardization processes, which will enlighten the current situation. Finally, future aspects and evolutionary potential are briefly discussed.
7.2 Notions of a Fieldbus Fieldbus systems have to be seen as an integrative part of a comprehensive automation concept and not as stand-alone solutions. The name is therefore programmatic and evocative. It seems to give an indication of the intentions the developers had in mind and thus deserves special attention.
7.2.1 The Origin of the Word Interestingly enough, not even the etymology of the term itself is fully clear. The English word fieldbus is definitely not the original one. It appeared around 1985 when the fieldbus standardization project
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-3
within IEC TC65 was launched [4] and seems to be a straightforward literal translation of the German term Feldbus, which can be traced back until about 1980 [6]. Indeed, the overwhelming majority of early publications in the area are available only in German. The word itself was coined in process industry and primarily refers to the process field, designating the area in a plant where lots of distributed field devices, mostly sensors and actuators, are in direct contact with the process to be controlled. Slightly after the German expression and sharing its etymological root, the French word réseau de terrain (or réseau d’instrumentation, instrumentation network) emerged. This term was not specifically targeted at the process industry, but refers also to large areas with scattered devices. The connection of such devices to the central control room was traditionally made via point-to-point links, which resulted in a significant and expensive cabling need. The logical idea, powered by the advances of microelectronics in the late 1970s, was to replace this star-like cabling in the field by a party-line, bus-like installation connecting all devices via a shared medium — the fieldbus [7, 8]. Given the large dimensions of process automation plants, the benefits of a bus are particularly evident. However, the concept was not undisputed when it was introduced. The fieldbus approach was an ambitious concept: a step toward decentralization, including the preprocessing of data in the field devices, which both increases the quality of process control and reduces the computing burden for the centralized controllers [9]. Along with it came the possibility to configure and parameterize the field devices remotely via the bus. This advanced concept, on the other hand, demanded increased communication between the devices that goes far beyond a simple data exchange. This seemed infeasible to many developers, and still in the mid-1980s, one could read statements like the following [10]: “The idea of the fieldbus concept seems promising. However, with reasonable effort it is not realizable at present.” The alternative and somewhat more conservative approach was the development of so-called field multiplexers, devices that collect process signals in the field, serialize them, and transfer them via one single cable to a remote location where a corresponding device de-multiplexes them again [11]. For quite some time, the two concepts competed and coexisted [12], but ultimately the field multiplexers mostly disappeared, except for niches in process automation, where many users still prefer such remote input/ output (I/O) systems despite the advantages of fieldbus solutions [13]. The central field multiplexer concept of sampling I/O points and transferring their values in simple data frames also survived in some fieldbus protocols, especially designed for low-level applications. The desire to cope with the wiring problem getting out of hand in large installations was certainly the main impetus for the development of fieldbus systems. Other obvious and appealing advantages of the concept are modularity, the possibility to easily extend installations, and the possibility to have much more intelligent field devices that can communicate not just for the sake of process data transfer, but also for maintenance and configuration purposes [14, 15]. A somewhat different viewpoint that led to different design approaches was to regard bus systems in process control as the spine of distributed real-time systems [16]. While the wiring optimization concepts were in many cases rather simple bottom-up approaches, these distributed real-time ideas resulted in sophisticated and usually well investigated top-down designs.
7.2.2 Fieldbuses as Part of a Networking Concept An important role in the fieldbus evolution has been played by the so-called automation pyramid. This hierarchical model was defined to structure the information flow required for factory and process automation. The idea was to create a transparent, multilevel network — the basis for computer-integrated manufacturing (CIM). The numbers vary, but typically this model is composed of up to five levels [7, 8]. While the networks for the upper levels already existed by the time the pyramid was defined, the field level was still governed by point-to-point connections. Fieldbus systems were therefore developed also with the aim of finally bridging this gap. The actual integration of field-level networks into the rest of the hierarchy was in fact considered in early standardization [4]; for most of the proprietary developments, however, it was never the primary intention. In the automation pyramid, fieldbuses actually populate two levels: the field level and the cell/process level. For this reason, they are sometimes further differentiated into two classes:
© 2005 by CRC Press
7-4
The Industrial Communication Technology Handbook
• Sensor–actuator buses or device buses have very limited capabilities and serve to connect very simple devices with, e.g., programmable logic controllers (PLCs). They can be found exclusively on the field level. • Fieldbuses connect control equipment like PLCs and PCs as well as more intelligent devices. They are found on the cell level and are closer to computer networks. Depending on the point of view, there may even be a third sublevel [17]. This distinction may seem reasonable but is in fact problematic. There are only few fieldbus systems that can immediately be allocated to one of the groups; most of them are used in both levels. Therefore, it should be preferable to abandon this arbitrary differentiation. How do fieldbus systems compare to computer networks? The classical distinction of the different network types used in the automation pyramid hinges on the distances the networks span. From top down, the hierarchy starts with global area networks (GANs), which cover long, preferably intercontinental distances and nowadays mostly use satellite links. On the second level are wide area networks (WANs). They are commonly associated with telephone networks (no matter if analog or digital). Next come the well-known local area networks (LANs), with Ethernet as the most widely used specimen today. They are the classical networks for office automation and cover only short distances. The highest level of the model shown in Figure 7.1 is beyond the scope of the original definition, but is gaining importance with the availability of the Internet. In fact, Internet technology is penetrating all levels of this pyramid all the way down to the process level. From GANs to LANs, the classification according to the spatial extension is evident. One step below, on the field level, this criterion fails, because fieldbus systems or field area networks (FANs) can cover even larger distances than LANs. Yet, as LANs and FANs evolved nearly in parallel, some clear distinction between the two network types seemed necessary. As length is inappropriate, the classical border line drawn between LANs and FANs relies mostly on the characteristics of the data transported over these networks. Local area networks have high data rates and carry large amounts of data in large packets. Timeliness is not a primary concern, and real-time behavior is not required. Fieldbus systems, by contrast, have low data rates. Since they transport mainly process data, the size of the data packets is small, and real-time capabilities are important. For some time, these distinction criteria between LANs and FANs were sufficient and fairly described the actual situation. Recently, however, drawing the line according to data rates and packet sizes is no longer applicable. In fact, the boundaries between LANs and fieldbus systems have faded. Today, there are fieldbus systems with data rates well above 10 Mbit/s, which is still standard in older LAN installations. In addition, more and more applications require the transmission of video or voice data, which results in large data packets. Network types
Protocol hierarchy
company level
global area networks
factory level
wide area networks
TOP
MAP shop floor level
cell level process level
field level
cell controller
PLC
sensors/actuators
CNC
local area networks Mini-MAP field area networks
Fieldbus
sensor-actuator networks
(sensor level)
FIGURE 7.1 Hierarchical network levels in automation and protocols originally devised for them.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-5
On the other hand, Ethernet as the LAN technology is becoming more and more popular in automation and is bound to replace some of today’s widely used midlevel fieldbus systems. The real-time extensions under development tackle its most important drawback and will ultimately permit the use of Ethernet in low-level control applications. At least for the next 5 years, however, it seems that Industrial Ethernet will not make the lower-level fieldbuses fully obsolete. They are much better optimized for their specific automation tasks than the general-purpose network Ethernet. But the growing use of Ethernet results in a reduction of the levels in the automation hierarchy. Hence the pyramid gradually turns into a flat structure with at most three, maybe even only two, levels. Consequently, a more appropriate distinction between LANs and FANs should be based on the functionality and the application area of these networks. According to this argumentation, a fieldbus is simply a network used in automation, irrespective of topology, data rates, protocols, or real-time requirements. Consequently, it need not be confined to the classical field level; it can be found on higher levels (provided they still exist) as well. A LAN, on the other hand, belongs to the office area. This definition is loose, but mirrors the actual situation. Only one thing seems strange at first: following this definition, the Industrial Ethernet changes into a fieldbus, even though many people are inclined to associate it with LANs. However, this is just another evidence that the boundaries between LANs and FANs are fading.
7.3 History The question of what constitutes a fieldbus is closely linked to the evolution of these industrial networks. The best approach to understanding the essence of the concepts is to review the history and intentions of the developers. This review will also falsify one of the common errors frequently purported by marketing divisions of automation vendors: that fieldbus systems were a revolutionary invention. They may have revolutionized automation — there is hardly any doubt about it. However, they were only a straightforward evolution that built on preexisting ideas and concepts.
7.3.1 The Roots of Industrial Networks Although the term fieldbus appeared only about 20 years ago, the basic idea of field-level networks is much older. Still, the roots of modern fieldbus technology are mixed. Both classical electrical engineering and computer science have contributed their share to the evolution, and we can identify three major sources of influence: • Communication engineering with large-scale telephone networks • Instrumentation and measurement systems with parallel buses and real-time requirements • Computer science with the introduction of high-level protocol design This early stage is depicted in Figure 7.2. One foundation of automation data transfer has to be seen in the classic telex networks and also in standards for data transmission over telephone lines. Large distances called for serial data transmission, and many of these comparatively early standards still exist, like V.21 (data transmission over telephone lines) and X.21 (data transmission over special data lines). Various protocols have been defined, mostly described in state machine diagrams and rather simple because of the limited computing power of the devices available at that time. Of course, these communication systems have a point-to-point nature and therefore lack the multidrop characteristic of modern fieldbus systems, but nevertheless, they were the origin of serial data transmission. Talking about serial data communication, one should notice that the engineers who defined the first protocols often had a different understanding of the terms serial and parallel than we have today. For example, the serial interface V.24 transmits the application data serially, but the control data in a parallel way over separate control lines. In parallel to the development of data transmission in the telecommunication sector, hardware engineers defined interfaces for stand-alone computer systems to connect peripheral devices such as printers. The basic idea of having standardized interfaces for external devices was soon extended to process control and instrumentation equipment. The particular problems to be solved were the synchronization of
© 2005 by CRC Press
7-6
The Industrial Communication Technology Handbook
Centronics parallel printer interfaces
Telex DT
CAMAC industrial parallel interfaces
Teletex DT
GPIB
RS 485
serial interfaces
fieldbus systems
SS7 V.21
X.21
DT in telecommunications Computer WAN X.25
FIGURE 7.2 Roots of fieldbus systems.
spatially distributed measurement devices and the collection of measurement data from multiple devices in large-scale experimental setups. This led to the development of standards like CAMAC (computerautomated measurement and control, mostly used in nuclear science) and GPIB (general purpose interface bus, later also known as IEEE 488). To account for the limited data processing speed and real-time requirements for synchronization, these bus systems had parallel data and control lines, which is also not characteristic for fieldbus systems. However, they were using the typical multidrop structure. Later on, with higher integration density of integrated circuits and thus increased functionality and processing capability of microcontrollers, devices became smaller and portable. The connectors of parallel bus systems were now too big and clumsy, and alternatives were sought [18]. The underlying idea of developments like I2C [19] was to extend the already existing serial point-to-point connections of computer peripherals (based on RS 232) to support longer distances and finally also multidrop arrangements. The capability of having a bus structure with more than just two connections together with an increased noise immunity due to differential signal coding eventually made RS 485 a cornerstone of fieldbus technology up to the present day. Historically the youngest root of fieldbus systems, but certainly the one that left the deepest mark, was the influence of computer science. Its actual contribution was a structured approach to the design of high-level communication systems, contrary to the mostly monolithic design approaches that had been sufficient until then. This change in methodology had been necessitated by the growing number of computers used worldwide and the resulting complexity of communication networks. Conventional telephone networks were no longer sufficient to satisfy the interconnection requirements of modern computer systems. As a consequence, the big communication backbones of the national telephone companies gradually changed from analog to digital systems. This opened the possibility to transfer large amounts of data from one point to another. Together with an improved physical layer, the first really powerful data transmission protocols for wide area networks were defined, such as X.25 (packet switching) or SS7 (common channel signaling). In parallel to this evolution on the telecommunications sector, local area networks were devised for the local interconnection of computers, which soon led to a multitude of solutions. It took nearly a decade until Ethernet and the Internet Protocol (IP) suite finally gained the dominating position they have today.
7.3.2 The Evolution of Fieldbuses The preceding section gave only a very superficial overview of the roots of networking, which laid the foundations not only of modern computer networks, but also of those on the field level. But let us now look more closely at the actual evolution of the fieldbus systems. Here again, we have to consider the
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-7
different influences of computer science and electrical engineering. First and foremost, the key contribution undoubtedly came from the networking of computer systems, when the International Organization for Standardization (ISO) introduced the Open Systems Interconnection (OSI) model [20, 21]. This seven-layer reference model was (and still is) the starting point for the development of many complex communication protocols. The first application of the OSI model to the domain of automation was the definition of the Manufacturing Automation Protocol (MAP) in the wake of the CIM idea [22]. MAP was intended to be a framework for the comprehensive control of industrial processes covering all automation levels, and the result of the definition was a powerful and flexible protocol [23]. Its complexity, however, made implementations extremely costly and hardly justifiable for general-purpose use. As a consequence, a tightened version called MiniMAP, using a reduced model based on OSI layers 1, 2, and 7, was proposed to better address the problems of the lower automation layers [24]. Unfortunately, it did not have the anticipated success either. What did have success was Manufacturing Message Specification (MMS). It defined the cooperation of various automation components by means of abstract objects and services and was later used as a starting point for many other fieldbus definitions [25]. The missing acceptance of MiniMAP and the inapplicability of the original MAP/MMS standard to time-critical systems [26] were finally the reason for the IEC to launch the development of a fieldbus based on the MiniMAP model, but tailored to the needs of the field level. According to the original objectives, the higher levels of the automation hierarchy should be covered by MAP or PROWAY (process data highway) [22]. Independent of this development in computer science, the progress in microelectronics brought forward many different integrated controllers (ICs), and new interfaces were needed to interconnect the ICs in an efficient and cheap way. The driving force was the reduction of both the interconnect wires on the printed circuit boards and the number of package pins on the ICs. Consequently, electrical engineers — without knowledge of the ISO/OSI model or similar architectures — defined simple buses like the I2C. Being interfaces rather than fully fledged bus systems, they have very simple protocols, but they were and still are widely used in various electronic devices. Long before the invention of board-level buses, the demand for a reduction of cabling weight in avionics and space technology had led to the development of the MIL-STD-1553 bus, which can be regarded as the first real fieldbus. Introduced in 1970, it showed many characteristic properties of modern fieldbus systems: serial transmission of control and data information over the same line, master–slave structure, the possibility to cover longer distances, and integrated controllers. It is still used today. Later on, similar thoughts (reduction of cabling weight and costs) resulted in the development of several bus systems in the automotive industry, but also in the automation area. A characteristic property of these fieldbuses is that they were defined in the spirit of classical interfaces, with a focus on the lower two protocol layers, and no or nearly no application layer definitions. With time, these definitions were added to make the system applicable to other areas as well. Controller Area Network (CAN) is a good example of this evolution: for the originally targeted automotive market, the definition of the lowest two OSI layers was sufficient. Even today, automotive applications of CAN typically use only these low-level communication features because they are easy to use and the in-vehicle networks are usually closed. For applications in industrial automation, however, where extensibility and interoperability are important issues, higher-level functions are important. So, when CAN was found to be interesting also for other application domains, a special application layer was added. The lack of such a layer in the original definition is the reason why there are many different fieldbus systems (like CANopen, Smart Distributed System (SDS), DeviceNet) using CAN as a low-level interface. From today’s point of view, it can be stated that all fieldbuses that still have some relevance were developed using the top-down or computer science-driven approach, i.e., a proper protocol design with abstract high-level programming interfaces to facilitate usage and integration in complex systems. The fieldbuses that followed the bottom-up or electrical engineering-driven approach, i.e., that were understood as low-level computer interfaces, did not survive due to their inflexibility and incompatibility with modern software engineering, unless some application layer functions were included in the course of the evolution.
© 2005 by CRC Press
7-8
The Industrial Communication Technology Handbook
Computer Science
ARPANET Microprocessors
Internet
Ethernet
C4004
ISO/OSI
C8080
C8086
Building and home automation
80386
Batibus
Modbus ARCNET
PDV-Bus Automotive and avionics
IEEE488 GPIB Predecessors
1970
Pentium
BacNet EIB LON
FF PROWAY ISA SP50 FIP ControlNet IEC61158 P-NET Profibus SDS IEC61784 EN50254 Interbus ASi Bitbus EN50170 EN50325 Sercos Hart DeviceNet
ARINC
MIL 1553 Interfaces, Instrumentation, PCB busses CAMAC
80486
CEbus
X10
Industrial and process
WWW
MMS
MAP
CAN
I 2C HP-IL RS485
SwiftNet M-Bus
Meas. Bus
Proprietary and Open Systems
1980
1990
International Standards
2000
FIGURE 7.3 Milestones of fieldbus evolution and related fields.
From the early 1980s on, when automation made a great leap forward with PLCs and more intelligent sensors and actuators, something like a gold rush set in. The increasing number of devices used in many application areas called for reduced cabling, and microelectronics had grown mature enough to support the development of elaborated communication protocols. This was also the birth date for the fieldbus as an individual term. Different application requirements generated different solutions, and from today’s point of view, it seems that creating new fieldbus systems was a trendy and fashionable occupation for many companies in the automation business. Those mostly proprietary concepts never had a real future, because the number of produced nodes could never justify the development and maintenance costs. Figure 7.3 depicts the evolution timeline of fieldbus systems and their environments. The list of examples is of course not comprehensive; only systems that still have some significance have been selected. Details about the individual solutions are summarized in the tables in the appendix. As the development of fieldbus systems was a typical technology push activity driven by the device vendors, the users first had to be convinced of the new concepts. Even though the benefits were quite obvious, the overwhelming number of different systems appalled rather than attracted the customers, who were used to perfectly compatible current-loop or simple digital inputs and outputs as interfaces between field devices and controllers and were reluctant to use new concepts that would bind them to one vendor. What followed was a fierce selection process where not always the fittest survived, but often those with the highest marketing power behind them. Consequently, most of the newly developed systems vanished or remained restricted to small niches. After a few years of struggle and confusion on the user’s side, it became apparent that proprietary fieldbus systems would always have only limited success and that more benefit lies in creating open specifications, so that different vendors may produce compatible devices, which gives the customers back their freedom of choice [8, 27]. As a consequence, user organizations were founded to carry on the definition and promotion of the fieldbus systems independent of individual companies [28]. It was this idea of open systems that finally paved the way for the breakthrough of the fieldbus concept.
7.4 Fieldbus Standardization From creating an open specification to the standardization of a fieldbus system it is only a small step. The basic idea behind it is that a standard establishes a specification in a very rigid and formal way, ruling out the possibility of quick changes. This attaches a notion of reliability and stability to the specification,
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-9
which in turn secures the trust of the customers and, consequently, also the market position. Furthermore, a standard is vendor independent, which guarantees openness. Finally, in many countries standards have a legally binding position, which means that when a standard can be applied (e.g., in connection with a public tender), it has to be applied. Hence, a standardized system gains a competitive edge over its nonstandardized rivals. This position is typical for, e.g., Europe (see [29] for an interesting U.S.-centric comment). It is therefore no wonder that after the race for fieldbus developments, a race for standardization was launched. This was quite easy on a national level, and most of today’s relevant fieldbus systems soon became national standards. Troubles started when international solutions were sought. One problem of fieldbus standardization is that the activities are scattered among a multitude of committees and working groups according to the application fields. This reflects the historical evolution and underpins the previous statement that the fieldbus is not a unique and revolutionary technology, but emerged independently in many different areas. Interestingly enough, the standardization activities are not even confined to the electrotechnical standardization bodies. Inside the IEC, committees concerned are: • IEC TC65/SC65C: Industrial-Process Measurement and Control/Digital Communications • IEC TC17/SC17B: Switchgear and Controlgear/Low-Voltage Switchgear and Controlgear In the ISO, work is being done in: • ISO TC22/SC3: Road Vehicles/Electrical and Electronic Equipment • ISO TC184/SC5: Industrial Automation Systems and Integration/Architecture, Communications and Integration Frameworks • ISO TC205/WG3: Building Environment Design/Building Control Systems Design The second player in the international standardization arena is the European standardization bodies CENELEC and CEN.* They are not mirrors of the IEC and ISO; the committees work independently, even though much work is being done in parallel. In recent years, cooperation agreements were established with the aim of facilitating the harmonization of international standardization. The cooperation of ISO and CEN is governed by the Vienna Agreement [30] (1990), and that of IEC and CENELEC by the Dresden Agreement [31] (1996). Roughly, these documents define procedures to carry out parallel votings and to simultaneously adopt standards on both the international and European levels. In practice, this comes down to international standards always superseding European ones, even though there is the theoretical possibility of European work being adopted on an international level. Hence, European committees are today much more closely connected to their worldwide counterparts than they were at the beginning of the fieldbus era. Within CENELEC, such relevant committees are: • CLC TC65CX: Fieldbus • CLC TC17B: Low-voltage Switchgear and Controlgear Including Dimensional Standardization • CLC TC205: Home and Building Electronic Systems (HBES) In CEN, fieldbuses are defined in: • CEN TC247: Building Automation, Controls and Building Management The committee with the longest track record in fieldbus standardization is IEC SC65C, which in May 1985 started the ambitious endeavor of defining an international and uniform fieldbus standard for process and industrial automation. This initiative came relatively early, soon after the trend toward fieldlevel networking and the inability of MAP to fully cover it became apparent. With the background of several industry-driven solutions emerging all around, however, this project caused heavy turbulences and opened a battlefield for politics that gradually left the ground of technical discussion. Table 7.1 shows the overall timeline of these fieldbus wars, which form an essential and obscure chapter in the fieldbus history and thus deserve special attention. *CENELAC, Comité Européen de Normalisation Electrotechnique (European Committee for Electrotechnical Standardization); CEN, Comité Européen de Normalisation (European Committee for Standardization).
© 2005 by CRC Press
7-10
The Industrial Communication Technology Handbook
TABLE 7.1
Fieldbus Standardization Timeline from the Viewpoint of IEC 61158
Period
Status of Standards
1985–1990
The claims are staked
1990–1994
German–French fieldbus war
1995–1998
Standardization locked in stalemate
1999–2000 2000–2002
The compromise Amendments to reach maturity for the market
Major Activities Start of the IEC fieldbus project; selection of various national standards — German Profibus and French FIP are the main candidates; first attempts to combine the two approaches Attempt of a general specification based on WorldFIP and the Interoperable System Project (ISP) Development of the American Foundation Fieldbus (FF) in response to the European approach and formation of the CENELEC standards comprising several fieldbus systems in one standard; deadlock of the international standard through obstructive minorities The eight-type specification becomes a standard The standard is enhanced by more types and the necessary profiles are specified in IEC 61784
7.4.1 The German–French Fieldbus War The actual starting point of international fieldbus standardization in IEC SC65C was a new work item proposed by the German national mirror committee [32]. The task was allocated to the already existing working group 6 dealing with the definition of PROWAY, another fieldbus predecessor. At that time, the development of fieldbus systems was mainly a European endeavor, thrust forward by research projects that still had a strong academic background as well as many proprietary developments. The European activities — at least those on the nonproprietary level — also have to be seen as a response to MAP, where the U.S. had a dominating position. Hence, the two big European fieldbus projects at that time, Factory Instrumentation Protocol (FIP) and Profibus, were intended to be counterweights for the international automation world. The IEC work started with a definition of requirements a fieldbus must meet [4]. In parallel, the ISA SP50 committee started its own fieldbus project on the U.S. level and defined a slightly different set of requirements [24, 33]. Work was coordinated between the two committees, with ISA taking the more active part. It launched a call for proposals to evaluate existing solutions. In response to this call, the following systems were identified as possible candidates [34]: • FIP (Flux Information Processus, Factory Instrumentation Protocol), a French development started around 1982 • Profibus (derived from process field), a German project started around 1984 • A proposal from Rosemount based on the ISO 8802.4 token-passing bus • A proposal from Foxboro based on the high-level data link control (HDLC) protocol • The IEEE 1118 project, in fact an extension of Bitbus • An extension of MIL-STD-1553B defined by a U.K. consortium All these proposals were evaluated, and finally the two most promising projects retained for further consideration were the French FIP and the German Profibus. Unfortunately, the approaches of the two systems were completely different. Profibus was based on a distributed control idea and in its original form supported an object-oriented vertical communication according to the client–server model in the spirit of the MAP/MMS specification, with the lower two layers taken from the exiting PROWAY project. FIP, on the other hand, was designed with a central, but strictly real-time-capable control scheme and with the newly developed producer–consumer (producer–distributor–consumer) model for horizontal communication. In fact, the idea behind FIP was to have a distributed operating system; a communication protocol was just one building block. Different as they were, the two systems were well suited for complementary application areas [35]. Evidently, a universal fieldbus had to combine the benefits of both, and so the following years saw strong efforts to find a viable compromise and a convergence between the two approaches. The most problematic part was the data link layer, where Profibus supported a token-passing scheme, while FIP relied on a
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-11
central scheduling approach. The suggestion to standardize both in parallel was not supported, and so it came that two different proposals were put to vote: a token-passing approach and a new proposal defined by an expert group with the aim of reconciling the two worlds [32]. The latter was more FIP oriented and finally prevailed [36], but it was very complex and left many Profibus supporters skeptical about its practical usability. In the meantime, the leading role in the standardization efforts on the IEC level had been taken not by the Europeans, but by the work of the SP50 committee of the Instrumentation, Systems and Automation Society (ISA, at that time still standing for Instrument Society of America). Owing to its mandatory composition involving manufacturers and users, it had taken a more pragmatic view and had been much more efficient during the late 1980s [37]. Actually, the committee had defined and issued (as a U.S. standard in 1993) a solution on its own. The results of this work exerted an important influence on the layer structure of the standard as we have it today [8, 38]. Finally, ISA and IEC decided to have joint meetings [35], and from that point onward the actual technical work was done within ISA SP50, while IEC restricted its activities to organizing the voting process. By the mid-1990s, the IEC committee was still struggling to overcome the differences between Profibus and FIP in what was sarcastically called the two-headed monster. With respect to its goal of defining a uniform fieldbus solution, it had not produced any substantial outcome for more than 8 years. The only exception was the definition of the physical layer, which was adopted as IEC 61158-2 in 1993. This part is the one that has since been used very successfully, mainly in the process automation area. On top of the physical layer, however, the standardization drafts became more and more comprehensive and overloaded with all kinds of communication and control principles imported from the different systems. In the data link layer specification, for example, three different types of tokens were introduced: the scheduler token, which determines which station controls the timing on the bus; the delegated token, with which another station can temporarily gain control over the bus; and the circulated token, which is passed from station to station for bus access. The problem with these all-inclusive approaches was that a full implementation of the standard was too expensive, whereas a partial implementation would have resulted in incompatible and not interoperable devices (a problem that was encountered also in the early implementations of, e.g., Profibus-FMS (fieldbus message specification), where significant parts of the standard are optional and not mandatory). Outside the international standardization framework, but alerted by the inability of the committees to reach a resolution, the big vendors of automation systems launched two additional initiatives to find a compromise. The foundation of the international WorldFIP project in 1993 had the goal of adding the functionality of the client–server model to FIP [39]. On the other side, the Interoperable System Project (ISP) attempted to demonstrate from 1992 onward how Profibus could be enhanced with the publisher–subscriber communication model, which is about the same as the producer–consumer model of FIP. Strange enough, the ISP was abandoned in 1994, before reaching a mature state, for strategic reasons [40].
7.4.2 The International Fieldbus War In 1994, after long years of struggles between German and French experts to combine the FIP and Profibus approaches, several, mainly American, companies decided to no longer watch the endless discussions. With the end of the ISP project, several former project members joined forces with the WorldFIP North America organization and formed the Fieldbus Foundation. This new association began the definition of a new fieldbus optimized for the process industry: the Foundation Fieldbus (FF). The work was done outside the IEC committees within the ISA, and for some time, the IEC work seemed to doze off. Meanwhile in Europe, disillusion had run rampant [3]. Following the failure to find an acceptable IEC draft for a universal fieldbus, several players deemed it necessary to make a new start at least on a European level. Therefore, the CENELEC committee TC65CX was established in 1993 with the aim of finding an intermediate solution until an agreement was reached within IEC. By that time, the standardization issue had ceased to be a merely technical question. Fieldbus systems had already made their way into the
© 2005 by CRC Press
7-12
The Industrial Communication Technology Handbook
TABLE 7.2 Contents of the CENELEC Fieldbus Standards and Their Relation to IEC IS 61158 CENELEC Standards Part
Contained in IEC Standard
EN 50170-1 (July 1996) EN 50170-2 (July 1996) EN 50170-3 (July 1996) EN 50170-A1 (Apr. 2000) EN 50170-A2 (Apr. 2000) EN 50170-A3 (Aug. 2000) EN 50254-2 (Oct. 1998) EN 50254-3 (Oct. 1998) EN 50254-4 (Oct. 1998) EN 50325-2 (Jan. 2000) EN 50325-3 (Apr. 2000) EN 50325-4 (July 2002) EN 50295-2 (Dec. 1998)
IS 61158 type 4 IS 61158 type 1/3/10 IS 61158 type 1/7 IS 61158 type 1/9 IS 61158 type 1/3 IS 61158 type 2 IS 61158 type 8 (IS 61158 type 3) (IS 61158 type 7) IS 62026-3 (2000) IS 62026-5 (2000) IS 62026-2 (2000)
Brand Name P-Net Profibus WorldFIP Foundation Fieldbus Profibus-PA ControlNet Interbus Profibus-DP (Monomaster) WorldFIP (FIPIO) DeviceNet SDS CANOpen AS-Interface
Note: The dates given in parentheses are the dates of ratification by the CENELEC Technical Board. The parenthetical IEC types denote that the respective fieldbus is contained in a superset definition.
market, much effort and enormous amounts of money had been invested in the development of protocols and devices, and there were already many installations. Nobody could afford to abandon a successful fieldbus; hence it was — from an economical point of view — impossible to start from scratch and create a unified but new standard that was incompatible with the established and widely used national ones. The emerging market pressure was also a reason that within CENELEC no uniform fieldbus solution could be agreed upon. However, the national committees found after lengthy and controversial discussions [3] a remarkable and unprecedented compromise: all national standards under consideration were simply compiled “as is” to European standards [41]. Every part of such a multipart standard is a copy of the respective national standard, which means that every part is a fully functioning system. Although this approach is very pragmatic and seems easy to carry out once adopted, it took a long time to reach it. After all, with the strict European regulations about the mandatory application of standards, being part of it would ensure competitiveness for the respective system suppliers. As there were mostly representatives of the big players present in the committees, they naturally tried to optimize their own positions. Consequently, the contents of the individual CENELEC standards that were adopted step by step still reflect the strategic alliances that had to be formed by the national committees to get “their” standard into the European ones. To make the CENELEC collection easier to handle, the various fieldbus systems were bundled according to their primary application areas. EN 50170 contains general-purpose field communication systems, EN 50254 has high-efficiency communication subsystems for small data packages, and EN 50325 is composed of different solutions based on the CAN technology. In the later phases of the European standardization process, the British national committee played the part of an advocate of the American companies and submitted also FF, DeviceNet, and ControlNet for inclusion in the European standards. Table 7.2 shows a compilation of all these standards, as well as their relation to the new IEC standard. For the sake of completeness, it should be noted that a comparable, though much less disputed, standardization process also took place for bus systems used in machine construction (dealt with by ISO), as well as building automation (in CEN and more recently in ISO). While the Europeans were busy standardizing their national fieldbus systems and simply disregarded what happened in IEC, the Fieldbus Foundation prepared its own specification. This definition was modeled after the bus access scheme of FIP and the application layer protocol of the ISP work (which was in turn based on Profibus-FMS). The FF specification naturally influenced the work in the IEC committee, and consequently, the new draft evolved into a mixture of FF and WorldFIP. By several members of IEC TC65, this was seen as a reasonable compromise able to put an end to the lengthy debate.
© 2005 by CRC Press
7-13
Fieldbus Systems: History and Evolution
However, when the draft was put to a vote in 1996, it was rejected by a very narrow margin, and the actual fieldbus war started. What had happened? The casus belli was that Profibus (specifically the variant PA, which was named after the target application area, process automation, and which had been developed by the Profibus User Organization based on the ideas of the abandoned ISP project) was no longer properly represented in the IEC draft. When the majority of ISP members had teamed up with WorldFIP North America to form the Fieldbus Foundation, the main Profibus supporters had been left out in the rain. The fact that Profibus was already part of a CENELEC standard was no consolation. Given the strict European standardization rules and the Dresden Agreement, according to which international (i.e., IEC) standards supersede opposing CENELEC standards, the Profibus proponents feared that FF might gain a competitive advantage and “their” fieldbus might lose ground. Consequently, the countries where Profibus had a dominant position managed to organize an obstructive minority that prohibited the adoption of the standard. The fact that the IEC voting rules make it easier to cast positive votes (negative votes have to be justified technically) was no particular hindrance, as there were still many inconsistencies and flaws in the draft that could serve as a fig leaf. The FF empire (as it was seen by the Profibus supporters) could not take this and struck back to save “their” standard. They launched an appeal to cancel negative votes that had, in their opinion, no sufficient technical justification. The minority of votes against the draft was very small, so the cancellation of a few votes would have been enough to turn the voting result upside down. Because this idea of using rather sophisticated legal arguments to achieve the desired goal was rather delicate, they proposed that the members of the IEC committee (i.e., the respective national mirror committees) should decide about the (non-)acceptance of the incriminated votes — a procedure that is not in conformance with the IEC rules and caused substantial exasperation. The discredited countries filed a complaint to the Committee of Action (CoA) of the IEC and asked it to resolve the situation. Owing to the infrequent meetings and rather formal procedures, the controversy sketched here carried on for several months. In the meantime, a new draft had been prepared with most of the editorial errors removed. The main discussion point was again the data link layer draft. But now the question was whether the draft in its present form could really be implemented to yield a functioning fieldbus. The Profibus supporters claimed it was not possible, and they envisioned — especially in Europe — a dreary scenario of a nonfunctional IEC fieldbus standard replacing the market-proven European counterparts. The FF proponents maintained it was possible. Their argument was that the Foundation Fieldbus was implemented according to the draft and that products were already being sold. The debate waved to and fro, and Figure 7.4 tries to depict why it was so difficult to judge what was right. Over the years of development, several different versions of the data link layer specification had been submitted to the various standardization committees or implemented as products. Hence, both sides could find ample evidence for their claims.
1995 IEC WG6 DLL
1997
1996 IEC WG6 DLL CDV 160 / 161
Editorial changes
Rejected in vote
1996 ISA SP 50.02 DLL
FF Prelim. Spec. Corrections / Amendments to ISA Subset 1996
IEC WG6 DLL CDV 178/179 1997 DD 238: FF Prelim. Spec. ref. IEC 160/161 instead of ISA
FDIS Vote
To CENELEC
Amendment EN 50170 prA1
1997 FF Final Spec.
To market
Addition of 80 extra pages
FIGURE 7.4 Evolution of the IEC 61158 data link layer and the Foundation Fieldbus (FF) demonstrating the various inconsistent flavors of the document.
© 2005 by CRC Press
7-14
The Industrial Communication Technology Handbook
In the course of subsequent voting processes, the battle raged and things grew worse. There were countries voting — both in favor and against — that had never cast a vote before or that according to their status in the IEC were not even allowed to vote. There were votes not being counted because they were received on a fax machine different from that designated at the IEC and thus considered late (because the error was allegedly discovered only after the submission deadline and it took several days to carry the vote to the room next door). Finally, there were rumors about presidents of national committees who high-handedly changed the conclusions of their committee experts. Throughout this entire hot phase of voting, the meetings of the national committees burst of representatives of leading companies trying to convince the committees of one or the other position. Never before or afterwards was the interest in fieldbus standardization so high, and never were the lobbying efforts so immense — including mobilization of the media, who had difficulties getting an objective overview of the situation [42]. The spiral kept turning faster and faster, but by and large, the obstruction of the standard draft remained unchanged, and the standardization process had degenerated to a playground for company tactics, to an economical and political battle that was apt to severely damage the reputation of standardization as a whole.
7.4.3 The Compromise On June 15, 1999, the Committee of Action of the IEC decided to go a completely new way to break the stalemate. One month later, on July 16, the representatives of the main contenders in the debate (Fieldbus Foundation, Fisher Rosemount, ControlNet International, Rockwell Automation, Profibus User Organization, and Siemens) signed a “Memorandum of Understanding,” which was intended to put an end to the fieldbus war. The Solomonic resolution was to create a large and comprehensive IEC 61158 standard accommodating all fieldbus systems — a move that left unhappy many of those who had been part of the IEC fieldbus project from the beginning [36, 43]. However, other than CENELEC, where complete specifications had been copied into the standard, the IEC decided to retain the original layer structure of the draft with physical, data-link, and application layers, each separated into services and protocols parts (Table 7.3). The individual fieldbus system specifications had to be adapted to so-called types to fit into this modular structure. In a great effort and under substantial time pressure, the draft was compiled and submitted for vote. The demand of the CoA was clear-cut: either this new draft would finally be accepted, or the old draft would be adopted without further discussion. Hence it was no wonder that the new document passed the vote, and the international fieldbus was released as a standard on the carefully chosen date of December 31, 2000. It was evident that the collection of fieldbus specification modules in the IEC 61158 standard was useless for any practicable implementation. What was needed was a manual for the practical use showing which parts can be compiled to a functioning system and how this can be accomplished. This guideline was compiled later on as IEC 61784-1 as a definition of so-called communication profiles [44]. At the same time, the specifications of IEC 61158 were corrected and amended. The collection of profiles shows TABLE 7.3 Systems
© 2005 by CRC Press
Structure of the IEC 61158 Fieldbus for Industrial Control
Standards Part
Contents
Contents and Meaning
IEC 61158-1 IEC 61158-2 IEC 61158-3 IEC 61158-4 IEC 61158-5 IEC 61158-6 IEC 61158-7 IEC 61158-8
Introduction PhL: Physical Layer DLL: Data Link Layer Services DLL: Data Link Layer Protocols AL: Application Layer Services AL: Application Layer Protocols Network Management Conformance Testing
Only technical report 8 types of data transmission 8 types 8 types 10 types 10 types Must be completely revised Work has been canceled
7-15
Fieldbus Systems: History and Evolution
TABLE 7.4
Profiles and Protocols according to IEC 61784-1 and IEC 61158 IEC 61158 Protocols
IEC 61784 Profile CPF-1/1 CPF-1/2 CPF-1/3 CPF-2/1 CPF-2/2 CPF-3/1 CPF-3/2 CPF-3/3 CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2
Phy
DLL
AL
CENELEC Standard
Brand Names
Type 1 Ethernet Type 1 Type 2 Ethernet Type 3 Type 1 Ethernet Type 4 Type 4 Type 1 Type 1 Type 1 Type 8 Type 8 Type 8 Type 6 Type 6
Type 1 TCP/UDP/IP Type 1 Type 2 TCP/UDP/IP Type 3 Type 3 TCP/UDP/IP Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 Type 6 Type 6
Type 9 Type 5 Type 9 Type 2 Type 2 Type 3 Type 3 Type 10 Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 — Type 6
EN 50170-A1 (Apr. 2000) — EN 50170-A1 (Apr. 2000) EN 50170-A3 (Aug. 2000) — EN 50254-3 (Oct. 1998) EN 50170-A2 (Oct. 1998) — EN 50170-1 (July 1996) EN 50170-1 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) — —
Foundation Fieldbus (H1) Foundation Fieldbus (HSE) Foundation Fieldbus (H2) ControlNet EtherNet/IP Profibus-DP Profibus-PA PROFInet P-Net RS-485 P-Net RS-232 WorldFIP (MPS, MCS) WorldFIP (MPS, MCS, SubMMS) WorldFIP (MPS) Interbus Interbus TCP/IP Interbus subset Swiftnet transport Swiftnet full stack
that the international fieldbus today consists of seven different main systems (communication profile families) that in turn can be subdivided (see Table 7.4). All important fieldbuses from industrial and process automation are listed here, and the world’s biggest automation companies are represented with their developments. Foundation Fieldbus consists of three profiles. The H1 bus is used in process automation, whereas high-speed Ethernet (HSE) is planned as an Ethernet backbone and for industrial automation. H2 is a remainder of the old draft. It allows for a migration of the WorldFIP solution toward FF, but in the profile description it is explicitly noted that there are no products available. From the Profibus side, the two profiles DP (decentralized periphery) and PA are present (even the new PROFInet has been included). Interestingly, the experts did not consider it worthwhile to list the original version of Profibus, the FMS, which is a strong sign for the diminishing importance, if not abandonment, of this hard-to-engineer fieldbus that is currently only contained in the EN 50170-2. The Danish fieldbus P-Net was taken over like all definitions and variants of WorldFIP and Interbus. In the latter case, the extensions for the tunneling of TCP/IP traffic have also been foreseen in the standard. A newcomer in the fieldbus arena is Swiftnet, which is widely used in airplane construction. The correct designation of an IEC fieldbus profile is shown for the example of Profibus-DP: compliance to IEC 61784 Ed.1:2002 CPF 3/1. Table 7.5 shows some technical characteristics and the main fields of application for the different systems. Lowlevel fieldbus systems for simple inputs/outputs (I/Os) such as the ones based on CAN or the AS-Interface are not part of IEC 61158; it is planned to combine them in IEC 62026.
7.5 Fieldbus Characteristics The application areas of fieldbus systems are manifold; hence, many different solutions have been developed in the past. Nevertheless, there is one characteristic and common starting point for all those efforts. Fieldbus systems were always designed for efficiency, with two main aspects: • Efficiency concerning data transfer, meaning that messages are rather short according to the limited size of process data that must be transmitted at a time • Efficiency concerning protocol design and implementation, in the sense that typical field devices do not provide ample computing resources
© 2005 by CRC Press
7-16
TABLE 7.5
The Industrial Communication Technology Handbook
Technical Characteristics and Application Domains of the Different Profiles
Profile
Name
CPF-1/1 CPF-1/2
FF (H1) FF (HSE)
CPF-1/3
Industry
Special Features
Nodes per Segment
Processing
Bus Access
Centralized Decentralized Decentralized Centralized Decentralized Centralized
Producer–consumer with distributor CSMA/CD Producer–consumer with distributor Producer–consumer
Centralized
Master–slave with token passing
Max. 99 Max. 30 Max. 126 Max. 32
Function blocks for decentralized control
FF (H2)
Process Factory Process Factory
Max. 32
CPF-2/1 CPF-2/2 CPF-3/1
ControlNet EtherNet/IP Profibus-DP
Factory Factory Factory
CPF-3/2
Profibus-PA
Process
CPF-3/3
PROFInet
Factory
Decentralized
Producer–consumer
Max. 30
CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2
P-Net RS-485 P-Net RS-232 WorldFIP
Factory Shipbuilding Factory
Optimized for factory applications Optimized for remote I/O Optimized for process control Distributed automation objects Multinet capability
Centralized
Max. 32
Distributed real-time database
Centralized Decentralized
Master–slave with token passing Producer–consumer with distributor
Interbus Interbus TCP/IP Interbus Subset Swiftnet transport Swiftnet full stack
Factory
Optimized for remote I/O
Centralized
Aircraft
Optimized for aircraft
Decentralized
Max. 30 Max. 32
Centralized
Single master with synchronized shift register Producer–consumer with distributor
Max. 256
Max. 256
Max. 1024
These two aspects, together with characteristic application requirements in the individual areas with respect to real-time, topology, and economical constraints, have led to the development of concepts that still are very peculiar of fieldbus systems and present fundamental differences to LANs.
7.5.1 Communication Concepts One difference to LANs concerns the protocol stack. Like all modern communication systems, fieldbus protocols are modeled according to the ISO/OSI model. However, normally only layers 1, 2, and 7 are actually used [14]. This is in fact a tribute to the lessons learned from the MAP failure, where it was found that a full seven-layer stack requires far too many resources and does not permit an efficient implementation. For this reason, the MiniMAP approach and, based on it, the IEC fieldbus standard explicitly prescribe a three-layer structure consisting of physical, data link, and application layers. In most cases, this reduced protocol stack reflects the actual situation found in many automation applications anyway. Fieldbuses typically are single-segment networks, and extensions are realized via repeaters or, at most, bridges. Therefore, network and transport layers — which contain routing functionality and end-to-end control — are simply not necessary. If functions of these layers, as well as layers 5 and 6, are still needed, they are frequently included in layer 2 or 7. For the IEC 61158 fieldbus standard, the rule is that layer 3 and 4 functions can be placed in either layer 2 or layer 7, whereas layer 5 and 6 functionalities are always covered in layer 7 (Figure 7.5) [45]. In the building automation domain (LonWorks, EIB/KNX [European installation bus and its successor, Konnex], BacNet), the situation is different. Owing to the possibly high number of nodes, these fieldbus systems must offer the capability of hierarchically structured network topologies, and a reduction to three layers is not sensible. For typical process control applications, determinism of data transfer is a key issue, and cycle time is a critical parameter. This fact has been the optimization criterion for many different fieldbus protocols and the reason that they are different from conventional LANs. Particularly the physical layer has to meet substantially more demanding requirements like robustness, immunity to electromagnetic disturbances,
© 2005 by CRC Press
7-17
Fieldbus Systems: History and Evolution
Full OSI stack Application Presentation Session Transport Network Data Link Physical
Reduced fieldbus stack IEC 61158 Coverage Application
Application
Data Link Physical
Data Link Physical
FIGURE 7.5 Layer structure of a typical fieldbus protocol stack as defined by IEC 61158.
intrinsic safety for hazardous areas, and costs. The significance of the physical layer is underpinned by the fact that this area was the first that reached (notably undisputed) consensus in standardization. On the data link layer, all medium access strategies also known from LANs are used, plus many different subtypes and refinements. Simple master–slave polling (ASi, Profibus-DP) is used as well as token-based mechanisms in either explicit (Profibus, WorldFIP) or implicit (P-Net) form. Carrier-sense multiple access (CSMA) is mostly used in a variant that tries to avoid collisions by either the dynamic adaptation of retry waiting times (LonWorks) or the use of asymmetric signaling strategies (CAN, EIB). Especially for real-time applications, time-division multiple-access (TDMA)-based strategies are employed (TTP [time-triggered protocol], but also Interbus). In many cases, the lower two layers are implemented with application-specific integrated circuits (ASICs) for performance and cost reasons. As a side benefit, the preference of dedicated controllers over software implementations also improves interoperability of devices from different manufacturers. An essential part of fieldbus protocol stacks is comprehensive application layers. They are indispensable for open systems and form the basis for interoperability. Powerful application layers offering abstract functionalities to the actual applications, however, require a substantial software implementation effort, which can negatively impact the protocol processing time and also the costs for a fieldbus interface. This is why in many cases (like Interbus or CAN) an application layer was originally omitted. While the application areas were often regarded as limited in the beginning, market pressure and the desire for flexibility finally enforced the addition of higher-layer protocols, and the growing performance of controller hardware facilitated their implementation. Network management inside fieldbus protocols is traditionally not very highly developed. This stems from the fact that a fieldbus normally is not designed for the setup of large, complex networks. There are exceptions, especially in building automation, which consequently needs to provide more elaborated functions for the setup and maintenance of the network. In most cases, however, the flexibility and functionality of network management is adapted to the functionality and application area of the individual fieldbus. There are systems with comparatively simple (ASi, Interbus, P-Net, J1939) and rather complex management functions (Profibus-FMS, WorldFIP, CANopen, LonWorks, EIB). The latter are typically more flexible in their application range but need more efforts for configuration and commissioning. In any case, network management functions are normally not explicitly present (in addition to the protocol stack, as suggested by the OSI model), but rather included in the protocol layers (mostly the application layer).
7.5.2 Communication Paradigms The characteristic properties of the various data types inside a fieldbus system differ strongly according to the processes that must be automated. Application areas like manufacturing, processing, and building automation pose different timing and consistency requirements that are not even invariant and consistent within the application areas [46]. Typical examples for different timing parameters are continuous measurement data that are sampled and transmitted in discrete-time fashion and form the basis for continuous process control and monitoring (like temperature, pressure, etc.). Other data are typically event based; i.e., they need transmission only in case of status changes (like switches, limit violations,
© 2005 by CRC Press
7-18
The Industrial Communication Technology Handbook
TABLE 7.6
Properties of Communication Paradigms
Communication relation Communication type Master–slave relation Communication service Application classes
Client–Server Model
Producer–Consumer Model
Publisher–Subscriber Model
Peer to peer Connection oriented Monomaster, multimaster Confirmed, unconfirmed, acknowledged Parameter transfer, cyclic communication
Broadcast Connectionless Multimaster Unconfirmed, acknowledged
Multicast Connectionless Multimaster Unconfirmed, acknowledged
Event notification, alarms, error, synchronization
State changes, event-oriented signal sources (e.g., switches)
etc.). As far as consistency is concerned, there are on the one hand process data that are continuously updated and on the other hand parameterization data that are transferred only upon demand. In case of error, the former can easily be reconstructed from historical data via interpolation (or simply be updated by new measurements). The systemwide consistency of configuration data, on the other hand, is an important requirement that cannot be met by mechanisms suitable for process data. These fundamental differences led to the evolution of several communication paradigms that are used either individually or in combination. The applicability in different fieldbus systems is quite different because they require various communication services and media access strategies. The three basic paradigms are: • Client–server model • Producer–consumer model • Publisher–subscriber model The most relevant properties of these three are summed up in Table 7.6. The overview shows that processes with mostly event-based communication can get along very well with producer–consumer-type communication systems, especially if the requirements concerning dynamics are not too stringent. The obvious advantage is that all connected devices have direct access to the entire set of information since the broadcasting is based on identification of messages rather than nodes. Reaction times on events can be very short due to the absence of slow polling or token cycles. Generally, producer–consumer-type systems (or subsystems) are necessarily multimaster systems because every information source (producer) must have the possibility to access the bus. The selection of relevant communication relationships is solely based on message filtering at the consumer’s side. Such filter tables are typically defined during the planning phase of an installation. The publisher–subscriber paradigm uses very similar mechanisms; the only difference is that multicast communication services are employed. The subscribers are typically groups of nodes that listen to information sources (publishers). Relating publishers and subscribers can be done online. As both paradigms are message based and therefore connectionless on the application layer, they are not suited for the transmission of sensitive, nonrepetitive data such as parameter and configuration values or commands. Connectionless mechanisms can inform the respective nodes about communication errors on layer 2, but not about errors on the application layer. The client–server paradigm avoids this problem by using connection-oriented information transfer between two nodes with all necessary control and recovery mechanisms. The communication transfer itself is based on confirmed services with appropriate service primitives (request, indication, response, confirm) as defined in the OSI model. Basically, a client–server-type communication can be implemented in both mono- and multimaster systems. In the latter cases (CSMA- and token-based systems) every master can take on the role of a client, whereas in monomaster systems (polling based) this position is reserved for the bus master. Consequently, the client–server paradigm is used mainly for monomaster systems as well as generally for discrete-time (cyclic) information transfer and for reliable data transfer on the application level (e.g., for parameterization data).
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-19
It is a characteristic feature of fieldbus systems that they do not adhere to single communication paradigms, but support a mix of strategies on different levels of sophistication. Examples for typical client–server systems are Interbus, Profibus, P-Net, and ASi. Broadcast services are here only used for special cases like synchronization purposes. Likewise, there are special ways of receiving messages (e.g., direct slave-to-slave communication) that require temporary delegation of certain bus master aspects. The other two paradigms are widely used in systems like CAN, CANopen, DeviceNet, ControlNet, EIB, and LonWorks. Yet, these systems also employ the client–server paradigm for special functions such as node configuration, file transfer, or the like.
7.5.3 Above the OSI Layers: Interoperability and Profiles A key point for the acceptance of open fieldbus systems was the possibility to interconnect devices of different vendors. Multivendor systems and interoperability are still important arguments in fieldbus marketing. The standardization of fieldbuses was originally thought to be sufficient for interoperable systems, but reality quickly showed that it was not. Standards often leave room for interpretation, and implementations may vary, even if they conform to the standard. Certification of the devices is a suitable way to reduce the problems, but by no means a guarantee. Another reason for troubles is that the semantics of data objects are not precisely defined. This problem has been disregarded in many cases until recently. In fact, it is not a problem of the fieldbus itself, but of the application. Consequently, it must be tackled beyond the ISO/OSI model. The definition of appropriate profiles (or companion standards in MMS) addresses this problem. The creation of profiles originated from the recognition that the definition of the protocol layers alone is not sufficient to allow for the implementation of interoperable products, because there are simply too many degrees of freedom. Therefore, profiles limit the top-level functionality and define specialized subsets for particular application areas [47]. Likewise, they specify communication objects, data types, and their encoding. So they can be seen as an additional layer on top of the ISO/OSI model, which is why they have also been called layer 8 or user layer. One thing to be kept in mind is that nodes using them literally form islands on a fieldbus, which contradicts the philosophy of an integrated, decentralized system. Different profiles may coexist on one fieldbus, but communication between the device groups is normally very limited or impossible. From a systematic viewpoint, profiles can be distinguished into communication, device, and branch profiles. A bus-specific communication profile defines the mapping of communication objects onto the services offered by the fieldbus. A branch profile specifies common definitions within an application area concerning terms, data types, and their coding and physical meaning. Device profiles build on communication and branch profiles and describe functionality, interfaces, and in general the behavior of entire device classes such as electric drives, hydraulic valves, and simple sensors and actuators. The work of defining profiles is scattered among different groups. Communication profiles are usually in the hands of fieldbus user groups. They can provide the in-depth know-how of the manufacturers, which is indispensable for bus-specific definitions. Device and branch profiles are increasingly a topic for independent user groups. For them, the fieldbus is just a means to an end — the efficient communication between devices. What counts more in this respect is the finding and modeling of uniform device structures and parameters for a specific application. This forms the basis for a mapping to a communication system that is generic within a given application context. The ultimate goal is the definition of fieldbus-independent device profiles [47]. This is an attempt to overcome on a high level the still overwhelming variety of systems. Finally, such profiles are also expected to facilitate the employment of fieldbus systems by the end user, who normally is only concerned about the overall functionality of a particular plant — and not about the question of which fieldbus to use. The methods used to define data types, indices, default values, coding and meanings, identification data, and device behavior are based on functional abstractions (most promising are currently function blocks [43, 48]) and universal modeling techniques [49]. A first step in the direction of fieldbus harmonization
© 2005 by CRC Press
7-20
The Industrial Communication Technology Handbook
has been taken by the European research project NOAH (Network-Oriented Application Harmonization [48, 50]), the results of which are currently under standardization by IEC SC65C in project IEC 61804.
7.5.4 Management Owing to the different capabilities and application areas of fieldbus systems, fieldbus management shows varying complexity and its solutions are more or less convenient for the user. It has already been stated above that the various fieldbuses offer a wide range of management services with grossly varying levels of sophistication. Apart from the functional boundary conditions given by the protocols, fieldbus management always strongly relies on the tool support provided by the manufacturers. This significantly adds to inhomogeneity of the fieldbus world in that entirely different control concepts, user interfaces, and implementation platforms are used. Furthermore, a strict division between communication and application aspects of fieldbus management is usually not drawn. Typical communication-related management functions are bus parameter settings like address information, data rate, or timing parameters. These functions are rather low level and implicitly part of all fieldbus protocols. The user can access them via software tools mostly supplied by the device vendor. Application-related management functions concern the definition of communication relations, systemwide timing parameters (such as cycle times), priorities, or synchronization. The mechanisms and services offered by the fieldbus systems to support these functions are very diverse and should be integrated in the management framework for the application itself (e.g., the control system using the fieldbus). As a matter of fact, a common management approach for various automation networks is still not available today, and vendor-specific solutions are preferred. From the users’ point of view (which includes not only the end users, but also system integrators), this entails significantly increased costs for the buildup and maintenance of know-how because they must become acquainted with an unmanageable variety of solutions and tools. This situation actually revives one of the big acceptance problems that fieldbus systems originally had among the community of users: the missing interoperability. Communication interoperability (as ensured by the fieldbus standards) is a necessary but not sufficient precondition. For the user, handling interoperability of devices from different vendors is equally important. What is needed are harmonized concepts for configuration and management tools. As long as such concepts do not exist, fieldbus installations will typically be single-vendor systems, which is naturally a preferable situation for the manufacturers to secure their market position. With the increasing importance of LAN and Internet technologies in automation, new approaches for fieldbus management appeared that may be apt to introduce at least a common view at various fieldbuses. All these concepts aim at integrating fieldbus management into existing management applications of the higher-level network, which is nowadays typically IP based. One commonly employed high-level network management protocol is the Simple Network Management Protocol (SNMP), which can also be used to access fieldbus data points [51, 52]. Another approach involves the use of Directory Services [53]. These two solutions permit the inclusion of a large number of devices in specialized network management frameworks. An alternative that has become very popular is the use of Web technology, specifically HTTP tunneled over the fieldbus, to control device parameters. This trend is supported by the increasing availability of embedded Web servers and the use of Extensible Markup Language (XML) as a device description language [54]. The appealing feature of this solution is that no special tools are required and a standard Web browser is sufficient. However, Web pages are less suitable for the management of complete networks and rather limited to singledevice management. Nevertheless, this approach is meanwhile pursued by many manufacturers.
7.6 New Challenges: Industrial Ethernet As stated before, Ethernet has become increasingly popular in automation. And like in the early days of fieldbus systems, this boom is driven mainly by the industry — on an academic level, the use of Ethernet had been discussed decades ago. Hence, the initial situation is comparable to that of 15 years ago, and there is enough conflict potential in the various approaches to use Ethernet in automation. After all, a
© 2005 by CRC Press
7-21
Fieldbus Systems: History and Evolution
HTTP, FTP, SMTP
SNMP, TFTP
Standard Internet TCP
Fieldbus Application Protocol Fieldbus over Internet
UDP IP
Fieldbus over Ethernet Standard Fieldbus Internet over Fieldbus
Real-time extensions Fieldbus Ethernet
FIGURE 7.6 Structures of Ethernet and fieldbus combinations.
key argument for the introduction of Ethernet was its dominating role in the office world and the resulting status of a uniform network solution. It was exactly this picture of uniqueness that marketing campaigns tried to project also onto the automation world: Ethernet as the single, consistent network for all aspects. A quick look at reality, however, shows that things are different. Ethernet per se is but a solution for the two lower OSI layers, and as fieldbus history already showed, this is not sufficient. Even if the commonly used Internet protocol suite with TCP (Transport Control Protocol) and UDP (User Datagram Protocol) is taken into account, only the lower four layers are covered. Consequently, there are several possibilities to get Ethernet or Internet technologies into the fieldbus domain, all of which are actually used in practice (Figure 7.6): • • • •
Tunneling of a fieldbus protocol over UDP/TCP/IP Definition of new real-time-enabled protocols Reduction of the free medium access in standard Ethernet Tunneling of TCP/IP over an existing fieldbus
The future role of Ethernet in the automation area is not clear. Initially, Ethernet was considered inappropriate because of its lack of real-time capabilities. With the introduction of switched Ethernet and certain modifications of the protocol, however, these problems have been alleviated. And even if there are still doubts about the predictability of Ethernet [55], its penetration into the real-time domain will influence the use of fieldbus-based devices and most likely restrict the future use of fieldbus concepts [56]. Today, Ethernet already takes the place of midlevel fieldbus systems, e.g., for the connection of PLCs. There exist first applications in manufacturing and building automation where no other fieldbuses are installed but Ethernet. To replace the existing lower-level fieldbuses by Ethernet and TCP/UDP/IP, more efforts are needed. One critical issue is (hard) real time, and there exist already different solutions to make Ethernet and TCP/IP meet the requirements of industrial applications [57]. One step below, on the sensor–actuator level, cost and implementation complexity are the most important factors. At the moment, fieldbus connection circuits for simple devices, often only one ASIC, are still cheaper than Ethernet connections. However, with modifications and simplifications of the controller hardware and the protocol implementations, Ethernet could finally catch up and become an interesting option.
7.6.1 Ethernet in IEC 61158 Only recently has standardization begun to deal with the question of Industrial Ethernet. Still, in the wake of the fieldbus wars, several solutions based on Ethernet and TCP/UDP/IP have made their way into the IEC 61158 standard without much fighting (see also Table 7.4):
© 2005 by CRC Press
7-22
The Industrial Communication Technology Handbook
• • • •
High-speed Ethernet (HSE) of the Foundation Fieldbus EtherNet/IP of ControlNet and DeviceNet PROFInet defined by Profibus International TCP/IP over Interbus
HSE and EtherNet/IP (note that here IP stands for Industrial Protocol) are two solutions with a fieldbus protocol being tunneled over TCP/IP. To be specific, it is no real tunneling, where data packets of a lower fieldbus OSI layer are wrapped in a higher-layer protocol of the transport medium. Instead, the same application layer protocol, which is already defined for the fieldbus, is also used over the TCP/IP or UDP/IP stack. In the case of ControlNet and DeviceNet, this is the Control and Information Protocol [58]. This solution allows the device manufacturers to base their developments on existing and well-known protocols. The implementation is without any risk and can be done fast. The idea behind PROFInet is more in the direction of implementing a new protocol. For the actual communication, however, it was decided to use the component object model (COM)/distributed component object model (DCOM) mechanism known from the Windows world. This solution opens a wide possibility of interactions with the office IT software available on the market. The possibility to use fieldbus devices like objects in office applications will increase the vertical connectivity. On the other hand, this also includes the risk of other applications overloading the network, which has to be avoided. Basically, the COM/DCOM model defines an interface to use modules as black boxes within other applications. PROFInet offers a collection of automation objects with COM interfaces independent of the internal structure of the device. So the devices can be virtual, and the so-called proxy servers can represent the interfaces of any underlying fieldbus. This encapsulation enables the user to apply different implementations from different vendors. The only thing the user has to know is the structure of the interface. Provided the interfaces of two devices are equal, the devices are at least theoretically interchangeable. Although this proxy mechanism allows the connection of the Ethernet to all types of fieldbus systems, it will not be a simple and real-time-capable solution. A second problem is that in order to achieve portability, the COM/DCOM mechanism has to be reprogrammed for different operating systems. DCOM is tightly connected to the security mechanisms of Windows NT, but there is also the possibility of using WIN95/98 systems or — with restrictions — some UNIX systems. To simplify this, the PROFInet runtime system includes the COM/DCOM functionality, and the standard COM/DCOM functions inside the operating system have to be switched off if PROFInet is used. The solution of tunneling TCP/IP over a fieldbus requires some minimum performance in terms of throughput from the fieldbus to be acceptable. Normally, throughput of acyclic data (the transport mechanism preferably used in this case) is not the strongest point of fieldbus systems. Nevertheless, Interbus defines the tunneling of TCP/IP over its acyclic communication channel [59]. The benefit of this solution is the parameterization of devices connected to the fieldbus with standard Internet services and well-known tools, e.g., a Web browser. This approach opens the possibility of achieving a new quality of user interaction, as well as a simpler integration of fieldbus management into existing high-level systems. On the downside, however, it forces the manufacturer of the field device to also implement the complete TCP/IP stack, maybe together with a Web server, on the device and the installation personnel to handle the configuration of the IP addressing parameters.
7.6.2 Real-Time Industrial Ethernet The Industrial Ethernet solutions discussed so far build on Ethernet in its original form; i.e., they use the physical and data link layers of ISO/IEC 8802-3 without any modifications. Furthermore, they assume that Ethernet is low loaded or Fast Ethernet switching technology is used, in order to get a predictable performance. Switching technology does eliminate collisions, but delays inside the switches and lost packages under heavy load conditions are unavoidable with switches [60]. This gets worse if switches are used in a multilevel hierarchy and may result in grossly varying communication delays. The real-time
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-23
capabilities of native Ethernet are therefore limited and must rely on application-level mechanisms controlling the data throughput. For advanced requirements, like drive controls, this is not sufficient. These known limitations of conventional Ethernet stimulated the development of several alternative solutions that were more than just adaptations of ordinary fieldbus systems. These entirely new approaches were originally outside the IEC standardization process, but are now candidates for inclusion in the real-time Ethernet (RTE) standard, i.e., the second volume of IEC 61784. The initial and boundary conditions for the standardization work, which started in 2003, are targeted at backward compatibility with existing standards. First of all, RTE is seen as an extension to the Industrial Ethernet solutions already defined in the communication profile families in IEC 61784-1. Furthermore, coexistence with conventional Ethernet is intended. The scope of the working document [61] states that “the RTE shall not change the overall behavior of an ISO/IEC 8802-3 communication network and their related network components or IEEE 1588, but amend those widely used standards for RTE behaviors. Regular ISO/IEC 8802-3 based applications shall be able to run in parallel to RTE in the same network.” Reference to the time distribution standard IEEE 1588 [62] is made because it will be the basis for the synchronization of field devices. The work program of the RTE working group essentially consists of the definition of a classification scheme with RTE performance classes based on actual application requirements [63]. This is a response to market needs that demand scalable solutions for different application domains. One possible classification structure could be based on the reaction time of typical applications in automation: • A first low-speed class with reaction times around 100 ms. This timing requirement is typical for the case of humans involved in the system observation (10 pictures per second can already be seen as a low-quality movie), for engineering, and for process monitoring. Most processes in process automation and building control fall into this class. This requirement may be fulfilled with a standard system with a TCP/IP communication channel without many problems. • In a second class the requirement is a reaction time below 10 ms. This is the requirement for most tooling machine control systems like PLCs or PC-based control. To reach this timing behavior, special care has to be taken in the RTE equipment: sufficient computing resources are needed to handle TCP/IP in real-time or the protocol stack must be simplified and reduced to get these reaction times on simple, cheap resources. • The third and most demanding class is defined by the requirements of motion control: to synchronize several axes over a network, a time precision well below 1 ms is needed. Current approaches to reach this goal rely on modifications of both protocol medium access and hardware structure of the controllers. These classes will then be the building blocks for additional communication profiles. The intended structural resemblance to the fieldbus profiles is manifested by the fact that the originally attributed document number IEC 62391 was changed to IEC 61784-2. The technological basis for the development will mostly be switched Ethernet. At the moment there are several systems that have the potential to fulfill at least parts of such an RTE specification and that are already introduced on the market or will be shortly. From these systems, three are extensions to fieldbuses already contained in IEC 61784: EtherNet/IP: Defined by Rockwell and supported by Open DeviceNet Vendor Association (ODVA) and ControlNet International, EtherNet/IP makes use of the Common Industrial Protocol (CIP), which is common to the networks EtherNet/IP, ControlNet, and DeviceNet. CIP defines objects and their relations in different profiles and fulfills the requirements of class 1 on EtherNet/IP. As such, it is part of IEC 61784-1. With the CIP Sync extensions it is possible to get isochronous communication that satisfies class 2 applications. These extensions use 100 MBit/s networks with the help of IEEE 1588 time synchronization. PROFInet: Defined mainly by Siemens and supported by Profibus International. Only the first version is currently included in the international fieldbus standard. A second step was the definition of
© 2005 by CRC Press
7-24
The Industrial Communication Technology Handbook
TABLE 7.7 IEC 61784
Industrial Ethernet Profiles Defined in
IEC 61784 Profile
Volume
Brand Names
CPF-1 CPF-2 CPF-3 CPF-6 CPF-10 CPF-11 CPF-12 CPF-13 CPF-14 CPF-15
1 1, 2 1, 2 1, 2 2 2 2 2 2 2
Foundation Fieldbus EtherNet/IP PROFInet Interbus VNET/IP TCnet EtherCAT EPL (Ethernet Powerlink) EPA Modbus
a soft real-time (SRT) solution for PROFInet IO. In this version class 2 performance is also reached for small and cheap systems by eliminating the TCP/IP stack for process data. I/O data are directly packed into the Ethernet frame with a specialized protocol. Class 3 communication is reached with a special switch ASIC with a short and stable cut-through time and special priority mechanism for real-time data [64]. Synchronization is based on an extension of IEEE 1588 using onthe-fly time stamping, an idea that has been introduced in a different context [65]. The first application planned for PROFInet isochronous real time (IRT) is the PROFIdrive profile for motion control applications. Interbus: Will also have an RTE extension, which will be identical to PROFInet. Still, it will be listed as a separate profile. Apart from these approaches that merely extend well-known fieldbus systems, there is a multitude of new concepts collected in IEC 61784-2 (Table 7.7), not all of which were known in detail at the time of this writing: VNET/IP: Developed by Yokogawa. The real-time extension of this protocol is called RTP (Real-Time and Reliable Datagram Protocol). Like many others, it uses UDP as a transport layer. Characteristic for the approach are an optimized IP stack (with respect to processing times) and a concept for redundant network connections. TCnet: A proposal from Toshiba. Here, the real-time extension is positioned in the medium access control (MAC) layer. Also, a dual redundant network connection is proposed, based on shared Ethernet. EtherCAT: Defined by Beckhoff and supported by the Ethercat Technology Group (ETG), EtherCAT uses the Ethernet frames and sends them in a special ring topology [66]. Every station in the net removes and adds its information. This information may be special input/output data or standard TCP/IP frames. To realize such a device, a special ASIC is needed for medium access that basically integrates a two-port switch into the actual device. The performance of this system is very good: it may reach cycle times of 30 µs. Powerlink: Defined by B&R and now supported by the Ethernet Powerlink Standardization Group (EPSG). It is based on the principle of using a master–slave scheduling system on top of a regular shared Ethernet segment [67]. The master ensures the real-time access to the cyclic data and lets standard TCP/IP frames pass through only in specific time slots. To connect several segments, a synchronization based on IEEE 1588 is used. This solution is the only product available on the market that already fulfills the class 3 requirements today. In the future, the CANopen drive profiles will be supported. EPA (Ethernet for Process Automation) protocol: A Chinese proposal. It is a distributed approach to realize deterministic communication based on a time-slicing mechanism.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-25
Modbus/TCP: Defined by Schneider Electric and supported by Modbus-IDA,* Modbus/TCP uses the well-known Modbus over a TCP/IP network. This is probably the most widely used Ethernet solution in industrial applications today and fulfills the class 1 requirements without problems. Modbus/TCP was — contrary to all other fieldbus protocols — submitted to Internet Engineering Task Force (IETF) for standardization as an RFC (request for comments) [68]. The real-time extensions use the Real-Time Publisher–Subscriber (RTPS) protocol, which runs on top of UDP. Originally outside the IEC SC65C was SERCOS, well known for its optical ring interface used in drive control applications. SERCOS III, also an Ethernet-based solution, is under development [69]. The ring structure is kept and the framing replaced by Ethernet frames to allow easy mixture of real-time data with TCP/IP frames. In every device a special software or, for higher performance, an application-specific integrated circuit will be needed that separates the real-time time slot from the TCP/IP time slot with a switch function. Recently, cooperation between the committee working on SERCOS and SC65C has been established to integrate SERCOS in the RTE standard. The recent activities of IEC SC65C show that there is a substantial interest, especially from industry, in the standardization of real-time Ethernet. This situation closely resembles fieldbus standardization at the beginning of the 1990s, which ultimately led to the fieldbus wars. Given the comparable initial situation, will history repeat itself? Most likely not, because the structure of the intended standard documents already anticipates a multipart solution. So, the compromise that in former days needed so long to be found is already foreseen this time. Furthermore, the big automation vendors have learned their lessons to allow them to avoid time- and resource-consuming struggles that eventually end up in compromises anyway. Finally, the IEC itself cannot afford a new standardization war that would damage its image. Hence, all parties involved should have sufficient interest for the standardization process to be smooth and fast without too much noise inside the committees. Another evidence for this attitude is that the CENELEC committee TC65CX explicitly decided not to carry out standardization on the European level, but to wait for the outcome of the IEC work. The final standard is expected in 2007.
7.7 Aspects for Future Evolution Even though fieldbus systems have reached a mature state, applications have become more demanding, which in turn creates new problems. Much work is still being done to improve the fieldbus itself, in particular concerning transmission speed and the large area of real-time capabilities [46, 70]. Another subject receiving considerable attention is the extension of fieldbuses to wireless physical layers [71, 72]. Apart from such low-level aspects, other problems are lurking on the system and application levels.
7.7.1 Driving Forces Historically, the most important driving forces behind the development of fieldbus systems were the reduction of cabling and the desire to integrate more intelligence into the field devices. At least in Europe, the general need for automation, of which fieldbus systems are an integral part, also had a socioeconomic reason. Raising production costs due to comparatively high wages required a higher degree of automation to stay competitive in an increasingly globalized market. The enabling technology for automation was, of course, microelectronics. Without the availability of highly integrated controllers, the development of fieldbus systems would have never been possible. Today’s driving forces for further evolution mainly come from the application fields that will be reviewed. Nevertheless, there are also technology push factors that promote the application of new *IDA, Interface for Distributed Automation, a consortium that originally worked on an independent solution, but finally merged with Modibus.
© 2005 by CRC Press
7-26
The Industrial Communication Technology Handbook
technologies, mainly at the lower layers of communication (e.g., Ethernet). It must not be overlooked, however, that these factors are to a certain extent marketing driven and aim at the development of new market segments or the redistribution of already existing ones. One important factor is what has recently become known as vertical integration. It concerns the possibly seamless interconnection between the traditional fieldbus islands and higher-level networks. The driving force behind this development is that people have become used to the possibility of accessing any information at any time over the Internet. Computer networks in the office area have reached a high level of maturity. Moreover, they are (quasi) standards that permitted worldwide interconnectivity and — even more important — easy access and use for nonspecialists. Hence, it is not astonishing that the anytime–anywhere concept is also extended to fieldbuses and automation systems in general. A common solution today is to have the coexistence of real-time fieldbus traffic and not time-critical tasks like configuration and parameterization based on, e.g., user-friendly Web-based services on the same communication medium. This becomes possible by the use of embedded Web servers in the field devices and the tunneling of TCP/IP over the fieldbus. Other approaches employ gateways to translate between the two worlds. In the near future, the increased use of Ethernet on the field level is supposed to further alleviate network integration, even though it will not be able to solve all problems. Another driving force for the development of new concepts comes from the area of building automation. Although networks in this field emerged relatively late compared with industrial automation, the benefits are evident: the operating costs of a building can be reduced dramatically, if information about the status of the building is available for control purposes. This concerns primarily the energy consumption, but also service and maintenance costs. Energy control is a particularly interesting topic. Provided electrical appliances are interconnected via a fieldbus, they can adjust their energy consumption so as to balance the overall load [73, 74]. This demand-side management avoids peak loads, which in turn is honored by the utility companies with lower energy prices. Even more important will be the combination of fieldbuses in buildings (and also private homes) with Internet connections. This is a particular aspect of vertical integration and opens a window for entirely new services [75]. External companies could offer monitoring and surveillance services for private houses while the owners are on vacation. Currently, such services already exist, but are limited to company customers (mostly within the context of facility management). A very important topic for utility companies in many countries is remote access to energy meters [76]. Having an appropriate communication link, they can more precisely and with finer granularity monitor the actual energy consumption of their customers, detect possible losses in the network, and better adapt their own productions and distributions. As a side benefit, billing can be automated and tariffs can be made more flexible when load profiles can be recorded. Eventually, if the energy meters support prepayment, even billing is no longer necessary. An application field that is becoming increasingly relevant for networks is safety-relevant systems. As this domain is subject to very stringent normative regulations, and thus very conservative, it was dominated for a long time (and still is) by point-to-point connections between devices. The first bus system to penetrate this field was the CAN-based safety bus [77]. It took a long time and much effort for this system to pass the costly certification procedures. Nevertheless, it was finally accepted by the users, which was by no means obvious in an area concerned with the protection of human life, given that computer networks usually have the psychological disadvantage of being considered unreliable. After this pioneering work, other approaches like the ProfiSafe profile [78], Interbus safety [79], ASi safety [80], and recently EtherNet/IP safety [81] and WorldFIP [82] readily followed. The next big step is just ahead in car manufacturing, where in-vehicle networks in general and x-by-wire technology in particular will become determining factors [83]. Here, safety is of even more obvious relevance, and the latest developments of fieldbus systems for automotive use clearly address this issue. In the current Industrial Ethernet standardization process, safety considerations also play an important role. Microelectronics will continue to be the primary enabling technology for automation networks. Increasing miniaturization and the possibility to integrate more and more computing power while at the same reducing energy consumption will be the prerequisite for further evolution. Today, system-on-a-
© 2005 by CRC Press
7-27
Fieldbus Systems: History and Evolution
3–6 nodes physical layer error control
6–12 nodes networks software tools
~ 1985
~ 1990
up to 20,000 nodes Profiles Plug & Play Internet
~ 2000
up to 1,000,000 nodes complex systems agent-based approaches
~ 2015
FIGURE 7.7 With increasing complexity of fieldbus installations, the important topics in research and practice change.
chip (SoC) integration of a complete industrial PC with Ethernet controller, on-chip memory, and a complete IP stack as firmware is available. Of course, the computing resources of such integrated solutions cannot be compared with high-end PCs, but they are sufficient for smart and low-cost sensors and actuators. This evolution is, on the one hand, the foundation of the current boom of Ethernet in automation. On the other hand, it will stimulate more research in the emerging field of sensor networks [84]. Currently most of the effort in this area is being put into wireless networking approaches, but it can be expected that work on other aspects will gain importance in the future. From an application point of view, other emerging fields like ubiquitous computing [85] or concepts inspired by bionics [86] will also rely on low-level networking as an essential technological cornerstone.
7.7.2 System Complexity If we consider the evolution of fieldbus systems, we observe a very interesting aspect. Until the mid-1990s, the developers of fieldbus systems concentrated on the definition of efficient protocols. Since the computing resources in the field devices were limited and the developers did not expect fieldbuses to have a complex network structure, most protocols only use the lower two or three layers and the top layer of the OSI/ISO model. In those days, typical applications in industrial automation had only about six nodes on average, so the assumption of not-so-complex structures was justified. With the availability of more fieldbus devices and a growing acceptance of the technology, the number of nodes in a typical installation has also increased. A decade ago, the average application in industrial automation had 6 to 12 nodes. With time, however, it turned out that the main costs of fieldbus systems were determined not so much by the development of the nodes, but rather by the maintenance of the node software, as well as the software tools necessary to integrate and configure the network. Actually, the development of a fieldbus system means much more than just designing a clever protocol and implementing a few nodes — an aspect that was often underrated in the past. More important for the success of a fieldbus is the fact that a user-friendly configuration and operating environment is available. This was, by the way, a strong argument in favor of open systems, where the development of field devices and software tools can be accomplished by different companies. For proprietary systems, by contrast, the inventor must supply both devices and software, which is likely to overstrain a single company. Today, the number of nodes per installation is increasing dramatically. The enormous numbers shown in Figure 7.7 are of course not found in industrial automation, but in the area of building automation, where installations with 20,000 or more nodes are nowadays feasible. This evolution goes hand in hand with the advances of sensor networks in general. If we extrapolate the experience from other fields of computer technology, we can try to sketch the future evolution: the prices of the nodes will fall, and at the same time the performance will increase, allowing for the integration of more and more intelligence into the individual node. This way, we can have complex networks with up to 1 million nodes working together. Such complex systems will be the challenge for the next decades. It is evident that applications in such systems must be structured differently from today’s approaches. What is required is a true distribution of the application. A promising concept is holonic systems that have been thoroughly investigated in manufacturing systems [87, 88]. A holonic system consists of
© 2005 by CRC Press
7-28
The Industrial Communication Technology Handbook
distributed, autonomous units (holons) that cooperate to reach a global goal. In artificial intelligence, the same concept is better known as a multiagent system. Such agents could be an interesting way to cope with complex systems [89, 90]. The main problem, however, will be to provide tools that can support the user in creating the distributed application. A problem directly connected with system complexity is installation and configuration support through some plug-and-play capability. The ultimate meaning here is that new nodes can be attached to an existing network and integrate themselves without further input from the user. Realistically, this will remain only an appealing vision, as the user will always have to define at least the semantics of the information flow (i.e., in the trivial case of building automation, which switch is associated with which lamp), but nodes will have to be much more supportive than they are today. To date, the concepts for plug and play or at least plug and participate are at a very early stage. There are exemplary solutions for the automatic configuration of Profibus-DP devices [91] based on a manager–agent model inspired by management protocols like SNMP or the management framework of the ISO/OSI model. Here, a manager controls the status of the fieldbus and initiates the start-up and commissioning of the system in cooperation with the agents on the individual devices. The necessary data are kept in a (distributed) management information base (MIB). Service broker approaches, such as Jini [92], could also be a suitable approach to tackle the problem of plug and play. The goal of Jini is to make distributed resources in a client–server network accessible. The term resource has a very abstract meaning and is composed of both hardware and software. To locate the resources in the network, services offered, as well as service requests, are published by the nodes and matched by the service broker [93]. A problem of Jini is that it builds on the relatively complex programming language Java. Hence, all Jini-enabled devices need to have a Java Virtual Machine as an interpreter, which is rather computing intensive. Jini is well developed today; however, hardware support still does not exist, and the breakthrough in smart devices as originally intended is not in sight. Competing approaches like Universal Plug and Play (UPnP) are catching up, but it is also questionable whether they will be suitable for complex systems.
7.7.3 Software Tools and Management The fieldbus as a simple means to communication is only one part of an automation system. Today, it is the part that is best understood and developed. What becomes increasingly a problem, especially with increasing complexity, is the support through software tools. Historically, such tools are provided by the fieldbus vendors or system integrators and are as diverse as the fieldbuses themselves. Moreover, there are different (and often inconsistent) tool sets for different aspects of the life cycle of a plant, like planning, configuration, commissioning, testing, and diagnosis or maintenance. Such tools typically only support a topological view on the installation, whereas modern complex systems would require rather functionality-oriented, abstract views. A major disadvantage of the tool variety is that they operate in many cases on incompatible databases, which hampers system integration and is likely to produce consistency problems. More advanced concepts build on unified data sets that present consistent views to the individual tools with well-defined interfaces [94, 95]. The data structures are nevertheless still specific for each fieldbus. Unification of the data representations is one of the goals of NOAH [50]. For fieldbus-independent access to the field devices and their data (not necessarily covering the entire life cycle), several solutions have been proposed. They mostly rely on a sort of middleware abstraction layer using object-oriented models. Examples are OPC (OLE for Process Control) [96], Java, and other concepts [97]. Such platforms can ultimately be extended through definition of suitable application frameworks that permit the embedding of generic or proprietary software components in a unified environment spanning all phases of the life cycle. Relevant approaches are, e.g., Open Control [95], Field Device Tool [98], and a universal framework of the ISO [99]. Beyond pure communication management, in the application domain, essential aspects of engineering and management are also not yet universally solved. The ample computing resources of modern field devices, however, allow the introduction of new and largely fieldbus-independent concepts for the modeling of applications. A promising development are function blocks, standardized in IEC 61499
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-29
[100]. Historically evolved as an extension to the PLC programming standard IEC 61131, they can be used to create a functional view (rather than a topological one) on distributed applications. The function block concept integrates the models known from PLCs in factory automation, as well as typical functions from process automation that are in many fieldbuses available as proprietary implementations. With its universal approach, it is also a good option for the implementation of fieldbus profiles. In the context of management and operation frameworks, the unified description of device and system properties becomes of eminent importance. To this end, device description languages were introduced. The descriptions of the fieldbus components are mostly developed by the device manufacturer and are integral parts of the products. Alternatively, they are contained in libraries where they can be downloaded and parsed for further use. Over the years, several mutually incompatible languages and dialects were developed [101, 102]. This is not surprising, as device descriptions are the basis for effective installation and configuration support. Thus, they are a necessary condition for the already discussed plug-and-play concepts. In recent years, the diversity of description languages is being addressed by the increased usage of universal languages like XML [103, 104], which is also the basis for the electronic device description language (EDDL) standardized in IEC 61804 [105, 106].
7.7.4 Network Interconnection and Security Security has never been a real issue in conventional fieldbus systems. This is understandable in so far as fieldbuses were originally conceived as closed, isolated systems, which raised no need for security concepts. In building automation, where networks are naturally larger and more complex, the situation is different, and at least rudimentary security mechanisms are supported [107]. In factory and process automation, things changed with the introduction of vertical integration and the interconnection of fieldbuses and office-type networks. In such an environment, security is an essential topic on all network levels. Given the lack of appropriate features on the fieldbus level, the development and application of security concepts is typically confined to the actual network interconnection [108, 109]. One important aspect is that popular firewalls are not sufficient to guarantee security. Likewise, encryption is no cure-all, albeit an important element of secure systems. To reach a meaningful security level, a thorough risk analysis is the first step. On this basis, a security strategy needs to be developed detailing all required measures, most of which are organizational in nature. In practice, one will face two major problems: (1) the additional computational effort for security functions on the field devices (e.g., for cryptographic function), which may contradict real-time demands; and (2) the logistical problem of distributing and managing the keys whose secrecy forms the basis of every security policy. Both problems can — to a certain extent — be tackled with the introduction of security tokens such as smart cards [107]. With the introduction of Ethernet in automation, a reconsideration of field-level security is also possible. This is facilitated by the fact that many Industrial Ethernet solutions use IP and the Internet transport protocols UDP and TCP on top of Ethernet, which means that standard security protocols like Transport Layer Security (TLS) [110] can be used. One should recognize, however, that there are other approaches that use proprietary protocols above Ethernet, and that Ethernet per se is not the layer where security features can be reasonably implemented. The fact that automation networks do not have security features up to now is also reflected in the recent standardization work of IEC SC65C WG13. Unlike other working groups, where the aim of the members is to get concrete proposals of established systems into the standards, no ready-to-use proposals exist. Apart from general considerations, the work has to be started largely from scratch. There is, however, related work in other fields that is being considered: • IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, maintained by IEC/SC 65A. Functional safety is in principle covered by the work of WG 12, but the common understanding is that safety-related systems necessarily have security aspects. • Work being done in IEC TC57/WG15: Power systems management and associated information exchange/data and communication security. • ISO/IEC 17799: Code of Practice for Information Security Management.
© 2005 by CRC Press
7-30
The Industrial Communication Technology Handbook
• ISO/IEC 15408: Common Criteria for IT Security Evaluation. • ISA SP99: Manufacturing and Control Systems Security. It can be expected that this U.S. activity will have significant influence on the WG 13 work. • AGA/GTI 12: Cryptographic Protection of SCADA Communications. • NIST PCSRF: Process Control Security Requirements Forum.
7.8 Conclusion and Outlook Fieldbus systems have come a long way from the very first attempts of industrial networking to contemporary highly specialized automation networks. What is currently at hand — even after the selection process during the last decade — nearly fully covers the complete spectrum of possible applications. Nevertheless, there is enough evolution potential left [70, 86]. On the technological side, the communication medium itself allows further innovations. Up to now, the focus has been on wired links, twisted pair being the dominant solution. Optical media have been used comparatively early for large distances and electromagnetically disturbed environments. Recently, plastic optical fibers have reached a status of maturity that allows longer cable lengths and smaller prices. Another option, especially for building automation, is the use of electrical power distribution lines. This possibility, although tempting in principle, is still impaired by bad communication characteristics of the medium. Substantial research effort will be needed to overcome these limitations, which in fact comes down to a massive use of digital signal processing. The most promising research field for technological evolution is the wireless domain. The benefits are obvious: no failure-prone and costly cabling and high flexibility, even mobility. The problems, on the other hand, are also obvious: very peculiar properties of the wireless communication channel must be dealt with, such as attenuation, fading, multipath reception, temporarily hidden nodes, and the simple access for intruders [71]. Wireless communication options do exist today for several fieldbuses [72]. Up to now, they have been used just to replace the conventional data cable. A really efficient use of wireless communication, however, would necessitate an entire redefinition of at least the lower fieldbus protocol layers. Evaluation of currently available wireless technologies from the computer world with respect to their applicability in automation is a first step in this direction. Ultimately we can expect completely new automation networks optimized for wireless communication, where maybe only the application layer protocol remains compatible with traditional wired solutions to achieve integration. Apart from mere technological issues, the currently largest trend is the integration of fieldbus systems in higher-level, heterogeneous networks and process control systems. Internet technologies play a particularly prominent role here, and the penetration of the field level by optimized Ethernet solutions creates additional momentum. The ultimate goal is a simplification and possibly harmonization of fieldbus operation. For the fieldbus itself, this entails increasing complexity in the higher protocol levels. At the same time, more and more field-level applications employ standard PC-based environments and operating systems like Windows or Linux [111]. These two trends together result in a completely new structure of the automation hierarchy. The old multilevel pyramid finally turns into a rather flat structure with two, maybe three levels, as shown in Figure 7.8. Here, functions of the traditional middle layers (like process and cell levels) are transferred into the intelligent field devices (and thus distributed) or into the management level. The traditional levels may persist in the organizational structure of the company, but not in the technical infrastructure. Does all this mean we have reached the end of the fieldbus era? The old CIM pyramid, which was a starting point for the goal-oriented development of fieldbus systems, ceases to exist, and Ethernet is determined to reach down into the field level. This may indeed be the end of the road for the traditional fieldbus as we know it, but certainly not for networking in automation. What we are likely to see in the future are Ethernet- and Internet-based concepts at all levels, probably optimized to meet special performance requirements on the field level but still compatible with the standards in the management area. Below, very close to the technical process, there will be room for highly specialized sensor–actuator networks — new fieldbus systems tailored to meet the demands of high flexibility, energy optimization,
© 2005 by CRC Press
7-31
Fieldbus Systems: History and Evolution
Management Marketing, Planning
Data Server Business data
Process Information Level
Statistics
Ethernet
Company network (backbone)
Quality control PC Control Parameters
Ethernet, (Fieldbus)
Process data management PLC
Process Control Level
Visualization
Fieldbus, (Ethernet) Measurement technology, sensors, actuators, controllers
FIGURE 7.8 Flattened, two-level automation hierarchy.
small-footprint implementation, or wireless communication. The next evolution step in fieldbus history is just ahead.
Acknowledgments The author thanks Dietmar Dietrich, Kurt Milian, Eckehardt Klemm, Peter Neumann, and Jean-Pierre Thomesse for the extensive discussions, especially about the historical aspects of fieldbus systems.
References [1] International Electrotechnical Commission, IEC 61158, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2003. [2] Fieldbus Foundation, What Is Fieldbus? http://www.fieldbus.org/About/FoundationTech/. [3] G.G. Wood, Fieldbus status 1995, IEE Computing and Control Engineering Journal, 6, 251–253, 1995. [4] G.G. Wood, Survey of LANs and standards, Computer Standards and Interfaces, 6, 27–36, 1987. [5] N.P. Mahalik (Ed.), Fieldbus Technology: Industrial Network Standards for Real-Time Distributed Control, Spinger, Heidelberg, 2003. [6] H. Töpfer, W. Kriesel, Zur funktionellen und strukturellen Weiterentwicklung der Automatisierungsanlagentechnik, Messen Steuern Regeln, 24, 183–188, 1981. [7] T. Pfeifer, K.-U. Heiler, Ziele und Anwendungen von Feldbussystemen, Automatisierungstechnische Praxis, 29, 549–557, 1987. [8] H. Steusloff, Zielsetzungen und Lösungsansätze für eine offene Kommunikation in der Feldebene, Automatisierungstechnik, 855, 337–357, 1990. [9] L. Capetta, A. Mella, F. Russo, Intelligent field devices: user expectations, IEE Coll. on Fieldbus Devices: A Changing Future, 6/1–6/4, 1994.
© 2005 by CRC Press
7-32
The Industrial Communication Technology Handbook
[10] K. Wanser, Entwicklungen der Feldinstallation und ihre Beurteilung, Automatisierungstechnische Praxis, 27, 237–240, 1985. [11] J.A.H. Pfleger, Anforderungen an Feldmultiplexer, Automatisierungstechnische Praxis, 29, 205–209, 1987. [12] H. Junginger, H. Wehlan, Der Feldmultiplexer aus Anwendersicht, Automatisierungstechnische Praxis, 31, 557–564, 1989. [13] W. Schmieder, T. Tauchnitz, FuRIOS: fieldbus and remote I/O: a system comparison, Automatisierungstechnische Praxis, 44, 61–70, 2002. [14] P. Pleinevaux, J.-D. Decotignie, Time critical communication networks: field buses, IEEE Network, 2, 55–63, 1988. [15] E.H. Higham, Casting a crystal ball on the future of process instrumentation and process measurements, in IEEE Instrumentation and Measurement Technology Conference (IMTC ’92), New York, May 1992, pp. 687–691. [16] J.P. Thomesse, Fieldbuses and interoperability, Control Engineering Practice, 7, 81–94, 1999. [17] J.-C. Orsini, Field Bus: A User Approach, Cahier Technique Schneider Electric 197, 2000, http:// www.schneider-electric.com.tr/ftp/literature/publications/ECT197.pdf. [18] R.D. Quick, S.L. Harper, HP-IL: A Low-Cost Digital Interface for Portable Applications, HewlettPackard Journal, January 1983, pp. 3–10. [19] Philips Semiconductor, The I2C-Bus Specification, 2000, http://www.semiconductors.philips.com/ buses/i2c/. [20] H. Zimmermann, OSI reference model: the ISO model of architecture for open system interconnection, IEEE Transactions on Communications, 28, 425–432, 1980. [21] J. Day, H. Zimmermann, The OSI reference model, Proceedings of the IEEE, 71, 1334–1340, 1983. [22] D.J. Damsker, Asessment of industrial data network standards, IEEE Trans. Energy Conversion, 3, 199–204, 1988. [23] H.A. Schutz, The role of MAP in factory integration, IEEE Transactions on Industrial Electronics, 35, 6–12, 1988. [24] B. Armitage, G. Dunlop, D. Hutchison, S. Yu, Fieldbus: an emerging communications standard, Microprocessors and Microsystems, 12, 555–562, 1988. [25] S.G. Shanmugham, T.G. Beaumariage, C.A. Roberts, D.A. Rollier, Manufacturing communication: the MMS approach, Computers and Industrial Engineering, 28, 1–21, 1995. [26] T. Phinney, P. Brett, D. McGovan, Y. Kumeda, FieldBus: real-time comes to OSI, in International Phoenix Conference on Computers and Communications, March 1991, pp. 594–599. [27] K. Bender, Offene Kommunikation: Nutzen, Chancen, Perspektiven für die industrielle Kommunikation, in iNet ’92, 1992, pp. 15–37. [28] T. Sauter and M. Felser, The importance of being competent: the role of competence centres in the fieldbus world, in FeT ’99 Fieldbus Technology, Magdeburg, Germany, September 1999, pp. 299–306. [29] Gesmer Updegrove LLP, Government Issues and Policy, http://www.consortiuminfo.org/ government/. [30] M.A. Smith, Vienna Agreement on Technical Cooperation between ISO and CEN, paper presented at ISO/IEC Directives Seminar, Geneva, June 1995, isotc.iso.ch/livelink/livelink/fetch/2000/2123/ SDS_WEB/sds_dms/vienna.pdf. [31] International Electrotechnical Commission, IEC-CENELEC Agreement, http://www.iec.ch/about/ partners/agreements/cenelec-e.htm. [32] E. Klemm, Der Weg durch die Gremien zur internationalen Feldbusnorm, paper presented at VDE Seminar Die neue, internationale Feldbusnorm: Vorteile, Erfahrungen, Beispiele, Zukunft, November 2002, Mannheim. [33] Instrument Society of America Standards and Practices 50, Draft Functional Guidelines, March 10, 1987, document ISA-SP50-1986-17-D.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-33
[34] G.G. Wood, Current fieldbus activities, Computer Communications, 11, 118–123, 1988. [35] C. Gilson, Digital Data Communications for Industrial Control Systems or How IEC 61158 (Just) Caught the Bus, paper presented at IEC E-TECH, March 2004, http://www.iec.ch/online_news/ etech/arch_2004/etech_0304/focus.htm#fieldbus. [36] P. Leviti, IEC 61158: an offence to technicians? in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, p. 36. [37] T. Phinney, Mopping up from bus wars, World Bus Journal, 22–23, December 2001. [38] H. Engel, Feldbus-Normung 1990, Automatisierungstechnische Praxis, 32, 271–277, 1990. [39] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 4), Automatisierungstechnische Praxis, 40, S25–S28, 1998. [40] J. Rathje, The fieldbus between dream and reality, Automatisierungstechnische Praxis, 39, 52–57, 1997. [41] G.H. Gürtler, Fieldbus standardization, the European approach and experiences, in Feldbustechnik in Forschung, Entwicklung und Anwendung, Springer, Heidelberg, 1997, pp. 2–11. [42] S. Bury, Are you on the right bus? Advanced Manufacturing, 1, 26–30, 1999, http://www.advanced manufacturing.com/October99/fieldbus.htm. [43] G.G. Wood, State of play, IEE Review, 46, 26–28, 2000. [44] International Electrotechnical Commission, IEC 61784-1, Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2003. [45] International Electrotechnical Commission, IEC 61158-1, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems: Part 1: Introduction, 2003. [46] J.-P. Thomesse, M. Leon Chavez, Main paradigms as a basis for current fieldbus concepts, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 2–15. [47] C. Diedrich, Profiles for Fieldbuses: Scope and Description Technologies, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 90–97. [48] U. Döbrich, P. Noury, ESPRIT Project NOAH: Introduction, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 414–422. [49] R. Simon, P. Neumann, C. Diedrich, M. Riedl, Field devices-models and their realisations, in IEEE International Conference on Industrial Technology (ICIT ’02), Bangkok, December 2002, pp. 307–312. [50] A. di Stefano, L. Lo Bello, T. Bangemann, Harmonized and consistent data management in distributed automation systems: the NOAH approach, in IEEE International Symposium on Industrial Electronics, ISIE 2000, Cholula, Mexico, December 2000, pp. 766–771. [51] M. Knizak, M. Kunes, M. Manninger, T. Sauter, Applying Internet management standards to fieldbus systems, in WFCS ’97, Barcelona, October 1997, pp. 309–315. [52] M. Kunes, T. Sauter, Fieldbus-Internet connectivity: the SNMP approach, IEEE Transactions on Industrial Electronics, 48, 1248–1256, 2001. [53] M. Wollschlaeger, Integration of VIGO into Directory Services, paper presented at 6th International P-NET Conference, Vienna, May 1999. [54] M. Wollschlaeger, Framework for Web integration of factory communication systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes JuanLes-Pins, France, October 2001, pp. 261–265. [55] J.D. Decotignie, A perspective on Ethernet-TCP/IP as a fieldbus, in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, pp. 138–143. [56] E. Byres, Ethernet to Link Automation Hierarchy, InTech Magazine, June 1999, pp. 44–47. [57] M. Felser, Ethernet TCP/IP in automation, a short introduction to real-time requirements, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 501–504.
© 2005 by CRC Press
7-34
The Industrial Communication Technology Handbook
[58] V. Schiffer, The CIP family of fieldbus protocols and its newest member: EtherNet/IP, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 377–384. [59] M. Volz, Quo Vadis Layer 7? The Industrial Ethernet Book, no. 5, Spring 2001. [60] K.C. Lee, S. Lee, Performance evaluation of switched Ethernet for real-time industrial communications, Computer Standards and Interfaces, 24, 411–423, 2002. [61] TC65/SC65C, New work item proposal, 65C/306/NP, 2003. [62] IEEE 1588, Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, 2002. [63] TC65/SC65C, Meeting minutes, 65C/318/INF, 2003. [64] A. Boller, Profinet V3: bringing hard real-time and the IT world together, Control Engineering Europe, September 2003, http://www.manufacturing.net/ctl/article/CA318939. [65] R. Höller, G. Gridling, M. Horauer, N. Kerö, U. Schmid, K. Schossmaier, SynUTC: high precision time synchronization over Ethernet networks, in 8th Workshop on Electronics for LHC Experiments (LECC), Colmar, France, September 9–13, 2002, pp. 428–432. [66] http://www.ethercat.org/. [67] http://www.ethernet-powerlink.com/. [68] Schneider Automation, Modbus Messaging on TCP/IP Implementation Guide, May 2002, http:// www.modbus.org/. [69] E. Schemm, SERCOS to link with ethernet for its third generation, IEE Computing and Control Engineering Journal, 15, 30–33, 2004. [70] J.-D. Decotignie, Some future directions in fieldbus research and development, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [71] L. Rauchhaupt, J. Hähniche, Opportunities and problems of wireless fieldbus extensions, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [72] L. Rauchhaupt, System and device architecture of a radio based fieldbus: the RFieldbus system, in IEEE Workshop on Factory Communication Systems, Västerås, Sweden, 2002, pp. 185–192. [73] P. Palensky, Distributed Reactive Energy Management, Ph.D. thesis, Vienna University of Technology, Austria, 2001. [74] G. Gaderer, T. Sauter, Ch. Eckel, What it takes to make a refrigerator smart: a case study, in IFAC International Conference on Fieldbus Systems and Their Applications (FeT), Aveiro, Portugal, July 2003, pp. 85–92. [75] L. Haddon, Home Automation: Research Issues, paper presented at EMTEL Workshop: The European Telecom User, Amsterdam, November 10–11, 1995. [76] M. Lobashov, G. Pratl, T. Sauter, Implications of power-line communication on distributed data acquisition and control systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Lisboa, Portugal, September 2003, pp. 607–613. [77] R. Piggin, An introduction to safety-related networking, IEE Computing and Control Engineering Journal, 15, 34–39, 2004. [78] PROFIBUS International, Profile for Failsafe with PROFIBUS, DP-Profile for Safety Applications, Version 1.2, October 2002, http://www.profibus.com. [79] INTERBUS Club, INTERBUS Safety, White Paper, 2003. [80] http://as-i-safety.net. [81] ODVA, Safety Networks: Increase Productivity, Reduce Work-Related Accidents and Save Money, Open DeviceNet Vendor Assoc., White Paper, 2003, http://www.odva.org. [82] J.-P. Froidevaux, O. Nick, M. Suzan, Use of fieldbus in safety related systems, an evaluation of WorldFIP according to proven-in-use concept of IEC 61508, WorldFIP News, http://www. worldfip.org. [83] G. Leen, D. Heffernan, Expanding automotive electronic systems, IEEE Computer, 35, 88–93, 2002. [84] H. Gharavi, S.P. Kumar (Eds.), Special issue on sensor networks and applications, Proceedings of the IEEE, 91, 2003.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-35
[85] G. Borriello, Key challenges in communication for ubiquitous computing, IEEE Communications Magazine, 40, 16–18, 2002. [86] D. Dietrich, T. Sauter, Evolution potentials for fieldbus systems, in Proceedings of the 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 343–350. [87] A. Koestler, The Ghost in the Machine, Arkana Books, London, 1967. [88] F. Pichler, On the construction of A. Koestler’s holarchical networks, in Cybernetics and Systems 2000, Austrian Society for Cybernetic Systems, Vienna, 2000. [89] P. Palensky, The convergence of intelligent software agents and field area networks, in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 917–922. [90] T. Wagner, An agent-oriented approach to industrial automation systems, in Agent Technologies, Infrastructures, R. Kowalczyk et al. (Eds.), Springer-Verlag, Berlin, 2003, pp. 314–328. [91] A. Pöschmann, P. Krogel, Autoconfiguration Management für Feldbusse: PROFIBUS Plug & Play, Elektrotechnik und Informationstechnik, 117, 5, 2000. [92] W. Kastner, M. Leupold, How dynamic networks work: a short tutorial on spontaneous networks, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes JuanLes-Pins, France, October 15–18, 2001, pp. 295–303. [93] S. Deter, Plug and participate for limited devices in the field of industrial automation, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 263–268. [94] O. Cramer Nielsen, A real time, object oriented fieldbus management system, in 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 335–340. [95] A. Baginski, G. Covarrubias, Open control: the standard for PC-based automation technology, in IEEE International Workshop on Factory Communication Systems, October 1997, pp. 329–333. [96] OPC Data Access Automation Specification, Version 2.0, OPC Foundation, October 14, 1998. [97] R. Bachmann, M.S. Hoang, P. Rieger, Component-based architecture for integrating fieldbus systems into distributed control applications, in Fieldbus Technology, Springer-Verlag, Heidelberg, 1999, pp. 276–283. [98] R. Simon, M. Riedl, C. Diedrich, Integration of field devices using field device tool (fdt) on the basis of electronic device descriptions (EDD), in IEEE International Symposium on Industrial Electronics, ISIE ’03, June 9–11, Rio de Janeiro, 2003, pp. 189–194. [99] W.H. Moss, Report on ISO TC184/SC5/WG5 open systems application frameworks based on ISO 11898, in 5th International CAN Conference (iCC ’98), San Jose, CA, 1998, pp. 07-02–07-04. [100] Function Blocks for Industrial-Process Measurement and Control Systems: Committee Draft, IEC TC65/WG6, ftp://ftp.cle.ab.com/stds/iec/sc65bwg7tf3/html/news.htm. [101] GSD Specification for PROFIBUS-FMS (version 1.0), PNO Karlsruhe. [102] Device Description Language specification, HART Communication Foundation, Austin, TX, 1995. [103] T. Bray, J. Paoli, C. M. Sperberg-McQueen, Extensible Markup Language (XML) 1.0, 1998, http: //www.w3.org/TR/REC-xml. [104] M. Wollschlaeger, Descriptions of fieldbus components using XML, Elektrotechnik und Informationstechnik, 117, 5, 2000. [105] International Electrotechnical Commission, IEC 61804-2, Function Blocks (FB) for Process Control: Part 2: Specification of FB Concept and Electronic Device Description Language (EDDL), 2003. [106] P. Neumann, C. Diedrich, R. Simon, Engineering of field devices using device descriptions, paper presented at IFAC World Congress 2002, Barcelona, 2002. [107] C. Schwaiger, A. Treytl, Smart card based security for fieldbus systems, in 2003 IEEE Conference on Emerging Technologies and Factory Automation, Lisbon, September 2003, pp. 398–406. [108] T. Sauter, Ch. Schwaiger, Achievement of secure Internet access to fieldbus systems, Microprocessors and Microsystems, 26, 331–339, 2002. [109] P. Palensky, T. Sauter, Security considerations for FAN-Internet connections, in IEEE International Workshop on Factory Communication Systems, Porto, September 2000, pp. 27–35.
© 2005 by CRC Press
7-36
The Industrial Communication Technology Handbook
[110] E. Rescorla, SSL and TLS, Addison-Wesley, Reading, MA, 2000. [111] W. Kastner, C. Csebits, M. Mayer, Linux in factory automation? Internet controlling of fieldbus systems! in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 27–31. [112] CAMAC, A Modular Instrumentation System for Data Handling, EUR4100e, March 1969. [113] http://www.hit.bme.hu/people/papay/edu/GPIB/tutor.htm. [114] National Instruments, GPIB Tutorial, www.raunvis.hi.is/~rol/Vefur/%E9r%20Instrupedia/CGPTUTO.PDF. [115] W. Büsing, Datenkommunikation in der Leittechnik, Automatisierungstechnische Praxis, 28, 228–237, 1986. [116] G. Färber, Bussysteme, 2nd ed., Oldenbourg-Verlag, Munich, 1987. [117] M-Bus Usergroup, The M-Bus: A Documentation, Version 4.8, November 11, 1997, http://www.mbus.com/mbusdoc/default.html. [118] G. Leen, D. Heffernan, A. Dunne, Digital networks in the automotive vehicle, IEE Computer and Control Engineering Journal, 10, 257–266, 1999. [119] CAN-in-Automation, CAN history, http://www.can-cia.de/can/protocol/history/. [120] Condor Engineering, MIL-STD-1553 tutorial, http://www.condoreng.com/support/downloads/ tutorials/MIL-STD-1553Tutorial.PDF. [121] Grid Connect, The Fieldbus Comparison Chart, http://www.synergetic.com/compare.htm. [122] Interbus Club, Interbus Basics, 2001, http://www.interbusclub.com/en/doku/pdf/interbus_ basics_en.pdf. [123] H. Kirrmann, Industrial Automation, lecture notes, EPFL, 2004, http://lamspeople.epfl.ch/ kirrmann/IA_slides.htm. [124] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 3), Automatisierungstechnische Praxis, 40, S17–S24, 1998. [125] T. Sauter, D. Dietrich, W. Kastner (Eds.), EIB Installation Bus System, Publicis MCD, Erlangen, Germany, 2001. [126] E.B. Driscoll, The History of X10, http://home.planet.nl/~lhendrix/x10_history.htm.
© 2005 by CRC Press
7-37
Fieldbus Systems: History and Evolution
Appendix The tables presented here give an overview of selected fieldbus systems, categorized by application domain. The list is necessarily incomplete, although care has been taken to include all approaches that either exerted a substantial influence on the evolution of the entire field or are significant still today. The year of introduction refers to the public availability of the specification or first products. This year is also the one used in the timeline in Figure 7.3. Note that despite careful research, the information obtained from various sources was frequently inconsistent, so there may be an uncertainty in the figures. Where respective data could be obtained, the start of the project has been listed as well because there are several cases where much time elapsed between the start of development of the fieldbus and its first release. TABLE 7.8
Instrumentation and PCB-Level Buses
Fieldbus
Developer (Country)
Introduced in
Standard IEEE 583 (1970, 1982, 1994) IEEE 595 (1974, 1982) IEEE 596 (1972, 1982) IEEE 758 (1979) ANSI IEEE-488 (1975, 1978) ANSI IEEE-488.2 (1987, 1992) IEC 60625 (1979,1993) —
[18]
— EN 1434-3 (1997)
[116] [117]
CAMAC
ESONE (Europe)
1969 (start of development 1966)
GPIB (HP-IB)
Hewlett-Packard (U.S.)
1974 (start of development 1965)
HP-IL
Hewlett-Packard (U.S.)
I 2C M-Bus
Philips (Netherlands) University of Paderborn, TI, Techem (Germany) Industry consortium (Germany)
1980 (start of development 1976) 1981 1992
Measurement Bus
TABLE 7.9
1988
[112]
[113, 114, 115]
DIN 66348-2 (1989) DIN 66348-3 (1996)
Automotive and Aircraft Fieldbuses
Fieldbus
Developer (Country)
Introduced in
Standard
References
— AEEC ARINC 429 (1978, 1995) ISO 11898 (1993, 1995) ISO 11519 (1994)
[118]
ABUS ARINC CAN
Volkswagen (Germany) Aeronautical Radio, Inc. (U.S.) Bosch (Germany)
Flexray J1850
DaimlerChrysler, BMW (Germany) Ford, GM, Chrysler (U.S.)
1987 1978 1986 (start of development 1983), CAL 1992 2002 1987
J1939 LIN MIL-1533
SAE (U.S.) Industry consortium SAE (military and industry consortium, U.S.)
1994 1999 1970 (start of development 1968)
VAN
Renault, PSA Peugeot-Citroen (France), ISO TC22 Ship Star Assoc., Boeing (U.S.) Vienna University of Technology (Austria)
1988
— SAE J1850 (1994, 2001) ISO 11519-4 SAE J1939 (1998) — (open spec) MIL-STD-1553 (1973) MIL-STD-1553A (1975) MIL-STD-1553B (1978) ISO 11519-3 (1994)
1997 1996
IEC 61158 (2000) —
SwiftNet TTP
References
© 2005 by CRC Press
[119]
[118] [118] [120]
[118]
[118]
7-38
The Industrial Communication Technology Handbook
TABLE 7.10 Fieldbuses for Industrial and Process Automation and Their Foundations Fieldbus ARCNET ASi
Developer (Country)
Introduced in
Hart Interbus-S
Datapoint (U.S.) Industry and university consortium (Germany) Intel (U.S.) Mitsubishi (Japan) CAN in Automation (user group, Germany) Allen-Bradley (U.S.) Allen-Bradley (U.S.) Fieldbus Foundation (industry consortium, U.S.) Rosemount (U.S.) Phoenix Contact (Germany)
MAP
General Motors (U.S.)
MMS Modbus PDV-Bus P-NET
ISO TC 184 Gould, Modicon (U.S.) Industry and university consortium (Germany) PROCES-DATA (Denmark)
1986 1979 1979 (start of development 1972) 1983
PROWAY C
IEC TC 65
Profibus
Industry and university consortium (Germany)
1986 (start of development 1975) 1989 (start of development 1984)
SDS Sercos
Honeywell (U.S.) Industry consortium (Germany) APC, Inc. (U.S.) Siemens (Germany) ISA SP 50 (U.S.)
1994 1989 (start of development 1986) 1990 1992 1993
Industry and university consortium (France)
1987 (start of development 1982)
Bitbus CC-Link CANopen ControlNet DeviceNet FF
Seriplex SINEC L2 SP50 Fieldbus (World)FIP
© 2005 by CRC Press
1977 1991 1983 1996 1995 (start of development 1993) 1996 1994 1995 (start of development 1994) 1986 1987 (start of development 1983) 1982 (start of development 1980)
Standard
References
ANSI ATA 878 (1999) EN 50295-2 (1998, 2002) IEC 62026-2 (2000) ANSI IEEE 1118 (1990) — (open spec) EN 50325-4 (2002)
[121]
EN 50170-A3 (2000) EN 50325-2 (2000) BSI DD 238 (1996) EN 50170-A1 (2000) — (open spec) DIN 19258 (1993) EN 50254-2 (1998) MAP 1.0 (1982) MAP 2.0 (1985) MAP 3.0 (1988) ISO/IEC 9506 (1988, 2000) — (open spec) DIN 19241 (1982)
[121] [119] [121]
DS 21906 (1990) EN 50170-1 (1996) ISA S72.01 (1985) IEC 60955 (1989) FMS: DIN 19245-1 and -2 (1991) DP: DIN 19245-3 (1993) PA: DIN 19245-4 (1995) FMS/DP: EN 50170-2 (1996) DP: EN 50254-3 (1998) PA: EN 50170-A2 (2000) EN 50325-3 (2000) IEC 61491 (1995) EN 61491 (1998) IEC 62026-6 (2000) — ISA SP 50 (1993) AFNOR NF C46601-7 (1989–1992) EN 50170-3 (1996) DWF: AFNOR NF C46638 (1996) DWF: EN 50254-4 (1998)
[119]
[122] [123]
[124, 115]
[14]
[119]
[121]
[16]
7-39
Fieldbus Systems: History and Evolution
TABLE 7.11 Fieldbuses for Building and Home Automation Fieldbus
Developer (Country)
Introduced in
BACnet
ASHRAE SPC135P (industry consortium, U.S.)
1991
Batibus
Industry consortium (France) Industry consortium (U.S.) Industry consortium (Europe) Industry consortium (Germany)
1987
CEBus EHS EIB
HBS
Industry consortium (Japan)
LonWorks
Echelon (U.S.)
Sigma I X10
ABB (Germany) Pico Electronics (U.K.)
© 2005 by CRC Press
1984 1987 1990
1986 (start of development 1981) 1991 1983 1978 (start of development 1975)
Standard
References
ANSI/ASHRAE 135 (1995) ENV 1805-1 (1998) ENV 13321-1 (1999) ISO 16484-5 (2003) AFNOR NF 46621-3 and -9 (1991) ENV 13154-2 (1998) ANSI EIA 600 (1992) ENV 13154-2 (1998) AFNOR NFC 46624-8 (1991) DIN V VDE 0829 (1992) ENV 13154-2 (1998) EIAJ/REEA ET2101
[125]
ANSI EIA 709 (1999) ENV 13154-2 (1998) — —
[121, 126]
[126]
8 The WorldFIP Fieldbus 8.1 8.2 8.3 8.4
Introduction ........................................................................8-1 WorldFIP Origin .................................................................8-2 Requirements.......................................................................8-2 Choices of WorldFIP...........................................................8-3 Identified Data vs. Classical Messages • Periodic and Aperiodic Traffic • Timeliness Attributes and Mechanisms for Time-Critical Systems
8.5
WorldFIP Architecture........................................................8-5
8.6
Physical Layer ......................................................................8-6
8.7
Data Link and Medium Access Control Layers ................8-7
Architecture and Standardization Figures • Topology • Coding Introduction • Basic Mechanism • The Aperiodic Server • Variable Transfer Services • Message Transfer • Synthesis on the Data Link Layer
8.8
Application Layer ..............................................................8-11 Services Associated with the Variables • Temporal Validity of Variables • Synchronous and Asynchronous • Synchronization Services • Services Associated with Variables Lists
8.9
Jean-Pierre Thomesse Institut National Polytechnique de Lorraine
WorldFIP State and Technology.......................................8-16 Technology • Fieldbus Internet Protocol • New Development
8.10 Conclusion.........................................................................8-16 References .....................................................................................8-17
8.1 Introduction This chapter is dedicated to the study of the WorldFIP* fieldbus. It is one of the first fieldbuses, born at the beginning of the 1980s. But it is also at the origin of several main concepts, which are now implemented in different, other fieldbuses. For example, the producer–consumer model, the timeliness attributes to qualify the validity of data, and the time coherence and consistency attributes are some of the most important WorldFIP contributions. Many of them are coming from research activities (academic and industrial) and from very distributed and real-time requirements analysis. That is why the first sections of this chapter will briefly relate the origin of this fieldbus (Section 8.2), the requirements (Section 8.3), and the choices of WorldFIP specifications (Section 8.4). The technical aspects will be further studied in the four following sections: the architecture in Section 8.5, the physical layer in Section 8.6, the data link layer in Section 8.7, and the application layer in Section 8.8. The current state of this fieldbus is given in the last section (Section 8.9) before the conclusion and bibliography. A lot of theoretical works have been developed for more than 15 years, to prove the protocols, to evaluate the performances, to *WorldFIP is the current name of the previous FIP network. FIP stands for Factory Instrumentation Protocol, but in the French language, the acronym means Flux d’Information (de et vers le) Processus.
8-1 © 2005 by CRC Press
8-2
The Industrial Communication Technology Handbook
guarantee the time constraints (Pleinevaux et al., 1988; Song et al., 1991; Simonot et al., 1995), or to estimate the performances of distributed applications (Bergé et al., 1995).
8.2 WorldFIP Origin The first works on the WorldFIP specification started in September 1982, in a working group under the aegis of the French Ministry of Research and Technology. This working group was composed of representatives of end users, engineering companies, and laboratories. It was important not to include providers and manufacturers of networks at the beginning in order to organize a real end users’ needs analysis, without having to consider the possible influence of existing products or projects. The first objective of this work was to analyze the needs for communication in automatic control systems, but it was necessary to take into account the following points: • It was really the starting development of local area networks. • It was the beginning of the Manufacturing Automation Protocol (MAP) project in the U.S. (MAP, 1988). • Some new ideas appeared on the application architectures, especially the idea of really distributed systems. • The intelligent devices started their development thanks to the progress of microelectronics. The development of WorldFIP started in this context, with essentially two main types of contributions, coming from research and end users’ experiences. The functional analysis of the communication needs in automatic control systems led to the distinction between two mains flows: • A flow of information associated with the control rooms in continuous processes or with the plant in discrete part manufacturing applications • A flow associated with the field devices called “flow of information of the process,” which will be analyzed later and which led to the WorldFIP fieldbus profiles To satisfy the former type, different local area networks were already in existence, while nothing yet existed for the latter. It was then decided to specify a so-called instrumentation network.* The first specification of the FIP Fieldbus was then published in May 1984 (Galara and Thomesse, 1984). It was only at the beginning of the 1990s that the name was transformed to WorldFIP. More information on the origins may be found in Thomesse (1993, 1998). The first results were presented for sustaining a standardization process at International Electrotechnical Commission (IEC) (Gault and Lobert, 1985).
8.3 Requirements The first (and abstract) requirement was to define a communication system to take the place of usual connections standards (4 to 20 mA) between the devices and controllers in an automation system. Another expression was more complete but also abstract: the objective was the design of an operating system for instrumentation. It was in fact a real need in order to build not only a communication system but also really distributed systems. It was then important to provide the right and well-suited services for the distribution of the applications (facilities for the management of coherence and consistencies and for the management of the impossible common global state and clock synchronization). The requirements could be enounced at different abstraction levels. Starting from the most general (see above), they have led to the following: • The connection between the field devices and the control functions should be expensive enough to try the specification of another communication technique. *At this time the word Fieldbus was not yet in use.
© 2005 by CRC Press
The WorldFIP Fieldbus
8-3
• The access to the data by the network should be standardized. • The location of data should be transparent for the user. • The system should be built to meet different dependability requirements by using the same basic components. • The competitiveness of companies should be improved by such technologies. • The development should go through the international standardization. • The protocols should be implemented in silicon. • The data flows between the functions and the set of field and control equipment have then been identified and analyzed, leading to the identification of special needs for a so-called instrumentation network. These led to the identification of the traffic and then to the more technical requirements: • The exchanged data are coming from sensors or are put to actuators. Most of them are known and identified (temperature, pressure, speed, position, and so on), but other transmitted data are not identified in the same sense and usual messages must be transmitted. • The exchanges are periodic or not. They are time constrained in terms of period, jitter, deadline, lifetime, promptness, and refreshment. • Most critical traffic must be periodically managed, but sporadic traffic must also take place. • The timeliness is important for the quality of service and the dependability of the applications. • The distributed decisions must be consistent; i.e., the data and the physical process must be seen in a coherent manner by all application processes. The impossible global state must be approached by a reliable broadcasting of states and events.
8.4 Choices of WorldFIP According to the previous requirements, the WorldFIP solution is based on a few basic ideas, which give the right quality of service to this fieldbus: • The distinction of two types of messages: the notion of identified data vs. the concept of classical messages, associated with the respective cooperation models, producer–consumer vs. client–server • The predefined scheduling of periodic traffic, with periods suited to the physical needs, especially the sampling theory • The online scheduling of sporadic traffic, with priority messages to the critical traffic • The cyclic updating of real-time data at the consumers’ sites • The timeliness attributes and mechanisms for time-critical systems These choices will be presented and analyzed below.
8.4.1 Identified Data vs. Classical Messages Data provided by sensors, data sent to the actuators, and more generally input/output (I/O) and control data are all identified in a given process. They are known within the application. These data are also called identified variables or identified objects. They are often simple objects (temperature, pressure, speed, etc.) and of fixed syntax (integer, real, Boolean, record, list, or other structured data). For instance, a temperature sensor can produce a temperature value coded as an integer, or as a real, and the manufacturer identification as a character chain. The identified data receive a name, which is a global name for the whole application. This name is also used for managing access to the medium. Each variable value has a single producer and one or more consumers. Since transferred values correspond to variables in the process, an identifier is attached to each variable whose value is to be transmitted on the network. This identifier is used as source address to control the medium access. The destination
© 2005 by CRC Press
8-4
The Industrial Communication Technology Handbook
is not indicated. Consumers are responsible for deciding the update of their copies of data on the reception of data by recognizing the corresponding identifier. This is the so-called source addressing. This addressing technique represents several advantages. It allows communication in a one-to-many manner with broadcast. Not only is the communication channel used efficiently when the same information has to be transmitted to more than one consumer, but also the coherence may be obtained with reliable broadcast. A new receiver may be added without address modification. Identifying the variables instead of the sources of the information on the variables offers an additional advantage: the variable is no longer bound to a node of the network. For example, in case of failure of the node providing the variable value, a new source may become active and replace the failed node without any modification of the receivers. Regarding an identified object, a single active producer is defined and all other stations may be defined as consumers.
8.4.2 Periodic and Aperiodic Traffic The control systems are usually based on the system sampling theory, and then the data in inputs and outputs should be transferred periodically. WorldFIP has chosen to privilege the periodic traffic of identified objects between producer and consumers. Variable values are stored in erasable buffers rather than in queues. There is neither acknowledgment nor retransmission for variable transfers. WorldFIP is from this point of view a time-triggered system (Kopetz, 1990). WorldFIP may also be seen as a distributed database updating and management system. The producer of an identified object periodically updates his own buffer, WorldFIP periodically updates the buffers at the consumer locations, and then these consumers may periodically use the copy of the producer value. If a failure occurs during the transmission, the last value is always available for the consumer until he or she receives a new one. In WorldFIP, a one-place erasable buffer is associated with each variable at its production and consumption locations. The usual acknowledgments are not necessary, and the retransmissions are avoided in case of error. The question is how to handle some critical data like alarms or rarely occurring events. In WorldFIP, there are two possible ways, depending on the criticality. If no real-time reaction is required, the best is to use the usual message transfer. Otherwise, the only good solution is to transform the alarm into a variable whose value reflects the presence of an alarm and transfer this value periodically. One may think that this would result in a waste of bandwidth. This is true, but it is the price to pay to ensure a deterministic response time. Moreover, multicast transfers are complex when acknowledgments from each receiver are required. In FIP, the choice to suppress acknowledgments simplifies drastically the solution.
8.4.3 Timeliness Attributes and Mechanisms for Time-Critical Systems Due to the periodic transfer of identified data between a producer and its consumers, no acknowledgment has been proposed. No retransmission is basically allowed. We consider the three following elements: a producer, the consumers, and the bus. The producer is a process producing a data named X at a given period. Several processes consume X at different periods, and the bus updates at a given period the copy of X at each consumer site from the original of X. The question at each consumption site is: Is the value of X fresh, too old, or obsolete? Therefore, some timeliness attributes have been defined in order to indicate to the consumers if the data are correct and in this case the cause of error. These attributes are called refreshments and promptness. The former type indicates if the production is timely; the latter indicates if the reception is correct. Based on these elementary attributes, it is then possible to define the time coherence of actions, i.e., the fact that different distributed actions take place in a given time interval. That is also the definition of simultaneity of actions.
© 2005 by CRC Press
8-5
The WorldFIP Fieldbus
MMS
MPS
Identified Traffic management
Messaging management
Physical layer
FIGURE 8.1 Simplified architecture of WorldFIP.
MMS
MPS Identified Traffic management
Messaging management
(ident,value) transfer Physical layer
FIGURE 8.2 Architecture of WorldFIP.
Other mechanisms have been introduced as synchronization mechanisms between the local operations and the behavior on the network. All these mechanisms will be detailed in Section 8.8.
8.5 WorldFIP Architecture The WorldFIP architecture is demonstrated in Figure 8.1 and Figure 8.2, according to the Open Systems Interconnection (OSI) architecture model (Zimmermann, 1980). All elements were standardized in France in 1992 (AFNOR, 1989). This architecture shows that two main profiles may be used. One is defined to solve the traffic of identified objects; the other is defined for the usual messaging exchanges. This architecture is directly issued from the need analysis. It is important to note that the messaging services in the data link layer are related to the point-to-point exchanges of frames, with storage in queues, with or without acknowledgment, and replication detection. The identified traffic services are related to the exchanges of data in a broadcast manner, with storage in erasable buffers, without acknowledgment, except by the space consistency mechanism at the application layer. Messaging periodic service (MPS) is the service element for the periodic and aperiodic exchanges of identified data. It uses the services of identified traffic at the data link layer. MMS is a subset of the wellknown MMS standard (ISO, 1990) and uses the messaging services at the data link layer. We may say that the first profile (left of the figure) is a profile for real-time traffic management, with guaranteed quality-of-service and timeliness properties. The second profile is used more for noncritical exchanges, e.g., during commissioning, for maintenance and configuration, or more generally for management. Notice that the messaging services are based on the same medium access control.
8.5.1 Architecture and Standardization The European standard EN 50170 [CENELEC, 1996a] contains three national standards in Europe.* Volume 3 outlines all WorldFIP specifications according to the organization shown in Table 8.1 and Figure 8.3. *The other volumes are concerned with P-Net and Profibus.
© 2005 by CRC Press
8-6
The Industrial Communication Technology Handbook
TABLE 8.1
Parts of the European 50170-3 Standard
EN 50170 volume 3
Part 1-3
EN 50170 volume 3
Part 2-3 Sub-part 2-3-1 Sub-part 2-3-2 Sub-part 2-3-3 Part 3-3 Sub-part 3-3-1 Sub-part 3-3-2 Sub-part 3-3-3 Part 5-3 Sub-part 5-3-1 Sub-part 5-3-2 Part 6-3 Part 7-3
EN 50170 volume 3
EN 50170 volume 3
EN 50170 volume 3 EN 50170 volume 3
General Purpose Field Communication System Physical Layer IEC Twisted Pair (IEC 61158-2) IEC Twisted Pair Amendment IEC Fiber optic Data Link Layer Data Link Layer Definitions FCS Definition Bridge Specification Application Layer Specification MPS Definition SubMMS Definition Application Protocol Specification Network Management
TABLE 8.2
Data Rate and Maximum Possible Lengths
Data Rate
Length without Repeater
Length with 4 Repeaters
31,25 kbps 1 Mbps 2,5 Mbps
10 km 1 km 700 m
50 km 5 km 3,5 km
Several profiles of WorldFIP have been defined. One of them, the simpler one providing only periodic traffic of identified data, is called Device WorldFIP (DWF) and is standardized (AFNOR, 1996; CENELEC, 1996b).
8.6 Physical Layer The physical layer of WorldFIP was obviously the first to be conformed to IEC 1158-2* because this standard has been defined starting from the FIP French standard C46 604. The medium may be a twisted shielded pair or a fiber optic.
8.6.1 Figures 8.6.1.1 Data Rates The standard defines three data rates for the shielded twisted pair: 31.25 kbps, 1 Mbps, and 2.5 Mbps. For the fiber optic, a fourth data rate, 5 Mbps, is defined. However, some experiences have been built with other data rates, for example, 25 Mbps, with transfer of speed and video. 8.6.1.2 Maximum Length The maximum number of stations is 256 and the maximum number of repeaters is 4. According to the data rate and number of repeaters, Table 8.2 gives the possible maximum lengths.
8.6.2 Topology The topology for a twisted shielded pair may be like that shown in Figure 8.4.
*This number was the previous number of the current 61158 standard.
© 2005 by CRC Press
8-7
The WorldFIP Fieldbus
Network Management
SubMMS EN50170 - volume 3, part 5-3-2
MPS EN50170 Volume 3 Part 5-3-1
MCS EN50170 - volume 3, part 6-3
EN50170 Volume 3 Part 7-3
Data Link layer EN50170 - volume 3, Part 3-3
Physical layer EN50170 - volume 3, Part 2-3
FIGURE 8.3
Architecture and European standard.
PC JB
TAP
JB
REP
JB
JB
DS
NDS
DS
DS
PC
DS
JB JB
DS
DS
NDS
DS
DS
NDS
NDS
JB: JunctionBox TAP: Connector DS: Diffusion Box DS: Device locally disconnectable NDS: Not disconnectable device RP: Repeater PC: Principal cable
FIGURE 8.4 Example of topology.
8.6.3 Coding The coding is based on a Manchester code. A physical data frame is composed of three parts: a sequence of frame composed of a preamble and a start delimiter (PRE and FSD), the data link information, and the end delimiter (FED). Twenty-four bits are added to each data frame.
8.7 Data Link and Medium Access Control Layers 8.7.1 Introduction The WorldFIP medium access control is centralized and managed by a so-called bus arbitrator (BA). All exchanges are under the control of this bus arbitrator. They are currently scheduled according to the
© 2005 by CRC Press
8-8
The Industrial Communication Technology Handbook
timing requirements and time constraints (Cardeira and Mammeri, 1995), but the scheduling policy is not relevant to the standard. The data link layer provides two types of services: for the identified objects and for the messages. Both may take place periodically. Thanks to the medium access control (MAC) protocol, it is obviously easy to manage the periodic traffic, which may be scheduled before the runtime. However, then it is necessary to provide the well-suited services for the requirement of sporadic or random traffic and the associated protocol mechanisms. As it is usually known, the random traffic is managed by a periodic server. When a station is polled, it may express a request for extra polling. Such requests corresponding to the aperiodic traffic are managed dynamically by the bus arbitrator.
8.7.2 Basic Mechanism The medium access control is based on the following principle: each exchange is composed of two frames, a request and a response. All of the exchanges are based on the couple (name of information, value of the information) implemented by two frames: an identification frame and a value frame. So, to exchange a value of an object, the bus arbitrator sends a frame that contains the identifier of this object. This frame is denoted ID-DAT (as identification of data) (Figure 8.5a). This frame is received by all active current stations and recognized by the so-called producer station of the identified object, and also by the consumer stations that subscribe to this object* (Figure 8.5b). The station that recognizes itself as the producer sends the current value of the identified object. This value is transferred in a so-called RP_DAT frame (as response) (Figure 8.5c). All interested stations, including all subscribers and the bus arbitrator, receive this RP_DAT frame (cf. Figure 8.5d). The ID and RP frames have the following formats: ID frame: • A control field (cf.) of 8 bits for which roles will be developed later • The identifier of the object (16 bits) • A cyclic redundancy check (CRC) (16 bits) RP frame: • A control field (cf.) of 8 bits for which roles will be developed later • The value of the identified object in the previous frame (maximum of 256 bytes) • A CRC (16 bits) The idea is now to extend this simple mechanism of polling by designation of the data to be sent, in order to solve the problem of messages transfer and the transfer of aperiodic data or messages.
8.7.3 The Aperiodic Server The needs for aperiodic transfer have been identified in Section 8.4.2. This aperiodic transfer takes place in the free time slots of the periodic one. The aperiodic traffic is dynamically managed by the bus arbitrator. 8.7.3.1 First Stage The first stage is the expression of the request to the BA by a station. Any producer may request a new exchange by an indication in the control field of the RP frame, when it is polled by an ID_DAT frame. This indication specifies the type of RP frame. We may then observe the three following RP frame types as answers to an ID_DAT frame:
*Notice that the producer–consumer model is also called publisher–subscriber, following the concept of subscribing to the data by the consumers.
© 2005 by CRC Press
8-9
The WorldFIP Fieldbus
(a) VARK
Identification of VARK
VAR3 VAR2 VAR1
(b) VARK
VAR3 VAR2 VAR1 Prod Value of VARK
Cons Value of VARK
Cons Value of VARK
(c) New Value of VARK
VARK
VAR3 VAR2 VAR1 Prod
Cons
Cons
New Value of VARK
Old Value of VARK
Old Value of VARK
Prod
Cons
(d) VARK
VAR3 VAR2 VAR1
New Value of VARK
FIGURE 8.5 Exchange and updating of a variable.
© 2005 by CRC Press
New Value of VARK
Cons New Value of VARK
8-10
The Industrial Communication Technology Handbook
RP_DAT: Response to ID_DAT without any request RP_DAT_RQ: Response to ID_DAT with request for random exchange of other identified objects RP_DAT_MSG: Response to ID_DAT with request for random exchange of message 8.7.3.2 Second Stage The second stage is to satisfy the request. The BA has then to place the right ID frames in a free time slot of the scanning table, according to its own scheduling policy. The ID frames corresponding to the possible requests are ID_RQ and ID_MSG. The former satisfies the RP_DAT_RQ and the latter the RP_DAT_MSG. Following the reception of ID_RQ, the station at the origin of the request sends an RP_RQ frame, in the data field, with a list of identifiers, which have to be sent as ID frames by the BA. Following the reception of ID_MSG, the station at the origin of the request sends an RP_MSG frame with a message in the data field. This message is specified with or without acknowledgment. Both of these RP_MSG frames are called RP_MSG NOACK or RP_MSG_ACK. In this last case, the receiver sends an RP_ACK after reception of the message. A special frame RP_FIN allows the BA to continue its polling.
8.7.4 Variable Transfer Services To each identified object, the data link layer associates a buffer B_DAT_prod at the producer station and a buffer B_DAT_cons at each consumer station. Two main services are defined for writing and reading a buffer: L_PUT and L_GET, respectively. The write service (L_PUT) places the new value in the producer buffer. The previous buffer content is overwritten. The read service (L_GET) gets the value from the consumer buffer. These services do not cause any traffic on the bus. The content of each consumer buffer is updated with the value stored in the producer buffer under the control of the bus arbitrator (Figure 8.5d), as seen in Section 8.7.2. A L_SENT.indication informs the producer when the transmission takes place. The consumers are informed by an L_RECEIVED indication when the update occurs.
8.7.5 Message Transfer For the message as for the identified objects, WorldFIP defines periodic and aperiodic (or on-request) message transfers. To periodic messages are assigned an identifier and a queue. According to the application needs, more than one identifier and one queue may be used when one wants to have different polling periods. Messages are deposited at the source side and the transfer of the content of the queue is periodically triggered by the bus arbitrator (ID-MSG frame). If the queue is not empty, the source DLL sends the first message in the queue in an RP_MSG_xx frame. The destination data link layer stores the message in the receive queue and, if requested, immediately acknowledges the transfer using an RP_ACK frame. The end of the transaction is signaled by the source to the bus arbiter using an RP_FIN frame. If the queue is empty, no transfer takes place and only the RP_FIN frame is sent. The polling period is a configuration parameter. For aperiodic message transfer, on the source side, the data link layer defines a single queue F_MSG_aper that will hold the pending messages. On the destination side, a receiving queue F_MSG_rec is defined. As for aperiodic variables, transfer requests are signaled to the bus arbiter as piggybacks on RP_DAT frames sent in response to ID_DAT frames.
8.7.6 Synthesis on the Data Link Layer Three types of objects are exchanged according to the basic principle based on the exchange of the couples (name, value) (Thomesse and Rodriguez, 1986). These objects are identified objects or list of identifiers of identified objects or messages with or without acknowledgment (Figure 8.6). The data link layer protocol may be considered with a connection established at the configuration stage. These connections are multicast and the corresponding services access points (SAPs) are represented
© 2005 by CRC Press
8-11
The WorldFIP Fieldbus
Station queue
Service Access Point IDENT4
Buffer Value of IDENT4
Buffer List of identifiers
CEP RP-DAT
CEP RP-RQ
Pointer to message queue
General Queue for messages
CEP RP-MSG
FIGURE 8.6 Service access point and connection end point.
by the identifiers. Associated with each SAP, different connection end points (CEPs) are represented by the associated objects and necessary resources: • A CEP for data exchange represented by the buffer storing the successive values • A CEP for the list of identifiers exchange represented by another buffer • A CEP for the sending of messages represented by a queue Each of these CEPs is addressed first by the identifier addressing the SAP and second by the indication in the control field of the ID frame, specifying the corresponding resource.
8.8 Application Layer Two application service elements comprise the WorldFIP application layer: MPS for real-time data exchange (Thomesse and Delcuvellerie, 1987; Thomesse and Lainé, 1989) and sub-MMS for usual messaging service and compatibility with other networks. For the real-time data exchange, FIP behaves like a distributed database being refreshed by the network periodically or aperiodic on demand. All application services related to the periodic and aperiodic data exchange are called MPS. MPS provides local read-write services (periodic) as well as remote read-write services (aperiodic) of the values of variables or lists of variables. The read services associate indications on the age of the object value. Considering a producer (producer of data named X) at a given period, the consumers of X consuming X at different periods, and the bus itself updating at a given period the copies of X consumers from the original of X, the question at a consumption site is: Is the value of X fresh, too old, or obsolete? In WorldFIP, this information is based on local mechanisms and provided as two types of status: the refreshment status elaborated by the producer and the promptness status elaborated by the consumer (Thomesse et al., 1986; Decotignie and Raja, 1993; Lorenz et al., 1994). These statuses are returned by the read services with the value itself. This information may also be used to check whether a set of variables are time coherent (Figure 8.9). As a variable may have several consumers, there is a need to know whether the different copies of the variable value available to the various consumers are identical. This information, called spatial coherency status (or spatial consistency) (Saba et al., 1993), is provided by the MPS read services related to lists of variables. The same services also offer a temporal coherence status.
8.8.1 Services Associated with the Variables A variable can be a simple type, such as integer, floating, Boolean, character, or composite-like array and records. It may have different semantics: variable, synchronization variable, consistency variable, or variable descriptor. Synchronization variables are used to synchronize application processes and also in the elaboration of the temporal and spatial statuses associated with ordinary variables. Consistency
© 2005 by CRC Press
8-12
The Industrial Communication Technology Handbook
variables are used to elaborate the spatial coherence status. Variable descriptors hold all information concerning variables, type, semantics, periodicity, etc., for configuration, commissioning, maintenance, and management. Three request services, A_READ, A_WRITE, and A_UPDATE, and two indication services, A_SENT and A_RECEIVED, are available. A_UPDATE is used to request an aperiodic transfer of a variable. A_SENT and A_RECEIVED are optional services. When used, A_RECEIVED informs the consumer at its local communication entity of the reception of a new variable value. Similarly, A_SENT notifies the producer of the transmission of the produced value. These indication services can be used by application processes (APs) to verify the proper operations of the communication entity and also to synchronize with each other by receiving synchronization variables (see Section 8.8.4). A_READ and A_WRITE exist in two forms, local and remote. A local read of a variable (denoted A_READLOC) provides the requesting AP with the current value of the local variable image. A local write (denoted A_WRITELOC) invoked by an AP updates the local value of the variable to be sent at a next scanning, specified in the request. As already said, these operations do not invoke any communication. The transfer of the value from the producer to its copies in all consumers concerns the distribution function of the application layer. The variable value given in an A_WRITELOC will be available for the distributor, which is in charge of broadcasting it to all consumers of this value. The variable value, returned by an A_READLOC at a consumer site, will be the last value updated by the distribution function. A remote write, A_WRITEFAR, is used by a producer to update the content of the local buffer assigned to the variable specified in the request and to ask for the transfer of this value to the producers. It may been seen as the combination of the invocation of an A_WRITELOC service followed by an A_UPDATE. However, as for the A_WRITELOC, this service may only be invoked by the producer of the variable. Operations are as follows: 1. 2. 3. 4. 5.
Remote READ request Ask the distributor for an update Transfer order from the distributor Transfer of the producer’s variable value to the users Confirmation of remote READ
In a similar way, with a remote read, A_READFAR, a consumer requires the transfer of the variable value from the producer to all consumers. This value is returned as a result of this request. This service is hence a combination of an A_UPDATE service invocation and an A_READLOC service invocation. However, read services may only be invoked on a consumer. Figure 8.7 shows this service from the producer–consumer model point of view, considering an AP that is a consumer of variable X. In order to request explicitly a transfer, this AP should also be the producer of a variable (Y in the example). Figure 8.8 depicts in detail all necessary primitive calls and frame exchanges to achieve the remote READ of X. We can see that these remote operations are not symmetrical. The remote write is a service without confirmation, while the remote read is with confirmation. User
Producer
User
5
1
Distributor
3 FIGURE 8.7 Remote READ using PDC model.
© 2005 by CRC Press
4
2
8-13
The WorldFIP Fieldbus
AP: consumer of X and producer of Y
Data Link layer
Application layer
Exchanged frames
A-READFAR-Rq(X) L-UPDATE-Rq(X)
Y val(Y)
ID-DAT(Y)
X demand queue
L-SENT-Ind(Y)
RP-DAT-RQ(val(Y)) ID-RQ(Y)
X
L-UPDATE-Cnf(X) X val(X)
L-RECEIVED-Ind(X)
RP-RQ(X) ID-DAT(X) RP-DAT(val(X))
A-READFAR-Cnf(X,val(X))
FIGURE 8.8 Primitives and frame exchanges to achieve a remote READ service.
8.8.2 Temporal Validity of Variables Normally, according to the producer–consumer model, operations should proceed in the following order: production, transfer, and consumption. As the receipt of an identifier triggers the transfer, a production and a consumption may be triggered by a production or consumption order. These behaviors may lead to abnormal situations. For example, several productions may occur successively without any transfer. Conversely, a number of transfers may take place between two successive productions of a variable. The same problems may arise on the consumer side. A consumer may read several times the same value of a variable or may not have enough time (or may not be interested) to consume all of the received values. It is thus important to detect these deviations from normal behavior. This means that any consumer should be able to know if the producer has produced on time, the transfer has been handled on time, and the consumer itself has consumed on time. Finally, the consumer should be able to check whether the value is still temporally valid (Lorenz et al., 1994a; Lorenz et al., 1994b). 8.8.2.1 Refreshment The refreshment status type for a variable is a Boolean that indicates if a production occurred in the right time window. It is elaborated by the application layer of the producer. The time window is defined by a start event and a duration. From a simplified view, the refreshment is correct (true) if the production occurs in the time window and it remains true during a given delay, which is the normal production period. It is false after this period deadline. The value of the current refreshment status is sent with the value and indicates to the consumer if, from the point of view of the producer and transmission, the value is valid (Figure 8.9). sync. variable variable prod.
variable transm.
sync. variable
Ttx time timer Tprod refreshment status
true false
timer Tcons promptness status
FIGURE 8.9
true false
Refreshment and promptness status elaboration in the synchronous case.
© 2005 by CRC Press
8-14
The Industrial Communication Technology Handbook
In summary, the refreshment status indicates to a consumer that the producer has produced its value by respecting a production delay called the production period. 8.8.2.2 Promptness The promptness status type for a variable value indicates if the transmission of the data has been done in a right time window. It is elaborated by the communication entities of its consumers. It is returned with the variable value and the refreshment status as a result of an invocation of A_READ.
8.8.3 Synchronous and Asynchronous Refreshment and promptness are the timeliness attributes associated with a given variable. They indicate if an event occurs in a time window defined by a starting event and a duration. The WorldFIP fieldbus has defined two types of timeliness attributes, the synchronous type and asynchronous type.* An attribute is said to be synchronous when the starting event of the time window is a received indication of a dedicated variable. It is normally periodic, and the bus arbitrator is, as for other variables, in charge of the respect of the period. The duration is the period of production and is managed by the network, through the bus arbitrator behavior. An attribute is said to be asynchronous when the starting event is the previous occurrence of this event. The duration is the period of production and is here managed by the device itself. In fact, in WorldFIP, two attributes are relevant to the synchronous type: the so-called synchronous attribute, as defined previously, and the so-called ponctual attribute, where the starting event is the same as for the synchronous attributes, but the duration is no more than the period of production. It is a shorter delay. Further details and the finite-state machines of these mechanisms can be found in the C46-602 standard and in Lorenz et al. (1994a).
8.8.4 Synchronization Services The processes of an application can be synchronized or asynchronous. An asynchronous application process is one whose execution is independent of the network behavior. An application process is said to be synchronized when its execution is related to the reception of some indication of the network. In many cases, the various distributed processes of an application are synchronized. This synchronization may be ensured through the indication of reception of variables. However, some application processes may not be able to handle synchronization. For such asynchronous APs that need to participate in a synchronized distributed application, FIP provides a resynchronization mechanism. In addition to the existing buffer, called the public buffer, for each variable the resynchronization mechanism associates a second buffer, the private buffer. The private buffer is only accessible to the corresponding AP. The access can be performed by the local A_READ if the variable is consumed or A_WRITE if the variable is produced. The public buffer is only accessible to the network (Figure 8.10). The resynchronization mechanism consists of copying the content from one buffer to the other according to the synchronization order via the network. Both variable production and variable consumption can be resynchronized. When variable consumption has to be resynchronized, its value in the private buffer is kept unchanged until the resynchronization order. If a new value is transferred on the network, it will be kept in the public buffer. Only at the reception of a resynchronization order will the value in the public buffer be copied in the private buffer. The process is similar for variable production. In both cases, the resynchronization order is given by a synchronization variable specified with each variable, produced or consumer, that needs to be resynchronized. *The terms synchronous and asynchronous could be replaced by “synchronized by the network” and “locally synchronized,” respectively.
© 2005 by CRC Press
8-15
The WorldFIP Fieldbus
Asynchronous access by producer or consumer
Occurrences of the same variable
Private buffer Public buffer
Synchronous access by network FIGURE 8.10
Resynchronization mechanism.
8.8.5 Services Associated with Variables Lists A variable list is an unordered set of variables that must verify the time coherence, i.e., the fact that they are all produced in a given time window. All variables of a list are consumed at the same time by at least one consumption process. In the case of more than one process concerned with the consumption of a list, another property must be verified: the space consistency, i.e., the fact that all copies of all variables of the list are the same on all consumption sites. The variables that comprise the list may be produced on different sites. They need not all be produced by the same producer, as in the usual application layers, as MMS or MMS-like (FMS or some sub-MMS, for example). Usually the productions are synchronous operations. The only service defined on lists is A_READLIST, which allows the reading of all variables of a list in a single invocation. This service returns the last received values for the variables in the list and three optional statuses: a production coherence status, a transmission coherence status, and a spatial consistency status. These statuses are provided to account for two important needs in the systems using fieldbuses: First, a consumer of several variables is interested to know whether the corresponding values have been sampled nearly at the same time, which is called time coherence (Kopetz, 1990). Second, when the value of a variable has been distributed to several consumers, it might be useful to know if all of the values are identical. This is referred to as spatial consistency. The idea in FIP is not to ensure temporal coherence and spatial consistency, but rather to indicate if these properties are present. In FIP, temporal coherence indication is given through two statuses, the production coherence status and the transmission coherence status. The production coherence status is a Boolean information elaborated by the consumer application layer. This status corresponds to a logical AND operation of all corresponding refreshment statuses. Similarly, a transmission coherence status is calculated as the logical AND of all promptness statuses of the variables in the list. The production coherence status and the transmission coherence status together are an indication of the temporal coherence of the variables in the list. The space consistency status of a list is elaborated by the application layer of each consumer of the list. The elaboration mechanism relies on the broadcast of a consistency variable by each of the consumers. This variable indicates for each copy of the variable list if it has been received correctly and in a given window. After the reception of all consistency variables by all consumers, each of them will have knowledge about the validity of the variables in the list at all consumers. A logical AND operation of all consistency variables will give the spatial consistency status. To make the variable list transfer more reliable, FIP defines an error recovery mechanism. If needed, a consumer can trigger a retransmission of its consumed variables when any error is detected. This mechanism performs a retransmission request (using aperiodic variable transfer service of data link layer) for a number of times, limited to a maximum defined for each of the instances of this list. The duration of the whole transaction of the list (including retransmission) must be bounded by a time window T smaller than the delay between two consecutive synchronization orders. It has been shown in (Song et
© 2005 by CRC Press
8-16
The Industrial Communication Technology Handbook
al., 1991) that this recovery mechanism (a kind of grouped acknowledgment technique) is very efficient and can be recommended for use for other multicast protocols.
8.9 WorldFIP State and Technology 8.9.1 Technology A lot of integrated circuits and software libraries are available in order to build devices compatible with the standard. They conform to the European standard EN 50 170. The circuits cover all physical layer protocols and all profiles of data link and application layers. They are referenced as FIPIU2, FULLFIP2, and MICROFIP for the communication component. The circuits for the physical layer are essentially FIELDRIVE and CREOL for copper wire and OPERA-FIPOPTIC for fiber optic. It is important to notice that redundant channels may be used with the FIELDUAL component. TransoFIP and FIELDTR are possible transformers for connection on copper wire. The libraries are used in order to create the link between the user application and the communications controller. Each library is dedicated to a communication component.
8.9.2 Fieldbus Internet Protocol FIP may now be interpreted as FieldBus Internet Protocol. Indeed, the messaging services are used to transfer Hypertext Transfer Protocol (HTTP) protocol data unit (PDU), and then each site on a WorldFIP fieldbus may host a Web server. For remote maintenance applications, for remote configuration, it is then possible to access the stations through a browser. Two solutions are currently used: 1. A single station may be seen as the image of all stations of the fieldbus. 2. Each station is directly accessed by tunneling HTTP in the WorldFIP PDUs. The interest of WorldFIP is that the data flow generated by Internet connections is managed in complete compatibility with the time constraints of the process, and then with its own dependability. The timecritical traffic is always satisfied in priority.
8.9.3 New Development In 2001, WorldFIP was certified safe according to the procedure “proven in use” defined by IEC 61 508. According to this standard, a fieldbus is considered a subsystem. It has been certified by Bureau Veritas after considering very reliable applications from detailed records from field users, with a sufficient number of applications, with a high level of confidence in the operational figures. It is the only fieldbus in the world with such a certification.
8.10 Conclusion The WorldFIP fieldbus is now 20 years old if we consider the beginning of its specification. It is only a few years old if we consider the very recent dates of the international standards (Leviti, 2001), knowing that the definition of the profiles standard (CENELEC, 2003b) is not yet finished. The current integrated circuits (ICs) (the third generation for the first IC) are based on the last development of microelectronics. The WorldFIP fieldbus is the only certified SIL3 (safety integrity level) in the world (Froidevaux et al., 2001). WorldFIP occupies a special place in the international market. It is used in all types of industry (iron–steel industry, car manufacturing, bottling, power plants, etc.), but also in a lot of time-critical applications, embedded systems in trains, autobuses, ships, subways, and a very important application: the Large Hadron Collider in CERN (Centre Européen de Recherche Nucléaire) in Switzerland. The main reason must be searched in the technical specifications, in the services provided, and in the quality of
© 2005 by CRC Press
The WorldFIP Fieldbus
8-17
services, essentially from a timeliness point of view. WorldFIP guarantees that time constraints will be met, periods without jitter, and synchronization of distributed actions (productions, consumptions, data acquisitions, controls). The validation of data is also an important element of the quality of service and of the dependability of WorldFIP-based systems. The quality of the physical layer (IEC, 1993) and the redundancy capabilities are also important points for choosing WorldFIP for critical applications. A lot of these concepts and mechanisms have been repeated in the TCCA (time-critical communication architecture) report (ISO, 1991; Grant, 1992) and in the IEC TS 61158 (CENELEC, 2003a). Regarding other and newer requirements, WorldFIP is able to transport voice and video without disturbing or influencing the real-time traffic. Some examples of voice transport are already industrial in trains. WorldFIP is also able to transport HTTP data units, allowing remote access to WorldFIP-based systems for monitoring, maintenance, configuration, etc.
References AFNOR (1989). French Standards NF C46601 to C46607. FIP bus for exchange of information between transmitters, actuators and programmable controllers. Published between 1989 and 1992 (in French). AFNOR (1996). French Standard C46-638, Système de Communication à haute performance pour petits modules de données (WorldFIP Profil 1, DWF). Bergé, N., G. Juanole, and M. Samaan (1995). Using Stochastic Petri Nets for Modeling and Analysing an Industrial Application Based on FIP Fieldbus. Paper presented at International Conference on Emerging Technologies and Factory Automation, INRIA, Paris. Cardeira, C. and Z. Mammeri (1995). A schedulability analysis of tasks and network traffic in distributed real-time systems. Measurement, 15, 71–83. CENELEC (1996a). European standard EN 50170. Fieldbus. Volume 1: P-Net, Volume 2: PROFIBUS, Volume 3: WorldFIP. CENELEC (1996b). High Efficiency Communications Subsystems for Small Data Packages, CLC TC/ 65CX, EN 50254. CENELEC (2003a). prEN61158-2: Digital Data Communication for Measurement and Control: Fieldbus for Use in Industrial Control Systems. Part 2: Physical layer specification, Part 3: Data link layer service definition, Part 4: Data link layer protocol specification, Part 5: Application layer service definition, Part 6: Application layer protocol specification. CENELEC (2003b). prEN61784-1 (65C/294/FDIS): Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems. Decotignie, J.D. and P. Raja (1993). Fulfilling temporal constraints in fieldbus. In Proc. IECON ’93, Maui, HI, pp. 519–524. Froidevaux, J.-P. (2001). Use of Fieldbus In Safety Related Systems, an Evaluation of WorldFIP according to Proven-in-Use Concept of IEC 61508. Paper presented at 4th FET, IFAC Conference, Nancy, France. Galara, D. and J.P. Thomesse (1984). Groupe de réflexion FIP, Proposition d’un système de transmission série multiplexée pour les échanges d’informations entre des capteurs, des actionneurs et des automates réflexes. Ministère de l’Industrie et de la Recherche. Gault, M. and J.P. Lobert (1985). Contribution for the fieldbus standard. Presentation to IEC/TC65/ SC65C/WG6. Grant, K. (1992). Users Requirements on Time Critical Communications Architectures, Technical Report. ISO TC184/SC5/WG2/TCCA. IEC (1993). IEC Standard 1158-2, Fieldbus Standard for Use in Industrial Control Systems: Part 2: Physical Layer Specification and Service Definition + AMD1, 1995. ISO (1990). International Standard ISO 9506, Manufacturing Message Specification (MMS): Part 1: Service Definition, Part 2: Protocol Specification, 1991.
© 2005 by CRC Press
8-18
The Industrial Communication Technology Handbook
ISO (1991). ISO/TC 184/SC 5/WG 2-TCCA-N56, Draft Technical Report of the TCCA Rapporteurs’ Group of ISO/TC 184/SC 5/WG 2 Identifying User Requirements for systems Supporting TimeCritical Communications, August 1991. Kopetz, H. (1990). Event triggered vs. time triggered real time systems. LNCS, 563, 87–101. Leviti, P. (2001). IEC 61158, An Offence to Technicians. Paper presented at 4th FET, IFAC Conference, Nancy, France. Lorenz, P. and Z. Mammeri (1994a). Temporal Mechanisms in Communication Models Applied to Companion Standards. Paper presented at SICICA 94, Budapest. Lorenz, P., J.-P. Thomesse, and Z. Mammeri (1994b). A State-Machine for Temporal Qualification of Time-Critical Communication. Paper presented at 26th IEEE Southeastern Symposium on System Theory, Athens, Ohio, March 20–22. MAP (1988). General Motors, Manufacturing Automation Protocol, version 3.0. Pleinevaux, P. and J.-D. Decotignie (1988). Time critical communications networks: field buses. IEEE Network Magazine, 2, 55–63. Saba, G., J.P. Thomesse, and Y.Q. Song (1993). Space and time consistency qualification in a distributed communication system. In Proceedings of IMACS/IFAC International Symposium on Mathematical and Intelligent Models in System Simulation, Vol. 1, Brussels, Belgium, April 12–16, pp. 383–391. Simonot, F., Y.Q. Song, and J.P. Thomesse (1995). On message sojourn time in TDM schemes with any buffer capacity. IEEE Transactions on Communication, 43, 2/3/4, 1013–1021. Song, Y.Q., P. Lorenz, F. Simonot, and J.P. Thomesse (1991). Multipeer/Multicast Protocols for TimeCritical Communication. Paper presented at Multipeer/Multicast Workshop, Orlando, FL. Thomesse, J.-P. (1993). Le réseau de terrain FIP. Revue Réseaux et Informatique Répartie, Ed. Hermès, 3, 3, 287–321. Thomesse, J.-P. (1998). A review of the fieldbuses. Annual Reviews in Control, Pergamon, 22, 35–45. Thomesse, J.-P. and J.-L. Delcuvellerie (1987). FIP: A Standard Proposal for Fieldbuses. Paper presented at IEEE-NBS Workshop on Factory Communications, Gaithersburg, MD, March 17–19. Thomesse, J.-P., J.-Y. Dumaine, and J. Brach (1986). An industrial instrumentation local area network. Proceedings of IECON, 1, 73–78. Thomesse, J.-P. and T. Lainé (1989). The field bus application services. In Proceedings of IECON ’89, 15th Conference IEEE-IES Factory Automation, Philadelphia, pp. 526–530. Thomesse, J.-P. and M. Rodriguez (1986). FIP, A Bus for Instrumentation. Paper presented at Advanced Seminar on Real Time Local Area Networks, Colloque INRIA Bandol, France. Zimmermann, H. (1980). OSI reference model. The ISO model of architecture for open system interconnection. IEEE Transactions on Communication, 28, 425–432.
© 2005 by CRC Press
9 FOUNDATION Fieldbus: History and Features 9.1 9.2
Principles of FOUNDATION Fieldbus.....................................9-1 Technical Description of FOUNDATION Fieldbus ................9-2 H1 and HSE FOUNDATION Fieldbus User Application Layer • H1 FOUNDATION Fieldbus • HSE FOUNDATION Fieldbus • Open Systems Implementation
Salvatore Cavalieri University of Catania
9.3 Conclusions .......................................................................9-16 References .....................................................................................9-16
FOUNDATION fieldbus is an all-digital, serial, two-way communication system. Its specification has been developed by the nonprofit Fieldbus Foundation [1]. Since its very beginning, FOUNDATION Fieldbus has shown two fundamental and unique (at that time, at least) features: an emphasis on the standardization of the description of the devices to be connected to the fieldbus, and the adoption of the main link access mechanisms (i.e., token-based and centralized ones), which the International Electrotechnical Commission (IEC)/Instrument Society of America (ISA) fieldbus committee (IEC 61158 and ISA SP50) was trying to derive from the existing proposals within a new and complete solution [2][3][4][5]. One of the aims of this paper is to emphasize the value of the choices made by the Fieldbus Foundation as well as their impact on current features of the FOUNDATION Fieldbus communication system. Furthermore, those features will be described in great detail, allowing the reader to clearly understand the key points of the system. This chapter is organized into two parts: Section 9.1 will give an overview of the principles of FOUNDATION Fieldbus, and Section 9.2 will discuss the main features of this communication system.
9.1 Principles of FOUNDATION Fieldbus Since the first fieldbus communication systems [6][7] have appeared on the market, the need to achieve just one fieldbus standard was felt immediately. Over 15 years ago, the International Electrotechnical Commission (IEC) and Instrument Society of America (ISA) embarked on a joint standardization effort identified by two codes: 61158 on the IEC side and SP50 on the ISA one. The main aim of the standard committee was the definition of a unique communication system able to merge the main features of the fieldbuses available on the market: FIP (Factory Instrumentation Protocol) [8] and Profibus [9]. The Fieldbus Foundation (note that Fieldbus Foundation refers to the name of the association while FOUNDATION Fieldbus is used for the relevant communication system [1]), established in 1994 as a result of a merger between ISP (Interoperable System Project) and WorldFIP North America, has defined a small set of basic principles. Those basic principles included two main cornerstones:
9-1 © 2005 by CRC Press
9-2
The Industrial Communication Technology Handbook
1. The adoption of both main medium access control (MAC) mechanisms that the IEC/ISA fieldbus committee was trying to derive from the existing proposals within a truly complete solution 2. Emphasis on a standard description of the devices to be connected on the fieldbus Cornerstone 1 made the Fieldbus Foundation free from the persistent solution issue: scheduled access vs. circulated token. IEC 61158 type 1 data link layer (DLL) stated: “Both paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3]. FOUNDATION Fieldbus, since being established, fully adopted an approach to provide both the predefined scheduling philosophy of FIP and the token rotation philosophy of Profibus. Section 9.2 provides more details about these two fundamental mechanisms. Cornerstone 2 allowed the Fieldbus Foundation to avoid a situation (which affected most of the previous fieldbus proposals) in which, after defining the communication stack, much more still needed to be done in order to make devices operational after being connected to a fieldbus. In fact, the previous fieldbus proposals started their developments by focusing on the communication aspects (physical media, access mechanisms, addressing, connections, quality of services, etc.). That was mostly motivated by the fact that when switching from a dedicated low-frequency 4- to 20-A signal to a multidata high-frequency serial link, the most evident change was relevant to the communication mechanism itself: Was that cable still good? What is the best encoding method? How can we guarantee noise recovery? Will too many data on the same medium affect their timeliness? And so on. But once the communication aspects were defined and proven, the data, which such communication was able to transfer between two or more devices, did not make those devices able to interoperate. FOUNDATION Fieldbus had this concept clear in sight since the beginning and included the definition of data semantics plus their configuration and use within the first set of specifications. The only aspect that initially the Fieldbus Foundation intentionally left out was the higher-speed version of Fieldbus, H2. That was mostly because the market addressed by the Fieldbus Foundation was relevant to the replacement of existing 4- to 20-mA devices with the ones compliant with FOUNDATION Fieldbus, trying to reduce as much as possible the relevant costs. This market strategy, chosen in order to make easier the adoption of the new fieldbus technology, could be realized if the already-laid-down twisted-pair cables (connecting the old 4- to 20-mA devices), which support only the slow-speed version, H1, were maintained. This explains why such a strong effort was put in the development of the H1 technology as opposed to the higher-speed H2 version of FOUNDATION Fieldbus. For H2, the Fieldbus Foundation initially planned to adopt the IEC/ISA high-speed standard [2], but ultimately decided to use the High-Speed Ethernet (HSE) instead, mainly due to the wide availability of components and the existence of networks in the plants (at least at the backbone level).
9.2 Technical Description of FOUNDATION Fieldbus FOUNDATION fieldbus is an all-digital, serial, two-way communication system. FOUNDATION Fieldbus specifications include two different configurations, H1 and HSE [1][10]. H1 (running at 31.25 kbit/s) interconnects field equipment such as sensors, actuators, and inputs/outputs (I/Os). HSE (running at 100 Mbit/s) provides integration of controllers (such as distributed control systems [DCSs] and programmable logic controllers [PLCs]), H1 subsystems (via a linking device), data servers, and workstations. HSE is based on standard Ethernet technology to perform its role. In detail: • The H1 FOUNDATION Fieldbus communication system is mainly devoted to distributed continuous process control and to replacement of existing 4- to 20-mA devices. Its communication functionalities, specifically foreseen for time-critical applications, are supported by services grouped within levels, as in all other OSI-RM open-system architectures. The number of levels is minimal, in order to guarantee maximum speed in the data handling. Below the fieldbus application layer, H1 FOUNDATION Fieldbus directly presents the data link layer, managing access to the communication
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
FIGURE 9.1
9-3
H1 FOUNDATION Fieldbus vs. ISO/OSI architecture.
FIGURE 9.2 HSE FOUNDATION Fieldbus vs. ISO/OSI architecture.
channel. A physical layer deals with the problem of interfacing with the physical medium. A network and system management layer is also present. Figure 9.1 compares the H1 FOUNDATION Fieldbus architecture against the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) reference model. • The HSE FOUNDATION Fieldbus defines an application layer and associated management functions, designed to operate over a standard Transmission Control Protocol (TCP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, over twisted-pair or fiber-optic switched Ethernet. It is mainly foreseen for discrete manufacturing applications, but of course, it can also be used to interconnect H1 segments, as well as foreign protocols trough IP/TCP gateways, in order to build complete plant networks. Figure 9.2 compares the HSE FOUNDATION Fieldbus architecture against the ISO/ OSI reference model. Looking at Figure 9.1 and Figure 9.2, it becomes evident that the Fieldbus Foundation has specified a user application layer, significantly differentiating the solution from the ISO/OSI model, which does not define such a layer. The H1 and HSE FOUNDATION Fieldbus user application layer is mainly based on function blocks, providing a consistent definition of inputs and outputs that allow seamless distribution and integration of functionality from various vendors [11].
9.2.1 H1 and HSE FOUNDATION Fieldbus User Application Layer As mentioned above, the Fieldbus Foundation has defined a standard user application layer based on blocks. Blocks are representations of different types of application functions. The types of blocks used in a user application are resource, transducer, and function. Devices are configured by using resource blocks and transducer blocks. The control strategy is built by using function blocks [11] instead.
© 2005 by CRC Press
9-4
The Industrial Communication Technology Handbook
TABLE 9.1
Basic Function Blocks
Function Block Name
Symbol Name
Analog Input Analog Output Bias/Gain Control Selector Discrete Input Discrete Output Manual Loader Proportional/Derivative Proportional/Integral/Derivative Ratio
AI AO BG CS DI DO ML PD PID RA
H1 Fieldbus
Al 110 PID 110 AO 110
FIGURE 9.3 Example of a complete control loop using function blocks in FOUNDATION Fieldbus devices.
9.2.1.1 Resource Block The resource block describes characteristics of the fieldbus device such as the device’s name, manufacturer, and serial number. There is only one resource block in a device. 9.2.1.2 Function Block Function blocks (FBs) provide the control system behavior. The input and output parameters of function blocks running in different devices can be linked over the fieldbus. The execution of each function block is precisely scheduled. There can be many function blocks in a single user application. The Fieldbus Foundation has defined sets of standard function blocks. Ten standard function blocks for basic control are defined in [12]. These blocks are summarized in Table 9.1. Other, more complex standard function blocks are defined in References [13] and [14]. The flexible function block is defined in [15]. A flexible function block (FFB) is a user-defined block. FFB allows a manufacturer or user to define block parameters and algorithms to suit an application that interoperates with standard function blocks and host systems. Function blocks can be built into fieldbus devices as required in order to achieve the desired device functionality. For example, a simple temperature transmitter may contain an analog input (AI) function block. A control valve might contain a proportional integrative derivative (PID) function block as well as the expected analog output (AO) block. Thus, a complete control loop can be built using only a simple transmitter and a control valve (Figure 9.3). 9.2.1.3 Transducer Blocks Like the resource blocks, the transducer blocks are used to configure devices. Transducer blocks decouple function blocks from the local input/output functionalities required in order to read sensors or to command actuator’s output. They contain information such as calibration date and sensor type [16][17].
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
9-5
9.2.2 H1 FOUNDATION Fieldbus The H1 FOUNDATION Fieldbus is made up by different layers (as shown in Figure 9.1), whose functionalities will be described in the following. 9.2.2.1 H1 FOUNDATION Fieldbus Physical Layer The H1 FOUNDATION Fieldbus physical layer has been conceived to receive messages from the communication stack in order to convert them into physical signals on the fieldbus transmission medium and vice versa [18]. Conversion tasks include the adding and removing of preambles, start delimiters, and end delimiters. The preamble is used by the receiver to synchronize its internal clock with the incoming fieldbus signal. The receiver uses the start delimiter to find the beginning of a fieldbus message. After finding the start delimiter, the receiver accepts data until the end delimiter is received. The physical layer is defined by approved standards issued by the IEC and ISA. In particular, the FOUNDATION Fieldbus H1 physical layer is the 31.25-Kbaud version of the IEC (type 1)/ISA fieldbus [2][19]. Signals (±10 mA on 50-ohm load) are encoded using the synchronous Manchester biphase-L technique and can be conveyed on low-cost twisted-pair cables. The signal is called synchronous serial because the clock information is embedded in the serial data stream. Data are combined with the clock signal while creating the fieldbus signal. The receiver of the fieldbus signal interprets a positive transition in the middle of a bit time as a logical 0 and a negative transition as a logical 1. Special codes are defined for the preamble, start delimiter, and end delimiter. Special N+ and N– characters are used in the start delimiter and end delimiter. Note that the N+ and N– signals do not have a transition in the middle of a bit time. 9.2.2.1.1 Fieldbus Signaling The transmitting device delivers ±10 mA at 31.25 kbit/s into a 50-ohm equivalent load to create a 1.0volt peak-to-peak voltage modulated on top of the direct current (DC) supply voltage. The DC supply voltage can range from 9 to 32 volts. The 31.25 kbit/s fieldbus also supports intrinsically safe (I.S.) fieldbuses for bus-powered devices. To accomplish this, an I.S. barrier is placed between the power supply in the safe area and the I.S. device in the hazardous area. In this case (I.S. applications), the allowed power supply voltage depends on the barrier rating. 9.2.2.1.2 Fieldbus Wiring H1 FOUNDATION Fieldbus wiring is based on trunk cables featuring terminators installed at each end. The H1 FOUNDATION Fieldbus allows for stubs or spurs located anywhere along the trunk and connected to the trunk by junction box, as shown by Figure 9.4. A single device can be connected by each spur. Existence of spurs allows 31.25 kbit/s devices to operate on wiring previously used for 4- to 20-mA devices [20][21]. More trunks can be connected in a fieldbus link by means of repeaters. Up to five trunks (by means of four repeaters) can be interconnected. Spur length varies from 1 up to 120 m, depending on the number of devices connected to the fieldbus link. The maximum number of devices on a fieldbus link is 32; the actual number depends on factors such as the power consumption of each device, the type of cable, the use of repeaters, etc. In particular, the maximum number of devices is usually 6 for intrinsically safe applications that are power delivered through the bus, 12 for nonintrinsically safe applications that are power delivered through the bus, and 32 for nonintrinsically safe applications that are not power delivered through the bus [22]. The total trunk length (including spurs) is 1900 m, and the number of network addresses available for each link is 240. 9.2.2.2 H1 FOUNDATION Fieldbus Data Link Layer The H1 FOUNDATION Fieldbus data link layer (DLL) controls transmission of messages onto the fieldbus. As mentioned already, FOUNDATION Fieldbus fully adopted the IEC (type 1)/ISA DLL statement: “Both
© 2005 by CRC Press
9-6
The Industrial Communication Technology Handbook
FIGURE 9.4 Trunk, junction box, and spurs in H1 FOUNDATION Fieldbus.
paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3][23][24]. Thus, a fieldbus needs a good mix of circulated token and scheduled access, well balanced to avoid losing bandwidth for the scheduled access when it is not really needed, but always giving priority to scheduled access over circulated token when a conflict arises. That is what the IEC (type 1)/ISA DLL documents propose and FOUNDATION Fieldbus made real: an overall schedule able to guarantee the needed data at the needed time but also allowing gaps within which a circulated token mechanism can take place while complying with a defined maximum rotation time. Such a philosophy clearly needs an arbitrator that univocally imposes the transmission of defined data at a defined time by a defined entity, when so required, but also guarantees a defined minimum amount of free time to each entity. This arbitrator is called link active scheduler (LAS) within IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL [3][23][24]. Essentially, LAS performs: • Access to the physical medium on a scheduled basis. • Circulation of the token only when no scheduled traffic is needed. The token is passed for a limited amount of time that is always shorter than the interval left before the next scheduled traffic. • A policy for the management of the token, according to which the token is returned to the LAS instead of passing it on to a new node so that the LAS, depending on the time left, can decide whether to actually pass the token once more or to resume link control to manage the scheduled traffic. Specific needs of the token mechanism include: • Giving enough contiguous token time to each node • Guaranteeing the most regular, as possible, token rotation time among all the nodes These needs are granted by the token method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. Other needs include: • Keeping the token cycle short enough • Satisfying the occurrence of high-priority events
© 2005 by CRC Press
9-7
FOUNDATION Fieldbus: History and Features
LAS
CD
DT
FIGURE 9.5
Centralized access mechanism.
These are substituted by the scheduled access method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. 9.2.2.2.1 Device Types Two types of devices are defined in the DLL specification: basic device and link master. Link master devices are capable of becoming the link active scheduler (LAS). Basic devices do not have the capability to become the LAS. 9.2.2.2.2 Scheduled Communication The way the LAS manages the centralized government is based on the following mechanism. The LAS has a list of transmitting times for all data buffers in all devices that need to be cyclically transmitted. When it is time for a device to send contents of a buffer, the LAS issues a compel data (CD) message to the device. Upon receipt of the CD, the device broadcasts or publishes the data item (DT) in the buffer to all devices on the fieldbus. Any device configured to receive the data is called a subscriber. Figure 9.5 shows this access mechanism. Scheduled data transfers are typically used for the regular, cyclic transfer of control loop data between devices on the fieldbus. 9.2.2.2.3 Unscheduled Communication The federal autonomy of each node is given by a bandwidth distribution mechanism based on the use of a circulating token. In unused portions of the bandwidth (i.e., not occupied by the transmission of CDs), the LAS sends a pass token (PT) to each node included in a particular list called live list (described in the following). Each token is associated with a maximum utilization interval, during which the receiving node can use the available bandwidth to transmit what it needs. On the expiration of the time interval
© 2005 by CRC Press
9-8
The Industrial Communication Technology Handbook
PT PT PT
RT RT RT PT
RT LAS PT
RT RT PT
FIGURE 9.6 Token passing mechanism.
or when the node completes its transmissions, the token is returned to the LAS by using another frame called return token (RT). A target token rotation time (TTRT) defines the interval time desired for each token rotation. The value to be assigned to this parameter is linked to the maximum admissible delay in the transmission of the asynchronous flow. Figure 9.6 shows the token circulation managed by the LAS. 9.2.2.2.4 Live List Maintenance The list of all devices that are properly responding to the pass token (PT) is called the live list. New devices may be added to the fieldbus at any time. The LAS periodically sends probe node (PN) messages to all the addresses not yet present in the live list. If a new device appears at an address and receives the PN, it immediately returns a probe response (PR) message. When a device returns a PR, the LAS adds the device to the live list and confirms its addition by sending the device a node activation message. The LAS is required to probe at least one address after it has completed a cycle of sending PTs to all the devices in the live list. The device will remain in the live list as long as it responds properly to the PTs sent from the LAS. The LAS will remove a device from the live list if the device does not reply to the relevant PT for three successive tries. Whenever a device is added or removed from the live list, the LAS broadcasts changes in the live list to all devices; this allows each link master device to maintain a current copy of the live list in order to be ready to become LAS if needed. 9.2.2.2.5 Data Link Time Synchronization A DLL time synchronization mechanism is provided so that any node can request the LAS for a scheduled action to be executed at a defined time that represents the same absolute instant for all the nodes.
© 2005 by CRC Press
9-9
FOUNDATION Fieldbus: History and Features
Is there time to do something before next scheduled CD ?
no
Wait until it’s time to issue the CD Issue the CD Send Idle Messages while waiting Legend: CD = Compel Data PN = Probe Node TD = Time Distribution PT = Pass Token
yes
Issue PN, TD, or PT
FIGURE 9.7 Link active scheduler algorithm.
The LAS periodically broadcasts a time distribution (TD) message on the fieldbus so that all devices have exactly the same data link time. This is important because scheduled communications on the fieldbus and scheduled function block executions in the user application layer are based on timing derived from these messages. 9.2.2.2.6 Link Active Scheduler Operation The algorithm used by the LAS is shown in Figure 9.7. 9.2.2.2.7 LAS Redundancy IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DDL provides the possibility to have more than one potential LAS on each link as well as backup procedures that are essential for fieldbus availability. In particular, as a fieldbus may have multiple link masters, if the current LAS fails, one of the link masters will become the LAS and the operation of the fieldbus will continue. 9.2.2.3 H1 FOUNDATION Fieldbus Application Layer The H1 FOUNDATION Fieldbus application layer includes two sublayers: FAS and FMS [25][26]. 9.2.2.3.1 Fieldbus Access Sublayer Fieldbus access sublayer (FAS) uses both the scheduled and unscheduled features of the data link layer to provide services for the fieldbus message specification (FMS) [25]. The type of each FAS service is described by virtual communication relationships (VCRs). VCR defines the kind of information (messages) exchanged between two applications. Possible features of the VCR may be the number of receivers (one or many) for each transmitter, the memory organization (queue or buffer) used to store the messages to be sent/received, and the DLL mechanism used to send the message (PT or CD). The types of VCR defined by the Fieldbus Foundation are: • Client–server VCR type. The client–server VCR type is used for queued, unscheduled, userinitiated, one-to-one communication between devices on the fieldbus. Queued means that messages are sent and received in the order submitted for transmission, according to their priority, without overwriting previous messages. When a device receives a pass token (PT) from the LAS, it may send a request message to another device on the fieldbus. The requester is called the client and the device that received the request is called the server. The server sends the response when it receives a PT from the LAS. The client–server VCR type is used for operator-initiated requests such as setpoint changes, access to and change of a tuning parameter, alarm acknowledgment, and device upload and download.
© 2005 by CRC Press
9-10
The Industrial Communication Technology Handbook
• Report distribution VCR type. The report distribution VCR type is used for queued, unscheduled, user-initiated, one-to-many communications. When a device, holding an event or a trend report to send, receives a PT from the LAS, it sends its message to a group address defined by its VCR. Devices that are configured to listen for that VCR will receive the report. The report distribution VCR type is normally used by fieldbus devices to send alarm notifications to the operator consoles. • Publisher–subscriber VCR type. The publisher–subscriber VCR type is used for buffered, oneto-many communications. Buffered means that only the latest version of the data is maintained within the network. New data completely overwrite previous data. When a device receives compel data (CD), the device will publish (broadcast) its message to all devices on the fieldbus. Devices that wish to receive the published message are called subscribers. CD may be scheduled in LAS, or they may be sent by subscribers on an unscheduled basis. An attribute of the VCR indicates which method is used. The publisher–subscriber VCR type is normally used by the field devices for cyclic, scheduled publishing of user application function block inputs and outputs. 9.2.2.3.2 Fieldbus Message Specification Fieldbus message specification (FMS) services allow user applications to send messages to each other across the fieldbus by using a standard set of message formats. FMS describes the communication services, message formats, and protocol behavior needed to build messages for the user application [26]. Data that are communicated over the fieldbus are described by an object description. Object descriptions are collected together in a structure called an object dictionary (OD). The object description is identified by its index in OD. Index 0, called the object dictionary header, provides a description of the dictionary itself and defines the first index for the object descriptions of the user application. The user application object descriptions can start at any index above 255. Index 255 and below define standard data types such as Boolean, integer, float, bit string, and data structures that are used to build all other object descriptions. A virtual field device (VFD) is used to remotely view local device data described in the object dictionary. A typical device will have at least two VFDs: the network and system management VFD and the user application VFD. The network and system management VFD provides access to the network management information base (NMIB) and to the system management information base (SMIB). NMIB data include VCRs, dynamic variables, statistics, and LAS schedules (if the device is a link master). SMIB data include device tag and address information and schedules for function block execution. The user application virtual field device is used to make the device functions (the function of a fieldbus device is defined by the selection and interconnection of blocks) visible to the fieldbus communication system. The header of the user application object dictionary points to a directory that is always the first entry in the function block application. The directory provides the starting indices of all of the other entries used in the function block application. The VFD object descriptions and their associated data are accessed remotely over the fieldbus network using virtual communication relationships. FMS communication services provide a standard way for user applications, such as function blocks, to communicate over the fieldbus. Specific FMS communication services are defined for each object type. Table 9.2 summarizes the communication services available. Detailed descriptions for each service are provided in [26]. All of the FMS services use the client–server VCR type except as noted (see notes a and b in Table 9.2). 9.2.2.4 H1 FOUNDATION Fieldbus System Management Inside the H1 FOUNDATION Fieldbus specification, system management handles important system features [27] such as: • Function block scheduling. Function blocks must often be executed at precisely defined intervals and in the proper sequence for correct control system operation. System management synchronizes execution of the function blocks to a common time clock shared by all devices. A macrocycle is a single iteration of a schedule within a device. According to the type of the device, we can have a LAS macrocycle and a device macrocycle. According to the first one, the system management
© 2005 by CRC Press
9-11
FOUNDATION Fieldbus: History and Features
TABLE 9.2
Set of Services in FMS
Group Management and environment services
Object dictionary (OD) services
Variable access services
Event services
Downloading/uploading services
Program handling services
a b
Service
Meaning
Initiate Abort Reject Status Unsolicited status Identify Get OD Initiate put OD Put OD Terminate put OD Read Write Information reporta Define variable list Delete variable list Event notificationb Acknowledge event notification Alter event condition monitoring Request domain in upload Initiate upload sequence Upload segment Terminate upload sequence Request domain download Initiate upload sequence Upload segment Terminate upload sequence Generic initiate download sequence Generic download segment Generic terminate download Create program invocation Delete program invocation Start Stop Resume Reset Kill
Establish a communication Abort a communication Reject a nonvalid service Gives the status of a service Send a nonrequested status Read a device specification (vendor, type, and version) Read an object dictionary (OD) Start loading an OD Load an OD in a device Stop loading an OD Read a variable Update the value of a variable Send data Define a variable list Delete a variable list Notify an event Acknowledge an event Enable or disable an event Request of domain upload Initiate upload Upload data End upload Request of domain download Initiate upload Upload data End upload Open download Send data to device Sequence stop download Create a program object Delete a program object Start a program object Stop a program object Resume execution of a program Reset a program Kill a program
Service can only use the publisher–subscriber or report distribution on VCR. Service can only use the report distribution on VCR.
can synchronize execution of the function blocks across the entire fieldbus link. On the basis of the device macrocycle, instead the system management can synchronize execution of function blocks inside each device. • Application clock distribution. This function allows publication of the time of day to all devices, including automatic switchover to a redundant time publisher. FOUNDATION Fieldbus supports an application clock distribution function. The application clock is usually set equal to the local time of day or to universal coordinated time. System management has a time publisher that periodically sends an application clock synchronization message to all fieldbus devices. The data link scheduling time is sampled and sent with the application clock message so that the receiving devices can adjust their local application times. During the intervals between synchronization messages, application clock time is independently maintained within each device relying on its own internal clock. • Device address assignment. Fieldbus devices do not use jumpers or switches to configure addresses. Instead, device addresses are set by configuration tools using system management services. Every fieldbus device must have a unique network address and physical device tag for the fieldbus to operate properly. To avoid the need for address switches on the devices, assignment
© 2005 by CRC Press
9-12
The Industrial Communication Technology Handbook
of network addresses can be performed by configuration tools using system management services. The sequence for assigning a network address to a new device is as follows: • An unconfigured device will join the network at one of four special temporary default addresses. • A configuration tool will assign a physical device tag to the new device using system management services. • A configuration tool will choose an unused permanent address and assign it to the device using system management services. • The sequence is repeated for all devices that enter the network at a default address. • Devices store the physical device tag and node address in nonvolatile memory, so they will also retain these settings after a power failure. • Find tag service. For the convenience of host systems and portable maintenance devices, system management supports a service for finding devices or variables by a tag search. The “find tag query” message is broadcasted to all fieldbus devices. Upon receipt of the message, each device searches its virtual field devices for the requested tag and returns complete path information (if the tag is found), including the network address, VFD number, VCR index, and OD index. Once the path is known, the host or maintenance device can access the data by its tag. All of the configuration information needed by system management, such as the function block schedule, is described by object descriptions in the network and system management VFD. This VFD provides access to the system management information base and also to the network management information base. 9.2.2.5 H1 FOUNDATION Fieldbus Network Management H1 FOUNDATION Fieldbus network management mainly provides for the configuration of the communication stack [28].
9.2.3 HSE FOUNDATION Fieldbus The HSE FOUNDATION Fieldbus foresees the architecture depicted in Figure 9.2. As shown, its main feature is the use of Internet architecture (full TCP/UDP/IP and IEEE 802.3u stack [29][30][31]) for high-speed discrete control and, more generally, for interconnecting several H1 segments in order to achieve a plantwide fieldbus network [32]. Before describing the HSE FOUNDATION Fieldbus specifications, a brief overview of its general features will be given, with particular emphasis on the capability to interconnect different H1 FOUNDATION Fieldbus segments. There are four basic HSE device categories (but several of them are typically combined into a single real device): linking device, Ethernet device, host device, and gateway device. A linking device (LD) connects H1 networks to the HSE network. An Ethernet device (ED) may execute function blocks and may have some conventional I/Os. A gateway device (GD) interfaces other network protocols such as Modbus [33], DeviceNet [34], or Profibus [9]. A host device (HD) is a non-HSE device capable of communicating with HSE devices. Examples include configurators, operator workstations, and an OPC server. The network in Figure 9.8 shows a host system operating on an HSE bus segment labeled Segment A. Communications to H1 segments (B and C, as shown in the figure) are achieved by means of an Ethernet switch. The same switch is used to connect a second HSE segment (D) and a segment running a foreign protocol (E). Any of the devices connected to the switch may attempt communication to any other device, and it is the function of the switch to provide the correct routing and to negotiate transmission without collisions. The connecting mechanism between HSE and H1 segments is performed by a linking device (LD). A typical LD will serve multiple H1 segments, though for simplicity, only one segment per LD is shown in Figure 9.8. The connection between HSE and a foreign protocol is made through a gateway device (GD). The capabilities of the interconnections shown in Figure 9.8 are as follows:
© 2005 by CRC Press
9-13
FOUNDATION Fieldbus: History and Features
HSE Segment A
Host System
LD
LD H1 Segment B Foreign Protocol Segment E Switch
GD LD H1 Segment C
LD HSE Segment D
FIGURE 9.8 H1 and HSE FOUNDATION Fieldbus interconnection.
• HSE host/H1 segment. The HSE host interacts with a standard H1 device through an LD. In this situation, the HSE host is able to configure, diagnose, and publish and subscribe data to and from the H1 device. • HSE host/HSE segment. The HSE host interacts with an HSE device and is able to configure, diagnose, and publish and subscribe data to and from the HSE device. • H1/H1 segment. In this situation, the interaction is between two H1 devices on two distinct H1 bus segments. The segments are connected to the Ethernet by LDs. Communications between devices on two H1 segments are functionally equivalent to communications between two H1 devices on the same bus segment. But it is clear that real-time communication between devices belonging to different H1 segments cannot be guaranteed due to the lack of a unique scheduler of the communication among different H1 segments. • HSE host/foreign protocol. This connection defines the relationship between a foreign device and the FOUNDATION Fieldbus application environment. The connection is made by GD. The foreign device is seen as a publisher to an HSE resident subscriber; the HSE host can handle the data stream from the I/O gateway in the same manner as it treats the data streams from devices on H1/HSE segments. 9.2.3.1 HSE FOUNDATION Fieldbus Physical, Data Link, Network, and Transport Layers As explained before, a higher-speed physical layer specification was always intended for selected process applications and for factory (discrete parts) automation. The original high-speed solution, called H2, was based on the H1 protocol and function block application running on different media at either 1 or 2.5 Mbit/s. In March 1998, the Foundation board of directors reconsidered the high-speed solution options and terminated further work on H2. The new approach was based on Ethernet and was intended to make use, as much as possible, of commercially available, off-the-shelf technology (COTS) components and software. The new solution, high-speed Ethernet, is designed to integrate multiple protocols, including multiple H1 FOUNDATION Fieldbus segments as well as foreign protocols, as described above. For its high-speed physical layer version, the Fieldbus Foundation has selected high-speed Ethernet at 100 Mbaud. The specifications for the physical layer, as well as for the Ethernet data link layer, are maintained by the Institute of Electrical and Electronics Engineers (IEEE) [30][31].
© 2005 by CRC Press
9-14
The Industrial Communication Technology Handbook
HSE also makes use of well-established Internet protocols that are maintained by the Internet Architecture Board. These include TCP (Transport Control Protocol), UDP (Unit Datagram Protocol), and IP (Internet Protocol) [29]. Standard HSE stack components are the Distributed Host Configuration Protocol (which assigns addresses), Simple Network Time Protocol (SNTP), and Simple Network Management Protocol (SNMP), which rely on TCP and UDP over IP and the IEEE 802.3 MAC and physical layers. This has resulted in a practically unlimited number of nodes (IP addressing) over star topology networks made of as many links as required, the length of which can be up to 100 m for twisted pair and 2000 m on fiber. Messages sent on the Ethernet are bounded by a series of data fields called frames. The combination of a message and frame is called an Ethernet packet. Typically, a packet encoded according to TCP/IP will be inserted in the message field of the Ethernet packet. FOUNDATION Fieldbus uses a similar data structure where messages are bounded by addressing and other data items. What corresponds to a packet in Ethernet is called a protocol data unit (PDU) in FOUNDATION Fieldbus. Let us consider a communication between two H1 devices over an interposed HSE segment, as illustrated in Figure 9.8. The easiest method for LD might be, upon receiving a communication from an H1 device, to simply insert the entire H1 PDU into the message part of the TCP/IP packet. Then LD on the destination H1 segment, upon receiving the Ethernet packet, would merely strip away the Ethernet frame and send the H1 PDU on to the receiving H1 bus segment. This technique is called tunneling and is commonly used in mixed-protocol networks. The solution developed by HSE FOUNDATION Fieldbus is somewhat more complex, but more efficient than tunneling. The HSE FOUNDATION Fieldbus PDU is inserted into the data field of a TCP/IP message field. However, the fieldbus address is encoded as a unique TCP/IP address, so the fieldbus PDU address is used to fill the address field of the TCP/IP packet. The entire TCP/IP packet is then inserted into the message field of the Ethernet packet. Because of the HSE encoding scheme, networks having multiple LDs can locate and transfer messages to the correct destination much more quickly, with far less extraneous bus traffic, as opposed to tunneling. Perhaps even more important, every H1 device (and every HSE device for that matter) has a unique TCP/IP address and can be directly accessed over standard IT and Internet networks. 9.2.3.2 HSE FOUNDATION Fieldbus Application Layer Existing Fieldbus specifications, which have been widely tested in H1 applications and had already been maintained by the Fieldbus Foundation, have been used in the HSE standard too, where applicable. These include fieldbus message specification (FMS) and system management (SM). New specifications were developed and tested to provide complete high-speed communications and control solutions. The new technology is based on the field device access agent (FDA agent) [35]. The FDA agent allows SM and FMS services used by the H1 devices to be conveyed over the Ethernet using standard UDP and TCP. This allows HSE devices to communicate with H1 devices that are connected via a linking device. The FDA agent is also used by the local function blocks in an HSE device. Thus, the FDA agent enables remote applications to access HSE devices and H1 devices through a common interface. 9.2.3.3 HSE FOUNDATION Fieldbus System Management The following aspects of management are supported in the HSE system management layer [36]: • • • •
Each device has a unique and permanent identity and a system-specific configured name. Devices maintain version control information. Devices respond to requests aiming to locate objects, including the device itself. Time is distributed to all devices on the network.
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
9-15
• Function block schedules are used to execute function blocks. • Devices are added and removed from the network without affecting other devices on the network. 9.2.3.4 HSE FOUNDATION Fieldbus Network Management HSE network management allows HSE host systems to perform management operations over the HSE network [37]. The following capabilities are provided by the network management: • Configuring the H1 bridge, which performs data forwarding and republishing between H1 interfaces. • Loading the HSE session list or single entries in this list. An HSE session endpoint represents a logical communication channel between two or more HSE devices. • Loading the HSE VCR list or single entries in this list. An HSE VCR is a communication relationship used for accessing VFDs across HSE. • Performance monitoring for session endpoints, HSE VCRs, and the H1 bridge. • Fault detection monitoring. 9.2.3.5 HSE FOUNDATION Fieldbus Redundancy The HSE FOUNDATION Fieldbus specification provides for management of redundant network interfaces. This capability protects against single and multiple faults in the network. Each device monitors the network and selects the best route to the destination for each message it has to send. HSE provides for various levels of redundancy up to and including complete device and media redundancy. HSE fault tolerance is achieved by operational transparency; i.e., the redundancy operations are not visible to the HSE applications. This is necessary because HSE applications are required to coexist with standard information technology applications. The HSE local area network (LAN) redundancy entity (LRE) coordinates the redundancy function. Each HSE device periodically transmits on both of its Ethernet interfaces a diagnostic message (representing its view of the network) to the other HSE devices. Each device uses the diagnostic messages to maintain a network status table (NST), which is used for fault detection and transmission port selection. There is no central redundancy manager. Instead, each device determines its behavior in response to the faults it detects [38].
9.2.4 Open Systems Implementation One of the main features of FOUNDATION Fieldbus (in both H1 and HSE configurations) is the ability to build open communication systems. It is clear that this represents a key issue in a communication system, as building up a perfect communication stack between two devices is completely useless if those two devices are not able to understand the meaning of each other’s data or behavior. Implementation of open systems is achieved through the use of function blocks and the adoption of a standard way to represent them (the device description language (DDL)). FOUNDATION Fieldbus has defined a set of standard function blocks that can be combined and parameterized to build up a device [11]. Due to the standard format, behavior, and connection of such function blocks, their access and use through the bus is then immediate, allowing the achievement of interoperability and interchangeability. Further, a manufacturer can improve and innovate an existing function block, creating a new standard function block. The solution adopted inside FOUNDATION Fieldbus to realize this has been the DDL, able to provide a formal device description (DD) that can then be interpreted by the DD services library available through FOUNDATION Fieldbus [39][40][41]. Such a DD acts as a driver for each specific device and is supplied together with the device itself. Within each DD, and for each function block included in the device, a hierarchy of definitions is followed: (1) the universal parameters of the device itself, (2) the common parameters of each function block, (3) the common parameters of the transducer blocks, and (4) the parameters specific to the manufacturer. DD may also include small programs able to interoperate with the device (e.g., for its calibration), as well as download capability for managing manufacturers’ upgrading.
© 2005 by CRC Press
9-16
The Industrial Communication Technology Handbook
9.3 Conclusions The Fieldbus Foundation so far appears to be the only multisupplier organization able to achieve concrete results in proposing a large-scale fieldbus solution merging the initial FIP/Profibus proposals. That is mainly due to its features providing true device interoperability and a combination of guaranteed scheduling and token rotation.
References [1] www.fieldbus.org. [2] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Part 2: Physical Layer Specification, 2001. [3] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 3 and 4: Data Link Layer Service and Protocol Definition, 2001. [4] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 5 and 6: Application Layer Service and Protocol Definition, 2001. [5] IEC 61784, Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2001. [6] J.D. Decotignie, P. Pleinevaux, Time critical communication networks: field buses, IEEE Network, 2. [7] J.D. Decotignie, P. Pleinevaux, A survey on industrial communication networks, Annales des Telecommunications, 48, 9–10. [8] www.worldfip.org. [9] www.profibus.org. [10] CENELEC EN50170/A1, General Purpose Field Communication System, Addendum A1, Foundation Fieldbus, 2000. [11] Fieldbus Foundation FF-890, Function Block Application Process: Part 1. [12] Fieldbus Foundation FF-891, Function Block Application Process: Part 2. [13] Fieldbus Foundation FF-892, Function Block Application Process: Part 3. [14] Fieldbus Foundation FF-893, Function Block Application Process: Part 4. [15] Fieldbus Foundation FF-894, Function Block Application Process: Part 5. [16] Fieldbus Foundation FF-902, Transducer Block Application Process: Part 1. [17] Fieldbus Foundation FF-903, Transducer Block Application Process: Part 2. [18] Fieldbus Foundation FF-816, 31.25 kbit/s Physical Layer Profile Specification. [19] ISA S50.02, Physical Layer Standard, 1992. [20] Fieldbus Foundation AG-140, 31.25 kbit/s Wiring and Installation Guide. [21] Fieldbus Foundation AG-165, Fieldbus Installation and Planning Guide. [22] Fieldbus Foundation AG-163, 31.25 kbit/s Intrinsically Safe Systems Application Guide. [23] Fieldbus Foundation FF-821, Data Link Services Subset. [24] Fieldbus Foundation FF-822, Data Link Protocol Subset. [25] Fieldbus Foundation FF-875, Fieldbus Access Sublayer. [26] Fieldbus Foundation FF-870, Fieldbus Message Specification. [27] Fieldbus Foundation FF-800, System Management Specification. [28] Fieldbus Foundation FF-801, Network Management. [29] Douglas E. Comer, Internetworking with TCP/IP, Vol. I, Principles, Protocols, and Architecture, Prentice Hall International, Englewood Cliffs, NJ, 1999. [30] ANSI/IEEE 802.3, IEEE Standards for Local Area Networks: CSMA/CD Access Method and Physical Layer Specifications, 1985. [31] ANSI/IEEE 802.3u, IEEE Standards for Local Area Networks: Supplement to CSMA/CD Access Method and Physical Layer Specifications: MAC Parameters, Physical Layer, MAUs, and Repeater for 100 Mb/s Operation, Type 100BASE-T, 1995. [32] Fieldbus Foundation FF-581, System Architecture.
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
[33] [34] [35] [36] [37] [38] [39] [40] [41]
www.modbus.org. CENELEC EN 50325-2, DeviceNet. Fieldbus Foundation FF-588, HSE Field Device Access Agent. Fieldbus Foundation FF-589, HSE System Management. Fieldbus Foundation FF-803, HSE Network Management. Fieldbus Foundation FF-593, HSE Redundancy. Fieldbus Foundation FD-900, Device Description Language Specification. Fieldbus Foundation FD-110, DDS User’s Guide. Fieldbus Foundation FD-100, DDL Tokenizer User’s Manual.
© 2005 by CRC Press
9-17
10 PROFIBUS: Open Solutions for the World of Automation 10.1 Basics ..................................................................................10-1 10.2 Transmission Technologies ...............................................10-2 10.3 Communication Protocol .................................................10-4 PROFIBUS DP • System Configuration and Device Types • Cyclic and Acyclic Data Communication Protocols
10.4 Application Profiles...........................................................10-8 General Application Profiles • Specific Application Profiles • Master and System Profiles
Ulrich Jecht UJ Process Analytics
Wolfgang Stripf Siemens AG
Peter Wenzel PROFIBUS International
10.5 10.6 10.7 10.8
Integration Technologies ................................................10-17 Quality Assurance ...........................................................10-19 Implementation...............................................................10-20 Prospects ..........................................................................10-21 PROFINET CBA • PROFINET IO • The PROFINET Migration Model
Abbreviations .............................................................................10-22 References ...................................................................................10-23
10.1 Basics Fieldbuses are industrial communication systems with bit-serial transmission that use a range of media such as copper cable, fiber optics, or radio transmission to connect distributed field devices (sensors, actuators, drives, transducers, analyzers, etc.) to a central control or management system. Fieldbus technology was developed in the 1980s with the aim of saving cabling costs by replacing the commonly used central parallel wiring and dominating analog signal transmission (4- to 20-mA or ±10-V interface) with digital technology. Due to the different industry-specific demands to sponsored research and development projects or preferred proprietary solutions of large system manufacturers, several bus systems with varying principles and properties were established in the market. The key technologies are now included in the recently adopted standards IEC 61158 and 61784 [1]. PROFIBUS is an integral part of these standards. Fieldbuses create the basic prerequisite for distributed automation systems. Over the years they evolved to instruments for automated processes with high productivity and flexibility compared to conventional technology. PROFIBUS is an open, digital communication system with a wide range of applications, particularly in the fields of factory and process automation, transportation, and power distribution. PROFIBUS is suitable for both fast, time-critical applications and complex communication tasks (Figure 10.1).
10-1 © 2005 by CRC Press
10-2
The Industrial Communication Technology Handbook
PROFIBUS
Upstream Inbound logistics
PROFIBUS
PROFIBUS
Mainstream
PROFIBUS
Downstream Outbound
Production
logistics
Automation Technology
FIGURE 10.1 PROFIBUS suitable for all decentralized applications.
The application and engineering aspects are specified in the generally available guidelines of the PROFIBUS International [2]. This fulfills user demand for standardization, manufacturer independence, and openness and ensures communication between devices of various manufacturers. Based on a very efficient and extensible communication protocol, combined with the development of numerous application profiles (communication models for device type families) and a fast-growing number of devices and systems, PROFIBUS began its record of success, initially in factory automation and, since 1995, in process automation. Today, PROFIBUS is the world market leader for fieldbuses with more than a 20% share of the market, approximately 500,000 equipped plants, and more than 12 million nodes. Today, there are more than 2000 PROFIBUS products available from a wide range of manufacturers. The success of PROFIBUS stems in equal measures from its progressive technology and the strength of its noncommercial PROFIBUS User Organization e.V. (PNO), the trade body of manufacturers and users founded in 1989. Together with the 25 other regional PROFIBUS associations within countries all around the world and the international umbrella organization PROFIBUS International (PI) founded in 1995, this organization now totals more than 1200 members worldwide. Objectives are the continuing further development of PROFIBUS technology and increasing worldwide acceptance. PROFIBUS has a modular structure (PROFIBUS toolbox) and offers a range of transmission and communication technologies, numerous application and system profiles, and device management and integration tools [8]. Thus, PROFIBUS covers the various and application-specific demands from the field of factory to process automation, from simple to complex applications, by selecting the adequate set of components out of the toolbox (Figure 10.2).
10.2 Transmission Technologies PROFIBUS features four different transmission technologies, all of which are based on international standards. They all are assigned to PROFIBUS in both IEC 61158 and IEC 61784: RS485, RS485-IS, MBPIS (IS stands for intrinsic safety protection), and fiber optics. RS485 transmission technology is simple and cost-effective and primarily used for tasks that require high transmission rates. Shielded, twisted-pair copper cable with one conductor pair is used. No expert knowledge is required for installation of the cable. The bus structure allows addition or removal of stations or the step-by-step commissioning of the system without interfering with other stations. Subsequent expansions (within defined limits) have no effect on stations already in operation.
© 2005 by CRC Press
10-3
Integration Technologies
System Profiles 1…x • Master Conformance Classes • Interfaces (Comm-FB, FDT, etc.) • Constraints
Encoder
Ident
PROFIdrive
SEMI
Common application profiles (optional): I&M functions, PROFIsafe, Time stamp, Redundancy, etc.
Communication IEC 61158/61784 protocol
PROFIBUS DP DP-V0...V2
Transmission technologies
• Descriptions (GSD, EDD) • Tools (DTM, Configurators)
Application profiles I
RIO for PA
PA Devices
Application profiles II
Weighing & Dosage
PROFIBUS: Open Solutions for the World of Automation
RS 485 NRZ RS 485-IS Intrinsic Safety
Fiber Glass Multi Mode Optics: Glass Single Mode PCF/Plastic Fiber
MBP: Manchester Bus Powered MBP-LP: Low Power MBP-IS: Intrinsic Safety
FIGURE 10.2 Structure of PROFIBUS system technology.
Various transmission rates can be selected from 9.6 Kbit/s up to 12 Mbit/s. One uniform speed is selected for all devices on the bus when commissioning the system. Up to 32 stations (master or slaves) can be connected in a single segment. For connecting more than 32 stations, repeaters can be used. The maximum permissible line length depends on the transmission rate. Different cable types (type designation A to D) for different applications are available on the market for connecting devices either to each other or to network elements (segment couplers, links, and repeaters). When using RS485 transmission technology, PI recommends the use of cable type A. RS485-IS transmission technology responds to an increasing market demand to support the use of RS485 with its fast transmission rates within intrinsically safe areas. A PROFIBUS guideline is available for the configuration of intrinsically safe RS485 solutions with simple device interchangeability. The interface specification details the levels for current and voltage that must be adhered to by all stations in order to ensure safe operation during interconnection. An electric circuit limits currents at a specified voltage level. When connecting active sources, the sum of the currents of all stations must not exceed the maximum permissible current. In contrast to the FISCO model (see below), all stations represent active sources. Up to 32 stations may be connected to the intrinsically safe bus circuit. MBP type transmission technology (Manchester coding and bus powered) is a new term that replaces the previously common terms for intrinsically safe transmission such as physics in accordance with IEC 61158-2, 1158-2, etc. In the meantime, the current version of IEC 61158-2 (physical layer) describes several different transmission technologies, MBP technology being just one of them. Thus, differentiation in naming was necessary. MBP is a synchronous, Manchester-coded transmission with a defined transmission rate of 31.25 Kbit/s. In the MBP-IS version, it is frequently used in process automation as it satisfies the key demands of the chemical and petrochemical industries for intrinsic safety and bus powering using two-wire technology. MBP transmission technology is usually limited to a specific segment (field devices in hazardous areas) of a plant, which is then linked to a RS485 segment via a segment coupler or links (Figure 10.3). Segment couplers are signal converters that modulate the RS485 signals to the MBP signal level and vice versa. They are transparent from a bus protocol’s point of view. In contrast, links provide more computing power. They virtually map the entire field devices connected to the MBP segment into the RS485 segment as a single slave. Tree or line structures (and any combination of the two) are network topologies supported by PROFIBUS with MBP transmission with up to 32 stations per segment and a maximum of 126 per network.
© 2005 by CRC Press
10-4
The Industrial Communication Technology Handbook
Control system (PLC)
Engineering or HMI tool
ε x
≤ 12 Mbit/s PROFIBUS DP/RS 485 Actuator + ε x
31.25 Kbit/s
I
PROFIBUS DP/MBP-IS Transducer Segment coupler/link
FIGURE 10.3 Intrinsic safety and powering of field devices using MBP-IS.
Fiber-optic transmission technology is used for fieldbus applications with very high electromagnetic interference or that are spread over a large area or distance. The PROFIBUS guideline for fiber-optic transmission [3] specifies the technology available for this purpose, including multimode and singlemode glass fiber, plastic fiber, and hard-clad silica (HCS) fiber. Of course, while developing these specifications, great care was taken to allow problem-free integration of existing PROFIBUS devices in a fiberoptic network without the need to change the protocol behavior of PROFIBUS. This ensures backward compatibility with existing PROFIBUS installations. The internationally recognized FISCO model considerably simplifies the planning, installation, and expansion of PROFIBUS networks in potentially explosive areas. FISCO stands for fieldbus intrinsically safe concept. It was developed by the German PTB [4]. The model is based on the specification that a network is intrinsically safe and requires no individual intrinsic safety calculations when the relevant four bus components (field devices, cables, segment couplers, and bus terminators) fall within predefined limits with regard to voltage, current, output, inductance, and capacity. The corresponding proof can be provided by certification of the components through authorized accreditation agencies, such as PTB (Germany), UL and FM (U.S.), and others. If FISCO-approved devices are used, not only is it possible to operate more devices on a single line, but the devices can be replaced during runtime by devices of other manufacturers, or the line can be expanded — all without the need for time-consuming calculations or system certification. So you can simply plug and play, even in hazardous areas.
10.3 Communication Protocol 10.3.1 PROFIBUS DP At the protocol level, PROFIBUS with decentralized peripherals (DP) and its versions DP-V0 to DP-V2 offer a broad spectrum of optional services, which enable optimum communication between different applications. DP has been designed for fast data exchange at the field level. Data exchange with the distributed devices is primarily cyclic. The communication functions required for this are specified through the DP basic functions (version DP-V0). Geared toward the special demands of the various areas of application, these basic DP functions have been expanded step by step with special functions, so that DP is now available
© 2005 by CRC Press
PROFIBUS: Open Solutions for the World of Automation
10-5
in three versions — DP-V0, DP-V1, and DP-V2 — whereby each version has its own special key features. All versions of DP are specified in detail in IEC 61158 and 61784, respectively. Version DP-V0 provides the basic functionality of DP, including cyclic data exchange as well as station diagnosis, module diagnosis, and channel-specific diagnosis. Version DP-V1 contains enhancements geared toward process automation, in particular acyclic data communication for parameter assignment, operation, visualization, and alarm handling of intelligent field devices, in coexistence with cyclic user data communication. This permits online access to stations using engineering tools. In addition, DP-V1 defines alarms. Examples for different types of alarms are status alarm, update alarm, and a manufacturer-specific alarm. Version DP-V2 contains further enhancements and is geared primarily toward the demands of drive technology. Due to additional functionalities, such as isochronous slave mode and slave-to-slave(s) communication (data exchange broadcast (DXB)), etc., DP-V2 can also be implemented as a drive bus for controlling fast movement sequences in drive axes.
10.3.2 System Configuration and Device Types DP supports implementation of both monomaster and multimaster systems. This affords a high degree of flexibility during system configuration. A maximum of 126 devices (masters or slaves) can be connected to a bus network. In monomaster systems, only one master is active on the bus during operation of the bus system. Figure 10.4 shows the system configuration of a monomaster system. In this case, the master is hosted by a programmable logic controller (PLC). The PLC is the central control component. The slaves are connected to the PLC via the transmission medium. This system configuration enables the shortest bus cycle times. In multimaster systems several masters are sharing the same bus. They represent both independent subsystems, comprising masters and their assigned slaves, and additional configuration and diagnostic master devices. The masters are coordinating themselves by passing a token from one to the next. Only the master that holds the token can communicate. PROFIBUS DP differentiates three groups of device types on the bus. DP master class 1 (DPM1) is a central controller that cyclically exchanges information with the distributed stations (slaves) at a specified message cycle. Typical DPM1 devices are PLCs or PCs. A DPM1 has active bus access with which it can read measurement data (inputs) of the field devices and write the set-point values (outputs) of the actuators at fixed times. This continuously repeating cycle is the basis of the automation function (Figure 10.4). PLC with Master Class 1
Bus cycle
1 2
Slaves
FIGURE 10.4 PROFIBUS DP monomaster system (DP-V0).
© 2005 by CRC Press
10-6
The Industrial Communication Technology Handbook
Power_On
Wait on Parameterization
Optional: - set slave address - get slave diagnosis
Parameterization Configuration
not ok
Wait on Configuration Configuration
Slave fault or timeout
Data Exchange
ok Optional: - get configuration - get slave diagnosis ok Diagnosis telegram instead of process data
FIGURE 10.5 State machine for slaves.
DP master class 2 (DPM2) consists of engineering, configuration, or operating devices. They are put in operation during commissioning and for maintenance and diagnostics in order to configure connected devices, evaluate measured values and parameters, and request the device status. A DPM2 does not have to be permanently connected to the bus system. The DPM2 also has active bus access. DP slaves are peripherals (input/output (IO) devices, drives, human machine interfaces (HMIs), valves, transducers, analyzers), which read in-process information or use output information to intervene in the process. There are also devices that solely provide input information or output information. As far as communication is concerned, slaves are passive devices: they only respond to direct queries (see Figure 10.4, sequences ① and ➁). This behavior is simple and cost-effective to implement. In the case of DPV0, it is already completely included in the Bus-ASIC.
10.3.3 Cyclic and Acyclic Data Communication Protocols Cyclic data communication between the DPM1 and its assigned slaves is automatically handled by the DPM1 in a defined, recurring sequence (Figure 10.4). The appropriate services are called MS0. The user defines the assignment of the slave(s) to the DPM1 when configuring the bus system. The user also defines which slaves are to be included/excluded in the cyclic user data communication. DPM1 and the slaves are passing three phases during start-up: parameterization, configuration, and cyclic data exchange (Figure 10.5). Before entering the cyclic data exchange state, the master first sends information about the transmission rate, the data structures within a PDU, and other slave-relevant parameters. In a second step, it checks whether the user-defined configuration matches the actual device configuration. Within any state the master is enabled to request slave diagnosis in order to indicate faults to the user. An example for the telegram structure for the transmission of information between master and slave is shown in Figure 10.6. The telegram starts with some synchronization bits, the type (SD) and length (LE) of the telegram, the source and destination addresses, and a function code (FC). The function code indicates the type of message or content of the load (processing data unit) and serves as a guard to control the state machine of the master. The PDU, which may carry up to 244 bytes, is followed by a safeguard mechanism frame-checking sequence (FCS) and a delimiter. One example for the usage of the function code is the indication of a fault situation on the slave side. In this case, the master sends a special diagnosis request instead of the normal process data exchange that the slave replies to with a diagnosis message. It comprises 6 bytes of fixed information and userdefinable device and module- or channel-related diagnosis information [1], [7].
© 2005 by CRC Press
10-7
PROFIBUS: Open Solutions for the World of Automation
Stream of standard PROFIBUS telegrams (S) S
Sync time
S
S
SD LE LEr SD DA
33TBit 68H
...
...
68H ....
S
S
S
SA
FC
Processing Data Unit
FCS ED
....
...
1.......244 Bytes
..... 16H
1 Cell = 11 Bit LE SB ZB ZB ZB ZB ZB ZB ZB ZB PB EB 0 1 2 3 4 5 6 7 TBit SD LE LEr DA SA FC
PDU
= Processing Data Unit, 244 Bytes maximum = Clock-Bit = 1/Baudrate = Frame Checking Sequence FCS = Start Delimiter (here SD2, var. data length) (across data within LE) = Length of Process Data ED = End Delimiter = Repetition of Length; no check in FCS SB = Start-Bit = Destination Address ZB0...7 = Character-Bit = Source Address PB = (even) Parity Bit = Function Code (Message type) EB = Stop-Bit
FIGURE 10.6 PROFIBUS DP telegram structure (example).
In addition to the single station-related user data communication, which is automatically handled by the DPM1, the master can also send control commands to all slaves or a group of slaves simultaneously. These control commands are transmitted as multicast messages and enable sync and freeze modes for event-controlled synchronization of the slaves [1], [7]. For safety reasons, it is necessary to ensure that DP has effective protective functions against incorrect parameterization or failure of transmission equipment. For this purpose, the DP master and the slaves are fitted with monitoring mechanisms in the form of time monitors. The monitoring interval is defined during configuration. Acyclic data communication is the key feature of version DP-V1. This forms the requirement for parameterization and calibration of the field devices over the bus during runtime and for the introduction of confirmed alarm messages. Transmission of acyclic data is executed parallel to cyclic data communication, but with lower priority. Figure 10.7 shows some sample communication sequences for a master class 2, which is using MS2 services. In using MS1 services, a master class 1 is also able to execute acyclic communications. Slave-to-slave communications (DP-V2) enable direct and timesaving communication between slaves using broadcast communication without the detour over a master. In this case, the slaves act as publisher; i.e., the slave response does not go through the coordinating master, but directly to other slaves embedded in the sequence, the so-called subscribers (Figure 10.8). This enables slaves to directly read data from other slaves and use them as their own input. This opens up the possibility of completely new applications; it also reduces response times on the bus by up to 90%. The isochronous mode (DP-V2) enables clock synchronous control in masters and slaves, irrespective of the busload. The function enables highly precise positioning processes with clock deviations of Example: Event observation: “The position of control valve A changed by 5˚ at 10:42 A.M.” Event observations require exactly-once semantics when transmitted to a consumer. At the sender, event information is consumed on sending, and at the receiver, event information must be queued and consumed on reading. Event information is transmitted in event messages. Periodic state observations or sporadic event observations are two alternative approaches for the observation of a dynamic environment in order to reconstruct the states and events of the environment at the observer. Periodic state observations produce a sequence of equidistant “snapshots” of the environment that can be used by the observer to reconstruct those events that occur within a minimum temporal distance that is longer than the duration of the sampling period. Starting from an initial state, a complete sequence of (sporadic) event observations can be used by the observer to reconstruct the complete sequence of states of the RT entity that occurred in the environment. However, if there is no minimum duration between events assumed, the observer and the communication system must be infinitely fast.
12.2.3 Temporal Firewalls An extensible architecture must be based on a small number of orthogonal concepts that are reused in many different situations in order to reduce the mental load required for understanding large systems. In a large distributed system the characteristics of the interfaces between subsystems determine to a large extent the comprehensibility of the architecture. In TTA, the communication network interface (CNI; Figure 12.2) between a host computer and the communication network is the most important interface. The CNI appears in every node of the architecture and separates the local processing within a node from the global interactions among the nodes. The CNI consists of two unidirectional data flow interfaces, one from the host computer to the communication system and the other in the opposite direction.
© 2005 by CRC Press
12-4
The Industrial Communication Technology Handbook
Input-Output Subsystem CNI Host Processor with Memory, Operating System and Application Software CNI TT Communication Controller to/from Replicated Communication Channels FIGURE 12.2 Node of TTA.
We call a unidirectional data flow interface elementary if there is only a unidirectional control flow [7] across this interface. An interface that supports periodic state messages with error detection at the receiver is an example of such an elementary interface. We call a unidirectional data flow interface composite if even a unidirectional data flow requires a bidirectional control flow. An event message interface with error detection is an example of a composite interface. Composite interfaces are inherently more complex than elementary interfaces, since the correct operation of the sender depends on the control signals from all receivers. This can be a problem in multicast communication where many control messages are generated for every unidirectional data transfer, and each one of the receivers can affect the operation of the sender. The basic CNI of TTA as depicted in Figure 12.3 is an elementary interface. The time-triggered transport protocol carries autonomously — driven by its time-triggered schedule — state messages from the sender’s CNI to the receiver’s CNI. The sender can deposit the information into its local CNI memory according to the information push paradigm, while the receiver will pull the information out of its local CNI memory. From the point of view of temporal predictability, information push into a local memory at the sender and information pull from a local memory at the receiver are optimal, since no unpredictable task delays that extend the worst-case execution occur during reception of messages. A receiver that is working on a time-critical task is never interrupted by a control signal from the communication system. Since no control signals cross the CNI in TTA (the communication system derives control signals for the fetch-and-delivery instants from the progress of global time and its local schedule exclusively), propagation of control errors is prohibited by design. We call an interface that prevents propagation of control errors by design a temporal firewall [4]. The integrity of the data in the temporal firewall is assured by the nonblocking write (NBW) concurrency control protocol [5, p. 217].
12.2.4 Communication Interface From the point of view of complexity management and composability, it is useful to distinguish between three different types of interfaces of a node: the real-time service (RS) interface, the diagnostic and management (DM) interface, and the configuration and planning (CP) interface [8]. These interface types serve different functions and have different characteristics. For the temporal composability, the most important interface is the RS interface. 12.2.4.1 The Real-Time Service Interface The RS interface provides the timely real-time services to the node environment during the operation of the system. In real-time systems it is a time-critical interface that must meet the temporal specification of the application in all specified load and fault scenarios. The composability of an architecture depends
© 2005 by CRC Press
12-5
Dependable Time-Triggered Communication
CNI Memory
CNI Memory
Pull
Sender
Push
Global Time
Receiver
Cluster Communication System Control Flow
Data Flow
FIGURE 12.3 Data flow and control flow at a TTA interface.
on the proper support of the specified RS interface properties (in the value and temporal domains) during operation. From the user’s point of view, the internals of the node are not visible at the CNI, since they are hidden behind the RS interface. 12.2.4.2 The Diagnostic and Management Interface The DM interface opens a communication channel to the internals of a node. It is used for setting node parameters and for retrieving information about the internals of the node, e.g., for the purpose of internal fault diagnosis. The maintenance engineer that accesses the internals of a node via the DM interface must have detailed knowledge about the internal objects and behavior of the node. The DM interface does not affect temporal composability. Usually, the DM interface is not time critical. 12.2.4.3 The Configuration and Planning Interface The CP interface is used to connect a node to other nodes of a system. It is used during the integration phase to generate the “glue” between the nearly autonomous nodes. The use of the CP interface does not require detailed knowledge about the internal operation of a node. The CP interface is not time critical. The CNI of TTA can be directly used as the real-time service interface. On input, the precise interface specifications (in the temporal and value domains) are the preconditions for the correct operation of the host software. On output, the precise interface specifications are the postconditions that must be satisfied by the host, provided the preconditions have been satisfied by the host environment. Since the bandwidth is allocated statically to the host, no starvation of any host can occur due to high-priority message transmission from other hosts. TTA implements an event-triggered communication service on top of the basic TT service to realize the DM and CP interfaces. Since the event-triggered communication is based on (but not executed in parallel to) the time-triggered communication, it is possible to maintain and to use all predictability properties of the basic TT communication service in event-triggered communication.
12.3 The Time-Triggered Architecture The range of TTA’s services is understood best if put into a broader context: the integrated project Dependable Computer Systems (DECOS) aims to develop technologies to move from federated distributed architectures to integrated distributed architectures [1]. While federated basically means that each application’s subsystem is placed on an independent node, integrated architectures try to unite several application’s subsystems on a single node. A schematic overview of the DECOS approach for integrated distributed architectures is depicted in Figure 12.4. An application is divided into different distributed application subsystems (DASs); such subsystems could be, for example, the power train, a breaking system, a steering system, etc. In a federated architecture each DAS would be implemented on a single node; an integrated architecture provides services that allow more DASs to be implemented on a single node. These services form the platform interface layer (PIL). Examples of PIL services are:
© 2005 by CRC Press
12-6
The Industrial Communication Technology Handbook
DAS A
DAS B
DAS C
DAS D
Platform Interface Layer (PIL)
Basic Services
Different Implementation Platforms and Choices
FIGURE 12.4 Structure of DECOS integrated distributed architecture.
• • • • • •
Encapsulation services Event-triggered communication Virtual networks Hidden gateways Provision of legacy interfaces Application diagnosis support
12.3.1 Basic Services The PIL services rely on a set of validated basic services. TTA is a target architecture that provides the basic services. 12.3.1.1 Predictable Time-Triggered Transmission The very basic principle of time-triggered communication is that transmission of messages is triggered by the clock rather than the availability of new information; the so-called time-division multiple-access (TDMA) strategy is used. In an architecture using TDMA, time is split up into (nonoverlapping) pieces of not necessarily equal durations, which are called slots. These slots are grouped into sequences called TDMA rounds, in which every node occupies exactly one slot. The knowledge of which node occupies which slot in a TDMA round is static, available to all components a priori, and equal for all TDMA rounds. When the time of a node’s slot is reached, the node is provided exclusive access to the communications medium for the duration of the slot, t islot , where 0 £ i < n (assuming there are n nodes in the system). The sending slot, t islot, of a respective node i is split up into three phases: presend, transmit, postreceive, where in the first phase preparations for the transmission are done, and the actual sending process is done in the second phase. During the postreceive phase the state of the nodes is updated according to the received messages. Durations between two consecutive transmit phases of succeeding nodes are called interframe gaps. The interframe gaps have to be chosen with respect to the postreceive phase and the different propagation delays of the messages on the channels. After the end of one TDMA round, the next TDMA round starts; that is, after the sending of the node in the last slot of a TDMA round, the node that is allowed to send in the first slot sends again. Consequently, each node sends predictably every tround time units, where tround = S in=-01t islot. 12.3.1.2 Fault-Tolerant Clock Synchronization It is widely understood that a common agreement on physical time throughout the complete systems is necessary for distributed control applications. Since safety-critical systems shall not rely on a single point of failure, each fault-tolerant solution for clock synchronization requires a distributed solution. Typically,
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-7
we can distinguish three phases in a fault-tolerant clock synchronization algorithm [5]: In the first phase, each node that participates in clock synchronization acquires information on the local views on the global time in all other nodes. The required message exchange can be implemented either by the exchange of dedicated synchronization messages or by a priori knowledge of the transmission pattern of regular message flow (implicit synchronization). In the second phase each node executes a convergence function based on the received deviation values from the different nodes. In the third phase a node adjusts its local timer that represents the local view on the global time by the output of the convergence function. The adjustment procedure can be implemented either as state correction, where the local timer is corrected at an instant, or as rate correction, where the local timer is corrected over an interval by accelerating or decelerating the speed of the local clock. More sophisticated clock synchronization algorithms take the stability of the drift of the node’s local clock into account and correct the rate of the clock in advance. Case studies show that a combination of a regular clock synchronization algorithm with a rate correction algorithm yields an impressive quality of the precision in the system. A crucial phase of clock synchronization is the initial synchronization after power-on when the nodes within a system are unsynchronized (since the power-on times of different nodes may vary, and thus the local clocks start to run at different points in time). Start-up algorithms have to be used to achieve a sufficient degree on initial synchronization. One possible solution for a start-up algorithm is a variation of a clock synchronization algorithm: after power-on, the local clocks of different nodes may be far apart, but successive rounds of message exchange and convergence should achieve a sufficient precision. However, if the exchange of messages, in particular synchronization messages, itself requires synchronization between the nodes, as is the case in time-triggered protocols, this solution cannot be implemented and a dedicated start-up algorithm has to be constructed. 12.3.1.3 Determinism A definition of a timely and deterministic multicast channel is given in [9] by the following three properties: 1. Timeliness: Given that a message is sent at the send instant tsend, then the receive instants treceive at all receivers of the (multicast) message will be in the interval [tsend + dmin, tsend + dmax], where dmin is called the minimum delay and dmax is called the maximum delay. The difference dmax – dmin is called the jitter of the communication channel. dmax and dmin are a priori known characteristic parameters of the given communication channel. 2. Constant order: The receive order of the messages is the same as the send order. The send order among all messages is established by the temporal order of the send instants of the messages as observed by an omniscient observer. 3. Agreed order: If the send instances of n (n > 1) messages are the same, then an order of the n messages will be established in an a priori known manner. We call a communication channel that fulfills properties 2 and 3 ordinal deterministic. If a communication channel fulfills all properties stated above, this communication channel is temporal deterministic; thus, temporal determinism is a stronger form of determinism than ordinal determinism. We call a communication channel path deterministic if there exists an a priori known route from a sending to a receiving node. Path determinism and temporal determinism are therefore orthogonal properties. 12.3.1.4 Fault Isolation In the field of fault-tolerant computing the notion of a fault containment region (FCR) is introduced in order to delimit the impact of a single fault. A fault containment region is defined as the set of subsystems that share one or more common resources. A fault in any one of these shared resources can thus impact all subsystems of the FCR; i.e., the subsystems of an FCR cannot be considered to fail independently of each other. In the context of this chapter we consider the following resources that can be impacted by a fault: • Computing hardware • Power supply
© 2005 by CRC Press
12-8
The Industrial Communication Technology Handbook
• Timing source • Clock synchronization service • Physical space For example, if two subsystems depend on a single timing source, e.g., a single oscillator or a single clock synchronization algorithm, then these two subsystems are not considered to be independent and therefore belong to the same FCR. Since this definition of independence allows that two FCRs can share the same design, i.e., the same software, software faults are not part of this fault model. In TTA a node is considered to form a single FCR. An architecture for safety-critical systems has to ensure that a fault that affects one FCR is isolated so that it will not cause other FCRs to fail. 12.3.1.5 FCR Diagnosis (Membership) The failure of an FCR must be reported to all other FCRs in a consistent manner within a short latency [5]. The membership service is a form of concurrent diagnosis that realizes such a detection service. The time-triggered protocols TTP/C and TTP/A are concrete implementations of TTA services. TTP/C is designed for ultrahigh dependable systems and thus tolerates either an arbitrary failure of any one of its nodes or a passive arbitrary failure of one of its channels (that means that even a faulty channel will not be allowed to create a correct TTP/C message itself). Furthermore, TTP/C is equipped with fault tolerance mechanisms that ensure that if the fault assumptions are temporally violated, the system will be able to recover within a bounded duration after the fault assumptions hold again. To ensure this robustness TTP/C implements all listed basic services. The low-cost TTP/A protocol is intended for usage as a fieldbus protocol and tolerates only fail-silent components. It implements only the predictable time-triggered transmission service. We discuss the time-triggered protocols TTP/C and TTP/A next.
12.3.2 The System Protocol TTP/C
Hub
The time-triggered protocol for Society of Automotive Engineers (SAE) Class C applications (TTP/C) currently supports bus (Figure 12.5a) and star (Figure 12.5b) topologies as well as hybrid compositions of those. The communication medium is replicated to compensate transmission failures of messages. The communication links are half duplex; that is, a node is able to either transmit or receive via an attached link. Full-duplex links would not bring advancements since the TDMA strategy excludes the possibility of more than one good node transmitting concurrently.* TTP/C realizes the predictable time-triggered transmission service by adhering to an a priori defined communication schedule that organizes communication into TDMA rounds. Several successive TDMA rounds form a cluster cycle. The messages a node may send may differ with respect to the TDMA
Node
Node
Node
Node
Node
Node
Node
Hub
Node
a)
b)
FIGURE 12.5 Different TTP/C topologies.
*Full-duplex links may bring advancements during the start-up phase of the protocol; the current start-up algorithm, however, is designed for half-duplex links.
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-9
round in the cluster cycle. When a cluster cycle is finished, it is restarted such that the cluster cycle is executed cyclically. The communication schedule, the so-called message description list (MEDL), is stored within the communication controller of each node. In addition to the time-triggered transmission concept described in Section 12.3.1.1, TTP/C also supports multiplexed nodes and shadow nodes. A set of nodes is called to be multiplexed if they share the same slot in a TDMA round. Depending on the TDMA round in a cluster cycle, the single node that is allowed to send in the multiplexed slot is identified (this information is stored in the MEDL as well). A shadow node has a dedicated slot in the TDMA round but will only transmit in this slot if it detects that its primary node fails to send. After recovery of the primary node the former primary will act as shadow node. A particular message may carry up to 240 bytes of data. The data are protected by a 24-bit cyclic redundancy check (CRC) checksum. In order to achieve high data efficiency, the sender name and message name are derived from the send instant. We distinguish between three different types of messages in TTP/C: I-frames, N-frames, and X-frames. I-frames carry the current controller state (C-state) and can be used for nodes that are out of synchronization to reintegrate into a running system. N-frames are used for regular application data and do not carry C-state information explicitly. However, the sending node calculates the CRC checksum using the N-frame and its internal C-state. A receiving node will calculate the CRC checksum using the received N-frame and its C-state. Thus, the CRC check will only be successful if both sender and receiver agree on the C-state. Using this form of CRC checksum calculation makes it impossible for a receiver to distinguish a transmission failure from a disagreement on the Cstate. X-frames (that is, N-frames that carry the C-state information explicitly) overcome this limitation. The fault-tolerant clock synchronization of TTP/C exploits the common knowledge of the send schedule: every node measures the difference between the a priori known expected and the actually observed arrival time of a correct message to learn about the difference between the clock of the sender and the clock of the receiver. This information is used by a fault-tolerant average algorithm to calculate periodically a correction term for the local clock in order to keep the clock in synchrony with all other clocks of the cluster. The clock synchronization algorithm has been formally verified in [16]. TTP/C uses a faulttolerant start-up algorithm that ensures that the system will become synchronized within an upper bound in time, provided that there is a minimum number of components awake. The start-up algorithm used in TTP/C is a waiting room algorithm that is based on unique time-outs. Each node i has two unique ) and cold-start time-out (t CS time-outs, listen time-out (t listen i i ). For each pair of nodes i, j the following relation holds: t listen > t CS i j
(12.1)
time units. After power-up, say at t0, node k starts to listen on the communication channels for t listen k If there is already synchronous operation established, node k will receive a frame during this period. If the , it will initiate cold-start by itself by sending a coldnode was not able to synchronize during t0 + t listen k start frame. After transmission of the cold-start frame, node k listens to the communication channel until t 0 + t listen . If node k was not able to integrate until this point in time, and no collision occurred, + t CS k k node k will send another cold-start frame. Node k will transmit cold-start frames with a period of t CS k until it successfully synchronizes to a received frame. Extensive model-checking studies of the start-up concept, including exhaustive failure simulation, were performed in [18]. As a key lemma of these studies, it was verified that a minimum configuration of three nodes and intelligent central guardians is necessary and sufficient to tolerate one arbitrarily faulty node or one passive arbitrarily faulty central guardian during the start-up sequence. The membership service employs a distributed agreement algorithm to determine whether the outgoing link of the sender or the incoming link of the receiver has failed. Nodes that have suffered a transmission fault are excluded from the membership until they restart with a correct protocol state. Before each send operation of a node, the clique avoidance algorithm checks if the node is a member of the majority clique. Certain aspects of TTA group membership service have been formally verified in [15].
© 2005 by CRC Press
12-10
The Industrial Communication Technology Handbook
The fault tolerance concepts of TTA that are used in TTP/C are discussed in detail in Section 12.4. As in any distributed computing system, the performance of TTA depends primarily on the available communication bandwidth and computational power. Because of physical effects of time distribution and limits in the implementation of the guardians [19], a minimum interframe gap of about 5 ms must be maintained between frames to guarantee the correct operation of the guardians. If a bandwidth utilization of about 80% is intended, then the message send phase must be in the order of about 20 ms, implying that about 40,000 messages can be sent per second within such a cluster. With these parameters, a sampling period of about 250 ms can be supported in a cluster composed of 10 nodes. The precision of the clock synchronization in current prototype systems is below 1 ms. If the interframe gap and bandwidth limits are stretched, it might be possible to implement in such a system a 100-ms TDMA round (corresponding to a 10-kHz control loop frequency), but not much smaller if the system is physically distributed (to tolerate spatial proximity faults). The amount of data that can be transported in the 20-ms window depends on the bandwidth: in a 5 MBit/s system it is about 12 bytes; in a 1 GBit/s system it is about 2400 bytes. A prototype implementation of TTP/C using Gigabit Ethernet [17] was developed within the next TTA project. This prototype implementation uses COTS (commercially available, off-the-shelf technology) hardware and was therefore not expected to achieve the limiting performance. The objective of this project was rather to determine the performance that can be achieved without special hardware and to pinpoint the performance bottlenecks faced when using COTS components. TTP/C is commercially available in the form of the automotive qualified TTP/C-C2 chip [22]. A Federal Avionics Aviation (FAA) certification process (DO-178b) is currently under finalization that shall also prove the appropriateness of the hardware for avionics applications. The detailed specification of the TTP/C protocol can be found at [20]. There are several ongoing projects that use TTP/C; examples are a railway signaling system or the cabin pressure control in Airbus A380. See [21] for a list of projects that employ TTP/C as a commercial product.
12.3.3 The Fieldbus Protocol TTP/A The TTP/A protocol is the time-triggered fieldbus protocol of TTA. It is used to connect low-cost smart transducers to a node of TTA, which acts as the master of a transducer cluster. In TTP/A the CNI memory element of Figure 12.3 has been expanded at the transducer side to hold a simple interface file system (IFS). Each interface file contains up to 256 records of four bytes each. The IFS forms the uniform name space for the exchange of data between a sensor and its environment (Figure 12.6). The IFS holds the real-time data, calibration data, diagnostic data, and configuration data. The information between the IFS of the smart transducer and the CNI of the TTA node is exchanged by the time-triggered TTP/A protocol, which distinguishes between two types of rounds, the master–slave (MS) round and the multipartner (MP) round. The MS rounds are used to read and write records from the IFS of a particular transducer to implement the DM and CP interfaces. The MP rounds are periodic and transport data from selected IFS records of several transducers across the TTP/A cluster to implement the RS. MP rounds and MS rounds are interleaved, such that the time-critical RS implemented by means of MP rounds and the event-based MS service can coexist. It is thus possible to diagnose a smart transducer
Internal Logic of Transducer
Interface File System (IFS)
Sensor FIGURE 12.6 Interface file system in a smart transducer.
© 2005 by CRC Press
Read by Client Write
Dependable Time-Triggered Communication
12-11
or to reconfigure or install a new smart transducer online, without disturbing the time-critical RS of the other nodes. The TTP/A protocol also supports a plug-and-play mode where new sensors are detected, configured, and integrated into a running system online and dynamically. The detailed specification of the TTP/A protocol can be found at [14].
12.4 Fault Tolerance In any fault-tolerant architecture it is important to distinguish clearly between fault containment and error containment. Fault containment is concerned with limiting the immediate impact of a single fault to a defined region, while error containment tries to avoid the propagation of the consequences of a fault, the error. It must be prohibited that an error in one fault containment region propagate into another fault containment region that has not been directly affected by the original fault.
12.4.1 Fault Containment In TTA, nodes communicate by the exchange of messages across replicated communication channels. Each one of the two channels transports independently its own copy of the message at about the same time from the sending CNI to the receiving CNI. The start of sending a message by the sender is called the message send instant. The termination of receiving a message by the receiver is called the message receive instant. In TTA, the intended message send instants and the intended message receive instants are a priori known to all communicating partners. A message contains an atomic data structure that is protected by a CRC. We make the assumption that a CRC cannot be forged by a fault. A message is called a valid message if it contains a data structure with a correct CRC. A message is called a timely message if it is a valid message and conforms to the temporal specification. A message that does not conform to the temporal specification is an untimely message. A timely message is a correct message if its data structure is in agreement, at both the syntactic and semantic levels, with the specification. We call a message with a message length that differs from its specification or with an incorrect CRC an invalid message.
12.4.2 Error Containment in the Temporal Domain An error that is caused by a fault in the sending FCR can propagate to another FCR via a message failure; i.e., the FCR sends a message that deviates from the specification. A message failure can be a message value failure or a message timing failure. A message value failure implies either that a message is invalid or that the data structure contained in a valid message is incorrect. A message timing failure implies that the message send instant or the message receive instant is not in agreement with the specification. In order to avoid error propagation of a sent message, we need error detection mechanisms that are in different FCRs than the message sender. Otherwise, the error detection mechanism may be impacted by the same fault that caused the message failure. In TTA we distinguish between timing failure detection and value failure detection. Timing failure detection is performed by a guardian (Figure 12.7), which is part of TTA. Value failure detection is the responsibility of the host computer. The guardian is an autonomous unit that has a priori knowledge of all intended message send and receive instants. Each one of the two replicated communication channels has its own independent guardian. A receiving node within TTA judges a sending node as operational, if it has received at least one timely message from the sender around the specified receive instant. It is assumed that a guardian cannot forge a CRC and cannot store messages; i.e., it can only output a valid message at one of its output ports if it has received a valid message on one of its input ports within the last d time units. A guardian transforms a message that it judges to be untimely into an invalid message by cutting off its tail. Such a truncated message will be recognized as invalid by all correct receivers and will then be discarded. The guardian may truncate a message either because it detected a message timing failure or because the guardian itself is faulty. In the latter case it is assumed that the sender of the message is correct, and thus the correct message will proceed to the receivers via the replicated channel of TTA.
© 2005 by CRC Press
12-12
The Industrial Communication Technology Handbook
TT
P/ C
CN I
Star Coupler including Central Guardian and TTP/C Communication Controller
TTP/C Communication Controller
C P/ TT I
CN FIGURE 12.7 TTA star topology with central guardian.
12.4.3 Error Handling in the Value Domain Detection of value failures is not the responsibility of TTA, but of the host computers. For example, detection and correction of value failures can be performed in a single step by triple modular redundancy (TMR). In this case three replicated senders, placed in three different FCRs, perform the same operations in their host computers. They produce — in the fault-free case — correct messages with the same content that are sent to three replicated receivers that perform a majority vote on these three messages (actually, at the communication level six messages will be transported, one from each sender on each of its two channels). Detection of value failures and detection of timing failures are not independent in TTA. In order to implement a TMR structure at the application level, the integrity of the timing of the architecture must be assumed. An intact sparse global time base is a prerequisite for the systemwide definition of the distributed state, which again is a prerequisite for masking value failures by voting. The separation of handling timing failures from handling value failures has beneficial implications for resource requirements. In general, it is necessary to implement interactive consistency to solve the Byzantine Generals Problem: a set of nodes has to agree upon a correct value in the presence of faulty nodes that may be asymmetric faulty. A Byzantine-tolerant algorithm that establishes interactive consistency in the presence of k arbitrary failing nodes requires 3k + 1 nodes and several rounds of message exchange [12]. For clock synchronization, and thus for the maintenance of the sparse global time base, instead of an interactive consistency algorithm, an interactive convergence algorithm [11] can be used that needs only a single round of message exchange. TTA claims to tolerate one arbitrary faulty component (that is, k = 1). Since all nodes of a cluster, independent of their involvement in a particular application system, can contribute to handling timing failures at the architectural level, the lower bound of nodes in a system is 4, which is a relatively small number for real systems. Once a proper global time has been established, TMR for masking of value failures can be implemented using only 2k + 1 synchronized nodes in a particular application subsystem. Two concepts contribute to this fine property: the self-confidence principle and replica determinism. According to the self-confidence principle, a node will consider itself correct until it is accused by a sufficient set of nodes. A set of nodes that operates replica determinant will produce the same output that are at most an a priori specifiable interval d apart [5]. That means that the tolerance of a Byzantine-faulty component does not necessarily
© 2005 by CRC Press
12-13
Dependable Time-Triggered Communication
Physical CAN Bus
Logical CAN Bus
CAN Controller
High-dependable TTP/C TTP/C Communication Communication Controller FIGURE 12.8 Virtual CAN on top of TTP/C.
require a solution to the Byzantine Generals Problem. The Byzantine Generals Problem has to be solved only if values from the environment are received, and the nodes have to come to a consistent view on these values. This separation of timing failures and value failures thus reduces the number of components needed for fault tolerance of an application from 3k + 1 to 2k + 1.
12.4.4 Virtual Networks As it is most likely that in a real system the nodes are of mixed criticality, it is economically feasible to provide a communication infrastructure that is of mixed dependability. Nodes that execute a highly dependable task are of high criticality and communicate via a highly dependable network protocol, while nodes of minor criticality operate on a low-dependable network protocol. TTA provides a mixeddependability communication infrastructure by virtual networks. Virtual networks provide a logical network structure on top of a physical network structure by emulation. Example: Recent research was concerned with a prototype study of CAN over TTP/C [13]. In this work two physical CAN networks were connected to a TTP/C cluster via two gateway nodes (Figure 12.8). The CAN messages are tunneled through the TTP/C system. Thus, the physically separated CAN buses form one logical CAN bus in a transparent fashion for the CAN controllers. With the virtual network approach, it is possible to have low-critical nodes communicate via a dynamic protocol while highly critical nodes communicate via the highly dependable TTP/C. Furthermore, the logical CAN bus consists of three independent fault containment regions, and thus a babbling CAN controller will only affect the physical part of the CAN bus where it is located. This approach is also scalable with respect to the number of logical CAN buses. To summarize, fault containment and error detection are achieved in TTA in three distinct steps. First, fault containment is achieved by proper architectural decisions concerning resource sharing in order to provide independent fault containment regions. Second, propagation of timing errors is avoided at the architecture level by the guardians. Third, handling of value failures is performed at the application level by voting.
12.5 The Design of TTA Applications Composability and the associated reuse of nodes and software can only be realized if the architecture supports a two-level design methodology. In TTA such a methodology is supported: TTA distinguishes between the architecture design (cluster design) and the component design (node design).
© 2005 by CRC Press
12-14
The Industrial Communication Technology Handbook
I/O TTP/A Network Driver Interface
Vehicle Dynamics
Brake Manager
Engine Control
I/O
I/O
Communication Controller
Gateway Body
Steering Manager
Suspension
I/O
I/O
Communication Network Interface
Replicated Broadcast Channels
FIGURE 12.9 Decomposition of a drive-by-wire application.
12.5.1 Architecture Design In the cluster design phase, an application is decomposed into clusters and nodes. This decomposition will be guided by engineering insight and the structure inherent in the application, in accordance with the proven architecture principle of form follows function. For example, in an automotive environment, a drive-by-wire system may be decomposed into functional units, as depicted in Figure 12.9. If a system is developed “on the green lawn,” then a top-down decomposition will be pursued. After the decomposition has been completed, the CNIs of the nodes must be specified in the temporal and value domains. The data elements that are to be exchanged across the CNIs are identified, and the precise fetch instants and delivery instants of the data at the CNI must be determined. Given these data, the schedules of the TTP/C communication system can be calculated and verified. At the end of the architecture design phase, the precise interface specifications of the nodes are available. These interface specifications are the inputs and constraints for the node design. Given a set of available nodes with their temporal specifications (nodes that are available for reuse), a bottom-up design approach must be followed. Given the constraints of the nodes at hand (how much time they need to calculate an output from an input), a TTP/C schedule must be found that meets the application requirements and satisfies the node constraints.
12.5.2 Component Design During the node design phase, the application software of the host computer is developed. The deliveryand-fetch instants established during the architecture design phase are the preconditions and postconditions for the temporal validation of the application software. The host operating system can employ any reasonable scheduling strategy, as long as the given deadlines are satisfied and the replica determinism of the host system is maintained. Node testing proceeds bottom up. A new node must be tested with respect to the given CNI specifications in all anticipated load and fault conditions. The composability properties of TTA (stability of prior service achieved by the strict adherence to information pull interfaces) ensure that a property that has been validated at the node level will also hold at the system level. At the system level, testing will focus on validating the emerging services that are a result of the integration.
12.5.3 Validation Today, the integration and validation phases are probably the most expensive phases in the implementation of a large distributed real-time system. TTA has been designed to reduce this integration and validation effort by providing the following mechanisms:
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-15
• The architecture provides a consistent distributed computing base to the application and informs the application in case a loss of consistency is caused by a violation of the fault hypothesis. The basic algorithms that provide this consistent distributed computing base (clock synchronization and membership) have been analyzed by formal methods and are implemented once and for all in silicon. The application need not be concerned with the implementation and validation of the complex distributed agreement protocols that are needed to establish consistency in a distributed system. • The architecture is replica deterministic, which means that any observed deficiency can be reproduced in order to diagnose the cause of the observed problem. • The interaction pattern between the nodes and the contents of the exchanged messages can be observed by an independent observer without the probe effect. It is thus possible to determine whether a node complies with its preconditions and postconditions without interfering with the operation of the observed node. • The internal state of a node can be observed and controlled by the DM interface. • In TTA it is straightforward to provide a real-time simulation test bench that reproduces the environment to any node in real time. Deterministic automatic regression testing can thus be implemented.
12.6 Conclusions The Time-Triggered Architecture is the result of more than 20 years of research in the field of dependable distributed real-time systems. During this period, many ideas have been developed, implemented, evaluated, and finally discarded. What survived is a small set of orthogonal concepts that center around the availability of a dependable global time base. The guiding principle during the development of TTA has always been to take maximum advantage of the availability of this global time, which is part of the world, even if we do not use it. TTA spans the whole spectrum of dependable distributed real-time systems, from the low-cost deeply embedded sensor nodes to high-performance nodes that communicate at gigabits per second speeds, persistently assuming that a global time of appropriate precision is available in every node of TTA. At present, TTA occupies a niche position, since in the experimental as well as in the theoretical realm of main-line computing, time is considered a nuisance that makes life difficult and should be dismissed at the earliest moment [10]. However, as more and more application designers start to realize that real time is an integrated part of the real world that cannot be abstracted away, the future prospects for TTA look encouraging.
Acknowledgments This work was supported by the European IST (Information Society Technologies) project “Next TTA” under project number IST-2001-32111. This document is a revised version of [2].
References [1] Consortium DECOS. DECOS Annex 1: Description of Work, 2003. Contract FP6-511764. [2] H. Kopetz and G. Bauer. Time-triggered communication networks. In Industrial Information Technology Handbook. CRC Press, Boca Raton, FL, 2004. [3] H. Kopetz and G. Bauer. The Time-Triggered Architecture. Proceedings of the IEEE, 91:112–126, 2003. [4] H. Kopetz and R. Nossal. Temporal firewalls in large distributed real-time systems. In Proceedings of the IEEE Workshop on Future Trends in Distributed Computing, 1997, pp. 310–315. [5] H. Kopetz. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.
© 2005 by CRC Press
12-16
The Industrial Communication Technology Handbook
[6] H. Kopetz. The time-triggered (TT) model of computation. In Proceedings of the 19th IEEE RealTime System Symposium, 1998, pp. 168–177. [7] H. Kopetz. Elementary versus composite interfaces in distributed real-time systems. In Proceedings of the 4th International Symposium on Autonomous Decentralized Systems, 1999, pp. 26–33. [8] H. Kopetz. Software engineering for real-time: a roadmap. In Proceedings of the 22nd International Conference on Software Engineering, 2000, pp. 201–211. [9] Hermann Kopetz. On the Determinism of Communication Systems. Research Report 48/2003, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2003. [10] E. Lee. What’s ahead for embedded software? IEEE Computer, 33:18–26, 2000. [11] L. Lamport and P.M. Melliar-Smith. Synchronizing clocks in the presence of faults. Journal of the ACM, 32:52–78, 1985. [12] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, 4:382–401, 1982. [13] Roman Obermaisser. An Integrated Architecture for Event-Triggered and Time-Triggered Control Paradigms. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [14] OMG. Smart Transducers Interface. Final adopted specification ptc/2002-10-02, Object Management Group, 2002. Available at http://www.omg.org. [15] H. Pfeifer. Formal verification of the TTP group membership algorithm. In Tommaso Bolognesi and Diego Latella, editors, Formal Methods for Distributed System Development Proceedings of FORTE XIII/PSTV XX 2000, Pisa, Italy, October 2000, pp. 3–18. Kluwer Academic Publishers, Dordrecht, The Netherlands. [16] Holger Pfeifer, Detlef Schwier, and Friedrich W. von Henke. Formal verification for time-triggered clock synchronization. In Charles B. Weinstock and John Rushby, editors, Dependable Computing and Fault Tolerant Systems, Vol. 12, Dependable Computing for Critical Applications — 7, IEEE Computer Society, San Jose, CA, 1999, pp. 207–226. [17] Martin Schwarz. Implementation of a TTP/C Cluster Based on Commercial Gigabit Ethernet Components. Master’s thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [18] Wilfried Steiner, John Rushby, Maria Sorea, and Holger Pfeifer. Model Checking a Fault-Tolerant Startup Algorithm: From Design Exploration to Exhaustive Fault Simulation. Paper presented at the International Conference on Dependable Systems and Networks (DSN2004), June 2004. [19] Christopher Temple. Enforcing Error Containment in Distributed Time-Triggered Systems: The Bus Guardian Approach. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 1999. [20] TTTech Computertechnik AG. Specification of the TTP/C Protocol. Available at http://www.tttech.com. [21] TTTech Computertechnik AG. TTP in Commercial Production. Available at http://tttech.com/c u s t o m e r s / . [22] TTTech Computertechnik AG. TTP/C-C2 Data Sheet. Available at http://www.ttchip.com.
© 2005 by CRC Press
13 Controller Area Network: A Survey 13.1 Introduction ......................................................................13-1 13.2 CAN Protocol Basics.........................................................13-2 Physical Layer • Frame Format • Access Technique • Error Management • Fault Confinement • Communication Services • Implementation
13.3 Main Features of CAN....................................................13-12 Advantages • Drawbacks • Solutions
Gianluca Cena IEIIT-CNR
Adriano Valenzano IEIIT-CNR
13.4 Time-Triggered CAN ......................................................13-14 Main Features • Protocol Specification • Implementation
13.5 CAN-Based Application Protocols.................................13-16 CANopen • DeviceNet
References ...................................................................................13-20
13.1 Introduction The history of Controller Area Network (CAN) starts more than 20 years ago. At the beginning of the 1980s a group of engineers at Bosch GmbH were looking for a serial bus system suitable for use in passenger cars. The most popular solutions adopted at that time were considered inadequate for the needs of most automotive applications. The bus system, in fact, had to provide a number of new features that could hardly be found in the already existing fieldbus architectures. The design of the new proposal also involved several academic partners and had the support of Intel, as the potential main semiconductor producer. The new communication protocol was presented officially in 1986 with the name of Automotive Serial Controller Area Network at the Society of Automotive Engineers (SAE) congress held in Detroit. It was based on a multimaster access scheme to the shared medium that resembled the well-known carriersense multiple-access (CSMA) approach. The peculiar aspect, however, was that CAN adopted a new distributed nondestructive arbitration mechanism to solve contentions on the bus by means of priorities implicitly assigned to the colliding messages. Moreover, the protocol specifications also included a number of error detection and management mechanisms to enhance the fault tolerance of the whole system. In the following years, both Intel and Philips started to produce controller chips for CAN following two different philosophies. The Intel solution (often referred to as FullCAN in the literature) required less host CPU power, since most of the communication and network management functions were carried out directly by the network controller. Instead, the Philips solution (BasicCAN) was simpler but imposed a higher load on the processor used to interface the CAN controller. Since the mid-1990s more than 15 semiconductor vendors, including Siemens, Motorola, and NEC, have been producing and shipping millions of CAN chips mainly to car manufacturers such as Mercedes-Benz, Volvo, Saab, Volkswagen, BMW, Renault, and Fiat. The Bosch specification (CAN version 2.0) was submitted for international standardization at the beginning of the 1990s. The proposal was approved and published as ISO 11898 at the end of 1993 and
13-1 © 2005 by CRC Press
13-2
The Industrial Communication Technology Handbook
contained the description of the network access protocol and the physical layer architecture. In 1995 an addendum to ISO 11898 was approved to describe the extended format for message identifiers. The CAN specification is currently in the process of being revised and reorganized and has been split into four separate parts: [ISO1], [ISO2], and [ISO4] have already been approved as international standards, whereas [ISO3] has reached a stable status and is being finalized. Even though it was conceived for vehicle applications, at the beginning of the 1990s CAN began to be adopted in different scenarios. The standard documents provided satisfactory specifications for the lower communication layers but did not offer guidelines or recommendations for the upper part of the Open Systems Interconnection (OSI) protocol stack, in general, and for the application layer, in particular. This is why the earlier applications of CAN outside the automotive scenario (i.e., textile machines, medical systems, and so on) adopted ad hoc monolithic solutions. The CAN in Automation (CiA) users’ group, founded in 1992, was originally concerned with the specification of a standard CAN application layer. This effort led to the development of the general-purpose CAN application layer (CAL) specification. CAL was intended to fill the gap between the distributed application processes and the underlying communication support, but in practice it was not successful, the main reason being that because CAL is really application independent, each user had to develop a suitable profile based on CAL for her or his specific application field. In the same years, Allen-Bradley and Honeywell started a joint distributed control project based on CAN. Although the project was abandoned a few years later, Allen-Bradley and Honeywell continued their works separately and focused on the higher protocol layers. The results of these activities were the Allen-Bradley DeviceNet solution and the Honeywell Smart Distributed System (SDS). For a number of reasons, SDS remained, in practice, an internal solution to Honeywell Microswitch, while DeviceNet was soon switched to Open DeviceNet Vendor Association and was widely adopted in a number of U.S. factory automation areas, becoming a serious competitor to widespread solutions such as PROFIBUS-DP and INTERBUS. Besides DeviceNet and SDS, other significant initiatives were focused on CAN and its application scenarios. CANopen was conceived in the framework of the European Esprit project ASPIC* by a consortium led once again by Bosch GmbH. The purpose of CANopen was to define a profile based on CAL, which could support communications inside production cells. The original CANopen specifications were further refined by CiA and released in 1995. Later, both CANopen and DeviceNet became European standards, and they are now widely used, especially in two different areas: factory automation and machine-distributed controls.
13.2 CAN Protocol Basics The CAN protocol architecture is structured according to the layered approach of the International Organization for Standardization (ISO)/OSI model. However, as in most of the currently existing networks conceived for use at the field level in the automated manufacturing environments, only few layers have been considered in its protocol stack. This is to make implementations more efficient and inexpensive. Few protocol layers, in fact, imply reduced processing delays when receiving and transmitting messages and simpler communication software. The CAN specifications [ISO1] and [ISO2], in particular, include only the physical and the data link layers, as depicted in Figure 13.1. The physical layer is aimed at managing the effective transmission of data over the communication support and tackles the mechanical, electrical, and functional aspects. Bit timing and synchronization, in particular, belong to this layer. The data link layer is split into two separate sublayers: medium access control (MAC) and logical link control (LLC). The purpose of the MAC entity is basically to manage access to the shared transmission support by providing a mechanism aimed at coordinating the use of the bus, so as to avoid unmanageable collisions. The functions of the MAC sublayer include frame encoding and decoding, arbitration, error
*ASPIC, Automation and control Systems for Production units using Installation bus-Concept.
© 2005 by CRC Press
Controller Area Network: A Survey
13-3
FIGURE 13.1 CAN protocol stack.
checking and signaling, and also fault confinement. The LLC sublayer offers the user (i.e., the application programs running in the upper layers) a proper interface, which is characterized by a well-defined set of communication services, in addition to the ability to decide whether an incoming message is relevant to the node. It is worth noting that the CAN specification is very flexible for what concerns both the implementation of the LLC services and the choice of the physical support, whereas there can be no modifications to the behavior of the MAC sublayer. As said before, unlike most fieldbus networks, the CAN specification does not include any native application layer. However, a number of such protocols exist that rely on CAN and ease the design and implementation of complex CAN systems.
13.2.1 Physical Layer The features of the physical layer of CAN that are valid for any system, such as those related to the physical signaling, are described in ISO 11898-1 [ISO1]. The medium access units (i.e., the transceivers) are defined in two separate documents: ISO 11898-2 [ISO2] and ISO 11898-3 [ISO3] for high-speed and low-speed communications, respectively. The definition of the medium interface (i.e., the connectors) is usually covered in other documents. 13.2.1.1 Network Topology CAN networks are based on a shared-bus topology. Buses have to be terminated at each end with resistors (the recommended nominal impedance is 120 W), so as to suppress signal reflections. For the same reason, the standard documents state that the topology of a CAN network should be as close as possible to a single line. Stubs are permitted for connecting devices to the bus, but their length should be as short as possible. For example, at 1 Mbit/s the length of a stub must be shorter than 30 cm. Several kinds of transmission media can be used: • Two-wire bus, which enables differential signal transmissions and ensures reliable communications. In this case, shielded twisted pair can be used to further enhance the immunity to electromagnetic interferences. • Single-wire bus, a simpler and cheaper solution that features lower immunity to interferences and is mainly suitable for use in automotive applications.
© 2005 by CRC Press
13-4
The Industrial Communication Technology Handbook
• Optical transmission medium, which ensures complete immunity to electromagnetic noise and can be used in hazardous environments. Fiber optics is often adopted to interconnect (through repeaters) different CAN subnetworks. This is done to cover plants that are spread over a large area. Several bit rates are available for the network, the most adopted being in the range of 50 Kbit/s to 1 Mbit/s (the latter value represents the maximum allowable bit rate according to the CAN specifications). The maximum extension of a CAN network depends directly on the bit rate. The exact relation between these two quantities involves parameters such as the delays introduced by transceivers and opto-couplers. Generally speaking, the mathematical product between the length of the bus and the bit rate has to be approximately constant. For example, the maximum extension allowed for a 500 Kbit/s network is about 100 m, and increases up to about 500 m when a bit rate of 125 Kbit/s is considered. Signal repeaters can be used to increase the network extension, especially when large plants have to be covered and the bit rate is low or medium. However, they introduce additional delays on the communication paths; hence the maximum distance between any two nodes is effectively shortened at high bit rates. Using repeaters also achieves topologies different from the bus (trees or combs, for example). In this case, good design could increase the effective area that is covered by the network. It is worth noting that unlike other field networks, such as, for example, PROFIBUS-PA, there is in general no cost-effective way in CAN to use the same wire for carrying both the signal and the power supply. However, an additional pair of wires can be provided inside the bus cable for the power supply. Curiously enough, connectors are not standardized by the CAN specifications. Instead, several companion or higher-level application standards exist that define their own connectors and pin assignment. CiA DS102 [DS102], for example, foresees the use of a SUB-D9 connector, while DeviceNet and CANopen suggest the use of either five-pin ministyle, microstyle, or open-style connectors. In addition, these documents include recommendations for bus lines, cables, and standardized bit rates, which were not included in the original CAN specifications. 13.2.1.2 Bit Encoding and Synchronization In CAN the electrical interface of a node to the bus is based on an open-collector-like scheme. As a consequence, the level on the bus can assume two complementary values, which are denoted symbolically as dominant and recessive. Usually, the dominant level corresponds to the logical value 0 while the recessive level coincides with the logical value 1. CAN relies on the non-return-to-zero (NRZ) bit encoding, which features very high efficiency in that synchronization information is not encoded separately from data. Bit synchronization in each node is achieved by means of a digital phase-locked loop (DPLL), which extracts the timing information directly from the bit stream received from the bus. In particular, the edges of the signal are used for synchronizing the local clocks, so as to compensate tolerances and drifts of the oscillators. To provide a satisfactory degree of synchronization among the nodes, the transmitted bit stream should include a sufficient number of edges. To do this, CAN relies on the so-called bit stuffing technique. In practice, whenever five consecutive bits at the same value (either dominant or recessive) appear in the transmitted bit stream, the transmitting node inserts one additional stuff bit at the complementary value, as depicted in Figure 13.2. These stuff bits can be easily and safely removed by the receiving nodes, to obtain the original stream of bits back.
FIGURE 13.2 Bit stuffing technique.
© 2005 by CRC Press
Controller Area Network: A Survey
13-5
From a theoretical point of view, the maximum number of stuff bits that may be added is one every four bits in the original frame, so the encoding efficiency can be as low as 80% (see, for example, the rightmost part of Figure 13.2, where the original bit stream alternates sequences of four consecutive bits at the dominant level followed by four bits at the recessive level). However, the influence of bit stuffing in real operating conditions is noticeably lower than the theoretical value computed above. Simulations show that, on average, only two to four stuff bits are effectively added to each frame, depending on the size of the identifier and data fields. Despite its being quite efficient, the bit stuffing technique has a drawback: the time taken to send a message over the bus is not fixed; instead, it depends on the content of the message itself. This might cause annoying jitters. Not all fields in a CAN frame are encoded according to the bit stuffing mechanism: it applies only to the initial part of the frames, from the start-of-frame (SOF) bit up to the cyclic redundancy check (CRC) sequence. The remaining fields are of fixed form and are not stuffed.
13.2.2 Frame Format The CAN specification [ISO1] defines both a standard and an extended frame format. These formats mainly differ for the size of the identifier field and for some other bits in the arbitration field. In particular, the standard frame format (also known as CAN 2.0A format) defines an 11-bit identifier field, which means that up to 2048 different identifiers are available to the applications executing in the same network (many older CAN controllers, however, only support identifiers in the range of 0 to 2031). The extended frame format (identified as CAN 2.0B) instead assigns 29 bits to the identifier, so that up to a half billion different objects could exist (in theory) in the same network. This is a fairly high value, which is virtually sufficient for any kind of application. Using extended identifiers in a network to which 2.0A-compliant CAN controllers are also connected usually leads to unmanageable transmission errors, which effectively make the network unstable. Thus, a third category of CAN controllers was developed, known as 2.0B passive: they manage in a correct way the transmission and reception of CAN 2.0A frames, while CAN 2.0B frames are simply ignored so that they do not hang the network. It is worth noting that, in most practical cases, the number of different objects allowed by the standard frame format is more than adequate. Since standard CAN frames are shorter than the extended ones (because of the shorter arbitration field), they permit higher communication efficiency (unless part of the payload is moved into the arbitration field). As a consequence, they are adopted in most of the existing CAN systems, and most of the CAN-based higher-layer protocols, such as CANopen and DeviceNet, basically rely on this format. The CAN protocol foresees only four kinds of frames: data, remote, error, and overload. Their formats are described in detail below. 13.2.2.1 Data Frame Data frames are used to send information over the network. Each data frame in CAN begins with a startof-frame (SOF) bit at the dominant level, as shown in Figure 13.3. Its role is to mark the beginning of the frame, as in serial transmissions carried out by means of conventional Universal Asynchronous Receiver/Transmitters (UARTs). The SOF bit is also used to synchronize the receiving nodes. Immediately after the SOF bit there is the arbitration field, which includes both the identifier and the remote transmission request (RTR) bit. As the name suggests, the identifier field identifies the content of the frame that is being exchanged uniquely on the whole network. The identifier is also used by the MAC sublayer to detect and manage the priority of the frame, which is used whenever a collision occurs (the lower the numerical value of the identifier, the higher the priority of the frame). The identifier is sent starting from the most significant bit up to the least significant one. The size of the identifier is different for the standard and the extended frames. In the latter case, the identifier has been split into an 11-bit base identifier and an 18-bit extended identifier, to provide compatibility with the standard frame format.
© 2005 by CRC Press
13-6
The Industrial Communication Technology Handbook
FIGURE 13.3 Format of data frames.
The RTR bit is used to discriminate between data and remote frames. Since a dominant value of RTR denotes a data frame while a recessive value stands for a remote frame, a data frame has a higher priority than a remote frame having the same identifier. Next to the arbitration field comes the control field. In the case of standard frames, it includes the identifier extension (IDE) bit, which discriminates between standard and extended frames, followed by the reserved bit r0. In the extended frames, the IDE bit effectively belongs to the arbitration field, as well as the substitute remote request (SRR) bit — a placeholder that is sent at recessive value to preserve the structure of the frames. In this case, the IDE bit is followed by the identifier extension and then by the control field, which begins with the two reserved bits r1 and r0. After the reserved bits there is the data length code (DLC), which specifies — encoded on 4 bits — the length (in bytes) of the data field. Since the IDE bit is dominant in the standard frames, while it is recessive in the extended ones, when the same base identifier is considered, standard frames have precedence over extended frames. Reserved bits r0 and r1 must be sent by the transmitting node at the dominant value. Receivers, however, will ignore the value of these bits. For the DLC field, values ranging from 0 to 8 are allowed. According to the last specification, higher values (from 9 to 15) can be used for application-specific purposes. In this case, however, the length of the data field is meant to be 8. The data field is used to store the effective payload of the frame. In order to ensure a high degree of responsiveness and minimize the priority inversion phenomenon, the size of the data field is limited to 8 bytes at most. After the data field there are the CRC and acknowledgment fields. The former field is made up of a cyclic redundancy check sequence encoded on 15 bits, which is followed by a CRC delimiter at the recessive value. The kind of CRC adopted in CAN is particularly suitable to cover short frames (i.e., counting less than 127 bits). The acknowledgment field is made up of two bits: the ACK slot followed by the ACK delimiter. Both of them are sent at the recessive level by the transmitter. The ACK slot, however, is overwritten with a dominant value by each node that has received the frame correctly (i.e., no error was detected up to the ACK field). It is worth noting that, in this way, the ACK slot is actually surrounded by two bits at the recessive level: the CRC and ACK delimiters. By means of the ACK bit, the transmitting node is enabled to discover whether at least one node in the network has received its frame correctly. At the end of the frame there is the end-of-frame (EOF) field, made up of seven recessive bits, which notifies all the nodes of the end of an error-free transmission. In particular, the transmitting node assumes that the frame has been exchanged correctly if no error is detected until the last bit of the EOF field, while in the case of receivers, the frame is valid if there are no errors until the sixth bit of EOF. Different frames are interleaved by the intermission (IMS), which consists of three recessive bits and effectively separates consecutive frames exchanged on the bus. 13.2.2.2 Remote Frames Remote frames are very similar to data frames. The only difference is that they carry no data (i.e., the data field is not present in this case). They are used to request that a given message be sent on the network
© 2005 by CRC Press
Controller Area Network: A Survey
13-7
by a remote node. It is worth noting that the requesting node does not know who the producer of the related information is. It is up to the receivers to discover the one that has to reply. The DLC field in remote frames is not effectively used by the CAN protocol. However, it should be set to the same value as the corresponding data frame, so as to cope with the situations where several nodes send remote requests with the same identifier at the same time (this is legal in a CAN network). In this case, it is necessary for the different requests to be perfectly identical, so that they will overlap in the case of a collision. It should be noted that because of the way the RTR bit is encoded, if a request is made for an object at the same time the transmission of that object is started by the related producer, the contention is resolved in favor of the data frame. 13.2.2.3 Error Frames Error frames are used to notify the nodes in the network that an error has occurred. They consist of two fields: error flag and error delimiter. There are two kinds of error flag: the active error flag is made up of six dominant bits, while the passive error flag consists of six recessive bits. An active error flag violates the bit stuffing rules or the fixed-format parts of the frame that is currently being exchanged; hence, it enforces an error condition that is detected by all other stations connected to the network. Each node that detects an error condition transmits an error flag on its own. In this way, as a consequence of the transmission of an error flag, there can be from 6 to 12 dominant bits on the bus. The error delimiter is made up of eight recessive bits. After the transmission of an error flag, each node starts sending recessive bits, and at the same time, it monitors the bus level until a recessive bit is detected. At this point the node sends seven more recessive bits, hence completing the error delimiter. 13.2.2.4 Overload Frames Overload frames can be used by the slow receivers to slow down operations on the network. This is done by adding an extra delay between consecutive data and remote frames. Their format is very similar to that of error frames. In particular, it is made up of an overload flag followed by an overload delimiter. Today’s CAN controllers are very fast, and so they make the overload frame almost useless.
13.2.3 Access Technique The medium access control mechanism on which CAN relies is basically carrier-sense multiple access (CSMA). When no frame is being exchanged, the network is idle and the level on the bus is recessive. Before transmitting a frame, the nodes have to observe the state of the network. If the network is idle, the frame transmission begins immediately; otherwise, the node must wait for the current frame transmission to end. Each frame starts with the SOF bit at the dominant level, which informs all the other nodes that the network has switched to the busy state. Even though very unlikely, it may happen that two or more nodes start sending their frames exactly at the same time. This is actually possible because the propagation delays on the bus — even though very small — are greater than zero. Thus, one node might start its transmission while the SOF bit of another frame is already traveling on the bus. In this case, a collision will occur. In CSMA networks that are based on collision detection, such as, for example, nonswitched Ethernet, this unavoidably leads to the corruption of all frames involved, which means that they have to be retransmitted. The consequence is a waste of time and a net decrease of the available bandwidth. In high-load conditions, this may lead to congestion: when the number of collisions is so high that the net throughput on the Ethernet network falls below the arrival rate, the network becomes stalled. Unlike Ethernet, CAN is able to resolve the contentions in a deterministic way, so that neither time nor bandwidth is wasted. Therefore, congestion conditions can no longer occur and all the theoretical system bandwidth is effectively available for communications. For the sake of truth, it should be said that contentions in CAN occur more often than one may think. In fact, when a node that has a frame to transmit finds the bus busy or loses the contention, it waits for
© 2005 by CRC Press
13-8
The Industrial Communication Technology Handbook
the end of the current frame exchange, and immediately after the intermission has elapsed, it starts transmitting. Here, the node may compete with other nodes for which — in the meantime — a transmission request has been issued. In this case, the different nodes synchronize on the falling edge of the first SOF bit that is sensed on the network. This implies that the behavior of a CAN network is effectively that of a network-wide distributed transmission queue where messages are selected for transmission according to a priority order. 13.2.3.1 Bus Arbitration The most distinctive feature of the medium access technique of CAN is the ability to resolve in a deterministic way any collision that should occur on the bus. In turn, this is made possible by the arbitration mechanism, which effectively finds out the most urgent frame each time there is a contention for the bus. The CAN arbitration scheme allows the collisions to be resolved by stopping the transmissions of all frames involved except the one that is characterized by the highest priority (i.e., the lowest identifier). The arbitration technique exploits the peculiarities of the physical layer of CAN, which conceptually provides a wired-end connection scheme among all the nodes. In particular, the level on the bus is dominant if at least one node is sending a dominant bit; likewise, the level on the bus is recessive if all the nodes are transmitting recessive bits. By means of the so-called binary countdown technique, each node — immediately following the SOF bit — transmits the message identifier serially on the bus, starting from the most significant bit. When transmitting, each node checks the level observed on the bus against the value of the bit that is being written out. If the node is transmitting a recessive value and the level on the bus is dominant, the node understands it has lost the contention and withdraws immediately. In particular, it ceases transmitting and sets its output port to the recessive level so as not to interfere with the other contending nodes. At the same time, it switches to the receiving state to read the incoming (winning) frame. The binary countdown technique ensures that in the case of a collision, all the nodes that are sending lower-priority frames will abort their transmissions by the end of the arbitration field, except for the one that is sending the frame characterized by the highest priority (the winning node does not even realize that a collision has occurred). This implies that no two nodes in a CAN network can be transmitting messages related to the same object (that is to say, characterized by the same identifier) at the same time. If this is not the case, in fact, unmanageable collisions could take place that, in turn, cause transmission errors. Because of the automatic retransmission feature of the CAN controllers, this will lead almost certainly to a burst of errors on the bus, until the stations involved are disconnected by the fault confinement mechanism. This implies that, in general, only one node can be the producer of each object. One exception to this rule is given by the frames without a data field, such as, for example the remote frames. In this case, should a collision occur among frames with the same identifier, they overlap perfectly and hence no collision effectively occurs. The same is also true for data frames that have a nonempty data field, provided that the content of this field is the same for all the frames sharing the same identifier. However, it makes no sense in general to send frames with a fixed data field. All nodes that lose the contention have to retry the transmission as soon as the exchange of the current (winning) frame ends. They will all try to send their frames again immediately after the intermission is read on the bus. Here, a new collision could take place that also involves the frames sent by the nodes for which a transmission request was issued while the bus was busy. An example that shows the detailed behavior of the arbitration phase in CAN is outlined in Figure 13.4. Here, three nodes (that have been indicated symbolically as A, B, and C) start transmitting a frame at the same time (maybe at the end of the intermission following the previous frame exchange over the bus). As soon as a node understands it has lost the contention, it switches its output level to the recessive value, so that it no longer interferes with the other transmitting nodes. This event takes place when bit ID5 is being sent for node A, while for node B this happens at bit ID2. Node C manages to send the entire identifier field, and then it can keep on transmitting the remaining part of the frame.
© 2005 by CRC Press
Controller Area Network: A Survey
13-9
FIGURE 13.4 Arbitration phase in CAN.
13.2.4 Error Management One of the main requirements that was fundamental in the definition of the CAN protocol was the need to have a communication system characterized by high robustness, i.e., a system that is able to detect most of the transmission errors. Hence, particular care has been taken in defining error management. The CAN specification foresees five different mechanisms to detect transmission errors: 1. Cyclic redundancy check: When transmitting a frame, the originating node adds a 15-bit-wide CRC to the end of the frame itself. Receiving nodes reevaluate the CRC to check if it matches the transmitted one. Generally speaking, the CRC used in CAN is able to discover up to 5 erroneous bits distributed arbitrarily in the frame or errors bursts including up to 15 bits. 2. Frame check: The fixed-format fields in the received frames can be easily tested against their expected values. For example, the CRC and ACK delimiters as well as the EOF field have to be at the recessive level. If one or more illegal bits are detected, a form error is generated. 3. Acknowledgment check: The transmitting node checks whether the ACK bit has been set to the dominant value in the received frame. On the contrary, an acknowledgment error is issued. 4. Bit monitoring: Each transmitting node compares the level on the bus against the value of the bit that is being written. Should a mismatch occur, an error is generated. This does not hold for the arbitration field or the acknowledgment slot. Such an error check is very effective to detect local errors that may occur in the transmitting nodes. 5. Bit stuffing: Each node verifies whether the bit stuffing rules have been violated in the portion of the frames from the SOF bit up to the CRC sequence. In the case when six bits of identical value are read from the bus, an error is generated. The residual probability that a corrupted message is not detected in a CAN network — under realistic operating conditions — has been evaluated and is found to be about 4.7 · 10–11 times the frame error rate or less.
13.2.5 Fault Confinement To prevent a node that is not operating properly from sending repeatedly corrupted frames, hence blocking the entire network, a fault confinement mechanism has been included in the CAN specification. The fault confinement unit supervises the correct operation of the related MAC sublayer, and should the node become defective, it disconnects that node from the bus. The fault confinement mechanism has been conceived to discriminate, as long as it is possible, between permanent failures and short disturbances that may cause bursts of errors on the bus. According to this mechanism, each node can be in one of the three following states:
© 2005 by CRC Press
13-10
The Industrial Communication Technology Handbook
• Error active • Error passive • Bus off Error-active and error-passive nodes take part in the communication in the same way. However, they react to the error conditions differently. They send active error flags in the former case and passive error flags in the latter. This is because an error-passive node has already experienced several errors, and hence it should avoid interfering with the network operations (a passive error flag, in fact, does not corrupt the ongoing frame exchange). The fault confinement unit uses two counters to track the behavior of the node with respect to the transmission errors: transmission error count (TEC) and receive error count (REC). The rules by which TEC and REC are managed are actually quite complex. However, they can be summarized as follows: each time an error is detected, the counters are increased by a given amount, whereas successful exchanges decrease them by one. Furthermore, the amount of the increase for the nodes that first detected the error is higher than for the nodes that simply replied to the error flag. In this way, it is very likely that the counters of the faulty nodes increase more quickly than the nodes that are operating properly, even when sporadic errors due to electromagnetic noise are considered. When counters exceed the first threshold (127), the node is switched to the error-passive state, to try not to affect the network. When a second threshold (255) is exceeded, the node is switched to the busoff state. At this point, it can no longer transmit any frame on the network, and it can be switched back to the error-active state only after it has been reset and reconfigured.
13.2.6 Communication Services According to the ISO specification [ISO1], the LLC sublayer of CAN provides two communication services only: L_DATA, which is used to broadcast the value of a specific object over the network, and L_REMOTE, which is used to ask for the value of a specific object to be broadcast by the related remote producer. From a practical point of view, these primitives are implemented directly in the hardware by all currently available CAN controllers. 13.2.6.1 Model for Information Exchanges Unlike most network protocols conceived for use in automated manufacturing environments (which rely on node addressing), CAN adopts object addressing. In other words, messages are not tagged with the address of the destination or originating node. Instead, each piece of information that is exchanged over the network (often referred to as an object) is assigned a unique identifier, which denotes unambiguously the meaning of the object itself in the whole system. This fact has important consequences on the way communications are carried out in CAN. In fact, identifying the objects that are exchanged over the network according to their meaning rather than to the node they are intended for implicitly allows multicasting and makes it very easy for the control applications to manage interactions among devices according to the producer–consumer paradigm. The exchange of information in CAN takes place according to the three phases shown in Figure 13.5: 1. The producer of a given piece of information encodes and transmits the related frame on the bus (the arbitration technique will transparently resolve any contention that should occur). 2. Because of the intrinsically broadcast nature of the bus, the frame is propagated all over the network, and every node reads its content in a local receive buffer. 3. The frame acceptance filtering (FAF) function in each node determines whether the information is relevant to the node itself. If it is, the frame is passed to the upper communication layers (from a practical point of view, this means that the CAN controller raises an interrupt to the local device logic, which will then read the value of the object); if it is not, the frame is simply ignored and discarded.
© 2005 by CRC Press
Controller Area Network: A Survey
13-11
FIGURE 13.5 Producer–consumer model.
In the sample data exchange depicted in Figure 13.5, node B is the producer of some kind of information that is relevant to (i.e., consumed by) nodes A and D. Node C is not interested in such data, so it is rejected by the filtering function (this is the default behavior of the FAF function). 13.2.6.2 Model for Device Interaction The access technique of CAN makes this kind of network particularly suitable to be used in distributed systems that communicate according to the producer–consumer model. In this case, data frames are used by the producer nodes to broadcast new values over the network, each of which is identified unambiguously by means of its identifier. Unlike the networks based on the producer–consumer–arbiter model, such as the Factory Instrumentation Protocol (FIP), information is sent in CAN as soon as it becomes available from either the control applications or the controlled physical system (by means of sensors), without the need for the intervention of a centralized arbiter. This noticeably improves the responsiveness of the whole system. CAN networks also work equally well when they are used to interconnect devices in systems based on a more conventional master–slave communication model. In this case, the master can use remote frames to ask for some specific information to be remotely sent on the network. The producer of that information, as a consequence of this frame, will reply with a data frame carrying the related object. It is worth noting that this kind of interaction is implemented in CAN in a fairly more flexible way than in the conventional master–slave networks, such as, for example, PROFIBUS-DP. In CAN, in fact, it is not necessary for the reply (data frame) to follow the request (remote frame) immediately. In other words, the network is not kept busy while the device is trying to send the reply. This allows the entire bandwidth to be theoretically available to the applications. Furthermore, the reply containing the requested value is broadcast on the whole network, and hence it can be read by all the interested nodes, in addition to the one that transmitted the remote request.
13.2.7 Implementation According to the internal architecture, CAN controllers can be classified in two different categories: BasicCAN and FullCAN. Conceptually, BasicCAN controllers are provided with one transmit and one receive buffer, as in conventional UARTs. The frame-filtering function, in this case, is generally left to the application programs (i.e., it is under control of the host controller), even though some kind of filtering can be done by the controller. To avoid overrun conditions, a double-buffering scheme based on shadow receive buffers is usually available, which permits a new frame to be received from the bus while the previous one is being read by the host controller. An example of a controller based on the BasicCAN scheme is given by Philips’ PCA82C200. FullCAN implementations foresee a number of internal buffers that can be configured to either receive or transmit some particular messages. In this case, the filtering function is implemented directly in the
© 2005 by CRC Press
13-12
The Industrial Communication Technology Handbook
CAN controller. When a new frame that is of interest for the node is received from the network, it is stored in the related buffer, where it can then be read by the host controller. In general, new values simply overwrite the previous ones, and this does not lead to an overrun condition (the old value of a variable is superseded by a newer one). The Intel 82526 and 82527 CAN controllers are based on the FullCAN architecture. FullCAN controllers, in general, free the host controller of a number of activities, so they are considered to be more powerful than BasicCAN controllers. However, the most recent CAN controllers embed the operating principles of both above architectures, so the above classification is actually in the process of being superseded.
13.3 Main Features of CAN The medium access technique on which CAN relies basically implements a nonpreemptive distributed priority-based communication system, where each node is enabled to compete directly for the bus ownership, so that it can send messages on its own (this means that CAN is a true multimaster system). This can be advantageous for use in event-driven systems.
13.3.1 Advantages CAN is by far more simple and robust than the token-based access schemes (such as, for example, PROFIBUS when used in multimaster configurations). In fact, there is no need to build or maintain the logical ring, nor to manage the circulation of the token around the master stations. In the same way, it is noticeably more flexible than the solutions based on the time-division multiple-access (TDMA) or combined-message approaches — two techniques adopted by SERCOS and INTERBUS, respectively. This is because message exchanges do not have to be known in advance. When compared to schemes based on centralized polling, such as FIP, it is not necessary to have a node in the network that acts as the bus arbiter, which can become a point of failure for the whole system. Since in CAN all the nodes are masters (at least from the point of view of the MAC mechanism), it is very simple for them to notify asynchronous events, such as, for example, alarms or critical error conditions. In all cases where this aspect is important, CAN is clearly better than the above-cited solutions. Thanks to the arbitration scheme, it is certain that no message will be delayed by lower-priority exchanges (this phenomenon is known as priority inversion). Since the CAN protocol is not preemptive (as is the case for almost all existing protocols), a message can still be delayed by a lower-priority one whose transmission has already started. This is unavoidable in any nonpreemptive system. However, as the frame size in CAN is very small (standard frames are 135 bits long at most, including stuff bits), the blocking time experienced by the very urgent messages is in general quite low. This makes CAN a very responsive network, which explains why it is used in many real-time control applications despite its relatively low bandwidth. The above characteristics have to be considered carefully when assigning identifiers to the different objects that have to be exchanged in distributed real-time control applications. From an intuitive point of view, the most urgent messages (i.e., the messages characterized by the tightest deadlines) should be assigned the lowest identifiers (for example, identifier 0 labels the message that has the highest priority in any CAN network). If the period of cyclic data exchanges (and the minimum interarrival time of the acyclic ones) is known in advance, a number of techniques based on either the rate monotonic or deadline monotonic approaches have appeared in the literature [TIN] that can be used to find (if it exists) a suitable assignment of identifiers to the objects, so that the resulting schedule is feasible (i.e., the deadlines of all the objects are always respected).
13.3.2 Drawbacks There are a number of drawbacks that affect CAN, the most important being related to performance, determinism, and dependability. Though they were initially considered mostly irrelevant, as time elapses they are becoming quite limiting in a number of application fields.
© 2005 by CRC Press
Controller Area Network: A Survey
13-13
13.3.2.1 Performances Even though inherently elegant, the arbitration technique of CAN poses serious limitations on the performance that can be obtained by the network. In fact, in order for the arbitration mechanism to operate correctly, it is necessary for the signal to be able to propagate from a node located at one end of the bus up to the farthest node (at the other end) and come back before the originating samples the level on the bus. Since the sampling point is located roughly after the middle of each bit (the exact position can be programmed by means of suitable registers), the end-to-end propagation delay, including the hardware delay of transceivers, must be shorter than about one quarter of the bit time (the exact value depending on the bit timing configuration in the CAN controller). As the propagation speed of signals is fixed (about 200 m/µs on copper wires), this implies that the maximum length allowed for the bus is necessarily limited and depends directly on the bit rate chosen for the network. For example, a 250 Kbit/s CAN network can span at most 200 m. Similarly the maximum bus length allowed when the bit rate is selected as equal to 1 Mbit/s is only 40 m. This, to some degree, explains why the maximum bit rate allowed by CAN specifications [ISO1] has been limited to 1 Mbit/s. It is worth noting that this limitation depends on physical factors, and hence it cannot be overcome in any way by advances in the technology of transceivers (to make a comparison, at present, several inexpensive communication technologies are available on the market that allow bit rates in the order of tens or hundreds of Mbit/s). Even though this can appear to be a very limiting factor, it will probably not have any relevant impact in the near future for several application areas — including automotive and process control applications — for which cheap and well-assessed technology is more important than performance. However, there is no doubt that CAN will suffer in a couple of years from the higher bit rates of its competitors, i.e., PROFIBUS-DP (up to 12 Mbit/s), SERCOS (up to 16 Mbit/s), INTERBUS (up to 2 Mbit/s), FlexRay (up to 10 Mbit/s), or the networks based on Industrial Ethernet (up to 100 Mbit/s). Such solutions, in fact, are able to provide a noticeably higher data rate, which is necessary for the systems that have a lot of devices and very short cycle times (1 ms or less). 13.3.2.2 Determinism Because of its nondestructive bitwise arbitration scheme, CAN is able to resolve in a deterministic way any collision that might occur on the bus. However, if nodes are allowed to produce asynchronous messages on their own — this is the way event-driven systems usually operate — there is no way to know in advance the exact time a given message will be sent. This is because it is not possible to foresee the actual number of collisions a node will experience with higher-priority messages. This behavior leads to potentially dangerous jitters, which in some kind of applications, such as, for example, those involved in the automotive field, might affect the control algorithms in a negative way and worsen its precision. In particular, it might happen that some messages miss their intended deadlines. Related to determinism is the problem that composability is not ensured in CAN networks. This means that when several subsystems are connected to the same network, the overall system may fail to satisfy some timing requirement, even though each subsystem was tested separately and proved to behave correctly. This is a severe limitation to the chance of integrating subsystems from different vendors, and hence makes the design tasks more difficult. 13.3.2.3 Dependability The last drawback of CAN concerns dependability. Whenever safety-critical applications are considered, where a communication error may lead to damages to the equipment or even injuries to human beings, such as, for example, in automotive x-by-wire systems, a highly dependable network has to be adopted. Reliable error detection should be achieved both in the value and in the time domain. In the former case, conventional techniques such as, for example, the use of a suitable CRC are adequate. In the latter case, a time-triggered approach is certainly more appropriate than the event-driven communication scheme provided by CAN. In time-triggered systems all actions (including message exchanges, sampling
© 2005 by CRC Press
13-14
The Industrial Communication Technology Handbook
of sensors, actuation of commanded values, and task activations) are known in advance and must take place at precise points in time. In this context even the presence (or absence) of a message at a given instant provides significant information (i.e., it enables the discovery of faults). Also related to dependability issues is the so-called babbling idiot problem, from which the CAN system might suffer. In fact, a faulty node that repeatedly transmits a very high priority message on the bus can block the whole network. Such a failure cannot be detected by the fault confinement unit embedded in CAN chips, as it does not depend on physical faults, but is due to logical errors.
13.3.3 Solutions Among the possible solutions conceived to enhance the behavior of CAN is the so-called time-triggered CAN (TTCAN) protocol [ISO4], for which the first chips are already available. By adopting a common clock and a time-triggered approach it is possible to reduce jitters and provide a fully deterministic behavior. If asynchronous transmissions are not allowed in the system (which means that the arbitration technique is not actually used), TTCAN effectively behaves like a TDMA system, and thus there is not any particular limitation on the bit rate (which could be increased above the theoretical limit of CAN). However, such a solution is generally not advisable, in that the behavior of the resulting network becomes noticeably different from CAN. Other solutions have appeared in the literature for improving CAN performances, such as, for example, WideCAN [WCAN], that provide higher bit rates and still rely on the conventional CAN arbitration technique. However, at present their interest is mainly theoretical.
13.4 Time-Triggered CAN The time-triggered CAN protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. However, it can be profitably used in those applications characterized by tight timing requirements that demand strictly deterministic behavior. In TTCAN, in fact, it is possible to decide exactly the point in time when safety-critical messages will be exchanged, irrespective of the network load. Moreover, composability is much improved with respect to CAN, so that it is possible to split a system into several subsystems that can be developed and tested separately. The TTCAN specification is now stable and is being standardized by ISO [ISO4]. The main reason that led to the definition of TTCAN was the need to provide improved communication determinism while maintaining the highest degree of compatibility with the existing CAN devices and development tools. In this way, noticeable savings in the investments for the communication technology can be achieved.
13.4.1 Main Features One of the most appealing features of TTCAN is that it allows event-driven and time-triggered operations to coexist in the same network. To ease migration from CAN, TTCAN foresees two levels of implementations that are known as levels 1 and 2, respectively. Level 1 implements basic time-triggered communications over CAN. Level 2, which is a proper extension of level 1, also offers a means for maintaining a global system time across the whole network, irrespective of tolerances and drifts of the local oscillators. This enables high-end synchronization, and hence true time-triggered operations can take place in the system. The TTCAN protocol is placed above the (unchanged) CAN protocol. It allows time-triggered exchanges to take place in a quasi-conventional CAN network. Because TTCAN relies on CAN directly (they adopt the same frame format and the same transmission protocol), it suffers from the same performance limitations of the underlying technology. In particular, it is not practically feasible to increase the transmission speed above 1 Mbit/s. However, because of the time-triggered paradigm it relies on, TTCAN is able to ensure strictly deterministic communications, which means that, for example, it is suitable for the first generation of drive-by-wire automotive systems — which are provided with hydrau-
© 2005 by CRC Press
Controller Area Network: A Survey
13-15
lic/mechanical backups. However, it will likely be unsuitable for the next generation of steer-by-wire applications. In these cases, in fact, the required bandwidth is noticeably higher.
13.4.2 Protocol Specification TTCAN is based on a centralized approach, where a special node called the time master (TM) keeps the whole network synchronized by regularly broadcasting a reference message (RM), usually implemented as a high-priority CAN message. Redundant time masters can be envisaged to provide increased reliability. Whenever receiving RM, each node restarts its cycle timer, so that a common view of the elapsing time is ensured across the whole network. In practice, every time a SOF bit is read on the bus, a synchronization event is generated in every network controller that causes the local time to be copied in a sync mark register. If the SOF bit is related to a valid reference message, the sync mark register is then loaded into the reference mark register. At this point, the cycle time is evaluated as the difference between the current local time and the reference mark. Two kinds of RM are foreseen: in level 1 implementations RM is 1 byte long, whereas level 2 relies on a 4-byte RM that is backward compatible with level 1 (from a practical point of view, 3 bytes are added for distributing the global time as seen by the time master). Protocol execution is driven by the progression of the cycle time. In particular, a number of time marks are defined in each network controller as either transmission or receive triggers, which are used for sending messages and validating message receptions, respectively. In TTCAN each node does not have to know all the messages in the network. Instead, only details of the messages the node sends or reads are needed. Transmission of data is organized as a sequence of basic cycles (BCs). Each basic cycle begins with the reference message, which is followed by a fixed number of time windows that are configured offline and can be of the following three types: • Exclusive windows: Each exclusive window is statically reserved to a predefined message, so that collisions cannot occur. They are used for safety-critical data that have to be sent deterministically and without jitters. • Arbitration windows: Such windows are not preallocated to any given message; thus, different competing messages will rely on the nondestructive CAN arbitration scheme to resolve any possible collision that might occur. • Free windows: They are reserved for future expansions of TTCAN systems. So that time windows are not exceeded, in TTCAN controllers it should be possible to disable the automatic retransmission feature of CAN when either the contention is lost or transmission errors are detected. The only exception occurs when several adjacent arbitrating windows exist. In this case, they can be merged to provide a single larger window, which can accommodate asynchronously generated messages in a more flexible way. Despite it seamlessly mixing both synchronous (exclusive) and asynchronous (arbitration) messages, TTCAN is very dependable: in fact, should there be a temporary lack of synchronization and more than one node tries to transmit in the same exclusive window, the arbitrating scheme of CAN is used to solve the collision. For increased flexibility, it is possible to have more than one basic cycle. A system matrix can be defined that consists of up to 64 different BCs, which are repeated periodically (see Figure 13.6). Thus, the effective periodicity in TTCAN is given by the so-called matrix cycle. A cycle counter — included in the first byte of RM — is used by every node to determine the current basic cycle. It is increased each cycle up to a maximum value (which is selected on a network-wide basis before operation is started), after which it is restarted. It should be noted that the system matrix is highly column oriented. In particular, each BC is made up of the same sequence of time windows; i.e., corresponding windows in different BCs have the same duration. However, they can be used to convey different messages, depending on the cycle counter. In this way, it is possible to have messages in exclusive time windows that are repeated once every any given
© 2005 by CRC Press
13-16
The Industrial Communication Technology Handbook
FIGURE 13.6 System matrix in TTCAN.
number of BCs. In this case, each message is assigned a repeat factor and a cycle offset, which characterize its transmission schedule. In the same way, it is possible to have more than one exclusive window in the BC allocated to the same message. This is useful either to replicate critical data or for having a refresh rate for some variables that is faster than the basic cycle.
13.4.3 Implementation TTCAN requires slight and inexpensive changes to the current CAN chips. In particular, transmit and receive triggers and a counter for managing the cycle time are needed for ensuring time-triggered operations. Even though level 1 could be implemented in software, a specialized hardware support can reduce noticeably the burden on the processor for managing time-triggered operations. As level 2-compliant controllers should allow drift correction and calibration of the local time, they need modified hardware. The structure of TTCAN modules is very similar to that of conventional CAN modules. In particular, two additional blocks are needed: the trigger memory and the frame synchronization entity. The former is used for storing the time marks of the system matrix. They are linked to the message buffers held in the controller’s memory. The latter is used to control the time-triggered communications. At present, there are several controllers available off-the-shelf that comply with TTCAN specifications, so that this protocol can be readily embedded in new projects.
13.5 CAN-Based Application Protocols To reduce the costs of designing and implementing automated systems, a number of higher-level application protocols have been defined in the past few years that rely on the CAN data link layer to exchange messages among the nodes (all the functions of the data link layer of CAN are implemented directly in hardware in the current CAN controllers, which increases the efficiency and reliability of the data exchanges). The aim of such protocols is to provide a usable and well-defined set of service primitives that can be used to interact with the field devices in a standardized way. At present, two of the most widely available solutions for the process control and automated manufacturing environments are CANopen [COP] and DeviceNet [DNET]. Both of them define an object model that describes the behavior of devices. This permits interoperability and interchangeability among devices coming from different manufacturers. In fact, as long as a device conforms to a given profile, it can be used in place of any other device (of a different brand) that adheres to the same profile.
© 2005 by CRC Press
Controller Area Network: A Survey
13-17
13.5.1 CANopen CANopen was originally conceived to rely on the communication services provided by the CAN application layer (CAL). However, the latest specifications [DS301] no longer refer explicitly to CAL. Instead, the relevant communication services have been embedded directly in the CANopen documents. In CANopen, information is exchanged by means of communication objects (COBs). A number of different COBs are foreseen, which are aimed at different functions: • Process data objects (PDOs), used for real-time exchanges such as, for example, measurements read from sensors and commanded values sent to the actuators for controlling the physical system • Service data objects (SDOs), used for non-real-time communications, i.e., parameterization of devices and diagnostics • Emergency objects (EMCY), used by devices to notify the control application that some error condition has occurred • Synchronization object (SYNC), used to achieve synchronized and coordinated operations in the system Even though in principle every CAN node is a master, at least from the point of view of the MAC mechanism, CANopen systems often rely on a master–slave approach, so as to simplify system configuration and network management. In most cases, in a CANopen network there is only one application master (which is responsible for actually controlling the operations of the automated system) and up to 127 slave devices (sensors and actuators). Each device is identified by means of a unique 7-bit address, called the node identifier, which lies in the range of 1 to 127. The node identifier 0 is used in general for broadcast communications. To ease network configuration, a predefined master–slave connection set has to be provided mandatorily by every CANopen device. It is a standard allocation scheme of identifiers to COBs that is available directly after initialization, when a node is switched on or reset — provided that no modifications have been stored in a nonvolatile memory of the device. COB identifiers in the predefined connection set are made up of a function code — which takes the four most significant bits of the CAN identifier — followed by the node address. The function code, on which mainly depends the priority of the COB, is used to discriminate among the different kinds of COBs, that is, PDOs, SDOs, EMCYs, network management (NMT) functions, and so on. 13.5.1.1 Object Dictionary The behavior of any CANopen device is described completely by means of a number of objects, each one tackling a particular aspect related to either the communications on the CAN bus or the functions available to interact with the physical controlled system (for example, there are objects that define the device type, the manufacturer’s name, the hardware and software versions, and so on). All the objects relevant to a given node are stored in the object dictionary (OD) of that node. Entries in the OD are addressed by means of a 16-bit index. Each entry, in turn, can either be represented by a single value or consist of several components that are accessible through an 8-bit subindex (such as the arrays and records). The object dictionary is split into four separate parts, according to the index of entries. Entries below 1000H are used to specify data types. Entries from 1000H to 1FFFH are used to describe communicationspecific parameters (i.e., the interface of the device as seen by the CAN network). Entries from 2000H to 5FFFH can be used by manufacturers to extend the basic set of functions of their devices. Their use has to be considered carefully, in that they could make devices no longer interoperable. Finally, entries from 6000H to 9FFFH are used to describe in a standardized way all aspects related to a specific category of devices (as defined in a device profile). 13.5.1.2 Process Data Objects All the real-time process data involved in controlling a physical system are exchanged in CANopen by means of PDOs. Each PDO is mapped on exactly one CAN frame, so that it can be exchanged quickly
© 2005 by CRC Press
13-18
The Industrial Communication Technology Handbook
and reliably. As a direct consequence, the amount of data that can be exchanged with one PDO is limited to 8 bytes at most. In most cases, this is more than sufficient to encode an item of process data. According to the predefined connection set, each node in CANopen can define up to four receive PDOs (from the application master to the device) and four transmit PDOs (from the device to the application master). In case more PDOs are needed, the PDO communication parameter entries in the OD of the device can be used to define additional messages — or to change the existing ones. By using the PDO mapping parameter — if supported by the device — it is even possible to define in the configuration phase which application objects (i.e., process variables) will be included in each PDO. The transmission of PDOs from the slave devices can be triggered by some local event taking place on the node — including the expiration of some time-out — or it can be remotely requested from the master. This gives system designers a very high degree of flexibility in choosing how devices interact in the automated system, and enables the features offered by intelligent devices to be exploited better. No additional control information is added to PDOs by CANopen, so that communication efficiency is as high as in CAN. This means that the meaning of each PDO is determined directly by the related identifier. As multicasting is allowed on PDOs, their transmission is unconfirmed; i.e., the producer has no way to determine whether the PDO has been read by all the intended consumers. One noticeable feature of CANopen is that it can provide synchronous operations. In particular, it is possible to configure the transmission type of each single PDO so that its exchanges will be driven by the occurrence of the SYNC message, which is sent regularly by a node known as sync master (which usually is the same node as the application master). Synchronous data exchanges, in this case, take place in periodic communication cycles. When synchronous operations are selected, commanded values are not actuated by devices as soon as they are received, nor are sampled values transmitted immediately. Instead, as depicted in Figure 13.7, each time a SYNC message is read from the network, the PDOs received in the previous communication cycle are actuated by every output device. At the same time, all sensors will sample their input ports and the measured values will be sent as soon as possible in the next cycle. A synchronous window length parameter can be defined that specifies the latest time when it is certain that all commanded values have been made available to devices. After that time the processing of output values can be started. Synchronous operations provide a noticeable improvement for what concerns the effect of jitters: in this case, in fact, system operations and timings are decoupled by the actual times PDOs are exchanged over the network. As the SYNC message is mapped on a high-priority frame, jitters are, at worst, the same as the duration of the longest CAN message.
FIGURE 13.7 Synchronous operation.
© 2005 by CRC Press
Controller Area Network: A Survey
13-19
13.5.1.3 Service Data Objects SDOs are used in CANopen for parameterization and configuration, which usually take place at a lower priority than process data (hence, they are effectively considered non-real-time exchanges). In this case a confirmed transmission service has to be provided, which ensures a reliable exchange of information. Furthermore, SDOs are only available on a peer-to-peer communication basis (multicasting is not allowed). A fragmentation protocol has been adopted for SDOs — which derives from the domain transfer services of CAL — so that information of any size can be exchanged. This means that the SDO sender has to split the information in smaller chunks, which are then reassembled at the receiving side. This affects the communication efficiency in a negative way. However, as SDOs are not used for the real-time control of the system, this is not a problem. SDOs are used to access the entries of the object dictionary directly, so that they can be read or modified by the configuration tools. From a practical point of view, two services are provided, which are used to upload and download the content of one subentry of the OD, respectively. According to the predefined connection set, each node must provide SDO server functionalities and has to define a pair of COB IDs for dealing with the OD access, one for each direction of transfer. In a CANopen network only one SDO client at a time is usually allowed (in reality, what is needed is that all SDO connections between clients and servers be defined statically). It is optionally possible to provide dynamic establishment of additional SDO connections by means of a network entity called the SDO manager. 13.5.1.4 Network Management There are two kinds of functions related to network management (NMT): node control and error control. Node control services are used to control the operation of either a single node or the whole network. For example, they can be used to start or stop nodes, to reset their state, or to put a node in configuration (preoperational) mode. Such commands are definitely time critical, and hence they use the highestpriority communication object available in CAN. Error control services are used to monitor the correct operation of the network. Two mechanisms are available: node guarding and heartbeat. In both cases, low-priority messages are exchanged periodically in the background over the network by the different nodes and suitable watchdogs are defined, both in the NMT master and in slave nodes. Should one device cease sending these messages, after a given timeout the network management layer is made aware of the problem and can take the appropriate actions. 13.5.1.5 Device Profiles In order to provide interoperability, a number of device profiles have been standardized in CANopen. Each profile describes the common behavior of a particular class of devices and is usually described in a separate document. Among the available profiles are the following: • I/O devices [DS401], which include both digital and analog input/output devices • Drives and motion control, which are used to describe digital motion products, such as stepper motors and servo-drives • Human machine interfaces, which describe the use of displays and operator interfaces • Measuring devices and closed-loop controllers, which measure and control physical quantities • IEC 61131-3 programmable device, which describes the behavior of programmable logic controllers (PLCs) and intelligent devices • Encoders, which define incremental/absolute linear and rotary encoders to measure both position and velocity The I/O device profile, for instance, permits the definition of the polarity of each digital input/output port, or the application of a filtering mask for disabling selected bits. Device ports can be accessed in groups of 1, 8, 16, or 32 bits. For analog devices, it is possible to use the raw value or a converted one (after a scaling factor and an offset have been applied), or to define triggering conditions when specific thresholds are exceeded.
© 2005 by CRC Press
13-20
The Industrial Communication Technology Handbook
13.5.2 DeviceNet DeviceNet [DNET] is a very flexible protocol to be used at the field level in the automated environments. The implementation of devices that comply with DeviceNet is, in general, slightly more complex than that for CANopen devices. However, DeviceNet offers a number of additional features with respect to CANopen, which can be used, for example, in complex multimaster networks. One appealing feature of DeviceNet is that it is based on the same Control and Information Protocol (CIP) adopted by ControlNet and EtherNet/IP. This means that a good level of interoperability is ensured among these networks, making it possible to interconnect them to provide seamless communications from devices at the plant floor up to the Internet. In addition to the services at the application level, the DeviceNet specification also defines the physical layer in detail, including aspects such as connectors and cables (thin, thick, and flat cables are foreseen). It should be noted that the cable in DeviceNet can be used for both the signal and power supply (by using 4 wires plus ground). Each DeviceNet network can include up to 64 different devices, which means that each node is identified by means of a 6-bit MAC ID. The allowable bit rates are limited to 125, 250, and 500 Kbit/s, which means that the permitted maximum bus extensions lie in the range of 100 to 500 m. 13.5.2.1 Object Model The behavior and functions of each device are described in detail in DeviceNet by means of objects. In particular, three kinds of objects are foreseen: communication, system, and application-specific objects. Two very important objects are the connection object, which defines all aspects related to a connection (including the CAN identifier and the triggering mode), and the application object, which defines the standardized behavior of a class of devices. Data and services made available by each device are addressed by means of a hierarchical addressing scheme that is based on the following components: MAC ID (i.e., the device’s address), class ID, instance ID, attribute ID, and service code. The class, instance, and attribute identifiers are usually specified on 8 bits, while the service code is made up of a 7-bit integer. 13.5.2.2 Communication Model Communication among nodes (either point-to-point or multicast) takes place according to a connectionoriented scheme. By using the standard 11-bit CAN identifier, it is possible to provide an addressing scheme based on four message groups, in decreasing order of priority: • Message group 1 includes the highest-priority identifiers and permits up to 16 different messages per node. • Message group 2 essentially refers to the predefined master–slave connection set. • Message group 3 is similar to message group 1, but it is made up of low-priority frames. • Message group 4 is primarily used for network management. Basically, two kinds of communication are possible: explicit messages and I/O messages. Explicit messages are used for general data exchanges among devices, such as configuration, management, and diagnostics. These kind of exchanges take place on the network at a low priority. I/O messages are used to exchange high-priority real-time messages according to the producer–consumer model. Because the underlying communication system is based on a CAN network, each frame can include 8 bytes at most. Should one item of data exceed this size, a fragmentation protocol has been defined in DeviceNet that manages message splitting and the successive reassembly.
References [COP] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 4: CANopen, EN 503254, 2001.
© 2005 by CRC Press
Controller Area Network: A Survey
13-21
[DNET] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 2: DeviceNet, EN 503252, 2000. [DS102] CAN in Automation International Users and Manufacturers Group e.V., CAN Physical Layer for Industrial Applications: Two-Wire Differential Transmission, CiA DS 102, version 2.0, 1994. [DS301] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Application Layer and Communication Profile, CiA DS 301, version 4.02, 2002. [DS401] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Device Profile for Generic I/O Modules, CiA DS 401, version 2.1, 2002. [ISO1] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 1: Data Link Layer and Physical Signalling, ISO 11898-1, 2003. [ISO2] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 2: High-Speed Medium Access Unit, ISO 11898-2, 2003. [ISO3] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 3: Low-Speed, Fault-Tolerant, Medium Dependent Interface, TC 22/SC 3/WG 1, ISO/PRF 11898-3, 2003. [ISO4] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 4: Time-Triggered Communication, ISO 11898-4, 2004. [TIN] Tindell K.W., Burns A., and Wellings A.J., Calculating Controller Area Network (CAN) messages response times, Control Engineering Practice, 3, 1163–1169, 1995. [WCAN] Cena G. and Valenzano A., A multistage hierarchical distributed arbitration technique for priority-based real-time communication systems, in IEEE Transactions on Industrial Electronics, 49, 1227–1239, 2002.
© 2005 by CRC Press
14 The CIP Family of Fieldbus Protocols 14.1 Introduction ......................................................................14-1 14.2 Description of CIP ............................................................14-3 Object Modeling • Services • Messaging Protocol • Communication Objects • Object Library • Device Profiles • Configuration and Electronic Data Sheets • Bridging and Routing • Data Management
14.3 Network Adaptations of CIP..........................................14-18 DeviceNet • ControlNet • EtherNet/IP
14.4 Benefits of the CIP Family .............................................14-50 Benefits for the Manufacturer of Devices • Benefits for the Users of Devices and Systems
14.5 Protocol Extensions under Development......................14-51 CIP Sync • CIP Safety
Viktor Schiffer Rockwell Automation
14.6 Conclusion.......................................................................14-64 References ...................................................................................14-64
14.1 Introduction In the past, typical fieldbus protocols (e.g., Profibus, Interbus-S, FIP (Factory Instrumentation Protocol), P-Net, AS-i (Actuator/Sensor Interface)) have been isolated implementations of certain ideas and functionalities that the inventors thought were best suited to solve a certain problem or do a certain job. This has led to quite effective fieldbuses that do their particular job quite well, but they are optimized for certain layers within the automation pyramid or are limited in their functionality (e.g., strict single master systems running a Master/Slave protocol). This typically results in barriers within the automation architecture that are difficult to penetrate and that require complex gateway devices without being able to fully bridge the gap between the various systems that can be quite different in nature. In contrast, the CIP™* family of protocols (CIP = Common Industrial Protocol) offers a scalable solution that allows a uniform protocol to be employed from the top level of an automation architecture down to the device level without burdening the individual devices. DeviceNet™* is the first member of this protocol family introduced in 1994. DeviceNet is a CIP implementation using the very popular Controller Area Network (CAN) data link layer. CAN in its typical form (ISO 11898 [11]) defines layers 1 and 2 of the OSI seven-layer model [14] only, while DeviceNet covers the rest. The low cost of implementation and the ease of use of the DeviceNet protocol has led to a large number of manufacturers, with many of them organized in the Open DeviceNet Vendor Association (ODVA; see http://www.odva.org). *CIP™ and DeviceNet™ are trademarks of ODVA.
14-1 © 2005 by CRC Press
14-2
The Industrial Communication Technology Handbook
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP (Presentation)
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
(Session)
Transport
ControlNet Transport
DeviceNet Transport
Encapsulation TCP
Network
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.1
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP, its implementations, and the ISO/OSI layer model.
ControlNet™,* introduced a few years later (1997), implemented the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters) for more demanding applications. Vendors and users of ControlNet products are organized within ControlNet International (CI; see http://www.controlnet.org) to promote the use of these products. In 2000, ODVA and ControlNet International introduced the newest member of the CIP family — EtherNet/IP™,† where IP stands for Industrial Protocol. In this network adaptation, CIP runs over TCP/ IP and therefore can be deployed over any Transmission Control Protocol (TCP)/Internet Protocol (IP)supported data link and physical layers, the most popular of which is IEEE 802.3 [12], commonly known as Ethernet. The universal principles of CIP easily lend themselves to possible future implementations on new physical/data link layers, e.g., ATM, USB, or FireWire. The overall relationship between the three implementations of CIP and the ISO/OSI layer model is shown in Figure 14.1. Two significant additions to CIP are currently being worked on: CIP Sync™ and CIP Safety™.‡ CIP Sync allows synchronization of applications in distributed systems through precision real-time clocks in all devices. These real-time clocks are kept in tight synchronization by background messages between clock masters and clock slaves using the new IEEE 1588:2002 standard [24]. A more detailed description of this CIP extension is given in Section 14.5.1. CIP Safety is a protocol extension that allows the transmission of safety-relevant messages. Such messages are governed by additional timing and integrity mechanisms that are guaranteed to detect system flaws to a very high degree, as required by international standards such as IEC 61508 [15]. If anything goes wrong, the system will be brought to a safe state, typically taking the machine to a standstill. A more detailed description of this CIP extension is given in Section 14.5.2. In both cases, ordinary devices can operate with CIP Sync or CIP Safety devices side by *ControlNet™ is a trademark of ControlNet International. †EtherNet/IP™ is a trademark of ControlNet International under license by ODVA. ‡CIP Sync™ and CIP Safety™ are trademarks of ODVA.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-3
side in the same system. There is no need for strict segmentation into Standard, Sync, and Safety networks. It is even possible to have any combination of all three functions in one device.
14.2 Description of CIP CIP is a very versatile protocol that has been designed with the automation industry in mind. However, due to its very open nature, it can be applied to many more areas. The overall CIP Specification is divided into several volumes: • Volume 1 is the CIP Specification. It contains all general parts of the specification that apply to all the network variants. • Volume 2 is the EtherNet/IP Specification. It contains the adaptation of CIP to the Ethernet TCP/ IP and User Datagram Protocol (UDP)/IP transportation layers and all details that apply specifically to EtherNet/IP, including extensions and any modifications of the CIP Specification. • Volume 3 is the DeviceNet Specification. It contains the adaptation of CIP to the CAN data link layer and all details that apply specifically to DeviceNet, including extensions and any modifications of the CIP Specification. • Volume 4 is the ControlNet Specification. It contains the adaptation of CIP to the ControlNet data link layer and all details that apply specifically to ControlNet, including extensions and any modifications of the CIP Specification. • Volume 5 will contain CIP Safety; it is planned to be published in early 2005. The CIP Specification [4] is available from ODVA. It is beyond the scope of this handbook to fully describe each and every detail of this specification, but the key features will be presented. The specification is subdivided into several chapters and appendices that describe the following features: • • • • • • • • •
Object modeling Messaging protocol Communication objects General object library Device profiles Electronic Data Sheets Services Bridging and routing Data management
There are a few more chapters containing descriptions of further CIP elements, but they are not of significance in the context of this book. A few terms used throughout this section should be described here to ensure they are well understood: • Client: Within a Client/Server architecture, the client is the device that sends a request to a server. The client expects a response from the server. • Server: Within a Client/Server architecture, the server is the device that receives a request from a client. The server is expected to give a response to the client. • Producer: Within a Producer/Consumer architecture, the producing device places a message on the network for consumption by one or several consumers. The produced message is in general not directed to a specific consumer. • Consumer: Within a Producer/Consumer architecture, the consumer is one of potentially several consuming devices that pick up a message placed on the network by a producing device. • Producer/Consumer model: CIP makes use of the Producer/Consumer model as opposed to the traditional Source/Destination message addressing scheme (Figure 14.2). It is inherently multicast. Nodes on the network determine if they should consume the data in a message based on the Connection ID in the packet.
© 2005 by CRC Press
14-4
The Industrial Communication Technology Handbook
Source/Destination src
dst
data
crc
data
crc
Producer/Consumer identifier
FIGURE 14.2
Source/Destination vs. Producer/Consumer model.
• Explicit Message: Explicit Messages contain addressing and service information that directs the receiving device to perform a certain service (action) on a specific part (e.g., an attribute) of a device. • Implicit (Input/Output (I/O)) Message: Implicit Messages do not carry address or service information; the consuming node(s) already know what to do with the data based on the Connection ID that was assigned when the connection was established. They are called Implicit Messages because the meaning of the data is implied by the Connection ID. Let us now have a look at the individual elements of CIP.
14.2.1 Object Modeling CIP makes use of abstract object modeling to describe: • The suite of available communication services • The externally visible behavior of a CIP node • A common means by which information within CIP products is accessed and exchanged Every CIP node is modeled as a collection of objects. An object provides an abstract representation of a particular component within a product. Anything not described in object form is not visible through CIP. CIP objects are structured into classes, instances, and attributes. A class is a set of objects that represent the same kind of system component. An object instance is the actual representation of a particular object within a class. Each instance of a class has the same attributes, but it has its own particular set of attribute values. As Figure 14.3 illustrates, multiple object instances within a particular class can reside within a CIP node. In addition to the instance attributes, an object class may also have class attributes. These are attributes that describe properties of the whole object class, e.g., how many instances of this particular object exist. Furthermore, both object instances and the class itself exhibit a certain behavior and allow certain services to be applied to the attributes, instances, or whole class. All publicly defined objects that are implemented in a device must follow at least the mandatory requirements of the CIP specification. Vendor-specific objects may also be defined with a set of instances, attributes, and services according to the requirements of the vendor. However, they need to follow certain rules described in Chapter 4 of the CIP Specification [4]. The objects and their components are addressed by a uniform addressing scheme consisting of: • Node Identifier: An integer identification value assigned to each node on a CIP network. On DeviceNet and ControlNet, this is also called MAC ID (Media Access Control Identifier) and is nothing more than the node number of the device. On EtherNet/IP the Node ID is the IP address. • Class Identifier (Class ID): An integer identification value assigned to each object class accessible from the network. • Instance Identifier (Instance ID): An integer identification value assigned to an object instance that identifies it among all instances of the same class.
© 2005 by CRC Press
14-5
The CIP Family of Fieldbus Protocols
CIP Node
A Class of Objects
FIGURE 14.3
Object Instances
A class of objects.
• Attribute Identifier (Attribute ID): An integer identification value assigned to a class or instance attribute. • Service Code: An integer identification value that denotes an action request that can be directed at a particular object instance or object class (see Section 14.2.2). Object Class Identifiers are divided into open objects, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x00F0 to 0x02FF), and vendor-specific objects (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other Class Identifiers are reserved for future use. In some cases, e.g., within the Assembly Object class, Instance Identifiers are divided into open instances, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x0100 to 0x02FF), and vendor-specific instances (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other instance identifiers are reserved for future use. Attribute Identifiers are divided into open attributes, defined in the CIP Specifications (ranging from 0x00 to 0x63), and vendor-specific attributes (ranging from 0x64 to 0xC7); the other Attribute Identifiers are reserved for future use. Vendor-specific objects can be created with a lot of freedom, but they still have to adhere to certain rules specified for CIP; e.g., they can use whatever Instance and Attribute IDs they wish, but their class attributes must follow the CIP Specification. Figure 14.4 shows an example of this object addressing scheme. More details on object modeling can be found in Chapters 1 and 4 of the CIP Specification [4]. Node ID #1
Node ID #2 Node ID #4: Object Class #5: Instance #2:Attribute #2
CIP Link
Object Class #5 Instance #1
Object Class #5 Attribute #2
Object Class #7 Instance #1
Instance #2
Node ID #3 Instance #1
Node ID #4
FIGURE 14.4
© 2005 by CRC Press
Object addressing example.
14-6
The Industrial Communication Technology Handbook
14.2.2 Services Service Codes are used to define the action that is requested to take place when an object or parts of an object are addressed through Explicit Messages using the addressing scheme described in Section 14.2.1. Apart from the simple read and write functions, a set of CIP Common Services (totaling 22, currently described in [4]) have been defined. These CIP Common Services are common in nature, which means that they can be used in all CIP networks and that they are useful for a large variety of objects. Furthermore, there are object-specific Service Codes that may have a different meaning for the same code, depending on the class of object. Finally, there is a possibility to define vendor-specific services according to the requirements of the developer. While this gives a lot of flexibility, the disadvantage of vendor-specific services is that they may not be understood universally. Complete details of the CIP Service Codes can be found in Appendix A of the CIP common Specification [4].
14.2.3 Messaging Protocol CIP is a connection-based protocol. A CIP Connection provides a path between multiple application objects. When a connection is established, the transmissions associated with that connection are assigned a Connection ID (CID) (Figure 14.5). If the connection involves a bidirectional exchange, then two Connection ID values are assigned. The definition and format of the Connection ID is network dependent. For example, the Connection ID for CIP Connections over DeviceNet is based on the CAN Identifier field. Since most messaging on a CIP network is done through connections, a process has been defined to establish such connections between devices that are not connected yet. This is done through the Unconnected Message Manager (UCMM) function, which is responsible for the processing of Unconnected Explicit Requests and Responses. The general method to establish a CIP Connection is by sending a UCMM Forward_Open Service Request Message. While this is the method used on ControlNet and EtherNet/IP (all devices that allow Connected Messaging support it), it is rarely used on DeviceNet so far. For DeviceNet, the simplified methods described in Sections 14.3.1.11 and 14.3.1.12 are typically used. DeviceNet Safety™* (see Section 14.5.2), on the other hand, fully utilizes this service. A Forward_Open request contains all information required to create a connection between the originator and the target device and, if requested, a second connection between the target and the originator. In particular, the Forward_Open request contains information on the following: • • • • •
Time-out information for this connection Network Connection ID for the connection from the originator to the target Network Connection ID for the connection from the target to the originator Information on the identity of the originator (Vendor ID and Serial Number) (Maximum) data sizes of the messages on this connection
FIGURE 14.5
Connections and Connection IDs.
*DeviceNet Safety™ is a trademark of ODVA.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-7
• Trigger mechanisms, e.g., Cyclic, Change of State (COS) • Connection Path for the application object data in the node The Connection Path may also contain a Routing Segment that allows connections to exist across multiple CIP networks. The Forward_Open request may also contain an electronic key of the target device (Vendor ID, Device Type, Product Code, Revision), as well as configuration information that will be forwarded to the Configuration Assembly of the target device. Some networks, like ControlNet and EtherNet/IP, may also make extensive use of Unconnected Explicit Messaging, while DeviceNet uses Unconnected Messaging only to establish connections. All connections in a CIP network can be divided into I/O Connections and Explicit Messaging Connections: • I/O Connections provide dedicated, special-purpose communication paths between a producing application and one or more consuming applications. Application-specific I/O data move through these ports and are often referred to as Implicit Messaging. These messages are typically multicast. • Explicit Messaging Connections provide generic, multipurpose communication paths between two devices. These connections are often referred to as just Messaging Connections. Explicit Messages provide the typical Request/Response-oriented network communications. These messages are typically point-to-point. The actual data transmitted in CIP I/O Messages are the I/O data in an appropriate format — it may be prepended by a Sequence Count value. This Sequence Count value can be used to distinguish old data from new, e.g., if a message has been re-sent as a heartbeat in a COS Connection. The two states Run and Idle can be indicated with an I/O Message either by prepending a Run/Idle header, used for ControlNet and EtherNet/IP, or by sending I/O data (Run) or no I/O data (Idle), mainly used for DeviceNet. Run is the normal operative state of a device; the reaction to receiving an Idle event is vendor-specific and application-specific. Typically, this means bringing all outputs of the device to an Idle state, and that typically means “off,” i.e., de-energized. Explicit Messaging requests, on the other hand, contain a Service Code with path information to the desired object (attribute) within the target device followed by data (if any). The associated responses repeat the Service Code followed by status fields followed by data (if any). DeviceNet uses a condensed format for Explicit Messages, while ControlNet and EtherNet/IP use the full format. More details of the messaging protocol can be found in Chapter 2 of the CIP Specification [4].
14.2.4 Communication Objects The CIP communication objects manage and provide the runtime exchange of messages. While these objects follow the overall principles and guidelines for CIP objects, the communication objects are unique in a way since they are the focal point for all CIP communication. It therefore makes sense to have a look at them in more detail. Every instance of a communication object contains a link producer part or a link consumer part, or both. I/O Connections may be either producing or consuming or producing and consuming, while Explicit Messaging Connections are always producing and consuming. Figure 14.6 and Figure 14.7 show the typical connection arrangement for CIP I/O Messaging and CIP Explicit Messaging. The attribute values in the Connection Objects define a set of attributes that describe vital parameters of this connection. Note that Explicit Messages are always directed to the Message Router Object. First of all, they state what kind of connection this is. They specify whether this is an I/O Connection or an Explicit Messaging Connection, but also the maximum size of the data to be exchanged across this connection, and the source and sink of this data. Note that Explicit Messages are always directed to the Message Router Object. Further attributes define the state of this connection and what kind of behavior this connection is to show. Of particular importance is how messages are triggered (from the application, through Change of State or Change of Data, through Cyclic events or network events) and the timing of the connections
© 2005 by CRC Press
14-8
The Industrial Communication Technology Handbook
I/O Connection
I/O Producing Application Object
Producing I/O Connection
I/O Consuming Application Object
Consuming I/O Connection
Device #2
Device #1 I/O Message
I/O Consuming Application Object
Consuming I/O Connection
Device #3
FIGURE 14.6
CIP I/O Multicast Connection.
Explicit Messaging Connection Device #1 Request Application Object
Explicit Messaging Connection
Device #2 Request Explicit Messages
Response
FIGURE 14.7
Explicit Messaging Connection
Message Router Response
Obj. Obj.
CIP Explicit Messaging Connection.
(time-out associated with this connection and predefined action if a time-out occurs). CIP allows multiple connections to coexist in a device, although simple devices, e.g., simple DeviceNet slaves, will typically only have one or two connections alive at any given point in time. Complete details of the communication objects can be found in Chapter 3 of the CIP Specification [4].
14.2.5 Object Library The CIP family of protocols contains a very large collection of commonly defined objects (currently 48 object classes). The overall set of object classes can be subdivided into three types: • General-use objects • Application-specific objects • Network-specific objects Apart from the objects that are network-specific, all other objects are used in all three CIP network types. Figure 14.8 shows the general-use objects, Figure 14.9 shows a group of application-specific objects, and Figure 14.10 shows a group of network-specific objects. New objects are added on an ongoing basis. The general-use objects can be found in many different devices, while the application-specific objects are typically only found in devices hosting such applications.
© 2005 by CRC Press
14-9
The CIP Family of Fieldbus Protocols
• Identity Object, see Section 14.2.5.1
• Parameter Object, see Section 14.2.5.2
• Message Router Object
• Parameter Group Object
• Assembly Object, see Section 14.2.5.3
• Acknowledge Handler Object
• Connection Object, see Section 14.2.4
• Connection Configuration Object
• Connection Manager Object, see Section 14.2.4
• Port Object
• Register Object
• Selection Object • File Object
FIGURE 14.8
General-use objects.
• Discrete Input Point Object
• Sequencer Object
• Discrete Output Point Object
• Command Block Object
• Analog Input Point Object
• Motor Data Object
• Analog Output Point Object
• Control Supervisor Object
• Presence Sensing Object
• AD/DC Drive Object
• Group Object
• Overload Object
• Discrete Input Group Object
• Softstart Object
• Discrete Output Group Object
• S-Device Supervisor Object
• Discrete Group Object
• S-Analog Sensor Object
• Analog Input Group Object
• S-Analog Actor Object
• Analog Output Group Object
• S-Single Stage Controller Object
• Analog Group Object
• S-Gas Calibration Object
• Position Sensor Object
• Trip Point Object
• Position Controller Supervisor Object
• S-Partial Pressure Object
• Position Controller Object FIGURE 14.9
Application-specific objects.
• DeviceNet Object, see Section 14.3.1.4.1 • ControlNet Object, see Section 14.3.2.4.1 • ControlNet Keeper Object, see Section 14.3.2.4.2 • ControlNet Scheduling Object, see Section 14.3.2.4.3 • TCP/IP Interface Object, see Section 14.3.3.5.1 • Ethernet Link Object, see Section 14.3.3.5.2 FIGURE 14.10
© 2005 by CRC Press
Network-specific objects.
14-10
The Industrial Communication Technology Handbook
Parameter
Application Object(s)
Identity
Message Router
Assembly
Required Objects
Optional Objects I/O
Explicit msg
Connection(s)
Network Link* * - DeviceNet - ControlNet - Ethernet
CIP Network
FIGURE 14.11
Typical device object model.
This looks like a large number of object types, but typical devices only implement a subset of these objects. Figure 14.11 shows the object model of such a typical device. The objects required in a typical device are: • • • •
Either a Connection Object or a Connection Manager Object An Identity Object One or several network link-related objects (depends on network) A Message Router Object (at least its function)
Further objects are added according to the functionality of the device. This allows very good scalability of implementations so that small devices such as a proximity sensor on DeviceNet are not burdened with unnecessary overhead. Developers typically use publicly defined objects (see above list), but can also create their own objects in the vendor-specific areas, e.g., Class IDs 100 to 199. However, it is strongly encouraged to work with the Special Interest Groups (SIGs) of ODVA and ControlNet International to create common definitions for further objects instead of inventing private ones. Out of the general-use objects, several will be described in more detail below. 14.2.5.1 Identity Object (Class Code 0x01) This object is described in more detail for two reasons: (1) being a relatively simple object, it can easily be used to show the general principles, and furthermore, (2) every device must have an Identity Object. Therefore, it is of general interest in this context. The vast majority of devices only support one instance of the Identity Object. Thus, there are typically no requirements for any class attributes that would describe further class details, e.g., how many instances exist in the device; only instance attributes are required in most cases. There are mandatory attributes (Figure 14.12) and optional attributes (Figure 14.13). • The Vendor ID attribute allows an identification of the vendor of every device. This UINT (Unsigned Integer) value (for Data Type descriptions, see Section 14.2.9) is assigned to a specific vendor by ODVA or ControlNet International. If a vendor intends to build products for more than one CIP network, he will get the same Vendor ID for all networks. • The Device Type specifies which profile has been used for this device. It must be one of the Device Types described in Chapter 6 of the CIP Specification [4] or a vendor-specific type (see Section 14.2.6). • The Product Code is a UINT number defined by the vendor of the device. This is used to distinguish multiple products of the same Device Type from the same vendor.
© 2005 by CRC Press
14-11
The CIP Family of Fieldbus Protocols
• Vendor ID
• Status
• Device Type
• Serial Number
• Product Code
• Product Name
• Revision FIGURE 14.12
Mandatory attributes.
• State • Configuration Consistency Value • Heartbeat Interval • Languages Supported FIGURE 14.13
Optional attributes.
• The Revision is split into two USINT (Unsigned Short Integer) values specifying a Major Revision and a Minor Revision. Any change of the device that results in a modified behavior of the device on the network must be reflected in a change of at least the Minor Revision. Any change in the device that needs a revised Electronic Data Sheet (EDS; see Section 14.2.7) must be reflected in a change of the Major Revision. Vendor ID, Device Type, Product Code, and Major Revision allow an unambiguous identification of an EDS for this device. • The Status attribute provides information on the status of the device, e.g., whether it is owned (controlled by another device), whether it is configured (to something different from the out-ofthe-box default), and whether any major or minor faults have occurred. • The Serial Number is used to uniquely identify individual devices in conjunction with the Vendor ID; i.e., no two CIP devices of a vendor may carry the same Serial Number. The 32 bits of the Serial Number allow ample space for a subdivision into number ranges that could be used by different divisions of larger companies. • The Product Name attribute allows the vendor to give a meaningful ASCII name string (up to 32 characters) to the device. • The State attribute describes the state of a device in a single UINT value; it is thus less detailed than the Status attribute. • The Configuration Consistency Value allows a distinction between a configured and an unconfigured device or between different configurations in a device. This helps avoid unnecessary configuration downloads. • The Heartbeat Interval allows enabling of the Device Heartbeat Message and setting the maximum time between two heartbeats to 1 to 255 s. The services supported by the class and instance attributes are either Get_Attribute_Single (typically implemented in DeviceNet devices) or Get_Attributes_All (typically implemented in ControlNet and EtherNet/IP devices). None of the attributes is settable, except for the Heartbeat Interval (if implemented). The only other service that is typically supported by the Identity Object is the reset service. The behavior of the Identity Object is described through a state transition diagram. This and further details of the Identity Object can be found in Chapter 5 of the CIP Specification [4]. 14.2.5.2 Parameter Object (Class Code 0x0F) This object is described in some detail since its concept is referred to in Section 14.2.7, “Configuration and Electronic Data Sheets.” This object, when used, comes in two “flavors”: a complete object and an abbreviated version (Parameter Object Stub). This abbreviated version is mainly used by DeviceNet
© 2005 by CRC Press
14-12
The Industrial Communication Technology Handbook
Parameter Value
This is the actual parameter.
Link Path Size Link Path
These two attributes contain information on what application object/instance/ attribute the parameter value is retrieved from.
Descriptor
This describes parameter properties, e.g., read-only, monitor parameter, etc.
Data Type
This must be one of the Data Types described in Chapter C-6.1 of the CIP Specification, see Section 14.2.9.
Data Size
Data size in bytes.
FIGURE 14.14
Parameter Object Stub attributes.
devices that only have small amounts of memory available. The Object Stub in conjunction with the Electronic Data Sheet has more or less the same functionality as the full object (see Section 14.2.7). The purpose of this object is to provide a general means to allow access to many attributes of the various objects in the device without a simple tool (such as a handheld terminal) having to know anything about the specific objects in the device. The class attributes of the Parameter Object contain information on how many instances exist in this device and a Class Descriptor indicating, among other properties, whether a full or stub version is supported. Furthermore, they tell whether a Configuration Assembly is used and what language is used in the Parameter Object. Of the instance attributes, the first six are those required for the Object Stub. These are listed in Figure 14.14. These six attributes already allow access, interpretation, and modification of the parameter value, but the remaining attributes make life a lot better: • The next three attributes provide ASCII strings with the name of the parameter, its engineering units, and an associated help text. • Another three attributes contain the minimum, maximum, and default values of the parameter. • The next four attributes that follow allow scaling of the parameter value so that the parameter can be displayed in a more meaningful way, e.g., raw value in multiples of 10 mA, scaled value displayed in amps. • Another four attributes follow that can link the scaling values to other parameters. This feature allows variable scaling of parameters, e.g., percentage scaling to a full range value that is set by another parameter. • Attribute 21 defines how many decimal places are to be displayed if the parameter value is scaled. • Finally, the last three attributes are an international language version of the parameter name, its engineering units, and the associated help text. 14.2.5.3 Assembly Object (Class Code 0x04) Using the Assembly Object gives the option of mapping data from attributes of different instances of various classes into one single attribute (attribute 3) of an instance of the Assembly Object. This mapping is generally used for I/O Messages to maximize the efficiency of the control data exchange on the network. Due to the Assembly mapping, the I/O data are available in one block; thus, there are fewer Connection Object instances and fewer transmissions on the network. The process data are normally combined from different application objects. An Assembly Object can also be used to configure a device with a single data block, rather than having to set individual parameters. CIP makes a distinction between Input and Output Assemblies. Input and output in this context are viewed from the network. An Input Assembly reads data from the application and produces it on the network. An Output Assembly consumes data from the network and writes the data to the application. This data mapping is very flexible; even mapping of individual bits is permitted. Assemblies can also be
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.15
14-13
Example of an Assembly mapping.
used to transmit a complete set of configurable parameters instead of accessing them individually. These Assemblies are called Configuration Assemblies. Figure 14.15 shows an example of an Assembly mapping. The data from application objects 100 and 101 are mapped in two instances of the Assembly Object. Instance 1 is set up as an Input Assembly for the input data and instance 2 as an Output Assembly for output data. The data block is always accessed via attribute 3 of the relevant Assembly instance. Attributes 1 and 2 contain mapping information. The I/O Assembly mapping is specified for certain Device Profiles (e.g., Motor Starters) by the ODVA. Device developers can then choose which Assemblies they support in their products. If none of the publicly defined Assemblies fully represent the functionality of the product, a device vendor may implement additional vendor-specific Assemblies (Instance IDs 100 to 199). CIP defines static and dynamic Assembly Objects. Whereas mapping for static Assemblies is permanently programmed in the device (ROM), it can be modified and extended for dynamic mapping (RAM). Most simple CIP devices support only static Assembly Objects. Dynamic Assembly Objects tend to be used in more complex devices.
© 2005 by CRC Press
14-14
The Industrial Communication Technology Handbook
• Generic Device (0x00)
• Motor Starter (0x16)
• AC Drives (0x02)
• Soft Starter (0x17)
• Motor Overload (0x03)
• Human Machine Interface (0x18)
• Limit Switch (0x04)
• Mass Flow Controller (0x1A)
• Inductive Proximity Switch (0x05)
• Pneumatic Valve(s) (0x1B)
• Photoelectric Sensor (0x06)
• Vacuum/Pressure Gauge (0x1C)
• General Purpose Discret I/O (0x07)
• Process Control Valve (0x1D)
• Resolver (0x09)
• Residual Gas Analyzer (0x1E)
• Communications Adapter (0x0C)
• DC Power Generator (0x1F)
• ControlNet Programmable Logic Controller (0x0E)
• RF Power Generator (0x20)
• Position Controller (0x10)
• Turbomolecular Vacuum Pump (0x21)
• DC Drives (0x13)
• ControlNet Physical Layer (0x32)
• Contactor (0x15)
• (this is not a “normal” profile, it does not • contain any objects)
FIGURE 14.16
CIP Device Types.
14.2.6 Device Profiles It would be possible to design products using only the definitions of communication links and objects, but this could easily result in similar products having quite different data structures and behavior. To overcome this situation and to make the application of CIP devices much easier, devices of similar functionality have been grouped into Device Types with associated profiles. Such a CIP profile contains the full description of the object structure and behavior. Figure 14.16 shows the Device Types and associated profiles that have been defined so far in the CIP Specification [4] (profile numbers are in parentheses). Device developers must use a profile. Any device that does not fall into the scope of one of the specialized profiles must use the Generic Device profile or a vendor-specific profile. What profile is used and which parts of it are implemented must be described in the user documentation of the device. Every profile consists of a set of objects, some required, some optional, and a behavior associated with that particular type of device. Most profiles also define one or several I/O data formats (Assemblies) that define the meaning of the individual bits and bytes of the I/O data. In addition to the publicly defined object set and I/O data Assemblies, vendors can add objects and Assemblies of their own if they have devices that have additional functionality. If that is still not appropriate, vendors can create profiles of their own within the vendor-specific profile range. They are then free to define whatever behavior and objects are required for their device as long as they stick to some general rules for profiles. Whenever additional functionality is used by multiple vendors, ODVA and ControlNet International encourage coordinating these new features through discussion in the Special Interest Groups (SIGs). They can then create new profiles and additions to the existing profiles for everybody’s use and the benefit of the device users. All open (ODVA/CI defined) profiles carry numbers in the 0x00 through 0x63 or 0x0100 through 0x02FF ranges, while vendor-specific profiles carry numbers in the 0x64 through 0xC7 or 0x0300 through 0x02FF ranges. All other profile numbers are reserved by CIP. Complete details of the CIP profiles can be found in Chapter 6 of the CIP Specification [4].
14.2.7 Configuration and Electronic Data Sheets CIP has made provisions for several options to configure devices:
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
• • • • •
14-15
A printed data sheet Parameter Objects and Parameter Object Stubs An Electronic Data Sheet (EDS) A combination of an EDS and Parameter Object Stubs A Configuration Assembly combined with any of the above methods
When using configuration information collected on a printed data sheet, configuration tools can only provide prompts for service, class, instance, and attribute data and relay this information to a device. While this procedure can do the job, it is the least desirable solution since it does not determine the context, content, or format of the data. Parameter Objects, on the other hand, provide a full description of all configurable data of a device. This allows a configuration tool to gain access to all parameters and maintain a user-friendly interface since the device itself provides all the necessary information. However, this method imposes a burden on a device with full parameter information, which may be excessive for a small device, e.g., a simple DeviceNet slave. Therefore, an abbreviated version of the Parameter Object, called Parameter Object Stub, may be used (see Section 14.2.5.2). This still allows access to the parameter data, but it does not describe any meaning of this data. Parameter Stubs in conjunction with a printed data sheet are usable, but certainly not optimal. On the other hand, an EDS supplies all the information that a full Parameter Object contains in addition to I/O Connection information. The EDS thus provides the full functionality and ease of use of the Parameter Object without imposing an excessive burden on the individual devices. Another value of the EDS is that it provides a means for tools to do offline configuration and download the configuration data to the device at a later point in time. An EDS is a simple ASCII text file that can be generated on any ASCII editor. The CIP Specification lays down a set of rules for the overall design and syntax of an EDS. The main purpose of the EDS is to give information on several aspects of the device’s capabilities, the most important ones being the I/O Connections it supports and what parameters for display or configuration exist within the device. It is highly recommended that all supported I/O Connections are described within the EDS; this makes the application of a device much easier. When it comes to parameters, it is up to the developer to decide which items to make accessible to the user. Let us look at some details of the EDS: EDSs are structured into sections. Every section starts with a section name in square brackets []. The first two sections are mandatory for all EDSs. • [File]: Describes the contents and revision of the file • [Device]: Is equivalent to the Identity Object information and is used to match an EDS to a device • [Device Classification]: Describes what network the device can be connected to, optional for DeviceNet, required for ControlNet and EtherNet/IP • [IO_Info]: Describes connection methods and I/O sizes, DeviceNet only • [Variant_IO_Info]: Describes multiple I/O_Info data sets, DeviceNet only • [ParamClass]: Describes class-level attributes of the Parameter Objects • [Params]: Identifies all configuration parameters in the device; follows the Parameter Object definition, further details below • [EnumPar]: Enumeration list of parameter choices to present to the user; old method specified for DeviceNet only • [Assembly]: Describes the structure of data items • [Groups]: Identifies all parameter groups in the device and lists group name and Parameter Object instance numbers • [Connection Manager]: Describes connections supported by the device, typically used in ControlNet and EtherNet/IP • [Port]: Describes the various network ports a device may have • [Modular]: Describes modular structures inside a device • [Capacity]: Brand new EDS section to specify communication capacity of EtherNet/IP and ControlNet devices
© 2005 by CRC Press
14-16
The Industrial Communication Technology Handbook
With these sections, a very detailed description of a device can be made. Only a few of these details are described here, and further reading is available in [25] and [26]. A tool with a collection of EDSs will first use the device section to try to match an EDS with each device it finds on a network. Once this is done and a particular device is chosen, the tool can then display device properties and parameters and allow their modification (if necessary). A tool may also display what I/O Connections a device may allow and which of these are already in use. EDS-based tools are mainly used for slave or adapter devices; scanner devices are typically too complex to be configured through EDSs. For those devices, the EDS is mainly used to identify the device and then guide the tool to call a matching configuration applet. A particular strength of the EDS approach lies in the methodology of parameter configuration. A configuration tool typically takes all the information supplied by the Parameter Object and an EDS and displays it in a user-friendly manner. This enables the user to configure a device in many cases without the need of a detailed manual; the tool presentation of the parameter information together with help texts allows one to make the right decisions for a complete device configuration, provided, of course, that the developer has supplied all information required. A complete description of what can be done with EDSs goes well beyond the scope of this handbook. For further details, consult [25], [26], and Chapter 7 of the CIP Specification [4].
14.2.8 Bridging and Routing CIP has defined mechanisms that allow the transmission of messages across multiple networks, provided the bridging devices (routers) between the various networks are equipped with a suitable set of capabilities (objects and support of services). Once this is the case, the message will be forwarded from router to router until it has reached its destination node. Here is how it works: For Unconnected Explicit Messaging, the actual Explicit Message to be executed on the target device is wrapped up using another type of Explicit Message service, the so-called Unconnected_Send Message. This Unconnected_Send Message (Service Code 0x52 of the Connection Manager Object) contains complete information on the transport mechanism, in particular time-outs (they may be different while the message is still en route) and path information. The first router device that receives an Unconnected_Send Message will take its contents and forward them to the next network as specified within the path section of the message. Before the message is actually sent, the used part of the path is removed, but remembered by the intermediate router device for the return of any response. This process is executed for every hop until the final destination network is reached. The number of hops is theoretically limited by the message length. Once the Unconnected_Send Message has arrived at the target network, the inner Explicit Message is then sent to the target device, which executes the requested service and generates a response. This response is then routed back through all the routers it has gone through during its forward journey until it has finally reached the originating node. It is important to note in this context that the transport mechanism may have been successful in forwarding the message and returning the response, but the response could still contain an indication that the desired service could not be performed successfully in the target network/device. Through this mechanism, the router devices do not need to know anything about the message paths ahead of time. Thus, no programming of any of the router devices is required. This is often referred to as seamless routing. When a connection (I/O or Explicit) is set up using the Forward_Open service (see Section 14.3.2.10), it may go to a target device on another network. To enable the appropriate setup process, the Forward_Open Message may contain a field with path information describing a route to the target device. This is very similar to the Unconnected_Send service described above. This routing information is then used to create routed connections within the routing devices between the originator and the target of the message. Once set up, these connections automatically forward any incoming messages for this connection to the outgoing port en route to the target device. Again, this is repeated until the message has reached its target node. As with routed Unconnected Explicit Messages, the number of hops is generally limited
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.17
14-17
Logical Segment encoding example.
only by the capability of the devices involved in this process. In contrast to routed Unconnected Messages, routed Connected Messages do not carry path information. Since Connected Messages always use the same path for any given connection, the path information that was given to the routing devices during connection setup is held there as long as the connection exists. Again, the routing devices do not have to be preprogrammed; they are self-configured during the connection establishment process.
14.2.9 Data Management The Data Management part of the CIP Specification describes addressing models for CIP entities and the data structure of the entities themselves. The entity addressing is done by so-called Segments, a method that allows very flexible usage so that many different types of addressing methods can be accommodated. The first byte of a CIP Segment allows a distinction between a Segment Address (0x00 to 0x9F) and a Data Type description (0xA0 to 0xDF). Two uses of this addressing scheme (Logical Segments and Data Types) are looked at in a little more detail here; all of them are described in Appendix C of the CIP Specification [4]. 14.2.9.1 Logical Segments Logical Segments (first byte = 0x20 to 0x3F) are addressing segments that can be used to address objects and their attributes within a device. They are typically structured into [Class ID] [Instance ID] [Attribute ID, if required]. Each element of this structure allows various formats (1, 2, and 4 bytes). Figure 14.17 shows a typical example of this addressing method. This type of addressing is commonly used to point to assemblies, parameters, or any other addressable attribute within a device. It is extensively used in EDSs, but also within Unconnected Messages, to name just a few application areas. A complete list of all Segment types and their encoding can be found in Appendix C of the CIP Specification [4]. 14.2.9.2 Data Types Data Types (first byte = 0xA0 to 0xDF) can be either structured (first byte = 0xA0 to 0xA3) or elementary (first and only byte = 0xC1 to 0xDE) Data Types. Structured Data Types can be arrays of elementary Data Types or any assembly of arrays or elementary Data Types. Of particular importance in the context of this handbook are the elementary Data Types. They are used within EDSs to specify the Data Types of parameters and other entities. Here is a list of commonly used Data Types: • 1 bit (encoded into 1 byte): • Boolean, BOOL, Type Code 0xC1 • 1 byte: • Bit string, 8 bits, BYTE, Type Code 0xD1 • Unsigned 8-bit integer, USINT, Type Code 0xC6 • Signed 8-bit integer, SINT, Type Code 0xC2 • 2 bytes: • Bit string, 16 bits, WORD, Type Code 0xD2
© 2005 by CRC Press
14-18
The Industrial Communication Technology Handbook
• Unsigned 16-bit integer, UINT, Type Code 0xC7 • Signed 16-bit integer, INT, Type Code 0xC3 • 4 bytes: • Bit string, 32 bits, DWORD, Type Code 0xD3 • Unsigned 32-bit integer, UDINT, Type Code 0xC8 • Signed 32-bit integer, DINT, Type Code 0xC4 The Data Types in CIP follow the requirements of IEC 61131-3 [9]. A complete list of all Data Types and their encodings can be found in Appendix C of the CIP Specification [4]. 14.2.9.3 Maintenance and Further Development of the Specifications Both ODVA and ControlNet International have a set of working groups that have the task of maintaining the specifications and creating protocol extensions, e.g., new profiles or functional enhancements such as CIP Sync and CIP Safety. These groups are called Special Interest Groups (SIGs) for DeviceNet and ControlNet and Joint Special Interest Groups (JSIGs) for EtherNet/IP. JSIGs are called “joint” since it is a combination of ODVA and ControlNet International members that do the work, since the EtherNet/IP technology is jointly administered by both groups. The results of these SIGs are written up as DSEs (DeviceNet Specification Enhancements), CSEs (ControlNet Specification Enhancements), ESEs (EtherNet/IP Specification Enhancements), or CIPSEs (CIP Specification Enhancements), presented to the Technical Review Board (TRB) for approval and then incorporated into the specifications. Only ODVA or ControlNet International members can work within the SIGs, and those participants have the advantage of advance knowledge of technical changes. Participation in one or several SIGs is therefore highly recommended.
14.3 Network Adaptations of CIP Up to now there are three public derivatives of CIP. These three derivatives are based on quite different data link layers and transport mechanisms, but they maintain the principles of CIP.
14.3.1 DeviceNet DeviceNet was the first public implementation of CIP. As already mentioned in Section 14.2, DeviceNet is based on the Controller Area Network (CAN). The adaptations of CIP are done to accommodate certain limitations of the CAN protocol (up to 8 bytes payload only) and to allow for a simple device with only minimal processing power; for a more detailed description of the CAN protocol and some of its applications, see [10]. DeviceNet uses a subset of the CAN protocol (11-bit identifier only, no remote frames). Figure 14.18 shows the relationship between CIP, DeviceNet, and the ISO/OSI layer model. 14.3.1.1 Physical Layer and Relationship to CAN The physical layer of DeviceNet is an extension of ISO 11898 [11]. This extension defines the following additional details: • • • •
Improved transceiver characteristics that allow the support of up to 64 nodes per network Additional circuitry for overvoltage and miswiring protection Several types of cables for a variety of applications Several types of connectors for open (IP20) and enclosed (IP65/67) devices
These extensions result in a communication system with the following physical layer characteristics: • • • •
Trunkline/dropline configuration Support for up to 64 nodes Node removal without severing the network Simultaneous support for both network-powered (sensors) and separately powered (actuators) devices
© 2005 by CRC Press
14-19
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP (Presentation) (Session)
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
ControlNet Transport
Transport
DeviceNet Transport
Encapsulation TCP
Network
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.18
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP and DeviceNet.
• • • • • • •
Use of sealed or open-style connectors Protection from wiring errors Selectable data rates of 125, 250, and 500 kBaud Adjustable power configuration to meet individual application needs High current capability (up to 16 amps per supply) Operation with off-the-shelf power supplies Power taps that allow the connection of several power supplies from multiple vendors that comply with DeviceNet standards • Built-in overload protection • Power available along the bus: both signal and power lines contained in the cable The cables described in the DeviceNet Specification have been specially designed to meet minimum propagation speed requirements to make sure they can be used up to the maximum system length. Using the specified cables, in conjunction with suitable transceiver circuits, results in overall systems, as specified in Figure 14.19. ODVA has issued a guideline [7] that gives complete details on how to build the physical layer of a DeviceNet network. Developers of DeviceNet devices have the choice of creating DeviceNet circuits with or without physical layer isolation; both versions are fully specified. Furthermore, a device may take some or all of its power Trunk Distance Data Rate
Thick Cable
125 kBaud
500 meters
250 kBaud
250 meters
500 kBaud
100 meters
FIGURE 14.19
© 2005 by CRC Press
Thin Cable
Drop Length Flat Cable
Maximum
420 meters 100 meters
Data rate vs. trunk and drop length.
200 meters 75 meters
Cumulative 156 meters
6 meters
78 meters 39 meters
14-20
The Industrial Communication Technology Handbook
from the bus, thus avoiding extra power lines for devices that can live on the power supplied through the DeviceNet cable. All DeviceNet devices must be equipped with one of the connectors described in the DeviceNet Specification. Hard wiring of a device is allowed, provided the node is removable without severing the trunk. 14.3.1.2 Protocol Adaptations On the protocol side, there are basically two adaptations of CIP (apart from the addition of the DeviceNet Object) that have been made to better accommodate it to the CAN data frame: • Limitation to short messages (8 bytes or less) where possible; introduction of fragmentation for longer messages. • Introduction of the Master/Slave communication profile minimizes connection establishment management (see Section 14.3.1.12). These two features have been created to allow the use of simple and thus inexpensive microcontrollers. This is particularly important for small, cost-sensitive devices like photo-eyes or proximity sensors. As a result of this specialization, the DeviceNet protocol in its simplest form has been implemented in 8-bit microprocessors with as little as 4 kbyte of code memory and 175 bytes of RAM. The fragmentation of messages comes in two varieties: For I/O Messages typically sent with a fixed length, the use of fragmentation is defined through the maximum length of data to be transmitted through a connection. Any connection that has more than 8 bytes to transmit always uses the fragmentation protocol, even if the actual data to be transmitted are 8 bytes or less, e.g., an Idle Message. For Explicit Messaging, the use of the fragmentation protocol is indicated with every message, since the actual message will vary in length. The actual fragmentation protocol is contained in one extra byte within the message that indicates whether the fragment is a start, middle, or end fragment. A modulo 64 rolling fragment counter allows very long fragmented messages, in theory limited only by the maximum Produced or Consumed Connection Sizes (65,535 bytes). In reality, it is the capability of the devices that limits the message sizes. 14.3.1.3 Indicators and Switches DeviceNet devices may be built with indicators or without, but it is recommended to incorporate some of the indicators described in the specification. These indicators allow the user to determine the state of the device and its network connection. Devices may have additional indicators with a behavior not described in the specification. However, any indicators that carry names of those described in the specification must also follow their specified behavior. Devices may be built with or without switches or other directly accessible means for configuration. If switches for MAC ID and baud rate exist, then certain rules apply regarding how these values have to be used at power-up and during the operation of the device. 14.3.1.4 Additional Objects The DeviceNet Specification defines one additional object, the DeviceNet Object. 14.3.1.4.1 DeviceNet Object (Class Code 0x03) A DeviceNet Object is required for every DeviceNet port of the device. The instance attributes of this object contain information on how this device uses the DeviceNet port. In particular, there is information about the MAC ID of the device and the (expected) baud rate of the DeviceNet network this device is attached to. Both attributes are always expected to be nonvolatile; i.e., after a power interruption, the device is expected to try to go online again with the same values that were stored in these attributes before the power interruption. Devices that set these values through switches typically override any stored values at power-up. 14.3.1.5 Network Access DeviceNet uses the network access mechanisms described in the CAN specification, i.e., bitwise arbitration through the CAN Identifier for every frame to be sent. This requires a system design that does not
© 2005 by CRC Press
14-21
The CIP Family of Fieldbus Protocols
allow multiple uses of any of these identifiers. Since the node number of every device is coded into the CAN Identifier (see Section 14.3.1.10), it is generally sufficient to make sure that none of the node numbers exist more than once on any given network. This is guaranteed through the Network Access algorithm (see Section 14.3.1.6). 14.3.1.6 Going Online Any device that wants to communicate on DeviceNet must go through a Network Access algorithm before any communication is allowed. The main purpose of this process is to avoid duplicate Node IDs on the same network. Every device that is ready to go online sends a Duplicate MAC ID Check Message containing its Port Number, Vendor ID, and Serial Number. If another device is already online with this MAC ID or is in the process of going online, it responds with a Duplicate MAC ID Response Message that directs the checking device to go offline and not communicate any further. If two or more devices with the same MAC ID should happen to try to go online at exactly the same time, all of them will win arbitration at the same time (same CAN ID) and will proceed with their messages. However, since they will exhibit different values in the data field of the message, all devices on the link will flag Cyclic Redundancy Check (CRC) errors and thus cause a repetition of the message. This may eventually result in a Bus-Off condition for these devices, but a situation with duplicate Node ID is safely avoided. 14.3.1.7 Offline Connection Set The Offline Connection Set is a set of messages that have been created to communicate with devices that have failed to go online (see Section 14.3.1.6), e.g., to allow setting a new MAC ID. Full details of these messages can be found in [5] or [10]. 14.3.1.8 Explicit Messaging All Explicit Messaging in DeviceNet is done via connections and the associated Connection Object instances. However, these objects must first be set up in the device. This can be done by using the Predefined Master/Slave Connection Set to activate a static Connection Object already available in the device or by using the Unconnected Message Manager (UCMM) port of a device, through which this kind of Connection Object can be dynamically set up. The only messages that can be sent to the UCMM are Open or Close requests that set up or tear down a Messaging Connection, while the only messages that can be sent to the Master/Slave equivalent are an Allocate or Release request (see also Section 14.3.1.12). Explicit Messages always pass via the Message Router Object to the individual objects (refer to Figure 14.11). As mentioned in Section 14.2.3, Explicit Messages on DeviceNet have a very compact structure to make them fit into the 8-byte frame in most cases. Figure 14.20 shows a typical example of a request message. Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [0]
2
5
4
3
2
MAC ID
1
0 Message header
Service Code Class ID
3
Instance ID
4 ... 7
Service data ... (optional)
Message body
FIGURE 14.20 Format of nonfragmented Explicit Request Message using the 8/8 message body format (1 byte for Class ID, 1 byte for Instance ID).
© 2005 by CRC Press
14-22
The Industrial Communication Technology Handbook
Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [1]
5
4
2
1
0 Message header
MAC ID Service Code
Message body
Service data ... (optional)
2 ... 7
FIGURE 14.21
3
Format of a nonfragmented 8/8 Explicit Response Message.
The consumer of this Explicit Message responds in the format shown in Figure 14.21. The consumer sets the R/R (Request/Response) bit and repeats the Service Code of the request message. If data are transferred with the response, this is entered in the service data field. Most messages will use the 8/8 format shown in Figure 14.20, since they only need to address Class and Instance IDs up to 255. If there is a need to address any Class/Instance combinations above 255, then this is negotiated between the two communication partners during the setup of the connection. Should an error occur, the receiver responds with the Error Response Message. The Service Code for this message is 0x14; 2 bytes of error code are returned in the service data field. Further details of the message encoding, including the use of fragmentation, can be found in [5] and [10]. 14.3.1.9 I/O Messaging I/O Messages have a very compact structure; only the naked data are transmitted without the Run/Idle header and Sequence Count value used in ControlNet and EtherNet/IP. For messages up to 8 bytes long, the full CAN data field is used for I/O data. I/O Messages that are longer use 1 byte of the CAN data field for the fragmentation protocol (Figure 14.22 and Figure 14.23). I/O Messages without data (i.e., with zero length data) indicate the Idle state of the producing application. Any producing device can do this — master, slave, or peer.
Bit number Byte offset
7
6
5
4
0 ...
3
2
1
0
2
1
0
Process data (0 – 8 bytes)
7
FIGURE 14.22
Format of a nonfragmented I/O Message, 0 to 8 bytes.
Bit number Byte offset 0 1 ...
7
6
5
4
Fragmentation protocol Process data (0 – 7 bytes)
7
FIGURE 14.23
© 2005 by CRC Press
3
Format of the fragmented I/O Message.
14-23
The CIP Family of Fieldbus Protocols
FIGURE 14.24
I/O Messaging Connections.
As already mentioned, I/O Messages are used to exchange high-priority application and process data via the network, and this communication is based on the Producer/Consumer model. The associated I/O data are always transferred from one producing application object to one or more consuming application objects. This is undertaken using I/O Messages via I/O Messaging Connection Objects (Figure 14.24 shows two consuming applications) that must have been previously set up in the device. This can be done in one of two ways by using: • The Predefined Master/Slave Connection Set to activate a static I/O Connection Object already available in the device • An Explicit Messaging Connection Object already available in the device to dynamically create and set up an appropriate I/O Connection Object I/O Messages usually pass directly to the data of the assigned application object. The Assembly Object is the most common application object used with I/O Connections. Refer to Figure 14.11. 14.3.1.10 Using the CAN Identifier DeviceNet is based on the standard CAN protocol and therefore uses an 11-bit message identifier. A distinction can therefore be made between 211 = 2048 messages. Six bits is sufficient to identify a device because a DeviceNet network is limited to a maximum of 64 participants. This 6-bit Device Identifier (node address) is also called MAC ID. The overall CAN Identifier range is divided into four Message Groups of varying sizes (Figure 14.25). In DeviceNet, the CAN Identifier is the Connection ID. This is composed of the ID of the Message Group, the Message ID within this group, and the MAC ID of the device. The source or destination address is possible as the MAC ID. The definition depends on the Message Group and the Message ID. The significance of the message within the system is defined by the Message Group and Message ID. Connection ID = CAN Identifier (bits 10:0)
FIGURE 14.25
© 2005 by CRC Press
8
7
6
5
9
0 1
0
1
1
Message ID
Source MAC ID Message Group 1 Message ID Message Group 2 Message Group 3 Source MAC ID
1 1
1 1
1 1
Message ID 1 x x x
Message ID MAC ID 1 1
1 1
1
4
3
Definition of the Message Groups.
2
1
0
Used for
10
x
Message Group 4 Invalid CAN Identifiers
14-24
The Industrial Communication Technology Handbook
The four Message Groups are used as follows: • Message Group 1: Assigned 1024 CAN Identifiers (0x0000 to 0x03FF), 50% of all identifiers available. Up to 16 different Message IDs are available to the user per device (node) within this group. The priority of a message from this group is primarily determined by the Message ID (the significance of the message) and only after that by the source MAC ID (the producing device). If two devices transmit at the same time, then the device with a lower Message ID will always win the arbitration. However, if two devices transmit the same Message ID at the same time on the CAN bus, then the device with the lower node number will win. A 16-stage priority system can be set up relatively easily in this manner. The messages of Group 1 are therefore well suited for the exchange of high-priority process data. • Message Group 2: Assigned 512 identifiers (0x0400 to 0x05FF). Most of the Message IDs of this group are optionally defined for what is commonly referred to as the Predefined Master/Slave Connection Set (see Section 14.3.1.12). One Message ID is defined for network management (Section 14.3.1.6). The priority here is primarily determined by the device address, and only after that by the Message ID. If you consider the bit positions in detail, you will see that a CAN controller with an 8-bit mask is able to filter out its Group 2 Messages based on MAC ID. • Message Group 3: Has 448 CAN Identifiers (0x0600 to 0x07BF) and a structure similar to that of Message Group 1. Unlike this group, however, low-priority process data are mainly exchanged. In addition to this, the main use of this group is also the setting up of dynamic Explicit Connections. Seven Message IDs are possible per device; two of these are reserved for what is commonly referred to as the UCMM port (Section 14.3.1.11). • Message Group 4: Has 48 CAN Identifiers (0x07C0 to 0x07EF) and does not include any device addresses, but only Message IDs. The messages of this group are only used for network management. Four Message IDs are currently assigned for services of the Offline Connection Set. The other 16 CAN Identifiers (0x07F0 to 0x07FF) are invalid CAN IDs and thus not permitted in DeviceNet systems. This type of CAN Identifier issuing system means that unused Connection IDs (CAN Identifiers) cannot be used by other devices. Each device has exactly 16 Message IDs in Group 1, 8 Message IDs in Group 2, and 7 Message IDs in Group 3. One advantage of this system is that the CAN Identifiers used in the network can always be clearly assigned to a device. Devices are responsible for managing their own identifiers. This simplifies not only the design but also troubleshooting and diagnosis in DeviceNet systems. A central tool that keeps a record of all assignments on the network is not needed. 14.3.1.11 Connection Establishment As is described in Sections 14.3.1.8 and 14.3.1.9, messages in DeviceNet are always exchanged in a connection-based manner. Communication objects must be set up for this purpose. These are not initially available when a device is switched on; they first have to be created. The only port by which a DeviceNet device can be addressed when first switched on is the Unconnected Message Manager port (UCMM port) or the Group 2 Only Unconnected Explicit Message port of the Predefined Master/Slave Connection Set. Picture these ports like doors to the device. Only one particular key will fit in each lock. The appropriate key to this lock is the Connection ID, i.e., the CAN Identifier of the selected port. Other doors in the device can only be opened once the appropriate key is available and other Connection Objects are set up. The setting up of a link via the UCMM port represents a general procedure to be strived for with all DeviceNet devices. Devices that in addition to having the Predefined Master/Slave Connection Set are also UCMM capable are called Group 2 Servers. A Group 2 Server can be addressed by one or more connections from one or more clients. Since UCMM capable devices need a good amount of processing power to service multiple communication requests, a simplified communication establishment and I/O data exchange method has been created for low-end devices. This is called the Predefined Master/Slave Connection Set (see Section 14.3.1.12). This covers as many as five predefined connections that can be activated (assigned) when
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-25
accessing the device. The Predefined Master/Slave Connection Set represents a subset of the general connection establishment method. It is limited to pure Master/Slave relations. Slave devices that are not UCMM capable, and only support this subset, are called Group 2 Only Servers in DeviceNet speak. Only the master that allocates it can address a Group 2 Only Server. All messages received by this device are defined in Message Group 2. More details of the connection establishment using UCMM and the Master/Slave Connection Set can be found in [5] and [10]. 14.3.1.12 Predefined Master/Slave Connection Set Establishing a connection via the UCMM port requires a relatively large number of individual steps that have to be conducted to allow for data exchange via DeviceNet. The devices must provide resources to administer the dynamic connections. Because every device can set up a connection with every other device, and the source MAC ID of the devices is contained in the Connection ID, the CAN Identifier (Connection ID) may have to be filtered via software. This depends on how many connections a device supports, and the type and number of screeners (hardware CAN ID filters) of the CAN chip used in the device’s implementation. While this approach provides for taking full advantage of the multicast, peer-to-peer, and Producer/ Consumer capabilities of CAN, a simpler method that needs fewer CPU resources is needed for low-end devices. To that end, the Predefined Master/Slave Connection Set was defined. The Group 2 Only Unconnected Explicit Message port of the Predefined Master/Slave Connection Set therefore provides an interface for a set of five preconfigured connection types in a node. The basis of this model is a 1:n communication structure consisting of one control device and decentralized I/O devices. The central instance of such a system is known as the Master, and the decentralized devices are known as Slaves. Multiple masters are allowed on the network, but a slave can only be allocated to one master at any point in time. The predefined Connection Objects occupy instances 1 to 5 in the Connection Object (Class ID 0x05; see Section 14.2.4): • Explicit Messaging Connection: • Group 2 Explicit Request/Response Message (Instance ID 1) • I/O Messaging Connections: • Polled I/O Connection (Instance ID 2) • Bit-Strobe I/O Connection (Instance ID 3) • Change of State or Cyclic I/O Connection (Instance ID 4) • Multicast Polling I/O Connection (Instance ID 5) The messages to the slave are defined in Message Group 2, and some of the responses from the slave are contained in Message Group 1. The distribution of Connection IDs for the Predefined Master/Slave Connection Set is defined as shown in Figure 14.26. Because the CAN ID of most of the messages the master produces contains the destination MAC ID of the slave, it is imperative that only one master talks to any given slave. Therefore, before a master can use this Predefined Connection Set, he must first allocate it with the device. The DeviceNet Object manages this important function in the slave device. It allows only one master to allocate its Predefined Connection Set, thereby preventing duplicate CAN IDs from appearing on the wire. The two services used are called Allocate_Master/Slave_Connection_Set (Service Code 0x4B) and Release_Group_2_Identifier_Set (Service Code 0x4C). These two services always access Instance 1 of the DeviceNet object (Class ID 0x03) (Figure 14.27). Figure 14.27 shows the Allocate Message with 8-bit Class ID and 8-bit Instance ID, a format that is always used when it is sent as a Group 2 Only Unconnected Message. It may also be sent across an existing connection and in a different format if a format other than 8/8 was agreed upon during the connection establishment. The Allocation Choice Byte is used to set which predefined connections are to be allocated (Figure 14.28).
© 2005 by CRC Press
14-26
The Industrial Communication Technology Handbook
Connection ID = CAN Identifier (bits 10:0) 10
9
8
7
6
5
Group 1 Message ID
0
4
3
2
Used for 1
0
Source MAC ID
Group 1 Messages
0
1
1
0
0
Source MAC ID
Slave’s I/O Multicast Poll Response
0
1
1
0
1
Source MAC ID
Slave’s I/O Change of State or Cyclic Message
0
1
1
1
0
Source MAC ID
Slave’s I/O Bit-Strobe Response Message
0
1
1
1
1
Source MAC ID
Slave’s I/O Poll Response or COS/Cyclic Ack Message
1
0
MAC ID
1
0
Source MAC ID
0
0
0
Master’s I/O Bit-Strobe Command Message
1
0
Source MAC ID
0
0
1
Master’s I/O Multicast Poll Group ID
1
0
Destination MAC ID
0
1
0
Master’s Change of State or Cyclic Acknowledge Message
1
0
Source MAC ID
0
1
1
Slave’s Explicit/Unconnected Response Messages
1
0
Destination MAC ID
1
0
0
Master’s Explicit Request Messages
1
0
Destination MAC ID
1
0
1
Master’s I/O Poll Command/COS/Cyclic Message
1
0
Destination MAC ID
1
1
0
Group 2 Only Unconnected Explicit Request Messages
1
0
Destination MAC ID
1
1
1
Duplicate MAC ID Check Messages
FIGURE 14.26
Group 2 Group 2 Messages Message ID
Connection IDs of the Predefined Master/Slave Connection Set. Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [0]
5
4
3
2
1
0 Message header
MAC ID Service Code [0x4B] Class ID [0x03] Allocation Choice
0
FIGURE 14.27
Message body
Instance ID [0x01]
2…5 0
Allocator’s MAC ID
Allocate_Master/Slave_Connect_Set Request Message. Bit number
7
6
5
4
3
2
1
0
Reserved
Ack Suppression
Cyclic
Change of State
Multicast Polling
Bit-Strobe
Polled
Explicit Message
FIGURE 14.28
Format of the Allocation Choice Byte.
The associated connections are activated by setting the appropriate bits. Change of State and Cyclic Connections are mutually exclusive choices. The Change of State/Cyclic Connection may be configured as not acknowledged using acknowledge suppression. The individual connection types are described in more detail below. The allocator’s MAC ID contains the address of the node (master) that wants to assign the Predefined Master/Slave Connection Set. Byte 0 of this message differs from the allocator’s MAC ID if this service has been passed on to a Group 2 Only Server via a Group 2 Only Client (what is commonly referred to as a proxy function).
© 2005 by CRC Press
14-27
The CIP Family of Fieldbus Protocols
Link Producer Object Application Object Link Consumer Object
Link Producer Object
Link Consumer Object
Link Producer Object
Master MAC ID = 0
FIGURE 14.29
Link Consumer Object
Poll Command Message
CID = 0x041D Poll Response Message
CID = 0 x03C3
Poll Command Message
CID = 0 x042D Poll Response Message
CID = 0 x03C5
Poll Command Message
CID = 0 x0455 Poll Response Message
CID = 0x03CA CID = Connection ID
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 5
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 10
Polled I/O Connections.
The slave, if not already claimed, responds with a Success Message. The connections are now in configuring status. Setting the Expected_Packet_Rate (EPR) (Set_Attribute_Single service to attribute 9 in the appropriate Connection Object, value in ms) starts the connection’s time-monitoring function. The connection then changes into established state and I/O Messages begin transferring via this connection. The allocated connections can be released individually or collectively through the Release_Group_2_Identifier_Set service (Service Code 0x4C), using the same format as in Figure 14.27, except that the last byte (allocator’s MAC ID) is omitted. The following is an explanation of the four I/O Connection types in the Predefined Master/Slave Connection Set. 14.3.1.12.1 Polled I/O Connection A Polled I/O Connection is used to implement a classic Master/Slave relationship between a control unit and a device. In this setup, a master can transfer data to a slave using the poll request and receive data from the slave using the poll response. Figure 14.29 shows the exchange of data between one master and three slaves in the Polled I/O mode. In a message between a master and a slave using the Polled I/O Connection, the amount of data transferred via this message can be any length. If the length exceeds 8 bytes, the fragmentation protocol is automatically used. A Polled I/O Connection is always a point-to-point connection between a master and a slave. The slave consumes the Poll Message and sends back an appropriate response, normally its input data. The Polled Connection is subject to a time-monitoring function (that can be adjusted) in the device. A Poll Command must have been received within this time (4 × EPR); otherwise, the connection changes over into time-out mode. When a connection times out, the node may optionally go to a preconfigured fault state as set up by the user. A master usually polls all the slaves in a round-robin manner.
© 2005 by CRC Press
14-28
7
The Industrial Communication Technology Handbook
Bit Numbers Byte 0
MAC ID 7
FIGURE 14.30
0 7
Bit Numbers
0
7
Bit Numbers
...
Byte 1
MAC ID 0 MAC ID 15 MAC ID 8
Byte 6
0 7
Bit Numbers
0
Byte 7
MAC ID 55 MAC ID 48 MAC ID 63 MAC ID 56
Data format of the Bit-Strobe I/O Connection.
A slave’s response time to a poll command is not defined in the DeviceNet Specification. This provides much flexibility for slave devices to be designed appropriate to their primary applications, but it may also exclude the device from use in higher-speed applications. 14.3.1.12.2 Bit Strobe I/O Connection The master’s transmission on this I/O Connection is also known as a Bit-Strobe Command. Using this command, a master multicasts one message to reach all its slaves allocated for the Bit-Strobe Connection. The frame sent by the master using a Bit-Strobe Command is always 8 or 0 bytes (if Idle). From these 8 bytes, each slave is assigned one bit (Figure 14.30). Each slave can send back as many as 8 data bytes in its response. A Bit-Strobe I/O Connection represents a multicast connection between one master and any number of strobe-allocated slaves (Figure 14.31). Since all devices in a network receive the Bit-Strobe Command at the same time, they can be synchronized by this command. When the Bit-Strobe Command is received, the slave may consume its associated bit and then send a response of up to 8 bytes.
Link Producer Object Application Object Link Consumer Object
Bit-Strobe Command Message
CID = 0 x0400 Bit-Strobe Response Message
CID = 0 x0383
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Consumer Object
Link Consumer Object
Bit-Strobe Response Message
CID = 0 x0385
Application Object Link Producer Object
Slave MAC ID = 5
Link Consumer Object
Master MAC ID = 0
FIGURE 14.31
© 2005 by CRC Press
Link Consumer Object
Bit-Strobe I/O Connections.
Bit-Strobe Response Message
CID = 0 x038A CID = Connection ID
Application Object Link Producer Object
Slave MAC ID = 10
14-29
The CIP Family of Fieldbus Protocols
Link Producer Object Application Object Link Consumer Object
Link Consumer Object
Link Producer Object
Link Producer Object
Master MAC ID = 0
FIGURE 14.32
Link Consumer Object
Master COS/Cyclic Message CID = 0x041D Slave Acknowledge Message CID = 0x03C3 Slave COS/Cyclic Message CID = 0x0345 Master Acknowledge Message CID = 0x042A Master COS/Cyclic Message CID = 0x0455 Slave COS/Cyclic Message CID = 0x034A CID = Connection ID
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Producer Object
Application Object
Link Consumer Object
Ack Handler Object
Slave MAC ID = 5
Link Consumer Object
Application Object
Link Producer Object
Application Object
Slave MAC ID = 10
COS/Cyclic I/O Connections.
Since this command uses the source MAC ID in the Connection ID (Figure 14.26), devices that support the Bit-Strobe I/O Connection and have a CAN chip with screening limited to only 8 bits of the CAN ID (11 bits) must perform software screening of the CAN Identifier. 14.3.1.12.3 Change of State/Cyclic I/O Connection The COS/Cyclic I/O Connection differs from the other types of I/O Connections in that both endpoints produce their data independently. This can be done in a change-of-state or cyclic manner. In the former case, the COS I/O Connection recognizes that the application object data that the Produced_Connection_ Path is indicating have changed. In the latter case, a timer of the Cyclic I/O Connection expires and therefore triggers the message transfer of the latest data from the application object. A COS/Cyclic I/O Connection can be set up as acknowledged or unacknowledged. When acknowledged, the consuming side of the connection must set up a defined path to the Acknowledge Handler Object to ensure that the retries, if needed, are properly managed. Figure 14.32 shows the various COS/Cyclic I/O Connection possibilities. A COS/Cyclic I/O Connection can also originate from a master. This connection then seems like a Polled I/O Connection to the slave. This can be seen in Figure 14.26 since the same Connection ID is issued for the master’s Polled I/O Message as is issued for the master’s COS/Cyclic I/O Message. COS Connections have two additional behaviors. The Expected Packet Rate (EPR) is used as a default production trigger so that if the data have not changed after the EPR timer has expired, the data will be re-sent as a heartbeat. This is so the consuming node can know the difference between a dead node and one whose data has not changed. COS Connections also have a Production Inhibit timer to prevent a chattering node from using too much bandwidth.
© 2005 by CRC Press
14-30
The Industrial Communication Technology Handbook
14.3.1.12.4 Multicast Polled I/O Connection This connection is similar to the regular I/O poll except that all of the slaves belonging to a multicast group consume the same output data from the master. Each slave responds with its own reply data. A unique aspect of this connection is that the master picks the CAN ID from one of the slaves in the multicast group and must then set to that same value the consumed CAN ID in each of the other slaves. If during runtime that slave’s connection times out, the master must either stop producing its multicast poll command or pick another slave in the group and reset the command CAN ID in all the remaining slaves in the group to that value before sending another Multicast Poll Command. 14.3.1.12.5 I/O Data Sharing Due to the inherent broadcast nature of all CAN frames, applications can be set up to listen to the data produced by other applications. Such a listen-only mode is not described in the DeviceNet Specification, but some vendors have created products that do exactly that, e.g., Shared Inputs in Allen-Bradley scanners. 14.3.1.12.6 Typical Master/Slave Start Sequence A typical start-up of a DeviceNet network with a scanner and a set of slaves is executed as follows: • All devices run their self-test sequence and then try to go online with the algorithm described in Section 14.3.1.6. Any device that uses an autobaud mechanism to detect the baud rate of a network will have to wait with its Duplicate Node ID Message until it has seen enough CAN frames to detect the correct baud rate. • Once online, slave devices will do nothing until their master allocates them. • Once online, a master will try to allocate each slave configured into its scan list by running the following sequence of messages: • Try to open a connection to the slave using a UCMM Open Message. • If successful, the master can then use this connection for further communication with the slave. • If not successful, the master will try again after a minimum wait time of 1 s. • If unsuccessful again, the master will try to allocate the slave using the Group 2 Only Unconnected Explicit Request Message (at least for Explicit Messaging). • If successful, the master can then use this connection for further communication with the slave. • If not successful, the master will try again after a minimum wait time of 1 s. • If unsuccessful again, the master will start all over again with the UCMM Message. This process will carry on indefinitely or until the master has allocated the slave. • Once the master has allocated the slave, it may carry out some verification to see whether it is safe to start I/O Messaging with the slave. The master may also apply some further configuration to the connections it has established, e.g., setting the Explicit Messaging Connection to “Deferred Delete.” • Setting the EPR value(s) brings the I/O Connection(s) to the Established State so that I/O Messaging can commence. 14.3.1.12.7 Master/Slave Summary The task of supporting the Predefined Master/Slave Connection Set represents a solution that can be easily implemented for the device manufacturer. Simple BasicCAN controllers may be used; software screening of the CAN Identifier is generally not necessary, enabling the use of low-cost 8-bit controllers. This may represent an advantage as far as the devices are concerned but entails disadvantages for the system design. Group 2 Only (i.e., UCMM incapable) devices permit only one Explicit Connection between client (master) and server (slave), whereas UCMM capable devices can maintain Explicit Messaging Connections with more than one client at the same time. If a device wants to communicate with one of the allocated slaves that do not support UCMM, the master recognizes this situation and sets up a communication link with the requestor instead. Any communication between the requestor is then automatically routed via the master. This is called the proxy function. Since this puts an additional burden on the master and on network bandwidth, it is recommended that slave devices support UCMM.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-31
Although not explicitly defined in the DeviceNet Specification, DeviceNet masters can, under certain conditions, automatically configure their scan lists or the devices contained in their scan lists. This functionality simply makes use of the messaging capabilities of masters and slaves that allow the master to read from a slave whatever information is required to start an I/O communication and to download any configurable parameter that has been communicated to the master via EDS. This functionality facilitates the replacement of even complex slave devices without the need for a tool, reducing downtime of a system dramatically. 14.3.1.13 Device Profiles DeviceNet uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.1.14 Configuration EDSs of DeviceNet devices can make full use of all EDS features, but they do not necessarily contain all sections. Typical DeviceNet devices contain (apart from the mandatory sections) at least an IO_Info section. This section specifies which types of Master/Slave connections are supported and which one(s) should be enabled as default. It also declares which I/O Connections may be used in parallel. Chapter 7 of the DeviceNet Specification [5] gives full details of this section of a DeviceNet EDS. A full description of what can be done in DeviceNet EDSs would go well beyond the scope of this handbook, so references [25] and [26] are recommended for further reading. 14.3.1.15 Conformance Test At an early stage, the ODVA defined test and approval procedures for DeviceNet devices and systems. Manufacturers are given the opportunity to have their devices checked for conformance with the DeviceNet Specification in one of several independent DeviceNet conformance test centers. Only then do two key characteristics of all DeviceNet devices become possible: interoperability and interchangeability. Interoperability means that DeviceNet devices from all manufacturers can be configured to operate with each other on the network. Interchangeability goes one step farther by providing the means for devices of the same type (i.e., they comply with the same Device Profile) to be logical replacements for each other, regardless of the manufacturer. The conformance test checks both of these characteristics. This test is divided into three parts: • A software test to verify the function of the DeviceNet protocol. Depending on the complexity of the device, as many as several thousand messages are transmitted to the device under test (DUT). To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. • A hardware test to check conformance with the characteristics of the physical layer. This test checks all requirements of the specification like miswiring protection, overvoltage withstand, grounding, CAN transceiver, etc. The test may be destructive for noncompliant devices. • A system interoperability test that verifies that the device can function in a network with more than 60 nodes and a variety of scanners from various manufacturers. The software test is available from ODVA. It is a Windows-based tool, running on various PC CAN interface cards from a number of suppliers. It is recommended that device developers run this test in their own lab before taking devices to the official ODVA test. The hardware test and the system interoperability test involve more complex test setups that are typically not available to device developers. When a device passes the test, it is said to be DeviceNet CONFORMANCE TESTED®.* Many DeviceNet users now demand this seal. A device that has not been tested accordingly has a significant market disadvantage. Devices that have passed conformance testing are published on the ODVA Web site.
*DeviceNet CONFORMANCE TESTED® is a registered certification mark of ODVA.
© 2005 by CRC Press
14-32
The Industrial Communication Technology Handbook
14.3.1.16 Tools Tools for DeviceNet networks can be divided into three groups: • Physical layer tools: Tools (hardware and/or software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. They can range from very basic software operating from handheld devices to powerful PC-based software packages to configure complete networks. Most configuration tools are EDS-based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via Ethernet and suitable bridging devices, and thus allow remote access. High-level tools also actively query the devices on the network to identify them and monitor their health. • Monitoring tools: Typically PC-based software packages that can capture and display the CAN frames on the network. A raw CAN frame display may be good enough for some experts, but it is recommended that a tool that allows both raw CAN display and DeviceNet interpretation of the frames be used. For a typical installation, a configuration tool is all that is needed. However, to ensure the network is operating reliably, a check with a physical layer tool is highly recommended. Experience shows that the overwhelming majority of DeviceNet network problems are caused by inappropriate physical layer installation. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the DeviceNet product catalog on the ODVA Web site to access a list of vendors that provide tools for DeviceNet. 14.3.1.17 Advice for Developers Before any development of a DeviceNet product is started, the following issues should be considered in detail: • What functionality does the product require today and in future applications? • Slave functionality • Master functionality • Peer-to-peer messaging • Combination of the above • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • What type of hardware should be chosen for this product? • What kind of firmware should be used for this product? Will a commercially available communication stack be used? • Will the development of hardware and software be done internally or will it be designed by an outside company? • What kind of configuration software should be used for this product? Will a commercially available software package be used; i.e., is an EDS adequate to describe the device or is custom software needed? • What are the configuration requirements? • Will the product be tested for conformance and interoperability (highly recommended)? • What design and verification tools should be used? • What is an absolute must before the products can be placed on the market (own the specification, have a Vendor ID)? A full discussion of these issues goes well beyond the scope of this book; see [27] for further reading.
© 2005 by CRC Press
14-33
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP CIP Data Management Services
(Presentation)
CIP Message Routing, Connection Management (Session)
Transport
Explicit Messages, I/O Messages ControlNet Transport
DeviceNet Transport
Encapsulation TCP
UDP
Network IP Data Link
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.33
Possible future Alternatives: ATM, USB, Fire Wire,...
Relationship between CIP and ControlNet.
14.3.1.18 DeviceNet Overall Summary Since its introduction in 1994, DeviceNet has been used successfully in millions of nodes in many different applications. It is a de facto standard in many countries, and this situation is reflected in several national and international standards [16], [17], [18]. Due to its universal communication characteristics, it is one of the most versatile fieldbuses for low-end devices. While optimized for devices with small amounts of I/O, it can easily accommodate larger devices as well. Powerful EDS-based configuration tools allow easy commissioning and configuration of even complex devices without the need to consult manuals. While most applications are of the Master/Slave type, peer-to-peer communication is used in a rising number of applications, greatly simplifying the design, operation, and maintenance of these networks. With the introduction of CIP Safety on DeviceNet, many machine-level applications that today need a set of dedicated networks will soon be accommodated in only one DeviceNet network. Finally, its use of CIP and object structure allows the blending of DeviceNet networks into an overall CIP network structure that permits seamless communication, just as if it was only one network.
14.3.2 ControlNet ControlNet is based on a physical layer and a bus access mechanism that was specifically developed for this network to provide absolute determinism. All other features are based on CIP. Figure 14.33 shows the relationship between CIP, ControlNet, and the ISO/OSI layer model. 14.3.2.1 Physical Layer and Frame Structure The physical layer of ControlNet has specifically been designed for this network; it does not reuse any existing open technology. The basis of the physical layer is a 75-Ω coax trunk line cable (typically of the RG6 type) terminated at both ends. To reduce impedance mismatch, all ControlNet devices are connected to the network through special taps that consist of a coupling network and a specific length of dropline (1 m). There is no minimum distance requirement between any two of these taps, but since every tap introduces some signal attenuation, each tap reduces the maximum length of the trunkline by 16.3 m.
© 2005 by CRC Press
14-34
The Industrial Communication Technology Handbook
maximum allowable segment length =
Segment length (m)
1000
1000 m–16.3 m [number of taps–2]
750
500
250
2
16
32
48
Number of taps FIGURE 14.34
Coax medium topology limits.
This results in a full-length trunkline of 1000 m with only two taps at the ends, while a fully populated physical network with 48 taps allows a trunkline length of 250 m (Figure 14.34). This physical layer limitation was taken into account from the very beginning by including repeaters into the design that can increase the network size without lowering the speed. Therefore, if a network is to be built with a higher number of nodes (up to 99 nodes are possible) or with a topology that goes beyond the single trunkline limitations, repeaters can be used to create any tree topology or even a ring topology using a special type of repeater. There are also repeaters for fiber-optic media that can be used either to increase the system size even further or to allow very good isolation of network segments in harsh EMC (Electromagnetic Compatibility) environments or for high-voltage applications. The number of repeaters in series between any two nodes used to be limited to five until recently. Better repeater technology now allows up to 20 repeaters in series. However, whatever media technology is used, the overall length of a ControlNet system (distance between any two nodes on the network) is limited. This fundamental limit is due to propagation delay. With currently available media, this translates into approximately 20 km. To better accommodate industry requirements, ControlNet supports redundant media, allowing bumpless transfer from primary to secondary media or vice versa if one of them should fail or deteriorate. Developers are encouraged to support this redundant media feature in their design. For cost-sensitive applications, less expensive device variants may then be created by populating one channel only. Another feature often used in the process industry is the capability to run ControlNet systems into areas with an explosion hazard. The system is fully approved to meet worldwide standards for intrinsic safety (explosion protection). The connectors used for copper media are of the BNC type; TNC type connectors have been introduced recently for applications that require IP 67 protection. Devices may also implement a Network Access Port (NAP). This feature takes advantage of the repeater function of the ControlNet application-specific integrated circuits (ASICs). It uses an additional connector (RJ45) with RS 422-based signals that provides easy access to any node on the network for configuration devices. The signal transmitted on the copper media is a 5 Mbit/s Manchester encoded signal with an amplitude of up to 9.5 V (pk-pk) at the transmitter that can be attenuated down to 510 mV (pk-pk) at the receiving end. The transmitting and receiving circuits, coupled to the cable through transformers, are described in full detail in the ControlNet Specification [3].
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-35
14.3.2.2 Protocol Adaptation ControlNet can use all features of CIP. The ControlNet frame is big enough that fragmentation is rarely required. Since ControlNet is not expected to be used in very simple devices, there is no scaling. 14.3.2.3 Indicators and Switches ControlNet devices must be built with device status and network status indicators, as described in the specification. Devices may have additional indicators, which must not carry any of the names of those described in the specification. Devices may be built with or without switches or other directly accessible means for configuration. If switches for the MAC ID exist, then certain rules apply about how these values have to be used at powerup and during the operation of the device. 14.3.2.4 Additional Objects The ControlNet Specification defines three additional objects, the ControlNet Object (Class Code 0xF0), the Keeper Object (Class Code 0xF1), and the Scheduling Object (Class Code 0xF2). 14.3.2.4.1 ControlNet Object The ControlNet Object contains a host of information on the state of the ControlNet link of the device, among them diagnostic counters, data link and timing parameters, and the MAC ID. A ControlNet Object is required for every physical layer attachment of the device. A redundant channel pair counts as one attachment. 14.3.2.4.2 Keeper Object The Keeper Object (not required for every device) holds (for link scheduling software) a copy of the Connection Originator schedule data for all Connection Originator devices using a network. Every ControlNet network with scheduled I/O traffic must have at least one device with a Keeper Object (typically a Programmable Logic Controller (PLC) or another Connection Originator). If there are multiple Keeper Objects on a link, they perform negotiations to determine which Keeper is the Master Keeper and which Keeper(s) performs Backup Keeper responsibilities. The Master Keeper is the Keeper actively distributing attributes to the nodes on the network. A Backup Keeper is one that monitors Keeperrelated network activity and can transition into the role of Master Keeper should the original Master Keeper become inoperable. 14.3.2.4.3 Scheduling Object The Scheduling Object is required in every device that can originate an I/O Messaging Connection. Whenever a link scheduling tool accesses a Connection Originator on a ControlNet link, an instance of the Scheduling Object is created and a set of object-specific services is used to interface with this object. Once the instance is created, the link scheduling tool can then read and write connection data for all connections to originate from this device. After having read all connection data from all Connection Originators, the link scheduling tool can calculate an overall schedule for the ControlNet link and write this data back to all Connection Originators. The scheduling session is ended by deleting the instance of the Scheduling Object. 14.3.2.5 Network Access The bus access mechanism of ControlNet allows full determinism and repeatability while still maintaining sufficient flexibility for various I/O Message triggers and Explicit Messaging. This bus access mechanism is called Concurrent Time Domain Multiple Access (CTDMA); it is illustrated in Figure 14.35. The time axis is divided into equal intervals called Network Update Time (NUT). Within each NUT there is a subdivision into a Scheduled Service Time, an Unscheduled Service Time, and the Guardband. Figure 14.36 shows the function of the Scheduled Service. Every node up to and including the SMAX node (maximum node number participating in the Scheduled Service) has a chance to send a message
© 2005 by CRC Press
14-36
The Industrial Communication Technology Handbook
FIGURE 14.35
Media access through CTDMA.
FIGURE 14.36
Scheduled Service.
within the Scheduled Service. If a particular node has no data to send, it will nevertheless send a short frame to indicate that it is still alive. If a node fails to send its frame, the next-higher node number will step in after a very short, predetermined waiting time. This makes sure that the failure of a node will not lead to an interruption of the NUT cycle. Figure 14.37 shows the function of the Unscheduled Service. Since this service is designed for nontime-critical messages, only one node is guaranteed to get access to the bus during the Unscheduled Service Time. If there is time left, then the other nodes (with higher node numbers) will also get a chance to send. As with the Scheduled Service Time, if a node fails to send when it is its turn, the next node will step in. The node number that is allowed to send first within the Unscheduled Service Time is increased by 1 in each NUT. This guarantees an equal chance to all nodes. All node sequencing in this interval wraps; UMAX is followed by the lowest node number (typically 1) on the network. Those two service intervals combined with the Guardband guarantee determinism and repeatability while still maintaining sufficient freedom to allow for unscheduled message transmissions, e.g., for parameterization.
© 2005 by CRC Press
14-37
The CIP Family of Fieldbus Protocols
FIGURE 14.37
Unscheduled Service.
MAC Frame Preamble 16 bits
Start Delimiter 8 bits
Source MAC ID 8 bits
Lpacket FIGURE 14.38
Lpacket
Lpackets
CRC
0...510 bytes
16 bits
•••
End Delimiter 8 bits
Lpacket
MAC frame format.
14.3.2.6 Frame Description Every frame transmitted on ControlNet has the format of the MAC frame (Figure 14.38). Within every MAC frame, there is a field of up to 510 bytes that is available for the transmission of data or messages. This field may be populated with one or several Lpackets (link packets). These Lpackets carry the individual messages (I/O or Explicit) of CIP. There are also some specialized Lpackets used for network management. Since every node always listens to all MAC frames, they have no problem consuming any of the Lpackets in the frame that might be unicast, multicast, or broadcast in nature. This feature allows fine-tuned multicasting of small amounts of data to different sets of consumers without too much overhead. There are two types of Lpacket formats: fixed tag and generic tag. The fixed tag Lpackets are used for Unconnected Messaging and network administration, while the generic tag Lpackets are used for all Connected Messaging (I/O and Explicit). Figure 14.39 shows the format of a fixed tag Lpacket. By including the destination MAC ID, this format reflects the fact that these Lpackets are always directed from the requesting device (sending the MAC frame) to the target device (the destination MAC ID). The service byte within the fixed tag Lpacket does
© 2005 by CRC Press
14-38
The Industrial Communication Technology Handbook
Lpacket
Size
Control
Service
Destination MAC ID
Link data
1 byte
1 byte
1 byte
1 byte
3...506 bytes
FIGURE 14.39
Fixed tag Lpacket format.
Lpacket
Size
Control
Connection ID
Link data
1 byte
1 byte
3 bytes
0...504 bytes
FIGURE 14.40
Generic tag Lpacket format.
not represent the service of an Explicit Message, but a more general service type since the fixed tag Lpacket format can be used for a variety of actions such as network administration. Figure 14.40 shows the format of a generic tag Lpacket. The size byte specifies the number of words within the Lpacket; the control byte gives information on what type of Lpacket this is. The 3-byte Connection Identifier specifies which connection this Lpacket belongs to. These three bytes are the three lower bytes of the 4-byte Connection ID specified in the Forward_Open Message. The ControlNet Specification gives full details on how to assemble the three lower bytes of the Connection ID; the uppermost byte is always zero. For a device that receives the MAC frame, the Connection ID is the indication whether to ignore the Lpacket (the device is not part of the connection), to consume the data and forward it to the application (the device is an endpoint of this connection), or to forward the data to another network (the device acts as a bridge in a bridged connection). 14.3.2.7 Network Start-Up After power-up, every ControlNet device goes through a process of getting access to the ControlNet communication link and learning the current NUT and other timing requirements. This is a fairly complex process typically handled by the commercially available ControlNet ASICs. It would go beyond the scope of this handbook to describe all the details here. 14.3.2.8 Explicit Messaging Unlike DeviceNet, Explicit Messages on ControlNet can be sent connected or unconnected; both are typically transmitted within the unscheduled part of the NUT. Connected Explicit Messaging requires setting up a connection first (see Section 14.3.2.10). This, of course, means that all resources required for the management of the connection must stay reserved for this purpose as long as the connection exists. To avoid tying up these resources, most Explicit Messages can also be sent unconnected. Every part of an Explicit Message (request, response, acknowledgments) is wrapped into an Lpacket using the fixed tag Lpacket format for Unconnected Messaging (Figure 14.39) and the generic tag Lpacket format for Connected Messaging (Figure 14.40). The service/class/instance/attribute fields (see Section 14.2.3) of the Explicit Message are contained in the link data field.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.41
14-39
Device levels.
14.3.2.9 I/O Messaging ControlNet I/O Messaging, like any other CIP I/O Messaging, is done across connections and it always takes place in the scheduled part of NUT. Only one MAC frame may be transmitted by any device within its time slot, but this MAC frame may contain multiple Lpackets so that data can be sent to multiple nodes in one NUT. The individual Lpackets may be consumed by one node only or by multiple nodes if they are set up to consume the same data. I/O Messages use the generic tag Lpacket format (Figure 14.40). The link data field contains the I/O data prepended with a 16-bit sequence count number for the packet. I/O data transmission without the Sequence Count Number is possible in principle, but is not used today. Run/Idle can be indicated within a prepended Run/Idle header or by sending the data packet (Run) or no data packet (Idle). Which of the two methods that is used is indicated in the connection parameters in the Connection Manager section of the EDS. However, only the Run/Idle header method has been in use for ControlNet up to now. 14.3.2.10 Connection Establishment All connections on ControlNet are established using a UCMM Forward_Open message (see Section 14.2.3); therefore, all devices must support the UCMM function. 14.3.2.11 Device Levels While not official categories, it is useful to distinguish among several levels of devices (Figure 14.41); you only have to implement the functionality you need. The minimal device function (level 1) is that of a Messaging Server. It is used for Explicit Messaging applications only and acts as a target for Connected and Unconnected Explicit Messages, e.g., for program upload/download, data collection, status monitoring, etc. The next class of device (level 2) is an I/O Server. It adds I/O Messaging support to a level 1 device and acts as a target for both Explicit and I/O Messages, e.g., simple I/O devices, pneumatic valves, AC drives, etc. These devices are also called adapters. Another class of devices is a Messaging Client (level 3). It adds client support to level 1 Explicit Messaging applications and acts as a target and an originator for messaging applications, e.g., computer interface cards or Human-Machine Interface (HMI) devices. Finally, the most powerful class of device is a scanner (level 4). It adds I/O Message origination support to levels 1, 2, and 3 and acts as a target and an originator for Explicit and I/O Messages, e.g., PLCs, I/O scanners, etc. 14.3.2.12 Device Profiles ControlNet uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.2.13 Configuration ControlNet devices typically come with EDSs, as described in Section 14.2.7. For EDS-based configuration tools, the EDS should contain a Connection Manager section to describe the details of the connections that can be made into the device. This section basically is a mirror of what is contained in the
© 2005 by CRC Press
14-40
The Industrial Communication Technology Handbook
Forward_Open message that a Connection Originator would send to the device. Multiple connections can be specified within an EDS, then one or more can be chosen by the configuration tool. An EDS may also contain individual parameters or a Configuration Assembly with a complete description of all parameters within this Assembly. In many applications, the Configuration Assembly is transmitted as an attachment to the Forward_Open Message. 14.3.2.14 Conformance Test ControlNet International has defined a conformance test for ControlNet devices. Currently, this test is a protocol conformance test only since it is expected that most implementations use the commercially available components for transformers and drivers. As many as several thousand messages are transmitted to the DUT, depending on the complexity of the device. To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. The software test is available from ControlNet International. It is a Windows-based tool, running on a PC interface card through a NAP connection (see Section 14.3.2.1). It is recommended that device developers run this test in their own labs before taking devices to the official ControlNet International test. When a device passes the test, it is said to be ControlNet CONFORMANCE TESTED™.* Many ControlNet users now demand this seal. A device that has not been tested accordingly has a significant market disadvantage. Devices that have passed conformance testing are published on the ControlNet International Web site. 14.3.2.15 Tools Tools for ControlNet networks can be divided into three groups: • Physical layer tools: Tools (hardware and software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. Most configuration tools are EDS-based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via Ethernet and suitable bridging devices, and thus allow remote access. High-level tools also actively query the devices on the network to identify them and monitor their health. Configuration tools may also be integrated into other packages like PLC programming software. • Monitoring tools: Typically PC-based software packages that can capture and display the ControlNet frames on the network. A raw ControlNet frame display may be good enough for some experts, but it is recommended that a tool that allows both raw ControlNet frame display and interpreted frames be used. For a typical installation, a configuration tool is all that is needed. However, to ensure that the network is operating reliably, a check with a physical layer tool is highly recommended. Experience shows that the overwhelming majority of ControlNet network problems are caused by inappropriate physical layer installation. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the ControlNet product catalog on the ControlNet International Web site to access a list of vendors that provide tools for ControlNet. 14.3.2.16 Advice for Developers Before any development of a ControlNet product is started, the following issues should be considered in detail:
*ControlNet CONFORMANCE TESTED™ is a certification mark of ControlNet International.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-41
• What functionality (device class; see Section 14.3.2.11) does the product require today and in future applications? • Messaging server only • Adapter functionality • Messaging client • Scanner functionality • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • Will the development be based on commercially available hardware components and software packages (recommended) or designed from scratch (possible but costly)? • What are the configuration requirements? • Will the product be tested for conformance (highly recommended)? • What design and verification tools should be used? • What is an absolute must before products can be placed on the market (own the specification, have a Vendor ID)? ControlNet chip sets and associated software packages are available from Rockwell Automation and through ControlNet International. Turn to the ControlNet International Web site for a list of companies that can support ControlNet developments. 14.3.2.17 ControlNet Overall Summary Since its introduction in 1997, ControlNet has been used successfully in hundreds of thousands of nodes in many different applications. It is the network of choice for many high-speed I/O and PLC interlocking applications. Like DeviceNet, ControlNet has been turned into an international standard [19]. Due to its universal communication characteristics, it is one of the most powerful controller-level fieldbuses. The specific strength of ControlNet is its full determinism and repeatability, which make it ideally suited for many high-speed applications while maintaining full Explicit Messaging capabilities without compromising its real-time behavior. Finally, its use of CIP and object structure allows the blending of ControlNet networks into an overall CIP network structure that permits seamless communication, just as if it was only one network.
14.3.3 EtherNet/IP EtherNet/IP is the newest member of the CIP family; it is a technology supported by both ODVA and ControlNet International. EtherNet/IP has evolved from ControlNet and is therefore very similar to ControlNet in the way the CIP Specification is applied. Due to the length of the Ethernet frames and the typical multimaster structure of Ethernet networks, there are no particular limitations in the EtherNet/IP implementation of CIP. Basically all that is required is a mechanism to encode CIP Messages into Ethernet frames. Figure 14.42 shows that there is an encapsulation mechanism (see Section 14.3.3.6) that specifies how I/O and Explicit Messages are wrapped into Ethernet frames. The well-known TCP/IP protocol is used for the encapsulation of Explicit Messages, while UDP/IP is used for the encapsulation of I/O Messages. The use of the very popular TCP/IP and UDP/IP stacks for encapsulation means that many applications will not require extra middleware for this purpose, since these stacks are already in use in many applications anyway. Even with the use of certain infrastructure devices (see Section 14.3.3.16) it is difficult to make today’s Ethernet fully deterministic. Therefore, many CIP users may prefer ControlNet for applications that require full determinism and repeatability. However, future extensions to CIP such as CIP Sync (see Section 14.5.1) will allow EtherNet/IP to be used in highly synchronous and deterministic applications like coordinated drives. 14.3.3.1 Physical Layer Adaptation Since EtherNet/IP is taking the Ethernet protocol to the factory floor, there are some restrictions and further requirements on the physical layer [12] that is to carry EtherNet/IP in a typical factory automation
© 2005 by CRC Press
14-42
The Industrial Communication Technology Handbook
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP CIP Data Management Services
(Presentation)
CIP Message Routing, Connection Management (Session)
Transport
Explicit Messages, I/O Messages ControlNet Transport
DeviceNet Transport
Encapsulation TCP
UDP
Network IP DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.42
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP and EtherNet/IP.
environment. The actual signaling is left unchanged, but there are some additional specifications on connectors and cabling. For IP 20 applications, the well-known RJ45 connector is used, but for applications that require a higher degree of protection, suitable connectors have been specified. The EtherNet/ IP specification lists a sealed connector based on the RJ45 type. A second connector (D-coded M12) is a recent addition for devices that require a more compact connector. This connector has also been specified by a number of other organizations, so it is expected that it will become the de facto standard for field devices. Cat 5E or Cat 6 shielded or unshielded cables are recommended for EtherNet/IP. The use of shielded cables is specifically recommended in application where adjacent material, such as metal cable ducts, may have substantial influence on the characteristics of the cable. Copper media may only be used for distances up to 100 m. Fiber-optic media are recommended for longer distances. Fiber-optic media may also be advisable for applications with very high electromagnetic disturbances or high-voltage potential differences between devices. 14.3.3.2 Frame Structure EtherNet/IP uses standard Ethernet TCP/IP and UDP/IP frames as defined by international standards [12]. Therefore, no further frame details are described here. 14.3.3.3 Protocol Adaptation EtherNet/IP can use all features of CIP. The Ethernet frame is big enough that fragmentation is rarely required. Since EtherNet/IP is not expected to be used in very simple devices, no further scaling than that described in Section 14.3.3.10 is required. 14.3.3.4 Indicators and Switches EtherNet/IP devices that need to conform to the industrial performance level must have the set of indicators described in Chapter 9 of the EtherNet/IP Specification [6]. Devices may have additional indicators, which must not carry any of the names of those described in the specification. Devices may be built with or without switches or other directly accessible means for configuration.
© 2005 by CRC Press
14-43
The CIP Family of Fieldbus Protocols
Ethernet Frame Ethernet Header
IP Header
TCP or UDP Header
Encapsulation Header
Encapsulation Data
Trailer
Described in EtherNet/IP Specification FIGURE 14.43
Relationship between CIP and Ethernet frames.
14.3.3.5 Additional Objects The EtherNet/IP Specification defines two additional objects, the TCP/IP Object (Class Code 0xF5) and the Ethernet Link Object (Class Code 0xF6). 14.3.3.5.1 TCP/IP Object The TCP/IP interface object provides a mechanism to configure a device’s TCP/IP network interface. Examples of configurable items include the device’s IP address, network mask, and gateway address. 14.3.3.5.2 Ethernet Link Object The Ethernet link object maintains link-specific counters and status information for an Ethernet 802.3 communications interface. Each device has exactly one instance of the Ethernet link object for each Ethernet 802.3 communications interface. A request to access instance 1 of the Ethernet link object always refers to the instance associated with the communications interface over which the request was received. 14.3.3.6 EtherNet/IP Encapsulation EtherNet/IP is completely based on existing TCP/IP and UPD/IP technologies and uses these principles without any modification. TCP/IP is mainly used for the transmission of Explicit Messages, while UDP/ IP is used mainly for I/O Messaging. The encapsulation protocol defines a reserved TCP port number that is supported by all EtherNet/IP devices. All EtherNet/IP devices accept at least two TCP connections on TCP port number 0xAF12. The encapsulation protocol also defines a reserved UDP port number that is supported by all EtherNet/ IP devices. All devices accept UDP packets on UDP port number 0xAF12. However, most UDP port assignments in EtherNet/IP are determined by (TCP) Explicit Messages; most EtherNet/IP UDP messages do not, in fact, use port 0xAF12. Since UDP, unlike TCP, does not have an ability to reorder packets, whenever UDP is used to send an encapsulated message, the entire message is sent in a single UDP packet and only one encapsulated message is present in any UDP packet. 14.3.3.6.1 General Use of the Ethernet Frame Since EtherNet/IP is completely based on Ethernet with TCP/IP and UDP/IP, all CIP-related messages sent on an EtherNet/IP network are based on Ethernet frames with an IP header (Figure 14.43). The Ethernet, IP, and TCP or UDP headers are described through international standards (see Section 14.3.3.2); therefore, details of these headers are only mentioned in the EtherNet/IP Specification when necessary to understand how they are used. The encapsulation header is a description of the meaning of the encapsulation data. Most encapsulation data use the so-called Common Packet Format. I/O Messages sent in UDP frames do not carry an encapsulation header, but they still follow the Common Packet Format. 14.3.3.6.2 Encapsulation Header and Encapsulation Commands The overall encapsulation packet has the structure described in Figure 14.44. While the description of some of the encapsulation header details would go beyond the scope of this handbook, the command field needs some more attention here. However, only those commands that are
© 2005 by CRC Press
14-44
The Industrial Communication Technology Handbook
Encapsulation Packet Encapsulation Header
Encapsulation Data
Command
Length
Session Handle
Status
Sender Context
Options
Encapsulated Data Common Packet Format
2 bytes
2 bytes
4 bytes
4 bytes
8 bytes
4 bytes
0 to 65,511 bytes
FIGURE 14.44
Structure of the encapsulation packet. Common Packet Format
Item count
Address Item
Data Item
Optional additional items
2 bytes
FIGURE 14.45
Type ID
Length
2 bytes
2 bytes
Data
Type ID
Length
2 bytes
2 bytes
Data
Common Packet Format.
needed to understand EtherNet/IP are described here, and their descriptions only list the main features. The encapsulated data as such follows the Common Packet Format (see Section 14.3.3.6.2.4). 14.3.3.6.2.1 ListIdentity Command — The ListIdentity command is a broadcast UDP message that tells all EtherNet/IP devices to return a data set with identity information. This command is typically used by software tools to browse a network. 14.3.3.6.2.2 RegisterSession/UnRegisterSession Commands — These two commands are used to register and unregister a CIP Session between two devices. Once such a session is established, it can be used to exchange further messages. Multiple sessions may exist between two devices, but this is not common. The device requesting the session creates a sender context value; the device receiving the session request creates a session handle. Both values are used to identify messages between the two devices. 14.3.3.6.2.3 SendRRData/SendUnitData Commands — The SendRRData command is used for Unconnected Messaging; the SendUnitData command is used for Connected Explicit Messaging. 14.3.3.6.2.4 Common Packet Format — The Common Packet Format is a construct that allows packing of multiple items into one encapsulation frame (Figure 14.45). However, in most cases, only one Address Item and one Data Item are represented. All encapsulated messages are then assembled using at least these two items within the Common Packet Format. Full details of this encapsulation can be found in Chapter 2 of the EtherNet/IP Specification [6]. 14.3.3.7 IP Address Assignment Since the initial development of TCP/IP, numerous methods for configuring a device’s IP address have evolved. Not all of these methods are suitable for industrial control devices. In the office environment, for example, it is common for a PC to obtain its IP address via the Dynamic Host Configuration Protocol (DHCP), potentially getting a different address each time the PC reboots. This is acceptable because the PC is typically a client device that only makes requests, so there is no impact if its IP address changes.
© 2005 by CRC Press
14-45
The CIP Family of Fieldbus Protocols
Command [0x6F]
…
FIGURE 14.46
Length [bytes]
Session Handle
Status [0]
Sender Context
Options [0]
…
Interface Time- Item Address Address Data Data Message Handle out Count Type Length Type Length Router Request [0] [2] [0] [0] [0x00B2] Packet
UCMM request encapsulation.
However, for an industrial control device that is a target of communication requests, the IP address cannot change at each power-up. If you’re talking to a particular PLC, you want that PLC to be at the same address the next time it powers up. To further complicate matters, the only interface common to all EtherNet/IP devices is an Ethernet communications port. Some devices may also have a serial port, user interface display, hardware switches, or other interfaces, but these are not universally shared across all devices. Since Ethernet is the common interface, the initial IP address must at least be configurable over Ethernet. The EtherNet/IP Specification, via the TCP/IP Interface Object, defines a number of ways to configure a device’s IP address. A device may obtain its IP address via Bootstrap Protocol (BOOTP), DHCP, or an explicit Set_Attribute (single or set-all) service. None of these methods are mandated however. As a result, vendors could choose different methods for configuring IP addresses. From the user’s perspective, it is desirable for vendors to support some common mechanism(s) for IP address configuration. Therefore, ODVA, Profibus User Organization (PNO), and Modbus/IDA (Interface for Distributed Automation) are currently working on mandating a set of common methods to assign an IP address across the Ethernet link. The current ODVA recommendations on this subject can be downloaded from the ODVA Web site [8]. 14.3.3.8 Use of the Encapsulation Data 14.3.3.8.1 Explicit Messaging Unlike DeviceNet, Explicit Messages on EtherNet/IP can be sent connected or unconnected. Connected Explicit Messaging requires setting up a connection first (see Section 14.3.3.9). This, of course, means that all resources required for the management of the connection must stay reserved for this purpose as long as the connection exists. To avoid tying up these resources, most Explicit Messages can also be sent unconnected. Explicit Messages on EtherNet/IP are sent with a TCP/IP header and use encapsulation with the SendRRData Command (unconnected) and the SendUnitData Command (connected). As an example, the full encapsulation of a UCMM request is shown in Figure 14.46. The Message Router Request Packet, containing the message as such, follows the general format of Explicit Messages defined in Chapter 2 of the CIP Specification [4]. 14.3.3.8.2 I/O Messaging I/O Messages on EtherNet/IP are sent with a UDP/IP header. No encapsulation header is required, but the message still follows the Common Packet Format (e.g., Figure 14.47). The data field contains the I/O data prepended with a 16-bit Sequence Count Number for the packet. I/O data transmission without the Sequence Count Number is possible in principle, but is not used today. Run/Idle can be indicated within a Run/Idle header or by sending the data packet (Run) or no data packet (Idle). Which of the two methods is used is indicated in the connection parameters of the
© 2005 by CRC Press
14-46
The Industrial Communication Technology Handbook
Item Address Type Address Conn. Sequence Data Type Data Count (Sequenced) Length ID Number (Connected) Length [2] [0x8002] [8] [0x00B1]
...
FIGURE 14.47
Sequence Count Value
Run/Idle Header
...
I/O Data
I/O Message encapsulation.
Connection Manager section of the EDS. However, the Run/Idle header method is recommended for use in EtherNet/IP, and this is what is shown in Figure 14.47. I/O Messages from the originator to the target are typically sent as UDP unicast frames, while those sent from the target to the originator are typically sent as UDP multicast frames. This allows other EtherNet/IP devices to listen to this input data. To avoid these UDP multicast frames propagating all over the network, it is highly recommended that switches that support Internet Group Management Protocol (IGMP) Snooping be used. IGMP (see [41]) is a protocol that allows the automatic creation of multicast groups. Using this functionality, the switch will automatically create and maintain a multicast group consisting of the devices that need to consume these multicast messages. Once the multicast groups have been established, the switch will direct such messages only to those devices that have subscribed to the multicast group of that message. 14.3.3.9 Connection Establishment All connections on EtherNet/IP are established using a UCMM Forward_Open Message (see Section 14.2.3); therefore, all devices must support the UCMM function. 14.3.3.10 Device Levels (Clients, Servers) While not official categories, it is useful to distinguish among several levels of devices (Figure 14.48); one only has to implement the functionality needed. The minimal device function (level 1) is that of a Messaging Server. It is used for Explicit Messaging applications only and acts as a target for Connected and Unconnected Explicit Messages, e.g., for program upload/download, data collection, status monitoring, etc. The next class of device (level 2) is an I/O Server. It adds I/O Messaging support to a level 1 device and acts as a target for both Explicit and I/O Messages, e.g., simple I/O devices, pneumatic valves, AC drives, etc. These devices are also called adapters. Another class of device is a Messaging Client (level 3). It adds client support to level 1 Explicit Messaging applications and acts as a target and an originator for messaging applications, e.g., computer interface cards or HMI devices. Finally, the most powerful class of device is a scanner (level 4). It adds I/O Message origination support to levels 1, 2, and 3 and acts as a target and an originator for Explicit and I/O Messages, e.g., PLCs, I/O scanners, etc. 14.3.3.11 Device Profiles EtherNet/IP uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.3.12 Configuration EtherNet/IP devices typically come with EDSs, as described in Section 14.2.7. For EDS-based configuration tools, the EDS should contain a Connection Manager section to describe the details of the connections that can be made into the device. This section basically is a mirror of what is contained in the Forward_Open message that a Connection Originator would send to the device. Multiple connections can be specified within an EDS that can then be chosen by the configuration tool.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.48
14-47
Device levels.
An EDS may also contain individual parameters or a Configuration Assembly with a complete description of all parameters within this Assembly. In many applications, the Configuration Assembly is transmitted as an attachment to the Forward_Open Message. 14.3.3.13 Conformance Test Conformance testing is mandatory for all EtherNet/IP devices. Currently, this test is a protocol conformance test only since it is expected that most implementations use commercially available components for media access and physical attachments. Depending on the complexity of the device, as many as several thousand messages are transmitted to the DUT. To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. The software test is available from ODVA. It is a Windows-based tool, running on a PC with a standard Ethernet card. It is recommended that device developers run this test in their own labs before taking devices to the official ODVA test. When a device passes the test, it is said to be EtherNet/IP CONFORMANCE TESTED™.* Devices that have passed conformance testing are published on the ODVA Web site. 14.3.3.14 Requirements for TCP/IP Support In addition to the various requirements set forth in the EtherNet/IP Specification, all EtherNet/IP hosts are required to have a minimally functional TCP/IP suite and transport mechanism. The minimum host requirements for EtherNet/IP hosts shall be those covered in RFC 1122 [36], RFC 1123 [37], and RFC 1127 [38] and the subsequent documents that may supersede them. Whenever a feature or protocol is implemented by an EtherNet/IP host, that feature shall be implemented in accordance with the appropriate RFC documents, regardless of whether the feature or protocol is considered required or optional by this specification. The Internet and RFCs are dynamic. There will be changes to the RFCs and to the requirements included in this section as the Internet and this specification evolve, and these changes will not always provide for backward compatibility. All EtherNet/IP devices shall at a minimum support: • • • • • • •
Internet Protocol (IP version 4) (RFC 791 [29]) User Datagram Protocol (UDP) (RFC 768 [28]) Transmission Control Protocol (TCP) (RFC 793 [31]) Address Resolution Protocol (ARP) (RFC 826 [32]) Internet Control Messaging Protocol (ICMP) (RFC 792 [30]) Internet Group Management Protocol (IGMP) (RFC 1112 [35] and RFC 2236 [41]) IEEE 802.3 (Ethernet) as defined in RFC 894 [33]
*EtherNet/IP CONFORMANCE TESTED™ is a certification mark of ODVA.
© 2005 by CRC Press
14-48
FIGURE 14.49
The Industrial Communication Technology Handbook
Relationship of CIP to other typical Ethernet protocols.
Although the encapsulation protocol is suitable for use on other networks besides Ethernet that support TCP/IP, and products may be implemented on these other networks, conformance testing of EtherNet/ IP products is limited to those products on Ethernet. Other suitable networks include: • Point-to-Point Protocol (PPP) (RFC 1171 [39]) • ARCNET (RFC 1201 [40]) • Fiber Distributed Data Interface (FDDI) (RFC 1103 [34]) 14.3.3.15 Coexistence of EtherNet/IP and Other Ethernet-Based Protocols EtherNet/IP devices are encouraged but not required to support other Internet protocols and applications not specified in the EtherNet/IP Specification. For example, they may support Hypertext Transfer Protocol (HTTP), Telnet, File Transfer Protocol (FTP), etc. The EtherNet/IP Specification makes no requirements with regards to these protocols and applications. Figure 14.49 shows the relationship between CIP and other typical Ethernet-based protocol stacks. Since EtherNet/IP, like many other popular protocols, is based on TCP/IP and UDP/IP, coexistence with many other services and protocols is no problem at all and CIP blends nicely into the set of already existing functions. This means that anybody already using some or all of these popular Ethernet services can add CIP without too much of a burden; the existing services like HTTP or FTP may remain as before, and CIP will become another service on the process layer. 14.3.3.16 Ethernet Infrastructure To successfully apply EtherNet/IP to the automation world, the issue of determinism has to be considered. The inherent principle of the Ethernet bus access mechanism whereby collisions are detected and nodes back off and try again after a while cannot guarantee determinism. While Ethernet in its present form cannot be made strictly deterministic, there are ways to improve this situation. First, the hubs typically used in many office environments have to be replaced by the more intelligent switches that will forward only those Ethernet frames that are intended for nodes connected to this switch. With the use of full-duplex switch technology, collisions are completely avoided; instead of colliding, multiple messages sent to the same node at the same time are queued up inside the switch and are then delivered one after another. As already mentioned in Section 14.3.3.8.2, it is highly recommended that switches that support IGMP Snooping be used.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-49
If EtherNet/IP networks are to be connected to a general company network, then this should always be done through a router. The router keeps the UDP multicast messages from propagating into the company network and makes sure that broadcast or multicast office traffic does not congest the control network. Even though the router separates the two worlds, it can be set up to allow the TCP/IP-based Explicit Messages to pass through so that a configuration tool sitting in a PC in the office environment may very well be capable of monitoring and configuring devices on the control network. 14.3.3.17 Tools Tools for EtherNet/IP networks can be divided into four groups: • Physical layer tools: Tools (hardware and software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Commissioning tools: All EtherNet/IP devices need an IP address. In some cases, the setting of this address can only be achieved through the Ethernet link (see Section 14.3.3.7). In these cases, a BOOTP/DHCP server tool is required such as the free BOOTP/DHCP routine downloadable from the Rockwell Automation Web site. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. Most configuration tools are EDS based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via suitable bridging devices. High-level tools also actively query the devices on the network to identify them and monitor their health. Configuration tools may also be integrated into other packages like PLC programming software. • Monitoring tools: Typically PC-based software packages (called sniffers) that can capture and display the Ethernet frames on the network. A raw Ethernet frame display may be good enough for some top experts, but it is recommended that a tool that allows both raw Ethernet frame display and multiple levels of frame interpretation (IP, TCP/UDP, EtherNet/IP header interpretation) be used. Due to the popularity of Ethernet, a large number of sniffers are available, but not all of them support EtherNet/IP decoding. For a typical installation, a commissioning tool and a configuration tool are all that is needed. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the EtherNet/IP product catalog on the ODVA Web site to access a list of vendors that provide tools for EtherNet/IP. 14.3.3.18 Advice for Developers Before any development of an EtherNet/IP product is started, the following issues should be considered in detail: • What functionality (device class; see Section 14.3.3.10) does the product require today and in future applications? • Messaging server only • Adapter functionality • Messaging client • Scanner functionality • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • Will the development be based on commercially available hardware components and software packages (recommended) or designed from scratch (possible but costly)? • What are the configuration requirements? • What design and verification tools should be used?
© 2005 by CRC Press
14-50
The Industrial Communication Technology Handbook
• What is an absolute must before products can be placed on the market (own the specification, have a Vendor ID, have the product conformance tested)? Ethernet chip sets and associated base software packages are available from a large number of vendors on the market. For support of the EtherNet/IP part of the development, turn to the ODVA Web site for a list of companies that can support EtherNet/IP developments. 14.3.3.19 EtherNet/IP Overall Summary Since its introduction in 2000, EtherNet/IP has shown remarkable growth in many applications that used to be done with traditional fieldbuses. This success is largely attributed to the fact that this TCP/UDP/ IP-based Ethernet system has introduced real-time behavior into the Ethernet domain without giving up any of its highly appreciated features such as company-wide access with standard and specialized tools through corporate networks. The specific strength of EtherNet/IP is the fact that it does not require a modified or highly segregated network; standard switches and routers as known in the office world can be used without modification. At the same time, this means that all existing transport-level or TCP/UDP/IP-level protocols can continue to be used without any need for special bridging devices. The substantially improved real-time behavior of CIP Sync and the introduction of CIP Safety will soon allow EtherNet/IP to be used in applications that today need a set of several dedicated fieldbuses. Finally, its use of CIP and object structure allows the blending of EtherNet/IP networks into an overall CIP network structure that allows seamless communication, just as if it was only one network.
14.4 Benefits of the CIP Family The benefits of the CIP family can be subdivided into two groups: • Benefits for the manufacturer of devices • Benefits for the user of devices and systems
14.4.1 Benefits for the Manufacturer of Devices Major benefits for manufacturers come from the fact that existing knowledge can be reused from one protocol to another. This results in lower training costs for development, sales, and support personnel. Reduced development costs can be achieved since certain parts (e.g., parameters, profiles) of the embedded firmware can be reused from one network to another since they are identical. As long as these parts are written in a high-level language, the adaptation is simply a matter of running the right compiler for the new system. Another very important advantage for manufacturers is the easy routing of messages from one system to another. Any routing device can be designed very easily, since there is no need to invent a translation from one system to another; both systems already speak the same language. Manufacturers also benefit from dealing with the same organizations for support and conformance testing.
14.4.2 Benefits for the Users of Devices and Systems Major benefits for users come from the fact that existing knowledge can be reused from one protocol to another, e.g., Device Profiles and the behavior of devices are identical from one system to another. This results in lower training costs. Technical personnel and users do not have to make very large changes to adapt an application from one type of CIP network to another. The system integrator can choose the CIP network that is best suited to his application without having to sacrifice functionality. A further, very important benefit comes from the ease of bridging and routing within the CIP family. Moving information between noncompatible fieldbuses is always difficult and cumbersome, since it is almost impossible to translate functionality from one fieldbus to another. This is where the full benefits
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-51
of CIP can be reaped. Forwarding of data and messages from top to bottom and back again is very easy to implement and uses very little system overhead. There is no need to translate from one data structure to another — they are the same. Services and status codes share the same benefit: these, too, are identical over all CIP networks. Finally, creating a message that runs through multiple hops of CIP networks is simply a matter of inserting the full path from the originating to the target device. Not a single line of code or any other configuration is required in the routing devices. This results in fast and efficient services that are easy to create and maintain. Even though these networks may be used in different parts of the application, messaging from one end to another really works as if there is only one network. Finally, the very efficient Producer/Consumer mechanisms used in all CIP networks result in very fast and efficient use of the transmission bandwidth, with the result that system performance is often much higher than that with other fieldbuses running at higher raw baud rates. Only the data that are really important will be transmitted, instead of repeating old data again and again. Planned and future protocol extension will always be integrated in a manner that allows coexistence of normal devices with enhanced devices like those supporting CIP Sync and CIP Safety. Therefore, no strict segmentation into Standard, CIP Sync, and CIP Safety networks is required unless there is a compelling reason, e.g., unacceptably high response time due to high bus load.
14.5 Protocol Extensions under Development 14.5.1 CIP Sync 14.5.1.1 General Considerations CIP networks as described in [3], [4], [5], and [6] have a real-time behavior that is appropriate for many applications, but there are a growing number of applications that require much tighter control of certain real-time parameters. Let us have a look at some of them: • Real time: This term is being used in a large number of differing meanings in various contexts. For further use in this section, the following definition is used: A system exhibits real-time behavior when it can react to an external stimulus within a predetermined time. How short or how long this time is depends on the application. Demanding industrial control applications require reactions in the millisecond range, while some process control applications can often live with reaction times of several seconds or more. • Determinism: A deterministic system allows worst-case determination (not a prediction or a probability) of when a certain action takes place. Industrial communication systems may offer determinism to a higher or lesser degree depending on how they are implemented and used. Networks featuring message transmission at a predetermined point in time, such as ControlNet, SERCOS interface, and Interbus-S, are often said to offer absolute determinism. On the other hand, networks such as Ethernet may become undeterministic under certain load conditions, specifically when deployed in half-duplex mode with hubs. However, when Ethernet is deployed with full-duplex high-speed switches, it operates in a highly deterministic manner (see Section 14.3.3.16). • Reaction time: In an industrial control system, the overall system reaction time is what determines the real-time behavior. The communication system is only one of several contributing factors to the overall reaction time. In general, it is the time from an input stimulus to a related output action. • Jitter: The term jitter is used to define the time deviation of a certain event from its average occurrence. Some communication systems rely on a very small message jitter, while most applications only require that a certain jitter is not exceeded for actions at the borders of the system, such as input sampling jitter and output action jitter. • Synchronicity: Distributed systems often require certain actions to take place in a coordinated fashion; i.e., these actions must take place at a predetermined moment in time independent of
© 2005 by CRC Press
14-52
The Industrial Communication Technology Handbook
where the action is to take place. A typical application is coordinated motion or electronic gearing. Some of these applications require a synchronicity in the microsecond range. • Data throughput: This is the capability of a system to process a certain amount of data within a certain time span. For communication systems, protocol efficiency, the communication model (e.g., Producer/Consumer), and endpoint processing power are most important, while the wire speed only sets the limit of how much raw data can be transmitted across the physical media. CIP Sync is a CIP-based communication principle that enables synchronous low-jitter system reactions without the need for low-jitter data transmission. This is of great importance in systems that do not provide absolute deterministic data transmission or where it is desirable for a variety of higher-layer protocols to run in parallel to the application system protocol. The latter situation is characteristic for Ethernet. Most users of TCP/IP-based Ethernet want to keep using it as before without the need to resort to a highly segregated network segment to run the real-time protocol. The CIP Sync communication principle meets these requirements. 14.5.1.2 Using IEEE 1588 Clock Synchronization The recently published IEEE standard 1588 — Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems [24] — lays the foundation for a precise synchronization of real-time clock in a distributed system. An IEEE 1588 system consists of a Time Master that distributes its system time to Time Slaves in a tree-like structure. The Time Master may be synchronized with another real-time clock higher up in the hierarchy, while the Time Slaves may be Time Masters for other devices below them. A Time Slave that is Time Master to another set of devices (typically in another part of the system) is also called a Boundary Clock. The time distribution is done by multicasting a message with the actual time of the master clock. This message originates in a relatively high layer of the communication stack, and therefore, the actual transmission takes place at a slightly later point in time. Also, there will be a variation of the stack processing time from one message to another. To compensate this delay and its jitter, the actual transmission time can be captured in a lower layer of the communication stack, such as noting the “transmit complete” feedback from the communication chip. This update time capture is then distributed in a follow-up message. The average transmission delay is also determined so that the time offset between master and slave clock can also be compensated. This protocol has been fully defined for Ethernet UDP/IP systems, and the protocol details for further industrial communication systems are to follow. The clock synchronization accuracy that can be achieved with this system largely depends on the precision time capture of the master clock broadcast message. Hardware-assisted time capture systems can reach a synchronization accuracy of 250 ns or less. It is expected that Ethernet chip manufacturers will offer integrated IEEE 1588 hardware support in the very near future. 14.5.1.3 Additional Object CIP Sync will require the addition of a new time synchronization object. This object manages the realtime clock inside a CIP Sync device and provides access to the IEEE 1588 timing information. Figure 14.50 shows the relationship of the additional object required for CIP Sync. 14.5.1.4 CIP Sync Communication Principles Real-time clocks coordinated through the IEEE 1588 protocol on their own do not constitute a real-time system yet. Additional details to show how time stamping is used for input sampling and for the coordination of output actions will be added. Some Device Profiles will be extended as well to incorporate time information in their I/O Assemblies. Details of this activity are under discussion in the ODVA Distributed Motion Control JSIG. 14.5.1.5 Message Prioritization Combining these three elements (Sections 14.5.1.2, 14.5.1.3, and 14.5.1.4) with collision-free infrastructure (see Section 14.3.3.16) is sufficient to build a real-time system. However, it is necessary to consider all traffic within the system and arrange all application messages containing time-critical data in such a
© 2005 by CRC Press
14-53
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles Application
Semiconductor
Pneumatic valves
AC Drives
Position controller
Other profiles
CIP Application Layer Application Object Library
CIP Sync Object
CIP (Presentation) (Session)
Transport
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
DeviceNet Transport
ControlNet Transport
Network
Encapsulation TCP
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.50
Possible future Alternatives: ATM, USB, FireWire,...
CIP extensions required for CIP Sync.
way that they are guaranteed to arrive at all consumers in time. When other Ethernet protocols, such as HTTP or FTP with possibly very long frames, need to coexist in the same system, the situation may need careful configuration. Ethernet frames with up to 1500 bytes of payload (approximately 122 µs long in a 100 Mbit/second system) can easily congest the system and delay important messages by an undetermined amount of time, possibly too long for correct functioning of the system. This is where message prioritization becomes an important element. Of the many prioritization schemes in use or proposed for Ethernet today, EtherNet/IP uses message prioritization according to IEEE 802.3:2002 [13]. This is a scheme supported by many switches available today. It allows preferential treatment of Ethernet frames in such a way that those frames with the highest priority will jump the message queues in a switch and will get transmitted first. Messages with high priority will get transmitted, while those with lower priority typically have to wait. Suitable priority assignment for all time-critical messages then guarantees their preferential treatment. Standard EtherNet/IP and other Ethernet messages will get low or no priority and thus have to wait until all higher-priority messages have passed. Once this prioritization scheme is implemented, one full-length frame can be tolerated within every communication cycle consisting of a set of prioritized input (port A through port E) and output (port F) messages. Figure 14.51 illustrates this process. 14.5.1.6 Applications of CIP Sync Typical applications for CIP Sync are time-stamping sensor inputs, distributed time-triggered outputs, and distributed motion such as electronic gearing or camming applications. For example, in motion applications, the sensors sample their actual position at a predetermined time, i.e., in a highly synchronized way, and transmit them to the application master that coordinates the motion. The application master then calculates the new reference values and sends them to the motion drives. Using CIP Sync, it is no longer necessary to have extremely low jitter in the communication system; it is sufficient to transmit all time-critical messages in time, and their exact arrival time becomes irrelevant. The assignment of suitable priorities to CIP Sync communication guarantees that all time-critical messages always get the bandwidth they need and all other traffic is automatically limited to the remaining bandwidth.
© 2005 by CRC Press
14-54
The Industrial Communication Technology Handbook
1
6
CIP frames with priority
Port A 3
9
Port B 5
7
4
8
1 3 4 5
2
6 7 8 9
Port F
Port C
Port D 2 Port E
Ethernet frame without priority
Result after priorization in the switch
The numbers inside the frames indicate their relative arrival time at the switch port
FIGURE 14.51
Ethernet frame prioritization.
As a result of these measures, CIP Sync devices can coexist side by side with other EtherNet/IP devices without any need for network segmentation or special hardware. Even non-EtherNet/IP devices — provided they do not override any of the CIP Sync prioritizations — can be connected without any loss of performance in the CIP Sync application. 14.5.1.7 Expected Performance of CIP Sync Systems As already mentioned, CIP Sync systems can be built to maintain a synchronization accuracy of better than 250 ns, in many cases without the use of Boundary Clocks. The communication cycle and thus the reaction delay to unexpected events is largely governed by the number of CIP Sync devices in a system. Allowing some bandwidth (approximately 40%) for non-CIP Sync messages, as described in Section 14.5.1.5, the theoretical limit (close to 100% wire load) for the communication cycle of a CIP Sync system based on a 100 Mbit/s Ethernet link is around 500 µs for 30 coordinated motion axes, with 32 bytes of data each. 14.5.1.8 CIP Sync Summary CIP Sync based on EtherNet/IP is a natural extension of the EtherNet/IP system into the very fast real-time domain. In contrast to many other proposed or existing real-time extensions, it does not require any strict network segmentation between high-performance real-time sections and other parts of the communication system. This results in truly open systems that can tolerate the vast majority of parallel TCP/IP-based protocols found in today’s industrial communication architecture without compromising performance. In a first phase, the CIP Sync principles will be applied to EtherNet/IP, while an extension to the other CIP implementations will follow at a later time.
14.5.2 CIP Safety CIP Safety, like other safety protocols based on industry standard networks, adds additional services to transport data with high integrity. Unlike other networks, the user of CIP Safety does not have to change his approach when going from one network or media to another. CIP Safety presents a scalable, networkindependent approach to safety network design, where the safety services are described in a well-defined layer. This approach also enables the routing of safety data, allowing the user to create end-to-end safety chains across multiple links without being forced to difficult-to-manage gateways. 14.5.2.1 General Considerations In the past and still today, hardwired safety systems employed safety relays that are interconnected to provide a safety function. Hardwired systems are difficult to develop and maintain for all but the most trivial applications. Furthermore, these systems place significant restrictions in the distance between devices.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.52
14-55
CIP communications layers, including safety.
Because of these issues, as well as distance and cost considerations, it is desirable to allow safety services to be implemented on standard communication networks. The key to the development of safety networks was not to create a network that could not fail, but to create a system where failures in the network would cause safety devices to go to a known state. If the user knew which state the system would go to, he could make his application safe. But this meant that significantly more checking and redundant coding information would be required. To determine the additional safety requirements, an existing railway standard [20] was used and later extended by the German Safety Bus committee [21]. This committee provided design guidelines to safety network developers to allow their networks and safety devices to be certified according to IEC 61508 [15]. Using these results, the Common Industrial Protocol, which allows network-independent routing of standard data, was extended to allow high-integrity safety services. The result is a scalable, routable, network-independent safety layer, thus removing the requirement for dedicated safety gateways. Since all safety devices execute the same protocol, independent of which media they reside on, the user approach is consistent and independent of media or network used. CIP Safety is an extension to standard CIP that has been approved by TÜV Rheinland for use in IEC 61508 SIL 3 and EN 954-1 category 4 applications. It extends the model by adding CIP Safety application layer functionality, as shown in Figure 14.52. The additions include several safety-related objects and Safety Device Profiles. Because the safety application layer extensions do not rely on the integrity (see Section 14.5.2.3) of the underlying standard CIP as described in Section 14.2 and data link layers as described in Sections 14.3.1, 14.3.2, and 14.3.3, single-channel (nonredundant) hardware can be used for the data link communication interface. This same partitioning of functionality allows standard routers to be used to route safety data, as shown in Figure 14.53. The routing of safety messages is possible, because the end device is responsible for ensuring the integrity of the data. If an error occurs in the transmission of data or in the intermediate router, the end device will detect the failure and take an appropriate action. This routing capability allows the creation of DeviceNet Safety cells with quick reaction times to be interconnected with other cells via a backbone network such as EtherNet/IP for interlocking, as shown in Figure 14.54. Only the safety data that are needed are routed to the required cell, which reduces the
© 2005 by CRC Press
14-56
The Industrial Communication Technology Handbook
CIP Safety App. Objects
CIP Safety App. Objects
CIP Safety Connection
CIP Safety Connection
CIP Routing DeviceNet Transport & Data Link Layer
DeviceNet Transport & Data Link Layer
DeviceNet
FIGURE 14.53
EtherNet/IP
Routing of safety data.
Safety PLC
EtherNet/IP Safety
Safety PLC
Router
Router DeviceNet Safety 1
I
FIGURE 14.54
EtherNet/IP Transport & Data Link Layer
EtherNet/IP Transport & Data Link Layer
I
O
Safety PLC
Router DeviceNet Safety 2
I
I
O
Safety PLC
DeviceNet Safety 3
I
O
Network routing.
individual bandwidth requirements. The combination of fast responding local safety cells and the intercell routing of safety data allows users to create large safety applications with fast response times. Another benefit is the ability to multicast safety messages across multiple networks. 14.5.2.2 Implementation of Safety As indicated in Figure 14.52, all CIP Safety devices also have an underlying standard CIP functionality. The extension to the CIP Safety application layer is specified using a Safety Validator Object. This object is responsible for managing the CIP Safety Connections (standard CIP Connections are managed through communication objects) and serves as the interface between the safety application objects and the link layer connections, as shown in Figure 14.55. The Safety Validator ensures the integrity of the safety data transfers by applying the integrity-ensuring measures described in Section 14.5.2.3. • The producing safety application uses an instance of a Client Validator to produce safety data and ensure time coordination. • The client uses a link data producer to transmit the data and a link consumer to receive time coordination messages. • The consuming safety application uses a Server Validator to receive and check data. • The server uses a link consumer to receive data and a link producer to transmit time coordination messages.
© 2005 by CRC Press
14-57
The CIP Family of Fieldbus Protocols
FIGURE 14.55
Relationship of Safety Validators.
The link producers and consumers have no knowledge of the safety packet and fulfill no safety function. The responsibility for high-integrity transfer and checking of safety data lies within the Safety Validators. 14.5.2.3 Ensuring Integrity CIP Safety does not prevent communication errors from occurring, but it ensures transmission integrity by detecting errors and allowing devices to take appropriate actions. The Safety Validator is responsible for detecting these communication errors. The nine communication errors that must be detected are shown in Figure 14.56 along with the five measures CIP Safety used to detect these errors, based on reference [21]. Measures to detect communication errors Communication Errors
Time Expectation ID for send Safety CRC via time stamp and receive
Message Repetition
X
X*
Message Loss
X
X*
Message Insertion
X
Incorrect Sequence
X
X
Increased age of data in bridge
X*
X
X
X
X
X
Coupling of safety and safety data Coupling of safety and standard data
Diverse measure
X*
Message Corrupt Message Delay
Redundancy with Cross Checking
X
X
X
X
X
* The Safety CRC provides additional protection for communication errors in fragmented messages.
FIGURE 14.56
© 2005 by CRC Press
Error detection measures.
14-58
The Industrial Communication Technology Handbook
Producer Count 0
Consumer Count 89
Ping
1
90
2
91
3 4 Offset = 92−5 = 87
er nsum
2 e=9
tim
93
Co
5
94
6
95
7 Time Stamp = 97+8 = 95
8
96
Time Sta
mp = 95
9
Time Stamp = 87+14 = 101
FIGURE 14.57
92
97 98
10
99
11
100
12
101
13
102
14
Time Sta
mp = 10
1
Max. Age = 98−95 = 3
103
15
104
16
105
17
106
Max. Age = 104−101 = 3
Time stamp.
14.5.2.3.1 Time Expectation via a Time Stamp All CIP Safety data are produced with a time stamp, which allows Safety Consumers to determine the age of the produced data. This detection measure is superior to the more conventional reception timers. Reception timers can tell how much time has elapsed since a message was last received, but they do not convey any information about the actual age of the data. A time stamp allows transmission, media access/ arbitration, queuing, retry, and routing delays to be detected. Time is coordinated between producers and consumers using ping requests and ping responses, as shown in Figure 14.57. After a connection is established, the producer will produce a ping request, which causes the consumer to respond with its consumer time. The producer will note the time difference between the ping production and the ping response and store this as an offset value. The producer will add this offset value to its producer time for all subsequent data transmissions. This value is transmitted as the time stamp. When the consumer receives a data message it subtracts its internal clock from the time stamp to determine the data age. If the data age is less than the maximum age allowed, the data are applied; otherwise, the connection goes to the safety state. The device application is notified so that the connection safety state can be appropriately reflected. The ping request-and-response sequence is repeated periodically to correct for any drift in producer or consumer time base drift. 14.5.2.3.2 Production Identifier A Production Identifier (PID) is encoded in each data production of a Safety Connection to ensure that each received message arrives at the correct consumer. The PID is derived from an electronic key, the device Serial Number, and the CIP Connection Serial Number. Any safety device inadvertently receiving a message with the incorrect PID will go to a safety state. Any safety device that does not receive a message within the expected time interval with the correct PID will also go to a safety state. This measure ensures that messages are routed correctly in multilink applications.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-59
14.5.2.3.3 Safety CRC All safety transfers on CIP Safety use Safety CRCs to ensure the integrity of the transfer of information. The Safety CRCs serve as the primary measure to detect possible corruption of transmitted data. They provide detection up to a Hamming distance of 4 for each data transfer section, though the overall Hamming distance coverage is greater for the complete transfer due to the redundancy of the protocol. The Safety CRCs are generated in the Safety Producers and checked in the Safety Consumers. Intermediate routing devices do not examine the Safety CRCs. Thus, by employing end-to-end Safety CRCs, the individual data link CRCs are not part of the safety function. This eliminates certification requirements for intermediate devices and helps to ensure that the safety protocol is independent of the network technology. The Safety CRC also provides a strong protection mechanism that allows underlying data link errors such as bit stuffing or fragmentation errors to be detected. The individual link CRCs are not relied on for safety, but they are still enabled. This provides an additional level of protection and noise immunity, by allowing data retransmission for transient errors at the local link. 14.5.2.3.4 Redundancy and Cross-Check Data and CRC redundancy with cross-checking provides an additional measure of protection by detecting possible corruption of transmitted data. They effectively increase the Hamming distance of the protocol. These measures allow long safety data packets, up to 250 bytes, to be sent with high integrity. For short packets of 2 bytes or less, data redundancy is not required; however, redundant CRCs are cross-checked to ensure integrity. 14.5.2.3.5 Diverse Measures for Safety and Standard The CIP Safety protocol is present only in safety devices; this prevents standard devices from masquerading as a safety device. 14.5.2.4 Safety Connections CIP Safety provides two types of Safety Connections: • Unicast • Multicast A unicast connection, as shown in Figure 14.58, allows a Safety Validator client to be connected to a Safety Validator server using two link layer connections.
FIGURE 14.58
© 2005 by CRC Press
Unicast connection.
14-60
FIGURE 14.59
The Industrial Communication Technology Handbook
Multicast connection.
A multicast connection, as shown in Figure 14.59, allows up to 15 Safety Validator servers to consume safety data from a Safety Validator client. When the first Safety Validator server establishes a connection with a Safety Validator client, a pair of link layer connections is established, one for data-and-time correction and one for time coordination. Each new Safety Validator server will use the existing dataand-time correction connection and establish a new time coordination connection with the Safety Validator client. To optimize the throughput on DeviceNet, three data link connections are used for each multicast connection, as shown in Figure 14.60. The data-and-time correction messages are sent on separate connections. This allows short messages to be transmitted on DeviceNet within a single CAN frame and reduces the overall bandwidth, since the time correction and time coordination messages are sent at a much slower periodic interval. When multicast messages are routed off link, the router combines the data-and-time correction messages from DeviceNet and separates them when messages reach DeviceNet. Since the safety message contents are unchanged, the router provides no safety function.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.60
14-61
Multicast connection on DeviceNet.
14.5.2.5 Message Packet Sections CIP Safety has four message sections: • • • •
Data section Time-stamp section Time correction section Time coordination section
The description of these formats would go beyond the scope of this handbook. Reference [23] provides further details. 14.5.2.6 Configuration Before safety devices can be used in a safety system, they must first be configured and connections must be established. The process of configuration requires configuration data from a configuration tool to be placed in a safety device. There are two possible sequences for configuration:
© 2005 by CRC Press
14-62
Safety Network Configuration Tool
d oa t) nl ge ow T ar )D o (2 ol t o (T
(T (1) oo D l to ow Or nlo igi ad na to r)
The Industrial Communication Technology Handbook
(3) Download (Originator to Target Download)
Originator Device
FIGURE 14.61
(4) SafetyOpen Configuration
Target Device
Configuration transfers.
• Configuration tool directly to device • Via an intermediate device In the configuration tool-to-device case, as shown in Figure 14.61, the configuration tool writes directly to the device to be configured (1 and 2). In the case of intermediate device configuration, the tool first writes to an originator (1) and the originator writes to the target using an originator-to-target download (3) or a Safety_Open service (4). The Safety_Open service (4) is unique in that it allows a Safety Connection to be established at the same time that a device is configured. 14.5.2.7 Connection Establishment CIP provides a connection establishment mechanism, using a Forward_Open service that allows producer-to-consumer connections to be established locally or across multiple links via intermediate routers. An extension of the Forward_Open, called the Safety_Open service, has been created to allow the same multilink connections for safety. There are two types of Safety_Open requests: • Type 1: With configuration • Type 2: Without configuration With the Type 1 Safety_Open request, configuration and connections are established at the same time. This allows rapid configuration of devices with simple and relatively small configuration data. With the Type 2 Safety_Open request, the safety device must first be configured, and the Safety_Open request then establishes a Safety Connection. This separation of configuration and connection establishment allows the configuration of devices with large and complex configuration data. In both cases, the Safety_Open request establishes all underlying link layer connections — across the local link as well as any intermediate links and routers. 14.5.2.8 Configuration Implementation CIP Safety provides the following protection measures to ensure the integrity of configuration: • • • •
Safety Network Number Password protection Configuration ownership Configuration locking
14.5.2.8.1 Safety Network Number The Safety Network Number provides a unique network identifier for each network in the safety system. The Safety Network Number combined with the local device address allows any device in the safety system to be uniquely addressed.
© 2005 by CRC Press
14-63
The CIP Family of Fieldbus Protocols
Application Objects
Parameters
Safety I/O Assemblies Standard I/O Assemblies
Other Objects
Message Router
Identity Safety Supervisor
Safety Validator
Safety I/O Connections
Explicit msg
Network Link* * - DeviceNet - ControlNet - Ethernet
CIP Network
FIGURE 14.62
Safety device objects.
14.5.2.8.2 Password Protection All safety devices support the use of an optional password. The password mechanism provides an additional protection measure, prohibiting the reconfiguration of a device without the correct password. 14.5.2.8.3 Configuration Ownership The owner of a CIP Safety device can be specified and enforced. Each safety device can specify that its configuration is configured by a selected originator or that the configuration is only configured by a configuration tool. 14.5.2.8.4 Configuration Locking Configuration Locking provides the user with a mechanism to ensure that all devices have been verified and tested prior to being used in a safety application. 14.5.2.9 Safety Devices The relationship of the objects within a safety device is shown in Figure 14.62. Note that CIP Safety extends the CIP object model, with the addition of Safety I/O Assemblies and Safety Validator and Safety Supervisor Objects. 14.5.2.10 Safety Supervisor The Safety Supervisor Object provides a common configuration interface for safety devices. The Safety Supervisor Object centralizes and coordinates application object state behavior and related status information, exception status indications (alarms and warnings), and defines a behavior model that is assumed by objects identified as belonging to safety devices.
© 2005 by CRC Press
14-64
The Industrial Communication Technology Handbook
14.5.2.11 CIP Safety Summary The concept presented here demonstrates a scalable, routable network-independent safety protocol based on extensions to the CIP architecture. This concept can be used in solutions ranging from device-level networks such as DeviceNet to higher-level networks such as EtherNet/IP. By designing network independence into CIP Safety, multilink routing of Safety Connections can be supported. Functions such as multilink routing and multicast messaging provide a strong foundation that enable users to create the fast responding local cells and interconnect remote cells that are required for today’s safety applications. The design also enables expansion to future network technologies as they become available.
14.6 Conclusion The CIP family of protocols is a very versatile set of fieldbus protocols that are scalable to allow their use in many applications and many levels of the automation architecture. Due to the universal applicability of the underlying protocol, it is very easy to switch from one system to another. The Producer/Consumer principle together with the open-object architecture used in the CIP family allow very efficient use of the communication bandwidth and ensure that these modern systems can be used for many years to come.
References* [1] CIP Common Specification, Release 1.0, © 2000, 2001 by ControlNet International and Open DeviceNet Vendor Association. [2] DeviceNet Specification, Release 2.0, including Errata 5, March 31, 2002, © 1995–2002 by Open DeviceNet Vendor Association. [3] ControlNet Specification, Release 2.0, including Errata 2, December 31, 1999, © 1998, 1999 by ControlNet International. [4] CIP Common Specification, Edition 2.0, © 2001–2004 by ODVA and ControlNet International. [5] DeviceNet Adaptation of CIP Specification, Edition 1.0, December 15, 2003, © 1994–2004 by Open DeviceNet Vendor Association. [6] EtherNet/IP Specification, Release 1.0, June 5, 2001, © 2000, 2001 by ControlNet International and Open DeviceNet Vendor Association. [7] Planning and Installation Manual, DeviceNet Cable System, Publication PUB00027R1, downloadable from ODVA Web site (http://www.odva.org/). [8] Recommended IP Addressing Methods for EtherNet/IP Devices, Publication PUB00028R0, downloadable from ODVA Web site (http://www.odva.org/). [9] IEC 61131-3:1993, Programmable Controllers: Part 3: Programming Languages. [10] Controller Area Network: Basics, Protocols, Chips and Application, IXXAT Automation, 2001. [11] ISO 11898:1993, Road Vehicles: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication. [12] IEEE 802.3:2000, ISO/IEC 8802-3:2000, Information Technology: Local and Metropolitan Area Networks: Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specification. [13] IEEE 802.3:2002, Information Technology: Telecommunication and Information Exchange between Systems: LAN/MAN: Specific Requirements: Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications. [14] ISO/IEC 7498-1:1994, Information Technology: Open Systems Interconnection: Basic Reference Model. [15] IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, 1998.
*All RFCs are downloadable from http://www.faqs.org/rfcs/.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-65
[16] IEC 62026-3, Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 3: DeviceNet, 2000. [17] EN 50325-2, Industrial Communications Subsystem Based on ISO 11898 (CAN) for ControllerDevice Interfaces: Part 2: DeviceNet, 2000. [18] GB/T 18858 (Chinese national standard), Low-Voltage Switchgear and Controlgear ControllerDevice Interface, 2003. [19] IEC 61158, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2000. [20] EN 50159-1:2001, Railway Applications, Communication, Signaling and Processing Systems. [21] Draft Proposal Test and Certification Guideline, Safety Bus Systems, BG Fachausschuβ Elektrotechnik, May 28, 2000. [22] IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. [23] David A. Vasko, Suresh R. Nair, 2003, CIP Safety: Safety Networking for the Future, paper presented at Proceedings of the 9th International CAN Conference. [24] IEEE 1588:2002, Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. [25] Viktor Schiffer, 2003, Modular EDSs and Other EDS Enhancements for DeviceNet, paper presented at Proceedings of the 9th International CAN Conference. [26] Viktor Schiffer, 2003, Device Configuration Using Electronic Data Sheets, ODVA Conference and 9th Annual Meeting, downloadable from ODVA Web site. [27] Viktor Schiffer, Ray Romito, DeviceNet Development Considerations, downloadable from ODVA Web site, 2000. [28] RFC 768, User Datagram Protocol, 1980. [29] RFC 791, Internet Protocol, 1981. [30] RFC 792, Internet Control Message Protocol, 1981. [31] RFC 793, Transmission Control Protocol, 1981. [32] RFC 826, Ethernet Address Resolution Protocol, or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware, 1982. [33] RFC 894, Standard for the Transmission of IP Datagrams over Ethernet Networks, 1984. [34] RFC 1103, Proposed Standard for the Transmission of IP Datagrams over FDDI Networks, 1989. [35] RFC 1112, Host Extensions for IP Multicasting, 1989. [36] RFC 1122, Requirements for Internet Hosts: Communication Layers, 1989. [37] RFC 1123, Requirements for Internet Hosts: Application and Support, 1989. [38] RFC 1127, Perspective on the Host Requirements RFCs, 1989. [39] RFC 1171, Point-to-Point Protocol for the Transmission of Multi-Protocol Datagrams over Pointto-Point Links, 1990. [40] RFC 1201, Transmitting IP Traffic over ARCNET Networks, 1991. [41] RFC 2236, Internet Group Management Protocol, Version 2, 1997.
© 2005 by CRC Press
15 The Anatomy of the P-NET Fieldbus 15.1 15.2 15.3 15.4
Background........................................................................15-1 The Embodiment of P-NET.............................................15-2 The Communication Skeleton .........................................15-3 Layer 1: The Physical Layer ..............................................15-3 RS 485 • RS 232 • Light-Link • 4-WIRE P-NET• Ethernet
15.5 Layer 2: The Data Link Layer ...........................................15-7 Node Address Field • Control Status Field • Info Length Field • Info Field • The Error Detection Field • Master–Slave • Multimaster Bus Access
Christopher G. Jenkins PROCES-DATA (U.K.) Ltd.
15.6 Layer 3: The Network Layer ...........................................15-15 15.7 Layer 4: The Service Layer..............................................15-16 15.8 Layer 7: The Application Layer ......................................15-16 15.9 Layer 8: The User Layer ..................................................15-16 15.10 The Intelligent P-NET Node ..........................................15-18 15.11 The PC and P-NET .........................................................15-20 15.12 The Appliance of P-NET ................................................15-22 15.13 Worldwide Fallout...........................................................15-23 15.14 P-NET for SMEs..............................................................15-24 Bibliography ...............................................................................15-25
Within a restricted number of pages, there needs to be a compromise between describing a technical concept so briefly as to not do it justice, and delving too deeply into its functionality to leave no room to convey the general essence. With this in mind, this study concentrates on describing the skeleton of P-NET’s structure in as much detail as is necessary to at least highlight the main attributes of this fieldbus type. Hopefully, the reader will still be left with enhanced knowledge of the techniques used when using P-NET within a process automation system.
15.1 Background P-NET was initially conceived in Denmark during the 1980s as a means of transferring measurement and control data between industrial process transmitters and programmable controllers, connected together within a serial communications network — hence the name P-NET, derived from the words describing its use within a process network. Prior to this, data had to be individually transmitted in analogue form using current (e.g., 4 to 20 mA) or voltage (e.g., 0 to 100 mV) or a one-to-one digital link (e.g., 20-mA current loop or RS 232). Therefore, there was little chance of avoiding many point-to-point connections from transmitters to a central controller, involving a multitude of wires, cables, and terminations. This is not the place to discuss the unfavorable
15-1 © 2005 by CRC Press
15-2
The Industrial Communication Technology Handbook
FIGURE 15.1 P-NET segments showing inter-network communication.
cost comparison between even a small process plant based on point-to-point wiring and a digitally networked system, due to the additional cabling and manpower, a decrease in reliability with an increase of terminations, and lack of expansion flexibility. However, it is worth mentioning that studies and practical implementations since that time have proposed that one can achieve a 40% lifetime cost advantage of the latter over the former. So, in justifying the economic advantages of developing P-NET in the first place, how does this compare with the evolution of other forms of fieldbus available today? Again, this brief study is not intended to be a comparison with, or to cast dispersions upon, any other industrial communications standard. However, the aim is to convey that there are a number of important differences between P-NET and other fieldbus standards, which provide it with its own unique characteristics.
15.2 The Embodiment of P-NET Before we dig deeper into the technicalities of the P-NET protocol, it would be as well to briefly describe some of the attributes that give P-NET its particular personality: • Although P-NET is generically grouped and described as a “Fieldbus for use in Industrial Control Systems” and to which an international standard has been applied (IEC 61158/4), its capabilities extend beyond being purely a single-level bus system. As the name implies, it has built-in networking properties. This provides the means for an industrial control system to be partitioned into a number of autonomous segments (buses), each communicating with others using the same protocol through gateways. This means that if required, such a system can also be structured into hierarchical layers dealing with the lower sensor level, through the device level, up to the control system backbone. P-NET is therefore known as a multinet protocol (Figure 15.1). • Some fieldbus types only support a single controller or master. This is the device that controls communication with a number of slave nodes connected to the bus. P-NET is a multimaster protocol, which allows up to 32 master devices to communicate with up to 125 other nodes. However, this restriction applies only to a single bus or network segment. By incorporating a P-NET multiport master device onto the bus, the bus will act as a transparent bridge between two nets. Another 32 masters within 125 nodes can again be connected to this new bus. It does not take too much thought to realize that by adding more multiport masters to interconnected buses, one will eventually build up a highly complex Web network having many segments and layers. While such extended facilities would not perhaps be considered particularly useful for a simple measurement and control system, the flexibility offered by P-NET has allowed it to also be structured as a completely integrated industrial control communication network to run an entire process or manufacturing plant (Figure 15.2).
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-3
FIGURE 15.2 A complete network showing a variety of physical media.
15.3 The Communication Skeleton Let us now start discussing the P-NET protocol in more detail. When fully specifying a communications protocol, it is quite common to use the Open Systems Interconnection (OSI) 7 layer reference model as a means of breaking the protocol down into manageable parts. The methodology and generic meaning of these layers is described elsewhere in this book. It is worth mentioning, however, that although most fieldbus types can be described using just layers 1, 2, and 7, P-NET also implements layers 3 and 4, because it offers the additional multinet and gateway features. Since the structure of a complete message can be seen in the lower layers, we shall concentrate mainly on these.
15.4 Layer 1: The Physical Layer Layer 1 is concerned with how raw bits are transmitted over the bus and network. It specifies the cable or transmission medium, how many nodes are allowed and what topology is used to connect them together, how a 1 and a 0 are represented in terms of voltage level, the timing of each bit, etc. It is worth noting that this is the only layer where the electrical activity at the transmitting and receiving nodes will be the same. There are three electrical standards specified for P-NET in the current IEC standard. However, in recent years, other specifications have evolved, all of which are now being used extensively. If we also consider the local and wide area transport media of Ethernet and wireless local area network (WLAN) together with dial-up networking using public switched telephone network (PSTN), integrated services digital network (ISDN), and global system for mobile communication (GSM) (including short message service [SMS]), or the permanent service provided by Broadband via the Internet, there are a myriad of possibilities for fully utilizing the networking capabilities of P-NET.
15.4.1 RS 485 RS 485 is an international electrical standard in its own right, which enables multiple communicating devices to be connected together on the same piece of cable (Figure 15.3).
© 2005 by CRC Press
15-4
The Industrial Communication Technology Handbook
FIGURE 15.3 Recommended interface circuit.
This standard was chosen for P-NET because electrically it uses a balanced-line transmission principle. This means that it has better noise immunity and higher speed and distance capabilities than one where each line is referenced to a fixed voltage. The specification has been extended using P-NET by ensuring that all connected nodes incorporate a galvanically isolated connection. This enables up to 127 nodes to be simultaneously connected to a single cable (Figure 15.4). The topology of choice for this medium is for the screened twisted-pair cable to be connected in a continuous ring having a length of up to 1.2 km. This improves noise immunity by reducing reflections and enhances reliability if there should be a single point break in the cable. However, the classic bus architecture is also permitted if a specific termination network is fitted. The bit rate of standard P-NET was carefully chosen to ensure that any commonly available microcontroller with a serial interface could be used as a P-NET node. At the same time, it needed to be ensured
P-NET
FIGURE 15.4 Ring topology.
© 2005 by CRC Press
15-5
The Anatomy of the P-NET Fieldbus
= Termination
P-NET
FIGURE 15.5 Bus topology.
that the line length of a single bus segment performed over a workable distance without signal degradation. Such a standard rate is fixed at 76.8 Kbs, which is a standard software-selectable baud rate in most microcontroller serial interfaces. This also enables the line length to extend around a complete plant. The consequent data transfer rate is shown in Figure 15.10.
15.4.2 RS 232 This electrical standard is also part of the P-NET standard. It plays an important role in providing gateway possibilities between two P-NET network segments and in linking to equipment that does not possess a P-NET interface. RS 232 is a point-to-point serial link and does not allow more than two transmit–receive devices to be connected together. Therefore, it is not a multidrop medium as with RS 485. Due to the way bits are transmitted, it is more susceptible to noise unless distances are kept to a minimum. However, when used as a P-NET port gateway, the bit rate is selectable, these days ranging from 300 to 230,400 bps, including the standard 76,800 bps. This means that dial-up modems associated with PSTN, ISDN, and GSM can all be utilized to transfer data between P-NET segments that are physically located thousands of miles apart. Of course, if a P-NET device includes an RS 232 port, it can also be connected to printers, bar code readers, and other devices having an RS 232 interface.
15.4.3 Light-Link Light-link is a medium designed to transfer data via infrared (IR) nonvisible light. It was designed by PROCES-DATA primarily as a means of transferring data using the P-NET protocol throughout a local cluster of individually mounted DIN* rail input/output (I/O) slave modules and multiport programmable master modules. Such a facility negates the need to perform any physical wiring between modules, since the act of mounting adjacent modules automatically connects the light path. The means are also provided for extending the light-link path to other local clusters or individual nodes using standard fiber-optic cable (Figure 15.6 and Figure 15.7).
15.4.4 4-WIRE P-NET 4-WIRE P-NET extents the usability of P-NET devices to areas where it is important to transmit power as well as data within one cable, perhaps where a number of low-power single-channel sensors/actuators need to be distributed around a building or home. This is opposed to connecting an individual power source to each distributed node or cluster, as with classic RS 485. Another major advantage of this medium is that together with an appropriate barrier, it is also suitable for connecting ATEX†-approved AP-NET nodes within hazardous areas. A hazardous area is one where an ignitable gas or material permanently or occasionally exists, and where a spark caused by the connection or disconnection or a fault within electrical equipment could cause an explosion. Electrical equipment therefore has to be designed, approved, and marked for use in such areas. The subject of intrinsic safety and other forms of hazardousarea protection is a wide one, and it would be inappropriate to discuss it in any detail here. Suffice it to say that by using this medium, a number of approved P-NET devices can be used within the industries
*DIN, Deutsches Institute für Normung (German Standards Authority). †ATEX, Authority for Testing Equipment in Explosive Atmospheres.
© 2005 by CRC Press
15-6
The Industrial Communication Technology Handbook
FIGURE 15.6 Mounting principle of P-NET modules from PROCES-DATA.
FIGURE 15.7 Two clusters of modules joined by extending light-link via fiber optics.
that warrant such safety protection (e.g., oil, petroleum, gas, pharmaceuticals, mining, etc.). It should also be understood that a change from one P-NET medium to another is transparent and operates at the same speed using the same protocol (Figure 15.8).
15.4.5 Ethernet Ethernet is another electrical standard for the transmission of data, in the same way that RS 485 is. It has gained wide popularity as a means of exchanging data between office equipment, such as PCs, printers, faxes, etc. As an electrical standard, it can be used to transfer data using various protocols, one of which is the packet-oriented Internet Protocol (IP). As the name implies, this protocol enables not only computers within an office environment to talk together, but also equipment separated by vast distances by
© 2005 by CRC Press
15-7
The Anatomy of the P-NET Fieldbus
24V PSU M36 Cluster
M36 Cluster
M36 Cluster
P-NET = 4-WIRE P-NET Conditioner: Termination = 4-WIRE P-NET Conditioner: S-Ref = P-NET Device = 4-WIRE P-NET Device
FIGURE 15.8 Conversion to 4-WIRE P-NET directly or through ATEX barrier (not shown).
each having a connection to the Internet. As with a P-NET device having an RS 232 port to communicate with another P-NET device directly or via various MODEM types, devices incorporating an Ethernet port are also available to connect P-NET devices together locally or via other devices, including WLAN, to the Internet. Although such a connection through an Ethernet switch would also be transparent to P-NET communicating nodes, in this case each P-NET frame is wrapped within a User Datagram Protocol (UDP)/IP packet (Figure 15.9 and Figure 15.10).
15.5 Layer 2: The Data Link Layer The task of the data link layer is to: • Create and recognize frame boundaries and node addresses • Perform transmission error control • Control access to the bus, including multimaster access All communication on a bus is sent within a structured frame. This consists of a series of asynchronously transmitted 9-bit bytes. The important feature of each is that the ninth bit indicates whether the remaining bits are associated with a node address or some other data. Any microcontroller that is to be considered for use as a P-NET node must have the ability in its serial interface Universal Asynchronous PC with WLAN interface
Interconnecting Clients and Notebook PCs PC with WLAN interface Access Point
Uplink
Switch 1
Client mode
2
3
4
Client mode
VIGO PC
PD602
PD602 PD602
FIGURE 15.9 Using P-NET across Ethernet and WLAN media.
© 2005 by CRC Press
15-8
The Industrial Communication Technology Handbook
P-NET RS 485
P-NET Light-Link
P-NET 4-WIRE
P-NET Ethernet
RS 232
IR
Combined power and RS 485
Ethernet
Ring without termination. Bus with terminators.
Point to point
Multi -master bus supporting multinet structure
Bus
Star connection using switches
Screened twisted pair cable (STP) 100 – 120 ohms impedence (IBM Twinax)
Screened twisted pair (Cat 5 STP)
Plastic Optical Fibre 1000 um. In built within PD M 36 DIN rail modules
Single cable for both communication and power. Screened dual twisted pair.
Twisted pair cable without screen (Cat 5 UTP)
12 m
0.5 m
600 m
100 m
Depends on load
100
76.8 kbits/sec providing 300 confirmed floating point measurements/sec.
10/100 Mbit/s. 1000 confirmed messages/sec.
3.3 mS
1 mS
Electrical Standard
RS 485
Bus Structure
Medium
Bus Length No. of Nodes per Segment
P-NET RS 232
1200 m ring 600 m bus Up to 125 inc. up to 32 masters
Communication Speed
76.8 kbits/sec providing 300 confirmed floating point values/sec
Cycle Time
3.3 mS
2
300 – 230.4 kbit/s
-
Up to 125 inc. up to 32 masters 76.8 & 230.4 kbit/s providing 300 or 900 confirmed floating point measurements/sec. 3.3 or 1.1 mS
Accessories
Cutter & optic wedges
Bending Radius
> 30 mm
FIGURE 15.10 Overview of the physical media used by P-NET.
Receiver/Transmitter (UART) for this additional bit to be included within the byte. The byte structure is therefore as follows (Figure 15.11): • • • •
One start bit (logical 0) Eight data bits with least significant bit (LSB) first (bits 0 to 7) One address/data bit One stop bit (logical 1)
A frame is divided up into a number of variable- and fixed-length fields as follows (Figure 15.12): • • • • •
Node address field — 2 to 24 bytes Control/status field — 1 byte Info length — 1 byte Info field — 0 to 63 bytes Error detection field — 1 to 2 bytes
FIGURE 15.11 Structure of byte.
FIGURE 15.12 Structure of frame.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-9
Complete frames sent by masters are separated by an idle interval of no more than 30 bit periods before receiving a response frame. At the standard communication speed of 76.8 Kbaud, this is equivalent to 390 µS. The start of a frame can always be recognized by the fact that the first byte has the address/ data bit set to 1. In addition, the first address-identified byte in the frame having bit 7 set true will contain the node address (bits 0 to 6) of the token-holding master (see below). This introduces the fact that P-NET addressing also includes the requesting node address, as well as the destination node address from which a response is expected. Bit 7 of each address byte is thus used to indicate whether it is associated with the (slave) address from which a response is expected or is being made (bit 7 = 0) or the requesting (master) source address of the transmission to which a response is expected or is being received (bit 7 = 1). In other words, because the first byte of an address field is always a destination address, if bit 7 is false, then the transmission is a request from a master. Conversely, if bit 7 is true, the transmission is a response from a slave.
15.5.1 Node Address Field Node addresses (NAs) can extend from 0 to 127 (6 bits): • Address 0 — Normally used as default address when shipped from manufacture. This guarantees that this node will have a unique address when first incorporated into a network of nodes, before it is automatically set to its system-defined address. • Addresses 1 to 125 — Range of addresses within an individual P-NET sector, where addresses 1 to 32 are normally reserved for master nodes. • Address 126 — Broadcast to all nodes without stimulating a response. • Address 127 — Reserved for test. Since P-NET has the ability to communicate with nodes in other network segments (up to 10 layers away), the address field is of variable length. For simple connections between nodes on the same bus segment, only two bytes are required. The first is the destination address of the slave node. The second is the source address of the master. This simple address strategy can be recognized by the 0, 1 sequence of bit 7 in the two address bytes. The response from the slave (or master) also uses the same simple format, by reversing the received address fields so that the master’s source address becomes the slave’s destination address (sending it back from where it came) and the master’s destination address becomes the slave’s source address. This address strategy is recognized by the 1, 0 sequence of bit 7 in the two address bytes. This slave response format is always the same, no matter whether the request has been made by a local master or one originating from a remote segment. For a master node to address a slave node through a gateway into an adjacent P-NET segment or farther, it is necessary to use the services provided by a multiport master. Here the master that is originating the request, being aware of the route map to the node that is to give the response, prepares a series of signposts within the address field of the frame. The technique that P-NET uses to define a complete address consists of node addresses and port numbers, together with an indication of how far the journey is. Also, some kind of identity for this particular communication is included. Node addresses have already been discussed, so an explanation of what a port number is is relevant here. Because of the networking capabilities of P-NET, there needs to be a device having the ability to transparently transfer data from one network segment to another. This is called a gateway and consists of two or more input/output ports. This gateway is also a master node, because it has to control the access to the individual buses at the appropriate time. Of course, master devices that only have one port cannot act as gateways. Commonly, multiport masters consist of two or three external ports, but P-NET allows devices to have as many as is practically useful. Each port is numbered from 1 upward, and if it is being used as an output port, it is used within the address to define the complete route. So basically, an address that is to extend across one or more gateways (as opposed to simple local bus addressing) is formed by using the node address of the gateway in the local bus segment, then the port number that is
© 2005 by CRC Press
15-10
The Industrial Communication Technology Handbook
MASTER
GATEWAY 1 Node 5 of 6 Segment A
Port 2
Node 4 of 6
Port 2
GATEWAY 2 Node 2 of 2 Segment B
Port 1
Node 1 of 2
Port 2 Port 3
SLAVE Node 3 of 4 Segment C Node 34
FIGURE 15.13 Communication path to a remote slave.
connected to the adjacent segment, followed by the node address of the slave node or gateway to another segment, then the port number, and so on. As previously discussed, a P-NET address is structured in terms of its destination and source addresses, so that the response can find itself back to the requesting node. This concept is obviously a little more complex when we consider a long path through a number of gateways. The fact that each bus segment within a larger network operates both simultaneously and independently means that it is unlikely that the communication path will be completely continuous at a particular instance in time. In other words, due to the equal-priority philosophy of P-NET masters, it may be necessary for a gateway to wait before its port can access the adjacent bus. The more gateways that are involved in the transaction, the more traffic-controlled crossroads that can be expected during the journey. In order to help describe the route across these network areas (and back again), two additional pieces of data are incorporated into the address field (those bytes with the address/data bit 8 set to 1). If the address field consists of more than the two bytes used for simple addressing, the third byte is used to define how many extra bytes are to be involved in the address field. In addition, the last byte is used as a requesting master message identity. In simple terms, this is chosen by the master to ensure that when a response message is returned, it recognizes that this response is associated with a particular request. More specifically, it will be a number between 16 ($10) and 127 ($7F) and is in fact associated with a particular task running in the master. It can therefore be deduced that a master could send out a number of requests over various paths, and the responses may not return in the order sent out. It can also be concluded that, more often than not, a master is a programmable device/controller programmed in a multitasking high-level language (e.g., Pascal), having the ability to obtain values from network-based variables (e.g., Process-Pascal). So, there are two automatic tasks that a gateway node must perform. In order to satisfy the rule that a slave must reply to a request within 30 bit periods, the gateway must reply to the originating master that an “answer comes later.” This is so that this master can now release the token to allow this local segment to continue dealing with other bus transfers, be those local or across gateways from others or the same master (see below). The other task of the gateway is to respecify the address structure to enable the next communication stage to be achieved while ensuring that the way back for a response is not lost. In this respect, a number that previously identified an outgoing port is changed to become the equivalent incoming node address. Let us use an illustrative example to show how a master obtains data from a slave located in a remote network segment. The physical paths could include any of the media types already described (Figure 15.13). The address fields (add/data bit = 1) in Figure 15.14 show how a request is made across the above example of three P-NET segments. The hatched bytes relate to source rather than destination addresses (Figure 15.14). This inaugural address field includes the complete path to the slave. The first source node
FIGURE 15.14 Request from Master to gateway 1.
© 2005 by CRC Press
15-11
The Anatomy of the P-NET Fieldbus
Response from Gateway 1 to Master Begin
End
Node
Node
5
4
1 1
0 1
FIGURE 15.15 Response from gateway 1 to Master.
Request from Gateway 1 to Gateway 2
Begin
Node 1
0 1
Port 3
Node
Extra Bytes
0 1
5
0 1
34
Node
0 1
2
Port
1 1
2
1 1
End
Node 5
1 1
Task 20
1 1
FIGURE 15.16 Request from gateway 1 to gateway 2.
Request from Gateway 2 to Gateway1 Begin
End
Node
Node
2
1
1 1
0 1
FIGURE 15.17 Response from gateway 2 to gateway 1.
address indicates that node 5 in segment A currently holds the token (explained below). A complex address is recognized as one having more than three sequential zeros for bit 7 (Figure 15.15). The first destination and source node addresses have been reversed. The adjacent control/status field will indicate “answer comes later” and no data will be included. This releases node 5 for one of the other five masters to gain access to the bus in segment A (Figure 15.16). Gateway 1 port has been changed to a source node, and gateway 2 node changed to a source port. The first source node address indicates that node 2 in segment B currently holds the token (Figure 15.17). The first destination and source addresses have been reversed. The adjacent control/status field will indicate “answer comes later” and no data will be included. This releases node 2 for the one other master in segment B to gain access to the bus (Figure 15.18). The final part of the request path uses simple addressing as if gateway 2 had originated the request. It is a request because of the 0, 1 sequence of bit 7. The first source node address indicates that node 3 in segment C currently holds the token (Figure 15.19). The first destination and source addresses have been reversed and indicate a simple address response because of the 1, 0 sequence of bit 7. The following info field will contain the requested data. This releases node 3 for one of the other three masters in segment C to gain access to the bus (Figure 15.20). Now the source addresses in the request from gateway 1 to gateway 2 have been swapped to destination addresses, and the data from the slave passed on within the attached info field. This is sent
Request from Gateway 2 to Slave Begin
End
Node
Node
34
3
0 1
FIGURE 15.18 Request from gateway 2 to slave.
© 2005 by CRC Press
1 1
15-12
The Industrial Communication Technology Handbook
Response from Slave to Gateway 2 Begin
End
Node
Node
3
34
1 1
0 1
FIGURE 15.19 Response from slave to gateway 2. Answer from Gateway 2 to Gateway 1
Begin
Node 2
0 1
Port 2
0 1
Extra Bytes
6
0 1
Node 5
0 1
Task 20
0 1
Node 1
1 1
End
Port 3
1 1
Node 34
1 1
End 0
1 1
FIGURE 15.20 Answer from gateway 2 to gateway 1. Answer from Gateway 1 to Master
Begin
Node 5
0 1
Task 20
0 1
Extra Bytes
6
0 1
Node 4
1 1
Port 1
1 1
Node 5
1 1
End
Port 1
1 1
Node 34
1 1
End 0
1 1
FIGURE 15.21 Answer from gateway 1 to master.
when node 1 in segment B gains access to the bus. No additional response will be generated by gateway 1 to gateway 2, because the final address byte contains zero, meaning that this is an awaited answer (Figure 15.21). The final part of the request–response cycle, received by the originating node 5 while node 4 is holding the bus token. It indicates that the source of the data is from node 34, which has traveled across two segments and that the response is associated with task 20. The attached info field contains the requested data. The generation of any response is not required or expected for the reasons given above.
15.5.2 Control Status Field Control/status is the means by which it is possible to communicate an instruction from a master device to a slave during a request, or to reveal status or error information in a response. It consists of just one byte. The first three least significant bits (0 to 2) are used as a reduced instruction set during a request: • 001 Store — Used when transferring data from a master to slave. The response from the slave contains no data. • 010 Load — Used to request specific data from slave to master. The response from the slave contains all or part of the requested data. • 011 AND — Used to AND data from a master with that in a specific location in a slave. The response contains no data. • 100 OR — As above, except the data is ORed. • 101 Test and Set — This instruction is used to send data from a master to a slave, and after being ANDed with the contents of the specified location, returned to the master. • 110 Long Load — Used when it is known that the number of bytes of data requested from a slave will exceed 64. Here the communication will be automatically divided into 56-byte messages, so that other masters can also be given the opportunity to access the bus. • 111 Long Store — As above, except that the data will be transferred from master to slave. 15.5.2.1 Softwire Numbers In a request, bit 3 is used to define whether the location address of the data is to be regarded as a symbolic SwNo (0) or is an absolute address (1), which is mainly used for test purposes. This is the first time this
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-13
word has been used so far, but it is an important concept in the world of P-NET. A softwire number is a way of symbolically defining the location of a variable within a P-NET node. SoftWiring can be conceptualized as the means of connecting a declared variable in an application to its value held in a remote node. Using softwire numbers means that an ordered set of variables all associated with a particular measurement or collection of data can be defined (a channel), even though these may be scattered throughout a microcontroller’s memory and may even use various memory types. This is opposed to using absolute addressing, where the location of a measurement in a device designed using one microcontroller is likely to be different for the same measurement using another type of controller. In a response, the use of the control/status byte changes to signal various coded qualitative states, varying from “OK,” “busy,” and “answer comes later” to error codes representing “no response,” “data format error,” “node address error,” “write protection,” and “net short circuit.” These are just a brief example of some of the tests and analyses that are performed on a message on its journey to and from a P-NET node. Generally speaking, if bits 0 to 2 are not all zero, then the status given is error-free, although bit 7 is used to show whether any historical error has been flagged in the node. Conversely, if bits 0 to 2 are all zero, then the code relates to various error states.
15.5.3 Info Length Field The info length field defines how many bytes of data are included in this transmission. The single byte is divided into two parts. Bits 0 to 5 define within the range 0 to 65 how many bytes are included in the info field. Bits 6 and 7 are also encoded to indicate whether the most significant bytes in the datacontaining info field are a SwNo of 2 or 4 bytes long and whether an offset is included.
15.5.4 Info Field The whole purpose of a communication protocol is to send and receive some kind of data or information. It is the info field that is the culmination of all the internal negotiating communications between the different layers within each node, to provide this within a frame container to the outside world. Of course, the frame has to be decoded and travel back up the layers again before the user can see any result, such as a measurement presented on a screen or that a solenoid has been activated. It is this assembling and disassembling from a command block into a packet into a frame and back again that is the essence of a communication protocol. The info field contains the number of data bytes defined by the info length field.
15.5.5 The Error Detection Field The error detection field provides a means for a check to be made as to whether any corruption of data has occurred during transmission. There are two selectable levels of detection. The first is called normal error detection, and during the formation of the frame, each byte to be sent is exclusive ORed with each of two previously cleared registers. For each byte for transmission, the second register is shifted one bit to the left. When the formation of the info field has been completed, the value of the first register is sent, followed by the value of the second register. The receiving node reverses the process, where each byte received is exclusive ORed with each of two previously cleared byte registers. For each byte received, the second register is shifted one bit to the left. On receipt of the first error detection field byte, the first register should be zero. On receipt of the second error detection field, the second register should be zero. If this is not the case, an error detect failure is generated. This method of error detection has a Hamming distance of 4, meaning that up to 3 randomly spaced erroneous bits within 64 will be detected. The second level of error detection is called reduced error detection. Here the Hamming distance is 2, where any single bit error will be detected. The method involves a single register to which each transmission byte is ADDed without carry. When the info field has been transmitted, the 2’s complement of the contents of the register is sent in the error detection field. The receiving node reverses the process by adding each byte received to a register. When the complete frame has been received, the register should contain zero. If this is not the case, an error detect failure will be generated.
© 2005 by CRC Press
15-14
The Industrial Communication Technology Handbook
In addition to the two methods of error detection described above, every received byte is checked for a correct start bit, address/data bit, and stop bit. If an error is found, then an overrun/framing error bit is set in the control/status byte. It should be noted that whatever error detection mechanism is chosen, all nodes within the same network segment must use the same method.
15.5.6 Master–Slave One of the characteristics of P-NET is that it is a multimaster–slave protocol, meaning that only one of perhaps a number of master devices at a time can instigate a request to another node (slave) to send back or receive data. However, it should be remembered that a master can (and often does) also act as a slave when it is not holding a token. To maximize efficiency, it is ensured that a request–reply cycle is completed in the fastest time possible. This is done by arranging for a slave module to prepare an immediate response once the slave has established that the request on the bus is for itself. Since the first byte in a request contains the node address, which is read by all nodes, only the slave with that address needs to perform any further processing beyond that point. All other slaves can “go to sleep” with respect to communication and get on with their own specific functional processing. Once the identified slave establishes that the frame generated by the master is complete, it must start to respond back to that master within 11 to 30 bit periods, which is equivalent to a maximum of 390 µS at 76.8 Kbit/s. During the period that the slave is transmitting the response frame, it has exclusive access to the bus.
15.5.7 Multimaster Bus Access As explained previously, P-NET has the versatility to include more than one master node within a single bus segment. In fact, up to 32 masters are permitted (some of which could be the common PC). Since any P-NET segment only utilizes one common communication cable, there is a need for a mechanism to determine which of the master nodes is permitted to access the bus. In communication terms, this uses a technique called token passing. Here, it is arranged that only one master is able to hold the token at any one time, and when it does, it is permitted to access the bus. When that master has finished using the bus, it passes the token to another master. Unlike some token-passing techniques, P-NET uses a rather ingenious method to transfer the right to access where no actual data representing the token are communicated between masters. This method is therefore called virtual token passing. It has also been seen previously that the data (info) held in a frame can vary from 0 to 63 bytes. Although this does not prevent large amounts of concatenated data (record, array, database, program, etc.) from being transmitted from one master to another node, it is arranged so that it is fragmented into token-controlled transfers of up to 56 bytes at a time. The consequence is that any master connected to the same bus gets the same level of priority and opportunity to access the bus, and no master can clog up the system when transferring large amounts of data. So, how is this done? Well, a master needs to incorporate an additional mechanism not required in a slave. The main part consists of two counters: the idle bus bit period counter (IBBPC) and the access counter (AC). The latter is associated with node addressing. Each master is given a node address between 1 and 32 (normally sequentially) as well as a number representing the maximum number of masters that are expected to be connected to the bus (e.g., node 3 of four masters). The IBBPC is designed to increment for each bit period that the bus has not made a transit from a 1 to a 0. For a bus operating at the standard 76.8 Kbs, this bit period is 11 µS ± 0.1%. If such a change is seen, then the IBBPC is reset to zero. When transmissions are being made, the first byte in a frame that has bit 7 set to 1 will contain the node address of the token-holding master, because it is the only one allowed to start a transmission off. All other masters can see this and will therefore synchronize their access counters to the same number as the node address of the active master. When a slave has finished responding to a request from the token-holding master, there will then be an idle period of 40 bits (Figure 15.22). At this point in time, therefore, all IBBPCs will have a count of 40. This number is the minimum decade value to stimulate an access counter to increment by 1, or to reset to 1 if it becomes greater than
© 2005 by CRC Press
15-15
The Anatomy of the P-NET Fieldbus
Action Bus
AccessCounter
3
4
1
50
40
2
60
3
4
80
70
Idle-Bus Bit-Period Counter
FIGURE 15.22 Sharing bus access between four master nodes.
Physical Link Layer 1
Data Link Layer 2
Network Layer 3
Service Layer 4
Port N Port 2 Port 1
BusII BusI interface
*
13 12 11 NA X
Master Receive Multimaster BusAccess
Master Buffer
*
10
NA Y C o n v e r t
Program Service
Slave Transmit
* Bit 7 in the first address byte is "1"
*
Data Memory
NA 2 NA:Y
* Internal"Bus" Node Add Error NA70
T A S K
NA 11 NA 10
ACL
A d d r e s s
Slave Receive
NA: NA:X
Application Layer 7
NA??
*
P-NET Service
SW List
FIGURE 15.23 A conceptual view of the P-NET communication layers.
the recorded maximum number of masters. The master node that has a node address equal to the value of the access counter is now said to be holding the token. All this has been done without any data having been passed between masters. This master must now use the bus within 2 to 7 bit periods, although there is no obligation to do so if it has nothing to transmit (or indeed, a master with that address is not powered or not even connected). If this is the case after 10 bit periods, then all IBBPCs will have increased by another 10 (i.e., 50, 60, 70 …), which will increase the access counter by 1, thereby giving the next master in the cycle the chance to access the bus (hold the virtual token). If a master does respond within the timescale, the bus will no longer be idle, and all IBBPCs will therefore be reset to zero, awaiting the end of the current acknowledge transmission (Figure 15.23).
15.6 Layer 3: The Network Layer The structure and contents of the frame as described in the data link layer (layer 2) have encompassed practically all the important aspects for addressing and communicating data between two P-NET nodes. However, the frame was the result of assembling commands, addresses, and routes from within the other layers of communication activity. In the same way that there is a parallel to serial conversion between the byte structure of the frame in layer 2 and the electrical impulses of layer 1, there is also a structure conversion between layer 2 and layer 3, the network layer. The frame is converted into a packet and vice
© 2005 by CRC Press
15-16
The Industrial Communication Technology Handbook
versa. Its form is not much different, where basically the error detect field in a frame is replaced with a retry timer in a packet. Layer 3 is basically the P-NET post office, which transfers the packets according to the destination address. A message may be required to be sent out of another P-NET port, or into the P-NET service layer (layer 4), or back to the requesting application, or to return a message indicating that an address is unknown. It also performs any action necessary to ensure that a response finds its way back. It may be deduced that a gateway will require greater activity at this level than a single port master. When considering a slave device, layer 3 is practically transparent.
15.7 Layer 4: The Service Layer In essence, layer 4 performs two kinds of service, one when the node is acting as a slave, called the P-NET service, and the other when acting as a requesting master, called the program service. One common aspect is that each is now able to access internal data memory and an area called the softwire list. As mentioned before, a softwire number is an important aspect of P-NET in that, together with a node address, it is the means to symbolically address a memory location anywhere within a P-NET system. A softwire list is a memory-based table. It holds details about the device’s own internally held global variables and details of the variables used by this device’s application program, which have been declared to be located in other addressable nodes. In either case, the list index is the SwNo; it maps not only the absolute address of the variable, including the node address if external, but also the data and memory type. Therefore, if a device acting as a slave has received a request for data, it is this layer that interprets the command instruction (load, store), translates the SWNo into an absolute memory location, retrieves or stores the data, and converts the packet into a response by removing all addresses having bit 7 = 0. This is then sent back to the network layer. If the device requires the sending of data or the requesting of data from another node, a command is received from the application layer. The associated SwNo of the variable is translated into a complete node address, and a task number is attached to identify to which application task this message applies. Together with the command and data if any, this is all formed into a packet and sent to layer 3. If the amount of data to be sent exceeds 56 bytes, it is this layer that holds all the fragmented packets for release each time the master regains access to the bus. If the application has made a number of requests to which “answer comes later” has been received, this layer is responsible for ensuring that the response is returned to the correct application task.
15.8 Layer 7: The Application Layer Layer 7 is the means by which application programs begin the process of accessing variables in other nodes and networks. When a task in a running program requires the value of an external global variable to be obtained or changed, the program code is translated into a command block containing a code defining the operation requirement, the SwNos of both the internal and external variables, an expected data length, and a means to relate the block to the calling task. The P-NET protocol is therefore able to deal with multitasking processes, where a number of tasks may be making simultaneous calls on network variables. Such blocks are used in layer 4, by referring to the softwire list to form the complete node address, to be structured into packets for transfer through the other layers. It is the act of declaring variable types and locations in the user program that generates, during compilation, the softwire table for use by the running application(s).
15.9 Layer 8: The User Layer As far as the traditional OSI seven-layer model is concerned, there is no layer 8 in the analysis of a communication path between two devices. This is quite reasonable since, for example, in describing a
© 2005 by CRC Press
15-17
The Anatomy of the P-NET Fieldbus
RegNo. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Identifier FlagReg IOTimer Counter OutCurrent Operatingtime LevelVoltage MinMaxCurTimer InputHoldPreset OutputPreset ChConfig
MEMORY TYPE RAM Read Write RAM Read Write RAM Auto Save RAM Read Only RAM Auto Save RAM Read Only RAM Read Write RAM Init EEPROM RAM Init EEPROM EEPROM RPW
Read Out Binary Decimal Decimal Decimal Decimal Decimal Decimal Decimal Decimal ------
Type Bit8 Real Long Integer Real Real Real Real Real Record Record
SI unit --s --A s V s s s ---
MinMaxCurrent VoltageScale Maintenance ChType CHError
EEPROM RPW EEPROM RPW EEPROM RPW PROM Read Only RAM Read Only
Decimal Decimal ----------Binary
Record Record Record Record Record
-----------
FIGURE 15.24 A standard digital input/output channel.
telephone network, one does not really need to know what the conversation between the two parties is about once the connection has been made. However, in the current age of object-oriented programming, an individual process measurement object can have many properties associated with it, and within its bounds have various methods applied. The P-NET protocol lends itself very well to this object-oriented concept, and there are certain rules that need to be followed before a device can be said to have P-NET compatibility. The object in this case is a formally structured table of data associated with a particular process or process variable, the index of which has an associated SwNo. We touched on the idea of softwire numbers in layers 4 and 7, but this was to describe how we get the global address of an individual global variable. Here the table, which may just be a small part of the softwire list previously described, incorporates all there is to know about that process variable (its properties), be that its value or state, how long it has been active, how many times it has switched, and its ability to perform automatic functions (its methods). Such a table definition is called a channel and is the basis of a P-NET object. In basic terms, a channel consists of 16 registers of variables or constants, of any simple or complex data type, that can be derived from unrelated memory locations and technologies (Figure 15.24). It is the channel and register numbers that define the SwNo, which together with the node address of the device provides the complete P-NET address. Any universal P-NET device must have at least one channel that represents the whole device as an object. It is within this channel that properties such as node address, manufacturer, serial number, device type, etc., are found or specified. This is called the service channel and is always identified as channel 0. Since anyone using P-NET knows how to access the identifying properties of a device object, it is a simple matter to set up a node address or to check that a connected node is of the correct type or not exhibiting any errors. Of course, it is in the slave nodes where the object-oriented structure of channels is highly utilized. A device can have as many channels as thought to be practical as long as they include a service channel. Each of these can be of the same channel type or be different. A device manufacturer may decide to have an eight-channel module concentrating on digital input/output. Another may wish to mix analogue, digital I/O, and other channel types in the same device. A low-cost transmitter could be made using one special measurement channel and a service channel. Over the years, many standardized channel types have been defined. These are held by the International P-NET User Organization (IPUO) and help to ensure that a device manufacturer of a weight transmitter, for example, can provide a compatible channel structure to one that has already been produced. It is these predefined object structures that make it easy for the programming user to treat as local variables any variable within the distributed nodes connected throughout a highly complex networked
© 2005 by CRC Press
15-18
The Industrial Communication Technology Handbook
Node Application
Communications
Microprocessor
Dualport RAM
Dip-switches Micro-/Chipprocessor
Fieldbus interface
Fieldbus
FIGURE 15.25 A fieldbus node requiring additional protocol hardware and memory.
system. Since such structures hold the address and data type of all device variables, the programmer only has to declare that he is using an instance of this predefined device type and then just specifies its node address. All contained and named variables within this device (also defined in the device’s own softwire list), of which there may be hundreds, are therefore also declared, and can be referred to in high-level languages by means of highly convenient object-oriented identifiers such as Tank1.Temperature2.Value. The programming user, of course, does not have to worry about writing any code associated with the transport of data between nodes, and can instead concentrate on designing the required processes using a meaningful naming and object-oriented strategy.
15.10 The Intelligent P-NET Node One of the main philosophies behind P-NET is that of distributing processing power, popularly known as intelligence. The idea of controlling a complete system from a central point, even if basic transducers and actuators are networked, is at odds with the concept of segmenting a system into autonomous but intercommunicating parts. Fieldbus was born when integrated microcontrollers became available, providing both programmability and serial communication. Some fieldbus types offer manufacturers an add-on chip to provide the means to communicate with that protocol. Others offer a complete microcontroller with built-in protocol (Figure 15.25). P-NET has been designed so that a wide selection of standard microcontrollers can be used depending on how they are to be used. Not only does this make it unnecessary to interface with additional hardware, but it also allows the user to choose his preferred or most economic supplier (Figure 15.26). The protocol is therefore completely software based, which for a slave at least, could just about be derived from what has been described here so far. Alternatively, a copy of the international standard will provide the formal structure. However, the most attractive route may be to make a one-off purchase of the code for the specific micro family of your choice. Any of these routes ensures that no extra royalties or levies will be paid to a supplier for special chip purchases. As mentioned elsewhere in this treatise, one of the first P-NET devices was a magnetic flow meter transmitter. This was and continues to be a highly intelligent fieldbus node. Not only does the microcontroller deal with aspects of electronically providing a measurement of flow and converting this into volume, but it is able to present these measurements configured into the unit of measurement the user chooses. Furthermore, by integrating
© 2005 by CRC Press
15-19
The Anatomy of the P-NET Fieldbus
P-NET Node Application & Communications Microprocessor
Fieldbus interface
Fieldbus= P-NET
FIGURE 15.26 A P-NET node where communications protocol is part of the application software.
temperature measurement with digital and analogue I/O, and a PID channel, closed-loop temperaturecompensated batch and blend control can be independently undertaken without any intervention from the fieldbus, apart from perhaps initially sending a set point or requesting a flow measurement. This fact alone reduces the requirement for a high bandwidth, because high-speed operations can be performed locally. Since the design of this device, many other specialized sensors and general-purpose digital and analogue input/output modules have been produced (Figure 15.27). Thus, the general trend is that any object-oriented channel incorporated into a module has the intelligence to perform some form of autonomous task or process without the need for constant fieldbus communication. For digital devices, this may include performing high-speed counting, measuring the load current, and automatically switching off if a set point level is reached. (Also, pulse width and duty cycle control can be performed, depending on the setting of individual control bits within a channel register.) For analogue channels, the most likely
FIGURE 15.27 Example of slave module containing a mix of analogue and digital I/O channels and two internal channels.
© 2005 by CRC Press
15-20
The Industrial Communication Technology Handbook
FIGURE 15.28 A PC as an integral part of a P-NET system.
task is to scale a measurement into engineering units for immediate use elsewhere on the network, signal level monitoring, and setting of alarm bits.
15.11 The PC and P-NET No process automation system based on a fieldbus could yield its full potential without somehow providing the opportunity to incorporate the ubiquitous PC somewhere during its development, commissioning, or operation. P-NET is no exception in this respect, where many PC-based tools have been developed for defining a system structure, setting up node addresses, configuring channel functions, and monitoring individual or lists of variables. Furthermore, software development editors, compilers, and debuggers have been made available for the download of multitasking, high-level language and objectoriented process automation programs to program channels incorporated into P-NET multiple-port master devices (Figure 15.28). To do this, there must first be an interface between the PC and P-NET, and second, the PC must be running a real-time operating system. The first part is relatively straightforward, in that either a built-in interface card or an external module attached to the parallel port can be used. In fact, the PC serial or local area network port could also be used, if this were directly connected to an RS 232 or Ethernet port on a P-NET master. As far as the operating system is concerned, there is really only one package to consider — VIGO. In the same way that P-NET was derived from a phrase, VIGO was derived from the term virtual global object to depict that this package has the means to generate and manage objects (Figure 15.29). Since the time P-NET was conceived, VIGO has evolved into the realtime Windows-based PC operating system for P-NET, in the same way that Microsoft Windows is really
FIGURE 15.29 The object-oriented nature of P-NET.
© 2005 by CRC Press
15-21
The Anatomy of the P-NET Fieldbus
Configuration Editor Applications e.g., Visual Basic,Excel,Access,VisualC++,.... BackupRestore
Common Application Service Interface: OLE2 Automation Interface
Manager Information Base
VIGOSERV Application Programmers Fieldbus Interface Common Communication Service Interface: MMS Interface
P-NET Protocol
Profibus Protocol
WorldFip Protocol
Compiler
Project Configuration
Debugger
MAPFile
? Protocol Instruction Data Converter (IDC) HUGO2API
HUGO2 Real time communication kernel for MS Windows
HUGO2 API
P-NET Driver
Profibus Driver
WorldFip Driver
? Driver
P-NET HW
Profibus HW
WorldFip HW
? HW
Network Drivers
Hardware Drivers
Fieldbuses
Novell Driver
NetBios Driver
Ethernet HW
Local Area Networks
FIGURE 15.30 Conceptual view of the structure of VIGO.
the only choice for Intel-based PCs. VIGO, as well as a great deal of other P-NET hardware and software, has been and continues to be developed by PROCES-DATA. As with Windows, VIGO is constantly evolving and is really an integrated suite of interrelated programs operating around a central core (Figure 15.30). At the time of writing, VIGO is well into the version 5.x level, having first made an appearance in the early 1990s, and is freely available on the Internet. One could regard VIGO as a means to turn a PC into a P-NET multimaster node, and indeed it does just that. You can use a PC as a transparent gateway node between P-NET segments or between P-NET and other electrical standards and protocols such as RS 232 for Modbus, or Ethernet for linking to local nodes, or the Internet. Since VIGO is extensible, additional protocols, including those for other fieldbus types, could be incorporated. The other exceptional feature of VIGO is that it is also a Windows OLE2 automation server. This means that any standard Windows application designed as an object linking and embedding (OLE) client, such as Excel or Access, can transfer data to and from these programs and any P-NET remotely located node. Furthermore, customized graphical applications can be written in any language for use on a PC, e.g., Delphi, Visual Basic, C++, etc., and can interface with all declared P-NET variables. One such specific application is included in the P-NET suite: Visual VIGO. This is because it is a graphical SCADA (supervisory control and data acquisition) application, where a system is drawn by the system designer using graphical components or objects. Each is allied to a P-NET variable so that, for example, the liquid level in a tank can be displayed as a configurable bar graph, or that a picture of a valve can be clicked with the mouse to open and close it (Figure 15.31). Visual VIGO also incorporates data acquisition components, where variables either can be conditionally logged in a P-NET master and then used by Visual VIGO to plot historical trend graphs or tables, or can be directly logged by the PC in real time. In both cases, collected data are held on the PC hard disc in a form suitable for conversion to any other database format (Figure 15.32).
© 2005 by CRC Press
15-22
The Industrial Communication Technology Handbook
FIGURE 15.31 Visual VIGO display of a mixing system.
15.12 The Appliance of P-NET From its inception in Denmark, P-NET was, among other things, initially used to link together groups of intelligent hygienic flow meter transmitters, so that they could be configured, provide multiple measurement values already scaled into engineering units, set batches, and control flow with PID loops, all from a central point using only one cable. The early successes with this fieldbus type therefore tended to revolve around industries requiring hygienic liquid measurement, e.g., dairies, breweries, and soft drinks companies. The fact that Denmark is renowned for the quality of its dairy products and beer, ensured that the advantages of this emerging fieldbus technology were rapidly adopted nationally during any upgrading or the building of new plants. P-NET was made a national standard, and together with two other open standards from Germany and France formed a combined European standard: EN 50170. Another industry of importance to Denmark is intensive pig farming, providing significant exports of bacon all over the world. P-NET provided an economic opportunity to modernize and automate animal feeding systems, to an extent that customized animal feeding systems have also been exported widely. The networking and object-oriented nature of P-NET was recognized by other diverse industries for its systems’ ability to be packaged for export. The manufacture of concrete products, involving weighing, mixing, conveyor belt control, etc., has been one of the successful examples of a customizable system completely removed from the dairy/brewing sector. Another major environment within which P-NET has had a major impact has been shipping. Systems for large tankers, container ships, ferries, and luxury yachts involve ballast control, engine management, level measurement, and general heating/lighting/ security duties. One of the reasons for its popularity in this region is because P-NET is multimaster, allowing simultaneous monitoring and distributed control from various locations aboard a ship. In addition, the fact that P-NET is multinet means that there is a naturally built-in redundancy for signal paths, essential in order to gain worldwide insurance and safety approvals.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-23
FIGURE 15.32 Visual VIGO logged data.
P-NET has also been found to suit the requirements of smaller mobile systems, such as delivery/ collection trucks. It goes without saying that the flow metering requirements of the dairy/brewing industries have also extended to the collection of ex-farm milk and, in Scandinavia at least, to deliver beer in bulk to bars and hotels. Such systems may only require a single multiport master and some slave transducers, but the ability to port to GPS for customer or supplier location recording and the use of GSM for download of collected data, demonstrate the flexible advantages offered by P-NET at the other end of the complexity spectrum. We cannot leave Denmark without mentioning a huge national project confirming the “green” credentials of this high-tech country. There is legislation in some European countries to levy a charge on drink bottles and cans, to ensure that these are returned when empty for reuse or recycling. This has, in the past, involved additional resources to sort, store for return, and credit the customer. The steady increase in the variety of bottles and the addition of cans into the equation warranted a technological solution to ensure a continued green advantage. Of course, this massive undertaking by up to 10,000 retail shops and supermarkets is being highlighted because the only fieldbus permitted to be used by the machine manufacturers is P-NET. The system consists of at least two machines, each of which can be supplied by different companies. Obviously, the question of compatibility and interoperability is an important consideration here. The use of the P-NET protocol over various physical media is also demonstrated, in that automatic communication between each shop and central depots uses the data communication facilities offered by the GSM mobile phone network. This enables all outlets to be kept up to date with current charges together with the identities of all permitted bottles and cans.
15.13 Worldwide Fallout The diverse nature and popularity of P-NET applications within its country/region of origin has no doubt had a profound effect on the quantity of product, systems, and knowledge exported to other areas of the
© 2005 by CRC Press
15-24
The Industrial Communication Technology Handbook
FIGURE 15.33 P-NET modules mounted in an electrical distribution box.
world. In the same way as other fieldbus types, P-NET is an enabling technology, and as additional product has become available through systems and component manufacturers and distributors, system integrators and designers have embraced the business opportunities presented to produce enhanced or new systems in many other countries. Some memorable examples include: • A well-known engineering company in India is producing tire-manufacturing plants controlled by P-NET. • A Chinese company is using P-NET to control wind turbines. • A German company is using an intrinsically safe form of P-NET to produce metering systems for petroleum and domestic heating oil delivery trucks. • A U.K. company is producing fuel management systems based on P-NET to fuel trains from depots throughout British railways. A Canadian company manufactures P-NET-enabled mixing systems for the soft drink industry (Figure 15.33). Other industrial sectors within which P-NET is represented include: • • • • • • • • •
Fish farming Agricultural systems Propane gas container filling systems Textile manufacture Blood testing equipment manufacture Building management systems Product selection machines in retail outlets Weather stations Home automation
15.14 P-NET for SMEs It could be argued that as a fieldbus type, P-NET is not quite as well known as some of the more publicized protocols. This may be because its diverse utilization within many industrial sectors and world locations has been promoted more by communication between engineers and programmers looking to provide solutions, rather than through the budgets of sales and marketing organizations. However, like its siblings, P-NET is an international standard because it meets and often exceeds the criteria required for it to be so. Its often unsung implementations by small and medium enterprises (SMEs) in many major projects can perhaps be likened to the quiet enthusiasm of certain software engineers for the Linux operating system over Windows, where the well-known use of the latter can often overshadow a technical or commercial advantage of the former.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-25
For those already having some knowledge of fieldbus types, it is hoped that this chapter has helped put P-NET into some kind of comparative perspective. For those new to industrial communication techniques, its compilation will have been worthwhile if there is now at least a conviction that fieldbus technology like P-NET has much to offer in achieving robust solutions to the challenges presented by modern industrial processing.
Bibliography [1] [2] [3] [4] [5]
The P-NET Fieldbus for Process Automation: International P-NET User Organization. P-NET 502 058 01, International P-NET User Organization. Installation Guide, PROCES-DATA A/S. www.p-net.org. (Various figures and diagrams were obtained from this source.) www.proces-data.com. (Various figures and diagrams were obtained from this source.)
© 2005 by CRC Press
16 INTERBUS Means Speed, Connectivity, Safety
Jürgen Jasperneite Phoenix Contact GmbH & Co. KG
16.1 Introduction to Field Communication ...........................16-1 16.2 INTERBUS Overview........................................................16-2 16.3 INTERBUS Protocol .........................................................16-4 16.4 Diagnostics.........................................................................16-7 16.5 Functional Safety...............................................................16-8 16.6 Interoperability, Certification ...........................................16-9 16.7 Connectivity ....................................................................16-10 16.8 IP over INTERBUS .........................................................16-12 16.9 Performance Evaluation..................................................16-13 16.10 Conclusions .....................................................................16-14 References ...................................................................................16-14
16.1 Introduction to Field Communication The growing degree of automation in machines and systems increases the amount of cabling required for parallel wiring due to the large number of input/output (I/O) points. This brings with it increased configuration, installation, start-up, and maintenance effort. The cable requirements are often high because, for example, special cables are required for the transmission of analog values. Parallel field wiring thus entails serious cost and time factors. In comparison, the serial networking of components in the field using fieldbus systems is much more cost-effective. The fieldbus replaces the bundle of parallel cables with a single bus cable and connects all levels, from the field to the control level. Regardless of the type of automation device used, e.g., programmable logic controllers (PLCs) from various manufacturers or PC-based control systems, the fieldbus transmission medium networks all components. They can be distributed anywhere in the field and are all connected locally. This provides a powerful communication network for today’s rationalization concepts. There are numerous advantages to a fieldbus system in comparison with parallel wiring: The reduced amount of cabling saves time during planning and installation, while the cabling, terminal blocks, and control cabinet dimensions are also reduced (Figure 16.1). Self-diagnostics minimize downtimes and maintenance times. Open fieldbus systems standardize data transmission and device connection regardless of the manufacturer. The user is therefore independent of any manufacturer-specific standards. The system can be easily extended or modified, offering flexibility as well as investment protection. Fieldbus systems, which are suitable for networking sensors and actuators with control systems, have represented state-of-the-art technology for some time. The main fieldbus systems are combined under
16-1 © 2005 by CRC Press
16-2
The Industrial Communication Technology Handbook
FIGURE 16.1 Serial instead of parallel wiring. TABLE 16.1 The Four Basic Types of Arithmetic Operations for Field Communication Signal acquisition Functional safety Drive synchronization Connectivity
Quick and easy acquisition of signals from I/O devices Transmission of safety-related information (e.g., emergency stop) Quick and precise synchronization of drive functions for distributed closed-loop controllers Creation of seamless communication between business processes and production
the umbrella of IEC 61158 [1]. This also includes INTERBUS as type 8 of IEC 61158 with an installed basis of 6.7 million nodes and more than 1000 device manufacturers. The requirements of these systems can be grouped according to the four basic types of arithmetic operations for field communication shown in Table 16.1.
16.2 INTERBUS Overview INTERBUS has been designed as a fast sensor–actuator bus for transmitting process data in industrial environments. Due to its transmission procedure and ring topology, INTERBUS offers features such as fast, cyclic, and time-equidistant process data transmission, diagnostics to minimize downtime, and easy operation and installation, as well as meets the optimum requirements for fiber-optic technology. In terms of topology, INTERBUS is a ring system; i.e., all devices are actively integrated in a closed transmission path (Figure 16.2). Each device amplifies the incoming signal and forwards it, enabling higher transmission speeds over longer distances. Unlike other ring systems, the data forward and return lines in the INTERBUS system are led to all devices via a single cable. This means that the general physical appearance of the system is an open tree structure. A main line exits the bus master and can be used to form seamless subnetworks up to 16 levels deep. This means that the bus system can be quickly adapted to changing applications. The INTERBUS master–slave system enables the connection of up to 512 devices, across 16 network levels. The ring is automatically closed by the last device. The point-to-point connection eliminates the need for termination resistors. The system can be adapted flexibly to meet the user’s requirements by adding or removing devices. Countless topologies can be created. Branch terminals create branches,
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-3
FIGURE 16.2 Topology flexibility.
which enable the connection and disconnection of devices. The coupling elements between the bus segments enable the connection and disconnection of a subsystem and thus make it possible to work on the subsystem without problems, e.g., in the event of an error or when extending the system. Unlike in other systems where data are assigned by entering a bus address using dual in-line package (DIP) or rotary switches on each individual device, in the INTERBUS system data are automatically assigned to devices using their physical location in the system. This plug-and-play function is a great advantage with regard to the installation effort and service friendliness of the system. The problems and errors, which may occur when manually setting device addresses during installation and servicing, are often underestimated. The ability to assign easy-to-understand software names to the physical addresses enables devices to be added or removed without readdressing. In order to meet the individual requirements of a system, various basic elements must be used (Figure 16.2): 1. Controller board: The controller board is the master that controls bus operation. It transfers output data to the corresponding modules, receives input data, and monitors data transfer. In addition, diagnostic messages are displayed and error messages are transmitted to the host system. 2. Remote bus: The controller board is connected to the remote bus devices via the remote bus. A branch from this connection is referred to as a remote bus branch. Data can be physically transmitted via copper cables (RS-485 standard), fiber optics, optical data links, slip rings, or other media (e.g., wireless). Special bus terminal modules and certain I/O modules or devices such as robots, drives, or operating devices can be used as remote bus devices. Each has a local voltage supply and an electrically isolated outgoing segment. In addition to the data transmission lines, the installation remote bus can also carry the voltage supply for the connected I/O modules and sensors. 3. Bus terminal module: The bus terminal modules, or devices with embedded bus terminal module functions, are connected to the remote bus. The distributed local buses branch out of the bus terminal module with I/O modules, which establish the connection between INTERBUS and the sensors and actuators. The bus terminal module divides the system into individual segments, thus enabling one to switch individual branches on/off during operation. The bus terminal module amplifies the data signal (repeater function) and electrically isolates the bus segments. 4. Local bus: The local bus branches from the remote bus via a bus coupler and connects the local bus devices. Branches are not allowed at this level. The communications power is supplied by the bus terminal module, while the switching voltage for the outputs is applied separately at the output modules. Local bus devices are typically I/O modules.
© 2005 by CRC Press
16-4
The Industrial Communication Technology Handbook
FIGURE 16.3 The layer 2 summation frame structure of INTERBUS.
16.3 INTERBUS Protocol INTERBUS recognizes two cycle types: the identification cycle for system configuration and error management, and a data transfer cycle for the transmission of user data. Both cycle types are based on a summation frame structure (Figure 16.3). The layer 2 summation frame consists of a special 16-bit loopback word (preamble), the user data of all devices, and a terminating 32-bit frame check sequence (FCS). As data can be simultaneously sent and received by the ring structure of the INTERBUS system (full-duplex mode), this results in very high protocol efficiency. The logical method of operation of an INTERBUS slave can be configured between its incoming and outgoing interfaces by the register set shown in Figure 16.4. Each INTERBUS slave is part of a large, distributed shift register ring, whose start and end point is the INTERBUS master. In the input data register, input data, i.e., data that is to be transmitted to the master, is loaded during data transfer. The output data registers and the cyclic redundancy check (CRC) register are switched to the input data register in parallel [8]. The polynomial g(x) = x16 + x12 + x5 + 1 is used for the CRC. After finishing a valid data transfer cycle, output data from the output data register are written to a memory and then accepted by the local application. The CRC registers are used during the frame check sequence to check whether the data have been transmitted correctly. The length of the I/O data registers depends on the number of I/Os of the individual node. The master needs to know which devices are connected to the bus so that it can assign the right I/O data to the right device.
FIGURE 16.4 Basic model of an INTERBUS slave node.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-5
Once the bus system has been switched on, the master starts a series of identification cycles, which enable it to detect how many and which devices are connected. Each slave has an identification data register, which has a 16-bit ID code. The master can use this ID code to assign a slave node to a defined device class (e.g., digital I/O node, analog I/O node) and detect the length of the I/O data registers in a data cycle. The control data registers are switched in parallel to the identification data registers, whereby the individual devices can be managed by the master. Commands are transmitted, e.g., for local resets or outgoing interface shutdown. The identification cycle is also used to find the cause of a transmission error and to check the integrity of the shift register ring. The individual registers are switched in the different phases of the INTERBUS protocol via the selector in the ring. On layer 1, INTERBUS uses two telegram formats with start and stop bits similar to those of Universal Asynchronous Receiver/Transmitter (UART): • The 5-bit status telegram • The 13-bit data telegram The status telegram is used to generate defined activity on the medium during pauses in data transmission. The slave nodes use the status telegram to reset their internal watchdogs, which are used to control a fail-safe state. The data telegram is used to transmit a byte of the layer 2 payload. The remaining bits of both telegrams are used to distinguish between data and ID cycles, as well as the phase data transfer and FCS within a cycle. This information is used by the selector shown in Figure 16.4 to switch the relevant register in the ring. INTERBUS uses a physical transmission speed of 500 kbps or 2 Mbps. The cycle time, i.e., the time required for I/O data to be exchanged once with all the connected modules, depends on the amount of user data in an INTERBUS system. Depending on the configuration, INTERBUS can achieve cycle times of just a few milliseconds. The cycle time increases linearly with the number of I/O points, as it depends on the amount of information to be transmitted. For more detailed information, refer to the performance evaluation. The architecture of the INTERBUS protocol is based on the OSI reference model according to ISO 7498. The protocol architecture of INTERBUS provides the cyclic process data and an acyclic parameter data channel, using the services of the peripheral message specification (PMS) as well as a peripheral network management (PNM) channel (see Figure 16.5). As is typical for fieldbus systems, for reasons of efficiency, ISO layers 3 to 6 are not explicitly used, but are combined in the lower-layer interface (LLI) in layer 7. The process data channel enables direct access to the cyclically transmitted process data. It is characterized by its ability to transmit process-relevant data quickly and efficiently. From the application point of view, it acts as a memory interface. The parameter channel enables data to be accessed via a service interface. The data transmitted in the parameter channel have a low dynamic response and occur relatively infrequently (e.g., updating text in a display). Network management is used for manufacturer-independent configuration, maintenance, and startup of the INTERBUS system. Network management is used, for example, to start or stop INTERBUS cycles, to execute a system reset, and for fault management. Furthermore, logical connections between devices can be established and aborted via the parameter channel in the form of context management. To transmit parameter data and time-critical process data simultaneously, the data format of the summation frame must be extended by a specific time slot. In several consecutive bus cycles, a different part of the data is inserted in the time slot provided for the addressed devices. The Peripherals Communication Protocol (PCP) performs this task [5]. It inserts a part of the telegram in each summation frame and recombines it at its destination (see Figure 16.6). The parameter channels are activated, if necessary, and do not affect the transfer of I/O data. The longer transmission time for parameter data that are segmented into several bus cycles is sufficient for the low time requirements that are placed on the transmission of parameter information. INTERBUS uses a master–slave procedure for data transmission. The parameter channel follows the client–server paradigm. It is possible to transmit parameter data between two slaves (peer-to-peer communication). This means that both slaves can adopt both the client and server functions. With this
© 2005 by CRC Press
16-6
The Industrial Communication Technology Handbook
FIGURE 16.5 Protocol architecture of an INTERBUS node.
FIGURE 16.6 Transmission of parameter data with a segmentation and recombination mechanism.
function, layer 2 data are not exchanged directly between the two slaves, but are implemented by the physical master–slave structure; i.e., the data are first transmitted from the client to the master and then forwarded to the server from the master. The server response data are also transmitted via the master. However, this diversion is invisible for slave applications. The task of a server is described using the model of a virtual field device (VFD). The virtual field device model unambiguously represents that part of a real application process that is visible and accessible through the communication. A real device contains process objects. Process objects include the entire data of an application process (e.g., measured values, programs, or events). The process objects are entered in the object dictionary (OD) as communication objects. The object dictionary is a standardized public list in which communication objects are entered with their properties. To ensure that data are exchanged smoothly in the network, additional items must be standardized, in addition to the OD, which can be accessed by each device. This includes device features such as the manufacturer name or defined device functions that are manufacturer independent. These settings are
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-7
used to achieve a closed and manufacturer-independent representation of a real device from the point of view of the communication system. This kind of modeling is known as a virtual field device.
16.4 Diagnostics The system diagnostics play an important role in practical applications. In increasingly complex systems, errors must be located quickly using system diagnostics and clearly indicated to the user. In addition to detecting errors, good error diagnostics include reliable error localization. For message-oriented fieldbus systems with a bus structure, only one telegram is ever transmitted to a device at any one time. An error, which affects the system via a specific device or a device nearby, can even destroy telegrams, which are not themselves directed at the faulty device, but may be directed at remote devices. It is therefore virtually impossible to determine the exact error location. INTERBUS uses the CRC procedure in each device to monitor the transmission paths between two devices and, in the event of CRC errors, can therefore determine in which segment the error occurred. An important criterion for maintaining data communication is the response of the master in the event of the following errors: • • • •
Cable break Failure of a device Short circuit on the line Diagnostics of temporary electromagnetic interference (EMI)
In all fieldbus systems, in the event of a line interrupt, the devices after the interrupt are no longer reached. The error localization capability depends on the transmission system used. In linear systems, telegrams are still sent to all devices. However, these telegrams are lost because the devices are no longer able to respond. After a certain period, the master detects the data loss. However, it cannot precisely determine the error location because the physical position of the devices is not known. The system diagrams must be consulted so that the service or maintenance personnel can determine the probable error location (Figure 16.7). Unlike linear systems, the individual devices in the INTERBUS system are networked so that each one behaves as a separate bus segment. Following a fatal error, the outgoing interfaces of all devices are fed
FIGURE 16.7 The behavior of bus systems and ring systems in the event of a cable break.
© 2005 by CRC Press
16-8
The Industrial Communication Technology Handbook
back internally via a bypass switch. In the event of a line interrupt between the devices, the master activates each separate device in turn. To do this, the master opens the outgoing interface, starting from the first device up until the error location, thus clearly identifying the inaccessible device. The controller board can then clearly assign the error location as well as the station or station name and display it in plain text. This is a huge advantage, particularly in large bus structures with numerous devices, where bus systems are typically used. If a device fails, the fieldbus behaves in the same way as for a line interrupt. However, the functional capability of the remaining stations differs in linear and ring systems. In a linear system, bus operation cannot be maintained because the condition of physical bus termination using a termination resistor is no longer met. This can lead to reflections within the bus configuration. The resulting interference level means that correct operation is not possible. In an INTERBUS ring system, the termination resistor is opened and closed together with a bypass switch, which ensures that the condition of the closed ring is always met. In the event of a line interrupt or device failure, the master can either place the devices in a safe state or start up the remaining bus configuration autonomously. Short circuits on the line are a major challenge in a bus system. In the event of a direct or indirect (e.g., via ground) short circuit on the line, the transmission path is blocked for the entire section. In linear systems, the transmission line is used equally for all devices, which means that the master cannot reach segment parts either. This considerably reduces further error localization. In the INTERBUS system, the user is aided by the physical separation of the system into different bus segments. As described for the line interrupt, the devices are activated by the master in turn and the ring is closed prior to the short circuit, which means that subsystems can be started up again. The error location is reported in clear text on the controller board. Linear systems also support a division into different segments. Repeaters, which are placed at specific points, can then perform diagnostic functions. However, a repeater cannot monitor the entire system; it can only cover a defined number of devices per segment. Furthermore, the use of repeaters incurs additional costs, which should not be underestimated, and increased configuration effort. In summary, the INTERBUS diagnostic features are essentially based on the physical segmentation of the network into numerous point-to-point connections. This feature makes INTERBUS particularly suitable for use with fiber optics, which are used increasingly for data transmission in applications with large drives, welding robots, etc. In linear systems, the use of fiber optics — like bus segmentation — requires expensive repeaters, which simulate a ring structure. Fiber-optic path check in the INTERBUS system is another feature, which is not offered by other buses. In this system, a test pattern for the fiberoptic cable is transmitted between the interfaces to determine the quality of the connection. If the cable deteriorates due to dirt, loose connections, bending, etc., the transmission power is increased automatically. If a critical value is reached, the system generates a warning message so that the service personnel can intervene before the deterioration leads to expensive downtimes. Studies by the German Association of Electrical and Electronic Manufacturers (ZVEI) and the German Engineering Federation (VDMA) indicate that many bus errors are caused by direct or hidden installation faults. For this reason alone, bus diagnostics simplify start-up and ensure the smooth operation of the system, even in the event of extensions, servicing, and maintenance work. Every bus system should automatically carry out comprehensive diagnostics of all connected bus devices without the need to install and configure additional tools. Additional software tools for system diagnostics often cost several thousand Euro. In the INTERBUS system, all diagnosed states can be displayed directly on the controller board. If the master has a diagnostic display, various display colors can be used so that serious errors are clearly visible even from a distance. In addition, each master has a diagnostic interface, which can be used to transfer all functions to visualization systems or other software tools.
16.5 Functional Safety In recent years, safety technology has become increasingly important in machine and system production. This is because complex automation solutions require flexible and cost-effective safety concepts, which
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-9
FIGURE 16.8 The safety extension of INTERBUS.
offer the same advantages that users have come to appreciate in nonsafe areas. This means that considerable savings can be made, e.g., in terms of both cost and time, by changing from parallel to serial wiring. From the user’s point of view, however, various requirements must be taken into consideration. First, safe and nonsafe signals must be separated in order to simplify the programming, operation, and acceptance of safety applications. Second, all components used should be operated on a fieldbus, because a standardized installation concept and standard operation make planning operation, start-up, and maintenance easier. These requirements have led to the safety extension of INTERBUS. As the INTERBUS master, the controller board uses an integrated safe control system. The INTERBUS controller board with integrated safe controller is the basic unit in the system (Figure 16.8). It processes all safety-related inputs and confirms them to the standard control system by setting an output or resetting the output. This method of operation is similar to existing contact-based safety technology. The enabling of output data is programmed with preapproved blocks such as emergency stop, two-hand control, or electrosensitive protective equipment in SafetyProg Windows software, which is compatible with IEC 61131. The amount of programming required is reduced considerably through the use of blocks and the enable principle. The safe input and output components form the interface to the connected I/Os. They control, for example, contactors or valves and read the input status of the connected safety sensors, including intelligent sensors. The user uses the parameterization function of INTERBUS to select the settings for the I/O components, such as clock selection, sensor type, and signal type. The INTERBUS Safety system meets safety functions up to category 4 according to EN 954 [6] and safety integrity level 3 (SIL 3) according to IEC 61508 [7]. Depending on the application, the user can choose to use either a one-cable solution with integrated safety technology or a two-cable solution, where one bus cable is used for standard signals and the other for safety signals. A safety protocol is used between the safe controller and the connected I/O safety devices. This protocol provides the desired security of data transmission and can only be interpreted by the connected safety devices. The safety data are integrated transparently into the INTERBUS summation frame (Figure 16.9). This feature ensures the simultaneous operation of standard and safety devices in the bus system.
16.6 Interoperability, Certification The basic aim of open systems is to enable the exchange of information between application functions implemented on devices made by different manufacturers. This includes fixed application functions, a uniform user interface for communication, and a uniform transmission medium. For the user, this profile definition is a useful supplement to standardized communication and provides a generally valid model for data content and device behavior. These function definitions standardize some essential device
© 2005 by CRC Press
16-10
The Industrial Communication Technology Handbook
FIGURE 16.9 Safety protocol on top of the INTERBUS summation frame.
parameters. As a result, devices from different manufacturers exhibit the same behavior on the communication medium and can be exchanged without altering the application software when these standard parameters are used. INTERBUS takes a rigorous approach to the area of interoperability using the Extensible Markup Language (XML)-based device description FDCML (Fieldbus Description Configuration Markup Language) [9]. FDCML enables the different views of a field device to be described due to the generic device model used. Some examples include identification, connectivity, device functions, diagnostic information, and mechanical description of a device. This electronic device description is used in the configuration software for configuration, start-up, and other engineering aspects. Different applications can use FDCML to evaluate various aspects of a component. For example, Figure 16.9 shows the use of the FDCML file as an electronic data sheet in a Web browser. To simplify interoperability and interchangeability, the members of the INTERBUS Club compile a set of standard device profiles in several user groups for common devices such as drives (DRIVECOM Profile 22), human machine interfaces (MMI-COM D1), welding controllers (WELD-COM C0), and I/O devices (Sensor/Actuator Profile 12). These profiles can also be described neutrally with regard to the manufacturer and bus system in FDCML. The INTERBUS Club and various partner institutes have offered certification for several years to ensure maximum safety when selecting components. Independent test laboratories carry out comprehensive tests on devices as part of this process. The device only receives the “INTERBUS Certified” quality mark, which is increasingly important among users, if the test object passes all the INTERBUS conformance tests. The conformance test is composed of examinations that are carried out by test laboratories using various tools. The conformance test is divided into the following sections: • Basic function test (mandatory) • General section (valid for all interface types) • Fiber optics (for devices with fiber-optic interfaces) • Burst noise immunity test (mandatory) • PCP software conformance test (for devices with PCP communication) (dependent)
16.7 Connectivity As shown in Table 16.1, connectivity is one of the four basic arithmetic operations of field communication. Connectivity is the integration of fieldbus technology in company networks.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-11
FIGURE 16.10 Example of an FDCML device description visualized in a Web browser.
However, there are still no standard concepts for connectivity solutions. This makes it more difficult to integrate the field level in the company-wide distributed information system and is only possible through increased programming and parameterization. The Internet Protocol (IP) can be used here as an integration tool [2, 3]. IP is increasingly used in automation technology in conjunction with Ethernet and is then frequently referred to as Industrial Ethernet, for example, [4]. In many cases, IP is already well suited to the field. This section deals with transparent IP communication at the field level, taking into consideration the real-time requirements. This means that the advantages of fieldbus technology are maintained, and at the same time, the user is provided with a new quality of connectivity. For example, well-known office applications such as browsers can be used to load device descriptions, e-mails can be used to send maintenance messages, or File Transfer Protocol (FTP) can be used to upload and download files (Figure 16.10). Ethernet’s role in the future of automation is an important current issue. On the one hand, its specification suggests it could solve all of the communication problems in automation applications and supersede fieldbuses. On the other hand, fieldbuses, with their special characteristics, have arisen because the real world does not consist simply of bits and bytes. However, Ethernet and INTERBUS can only be fully integrated if transparent communication beyond the boundaries of the system is possible without complex conventional gateway processes. This is achieved by using Transmission Control Protocol (TCP)/IP as the standard communication protocol on both systems. While TCP/IP is now standard on Ethernet, this is by no means the case in the factory environment. Virtually all fieldbus organizations and promoters map their fieldbus protocol to Ethernet TCP/IP in order to protect their existing investments. INTERBUS took a different direction early on and integrated TCP/IP into the hybrid INTERBUS
© 2005 by CRC Press
16-12
The Industrial Communication Technology Handbook
protocol. TCP/IP standard tools and Internet technologies based on TCP/IP can therefore be readily transferred to the factory environment without additional expense. For example, on INTERBUS, the FTP service can be used to download control programs and other data to a process controller. The use of FTP to upload and download files such as robot programs is just one advantage, since TCP/IP opens up automation to the world of the Internet. Internet browsers will also be the standard user interface of the factory of the future, when all devices have their own integrated Web page. Special configuration tools, now supplied for devices by virtually every manufacturer, will no longer be needed, as in the future these devices will be configured through ActiveX controls or Java applets that are loaded through the network, and therefore do not have to be present on the control computer beforehand.
16.8 IP over INTERBUS Figure 16.11 shows the system architecture for IP tunneling. The known Ethernet TCP/IP structure can be seen on the left, and the extended protocol structure of an INTERBUS device can be seen on the right. An IP router, with the same mechanisms as in the office environment, is used for coupling. This function is best performed in the PLC. IP tunneling is performed by introducing a new data-send-acknowledged (DAS) service in the INTERBUS parameter channel (Figure 16.12 and Figure 16.13).
FIGURE 16.11 Connectivity creates new options such as Web-based management.
Client
H T T P TCP/ IP ETH
IP- Forw . ETH IB
Server H S P T N PD M T M S P P TCP/ IP IB
Internet/ Intranet
IP Ethernet
FIGURE 16.12 Basic architecture for IP tunneling.
© 2005 by CRC Press
PD/PMS/Safety
INTERBUS
MIB
16-13
INTERBUS Means Speed, Connectivity, Safety
LLI-User services • Data-TransferConfirmed (DTC) • Data-TransferAcknowledged (DTA) • Data-SendAcknowledge (DSA) • Associate (ASS) • Abort (ABT)
…..
Real-Time Application
Internet Protocol (IP)
API (z.B.C, DDI) PMS
DAS-Service Layer 7
Lower Layer Interface (LLI) Peripherals Data Link (PDL) Layer 2
…
FIGURE 16.13 Protocol architecture of an IP-enabled INTERBUS node.
This DAS service enables LLI user protocol data units (PDUs) to be transmitted for unconfirmed, connectionless LLI user services and is used for transparent IP data transmission. These data are transmitted in the same way as the parameter channel (PMS) at the same time as the time-critical process data (PD) exchange.
16.9 Performance Evaluation This section considers the performance of the concept in relation to the relevant fieldbus parameters, such as the number of I/O modules and the amount of cyclic I/O data. The achievable data throughput for IP tunneling is a key performance indicator. Due to the determinism of the INTERBUS system, the throughput can be easily calculated. The following applies to the INTERBUS medium access control (MAC) cycle time TIB (ms): TIB = 13 ◊(6 + N ) ◊
1 ◊1000 + TSW Baud rate
where i =k
N=
 PL
i
i =1
The IP throughput (IP_Th) of a device is calculated as follows: IP _ Th =
M -1 ◊8 TIB
where: N Baud rate TSW PLi M IP_Th
© 2005 by CRC Press
= total payload size: sum of all user data (bytes) of all devices k where N £ 512 bytes and k £ 512 = physical transmission speed of INTERBUS Mbps (0.5, 2) Mbps = software runtime (ms) of the master (0.7 ms, typical, depending on implementation) = Layer 2 payload of the ith device (bytes) where 1 £ i £ k, where k £ 512 = reserved MAC payload (bytes) for the IP channel of a device (M = 8, typical) = throughput (kbps) for IP data telegrams
16-14
The Industrial Communication Technology Handbook
80
14
70
IP throughput
60
TIB (ms)
10
50
8
TIB@2 Mbps
6
40 30
4
20
2
10
0 0
100 typical
200
300
400
IP_Th (kbps)
12
0 500
N (bytes)
FIGURE 16.14 Performance data.
In Figure 16.14, the curve for the INTERBUS cycle time at the MAC level and the IP throughput for a baud rate of 2 Mbps is illustrated as the function of payload size N. For a medium-size configuration of N = 125 bytes, an IP throughput of 36 kbps at an INTERBUS cycle time of approximately 1.6 ms is achieved. This roughly corresponds to the quality of an analog modem for Internet access. For smaller configurations, even Integrated Services Digital Network (ISDN) quality can be achieved. The calculated values could be confirmed in a practical application. In this configuration it should be noted that, in addition to the dedicated process data devices, several IPcompatible devices can be operated at the same time with this throughput.
16.10 Conclusions The open INTERBUS fieldbus system for modern automation seamlessly connects all the I/O and field devices commonly used in control systems. The serial bus cable can be used to network sensors and actuators, to control machine and system parts, to network production cells, and to connect higher-level systems such as control rooms. After a comprehensive introduction to INTERBUS, IP tunneling for fieldbuses was described for improved connectivity at the field level. An essential requirement is that time-critical process data transport is not affected. The integration of the Internet Protocol in INTERBUS creates a seamless communication platform to enable the use of new IP-based applications, which can make it considerably easier to engineer distributed automation solutions. Analysis has shown that the IP throughput can be as high as ISDN quality.
References [1] IEC 61158: Digital Data Communication for Measurement and Control: Fieldbus for Use in Industrial Control Systems, IEC 2001. [2] Postel, J., RFC-793: Transmission Control Protocol (TCP), USC/Information Science Institute, CA, 1981. [3] Postel, J., RFC-791: Internet Protocol (IP), USC/Information Science Institute, CA, 1981. [4] Feld, J., Realtime communication in PROFINET V2 and V3 designed for industrial purposes, in 5th IFAC International Conference on Fieldbus Systems and Their Applications (FeT 2003), Aveiro, Portugal, July 2003. [5] Phoenix Contact INTERBUS Slave Implementation Guide: Communication Software PCP, 1997, www.interbusclub.com/itc/sc_down/5193a.pdf. [6] DIN EN 954-1: Safety of Machinery: Safety-Related Parts of Control Systems: Part 1: General Principles for Design, 1997; German version, 1996.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-15
[7] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems - General requirements, 1999. [8] INTERBUS Protocol Chip SUPI 3, http://www.interbusclub.com/itc/sc_down/supi3_op_e.pdf, 2004. [9] Field Device Configuration Markup Language (FDCML), www.fdcml.org, 2004.
© 2005 by CRC Press
17 Michael Scholles Fraunhofer Institute of Photonic Microsystems
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
Uwe Schelinski Fraunhofer Institute of Photonic Microsystems
Petra Nauber Fraunhofer Institute of Photonic Microsystems
Klaus Frommhagen Fraunhofer Institute of Photonic Microsystems
17.1 Introduction ......................................................................17-1 17.2 IEEE 1394 Basics ...............................................................17-2 17.3 IEEE 1394 System Design.................................................17-4 17.4 Industrial Applications of IEEE 1394 ..............................17-6 17.5 IEEE 1394 Automation Protocol......................................17-9 17.6 Summary..........................................................................17-11 References ...................................................................................17-12
17.1 Introduction Almost every modern distributed technical system requires some kind of digital communication infrastructure with high bandwidth, no matter if it concerns consumer electronics or industrial applications. On one hand, computers and devices for acquisition and reproduction of digital images and sound converge to so-called multimedia systems. On the other hand, industrial control systems incorporate multiple sensors like cameras for optical quality inspection, while control and status information, normally distributed via fieldbuses, run along the same cable as mass data. All these systems have one common characteristic: they require an efficient peer-to-peer data communication mechanism, since they do not inevitably include a computer that can act as a data hub. Even if such a central node exists, sending data from one node to the computer and then forwarding it to another external device is often not the optimum communication pattern. Modern consumer electronics and communication systems for industrial and factory automation applications have some more features and requirements in common: High bandwidth: An industrial black-and-white camera with VGA resolution of 640 ¥ 480 pixels, 12bit color depth per pixel, and a frame rate of 25 Hz, which is often used for automated optical quality inspection, produces a data rate of almost 100 Mbit/s, which can easily be increased by higher geometrical resolution and use of multicamera systems. Real-time streaming: If the communication system supports a special type of data transfer that ensures bandwidth for real-time streaming of data or a well-defined latency for message transmission, definition of higher-layer protocols is significantly simplified. Standard Ethernet does not fulfill this requirement, which makes it difficult to implement hard real-time applications using this bus technology.
17-1 © 2005 by CRC Press
17-2
The Industrial Communication Technology Handbook
One single cabling: Mass data as well as status and control information shall be exchanged via the same cable that is used for all purposes within the system. In order to reduce the costs for cabling and connectors, a serial data transmission is preferred. Easy reconfiguration: While plug-and-play capability is self-evident for consumer electronics, this feature can also be very helpful in industrial environments, like a complex measurement system with lots of different signal acquisition and processing devices. If the user just plugs them together without any elaborate setup, setup time is significantly reduced. This requirement also includes that no specific network topology is enforced. All these requirements are fulfilled by the bus as defined in IEEE 1394 for a high-performance serial bus [1–4], with commercial implementations known as FireWire and i.LINK.* The current version of the standard is made up of three documents: The original standard, IEEE 1394–1995 [1], and the first amendment, IEEE 1394a–2000 [2], are the basis for almost all currently commercially available IEEE 1394 devices. The first amendment remedies flaws in the original standard but provides no new features for the user, whereas the second amendment, IEEE 1394b–2002 [4], completely replaces the lowest, physical layer of the IEEE 1394 protocol stack, necessary for extending the maximum speed and adding new media for data transmission. Because of its features listed above, this bus standard originally used for consumer electronics and computer peripherals becomes more and more popular for industrial and factory automation applications. The following sections describe the fundamental facts of IEEE 1394, a reference design for industrial IEEE 1394 nodes, and some real-world applications for industrial environments. Also, aspects of new media for data transmission, including optical fibers, will be covered.
17.2 IEEE 1394 Basics IEEE 1394 is a serial bus connecting nodes with current data rates up to 800 Mbit/s that can be mixed with older devices running at data rates of 100, 200, and 400 Mbit/s. The standard supports two completely different types of data transfer: 1. Asynchronous transfers are rather short messages that are mainly used for control and setup purposes. Their exchange is controlled by a request-and-response scheme that guarantees a data delivery for read or write operations and generates well-defined error codes. This type of communication is used if reliability is more important than timing, as it cannot be exactly determined at which time an asynchronous request of the application is actually sent to the connected node. 2. Isochronous channels are used for mass data that require a fixed, guaranteed bandwidth. The streaming data are divided into packets that are sent every 125 ms. This 8-kHz clock is distributed across the network by a special packet called cycle start packet. Typical examples of isochronous transfers are real-time video streams where late data are useless. The reception of the data is not secured; the sender does not even know if any other node is listening to the data. These two transfer types use a common physical and link layer of the IEEE 1394 protocol stack, as depicted in Figure 17.1. The link layer provides addressing, data checking, and data framing for packet transmission and reception, whereas the physical layer transforms the logical symbols used by the link layer into electrical signals, which includes bus arbitration in case several nodes want to send data at the same time. The isochronous data are directly fed into the link layer; for asynchronous data there is an additional transaction layer as a standard interface for the application on top of the protocol stack. Its main task is to implement the secure protocol using requests and responses for asynchronous transactions. All three layers exchange data with the so-called serial bus management that incorporates some special functions for managing the isochronous bandwidth and optimizing the efficiency of the bus. Normally, the physical and link layers are realized by dedicated chip sets, whereas the transaction layer and serial *FireWire is a trademark of Apple Computer, Inc., and the 1394 Trade Association. i.LINK is a trademark of Sony Corporation.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-3
FIGURE 17.1 Layered protocol architecture of IEEE 1394. TABLE 17.1 Supported Media by IEEE 1394
bus management are implemented in firmware. Serial bus management is optional and can normally be omitted for design of embedded nodes if a host computer with a common operating system is part of the network. But it is necessary that one node with serial bus management capabilities exists in the network; otherwise, no isochronous operation is possible. In IEEE 1394–1995 and IEEE 1394a–2000 the physical media used for IEEE 1394 were restricted to special shielded twisted-pair (STP) copper cables and IEEE 1394-specific sockets and connectors. For signaling and data exchange between two nodes, a pair of differential signals is used with additional information coded in the direct current (DC) voltage level. A modified version of the STP cables still exists in IEEE 1393b, but a true differential signaling is now used, and more important, a number of other media have been added. A matrix with all supported speeds and reach for different kinds of media is shown in Table 17.1. IEEE 1394b offers two promising options for IEEE 1394 in industrial applications. On one hand, existing CAT5 cabling can be used if a data rate of 100 Mbit/s is sufficient, e.g., if only short status and control messages have to be exchanged. On the other hand, optical media combine the advantages of high data rates with long distance and solve almost all electromagnetic interference (EMI) problems. If the special IEEE 1394 STP cables —for both IEEE 1394a and IEEE 1394b — are used, not only data but also power can be provided. A maximum current of 1.5 A at a typical voltage of 12 V is sufficient for a lot of applications, so that a separate power supply can be omitted for most nodes. IEEE 1394b only affects the physical layer: existing applications that use the original IEEE 1394–1995 standard and the first amendment IEEE 1394a–2000 can easily migrate to IEEE 1394b by just replacing the physical layer; no change of software is required.
© 2005 by CRC Press
17-4
The Industrial Communication Technology Handbook
All data are transmitted as packets consisting of a header that describes the type and destination address of the packets and payload. The receiver can check the integrity by means of cyclic redundancy check (CRC) values that are also included. The IEEE 1394 standard only specifies how packets are transmitted from one node to another; i.e., it covers only the lower layers of the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) architecture. The definition of common interpretation of the payload is left to application-specific transport protocols. Currently, more than 60 additional specifications exist, most of them for audio and video consumer electronics. Only a few of them are actually of interest for industrial applications: 1. Serial Bus Protocol 2 (SBP-2) [5] defines a generic method for asynchronous data exchange between two nodes. Principally, it encapsulates an arbitrary command set, but is mainly used for the implementation of the SCSI-2 protocol via IEEE 1394. SBP-2 is a very versatile protocol but has significant administrative overhead, so that it is not well suited for extremely high performance applications. This limitation is overcome by a new version of this standard, called SBP-3, that significantly reduces the amount of overhead and also allows isochronous transport of data. The well-known Apple iPod uses SBP-3 for music download. 2. The industrial and instrumentation digital camera (IIDC) 1394-based digital camera specification [6] (DCAM for short) describes how industrial cameras delivering uncompressed video data are accessed by other components as well as the data format of the image data. The main applications are industrial inspection systems for quality control or the like. 3. The Instrument and Industrial Control Protocol (IICP) specification [7] defines how to implement the IEEE 488 protocol on IEEE 1394. The situation concerning IEEE 1394 for high-performance industrial control systems is somewhat complicated. Of course, IEEE 1394 is already used for this application area, but existing solutions are based on proprietary protocols. In the meantime, leading European companies selling products for factory automation, motion control, and the like, as well as research institutes working on IEEE 1394, factory automation, and production technologies, have formed the 1394 Automation Association. Their main task was to develop and maintain a standard called 1394AP (1394 Automation Protocol) that allows subsystems from different vendors to communicate with each other via IEEE 1394. Additionally, DCAMcompliant commands and video streams can be embedded into the 1394AP packets. More details and issues concerning the implementation of 1394AP are described in Section 17.5. However, the exact specification of 1394AP will only be accessible to members of the 1394 Automation Association until first products using 1394AP are out on the market. Afterwards, the responsibility for the protocol will be handed over to the 1394 Trade Association. This process will happen in January 2005.
17.3 IEEE 1394 System Design This section describes a generic architecture of an embedded IEEE 1394 device, i.e., everything that is not a conventional computer. As stated above, only the lower layers of the IEEE 1394 protocol are realized by dedicated hardware; the upper layers are firmware. Therefore, every IEEE 1394 node must contain some kind of processor that executes the higher layers. This leads to the system architecture as depicted in Figure 17.2. Here, an IEEE 1394a solution is still described. However, if IEEE 1394b is necessary either for speed or because of the necessity of alternative media, only a different hardware for link and physical layers has to be chosen; the overall architecture remains the same. For the physical layer, any standard PHY chip can be used. A three-port PHY gives the most flexibility for the network topology. IEEE 1394a forbids closed loops with the system, which results in a treelike network topology, whereas IEEE 1394b automatically breaks up loops but also favors treelike networks. A variety of link layer controllers (LLCs) from different manufacturers exists. For industrial applications, the TSB12LV32 GP2Lynx by Texas Instruments is well suited. It provides a memory-mapped interface to a microcontroller for asynchronous data and setup of the IEEE 1394 data transmission, as well as a 16-bit high-speed data port that can send or receive data at the full speed of 400 Mbit/s. Unfortunately,
© 2005 by CRC Press
17-5
Embedded 1394.a Chip Set
Interface FPGA
DPRAM or FIFO
IEEE1394 Bus
Isochronous data
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
3-Port PHY
LLC
Data/Address Bus
5V 16 Bit µC w. Timers
Asynchronous Data
RAM
FLASH Memory
Programmed Bit I/O
3.3V
Power Unit
Self Power Plug
FIGURE 17.2 Hardware architecture of embedded IEEE 1394 device.
until now (November 2004) no embedded IEEE 1394b LLC capable of handling 800 Mbit/s and dedicated to isochronous transmission is available. Oxford Semiconductor produces a chip called OXUF922 that supports 800 Mbit/s but is mainly intended for mass storage devices like hard drives, as it includes some hardware support for SBP-2 but not for isochronous streaming. One solution might be to use the TSB82AA2 LLC of Texas Instruments, which is IEEE 1394b compliant and has a peripheral component interconnect (PCI) bus interface. If just one application-specific hardware is connected to the PCI interface, this can be treated as a general-purpose 32-bit-wide interface. The complexity of PCI only arises if it is really used as a bus accessed by numerous components. An important issue, especially for industrial applications not using the IEEE 1394b standard, is the electrical interface between PHY and LLC. Since in IEEE 1394a DC voltage levels communicate some information between the PHYs, all PHYs are directly connected with each other. If the nodes are powered from an external supply, they might operate at different ground levels, which may produce a current flow on the IEEE 1394 cables. In order to prevent a faulty operation of the bus or even damage of the nodes, a galvanic isolation between the LLC and the PHY is strongly recommended. Normally, if the chips include some bus holder circuitry, a capacitor inserted in the data and control lines is sufficient. Of course, the problem of additional galvanic isolation vanishes if IEEE 1394b and optical media are used. Two different kinds of optical media are supported. One solution is plastic optical fibers (POFs), which are relatively easy to install. However, commercially available LED-based optical transceivers that are necessary for inexpensive designs are limited to 200 Mbit/s data rates. For high-performance applications, glass optical fibers (GOFs) have to be used. In this case, Gigabit Ethernet transceivers can be used for IEEE 1394b systems. A crucial point is the choice of the processor for the IEEE 1394 node. Usually, the data handling is split: the processor executes the IEEE 1394 stack, cares about the asynchronous data, and controls additional low-bandwidth hardware via programmed bit I/O, whereas the isochronous mass data are processed by some application-specific hardware. In this case, a 16-bit microcontroller like the SABC161PI by Infineon gives sufficient computing power. Another processor that becomes very popular for embedded IEEE 1394 devices is the ARM7 type. Of course, the necessary amount of memory depends on the application, for the IEEE 1394 stack 64 KByte of both RAM and ROM are sufficient. The use of a field programmable gate array (FPGA) for the hardware processing of mass data provides flexibility, so that the same design can be used for a broad range of applications. Some buffering of data is required at the high-speed port of the LLC in order to transform the streaming data into isochronous packets or vice versa. Some LLCs have a sufficient amount of internal first in first out (FIFO) memory; others, like the GP2Lynx, require an external buffer realized by a dual-ported RAM or a FIFO. Another solution is to implement the FIFO inside the FPGA. The architecture of the software implementation of the protocol stack (Figure 17.3) represents the layer structure as defined in the standard. However, two additional layers are inserted that both provide a universal programming interface. The embedded application closely interacts with the serial bus management, transaction layer, and link layer. In order to ease the coding of the application for the user, a common Application Programming Interface (API) must be provided. API calls include routines for initialization of the bus, the basic asynchronous transactions (read, write, and lock), setup of isochronous transfers, and inquiry of information on the status of the bus and the local node as well as callback
© 2005 by CRC Press
17-6
The Industrial Communication Technology Handbook
Hardware Abstraction Layer Link Layer Physical Layer
Hardware
Serial Bus Mgmt.
asynch data
Transaction Layer
isoch. data
Application Programming Interface
Firmware
Application
FIGURE 17.3 Embedded IEEE 1394 protocol stack.
functions that are automatically executed in case of external bus events. The latter generate responses to incoming requests without explicit programming in the application. However, the API, transaction layer, and serial bus management shall be independent from the link layer controller in order to be usable for different embedded systems. Therefore, an additional hardware abstraction layer (HAL) is used that transforms services requested to the link layer into corresponding hardware accesses. No embedded realtime operating system is required; time-critical tasks are scheduled via timers of the microcontroller. For proper operation of IEEE 1394 in industrial environments, a number of requirements must be fulfilled. First of all, it is important to follow the rules of the PHY manufacturers for PCB design, which include a short distance between PHY and LLC, etch traces that match the line impedance of the cable, and avoidance of vias between the PHY and the IEEE 1394 connectors. Second, galvanic isolation is selfevident for IEEE 1394a, as described above. Third, EMI protection of the power supply must be applied. Fourth, shielding and grounding have to be accurate. If these rules are considered, a stable operation of IEEE 1394 nodes can be guaranteed even under harsh industrial conditions. Some examples are described in the following sections.
17.4 Industrial Applications of IEEE 1394 Since the most distinguished feature of IEEE 1394 is its high bandwidth — maximum 50 MByte/s for IEEE 1394a and 100 MByte/s for IEEE 1394b currently — it is mainly used in industrial systems that comprise some kind of image sensors. They produce a huge amount of data that must be transmitted to an image processing system in real time. A typical example is an automated optical quality control of goods. Therefore, the following examples refer to this kind of application. All of them still use IEEE 1394a. The application of IEEE 1394 as fieldbus replacement will be covered in a separate section. The first example is an industrial camera that contains a CMOS (complementary metal oxide semiconductor) image sensor. By using special readout electronics, a dynamic range of 120 dB per pixel is achieved, which allows the capturing of images having both extremely bright and dark regions, like the welding scene in Figure 17.4. The comparison with an equivalent CCD image (also in Figure 17.4) shows that no artifacts like blooming or smearing occur for the CMOS image. Main applications of this camera will be optical inspection of industrial production, like the welding scene in Figure 17.4, but also automotive application, like driver assistance. Here, intensive sun lighting and shadows produce images with enormous contrast that cannot be acquired with conventional charge coupled device (CCD) cameras. The IEEE 1394 interface of this camera uses the hardware and software described in the previous section. The camera operates in accordance to the DCAM standard. Since there are already a number of interface cards, image processing systems, and software tools available that support the DCAM standard, this camera can be used for a broad range of industrial inspection systems. Since another application is driver assistance in passenger cars, the special requirements of automotive electronics must
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-7
FIGURE 17.4 Welding scene acquired with conventional CCD camera (left image) and advanced CMOS camera (right image).
be fulfilled. Special IEEE 1394 latching connectors and chip sets for extended temperature ranges are commercially available. A second application is a so-called data logger for a machine that measures the outer contours of rotation symmetrical workpieces like crankshafts. The piece is placed between an infrared (IR) lighting unit and some CCD line cameras, whose number (maximum of 10) depends on the diameter of the workpiece. Both the lighting and the cameras are moved along the axis of the workpiece in a synchronous manner, as shown in Figure 17.5. The plug-and-play capability of IEEE 1394 allows an easy reconfiguration of the system for different types of workpieces. Another reason for IEEE 1394 in this application is bandwidth: each camera has 2048 pixels with 12-bit color depth and operates at a maximum pixel clock of 20 MHz. The resulting data rate exceeds the bandwidth of IEEE 1394a already for one camera so that areas outside the region of interest are clipped using a lookup table and run length encoded by hardware. This task together with the transformation of pixel and additional mechanical data into IEEE 1394 packets is performed by circuitry implemented in the FPGA. The packets are written into the FIFO and transmitted via the IEEE 1394 subsystem shown in Figure 17.2. Concerning the type of transaction, a trade-off must be made between asynchronous packets that guarantee a secure transmission at the cost of bandwidth for response packets and isochronous transfers that lead to high bus throughput but with the lack of feedback upon success or failure. For reasons of security, asynchronous transfers in combination with a proprietary transport protocol have been chosen. However, first experiments have shown that the data logger is able to generate data up to 30 MByte/s, but the connected PC that performs the image analysis cannot handle such an amount of data over long period. Therefore, it is likely that a second version will use isochronous channels for data transmission. Prototype systems of the data logger have been realized (Figure 17.6) and successfully tested. Due to constructive reasons, only a 6-inch-square printed circuit board (PCB) area can be used per camera. The complete system has passed intensive EMC/ESD (electromagnetic compatibility/electrostatic discharge) testing, since all rules mentioned in the previous section have been strictly followed. The machine operates in the direct neighborhood of machine tools, but no faulty operation of the IEEE 1394a bus has been observed so far. The third device is a so-called color photo scanner for extremely high-resolution image acquisition. It is used for applications in which short acquisition time is not the most important issue, but a precise imaging of fine details, like in optical inspection of printed circuit boards. The design goal has been a geometrical resolution of more than 8000 pixels for the shorter edge of the image area and a color depth
© 2005 by CRC Press
17-8
The Industrial Communication Technology Handbook
IR Lighting Unit
Work Piece (e.g. Shaft) CCD Line Camera Camera Position
Movement Directions White Level
Black Level
Camera Output
FIGURE 17.5 Working principle of optical measurement system.
FIGURE 17.6 Module for data compression and transmission used in optical measurement machine.
of 12 bits per elementary color. In order to avoid the enormous costs of a corresponding CCD area sensor, a mixed mechanical-optical principle has been chosen. An arbitrary three-dimensional scene is projected onto a focusing screen via a lens, and the resulting two-dimensional image is scanned with a CCD line sensor that is moved by a step motor. This results in an image size of 8,192 ¥ 12,000 pixels per image. In order to reduce the scanning time to the physically minimum duration — integration time per line ¥ number of lines — two hardware tasks must operate in parallel: one must control the CCD sensor, motor, and analog and digital image processing circuitry, and the other the transmission of data via IEEE 1394. The architecture of the color photo scanner (see photo of the final product in Figure 17.7) is derived from the reference design described in the last section. The two hardware tasks are implemented inside the FPGA. Instead of the FIFO, a fast SRAM (static random access memory) organized as a cyclic buffer is used as an interface between these two tasks. The design makes use of the fact that the high-speed port of the LLC can also handle asynchronous packets at high speed, so that the controller does not care about the mass data. It only executes the IEEE 1394 protocol stack and an implementation of the SBP-2 protocol that is used for control and setup of the scanner. The microcontroller provides enough computing power for that purpose. The device that has become a commercial product is able to capture a complete image within 90 s, which can only be achieved by hardware support of the FPGA. All necessary parts fit into the modified housing of a single lens reflex (SLR) camera.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-9
FIGURE 17.7 Color photo scanner.
17.5 IEEE 1394 Automation Protocol The previous section describes some industrial applications of IEEE 1394 but leaves real industrial communication networks out of consideration. This application field has been dominated by proprietary solutions in the past. Only recently has the situation changed with the passing of the 1394 Automation Protocol. Its theoretical principles are defined in [8]. In industrial automation, a trend toward decentralized systems can be observed over the last few years. In conjunction with the growing capabilities of embedded hardware, most of the system functionality is transferred from the central control unit to distributed control units. A typical example is motion control: the system consists of several input/output modules and decentralized control units to drive the axes. One control node is designated as the host control. This node is responsible for maintaining the network, which includes start-up and parameterization, control and supervision of all system components, and network management. Most of the time, the host control is implemented on an industrial computer, but this task may be assigned to any one of the control units. Several buses have been established in the industrial control in order to connect the decentralized modules, like Ethernet and its derivates, Profibus, CANBus, SerCos, and others. All these technologies lack one important feature: there is no method for cyclic, deterministic distribution of control information with guaranteed delivery at fixed intervals, which is a prerequisite for industrial, especially motion control systems. However, this type of information exchange is one of the characteristics of IEEE 1394: its isochronous mode for data transmission. Therefore, a number of companies selling products for factory automation have integrated IEEE 1394 in their equipment, but are still using proprietary protocols, so that interoperability is not ensured. This limitation shall be overcome by 1394AP. Its specification as well as the development of a reference implementation has been initiated by the European 1394 Automation Association, which consists of more than a dozen factory automation companies and research institutes. IEEE 1394 matches the requirements on communication in industrial control systems: • The communication network has to be fast, robust, and inexpensive. • IEEE 1394 supports distributed architecture like the tree structure. Isochronous and asynchronous transmission schemes permit unrestricted communication between the system components. • IEEE 1394 itself already provides the cycle clock synchronization, which is necessary in order to control specific data exchange. • Low jitter of IEEE 1394 is one reason this bus is attractive for industrial control. • The variety of IEEE 1394 devices guarantees independence from a single supplier and its price policy. Therefore, IEEE 1394 is an almost ideal choice for industrial communication networks. However, specific features like network topology, application-specific control and status registers, packet payload, network management, and message synchronization have to be defined, which makes up 1394AP. Because
© 2005 by CRC Press
17-10
The Industrial Communication Technology Handbook
of the special requirements of industrial communication, 1394AP differs from the usual properties of an IEEE 1394-based transport protocol. One node of the network acts as the so-called application master. Most of the time it will be the control PC of the network, but any other node with sufficient computing power may take this function. The capability of becoming application master is stated by special entries in the configuration ROM of the 1394AP node. Since the application master is responsible for synchronization of all nodes and the usual IEEE 1394 cycle start packets are used for this purpose, the root of the treelike network shall become the application master. All nonroot nodes are called slaves. The main task of the application master is the cyclic transfer of input data to the slaves. This information is summarized in the master data telegram (MDT), which is the payload of an IEEE 1394 packet. The update rate of the control variables can be adjusted via the 1394AP-specific application cycle. Its length is based on the requirements of the application. For high-speed applications, the application cycle is identical with the standard IEEE 1394 isochronous cycle of 125 ms. In this case, the MDT is the payloads of isochronous packets, which are sent immediately after the cycle start packet. In case of reduced performance requirements, the application cycle is spread over several isochronous cycles. Then, for the MDT, both isochronous and asynchronous packets can be used. In the first case, each packet only carries part of the complete MDT; in the second case, asynchronous broadcast write requests are used. The MDT of 1394AP only defines a data structure for transmission of application-specific variables; it does not specify the meaning of the variables. This is left to the application, which gives flexibility for the use of 1394AP. While the slaves receive the MDT and extract the data that are necessary for proper operation, they output their data via the device data telegrams (DDTs). The DDTs transfer the output data both to the master and to the other slaves, so that a real peer-to-peer data transfer is ensured. The remarks on the application cycle for the MDTs are also valid for DDTs. Each node provides network management services, which include node activation, suspension, configuration, initialization, and reset, as well as error handling. The network master — identical with the application master — controls changes in the state of the nodes. One of the main reasons why IEEE 1394 has been chosen as a control bus for real-time systems is its deterministic timing via isochronous data transfer. The clocks running in each individual node are synchronized every 125 ms by the cycle start packets. Data to be sent as either MDT or DDT are held in local buffers and marked with a time stamp. The time information contained in the cycle start packets is used as a trigger that releases the data for transmission. One problem to be overcome is if isochronous and asynchronous data transfers are mixed. After a cycle start, first all isochronous packets are sent; the remaining part of the cycle is used for asynchronous data. However, IEEE 1394 does not check if the transfer of one asynchronous packet can be terminated within 125 ms since the last cycle start. If not, the next cycle start packet will be delayed. For 1394AP, this results in nondeterministic timing, which cannot be accepted. Therefore, some precalculation of the overall bandwidth has to be carried out, which prevents the delay by managing the limited asynchronous resources between all nodes of the network. Isochronous transport is preferred in any case. The MDTs and DDTs define the software interface of 1394AP to the application. In order to ease the migration from other bus solutions to 1394AP without having to rewrite major parts of the application software, well-known communication profiles will be added to 1394AP. As a first example, the CANopen Communication Profile [9] is taken into consideration and implemented in 1394AP. For first experiments, a communication node that supports features of a preliminary version of the 1394AP protocol by hardware has been designed. It can only be used as a slave node. Principally, it again uses the system design depicted in Section 17.3, but comprises a different link layer controller because of the special communication pattern defined in 1394AP. The GP2Lynx is only capable of either sending or receiving isochronous data, but not both simultaneously, as required for MDTs and DDTs. Therefore, the Texas Instruments TSB42AB4 CeLynx device is used in this design. Originally targeted to consumer electronics, it is also well suited for industrial applications and supports full-duplex isochronous streaming.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-11
MC 16Bit Port
FIGURE 17.8 Architecture of node for synchronous industrial communication via 1394AP.
FIGURE 17.9 Node for synchronous industrial communication via IEEE 1394.
The resulting schematic of the node is shown in Figure 17.8. The FPGA filters the data of incoming MDTs and DDTs, so that the microcontroller of the type Infineon C167CS only has to handle relevant data. Outgoing DDTs are sent as asynchronous packets and directly written into the link layer controller by the microcontroller. Figure 17.9 shows the industrial communication node. It is a stacked design using two PCBs; therefore, part of the circuitry is not visible in Figure 17.9. Experiments with this embedded system have shown that 1394AP is suited for industrial real-time communications, especially for motioncontrol systems.
17.6 Summary The four examples in the last two sections show that IEEE 1394 is an emergent and well-suited technology for industrial applications. However, some minor problems exist: 1. Currently, almost all IEEE 1394 devices use IEEE 1394a, which limits the distance between two nodes to 4.5 m, as defined by the standard, while high-quality cables allow up to 10 m. But for large industrial systems this still might not be sufficient. 2. The DC coupling of physical layers in IEEE 1394a implies the danger of disturbance problems, especially in harsh environments. 3. The number of devices on one bus is limited to 63, which may not be enough for certain applications. IEEE 1394 bridges that can connect up to 1023 buses are currently in the specification phase, but it will take some time, since first devices are available. 4. If IEEE 1394 is used inside machines with little space left for cabling, the conventional cables and connectors are not optimal. Smaller IEEE 1394 connections for short distances, like, for instance, PCB stacks or flexible printed tapes, are desirable.
© 2005 by CRC Press
17-12
The Industrial Communication Technology Handbook
The first two items will vanish with the widespread use of the IEEE 1394b amendment; the third one is tackled by the upcoming IEEE 1394.1 specification. Unfortunately, the last item remains a field of design experiments, because a standardized solution does not exist at the moment. These items will not prevent the use of IEEE 1394 for industrial and factory automation applications. The basic technology consisting of IEEE 1394 chip sets, hardware reference designs, and software protocol stacks are already available and first commercial applications have been realized. Therefore, it is very likely that a large number of industrial systems will incorporate IEEE 1394 for data transmission in the near future. The most important limitations of the original standard — DC coupling of nodes and limited reach of 4.5 m between two nodes via copper cables — have been overcome by the new IEEE 1394b amendment. In summary, IEEE 1394 is already more than an emergent technology nowadays, since it will be widely used in industrial applications very soon.
References [1] IEEE, IEEE Standard for a High-Performance Serial Bus 1394–1995, IEEE Press, 1996. [2] IEEE, IEEE Standard for a High-Performance Serial Bus Amendment 1 1394a–2000, IEEE Press, New York, 2000. [3] D. Anderson, Firewire System Architecture, 2nd edition, Addison-Wesley, Reading, MA, 1999. [4] IEEE, IEEE Standard for a High-Performance Serial Bus Amendment 2 1394b–2002, IEEE Press, New York, 2002. [5] Serial Bus Protocol 2 (SBP-2), ANSI Standard NCITS 325-1998. [6] IIDC 1394-Based Digital Camera Specification, Version 1.30, available from 1394 Trade Association (see http://www.1394ta.org/Technology/Specifications/specifications.htm). [7] 1394TA IICP Specification for the Instrument and Industrial Control Protocol, Version 1.00, available from 1394 Trade Association (see http://www.1394ta.org/Technology/Specifications/ specifications.htm). [8] G. Beckmann, Ein Hochgeschwindigkeits-Kommunikationssystem für die industrielle Automation, dissertation, Technical University of Braunschweig, 2001 (in German). [9] CANopen Communication Profile for Industrial Systems, CiA Draft Standard 301, Revision 3.0, available from CAN in Automation e.V.
© 2005 by CRC Press
18 Configuration and Management of Fieldbus Systems 18.1 Introduction ......................................................................18-1 18.2 Concepts and Terms..........................................................18-2 Configuration vs. Management • Smart Devices • Plug and Play vs. Plug and Participate • State
18.3 Requirements on Configuration and Management........18-3 18.4 Interface Separation ..........................................................18-4 The Interface File System Approach
18.5 Profiles, Data Sheets, and Descriptions ...........................18-6 Profiles • Electronic Data Sheets
18.6 Application Development...............................................18-10 18.7 Configuration Interfaces .................................................18-13 Hardware Configuration • Plug and Participate • Application Download
Stefan Pitzek Vienna University of Technology
Wilfried Elmenreich Vienna University of Technology
18.8 Management Interfaces...................................................18-15 Monitoring and Diagnosis • Calibration
18.9 Maintenance in Fieldbus Systems ..................................18-17 18.10 Conclusion.......................................................................18-18 References ...................................................................................18-18
18.1 Introduction Fieldbus systems are often evaluated by their technical merits, like performance, efficiency, and suitability for a particular application. Being designed to perform control applications, most industrial communication networks are well capable of performing their respective application tasks. Besides these ostensible criteria, however, there are some other capabilities a fieldbus system must provide, which in some cases might actually have a greater influence on the usability of a particular system than the technical ability to fulfill the given control requirements. These capabilities deal with the configuration and management, i.e., the setup, configuration, monitoring, and maintenance of the fieldbus system. Powell [32] describes the problematic situation in the past: “Fifteen years ago, a typical process automation plant consisted of various field devices from half a dozen of vendors. Each device had its own setup program with different syntax for the same semantics. The data from the devices often differed in the data formats and the routines to interface each device.” Since that time, a lot of concepts and methods have been devised in order to support these configuration and management tasks. Many of the concepts have been implemented in fieldbus technologies such as HART (highway addressable remote transducer), Profibus, Foundation Fieldbus, LON (local operating
18-1 © 2005 by CRC Press
18-2
The Industrial Communication Technology Handbook
network), etc. It is the objective of this chapter to give an introduction to the state-of-the-art concepts and methods for the configuration and management of fieldbus systems. The remainder of the chapter is organized as follows: Section 18.2 gives definitions of the concepts and terms in the context of configuration and management of fieldbus systems. Section 18.3 investigates the requirements for configuration and management tasks. Section 18.4 analyzes the necessary interfaces of a field device and proposes a meaningful distinction of interface types. Section 18.5 discusses profiles and other representation mechanisms for system properties in several fieldbus systems. Section 18.6 gives an overview of application development methods and their implications for configuration and management of fieldbus networks. Section 18.7 examines the initial setup of a system in terms of hardware and software configuration. Section 18.8 deals with approaches for the management of fieldbus systems, like application download, diagnosis, and calibration of devices. Section 18.9 presents maintenance methods for reconfiguration, repair, and reintegration of fieldbus devices.
18.2 Concepts and Terms The purpose of this section is to introduce and define some important concepts and terms that are used throughout this chapter.
18.2.1 Configuration vs. Management The term configuration is used for a wide range of actions. Part of the configuration deals with setting up the hardware infrastructure of a fieldbus network and its nodes, i.e., physically connecting nodes (cabling) and configuring (e.g., by using switches, jumpers) nodes in a network. On the other hand, configuration also involves setting up the network on the logical (i.e., software) level. Depending on the network topology and underlying communication paradigm (and other design decisions), this leads to very different approaches to how configuration mechanisms are implemented. In contrast, management deals with handling an already built system and includes maintenance, diagnosis, monitoring, and debugging. As with configuration, different fieldbus systems can greatly differ in their support and capabilities for these areas. Often configuration and management are difficult to separate since procedures such as plug and play (see Section 18.2.3) involve configuration as well as management tasks.
18.2.2 Smart Devices The term smart or intelligent device was first used in this context by Ko and Fung [21], meaning a sensor or actuator device that is equipped with a network interface in order to support an easy integration into a distributed control application. In the context of fieldbus systems, a smart device supports its configuration and management by providing its data via a well-defined network interface [23] or offering a self-description of its features. The description usually comes in a machine-readable form (e.g., as an Extensible Markup Language (XML) description) that resides either locally at the fieldbus device (e.g., IEEE 1451.2 [17]) or at a higher network level being referenced by a series number (e.g., OMG Smart Transducer Interface [26]).
18.2.3 Plug and Play vs. Plug and Participate Plug and play describes a feature for the automatic integration of a newly connected device into a system without user intervention. While this feature works well for personal computers within an office environment, it is quite difficult to achieve this behavior for automation systems, since without user intervention the system would not be able to guess what sensor data should be used and what actuator should be instrumented by a given device. Therefore, in the automation domain the more correct term plug and participate should be used, describing the initial configuration and integration of a new device that can be automated. For example, after connecting a new sensor to a network, it could be automatically detected,
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-3
given a local name, and assigned to a communication slot. The task of a human system integrator is then reduced to decide on the further processing and usage of the sensor data.
18.2.4 State Zadeh states that the “notion of state of a system at any given time is the information needed to determine the behavior of the system from that time on” [40, p. 3]. In real-time computer systems, we distinguish between the initialization state (i-state) and the history state (h-state) [22]. The i-state encompasses the static data structure of the computer system, i.e., data that are usually located in the static (read-only) memory of the system. The i-state does not change during the execution of a given application, e.g., calibration data of a fieldbus node. The h-state is the “dynamic data structure … that undergoes change as the computation progresses” [22, p. 91]. An example for an h-state is the cached results of a sequence of measurements that are used to calculate the current state of a process variable. The size of the h-state at a given level of abstraction may vary during execution. A good system design will aim at having a ground state, i.e., when the size of the h-state becomes zero. In a distributed system, this usually requires that no task is active and no messages are in transit.
18.3 Requirements on Configuration and Management The requirements on a configuration and management framework are driven by several factors. We have identified the following points: • (Semi)automatic configuration: The requirement for a plug-and-play-like configuration can be justified by three arguments: 1. An automatic or semiautomatic configuration saves time and therefore leads to better maintainability and lower costs. 2. The necessary qualification of the person who sets up the system may be lower if the overall system is easier to configure. 3. The number of configuration faults will decrease, since monotone and error-prone tasks like looking up configuration parameters in heavy manuals are done by the computer. In most cases, a fully automatic configuration will only be possible if the functionality of the system is reduced to a manageable subset. For more complex applications, consulting the human mind is unavoidable. Thus, we distinguish two cases: (i) The automatic setup of simple subsystems. This use case mostly deals with systems that require an automatic and autonomous (i.e., without human intervention) reconfiguration of network and communication participants in order to adapt to different operating environments. Usually, such systems either use very sophisticated (and often costly) negotiation protocols or work only on closely bounded and well-known application domains. (ii) Computer-supported configuration of large distributed systems. This case is the usual approach. • Comprehensible interfaces: In order to minimize errors, all interfaces will be made as comprehensible as possible. This includes the uniform representation of data provided by the interfaces and the capability of selectively restricting an interface to the data required by its user. The comprehensibility of an interface can be expressed by the mental load that it puts on to the user. Different users need different specialized interfaces, each with a minimum of mental load. For example, an application developer mostly has a service-centered view of the system. Physical network details and other properties not relevant for the application should be hidden from the developer [27]. • Uniform data structures: The configuration and management of fieldbus systems require representations of system properties that are usable by software tools. In order to avoid a situation where each application deals with the required information in its own way, these representations should be generic, highly structured, and exactly specified.
© 2005 by CRC Press
18-4
The Industrial Communication Technology Handbook
• Low overhead on embedded system: Fieldbus systems employ embedded hardware for reasons of cost, size, power consumption, and mechanical robustness. Such embedded hardware usually provides far less memory and processing power than average desktop systems. Currently, typical microcontrollers provide about several hundred bytes of RAM and few kilobytes of Flash ROM. Clocked by an internal oscillator, these microcontrollers provide about 0.5 up to 16 MIPS of processing power. Therefore, the designers of configuration and management tools must take care that there is as little overhead on the embedded system nodes as possible (e.g., static data required for management should be stored in a central repository outside the network). • Use of standard software/hardware: Computers running standard Windows or Linux operating systems do not provide guaranteed response times for programs, and most hardware interfaces are controlled by the operating system. Since this might violate the special timing requirements of a fieldbus protocol, it is often not possible to directly connect a configuration host computer to the fieldbus network using the fieldbus protocol itself. Instead, a configuration tool must use some other means of communication, such as standard communication protocols or interfaces like Transmission Control Protocol (TCP)/Internet Protocol (IP), RS232, universal serial bus (USB), or standard middleware like CORBA (Common Object Request Broker Architecture). Since fieldbus nodes might not be powerful enough to implement these mechanisms, communication will often be performed using dedicated gateway nodes. In order to reduce the complexity of the involved conversion and transformation steps, the interface to and from the fieldbus node must be comprehensible, structurally simple, and easy to access.
18.4 Interface Separation If different user groups access a system for different purposes, they should only be provided with interfaces to the information relevant for their respective purposes [33]. Interfaces for different purposes may differ by the accessible information and in the temporal behavior of the access across the interface. Kopetz et al. [23] have identified three interfaces to transducer nodes of a fieldbus: 1. The configuration and planning (CP) interface allows the integration and setup of newly connected nodes. It is used to generate the “glue” in the network that enables the components of the network to interact in the intended way. Usually, the CP interface is not time critical. 2. The diagnostic and management (DM) interface is used for parametriation and calibration of devices and to collect diagnostic information to support maintenance activities. For example, a remote maintenance console can request diagnostic information from a certain sensor. The DM interface is usually not time critical. 3. The real-time service (RS) interface is used to communicate the application data, e.g., sensor measurements or set values for an actuator. This interface usually has to fulfill timing constraints such as a bounded latency and a small communication jitter. The RS interface has to be configured by means of the CP (e.g., communication schedules) or DM (e.g., calibration data or level monitors) interface. The TTP/A (time-triggered protocol for SAE class A applications) fieldbus system [24] uses timetriggered scheduling that provides a deterministic communication scheme for the RS interface. A specified part of the bandwidth is reserved for arbitrary CP and DM activities. Therefore, it is possible to perform configuration and planning tasks while the system is in operation without a probe effect on the real-time service [15].
18.4.1 The Interface File System Approach The concept of the Interface File System (IFS) was introduced by Kopetz et al. [23]. The IFS provides a unique addressing scheme to all relevant data of the nodes in a distributed system. Thus, the IFS maps
© 2005 by CRC Press
18-5
Configuration and Management of Fieldbus Systems
real-time data, all kinds of configuration data, self-describing information, and internal state reports for diagnosis purposes. The IFS is organized hierarchically as follows: The cluster name addresses a particular fieldbus network. Within the cluster, a specific node is addressed by the node name. The IFS of a node is structured into files and records. Each record is a unit of four bytes of data. The IFS is a generic approach that has been implemented with the TTP/A protocol [24] as a case study for the OMG Smart Transducer Interface. The IFS approach supports well the integration and management of heterogeneous fieldbus networks. The IFS provides the following benefits: • It establishes a well-defined interface between network communication and local application. The local application uses API (Application Programming Interface) functions to read and write data from or into the IFS. The communication interface accesses the IFS to exchange data across the network. • The IFS hides network communication from the node application and provides location transparency for a message, since a task does not have to discriminate between data that is locally provided and data that is communicated via the network. • Since the configuration and management data are also mapped into the IFS, configuration and management tools can directly use the CORBA STI (smart transducer interface) for accessing this information from outside the network. Figure 18.1 depicts an architecture with configuration and management tools that access the IFS of a fieldbus network from the Internet. Diagnosis Tools Monitoring Application (Remote )
Configuration Tool
Virtual Access to any
fieldbus node
Internet TCP /IP or CORBA Interface
Internet or CORBA ORBit Server
RS232 or any other Standard Interface
Fieldbus Gateway Node
Fieldbus Smart Transducer Interface
Field Devices
FIGURE 18.1 Architecture for remote configuration and monitoring.
© 2005 by CRC Press
Monitoring Application (Local)
18-6
The Industrial Communication Technology Handbook
The IFS maps real-time service data, configuration data, and management data all in the same way. In fact, the management interface can be used to define the real-time service data set dynamically (e.g., to select between a smoothed value or a dynamic value as the result from a sensor). While it is required to provide real-time guarantees for communication of real-time data, the access to configuration and management data is not time critical. This enables the employment of Web-based tools for remote maintenance. Tools that interface with the IFS have been implemented using CORBA as middleware. CORBA is an object model managed by the Object Management Group (OMG) that provides transparent communication among remote objects. Objects can be implemented in different programming languages and can run on different platforms. The standardized CORBA protocol IIOP (Internet Inter-ORB Protocol) can be routed over TCP/IP, thus supporting worldwide access to and communication between CORBA objects across the Internet. Alternatively, it is possible to use Web Services as the management interface to a fieldbus network. A case study that implements Web Services on top of the IFS of a fieldbus is described in [36].
18.5 Profiles, Data Sheets, and Descriptions In order to build and configure systems, users require information on different properties of the parts of the targeted system. Such information comes in the form of, e.g., hardware manuals or data sheets. Since this information is intended for human consumption, representation and content are typically less formal than would be required for computer processing of this information. For that reason, dedicated computer-readable representations of fieldbus system properties are required, which play a similar role as information sources for a computer-based support framework during configuration and management of a system. These representations allow for establishing common rule sets for developing and configuring applications and for accessing devices and system properties (for configuration as well as management functions). In the following, we examine several representation mechanisms.
18.5.1 Profiles Profiles are a widely used mechanism to create interoperability in fieldbus systems. We distinguish several types of profiles, i.e., application, functional, or device profiles. Heery and Patel [16] propose a very general and short profile definition that we adopt for our discussion: “Profiles are schemas, which consist of data elements drawn from one or more name spaces,* combined together by implementors, and optimized for a particular local application.” In many cases, a profile is the result of the joint effort of a group of device vendors in a particular area of application. Usually, a task group is founded that tries to identify reoccurring functions, usage patterns, and properties in their domain and then creates strictly formalized specifications according to these identified parts, resulting in so-called profiles. More specific, for each device type, a profile exactly defines what kind of communication objects, variables, and parameters have to be implemented so that a device conforms to the profile. Profiles usually distinguish several types of variables and parameters (e.g., process parameters, maintenance parameters, user-defined parameters) and provide a hierarchical conformance model that allows for the definition of user-defined extensions of a profile. A device profile need not necessarily correspond to a particular physical device; for example, a physical node could consist of multiple virtual devices (e.g., multipurpose input/output (I/O) controller), or a virtual device could be distributed over several physical devices. Protocols supporting device, functional, and application profiles are CANopen [7], Profibus, and LON [25] (LonMark functional profiles). Figure 18.2 depicts, as an example, the visual specification of a LonMark† functional profile for an analog input object. The profile defines a set of network variables (in this example only the mandatory ones are shown) and local configuration parameters (none in this *That is, sources. †http://www.lonmark.org.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-7
Analog Input Object Mandatory Network Variables
nv1
nvoAnalog SNVT_lev_percent
Configuration Properties None
FIGURE 18.2 Functional profile for an analog input.
example). The arrow specifies that this profile outputs a digital representation of an analog value, whereas the structure of this output is defined with the standardized (in LON) network variable type SNVT_lev_percent (–163.84 to 163.84% of full scale). In addition, a profile also specifies other important properties, such as timing information, valid range, update rate, power-up state, error condition, and behavior (usually as a state diagram). While the approach for creating profiles is comparable for different protocols, the profiles are not always interchangeable between the various fieldbuses, although advancements (at least for process control-related fieldbuses) have been made within IEC 61158 [19]. Block- and class-based concepts, such as function blocks as they are defined for the Foundation Fieldbus or Profibus DP, or component classes in IEEE 1451.1 [18], can be considered implementations of the functional profile concept.
18.5.2 Electronic Data Sheets Classical data sheets usually provide a detailed description of mostly physical key properties of a device such as available pins, electrical properties of pins, available amount and layout of memory, processing power, etc. Electronic data sheets play a conceptually similar role, but usually with a different focus, since they often try to abstract from details of physical properties of the underlying system and describe properties of a higher-level system model (e.g., Institute of Electrical and Electronics Engineers (IEEE) digital transducer interface [17] or Interface File System [26]). Such electronic data sheets follow strict and formalized specification rules in order to allow computer-supported processing of the represented information. A generic electronic data sheet format was developed as part of the smart transducer-related IEEE 1451 standards family. IEEE 1451.2 [17] specifies the transducer electronic data sheet (TEDS) and a digital interface to access that data sheet and to read sensors or set actuators. Figure 18.3 depicts the TEDS in context of the system architecture as defined in IEEE 1451: • Smart Transducer Interface Module (STIM): A STIM contains from 1 to 255 transducers of various predefined types together with their descriptions in the form of the corresponding TEDSs. • Network-capable application processor (NCAP): The NCAP is the interface to the overall network. By providing an appropriate NCAP, the transducer interface is independent of the physical fieldbus protocol. • Transducer-independent interface (TII): The TII is the interface between the STIM and the NCAP. It is specified as an abstract model of a transducer instrumented over 10 digital communication lines. TEDSs describe node-specific properties, such as the structure and temporal properties of devices and transducer data. Since the transducer interface in IEEE 1451 is line based, the basic communication
© 2005 by CRC Press
18-8
The Industrial Communication Technology Handbook
Smart Transducer Interface Module Sensor ADC
DAC Transducer
Address Logic
Netw ork Capable Application Processor (NCAP)
DIO Transducer Independent Interface
Netw ork
Actuator
Transducer Electronic Data Sheet (TEDS)
FIGURE 18.3 Smart Transducer Interface Module connected to NCAP.
primitive is a channel. A channel represents a single flow path for digital data or an analog signal. One STIM may contain multiple channels and has an associated meta-TEDS that describes properties of the STIM, such as device identification information, number of implemented channels, command response time, or worst-case timing information. Each channel has an associated channel TEDS that describes channel-related information such as data structure, transducer, data conversion, timing, etc. IEEE 1451 aims at self-contained nodes. Thus, TEDSs are stored in a memory directly located at the nodes. This requires considerable memory resources, so the representation of the configuration information for such a system must be very compact. IEEE 1451 achieves this goal by providing a large set of predefined transducer types and modes based on enumerated information, where identifiers are associated with more detailed prespecified descriptions (similar to error codes). An instance of a transducer description can be derived from the predefined types, and thus the memory requirements for the transducer description are kept low. The smart transducer descriptions (STDs), as defined in [28], take a comparable role for describing properties of devices that follow the CORBA Smart Transducer Interfaces standard (the descriptions themselves are currently not part of the standard), although there are some notable differences between both approaches. Unlike the commonly used enumeration-based description of properties, the STD and related formats use XML [39] as the primary representation mechanism for all relevant system aspects. Together with related standards, such as XML Schema or XSLT, XML provides advanced structuring, description, representation, and transformation capabilities. It is becoming the de facto standard for data representation and has extensive support throughout the industry. Some examples of XML used in applications in the fieldbus domain can be found in [6, 9, 37]. As the name implies, the smart transducer descriptions describe the properties of nodes in the smart transducer network. The STD format is used for describing both static properties of a device family (comparable to classic data sheets) and devices that are configured as part of a particular application (e.g., the local STD also contains the local node address). The properties described in STDs can be divided into the following categories: • Microcontroller information: This block holds information on the microcontroller and clock of the smart transducer (e.g., controller vendor, clock frequency, clock drift). • Node information: This block describes properties that are specific to a particular node and mostly consist of identification information, such as vendor name, device name/version, and node identifiers (serial number, local name).
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-9
FIGURE 18.4 Example STD element.
• Protocol information: This block holds protocol-specific information, such as version of the communication protocol, supported baud rates, Universal Asynchronous Receiver/Transmitter (UART) types, and IFS layout. • Node service information: The information in this block specifies the behavior and the capabilities of a node. In the current approach, a service plays a role similar to that of a functional profile (see Section 18.5.1) or function block. Such functional units are especially important for supporting the creation of applications. They conform to the interface model of the CORBA STI standard, since a service consists of a service identifier (e.g., name), input and output parameters, configuration parameters, and management parameters [12]. Parameters are specified by data type and multiple constraints (range, precision, minimum interval time, maximum runtime). Figure 18.4 shows the description of a file in the IFS, consisting of the name of the file, its length (in records), and the location of the data, i.e., the memory type (RAM, Flash, ROM) where the file should be located (e.g., data specifies that a file is mapped into the internal RAM of the microcontroller). The prefix rodl: is shorthand for an XML name space. Name spaces allow the reuse of element definitions in multiple places. For example, the elements from the rodl (round descriptor list) name space are defined once separately and used in smart transducer descriptions as well as in additional related formats, such as the cluster configuration descriptions (CCDs). While the STD focuses on the nodes, the CCD format deals with system-level aspects. It is not always possible to store all relevant information outside the node, but by focusing on reducing the amount of required information on the node to the minimum, extensive external meta-information can be used without size constraints. The reference to this external information is the unique combination of series and serial numbers of the node. The series number is identical for all nodes of the same type. The serial number identifies the instance of a node among all nodes of a series. The advantages of this approach are twofold: 1. The overhead at the node is very low. Current low-cost microcontrollers provide internal RAM and EPROM memory of around 256 bytes. This is not sufficient to store more than the most basic parts of data sheets according to, e.g., IEEE 1451.2 without extra hardware like an external memory element. With the proposed description approach, only the memory for storing the series and serial numbers is necessary, which is 8 bytes. 2. Instead of implicitly representing the node information with many predefined data structures mapped to a compact format, it is possible to have an explicit representation of the information in a well-structured and easy-to-understand way. A typical host computer running the configuration and management tools can easily deal with even very extensive generic XML descriptions. Furthermore, XML formats are inherently easy to extend, so the format is open for future extensions of transducer or service types.
© 2005 by CRC Press
18-10
The Industrial Communication Technology Handbook
FIGURE 18.5 Process variable represented with a device description.
Another interesting description mechanism is the device description language (DDL), which has a relatively long history in the fieldbus sector. First drafts emerged around 1990 at Endress+Hauser, where development for a predecessor language, called parameter description language, was already performed in the late 1980s [1]. DDL was first used with the HART fieldbus [5], but later was adopted for the Foundation Fieldbus [13] and most recently for Profibus (where it is called electronic device description). Unfortunately, the different versions are not fully compatible, since they have been extended within the scope of the respective fieldbus protocols. The syntax of the DDL is similar to the syntax of the C programming language, but conceptually the language strongly relates to specialized markup languages like the hypertext markup language (HTML). In addition to these markup capabilities, DDL also provides enhancements like conditional evaluation and loops. DDL serves several purposes in the description of field devices: • It describes the information items presented in the memory of the described devices. • It supports the representation of the described information on different accessing devices (with different displaying capabilities). • It supports the detailed specification of device properties, such as labels for parameters, engineering units, display precision, help texts, the relationship of parameters, and the layout of calibration and diagnostic menus. Unlike the other presented approaches, the device descriptions (DDa) based on DDL play a bigger role for system management, since they not only describe the data in the memory of the devices, but also support defining rich meta-information for improving the interaction with devices. Figure 18.5 depicts a process variable defined with DDL. The example DDL fragment defines the representation of a variable on an access device. It specifies a label to represent the variable on the display, the data type, formatting information for the value to be displayed, and constraints on valid inputs for changing the value. DDs can be stored on devices themselves (using a compact encoding of the information in the DD), as well as externally (e.g., delivered on a disc together with the device or centrally available in a DD repository).
18.6 Application Development At the center of a fieldbus system is the actual fieldbus application. In the following we examine several application development approaches and how they influence system configuration. A widely used development approach for fieldbus applications is model-based development. The basic idea behind this approach is to create a model of the application that consists of components that are connected via links that represent the communication flow between the components. Different approaches usually differ in what constitutes a component (e.g., function blocks, subsystems, services, functional profiles, physical devices) and the detailed semantics of a link. Many approaches support the recursive definition of components, which allows for grouping multiple lower-level components into one higher-level component. Figure 18.6 depicts a typical small control application consisting of two analog inputs receiving values from two sensors, two production identifiers (PIDs), and one analog output controlling an actuator.
© 2005 by CRC Press
18-11
Configuration and Management of Fieldbus Systems
Sensor
Analog In
PID
Sensor
Analog In
PID
Analog Out
Actuator
FIGURE 18.6 Example for an application model. Enterprise Site Area Process Cell Unit Equipment Control Module
FIGURE 18.7 ANSI/ISA-88.01–1995 hierarchical model.
But the model-based approach is not the only application design approach. Another approach used by multiple fieldbus configuration tools is the ANSI/ISA-88.01–1995 procedural control model [20]. This modeling approach enforces a strictly modular hierarchical organization of the application (Figure 18.7). There should be no or hardly any interaction between multiple process cells. Interaction between components in a process cell is allowed. To make best use of this approach, the structure of the network site and the application should closely correspond to the hierarchy specified by this model. This modeling approach conceptually follows the typical hierarchy of process control applications with multiple locally centralized programmable logic controllers (PLCs) that drive several associated control devices. This eases transition from predecessor systems and improves overall robustness, since this approach provides fault containment at the process cell level. As a downside, the coupling between the physical properties of the system and the application is rather tight. An example for a fieldbus protocol that supports this modeling approach is the Profibus PA protocol that provides a universal function block parameter for batch identification [4]. Another design approach is two-level design [31], which originated in the domain of safety-critical systems. In this approach, the communication between components must be configured before configuring the devices. While this requires that many design decisions must be made very early in the design process, this approach greatly improves overall composability of the components in the system. Abstract application models provide several advantages for application development: • The modular design of applications helps to deal with complexity by applying a divide-and-conquer strategy. Furthermore, it supports reuse of application components and physical separation. • The separation of application logic from physical dependencies allows hardware-independent design that enables application development before hardware is available, as well as eases migration and possibly allows the reuse (of parts) of applications. For configuring a physical fieldbus system from such an application model, we must examine (1) how this application model maps to the physical nodes in the network and (2) how information flow is maintained in the network.
© 2005 by CRC Press
18-12
The Industrial Communication Technology Handbook
Function Block Index Parameter
Slot 1 Slot 2 Slot Index
Param Index
1
Param1
2
Param2
Address Data Item Module1
Module2
virtual devices Module3
Device Memory
FIGURE 18.8 Mapping of function blocks to a physical device in Profibus DP.
In order to map the application model to actual devices, fieldbuses often provide a model for specifying physical devices as well. For example, in Profibus DP the physical mapping between function blocks and the physical device is implemented as follows (Figure 18.8). A physical device can be subdivided in several modules that take the role as virtual devices. Each device can have one (in case of simple functionality) up to many slots. A function block is mapped to a slot, whereas slots may also have associated physical and transducer blocks. Physical and transducer blocks represent physical properties of a fieldbus device. Parameters of a function block are indexed, and the slot number and parameter index cooperatively define the mapping to actual data in the device memory. In contrast, the Foundation Fieldbus (FF) follows an object-oriented design philosophy. Thus, all information items related to configuring a device and the application (control strategy) are represented with objects. This includes function blocks, parameters, and subelements of parameters. These objects are collected in an object dictionary (OD), whereas each object is assigned an index. This OD defines the actual mapping to the physical memory on the respective device. In order to understand the methods for controlling the communication flow between the application components, we first examine some recurring important communication properties in fieldbus applications: • The use of state communication as primary communication mechanism for operating a fieldbus [29], i.e., performing the fieldbus application. State communication usually involves cyclically updating the associated application data. • Support for asynchronous/sporadic communication (event communication) in order to perform management functions and deal with parts of the application that cannot be performed with state communication. A common method to achieve these properties is scheduling. There are many scheduling approaches with vastly different effects on configuration. Following are some commonly used approaches adopted in fieldbus systems: • Multicycle polling: In this approach, the communication is controlled by a dedicated node that authorizes other nodes to transmit their data [8]. This approach is used, for example, in WorldFIP, FF, and ControlNet. For configuring devices in such a network, the authorization nodes require at least a list of nodes to be polled; i.e., in the case of a master–slave configuration, only one node must be configured with the time information in order to control the whole cluster. For better control on the timely execution of the overall application, a time-division multiplexing scheme is used for bus access.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-13
• Time triggered: In a time-triggered communication model, the communication schedule is derived from the progression of physical time. This approach requires a predefined collision-free schedule that defines a priori when a device is allowed to broadcast its data and an agreement on the global time, which requires the synchronization of the local clocks of all participating devices [10]. Some examples of protocols that support time-triggered communication are TTP/A [24], TTP/C [35], and the synchronous part of the FlexRay protocol [14]. In order to configure the communication in these systems, the schedules must be downloaded to all the nodes in the network. • Event triggered: Event-triggered communication implements a push model, where the sender decides when to send a message, e.g., when a particular value has changed more than a given delta. Collisions on the bus are solved by collision detection/retransmission or collision avoidance, i.e., bitwise arbitration protocols such as Controller Area Network (CAN) [34]. Event-triggered communication does not depend on scheduling, since communication conflicts are resolved either by the protocol at the data link layer (e.g., bitwise arbitration) or by the application. The scheduling information is usually stored in dedicated data structures that are downloaded to the nodes in the network in order to be available for use by the network management system functions of the node. The TTP/A protocol deals with both application- and communication-specific configuration information in an integrated way. In this approach, the local communication schedules (called round descriptor lists) as well as the interfaces of application services [12] are mapped onto the same interfacing mechanism, the Interface File System (see Section 18.4.1). For the representation of the overall system, the cluster configuration description format was developed; it acts as a central and uniform data structure that stores all the information pertinent to the fieldbus system. This information includes: • Cluster description meta-information: This description block holds information on the cluster description itself, such as the maintainer, name of the description file, or version of the CCD format. • Communication configuration information: This information includes round sequence lists as well as round descriptor lists, which represent the detailed specification of the communication behavior of the cluster. Additionally, this part of the CCD also includes (partially physical) properties important for communication, such as the UART specification, line driver, and minimum or maximum signal runtimes. • Cluster node information: This block contains information on the nodes in a cluster, whereas nodes are represented with the smart transducer description format.
18.7 Configuration Interfaces In the last section we focused on the relation between application and configuration. In the following, we examine aspects of the system configuration that are mostly independent of the application. We will take a brief look at the physical configuration of fieldbus systems, how nodes are recognized by the configuration system, and how the actual application code is downloaded into the fieldbus nodes.
18.7.1 Hardware Configuration The hardware configuration involves a setup of plugs and cables of the fieldbus system. Several fieldbus systems implement means to avoid mistakes, such as connecting a power cable to a sensitive input, which would cause permanent damage to the fieldbus system or even harm people. Moreover, the hardware configuration interfaces such as plugs and clamps are often subject to failure in harsh environments, e.g., on a machine that induces a lot of vibration. For hardware configuration the following approaches can be identified:
© 2005 by CRC Press
18-14
The Industrial Communication Technology Handbook
• The use of special jacks and cables that support a tight mechanical connection and avoid mistakes in orientation and polarity by their geometry. For example, the actuator–sensor interface* (AS-i) specifies a mechanically coded flat cable that allows the connection of slaves on any position on the cable by using piercing connectors. AS-i uses cables with two wires transporting data and energy via the same line. The piercing connectors support simple connection, safe contacting, and protection up to class IP 67. • Baptizing of devices in order to obtain an identifier that allows addressing the newly connected device. This could be done explicitly by assigning an identifier to the device (e.g., by setting dip switches or entering a number over a local interface) or implicitly by the cabling topology (e.g., devices could be daisy chained and obtain their name subsequently according to the chain). Alternatively, it is possible to assign unique identifiers to nodes in advance. This approach is taken, for example, with Ethernet devices where the medium access control (MAC) address is a worldwide unique identifier, or in the TTP/A protocol that also uses unique node IDs. However, such a worldwide unique identifier will have many digits, so that it is usually not feasible to have the number printed somewhere on the device. To overcome this problem, machine-readable identifiers in the form of bar codes or radio frequency (RF) tags are used during hardware configuration. • Simple configuration procedures, which can be carried out and verified by nonexpert personnel.
18.7.2 Plug and Participate Since the hardware configuration is intended to be simple, a fieldbus system should behave intelligently in order to release human personnel from error-prone tasks. During the stage of plug and participate, the fieldbus system runs an integration task that identifies new nodes, obtains information about these nodes, and changes the network configuration in order to include the new nodes in the communication. Identification of new nodes can be supported with manual baptizing as described in the previous section. Alternatively, it is also possible to automatically search for new nodes and identify them as described in [11]. If there can be different classes of nodes, it is necessary to obtain information on the type of the newly connected nodes. This information will usually be available in the form of an electronic data sheet that can be obtained from the node or from an adequate repository. The necessary changes of the network configuration for including the new node greatly depend on the employed communication paradigm. In the case of a polling paradigm, only the list of nodes to be polled has to be extended. In the case of a time-triggered paradigm, the schedule has to be changed and updated in all participating nodes. In the case of an event-triggered paradigm, only the new node has to be authorized to send data. However, it is very difficult to predict how a new sender will affect the timing behavior of an event-triggered system. In all three cases, critical timing might be affected due to a change of the response time, i.e., when the cycle time has to be changed. Thus, in time-critical systems, the extensibility must be taken into account during system design, e.g., by reserving at first unused bandwidth or including spare communication slots.
18.7.3 Application Download Some frequently reoccurring fieldbus applications, like standard feedback control loops, alert monitoring, and simple control algorithms, can often be put in place like building bricks, since these applications are generically available (e.g., PID controller). For more complex or unorthodox applications, however, it is necessary to implement user-defined applications. These cases require that code must be downloaded into the target devices.
*http://www.as-interface.net/.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-15
Ten years ago, the most common method to reprogram a device was to use a socketed EPROM memory chip that was taken out of the circuit, erased under UV radiation, and programmed using a dedicated development system, i.e., a PC with a hardware programming device, and then put back into the system. Today, most memory devices and microcontrollers provide an interface for in-system serial programming of Flash and EPROM memory. The hardware interface for in-system serial programming usually consists of a connector with four to six pins that is attached to either an external programming device or directly to the development PC. These programming interfaces are often proprietary to particular processor families, but there also exist some standard interfaces that support a larger variety of devices. For example, the Joint Test Action Group (JTAG) debugging interface (IEEE 1149.1) also supports the download of application code. While the in-system serial programming approach is much more convenient than the socketed EPROM method, both approaches are conceptually quite similar, since it is still necessary to establish a separate hardware connection to the target system. The most advanced approach for downloading applications is in-system application download. In this approach, it is possible to program and configure a device without taking it out of the distributed target system and without using extra cables and hardware interfaces. In-system configuration is supported by state-of-the-art Flash devices, which can reprogram themselves in part by using a boot loader program. This approach is supported, for example, by state-of-theart TTP nodes. A cluster consists of a set of at least four TTP/C nodes and a monitoring node that is connected to the development system. Whenever a new application has to be set up, the monitoring node sends a signal that causes the nodes to go into the so-called download mode. In this mode, it is possible to download application code via the fieldbus network. During the download phase, the real-time service is inactive. Misconfigurations that lead to a failure of the download function must be corrected by locally connecting a programming tool. Alternatively, application code could be downloaded via the fieldbus into the RAM memory at startup. In this case, only the boot loader resides in the persistent memory of the device and the user-defined application code has to be downloaded at start-up. This approach has the advantage of being stateless, so that errors in the system are removed at the next start-up. Thus, the engineers could handle many faults by a simple restart of the system. On the other hand, this approach depends on the configuration instance at start-up — the system cannot be started if the configuration instance is down. Moreover, the restart time of a system may be considerably longer.
18.8 Management Interfaces The ability to perform remote management operations on distributed fieldbus devices is one of the most important advantages of fieldbus systems. Wollschläger [38, p. 89] states that “in automation systems, engineering functions for administration and optimization of devices are gaining importance in comparison with control functions.” Typical management operations are monitoring, diagnosis, or node calibration. Unlike the primary fieldbus applications, which often require cyclical, multidrop communication, these management operations usually use a one-to-one (client–server) communication style. For this reason, most fieldbus systems support both communication styles. A central question is whether and how this management traffic influences the primary application, the so-called probe effect [15]. System management operations that influence the timing behavior of network communication are especially critical for typical fieldbus applications (e.g., process control loops) that require exact realtime behavior. The probe effect can be avoided by reserving a fixed amount of the bandwidth for management operations. For example, in the Foundation Fieldbus and WorldFIP protocols the application cycle (macrocycle) is chosen to be longer than strictly required by the application, and the remaining bandwidth is free for management traffic.
© 2005 by CRC Press
18-16
The Industrial Communication Technology Handbook
In order to avoid collisions within this management traffic window, adequate mechanisms for avoiding or resolving such conflicts must be used (e.g., token passing between nodes that want to transmit management information, priority-based arbitration). In TTP/A, the management communication is implemented by interleaving real-time data broadcasts (implemented by multipartner rounds) with so-called master–slave rounds that open a communication channel to individual devices. If management traffic is directly mingled with application data, such as in CAN, LonWorks, or Profibus PA, care must be taken that this management traffic does not influence the primary control application. This is typically achieved by analyzing network traffic and leaving enough bandwidth headroom. For complex systems and safety-critical systems that require certain guarantees on system behavior, this analysis can become very difficult.
18.8.1 Monitoring and Diagnosis In order to perform passive monitoring of the communication of the application, it is usually sufficient to trace the messages transmitted on the bus. However, the monitoring device must have knowledge of the communication scheme used in the network, in order to be able to understand and decode the data traffic. If this scheme is controlled by physical time, as is the case in time-triggered networks, the monitoring node must also synchronize itself to the network. Some advanced field devices often have built-in self-diagnostic capabilities and can disclose their status to the management system. It depends on the capabilities of the fieldbus system how such information reaches the management framework. Typically, a diagnosis tool or the diagnosis part of the management framework will regularly check the information in the nodes. This method is called status polling. In some fieldbus protocols (e.g., FF), devices can also transmit status messages by themselves (alert reporting). In general, the restrictions from the implementation of the management interface of a fieldbus protocol also apply to monitoring, since in most fieldbus systems the monitoring traffic is transmitted using the management interface. For systems that do not provide this separation of management from application information at the protocol level, other means must be taken to ensure that monitoring does not interfere with the fieldbus application. Since status polling is usually performed periodically, it should be straightforward to reserve adequate communication resources during system design, so that the control application is not disturbed. In the case of alert reporting, the central problem without adequate arbitration and scheduling mechanisms is how to avoid overloading the network in case of “alarm showers,” where many devices want to send their messages at once. It can be very difficult to give timeliness guarantees (e.g., the time between when an alarm occurs and the time it is received by the respective target) in such cases. The typical approach to deal with this problem (e.g., as taken in CAN) is to provide much bandwidth headroom. For in-depth diagnosis of devices, it is sometimes also desirable to monitor operation and internals of individual field devices. This temporarily involves greater data traffic that cannot be easily reserved a priori. Therefore, the management interface must provide some flexibility on the diagnosis data in order to dynamically adjust to the proper level of detail using some kind of pan-and-zoom approach [2].
18.8.2 Calibration The calibration of transducers is an important management function in many fieldbus applications. There is some ambiguity involved concerning the use of this term. Berge [4] strictly distinguishes between calibration and range setting: “Calibration is the correction of sensor reading and physical outputs so they match a standard” [p. 363]. According to this definition, calibration cannot be performed remotely, since the device must be connected to a standardized reference input. Range setting is used to move the value range of the device so that the resulting value delivers the correctly scaled percentage value. It does not require applying an input and measuring an output; thus,
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-17
it can be performed remotely. In the HART bus, this operation is called calibration, whereas calibration is called trim. Fieldbus technology does not influence the way calibration is handled, although information that is required for calibration is stored as part of the properties that describe a device. Such information could be, e.g., the minimum calibration span limit. This is the minimum distance between two calibration points within the supported operation range of a device. Additionally, calibration-related information, i.e., the individual calibration history, can be stored in the devices themselves. This information is then remotely available for management tools in order to check the calibration status of devices. Together with the self-diagnosis capabilities of the field devices, this allows performing a focused and proactive management strategy.
18.9 Maintenance in Fieldbus Systems Fieldbus maintenance is the activity of keeping the system in good working order. The extensive management functions provided by fieldbus systems, such as diagnosis, and monitoring greatly help in maintaining systems. There are several different maintenance schemes that influence the way these steps are executed in detail. Choice of a particular maintenance scheme is usually motivated by the application requirements [4]: • Reactive maintenance is a scheme in which a device is only fixed after it has been found to be broken. This case should be avoided in environments where downtimes are costly (such as in factory applications). Thus, designers of such applications will usually choose more active maintenance strategies. Nonetheless, fieldbus systems also provide advantages for this scheme, since they support the fast detection of faulty devices. • Preventive maintenance is a scheme in which devices are serviced in regular intervals even if they are working correctly. This strategy prevents unexpected downtime, thus improving availability. Due to the associated costs, this approach will only be taken in safety-related applications such as in aviation, train control, or where unexpected downtimes would lead to very high costs. • Predictive maintenance is similar to preventive maintenance, differing in a dynamic service interval that is optimized by using longtime statistics on devices. • Proactive maintenance focuses on devices that are expected to require maintenance. Basically, maintenance involves the following steps: • Recognizing a defective device • Repairing (replacing) the defective device • Reintegrating the serviced device In fieldbus systems, faulty devices will usually be recognized via the network. This is achieved by monitoring the fieldbus nodes and the application or with devices that are capable of sending alerts (refer to Section 18.8). After the source of a problem has been found, the responsible node must be serviced. This often requires disconnecting the node from the network. Thus, we require strategies of how the system should deal with disconnecting a node, as well as reconnecting and reintegrating the replacement node. In case the whole system must be powered down for maintenance, a faulty node can be simply replaced and the integration of the new node occurs as a part of the normal initial start-up process. If powering down of the whole system is undesirable or even impossible (in the sense of leading to severe consequences, as in the case of safety-critical applications), this process becomes more complicated. In this case, we have several options: • Implementation of redundancy: This approach must be taken for safety- or mission-critical devices, where operation must be continued after a device becomes defective or during replacement, respectively. A detailed presentation of redundancy and fault-tolerant systems can be found in [30].
© 2005 by CRC Press
18-18
The Industrial Communication Technology Handbook
• Shutdown of part of the application: In the case of factory communication systems that often are organized as multilevel networks or use modular approaches, it might be feasible to shut down a local subnetwork (e.g., a local control loop or a process cell as defined in the ANSI/ISA-88.01–1995 standard). The replacement node must be configured with individual node data, such as calibration data (these data usually differ between replaced and replacement node), and the state of a node. The state information can include: • Information that is accumulated at runtime (the history state of a system). This information must be transferred from the replaced to the replacement node. • Timing information, so that the node can synchronize with the network. For example, in networks that use a distributed static schedule (e.g., TTP/A), each node must be configured with its part of the global schedule in order to get a network-wide consistent communication configuration. One alternative approach for avoiding transferring of system state is to design a stateless system in the first place. Bauer [3] proposes a generic approach for creating stateless systems from systems with state. Another possibility is to provide well-defined reintegration points where this state is minimized. Since fieldbus applications typically use a cyclical communication style, the start of a cycle is a natural reintegration point.
18.10 Conclusion Configuration and management play an important role in fieldbus systems. The configuration phase can be subdivided into a part that requires local interaction such as connection of hardware and setting dip switches, and a part that can be done remotely via the fieldbus system. An intelligent design requires that the local part is as simple as possible in order to employ nonexpert personal, and that both parts are supported by an adequate architecture and tools that assist the system integrator in tedious and errorprone tasks such as adjusting parameters according to the data sheet of a device. Examples of such an architecture are, among others, IEEE 1451 and the OMG Smart Transducer Standard, which both provide machine-readable electronic data sheets. Management encompasses functions like monitoring, diagnosis, calibration, and support for maintenance. In contrast to the configuration phase, most management functions are used concurrently with the real-time service during operation. Some management functions, such as monitoring, may even require real-time behavior for themselves. In order to avoid a probe effect on the real-time service, the scheduling of a fieldbus system must be designed to integrate management traffic with real-time traffic.
References [1] Borst Automation. Device description language. The HART Book, 9, May 1999. Available at http: //www.thehartbook.com/. [2] L. Bartram, A. Ho, J. Dill, and F. Henigman. The continuous zoom: a constrained fisheye technique for viewing and navigating large information spaces. In ACM Symposium on User Interface Software and Technology, 1995, pp. 207–215. [3] G. Bauer. Transparent Fault Tolerance in a Time-Triggered Architecture. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2001. [4] J. Berge. Fieldbuses for Process Control: Engineering, Operation, and Maintenance. ISA — The Instrumentation, Systems, and Automation Society, Research Triangle Park, NC, 2002. [5] R. Bowden. HART: A Technical Overview. Fisher-Rosemount, Chanhassen, MN, 1997. [6] D. Bühler. The CANopen Markup Language: representing fieldbus data with XML. In Proceedings of the 26th IEEE International Conference of the IEEE Industrial Electronics Society (IECON 2000), Nagoya, Japan, October 2000.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-19
[7] CAN in Automation e.V. CANopen: Communication Profile for Industrial Systems, 2002. Available at http://www.can-cia.de/downloads/. [8] S. Cavalieri, S. Monforte, A. Corsaro, and G. Scapellato. Multicycle polling scheduling algorithms for fieldbus networks. Real-Time Systems, 25:157–185, 2003. [9] S. Eberle. XML-basierte Internetanbindung technischer Prozesse. In Informatik 2000 Neue Horizonte im neuen Jahrhundert. Springer-Verlag, Heidelberg, 2000, pp. 356–371. [10] W. Elmenreich, G. Bauer, and H. Kopetz. The time-triggered paradigm. In Proceedings of the Workshop on Time-Triggered and Real-Time Communication, Manno, Switzerland, December 2003. [11] W. Elmenreich, W. Haidinger, P. Peti, and L. Schneider. New node integration for master-slave fieldbus networks. In Proceedings of the 20th IASTED International Conference on Applied Informatics (AI 2002), February 2002, pp. 173–178. [12] W. Elmenreich, S. Pitzek, and M. Schlager. Modeling distributed embedded applications using an interface file system. Accepted for presentation at the 7th IEEE International Symposium on ObjectOriented Real-Time Distributed Computing, 2004. [13] Fieldbus Technical Overview: Understanding FOUNDATION fieldbus Technology, 2001. Available at http://www.fieldbus.org. [14] T. Führer, F. Hartwich, R. Hugel, and H. Weiler. FlexRay: The Communication System for Future Control Systems in Vehicles. Paper presented at SAE World Congress 2003, Detroit, MI, March 2003. [15] J. Gait. A probe effect in concurrent programs. Software Practice and Experience, 16:225–233, 1986. [16] R. Heery and M. Patel. Application Profiles: Mixing and Matching Metadata Schemas. Ariadne, September 25, 2000. Available at http://www.ariadne.ac.uk. [17] Institute of Electrical and Electronics Engineers, Inc. IEEE 1451.2–1997, Standard for a Smart Transducer Interface for Sensors and Actuators: Transducer to Micro-Processor Communication Protocols and Transducer Electronic Data Sheet (TEDS) Formats, September 1997. [18] Institute of Electrical and Electronics Engineers, Inc. IEEE 1451.1–1999, Standard for a Smart Transducer Interface for Sensors and Actuators: Network Capable Application Processor (NCAP) Information Model, June 1999. [19] International Electrotechnical Commission (IEC). Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems: Part 1: Overview and Guidance for the IEC 61158 Series, April 2003. [20] ANSI/ISA-88.01, Batch Control Part 1: Models and Terminology, December 1995. [21] W.H. Ko and C.D. Fung. VLSI and intelligent transducers. Sensors and Actuators, 2:239–250, 1982. [22] H. Kopetz, Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Boston, 1997. [23] H. Kopetz, M. Holzmann, and W. Elmenreich. A universal Smart Transducer Interface: TTP/A. International Journal of Computer System Science and Engineering, 16:71–77, 2001. [24] H. Kopetz, et al. Specification of the TTP/A Protocol, Version 2.00. Technical report, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. Available at http:// www.ttagroup.org. [25] D. Loy, D. Dietrich, and H.-J. Schweinzer (Eds.). Open Control Networks. Kluwer Academic Publishing, Boston, 2001. [26] OMG. Smart Transducers Interface, V1.0. Available specification document number formal/200301-01, Object Management Group, Needham, MA, January 2003. Available at http://doc.omg.org/ formal/2003-01-01. [27] S. Pitzek and W. Elmenreich. Managing fieldbus systems. In Proceedings of the Work-in-Progress Session of the 14th Euromicro International Conference, June 2002. [28] S. Pitzek and W. Elmenreich. Configuration and management of a real-time smart transducer network. In Proceedings of the 9th IEEE International Conference on Emerging Technologies and Factory Automation, Volume 1, Lisbon, Portugal, September 2003, pp. 407–414. [29] P. Pleinevaux and J.-D. Decotignie. Time critical communication networks: field buses. IEEE Network, 2:55–63, 1998.
© 2005 by CRC Press
18-20
The Industrial Communication Technology Handbook
[30] S. Poledna. Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. Kluwer Academic Publishers, Boston, 1995. [31] S. Poledna, H. Angelow, M. Glück, M. Pisecky, I. Smaili, G. Stöger, C. Tanzer, and G. Kroiss. TTP Two Level Design Approach: Tool Support for Composable Fault-Tolerant Real-Time Systems. Paper presented at SAE World Congress 2000, Detroit, MI, March 2000. [32] J. Powell. The “Profile” Concept in Fieldbus Technology. Technical article, Siemens Milltronics Process Instruments Inc., 2003. [33] A. Ran and J. Xu. Architecting software with interface objects. In Proceedings of the 8th Israeli Conference on Computer-Based Systems and Software Engineering, 1997, pp. 30–37. [34] Robert Bosch GmbH, Stuttgart. CAN Specification, Version 2.0, 1991. [35] TTAGroup. Specification of the TTP/C Protocol. TTAGroup, 2003. Available at http://www. ttagroup.org. [36] M. Venzke. Spezifikation von interoperablen Webservices mit XQuery. Ph.D. thesis, Technische Universität Hamburg-Harburg, Hamburg-Harburg, Germany, 2003. [37] M. Wollschläger. A framework for fieldbus management using XML descriptions. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems (WFCS 2000), September 2000, pp. 3–10. [38] M. Wollschläger, C. Diedrich, T. Bangemann, J. Müller, and U. Epple. Integration of fieldbus systems into on-line asset management solutions based on fieldbus profile descriptions. In Proceedings of the 4th IEEE International Workshop on Factory Communication Systems, August 2002, pp. 89–96. [39] World Wide Web Consortium (W3C). Extensible Markup Language (XML) 1.0, 2nd ed., October 2000. Available at http://www.w3.org. [40] L.A. Zadeh. The Concept of System, Aggregate, and State in System Theory, Inter-University Electronics Series, Volume 8. McGraw-Hill, New York, 1969, pp. 3–42.
© 2005 by CRC Press
19 Which Network for Which Application 19.1 Introduction ......................................................................19-1 19.2 Production Hierarchies.....................................................19-1 19.3 Process Types .....................................................................19-3 Batch Systems
19.4 Control Systems.................................................................19-3 Time-Triggered Systems • Discrete Event Control Systems
19.5 Communication Systems ..................................................19-5 Communication Models • Temporal Consistency • Spatial Consistency • Event Ordering • Influence of Failures
19.6 Parameters to Consider in a Choice ................................19-7 19.7 An Overview of Some Solutions......................................19-9
Jean-Dominique Decotignie CSEM (Centre Suisse d’Electronique et de Microtechnique)
Actuator Sensor Interface • Controller Area Network • HART • INTERBUS • LON • MIL-STD-1553 • PROFIBUS-FMS • PROFIBUS-DP • SERCOS • WorldFIP • Ethernet-Based Solutions • Solutions from Nonindustrial Markets
19.8 Conclusion.......................................................................19-13 References ...................................................................................19-14
19.1 Introduction Since the advent of industrial communications, dozens of solutions have been designed that all claim to solve the problem posed by this chapter’s title. This is true, but most of the time in a given context. The reason is that there is no single case in industrial communications. Most of the time industrial processes are organized in a hierarchical manner, and depending on the location in the hierarchy, needs will differ. The way the control software is written also introduces different needs on the communication networks. To select a communication network, it is thus important to understand the different views the network designers had when they built their solution. In this chapter, we will first describe the way industrial production is organized. Then the differences of approaches in architecturing control applications will be explained. When introducing a network to support communication between control entities, a number of constraints are added, such as errors and delays. As they have an impact on the system, they will be detailed first. The last section will present selected industrial networks and show which type of application they may support and how well they do the required job.
19.2 Production Hierarchies Control of a production system is performed by numerous computers organized along several hierarchical levels. Computer networks provide communications between the computers within a level and with some of the computers at the adjacent levels. 19-1 © 2005 by CRC Press
19-2
The Industrial Communication Technology Handbook
TABLE 19.1 Production Hierarchy LEVEL MANUFACTURING PROCESS CONTROL
PUBLIC NETWORK 6 plant management
PLANT MANAGEMENT
OFFICE LAN 5
factory controller
4
cell/line controller
3
workstation controller
2
automation module ctrl
SUPERVISORY CONTROLLER
INDUSTRIAL LAN
CELL NETWORK
BACKPLANE/ CELL NETWORK FIELD BUS
FIELD BUS
1
0
device controller
LINE
DISTRIBUTED CONTROL SYSTEM
MACHINE
PROCESS CONTROLLER
AXES
DEDICATED CONTROLLER
AXES
DEDICATED CONTROLLER
sensor or actuator
Before presenting the different levels in more detail, let us say that the closer an application is to the process, the higher the temporal constraints (ISO, 1994). Conversely, the quantity of transferred data increases with the level in the hierarchy. The position of an application in the production hierarchy also influences the way the application software is built. Applications at the lowest levels often adopt a timetriggered approach (see Section 19.4), while at higher levels an event-driven approach is used in most cases. Numerous factory models, also known as computer-integrated manufacturing (CIM) models, have been described in the literature (for a summary, see Jones, 1989). In terms of hierarchy, all of the above proposals do not differ very much. Differences concern mainly the number and labeling of the levels. Table 19.1 depicts a possible summary of these proposals. This table shows that there exists a more profound difference between process control and manufacturing. The primary purpose of a plant control system, organized in a hierarchical manner as in Table 19.1, is to manage the process as seen through the sensors (level 0) by acting upon it through actuators. Sensors and actuators are linked to the first level of automation (level 1), where each automation device controls a variable of the process in such a way that it stays within limits or follows set-point data given by the next level of automation (level 2). Level 1 corresponds to an axis, control of a single entity, in manufacturing or a local loop in process control. The variable may be discrete, such as a tool-changing mechanism in a machine tool, or continuous, such as an axis in a robot or a heater in a distillation column. Units at level 2 elaborate the set-point data or the limits for variables that have to be linked or related to ensure a proper operation. This is typically the purpose of an interpolator in the computer numerical controller (CNC), or a tool magazine controller for a machine tool, or a robot path controller that gives the set-point data for joints in order to follow a given path. These units receive their commands from and return their status to the machine or process level (level 3). Units at level 3 coordinate the actions on, or the operating conditions of, several elements or groups of elements in order to realize or optimize operations or sequences of operations on objects, either solid or fluid. They receive their commands and report to the line or distributed control system (DCS) level (level 4). Units at levels 1 to 3 also have to perform diagnostics on themselves, detect emergency conditions, and make immediate decisions if they have enough information to do so. Otherwise, they report to the next higher level for decisions. Level 4 units are responsible for optimizing the production, that is, scheduling when and where given operations are performed on objects, and for ensuring that all necessary resources are present.
© 2005 by CRC Press
Which Network for Which Application
19-3
Level 5 normally corresponds to the elaboration of a single product or a family of products on the same devices. That is where process planning for products is performed. Level 6 represents the top level of a plant. Basic functions performed at this level are product design, resource management, high-level production management with inter-area production scheduling, plant policies establishment, etc. At the top, we may find a corporate management level (not shown in Table 19.1) that is linked to the different sites or plants via public or corporate networks. At this level all the nonproduction activities (research, finances, etc.) of the enterprise can be linked. It should be noted that some of these levels might be absent in a given enterprise. Depending on the level of automation and the control structure, level 2 and even level 1 might be included in level 3. Also, levels 5 and 6 might be collapsed into a single level. There is today a clear tendency to reduce the number of levels. However, the functions described above are always present whatever the number of levels.
19.3 Process Types Processes controlled by computers are usually classified according to two paradigms: continuous systems and discrete event systems (Halang, 1992). An example of a continuous process is a temperature regulator that keeps the temperature in a reactor within preset limits around a given set-point value. Such a system reads the temperature as given by a sensor, compares it with the set-point value, computes the necessary correction using an adequate control algorithm, and energizes accordingly a heating actuator. This sequence of operations is repeated at regular intervals, often periodically. The period is set according to the dynamics of the controlled process. On the other hand, the control of an elevator pertains to the class of discrete systems. Let us describe a part of a typical sequence of operations. While the elevator is approaching its destination floor, the system is expecting that the floor deceleration sensor turns on. In this case, the control system has to react by decreasing the speed of the elevator. In the next step, it has to stop the elevator when the floor sensor indicates that the floor has been reached. During this sequence, other events may happen, such as summons requests from external passengers. Such requests may either alter the current sequence of operations or just need to be memorized for later treatment. In addition, a given event may not be interesting at all times. In the above example, the deceleration event is only of interest if the elevator has to stop on the corresponding floor. Event detection needs to be disabled sometimes and enabled again later. An essential characteristic of discrete event systems is that if the action triggered by the occurrence of an event is perfectly known, the point in time at which this event will occur is a priori unknown. Furthermore, the order in which two or more events occur is often very important.
19.3.1 Batch Systems A lot of physical systems present both of these aspects; they are called batch systems.
19.4 Control Systems Continuous systems and discrete event systems implemented by cyclic or periodic polling are often called time-triggered systems or sampled data systems. Discrete event systems implemented using interrupts or internal software events are referred to as event-triggered systems (Kopetz, 1991).
19.4.1 Time-Triggered Systems From the control system viewpoint, continuous systems will be implemented as looping tasks that await the beginning of the period, sample their inputs, compute the new output values, and set the actuators according to the new values. Periodicity is not mandatory but often assumed, as it leads to simpler
© 2005 by CRC Press
19-4
The Industrial Communication Technology Handbook
algorithms and more stable and secure systems. Most of the algorithms developed with this assumption are very sensitive to period duration variations, jitters in the starting instant. This is especially the case of motor controllers in precision machines. Simultaneous sampling of inputs is also an important stability factor. Let us assume that the controller implements a state space control algorithm in which the state vector includes the position, speed, and acceleration of a mobile and that the actual values are obtained by three sensors, a position encoder, tachometer, and accelerometer. The control algorithm assumes that the measured values were acquired at the same instant. For the control systems, this translates into what is called simultaneous sampling. The acquisitions are indeed not really simultaneous if only one processor is devoted to the three operations. However, their instants must be as close as possible and remain apart under a given limit that may be roughly estimated to two orders of magnitude below the period.
19.4.2 Discrete Event Control Systems Discrete event systems may be implemented as a set of tasks activated by event occurrences. Let us define an event as a significant change (or a sequence of significant changes) of state of the process (the so-called occurrences). This change, the event, is detected by some input circuitry that continuously monitors the input and is transformed into an interrupt (detection may also be done by software). The event is handled by a task that undertakes the required actions. The time elapsed between the occurrence of the event and the corresponding reaction, often called reaction time, is bounded and given in the requirements. Reaction times may depend on the kind of event. Reactions to events are normally handled according to the order of arrival of the events. However, some events may be more important than others. The emergency stop in an elevator is clearly more important than a summons request. This will translate into priorities in the event handling. In any case, the order in which events occur is important for the application. The above implementation technique is conventional but not the only one. It is always possible to implement a discrete event system as a continuous one. In such a case, all the inputs are sampled at regular intervals by the control software that detects the changes and undertakes the necessary reactions. This is the way most programmable logic controllers (PLCs) are implemented. With such an implementation, precedence between events may only be asserted if the events are detected during different polling cycles. If two events are detected during the same cycle or period, they will be considered simultaneous. The cycle duration or period should be selected in such a way that all events can be detected. In the elevator example, if the deceleration switch closes during a minimum of 20 ms, the polling period should clearly be lower in order to detect the close event. In summary, continuous control systems exhibit four important characteristics: • They are cyclic and often periodic; period values are set according to the process dynamics. • Jitters in the period should be limited to a few percent of the period. • The instants of input acquisition and output settings are known in advance and dictated by the control system. • All inputs need to be sampled nearly simultaneously. Discrete event systems may be implemented as continuous ones are. They then exhibit the same characteristics but are not necessarily periodic. However, the cycle time should be kept low enough so that all events may be sensed. These systems may also be implemented using interrupts with the following characteristics: • • • • • •
Event occurrence instants are not known. The reaction time to an event is bounded. The order of occurrence of events is important. Reaction to some events may have a higher priority. Detection of a given event may be temporarily disabled. There is a limit in the density of event occurrence that may be handled by the control system.
© 2005 by CRC Press
Which Network for Which Application
19-5
From the control point of view, a control system is often composed of a time-triggered component and an event-triggered component.
19.5 Communication Systems The communication system is there to support the interaction between the control applications. On a given computer different applications may coexist, some being time triggered and some being event triggered. They may communicate with distant applications of the other kind (a time-triggered application may communicate with an event-triggered application and vice versa). The communication network itself may be built according to the event-triggered paradigm, the timetriggered paradigm (Kopetz, 1994), or a combination of both. Matching a time-triggered application with a time-triggered network is obviously more easily performed than doing so with an event-triggered communication system. The latter requires some additional adaptation. Ideally, the communication system should support both views.
19.5.1 Communication Models Communication models define how the different application processes may cooperate (Thomesse, 1993). The most widely used communication model is the client–server model. In this model, processes interact through requests and responses. The client is the process that requests an action be performed by another process, the server. The server carries out the work and returns a message with the result. Client and server only refer to the role the processes play during the remote operation. A process that is a client in some operation may become the server for another one. This model is hence clearly a point-to-point model. This model exhibits a number of drawbacks for control applications. First, time is not taken into account. It is impossible to specify a delay between the request and the response and have some means to check that this delay would be respected because the server application is involved in the response. Second, if a client wants to make simultaneous requests to several servers, this may only be performed sequentially, one request after the other. Last, if two clients make the same request to a server, this model will treat the requests in sequence and the answers may be different. For example, two control nodes may request the value of a sensor attached to a server node. The value returned to the first client may differ from the value returned to the second. The last two problems may be solved by adequate algorithms running on top of the client–server interaction. This results in heavy implementations with often poor performances. However, for a number of remote operations, in particular during the configuration and setup phases, these problems do not appear and the client–server model is a good solution. For real-time operations, industrial networks need to offer performing and simple solutions that solve the problems of the client–server model. This has led to the producer–consumer model (sometimes called publisher–subscriber model), which is a multipoint model. This model is restricted to the transfer of information values and as such is well suited to time-triggered systems. In the producer–consumer model, each piece of information has a producer and one or more consumers. When the producer has some information ready, it gives it to the network. The network transfers the information and makes it available to the consumers. This has some advantages over the client–server model: • The producer and consumers do not need to be synchronized. The consumer does not need to wait for the response of the producer as in the client–server model. If the information is already transferred, it may use it; otherwise, it considers that no new information is available. • The same information can be transferred at the same time to all consumers. The network may thus be used in a more efficient way. • Two or more consumers will work with the same value at a given time. • Flow control is no longer necessary as new information overwrites the previous information. This assumes that the production of new information outdates the previous one.
© 2005 by CRC Press
19-6
The Industrial Communication Technology Handbook
• Synchronization between applications can be implemented as production of a synchronization variable. However, this model comes with a price. As the consumer has no way to relate the information with an explicit request (as in the client–server model), it should be able to know the age of the information. Networks implementing the producer–consumer model should be able to tag the information with some attributes from which this information can be extracted. The producer–distributor–consumer model is an extension of the producer–consumer model that adds an extra level of separation between the production and the transfer of information. The additional role, the distributor, is in charge of transferring the information from the producer site to the consuming sites. In such a way, transfers are no longer triggered by production but are done according to rules defined for the distributor. This offers more flexibility in the scheduling of transfers in the network and results in an improved efficiency.
19.5.2 Temporal Consistency Temporal consistency has two facets, absolute temporal consistency and relative temporal consistency (Kopetz, 1988). Absolute temporal consistency is related to the age of information. A piece of data has been produced at a given time. It is later transmitted over the communication system. It is finally used at some point in time. A piece of data is said to be time consistent in the absolute way as long as the difference between the instant of production and the current instant does not exceed its validity duration. In other words, a piece of data should not be used if it is too old. This behavior is easy to ensure when all takes place on the same computer. When the data are transported over a network, the information is often lost. In time-triggered application, the control application expects that all input data have been acquired at the same time. The measures have been done in a given time window. This property is called production time coherence in the Field Instrumentation Protocol (FIP) standard, and more generally, we can speak of time consistency each time some action must be done in a time interval (ISO, 1994). It is called temporal (or relative temporal) consistency (Kopetz and Kim, 1990) and is defined as follows. Let us consider two variables a and b. Let [a, ta, va] and [b, tb, vb] be two observations of a and b where va and vb are two samples of a and b taken at times ta and tb. The samples are said to be temporally coherent from the production point of view if |ta – tb| < R for a given R, where R is the temporal consistency threshold.
19.5.3 Spatial Consistency As defined by Le Lann (1990), a distributed system is “a computing system whose behavior is determined by algorithms explicitly designed to work with multiple loci of control.” In such systems, there are a number of control nodes that do not work independently but cooperate to perform common tasks. Cooperation may be needed to achieve a certain degree of fault tolerance or because the overall control task is too large to be handled on a single node. Time-triggered distributed systems work periodically, and the various loci of control synchronize their operations according to the passage of time. To do so, they cannot rely on local timers and need to share a common sense of time (Verissimo, 1994). This may be achieved through a distributed clock synchronization algorithm (Kopetz and Ochsenreiter, 1987) or, with some restrictions, using temporal events generated by the networks (He et al., 1990). In such systems, the various loci of control only exchange data, or state information. Obviously, all data necessary for the computations should be available to all loci of control before the synchronization instants; flow control is done statically. A single unit of data may be needed by several control nodes. As these nodes do not share a common memory, the data unit needs to be replicated in each node. This may be done by a broadcast transfer, but we need to have some guarantee that all replicas are identical at the time they are used. This property is called spatial consistency. It may be obtained by a reliable broadcast algorithm (Hadzilacos and Toueg, 1993) or, as in FIP, if only an indication of spatial consistency is necessary (Decotignie and Prasad, 1994).
© 2005 by CRC Press
Which Network for Which Application
19-7
19.5.4 Event Ordering Event-triggered control systems are often very sensitive to the order in which events occur. Networks do not guarantee that requests for information transfers are handled in the order they are submitted. This means that the applications cannot rely on the order in which they receive the events from the networks to establish the order of occurrence of events in time.
19.5.5 Influence of Failures Failures cannot be avoided, but their influence must be minimized. In the value domain, possible failures may be undefined values or defined values considered correct, but incorrect vis-à-vis the related input. In the time domain, crash failures, omission failures, and timing (early or late) failures may occur. Coping with failures is the subject of fault-tolerant computing (Le Lann, 1992) and, as such, outside our scope. However, it is worth discussing the impact of the network on the system with regard to failures. With the introduction of a network, faults may occur in a number of additional entities, links, emission and reception circuits, and software. Links may be cut intermittently or permanently and may be subject to perturbations that mutilate the transmitted information. Emission and reception parts in nodes may stop, respond too quickly or too early, and emit when they are not allowed to or even constantly. Networks have been designed to resist faults by detecting and correcting errors using three types of redundancy: time redundancy, physical resource redundancy, and information redundancy. For example, redundant information is added to each message transferred on the network in the form of an error detection code, parity, or cyclic redundancy check (CRC). Each code exhibits a given detection capacity, which means that there is a number of errors that may not be detected at the network level. Crash, omission, and timing failures are normally detected through the use of timers. In time-triggered systems, this detection is easy because each node has to transmit periodically. Absence of message arrival in the period indicates a possible failure, and appropriate countermeasures may be taken in the application. Furthermore, error correction may be based on temporal redundancy by keeping the previous value and just waiting for a new value at the next period. Event-triggered systems are more difficult to handle. Mutilated frames may be detected as previously and signaled to the sender by a negative acknowledgment from the receiver. The sender waits for the acknowledgment, either positive or negative. Omission failures are detected by the absence of acknowledgment in a given delay. In case of negative or absent acknowledgment, the emitter retransmits the same message. This process is repeated until success or a maximum number of retries has been reached. This means that, to cope with possible faults, acknowledgments are necessary. Furthermore, the time necessary to transfer a message from a sender to a receiver may greatly vary, with a corresponding degradation of the application response time. It also introduces a delay in the transmission of other messages. A second potential problem in event-triggered systems is the difficulty to distinguish a node crash or an omission failure from an absence of event occurrence. A receiver node, for instance, an actuator node, that does not receive any message may assume either that the control node has failed or that there is no new command. In the first case, the actuator should be put in some safe mode, while in the latter no special treatment is to be done. This means that liveness messages should be sent at regular intervals in addition to normal messages. Some failures may prevent the network from functioning if the network has not been designed properly. For instance, if all traffic is ruled by a single node, the case of centralized medium access control, a crash or an omission failure of this node causes the network to stop unless this node is duplicated and some recovery protocol implemented. Some nodes may also start transmitting messages constantly or at a point in time where they are not allowed to do so. This may be the case with some deterministic carriersense multiple-access (CSMA) protocols where collision resolution is based on a priority.
19.6 Parameters to Consider in a Choice When selecting a network for a given application, there are a number of parameters to consider:
© 2005 by CRC Press
19-8
The Industrial Communication Technology Handbook
• Communication models: As described above, two main models are used in building an application layer, the client–server model and the producer–distributor–consumer model (sometimes also called the publisher–subscriber model). Distributed applications are likely to use the latter, while hierarchically organized applications may use the former. • Traffic types: Information transfers may be sporadic (event triggered), cyclic, or periodic. In the last case, the application may impose strict limits in the jitter between consecutive transmissions of the same data. • Topology and transmission medium: If most solutions permit a tree-like topology, there might be restrictions in the branch lengths and the number of nodes in the tree. Some solutions also offer transmission media other than copper-based twisted pairs. Examples are optical fibers or radio transmissions. • Immunity to noise, environment: Networks can be used in difficult environments. This is often the case in transportation and petrochemical applications. The physical layer may be more or less resistant to such environments. Selecting an inadequate solution may lead to an increase error rate in the transmission or even a complete fading in some cases. • Errors: As described above (Section 19.5), transmission errors cannot be avoided even if, with proper cabling and an adequate physical layer, their level can be kept very low. In many applications, temporary errors can be tolerated even if the application is not informed of the occurrence of the error. However, in some cases, this is not acceptable and the network should provide ways to inform the application that an error has occurred. • Throughput: The raw bit rate of a given network is a poor measure of the actual throughput as seen from the application. Overheads in the protocols, response delay in the network stack, medium access control schemes, traffic scheduling algorithms, and delays in the application reduce the actual throughput. As an example, transferring a 16-bit value requires the actual transfer of around 1000 bits with Ethernet, 450 bits with PROFIBUS-FMS, 200 bits with the Controller Area Network (CAN), and 90 bits with FIP. Due to the other effects, the actual throughput may be one or two orders of magnitude below the raw bit rate. • Guarantees: A lot of research work has been devoted to calculating the guarantees offered by industrial networks, as this is of prime importance to the applications. Guarantees give answers to different questions such as: • Will the network be able to withstand a given traffic load? • What will be the maximum transfer time of a sporadic event? • What happens in case of network overload? • What will be the maximum jitter of a periodic transmission? If most solutions offer some guarantees, few give answers to all of the questions listed above. Furthermore, most of the time guarantees are given under the assumption that no transmission error will occur, which cannot be taken for granted. • Consistency: As mentioned above, temporal consistency is assumed in most control applications. Unfortunately, most solutions do not offer any support for temporal consistency not speaking of spatial consistency. This implies that this aspect should be added on top of the selected network. • Horizontal vs. vertical traffic: As described above, a network may be used to support traffic between computers of adjacent levels (vertical traffic). It may also be used to ensure communication between control devices of the same level to guarantee coordination. Traffic patterns and communication relationships are often different in each case, and the network solution should be able to sustain both. • Services: The application layer of a network provides a number of services to the applications. Services that are not available will need to be implemented in the application. Examples of such services are: • Read and write typed variables (or objects) with possible indication of freshness • Read sets of data with possible indication of temporal consistency
© 2005 by CRC Press
Which Network for Which Application
19-9
• Download of data and programs • Remote invocation (start, stop, pause, resume) of programs • Synchronization between applications • Receive, send, or subscribe to events with possible indication of time of occurrence • Configuration ease: For most networks, vendors offer tools that allow configuration of a network. This may include setting addresses, initial values to the timers used in the network, periods, priorities, etc. Tools are also available to monitor the network. The availability of such tools may greatly reduce commissioning and management efforts. • Connection with the Internet: With the widespread use of Web browsers, it is tempting to put a Web server inside each industrial device. This server is used to access non-real-time information such as configuration, user manual, etc. To implement such a capability, the transport protocol should be able to carry the Hypertext Transfer Protocol (HTTP) messages. However, the Transport Control Protocol (TCP) is not mandatory.
19.7 An Overview of Some Solutions There are dozens of networks used in the industrial domain. Here, we present briefly some of the most well known solutions. The objective is to highlight the features that make them fit a type of application rather than another.
19.7.1 Actuator Sensor Interface Actuator sensor interface (ASi) (CENELEC EN 50295, IEC 62026-2) is a communication bus targeted at simple remote inputs and outputs for a single industrial computer. It is based on a low-cost electromechanical multidrop connection system designed to operate over a two-wire cable, over a distance of up to 100 m, or more if repeaters are used. Data and power are transmitted on the same two-wire cable. It is especially suitable for lower levels of plant automation where simple — often binary — field devices such as switches need to interoperate in a stand-alone local area automation network controlled by PLC or PC. The master polls the network by issuing commands and receiving and processing replies from the slaves. Connected slaves are polled cyclically by the master with a data rate of 166,6 kbit/s, which gives a maximum latency of 5148 ms on a fully loaded network (31 slaves). Sixty-two slaves can be connected in the extended addressing mode. Each slave may receive or transmit four data bits. There is provision for automatic slave detection. ASi does not define any application layer but provides management functions to configure or reset slaves and detect new slaves.
19.7.2 Controller Area Network Controller Area Network (CAN) results from an effort from Robert BOSCH to provide a serial communication multiplexer for transmission of information inside a vehicle. CAN layers 1 and 2 are normalized by the International Organization for Standardization (ISO) (1993). Strictly speaking, there are a few application layers for CAN among which CANOpen is the most well known. CAN is a bus without repeaters, which restricts the possible topologies. Up to 30 stations can be connected. Every message to be transmitted is uniquely identified, and there is no means to address directly a given station unless a unique identifier is attached to this station. Every station may access the bus when no other station is transmitting. If two stations start transmitting at the same time, there is an arbitration and the message with the lowest identifier will win. This means that the transmission of a message bearing a high value as identifier may be deferred for a long period and time constraints may only be fulfilled statistically unless an additional protocol (time slots or central access controller) is added. It should be noted that by using an open-collector connection to the cable, CAN implements an immediate and collective acknowledge to any transmitted frame. The protocol is hence efficient and spatial consistency may be easily implemented.
© 2005 by CRC Press
19-10
The Industrial Communication Technology Handbook
At the application layer, CANOpen offers adequate services for sporadic (event) transfers using the producer–consumer model with minimal support of consistency. CAN is not adequate for periodic transfers. As a response, time-triggered (TT)-CAN has been developed. Through its deterministic collision resolution mechanism, CAN offers ways to determine the guarantees offered by the network. Its behavior in the presence of overload is known: low-priority messages cannot access the network any longer. This implies a careful configuration of the network. DeviceNet (IEC 62026-3, 2000b) and Smart Distributed System (SDS) (IEC 62026-5, 2000c) are partly based on CAN.
19.7.3 HART HART (ROSEMOUNT, 1991) has been designed by ROSEMOUNT to interconnect transmitters in process control applications. HART may run over existing 4- to 20-mA lines on a point-to-point mode or on twisted pairs in a bus configuration. In the first case, process values are usually transmitted by analog means and HART is used for configuration and tests. In the multipoint (bus) mode, up to 15 devices may be remotely powered from the master station while fulfilling intrinsic safety requirements. If they are powered locally, a much larger number of devices may be connected. All traffic either comes from or goes to the master station. An additional master station (handheld terminal) may be connected to the bus mainly for management purposes. HART defines an application layer suited to process control applications and is well established in the field. Outside this scope, HART is restricted to very slow master–slave applications.
19.7.4 INTERBUS INTERBUS has been developed by Phoenix Contact to implement distributed inputs and outputs on a PLC. It complies with the collapsed three-layer Open Systems Interconnection (OSI) model and has been standardized by CENELEC (EN 50254, 1998a). INTERBUS uses a ring topology with a master station (PLC) and up to 256 slave stations. Twisted pairs are used, but fiber-optic cable may be used easily. Topologies are slightly restricted by the ring cabling. In INTERBUS, most of the traffic is cyclic or periodic, with a single period for all traffic. The traffic is from the master to slaves and vice versa. As such, INTERBUS is targeted to remote inputs and outputs for time-triggered applications. Simultaneous sampling is available and network guarantees can be easily determined.
19.7.5 LON LON has been designed by ECHELON, especially with building control in mind. It complies with the full seven-layer OSI model and provides a variety of transmission media, including wireless at limited speed (5 kbit/s). It has been standardized by the Electronic Industries Alliance (EIA-709.1, 1998). On twisted pairs, LON uses the RS 485 standard. Repeaters are permitted and a rather general topology may be implemented. The medium access control is decentralized and contention based (predictive CSMA). Time constraints may hence only be fulfilled statistically. LON has a network layer and routers are available to interconnect up to 255 subnetworks. The application layer provides services for sporadic variable and message exchanges without temporal relationship (no support for time stamping). Variables and messages may have different priorities and may be authenticated. The authentication key is assigned to the node. LON has no support for cyclic transfers and cannot ensure periodicity. Variables may be typed by using one of the predefined types or a C declaration. In the former case, a remote station will be able to read the variable type. Each node may define a variable as input from or output to the network. In the latter case, when the value of the variable is updated, the new value may be automatically sent to all nodes that have declared the variable as input. Such transmissions seem to
© 2005 by CRC Press
Which Network for Which Application
19-11
take place in a series of point-to-point transfers and not in a broadcast mode as in WorldFIP. A second difference is that the same variable may be updated (declared as output) by several nodes. Messages may be sent with or without acknowledgment in the point-to-point or multipoint mode. The content of a message is user defined. Messages provide the means to extend the LON functionality. Loss of interoperability is the price to pay.
19.7.6 MIL-STD-1553 MIL-STD-1553 has been developed for the U.S. Air Force and the U.S. Navy (Haverty, 1986). It may be considered the precursor of fieldbuses and most of the concepts have been used in other proposals. Information is communicated over a bus in Manchester II biphase encoding at a rate of 1 Mbits/s. A maximum of 31 remote terminals (RTs) can be connected to the bus. Each RT can have up to 30 subaddresses. They are coupled to the bus controller (BC) either directly or via transformers, over a screened twisted pair. The bus controller initiates all message transfers. It issues a command frame to a given RT to transmit or receive data. The RT responds by sending a data frame if necessary and terminates the transfer with a status frame. The bus controller may simultaneously issue to a given RT a command to receive data and to another RT a command to transmit data. This mechanism allows cross-communication between RTs. Command, data, and status frames consist of 16 data bits, 1 parity bit, and 3 bits of Manchester code violation for synchronization at the beginning of the frame. Very strong requirements are set for cables, coupling transformers, and isolation resistors to ensure very good noise immunity. MIL-STD-1553 does not define any application layer. There are, however, some attempts to use it in the industry. Due to the high interface cost, its success is very limited.
19.7.7 PROFIBUS-FMS PROFIBUS is a European standard (EN 50170 1996b) that adopts a three-layer architecture and includes a full network management part. PROFIBUS has strong analogies with MiniMAP and may be considered a rather cheap implementation of this network. It makes the difference between master stations that may transmit on their own and slave stations that may only respond to inquiries from master stations. This allows simple devices to be implemented in an economic manner. A rather general topology is possible with up to 127 devices connected. Intrinsically safe devices are available from some vendors. Medium redundancy is supported. PROFIBUS relies on a simplified token-passing mechanism that does not offer any guaranteed transfer time. Elaboration of spatial and temporal consistency indications as well as age of data is not standardized and left to the users. The application layer may be considered a subset of Manufacturing Message Specification (MMS) (Manufacturing Automation Protocol (MAP) application layer) (CCE-CNMA, 1995b) even if the syntax is somehow different. Room is left for profiles that are more or less equivalent to companion standards in MMS. In addition, data access protection is defined and applies to all objects (variables, programs, domains) manipulated by the application services. PROFIBUS-FMS introduces the idea of cyclic exchanges of data. However, this has very little impact on the application services. In fact, cyclic data exchanges are added to handle data transfers from slave stations and not for cyclic user data transfers. As a cell network, PROFIBUS fulfills most of the requirements. One may regret the absence of application services for event management and application synchronizations (semaphores).
19.7.8 PROFIBUS-DP PROFIBUS-DP is also covered by European standard EN 50170. It shares its physical and data link layers with PROFIBUS-FMS. DP is targeted at remote inputs and outputs and has normally one master station and a number of slave stations.
© 2005 by CRC Press
19-12
The Industrial Communication Technology Handbook
Traffic is essentially cyclic from the master to the slaves and vice versa. In the absence of other master stations, some real-time guarantees can be calculated.
19.7.9 SERCOS SERCOS results from a common effort of the German Machine-Tool (VDW) and Drive Manufacturers (ZVEI) to design a communication mean between computerized numerical controllers (CNCs) and motor drives. It has been standardized by IEC (61491, 2002) and CENELEC (1998b). SERCOS is a ring where the link between each node is an optical fiber. High noise immunity is thus provided. There is a central station in the CNC, and all traffic either comes from or goes to this master station. However, medium access control is not under the control of the master, which would have rendered the network much more efficient. Rather, every slave (drive) has a time slot in which it may transmit information. The selection of the slot is done at start-up and takes into account the response time of each device (drive). Each transmitted frame may contain real-time data as well as messages for configuration. As a maximum of 2 bytes may be sent per frame, messages must be segmented in a large number of segments and reassembled at reception. Even if SERCOS is not really structured according to the OSI model, a large number of application layer services are defined. These services mainly apply to drives. SERCOS provides simultaneous sampling of the inputs and handles periodic traffic at a minimum of 62.5 µs. Real-time traffic is guaranteed. Up to 254 drives can be connected.
19.7.10 WorldFIP Field Instrumentation Protocol (FIP) is a European standard (CENELEC, 1996a) that complies with the ISO collapsed three-layer model. Its physical layer conforms with the fieldbus international standard (IEC 61158-2) and provides IEC level 3 EMC noise immunity. It includes a full network management part. FIP assumes that most of the traffic is cyclic or periodic. In this case, each transfer is from a producer to a number of consumers. For real-time data, the producer–consumer model is used. Different periods or cycle durations may coexist on the same network. The deterministic nature of the traffic justifies a central medium access controller, the distributor or bus arbiter. For reliability purposes, redundant bus arbiters can be added. In FIP, cyclic traffic scheduling is performed by the bus arbiter based on the needs expressed during the initialization phase by each communicating entity. Sporadic traffic is scheduled based upon the needs expressed at runtime. FIP supports two transmission media: shielded twisted pair and fiber optics. The topology is very flexible. Medium and line-driver redundancy are supported. Up to 64 stations can be connected without repeaters. As mentioned above, FIP medium access control is centralized. All transfers are under the control of the bus arbiter that schedules transfers to comply with timing requirements. The data link layer provides two types of transmission services, those for variable exchange and those for message transfer. Transfers of variables and messages may take place under request from any station or cyclically (or periodically) according to system configuration. Variables are exchanged according to the producer–distributor–consumer model and identified by a unique 16-bit identifier known from the producer and consumer. The identifier is not related to any physical address. Messages are transferred from a source station to a single or all destination stations according to a client–server model. Each message holds its source and destination addresses. These addresses are 24 bits long and identify the segment number and address of the station on the segment (in fact, the link service access point). Messages are optionally acknowledged. For real-time data exchange, FIP behaves like a distributed database being refreshed by the network periodically or upon demand. All the application services related to periodic and sporadic data exchange are called MPS. MPS provides local read and write services (periodic) as well as remote read and write
© 2005 by CRC Press
Which Network for Which Application
19-13
services (sporadic). Accessed objects are variables or lists of variables. For variables, information on freshness is available. For lists of variables, FIP provides information on temporal and spatial consistency status. In addition, FIP provides a conventional client–server model for messages and events with a subset of MMS as application layer. The available services are defined according to classes: sensor, actuator, input/ output (I/O) concentrator, PLC, operator console, and programming console. The MMS subset covers domain management, program invocation, variable access, semaphore and journal management, and the basics of event management. Syntax and encoding rules conform to Abstract Syntax Notation 1 (ASN.1).
19.7.11 Ethernet-Based Solutions Ethernet has been used since the beginning of the 1980s in industrial communications (CCE-CNMA, 1995a). Even if a few solutions (Le Lann, 1993) introducing some degree of predictability in this technology have been designed and produced, users in the industrial domain were reluctant to use Ethernet for its lack of guarantees. More recently, due to its low cost and wide availability, there is a revival of interest in using Ethernet (IEEE 802.3) as a communication network in factories. Ethernet (IEEE802.3, 2000) defines the lowest two protocol layers of the OSI model. Its topology is a tree in which each node is either a switch or a hub. This decreases cabling flexibility and increases the cost of the solution. The main question is: Which protocol should be used over Ethernet? Layer 3 (network layer) is often the Internet Protocol (IP) (IETF, 1981a), which is adequate for real time. Selecting the Transport Control Protocol (TCP) (IETF, 1981b) as a transport protocol is not really adequate for realtime use (Decotignie, 2001). Other solutions such a Xpress Transfer Protocol (XTP) (Sanders and Weaver, 1990) do a much better job. TCP may still be used for non-real-time traffic. If adequate application layers can be found easily, the missing part is what we could call a real-time layer that can provide indication of consistency and event ordering. For the time being, a number of organizations are pushing for Ethernetbased solutions, including a PROFIBUS version. Unfortunately, there is no compatibility between these solutions. Let us simply state that it is certainly possible to built adequate solutions on top of Ethernet, but this is still to come.
19.7.12 Solutions from Nonindustrial Markets It is tempting to use, in the industrial domain, solutions that were designed for the consumer market. They are often cheaper and a lot of tools exist to support them. Universal Serial Bus (USB) and Firewire, or IEEE 1394, are examples of such technologies. They were not designed for computer-to-computer networking but as remote input and output media for a single computer. The main questions are: • Is the solution able to withstand industrial use? • Is there a clear advantage over other solutions? The answer to the first question is case dependent. Successful experiences were made with Firewire (Ruiz, 1999). However, the physical layer of both USB and Firewire was not designed for industrial use. The second question has several facets. In terms of throughput, there is not a definitive advantage over other high-speed solutions (Dallemagne and Decotignie, 2001). The main problem is that most of the functionality is missing. These networks only define the lowest two layers of the OSI model and completely lack the other layers.
19.8 Conclusion The selection of a communication system to interconnect industrial applications on different computers depends on the type of application. We have presented the taxonomies of control applications. We have shown that a related taxonomy exists for the industrial communication networks. However, distributing industrial applications over several sites connected by a network introduces a number of problems that differ depending on the type of application. Industrial networks should support the resolution of these
© 2005 by CRC Press
19-14
The Industrial Communication Technology Handbook
problems. Finally, selected examples of networks were briefly described and the ability to support the different types of applications and problems related to distribution outlined.
References CCE-CNMA (1995a), CCE: An Integration Platform for Distributed Manufacturing Applications, Research Report ESPRIT, Springer-Verlag, Berlin. CCE-CNMA (1995b), MMS: A Communication Language for Manufacturing, Research Report ESPRIT, Springer-Verlag, Berlin. CENELEC EN 50170 (1996a), General Purpose Field Communication System, Vol. 3/3 (WorldFIP). CENELEC EN 50170 (1996b), General Purpose Field Communication System, Vol. 2/3 (PROFIBUS). CENELEC EN 50254 (1998a), High Efficiency Communication Subsystem for Small Data Packages, December 1998. CENELEC EN 50295 (1999), Low-Voltage Switchgear and Controlgear: Controller and Device Interface Systems: Actuator Sensor interface (AS-i). CENELEC EN 61491 (1998b), Electrical Equipment of Industrial Machines: Serial Data Link for RealTime Communication between Controls and Drives (SERCOS). Dallemagne P., Decotignie J.-D. (2001), A comparison of USB 2.0 and Firewire in industrial applications, in Proceedings of the FeT 2001 International Conference on Fieldbus Systems and Their Applications, Nancy, France, November 15–16, pp. 16–23. Decotignie J.-D. (2001), A perspective on Ethernet-TCP/IP as a fieldbus, in Proceedings of the FeT 2001 International Conference on Fieldbus Systems and Their Applications, Nancy, France, November 15–16, pp. 138–143. Decotignie J.-D., Prasad P. (1994), Spatio-temporal constraints in fieldbus: requirements and current solutions, in the 19th IFAC/IFIP Workshop on Real-Time Programming, Isle of Reichnau, June 22–24, pp. 9–14. EIA-709.1 (1998), Control Network Specification, Electronic Industries Alliance, Arlington, VA. Hadzilacos V., Toueg T. (1993), Fault-tolerant broadcasts and related problems, in Distributed Systems, S. Mullender, Ed., ACM Press, New York. Halang W., Sacha K. (1992), Real-Time Systems, World Scientific, Singapore. Haverty M. (1986), MIL-STD-1553: a standard for data communications, Communication and Broadcasting, 10, 29–33. He J., Mammeri Z., Thomesse J.-P. (1990), Clock synchronization in real-time distributed systems based on FIP field bus, 2nd IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, Egypt, September 30–October 2, pp. 135–141. IEC 61158-2 (2000–8) Fieldbus Standard for Use in Industrial Control Systems: Part 2: Physical Layer Specification and Service Definition. IEC 61491 (2002), Electrical Equipment of Industrial Machines: Serial Data Link for Real-Time Communication between Controls and Drives. IEC 62026-2 (2000a), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 2: Actuator Sensor Interface (AS-i). IEC 62026-3 (2000b), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 3: DeviceNet. IEC 62026-5 (2000c), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 5: Smart Distributed System (SDS). IEEE 802.3 (2000), Part 3: Carrier Sense Multiple Access with Collision Detection on (CSMA/CD) Access Method and Physical Layer Specifications, pp. i –1515. IETF RFC 791 (1981a), Internet Protocol, September 1. IETF RFC 793 (1981b), Transport Control Protocol, September 1. ISO 11898 (1993), Road Vehicles: Exchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication.
© 2005 by CRC Press
Which Network for Which Application
19-15
ISO (1994), Time Critical Communication Architectures: User Requirements, ISO TR 12178, Geneva. Jones A., et al. (1989), Issues in the design and implementation of a system architecture for computer integrated manufacturing, International Journal of Computer Integrated Manufacturing, 2, 65–76. Kopetz H. (1988), Consistency constraints in distributed real time systems, in Proceedings of the 8th IFAC Workshop on Distributed Computer Control Systems, Vitznau, Switzerland, September 13–15, pp. 29–34. Kopetz H. (1991), Event-triggered versus time-triggered real-time systems, in Operating Systems of the 90s and Beyond, Lecture Notes in Computer Science 563, Springer, Heidelberg, 1991. Kopetz H., Grunsteidl G. (1994), TTP: a protocol for fault tolerant real-time systems, IEEE Computer, 27, 14–23. Kopetz H., Kim K. (1990), Temporal uncertainties in interactions among real-time objects, in Proceedings of the 9th Symposium on Reliable Distributed Systems, Huntsville, AL, October 9–10, pp. 165–174. Kopetz H., Ochsenreiter W. (1987), Clock synchronization in distributed real-time systems, IEEE Computer, 36, 933–940. Le Lann G. (1990), Critical issues in the development of distributed real-time computing systems, in 2nd IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, September 30–October 2, pp. 96–105. Le Lann G. (1992), Designing real-time dependable distributed systems, Computer Communications, 14, 225–234. Le Lann G., Rivierre N. (1993), Real-Time Communications over Broadcast Networks: The CSMA-DCR and the DOD-CSMA-CD Protocols, Research Report INRIA 1863. ROSEMOUNT, Inc. (1991), HART Smart Communications Protocol Specifications, revision 5.1.4, January. Ruiz L., et al. (1999), Using Firewire as an industrial network, in SCCC ’99, Talca, Chile, pp. 201–208. Sanders R., Weaver A. (1990), The Xpress Transfer Protocol (XTP): a tutorial, Computer Communications Review, 20, 65–80. Thomesse J.-P. (1993). Time and industrial local area networks, in the 7th Annual European Computer Conference on Computer Design, Manufacturing and Production (COMPEURO ’93), Paris-Evry, France, May, pp. 365–374. Verissimo P. (1994), Ordering and timeliness requirements of dependable real-time programs, Real-Time Systems, 7, 104–128.
© 2005 by CRC Press
II Ethernet and Wireless Network Technologies 20 Approaches to Enforce Real-Time Behavior in Ethernet .............................................20-1 P. Pedreiras and L. Almeida 21 Switched Ethernet in Automation Networking .............................................................21-1 Tor Skeie, Svein Johannessen, and Øyvind Holmeide 22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches..........22-1 Andreas Willig 23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment ......................................................................................................................23-1 Kirsten Matheus
II-1 © 2005 by CRC Press
20 Approaches to Enforce Real-Time Behavior in Ethernet 20.1 20.2 20.3 20.4 20.5
Introduction ......................................................................20-1 Ethernet Roots...................................................................20-2 Why Use Ethernet at the Fieldbus Level? ........................20-3 Making Ethernet Real Time .............................................20-4 CSMA/CD-Based Protocols..............................................20-5 NDDS • ORTE • Traffic Smoothing
20.6 Modified CSMA Protocols................................................20-8 Virtual-Time CSMA • Windows Protocol • CSMA/DCR • EQuB
20.7 Token Passing ..................................................................20-13 Timed-Token Protocols • RETHER • RT-EP: Real-Time Ethernet Protocol
20.8 Time-Division Multiple Access ......................................20-17 The MARS Bus • Variable-Bandwidth Allocation Scheme
20.9 Master–Slave Techniques ................................................20-18 FTT-Ethernet Protocol • ETHERNET Powerlink
Paulo Pedreiras Universidade de Aveiro
Luís Almeida Universidade de Aveiro
20.10 Switched Ethernet ...........................................................20-20 EDF Scheduled Switch • EtheReal
20.11 Recent Advances ..............................................................20-24 20.12 Conclusion.......................................................................20-25 References ...................................................................................20-26
20.1 Introduction Nowadays, intelligent nodes, i.e., microprocessor based with communication capabilities, are extensively used in the lower layers of both process control and manufacturing industries [52]. In these environments, applications range from embedded command-and-control systems to image processing, monitoring, human–machine interface, etc. Moreover, the communication between different nodes has specific requirements [5] that are quite different from, and sometimes opposed to, those found in office environments. For instance, predictability is favored against average throughput, and message transmission is typically time and precedence constrained. Furthermore, a lack of compliance with those constraints can have a significant negative impact on the quality of the control action in distributed computer control systems (DCCS), or on the quality of the observation of the system state in distributed monitoring systems (DMS). Therefore, to deliver adequate quality of service, special-purpose networks have been developed, essentially during the last two decades, which are generically called fieldbuses and are particularly adapted to support frequent exchanges of small amounts of data under time, precedence, and dependability
20-1 © 2005 by CRC Press
20-2
The Industrial Communication Technology Handbook
constraints [52]. Probably, the most well known are PROFIBUS, WorldFIP, P-NET, Foundation Fieldbus, TTP/C, CAN, and CAN-based systems such as DeviceNet. In the early days of DCCS, network nodes presented simple interfaces and supported limited sets of actions. However, the quantity, complexity, and functionality of nodes on a DCCS have been increasing steadily. As a consequence of this evolution, the amount of information that must be exchanged over the network has also increased, either for configuration or for operational purposes. The increase in the amount of data exchanged among DCCS nodes is reaching limits that are not achievable using traditional fieldbuses due to limited bandwidth, typically between 1 and 5 Mbps [5]. Machine vision is just one example of a killer application for those systems. Therefore, other alternatives are required to support higher-bandwidth demands while retaining the main requirements of a real-time communication system: predictability, timeliness, bounded delays, and jitter. From the 1980s, several general-purpose networks, exhibiting higher bandwidth than the traditional fieldbus protocols, have also been proposed for use at the field level. For example, two prominent networks, Fiber Distributed Data Interface (FDDI) and ATM, have been extensively analyzed for both hard and soft real-time communication systems [50]. However, due to high complexity, high cost, lack of flexibility, and interconnection capacity [50], these protocols have not gained general acceptance. Another communication protocol that has been evaluated for use at the field level is Ethernet. Main factors that favor the use of this protocol are [5] cheap silicon availability, easy integration with Internet, clear path for future expandability, and compatibility with networks used at higher layers on the factory structure. However, the nondeterministic arbitration mechanism used by Ethernet impedes its direct use at the field level, at least for hard real-time communications. Therefore, in the past, several attempts have been made to allow Ethernet to support time-constrained communications. The methods that have been used to achieve deterministic message transmission over Ethernet range from modifications of the medium access control (MAC) layer (e.g., [28]) to the addition of sublayers over the Ethernet layer to control the time instants of message transmission (e.g., [54]) and therefore avoid collisions. More recently, with the advent of switched Ethernet, and therefore the intrinsic absence of collisions, a new set of works concerning the ability of this topology to carry time-constrained communications has appeared (e.g., [50]). This chapter presents a brief description of the Ethernet protocol, followed by a discussion of several techniques that have been proposed or used to enforce real-time communication capabilities over Ethernet during the last two decades. The techniques referred to include those that support either probabilistic or deterministic analysis of the network access delay, thus covering diverse levels of real-time requirements from soft to hard real-time applications. This chapter aims to bring together in one volume different dispersed pieces of related work, trying to provide a global view of this niche area of real-time communication over Ethernet. The presentation is focused more on conceptual grounds than on mathematical formalism, which can be found in the references provided in the text. Moreover, for the sake of coherency with the original work referred to, several different expressions are used interchangeably in the text, but with similar meaning. This is the case of node, station, and host, referring to a computing element in a distributed system with independent network access, as well as message, frame, and packet, referring to an Ethernet protocol data unit.
20.2 Ethernet Roots Ethernet was born about 30 years ago, invented by Robert Metcalfe at Xerox’s Palo Alto Research Center. Its initial purpose was to connect two products developed by Xerox: a personal computer and a brand new laser printer. Since then, this protocol has evolved in many ways. For instance, concerning the transmission speed, it has grown from the original 2.94 Mbps to 10 Mbps [6, 7, 15–18], then to 100 Mbps [19], and more recently to 1 Gbps [20] and 10 Gbps [21]. Concerning physical medium and network topology, Ethernet started as a bus topology based initially on thick coaxial cable [15], and afterwards on thin coaxial cable [16]. In the mid-1980s, a more structured and fault-tolerant approach,
© 2005 by CRC Press
20-3
Approaches to Enforce Real-Time Behavior in Ethernet
TX OK
Idle
TX request Bus busy
Transmit
Sense Bus idle
Collision Jam sequence
Wait backoff time
FIGURE 20.1 Ethernet CSMA/CD simplified state diagram.
based on a star topology, was standardized [17], running only at 1 Mbps, however. At the beginning of the 1990s, an improvement was standardized [18], running at 10 Mbps over category 5 unshielded twisted-pair cable. In this evolution process, two fundamental properties have been kept unchanged: 1. Single collision domain; i.e., frames are broadcast on the physical medium, and all the network interface cards (NICs) connected to it receive them. 2. The arbitration mechanism, which is called carrier-sense multiple access with collision detection (CSMA/CD). According to the CSMA/CD mechanism (Figure 20.1), a NIC with a message to be transmitted must wait for the bus to become idle and only then does it start transmitting. However, several NICs may have sensed the bus during the current transmission and then tried to transmit simultaneously thereafter, causing a collision. In this case, all the stations abort the transmission of the current message, wait for a random time interval, and try again. This retry mechanism is governed by the truncated binary exponential backoff (BEB) algorithm, which duplicates the randomization interval every retry, reducing the probability of further collisions. The number of retries is limited to 16. The use of a single broadcast domain and the CSMA/CD arbitration mechanism has created a bottleneck when facing highly loaded networks: above a certain threshold, as the submitted load increases, the throughput of the bus decreases — a phenomenon referred to as thrashing. In the beginning of the 1990s, the use of switches in place of hubs was proposed as an effective way to deal with thrashing. A switch creates a single collision domain for each of its ports. If a single node is connected to each port, collisions never actually occur unless they are created on purpose, e.g., for flow control. Switches also keep track of the addresses of the NICs connected to each port by inspecting the source address in the incoming messages. This allows forwarding of incoming messages directly to the respective outgoing ports according to the respective destination addresses, a mechanism generally known as forwarding. When a match between a destination address and a port cannot be established, the switch forwards the respective message to all ports, a process commonly referred to as flooding. The former mechanism, forwarding, allows a higher degree of traffic isolation so that each NIC receives the traffic addressed to it only. Moreover, since each forwarding action uses a single output port, several of these actions can be carried out in parallel, resulting in multiple simultaneous transmission paths across the switch and, consequently, in a significant increase in the global throughput.
20.3 Why Use Ethernet at the Fieldbus Level? Operation at the fieldbus level implies the capacity to convey time-constrained traffic associated with sensors, controllers, and actuators. However, as mentioned above, Ethernet was not designed to support that type of traffic, and some of its properties, such as the nondeterministic arbitration mechanism, pose
© 2005 by CRC Press
20-4
The Industrial Communication Technology Handbook
serious challenges for that purpose. Thus, why use it? Several works address the pros and cons of using Ethernet at the field level (e.g., [5][30][54]). Commonly cited arguments in favor are [5]: • It is cheap, due to mass production. • Integration with Internet is easy (Transmission Control Protocol (TCP)/Internet Protocol (IP) stacks over Ethernet are widely available, allowing the use of application layer protocols such as File Transfer Protocol (FTP) and Hypertext Transfer Protocol (HTTP)). • Steady increases on the transmission speed have happened in the past and are expected to occur in the near future. • Due to its inherent compatibility with the communication protocols used at higher levels within industrial systems, the information exchange with the plant level becomes easier. • The bandwidth made available by existing fieldbuses is insufficient to support some recent developments, such as the use of multimedia (e.g., machine vision) at the field level. • Availability of technicians familiar with this protocol. • Wide availability of test equipment from different sources. • Mature technology, well specified and with equipment available from many sources, without incompatibility issues. On the other side, the most common argument against using Ethernet at the field level is its destructive and nondeterministic arbitration mechanism. A potential remedy is the use of switched Ethernet, which allows bypassing of the native CSMA/CD arbitration mechanism. In this case, provided that a single NIC is connected to each port, and the operation is full duplex, no collisions occur. However, just avoiding collisions does not make Ethernet deterministic; for example, if a burst of messages destined to a single port arrives at the switch in a given time interval, their transmission must be serialized. If the arrival rate is greater than the transmission rate, buffers will be exhausted and messages will be lost. Therefore, even with switched Ethernet, some kind of higher-level coordination is required. Moreover, bounded transmission delay is not the only requirement of a fieldbus. Some other important factors commonly referred to in the literature are temporal consistency indication, precedence constraints, and efficient handling of periodic and sporadic traffic, as well as of short packets. Clearly, Ethernet, even with switches, does not provide answers to all those demands.
20.4 Making Ethernet Real Time The previous section discussed the pros and cons of using Ethernet for real-time communication, particularly for use as a fieldbus. Basically, Ethernet by itself cannot fulfill all the properties that are expected from a fieldbus. Therefore, specifically concerning real-time communication, several approaches have been developed and used. Many of them override Ethernet’s CSMA/CD medium access control by setting an upper transmission control layer that eliminates, or at least reduces, the occurrence of collisions at the medium access. Other approaches propose modification of the CSMA/CD medium access control layer so that collisions either seldom occur or, when they do, the collision resolution is deterministic and takes a bounded worst-case time. Moreover, some approaches support such deterministic reasoning on the network access delay, while other ones allow for a probabilistic characterization only. In the remainder of this chapter, some paradigmatic efforts to improve Ethernet’s behavior with respect to meeting deadlines for the network access delay are briefly discussed. For the sake of clarity, they are classified as: • • • • • •
CSMA/CD-based protocols Modified CSMA protocols Token passing TDMA Master–slave techniques Switched Ethernet
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-5
20.5 CSMA/CD-Based Protocols This category of protocols paradoxically achieves real-time behavior by using standard Ethernet network adapters and relying on the original CSMA/CD contention resolution mechanism. They exploit the fact that the probability of collision upon network access is closely related to the traffic properties, namely, the bus utilization factor, message lengths, and precedence constraints [2][25]. In fact, knowing such traffic properties allows for computing probabilities of either packet losses or deadline misses [25][41]. In many distributed real-time applications the network utilization can be characterized at design time, the traffic load is light, and there is a predominance of short messages. In these cases, the expected deadline miss ratio is low. Reference [41] presents a table that contains a set of different load, bandwidth, and deadline requirements the interval of time for which there is a 99% probability of all messages being delivered within their respective deadlines. For instance, in a 100-Mbps network with 1000 messages of 128 bytes payload generated every second, and a message deadline of 1 ms, the referred interval is 1140 years. However, for the same load but with 2-ms deadlines in a 10-Mbps network that interval is reduced to 1 h. This huge reduction in the interval for which all deadlines are met with 99% probability results from an increase in bus utilization from 1 to 10% approximately. The numbers above show a strong dependence between the network utilization and the deadline miss probability. They also show that by keeping the utilization sufficiently low (1% in that case) and using relatively short messages (128 bytes of payload in the example), the deadline miss probability can be almost negligible. Despite seeming a significant waste of resources, the fact that Ethernet bandwidth is significantly higher than the requirements of many practical applications makes this possibility a practical one. In fact, there are protocols for real-time communication over Ethernet that rely solely on such a low-bandwidth utilization factor with small payloads, as shown in the following subsection on NDDS and ORTE. Achieving higher-bandwidth utilization factors together with a low-deadline-miss probability requires further control over the traffic in order to avoid bursts. This is called traffic smoothing or shaping, and it is explained later on in this section. In both cases, the probability of deadline misses is nonzero but can be made arbitrarily small with a corresponding penalty in bandwidth efficiency. Thus, in principle, these techniques may also be used in hard real-time applications, but their natural target is in soft realtime ones, such as multimedia, in which an occasional deadline miss just causes some transitory performance degradation. The application of these techniques to distributed computer control systems is also possible as long as such systems, from the control point of view, are designed to tolerate occasional sample losses.
20.5.1 NDDS Network Data Delivery Service (NDDS) [38] is a middleware for distributed real-time applications made by Real-Time Innovation, Inc., based on the Real-Time Publisher–Subscriber model [43]. It is currently in a process of standardization within the Object Management Group (OMG), which has recently released the “Data Distribution Service for Real-Time Systems Specification” [4]. The NDDS architecture is depicted in Figure 20.2. The system architecture is centered on the NDDS database, which holds all the pertinent data concerning groups of publishers and subscribers. This database is accessible to the NDDS library and to a set of NDDS tasks. The former component provides a comprehensive set of services to the user applications, in the form of an Application Programming Interface. The NDDS tasks manage subscriptions and services, and send and receive publication updates. The NDDS database is shared among all network nodes, providing them with a holistic view of the communication requirements. Such global knowledge may be used to compute the probabilistic guarantee of deadline misses for the current load. This information is then made available to the application. There is no other mechanism to support traffic timeliness beyond admission control on the current communication load.
© 2005 by CRC Press
20-6
The Industrial Communication Technology Handbook
User Applications
NDDS Library
NDDS Database
NDDS Tasks
Operating Systems Network Interface
FIGURE 20.2 NDDS architecture.
However, in terms of fault tolerance, which is often a requirement of real-time systems, NDDS provides other mechanisms to support publisher redundancy. Thus, each group may have several subscribers and several replicated publishers of the same entity, e.g., a temperature value, all producing it in parallel. Each publication has two additional associated parameters: strength and persistency. The strength defines the relative weight of a publisher with respect to other publishers of the same entity. The persistency specifies the temporal validity of the publication. Subscribers consider a publication only if its strength is greater than or equal to the strength of the last one they received of that entity. In case the persistency window expires, the first publication of that entity after that instant is always accepted, irrespective of its strength. These mechanisms are complemented with sequence numbers assigned to each publication that allow the detection of missing instances. Publishers keep their publications in a buffer for a specified period. During that period, subscribers that missed a publication may request its retransmission.
20.5.2 ORTE The OCERA Real-Time Ethernet (ORTE) communication protocol [47] has been developed in the scope of an open-source implementation of the Real-Time Publisher–Subscriber protocol [43]. This protocol allows establishing statistical real-time channels on Ethernet based on the limitation of the bandwidth utilization. The internal architecture of ORTE is presented in Figure 20.3, and it is functionally equivalent to the NDDS architecture (Section 20.5.1). The ORTE layer is composed of manager objects (M), which are responsible for the traffic management, and managed applications (MA), which are the objects that represent the user application within the ORTE layer. Publisher redundancy and acknowledged message transmissions are supported by mechanisms equivalent to the ones presented in Section 20.5.1.
20.5.3 Traffic Smoothing Kweon et al. [25] introduced the traffic-smoothing technique. In this work, the authors showed analytically that it is possible to provide a probabilistic guarantee that packets may be successfully transmitted
© 2005 by CRC Press
20-7
Approaches to Enforce Real-Time Behavior in Ethernet
API
Store Pub, Sub Database of M, MA, Pub. and Subs. objects
Metatraffic
Database of M and MA objects
User Data Metatraffic
UDP
FIGURE 20.3 ORTE internal architecture.
within a predefined time bound, if the total arrival rate of new packets generated by all the stations on the network is kept below a threshold, called network-wide input limit. The probabilistic guarantee can be expressed by P(D £ Dk*) > 1 – Z, where Z is the loss tolerance and Dk* is the worst-case delay suffered by a packet when it is successfully transmitted at the kth trial. Therefore, if the network average load is kept below a given threshold and bursts of traffic are avoided, a low probability of collisions can be obtained, as an estimation of the network-induced delay. To enforce this behavior, an interface layer called traffic smoother (Figure 20.4) is placed just above the Ethernet data link layer. This element is in charge of shaping the traffic generated by the respective node according to a desired rate commonly referred to as station input limit. The traffic smoother implements a leaky bucket with appropriate depth and rate that captures and smoothes the non-realtime traffic generated by the node. On the other hand, the real-time traffic is, by its nature, nonbursty, thus spaced in time, with short payloads, resulting in low collision probability. Therefore, it does not need smoothing and is immediately sent to the network, bypassing the leaky bucket. Application
{TCP,UDP}/IP
Traffic Smoother RT traffic
NRT traffic
Leaky bucket
Ethernet
FIGURE 20.4 Software architecture of traffic smoothing.
© 2005 by CRC Press
20-8
The Industrial Communication Technology Handbook
The station input limit, i.e., the parameters of the leaky bucket, can be defined either statically at design time or dynamically according to the current traffic conditions. The original implementation used the static approach, in which the station input limit is assigned at design time. A side effect of this technique is that it may lead to poor network bandwidth utilization whenever at runtime one or more stations use less bandwidth than they were assigned. In such circumstances, the unused bandwidth is simply wasted. Moreover, the number of stations must be known a priori to compute the station input limits. This is not adequate to an open system with stations that connect and disconnect at runtime. Kweon et al. [26] and Lo Bello et al. [29] proposed the dynamic approach in which the bus load is assessed online and the station input limit may vary within a predefined range. Thus, stations having higher communication requirements at some particular instant in time may reclaim bandwidth that is not being used by other stations at that time, thus increasing the potential bandwidth utilization. Moreover, as stations dynamically adapt themselves to the traffic conditions, this solution scales well when the number of stations increases. Lo Bello et al. [31] developed a further evolution of the dynamic approach that consists of estimating the network load online using two parameters, the number of collisions and the throughput, both observed in a given interval. These parameters are then fed to a fuzzy controller to set the instantaneous station input limit. The resulting efficiency is substantially higher than both the static and dynamic approaches relying on a single parameter to assess the bus state.
20.6 Modified CSMA Protocols As opposed to the previous category in which the native arbitration mechanism of Ethernet, i.e., CSMA/ CD, is used as it is, in this category such an arbitration protocol is adequately modified, namely, the backoff-and-retry mechanism, in order to improve the temporal behavior of the network (e.g., [3][28][46]). The result is still a fully distributed arbitration protocol of the CSMA family that determines when to transmit based on local information and on the current state of the bus only. There are two common options in this category: delaying transmissions to reduce the probability of collisions or sorting out collisions in a controlled way. This section presents four protocols, the first of which (virtual-time CSMA) follows the first option by implementing a type of CSMA with collision avoidance (CSMA/CA) that delays message transmissions according to a temporal parameter. The remaining three protocols, Windows, CSMA/DCR, and EQuB, follow the second option, modifying the backoff-and-retry mechanism so that the network access delay for any contending message can be bounded. CSMA/DCR and EQuB support a deterministic bound, while Windows still uses a probabilistic approach to sort out particular situations.
20.6.1 Virtual-Time CSMA The virtual-time CSMA protocol is presented in [33] and [37]. It allows implementing different scheduling policies by assigning different waiting times to messages submitted for transmission. The traffic on the bus is then serialized according to such waiting times, following an order that approximates the chosen scheduling policy. This mechanism is highly flexible in the sense that all common real-time scheduling policies can be implemented, either static priorities based (e.g., rate monotonic and deadline monotonic) or dynamic priorities based (e.g., minimum laxity first and earliest deadline first). One of the most interesting features of this protocol is that its decisions are based on the assessment of the communication channel status only. When the bus becomes idle and a node has a message to transmit, it waits for a given amount of time, related to the scheduling policy implemented. For example, if minimum laxity first (MLF) scheduling is used, the waiting time is derived directly from the laxity using a proportional constant. When this amount of time expires, and if the bus is still idle, the node tries to transmit the message. If a collision occurs, then the scheduler outcome resulted in more than one message having permission to be transmitted at the same time (e.g., when two messages in two nodes have the same laxity in MLF). In this case, the protocol can recalculate the waiting time either using the
© 2005 by CRC Press
20-9
Approaches to Enforce Real-Time Behavior in Ethernet
Message collision Message c Message m
Bus
Message a
Message b
Message b
Message c
Events A(i,a) A(j,b)
A(k,c)
d(i,a)
d(j,b)=d(k,c)
Legend: A(l,z) : arrival instant of the lth instance of message z d(l,z) : absolute deadline of instance l of message z
FIGURE 20.5 Example of Virtual-Time CSMA operation using MLF.
same rule or using a probabilistic approach. This last option is important to sort out situations in which the scheduler cannot differentiate messages; e.g., messages with the same laxity would always collide. Figure 20.5 shows the operation of the Virtual-Time CSMA protocol with MLF scheduling. During the transmission of message m, messages a and b become ready. Because the laxity of message a (i.e., time to the deadline minus message transmission time) is shorter than the laxity of message b, message a is transmitted first. During the transmission of message a, message c arrives. Messages b and c have the same deadline and the same laxity. Therefore, an attempt will be made to transmit them at the same time, causing a collision. Then the algorithm uses the probabilistic approach, with message b being given a random waiting time lower than that of message c, and thus being transmitted next. When the transmission of message b ends, the waiting time for message c is recomputed, and only after the expiration of this interval is message c finally transmitted. Beyond the advantage of using standard Ethernet hardware, this approach does not require any other global information but the channel status, which is readily available from all NICs. Thus, the protocol can be implemented in a fully distributed and uniform way, with a relatively low computational overhead. Nevertheless, this approach presents some important drawbacks: • Performance is highly dependent on the proportional constant value used to generate the waiting time, leading to: • Excess of collisions if it is too short • Large amount of idle time if it is too long • The proportional constant value depends on the properties of the message set; therefore, online changes to that set can lead to poor performance. • Due to possible collisions, worst-case transmission time can be much higher than average transmission time, and thus only probabilistic timeliness guarantees can be given. • When implemented in software using off-the-shelf NICs, the computational overhead grows with the level of traffic on the bus because every transmission or collision raises an interrupt in all nodes to trigger the intervals of waiting time. This can be costly for networks with higher transmission rates, such as Fast or Gigabit Ethernet, mainly when the bus traffic includes many short messages.
20.6.2 Windows Protocol The Windows protocol was proposed for both CSMA/CD and Token Ring networks [33]. Concerning the CSMA/CD implementation, the operation is as follows: The nodes on a network agree on a common time interval referred to as a window. All nodes synchronize upon a successful transmission, restarting the respective window. The bus state is used to assess the number of nodes with messages to be transmitted within the window: • If the bus remains idle, there are no messages to be transmitted in the window.
© 2005 by CRC Press
20-10
The Industrial Communication Technology Handbook
lstA
lst B
Step 1
Window 1 Window 2
Step 2 Step 3
lst C
Window 3
Messages A,B and C collide
Messages A and B collide
Message A transmitted
Legend: lst x : latest sending time of message x
FIGURE 20.6 Resolving collisions with the Windows protocol.
• If only one message is in the window, it will be transmitted. • If two or more messages are within the window, a collision occurs. Depending on the bus state, several actions can be performed: • If the bus remains idle, the window duration is increased in all nodes. • In the case of a collision, the time window is shortened in all nodes. • In the case of a successful transmission, the window is restarted and its duration is kept as it is. In the first two cases, the window duration is changed but the window is not restarted. Moreover, the window duration varies between a maximum (initial) and minimum value. Whenever there is a sufficiently long idle period in the bus, the window will return to its original maximum length. If a new node enters dynamically in the system, it may have an instantaneous window duration different from the remaining nodes. This may cause some perturbation during an initial period, with more collisions than expected. However, as soon as an idle period occurs, all windows will converge to the initial length. A probabilistic retry mechanism may also be necessary when the windows are shrunk to their minimum and collisions still occur (e.g., when two messages have the same transmission time). Figure 20.6 shows a possible operational scenario using the Windows protocol implementing MLF message scheduling. The top axis represents the latest send times (lst) of messages A, B, and C. The lst of a message is the latest time instant by which the message transmission must start so that the respective deadline is met. This is equivalent to the laxity of the message as presented in the previous subsection. The first window (step 1) includes the lst of three messages, thus leading to a collision. The intervening nodes feel the collision, and the window is shrunk (step 2). However, the lst of messages A and B are still inside the window, causing another collision. In response to this event, the window size is shrunk again (step 3). In this case only message A has its lst within the window, leading to a successful transmission. This method exhibits properties that are very similar to those of the previous method (virtual-time protocol). It is based on local information only, it supports probabilistic bounds to the network delay, and it can be easily implemented in a fully distributed and uniform way. The computational overhead is also similar to that of the previous case, growing for higher levels of bus traffic when implemented in software. However, this approach is more efficient than virtual time because of its adaptive behavior, which can easily cope with a dynamic number of nodes and dynamic communication requirements. Its efficiency, though, is substantially influenced by the magnitude of the steps in the variations of the window duration.
20.6.3 CSMA/DCR In [28], LeLann and Rivierre presented the CSMA/DCR protocol, where DCR stands for deterministic collision resolution. This protocol implements a fully deterministic network access scheme that consists
© 2005 by CRC Press
20-11
Approaches to Enforce Real-Time Behavior in Ethernet
1 C6 9
2
C3
C3 3 C2
X
4
5
V
C2 6 X2
10
11
V
C3 13 X
7
8
X3
X5
C2
12
14
15
X 12
X 14
X 15
Legend: c s
c : chronological order s : channel status C n - Collided slot (n collisions) V - Empty channel slot X - transmission ok
FIGURE 20.7 Example of tree search with CSMA/DCR. TABLE 20.1 Tree Search Example Search order Channel status Source index
1 C 2 3 5 12 14 15
2 C 2 3 5
3 C 2 3
4 I
5 C 2 3
6 X 2
7 X 3
8 X 5
9 C 12 14 15
10 I
11 C 12 14 15
12 X 12
13 C 14 15
14 X 14
15 X 15
Note: Channel status: C, collision; I, idle; X, transmission.
of a binary tree search of colliding messages; i.e., there is a hierarchy of priorities in the retry mechanism that allows calculation of the maximum network delay that a message can suffer. During normal operation, the CSMA/DCR follows the standard IEEE 802.3 protocol (random access mode). However, whenever a collision is detected the protocol switches to the epoch mode. In this mode, lower-priority message sources voluntarily cease contending for the bus, and higher-priority ones try again. This process is repeated until a successful transmission occurs (Figure 20.7). After all frames involved in the collision are transmitted, the protocol switches back to the random access mode. Figure 20.7 together with Table 20.1 depicts the CSMA/DCR operation in a situation where six messages collide. Considering that lower indexes correspond to higher priorities, after the initial collision the right branch of the tree (messages 12, 14, and 15) ceases contending for bus access. Since there are still three messages on the left branch, a new collision appears, between messages 2, 3, and 5. Thus, the left subbranch is selected again, leaving message 5 out. In the following slot, messages 2 and 3 will collide again. The subbranch selected after this collision has no active messages, and thus in the following time slot the bus will be idle (step 4). This causes a move to the right subbranch, where messages 3 and 5 reside, resulting in a new collision. Finally, in step 6, the branch containing only the message with index 5 is selected, resulting in a successful transmission. The algorithm continues this way until all messages are successfully transmitted. Despite ensuring a bounded access time to the transmission medium, this approach exhibits two main drawbacks:
© 2005 by CRC Press
20-12
The Industrial Communication Technology Handbook
• In some cases (e.g., [28]), the firmware must be modified; therefore, the economy of scale obtained when using standard Ethernet hardware is lost. • The worst-case transmission time, which is the main factor considered when designing real-time systems, can be substantially longer than the average transmission time. Consequently, all worstcase analysis will be very pessimistic, leading to low bandwidth utilization, at least concerning real-time traffic.
20.6.4 EQuB Sobrinho and Krishnakumar [49] proposed the EQuB protocol, which allows achieving predictable behavior on shared Ethernet networks. It consists of an overlay mechanism to the native CSMA/CD that allows real-time and non-real-time traffic to coexist on the same network while providing privileged access to the former over the latter, with a first-come, first-served (FCFS) access discipline between contending real-time sources. The collision resolution mechanism for real-time sources requires disabling the native exponential backoff mechanism of Ethernet and the capacity to transmit jamming sequences with predefined durations. Both features must be configured in the network interface card of the respective hosts, but the latter feature is not commonly supported by off-the-shelf NICs. The underlying real-time traffic model assumes that during long intervals of time, called sessions, real-time hosts generate continuously periodic streams of data to be transmitted over the network. This is the case, for example, when a host initiates the transmission of a video stream at a constant bit rate. Collisions involving non-real-time hosts only are sorted out by the native CSMA/CD mechanism of Ethernet. However, when real-time hosts participate in a collision, they transmit a jamming signal that is longer than that specified in the Ethernet MAC protocol, i.e., 32 bit times. These crafted jamming signals are called black bursts, and their maximum duration is set proportional to the time a given host has been waiting to transmit a given message, i.e., the duration of the collision resolution process. During the transmission of a black burst, the bus state is continuously monitored. If, at some moment, a realtime host contending for the bus detects that no other nodes are sending black bursts, it infers that it is the host having the oldest ready message (highest-priority according to FCFS), subsequently aborts the transmission of its own black burst, and transmits the data message immediately after. If a real-time host transmits its complete black burst and still feels the bus jammed, it infers that other hosts having longer black bursts, and consequently longer waiting times, are also disputing the bus. In this circumstance the host backs off, waiting for the bus to become idle for the duration of an interframe space (IFS). At this time, the black burst duration is recomputed to reflect the increased waiting time, and a new attempt is made to transmit the message. Figure 20.8 illustrates the bus arbitration mechanism with two hosts having one real-time message each, 1 and 2, scheduled for transmission at instants t0 and t1, respectively, while a third data message is being transmitted. Since both hosts feel the bus busy, they wait for the end of current message transmission plus an IFS, which occurs at instant t3. According to EQuB, both nodes attempt to transmit their message at time t3, generating a collision and starting the transmission of black bursts (t4). Since message 2 has a shorter waiting time than message 1, its black burst is completely transmitted, terminating at instant t5, and the respective host backs off, waiting for the bus to become idle again before retrying the message transmission. At that point, the winning host, which has the oldest message, detects that there are no more jamming sequences from other hosts, stops sending its own jamming signal, and initiates immediately the transmission of its data message, which happens at instant t6. It is important to realize that non-real-time data messages always lose the arbitration against any realtime message because real-time hosts transmit their messages right after the jamming signal without further delay, while the non-real-time messages follow the standard Ethernet backoff-and-retry mechanism (BEB). On the other hand, among real-time messages, the ones with longer waiting times are associated with longer black bursts. Thus, they are transmitted before other real-time messages with shorter waiting times, resulting in the FCFS serialization, as discussed before.
© 2005 by CRC Press
20-13
Approaches to Enforce Real-Time Behavior in Ethernet
tacc 1
Maximum black burst duration
twait 1
RT Pack
tacc 2
IFS
1
t
Maximum black burst duration
twait 2
RT Pack
2
t IFS
Data Pack
IFS
k
t t0
t1
t2
t3 t4
t5 t6
t7
t8
Legend
NRT data packet
RT data packet
Black burst
FIGURE 20.8 Black burst contention resolution mechanism.
Moreover, the EQuB protocol also takes advantage of the underlying periodic model of the real-time traffic and schedules the next transmission in each host one period later with respect to the transmission instant of the current instance. Thus, in some circumstances, particularly when the message periods in all real-time hosts are equal or harmonic, the future instances of the respective messages will not collide again, leading to a high efficiency in bus utilization and to a round-robin service of real-time hosts.
20.7 Token Passing One well-known medium access control technique suited for shared broadcast bus networks is token passing. According to this technique, there is a single token in the entire network at any instant and only the node having possession of the token is allowed to trigger message transactions. The token is then circulated among all nodes according to an order that is protocol dependent. In the simplest and most common way, the token rotates in a circular fashion, which tends to divide the bandwidth equally among all nodes in high-traffic load conditions. For asymmetrical bandwidth distribution, some protocols allow the token to visit the same node more than one time in each token round. In both cases, a basic condition for real-time operation is that the time spent by the token at each node must be bounded. This can be achieved by using a timed-token protocol [32], as in the wellknown cases of FDDI, IEEE 802.4 Token Bus, and PROFIBUS. The same technique, i.e., a timed-token protocol, can be used to enforce real-time behavior on Ethernet networks, overriding the native CSMA/ CD arbitration. A common pitfall of the previous approaches is that token losses take time to be detected and recovered from, causing a transitory disruption in the network operation. Also, the bus access is not periodic due to the irregularity of token arrivals, causing a considerable jitter in high-rate periodic message streams. The above pitfalls have been addressed by other protocols, e.g., RETHER and RT-EP, both explained below, that have substantially different token management policies. For example, in RETHER the token rotation is triggered periodically, irrespective of the traffic transmitted in each cycle. On the other hand, in RT-EP the token is first circulated among all nodes to reach an agreement on the highest-priority message ready to be transmitted, and then it is directly sent to the respective transmitting node.
© 2005 by CRC Press
20-14
The Industrial Communication Technology Handbook
Direct Access
TCP/UDP IP
Video/Audio
Real-Time Traffic
Logical Link Control
QoS Sublayer
Token-Passing Protocol
Medium Access Control (MAC) Physical Layer
FIGURE 20.9 Extended Ethernet protocol stack for timed-token operation.
20.7.1 Timed-Token Protocols In timed-token protocols [32], the token visits all the nodes in a fixed order, without previous knowledge about their states concerning the number or priority of ready messages. Therefore, upon token arrival, a node may have several messages ready for transmission or none. In the former case, the node transmits its ready messages while in possession of the token. In the latter case, the token is forwarded immediately. The crux of the protocol consists of enforcing an upper limit to the interval of time that a node can hold the token before forwarding it, i.e., the token-holding time. This interval of time is set dynamically upon each token arrival according to the difference between the target and the effective token rotation times. The target token rotation time is a configuration parameter with a deep impact on the temporal behavior of the system. For example, it influences directly the worst-case interval between two consecutive token visits. The effective token rotation time is the interval that actually ellapses between a token arrival and the arrival of the previous one at the same node. Therefore, during each token visit, a node has more or less time to transmit messages depending on whether the token arrived early or late, respectively. In any case, a minimum transmission capacity is always granted to every node during each token visit to reduce network inaccessibility periods (synchronous bandwidth in FDDI and 802.4, or one high-priority message in PROFIBUS). Knowing the global communication requirements as well as the number of nodes in the network, it is possible to upper bound the time between two consecutive token visits to each node, thus providing an upper bound to the real-time traffic latency. The respective feasibility analysis is shown in [32] for IEEE 802.4 and in [53] for PROFIBUS. Steffen et al. [51] present an implementation of this concept on shared media local area networks. Although aiming particularly at shared Ethernet, the method may also be applied to networks like HomePNA [14] and Powerline [40]. The extended Ethernet protocol stack proposed in [51] is depicted in Figure 20.9. All the nodes connected to the network have a quality-of-service (QoS) sublayer (token-passing protocol in Figure 20.9), which interfaces the logical link control and the medium access control layers. The QoS sublayer overrides the native arbitration mechanism, controlling the access to the bus via a token-passing mechanism. This protocol defines two distinct types of message streams: synchronous and asynchronous. Synchronous traffic is assumed to be periodic and its maximum latency can be bounded. It is characterized by the message transmission time, period, and deadline. Asynchronous traffic is handled according to a best-effort policy, and thus no real-time guarantees are provided. Asynchronous streams are characterized by the message transmission time and desired average bandwidth.
© 2005 by CRC Press
20-15
Approaches to Enforce Real-Time Behavior in Ethernet
Ethernet
NRT 1
1
2
NRT 2
NRT 3
3
NRT 4
4
5
NRT 5
6
RT 1
n
Nodes having RT messages
m
Nodes having exclusively NRT messages
FIGURE 20.10 Sample network configuration for RETHER.
Whenever the token arrives at a node, the synchronous messages are sent first. All nodes are granted at least a predefined synchronous bandwidth in all token visits to send such type of traffic. After the synchronous bandwidth is exhausted, a node can continue to transmit up to the exhaustion of its tokenholding time. After that, the token is forwarded to the next node in the circulation list. In [51] Steffen et al. present the adaptation of existing analytical tools to carry the feasibility analysis of the real-time communication requirements.
20.7.2 RETHER The RETHER protocol was proposed by Venkatramani and Chiueh [56]. This protocol operates in normal Ethernet CSMA/CD mode until the arrival of a real-time communication request, upon which it switches to Token Bus mode. In Token Bus mode, real-time data are considered to be periodic and the time is divided into cycles of fixed duration. During the cycle duration the access to the bus is regulated by a token. First, the token visits all nodes that are sources of real-time (RT) messages. After this phase, and if there is enough time until the end of the cycle, the token visits the sources of non-real-time (NRT) messages. An online admission control policy assures that all accepted RT requests can always be served and that new RT requests cannot jeopardize the guarantees of existing RT messages. Therefore, in each cycle all RT nodes can send their RT messages. However, concerning the NRT traffic, no timeliness guarantees are granted. Figure 20.10 illustrates a possible network configuration with six nodes. Nodes 1 and 4 are sources of RT messages, forming the RT set. The remaining nodes have no such RT requirements and constitute the NRT set. The token first visits all the members of the RT set and after, if possible, the members of the NRT set. A possible token visit sequence could be cycle i {1 – 4 – 1 – 2 – 3 – 4 – 5 – 6}, cycle i + 1 {1 – 4 – 1 – 2}, cycle i + 2 {1 – 4 – 1 – 2 – 3 – 4}, etc. In the ith cycle the load is low enough so that the token has time to visit the RT set plus all nodes in the NRT set, too. In the following cycle, besides the RT set, the token only visits nodes 1 and 2 of the NRT set, and in the next cycle, only nodes 1 through 4 of the NRT set are visited. This approach supports deterministic analysis of the worst-case network access delay, particularly for the RT traffic. Furthermore, if the NRT traffic is known a priori, it is also possible to bound the respective network access delay, which can be important, for example, for sporadic real-time messages. However, since the bandwidth available for NRT messages is distributed according to the nodes order established in the token circulation list, the first nodes always get precedence over the following ones, leading to potentially very long worst-case network access delays. Moreover, this method involves a considerable communication overhead caused by the circulation of the token.
20.7.3 RT-EP: Real-Time Ethernet Protocol The Real-Time Ethernet Protocol (RT-EP) [34][35] is a token-passing protocol that operates over Ethernet and that was designed to be easily analyzable using well-known schedulability analysis techniques.
© 2005 by CRC Press
20-16
The Industrial Communication Technology Handbook
Priority Queues TX Queue
Send_Info
RX Queue RX Queue
Main Communication Thread
Ethernet
Tasks
Init_Comm
Recv_Info RX Queue
FIGURE 20.11 RT-EP node architecture.
An RT-EP network is logically organized as a ring, each node knowing which other nodes are its predecessor and successor. The token circulates from node to node within this logical ring. Access to the bus takes place in two phases: arbitration and application message transmission. In the arbitration phase, the token visits all the nodes arranged in the logical ring to determine the one holding the highest-priority message ready for transmission. For this purpose, the token conveys a priority threshold that is initialized with the lowest priority in the system every time an application message is transmitted. Then, upon token arrival, each node compares the priority of its own ready messages, if any, with the priority encoded in the token. If any of its ready messages has a higher priority than the one encoded in the token, this one is updated. The token also carries the identity of the node that contains the highest-priority message found so far. After one token round the arbitration phase is concluded and the token is sent directly to the node having the highest-priority ready message so that it can transmit it, i.e., application message transmission phase. After concluding the application message transmission, the same node starts a new arbitration phase. Internally (Figure 20.11), each node has one priority transmission queue, within which all outgoing messages are stored in priority order, and a set of reception priority queues, one for each application requesting the reception of messages. The main communication thread handles all the interaction with the network, carrying all the protocol-related operations, namely, the arbitration and application message transmission and reception. User applications have access to three services: • Init_Comm: Performs network initialization • Send_Info: Places a message in the TX queue for transmission • Recv_Info: Reads a message (if present) from the application RX queue RT-EP packets are carried in the data field of the Ethernet frames. There are two distinct types of RTEP packets: token packets and info packets. The token packets are used during the arbitration phase and contain a packet identifier, specifying the functionality of the packet; priority and station address fields, identifying the highest-priority ready message as well as the respective station ID; and a set of fields used to handle faults. The info packets carry the actual application data and contain a packet identifier field, specifying the packet’s type; a priority field, which contains the priority of the message being conveyed; a channel ID field, identifying the destination queue in the receiver node; a length field, defining the message data size; an info field, carrying the actual message data; and a packet number field, which is a sequence number used for fault tolerance purposes.
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-17
The fault tolerance mechanism [35] allows recovering from message losses, including token losses, within a bounded time. This mechanism is based on forcing all stations to permanently listen to the bus. Following any transaction, the predecessor station monitors the bus waiting for the following transmission of the next frame by the receiving station. If the receiving station does not transmit any frame within a given time window, the predecessor station assumes a message loss and retransmits it. After a predefined number of unsuccessful retries, the receiving station is considered a failing station and is excluded from the logical ring. This mechanism my lead to the occurrence of message duplicates. The sequence number field, present both in the token and info packets, is used to discard the duplicate messages at the receiving nodes. This protocol has been implemented on nodes running MaRTE OS, a POSIX compliant real-time kernel.
20.8 Time-Division Multiple Access Another well-known technique to achieve predictable temporal behavior on shared communication networks is to assign exclusive time slots to distinct data sources, either nodes or devices, in a cyclic fashion. This is known as time-division multiple access (TDMA), and it implies a global synchronization framework so that all nodes agree on their respective transmission slots. Hence, this is also a collision-free medium access protocol that can be used on top of shared Ethernet to override its native CSMA/CD mechanism and prevent the negative impact of collisions. TDMA mechanisms are widely used, mainly in safety-critical applications. Examples of TDMA-based protocols include TTP/C, TT-CAN, SAFEBus, and SWIFTNET. The remainder of this section addresses two particular TDMA implementations on shared Ethernet.
20.8.1 The MARS Bus The MARS bus was the networking infrastructure used in the MARS (Maintainable Real-Time System) architecture [24, 45] developed in the late 1980s. Soon after, the MARS bus evolved into what is nowadays the TTP/C protocol. The MARS architecture aimed at fault-tolerant distributed systems providing active redundancy mechanisms to achieve high predictability and ease of maintenance. In MARS, all activities are scheduled offline, including tasks and messages. The resulting schedule is then used online to trigger the system transactions at the appropriate instants in time. Interactions among tasks, either local or remote, are carried out via MARS messages. It is the role of the MARS bus to convey MARS messages between distinct nodes (cluster components). The MARS bus was based on a 10BASE2 Ethernet using standard Ethernet interface cards. A TDMA scheme was used to override Ethernet’s native CSMA/CD medium access control. The TDMA round consisted of a sequence of slots of equal duration, each assigned to one node in a circular fashion. Moreover, during each slot the tasks in each node were scheduled in a way to prevent contention between tasks on bus access [44].
20.8.2 Variable-Bandwidth Allocation Scheme The variable-bandwidth allocation scheme was proposed for Ethernet networks by Lee and Shin [27]. Basically, it is a TDMA transmission control mechanism in which the slots assigned to each node in the TDMA round can have different durations. This feature allows tailoring the bandwidth distribution among nodes according to their effective communication requirements, and thus it is more bandwidth efficient than other TDMA-based mechanisms relying on equal-duration slots, as was the case in the MARS bus. Nowadays, this feature has been incorporated in most of the existing TDMA-based protocols, e.g., TTP/C and TT-CAN, improving their bandwidth efficiency. Moreover, this technique also encompasses the possibility of changing the system configuration online, namely, adding or removing nodes, a feature that is sometimes referred to as flexible TDMA (FTDMA) [57]. The nomenclature used in [27] uses the expression frame to refer to the TDMA round. Both the frame duration (frame time — F) together with the slot durations (slot times — Hi) are computed according
© 2005 by CRC Press
20-18
The Industrial Communication Technology Handbook
Frame Time
Tc
H1
H2
Hn time
Slot Time
Interslot Time
FIGURE 20.12 The structure of a TDMA frame.
to the specific traffic characteristics. The first slot in each frame (Tc) is reserved for control purposes, such as time synchronization and addition or deletion of nodes. The structure of a TDMA frame is depicted in Figure 20.12. The transmission of the control slot, Tc, as well as the interslot times represent communication overhead. The interslot time must be sufficient to accommodate a residual global clock inaccuracy and to allow nodes to process incoming messages before the start of the following slot. In their work, the authors derive a set of necessary conditions that a given allocation scheme f has to fulfill to compute both the frame (F) and slot durations (Hi) according to the communication requirements, i.e., message transmission times (Ci), periods (Pi), and system overhead (g). f: ({Ci}, {Pi}, g) ({Hi}, F). Based on those conditions, the authors present an algorithmic approach for carrying the computation of F and Hi and compare the results of this methodology with other TDMA approaches, namely, MARS. The results obtained show the improvement in bandwidth utilization that may be achieved with this variable-bandwidth allocation scheme.
20.9 Master–Slave Techniques One of the simplest ways of enforcing real-time communication over a shared broadcast bus, including Ethernet, consists of using a master–slave approach, in which a special node, the master, controls the access to the medium of all other nodes, the slaves. The traffic timeliness is then reduced to a problem of scheduling that is local to the master. However, this approach typically leads to a considerable underexploitation of the network bandwidth because every data message must be preceded by a control message issued by the master, resulting in a substantial communication overhead. Moreover, there is some extra overhead related to the turnaround time, i.e., the time that must elapse between consecutive messages, since every node must fully receive and decode the control message before transmitting the respective data message. Nevertheless, it is a rugged transmission control strategy that has been used in many protocols. This section will describe two examples: ETHERNET Powerlink [9] and flexible time-triggered (FTT) Ethernet [39]. The case of FTT-Ethernet deserves particular attention because it implements a variant of the master–slave technique that allows a substantial reduction in the protocol communication overhead. This is called the master–multislave approach [1] according to which the bus time is broken into cycles and the master issues one control message each cycle only, indicating which data messages must be transmitted therein. This mechanism has been developed within the FTT communication framework [8] and has been implemented over different network protocols, such as Controller Area Network [1] and Ethernet [39].
20.9.1 FTT-Ethernet Protocol The FTT-Ethernet protocol [39] combines the master–multislave transmission control technique with centralized scheduling, maintaining both the communication requirements and the message scheduling policy localized in one single node, the master, and facilitating online changes to both, thus supporting a high level of operational flexibility.
© 2005 by CRC Press
20-19
Approaches to Enforce Real-Time Behavior in Ethernet
Elementary Cycle (EC) [ i ] Synchronous Window TM SM1
SM3
SM8
Elementary Cycle (EC) [ i + 1 ] Async. Window
SM9
{SM1,Tx 1} {SM3,Tx 3} {SM8,Tx 8} {SM9,Tx 9}
CM3
NRTM4
TM SM1
SM4
SM11
CM7 NRT11
NRT21
{SM1,Tx 1} {SM4,Tx 4} {SM11,Tx 11}
FIGURE 20.13 FTT-Ethernet traffic structure.
The bus time is divided in fixed duration time slots called elementary cycles (ECs) that are further decomposed into two phases, the synchronous and asynchronous windows (Figure 20.13), which have different characteristics. The synchronous window carries the periodic time-triggered traffic that is scheduled by the master node. The expression time triggered implies that this traffic is synchronized to a common time reference, which in this case is imposed by the master. The asynchronous window carries the sporadic traffic related to protocol control messages, event-triggered messages, or non-real-time traffic in general. There is a strict temporal isolation between both phases so that the sporadic traffic does not interfere with the time-triggered one. Despite allowing online changes to the attributes of the time-triggered traffic, the FTT-Ethernet protocol enforces global timeliness using online admission control. Due to the global knowledge and centralized control of the time-triggered traffic, the protocol supports arbitrary scheduling policies (e.g., rate monotonic (RM) and earliest deadline first [EDF]) and may easily support dynamic QoS management complementary to admission control. Beyond the flexibility and timeliness properties that this protocol exhibits, there are also some drawbacks that concern the computational overhead required in the master to execute both the message scheduling and the schedulability analysis online. This is, however, confined to one node. The computational power required by the slaves in what concerns the communication protocol is just to decode the trigger message in time and start the due transmissions in the right moments. Finally, in safety-critical applications the master must be replicated, for which there are specific mechanisms to ensure coherency between their internal databases holding the system communication requirements.
20.9.2 ETHERNET Powerlink ETHERNET Powerlink [9] is a commercial protocol providing deterministic isochronous real-time communication, operating over hub-based Fast Ethernet networks. A recently developed version (version 2.0) also allows operation over switched Ethernet networks, but for applications with more relaxed temporal constraints only. The protocol supports either periodic (isochronous) or event (asynchronous) data exchanges, and when implemented on hubs, it also provides a very tight time synchronization (accuracy better than 1 µs) and fast update cycles (in the order of 500 µs) for the periodic traffic. From architectural and functional points of view, this protocol bears many resemblances with the WorldFIP fieldbus. The ETHERNET Powerlink protocol uses a master–slave transmission control technique, which completely prevents the occurrence of collisions at the bus access [10]. The network architecture is asymmetric, composed of a so-called Powerlink manager (master) and a set of Powerlink controllers (slaves). The former device controls all the communication activities, assigning time slots to all the remaining stations. The latter devices, controllers, are passive bus stations, sending information only after an explicit request from the manager. The Powerlink protocol operates isochronously, with the data exchanges occurring in a cyclic framework based on a microcycle of fixed duration, i.e., the Powerlink cycle. Each cycle is divided in four distinct phases: start, cyclic, asynchronous, and idle periods (Figure 20.14).
© 2005 by CRC Press
20-20
The Industrial Communication Technology Handbook
Cycle Time
Manager Controller
Start-Period Start of Cycle
Cyclic-Period Poll Req
Asynch-Period
Poll Req
Poll Res
End of Cycle
Poll Res
Idle-Period
Invite
Send
Time
FIGURE 20.14 Powerlink cycle structure.
A Powerlink cycle starts with a start-of-cycle message, sent by the manager. This is a broadcast message, which instructs controllers that a new cycle will start, and thus allows them to carry the preparation of the necessary data. After the start period is the cyclic period, in which the controllers transmit the isochronous traffic. The transactions carried out in this period (window) are fully controlled by the manager, which issues poll requests (PollRequest) to the controllers. Upon reception of a PollRequest, controllers respond by transmitting the corresponding data message (PollResponse). The PollRequest message is a unicast message, directly addressed to the controller node involved in the transaction. The corresponding PollResponse is a broadcast message, thus facilitating the distribution of data among all system nodes that may need it (producers–distributor–consumers communication model). Isochronous messages may be issued every cycle or every given number of cycles according to the application communication requirements. After completing all isochronous transactions of one cycle, the manager transmits an end-of-cycle message, signaling the end of the cyclic period. Asynchronous transactions may be carried out between the end of the cyclic period and the end of the Powerlink cycle. These messages may be asynchronous data messages (invite/send) or management messages, like Ident/AsyncSend, issued by the manager to detect active stations. Since these transactions are still triggered by the Powerlink manager, any node having asynchronous data to send must first notify the manager of that fact. This is performed during an isochronous transaction involving that particular node, using piggybacked signaling in the respective PollResponse message. The manager maintains a set of queues for the different asynchronous request sources and schedules the respective transactions within the asynchronous period, if there is enough time up to the end of the cycle. In case there is not enough time to complete a given asynchronous transaction or there is no scheduled asynchronous transaction, then the protocol inserts idle time in the cycle (idle period) in order to strictly respect the period of the start-of-cycle message. ETHERNET Powerlink also handles Ethernet packets with foreign protocols, such as TCP/IP. This traffic is conveyed within the asynchronous period. Powerlink provides a special-purpose device driver that interfaces with such upper-protocol stacks.
20.10 Switched Ethernet Since roughly one decade ago, the interest in using Ethernet switches has been growing as a means to improve global throughput, traffic isolation, and reduce the impact of the nondeterministic features of the original CSMA/CD arbitration mechanism. Switches, unlike hubs, provide a private collision domain for each of their ports; i.e., their ports are not directly connected to each other. When a message arrives at a switch port, it is buffered, analyzed concerning its destination, and moved to the buffer of the destination port (Figure 20.15). The packet-handling block in the figure, commonly referred to as switch fabrics, transfers messages from input to output ports. When the arrival rate of messages at each port, either input or output, is greater than the rate of departure, the messages are queued. Currently, most switches are fast enough handling message arrivals so that queues do not build up at the input ports
© 2005 by CRC Press
20-21
Approaches to Enforce Real-Time Behavior in Ethernet
Switch Output Queues
Packet handling
Scheduler
Output ports
Input ports
Receiving buffers
- Address lookup - Traffic classification Scheduler
FIGURE 20.15 Switch internal architecture.
(these are commonly referred to as nonblocking switches). However, queues may always build up at the output ports whenever several messages arrive in a short interval and are routed to the same port. In such a case, queued messages are transmitted sequentially, normally in FCFS order. This queue-handling policy may, however, lead to substantial network-induced delays because higher-priority or more important messages may be blocked in the queue while lower-priority or less important ones are being transmitted. Therefore, the use of several parallel queues for different priority levels has been proposed (formerly IEEE 802.1p, now integrated within IEEE 802.1D). The number of distinct priority levels is limited to eight, but many current switches that support traffic prioritization offer an even further limited number. The scheduling policy used to handle the messages queued at each port also strongly impacts the network timing behavior [22]. A common misconception is that the use of switches, due to the elimination of collisions, is enough to enforce real-time behavior in Ethernet networks. However, this is not true in the general case. For instance, if a burst of messages destined to the same port arrives at the switch, output queues can overflow, thus losing messages. This situation, despite seeming somewhat unrealistic, can occur with a nonnegligible probability in certain communication protocols based on the producer–consumer model, e.g., Common Industrial Protocol (CIP) and its lower-level protocols such as EtherNet/IP (Industrial Protocol) [36], or based on the publisher–subscriber model, such as RTPS [43] used within Interface for Distributed Automation (IDA). In fact, according to these models, each node that produces a given datum (producer or publisher) transmits it to potentially several nodes (consumers or subscribers) that need it. This model is efficiently supported in Ethernet by means of special addresses, called multicast addresses. Each network interface card can define the multicast addresses related to the information that it should receive. However, the switch has no knowledge about such addresses and thus treats all multicast traffic as broadcasts; i.e., messages with multicast destination addresses are transmitted to all ports (flooding). Therefore, when the predominant type of traffic is multicast/broadcast instead of unicast, one can expect a substantial increase of the peak traffic at each output port that increases the probability of queue overflow, causing degradation of the network performance. Furthermore, in these circumstances, one of the main benefits of using switched Ethernet, i.e., multiple simultaneous transmission paths, can be compromised. A possible way to limit the impact of multicasts is using virtual LANs (VLANs) so that flooding affects only the ports of the respective VLAN [36]. Other problems concerning the use of switched Ethernet are discussed in [5], such as the additional latency introduced by the switch in the absence of collisions as well as the low number of available priority levels, which hardly supports the implementation of efficient priority-based scheduling. These problems are, however, essentially technological and are expected to be eliminated in the near future. Moreover, switched Ethernet does alleviate the nondeterminism inherent to CSMA/CD medium access control and opens the way to efficient implementations of real-time communication over Ethernet. The remainder of this section presents two protocols that operate over switched Ethernet to support real-time communication.
© 2005 by CRC Press
20-22
The Industrial Communication Technology Handbook
End node Switch
Application protocols
RT channel management UDP
TCP IP
RT RT layer Queue Ethernet
NRT Queue
MAC PHY
RT layer Ethernet
RT layer MAC PHY
Ethernet
MAC PHY
FIGURE 20.16 System architecture for EDF-based switched Ethernet.
20.10.1 EDF Scheduled Switch Hoang et al. [12][13] developed a technique that supports a mix of real-time (RT) and non-real-time (standard IP) traffic coexisting in a switch-based Ethernet network. The RT traffic is scheduled according to the earliest-deadline-first policy, and its timeliness is guaranteed by means of adequate online admission control. The proposed system architecture, depicted in Figure 20.16, requires the addition of a real-time layer (RT-l) on network components, either end node, as well as the switch. The RT-l is responsible for establishing real-time connections, performing admission control, providing time synchronization, and managing the message transmission and reception of both real-time and non-real-time traffic classes. The switch RT channel management layer provides time synchronization by transmitting periodically a time reference message. Moreover, this layer also takes part in the admission control process, by assessing the internal state of the switch, and consequently its ability to fulfill the timeliness requirements of the real-time message streams, and by acting as a broker between the nodes requesting RT channels and the targets of such requests. Finally, this layer also disseminates the internal switch state, namely, in what concerns the queue’s status, to allow flow control of non-real-time traffic on the end nodes. Real-time communication is carried out within real-time channels, a point-to-point logical connection with reserved bandwidth. Whenever a node needs to send real-time data, it issues a request to the switch, indicating both the source and destination addresses (both MAC and IP), and the period, transmission time, and deadline of the message. Upon reception of such a request, the switch performs the first part of the admission control mechanism, which consists of evaluating the feasibility of the communication between the source node and the switch (uplink) and between the switch and the target node (downlink). If the switch finds the request feasible, it forwards the request to the destination node. The target node analyzes the request and informs the switch about its will on accepting or not the real-time connection. The switch then forwards this answer to the originator node. If the RT channel is accepted, it is assigned with a systemwide channel ID that univocally identifies the connection. The real-time layer is composed of two distinct queues, one for real-time traffic and the other for nonreal-time traffic. The former is a priority queue, where messages are kept sorted by distance to their deadlines. The non-real-time queue holds the messages in a first-in, first-out scheme. Thus, real-time messages are transmitted according to their deadlines, while non-real-time messages are transmitted according to their arrival instant. The feasibility analysis proposed [13] is derived from EDF task scheduling analysis, but with adaptations to account for some system specifics, such as including the overheads due to control messages and the impact of nonpreemptive message transmission. In the scope of that work, deadlines are defined on an end-to-end basis. Since the traffic is transmitted in two separate steps (uplink and downlink), the analysis must ensure that the total delay induced by these steps together does not exceed the total end-to-end deadline. For a given real-time message stream i, if diu is the deadline for the uplink and did the deadline for the downlink, then the end-to-end deadline
© 2005 by CRC Press
20-23
Approaches to Enforce Real-Time Behavior in Ethernet
Ethernet
EtheReal switch M 7x
8x
9x
1x
2x
3x
10x
11x 12x
7x
8x
9x
4x
5x
1x
2x
3x
10x
11x
12
4x
5x
6x
C 7 8 9101112 A
12 34 5 6
A
6x
B
Host B
Ethernet
Host A
RTCD Request
7x
8x
9x
1x
2x
3x
10x
11x
12x
7x
8x
9x
4x
5x
6x
1x
2x
3x
10x
11x
12
4x
5x
6x
C 7 8 9 101112 A
12 34 56
A
B
EtheReal switch N Result
Sender Applications
FIGURE 20.17 Connection setup procedure in the EtheReal architecture.
diee must be at least as large as the sum of the previous two: diu+ did £ diee. In [12], the authors assume an end-to-end deadline equal to the period of the respective message stream, and a symmetric partitioning of that deadline between the uplink and downlink. An improvement is presented in [13], where the authors propose an asymmetric deadline partition scheme. Although more complex, this method allows a higher efficiency in bandwidth utilization, because a larger fraction of the deadline can be assigned to more loaded links, thus increasing the overall schedulability level.
20.10.2 EtheReal The EtheReal protocol [54] is another proposal to achieve real-time behavior on switched Ethernet networks. In this approach, the protocol is supported by services implemented on the switch only, without any changes in the operating system and network layers of end nodes. The switch services are accessible to the end nodes by means of user-level libraries. EtheReal has been designed to support both real-time and non-real-time traffic via two distinct classes. The real-time variable-bit-rate (RT-VBR) service class is meant to support real-time applications. These services use reserved bandwidth and try to minimize the packet delay and packet delay variation (jitter). Applications must provide the desired traffic characteristics during the connection setup, namely, average traffic rate and maximum burst length. If these parameters are violated at runtime, the real-time guarantees do not hold and packets may be lost. The second service class is best effort (BE); it was developed specifically to support existing non-real-time applications like telnet, HTTP, etc., without requiring any modification. No guarantees are provided for this type of traffic. Real-time services in EtheReal are connection oriented, which means that applications have to follow a connection setup protocol before being able to send data to the real-time channels. The connection setup procedure is started by sending a reservation request to a user-level process called real-time communication daemon (RTCD), running on the same host (Figure 20.17). This daemon is responsible for the setup and teardown of all connections in which the host node is engaged. The reservation request for RT connections contains the respective QoS requirements: average traffic rate and maximum burst length. Upon reception of a connection setup request, the RTCD contacts the neighbor EtheReal switch that evaluates whether it has enough resources to meet the QoS requirements of the new RT connection without jeopardizing the existing ones, namely, switch fabric bandwidth, CPU bandwidth for packet scheduling, and data buffers for packet queuing. If it has such resources and if the destination node is directly attached to the same switch, it positively acknowledges the request. If the destination node is in another segment, i.e., connected to another switch, the switch that received the request forwards it to the next switch in the path. A successful connection is achieved if and only if all the switches in the path between the source and target node have enough resources to accommodate the new RT connection. If one switch does not have enough resources, it sends back a reject message, which is propagated down to the requestor node. This procedure serves to notify the requestor application about the result of the
© 2005 by CRC Press
20-24
The Industrial Communication Technology Handbook
operation, as well as to let the intermediate EtheReal switches to de-allocate the resources associated with that connection request. The EtheReal architecture employs traffic shaping and policing, within both hosts and switches. The traffic shaping is performed to smooth the interpacket arrival time, generating a constant rate flow of traffic. Traffic policing is used to ensure that the declared QoS parameters are met during runtime. Those functions are also implemented on the switches to ensure that an ill-behaved node, due to either malfunction or malicious software, does not harm the other connections on the network. With respect to the packet scheduling inside the switch, the EtheReal architecture employs a cyclic round-robin scheduling algorithm. All real-time connections are served within a predefined cycle. A part of that cycle is also reserved to best-effort traffic, to avoid starvation and subsequent time-outs on the upper-layer protocols. Applications access the real-time services by means of a real-time data transmission/reception (RTTR) library, which provides services for connection setup and teardown and data transmission and reception, beyond other internal functions already referred to, such as traffic shaping and policing. Another interesting feature of this protocol is its scalability and high recovery capability, when compared with standard switches. For example, the spanning tree protocol (IEEE 802.1D) is used in networks of standard switches to allow redundant paths and automatic reconfiguration upon a link/switch failure. However, such reconfiguration may take several tens of seconds with the network down, typically around 30 s, which is intolerable for most real-time applications. On the other hand, the authors claim that EtheReal networks may recover nearly three orders of magnitude faster, within 32 ms [55].
20.11 Recent Advances The interest in Ethernet and its use to support real-time communication continues to grow in the industrial domain, in embedded systems, and even in LANs that support QoS-sensitive distributed applications (e.g., video conferencing, Voice-over-IP (VoIP)). This growing interest also motivates a substantial research effort toward solving current limitations and improving real-time performance of Ethernet-based communication systems. This section summarizes a few of the latest results related to Ethernet technologies that were presented in this chapter. The results to be briefly discussed deal with issues such as protocol stack implementation solutions, switched topologies, traffic scheduling within the switches, and shared Ethernet. The way packets are handled by the protocol software (protocol stack) within the system nodes is one of the issues recently addressed in related literature. Most of the operating systems implement a single queue, usually using a first-come, first-served policy, for both real and non-real-time traffic. This approach causes priority inversions and induces unforeseen delays. A methodology that has been recently proposed to solve this problem is to implement multiple transmit/receive queues [48]. With this approach, the real-time traffic is intrinsically separated from the non-real-time traffic, and the latter is sent/processed only when the real-time queues are empty. It is also possible to build separate queues for each traffic class, providing internal priority-aware scheduling. The topology of a switched Ethernet network is another important issue that has recently received a great deal of attention. In fact, the topology has an impact on the number of switches that messages have to cross before reaching the target. This in turn affects the temporal properties of the traffic. For instance, the line topology proposed in [23], in which each device integrates a simplified switch and all devices are chained in a line, eases the cabling, but this is not the most suitable topology for realtime behavior and fault tolerance. On the other hand, the work in [42], or more recently in [6], proposes using a tree topology with two levels only. An optimization algorithm decides the allocation of nodes to switches so that all time constraints are met, taking into account the whole traffic in each branch of the tree, i.e., in each switch. This topology favors the real-time behavior of the system but leads to a more complex cabling. However, neither of the referred topologies considers redundant paths for improved fault tolerance. This issue is addressed in [55], in which a variant of the spanning
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-25
tree protocol is proposed that is capable of managing redundant paths with recovery times on the order of few tens of milliseconds — a magnitude that is compatible with the time constraints of a large set of practical applications. Another aspect concerning switch-based Ethernet networks deals with the scheduling policy within the switch itself. Switches support up to eight distinct statically prioritized traffic classes. Different message scheduling strategies have a strong impact on the real-time behavior of the switch [23]. Particularly strategies oriented toward average performance and fairness, which are relevant for general-purpose networks, may have a negative impact on the switch real-time performance. Recent research presented in [13] and [55] addresses the use of different scheduling policies within switches, namely, earliest deadline first (EDF) and modified round-robin with fixed duration cycles, respectively. Both works also address admission control to provide timeliness guarantees to the current real-time traffic, while supporting online channel setup and teardown. On the other hand, the interest in shared Ethernet continues, either for applications requiring frequent multicasting, in which case the benefits of using switches are substantially reduced, or for applications requiring precise control of transmission timing, such as high-speed servoing. In fact, switches induce higher delay and jitter in message forwarding than hubs. This is caused by internal mechanisms such as MAC address to port translation in forwarding. In the previous sections, several examples of recent enhancements to the shared Ethernet were discussed, such as the work on adaptive traffic smoothing [31] and master–slave techniques, including both ETHERNET Powerlink [9] and FTT-Ethernet [39] protocols. Nevertheless, these two protocols also operate over switches, in the process incurring additional forwarding delays. In the latter protocol, the switch-based implementation may take advantage of the message queuing in the switch ports to simplify the transmission control. Several existing Ethernet-based industrial protocols, such as Ethernet/IP, also take advantage of switches to improve their real-time capabilities [11][36]. Particularly, Ethernet/IP is receiving considerable support from major international associations of industrial automation suppliers, such as Open DeviceNet Vendor Association (ODVA), ControlNet International (CNI), Industrial Ethernet Association (IEA), and Industrial Automation Open Networking Alliance (IAONA).
20.12 Conclusion Ethernet is the most popular technology for LANs today. Due to its low cost, high availability, and easy integration with other networks, among other characteristics, Ethernet has become an attractive option in application domains for which it was not originally designed. Some such application domains, e.g., industrial automation, impose real-time constraints on the communication services that must be delivered to the applications. This requirement conflicts with the original medium access control technique embedded in the protocol, CSMA/CD, which is nondeterministic and behaves very poorly with medium to high network loads. Therefore, many adaptations and technologies for Ethernet have been proposed to support the desired real-time behavior. This chapter presented an overview of some paradigmatic techniques, ranging from changes to the bus arbitration, to the addition of transmission control layers, to the use of special networking equipment such as switches. These techniques were described and briefly analyzed for their pros and cons in different types of applications. Then a brief summary of the latest results was presented. With the current trends to bring Ethernet into the world of distributed automation systems, it is likely that Ethernet (and its variants based on different solutions to support hard real-time and deterministic behavior, among other requirements) will establish itself as the de facto communication standard for this area. Although its efficiency in terms of bandwidth utilization is still low when considering short messages, particularly compared with the majority of fieldbuses, its already high and increasing bandwidth seems more than enough to supplant such a deficiency. Ethernet thus has a chance to become a single networking technology within automation systems to support the integration of all levels, from the plant floor to management, maintenance, the supply chain, etc.
© 2005 by CRC Press
20-26
The Industrial Communication Technology Handbook
References [1] Almeida, L., Pedreiras, P., and Fonseca, J.A. The FTT-CAN protocol: why and how. IEEE Transactions on Industrial Electronics, 49, 2002. [2] Tanenbaum, Andrew S. Computer Networks, 4th edition. Prentice Hall, Englewood Cliffs, NJ, 2002. [3] Court, R. Real-time Ethernet. Computer Communications, 15, 198–201, 1992. [4] Data Distribution Service for Real-Time Systems Specification, Final Adopted Specification ptc/ 03-03-07. Object Management Group, Inc., July 2003. [5] Decotignie, J.-D. A perspective on Ethernet as a fieldbus. In Proceedings of the 4th FeT 2001: International Conference on Fieldbus Systems and Their Applications, pp. 138–143. Nancy, France, November 2001. [6] Divoux, T., Georges, J.P., Krommenacker, N., and Rondeau, E. Designing suitable switched Ethernet architectures regarding real-time application constraints. In Proceedings of INCOM 2004 (11th IFAC Symposium on Information Control Problems in Manufacturing). Salvador, Brazil, April 2004. [7] DIX Ethernet V2.0 specification, 1982. [8] FTT Web page. Available at http://www.ieeta.pt/lse/ftt. [9] ETHERNET Powerlink protocol. Available at www.ethernet-powerlink.org. [10] ETHERNET Powerlink Data Transport Services White-Paper Ver. 0005. Bernecker + Rainer Industrie-Elektronic Ges.m.b.H., September 2002. Available at http://www.ethernet-powerlink.org. [11] Ethernet/IP (Industrial Protocol) specification. Available at www.odva.org. [12] Hoang, H., Jonsson, M., Hagstrom, U., and Kallerdahl, A. Switched real-time ethernet with earliest deadline first scheduling: protocols and traffic handling. In Proceedings of WPDRTS 2002, the 10th International Workshop on Parallel and Distributed Real-Time Systems. Fort Lauderdale, FL, April 2002. [13] Hoang, H. and Jonsson, M. Switched real-time ethernet in industrial applications: asymmetric deadline partitioning scheme. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [14] Home Phoneline Association. Available at http://www.homepna.org. [15] IEEE 802.3 10BASE5 standard. [16] IEEE 802.3 10BASE2 standard. [17] IEEE 802.3c 1BASE5 StarLan standard. [18] IEEE 802.3i 10BASE-T. [19] IEEE 802.3u 100BASE-T. [20] IEEE802.3z 1000BASE-T. [21] IEEE 802.3ae-2002, 10 Gbps. [22] Jasperneit, J. and Neumann, P. Switched Ethernet for factory communication. In Proceedings of ETFA2001: 8th IEEE International Conference on Emerging Technologies and Factory Automation. Antibes, France, October 2001. [23] Jasperneite, J., Neumann, P., Theis, M., and Watson, K. Deterministic real-time communication with switched Ethernet. In Proceedings of WFCS ’02: 4th IEEE Workshop on Factory Communication Systems, pp. 11–18. Västeras, Sweden, August 2002. [24] Kopetz, H., Damm, A., Koza, C., Mulazzani, M., Schwabl, W., Senft, C., and Zainlinger, R. Distributed fault-tolerant real-time systems: the MARS approach. IEEE Micro, 9, 25–40, 1989. [25] Kweon, S.-K., Shin, K.G., and Zheng, Q. Statistical real-time communication over Ethernet for manufacturing automation systems. In Proceedings of the 5th IEEE Real-Time Technology and Applications Symposium. June 1999. [26] Kweon, S.-K., Shin, K.G., and Workman, G. Achieving real-time communication over Ethernet with adaptive traffic smoothing. In Proceedings of RTAS ’00, 6th IEEE Real-Time Technology and Applications Symposium, pp. 90–100. Washington, DC, June 2000. [27] Lee, J. and Shin, H. A variable bandwidth allocation scheme for Ethernet-based real-time communication. In Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications, pp. 28–33. Tokyo, Japan, October 1995.
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-27
[28] LeLann, G. and Rivierre, N. Real-Time Communications over Broadcast Networks: The CSMADCR and the DOD-CSMA-CD Protocols, INRIA Report RR1863. 1993. [29] Lo Bello, L., Lorefice, M., Mirabella, O., and Oliveri, S. Performance analysis of Ethernet networks in the process control. In Proceedings of the 2000 IEEE International Symposium on Industrial Electronics. Puebla, Mexico, December 2000. [30] Lo Bello, L. and Mirabella, O. Design issues for Ethernet in automation. In Proceedings of FeT 2001: 4th FeT IFAC Conference. Nancy, France, 2001. [31] Lo Bello, L., Mirabella, O., et al. Fuzzy traffic smoothing: an approach for real-time communication over Ethernet networks, In Proceedings of WFCS 2002, 4th IEEE Workshop on Factory Communication Systems. Västeras, Sweden, August 2002. [32] Malcolm, N. and Zhao, W. The timed-token protocol for real-time communications, IEEE Computer, 27, 35–41, 1994. [33] Malcolm, N. and Zhao, W. Hard real-time communications in multiple-access networks. Real Time Systems, 9, 75–107, 1995. [34] Martínez, J., Harbour, M., and Gutiérrez, J. A multipoint communication protocol based on Ethernet for analyzable distributed applications. In Proceedings of the 1st International Workshop on Real-Time LANs in the Internet Age, RTLIA ’02, Vienna, Austria, 2002. [35] Martínez, J., Harbour, M., and Gutiérrez, J. RT-EP: real-time Ethernet protocol for analyzable distributed applications on a minimum real-time POSIX kernel. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [36] Moldovansky, A., Utilization of modern switching technology in Ethernet/IP networks, In Proceedings of the 1st International Workshop on Real-Time LANs in the Internet Age, RTLIA ’02, Vienna, Austria, 2002. [37] Molle, M. and Kleinrock, L. Virtual time CSMA: why two clocks are better than one. IEEE Transactions on Communications, COM-33, 919–933, 1985. [38] Pardo-Castellote, G., Schneider, S., and Hamilton, M. NDDS: The Real-Time Publish-Subscribe Middleware. Real-Time Innovations, Inc., Sunnyvale, CA, August 1999. Available at http:// www.rti.com/products/ndds/literature.html. [39] Pedreiras, P., Gai, P., and Almeida, L. The FTT-Ethernet protocol: merging flexibility, timeliness and efficiency, pp. 152–160. In Proceedings of the 14th Euromicro Conference on Real-Time Systems. Vienna, Austria, 2002. [40] Powerline Alliance. Available at http://www.powerlineworld.com. [41] Real-Time Innovations, Inc. Can Ethernet Be Real-Time? Available at http://www.rti.com/products/ ndds/literature.html. [42] Rondeau, E., Divoux, T., and Adoud, H. Study and method of Ethernet architecture segmentation for industrial applications, pp. 165–172. In The 4th IFAC Conference on Fieldbus Systems and Their Applications. Nancy, France, November 2001. [43] RTPS (Real-Time Publisher–Subscriber protocol) part of the IDA (Interface for Distributed Automation) specification. Available at www.ida-group.org. [44] Schabl, W., Reisinger, J., and Grunsteidl, G. A Survey of MARS, Research Report 16/89. Vienna University of Technology, Austria, October 1989. [45] Schutz, W. A Test Strategy for the Distributed Real-Time System MARS, Research Report 1/90. Vienna University of Technology, Austria, January 1990. [46] Shimokawa, Y. and Shiobara, Y. Real-time Ethernet for industrial applications, pp. 829–834. In Proceedings of IECON. 1985. [47] Smolik, P., Sebek, Z., and Hanzalek, Z. ORTE: open source implementation of Real-Time PublishSubscribe Protocol. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [48] Skeie, T., Johannesses, S., and Holmeide, O. The road to an end-to-end deterministic Ethernet. In Proceedings of WFCS ’02: 4th IE,EE International Workshop on Factory Communication Systems, pp. 3–9. Västeras, Sweden, August 2002.
© 2005 by CRC Press
20-28
The Industrial Communication Technology Handbook
[49] Sobrinho, J.L. and Krishnakumar, A.S. EQuB-Ethernet quality of service using black bursts. In Proceedings of the 23rd Conference on Local Computer Networks, pp. 286–296. Boston, MA, October 1998. [50] Song, Y. Time constrained communication over Switched Ethernet. In Proceedings of the 4th FeT 2001: International Conference on Fieldbus Systems and Their Applications, pp. 138–143. Nancy, France, November 2001. [51] Steffen, R., Zeller, M., and Knorr, R. Real-time communication over shared media local area networks. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [52] Thomesse, J-P. Fieldbus and interoperability. Control Engineering Practice, 7, 81–94, 1999. [53] Tovar, E. and Vasques, F. Cycle time properties of the PROFIBUS timed token protocol. Computer Communications, 22, 1206–1216, 1999. [54] Varadarajan, S. and Chiueh, T. EtheReal: a host-transparent real-time Fast Ethernet switch. In Proceedings of the 6th International Conference on Network Protocols, pp. 12–21. Austin, TX, October 1998. [55] Varadarajan, S. Experiences with EtheReal: a fault-tolerant real-time Ethernet switch. In Proceedings of the 8th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 184–195. Antibes, France, October 2001. [56] Venkatramani, C. and Chiueh, T. Supporting real-time traffic on Ethernet. In Proceedings of IEEE Real-Time Systems Symposium. San Juan, Puerto Rico, December 1994. [57] Willig, A. A MAC protocol and a scheduling approach as elements of a lower layer architecture in wireless industrial LANs. In Proceedings of WFCS ’97 (IEEE International Works on Factory Communication Systems). Barcelona, Spain, October 1997.
© 2005 by CRC Press
21 Switched Ethernet in Automation Networking 21.1 The Switches Are Not the Complete Network................21-1 21.2 Analyzing Switched Fast Ethernet....................................21-2 The Learning Process inside the Switch
21.3 There Are Always Bottlenecks ..........................................21-3 Even Highways Have Queues • Introducing a Standard for Priority and Delivery • High-Priority Packets Get High-Priority Treatment • Bottleneck Conclusions
21.4 Time Synchronization across Switched Ethernet............21-7
Tor Skeie ABB Corporate Research
Svein Johannessen ABB Corporate Research
Øyvind Holmeide OnTime Networks
The Concept of Time Stamping • Synchronization Requirements in Substation Automation • How to Be Extremely Accurate: IEC Class T3 • Measurements on an Actual Network • Beyond the Speed of Light: Class T5 • Summary and Conclusions
21.5 Introducing Virtual Subnetworks ..................................21-13 Port Group VLANs • Group-Based VLANs: GARP VLAN Registration Protocol • Layer 3-Based VLANs
References ...................................................................................21-15
In recent years Ethernet technology has taken several giant evolution steps. From the 10 Mbit/s shared cable in 1990, the state of the art is now a switch-based communication technology running at 100 or 1000 Mbit/s. The network switches are also getting more sophisticated in that they have started including support for packet priority and virtual networks. Since all this power comes at a steadily decreasing cost, it is steadily invading other cost-sensitive areas, most notably automation networks. This chapter will look at some critical aspects of switched Fast Ethernet as an automation network.
21.1 The Switches Are Not the Complete Network Exchanging the classic Ethernet infrastructure for a switched infrastructure does wonders for the infrastructure, but not necessarily for our automation network. The reason for this is illustrated in Figure 21.1, which shows all the stages in an information transfer between an automation controller and an input/output (I/O) node. The purpose of this chapter is to: • Analyze the advantages, disadvantages, and peculiarities of a switched Ethernet automation network • Point out the remaining bottlenecks between the controller software and the I/O node software
21-1 © 2005 by CRC Press
21-2
The Industrial Communication Technology Handbook
Controller softw are
I/O node softw are
Protocol
Protocol
Netw ork Interface Drop Link
Netw ork Interface
Netw ork
Drop Link
FIGURE 21.1 The complete end-to-end network.
• Discuss how the characteristics of the switched network affect some specialized automation network tasks
21.2 Analyzing Switched Fast Ethernet Let us start by recapitulating some basic facts about switched Ethernet automation networks: 1. Just as in a hub, an Ethernet switch contains a number of ports. Each port is either active (connected to an active Ethernet node) or passive (disconnected or connected to an inactive Ethernet node). 2. The connection between an active port and its associated Ethernet node is point to point. The connection may be full duplex (send and receive simultaneously) if the associated node supports it; otherwise, it is half duplex. 3. Each port in an Ethernet switch has a table of network addresses associated with that port. These tables are created by inspecting network packets sent from the node and extracting the source address from the packets. 4. Ethernet switches use a store-and-forward approach, which means that the complete network packet is received and verified by the switch before it is transferred to the output port. 5. The transfer of a network packet from one port to another inside the Ethernet switch is done by memory-to-memory copy at a very high speed. 6. An Ethernet switch does not use the collision mechanism of classic Ethernet. 7. Several transfers between different ports may take place more or less simultaneously. If an Ethernet switch has N ports, it should also be capable of handling N simultaneous connections running at full link speed (assuming that none of them request the same output port). This means that an Ethernet switch has a potentially much greater data transfer capability than a hub.
21.2.1 The Learning Process inside the Switch Item 3 above gave the algorithm the switch uses for the internal transfer of network packets. What happens if the switch does not have the destination node address in any port table? In this case, the switch plays it safe and transfers the packet to every output port in the switch. It reduces the switch performance to that of a hub, but it ensures that if the destination node is anywhere on the network, the packet will get through. In the meantime, the switch has learned the location of the source node and stored it in the correct table. This learning process is usually very fast, since each node only has to answer once in order for it to be associated with the correct port. There are, however, some cases when the learning process fails, the most important being: • Ethernet broadcast • Ethernet multicast 21.2.1.1 Broadcast: Use with Care Ethernet inherited the concept of broadcast (an address that is accepted by every node on the network) from high-level data link control (HDLC). This functionality turned out to be very useful in several network protocols:
© 2005 by CRC Press
Switched Ethernet in Automation Networking
• • • •
21-3
Name announcing in NetBIOS (“Hi everybody, my name is …”) The Service Advertising Protocol (SAP) in IPX/SPX Network address inquiry in ARP (“Hello everybody, who has got this IP address?”) IP address deployment in BOOTP and DHCP (“I just woke up and I want an IP address!”)
The drawbacks inherent in excessive usage of broadcast messages are: • Broadcast messages cannot be filtered by hardware and must perforce be handled by software, thus consuming CPU resources. • If broadcast messages are allowed to pass through bridges and routers unchecked, they may create a broadcast storm (broadcast messages propagated in circles and thereby clogging the whole network). • If a broadcast address is used as a source address, it may create a lot of network problems (this is a well-known way of creating network meltdown). • It reduces the performance of a switched network to that of a hub-based network. 21.2.1.2 Multicast: A Useful, but Dangerous Concept A huge number of Ethernet addresses are reserved for multicast usage. The multicast concept addresses the need for creating network groups of some sort. Traditionally, the multicast concept has been popular in some automation contexts, usually associated with a philosophy called publish-and-subscribe. This philosophy allows independently developed distributed applications to be able to exchange information in an event-driven manner without needing to know the source of the data or the network topology. Information producers publish information anonymously. Subscribers anonymously receive messages without requesting them. Like other broadcast-based models, publish-and-subscribe is efficient when used on a classic network. For example, if the cost of energy changes in a distribution system, only a single transmission is required to update all of the devices dependent on the energy price. (This is, of course, in the best or most optimistic case.) On a switched Ethernet network, the situation changes drastically. For a standard (unmanaged) switch, multicasting actually uses more bandwidth than sending the same message to one recipient after another. This surprising fact is best illustrated with an example. If we have a sixteen-port switch and send the same message to four nodes, the message will occupy the sending port four times and the four receiving ports one time each, for a total of eight message times. Sending it as a multicast means that the message will occupy all sixteen ports for one message time — a total of sixteen message times. The moral in this case is: Know your protocols and your infrastructure — and make sure they work together, not against one another.
21.3 There Are Always Bottlenecks Referring again to Figure 21.1, we observe that the network infrastructure is just one part of the communication path from the controller application to the I/O node application. The drop links are unchanged from the hub-based network, and the end nodes (the controller and the I/O station) may well be unchanged from the days of the 10-Mbit network — or even older. We shall now proceed to look at the most important bottlenecks in our switched network.
21.3.1 Even Highways Have Queues 21.3.1.1 Queues in the Switches The nondeterministic behavior of traditional switched Ethernet is caused by unpredictable traffic patterns. At times, packets from several input ports will be destined for the same output port and some of them must perforce be queued up waiting for the output port to be free. The reason is that at times there
© 2005 by CRC Press
21-4
The Industrial Communication Technology Handbook
will be a lot of low-priority traffic in the system (node status reports, node firmware update, etc.). If we do not introduce some sort of traffic rules, such a situation will give an unpredictable buffering delay depending on the number of involved packets and their length. In the worst case, packets can be lost when the amount of packets sent to an output port exceeds the bandwidth of this port for a period that is longer than the output buffer is able to handle. An automation network will have such coexisting real-time traffic (raw data, time sync data, commands, etc.) and noncritical data (Transmission Control Protocol (TCP)/Internet Protocol (IP) — file transfer, etc.). A maximum size packet (1518 bytes) represents an extra delay of 122 ms for any following queued packets in case of a 100-Mbps network. 21.3.1.2 Queues on the Drop Links Even if we manage to introduce some sort of priority mechanism in our system, we have one queue mechanism left. Since some standards (International Electrotechnical Commission (IEC) 61850-9 springs to mind) propose to transfer real-time data by using a publish-and-subscribe approach (which is again based on the use of multicast groups), a standard unmanaged Ethernet switch has to route these data packets onto every drop link in the system adhering to the broadcast paradigm. Since all those real-time packets might have the same priority, they will be put in the same switch queue on all output ports. The processing rate for these queues is equal to the bandwidth of the drop link between the switch and the nodes, and therefore is effectively a drop link queue. This way, important data packets destined for one node may be delayed by multicast traffic destined for another node. Obviously, we need to introduce some traffic rules/mechanisms for multicast data packets as well in order to reduce the worst-case transfer time. 21.3.1.3 Queues in the Nodes Introducing traffic rules in the Ethernet switches will improve the worst-case latency across the network (from the Ethernet controller in the source node to the Ethernet controller in the destination node). Inside the node, however, there is usually only one single network task, and just a single hardware queue associated with the Ethernet controller. Since a network packet spends a large percentage of its total endto-end time inside a node, internal packet prioritization is needed in order to have maximal control over the total packet transfer time (80 to 90% of the end-to-end message latency is spent within the end nodes, at least when adhering to the Internet Engineering Task Force (IETF) protocols, User Datagram Protocol (UDP)/TCP/IP). • For the transmit operation, one typically could have a situation where multiple maximum size Ethernet packets, for example, fragments of an FTP transfer, queued up at the Ethernet driver level. In a standard implementation, real-time packets will be added to the end of the queue. Such behavior will cause a nonpredictable delay in transmission. • For the receive operation, the protocol stack implementation represents a possible bottleneck in the system. This is mainly due to the first-come, first-served queue at the protocol multiplexer level (the level where the network packets are routed to the appropriate protocol handler software).
21.3.2 Introducing a Standard for Priority and Delivery 21.3.2.1 High-Priority Packets Jump Ahead in Switch Queues Institute of Electrical and Electronics Engineers (IEEE) 802.1D (see [3]) has been introduced to alleviate the switch queue problem; moreover, the standard specifies a layer 2 mechanism for giving missioncritical data preferential treatment over noncritical data. The concept has primarily been driven by the multimedia industry and is based on priority tagging of packets and implementation of multiple queues within the network elements in order to discriminate packets [1]. For tagging purposes IEEE 802.1Q [4] defines an extra field for the Ethernet medium access control (MAC) header. This field is called the Tag Control Info (TCI) field and is inserted as indicated by Figure 21.2. This field contains three priority bits; thus, the standard defines eight different levels of priority.
© 2005 by CRC Press
21-5
Switched Ethernet in Automation Networking
Destination
Source
0x8100
Tag
xxx x
Type
FCS
0xYYY
12-bit 80.21Q VLAN identifier "Canonical" - 1 bit 3-bit priority field (802.1D) Tagged frame type interpretation - 16 bit
FIGURE 21.2
MAC header (layer 2) with tag.
21.3.2.2 Multicast Distribution Rules Reduce Drop Link Traffic Figure 21.2 defines more than some priority bits. If you look closely, you can see something called a “12bit 802.1Q VLAN identifier.” This virtual local area network (VLAN) identifier is a mandatory part of each TCI field and can, when properly used, remove all unnecessary drop link traffic in an automation network based on publish-and-subscribe. We will discuss the handling and usage of this field in the section on VLANs.
21.3.3 High-Priority Packets Get High-Priority Treatment A network data packet spends a relatively small percentage of its application-to-application transfer time on the physical network. The actual percentage varies with the speed of the network and the performance of the node, but for a 100 Mbit/s Fast Ethernet, the average percentage is between 20 and 0.1%. For this reason, implementing network priority will have little influence on the average application-to-application transfer time (it will, however, have a large influence on the worst-case transfer time). In order to improve the average application-to-application transfer time, the concept of priority must be extended to include the protocol layers in both the sending and receiving ends. In order to accomplish this, we must consider: • Adjustable process priority for the protocol stack software • Several instances of the protocol stack software running at different priority levels • Multiple transmit queues at the Ethernet driver level 21.3.3.1 Matching Network Process Priority to Packet Priority In most real-time operating systems, the protocol stack runs as a single thread in the context of some networking task. If we want the task priority to depend on the packet priority, we can implement it in a very simple way: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Put those priorities in a table. 2. Set a high basic networking task priority (the priority it uses when no packet is being processed). 3. When the networking task starts to process a packet, it should extract the packet priority and use it with the priority table described above to find the appropriate task priority. 4. Change the task priority by executing a system call. 21.3.3.2 Multiple Instances of the Protocol Stack Software There is, however, a problem with the previous solution. Adjusting to correspond to the packet priority ensures that the processing of high-priority network packets does not get interrupted by less important administrative tasks. Still, networking tasks process incoming messages one at a time in a linear fashion. What we really want is to process high-priority messages before low-priority ones. In fact, if we could suspend the processing of low-priority network packets when a high-priority message arrives, we would have the perfect solution. At first glance, there exists an ideal solution: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Create one task for each priority and put the task IDs in a table.
© 2005 by CRC Press
21-6
The Industrial Communication Technology Handbook
2. When a packet arrives, extract the packet priority, use it with the task table described above, send the packet to that task, and send a signal to the task in order for it to start processing. The problem with this solution is that it supposes that the network software is reentrant, a condition that is seldom fulfilled. Rewriting the stack software to make it reentrant is not hard, just tedious and time-consuming. Of course, it also means that you have to support the rewritten software in the future. Running multiple instances of the protocol software may be the most elegant and efficient solution, but be prepared to spend some time and resources on it. 21.3.3.3 Multiple Receive Queues and Adjustable Priority If we do not want to spend time and money on making the network software reentrant, there is an alternative solution available. This solution does not suspend the processing of lower-priority packets, but selects the next packet to be processed from a set of criteria based on the packet priority. One possible implementation goes like this: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Create one queue for each priority and put a pointer to the queue in a table. 2. Whenever the network software is ready to process the next packet, it pulls all packets from the input queue and distributes them to the priority queues. 3. When the input queue is empty, the priority selection algorithm is run. This algorithm may be implemented in several different ways, for example: • Always pick the packet from the highest-priority nonempty queue. • Pick the packet from the highest-priority nonempty queue a certain number of times. Then move the first packet from each nonempty queue to the next higher-priority queue before picking the packet from the highest-priority nonempty queue again. • Introduce a LOW flag. Wait for a packet to appear in the highest-priority queue or for the LOW flag to be set. If the packet appeared in the highest-priority queue, set the LOW flag and process that packet. If the LOW flag was set, reset it and process the packet from the highestpriority nonempty queue. 21.3.3.4 Implementing Multiple Transmit Queues On a transmit request, a standard Ethernet driver takes a buffer, does some housekeeping, and transfers it to the Ethernet controller hardware. Such a driver is simple and fair (first come, first served), but may be unsuitable for an efficient priority implementation. One or more large low-priority packets, once they are scheduled for transmission, will delay high-priority packets for the time it takes to transfer the lowpriority ones. If we want high-priority packets to go to the head of the transmission queue, and the hardware does not support multiple queues, we must do some priority handling at the driver level. The simplest solution is to use two queues: the hardware queue and a software queue. Low-priority packets go into the software queue, and high-priority packets go straight into the hardware queue. From this point onward, there exist several algorithms addressing different needs. • If the real-time requirements are moderate, move the first packet in the software queue to the hardware queue whenever the hardware queue is empty. • If the real-time requirements are strict, move the first packet in the software queue to the hardware queue whenever a high-priority packet has been placed in the hardware queue.
21.3.4 Bottleneck Conclusions The conclusion, based on a large set of real-world measurements (see [Reference 2]) is that: 1. At 100 Mbit/s, the Ethernet switch and the drop links do not constitute a bottleneck under FTP load. 2. If several switches are interconnected by standard drop link cables, these cables might represent bottlenecks. Most switches have provisions for Gigabit Ethernet (1000 Mbit/s) interconnections, however, thereby removing this possibility.
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-7
3. The main communication delays are inside the nodes. A suitable queue strategy for high-priority packets ensures that those will be processed before low-priority packets. 4. For a high-performance node processor, implementing prioritized transmit queues has a greater overall impact on high-priority packet delays than implementing prioritized receive queues. 5. If you want to implement internal receive priority queuing, ensure that it is possible to configure the chosen switch not to remove the priority tagging information.
21.4 Time Synchronization across Switched Ethernet Now and then you come across measurement problems that are tightly associated with the notion of synchronicity, meaning that things need to happen simultaneously. The usual things that need such synchronicity are data sampling and motion control. In the case of data sampling, you need to know the value of two different quantities measured at the same time (within a narrow tolerance). If the measurement sources are close together, this is fairly easy to accomplish, but if they are far apart and connected to different measurement nodes, it suddenly gets harder. The usual choices are: 1. Use a special hardware signal on a separate cable between the controller and all nodes that need synchronization. If the nodes are far apart and the tolerances are tight, make sure that all cables that carry the synchronization signal have the same length. 2. Add a local clock to each node and use the present automation network to keep them in synchronization. Tell each node how often the measurement sources should be sampled and require the node to time-stamp each measurement. We shall now take a look at the hardest synchronization requirements for automation purposes and discuss the possibility of implementing class T5 (1 ms) and class T3 (25 ms) synchronization in a multitraffic switched Ethernet environment. Common for both solutions is that they adhere to the same standardized time protocol. Such a step would significantly reduce the cabling and transceiver cost, since costly dedicated (separate) links are used for this purpose today.
21.4.1 The Concept of Time Stamping Let us start at the very beginning — the concept of time stamping: Time stamping is the association of a data set with a time value. In this context, time may also include date. Why would anybody want to time-stamp anything? The closest example may be on your own PC — whenever you create a document and save it, the document is automatically assigned a data and time value. This value enables you to look for: • Documents created on a certain date (for example, last Monday) • Documents created within a certain time span (for example, last half of 1998) • The order in which a set of documents was created (for example, the e-mails in your inbox) If we just look at the examples above, we see that the accuracy we need for the time stamping is about the same as that which we expect from our trusty old wristwatch. This means “within a couple of minutes,” but as long as the clock does not stop, it does not really matter much how precise it is. 21.4.1.1 Let Us Synchronize Our Watches Now we know about time stamping on our own PC. The next step is to connect the PC to a network, maybe even to the Internet, and start exchanging documents and e-mails. What happens if the clock in your PC (the clock that is used for time stamping) is wrong by a significant amount? • If you have an e-mail correspondence with someone, a reply (which is time-stamped at the other end) might appear to be written before the question (which is time-stamped at your end). • If you collaborate on some documents, getting the latest version might be problematical.
© 2005 by CRC Press
21-8
The Industrial Communication Technology Handbook
Therefore, when several PCs are connected together in any sort of network, the PC clocks are still accurate enough, but a new requirement is that they should be synchronized (show the same time at one point in time). Now, we could go around to each PC, look at our wristwatch, and set the PC clock to agree with it. The trouble is that this is a boring and time-consuming job and we should look for a better solution. One solution is to elect one PC to be the time reference, which means that every other PC should get the current time from it at least once a day and set its own clock to agree with that time. This solution works satisfactorily on a LAN, but all PC clocks will lag the time reference by the time it takes a clock value to travel from the time reference to the synchronizing PC. Except for very unusual cases, this lag is less than 1 s and thus good enough for office purposes. Enter the Internet. Suddenly the synchronization problem escalates, since two collaborating PCs may be located in different time zones (remember to compensate for that) and a synchronization message may take a long time to travel from one PC to the other. Fortunately, the Internet Network Time Protocol has a solution to both problems. This protocol involves sending a time-stamped time request message to a timeserver. This timeserver adds an arrival time stamp and a retransmit time stamp before returning the request message to the requesting PC. The requesting PC time-stamps the message when it returns and uses all the time stamps in calculating the correct time. This protocol and its little brother, the Simple Network Time Protocol (SNTP), are able to synchronize computers across the Internet with a precision in the low milliseconds.
21.4.2 Synchronization Requirements in Substation Automation In the energy distribution world, a substation is an installation where the energy is combined, split, or transformed. A substation automation (SA) system refers to tasks that must be performed in order to control, monitor, and protect the primary equipment of such a substation and its associated feeders. In addition, the SA system has administrative duties such as configuration, communication management, and software management. The communication within SA systems is crucial from the point that the functionality demands for very time-critical data exchange. These requirements are substantially harder than the corresponding requirements in general automation. This is also true for the required synchronization accuracy of the IED’s* internal clock in order to guarantee precise time stamping of current and voltage samples. Various SA protection functions require different levels of synchronization accuracy. IEC has provisionally defined five levels: IEC classes T1 to T5 (IEC 61850-5, Sections 12.6.6.1 and 12.6.6.2): • • • • •
IEC class T1: 1 ms IEC class T2: 0.1 ms IEC class T3: ±25 ms IEC class T4: ±4 ms IEC class T5: ±1 ms
Since these definitions and classes are not yet frozen, we will refer to them here as class T1, class T2, etc. (without IEC). At this point in time, the substation automation field is also on the edge of migrating toward the usage of switched Fast Ethernet as the automation network infrastructure. The ultimate vision is to achieve interoperability between products from different vendors on all levels within the substation automation field. A proof of this new trend is the upcoming IEC 61850 standard, Communication Networks and Systems in Substations, issued by the technical committee. Inventions of de facto standard concepts and adoption of off-the-shelf technologies are the key instruments to reaching the interoperability goal. Figure 21.3 illustrates the communication structure of a future substation adhering to switched Ethernet as a common network concept holding multiple coexisting traffic types [5].
*IED: Intelligent Electrical Device.
© 2005 by CRC Press
21-9
Switched Ethernet in Automation Networking
Gateway/ firew all
Local station
P&C
P&C
IED IED
IED
IED IED
IED
1µs
FIGURE 21.3 The communication network in a future substation automation system using Ethernet as a common network infrastructure.
We already know that switched Fast Ethernet has sufficient real-time characteristics to meet very demanding automation requirements. What is left to show is that it is possible to implement the various IEC classes of synchronization accuracy over such a network. This is considered to be the final obstacle of fully migrating to Ethernet in substation automation. 21.4.2.1 Time Synchronization Status There is a plethora of proposed theory and methods for synchronizing clocks in distributed systems. The most prominent public domain synchronization method is the Network Time Protocol (NTP), which is standardized by the IETF in RFC1305. A subset of NTP (Simple Network Time Protocol) is also defined and is protocol compatible with NTP. The intended use of NTP is to synchronize computer clocks in the global Internet. For this purpose, it relies on sophisticated mechanisms to access national time, organize time synchronization subnets possibly implemented over various media, and adjust the local clock in each participating peer. SNTP, on the other hand, does not implement the full set of NTP algorithms and is targeting simpler synchronization purposes. Common for this body of work is that it does not present solutions for low-microsecond accuracy, instead targeting synchronization of LANs and wide area networks (WANs) in the general sense where a precision of some milliseconds is sufficient. Looking at the automation field in general and especially at the SA world, we find a diversity of proprietary and patented solutions in order to achieve highly accurate time synchronization over the Ethernet. Interoperability concerns are, however, present. 21.4.2.2 Stating the Problem: Why Network Synchronization Is Difficult The delays from the time stamping of a time synchronization message in the message source node until it is time-stamped in the message destination node are: • • • • • •
Message preparation delay Communication stack traversal delay (transmission) Network access delay Network traversal delay Communication stack traversal delay (reception) Message handling delay
Variations in the delays are due to: • Real-time operating system (RTOS) scheduling unpredictability • Network access unpredictability • Network transversal time variations
© 2005 by CRC Press
21-10
The Industrial Communication Technology Handbook
Time stamping at the lowest stack level helps eliminate the stack delay variations and RTOS scheduling unpredictability, but introduces some complications in the implementation.
21.4.3 How to Be Extremely Accurate: IEC Class T3 We have mentioned that the precision that may be achieved by the traditional NTP/SNTP implementations is 1 ms at best. Basically, this stems from the time stamping of incoming and outgoing NTP/SNTP packets at the NTP/SNTP application layer. As stated above, this makes time stamping a victim of realtime OS scheduling unpredictability. In this section we describe how a high degree of accuracy can be achieved using a tuned SNTP implementation and standard Ethernet switches. 21.4.3.1 A Tuned SNTP Time Protocol Implementation The NTP/SNTP time synchronization algorithm presented has definite limits to the level of attainable accuracy. Let us recapitulate: The NTP/SNTP algorithm is based on a network packet containing three time stamps: • t1: The (client) time the packet was generated in the client asking for the current time • t2: The (server) time the packet arrived at the timeserver • t3: The (server) time the packet was updated and put into the transmission queue at the server In addition, the calculations require: • t4: The (client) time the packet arrived back at the client Now, t2 and t4 can easily be determined down to the microsecond (and perhaps even better) using hardware or software time stamps based on packet arrival interrupts. The other two have definite problems, however. For full accuracy, t1 and t3 should be the time when the packet left the time client or timeserver, respectively. The problem is that this time stamp is not available until the packet has really left the timeserver or client, and then it is, of course, too late to incorporate it into the packet. Therefore, the time synchronization inaccuracy for an NTP/SNTP setup is the variation in the delay between t1 and the time the packet leaves the time client plus the variation in the delay between t3 and the time the packet leaves the timeserver. It turns out that there are several ways of determining the time when the synchronization packet actually leaves the time client. One algorithm runs like this: 1. Create the NTP packet and fill in the t1 (originate time stamp) as usual. Transmit the package to the timeserver. 2. Get hold of the time stamp when the packet leaves the time client, using any appropriate mechanism. Store this time stamp (t11) in a data structure together with t1. 3. When the packet returns from the timeserver, extract t2 from the packet and use t1 to look up the corresponding value of t11. Since t11 and t1 are stored together, there is no chance of confusion here. 4. Substitute t11 for t1 in the packet. 21.4.3.2 Achieving Precise Time Stamping within a Node The nature of real-time operating systems (they guarantee a maximum response time for an event but allow for a wide variation below that) introduces a substantial variation in the time spent in the communication stacks. This fact has necessitated an interrupt-level time stamping in both the time client and timeserver. The IEC class T3 solution described here adheres to the principle of interrupt-level time stamping of the SNTP request packet when sent from the time client and when received at the timeserver. Moreover, we propose that the synchronization is based on the (corrected) transmit time stamp set by the client (referred to as t1 in SNTP terminology) and the receive time stamp set by the server (referred to as t2). Usage of a possible low-level transmit time stamping of the corresponding SNTP reply packet (referred to as t3) necessitates novel techniques for controlling the nondeterministic access of an Ethernet packet to an Ethernet bus.
© 2005 by CRC Press
21-11
Switched Ethernet in Automation Networking
SNTP time server
SNTP time client GPS
Automation application
SNTP server TCP
t3
t0
UDP
UDP
IP Low level
MAC SNTP
SNTP client
TCP IP
t2
t4
t1
802.3
Low level MAC SNTP
802.3
FIGURE 21.4 The SNTP time client–server relation using low-level time stamping.
A side effect of using t1 – t2 only is that no mechanism for automatic calibration of the network latency will be available, and therefore a manual calibration of the propagation delays of the drop links and the minimum switch latency* must be performed. Figure 21.4 illustrates the setup of a SNTP time client and timeserver adhering to interrupt-level time stamping. 21.4.3.3 Time Client Implementation Issues There are several ways of time-stamping a network packet. We shall look at three of them and show that only the first two are suitable for accurate time synchronization: 1. Hardware time stamping in the Ethernet controller. 2. Software time stamping in an Interrupt Service Routine (ISR) outside the RTOS. This ISR should be connected to the Ethernet Interrupt Request signal and have a top hardware priority. 3. Software time stamping in an ISR controlled by the RTOS (Ethernet driver). This ISR is connected to the Ethernet Interrupt Request signal with a normal hardware priority. Using any of these low-level time-stamping methods is considered an implementation issue and will not cause any incompatibly between a low-level time-stamping client and a standard high-level timestamping server. In addition to low-level time stamping, the time client must consider the following aspects: • The interval between time updates • The specifications of the local time-of-day clock with respect to resolution, accuracy/stability, and the availability of drift and offset correction mechanisms • The usage of adaptive filtering and time-stamp validation methods in order to remove network delay variations 21.4.3.4 Timeserver Implementation Issues In order to achieve class T3 accuracy, the timeserver should be able to time-stamp an incoming message with an accuracy of better than 2 ms independently of network load. The exact time should be taken from a global positioning system (GPS) receiver, and the time parameters distributed from the timeserver should be based on GPS time representation instead of absolute time (i.e., coordinated universal time (UTC) timing) in order to cope with the leap-second problem. It is also convenient if the timeserver supports full-duplex connectivity in order to avoid a situation where upstream data introduce extra switch latency in downstream data (i.e., time requests). 21.4.3.5 Ethernet Infrastructure Implementation Issues Only one switch should preferably be allowed between a time client and a timeserver. Having multiple switch levels will impose increased jitter† through the infrastructure, which again might call for more
*The minimum Ethernet switch delay is usually given in the switch data sheet. †Jitter: Variations in the delay.
© 2005 by CRC Press
21-12
The Industrial Communication Technology Handbook
complex filtering at the time client side. The Ethernet switch must also have good switch latency characteristics. The switch latency from the client drop link to the server drop link depends on several parameters: • General switch load: This means all the network load on the switch except for the packets sent to the timeserver. The variations in the switch latency from the client drop link to the server drop link should be less than 2 ms. • Timeserver load: This parameter means other packets sent to the timeserver that may introduce extra delay in the transmission of a given SNTP request packet. This delay can be handled at the time client side using various filtering techniques (see time client requirements).
21.4.4 Measurements on an Actual Network Extensive tests and measurements regarding time synchronization accuracy on a switched Ethernet network have been undertaken [6]. The conclusions from this body of work are: • Traffic not destined for the timeserver does not interfere with traffic to the timeserver. • The switch latency for Ethernet packets to the timeserver depends to a great extent on other traffic to the timeserver. The full network measurements conclusions are: • Software time stamping using a sufficiently high priority interrupt (preferably nonmaskable) is for all practical purposes indistinguishable from time stamping using special-purpose hardware. • Software time stamping using an interrupt under RTOS control needs sophisticated filtering and statistical techniques before it can be used for time synchronization purposes. In that respect, this time stamping method is not suitable for IEC class T3 synchronization. • IEC class T3 time synchronization using tuned SNTP over a switched Ethernet has been shown to be eminently feasible.
21.4.5 Beyond the Speed of Light: Class T5 It is now possible to procure industrial-class Fast Ethernet switches fulfilling an extensive list of environment requirements relevant for substation automation applications. Some of these switches can even be delivered with an integrated SNTP timeserver. Since the internal switch logic has full control over input and output ports, time-stamping an SNTP request packet on arrival is no problem. In addition, the switch logic can insert the transmit time stamp whenever the output port is ready for the reply packet (and even adjust the time stamp for the delay between time stamping and actual transmission). Thus, the traditional problem related to the nondeterministic access to the Ethernet is not a problem here due to the tight interaction between the SNTP timeserver and the switch architecture. This time synchronization scheme provides the following: • Timing synchronization accuracy better than 1 µs if time stamping in the time client is performed in hardware; see “Time Client Implementation Issues.” • Both server time stamps — t2 (receive) and t3 (transmit) — may be used at the time client for synchronization purposes, and the drop link propagation delay can easily be removed based on the calculated round-trip delay. • The timing accuracy is independent of the network load. • No clever filtering/erasure techniques are needed in the time client.
21.4.6 Summary and Conclusions We have presented general solutions for achieving class T5 (1 ms) and class T3 (25 ms) time synchronization over the switched Ethernet. The former is based on a dedicated Ethernet switch/timeserver
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-13
combination, while the latter relies on standard switches. Common for both solutions is that they adhere to the low-level time-stamp implementation of the Simple Network Time Protocol. Hardware time stamping or low-level software time stamping outside the real-time operating system eliminates client inaccuracy from the error budget of the SNTP time synchronization loop. If the SNTP timeserver relies on the same time-stamping techniques, the only remaining factor to be handled in the error budget is possible time delay variations within the infrastructure. In these settings, class T3 synchronization is possible over the switched Ethernet.
21.5 Introducing Virtual Subnetworks What is a virtual local area network? A concept made possible by Ethernet switches, the multitude of vendor-specific VLAN solutions and implementation strategies has made it very difficult indeed to define precisely what a VLAN is. Nevertheless, most people would agree that a VLAN might be roughly equated to a broadcast domain.* More specifically, VLANs can be seen as analogous to a group of end stations, perhaps on multiple physical LAN segments, which are not constrained by their physical location and can communicate as if they were on a common LAN. Currently, three types of VLANs are of interest: • Port group VLANs • Layer 2 multicast group-based VLANs • Layer 3-based VLANs Here we will only give a brief overview of these different types of VLANs. An important aspect to note in this context is the fact that it is now possible to define dynamic VLANs that correspond exactly to a multicast group. This means that multicast frames will only propagate to members of the indicated VLAN and not to anyone else. In particular, the frames will not occupy drop links or CPU resources in nodes that do not belong to the multicast group.
21.5.1 Port Group VLANs Port-based VLAN is a concept relying on a pure manual configuration of Ethernet switches (bridges) to set up VLAN membership. Many initial VLAN implementations defined VLAN membership by groups of switch ports (for example, ports 1, 2, 3, 7, and 8 on a switch made up VLAN A, while ports 4, 5, and 6 made up VLAN B). Furthermore, in most initial implementations, VLANs could only be supported on a single switch. Second-generation implementations support VLANs that span multiple switches (for example, ports 1 and 2 of switch 1 and ports 4, 5, 6, and 7 of switch 2 make up VLAN A, while ports 3, 4, 5, 6, 7, and 8 of switch 1 combined with ports 1, 2, 3, and 8 of switch 2 make up VLAN B). Port grouping is still the most common method of defining VLAN membership, and configuration is fairly straightforward. Defining VLANs purely by port group does not allow multiple VLANs to include the same physical segment (or switch port). However, the primary limitation of defining VLANs by port is that the network manager must reconfigure the VLAN membership when a node is moved from one port to another.
21.5.2 Group-Based VLANs: GARP VLAN Registration Protocol In the 802.1D standard (see [3]), the Generic Attribute Registration Protocol (GARP) is introduced. This is a general protocol that allows network nodes to control selected switch properties. The first two
*Broadcast domain: A collection of all nodes that can be reached by a broadcast message from one of them.
© 2005 by CRC Press
21-14
The Industrial Communication Technology Handbook
implementations of this protocol are the GARP Multicast Registration Protocol and the GARP VLAN Registration Protocol (GVRP). GVRP provides a mechanism that allows switches and end stations to dynamically register (and subsequently, de-register) VLAN membership information with the Ethernet switches attached to the same LAN segment, and for that information to be disseminated across all switches in the LAN that support extended filtering services. Moreover, no manual configuration of the switches is required, opposed to the port-based solution; on the other hand, the switches must implement extra software as specified by IEEE 802.1D [3]. The operation of GVRP relies upon the services provided by GARP. The information registered, deregistered, and disseminated via GVRP is in the following forms: 1. VLAN membership information indicates that one or more GVRP participants that are members of a particular VLAN (or VLANs) exist, and the Ethernet frames carry a 12-bit VID* (see Figure 21.2) that state the membership. The act of registering/de-registering a VID affects the contents of dynamic VLAN registration entries to indicate the port(s) on which members of the VLAN(s) have been registered. 2. Registration of VLAN membership information allows the Ethernet switches in a LAN to be made aware that frames associated with a particular VID should only be forwarded in the direction of the registered members of that VLAN. In this way, the VLAN membership is propagated through the Ethernet infrastructure, and forwarding of frames associated with the VID therefore occurs only on ports on which such membership registration has been received. GVRP is a very new protocol concept and is not yet widely supported by industrial Ethernet switches, but it is foreseen to be one of the future technologies for handling the publish-and-subscribe automation network philosophy. One such handling algorithm consists of three steps: 1. Map every multicast group to one specific VLAN identifier. This is, of course, an offline activity. 2. At system start-up time, all nodes send out one small packet for each VLAN (and thereby multicast group) it wants to join. This packet, which is not a part of any complicated protocol stack, gives the network infrastructure the information it needs in order to map VLAN identifiers to physical drop links. For simple data source nodes, these packets can be precompiled. 3. Whenever a multicast packet is to be transmitted, it should be tagged with “real-time priority” and the VLAN identifier corresponding to the multicast group. These simple steps ensure that multicast packets do not block any unnecessary drop links. Below we discuss possible solutions for taking control of the latency within the end nodes.
21.5.3 Layer 3-Based VLANs VLANs based on layer 3 information take into account protocol type (if multiple protocols are supported) or network layer address (for example, subnet address for TCP/IP networks) in determining VLAN membership. Although these VLANs are based on layer 3 information, this does not constitute a routing function and should not be confused with network layer routing. Even though a switch inspects a packet’s IP address to determine VLAN membership, no route calculation is undertaken, and frames traversing the switch are usually bridged according to implementation of the Spanning Tree Algorithm (the purpose of this algorithm is to make sure that no network packet is caught in an endless loop between switches). There are several advantages to defining VLANs at layer 3. First, it enables partitioning by protocol type. This may be an attractive option for network managers who are dedicated to a service- or application-based VLAN strategy. Second, nodes can physically move their workstations without having to reconfigure each workstation’s network address — a benefit primarily for TCP/IP nodes. Third, defining *VID: VLAN identifier.
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-15
VLANs at layer 3 can eliminate the need for frame tagging in order to communicate VLAN membership between switches, reducing transport overhead. One of the disadvantages of defining VLANs at layer 3 (vs. MAC- or port-based VLANs) can be performance. Inspecting layer 3 addresses in packets is more time consuming than looking at MAC addresses in frames. For this reason, switches that use layer 3 information for VLAN definition are generally slower than those that use layer 2 information. It should be noted that this performance difference is true for most, but not all, vendor implementations.
References [1] Ø. Holmeide and T. Skeie, VoIP drives realtime ethernet, in Industrial Ethernet Book, Vol. 5, GGH Marketing Communications, Titchfield, UK, 2001. [2] T. Skeie, S. Johannessen, and Ø. Holmeide, The road to an end-to-end deterministic Ethernet, in Proceedings of the 4th IEEE International Workshop on Factory Communication Systems (WFCS), September 2002. [3] IEEE 802.1D, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Communication Specification: Part 3: Media Access Control Bridges, 1998. [4] IEEE 802.1Q, IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks, 1998. [5] T. Skeie, S. Johannessen, and C. Brunner, Ethernet in substation automation, IEEE Control Systems Magazine, 22: 43–51, 2002. [6] T. Skeie, S. Johannessen, and Ø. Holmeide, Highly accurate time synchronization over switched Ethernet, in Proceedings of the 8th IEEE Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juan-les-Pins, France, 2001, pp. 195–204.
© 2005 by CRC Press
22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches 22.1 Introduction ......................................................................22-1 22.2 Wireless Industrial Communications and Wireless Fieldbus: Challenges and Problems .................................22-2 System Aspects • Real-Time Transmission over Error-Prone Channels • Integration of Wired and Wireless Stations/Hybrid Systems • Mobility Support • Security Aspects and Coexistence
22.3 Wireless LAN Technology and Wave Propagation .........22-5 Wireless LANs • Wave Propagation Effects
22.4 Physical Layer: Transmission Problems and Solution Approaches .........................................................22-7 Effects on Transmission • Wireless Transmission Techniques
22.5 Problems and Solution Approaches on the MAC and Link Layer ..........................................................................22-8 Problems for Wireless MAC Protocols • Methods for Combating Channel Errors and Channel Variation
22.6 Wireless Fieldbus Systems: State of the Art ..................22-12 CAN • FIP/WorldFIP • PROFIBUS • Other Fieldbus Technologies
22.7 Wireless Ethernet/IEEE 802.11.......................................22-14 Brief Description of IEEE 802.11 • Real-Time Transmission over IEEE 802.11
Andreas Willig University of Potsdam
22.8 Summary..........................................................................22-15 References ...................................................................................22-15
22.1 Introduction Wireless communication systems diffused into an ever-increasing number of application areas and achieved wide popularity. Wireless telephony and cellular systems are now an important part of our daily lives, and wireless local area network (WLAN) technologies become more and more the primary way to access business and personal data. Two important benefits of wireless technology are key to this success: the need for cabling is greatly reduced, and computers as well as users can be truly mobile. This saves
22-1 © 2005 by CRC Press
22-2
The Industrial Communication Technology Handbook
costs and enables new applications. In factory plants, wireless technology can be used in many interesting ways [24, Chap. 2]: • Provision of the communication services for distributed control applications involving mobile subsystems like autonomous transport vehicles, robots, or turntables • Implementation of distributed control systems in explosible areas or in the presence of aggressive chemicals • Easing frequent plant reconfiguration, as fewer cables have to be remounted • Mobile plant diagnosis systems and wireless stations for programming and on-site configuration However, when adopting WLAN technologies for the factory floor, some problems occur. The first problem is the tension between the hard reliability and timing requirements (hard real time) pertaining to industrial applications, on the one hand, and the problem of wireless channels having time-variable and sometimes quite high error rates, on the other hand. A second major source of problems is the desire to integrate wireless and wired stations into one single network (henceforth called a hybrid system or hybrid network). This integration calls for the design of interoperable protocols for the wired and wireless domains. Furthermore, using wireless technology imposes problems not anticipated in the original design of the (wired) fieldbus protocols: security problems, interference, mobility management, and so on. In this chapter we survey some issues pertaining to the design and evaluation of protocols and architectures for (integrated) wireless industrial LANs and provide an overview of the state of the art. There is emphasis on aspects influencing the time and reliability behavior of wireless transmission. However, we discuss not only the problems but also different solution approaches on the physical, medium access control (MAC), or data link layer. These layers are key to the success of wireless fieldbus systems because they have important responsibilities in fulfilling timing and reliability requirements, and furthermore, they are exposed most directly to the wireless link characteristics. In the second part of this chapter, we focus on technologies and the creation of hybrid systems. On the one hand, there are a number of existing fieldbus standards like Controller Area Network (CAN), Factory Instrumentation Protocol (FIP)/WorldFIP, or PROFIBUS. For these systems we discuss problems and approaches to create hybrid systems. On the other hand, one could start from existing wireless technologies and ask about their capabilities with respect to timeliness and reliability. The most widely deployed WLAN technology is currently the IEEE 802.11 WLAN standard; its suitability for industrial applications is discussed. This chapter is structured as follows: In Section 22.2 important general considerations and problems of wireless industrial communications and wireless fieldbus systems are presented. In Section 22.3, we discuss some basic aspects of wireless LAN technology and wireless wave propagation. The transmission impairments resulting from certain wave propagation effects and some physical layer approaches to deal with them are presented in Section 22.4. Wireless wave propagation also has some interesting consequences on the operation of the MAC and data link layer; these are discussed in Section 22.5. The following two sections take a more technology-oriented perspective. Specifically, in Section 22.6 we survey the state of the art regarding wireless industrial communication systems and wireless fieldbus systems. In Section 22.7 we present the important aspects of the IEEE 802.11 WLAN standard with respect to transmission of real-time data. Finally, in Section 22.8 we provide a brief summary. The chapter is restricted to protocol-related aspects of wireless transmission; other aspects like signal processing, analog and digital circuitry, or energy aspects are not considered. There are many introductory and advanced books on wireless networking, for example, [1, 12, 37, 56, 61, 63, 65, 66]. Several separate topics in wireless communications are treated in [25]. Furthermore, this chapter is not intended to serve as an introduction to fieldbus technologies; some background information can be found in [14, 46].
22.2 Wireless Industrial Communications and Wireless Fieldbus: Challenges and Problems In this section we survey some of the problem areas arising in wireless fieldbus systems.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-3
22.2.1 System Aspects First, wireless fieldbus systems will operate in similar environments as wired ones. Typically, a small to moderate number of stations are distributed over geographically small areas with no more than 100 m between any pair of stations [33]. Wired fieldbus systems offer bit rates ranging from hundreds of kilobits to (tens of) megabits per second, and wireless fieldbus systems should have comparable bit rates. The wireless transceivers have to meet electromagnetic compatibility (EMC) requirements, meaning that they not only have to restrict their radiated power and frequencies, but also should be properly shielded from strong magnetic fields and electromagnetic noise emanated by strong motors, high-voltage electrical discharges, and so on. This may pose a serious problem when off-the-shelf wireless transceivers are used (for example, commercial IEEE 802.11 hardware), since these are typically designed for office environments and have no industrial-strength shielding. Another problem is that many small fieldbus devices get their energy supply from the same wire as that used for data transmission. If the cabling is to be removed from these devices, there is not only the problem of wireless data transmission, but also the issue of wireless power transmission [31], which requires substantial effort. For battery-driven devices, the need to conserve energy arises. This has important consequences for the design of protocols [18, 27] but is not discussed anymore in this chapter.
22.2.2 Real-Time Transmission over Error-Prone Channels In industrial applications often hard real-time requirements play a key role. In accordance with [58] we assume the following important characteristics of hard real-time communications: (1) safety-critical messages must be transmitted reliably within an application-dependent deadline; (2) there should be support for message priorities to distinguish between important and unimportant messages; (3) messages with stringent timing constraints typically have a small size; and (4) both periodic and aperiodic/ asynchronous traffic is present. The qualifier hard stems from the fact that losses or deadline misses of safety-critical packets can cost life or damage equipment. Both periodic and aperiodic messages in fieldbus systems can be subject to hard real-time constraints. Wireless media tend to exhibit time-variable and sometimes high error rates, which creates a problem for fulfilling the hard real-time requirements. As an example, the measurements presented in [82] have shown that in a certain industrial environment, for several seconds no packet gets through the channel. Therefore, seeking deterministic guarantees regarding timing and reliability is not appropriate. Instead, stochastic guarantees become important. An example formulation might be the percentage of safetycritical messages that can be transmitted reliably within a prespecified time — bound should be at least 99.x%. Of course, the error behavior limits the application areas of wireless industrial LANs: when deterministic guarantees in the range of 10 to 100 ms are essential, wireless transmission is ruled out (at least at the current state of art). However, if occasional emergency stop conditions due to message loss or missing deadlines are tolerable, wireless technologies can offer their potential. The goal is to reduce the frequency of losses and deadline misses. It depends on the communication model how transmission reliability can be implemented. In many fieldbus systems (for example, PROFIBUS) packets are transmitted from a sender to an explicitly addressed receiver station without involving other stations. Reliability can be ensured by several mechanisms, for example, retransmissions, packet duplications, or error-correcting codes. On the other hand, systems like FIP/WorldFIP [73] and CAN [35] implement the model of a real-time database where data are identified instead of stations. A piece of data has one producer and potentially many consumers. The producer broadcasts the data and all interested consumers copy the data packet into an internal buffer. This broadcast approach prohibits the use of acknowledgments and packet retransmissions, but error-correcting codes can still be used to increase transmission reliability. Often the data are transmitted periodically and (repeated) packet losses can be detected by comparing the known period and the time of the last arrival of a data packet. This freshness information can be used by the application to react properly.
© 2005 by CRC Press
22-4
The Industrial Communication Technology Handbook
22.2.3 Integration of Wired and Wireless Stations/Hybrid Systems There is a huge number of existing and productive fieldbus installations, and it is best if wireless stations can be integrated into them. Such a network with both wireless stations (stations with a wireless transceiver) and wired stations are called hybrid systems. The most important requirements for hybrid systems are: • Transparency: There should be no need to modify the protocol stack of wired stations. • Using specifically tailored protocols: Most fieldbus systems are specified on layers 1 (physical layer), 2 (medium access control and link layer), and 7 (application layer). The introduction of a wireless physical layer affects the behavior and performance of both the medium access control and link layer. The existing protocols for wired fieldbus systems are not designed for a wireless environment and should be replaced by protocols specifically tailored for the wireless link. However, this comes at the cost of protocol conversion between wired and wireless protocols. • Portability of higher-layer software: If the link layer interface is the same for both the wireless and wired protocol stacks, implementations of higher-layer protocols and application software can be used in the same way on both types of stations. The different approaches to integrate wireless stations into wired fieldbus LANs can be classified according to the layer of the Open Systems Interconnection (OSI) reference model where the integration actually happens [13, 81]. Almost all fieldbus systems are restricted to the physical, data link, and application layers [14]. The classification is as follows: • Wireless cable replacement approach: All stations are wired stations and thus attached to a cable. A piece of cable can be replaced by a wireless link, and special bridgelike devices translate the framing rules used on the wireless and wired media, respectively. In this approach, no station is aware of the wireless link. A typical application scenario is the wireless interconnection of two fieldbus segments. • Wireless MAC-unaware bridging approach: The network is composed of both wired and wireless stations, but integration happens solely at the physical layer. Again, a bridgelike device translates the framing rules between wired and wireless media. The wireless stations use merely an alternative physical layer (PHY), but the medium access control (MAC) and link layer protocols remain the same as for wired stations. • Wireless MAC-aware bridging approach: The LAN is composed of both wired and wireless stations and integration happens at the MAC and data link layer. There are two different MAC and link layer protocol stacks for wired and wireless stations, but both offer the same link layer interface. The wireless MAC and link layer protocols should be (1) specifically tailored to the wireless medium and (2) easily integrable with the wired MAC and link layer protocols. An intelligent bridgelike device is responsible for both translation of the different framing rules and interoperation of the different MAC protocols. • Wireless gateway approach: In this approach integration happens at the application layer or even in the application itself. Entirely different protocols can be used on the different media types. • Some mixture of these approaches. Any of these approaches requires special coupling devices at the media boundaries. For the wireless cable replacement and the MAC-unaware bridging approaches, these devices can be simple. The other approaches may require complex and stateful operations. Hence, the issues of failure and redundancy need to be addressed.
22.2.4 Mobility Support The potential station mobility is one of the main attractions of wireless systems. We can assume that wireless fieldbus systems will be mostly infrastructure based (meaning that there are base stations or access points). A handover must be performed when a mobile station moves from the range of one access point into the
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-5
range of another access point. Typically, handover processes involve exchange of signaling packets between the mobile and access points. Ideally, a station can fulfill timing and reliability requirements even during a handover. The applicability and performance of handover schemes depend on the maximum speed of a mobile station. In industrial applications, it is typically forklifts, robots, or moving plant subsystems that are mobile, and it is safe to assume that these devices will have a maximum speed of 20 km/h [33]. A simple consequence of mobility is that stations may enter and leave a network at unforeseeable times. To support this, a protocol stack at a minimum must offer functionalities to make a new station known to the network/the other stations, and sometimes address assignment is also needed. On the other hand, fieldbus systems and their applications often are designed with the assumption that the network is set up once and not changed afterwards. Consequently, some fieldbus systems do not support any dynamics in their set of stations. Consider, for example, the FIP/WorldFIP fieldbus [73]. This system belongs to the class of real-time database systems, and the producers of data items (called process variables) are polled by a dedicated central station, the bus arbiter. The bus arbiter keeps a table of variable identifiers and traverses it cyclically. To include a new station into the system, the arbiter’s table has to be modified by a human operator. It is worth noting that the most widely used fieldbus systems do not offer any support for dynamic address assignment.
22.2.5 Security Aspects and Coexistence Security played no important role in the initial design of the fieldbus standards. This was reasonable, because physical access to a wire is needed to eavesdrop or inject packets. However, the introduction of wireless media allows an attacker to eavesdrop packets at some distance, for example, on the factory’s parking lot. Even worse, an attacker could generate interference on the operating frequency of a wireless fieldbus system and distort all transmissions (including the time-critical and important ones). An attacker might also try to inject malicious packets into the network, for example, false valve commands. Therefore, security measures (integrity, authentication, authorization) have to be added to wireless fieldbus systems [64]. Noise and interference are not only generated purposely by some attackers, but can also come from co-located wireless systems working in the same frequency band. As an example, both IEEE 802.11 and Bluetooth use the 2.4-GHz ISM (industrial, scientific, and medical) band and create mutual interference. This coexistence problem is explored in [10].
22.3 Wireless LAN Technology and Wave Propagation In this section we discuss some basic characteristics of WLAN technology and present some of the fundamental wave propagation effects. In Sections 22.4 and 22.5 we discuss physical layer and MAC/link layer approaches to overcome or at least relax some of the problems created by the propagation effects.
22.3.1 Wireless LANs Wireless LANs are designed for packet-switched communications over short distances (up to a few hundred meters) and with moderate to high bit rates. As an example, the IEEE 802.11 WLAN standard offers bit rates between 1 and 54 Mb/s [54, 69]. Wireless LANs usually use either infrared or radio frequencies. In the latter case, license-free bands like the 2.4-GHz ISM band are particularly attractive, since the only restriction in using this band is a transmit power limit. On the other hand, since anyone can use these bands, several systems have to coexist. Radio waves below 6 GHz propagate through walls and can be reflected on several types of surfaces, depending on both frequency and material. Thus, with radio frequencies non-line-of-sight (NLOS) communications is possible. In contrast, systems based on infrared only allow for line-of-sight (LOS) communications over a short distance. An example is the IrDA (Infrared Data Association) system [79]. Wireless LANs can be roughly subdivided into ad hoc networks [71] and infrastructure-based networks. In the latter case some centralized facilities like access points or base stations are responsible for tasks
© 2005 by CRC Press
22-6
The Industrial Communication Technology Handbook
like radio resource management, forwarding data to distant stations, mobility management, and so on. In general, stations cannot communicate without the help of the infrastructure. In ad hoc networks there is no prescribed infrastructure and the stations have to organize network operation by themselves. Infrastructure-based WLANs offer some advantages for industrial applications. Many industrial communication systems already have an asymmetric structure that can be naturally accommodated in infrastructure-based systems. The often used master–slave communication scheme serves as an example. Furthermore, the opportunity to offload certain protocol processing tasks to the infrastructure keeps mobile stations simpler and allows them to make efficient centralized decisions. Compared to other wireless technologies like cellular systems and cordless telephony, WLAN technologies seem to offer the best compromise between data rate, geographical coverage, and license-free/ independent operation.
22.3.2 Wave Propagation Effects In the wireless channel waves propagate through the air, which is an unguided medium. The wireless channel characteristics are significantly different from those of guided media, like cables and fibers, and create unique challenges for communication protocols. A transmitted waveform is subjected to phenomena like path loss, attenuation, reflection, diffraction, scattering, adjacent and co-channel interference, thermal or man-made noise, and imperfections in the transmitter and receiver circuitry [8, 61]. The path loss characterizes the loss in signal power when increasing the distance between a transmitter T and a receiver R. In general, the mean received power level E[PRx] can be represented as the product of the transmit power PTx and the mean path loss E[PL] : E[PRx ] = PTx ◊ E[PL] A typical path loss model for omnidirectional antennas is given by [61, Chap. 4.9]: Êdˆ E[PL](d) = C ◊ Á ˜ Ë d0 ¯
n
where d ≥ d0 is the distance between T and R, E[ PL ](d) is the mean path loss, d0 is a reference distance that depends on the antenna technology, C is a technology- and frequency-dependent scaling factor, and n is the so-called path loss exponent. Typical values for n are between 2 (free-space propagation) and 5 (shadowed urban cellular radio); see also [61, Chap. 4.9]. Reflection occurs when a waveform impinges on a smooth surface with structures significantly larger than the wavelength. Not all signal energy is reflected; some energy is absorbed by the material. The mechanism of diffraction allows a wave to propagate into a shadowed region, provided that some sharp edge exists. Scattering is produced when a wavefront hits a rough surface having structures smaller than the wavelength; it leads to a signal diffusion in many directions. The most important types of interference are co-channel interference and adjacent channel interference. In co-channel interference a signal transmitted from T to R on channel c1 is distorted by a parallel transmission on the same channel. In case of adjacent channel interference the interferer I transmits on an adjacent channel c2, but due to imperfect filters R captures frequency components from c2. Alternatively, an interferer I transmitting on channel c2 leaks some signal energy into channel c1 due to imperfect transmit circuitry (amplifier). Noise can be thermal noise or man-made noise. Thermal noise is created in the channel or in transceiver circuitry and can be found in almost any communications channel. Man-made noise in industrial environments can have several sources, for example, remote controls, motors, or microwave ovens.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-7
22.4 Physical Layer: Transmission Problems and Solution Approaches The previously discussed wave propagation effects can lead to channel errors. In general, their impact depends on a multitude of factors, including frequency, modulation scheme, and the current propagation environment. The propagation environment is characterized by the distance between stations, interferers, the number of different paths and their respective losses, and more. These factors can change when a station or parts of the environment move. Consequently, the transmission quality is time variable.
22.4.1 Effects on Transmission The notion of slow fading refers to significant variations in the mean path loss, as they occur due to significant changes in distance between transmitter T and receiver R or by moving beyond large obstacles. Slow fading phenomena usually occur on longer timescales; they often coincide with human activity like mobility. For short durations in the range of a few seconds, the channel can often be assumed to have constant path loss. An immediate result of reflection, diffraction, and scattering is that multiple copies of a signal may travel on different paths from T to R. Since these paths usually have different lengths, the copies arrive at different times (delay spread) and with different phase shifts at the receiver and overlap. This has two consequences: • The overlapping signals can interfere constructively or destructively. Destructive interference may lead to up to a 40-dB loss of received power. Such a situation is often called a deep fade. • The delay spread leads to intersymbol interference, since signals belonging to neighboring information symbols overlap at the receiver. If the stations move relative to each other or to the environment, the number of paths and their phase shifts vary in time. This results in a fast fluctuating signal strength at the receiver (called fast fading or multipath fading). It is important to note that these fluctuations are much faster than those caused by slow fading. Fast fading happens on the scale of milliseconds, whereas slow fading happens at scales of seconds or minutes. On the timescale of milliseconds, the mean signal strength is constant. If the delay spread is small relative to the duration of a channel symbol, the channel is called non-frequency selective or flat; otherwise, it is called frequency selective. These problems translate into bit errors or packet losses. Packet losses occur when the receiver fails to acquire bit synchronization [82]. In case of bit errors synchronization is successfully acquired, but a number of channel symbols are decoded incorrectly. The bit error rate can, for example, be reduced by using forward error correction (FEC) techniques [11, 45]. The statistical properties of bit errors and packet losses were investigated in a number of studies [16, 52, 82]. While the results are not immediately comparable, certain trends show up in almost every study: • Both bit errors and packet losses are bursty; they occur in clusters with error-free periods between the clusters. The distributions of the cluster lengths and the lengths of error-free periods often have a large coefficient of variation or even seem to be heavy tailed. • The bit error rates depend on the modulation scheme; typically schemes with higher bit rates/ symbol rates exhibit higher error rates. • The wireless channel is much worse than wired channels; often bit error rates of 10 -3 L10 -6 can be observed. Furthermore, the bit error rate can vary over several orders of magnitudes within minutes. Some knowledge about error generation patterns and error statistics can be helpful in designing more robust protocols.
© 2005 by CRC Press
22-8
The Industrial Communication Technology Handbook
22.4.2 Wireless Transmission Techniques A number of different transmission techniques have been developed to combat the impairments of the wireless channel and to increase the reliability of data transmission. Many types of WLANs (including IEEE 802.11) rely on spread-spectrum techniques [26], where a narrowband information signal is spread to a wideband signal at the transmitter and de-spread back to a narrowband signal at the receiver. By using a wideband signal, the effects of narrowband noise or narrowband interference are reduced. The two most important spread-spectrum techniques are directsequence spread spectrum (DSSS) and frequency-hopping spread spectrum (FHSS). In DSSS systems an information bit is multiplied (XORed) with a finite bipolar chip sequence such that transmission takes place at the chip rate instead of the information bit rate. The chip rate is much higher than the information rate and consequently requires more bandwidth; accordingly, the duration of a chip is much smaller than the duration of a user symbol. The chip rate is chosen such that the average delay spread is larger than the chip duration; thus, the channel is frequency selective. Receivers can explore this in different ways. To explain the first one, let us assume that the receiver receives a signal S from a line-of-sight path and a delayed signal S¢ from another path, such that the delay difference (called lag) between S and S¢ is more than the duration of a single chip. The chip sequences are designed such that the autocorrelation between the sequence and a shifted version of it is low for all lags of more than one chip duration. If a coherent matched-filter receiver is synchronized with the direct signal S, the delayed signal S¢ appears as white noise and produces only a minor distortion. In the RAKE receiver approach, delayed signal copies are not treated as noise but as a useful source of information [65, Section 10.4]. Put briefly, a RAKE receiver tries to acquire the direct signal and the strongest time-delayed copies and combines them coherently. However, RAKE receivers are much more complex than simple matched-filter DSSS receivers. In FHSS the available spectrum is divided into a number of subchannels. The transmitter hops through the subchannels according to a predetermined schedule, which is also known to the receiver. The advantage of this scheme is that a subchannel currently subject to transmission errors is used only for a short time before the transmitter hops to the next channel. The hopping frequency is an important parameter of FHSS systems, since high frequencies require fast and accurate synchronization. As an example, the FHSS version of IEEE 802.11 hops with 2.5 Hz and many packets can be transmitted before the next hop. In Bluetooth the hopping frequency is 1.6 kHz and at most one packet can be transmitted before the next hop. Packets are always transmitted without being interrupted by hopping. Recently there has been considerable interest in orthogonal frequency-division multiplexing (OFDM) techniques [75]. OFDM is a multicarrier technique, where blocks of N different symbols are transmitted in parallel over a number of N subcarriers. Hence, a single symbol has an increased symbol duration N · t, compared to full-rate transmission with symbol duration t. The symbol duration N · t is usually much larger than the delay spread of the channel, this way combatting intersymbol interference and increasing channel quality. IEEE 802.11a [54] as well as HIPERLAN/2 [20, 21] use an OFDM physical layer.
22.5 Problems and Solution Approaches on the MAC and Link Layer The MAC and the link layer are exposed most to the error behavior of wireless channels and should do most of the work needed to improve the channel quality. Specifically for hard real-time communications, the MAC layer is a key component: if the delays on the MAC layer are not bounded, the upper layers cannot compensate this. In general, the operation of the MAC protocol is largely influenced by the properties of the physical layer. Some of the unique problems of wireless media are discussed in this section. For a general discussion of MAC and link layer protocols, refer to [15, 28, 40, 68, 74].
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
A
B
22-9
C
FIGURE 22.1 Hidden-terminal scenario.
22.5.1 Problems for Wireless MAC Protocols Several problems arise due to path loss in conjunction with a threshold property: wireless receivers require the signal to have a minimum strength to be recognized. For a given transmit power, this requirement translates into an upper bound on the distance between two stations wishing to communicate*; if the distance between two stations is larger, they cannot hear each other’s transmissions. For MAC protocols based on carrier sensing (carrier-sense multiple access (CSMA)), this property creates the hiddenterminal [70] and exposed-terminal problems. The hidden-terminal problem is sketched in Figure 22.1: consider three stations A, B, and C with transmission radii as indicated by the circles. Stations A and C are in the range of B, but A is not in the range of C and vice versa. If C starts to transmit to B, A cannot detect this by its carrier-sensing mechanism and considers the medium to be free. Consequently, A also starts packet transmission and a collision occurs at B. The exposed-terminal problem is also a result of false prediction of the channel state at the receiver. An example scenario is shown in Figure 22.2. The four stations A, B, C, and D are placed such that the pairs A/B, B/C, and C/D can hear each other; all other combinations cannot. Consider the situation where B transmits to A, and one short moment later C wants to transmit to D. Station C performs carrier sensing and senses the medium busy due to B’s transmission. Consequently, C postpones its transmission. However, C could safely transmit its packet to D without disturbing B’s transmission to A. This leads to a loss of efficiency. Two approaches to solve these problems are busy-tone solutions [70] and the RTS/CTS protocol (defined below). In the busy-tone solution two channels are assumed: a data channel and a separate control channel for the busy-tone signals. The receiver of a packet transmits a busy-tone signal on the control channel during packet reception. If a prospective transmitter wants to perform carrier sensing, it listens
A
B
C
D
FIGURE 22.2 Exposed-terminal scenario.
*To complicate things, wireless links are not necessarily bidirectional: it may well happen that station A can hear station B but not vice versa.
© 2005 by CRC Press
22-10
The Industrial Communication Technology Handbook
on the control channel instead of the data channel. If the control channel is free, the transmitter can start to transmit its packet on the data channel. This protocol solves the exposed-terminal problem. The hidden-terminal scenario is also solved except in those rare cases where A and C start their transmissions simultaneously. However, if the busy tone is transmitted only when the receiver detects a valid packet header, the two colliding stations A and C can abort their transmissions quickly when they perceive the lack of a busy tone. The busy-tone solution requires two channels and two transceivers. The RTS/CTS protocol attacks the hidden-terminal problem using only a single channel. Here we describe the variant used in the IEEE 802.11 WLAN (there are other ones). Consider the case that station A has a data packet for B. After A has obtained channel access, it sends a short request-to-send (RTS) packet to B. This packet includes the time duration needed to finish the whole packet exchange sequence, including the final acknowledgment. If B receives the RTS packet properly, it answers with a clear-tosend (CTS) packet, again including the time needed to finish the packet exchange sequence. Station A starts to transmit its data packet immediately after receiving the CTS packet. Any other station C, hearing the RTS or CTS packet, defers its transmissions for the indicated time, this way not disturbing the ongoing packet transmission. It is a conservative choice to defer on any of the RTS or CTS packets, and in fact the exposed-terminal problem still exists. One solution could be to let C defer only on reception of a CTS frame, but to allow C a packet transmission if it hears an RTS without a corresponding CTS frame.* The RTS/CTS protocol described here does not prevent collisions of RTS packets, it has significant overhead, and it is still susceptible to subtle variants of the hidden-terminal problem [60]. A significant problem of wireless transceivers is their inability to transmit and receive simultaneously on the same frequency band. Hence, a fast collision detection procedure similar to the CSMA/CD protocol of Ethernet is impossible to implement. Instead, collision detection has to resort to other mechanisms like the busy-tone approach described above (rarely used) or the use of MAC layer acknowledgments (used frequently). Unfortunately, there are fieldbus systems relying on such a feature, for example, the Controller Area Network (CAN) fieldbus [35] with its priority arbitration protocol. In this class of protocols each message is tagged with a priority value, and this value is used to deterministically resolve collisions. In the CAN protocol, all stations are tightly time synchronized and the priority field is always at the start of a packet. All contending stations start packet transmission at the same time. Each contender transmits its priority field bit by bit and reads back the signal from the medium. If the medium state is the same as the transmitted bit, the station continues; otherwise, the station gives up and waits for the next contention cycle. This protocol requires not only the ability to simultaneously listen and receive on the same channel, but the channel will also produce meaningful values from overlapping signals. Alternative implementations are sketched in Section 22.6.1. Even receiver-based collision detection may not work reliably due to the near–far effect: consider two stations A and B transmitting packets in parallel to a station C. For simplicity, let us assume that both stations use the same transmit power. Station A is very close to C, whereas station B is far away but still in reach of C. Consequently, A’s signal at C is much stronger than B’s. In this case, it may happen that C successfully decodes a packet sent by A despite B’s parallel transmission. This situation is advantageous for the system throughput but disadvantageous for MAC protocols relying on collision detection or collision resolution.
22.5.2 Methods for Combating Channel Errors and Channel Variation A challenging problem for real-time transmission is the error-prone and time-varying channel. There are many possible control knobs for improving the channel quality, for example, transmit power, bit rate/modulation, coding scheme/redundancy scheme, packet length, choice of retransmission scheme (automatic repeat request (ARQ)), postponing schemes and timing of (re)transmissions, diversity schemes [57], and adaptation as a meta-method [22]. In general, adaptation at the transmitter requires
*Clearly, if C receives a distorted CTS packet it should defer.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-11
feedback from the receiver. This feedback can be created by using immediate acknowledgment packets after each data packet. The variations of transmit power and of the bit rate/modulation scheme are both equivalent to varying the energy per bit, which in turn influences the bit error rate [59, 61]. Roughly, higher transmit powers and slower bit rates/modulation schemes increase the probability of successful packet reception [32]. A common way to protect data bits against bit errors is to use redundancy. Example approaches are error-detecting and -correcting codes (also called forward error correction (FEC)) [44] and the transmission of multiple copies of a packet [2]. The latter approach can also be classified as a time-diversity scheme [57]. It is beneficial for the overall throughput to control the amount of redundancy according to the current channel state such that none or only a little redundancy is added when the channel currently shows only few errors [9, 16]. A second standard way to deal with transmission errors is retransmissions and suitable ARQ schemes. For channels with bursty errors, it is not clever to retransmit the same packet immediately on the same channel. Specifically, when the mean length of error bursts is of the same order or larger than the packet length, both the original packet and its immediate retransmission are likely to be hit by the same error burst. Hence, under these circumstances an immediate retransmission wastes time and energy. The transmitter can postpone the retransmission for a while and possibly transmit packets to other stations or over other channels in the meantime. If the postponing delay is well chosen, the channel has left the error burst and the retransmission is successful. Indeed, it has been demonstrated in [4–6] that such an approach can reduce the number of wasted packets and increase the throughput significantly. But how should one choose the postponing delay? One option is to adopt some fixed value, which could be based on measurements or on a priori knowledge about the channel. Another option is to occasionally send small probing packets [84] that the receiver has to acknowledge. If the transmitter captures such an acknowledgment, it assumes the channel to be back in good state and continues data transmission. For real-time systems, the postponing decision should not only consider the channel state but also the deadline of a packet. The authors of [17] describe a scheme that takes both the estimated channel state (for postponing decisions) and the packet deadline into account to select one coding scheme from a suite of available schemes. Retransmissions do not necessarily need to use the same channel as the original packet. It is well known that wireless channels are spatially diverse: a signal transmitted by station A can be in a deep fade at geographical position p1 and at the same time good enough to be properly received at another position p2. This property is exploited by certain diversity techniques, for example, receiver diversity [61]: the receiver has two antennas and can pick the stronger/better of the two signals it reads from its antennas. If the spacing between the antennas is large enough,* the signals appear to be uncorrelated. The spatial diversity of wireless channels can also be explored at the protocol level: assume that station A transmits a packet to station B. The channel from A to B is currently in a deep fade, but station C successfully captures A’s packet. If the channel from C to B is currently in a good state, the packet can be successfully transmitted over this channel. Therefore, station C helps A with its retransmission. This idea has been applied in [81] to the retransmission of data packets as well as to poll packets in a polling-based MAC protocol. In general, ARQ schemes can be integrated with forward error correction schemes into hybrid error control schemes [45]. Ideally, for industrial applications deadlines should be taken into account when designing these schemes. In [3, 72], retransmissions and FEC are combined with the concept of deadlines by increasing the coding strength with each retransmitted packet as the packet deadline approaches. This is called deadline-dependent coding. Another interesting hybrid error control technique is packet combining [30, 39, 78]. Put briefly, in these schemes the receiver tries to take advantage of the partially useful information contained in already received erroneous copies of a packet. For example, if the receiver has received at least three erroneous copies of a packet, it can try to figure out the original packet by
*The two antennas should at a minimum have a mutual distance of 40% of the wavelength [61, Chap. 5]. If the system works in the 2.4-GHz ISM band, this amounts to 5 to 6 cm.
© 2005 by CRC Press
22-12
The Industrial Communication Technology Handbook
applying bit-by-bit majority voting. There are other packet-combining techniques, for example, equalgain combining. Sometimes the packet error probability (and therefore the need for retransmissions) can be reduced by proper tuning of packet sizes. Intuitively, it is clear that larger packets are more likely hit by errors than smaller ones. On the other hand, with smaller packets the fixed-size packet header becomes more dominant and leads to increased overhead. If the transmitter has estimates of current channel conditions, it can choose the appropriate packet size, giving the desired trade-off between reliability and efficiency [47].
22.6 Wireless Fieldbus Systems: State of the Art Fieldbus systems are designed to deliver hard real-time services under harsh environmental conditions. A wireless fieldbus [13] should be designed to provide as stringent stochastic timing and reliability guarantees as possible over wireless links. However, in most of the literature surveyed in this section this issue is not addressed. Nonetheless, we discuss existing approaches for different popular fieldbus systems.
22.6.1 CAN As already described in Section 22.5.1, the CAN system [35] uses a priority arbitration protocol on the MAC layer, which cannot be implemented directly on a wireless link. Some approaches have been developed to circumvent this; here we discuss a centralized and two distributed solutions [42]. The distributed WMAC protocol uses a CSMA/CA (carrier-sense multiple access with collision avoidance) scheme with priority-dependent backoffs. A station wishing to transmit a packet uses a carriersense mechanism to wait for the end of an ongoing packet transmission. After this, the station picks a backoff time depending on the priority value of the current packet. The station listens on the channel during the backoff time. If no other station starts transmission, the station assumes that it has the highest priority and starts transmitting its own packet. Otherwise, the station defers and starts over after the other packet has been finished. In another distributed scheme the CAN message priority value is mapped onto the channel using an on–off keying scheme [41]: a station transmits a short burst if the current priority bit is a logical one; otherwise, it switches into receive mode. If the station receives any signal, it gives up; otherwise, it continues with the next bit. The priority bits are considered from the most significant bit to the least significant bit. If the station is still contending after the last bit, it transmits the actual data packet. This approach requires tight synchronization and fast switching between transmit and receive modes of the radio transceiver, which is a problem for certain WLAN technologies. The centralized RFMAC protocol leverages the fact that CAN belongs to the class of systems using the real-time database communication model. Data items are identified by unique identifiers. Similar to FIP/ WorldFIP, all communication is controlled by a central station broadcasting the variable identifiers and causing the producers of the corresponding data items to transmit the data.
22.6.2 FIP/WorldFIP The FIP/WorldFIP fieldbus uses a polling table to implement a real-time database [73]. To couple wired and wireless stations, in [49] a wireless-to-wired gateway is introduced, serving as a central station for the wireless part. The wireless MAC protocol uses time-division multiple access (TDMA), and each TDMA slot is used to transmit one data item (also called process variable). In the OLCHFA project, a prototype system integrating wired and wireless FIP stations has been developed. This system works in the 2.4-GHz ISM band using a DSSS physical layer [36]. The available publications put emphasis on the management of configuration data and on distributed algorithms for clock synchronization. The MAC and data link protocols of Factory Instrumentation Protocol were not modified. Since FIP broadcasts the values of process variables periodically, the protocol contains no retransmission scheme for the time-critical data. Instead, the OLCHFA approach is to enhance the FIP
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-13
process variable model with so-called time-critical variables, which provide freshness information to the applications. Applications can use this to handle cases of repeated losses.
22.6.3 PROFIBUS The R-FIELDBUS project (www.rfieldbus.de) evaluated how IEEE 802.11 with DSSS can be used in a PROFIBUS fieldbus system and how such a system can be used for transmission of IP-based multimedia data [33, 62]. Two different architectures have been proposed: the single logical ring and the multiple logical ring solutions, discussed below. Both solutions run the (almost) unmodified PROFIBUS protocol. The PROFIBUS protocol uses token passing on top of a broadcast medium. The token is passed between active stations along a logical ring, and much of the protocol’s complexity deals with ring maintenance. The token itself is a small control frame. In the single logical ring solution all wired and wireless stations are integrated into a single logical token-passing ring. The coupling devices between the wired and wireless domains simply forward all packets. This approach is easy to realize but subjects both data packets and control packets like the token frame to the errors on wireless links. It is shown in [83] for PROFIBUS and in [38] for the similar IEEE 802.4 Token Bus protocol that repeated losses of token frames can create severe problems with the achievable real-time performance. Since there is only a single logical ring, the whole network is affected. In contrast, in the multiple logical ring solution [23] wireless and wired stations are separated into several logical rings. These rings are coupled by intelligent devices called brouters (a merger from bridge and router). In this solution, transmission problems distort only one ring; the other logical rings remain operational. A second benefit of having multiple rings is traffic segmentation: if the segments are chosen carefully, most of the traffic will be intrasegment, and thus the overall traffic capacity can be increased. A drawback of the multiple logical ring solution, however, is that intersegment traffic is not natively supported by the PROFIBUS protocol and extensions are required. In [80, 81] a system following the wireless MAC-aware bridging approach is proposed. On the wireless side, specifically tailored polling-based protocols are used, whereas wired stations run the unmodified PROFIBUS protocol stack. The goal is to avoid token passing on wireless segments. It is shown that for bursty channel errors the polling-based protocols achieve substantially better performance in terms of stochastic hard real-time behavior than the PROFIBUS token-passing protocol; for certain kinds of channels the 99% quartile of the delay needed to successfully transmit a high-priority packet is up to an order of magnitude smaller than for the PROFIBUS protocol. To integrate both protocols, the coupling device between wired and wireless media provides a virtual ring extension [80]. In this scheme the coupling device acts on the wired side on behalf of the wireless stations. For example, it creates token frames and executes the ring maintenance mechanisms. Finally, in [43] a scheme for integration of wireless nodes into a PROFIBUS-DP network (single master, many slaves, no token passing) is described. An application layer gateway is integrated with a virtual master station. The virtual master acts as a proxy for the wireless stations; it polls them using standard IP and IEEE 802.11 distributed coordination function (DCF) protocols.
22.6.4 Other Fieldbus Technologies For the International Electrotechnical Commission (IEC) fieldbus [34] (which uses a centralized, pollingbased access protocol for periodic data and a token-passing protocol for asynchronous data) in reference [7], an architecture is proposed that allows coupling of several fieldbus segments using a wireless backbone based on IEEE 802.11 with point coordination function (PCF). In [50], it is investigated how the Manufacturing Automation Protocol (MAP)/Manufacturing Message Specification (MMS) application layer protocol can be enhanced with mobility. In the proposed system the IEEE 802.11 WLAN with DCF is used; time-critical transmissions and channel errors are not considered. In [48], the same question was investigated with the digital European cordless telephone (DECT) as the underlying technology.
© 2005 by CRC Press
22-14
The Industrial Communication Technology Handbook
22.7 Wireless Ethernet/IEEE 802.11 Instead of developing WLAN technology for the factory floor from scratch, existing technologies might serve as a starting point. A good candidate is the IEEE 802.11 WLAN standard [53, 54, 69], since it is the most widely used WLAN technology. Some alternative systems are HIPERLAN [19, 20], Bluetooth [29], and HomeRF [51].
22.7.1 Brief Description of IEEE 802.11 IEEE 802.11 belongs to the IEEE 802.x family of LAN standards. The standard describes architecture, services, and protocols for an Ethernet-like wireless LAN, using a CSMA/CA-based MAC protocol with enhancements for time-bounded services. The protocols run on top of several PHYs: a FHSS PHY, a DSSS PHY offering 1 and 2 Mb/s [69], 5.5 and 11 Mb/s extensions of the DSSS PHY [53], and an OFDM PHY with 54 Mb/s [54]. The standard describes an ad hoc mode and an infrastructure-based mode. In the infrastructure mode all communications are relayed through fixed access points (APs). An access point constitutes a service set in, and mobile stations have to associate with the closest access point. The access points are connected by a distribution system that allows the forwarding of data packets between mobile stations in different cells. In the ad hoc mode, there are neither access points nor a distribution system; stations communicate in a peer-to-peer fashion. A detailed description of IEEE 802.11 can be found in [55]. The basic MAC protocol of 802.11 is called the distributed coordination function (DCF). It is a CSMA/ CA protocol using the RTS/CTS scheme described in Section 22.5.1 and different interframe gaps to give control frames (for example, acknowledgments, CTS frames) priority over data frames. However, data frames cannot be differentiated according to priorities. The IEEE 802.11 MAC provides a connectionless and semireliable best-effort service to its users by performing a bounded number of retransmissions. The user cannot specify any quality-of-service requirements for his packets; he can only choose between contention-based and contentionless transmission (see below). Furthermore, it is not possible to specify attributes like transmit power, modulation scheme, or the number of retransmissions on a per-packet basis. This control would be desirable for treating different packet types differently. As an example, one could transmit high-priority packets with high transmit power and low bit rate to increase their reliability. The enhancement for time-bounded services is called point coordination function (PCF) and works only in the infrastructure mode. The PCF defines a superframe structure with variable- but maximumlength superframes. A superframe consists of a superframe header followed by a contention-free period (CFP) and a contention period (CP), both of variable length. During the CP all stations operate in the DCF mode, including the access points. To initiate the start of the CFP, the AP (also called point coordinator (PC)) has to acquire the wireless medium before it can transmit its beacon packet. Therefore, beacon transmissions and contention-free periods are not strictly periodic and isochronous services are not supported. The beacon indicates the length of the contention-free period, and all stations receiving the beacon are forbidden to initiate transmissions during this time. Instead, they wait to be polled by the point coordinator. If this happens, they can use the medium exclusively for transmission of a single packet. After the contention-free period ends, the stations return to their usual DCF behavior and can initiate transmissions at their will. The AP has a poll list of station addresses. The polling scheme itself is not fully specified. A station that desires to be polled has to signal this to the AP during the association process. The poll list membership ends upon disassociation or when the station reassociates itself without requesting contention-free service.
22.7.2 Real-Time Transmission over IEEE 802.11 The PCF is designed to provide time-bounded services. Many studies [67, 76, 77] confirm that indeed packets transmitted during the CFP receive substantially smaller delays than those transmitted during
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-15
the CP, but at the cost of substantial overhead: in [77] the authors show a scenario where eight voice calls, each having a data rate of 8 kBit/s, require approximately 50% bandwidth of a 1 MBit/s transmission medium (without channel errors). When transmission has to be both timely and reliable despite channel errors, retransmissions are needed. When a time-critical packet transmitted during the CFP fails, the station can try the retransmission during the following CP or during the next CFP one superframe later (except in the case where multiple entries in the polling list are allocated to the mobile, and thus it receives multiple polls during the same CFP). Hence, retransmissions of important packets receive no priority in 802.11.
22.8 Summary This chapter presented some problems and solution approaches to bringing WLAN technology to the factory plant and to benefiting from reduced cabling and mobility. The basic problem is the tension between the hard timing and reliability requirements of industrial applications, on the one hand, and the serious error rates and time-varying error behavior of wireless channels, on the other hand. Many techniques have been developed to improve the reliability and timeliness behavior of lower-layer wireless protocols, but up to now, wireless fieldbus systems have not been deployed on a large scale, as the problem of reliable transmission despite channel errors is not solved satisfactorily. It is not clear which combination of mechanisms and technologies has the potential to bound the number of deadline misses under realistic channel conditions. It seems to be an open question of whether just more engineering is needed to make wireless transmission suitable for fulfilling hard real-time and reliability requirements, or whether there is really a limit of what can be achieved. Fortunately, wireless communications and WLAN technology is a very active field of research and development. New technologies are created and existing technologies are enhanced. As an example, the IEEE 802.11g and IEEE 802.11e working groups are working on delivering higher bit rates and better quality of service to users. It will be exciting to see how industrial applications can benefit from this.
References [1] Lars Ahlin and Jens Zander. Principles of Wireless Communications. Studentlitteratur, Lund, Sweden, 1998. [2] A. Annamalai and Vijay K. Bhargava. Analysis and optimization of adaptive multicopy transmission ARQ protocols for time-varying channels. IEEE Transactions on Communications, 46:1356–1368, 1998. [3] Henrik Bengtsson, Elisabeth Uhlemann, and Per-Arne Wiberg. Protocol for wireless real-time systems. In Proceedings of the 11th Euromicro Conference on Real-Time Systems, York, England, 1999. [4] Pravin Bhagwat, Partha Bhattacharya, Arvind Krishna, and Satish K. Tripathi. Using channel state dependent packet scheduling to improve TCP throughput over wireless LANs. Wireless Networks, 3:91–102, 1997. [5] Richard Cam and Cyril Leung. Multiplexed ARQ for time-varying channels. Part I. System model and throughput analysis. IEEE Transactions on Communications, 46:41–51, 1998. [6] Richard Cam and Cyril Leung. Multiplexed ARQ for time-varying channels. Part II. Postponed retransmission modification and numerical results. IEEE Transactions on Communications, 46:314–326, 1998. [7] S. Cavalieri and D. Panno. On the integration of fieldbus traffic within IEEE 802.11 wireless LAN. In Proceedings of the 1997 IEEE International Workshop on Factory Communication Systems (WFCS ’97), Barcelona, Spain, 1997. [8] James K. Cavers. Mobile Channel Characteristics. Kluwer Academic Publishers, Boston, 2000. [9] R. Chen, K.C. Chua, B.T. Tan, and C.S. Ng. Adaptive error coding using channel prediction. Wireless Networks, 5:23–32, 1999.
© 2005 by CRC Press
22-16
The Industrial Communication Technology Handbook
[10] Carla-Fabiana Chiasserini and Ramesh R. Rao. Coexistence mechanisms for interference mitigation in the 2.4-GHz ISM band. IEEE Transactions on Wireless Communications, 2:964–975, 2003. [11] Daniel J. Costello, Joachim Hagenauer, Hideki Imai, and Stephen B. Wicker. Applications of errorcontrol coding. IEEE Transactions on Information Theory, 44:2531–2560, 1998. [12] Klaus David and Thorsten Benkner. Digitale Mobilfunksysteme. Informationstechnik. B.G. Teubner, Stuttgart, 1996. [13] Jean-Dominique Decotignie. Wireless fieldbusses: a survey of issues and solutions. In Proceedings of the 15th IFAC World Congress on Automatic Control (IFAC 2002), Barcelona, Spain, 2002. [14] Jean-Dominique Decotignie and Patrick Pleineveaux. A survey on industrial communication networks. Annales des Telecommunications, 48:435ff, 1993. [15] Lou Dellaverson and Wendy Dellaverson. Distributed channel access on wireless ATM links. IEEE Communications Magazine, 35:110–113, 1997. [16] David A. Eckhardt and Peter Steenkiste. A trace-based evaluation of adaptive error correction for a wireless local area network. MONET: Mobile Networks and Applications, 4:273–287, 1999. [17] Moncef Elaoud and Parameswaran Ramanathan. Adaptive use of error-correcting codes for realtime communication in wireless networks. In Proceedings of IEEE INFOCOM 1998, San Francisco, March 1998. [18] Anthony Ephremides. Energy concerns in wireless networks. IEEE Wireless Communications, 9:48–59, 2002. [19] ETSI. High Performance Radio Local Area Network (HIPERLAN): Draft Standard. ETSI, March 1996. [20] ETSI. TR 101 683, HIPERLAN Type 2: System Overview. ETSI, February 2000. [21] ETSI. TS 101 475, BRAN, HIPERLAN Type 2: Physical (PHY) Layer. ETSI, March 2000. [22] Andras Farago, Andrew D. Myers, Violet R. Syrotiuk, and Gergely V. Zaruba. Meta-MAC protocols: automatic combination of MAC protocols to optimize performance for unknown conditions. IEEE Journal on Selected Areas in Communications, 18:1670–1681, 2000. [23] Luis Ferreira, Mario Alves, and Eduardo Tovar. Hybrid wired/wireless PROFIBUS networks supported by bridges/routers. In Proceedings of the 2002 IEEE Workshop on Factory Communication Systems, WFCS 2002, pp. 193–202, Västeras, Sweden, 2002. [24] Funbus-Projektkonsortium. Das Verbundprojekt Drahtlose Feldbusse im Produktionsumfeld (Funbus): Abschlubbericht. INTERBUS Club Deutschland e.V., Postf. 1108, 32817 Blomberg, Bestell-Nr: TNR 5121324, October 2000. Available at http://www.softing.de/d/NEWS/ Funbusbericht.pdf. [25] Jerry D. Gibson, editor. The Communications Handbook. CRC Press/IEEE Press, Boca Raton, FL, 1996. [26] Savo Glisic and Branka Vucetic. Spread Spectrum CDMA Systems for Wireless Communications. Artech House, Boston, 1997. [27] Andrea J. Goldsmith and Stephen B. Wicker. Design challenges for energy-constrained ad hoc wireless networks. IEEE Wireless Communications, 9:8–27, 2002. [28] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3, 2000. Availabe at http://www.comsoc.org/pubs/surveys. [29] Jaap C. Haartsen. The Bluetooth radio system. IEEE Personal Communications, 7:28–36, 2000. [30] Bruce A. Harvey and Stephen B. Wicker. Packet combining systems based on the Viterbi decoder. IEEE Transactions on Communications, 42:1544–1557, 1994. [31] Junji Hirai, Tae-Woong Kim, and Atsuo Kawamura. Practical study on wireless transmission of power and information for autonomous decentralized manufacturing system. IEEE Transactions on Industrial Electronics, 46:349–359, 1999. [32] Gavin Holland, Nitin Vaidya, and Paramvir Bahl. A rate-adaptive MAC protocol for wireless networks. In Proceedings of the Seventh Annual International Conference on Mobile Computing and Networking 2001 (MobiCom), Rome, Italy, July 2001. [33] Jörg Hähniche and Lutz Rauchhaupt. Radio communication in automation systems: the R-Fieldbus approach. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems (WFCS 2000), pp. 319–326, Porto, Portugal, 2000.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-17
[34] IEC (International Electrotechnical Commission). IEC-1158-1, FieldBus Specification, Part 1, FieldBus Standard for Use in Industrial Control: Functional Requirements. IEC. [35] International Organization for Standardization. ISO Standard 11898: Road Vehicle: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication. ISO, 1993. [36] Ivan Izikowitz and Michael Solvie. Industrial needs for time-critical wireless communication and wireless data transmission and application layer support for time-critical communication. In Proceedings of Euro-Arch ’93, Munich, 1993. [37] W.C. Jakes, editor. Microwave Mobile Communications. IEEE Press, Piscataway, NJ, 1993. [38] Hong ju Moon, Hong Seong Park, Sang Chul Ahn, and Wook Hyun Kwon. Performance degradation of the IEEE 802.4 Token Bus network in a noisy environment. Computer Communications, 21:547–557, 1998. [39] Samir Kallel. Analysis of a type-II hybrid ARQ scheme with code combining. IEEE Transactions on Communications, 38:1133–1137, 1990. [40] J.F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16:43–70, 1984. [41] A. Kutlu, H. Ekiz, M.D. Baba, and E.T. Powner. Implementation of “comb” based wireless access method for control area network. In Proceedings of the 11th International Symposium on Computer and Information Science, pp. 565–573, Antalaya, Turkey, November 1996. [42] A. Kutlu, H. Ekiz, and E.T. Powner. Performance analysis of MAC protocols for wireless control area network. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, pp. 494–499, Beijing, China, June 1996. [43] Kyung Chang Lee and Suk Lee. Integrated network of PROFIBUS-DP and IEEE 802.11 wireless LAN with hard real-time requirement. In Proceedings of IEEE 2001 International Symposium on Industrial Electronics, Pusan, Korea, 2001. [44] Shu Lin and Daniel J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1983. [45] Hang Liu, Hairuo Ma, Magda El Zarki, and Sanjay Gupta. Error control schemes for networks: an overview. MONET: Mobile Networks and Applications, 2:167–182, 1997. [46] Nitaigour Premchand Mahalik, editor. Fieldbus Technology: Industrial Network Standards for RealTime Distributed Control. Springer, Berlin, 2003. [47] Eytan Modiano. An adaptive algorithm for optimizing the packet size used in wireless ARQ protocols. Wireless Networks, 5:279–286, 1999. [48] Philip Morel. Mobility in MAP networks using the DECT wireless protocols. In Proceedings of the 1995 IEEE Workshop on Factory Communication Systems, WFCS ’95, Leysin, Switzerland, 1995. [49] Philip Morel and Alain Croisier. A wireless gateway for fieldbus. In Proceedings of the Sixth International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 95), 1995. [50] Philip Morel and Jean-Dominique Decotignie. Integration of wireless mobile nodes in MAP/MMS. In Proceedings of the 13th IFAC Workshop on Distributed Computer Control Systems DCCS 95, 1995. [51] Kevin J. Negus, Adrian P. Stephens, and Jim Lansford. HomeRF: Wireless networking for the connected home. IEEE Personal Communications, 7:20–27, 2000. [52] Giao T. Nguyen, Randy H. Katz, Brian Noble, and Mahadev Satyanarayanan. A trace-based approach for modeling wireless channel behavior. In Proceedings of the Winter Simulation Conference, Coronado, CA, December 1996. [53] Editors of IEEE 802.11. IEEE Standard for Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Networks: Specific Requirements: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical Layer (PHY) Extension in the 2.4 GHz Band. IEEE, 1999. [54] Editors of IEEE 802.11. IEEE Standard for Telecommunications and Information Exchange between Systems: LAN/MAN Specific Requirements: Part 11: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications: High Speed Physical Layer in the 5 GHz Band. IEEE, 1999.
© 2005 by CRC Press
22-18
The Industrial Communication Technology Handbook
[55] Bob O’Hara and Al Petrick. IEEE 802.11 Handbook: A Designer’s Companion. IEEE Press, New York, 1999. [56] K. Pahlavan and A.H. Levesque. Wireless Information Networks. John Wiley & Sons, 1995. [57] Arogyaswami Paulraj. Diversity techniques. In Jerry D. Gibson, editor, The Communications Handbook, pp. 213–223. CRC Press/IEEE Press, Boca Raton, FL, 1996. [58] Juan R. Pimentel. Communication Networks for Manufacturing. Prentice Hall International, Englewood Cliffs, NJ, 1990. [59] John G. Proakis. Digital Communications, 3rd edition. McGraw-Hill, New York, 1995. [60] C.S. Raghavendra and Suresh Singh. Pamas: power aware multi-access protocol with signalling for ad hoc networks. ACM Computer Communication Review, 27, 1998. [61] Theodore S. Rappaport. Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 2002. [62] Lutz Rauchhaupt. System and device architecture of a radio-based fieldbus: the RFieldbus system. In Proceedings of the Fourth IEEE Workshop on Factory Communication Systems 2002 (WFCS 2002), Västeras, Sweden, 2002. [63] Asuncion Santamaria and Francisco J. Lopez-Hernandez, editors. Wireless LAN: Standards and Applications, Mobile Communication Series. Artech House, Boston, 2001. [64] Günter Schäfer. Security in Fixed and Wireless Networks: An Introduction to Securing Data Communications. John Wiley & Sons, Chichester, U.K., 2003. [65] William Stallings. Wireless Communications and Networks. Prentice Hall, Upper Saddle River, NJ, 2001. [66] Ivan Stojmenovic, editor. Handbook of Wireless Networks and Mobile Computing. John Wiley & Sons, New York, 2002. [67] Takahiro Suzuki and Shuji Tasaka. Performance evaluation of video transmission with the PCF of the IEEE 802.11 standard MAC protocol. IEEE Transactions on Communications, E83-B:2068–2076, 2000. [68] Andrew S. Tanenbaum. Computernetzwerke, 3rd edition. Prentice Hall, Munich, 1997. [69] Editors of IEEE 802.11. IEEE Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, November 1997. [70] Fouad A. Tobagi and Leonard Kleinrock. Packet switching in radio channels. Part II. The hidden terminal problem in CSMA and busy-tone solutions. IEEE Transactions on Communications, 23:1417–1433, 1975. [71] Chai-Keong Toh. Ad Hoc Mobile Wireless Networks: Protocols and Systems. Prentice Hall, Upper Saddle River, NJ, 2002. [72] Elisabeth Uhlemann, Per-Arne Wiberg, Tor M. Aulin, and Lars K. Rasmussen. Deadline-dependent coding: a framework for wireless real-time communication. In Proceedings of the International Conference on Real-Time Computing Systems and Applications, pp. 135–142, Cheju Island, South Korea, December 2000. [73] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 3, WorldFIP. Union Technique de l’Electricit’e, 1996. [74] Harmen R. van As. Media access techniques: the evolution towards terabit/s LANs and MANs. Computer Networks and ISDN Systems, 26:603–656, 1994. [75] Richard van Nee and Ramjee Prasad. OFDM for Wireless Multimedia Communications. Artech House Publisher, Boston, 2000. [76] Malathi Veeraraghavan, Nabeel Cocker, and Tim Moors. Support of voice services in IEEE 802.11 wireless LANs. In Proceedings of IEEE INFOCOM 2001, Anchorage, AK, April 2001. [77] Matthijs A. Visser and Magda El Zarki. Voice and data transmission over an 802.11 wireless network. In Proceedings of the IEEE Personal, Indoor and Mobile Radio Conference (PIMRC) 95, pp. 648–652, Toronto, Canada, September 1995. [78] Xin Wang and Michael T. Orchard. On reducing the rate of retransmission in time-varying channels. IEEE Transactions on Communications, 51:900–910, 2003.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-19
[79] Stuart Williams. IrDA: past, present and future. IEEE Personal Communications, 7: February 2000. [80] Andreas Willig. An architecture for wireless extension of PROFIBUS. In Proceedings of IECON 03, Roanoke, VA, 2003. [81] Andreas Willig. Polling-based MAC protocols for improving realtime performance in a wireless PROFIBUS. IEEE Transactions on Industrial Electronics, 50:806 –817, 2003. [82] Andreas Willig, Martin Kubisch, Christian Hoene, and Adam Wolisz. Measurements of a wireless link in an industrial environment using an IEEE 802.11-compliant physical layer. IEEE Transactions on Industrial Electronics, 49:1265–1282, 2002. [83] Andreas Willig and Adam Wolisz. Ring stability of the PROFIBUS token passing protocol over error prone links. IEEE Transactions on Industrial Electronics, 48:1025–1033, 2001. [84] Michele Zorzi and Ramesh R. Rao. Error Control and energy consumption in communications for nomadic computing. IEEE Transactions on Computers, 46:279–289, 1997.
© 2005 by CRC Press
23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment 23.1 Introduction ......................................................................23-1 23.2 WLAN, WPAN, Cellular Networks, and Ad Hoc Networks ............................................................................23-2 23.3 Bluetooth Technology .......................................................23-4 Technical Background • Performance
23.4 IEEE 802.11........................................................................23-7 Technical Background • Performance
23.5 ZigBee...............................................................................23-13 Technical Background • Performance
Kirsten Matheus Carmeq GmbH
23.6 Coexistence of WPAN and WLAN (Bluetooth and IEEE 802.11b)..................................................................23-15 23.7 Summary and Conclusions ............................................23-16 References ...................................................................................23-17
23.1 Introduction The convenience of true mobility offered by wireless connectivity is the main factor behind widespread acceptance of wireless technologies. The global system for mobile communication (GSM), a secondgeneration cellular system designed mainly for mobile telephony, currently has more than one billion users worldwide. Systems like GSM or the third-generation universal mobile telecommunication system (UMTS) nevertheless require extensive infrastructure. The commercial and industrial deployment of systems that function on a smaller scale and do not require costly frequency licensing or infrastructure has become more appealing; such systems include wireless personal area networks (WPANs) and wireless local area networks (WLANs). As a consequence, the Bluetooth and IEEE 802.11 technologies and the newly emerging ZigBee have received a significant amount of public and scientific attention. Bluetooth, like ZigBee, is a typical WPAN representative that is inexpensive, consumes little power, is small in size, and supports voice and data services. The different IEEE 802.11 variants are WLAN representatives that provide comparably high user data rates at the cost of a higher battery power consumption. ZigBee is limited to small data rates, but at the same time consumes very little power. With their original purposes fulfilled, new areas of deployment are being
23-1 © 2005 by CRC Press
23-2
The Industrial Communication Technology Handbook
developed for these technologies. Bluetooth plays a larger role in markets like warehousing, retailing, and industrial applications [5]. IEEE 802.11 is considered for seamless coverage of complete cities [4, 59]. Depending on the exact application, users of these wireless technologies have certain expectations concerning the quality of the systems. The application requirements have to be considered carefully in order to be able to choose the most suitable technology. The main criteria are generally throughput, delay, and reliability. In addition, cost, power consumption, security, and, last but not least, availability can be important issues. Note, though, that owing to the possibility of interference, adverse radio conditions, or range limits, hard quality-of-service (QoS) guarantees for throughput cannot be provided by wireless systems. In industrial environments the radio conditions can be especially difficult because metal walls have a significant impact on the transmission. Metal shields radio transmissions while causing respectively more reflections. Systems requiring a certain amount of data rate within a strict time window, e.g., because they are security dependent, should not be wireless. In addition to the parameters discussed above — factors like unit density, traffic demand, mobility, environmental changes during deployment, interference, frequency range, etc., determine how well a technology satisfies the requirements. Thus, both the individual link performance and the overall network capacity should be optimized. This chapter first describes in Section 23.2 the basic differences between WLANs, WPANs, cellular networks, and ad hoc networks. In Sections 23.3, 23.4, and 23.5 the technologies Bluetooth, IEEE 802.11, and ZigBee are described in more detail. Each of those sections provides the technical background on the technology under consideration, as well as investigations on the performance of the systems and their suitability for industrial applications/factory floor environments. Section 23.6 shows how Bluetooth and IEEE 802.11b/g, which are placed in the same frequency band and are possibly used at the same time in the same location — coexist. Section 23.7 provides a summary and the conclusion.
23.2 WLAN, WPAN, Cellular Networks, and Ad Hoc Networks The expressions wireless local area network (WLAN), wireless personal area network (WPAN), cellular networks, and ad hoc networks are commonly used, often though without consistency or precision. In the following, a clarification of the terminology is given: WLAN: A wireless LAN has the same functionality as a wired LAN with the difference that the wires are replaced by air links. This means that within a restricted area (home, office, hot spot), intercommunication between all devices connected to the network is possible and the focus is on data communication as well as high data rates. The definition of WLAN says nothing on how the network is organized. Often an infrastructure of mounted access points (APs) enables wireless access to the wired LAN behind the APs, thus representing a cellular network structure. Nevertheless, a wireless LAN can also function on an ad hoc basis. WPAN: In a wireless PAN all devices are interconnectable. The difference is that all units are somehow associated with someone or something alike (either because they are yours or because they are shared or public devices you want to use) and are very nearby. A PAN can consist of a variety of devices and can even include different technologies. The applications are therefore not limited to data transmission, but voice communication can be used in a PAN as well. While you move within the WLAN, you can generally move with your WPAN. This means that several independent WPANs can coexist in the same area, each being self-sufficient without any infrastructure. Thus, they generally function on an ad hoc basis. The difference between cellular and ad hoc networks is visualized in Figure 23.1. As can be seen, there are several steps that lead to ad hoc networking: a pure ad hoc network employs neither any infrastructure nor a specific unit (like access point or base station) for the organization of coverage, synchronization, and services. Nevertheless, a network can be ad hoc, when it supports single hops only. It can be seen that WLAN technologies like IEEE 802.11 in the infrastructure mode or HIPERLAN/2 are in the same classification as typical cellular systems like GSM or UMTS/wireless code-division multiple
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-3
FIGURE 23.1 Classification of wireless technologies between cellular and ad hoc; MANET stands for mobile (or multihop [38]) ad hoc network (designed for Internet applications [31]), PRnet for public radio network [46], and ODMA for opportunity-driven multiple access [1, 54]. A Bluetooth scatternet means that several Bluetooth piconets are interlinked. BS = base station.
access (WCDMA); all are based on infrastructure and use a specific unit for central control. It would thus be correct to call these WLAN systems cellular systems. Despite this, there are distinct differences with regard to coverage. The wide (instead of local) area coverage of cellular systems like GSM has caused cellular systems to be associated with complete coverage and access everywhere, even though this is not correct. The description wireless wide area network (WWAN) gives a better idea of the difference from WLAN systems. The fact that the existing WWAN technologies like GSM and UMTS focus on voice communication, while WLAN technologies focus on data transmission, is not such an important difference. More important is that WWAN technologies are designed to support user mobility (and roaming) up to very high velocities, while WLAN systems support stationary or portable access. Because of the licensing regulations and costs for extensive infrastructure, WWAN systems like GSM and UMTS are not of interest for industrial applications. Thus, they are not discussed any further. From the radio network point of view, the most important technical distinction with respect to the discussed terminology — with far-reaching consequences to system design, network optimization, etc. [52] — does not have to be made between WPAN, WLAN, or WWAN technologies. It has to be made between systems organized in cells and ad hoc systems: • As cellular networks are systematically laid out, the minimum distance to the next co-channel interferer (i.e., the next uncoordinated user transmitting at the same time on the same frequency) is generally controllable, known, and fixed. In contrast, in ad hoc networks the next co-channel interferer can be very close from one moment to the next, without any possibility of influencing the situation (Figure 23.2). • The centralized control in cellular networks allows for effective and fair distribution of resources, because of the accurate knowledge of the overall traffic demand and the possibility of a more global resource management. For ad hoc networks or several coexisting WPANs, knowledge of overall traffic demand is generally not available and the systems compete for the resources.
FIGURE 23.2 Next closest co-channel interferer in cellular and ad hoc networks.
© 2005 by CRC Press
23-4
The Industrial Communication Technology Handbook
23.3 Bluetooth Technology 23.3.1 Technical Background Bluetooth (BT) is first of all a cable replacement technology aiming at effortless wireless connectivity in an ad hoc fashion. It supports voice as well as data transmission [7, 8, 10, 11, 26]. Its desired key associations are easy to use, low in power consumption, and low in cost, with the aim that the Bluetooth functionality is integrated in as many devices as possible. To meet these goals, the BT special interest group (SIG) placed the technology in the unlicensed ISM (industrial, scientific, and medical) band at 2.4 GHz. This allows close to worldwide deployment without the need to pay fees for frequency licensing. Nevertheless, it requires complying with the respective sharing rules.* As a consequence, Bluetooth performs a rather fast frequency hopping (FH) over 79 carriers of 1-MHz bandwidth each, such that every Bluetooth packet is transmitted on a newly chosen frequency (which results in a nominal hop rate of 1600 hops/s). To further reduce cost and support the distribution of the Bluetooth technology, the Bluetooth specification is an open standard that can be used without even needing to pay for the use of its key patents, on the term that the strict qualification procedure is passed.† The latter is to ensure the acceptance of the technology. For the same purpose, the specification contains application profiles. The profiles describe in detail the implementation of the foreseen applications, thus enabling units of different manufactures to communicate. The most important characteristics on the physical layer are as follows: The data are Gaussian frequency shift keying (GFSK) modulated at 1 Mbps and organized in packets consisting of access code, header, and payload. The employment of forward error correction (FEC) for the payload is optional. Interpacket interleaving is not performed. This allows for lower chip prices, because memory can be saved. Bluetooth uses a master–slave concept in which the unit that initiates a connection is temporarily assigned master status (for as long as the connection is up). The master organizes the traffic of up to seven other active units, called slaves, of this piconet. From the master’s device address the identity of each piconet, and with it the frequency-hopping sequence, can be derived. The header of a packet contains the actual addressee, the length of the packet, and other control information. Note that within one piconet the slave can only communicate with the master (and not directly with the other slaves) and this — in case of data connections — only after having been asked (i.e., polled).‡ The channel is organized in a time-division multiple access (TDMA)/time-division duplex (TDD) [24] scheme (Figure 23.3). It is partitioned into 625-ms time slots. Within this slot grid the master can only start transmission in the odd-numbered slots, while the slaves can only respond in even-numbered ones. When a unit is not already in one of the specific power save modes (sniff, hold, park), the slotting is power consumption friendly, because every unit has to listen only during the first 10 ms of its receive slot whether there is a packet arriving (and if not, can close down until the next receive slot**). This means it needs to listen into the channel only 10 ms/(2 · 625 ms) = 0.8% of the time, during an active connection in which no packets are sent. Yet another facet to the long battery life is the low basic transmit power of 0 dBm (resulting in a nominal range of about 10 m). Bluetooth can also be used with up to 20-dBm transmit power. This results in a larger range but requires the implementation of power control to fulfill the Federal Communications Commission (FCC) sharing rules.
*For the U.S. [20, Part 15], for Europe [16], for Japan [48]. †The Bluetooth specification has also been adopted by the IEEE. It can be found under IEEE 802.15.1. ‡Every BT unit can simultaneously be a member of up to four piconets (though it can be master in only one of them). A formation in which several piconets are interlinked in that manner is called scatternet. Aspects like routing, which are of interest in this constellation, will not be covered in this chapter. The chapter will thus focus on the properties of a single or multiple independent piconets. **This is quite different from channel access schemes like CSMA, as used in IEEE 802.11 (see Section 23.4). Unless asleep, IEEE 802.11 always has to listen into the channel.
© 2005 by CRC Press
23-5
Wireless Local and Wireless Personal Area Network Technologies
DH1
HV3
DH1
POLL
HV3
HV3 HV3
HV3
HV3
DH1
DH1
ACK
FIGURE 23.3 Example slot occupation within one piconet consisting of a master and three slaves; to slave 1 there is an SCO link, to slave 2 (best-effort) traffic is transmitted in both directions, and slave 3 currently has nothing to send (but has to respond to the POLL packet with an acknowledgment (ACK)). During the transmission of multislot packets the frequency is not changed.
Bluetooth provides two in principle different types of connections: asynchronous connectionless (ACL) links foreseen for data transmission and synchronous connection-oriented (SCO) links foreseen for speech transmission. For ACL links there are six packet types defined. The packets occupy either one, three, or five (625-ms) time slots, and their payloads are either uncoded (called DH1, DH3, or DH5, respectively) or protected with a 2/3 rate FEC using a (15, 10) shortened Hamming block code without any interleaving (called DM1, DM3, DM5, respectively). An automatic repeat request (ARQ) scheme initiates the retransmission of a packet in case the evaluation of the cyclic redundancy check (CRC) included in each ACL payload shows inconsistencies. This secures error-free reception of the transmitted information. Table 23.1 gives an overview of the throughput values achievable with ACL connections. The maximum (unidirectional) Bluetooth throughput is 723 kbps. As speech transmission is delay sensitive, the original SCO links support three different packet types that are transmitted at fixed intervals. These types were designed to transport continuous-variable slope delta (CVSD) encoded speech at 64 kbps. The packet types occupy always just one (625-ms) time slot, but they are differentiated by their payload FEC. The packet payloads are either unprotected (called HV3) or 2/3 rate FEC encoded (HV2) or protected with a 1/3 rate repetition code (HV1). For an HV3 connection, a packet is transmitted every sixth slot (Figure 23.3); for HV2, every fourth slot; and for HV1, every second slot (meaning that with one HV1 connection no other traffic can be transmitted in the piconet). Up to the Bluetooth Specification 1.1 [9] there was no ARQ scheme for SCO links. In case of an erroneous reception of the packet overhead, the SCO packet was replaced by an erasure pattern. In case noncorrectable bit errors occurred in the payload only, these errors were forwarded to the speech decoder. The latest specification, Bluetooth Specification 1.2 [10], includes an enhanced SCO link. This link allows very flexible deployment of the SCO link, providing for a reserved bandwidth for several transmission rates and a limited number of retransmissions. TABLE 23.1 Throughput Values for ACL Connections
Name
No. of Slots
FEC?
Max. No. of User Bytes
DH1 DH3 DH5 DM1 DM3 DM5
1 3 5 1 3 5
No No No 2/3 2/3 2/3
27 183 339 17 121 224
Unidirectional Throughput
Bidirectional Throughput
Forward
Reverse
Forward
Reverse
172.8k 585.6k 723.2k 108.8k 387.2k 477.8k
172.8k 86.4k 57.6k 108.8k 54.4k 36.3k
172.8k 390.4k 433.9k 108.8k 258.1k 286.7k
172.8k 390.4k 433.9k 108.8k 258.1k 286.7k
Note: The reverse link in the unidirectional case transmits DH1 or DM1 packets, depending on whether the forward link uses a DH or DM packet type.
© 2005 by CRC Press
23-6
The Industrial Communication Technology Handbook
To further improve coexistence with other systems in the ISM band, Bluetooth version 1.2 includes the possibility to perform adaptive frequency hopping (AFH), i.e., to exclude carrier frequencies used by other systems from the hop sequence. With AFH the nominal hop rate will be halved, because the specification has been changed such that the slave responds on the same frequency on which it received the packet from the master [10]. Security is supported in Bluetooth by the specification of authentication and encryption. For the future, a high-rate mode is envisioned that allows direct slave-to-slave communication at an about 10-fold transmission rate. The transmission takes place on 4-MHz channels that are chosen at specifically good locations within the 79-MHz bandwidth.
23.3.2 Performance On the factory floor Bluetooth can be used as a wireless add-on to wired systems or as a replacement of existing cabling. It can cover machine-to-machine communication, wireless/remote monitoring, or tracking and some type of positioning of moving entities [5, 23]. Considering the comparably short range of Bluetooth and the likely association with a specific unit (represented by a machine, person, or task), it is possible that several independently active Bluetooth piconets coexist and overlap in space. The use of frequency hopping helps to mitigate the effects of interference among these piconets. When assuming more or less time-synchronized piconets, a worst-case approximation of the loss rate can be made with Equation 23.1. It calculates the probability P(x, n) that of x other piconets, n hop onto the same frequency as the considered piconet: n
Ê x ˆ Ê 1 ˆ Ê 78 ˆ P(x, n) = Á ˜ Á ˜ Á ˜ Ë n¯ Ë 79 ¯ Ë 79 ¯
x- n
(23.1)
The probability that at least one of the other x piconets transmits on the same frequency is then P(x) = 1 – P(x, 0). The smaller the number of interfering piconets, the better the approximation offered by this approach, because for larger numbers, the distances to some of the interferers are likely to be too large to be harmful. In [52, 61, 62] a more sophisticated approach has been chosen, and Bluetooth–Bluetooth coexistence results have been obtained with the help of detailed radio network simulations that include traffic, distribution, and fading models, as well as adjacent channel effects. All results have been obtained for an office of 10 ¥ 20 m2, assuming an average master–slave distance of 2 m. Naturally a factory floor is likely to be significantly larger than 10 ¥ 20 m2. Nevertheless, the increased delay spread on the factory floor does not really effect Bluetooth due to its small range (which is different for WLAN technologies; see Section 23.4.2). On the factory floor it is thus possible to place the Bluetooth units with the same density as in the investigated office scenario without loss in performance. Because a factory floor is larger than the investigated office, the overall number of piconets that can be used simultaneously on the factory floor is larger too. Additionally, location and traffic of the factory floor units are likely to be more predictable. Directive antennas also help to improve the performance. The results of the aforementioned publications thus give a good idea of what performance is achievable: • A 10 ¥ 20 m2 room supports 30 HV3 simultaneous speech connections with an average packet loss rate of 1% (a limit that still allows for acceptable quality). • HV3 packet types are preferable to HV2 and HV1. The subjective quality will not increase with additional payload coding. Using a coded HV packet just increases (unnecessarily) the interference in the network and the power consumption. • One hundred simultaneous World Wide Web (WWW) sessions (bursty traffic with an average data rate of 33.2 kbps each) in the 10 ¥ 20 m2 size room result in only a 5% degradation of the aggregate throughput. • The maximum aggregate throughput in the room is 18 Mbps (at 50 fully loaded piconets). These piconets then transmit at a unidirectional data rate of 360 kbps each.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-7
• Long and uncoded packets are preferable to shorter and coded ones. It takes 60 interfering piconets using the same packet type, 10 interfering HV1 connections (worst case), or a link distance of 27 m (which is far beyond the envisioned range of 10 m) before another packet type yields a larger throughput than DH5 [52]. It is advisable not to use the optional FEC (DM packet types). As the coding is not appropriate to handle the almost binary character of the Bluetooth (ad hoc) transmission channel,* the additional power that would be needed for the coding can be saved. Bluetooth is inexpensive and consumes significantly less power than IEEE 802.11 systems. The ACL link is reliable with best-effort traffic (with a maximum throughput of 723 kbps). The SCO link has reserved bandwidth, though the packets might contain residual bit errors (even when using the enhanced SCO link). In principle, Bluetooth is very robust against other Bluetooth interference and good performance can be achieved even in very dense environments. Note that customized implementations, which cannot be based on existing profiles, might be difficult to realize, as the regulations do not allow the implementation of proprietary solutions. The specification of new profiles, though, can be quite time-consuming.
23.4 IEEE 802.11 23.4.1 Technical Background IEEE 802.11 includes a number of specifications that define the lower layers (mainly the physical (PHY) and medium access control (MAC) layers) for WLANs [32–34, 37, 53]. Being part of the IEEE 802 group means that an interface can be used (IEEE 802.2) to connect to the higher layers, which are then not aware of the — with IEEE 802.11 wireless — network that is actually transporting the data. The key intentions of IEEE 802.11 are thus to provide a high-throughput and continuous-network connection like that available in wired LANs. To encourage the wide employment of the technology, the use of IEEE 802.11 does not incur frequency licensing fees. IEEE 802.11 either uses infrared (IR) or transmits in the unlicensed ISM band at 2.4 GHz (like Bluetooth) or in 5-GHz bands that are license-exempt in Europe and unlicensed in the U.S. (UNII bands). In contrast to Bluetooth, the companies holding key patents within IEEE 802.11 can charge developers of IEEE 802.11 products for using the patents “on reasonable terms” [36]. In principle, it is possible to have an IEEE 802.11 WLAN consisting of mobile stations (MSs) only. It is more likely, though, that IEEE 802.11 is used as a wireless access technology to a wired LAN to which the connection is made by IEEE 802.11 access points (APs). Should the access point only employ the distributed coordination function,† the MAC layer supports collision avoidance by employing carriersense multiple access (CSMA). This means that before transmitting a packet, the respective unit has to listen for the availability of the channel.‡ If the channel is sensed free after having been busy, the unit waits a certain period (called DIFS) and then enters a random backoff period** of
*The reasons are manifold. Without interference, the channel varies already due to hopping over 79 relatively narrowband channels. Additionally, with the wavelength used in Bluetooth, even small changes in position can cause large changes in the received signal strength. When there is interference, the effect becomes more pronounced. The existence or nonexistence of a close co-channel interferer can make the channel change from very good to very bad within the fraction of a moment (Figure 23.2). †Which is likely and assumed in the following of this chapter. In theory, the standard also provides the use of a centralized point coordination function. ‡The implementor can choose whether the units react (1) on just other IEEE 802.11 traffic, (2) on just other IEEE 802.11 traffic above a certain receive signal strength, or (3) on any signal above a certain receive signal strength [34, Section 18.4.8.4]. **The random backoff period is entered only when the channel was busy before. Otherwise, the unit will transmit at once after DIFS.
© 2005 by CRC Press
23-8
The Industrial Communication Technology Handbook
nPHY + nr random (0Kmin( - 1,10233 )) ◊t slot 1444444 22444444
(23.2)
CW
with nPHY a parameter depending on the type of physical layer chosen, nr the index of the retransmission of the packet, tslot the slot duration, and CW the contention window (with CWmin = 2 nPHY - 1). If the channel is available after this period, the unit transmits its packet (consisting of a PHY header, MAC header, and payload). Upon correct reception, the addressee responds with an ACK packet a short period (called SIFS) later (Figure 23.4). The realized ARQ mechanism ensures reliable data. Obviously, the IEEE 802.11 WLAN MAC concept was designed for best-effort data traffic. Services for which strict delay requirements exist — like speech — are not supported well by the current IEEE 802.11 specifications. To be able to provide QoS in the future, there is an ongoing activity within the IEEE that extends the MAC protocol with the necessary parameters (see Table 23.3). At the moment, QoS is difficult to provide, especially when multiple units coexist in the network. The IEEE 802.11 MAC concept also includes a mechanism to solve the hidden-terminal problem. Whether this ready-to-send/clear-to-send (RTS/CTS) packet exchange saves more bandwidth (due to avoided retransmissions) than it needs depends on the terminal density and payload packet length [6]. As the RTS/CTS mechanism is optional and consumes additional bandwidth, it will be assumed in the following that the RTS/CTS mechanism is not used. IEEE 802.11 has a significantly larger power consumption than Bluetooth. Note that this is due not only to the higher transmit power (20 dBm in Europe, 30 dBm in the U.S.) but also to the CSMA concept. IEEE 802.11 units not specifically in sleep status have to listen to the channel all the time (unlike Bluetooth, which listens only at the beginning of the receive slot). Naturally, the higher transmit power allows for a larger range of about 50 m (with 20 dBm). There are six different options for the physical layer implementation of IEEE 802.11: IR: The infrared mode transmits near-visible light at 850- to 950-nm wavelength. The data are pulse position modulated at 1 or 2 Mbps. In principle, the signal needs line of sight (LOS) and cannot go through walls. This and the nonvisibility of IEEE 802.11 IR products are the reasons for not covering the IR mode further in this chapter. FHSS: The frequency-hopping spread-spectrum mode is placed (like Bluetooth) in the 2.4-GHz ISM band. The data are GFSK modulated using two levels for the 1-Mbps and four for the 2-Mbps modulation rates. The FHSS mode divides the 79 hop frequencies into three distinct sets with 26 different sequences each. The hopping rate can be as slow as 2.5 hops/s. Despite its comparably good interference robustness [56], the popularity of the FHSS mode is limited due to its comparably low transmission rates. Note, though, that its principles have been incorporated in the HomeRF standard [11, 29]. DSSS: The direct-sequence spread-spectrum mode is also used in the 2.4-GHz ISM band. The nominal bandwidth of the main lobe is 22 MHz. The transmit power reduction in the first and residual side lobes is supposed to be 30 and 50 dB, respectively (see Figure 23.5 for a measured spectrum). DIFS random back-off
SIFS
random DIFS back-off
SIFS
random DIFS back-off
SIFS
random DIFS back-off
ACK
ACK
ACK
FIGURE 23.4 Principle time behavior of IEEE 802.11 under the distributed coordination function; note that the random backoff timer from nontransmitting units continues after the next DIFS with the remaining number of slots.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-9
FIGURE 23.5 Measured spectrum of an IEEE 802.11b WLAN PCMCIA card.
In principle, 11/13 (U.S./Europe) center frequencies are available for the DSSS system. Nevertheless, using several systems in parallel requires a spacing of 25/30 MHz (U.S./Europe), which consequently only allows three systems to be used simultaneously. The DSSS mode includes the original version (specified in IEEE 802.11) and a high-rate extension (specified in IEEE 802.11b). In the original mode the chipping of the baseband signal is performed with 11 Mz, employing an 11-chip pseudorandom code (Barker sequence). For the 1-Mbps modulation rate, a 1-bit DBPSK symbol is spread with the Barker sequence; for the 2-Mbps modulation rate, a 2-bit DQPSK symbol is spread with the same sequence.* The high-rate extension IEEE 802.11b was, at the time of writing, the most popular and widespread WLAN technology. For the PHY header IEEE 802.11b uses the same 1- and 2-Mbps modulations as the plain DSSS mode. Note, though, that a shortened header of 96 ms can be used. For the IEEE 802.11b PHY payload (consisting of the MAC header and the user data), a 5.5- and 11-Mbps complementary code keying (CCK) modulation is used. The CCK employs a variation of M-ary orthogonal signaling (complex Walsh–Hadamard functions) with eight complex chips in each spreading code word. For the 5.5-Mbps modulation rate, 4 bits are mapped onto 8 chips, and for 11 Mbps, 8 bits are mapped onto 8 chips. PBCC: The packet binary convolution code (PBCC) physical layer is one of three additional possibilities standardized in IEEE 802.11g as an even higher rate extension to IEEE 802.11b in the 2.4-GHz band. In this optional PHY a single-carrier modulation scheme is used that encodes the payload using a 256-state PBCC. The foreseen modulation rates are 22 and 33 Mbps. OFDM: The orthogonal frequency-division multiplexing (OFDM) physical layer was originally designed for 5-GHz bands (also referred to as IEEE 802.11a) but has now been adopted for the 2.4-GHz band (as part of IEEE 802.11g). The parameters of the IEEE 802.11 OFDM PHY had at the time
*Note that in contrast to a typical CDMA system like UMTS, all users use the same spreading code.
© 2005 by CRC Press
23-10
The Industrial Communication Technology Handbook
of standardization been harmonized with those of HIPERLAN/2.* Seven modi are defined ranging from BPSK with a rate of R = 1/2 FEC (lowest modulation rate with 6 Mbps) to 64-QAM with a rate of R = 3/4 FEC (highest modulation rate with 54 Mbps; Table 23.2). The OFDM technique is based on a 64-point IFFT/FFT, while only using 52 of the subcarriers (48 for user data, 4 for pilot carriers). The subcarrier spacing is f = 20 MHz/64 = 0.3125 MHz. Note that full OFDM symbols always have to be transmitted. This means that they possibly have to be filled up with dummy bits. To transmit one OFDM symbol tsym = 1/f + 1/4 · 1/f = 4 ms are needed, with the latter part representing the time used for the guard interval to combat multipath propagation. For synchronization, channel estimation, and equalization, a training sequence is transmitted, which consists of often repeated short and two repeated long OFDM symbols [35, 58]. DSSS-OFDM: This optional physical layer format of IEEE 802.11g combines the DSSS PHY with the OFDM PHY such that for the header DSSS is used while the payload employs OFDM (including the OFDM preamble). Table 23.2 compares the theoretical maximum throughput (TP) values of the different IEEE 802.11 PHY versions after the MAC. The maximum payload length is 4095 bytes (which has to include the 34byte MAC header). Fifteen hundred bytes is the common length of an Ethernet packet (plus 34 bytes for the MAC header and checksum), 576 is a typical length for a Web-browsing packet, and 60 bytes is the length of a Transmission Control Protocol (TCP) acknowledgment. The throughput TP is calculated as follows: TP =
PayBytes ◊ 8 CWmin DIFS ◊ t slot + t data packet + SIFS + t ACK { + 2 243 2t slot + SIFS 14
(23.3)
average back-off
The durations needed to transmit the data packet tdata packet and acknowledgment tACK vary depending on the physical layer chosen. For the FHSS and DSSS modes they are calculated as follows: t data DSSS/FHSS = t PHYh +
34 ◊ 8 14 ◊ 8 PayBytes ◊ 8 + ; t ACK DSSS/FHSS = t PHYh + ModRate ModRate ModRate 1424 3
(23.4)
MACheader
For the OFDM physical layer in the 5-GHz bands, Equation 23.5 needs to be calculated. For the OFDM mode in the 2.4-GHz band, an additional 6-ms signal extension has to be added to both. “Ceil” stands for the next larger integer. Ê 16 + (34 + PayBytes) ◊ 8 + 6 ˆ t data OFDMa = t PHYh + t symceil Á ˜ Ë ModRate / 12 Mbps ◊ 48 ¯ (23.5) t ACK OFDMa
16 + 14 ◊ 8 + 6 Ê ˆ = t PHYh + t symceil Á ˜ Ë ModRate / 12 Mbps ◊ 48 ¯
The packet and acknowledgment durations for the DSSS-OFDM PHY are calculated quite similarly to Equation 23.5:
*Originally HIPERLAN/2 was intended to be the WLAN technology for the European market, while IEEE 802.11 was the pendant for North America. Owing to delays in development, HIPERLAN/2 lost the chance to establish itself on the market, despite its better overall network performance (from which the user would have had the advantage of higher user data rates). Publications on HIPERLAN/2 include [17, 18, 19, 24, 30, 40, 41, 43, 45, 47, 49, 57].
© 2005 by CRC Press
Mode
Frq. Band
tslot
SIFS
CWmin
tPHYh
FHSS
2.4 GHz
50 ms
28 ms
15
128 ms
DSSS
2.4 GHz
20 ms
10 ms
31
192 ms 96 ms
OFDM
a
5 GHz (2.4 GHz)
9 ms (9/20 ms)
16 ms (10 ms)
15 (31)
20 ms tsym = 4 ms
Modulation
ModRate (Mbps)
GFSK (2 level) GFSK (4 level) DBPSK DQPSK CCK (QPSK) CCK (QPSK) BPSK, R1/2
1 2 1 2 5.5 11 6
BPSK, R3/4
9
QPSK, R1/2
12
QPSK, R3/4
18
16-QAM, R1/2
24
16-QAM, R3/4
36
64-QAM, R1/2
48
64-QAM, R3/4
54
TP (Mbps) for PayBytes 60
576
1500
4061
0.29 0.39 0.30 0.40 0.67 0.75 1.51 (1.24/0.83) 1.81 (1.44/0.91) 2.02 (1.55/0.96) 2.25 (1.70/1.01) 2.38 (1.76/1.03) 2.59 (1.86/1.07) 2.64 (1.89/1.07) 2.70 (1.92/1.08)
0.79 1.40 0.80 1.42 3.14 4.54 4.57 (4.29/3.64) 6.32 (5.81/4.67) 7.86 (7.05/5.44) 10.4 (8.97/6.53) 12.3 (10.3/7.22) 15.2 (12.3/8.14) 17.1 (13.7/8.7) 17.9 (14.2/8.90)
0.91 1.72 0.91 1.72 4.26 7.11 5.37 (5.2/4.8) 7.79 (7.43/6.64) 10.0 (9.45/8.21) 14.1 (13.0/10.8) 17.6 (15.9/12.7) 23.7 (20.8/15.6) 28.5 (24.3/17.5) 30.8 (26.0/18.3)
0.96 1.89 0.97 1.89 4.97 9.16 5.76 (5.68/5.49) 8.51 (8.35/7.96) 11.2 (10.9/10.3) 16.3 (15.8/14.4) 21.2 (20.2/18.1) 30.3 (28.4/24.3) 38.4 (35.4/29.3) 42.2 (38.6/31.4)
Wireless Local and Wireless Personal Area Network Technologies
TABLE 23.2 Comparison of Different Achievable Maximum Throughput Rates (in Mbps) for the Different IEEE 802.11 PHY Modesa
Except for the optional DSSS-OFDM and PBCC.
23-11
© 2005 by CRC Press
23-12
The Industrial Communication Technology Handbook
TABLE 23.3 Overview of Activities within 802.11 Group
Subject
Status
a b c d e f g h i j k l m
PHY in the 5 GHz bands High rate mode in 2.4 GHz band Extensions for specific MAC procedures Supplements for new regulatory regions Enhancements for QoS To achieve multivendor access point interoperability Enhancements of 802.11b data rates Extensions for channel selection for 802.11b Enhancements for security and authentication algorithms Enhancements for the use of 802.11a in Japan Definition of radio resource management measurements Nonexistent Maintenance of 802.11-1999
Completed Completed Completed Completed Ongoing Completed Completed Almost completed Ongoing Ongoing Ongoing, initialized in 2003 Ongoing, initialized in 2003
Ê 16 + (34 + PayBytes) ◊ 8+ 6 ˆ t data DSSS-OFDM = t PHYh DSSS + t Preamble OFDM + t symceil Á ˜ + 6ms Ë ModRate /12 Mbps ◊ 48 ¯ (23.6) 16 + 14 ◊ 8+ 6 Ê ˆ t ACK DSSS-OFDM = t PHYh DSSS + t Preamble OFDM + t symceil Á ˜ + 6 ms Ë ModRate /12 Mbps ◊ 48 ¯ For the PBCC mode at 22 Mbps the data packet and acknowledgment durations are given in Equation 23.7. In the case of 33 Mbps, a 1-ms clock switch time has to be added to both: t data PBCC 22 = t PHYh +
(34 + PayBytes+1) 14 ◊ 8 ; t ACK PBCC 22 = t PHYh + ModRate ModRate
(23.7)
For small payload sizes (60 and 576 bytes) the throughput values are not very good. When considering Ethernet packets, the highest theoretical throughput rates are 7.11 Mbps for IEEE 802.11b and 30.8/26.0 Mbps for the OFDM modes. Naturally, these wireless throughput rates are smaller than the wired ones (where 70 to 80 Mbps is possible), but at least for the higher modulation rates with 1500-byte Ethernet packets the throughput values are reasonably good. Note that the real-life throughput values for IEEE 802.11b systems are still smaller than the theoretically possible ones: values around 5 Mbps have been measured [50, 51]. This is because for actual implementations used, higher protocol layers like TCP/IP cause additional overhead and delays. For security, IEEE 802.11 WLANs support several authentication processes, which are listed in the specification (none are mandatory).* Table 23.3 lists the standardization activities involving IEEE 802.11.
23.4.2 Performance Next to aspects like individual link throughput, network capacity, and interference robustness, the transmission environment has to be taken into consideration when contemplating the use of IEEE 802.11 on the factory floor. Because the scenarios envisioned for IEEE 802.11 were placed primarily in homes and offices, some differences occur when looking at the delay spread. While in homes and offices the delay spread is assumed to be GWControl().GetControlState(ControlState); /*get current control state */ pGemConfControlStateChange pControlStateChange = new GemConfControlStateChange; pControlStateChange->NewControlState = ControlState; /*set state */ pControlStateChange->ControlStateName = getStrControlState(ControlState); OnGemConfControlStateChange(0, (LPARAM) pControlStateChange); /*notify state change */
FIGURE 42.15 (a) Sample GCD file. (b) C++ sample, header files. (c) C++ sample; create GWGEM object. (d) C++ sample; set up control state. (e) C++ sample; set up communication state. (f) C++ sample; set up spooling state. (g) C++ sample; remove GWGEM object. (h) C++ sample; fire an event. (i) C++ sample; disable the communication link. (j) C++ sample; send S1, F13 to host.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
(e)
pGlobalGem -> GWLink().GetLinkState(LinkState); /* get initial communications state */ SetDlgItemText(IDS_LINKSTATE, getStrLinkState(LinkState)); /* display on GUI */
(f)
pGlobalGem -> GWSpool().GetSpoolState(SpoolState); /* get the spool state */ SetDlgItemText(IDS_SPOOLSTATE, getStrSpoolState(SpoolState)); /*display */
(g)
(h)
delete pGEM; /* remove GWGEM object */
int status = pGlobalGem->GWEvent().Send(EventID); /*send event with ID = EventID*/
(i)
pGlobalGem->GWLink().Disable();
(j)
/* following code sends S1F13 to the host system */
SDRTKT tkx = 0;
/* message structure declared in gwmessage.h */ /* set SDR ticket value to 0 */
unsigned char buffer[512] ;
/* Message text buffer */
unsigned char ModelNum[7] = "SDR";
/* set model number */
unsigned char SoftRev[7] = "Rev10";
/* software version */
pmsg->stream = 1;
/* set stream to 1 */
pmsg->function = 13;
/* set function to 13 */
pmsg->wbit = 1;
/* request reply */
pmsg->buffer = buffer;
/* pointer to message buffer */
PSDRMSG pmsg;
pmsg->length = sizeof(buffer); pGlobalGem->GWSdr().SdrItemInitO( pmsg);
/* fill up SECS II message */
pGlobalGem->GWSdr().SdrItemOutput( pmsg, GWS2_L, NULL, (SDRLENGTH)2); pGlobalGem->GWSdr().SdrItemOutput(pmsg,GWS2_STRING,ModelNum, (SDRLENGTH)6); pGlobalGem->GWSdr().SdrItemOutput( pmsg,GWS2_STRING, SoftRev, (SDRLENGTH)6); int status = pGlobalGem->GWSdr().SdrRequest(0, pmsg, &tkx); /* send S1F13 out */
FIGURE 42.15 Continued.
© 2005 by CRC Press
42-25
42-26
The Industrial Communication Technology Handbook
Host Application
SECSII Message Handler
HSMS Driver
TCP/IP
HSMS Driver
SECSII Message Handler
Equipment State Model
RS-232 Connection
PLC
Actuators & Sensors Mechanical
FIGURE 42.16
Design of intercommunication process.
References [1] Tin O., Competitive Analysis and Conceptual Design of SEMI Equipment Communication Standards and Middleware Technology, Master of Science (Computer Integrated Manufacturing) Dissertation, Nanyang Technological University, 2003. [2] SEMATECH, Generic Equipment Model (GEM) Specification Manual: The GEM Specification as Viewed from Host, Technology Transfer 97093366A-XFR, 2000, pp. 4–39. [3] SEMATECH, High Speed Message Services (HSMS): Technical Education Report, Technology Transfer 95092974A-TR, 1999, pp. 11–34. [4] GW Associates, Inc., Solutions for SECS Communications, Product Training (PowerPoint slides), 1999. [5] SEMI International Standards, CD-ROM, SEMI, 2003. [6] Semiconductor Equipment and Materials International Equipment Automation/Software, Volumes 1 and 2, SEMI, 1995. [7] SEMATECH, CIM Framework Architecture Guide 1.0, 97103379A-ENG, 1997, pp. 1–31. [8] SEMATECH, CIM Framework Architecture Guide 2.0, 1998, pp. 1–24.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-27
[9] SEMI, Standard for the Object-Based Equipment Model, SEMI Draft Document 2748, 1998, pp. 1–52. [10] SEMI E98, Provisional Standard for the Object-Based Equipment Model. [11] Weiss M., Increasing Productivity in Existing Fabs by Simplified Tool Interconnection, 12th edition, Semiconductor FABTECH, 2001, pp. 21–24. [12] Yang H.-C., Cheng F.-T., and Huang D., Development of a Generic Equipment Manager for Semiconductor Manufacturing, paper presented at 7th IEEE International Conference on Emerging and Factory Automation, Barcelona, October 1996, pp. 727–732. [13] Feng C., Cheng F.-T., and Kuo T.-L., Modeling and Analysis for an Equipment Manager of the Manufacturing Execution System in Semiconductor Packaging Factories, 1998, pp. 469–474. [14] ControlPRoTM, Developer Guide, Realtime Performance, Inc., 1996. [15] Kaufmann T., The Paradigm Shift for Manufacturing Execution Systems in European Projects and SEMI Activities, 8th edition, Semiconductor FABTECH, 2002, pp. 17–25. [16] GW Associates, Inc., SECSIMPro GEM Compliance Scripts User’s Guide, 2001. [17] GW Associates, Inc., SECSIMPro, SSL Reference Guide, 2001. [18] GW Associates, Inc., SECSIMPro, User’s Guide, 2001. [19] SEMATECH, SEMASPEC GEM Purchasing Guidelines 2.0, Technology Transfer 93031573B-STD, 1994, pp. 10–30.
Web References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
www.cimetrix.com www.abakus-soft.de www.kinesysinc.com www.secsandgem.com www.sdiusa.com www.yokogawa.com.sg www.asyst.com www.siautomation.com www.ais-dresden.de www.agilent.com
© 2005 by CRC Press
Home page of Cimetrix Software Home page of Abakus Software Home page of Kinesys Software — The GEM Box Home page of Ergo Tech Software Home page of SDI Software Home page of Yokogawa Software Home page of Asyst Software Home page of SI Automation Software Home page of VECC Product Home page of Agilent Software