950 94 7MB
Pages 345 Page size 385.236 x 595.386 pts Year 2011
Springer Tracts in Advanced Robotics Volume 11 Editors: Bruno Siciliano · Oussama Khatib · Frans Groen
Springer Berlin Heidelberg NewYork Hong Kong London Milan Paris Tokyo
J.-H. Kim D.-H. Kim Y.-J. Kim K.-T. Seow
Soccer Robotics With 205 Figures and 12 Tables
13
Professor Bruno Siciliano, Dipartimento di Informatica e Sistemistica, Universit`a degli Studi di Napoli Federico II, Via Claudio 21, 80125 Napoli, Italy, email: [email protected] Professor Oussama Khatib, Robotics Laboratory, Department of Computer Science, Stanford University, Stanford, CA 94305-9010, USA, email: [email protected] Professor Frans Groen, Department of Computer Science, Universiteit van Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands, email: [email protected] STAR (Springer Tracts inAdvanced Robotics) has been promoted under the auspices of EURON (European Robotics Research Network)
Authors Dr. Jong-Hwan Kim Dept. of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology (KAIST) 373-1 Gusong-dong, Yusong-gu Daejeon 305-701 Republic of Korea
Dr. Dong-Han Kim Dept. of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology (KAIST) 373-1 Gusong-dong, Yusong-gu Daejeon 305-701 Republic of Korea
Dr. Yong-Jae Kim Intelligent Robot Lab. Inst. of Intel. Syst. Mechatronic Center Samsung Electronics Co. Maeton 3-dong Paldal-gu, Suwon-si Gyeonggi-do 442-742 Republic of Korea
Dr. Kiam-Tian Seow School of Computer Engineering Nanyang Technological University Nanyang Avenue Singapore 639798 Singapore
ISSN 1610-7438 ISBN 3-540-21859-9
Springer-Verlag Berlin Heidelberg New York
Library of Congress Control Number: 2004104485 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable to prosecution under German Copyright Law. Springer-Verlag is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2004 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Digital data supplied by authors. Data-conversion and production: PTP-Berlin Protago-TeX-Production GmbH, Germany Cover-Design: design & production GmbH, Heidelberg Printed on acid-free paper 62/3020Yu - 5 4 3 2 1 0
Editorial Advisory Board EUROPE Herman Bruyninckx, KU Leuven, Belgium Raja Chatila, LAAS, France Henrik Christensen, KTH, Sweden Paolo Dario, Scuola Superiore Sant’Anna Pisa, Italy R¨udiger Dillmann, Universit¨at Karlsruhe, Germany AMERICA Ken Goldberg, UC Berkeley, USA John Hollerbach, University of Utah, USA Lydia Kavraki, Rice University, USA Tim Salcudean, University of British Columbia, Canada Sebastian Thrun, Carnegie Mellon University, USA ASIA/OCEANIA Peter Corke, CSIRO, Australia Makoto Kaneko, Hiroshima University, Japan Sukhan Lee, Sungkyunkwan University, Korea Yangsheng Xu, Chinese University of Hong Kong, PRC Shin’ichi Yuta, Tsukuba University, Japan
Foreword
At the dawn of the new millennium, robotics is undergoing a major transformation in scope and dimension. From a largely dominant industrial focus, robotics is rapidly expanding into the challenges of unstructured environments. Interacting with, assisting, serving, and exploring with humans, the emerging robots will increasingly touch people and their lives. The goal of the new series of Springer Tracts in Advanced Robotics (STAR) is to bring, in a timely fashion, the latest advances and developments in robotics on the basis of their significance and quality. It is our hope that the wider dissemination of research developments will stimulate more exchanges and collaborations among the research community and contribute to further advancement of this rapidly growing field. This monograph written by Jong-Hwan Kim, Dong-Han Kim, Yong-Jae Kim and Kiam-Tian Seow forms an introduction to the field of Soccer Robotics. Soccer Robotics has become an important research area with different competing initiatives. It integrates mechatronics, computer science and artificial intelligence techniques to create real-world autonomous systems, which are not only fun to see. Soccer Robotics forms also a test bed for system integration of autonomous systems comparing different approaches in various competitions with different levels of distributed perception and collaboration. Soccer Robotics opens the route towards collaborating autonomous robot systems in a real-world adversarial setting. The focus of this monograph is on the FIRA framework of Soccer Robotics, in particular MiroSot, which uses a central overhead camera to overview the whole soccer field, and a central control of the robots. The monograph gives a complete description of the different aspects needed to create a soccer team. It describes the hardware aspects, the computer vision needed, navigation, action selection, basic skills and game strategy. These aspects are described at an undergraduate level, and up to a junior graduate level, showing its use of as text book but also a must for everyone who wants to enter MiroSot robotics. A fine addition to the series! Amsterdam February 2004
Frans Groen STAR Editor
Preface
Autonomous robots which are adaptable, communicative and objectiveoriented, and intelligent multi-agent robotic systems in general, are so evidently complex that it has become increasingly necessary to find a domain that can serve as an integrated framework for the complementary purposes of research and education. Robot soccer is one such suitable domain that is representative of intelligent multi-agent robotic systems, in which multiple robotic agents (or simply, multiple robots) need to cooperate in an adversarial environment to achieve specific objectives. It is a game based on the modified rules of human soccer and is played in a scaled down soccer field, in which two soccer robot teams compete by attempting to move a ball into the opponent team’s goal. The team with a higher score at the end of regulation time wins. Technically, robot soccer is a competitive game that makes heavy demands in all the key areas of robot technology, namely, mechanics, control, sensors, communication, and intelligence. On the one hand, it spurs wide-ranging multidisciplinary research work by providing a comprehensive test bed that facilitates the concrete demonstration and performance evaluation of new ideas and concepts. On the other hand, it captivates as an educational tool that helps students better understand and appreciate the scientific knowledge and technological developments in an inherently multidisciplinary setting of intelligent multi-agent robotic systems. Since its inception in 1995, robot soccer has evolved into a recognized area of its own. This area, called Soccer Robotics, is a subfield of AI Robotics that offers a challenging domain for research and education in a large spectrum of issues integrating the problems of sensing, deciding and acting that are of relevance to the development of complete autonomous agents in general. The hope in Soccer Robotics, of course, is that by discovering how to get a team of robots to sense with acuity, decide collaboratively and act in coordination within the limited context of a soccer game, it will be possible to use the same techniques and technologies to build robots that carry out other more useful tasks. The development of this subfield is actively supported through the Micro-Robot Soccer Tournament (MiroSoT) and Simulated-Robot Soccer Tournament (SimuroSoT) Categories of the FIRA Cup, an international event organized by the Federation of International Robot-soccer Association (FIRA, http://www.fira.net). FIRA Cup, held annually since 1996, has
X
Preface
been the ‘examination’ ground for the testing of new techniques and technologies integrated in the game of robot soccer, and has provided much excitement and entertainment for all those who participated. This new book Soccer Robotics is intended to be a comprehensive introduction to the field of soccer robotics, emphasizing breadth of coverage and accessibility of the material to readers with possibly different backgrounds. Its key feature is the emphasis placed on a robot soccer-programming framework that integrates all the key areas of robot technology. Until now, these areas had been treated mainly in separate books or in research literature only, outside the arena of soccer robotics. A substantial portion of this book is based on the first author’s lectures EE006 Robot Soccer System at the Korea Advanced Institute of Science and Technology in the period July 13 - August 14, 1998. The material on robot soccer originated with the KAIST postgraduate theses of Dong-Han Kim (2003,1998), Yong-Jae Kim (2003), Hyun-Sik Shim (1998), Mun-Soo Lee (2000), Heung-Soo Kim (1997) and others, together with joint publications with the first author. The experimental robot soccer system program for Small League MiroSoT that supplements this book has been developed with the help of many students in the first author’s Robot Intelligence Technology (RIT) Lab at KAIST, while the simulator package for Large League SimuroSoT has been developed by Bing-Rong Hong’s research team at Harbin Institute of Technology, P.R. China. Both are available for free download from the FIRA website http://www.fira.net. Soccer Robotics is written as a textbook for practical courses at the undergraduate level, and up to the first-year graduate level. It is useful for researchers and practising engineers interested in trying out new techniques in the domain of robot soccer. This book is also suitable for anyone interested in learning and developing robot soccer systems for edutainment purposes. For those interested in participating in either the MiroSoT or SimuroSoT categories of the annual FIRA Cup and other robot-soccer championship events, the material in this book will provide a firm foundation for the development of robot soccer systems to competitive standards. The book will be of interest to scientists, engineers and students in a variety of disciplines besides AI Robotics, where the use of robot soccer as a test bed is relevant: sensors (including computer vision), control, communication, multiagent systems and artificial intelligence. To review the chapters briefly: Chapter 1 defines the multi-agent framework of soccer robotics in terms of the three commonly accepted primitives of AI robotics, namely, SENSE, DECIDE and ACT. The goals of soccer robotics in research and education of intelligent multi-agent robotic systems are explained. The various categories of robot soccer created by FIRA, an international regulating body for robot soccer, are described. The classification of robot soccer systems for MiroSoT is also examined.
Preface
XI
Chapter 2 presents the basic theoretical background on the mechanical motion of mobile robots, with emphasis on the kinematics of a two-wheel MiroSoT robot. The essentials of hardware and firmware needed to build a two-wheel MiroSoT robot with IR or RF communication are covered in sufficient detail. Chapter 3 focusses on the (visual) SENSE primitive; in particular, it presents how the postures of target objects in robot soccer can be computed using centralized vision techniques. The basics of computer vision are first introduced. Real examples are then provided to highlight the practical considerations in building a good vision system for a MiroSoT team. Chapter 4 focusses on the DECIDE and ACT primitives. A hybrid control architecture is introduced that integrates the three primitives of SENSE, DECIDE and ACT in a hierarchy of four interacting levels, namely, role, action, behaviour and execution. To expose the technical challenges involved, example strategies are given at the role level and action level. Action designs for robot soccer, to be implemented at the behavioral level, are classified and explained. An overview of classical PID control, applicable at the behavioral level and execution level, follows. Finally, two different navigation methods, applicable at the behavioral level, are presented. Chapter 5 motivates the importance of the various aspects of intelligence, namely, search and evolution, knowledge representation and inference and learning and adaptation, as needed by the DECIDE and ACT primitives. Following, it demonstrates, by examples, the applicability of Petri nets, Q-learning, neural networks, evolutionary programming and fuzzy logic to robot soccer under the MiroSoT category. These soft-computing paradigms make concrete (at least one of) the abstract aspects of intelligence. For each paradigm, one or two examples are provided that address some key issues at specific hierarchical levels of the hybrid control architecture introduced in Chapter 4. Chapter 6 introduces a host software model for MiroSoT robot soccer system. An overview of the programming framework for robot soccer is then presented, in which a number of the robot soccer concepts described in earlier chapters are illustrated through example ‘C’ programs which are the key functions of a robot soccer system for Small League (3-a-side) MiroSoT. Chapter 7 complements the real-system programming framework presented in the previous chapter with a computer-simulated system programming framework. It presents the core simulator system and programming framework for Large League (11-a-side) SimuroSoT. Example ‘C++ ’ codes are provided for illustration. As do all authors of technical work, we wish to acknowledge the many contributors on whose work our own presentation is partly based. The list of references gives some indication of those to whom we are in debt. On a more
XII
Preface
personal level, we would expressly like to thank Hyun-Sik Shim, Myung-Jin Jung, Heung-Soo Kim, Kuk-Hyun Han, Kui-Hong Park, Ming Yu-Chi, JunSu Jang, Kang-Hee Lee, Jayyati Ghoshal and many other students in the RIT Lab who have contributed to this book in a wide range of invaluable ways. This book would never have been possible without the funding that came from a variety of sources to support the research and development work in soccer robotics, and the writing of this book. The first author would like to acknowledge each of these agencies: LG, Samsung, POSCO, KOSEF and MRDEC. The last author would like to acknowledge the award of a ‘Brain Korea 21’ Institute Fellowship in 2002 that supported his joint research and authorship at KAIST. Finally, the authors are indebted to Dr. Thomas Ditzinger, Engineering Editor at Springer Verlag, for editorial assistance, and quality production of this book. KAIST, Daejeon, Korea, Jan 2003
Jong-Hwan Kim
KAIST, Daejeon, Korea
Dong-Han Kim
Samsung Electronics, Suwon, Korea
Yong-Jae Kim
NTU, Singapore
Kiam-Tian Seow
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IX
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIX List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXV 1.
Soccer Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Agents, Multi-agent Systems, and AI Robotics . . . . . . . 1.1.2 Cooperative Robot Teams . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Domain Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Robot Soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Goals of Soccer Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Test Bed for Robotics Research and Development (R&D) 1.2.2 Educational Tool for AI Robotics . . . . . . . . . . . . . . . . . . . 1.2.3 FIRA Robot World Cup . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Technology Transfer to New Useful Tasks . . . . . . . . . . . . 1.3 Fundamental Motion Benchmarks for Robot Soccer . . . . . . . . 1.3.1 Striking the Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Passing the Ball to Another Robot . . . . . . . . . . . . . . . . . 1.3.3 Striking a Moving Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Passing a Moving Ball to a Moving Robot . . . . . . . . . . . 1.3.5 Dribbling the Ball Past Obstacles . . . . . . . . . . . . . . . . . . 1.4 Categories of Robot Soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 MiroSoT and NaroSoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 SimuroSoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 RoboSoT and KheperaSoT . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 HuroSoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 The MiroSoT Robot Soccer System . . . . . . . . . . . . . . . . . . . . . . . 1.6 Classification of MiroSoT Robot Soccer Systems . . . . . . . . . . . . 1.6.1 Command-Based Robots . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Action-Based Robots .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 2 6 6 7 8 9 10 10 11 11 12 12 12 12 13 14 15 16 18 19 22 23 24
XIV
Contents
1.6.3 Intelligence-Based Robots . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.7 Purpose of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.
3.
Robot Soccer System: Hardware and Firmware Components . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Mechanical Movement Mechanisms . . . . . . . . . . . . . . . . . 2.2.2 Kinematics of a Two-Wheel Robot . . . . . . . . . . . . . . . . . 2.2.3 Basic Motion Control: A Circular Path Analysis . . . . . . 2.3 A Two-Wheel Command-Based Soccer Robot . . . . . . . . . . . . . . 2.3.1 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 DC Motors and Auxiliary Components . . . . . . . . . . . . . . 2.3.3 Motor Driving and Circuits . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Velocity and Duty Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Other Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 27 27 27 31 34 36 37 45 52 54 57 66 69 69
How to Sense? Use Computer Vision Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Vision Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Vision System Operations . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Sampling, Pixel, and Quantization . . . . . . . . . . . . . . . . . . 3.2.4 Gray Scale, Binary, and Colour Images . . . . . . . . . . . . . . 3.2.5 Colour Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Binary Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Computing Geometric Properties . . . . . . . . . . . . . . . . . . . 3.3.3 Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Labelling Algorithm 1: Recursive . . . . . . . . . . . . . . . . . . . 3.3.5 Labelling Algorithm 2: Sequential, 4-Connectivity . . . . 3.3.6 Size Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Vision System For MiroSoT Robot Soccer . . . . . . . . . . . . . . . . . 3.4.1 System Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Image and Physical Coordinates on MiroSoT Playground 3.4.3 Example 1: System Hardware . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Example 2: Vision Processing . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Example 3: Information Extraction . . . . . . . . . . . . . . . . . 3.4.6 Example 4: Window Tracking for Fast Vision Processing Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71 71 72 72 73 74 74 75 77 78 80 84 87 87 89 90 90 91 93 94 97 100 101
Contents
4.
5.
XV
How to Decide and Act? Use Intelligent Systems and Control Techniques . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Hybrid Control Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Role Level: The Who Issue . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Action Level: The What Issue . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Behaviour Level: The How Issue . . . . . . . . . . . . . . . . . . . . 4.2.4 Execution Level: The Motion Issue . . . . . . . . . . . . . . . . . . 4.3 Example Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Action-Level Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Role-Level Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Design of Robot Soccer Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Base Class: Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Attacker Class: Shoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Defender Class: Push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Goalkeeper Class: Block . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Control Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 PID Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Unified Navigation Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Control of a Two-Wheel Robot . . . . . . . . . . . . . . . . . . . . 4.6.2 Univector Field Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 Limit-Cycle Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103 103 104 105 105 106 107 107 108 109 109 110 111 114 115 117 117 120 121 126 130 140
How to Improve Intelligence? Use Soft Computing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Intelligence Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Search and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Knowledge Representation and Inference . . . . . . . . . . . . 5.2.3 Learning and Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Petri Net Structure and Graph . . . . . . . . . . . . . . . . . . . . . 5.3.2 Petri Net Markings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Rules for Petri Net Execution . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Example 1: Role Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Example 2: Action Level . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Q-Learning: A Model-Free Reinforcement Learning Method . . 5.4.1 Standard Reinforcement Learning . . . . . . . . . . . . . . . . . . 5.4.2 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Example 1: Role Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 A Simple Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Neural Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
141 141 141 142 143 144 146 146 147 149 149 153 159 161 162 164 167 167 169 170
XVI
6.
7.
Contents
5.5.4 Example 2: Action Level . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Evolutionary Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 The EP Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 EP and GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 EP and ES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.4 Example: Behaviour Level . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Fuzzy Logic and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 A Fuzzy Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Fuzzy Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.4 Defuzzification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.5 Example: Behaviour Level . . . . . . . . . . . . . . . . . . . . . . . . Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
170 178 179 182 182 183 188 189 189 191 193 194 204
Robot Soccer System: Software Components and Programming . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 MiroSoT Host Software Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Modular Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Modular Design Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Programming Framework: An Overview . . . . . . . . . . . . . . . . . . . 6.4 Basic Skill Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Velocity() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Angle() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Position() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Shoot() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Applied Skill Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Kick() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Goalie() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 AvoidBound() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Game Strategy Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Zone-Defence Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Univector Field Navigation . . . . . . . . . . . . . . . . . . . . . . . . 6.6.3 Limit-Cycle Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 205 206 206 206 208 211 211 213 215 221 227 227 236 242 247 247 250 254 256
Simulated Robot Soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 7.2 Client-Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 7.2.1 Server Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 7.2.2 Client Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 7.3 Kinematics Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 7.3.1 For the Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 7.3.2 For the Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 7.4 How To Run the Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Contents
XVII
7.4.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 7.4.2 Server Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 7.4.3 Client Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 7.5 Client Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 7.5.1 Basic Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 264 7.5.2 Attack Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 7.5.3 System-Defined Variables and Constants . . . . . . . . . . . . 265 7.5.4 Velocity() and Position() Functions . . . . . . . . . . . . . 266 7.5.5 Example Game Strategy Programs: NormalGame() . . . . 269 Notes on Selected References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 A. Programming the PIC16C73/73A Microcontroller . . . . . . . . 273 A.1 On-chip PWM Programming for Robot Motion Control . . . . . 273 A.2 On-chip USART Programming for Robot Communication . . . 276 B. Reference Manual for an Experimental MiroSoT System . . B.1 Vision System: Set-up and Initialization . . . . . . . . . . . . . . . . . . . B.1.1 Build Program Executable Code . . . . . . . . . . . . . . . . . . . B.1.2 Run the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.3 Set Camera Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.4 Set Ball Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.5 Set Robot Team ID Colour . . . . . . . . . . . . . . . . . . . . . . . . B.1.6 Set Robot ID Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.7 Set Playground Boundary . . . . . . . . . . . . . . . . . . . . . . . . . B.1.8 Set Robot Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.9 Set Pixel Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.10 Save Vision Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.11 Open Vision Settings File . . . . . . . . . . . . . . . . . . . . . . . . . B.1.12 Set Auto Colours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.13 Change Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 System Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.1 The SENSE Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.2 The Communication Category . . . . . . . . . . . . . . . . . . . . . B.2.3 The DECIDE-and-ACT Category . . . . . . . . . . . . . . . . .
283 283 283 285 285 288 291 293 293 294 294 295 295 295 300 302 302 312 314
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
List of Figures
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11
Hardware entities of a robot soccer team . . . . . . . . . . . . . . . . . . . . . . Basic set-up for the ‘dribbling the ball past obstacles’ test . . . . . . . FIRA robot soccer: off-board/centralized vision categories . . . . . . . FIRA robot soccer: simulation category . . . . . . . . . . . . . . . . . . . . . . . FIRA robot soccer: onboard/distributed vision categories . . . . . . . . FIRA robot soccer: humanoid category . . . . . . . . . . . . . . . . . . . . . . . . General set-up for robot soccer (MiroSoT Category) . . . . . . . . . . . . A general SENSE-DECIDE-ACT block diagram for robot soccer Command-based robot soccer system . . . . . . . . . . . . . . . . . . . . . . . . . Action-based robot soccer system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intelligence-based robot soccer system . . . . . . . . . . . . . . . . . . . . . . . . .
8 13 15 16 17 19 20 22 23 24 25
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23
Different types of ‘move by rolling’ mechanism for a mobile robot . The robot’s posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different wheel assemblies resulting in different ICR-axes . . . . . . . . Kinematics of a robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computation of the robot’s unique ICR . . . . . . . . . . . . . . . . . . . . . . . Circular path and angle of turning . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotational velocity profile of the two wheels . . . . . . . . . . . . . . . . . . . Hardware architecture of a soccer robot . . . . . . . . . . . . . . . . . . . . . . . Harware of a soccer robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal operational sequence of a microcontroller . . . . . . . . . . . . . . Handling an interrupt request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supporting architecture for a microcontroller . . . . . . . . . . . . . . . . . . Scheme separating address and data signals . . . . . . . . . . . . . . . . . . . . Input-output of a chip selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example CS equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A system address map and corresponding CS equations . . . . . . . . . Selecting the clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Device interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pin Configuration of PIC16C73/73A microcontroller . . . . . . . . . . . A DC motor ‘package’ and its exploded view . . . . . . . . . . . . . . . . . . . Working principle of a DC motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotational velocity and current versus torque . . . . . . . . . . . . . . . . . . Torque about the wheel axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 29 30 32 33 34 35 36 37 38 39 39 40 41 41 42 42 43 44 45 46 47 47
XX
2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 2.32 2.33 2.34 2.35 2.36 2.37 2.38 2.39 2.40 2.41 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18
List of Figures
Two mechanical designs of motor-wheel assembly . . . . . . . . . . . . . . . H-Bridge circuit for motor driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amplified PWM signals with different duty cycles . . . . . . . . . . . . . . PWM-based operations of H-bridge circuit . . . . . . . . . . . . . . . . . . . . . Graph showing the relationship between the wheel rotational velocity ω ¯ G and the PWM data W + required to attain it . . . . . . . . . . IR communication using ASK and IrDA1.0 methods . . . . . . . . . . . . Module block diagrams implementing the base band method for IR communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic module for IR base band communication . . . . . . . . . . . . . . . A game set-up for teams using IR base band communication . . . . . IR transmission coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circuit block diagram of a transceiver . . . . . . . . . . . . . . . . . . . . . . . . . Communication message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RF communication using the FSK method . . . . . . . . . . . . . . . . . . . . . An ALLINTEK ARFM-424 RF communication module . . . . . . . . . An ALLINTEK RF transceiver circuit for use by the host computer An RF transceiver circuit for use by each team robot (within dotted lines) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of power regulation IC chips . . . . . . . . . . . . . . . . . . . . . . . .
50 53 54 54
Basic architecture of a computer vision system . . . . . . . . . . . . . . . . . An n × m image grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The RGB colour cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A gray level image and its resulting binary images using different thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An image and its histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example showing the centre position (¯ x, y¯) of an image . . . . . . . Finding the orientation of the object . . . . . . . . . . . . . . . . . . . . . . . . . . A binary image and its labelled connected components . . . . . . . . . . The 4- and 8-neighbourhoods of a pixel at square position [i, j] . . . Examples of a 4-path and an 8-path . . . . . . . . . . . . . . . . . . . . . . . . . . An example illustrating the workings of the sequential connected components algorithm on an image . . . . . . . . . . . . . . . . . . . . . . . . . . . A noisy binary image and its resulting image after application of a size filter (Af = 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A high-level flow chart showing vision processing as a software component of a robot soccer host-system program . . . . . . . . . . . . . On mapping the image and physical coordinate points in the playground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frame grabber (Media camp 7 plus) . . . . . . . . . . . . . . . . . . . . . . . . . . Example 1: Labelled components image . . . . . . . . . . . . . . . . . . . . . . . A robot’s colour patch layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing a robot’s posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73 75 76
58 59 60 61 61 62 63 63 64 66 67 67 68 70
79 80 81 82 85 86 86 89 90 92 93 94 96 97 98
List of Figures
XXI
3.19 Basic window tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1 A hybrid control architecture for robot soccer (MiroSoT Category) 105 4.2 Situational problems encountered by role-level assigner . . . . . . . . . . 107 4.3 Basis areas of manoeuvre for zone defence strategy (Small League MiroSoT Category) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.4 Formations for Middle League MiroSoT . . . . . . . . . . . . . . . . . . . . . . . 110 4.5 Wandering action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6 SweepBall action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.7 Shoot action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.8 Cannon Shoot action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 Position To Shoot action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.10 PushBall action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.11 Position To PushBall action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.12 BlockBall action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.13 Block diagram representation of a PID controller . . . . . . . . . . . . . . . 118 4.14 Robot soccer situation: A robot (in white) should kick the ball (round) avoiding a opponent robot (in grey) . . . . . . . . . . . . . . . . . . . 120 4.15 Robot modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.16 Available velocity region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.17 Component forces for generating a potential field . . . . . . . . . . . . . . . 127 4.18 A potential field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.19 A univector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.20 Univector field for obstacle avoidance by a point object . . . . . . . . . . 129 4.21 Modified univector field for obstacle avoidance by a robot while moving towards a target point g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.22 Phase portrait of a limit-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.23 Navigation using the limit-cycle method . . . . . . . . . . . . . . . . . . . . . . . 133 4.24 Multiple obstacle situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.25 Decision of rotational direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.26 Navigation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.27 Local minima with two overlapped obstacles . . . . . . . . . . . . . . . . . . . 138 4.28 Extended navigation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.29 Robot soccer example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.1 A Petri net structure and its graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 A Petri-net graph for role assignment supervision . . . . . . . . . . . . . . . 5.3 A Petri-net supervisor for role assignment: transition firings and token redistributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Role selection: who should attack? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 A role-action structure for Petri net supervision and control . . . . . 5.6 Four key situations for the defending robot . . . . . . . . . . . . . . . . . . . . 5.7 A Petri-net graph for defending robot control . . . . . . . . . . . . . . . . . . 5.8 Predictor of target point of the ball. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 A Petri-net graph for goalkeeping robot control . . . . . . . . . . . . . . . .
148 151 151 152 153 154 156 157 159
XXII
List of Figures
5.10 A Petri-net control for robot goalkeeping: a transition firing and token redistributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.11 The standard reinforcement-learning model . . . . . . . . . . . . . . . . . . . . 161 5.12 States of the attacking robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.13 States of the defending robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.14 States of the ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.15 A Q-learning state (ra3 , θ1 , rd2 , b5 ) of a role-level strategy for robot soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.16 Schematic diagram of a simple neuron . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.17 Two sigmoid functions, with λ1 = 0, λ2 = 1 . . . . . . . . . . . . . . . . . . . . 168 5.18 A feedforward neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.19 Structure of the proposed ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.20 The process of selecting aji f s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.21 A one-a-side MiroSoT game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.22 Four situation variables characterizing ball possession . . . . . . . . . . . 176 5.23 Four situation variables representing the team (or home) robot’s winning score against the opponent robot and the risk level of conceding a goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.24 Structure of the feedforward neural network . . . . . . . . . . . . . . . . . . . 177 5.25 Pseudocode of algorithm EP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.26 Grid net of the function approximator . . . . . . . . . . . . . . . . . . . . . . . . 184 5.27 Simple example of grid net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.28 Illustration of membership functions for fuzzy values. The upper diagram shows the membership functions of cold, moderate, and hot. The middle diagram shows the membership functions for cold and moderate; the lower diagram shows the membership functions for cold or moderate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 5.29 A fuzzy PD controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 5.30 Illustration of fuzzy inference with two rules using the min-max rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.31 Shooting from the left side when the line connecting the ball to the opponent goal is on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.32 Variables for relative posture characterization . . . . . . . . . . . . . . . . . . 195 5.33 Overall fuzzy control structure for the Shoot action . . . . . . . . . . . . . 196 5.34 Desired output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 5.35 Membership functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.36 θd sampled at each input region in the vicinity of the ball at (0, 0) 199 5.37 Obstacle avoidance scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.38 Membership functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 5.39 Membership functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.40 FLC for obstacle block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.41 Membership functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.1 Host software model for a MiroSoT team . . . . . . . . . . . . . . . . . . . . . . 207 6.2 Overall program structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
List of Figures
6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15
6.16
6.17
6.18 6.19 6.20
6.21 6.22 6.23 6.24 6.25
6.26
XXIII
My Strategy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Angle or turning control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Position control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Problem of oscillation about θe = ±90◦ with Position() . . . . . . 218 Graph of Vc against de . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Different paths of the robot for shooting the ball in different desired directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Geometric relationships for calculating the robot’s desired heading angle θd at its online position (x, y) . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Problem of chattering with Shoot() . . . . . . . . . . . . . . . . . . . . . . . . . 225 A modified univector field for solving the problem of chattering . . 226 A strategy for the Kick() function . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Mapping playground subareas to desired directions of ball kick . . . 229 State S0: the desired point (pos[0], pos[1]) behind the ball the specified robot should move to. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 In state S0: to switch to state S1 if the robot is at less than a distance of Dc from the desired point (pos[0], √ pos[1]). Note that in this figure, the illustrated C0-condition ‘ dx × dx + dy × dy < Dc ’ is true. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 In state S1: to switch to state S2 if the angle error is less than a specified value Ad . Note that in this figure, the illustrated C1condition ‘|θe | < Ad ’ is false. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 State S2: the desired point (pos[0], pos[1]) the specified robot should reach in order to move through the ball’s position at a certain positive velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 In state S2: to switch to state S0 if the robot is at less than a distance of Df from the ball, or is behind the ball. . . . . . . . . . . . . . . 235 Areas of surveillance for the Goalie() function . . . . . . . . . . . . . . . 236 Desired position (estimate x, estimate y) of goalkeeping robot dy when the ball is in far-distance area, with y = · x, where dx dy = PositionOfBall[1]−130/2 and dx = PositionOfBall[0]−15.237 Desired position (estimate x, estimate y) of goalkeeping robot when the ball is in middle-distance area . . . . . . . . . . . . . . . . . . . . . . 237 Desired position (estimate x, estimate y) of goalkeeping robot when the ball is in near-distance area . . . . . . . . . . . . . . . . . . . . . . . . . 238 The two Y -coordinate boundaries of the goalkeeping robot . . . . . . . 239 Numbered exception-situations and the desired turning directions for the goalkeeping robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Labelled exception-situations characterized by specified bounds (DISTANCE BOUND, ANGLE BOUND) and wall locations, namely top and bottom walls, left and right walls . . . . . . . . . . . . . . . . . . . . . . . . . 243 Team robots’ assigned areas according to the zone-defence strategy 248
7.1 Client-server platform for SimuroSoT . . . . . . . . . . . . . . . . . . . . . . . . . 258
XXIV List of Figures
7.2 Internal architectue of client-server based simulator for robot soccer programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 SimuroSoT simulation display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 SimuroSoT client interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 SimuroSoT client connection window . . . . . . . . . . . . . . . . . . . . . . . . . .
258 262 263 264
A.1 Using CCP in PWM mode for motor driving and USART for data reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 A.2 Switch for assigning Team ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 B.1 Visual C++ window environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 MiroSoT system user-interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Colour-setting buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.4 Default colour setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.5 Colour parameter-setting window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.6 Default camera image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.7 Adjusting Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.8 Adjusting Hue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.9 Adjusting Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.10 Setting ball colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.11 Ball colour control Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.12 Well-set ball colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.13 Selecting a robot on the real-image screen . . . . . . . . . . . . . . . . . . . . . B.14 Setting a robot team ID colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.15 Setting robot ID colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.16 Setting boundary of playround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.17 Setting robot size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.18 Setting the pixel size of colours with the Set Pixel Size button . . B.19 Saving settings with [Save Vision File] . . . . . . . . . . . . . . . . . . . . . . B.20 Importing settings with [Open Vision File] . . . . . . . . . . . . . . . . . . B.21 Zooming in after clicking on Auto Set Colour button . . . . . . . . . B.22 Setting the team ID colour with auto colour setting . . . . . . . . . . . . . B.23 Setting robot ID colour with Auto Colour Setting . . . . . . . . . . . . . B.24 Setting the ID number for a robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.25 Robot colour 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.26 Menu for changing colours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.27 Change colour parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.28 Display function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.29 Set-colour function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.30 Vision function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.31 Communication function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
284 284 285 285 286 286 287 287 288 289 290 290 292 292 293 294 295 296 296 297 297 298 299 299 300 301 301 302 304 307 313
List of Tables
1.1 Robot primitives defined in terms of inputs and outputs . . . . . . . . . 3 1.2 Robot primitives and system entities for robot soccer (MiroSoT Category) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Some motor and gear characteristics (source: datasheets) . . . . . . . . 51 2.2 Pin Configuration of ALLINTEK ARFM-424 module . . . . . . . . . . . 65 2.3 Some batteries and their characteristics . . . . . . . . . . . . . . . . . . . . . . . 68 4.1 Robot primitives and hierarchy levels defined for robot soccer (MiroSoT Category) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.1 5.2 5.3 5.4 5.5
Algorithm EP for offline training of univector fields . . . . . . . . . . . . Representation of the fuzzy PD controller as a table . . . . . . . . . . . . Rules for the destination block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rules for the obstacle block, FLC1 (left) and FLC2 (right) . . . . . . Rules for right wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
187 191 197 200 202
6.1 Program functions for robot soccer system (MiroSoT category) . . . 210
1. Soccer Robotics
1.1 Introduction Soccer robotics is an emerging field that combines artificial intelligence and mobile robotics with the popular sport of soccer. In essence, it studies how mobile robots can be built and trained to play a game of soccer. It arises out of a need to find a domain that can serve as an integrated framework for the complementary purposes of research and education in multi-agent robotics. Indeed, robot soccer, a roboticized version of human soccer, has gained worldwide acceptance as an interesting and challenging domain for studying and investigating a large spectrum of issues of relevance to the development of complete autonomous robots in multi-robot systems. Since its official founding in 1997, the Federation of International Robotsoccer Association (FIRA, http://www.fira.net), an international non-profit regulating body for robot soccer, has been actively promoting the development of soccer robotics. This chapter presents the motivations and an overview of the FIRA framework for soccer robotics as a subfield of AI robotics. More specifically, it presents the ‘what’ of intelligent multi-agent robotic systems and robot soccer systems. The basic terminology and concepts used in AI robotics and multi-agent systems are defined. The domain characteristics inherent in robot soccer, and of significant relevance to the study of multi-agent robotic systems, are examined. The goals of soccer robotics in research and education of intelligent multi-agent robotic systems are explained. The various categories of robot soccer created by FIRA are described. The classification of robot soccer systems for the Micro-Robot Soccer Tournament (MiroSoT) is examined. MiroSoT is an important category of robot soccer, especially from the perspective of edutainment, and is the focus of this book. 1.1.1 Agents, Multi-agent Systems, and AI Robotics The field of intelligent multi-agent robotic systems is concerned with the study of building (artificially) intelligent robotic agents. An intelligent mobile robotic agent is an autonomous physical robot situated in a real environment, and is reactive, proactive and communicative. To elaborate, an autonomous robot requires minimal or no human intervention. In order to satisfy its design objectives, the robotic agent (or simply, the robot) is J.-H. Kim, D.-H. Kim, Y.-J. Kim, K.-T. Seow: Soccer Robotics, STAR 11, pp. 1-26, 2004 Springer-Verlag Berlin Heidelberg 2004
2
1. Soccer Robotics
• reactive in that it can perceive its environment and respond in a timely fashion to changes that occur in it; • proactive in that it is capable of taking its own initiative; and • communicative in that it is able to interact with other agents, possibly including humans. Reaction, proaction and communication are the means by which a robot interacts with its environment. A robot’s intelligence is said to emerge from such interactions. In general therefore, intelligence is not a property of the robot in isolation, but is rather a result of interplay with its environment. A software agent shares the same means of interaction. But just perhaps, the most crucial aspect that sets a robotic agent and a software agent apart is embodiment; a robot has an individual physical presence (or individual body) unlike a software agent. This spatial reality has implications in the robot’s dynamic interactions with the environment that cannot be simulated faithfully. A robot perceives its environment through sensors and acts upon that environment through actuators, usually after some decisive reasoning. Following which, the artificial intelligence (AI) in the robot, the degree of which is exhibited by the way it behaves when taking actions, can be organized naturally in terms of the three commonly accepted primitives of AI robotics, namely, SENSE, DECIDE and ACT, with ACT further subcategorized into Intelligent Control and Actuation. If a robot’s function is collecting information from its sensors, and producing an output for use by its other functions, then the function falls in the SENSE category. If the function is taking in information (either from its sensors or its own knowledge about the application domain and environment), and selecting an action for the robot to perform, the function is in the DECIDE category. Functions which produce output commands to the motor actuators fall into the ACT:Control category. Functions that drive the robot hardware to produce physical motion fall into the ACT:Actuation category. Functions under the categories of DECIDE and ACT:Control constitute the core of a robot’s intelligence. Table 1.1 defines these primitives in terms of the inputs and outputs for each primitive. A multi-agent robotic system is said to be formed when two or more robots are situated in the same environment. When a group of robots in a multi-agent system come together to share a common ultimate objective, they are said to form a team. In other words, a team is always objective-oriented. Team members (or teammates) cooperate via interactions to achieve the ultimate objective. Other robots in the same environment that have objectives opposing the team’s ultimate objective are the team’s opponents. 1.1.2 Cooperative Robot Teams Broadly speaking, the multi-agent robotic problems of SENSE, DECIDE and ACT fall within the general framework of building cooperative robot
1.1 Introduction
3
Table 1.1. Robot primitives defined in terms of inputs and outputs ROBOT PRIMITIVE
INPUT
OUTPUT
Sensor Data
Sensed Information
Information (Sensed and/or Cognitive)
Selected Actions
: Control
Sensed Information and Selected Actions
Actuation Commands
: Actuation
Actuation Commands
Physical Motion
SENSE DECIDE
ACT
teams. Ideally, a robot team should satisfy the system design requirements of robustness and fault tolerance, reliability, flexibility or adaptability, coherence and scalability. In this section, we examine each requirement to emphasize its significance in robot teams. Robustness and Fault Tolerance. Robustness refers to the ability of a system to gracefully degrade in the presence of partial system failure. The related notion of fault tolerance refers to the ability of a system to detect and compensate for partial system failures. To achieve robustness and fault tolerance, cooperative teams need to minimize their vulnerability to individual robot outages. To achieve this design requirement, first, one must ensure that critical control behaviors are distributed across as many robots as possible rather than being centralized in one or a few robots. This complicates the issue of action selection among the robots, but results in a more robust multi-robot cooperation system since the failure of one robot does not jeopardize the system’s objective entirely. Second, one must ensure that an individual robot does not rely on orders from a higher-level robot to determine the appropriate actions it should employ. Relying on one or a few coordinating robots makes the team much more vulnerable to individual robot failures. Instead, each robot should be able to perform some meaningful tasks, up to its physical limitations, even when all other robots have failed. And third, one must ensure that robots have some means for redistributing tasks among themselves when some robots fail. This characteristic of
4
1. Soccer Robotics
task reallocation is essential for a team to achieve its objective in a dynamic environment. Reliability. Reliability refers to the dependability of a system, i.e., whether it functions properly each time it is utilized. As an example of a reliability problem, consider a situation in which two robots, r1 and r2 , have two tasks, t1 and t2 , to perform. Let us assume that they negotiate for task allocation, which results in robot r1 performing task t1 and robot r2 performing task t2 . Further, suppose that robot r1 experiences a mechanical failure that neither robot r1 nor robot r2 can detect. While robot r1 valiantly attempts in vain to complete task t1 , robot r2 successfully completes task t2 . However, although robot r2 also has the ability to successfully complete task t1 , it does nothing further since it expects robot r1 to complete task t1 . Thus, as a team, the robots never achieve the objective of completing the two tasks. One would probably not term such a team as reliable, since a mere reallocation of the tasks would have led to achieving the objective. Flexibility or Adaptability. The term flexibility or adaptability refers to the ability of team members to modify their actions as the environment or robot team changes. Ideally, a cooperative team should be responsive to changes in individual robots’ skills and performances as well as in the dynamic environment. In addition, the team should not rely on a prespecified group composition in order to achieve its objective. The capabilities of the team robots can change over time due to learning which should enhance performance, or due to mechanical or environmental causes which may reduce or increase a robot’s success at certain tasks. Team members should respond to these changes in performance levels by taking over tasks that are no longer being adequately performed or by relinquishing those tasks better executed by others. Each robot must decide which task to undertake based on the actual rather than predetermined performance of the team robots for all the tasks. Robots must also exhibit flexibility in their action selection in response to the dynamic nature of their environment. Obviously, in real world environments, some changes that occur cannot be attributed to the actions of any robot team member or members. Rather, outside forces not under the influence of the robot team affect the state of the environment throughout the course of execution. These effects may be either destructive or beneficial, leading to an increase or decrease in the workload of the robot team members. The robot team should therefore be flexible in its action selections, opportunistically adapting to environmental changes that eliminate the need for certain tasks, or activating other tasks that a new environmental situation requires. Finally, the flexibility requirement also deals with the ease of deploying robot teams in various applications. The human designer should be given the liberty to form teams as desired from subsets of the available robots. However, different groups of robots may be useful for different objectives and
1.1 Introduction
5
thus the team composition can vary from one team to another. The aim is to have these robots perform reasonably well in their teams the very first time they are grouped together, without requiring any robot to have prior knowledge of the abilities of the other team members. However, over time, we want a given team of robots to improve its performance by having each robot learn how the presence of other specific robots on the team would affect its own behavior. For example, a robot that prefers to clean floors but can also empty the garbage, should learn that in the presence of another robot that is only capable of cleaning the floors, it should automatically take the role of emptying the garbage. Coherence. Coherence refers to how well the team performs as a whole in terms of whether the actions of individual agents are purposefully combined toward some unifying objective. Typically, coherence is measured by criteria such as the quality of the solution or the efficiency of computing the solution. Efficiency considerations are particularly important in teams of heterogeneous robots whose capabilities overlap, in that different robots are able to perform the same task but with quite different performance characteristics. In a highly efficient team, the team robots select tasks such that the overall team performance is as close to the optimal as possible. A team in which robots pursue conflicting actions or duplicate one another’s actions cannot be considered a highly coherent team. A coherent team, however, need not be totally free of all possible conflicts. Rather, the team robots must be able to resolve conflicts as they arise. As a simple example, conflicts do occur whenever multiple robots physically share the same workspace. Although they have the same high-level objective, they may at times try to occupy the same position in space, giving rise to positioning conflicts that need to be resolved. Clearly, robot teams exhibiting low coherence are of limited use in solving practical engineering problems. Achieving high coherence is therefore an important design objective in building cooperative robot teams. Scalability. Scalability refers to the ease with which a team can admit more team members so as to improve the overall team performance in working on its problem tasks. As an example of a scalability problem, consider a situation in which two robots, r1 and r2 , clean up some toxic wastes. During execution, two robots, r3 and r4 , are added. The extra robots, r3 and r4 , know about the existing robots, r1 and r2 . However, the existing robots do not know about these extra robots. Thus, several problems can occur, such as a degradation in team efficiency due to the interference among the robots and an increase in both communication collisions and job reallocations. To cope with these problems, considerations for scalability are necessary. In particular, the issues of complexity that arise as the team size increases must be mitigated if one is to produce a highly scalable team.
6
1. Soccer Robotics
1.1.3 Domain Characteristics Robot teams can be deployed in many application domains. The following domain characteristics present challenging problems of SENSE, DECIDE and ACT for building such distributed agent systems: 1. Cooperative domains are those in which a group of agents shares a common objective. 2. Adversarial domains are those in which there are agents with opposing objectives. 3. Real-time domains are those in which success depends on acting in time in response to a dynamically changing environment. A dynamically changing environment is one that has agents actively operating on it, and making changes in ways generally beyond the control of any individual agent. 4. Noisy domains are those in which the agents cannot accurately perceive the environment they are situated in, nor can they accurately affect it. These characteristics are inherent in the real world. Interestingly, the domain of robot soccer has these characteristics, making it particularly appropriate yet enjoyable for studying the problems of multi-agent robotic systems. Robot soccer is a game based on the modified rules of human soccer, and is played in a scaled down soccer field. In a game of robot soccer, two robot soccer teams compete by attempting to move a ball into the opponent team’s goal. The team with a higher score at the end of regulation time wins. 1.1.4 Robot Soccer Through the game of robot soccer, soccer robotics studies how multiple soccer-playing robots on each team could be built to cooperate in an adversarial environment to achieve specific objectives. The domain of robot soccer has the following characteristics: • Independent agents with the same well-defined high level objective of scoring as many goals to win the match: teammates. • Agents with conflicting well-defined high-level objective of counter-scoring as many goals to win the match: opponents. • A need for real-time decision-making. • Sensor and actuator noise. Note that in a competitive setting, the intermediate or low-level objectives of teammates and opponents can differ indeterminately. The different low-level objectives are usually associated with different roles such as defending and attacking. A teammate or opponent can dynamically assume these roles in accordance to its own team’s strategy.
1.2 The Goals of Soccer Robotics
7
The robots are assumed to have, at their disposal, the following resources: • Sensors that provide partial, noisy information about the environment. • The ability to process sensory information in order to update a world model (of the environment). • Noisy actuators that affect the environment. • Low-bandwidth, unreliable (wireless) communication capabilities. In order to cooperate well in such a domain, soccer robots must perform real-time visual recognition and tracking of moving objects, collaborate with teammates (to decide their roles of attack or defence in a dynamic game situation), navigate in coordination with teammates and in counteraction against opponents, and strike the ball in the goal-ward direction. All these demand robots that are efficient (functioning under time and resource constraints), reactive and proactive (deciding what actions to take based on strategic reasoning of the game situation, and perhaps learning and adapting from experience), communicative (as part of collaborating and coordinating with one another when deciding what actions will accomplish the low-level objectives that are beyond individual’s capabilities) and autonomous (sensing, deciding and acting as an independent system). The point to emphasize is that all these capabilities must be well integrated into a single and complete robot soccer system. Soccer robotics studies how such integrated robots can be built, using different approaches from those employed in separate research disciplines. This book provides course material for developing a soccer team of independent robots that can cooperate and work towards the ultimate objective in a complex, real-time, noisy, and adversarial environment. As a quick introduction, the system set-up considered is a robot soccer team that consists of micro-robots, a global vision system, a communication module and a host computer. Fig. 1.1 shows the hardware composition of the robot soccer team. More information on this game is given in Section 1.4.1. The rules of the game are given on the FIRA website http://www.fira.net. To build such a system successfully, this book uses a common architecture and basic robot hardware design, but emphasizes a robot soccer-programming framework on which to learn how to write programs to incorporate intelligence into the system, i.e., to integrate the abilities in a soccer team of robots to play different roles and utilize different strategies and control techniques in their behavior.
1.2 The Goals of Soccer Robotics The goals of soccer robotics are closely linked to those of the Federation of International Robot-soccer Association (FIRA), officially founded in June 19971 . FIRA actively promotes the game through organizing tournaments, 1
FIRA started as an organizing committee of the first Micro-Robot World Cup Soccer Tournament held in November 1996 at KAIST, Daejeon, Korea.
8
1. Soccer Robotics
Fig. 1.1. Hardware entities of a robot soccer team
conferences, workshops and other activities, and is an international non-profit regulating body that created the various categories of competitions and devised the game rules to reflect the state of the art in robot technology. 1.2.1 Test Bed for Robotics Research and Development (R&D) The domain of robot soccer is real-time, noisy, adversarial and cooperative, and thus provides a representative test bed for study and investigative research on the real issues of SENSE, DECIDE and ACT for intelligent multi-agent robotic systems. Although realistic simulation environments exist, it is important to evaluate physical robotic systems in order to address the full complexity of the issues. The test bed enables the direct comparisons between two different teams of robots, namely, by pitting them against each other in competitions such as those organized by FIRA. These competitions offer an independent measure of progress in intelligent multi-agent robotics. Different approaches to the same problem of robot soccer are demonstrated and evaluated in an environment with rules specified by an independent com-
1.2 The Goals of Soccer Robotics
9
mittee, rather than in a laboratory setting engineered to produce the most favorable but possibly biased results. 1.2.2 Educational Tool for AI Robotics The current study and research into building intelligent multi-agent robotic systems necessitate an integrated approach to problem solving in computing, science and engineering. Students in this area come from a broad base of computer, control and electrical engineering, computer science, mathematics, biology, physics and biomedical engineering and neuroscience. Robot soccer not only provides an experimentation test bed for multi-agent robotic systems research, but also a useful education tool for practical courses in this area. The domain of robot soccer is sufficiently complex yet accessible, with standard game rules well-defined and regulated by FIRA. Technically, the game of robot soccer makes heavy demands in all the key areas of robot technology, namely, mechanics, control, sensors, communication, and intelligence. It thus provides a sound educational, integrated project to progress students to real-world problem solving. Students will have the unique opportunity to focus on an easily understood standard problem where a wide range of these technologies would need to be developed, examined and integrated in a multidisciplinary and teamwork setting. The four key education areas involved are discussed below. Integrated knowledge. Students become quickly aware in a project on robot soccer that specialist intellectual knowledge from many pure discipline areas, such as mathematics, physics, electronics, computing, etc, must be brought together into an integrated whole. Complex systems are indeed complex, and to study their evolution and control requires an investment and application of a vast amount of multidisciplinary knowledge, forcing a teamwork approach. Teamwork. A project on robot soccer is undertaken typically by interdisciplinary teams, rather than by a single individual. This introduces participating students to a teamwork approach to problem solving which is quite lacking in many current educational systems that encourage individual problem solving within specific discipline areas. Real world issues. Through their involvement in the development of their multi-robot system, students are quickly brought into the realities of working with real-time evolution of complex, nonlinear physical systems. Unlike the many textbook problems studied and examined by students in an undergraduate curriculum, either by theory or through computer simulations where information and modelling are exact, real world problems are inherently complex systems where the modelling process is never exact, input and output data have a certain degree of uncertainty, parameter measurements may be imprecise and uncertain, and definitive control evaluation, for example, is impossible to achieve.
10
1. Soccer Robotics
Computer programs that run simulation models with exact input and output data very well may not, for example, run on a micro-robot due to its limitations in onboard processor and memory requirements. Students must learn to overcome these difficulties in practical applications and students lacking this hands-on experience greatly underestimate these real world issues. Critical thinking and creativity. The involvement and participation in building a robot soccer system brings to the student a sense of creativity and critical thinking necessary for student transition to the professional worker/researcher in our technological world. 1.2.3 FIRA Robot World Cup One basic goal of soccer robotics is to take the spirit of science and technology in AI robotics to the laymen and the younger generation, worldwide. In line with this, FIRA has its flagship event, the FIRA Cup, held annually since 1998. FIRA Cup has its predecessor in the Micro-Robot World Cup Soccer Tournament, held in 1996 and 1997 at KAIST, Korea, and is an international competition that seeks to fulfill the dual purposes of research evaluation and edutainment (education plus entertainment). Current information on this event can be found on the FIRA website http://www.fira.net. 1.2.4 Technology Transfer to New Useful Tasks The game of robot soccer allows researchers to discover and learn how to get a team of robots to sense with acuity, decide collaboratively and act in coordination within the limited context of a soccer game. The hope is that it will be possible to use or modify the same techniques and technologies to build robots that carry out other more useful tasks in industries. This should extend eventually to robots capable of working for or with humans in their envisaged roles as personal, service or field agents. Below, we enumerate these roles in various human-oriented applications which are currently subjects of intensive research and development, but which could potentially develop into gigantic enterprises, as personal computer businesses are today, to serve the needs of our 21st century society: 1. Personal Robots: homeostasis and utility oriented, e.g. as household servants, subject tutors (education) and pets (entertainment). 2. Service Robots: occupation oriented, e.g., as hospital nursing and surgery assistants, museum tour guides, restaurant waiters and service-on-demand driverless taxis (intelligent transportation). 3. Field Robots: intensive-labour or hazardous-mission oriented e.g., as construction workers, farmers and military agents.
1.3 Fundamental Motion Benchmarks for Robot Soccer
11
Central to these human-oriented applications is the need for ease and safety in operating these robots after switching them on. They should be commanded intuitively even by non-experts via multi-modal interfaces such as voice, speech, graphics and vision, and would dynamically adapt to everchanging environmental conditions. Personal and service robots should be fail-safe while field robots should be survivable. As a test bed that even laymen can easily identify with, we believe that robot soccer can help fire the imagination in terms of scientifically imitating human abilities, both mental and physical, leading to the creation of powerful technological ideas initially directed at developing a soccer team of fully autonomous humanoid robots capable of winning against the human world soccer champion team. It is envisaged that the technological know-how developed in the process could culminate in overcoming challenging issues facing the design and development of personal, service and field robots. A lot of the accumulated technology would have high potential for transfer, to eventually realizing robotic agents capable of carrying out other more useful tasks in human-oriented roles.
1.3 Fundamental Motion Benchmarks for Robot Soccer The fundamental motion of robot soccer is defined as the possibility of a robot starting from an arbitrary posture (x, y, θ) and moving to another posture (x , y , θ ) in the minimum time period possible, t, where (x, y) is the coordinate position of the centre of the robot in a Cartesian frame, and θ is the angle of the robot’s heading in that frame. The need for fundamental motion is clear: soccer robots must be able to move from where they are to different strategic postures in time during a game. For, if this cannot be done with a minimum level of accuracy and reliability, any game (cooperative) strategy will be overwhelmed by randomness for its real effectiveness to be ascertained. For this reason, FIRA has outlined a series of benchmark tests. For the purpose of performance evaluation, each benchmark test standardizes and makes explicit a (level of) skill that assumes the fundamental motion. The extent that the robots could measure up to these motion benchmarks will influence significantly the extent a proposed game strategy can be smoothly coordinated in execution. These individual skills and strategic teamwork are fundamental in the overall performance of a robot soccer team. The following briefly describes the benchmark tests. 1.3.1 Striking the Ball The Ball-Striking test requires a robot to move from (x, y, θ) to strike a stationary ball at (x , y ) to make the ball pass through (x , y ). A time period could also be added.
12
1. Soccer Robotics
In other words, the robot should be able to start from anywhere to strike a ball placed anywhere to make it go in any specified direction. This is a basic requirement to play soccer competently. It is necessary for the most elementary tasks such as kicking off, taking goal kicks and taking penalties. 1.3.2 Passing the Ball to Another Robot The Ball-Passing test requires a robot at (x, y, θ) to strike a stationary ball at (x , y ) so that another robot starting from (x , y , θ ) can strike the ball to pass through (x , y ). This is actually a very difficult test to pass. However, the skill of ball passing is the absolute minimum for robots to be able to coordinate as teammates to move the ball around from one team robot to another. Without this skill, no game strategy beyond schoolboy ‘kick-and-run’ can be effectively executed. 1.3.3 Striking a Moving Ball The Moving Ball-Striking test requires a robot to move from (x, y, θ) to strike a moving ball at (x , y , θ ) to make the ball pass through (x , y ). This test reflects a real game situation, in which the ball is moving most of the time, and sometimes it is moving quite fast, exceeding 1m/s. To hit a moving ball, the vision system must be working well at the highest possible image sampling rate, since the decision-making module has to know the speed of the ball to predict where it will be when the robot hits it. 1.3.4 Passing a Moving Ball to a Moving Robot The Passing a Moving Ball test requires a robot at (x, y, θ) to strike a moving ball at (x , y , θ ) to be struck at (x , y ) by another robot starting from (x , y , θ ) to pass through (x , y ). 1.3.5 Dribbling the Ball Past Obstacles Perhaps the most challenging, this test requires a robot to manoeuvre with the ball past a series of obstacles, without colliding with any one of them. The robot needs to plan and move with the ball through a zigzag or winding course that avoids obstacles. These obstacles simulate the opponent team robots and they could be stationary or moving. A basic test set-up, as shown in Fig. 1.2, has one test robot, one ball and two stationary obstacles. One obstacle is placed directly behind the other, at (xo1 , y) and (xo2 , y), with x01 < x02 . The Cartesian x-distance in between the obstacles is just long enough to place two imaginary robots, rotating about their individual centres, to form a straight line (through the centres of these
1.4 Categories of Robot Soccer
13
Y
Obstacle1 (x 01, y) Robot (x, y, θ)
Obstacle2 (x02, y)
0
X
Fig. 1.2. Basic set-up for the ‘dribbling the ball past obstacles’ test
four objects). The test robot is placed at (x, y, θ), in front of the obstacle at (xo1 , y), with x < xo1 ; the Cartesian x-distance in between them is just long enough to place one imaginary robot rotating about its own centre to form a straight line. All imaginary robots replicate the test robot. The ball is placed directly in front of the test robot. The test then requires the robot at (x, y, θ) to push the ball around and through the two obstacles in an ‘S’-like path to a good posture behind the obstacle at (xo2 , y).
1.4 Categories of Robot Soccer In this section, we briefly describe the categories of robot-soccer created by FIRA. Each category is a tournament with a defined set of operating rules and physical or simulated conditions that lend R&D focus on certain aspects of developing robot soccer systems, but in a competition setting that people from all walks of life can easily understand and enjoy. The FIRA Cup is one such major event that runs these tournaments on an annual basis. The detailed game rules of each category are available on the FIRA website http://www.fira.net. These categories, taken together, reflect the state of the art in AI robotics from the robot soccer perspective. They are by no means fixed, and will evolve in tandem with the R & D progress in robot technology.
14
1. Soccer Robotics
1.4.1 MiroSoT and NaroSoT The Micro-Robot Soccer Tournament (MiroSoT) and the Nano-Robot Soccer Tournament (NaroSoT) are the two categories of robot soccer that use a vision camera overlooking the playground to enable global (i.e., off-board and centralized) vision processing. The set-up of an overhead camera emulates an accessible environment, i.e., one in which complete information about the environment can be obtained if one wishes to. It considers the fact that in current vision technology, cameras mounted onboard the robots cannot deliver the same quality of position information as simply as an overhead camera. The intention of the overhead vison camera is to simplify the process of gathering visual information so that the main focus can be placed on the other two components, DECIDE and ACT, of the system. These two categories are of interest to those whose research or edutainment programmes would be held back by distributed (or localized) vision problems without the simple expediency of an overhead camera. There are two leagues in MiroSoT, namely, a small league (3-a-side, i.e., a team has 3 team robots, inclusive of the goalkeeper) and middle league (5-a-side); for each league, the number of robots specified for a team includes the goalkeeper. The size of each robot is limited to a cubic box of 7.5cm × 7.5cm × 7.5cm. The small league made its debut in 1996, while the middle league made its debut in 2001. Fig. 1.3(a) shows the snapshot of a game of Middle League MiroSoT. There is currently only one league in NaroSoT, namely, a middle league (5-a-side). This league made its debut in 1998. The size of each robot is smaller, and is limited to a rectangular box with a square base of 4cm × 4cm and a height of 5.5cm. Except for the size of the robot and the playground, the game rules for Middle League MiroSoT and Middle League NarosoT are quite similar. Fig. 1.3(b) shows the snapshot of a game of Middle League NaroSoT. Once the fundamental problems of robot motion can be efficiently solved, it is hoped that a full 11-a-side league would eventually emerge under this set-up. This is one good reason for introducing NaroSoT for, to put more robots on the playground, a practical option is to make the robots smaller rather than the playground bigger. The playground for the Middle League MiroSoT is the largest. Experience has shown that up to this size, robots can be quite easily accessed and positioned from the sidelines for games and experiments. Anything larger would require people to step on the playground, and make the playground ungainly and difficult to move. Besides, the camera only needs to be positioned about 2 metres above the playground to capture the full view of the playground without excessive optical distortion due to the lens. A larger playground would require the camera to be positioned higher than most conventional buildings would feasibly allow. MiroSoT is the focus of this book. More will be said of MiroSoT from Section 1.5 onwards.
1.4 Categories of Robot Soccer
15
(a) MiroSoT
(b) NaroSoT Fig. 1.3. FIRA robot soccer: off-board/centralized vision categories
1.4.2 SimuroSoT The Simulated-Robot Soccer Tournament (SimuroSoT) is MiroSoT played in a computer simulated environment. Without robot and vision hardware, the problems of sensing and acting are reduced to non-issues, and it becomes possible to focus on game strategy development for the two bigger leagues in SimuroSoT, namely the middle league (5-a-side) and the large or full league (11-a-side). Both leagues made their official debuts in 2001. The SimuroSoT framework consists of a network of three computers. One computer is configured as a server that simulates vision processing and physical motion of the robots and the ball, with a monitor screen that displays the game situation (i.e., the two competing team robots on a MiroSoT-simulation playground). The other two computers serve as clients, each assigned to a different competing team. Each client loads and runs a program that executes the designed game strategy of the team. Dynamic information on postures
16
1. Soccer Robotics
(a) SimuroSoT Fig. 1.4. FIRA robot soccer: simulation category
(i.e., the coordinate positions and directions of move) of the robots and the ball are passed as input to the client programs which then execute the necessary team game strategy, and pass control information back to the server computer that updates its monitor display accordingly. Fig. 1.4 shows the screen snapshot of a simulation game in Middle League SimuroSoT. A computer simulation game, this category is decidedly a competitive test of complex strategy development (the DECIDE component) using advanced AI techniques. 1.4.3 RoboSoT and KheperaSoT The Robot Soccer Tournament (RoboSoT) made its debut in 2001. Each team can consist of one, or up to three robots of which one can be the goalkeeper. The size of each robot is limited to a rectangular box with a square base of 20cm × 20cm and a height of 40cm This differentiates from a similar but one-a-side tournament, the Khepera robot Soccer Tournament (KheperaSoT), which made its debut in 1998. Khepera is the name of a commercially available robot; it is much smaller, has a vertical cylindrical shape, and has also been used for other research and education purposes in autonomous mobile robotics. Although KheperaSot is primarily intended for Khepera robots, any similar cylindrical robot is allowed as long as its base diameter does not exceed 6 cm. Fig. 1.5 shows the respective snapshots of a game of RoboSoT and KheperaSoT. A major difference of these two categories from the MiroSoT/NaroSoT categories is that in the physical set-up, the overhead camera is abandoned in favour of vision processing systems onboard each team robot. With this onboard / distributed vision-based set-up, the environment becomes inaccessible
1.4 Categories of Robot Soccer
17
(a) RoboSoT
(b) KheperaSoT Fig. 1.5. FIRA robot soccer: onboard/distributed vision categories
in that each robot will always only have a partial view of the environment. Therefore, this set-up offers a unique opportunity to promote research and development in distributed (or local) vision processing as an equally important aspect of robotic agents. These categories also require the team robots to be endowed with the local capabilities to decide and act. In other words, the SENSE-DECIDE-ACT components are (required to be) distributed in the team robots, making them truly autonomous as individual agents. Because of increased complexity, the level of play in these two categories, as demonstrated in past international tournaments thus far, has been inferior to the MiroSoT/NaroSoT categories, but they are much closer to the research goals of intelligent multi-agent robotic systems. Below, we briefly discuss some non-trivial problems of local sensing and local decision-making problems not found in centralized vision-based systems that teams in the FIRA RoboSoT and KheperSoT categories must contend with.
18
1. Soccer Robotics
• In local visual sensing, at the individual agent level, object recognition becomes a very difficult problem due to occlusion of objects and the fact that the same object to be tracked can appear very different in terms of shape and size from different views of the camera on-board a robot; these different views are due to the relative motion of the environmental objects and the robot. At the team level, a mobile robot now needs to dynamically decide which view to focus in relation to its team members’, in order to fulfill the team’s overall visual needs. Besides, broadcasting of time-critical visual information to all other team members over an inherently bandwidth-limited communication network is often infeasible without sacrificing real-time requirements. To overcome this, each agent would need to determine and only send to those team members that need the information it acquires. All these problems have technical implications in the development of robot navigation techniques aimed at attaining a robot’s desired posture while avoiding obstacles in its way. • In local or distributed decision-making, the moving robot would need to keep track of its own position and obtain critical information from its team members to complement its own. Based on such information, it can then determine its current ability with respect to all the available actions. It needs to constantly update this capability status and other information in order to work cooperatively (i.e., in collaboration and coordination) with its team members, to arrive at good joint decisions made in real-time, and under various trade-offs to achieve different objectives. Clearly, from the perspective of teamwork, central to the problems of local visual sensing and local decision-making is the general problem of cooperation. The idea of cooperation presents numerous challenges as it has to be carried out among distributed robotic agents with limited sensing capabilities, over a bandwidth limited communication network. This is an important area of active research, and for a start, the interested reader might want to refer to the survey paper [1]. 1.4.4 HuroSoT Fig. 1.6 shows some robot participants at the first Humanoid Robot Soccer Tournamnent (HuroSoT). A category at the stage of infancy, HuroSoT made its debut in 2002. It is the only category in which the robots assumes the primitive form of humans, from which the term humanoid is derived. These robots do not move on wheels, but are biped (i.e., ‘two legged’), admitting critical problems of dynamic motion control and balancing not found in the other categories. The HuroSoT initiative aims to stimulate and promote research and development in humanoid (biped) robotics. Because of the current state of the art, this game is still quite far away from an actual soccer game. Presently, the competition is organized as a series
1.5 The MiroSoT Robot Soccer System
19
(a) HuroSoT Fig. 1.6. FIRA robot soccer: humanoid category
of tests, including robot dash, penalty kick and obstacle avoidance. Robot dash is a sprint event; in penalty kick, the robot participants have to kick a ball into an empty goal and in obstacle avoidance, they have to avoid obstacles simulating stationary opponent players. These tests are to be seen as preparations for humanoid robot soccer in subsequent years of development. The format of the HuroSoT will actively evolve in tandem with the state of the art developments in humanoid robotics. Although still a distant dream, it is hoped that the game will eventually evolve into one with two competing humanoid robot teams playing a full game of soccer.
1.5 The MiroSoT Robot Soccer System Fig. 1.7 shows the general set-up for the Micro-Robot Soccer Tournament (MiroSoT). The set-up is a combination of mobile (small-sized) micro-robots, a vision camera overlooking the playground connected to a centralized host computer, and a wireless communication module connecting the host computer to the robots. The fact that visual sensing is achieved by a video camera that overlooks the complete playground offers an opportunity to get a global view of the dynamic game situation. This set-up may have simplified the sharing of information among multiple robots, but it still presents a real challenge for reliable and real-time processing of the movement of multiple moving objects, namely, the ball, the team robots as well as the robots on the opposing team. In a well-defined processing cycle, the global vision system perceives the dynamic game situation and processes the image frames, giving the postures
20
1. Soccer Robotics
Fig. 1.7. General set-up for robot soccer (MiroSoT Category)
of each robot and the ball (the SENSE functionality); the decision-making program module, given this information, uses its strategic knowledge to decide what action each robot has to take next (the DECIDE functionality); the intelligent control module, based on the selected action or sensed information, determines the actuation commands which are communicated to the individual robots that translate them into physical robot motion using their resident actuation routines (the ACT functionality). To adapt to the dynamic game situation, the team robots under control should always change their desired postures2 , and reactively determine the appropriate paths towards these desired postures. In order not to miss critical visual information that helps the team robots adapt in time to the dynamic game situation, the moving objects in the playground must be tracked as closely and as accurately as possible, and this necessarily requires fast sampling of the image frames of the game situation captured by the vision camera. Clearly, the processing cycle time (of sensing, deciding, controlling and communicating) must not exceed this sampling time in order not to miss any image frame. To satisfy this, the processing cycle time must be kept correspondingly short for a fast (image frame) sampling rate. Keeping this processing cycle time short while attempting to increase the overall intelligence of the system is a challenging constraint to handle in robot soccer, and in real-time multi-agent robotic systems in general. Table 1.2 maps the robot primitives of SENSE-DECIDE-ACT onto the robot soccer domain and the possible system (hardware) entities; the 2
Of course, what is ‘desired’ is decidedly a subjective opinion of the system designer. In general, such an opinion is coded in the system as a strategy formulated in terms of the positions of the other team robots, the opponent team robots and the ball.
1.5 The MiroSoT Robot Soccer System
21
Table 1.2. Robot primitives and system entities for robot soccer (MiroSoT Category) PRIMITIVE
ENTITY
INPUT
OUTPUT
SENSE [S]
Vision Camera and Host Computer (Vision System)
Field of View
Robots and Ball Postures
DECIDE [D]
Host Computer or Robots
Robots and Ball Postures and/or Strategic Knowledge
Selected Actions
Host Computer or Robots
Robots and Ball Postures and Selected Actions
Desired Wheel Velocities
Robots
Desired Wheel Velocities
Physical Motion
with Feedback Control
Physical Motion
Current Wheel Velocities
: Control [C]
ACT [A] : Actuation [M]
wireless communication that always exists between the host computer and the micro-robots is implicit. Fig. 1.8 shows a typical block diagram of an organization relating these primitives. There are many ways to design and implement this organization. The flexibility is due to one salient feature of the organization, namely, it does not enforce the cyclic operational sequence of SENSE, DECIDE, ACT, but allows DECIDE to modify the SENSE and ACT couplings as needed. It is easy to see from the block diagram why intelligent control is sometimes called outer-loop control and actuation is sometimes called inner-loop control. In outer-loop (robot) control, regardless of the techniques used, there are conceptually two major steps, namely, desired posture generation and posture control. In inner-loop control, encoders (together with some auxil-
22
1. Soccer Robotics ACT
Selected actions
Intelligent Control
DECIDE
Posture Generation
Desired postures of objects
Σ
+
Control
Desired wheel velocities
Σ
+ -
-
Actuation
PWM signals
Wheel velocities
Motors Motor rotations
REAL WORLD
Encoders Status of actions
Postures of objects
SENSE
Field of view
Fig. 1.8. A general SENSE-DECIDE-ACT block diagram for robot soccer
iary devices) are needed to quantify the physical motion of the wheels (input) and measure their current velocities (output) as feedback information; these devices would be covered in Chapter 2. Strictly speaking, an encoder is a motion sensor and thus comes under the SENSE primitive. However, for conceptual neatness as laid out in Table 1.2, and as it is better to present motion sensors as an integral part of a MiroSoT robot hardware, we classify motion sensing as part of actuation under the ACT primitive. The SENSE primitive is viewed exclusively as sensing the real outer-world or environment in which the motors run. The actual entity for each primitive depends on the system of play used that we will elaborate next.
1.6 Classification of MiroSoT Robot Soccer Systems The MiroSoT set-up supports three systems of play, namely, command-based robot system, action-based robot system and intelligence-based robot system. Each system has a centralized (software) component for (visual) sensing [S], and is characterized by whether the other two key components of deciding [D] and acting [A] are centralized in the host computer or distributed in the team robots. Here, intelligent control [C] and actuation [M] - the subcategories of primitive A - are used to help characterize or classify the robot soccer systems. The following describes the system classification in some detail, and also discusses the relative advantages and disadvantages of the systems.
1.6 Classification of MiroSoT Robot Soccer Systems
23
1.6.1 Command-Based Robots In the command-based robot system, the S-D-C components are centralized in the host computer; only the M component is distributed in the (hardware of the) team robots. In other words, the core intelligence (due to D and A:C) of the system resides in the host computer, as depicted in Fig. 1.9. In this system, each robot is similar to an RC-car (radio-frequency control car), but is under intelligent control resident in the host computer. Only a one-way wireless communication link is required, to transmit actuation commands from the host computer to the team robots. Of the three systems of play, this system is the most economical one to build; no visual sensor is mounted onboard the robots; the major efforts are focussed on writing software programs for strategic cooperation (and communication) amongst the robots. But it is also the one with the heaviest computational load in the host computer. However, increasingly, this is becoming a non-critical issue because of the availability of low-cost but powerful personal computers. This system should be preferred by many who have some knowledge of multi-agent systems and computer vision, and whose primary aim is to learn the basics of robot soccer or participate and win in a major tournament.
Command
Fig. 1.9. Command-based robot soccer system
24
1. Soccer Robotics
1.6.2 Action-Based Robots In the action-based robot system, the S-D components are resident in the host computer but the A component is distributed in the team robots. In other words, the core intelligence is split between the D component in the host computer and the A:C subcomponent distributed in the robots, as depicted in Fig. 1.10. In the command-based robot system, the control functions reside in the host computer. In this system, they are embedded in each robot, thereby reducing the host computer’s computational load. As in the command-based robot system, only a one-way wireless communication link is required, but is used instead to transmit visual information or action commands (i.e., selected actions to take) from the host computer to the team robots. This system is analogous to a human instructing a well-trained dog. Here, the human instructor plays the role of the host computer and the dog plays the role of the robot. The human only needs to give instructions without having to know the detailed movements of the dog. This makes it easier for the instructor since the dog knows how to act autonomously such as avoiding collision with any obstacle. This system should be favoured by those whose research emphasis is on robot control programming to build soccer robots individually capable of executing a selected action autonomously.
Action
Fig. 1.10. Action-based robot soccer system
1.7 Purpose of This Book
25
1.6.3 Intelligence-Based Robots In the intelligence-based robot system, only the S component remains centralized in the host computer; the D-A components are distributed in the team robots. In other words, the core intelligence of the system is distributed in the robots, as depicted in Fig. 1.11. The computational load in this system is the most distributed and wellbalanced in the individual robots and the host computer, with no computational load heavily concentrated in any one hardware entity. In principle, therefore, this system is the most scalable in terms of the number of team robots. It is also the closest to the notion of an intelligent multi-agent robotic system. Not only is a one-way wireless communication link required to transmit visual information from the host computer to the team robots; two-way wireless communication links are also needed to enable cooperative exchange of strategic information among the team robots. But many research issues remain open on how the team robots can effectively communicate and cooperate to achieve their ultimate objective. As a result, such robots are currently difficult, if not impossible, to build. This system is a good infrastructure for researchers who intend to use robot soccer as a test bed to develop or improve the techniques of multi-agent systems and distributed agent communication for real world applications, but under the assumption that the operating environment is accessible, as provided for by the overhead vision camera.
Posture
Fig. 1.11. Intelligence-based robot soccer system
1.7 Purpose of This Book This book presents course material that discusses in detail, the concept, design and construction of appropriate vision algorithms (SENSE), strategies
26
1. Soccer Robotics
(DECIDE), micro-robots (the hardware for sensing and movement) and control and actuation algorithms (ACT) devised for a command-based robot soccer system for MiroSoT. The cooperative strategies include the robots organizing themselves in formations, engaging in zone defence and continually switching specific roles to achieve their common objective. The intelligent control techniques to carry out some designed actions include novel navigation methods of a robot manoeuvring towards a desired posture, without colliding with any other robot. The general architectures, techniques and tools available for developing a robot soccer system are described and their applicability to the various aspects of robot soccer demonstrated as much as it is possible. The ‘C’ source code of an experimental (command-based) robot soccer program (for MiroSoT) is available for download from the FIRA website http://www.fira.net. The details of all major algorithms implemented in the robot soccer program are presented throughout the book to enable students to learn and build their own robot soccer systems from scratch, and up to competitive standards. A reference manual for the experimental system is given in Appendix B; the information provided therein should compress the learning curve in the practical design and development of a MiroSoT robot soccer system.
Notes on Selected References The textbook [2] presents an excellent introduction to AI robotics under the SENSE-DECIDE-ACT paradigm. For an advanced treatment of the same material, refer to the book [3]. The DECIDE primitive is referred to as the PLAN primitive in these two books. A special column, The Robot Competition Corner, of the International Journal of Robotics and Autonomous Systems, Elsevier Science, publishes information about the robot soccer competitions organized by FIRA since 1996, their interesting highlights and results, among other competitions’. A special session, Entertainment Robotics, of the 1997 IEEE International Conference on Robotics and Automation, presented a list of papers [4, 5, 6, 7] reporting early research efforts on robot soccer. Two special journal issues contain selected papers from workshop participants of the inaugural event MiroSoT’96 [8] and MiroSoT’97 [9]. A survey on multi-agent systems from the robot soccer perspective is presented in [10]. Two books, written in Korean and published under the auspices of FIRA, provide a comprehensive introduction to robot soccer systems [11] and cover the engineering know-how of building MiroSoT systems [12]. Material on the benchmark tests for the fundamental motion of robot soccer is taken from [13].
2. Robot Soccer System: Hardware and Firmware Components
2.1 Introduction In a robot soccer game, individual team robots must be capable of communicating and moving in real time. That a soccer robot could receive and execute various motion commands, driving itself into proper postures at the right times, is crucial to the success of physical team coordination. Such real time capabilities need to be enabled by the hardware and firmware for a soccer robot, integrated in a suitable robot architecture. This chapter presents the necessary background on the mechanical motion and basic architecture of a mobile robot. On the former, some mechanical structures for robot mobility via ‘rolling’ are examined; the kinematics of a differential-drive (two-wheel) robot is then presented. On the latter, the essentials of electronic hardware and firmware needed to build a differentialdrive robot of a MiroSoT system are covered in sufficient detail.
2.2 Mobile Robots A mobile robot is defined as one capable of locomotion on a surface solely through the actuation of a movement mechanism on which the robot is mounted, and which is in contact with the surface. This definition encompasses every robot equipped with a movement mechanism, including a wheelbased robot, a six-legged walking robot that assumes the shape of an insect, and a two-legged robot that assumes the shape of a human. 2.2.1 Mechanical Movement Mechanisms The characteristic movements exhibited by a robot depend on the design of the mechanical structure for its movement mechanism. It is therefore important to choose the right type of structure for a robot’s movement mechanism in a given application. Here, we concentrate only on the different types of ‘move by rolling’ mechanism for a robot, as shown in Fig. 2.1. Fig. 2.1(a) shows a commonly used differential-drive wheel-based mechanism. A robot on this mechanism can turn smoothly. However, the robot J.-H. Kim, D.-H. Kim, Y.-J. Kim, K.-T. Seow: Soccer Robotics, STAR 11, pp. 27-70, 2004 Springer-Verlag Berlin Heidelberg 2004
28
2. Hardware Components
(a) Wheel
(b) Caterpillar
(c) Omni-directional Fig. 2.1. Different types of ‘move by rolling’ mechanism for a mobile robot
tends to slip easily because of the small contact areas its wheels make with the floor surface. Fig. 2.1(b) shows a caterpillar-based mechanism. A robot on this mechanism can move in a straight line easily, but is unable to negotiate sharp bends. Fig. 2.1(c) shows a 3-wheel mechanism, with the rim of each wheel lined with small bar rollers placed parallel to the rotation axis. Unlike the previous two mechanisms, a robot on this mechanism can move freely in any direction with proper rotation of the three wheels. A robot with this characteristic mechanism is called an omni-directional mobile robot. Such a movement mechanism is, however, difficult to construct due to its structural complexity. The mechanisms shown in Fig. 2.1(a) and Fig. 2.1(b) are the most commonly used in general, and are also popular with designers of soccer robots for the FIRA tournament series (except HuroSoT). Wheel Mobile Robots. In this section, we analyze how different wheel assembly designs for a wheel mobile robot can affect the turning ability of a robot that has a rigid body. A wheel assembly is a movement mechanism or device which provides or allows motion between its mounted robot and the surface, on which each wheel is intended to have a single point of rolling contact. The mechanical flexibility (to move and turn) determines if the robot is capable of attaining an intended posture, depicted in Fig. 2.2 and defined by P as follows:
2.2 Mobile Robots
x P = y , θ
29
(2.1)
where (x, y) is the coordinate position of the robot’s centre in the Cartesian X1-Y1 frame, and θ, called its heading angle, is the angle of orientation of the robot in X1-Y1 frame, defined by the angle the robot’s heading in the X2direction makes with the X1-axis. By convention, the heading angle increases counter-clockwise.
Y1 Y2
θ
X2
X1 Fig. 2.2. The robot’s posture
In order not to clutter the analysis, some simplifying assumptions are made: The navigation floor of the robot (i.e., the floor on which the robot moves) is a flat horizontal surface; (the plane surface of) each wheel of the robot is perpendicular to the floor. The condition of pure rolling is also assumed. Pure rolling refers to rolling without slipping in any direction, including the direction of motion. Conceptually speaking, a slip is said to occur if in the consecutive positions of a moving wheel-based robot, a point on the circumference of each wheel comes into contact with more than one point on the floor. Most wheels, with the exception of the Swedish wheel, satisfy this condition within reasonable tolerances. A Swedish wheel is designed specially with embedded ball rollers in its rim to admit the freedom of moving in any direction, including that perpendicular to the plane surface of the wheel.
30
2. Hardware Components
To explain more concisely and without ambiguity, the following terminology needs to be defined: • A wheel’s surface normal refers to the imaginary line that is perpendicular to (the cross section of) the wheel surface, and passes through the centre of the wheel surface. • A robot’s instantaneous centre of rotation (ICR) refers to the intersection point of the surface normals of any two wheels of the robot. That it is instantaneous suggests that the ICR can shift as the robot moves. • A robot’s unique ICR refers to the common ICR at which the surface normals of all the robot’s wheels intersect. • A robot’s unique ICR-axis refers to the line that is perpendicular to the navigation floor and passes through the robot’s unique ICR. For a robot with two or more wheels, more than one ICR may exist, but the robot can turn provided a unique ICR exists (with respect to its wheel assembly). This states the mechanical principle on which a wheel-based robot turns. The unique ICR-axis is the vertical line about which the robot can turn. Clearly, the further the unique ICR is from the robot, the less pronounced is the curvature of the turn. We now analyze four robots with different wheel assemblies, as shown in Fig. 2.3.
Fig. 2.3. Different wheel assemblies resulting in different ICR-axes
In Fig. 2.3(a) is a robot with a unique ICR. The robot’s wheel assembly is similar to that of an automobile; the robot can turn left about the unique ICR-axis which exists on its left-hand side. Fig. 2.3(b) shows a robot on two parallel wheels, with the centres of these wheels aligned, resulting in a common surface normal. For this wheel assembly, the unique ICR can exist at any point on this line, and is determined (marked off on the line) by the turning speed induced by the difference in the velocities of the two wheels.
2.2 Mobile Robots
31
Fig. 2.3(c) shows a robot with no unique ICR. Any robot with this wheel assembly cannot move in any direction and always stays ‘locked’ in any position it is placed. Fig. 2.3(d) shows an assembly similar to that of Fig. 2.3(b) but with the wheels’ centres not aligned. The unique ICR in this case is, in principle, located at infinity since the two parallel wheel surface normals are deemed to intersect there. This implies that the robot cannot turn, but move forward. When building a more compact robot (e.g. a NaroSoT robot), the wheel assembly shown in Fig. 2.3(d) may be chosen. Robot turning is possible in practice because the two wheels attached to the same mechanical body always create some slip. The slip is due to ‘pull and push’ that result whenever the wheels move at different velocities. But since the left side is not symmetrical to the right side, an analysis to determine the unique ICR is difficult. 2.2.2 Kinematics of a Two-Wheel Robot We now analyze the kinematics of a MiroSoT soccer robot that moves on a two-wheel mechanism as shown in Fig. 2.3(b). Robot kinematics refers to a robot in motion, analyzed in terms of the mathematical relations between the robot’s position and its wheels’ velocities, without reference to force and mass. Consider a MiroSoT soccer robot as shown in Fig. 2.4. If the rotational velocities of the left and the right wheels are ωL and ωR respectively, then assuming no slipping of the wheels, the wheel velocities at the respective contact points are VL = rωL ,
VR = rωR ,
(2.2)
where r is the radius of the wheel. Let ν be the velocity of the robot at its centre and ω be the turning velocity of the robot (about its unique ICR-axis). Then, ν, ω, ωR and ωL have the following relationship: VL + VR ωL + ω R ν= =r ; 2 2 VR − VL ωR − ω L ω= =r , L L where L is the distance between the two wheels. [x have the following relationship: 1
(2.3)
y
For notational convenience, we also denote it by (x, y, θ).
θ]T 1 and [ν
ω]T
32
2. Hardware Components
Y
VL v
VR
θ y
0
x
X
Fig. 2.4. Kinematics of a robot
x˙ cos θ ˙ P = y˙ = sin θ 0 θ˙
0 ν 0 . ω 1
(2.4)
Eq. (2.4) is the kinematics equation of the robot, where
U=
ν ω
(2.5)
is the control vector input. Along the line orthogonal to the plane of the two parallel wheels (i.e., the line passing through the centres of these two centrally-aligned wheels), the resultant component velocity is given by
x˙ ˙ H · P = sin θ − cos θ = x˙ sin θ − y˙ cos θ, y˙
(2.6)
where H is the unit vector orthogonal to the plane of the wheels. Obviously, there is no motion along this line, hence: H · P˙ = 0.
(2.7)
2.2 Mobile Robots
33
Eq. (2.7) is called the nonholonomic constraint of the robot. It can be rewritten as:
tan θ =
y˙ . x˙
(2.8)
Now, Eq. (2.4) has 3 component equations but only two input control variables ν and ω (obtainable from Eq. (2.3), given the related wheel velocities VR and VL ). The holonomic constraint is not an independent equation as it can be obtained by combining the first two component equations of Eq. (2.4). This explains why, in general, no control solution is guaranteed to move the robot from a given posture (x, y, θ) to a desired posture (x , y , θ ).
R VL
VR
ICR R1
L R2
Fig. 2.5. Computation of the robot’s unique ICR
Instantaneous Turning Radius. Consider the two-wheel robot shown in Fig. 2.5. As this robot has a rigid body, VL = R2 ω,
VR = R1 ω.
(2.9)
Since L is the distance between the two wheels, it is easy to deduce from Fig. 2.5 that: R1 = R −
L ; 2
R2 = R +
L , 2
(2.10)
where R is the turning radius of the robot. Substituting R1 , R2 of Eq. (2.10) into Eq. (2.9) and eliminating ω, we obtain the following formula for turning radius R: L R= 2
VL + VR VR − VL
.
(2.11)
The robot moves in a straight line if VL = VR , implying from Eq. (2.11) that R −→ ∞. It turns about its own centre if R = 0, implying, from Eq. (2.11), that VL = −VR .
34
2. Hardware Components
2.2.3 Basic Motion Control: A Circular Path Analysis Consider the two-wheel robot shown in Fig. 2.6. Suppose we want the robot to move from one position to another along a circular path. To analyze circular motion control, it is convenient to denote the relative coordinate positions of the robot by (R, 0) and (R, ϕ), which specify its current and desired coordinate positions respectively. ϕ is called the robot’s turning angle about its unique ICR-axis.
D R
ϕ
Fig. 2.6. Circular path and angle of turning
The robot is assumed to be stationary at (R, 0), say at time t0 . In applying the wheel (rotational) velocity set (ωR , ωL ) (an equivalent to control input [ν ω]T by Eq. (2.3)), the robot’s motor-driven wheels will first accelerate to attain and then cruise at these applied velocities, before decelerating to a halt at the desired (R, ϕ). This gives rise to a velocity profile which, for a simple analysis, is reasonably approximated by a piece-wise linear function, as shown in Fig. 2.7. Based on this profile, the length of the circular path D is computed with reference to Eq. (2.3), as follows:
2.2 Mobile Robots
ω
35
ω ωR
ωL
t0
t1
t2
t3
Time
t0
(a) Left Wheel
t1
t2
t3
Time
(b) Right Wheel
Fig. 2.7. Rotational velocity profile of the two wheels
t3
D=
ν dt t0 t3
VL + VR dt 2 t0 t3 ωL + ω R = r dt 2 t0 t3 t3 1 = r ωL dt + ωR dt 2 t0 t0 1 = r (ωL + ωR ) (t3 − t0 + t2 − t1 ) . 4 =
(2.12)
By Eqs. (2.9) and (2.11), the turning radius R is given by:
R=
L 2
VL + VR VR − VL
=
L 2
ωL + ω R ωR − ω L
.
(2.13)
Hence, the desired turning angle ϕ (in radians) is determined by:
ϕ=
D r = (ωR − ωL )(t3 − t0 + t2 − t1 ). R 2L
(2.14)
Note that only when the ratio of the two wheel velocities is constant, such as if t0 = t1 and t2 = t3 , will the robot move in a perfectly circular path. As a final remark, cirular motion control is fundamental in some navigation methods such as a recently proposed limit-cycle method; the essence of this method comes from the observation that when a robot is moving towards a specified position, it can avoid colliding with any obstacle by turning clockwise or counter-clockwise around the obstacle. More will be said of this method in subsequent chapters.
36
2. Hardware Components
2.3 A Two-Wheel Command-Based Soccer Robot Fig. 2.8 shows the hardware architecture of a two-wheel command-based soccer robot. The essential components of the architecture are 1. a microcontroller (with supporting logic devices and associated circuitry), 2. two motors (with gears and wheels) and two motor drivers, 3. a wireless communication (receiver) module and a power system. For educational purposes, we omit inner-loop control (see Fig. 1.8) that would have required encoders (and encoder counters) for feedback sensing needed for determining the robot’s current wheel velocities. Fig. 2.9 shows three views of a physical robot built using this hardware architecture. The robot designed comprises of two printed circuit boards (PCBs), a rechargeable battery, two small DC motors (including gears and wheels) and a mechanical housing frame. Mounted on the bottom PCB are the power regulator and the IC chip for driving the motors. Mounted on the top PCB are the microcontroller, the wireless communication module and the associated circuitry. The communication module is a receiver that establishes
Fig. 2.8. Hardware architecture of a soccer robot
2.3 A Two-Wheel Command-Based Soccer Robot
37
a one-way link with a transmitter at the host computer. The rechargeable battery supplies power to the motor driving and other onboard logic circuitry. The (gear box accompanying the) motor used for this robot design has a gear ratio of 130 : 1 (motor rotations : wheel rotations). 2.3.1 Microcontroller This section focusses on the microcontroller which is the main processing unit in the robot hardware. We shall discuss the following aspects of a microcontroller: • • • • • •
basic functions, supporting architecture, data and address buses, chip selector, clock and interfacing.
Finally, we briefly describe a microcontroller chip, and elaborate the use of the microcontroller’s on-chip PWM (pulse width modulation) for actuating the motors, and its on-chip serial communication interface for receiving data. Basic Functions. There are 4 basic functions in a microcontroller, namely reset, instruction fetch, instruction execution and service interrupts. 1. Reset: This function sets the program counter (PC) to the starting address held in the reset vector. You may think of an address as a pointer to a memory location. The PC holds the physical address of the next instruction to be read from memory after the current instruction is executed. It also initializes the other internal registers to default values. 2. Instruction Fetch: The number of machine operations required for a single instruction fetch varies, depending on the type of microcontroller. In general, fetching an instruction is divided into 4 basic operations. First,
Fig. 2.9. Harware of a soccer robot
38
2. Hardware Components
the function writes the current PC value into the memory address register (MAR), and hence onto the address bus. Second, it sends out a read control signal on the control bus, upon which the requested data from the addressed memory is output on the data bus. Third, it checks the memory buffer register (MBR) to see if the requested data has been latched into it. Fourth, it reads the data from the MBR. Each machine operation takes up one timing state, with a duration of 2 or 3 clock pulses. 3. Instruction Execution: This function stores the instructions read from MBR (as done by the instruction fetch function) in the instruction register (IR). It then decodes and executes the opcode contained in the IR. An opcode is a machine code executable by the processor in the microcontroller to carry out a corresponding instruction. 4. Service Interrupts: This function checks for interrupt requests. An interrupt request occurs if a higher priority task arrives. Upon such a request, it suspends the running program by storing the program’s states (current memory address held in PC and other data) in a stack and then setting the PC with the data in the interrupt vector; this data is the starting physical address of the interrupt handling routine. After the interrupt handling routine has been executed, this function restores the suspended program’s states from the stack, which enable the instruction fetch and execution functions to continue on the original program. The cycle in which the basic functions except service interrupts are carried out in a microcontroller is shown in Fig. 2.10. Instruction Execution Reset
Internal registers’ values initialized
Instruction Fetch
Decode
Execute
Fig. 2.10. Normal operational sequence of a microcontroller
The sequence in the microcontroller’s handling of an interrupt request is shown in Fig. 2.11. Supporting Architecture. Fig. 2.12 shows the basic architecture set up to support the functionality of a microcontroller. The main integrated circuit (IC) chips and buses (interconnections in which the electrical signals flow) are shown. Here, it suffices to know that an IC chip is a microelectronic semiconductor device consisting of many interconnected transistors and other components. As the name implies, the data bus carries data bits, and the address bus carries address bits. Each bit is a digital signal representing
2.3 A Two-Wheel Command-Based Soccer Robot
Requests interrupt service
Stores all program states in stack
Executes interrupt handling routine
Gets start address of interrupt handling routine from interrupt vector
39
Restores all program states from the stack, and continue with program execution
Fig. 2.11. Handling an interrupt request
binary logic 0 or 1. Typically, a microcontroller has an 8-bit data bus (D7D0) and a 16-bit address bus (A15-A0); with conventions D0 and A0 referring to the least significant bits, and D7 and A15 referring to the most significant bits of the respective buses. The byte A15-A8 is called the upper-byte address and the byte A7-A0 is called the lower-byte address. DataBus
ROM
RAM
Microcontroller Read/Write AddressBus Chipselector
Fig. 2.12. Supporting architecture for a microcontroller
The ROM (read-only memory) is used to store programs (the ‘firmware’) and the RAM (random-access memory), to store data. To read and write from the memories, the microcontroller needs to generate the read and write control signals respectively. These control and address signals ‘direct’ the chip selector accordingly: To read, the chip selector enables either the RAM or ROM where data or instruction is to be read from the addressed memory location; to write, the chip selector enables the RAM where data is to be written into the addressed memory location. The chip selector and associated logic circuitry are designed to enable only one memory chip at a time. In practice, we select the ROM and RAM such that the total memory size is twice that estimated for an application program and data. The ROM, RAM and chip selector are essential devices in a microcontroller architecture. The microcontroller chip may or may not have all these devices built-in. The 89C52 IC chip is an example of a microcontroller that has an internal 8-Kbyte ROM but no chip selector. The 80C196 IC chip is an example of one that has neither a ROM nor a chip selector. In selecting
40
2. Hardware Components
a microcontroller, due considerations must be given to the extra peripherals and associated logic circuits that may need to be added externally. Address and Data Buses. Typically, a microcontroller with an 8-bit data and 16-bit address is compactly designed with its lower-byte address A7-A0 multiplexed with the 8-bit data D0-D7. In other words, the 8-bit data and the lower-byte address share the same bus, thus labelled as AD0-AD7 in Fig. 2.13. This design is possible because during instruction fetch, the address bits generated are put on the bus A15-A8-AD7-AD0 first before the data bits are output on the lower-byte bus AD7-AD0. The lower-byte address bits are on the bus AD7-AD0 only during the active ALE (Address Latch Enable) signal generated by the microcontroller chip. Thus, this ALE signal can be employed to separate the data and address signals, using the address latch scheme as shown in Fig. 2.13.
AD0
A0
AD1
A1
AD2 AD3
A2 A3
AD4 AD5
Microcontroller
8 bit Latch
DataBus
A4 A5
AD6
A6
AD7
A7
ALE
Address Bus
A8 A9 A10 A11 A12 A13 A14 A15
Fig. 2.13. Scheme separating address and data signals
To elaborate, in this scheme, when the ALE signal goes active, the 8-bit latch is enabled such that its output signals, labelled A7-A0, ‘follows’ the signals on the bus AD7-AD0 that contains the lower-byte address. When the ALE signal turns inactive thereafter, the latch output holds this lower-byte address; during this time, the upper-byte address remains on the bus A15-A8, and the data bits are put on the AD0-AD7 bus, hence separating the data signals from the address signals that ‘point’ at a memory location where the data signals are being read from or written into. Chip Selector. As mentioned, a chip selector enables only one device at any instant, and disables all the other devices connected to it. Depending on the microcontroller used, a chip selector need not be added externally in the microcontroller board. Microcontrollers such as the AM188 and 80296 chips do have a chip selector built in.
2.3 A Two-Wheel Command-Based Soccer Robot
Control Signal
CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8
Chipselector
Address
41
Chip Select Signal
Fig. 2.14. Input-output of a chip selector
Fig. 2.14 shows the inputs and outputs of a chip selector. As shown, the chip selector receives at its inputs, control signals such as Read/Write (RD/WD) and the address signals, usually the upper-byte A15-A8, from the microcontroller, and generates chip-select (CS) digital signals at its output. Each CS output is connected to exactly one device chip. So, for the architecture in Fig 2.12, two CS signals are needed, one each for the RAM and ROM. A CS output signal can be made to go active (in order to enable the connected device) according to a logic CS equation expressed in terms of the logic variables for the inputs. An example of a CS equation is shown in Fig. 2.15. The CS output signal is said to be active-low if it is functional (with respect to exclusive chip select) at logic 0, and active-high if it is functional at logic 1. ANDoperator !CNT1_CS = !A15 & A14 & A13 & A12 & A11 & A10 & A9 & !A8 & !RD; 0
1
1
1
1
1
1
0
RD
Active-low = 7E00H~7EFFH
Note: Symbol ‘!’ denotes ‘NOT’ logic operator. Fig. 2.15. An example CS equation
To ensure that only one CS signal is active at any instant, the logical ‘AND’ing of any two CS equations must always be at logic 0. As a design principle, the CS equation, being expressed partly in terms of the upperbyte address bits A15-A8, should define a unique upper-byte address or a unique set of upper-byte addresses that enables the connected device such as a ROM, RAM, PPI (programmable peripheral interface) or counter, and thus uniquely situates the device in a block of full addresses A15-A0 in the system (memory) address map (0000H-FFFFH). An example of an address map ‘partitioned’ by a set of CS equations is given in Fig. 2.16. The set of CS equations has to be designed and programmed to configure the internal hardware logic of the chip selector. Generally speaking, a decoder may be used as a chip selector but programmable array logic (PAL) and
42
2. Hardware Components 000H
ROM PPI Chip-select
7D00H 7E00H 7F00H
Counter1 Chip-select
7FFFH
Counter2 Chip-select
RAM
FFFFH Equations !RAM_CS !PPI_CS !CNT1_CS !CNT2_CS !ROM_CS
= = = = =
A15; !A15 & A14 & A13 & A12 & A11 & A10 & !A9 & A8; !A15 & A14 & A13 & A12 & A11 & A10 & A9 & !A8 & !RD; !A15 & A14 & A13 & A12 & A11 & A10 & A9 & A8 & !RD; !A15 & PPI_CS & CNT1_CS & CNT2_CS;
Fig. 2.16. A system address map and corresponding CS equations
generic array logic (GAL) are preferred because they are programmable and physically more compact. Note however that a PAL can be programmed only once, whereas a GAL is re-programmable. Programming methods for PAL and GAL may differ depending on the compilers used. Clock. The operations in a microcontroller are synchronized by running digital signals known as the clock. As a result, the processing speed of a microcontroller depends on the frequency of the clock. There are two methods to generate clock signals, as shown in Fig. 2.17. C2 XTAL2 C1 XTAL1
Microcontroller
GND
Fig. 2.17. Selecting the clock frequency
NC
XTAL2
External Oscillator Signal
XTAL1
GND
Microcontroller
2.3 A Two-Wheel Command-Based Soccer Robot
43
The first method is a circuit (shown on the left-hand side) that consists of a crystal used to drive the internal oscillation circuit. The frequency of the crystal determines the clock speed. The second method (shown on the right-hand side) uses an external oscillator as the microcontroller’s clock; this method is used when a clock is needed to synchronize the operational timings of two or more IC chips. Interfacing. To add a peripheral device chip, the following aspects need to be considered: • number of address and data bits required, • whether or not a device’s chip select signal (CS) is active-high or active-low, • whether or not a device’s reset signal is generated by software or hardware.
Data Bus
Chip
Output
RD WR CS A0 A1 Reset Address Bus
Fig. 2.18. Device interfacing
Consider the example shown in Fig. 2.18. This example shows a device chip interfaced with an 8-bit data, a 2-bit address, two control signals RD and WD, the reset signal and a chip select signal CS. The CS signal is from the chip selector, and reset signal, used to initialize the device, as well as all the other signals, are from the microcontroller. As a note of precaution, care must be taken when interfacing with analog devices because most analog devices tend to draw large instantaneous currents. Drawing a large instantaneous current from a digital device could damage the device. It is highly advisable to use only the driver (logic) circuits recommended by the manufacturer for interfacing analog devices such as DC motors. Microcontroller PIC16C73/73A. A microcontroller chip is typically a central-processing unit (CPU) integrated with modules for motion control
44
2. Hardware Components
OSC1/CLKIN OSC2/CLKOUT MCLR
: : :
RA0-RA5
:
RB0-RB7
:
RC0-RC7
:
VSS VDD
: :
Oscillator crystal input/external clock source input. Oscillator crystal output. Master clear (reset) input. This pin is an active low to reset the device. Data PORTA is a bi-directional I/O port. It can also be used for analog input. Data PORTB is a bi-directional I/O port. PORTB can be software programmed for internal weak pull-up on all inputs. Data PORTC is a bi-directional I/O port. RC1 can also be configured as CCP2. output pin, RC2 as CCP1 output pin; RC6 as USART asynchronous transmit pin and RC7 as USART asynchronous receive pin. Ground reference for logic and I/O pins. Positive supply for logic and I/O pins.
Fig. 2.19. Pin Configuration of PIC16C73/73A microcontroller
and communication purposes. The PIC16C73/73A microcontroller is one such chip; the pin configuration is shown in Fig. 2.19. The PIC16C73/73A device has 192 bytes of RAM and 22 I/O pins. In addition, several peripheral features are available, including: three timers / counters, two Capture/Compare/PWM (CCP) modules, two (serial) Universal Synchronous Asynchronous Receiver Transmitter (USART) modules and a 5-channel high-speed 8-bit A/D converter. The USART module is also known as the Serial Communication Interface (SCI). Appendix A elaborates on the use of its CCP and USART modules for motion control and communication, respectively.
2.3 A Two-Wheel Command-Based Soccer Robot
45
2.3.2 DC Motors and Auxiliary Components Broadly speaking, the hardware of a soccer robot comprises of a mechanical and an electrical subsystem. Motors are the main components of the mechanical subsystem. DC (Direct Current) motors are commonly used in soccer robots because of their relatively smaller sizes and lower costs compared to stepper motors’. DC motors, however, usually need additional components for good motion control performance. Besides the DC motors, motor drivers and wheels, the DC motor subsystem is usually also equipped with gears and encoders (with encoder counters). Encoder pulse signals, measuring the motor rotation angle, are fed-back and used to calculate the motor (and hence the wheel) velocities, and the gear boxes are used to increase the output torque of the DC motors (needed to rotate the wheels). Motor driver chips are needed to increase the current output of the PWM signals (generated by the microcontroller) to drive the DC motors. To build a compact soccer robot, one can generally reduce the size of the printed circuit board by using SMD (Surface Mount Device) components and through optimal layout of circuits. However, not many options are available when it comes to adding DC motors to the robot hardware. This is because a DC motor is usually required to be integrated as a mechanical ‘package’ comprising of the motor, a gearbox and an encoder, but such individual components which are compatible with one another are available in only a few fixed sizes. Fig. 2.20 shows an exploded view of such a DC motor ‘package’.
Encoder
Motor
Gear box
Fig. 2.20. A DC motor ‘package’ and its exploded view
Working Principle of a DC Motor. A rotor and a stator constitute a DC motor. The stator consists of two permanent bar magnets with opposite polarity facing each other, creating a magnetic field B in between, as depicted in Fig. 2.21. A current I flowing through a commutator brush (not shown) to the rotor of length l, will result in opposite forces F exerted on it in the directions indicated in Fig. 2.21, and governed by
46
2. Hardware Components
F N
S l I
Stator
Stator F
Rotor Fig. 2.21. Working principle of a DC motor
F = B · I · l.
(2.15)
The direction of the force F follows Fleming’s left-hand rule: Stretch out the thumb, first and second fingers of your left hand to be mutually perpendicular; then, with the first Finger pointing in the direction of the magnetic Field and the seCond finger in the direction of the Current, the Thumb is pointing in the direction of the Thrust or force. The opposite forces F rotate the rotor that ‘cuts’ the magnetic field continually, inducing, according to Faraday’s law, a back-emf2 in the rotor, which can be shown to be given by E = kbemf · ωM , where, E: kbemf :
(2.16)
back emf, back emf constant,
ωM :
rotational velocity of (motor) rotor.
Hence, by Kirchoff’s voltage law, V = E + I · ra , where V : voltage applied, I: rotor current,
(2.17)
ra :
armature resistance.
V is the voltage of the amplified PWM signals output by the motor driver; the motor driving circuit will be discussed in Section 2.3.3. Finally, the torque generated, i.e., the ‘angular’ force due to the opposite forces F about the (rotor’s) axis of rotation, and the current flow I are related by 2
The electromotive force ( abbreviated ‘emf’ ) is an archaic term for an induced electric potential (i.e., an induced voltage).
2.3 A Two-Wheel Command-Based Soccer Robot
(τM + τr ) = kT · I,
47
(2.18)
where, τM : motor torque, τr : frictional torque.
kT :
torque constant,
The parameters of Eqs. (2.16) to (2.18) are documented in the manufacturers’ catalogues on DC motors. Torque. Following the combination of Eqs. (2.16) and (2.17) and Eq. (2.18), we get two graphs of rotational velocity ωM versus motor torque τM and I versus τM , as shown in Fig. 2.22. Current (mA) I
Torque (mNm) τM
Rotational Velocity (rpm)
ωM
Torque (mNm) τM
Fig. 2.22. Rotational velocity and current versus torque
The graphs show that there is a tradeoff between motor torque and velocity; increasing the drive current I leads to an increase in the motor torque τM , but a decrease in the motor velocity ωM , and conversely. Most soccer robots move on two wheels. In order to drive the wheels, a force FG is needed against a frictional force Fr , as depicted in Fig. 2.23.
ro
Wheel shaft
G
r
Fr
Fig. 2.23. Torque about the wheel axis
The formulae for the wheel-driving force FG and frictional force Fr are as follows:
48
2. Hardware Components
Fr = µ · m · g, ro FG = Fr · , r
(2.19)
where µ: frictional constant, m: mass of robot, g: gravitational constant, r: radius of the wheel. ro : radius of the shaft, The torque τM generated by the motor and that τG applied to the wheel (attached to it) are related by τM 1 = , τG N · ηG where N : 1:
gear ratio,
(2.20)
ηG :
gear (‘torque-transfer’) efficiency.
Now, it can be shown that τG =
1 1 · FG · r = · µ · m · g · ro . 2 2
(2.21)
Thus, substituting Eq. (2.21) into Eq. (2.20), we get
τM
1 = 2
µ·m·g N · ηG
· ro .
(2.22)
Eq. (2.22) shows that the motor torque τM is directly proportional to the radius ro of the wheel shaft. Power Consumption. It is important that the power supplied by the batteries can last through one half of the game since it is convenient to change batteries only during the half-time interval. The battery power that is consumed by a DC motor is the product of the voltage V across the motor and the current I flowing through it. Rewriting Eq. (2.18), the current I is given by
I=
τM + τr . kT
(2.23)
Define Io by Io =
τr . kT
Then Eq. (2.23) becomes
(2.24)
2.3 A Two-Wheel Command-Based Soccer Robot
I=
τM + Io . kT
49
(2.25)
Io is known as the no-load current as I = Io when there is no load, i.e., τM = 0. Some DC motor catalogues do not specify the τr values; in such cases, the no-load current Io can be easily obtained (by measurement) for Eq. (2.25). Linear and Rotational Velocity. The gear ratio N : 1 relates the velocities of the motor and the wheel as follows: ωG 1 νG = = , νM ωM N where νM : ωM :
(2.26)
motor linear velocity, motor rotational velocity,
νG : ωG :
wheel linear velocity, wheel rotational velocity.
Combining Eqs. (2.16) and (2.17) and rearranging, the rotational velocity ωM in rotations per minute (rpm) is given by ωM =
1 · (V − I · ra ) rpm. kbemf
(2.27)
Since ωM is in rpm and not rad/s, the wheel (linear) velocity νG in cm/s is given by νG = r ·
2 · π · ωG = 60
2·π·r 60
· ωG cm/s,
(2.28)
where r in cm is the radius of the wheel. Substituting Eq. (2.26) into Eq. (2.28), we have νG =
2·π·r 60
· ωG =
2·π·r 60N
· ωM cm/s.
(2.29)
Size. As explained, DC motors and their compatible gearboxes and encoders are individually available, but in only a few fixed sizes. As a result, placing the motors, gearboxes and encoders within a confined cubic space of 7.5cm × 7.5cm × 7.5cm for a MiroSoT robot becomes a challenging mechanical design problem. No systematic approach exists, and this design is, at best, done with a lot of engineering ingenuity. Fig. 2.24 shows two sample designs that meet the size specifications for MiroSoT. The design on the left-hand side uses tendons (timing belts or chains) wrapped around a pair of wheels on each side; the design on the right-hand side uses gears to connect the gearbox (or gear head) to the encoder on each side.
50
2. Hardware Components
Fig. 2.24. Two mechanical designs of motor-wheel assembly
How to Select a Motor Package: An Example. Consider a two-wheel MiroSoT robot with the following features. 1. 12V - 550mA power supplied by NiMH batteries. 2. Electronic circuitry draws a maximum current of 300mA. This means 250 that a DC motor to be selected can draw a maximum current of mA 2 or 125mA. 3. Wheel radius r = 2cm, wheel shaft radius ro = 0.25cm. 4. Mass of robot m = 0.4kg. In this example, we want to select two (identical) motor packages for the robot, such as the one shown in Fig. 2.20. We assume that we have already selected two identical encoders, each with a length of 10mm. Suppose the maximum (linear) wheel velocity νG |max desired is at least 1m/s. Then ωG |max ≥
60 · 100cm/s = 480rpm. 2 · π · 2cm
Assume that the floor’s frictional constant µ is 0.43. Then the output torque τG |min that a motor (with gear) must at least produce is 1 · µ · m · g · ro 2 1 = · 0.43 · 0.4kg · 9.8m/s2 · 0.0025m 2 = 0.0021Nm = 2.1mNm.
τG |min =
Finally, assume that the mechanical design of the robot can accomodate two identical motor packages with each having a total length not exceeding 60 mm; in other words, the length Lm of each package minus its encoder must not exceed 50mm. By the values stated or determined above, each identical motor (plus gear) to be selected needs to meet the following requirements.
2.3 A Two-Wheel Command-Based Soccer Robot
Criterion
Requirement
Power consumption P Rotational velocity ωG |max Torque τG Length Lm
≤ ≥ ≥ ≤
51
12V, ≤ 125mA 480rpm 2.1mNm 50mm
Let’s consider Escap’s 16C18-205 DC motor and B16C27 gearbox that have the characteristics as shown in Table 2.1. Table 2.1. Some motor and gear characteristics (source: datasheets) Measuring Unit
Unit
Value
Measuring Voltage No-load Speed Stall Torque Average No-load Current Io Typical Starting Voltage Max. Continuous Current Max. Continuous Torque Max. Angular Acceleration Back-emf Constant kbemf Torque Constant kT Terminal Resistance ra
V rpm mNm mA V A mNm 103 rad/s2 V/1000rpm mNm/A ohm
12 17300 1.2 9 0.15 0.16 0.96 59 0.66 6.3 65
(a) 16C18-205 DC Motor Measuring Unit
Value
Ratio N Efficiency ηG Length (with 16C18) Lm Mass m
27 0.73 33.7mm 6g
(b) B16C27 Gearbox
Using a B16C27 gearbox, the minimum motor torque τM |min required is τM |min =
τG |min 2.1mNm = = 0.1061mNm. N · ηG 27 · 0.73
The 16C18-205 DC motor can generate a maximum continuous torque of 0.96mNm which is greater than the required torque of 0.1061 mNm. The
52
2. Hardware Components
required torque is also much less than the stall torque of 1.2mNm, defined as the minimum torque at or beyond which the motor will stall. Thus, in this aspect of torque generation, the 16C18-205 DC motor is a good choice. Using a B16C27 gearbox, the minimum current I|min required to produce τM |min is I|min =
τM |min 0.1061mNm + 9mA = 25.8mA. + Io = kT 6.3mNm/A
This current of 25.8mA is much less than the maximum of 125mA that can be supplied, and much less than the maximum continuous current of 160mA allowed. Thus, in this aspect of current drive, the 16C18-205 DC motor is a good choice. With this 16C18-205 DC motor, the maximum motor rotational velocity ωM |max is 1 · (V |max − I|min · ra ) kbemf 1 = · (12 − 25.8mA · 65ohms) 0.66V/1000rpm = 15641rpm.
ωM |max =
The corresponding maximum wheel rotational velocity ωG |max using a B16C27 gearbox is ωM |max N 15641rpm = 27 = 579rpm.
ωG |max =
This means that the robot can move up to the maximum value of 579rpm, which is beyond the required maximum of 480rpm. By the selection criteria, the combination of a 16C18-205 DC motor and a B16C27 gearbox is an acceptable choice for the MiroSoT soccer robot. 2.3.3 Motor Driving and Circuits The PWM (Pulse Width Modulation) method is one of the many methods used to drive DC motors. Referring to Fig. A.1, the amplified PWM signal V output by the motor driver can rotate (the rotor of) the DC motor at a velocity ωM that is proportional to the applied voltage V , according to Eq. (2.27). Why is a motor driver needed ? To serve two purposes, as explained below.
2.3 A Two-Wheel Command-Based Soccer Robot
53
1. For Current Amplification Eq. (2.18) shows that the motor torque τM is proportional to the current drive I. Referring to Fig. A.1, the PWM signal produced by the microcontroller has a low current output. Without amplification, this current is too weak to generate a sufficient torque τM to rotate the motor. The motor driver amplifies this signal to produce one with a stronger current drive for this purpose. 2. For Enabling Clockwise and Counter-clockwise Rotation
Vcc
PWM Logic Signals
SW1
SW2
+ NOT Gate SW4
M
-
SW3
GND
Fig. 2.25. H-Bridge circuit for motor driving
Consider the basic circuit of a motor driver, connected to a DC motor as shown in Fig. 2.25. The basic circuit is a H-bridge circuit, with 4 transistors acting as switches. Motor driver IC chips such as the L293 and L298 from SGS-Thomson contain two such H-bridges per chip. Some typical amplified PWM signal waveforms V output at the terminals of the DC motor by this circuit are shown in Fig. 2.26. To produce the output waveforms shown, the H-bridge circuit ‘switches’ the direction of the instantaneous current flow I in the motor according to the logic level of the PWM signals input from the microcontroller. The following explains how it works: Referring to Fig. 2.27(a), when the periodic PWM signal goes to logic 1, SW1 and SW3 are turned on, and SW2 and SW4 are turned off; the result is a current flow I in the direction indicated. The opposite occurs when the PWM signal goes to logic 0, as illustrated in Fig. 2.27(b). Hence the waveforms shown in Fig. 2.26.
54
2. Hardware Components {vu
V
{vmm
Average Voltage
Vp
STOP
0
time
-Vp {w~t
V
Average Voltage {vmm
{vu
CLOCKWISE
Vp 0
time
-Vp V
{vu
{vmm
Average Voltage
COUNTER-CLOCKWISE Vp time
0 -Vp
Fig. 2.26. Amplified PWM signals with different duty cycles
Vcc
Vcc
SW1 = ON
SW2 = OFF
SW1 = OFF
Current flow I
+
M
-
+
SW3 = ON
SW4 = OFF
SW2 = ON Current flow -I
-
SW3 =OFF
SW4 = ON
GND
GND
(a) PWM Signal at Logic 1
M
(b) PWM Signal at Logic 0
Fig. 2.27. PWM-based operations of H-bridge circuit
2.3.4 Velocity and Duty Cycle Referring to Fig. 2.26 again, the average voltage V¯ of the amplified PWM signal V is
2.3 A Two-Wheel Command-Based Soccer Robot
V¯ =
(n+1)T
1 · (TON − TOFF ) · Vp TPWM = (2d − 1) · Vp ,
V dt = nT
55
(2.30)
TON , 0 ≤ d ≤ 1, is the duty cycle TPWM of the PWM signal. Note that ideally, Vp = Vcc . This average voltage V¯ applied across the motor, with an average current I¯ flowing through it, determines the resultant DC motor rotational velocity ω ¯M according to the following formula. for an arbitrary n ≥ 0, where d =
ω ¯M =
1 kbemf
· V¯ − I¯ · ra rpm.
(2.31)
This formula follows directly from Eq. (2.27). Since |V¯ | >> |I¯ · ra |, V¯ (dominantly) determines, by its magnitude, how fast the DC motor rotates, and by its sign (+ or -), determines its direction of rotation (clockwise or counter-clockwise, respectively). Substituting Eq. (2.30) into Eq. (2.31), we obtain
ω ¯M =
1
· (2d − 1) · Vp − I¯ · ra rpm
kbemf 1 ≈ · (2d − 1) · Vp rpm. kbemf
(2.32)
Hence the resultant DC motor velocity (and rotational direction, as indicated by its sign) can be altered by changing the duty cycle d of the input PWM signal, as illustrated in Fig. 2.26. By combining Eqs. (2.26) and (2.32) and rewriting, we have ω ¯ G ≈ ωG |max · (2d − 1), where ωG |max =
Vp . N · kbemf
(2.33)
Hence the resultant DC motor-driven velocity (and rotational direction, as indicated by its sign) can be altered by changing the duty cycle d of the input PWM signal. Practical Implementation of Motor Actuation: An Example using PIC16C73/73A Microcontroller. Consider an example of motor actuation using the PIC16C73/73A microcontroller introduced in Section A.1. Then, by setting the integer values (PR2) and W as binary numbers in register PR2 and CCPRxL:CCPxCON5 : 4 of Eqs. (A.1) and (A.2) respectively, we get
56
2. Hardware Components
TON TPWM W , where A = (PR2) + 1. = 4A
d=
(2.34)
With 0 ≤ d ≤ 1, and by substituting Eq. (2.34) into Eq. (2.33), we obtain ω ¯ G = ωG |max ·
W −1 2A
rpm, where 0 ≤ W ≤ 4A.
(2.35)
By Eq. (2.35), the ‘delimiting values’ of ω ¯G (in rpm) follow. −ωG |max if W = 0, if W = 2A, ω ¯G = 0 ωG |max if W = 4A. The value of A is fixed a priori in the robot’s microcontroller program. Thus, wheel actuation at a dynamically changing desired velocity is reduced to computing and recomputing the velocity data W to generate the required pulse width of the PWM signals that drive the corresponding motor. Strictly speaking, for a given integer value of A, Eq. (2.35) holds only for those values of ω ¯ G for which the values of the integer variable W exist. So, a more general equation that holds for all values of ω ¯ G should have the error W included, as follows: ω ¯ G = ωG |max ·
(W + W ) −1 2A
rpm, where 0 ≤ W ≤ 4A.
(2.36)
Rewriting Eq. (2.36), we get
W W ω ¯ G = ωG |max · − 1 + ωG |max · rpm. 2A 2A specified actual
(2.37)
error
For a specified velocity ω ¯ G , the integer value of W should be determined with minimum error |W | ≤ 0.5 (ideally |W | = 0). Therefore, for a more precise arbitrary velocity setting, the value of A should be fixed as large as possible (but up to 28 as allowed by the 8-bit register PR2 plus 1). This will provide a more finely divided denominator range over which the integer W determined using Eq. (2.37) has a higher probability of corresponding more closely (if not exactly) to the specified velocity ω ¯G .
2.3 A Two-Wheel Command-Based Soccer Robot
57
Dead Zone and Saturation. There is a dead zone range of ±zd , where zd ≥ 0, at around W = 2A, within which the rotational velocity ω ¯G is 0. Motor saturation occurs at |¯ ωG | = ωsat ≤ ωG |max . Both phenomena are due to the inherent motor characteristics. Incorporating these into Eq. (2.36), we have a piece-wise function for ω ¯G (in rpm) as follows: + W − zd | · min ω if W + ≥ (2A + zd ), − 1 , ω G max sat 2A + ω ¯G = 0 + if |W − 2A| ≤ zd , W + zd − 1 , −ωsat if W + ≤ (2A − zd ); max ωG |max · 2A (2.38) where W + = W + W and −zd ≤ W + ≤ 4A + zd . The actual values of zd and ωsat depend on the motor used. Equivalently, we have a piece-wise function for W + as follows:
W+
ω ¯G + min 2A · + 1 + zd , 2A + Wsat if ω ¯ G > 0, ωG |max = 2A ¯ G = 0, if ω ω ¯ G + + 1 − zd , 2A − Wsat if ω ¯ G < 0; max 2A · ωG |max (2.39)
ωsat + zd . ωG |max The graph for Eq. (2.38) or Eq. (2.39) is shown in Fig. 2.28. The host computer program for a command-based robot soccer system needs to compute W + using Eq. (2.39) and send W as wheel velocity data to a respective team robot for its motor actuation. + where Wsat = 2A ·
2.3.5 Communication In this section, we discuss two communication means and the associated methods. One method uses radio frequency (RF) while the other uses infrared (IR) light. These methods are applicable to our purpose of sending actuation command data such as the desired velocity data from the host computer to the individual team robots. IR is an alternative medium frequently used in the remote switching of TV channels and many other consumer electronic products. IR supports communication with directivity; hence communication can be easily localized to a targeted area. IR transmitter/receiver circuits are simpler and available on an IC chip that is usually smaller than an RF communication module. But IR
58
2. Hardware Components
ωG ωGmax
M
W_S` V 4
W+ V
OV
M
W_S`
Tc
VcBM V
-ωGmax + + Note: Only the range [2A − Wsat , 2A + Wsat ] of W + is valid.
Fig. 2.28. Graph showing the relationship between the wheel rotational velocity ω ¯ G and the PWM data W + required to attain it
only supports short distance communication and is sensitive to fluctuations in environmental lighting. RF is a medium of communication that supports long distance and multichannel communication, with no directivity (i.e., there is no need to position and direct (or ‘point’) the transmitters at the receivers). But building an RF communication module is generally difficult because RF transmitter/receiver circuits are quite complex and require a certain degree of engineering knowhow. In building the communication module for a robot soccer system, a commercially available RF module is preferred. In employing RF communication, the selected carrier frequency for use should preferably not fall within or near the (carrier) frequency bands allocated for commercial use (e.g. cellular phone and pager). This is to avoid possible interference or ‘jamming’ of signal transmissions. Associated with each communication method is a set of communication protocols; a protocol is a standard set of rules that determines how computer-
2.3 A Two-Wheel Command-Based Soccer Robot
59
based and robotic agents communicate with one another across the medium. When these agents communicate with one another, they exchange a series of messages. To understand and act on these messages, the agents must agree on what a message means. A protocol has its rules described in terms of the format that a message must take and the systematic procedure in which agents must exchange messages within the context of a particular activity, such as sending and receiving messages across the medium. The message exchange procedure attempts to ensure the electronic messages are correctly formatted and transmitted from the originating agent to the destination agent. Agents of different types are able to communicate with one another on a certain activity - in spite of their differences - when they agree to use an appropriate communication protocol that offers a standard format and message exchange procedure. IR: Communication Circuit and Protocol. To use IR as the means of communication, two methods are available. One is the ASK (Amplitude Shift Key) method and the other is the base band method. These methods follow the standards set by the Infrared Data Association (IrDA, http://www.irda.org/). IrDA is an international non-profit organization that creates and promotes interoperable, low cost infrared data interconnection standards. In the ASK method, data is transmitted on a carrier signal and in the base band method (IrDA1.0), it is transmitted by switching the transmitter on-and-off. Fig. 2.29 illustrates how the serial data is transmitted using these methods. vGG { W
X
W
k
\WWo¡
zGhzrG
Z
pkhXUWG
X]
{
Fig. 2.29. IR communication using ASK and IrDA1.0 methods
W
60
2. Hardware Components
Referring to Fig. 2.29, when the data bit is logic 0, the ASK method generates and transmits the corresponding digital signals at a frequency of 500kHz; when the data bit is logic 1, it does not generate any signal. In the base band (IrDA 1.0) method, when the data bit is logic 0, the method 3 of the one bit time Tb ; when generates a corresponding pulse for the first 16 the data bit is logic 1, it does not generate any pulse. In the rest of this section, we focus on the base band (IrDA1.0) method.
IR Receiver (PIN)
IR Transmitter (LED)
Serial Output
Serial Input
Receiver Circuit (Amplifier and Quantizer)
Transmitter Circuit (LED DRIVER)
Decoder Encoder (3/16
th
Pulse Width Modulator)
Serial data output by host computer through an external data dispatcher
(a) Transmitter Module
(Edge Detector and Pulse Width Demodulator)
Serial data input to USART of robot˅s microcontroller
(b) Receiver Module
Fig. 2.30. Module block diagrams implementing the base band method for IR communication
Fig. 2.30 shows the block diagrams of the transmitter and receiver modules that implement the base band method. These module block diagrams, which are self-explanatory, can be realized using a HSDL7001 encoder/decoder IC chip and a HSDL1000 LED driver/receiver IC chip in a generic module for data transmission (at the host computer) or reception (at the team robot), as shown in Fig. 2.31. We turn our attention to the experimental robot soccer system. This system uses the base band method for IR communication. As explained on page 280 in Section A.2, without carrier signals at different frequencies to identify the teams, the two teams need to share the same IR communication
2.3 A Two-Wheel Command-Based Soccer Robot g
Qf
VCC
61
vQt
Be
VCC
VCC U3 JP1
JP2
JP3
TXDATA RXDATA
2 3 9 4 5 6
TXD RCV NRST
11 IR_TXD 10 IR_RCV
TXD RXD
JP4 JUMPER
JUMPER
JUMPER
JUMPER
1 15 16XCLK 14 OSCIN OSCOUT
X1 X2
13 POWERDN 7 CLK_SEL 12 PULSEMOD
A0 A1 A2
16 VCC 8 GND
R1
R2
U1
TXD RXD
6 4 1 2
VCC
C1
C2
C3
TXD LDEA RXD LEDC CX1 CX2
VCC GND
8 7 3 5
LEDA
VCC
HSDL1000
HSDL7001 R3
R4
R5
GND
C7 X1
Y1
R6
cBBB\ vzfcvcB_`B
BBvfBBB
C8 X2
cB
BB\ tzfcvcB^_B
BBtf BB
module, set up as a transceiver (a data ‘dispatcher’) connected to several IR transmitter modules, as shown in Fig. 2.32.
PC 1
v c
v
v
PC 2
v d kktBvBoBBB
Fig. 2.32. A game set-up for teams using IR base band communication
Because of the directivity of IR radiation and the small maximal view angle θt or θr (about 14o) of the IR transmitters and receivers, several IR transmitters are needed to cover the playground area. The MiroSoT game set-up allows the IR transmitter modules depicted in Fig. 2.32 to be placed 2m above the 150cm × 130cm playground, and hence four IR transmitter modules are required to sufficiently cover the whole playground area; the plan view of their positions is shown in Fig. 2.33. This coverage ensures that a robot can receive data at any position on the playground. Fig. 2.34 shows the circuit block diagram of the transceiver. The transceiver receives the data packets from (the host computer of) each team and transmits them out serially (i.e., one packet at a time) but simultaneously through the four IR transmitter modules. Each data packet is a communication message with the format as defined in Fig. 2.35.
62
2. Hardware Components
{
θ θ
SUR
RNR
v
t
UW
NUW
UW
N[W
SSW
NUW
SSW
N[W
Nθ r \x
t
θr
SWR
t
z Fig. 2.33. IR transmission coverage
The simple procedure of the broadcast protocol used for data send and receive is as follows: • The team host computer transmits the messages continually to the team robots through the transceiver. • The IR receiver module onboard each of the 3 team robots receives and decodes the messages as they arrive; the microcontroller stores these messages in its memory automatically via its configured hardware logic. The microcontroller program examines every message stored for the following conditions. 1. Start of message: the 0th -2nd bytes each contains the value 0xFF; 2. Team ownership: the 3rd byte contains its team ID; 3. Data delimiter: the 6th , 9th and 12th bytes each contains the value 0xAA; • If all the conditions are true, it proceeds to extract the left-wheel and rightwheel velocity data from the (3i + 4)th and (3i + 5)th bytes respectively, where i ∈ [0, 2] is the unique robot ID. Data Communication from Host Computer to Team Robots. Now, consider a command-based robot soccer system with the host computer program continually deciding and sending the velocity data W (for each wheel of each PIC16C73/73A microcontroller-based robot) through the send and receive communication protocol introduced above. Recall that W determines the duty cycle as in Eq. (2.34). In the experimental robot soccer system, the duty cycle of the PWM signal has a resolution of at most 10-bits. i.e., 0 ≤ (CCPRxL : CCPxCON5 : 4 )2 ≤ 210 − 1.
2.3 A Two-Wheel Command-Based Soccer Robot
63
v
jB e JvBcK
tuOTUTeBnBf JoczTUTK
uB u
u v r JSXeWWR wctvK
uB
jB e JvBdK
u v r JSXeWWR wctvK
o
JkBZRWSK
r v u JSXeWWR wctvK
ktBv o
Fig. 2.34. Circuit block diagram of a transceiver
0xFF
0xFF
0xFF
TID
0 HL
0 HR
0xAA
1 HL
1 HR
0xAA
2 HL
2 HR
0xAA
Format Definition • The first 3 bytes of 0xFF ( ‘0x’ denotes Hexadecimal) indicate start of message. • Byte TID contains a team ID 0x0F or 0xF0 that denotes Team A or Team B, respectively. i , for 0 ≤ i ≤ 2, contain, respectively, the unsigned velocity • Byte HLi and byte HR data for the left and right wheels of team robot with robot ID i. • Byte 0xAA indicates the end of velocity data pair for each team robot. Fig. 2.35. Communication message format
In other words, the velocity data W needs at least 10-bits but the protocol defined admits only one byte for velocity data. The simple approach used is for the host computer to send byte data H , given by H=
W , rounded to the nearest integer ≤ 28 − 1 (i.e., 255). 4
64
2. Hardware Components
The microcontroller program in a team robot with robot ID i would then, upon receiving each of its team communication messages, extracts and comi putes to obtain the pair of desired velocity data, 4HLi and 4HR . Finally, the byte values 0xFF (255) and 0xAA (170) are special codes defined in the communication format. As a design principle, the host computer program does not use them for velocity data. Hence H ∈ {j| 0 ≤ j ≤ 254 and j = 170}. RF: Communication Circuit and Protocol. For RF communication, a commercial communication module, the ARFM 424/447 from ALLINTEK (http//www.allintek.co.kr), can be used. The module implements the frequencyshift keying (FSK) method. In FSK, the binary values (0 and 1) are represented by two different frequencies near the carrier frequency. One bit time Tb
0
1
0
0
Data
FSK
Fig. 2.36. RF communication using the FSK method
As depicted in Fig. 2.36, at time t, the resulting FSK signal s(t) is s(t) =
Ao cos(2πf1 t) if data(t) = 1, Ao cos(2πf2 t) if data(t) = 0,
(2.40)
where Ao is an amplitude constant, and typically, f1 = fc + f
and f2 = fc − f
i.e., they are offset from the carrier frequency fc by an amount f . Like most commonly used commercial modules, the ARFM 424/447 transceiver module has a fixed frequency of either 424MHz or 447MHz. The communication mode is half-duplex, i.e., the module supports either signaltransmit only or signal-receive only. This communication module can send data up to 40Kbps. Some electrical characteristics of this module are as follows:
2.3 A Two-Wheel Command-Based Soccer Robot
Supply Power: Current Consumption: Input/Output Impedance:
65
DC 3.3V - 5V, 45mA Max, 50 Ohms.
Fig. 2.37 show the appearance and physical dimensions of the real product. The pin configuration is listed in Table 2.3.5. Table 2.2. Pin Configuration of ALLINTEK ARFM-424 module Pin No. 1 2 3 4 5 6 7
Function RXD TXD RXE TXE VCC GND ANT
8
AGND
Description Receiving Data(TTL level) Transmitting Data(TTL level) Reception Enable Pin, ’H’ Enable / ’L’ Disable Transmission Enable Pin, ’H’ Enable / ’L’ Disable DC Power, 3.2V 5.0V Ground Transmitting Data RF Output Pin, Antenna Port Impedance : 50 ANT Ground
Designed in combination with some auxiliary components, the ARFM-424 transceiver module can be used in a command-based robot soccer system as follows: 1. At the host computer Fig. 2.38 shows a commercially available RF transceiver circuit for use (as a transmitter module) by the host computer. The circuit is set (via a mode switch) to modulate and transmit the signals from the host computer via the attached RF antenna. 2. On each team robot Fig. 2.39 shows an RF transceiver circuit for use (as a receiver module) by each team robot. The RF communication circuit is set (via a mode switch) to demodulate the signals received from the host computer via the attached RF antenna. 3. Communication protocol Fig. 2.40 shows the message format of a protocol for RF communication. The first byte is a dummy byte for frequency locking and the second byte i indicates start of message. Two bytes, HLi and HR , for i ∈ [0, 2] (a 3-aside team), contains respectively, the unsigned velocity data for the left and right wheels of the team robot with ID i. The byte mode-i is reserved for extended use as a control mode for the team robot with ID i.
66
2. Hardware Components
(a) Photo image
32.00mm 27.94mm 2.54mm RXD
RXE
TXD
TXE
ANT
GND
AGND
GND
AGND
30.00mm
3.57mm
0.8T
16.2mm
ARFM-4XX VCC
22.86mm
GND
2.03mm
2.5mm 6.2mm
(b) Physical dimensions Fig. 2.37. An ALLINTEK ARFM-424 RF communication module
2.3.6 Power System The choice of a power source is crucial from the hardware point of view. Batteries constitute a significant percentage of the total weight of a MiroSoT soccer robot. On the one hand, larger batteries usually supply higher voltages (and hence currents), but this may be at the expense of attaining good speed as they might contribute excessive weight and size to the overall mechanical
2.3 A Two-Wheel Command-Based Soccer Robot
67
design of the robot. On the other hand, smaller batteries might not be able to supply the robot with a constant current that is sufficient to drive its logic and motor circuitry, or that lasts long enough to sustain the required autonomy
68
2. Hardware Components
dummy
header
0 HL
0 HR
mode-0
1 HL
1 HR
mode-1
2 HL
2 HR
mode-2
Fig. 2.40. Communication message format
(during the game). Besides, an under-weight robot that results might not be favourable strategically since the fairly rugged nature of the game means that it can be easily ‘pushed off’ its posture during the game. As a guideline from the viewpoint of sustained robot autonomy, the batteries must supply sufficient current through one half of the two-half game duration. Functionally, a battery converts chemical energy into electrical energy. There are basically two types of battery, namely, the rechargeable and nonrechargeable (i.e., ‘used once’) type. For a robot soccer system, it is economically cheaper to use rechargeable batteries. Table 2.3. Some batteries and their characteristics Battery Chemistry
Recharge
Energy Density (Whr/kg)
Cell Voltage
Typical Capacity (Ah)
Internal Resistance (ohms)
Comments
Alkaline
No
130
1.5
AA: 1.4 C: 4.5 D: 10
0.1
Most common primary battery
Lead-acid
Yes
40
2.0
C: 1.2-120
0.006
Available in a wide variety
0.3
Excellent energy density, high unit cost
Lithium
No
300
3.0
A: 1.8 C: 5 D: 14
Mercury
No
120
1.35
Coin: 0.19
10
NiCd
Yes
38
1.2
AA: 0.5 C: 1.8 D: 4
0.009
NiMH
Yes
57
1.3
AA: 1.1 4/3A: 2.3
Zinc-air
No
310
1.4
Carbonzinc
No
75
1.5
Low internal resistance, available from many sources Better energy density than NiCd, expensive High energy density but not widely available, limited range of sizes
D: 6
Inexpensive but obsolute
Table 2.3.6 lists some commercially available batteries and their characteristics. The capacity (or energy stored) of a battery is quantified in terms of amp-hours (Ah) or milliamp-hours (mAh). For example, a battery with 500mAh means that it can supply 500mA for one hour. Therefore, if a supply of 300mA is needed for 5 hours, a battery with 1500mAh is appropriate. When the motors start up or reverse direction, they could draw large transient currents from the power system, leading to a possibly large instan-
Notes on Selected References
69
taneous drop in the supply voltage. Thus, in using a common power system for the logic circuits and the motors of a soccer robot, batteries with the right capacities need to be chosen to ‘withstand’ the maximum transient currents that the motors might draw. As a guideline, a rechargeable battery needs to have a capacity at least one third of the (total) maximum instantaneous current that can be drawn. For example, if the maximum instantaneous current is 3000mA, the battery capacity must be at least 1000mAh. Otherwise, the battery supply voltage will drop instantaneously and excessively when such a current occurs, leading to battery ‘breakdown’ and/or logic circuit reset. Overall, in selecting the batteries for a soccer robot’s power system, one needs to balance the economic cost with the capacity (in mAh) and total weight of the batteries. Power Regulation. It is uncommon to always use IC chips such as those of the 74HC series that can operate over a wide (discrete) range of voltages (i.e., the ‘Vcc ’s). Most IC chips are powered by applying a fixed Vcc voltage. Thus, for an electrical circuit system containing several IC chips powered by different voltages (e.g. 5V and 12 V), power regulation is needed. Two types of power regulator are commonly used; as illustrated in Fig 2.41, a linear regulator ‘steps down’ a supply voltage while a DC-DC converter ‘steps up’ the voltage; a DC-DC converter can also be used to reverse the polarity of the voltage. Besides voltage level transformation, importantly, a power regulator ensures its output voltage is constant regardless of some inevitable fluctuations in the supply voltage or the load (i.e., the electrical currents drawn by the system circuitry) due to, for example, the start up or direction reversal of the motors in the soccer robot. The logic circuits need to operate under a constant applied voltage, and hence this voltage is best provided for by the regulator output (see Fig. 2.8). 2.3.7 Other Considerations We conclude with some mechanical design considerations, listed as follows: 1. The battery compartment in the robot should allow easy replacement of batteries since this is often done during the half-time interval of the game. 2. The robot’s wheels should be designed to have good friction with the contact surface; this is to minimize slips and hence reduce position errors.
Notes on Selected References The two companion books [14, 15] provide a reasonably good reference on PIC microcontrollers and the auxiliary devices for various applications.
70
2. Hardware Components
VCC 7805 7~35 volts
Supply
IN
OUT
GND
10u
47u
Battery
Supply POWER SW
(a) A linear regulator
470uH LX 2~5 volts
5 volts
VOUT
MAX631 LBI
GND
100u VFB
GND
(b) A DC-DC converter Fig. 2.41. Examples of power regulation IC chips
3. How to Sense? Use Computer Vision Techniques
3.1 Introduction In a robot soccer game, real-time information about the robots’ and the ball’s dynamically changing coordinate positions and directions of move is vital. In a MiroSoT set-up, such information is obtained using a real-time vision system which consists of a vision algorithm continually processing the digital images captured by a vision board that receives the analog images from a video camera overlooking the complete playground. The FIRA MiroSoT rules specify well defined colours for different objects in the playground, and these are used as major cues for object detection. Vision processing in a robot soccer system is therefore colour-based. The performance of a vision system for robot soccer is gauged in terms of its rate and accuracy in determining the objects’ coordinate positions and directions of move in the playground. Assuming that a good and fast enough vision processing algorithm exists, the former is limited to the image sampling rate of its vision processing board (also called an image frame grabber); the latter is limited by the quality of the digital images it processes. Most commercial vision boards provide about 30 image frames per sec1 ond, implying that new image frames are sampled once in every 30 s (or about 33.3ms). Thus, in a robot soccer system, if the system processing cycle time (of vision processing, deciding, controlling and communicating) does not exceed this frame sampling time, the maximum of 30 actuation commands per second can be received by the team robots. But each sampled image frame 1 is interlaced with an even and an odd field, and each field is captured at 60 s (about 16.7ms) intervals. So, if individual image fields are processed instead, and the system processing cycle time does not exceed this field sampling time, the maximum rate is doubled to 60 actuation commands received per second. But this higher rate is achieved at the cost of lower accuracy because individual image fields processed are clearly of lower resolution than an image frame. This chapter focusses on the (visual) SENSE primitive; it covers the basics of vision processing systems most relevant to the study of robot soccer systems, and presents how the postures of target objects in robot soccer can be computed using centralized vision techniques. Real examples then follow to study several specialized aspects of real vision systems as used by J.-H. Kim, D.-H. Kim, Y.-J. Kim, K.-T. Seow: Soccer Robotics, STAR 11, pp. 71-101, 2004 Springer-Verlag Berlin Heidelberg 2004
72
3. How to Sense?
previous FIRA Cup MiroSoT teams. These examples highlight the practical considerations in building a good vision system for a MiroSoT team.
3.2 Vision Basics 3.2.1 Computer Vision Computer or machine vision studies how useful information of a scene can be extracted from the images of the scene. The information refers to the features of objects found in the scene. Examples of an object’s features include its position, heading angle, contours and colours. The kind of information to be extracted depends on the application. Besides mobile robotics, examples of vision applications include medical diagnosis and weather forecasting based on satellite images, to mention a few. Related Disciplines. Some fields related to computer vision include image processing, computer graphics, pattern recognition and artificial intelligence. A comprehensive discussion of these related fields is beyond the scope of this book, but it is noteworthy that many techniques from these fields have significant bearings on computer vision, as briefly mentioned below. 1. The output of image processing is an image which is either an enhanced, compressed, de-blurred or focus-corrected version of the input image; image processing is therefore useful in the early stages of a vision system, since it could be used to enhance particular information and suppress noise in the original image frame. 2. Computer graphics studies how images can be generated from geometric primitives (image synthesis), and is therefore an inverse of computer vision concerned with estimating geometric primitives and other features of objects from images (image analysis). Graphics techniques such as those for curve and surface representations are applicable to approximating object contours in computer vision. 3. Pattern recognition studies how numerical and symbolic data are classified as patterns. Many statistical and syntactical techniques developed for classifying patterns play an important role in computer vision for object recognition. 4. Artificial intelligence studies how intelligent systems are built as well as the computational aspects of intelligence. Techniques from artificial intelligence can be used to analyze scenes by constructing a symbolic representation based on the features of the scene objects obtained by vision processing. In fact, many artificial intelligence techniques play important roles in all aspects of computer vision that vision is often considered a subfield of artificial intelligence.
3.2 Vision Basics
73
Fig. 3.1. Basic architecture of a computer vision system
3.2.2 Vision System Operations Images are two-dimensional (2-D) projections of the three-dimensional (3-D) scene. The information of a scene is therefore not directly available. To extract it, a vision system requires high-level knowledge about the objects in the scene, and low-level knowledge on image formation, namely projection geometry and the physics of light. Projection geometry determines the relative location of an arbitrary point in the scene in the image (display) plane; the physics of light determines the brightness of a point in the image plane as a function of scene illumination and surface properties. Fig. 3.1 shows a basic architecture of a computer vision system. The knowledge on image formation is built mainly into the vision hardware (i.e., the camera and frame grabber) while the application knowledge about the scene, such as the models of and the relationships among the objects which could be found in the scene, is coded into the computer vision algorithm. Through its optical lens, the camera projects a 3-D scene onto a 2-D intensity image. The illumination or brightness of this image is cast on the light-sensitive photocells of the CCD1 (Charge Coupled Device), an image sensor that converts and outputs these photo intensities as a continuous (electrical) charge signal. Alternative image sensors include CMOS (Complementary Metal Oxide Silicon), CID (Charge Injection Device) and PDA (Photo Diode Array). For robot soccer systems, CCD cameras are popularly used. The frame-grabber digitizes the analog charge signal output by the CCD camera into evenly time-spaced integer data points (that constitute the sensed 2-D digital images), and stores them in memory as a 2-D image ar1
CCD is a semiconductor technology used to build light-sensitive electronic devices such as cameras and image scanners. Such devices may detect either colour or black-and-white. Each CCD chip consists of an array of light-sensitive photocells. The photocell is sensitized by giving it an electrical charge prior to exposure.
74
3. How to Sense?
rays under computer control. The host computer program then calls upon a resident vision algorithm to process the stored digital images. 3.2.3 Sampling, Pixel, and Quantization Images are classified as either stationary or dynamic. Images captured in a robot-soccer game are dynamic, and they are either black and white (gray scale) or coloured, depending on the type of overhead camera used. A real image is a continuous tone picture, and output by the image sensor as a continuous signal waveform2 that represents brightness. The frame grabber digitizes this waveform by sampling3 using its scanner and quantizing using its analog-to-digital (A/D) converter to transform it into a 2-D array of integer data points or samples. Depending on the type of camera and optical filters used, each sample (for gray scale images) or a vector of ‘neighbourhood’ samples (for colour images) constitutes what is called a picture element or pixel for short. Sampling selects the evenly time-spaced ‘data points’ on the charge signal waveform; each data point indicates the original intensity value of the selected or sampled signal point. Quantization assigns each real-value data point an integer-value number to represent the intensity level for computer storage. Graphically, as depicted in Fig. 3.2, a 2-D n × m image array has n rows and m columns of pixels may be conveniently represented in a grid of equalsized squares, and for each pixel denoted by a, a[i, j] stores an intensity level quantized to an integer value for a gray scale (one-channel) image, and for a colour (multi-channel) image, stores a vector of intensity levels, with each for a basis colour channel and quantized to an integer. The basis colours depends on the colour model used. [i, j] refers to its pixel position which is a square point in the grid, where i, 1 ≤ i ≤ n, is the row index and j, 1 ≤ j ≤ m, is the column index. The grid represents the image plane and each grid square is said to be occupied by a pixel. The sampling rate of a chosen frame grabber determines the image array size or pixel resolution, i.e., how many pixels the digitized image will have. The quantization range or intra-pixel (intensity) resolution, limited by finite word size of the computer (usually set at 8-bits), determines how many levels are available to represent the intensity level of a signal sample point. 3.2.4 Gray Scale, Binary, and Colour Images For a gray scale image, the pixel is frequently represented as an unsigned 8-bit integer, and hence the quantization range is [0, 255], with 0 corre2
3
Strictly speaking, this waveform is not analog, but a dense spectrum of signal points each indicating the level of light intensity in a photocell of the exposed CCD (photocell) array. Note that the image sensor’s photocell ‘scanning’ rate is an integer multiple of this sampling rate.
3.2 Vision Basics
Pixel pixel a[1,1]
Column j
75
m Columns
Row
Pixel
a[i, j]
n Rows
Fig. 3.2. An n × m image grid
sponding to black, 255 corresponding to white and shades of gray distributed over the middle values. If the pixel is represented by a one-bit integer, the quantization range is [0, 1]; in this case, the image formed is called a binary image. For colour images based on the RGB colour model, each pixel a(i, j) = [R G B]T , where R, G and B are the integers values for its red, green and blue components; if the component channels are each quantized to an 8-bit integer, 28 × 28 × 28 (or 1.677 × 107 ) different colours can be represented for a pixel. It should be clear that the quality of digital images obtained depends on the image pixel and intra-pixel resolutions, as well as the environmental lighting conditions. In many practical vision applications, the sampling and quantizing rates are predetermined due to the limited choice of available cameras (or image acquisition hardware). It is however important to know the effects of sampling and quantizing rates on retaining information in digital images; this is discussed in the book [16]. 3.2.5 Colour Models Visible light has a wavelength ranging from 400nm to 700nm. Colours are created by mixing different visible lights. A colour model (or colour space) is a way of representing these colours and their relationship to one another. The RGB Colour Model. The RGB colour space consists of the three additive primaries: red, green, and blue. Spectral components of these colours combine additively to produce a resultant colour. In what follows, the colours are normalized (i.e., their values lie between 0 and 1.0). This is easily accomplished by dividing the colour by its maximum
76
3. How to Sense?
value allowed by the quantization range. For example, an 8-bit colour is normalized by dividing by 255. The RGB model is represented by a 3-dimensional cube with red green and blue at the corners on each axis, as shown in Fig. 3.3. Black is at the origin. White is at the opposite end of the cube. The gray scale follows the line from black to white. In a 24-bit colour graphics system with 8 bits per colour channel, red is (255,0,0). On the colour cube, it is (1,0,0).
Blue = (0,0,1)
Magenta = (1,0,1)
Black = (0,0,0)
Cyan = (0,1,1) White White = (1,1,1)
Green = (0,1,0)
Red = (1,0,0) Yellow = (1,1,0)
Fig. 3.3. The RGB colour cube
Often times, it becomes necessary to convert an RGB image into a gray scale image. To convert an image from RGB colour to gray scale, the following equation is used. Gray scale intensity = 0.299R + 0.587G + 0.114B.
(3.1)
This equation comes from the NTSC4 standard for luminance. Another common conversion from RGB colour to gray scale is a simple average. Gray scale intensity = 0.333R + 0.333G + 0.333B.
(3.2)
This is used in many applications. 4
National Television System Committee, a committee that sets colour television standards which are used in America, Korea and Japan.
3.3 Binary Image Processing
77
Other Colour Models. Different image processing systems use different colour models for different reasons. The colour picture publishing industry uses the CM Y colour model. Colour CRT monitors and most computer graphics systems use the RGB colour model. Systems that must manipulate hue, saturation, and intensity separately use the HSI colour model. The Y IQ and Y U V (or Y Cb Cr ) colour models are used respectively in NTSC and PAL5 video for broadcast television. Many commercial frame grabbers use the RGB model, but some use the Y U V model, such as the one in the example system of Section 3.4.3. Other colour models, based on human vision, include XY Z, LAB and LU V ; these were proposed by the commision internationale de l’eclairage which means the international commision on illumination. Relationship between Colour Models. Relationships exist to convert from one colour model to another and back. Listed below are some matrix equations that directly convert the RGB model to Y IQ, Y U V and Y Cb Cr models, respectively. Y 0.299 0.587 0.114 R I = 0.596 −0.275 −0.321 G . Q 0.212 −0.528 0.311 B
(3.3)
Y 0.299 0.587 0.114 R U = −0.169 −0.331 0.500 G . V 0.500 −0.419 −0.081 B
(3.4)
Y 0.299 0.587 0.114 R Cb = −0.299 −0.587 0.886 G . Cr 0.701 −0.587 −0.114 B
(3.5)
3.3 Binary Image Processing Binary images have only 2 (gray level) intensity levels, 0 and 1. Compared to gray scale level or colour images, binary images require less memory storage and can be processed more quickly, but clearly contain a lot less information of the scenes they represent. However, many techniques developed for binary vision systems are also applicable to vision systems which use gray scale or colour images. A convenient way to represent an object in a gray scale or colour image is to use its mask. The mask of an object is a binary image in which the pixel values 5
Phase Alternating Line, another set of colour television standards used in West Germany, The United Kingdom, parts of Europe, South America, parts of Asia and Africa.
78
3. How to Sense?
of the object are 1 and those of the background are 0. After an object has been ‘separated’ from the background, its geometric properties such as size, position and orientation may be required for decision making. These features can be computed from its binary image. In other words, the many basic concepts and processing techniques of computer vision are found in the simpler domain of binary image processing. This section introduces and explains these concepts and techniques through the processing of binary images. 3.3.1 Thresholding Thresholding is a method to convert a gray scale image into a binary image so that objects of interest are separated from the background. For thresholding to be effective in object-background separation, it is necessary that objects and background have sufficient contrast and the intensity levels of either the objects or the background are known. Thresholding is therefore an applicable vision technique for a MiroSoT system since the rectangular playground is black, providing a good contrast with the 5cm high white side walls and the blue or yellow patches worn on top of each soccer robot assigned as a team identification (ID) colour to visually identify the different team it belongs to. For a gray scale image F , let F [i, j] be the original intensity level of its pixel (referenced as a[i, j] in the grid of Fig. 3.2) and FT [i, j] be the thresholded value of the same pixel predicated on a criterion C on F [i, j]. To obtain a binary image B (for which its pixel intensity level B[i, j] = FT [i, j] ∈ {0, 1}), the thresholding function for FT [i, j] is defined by FT [i, j] =
1 if C(F [i, j]), 0 otherwise.
(3.6)
Depending on the application knowledge, the criterion function C(F [i, j]) can take one of the following formulae. 1. F [i, j] ≤ Tu , for some upper bound threshold Tu . This is used to separate darker colour objects (lower gray levels) from the lighter colour background (higher gray levels). 2. F [i, j] ≥ Tl , for some lower bound threshold Tl . This is used to separate lighter colour objects (higher gray levels) from the darker colour background (lower gray levels). 3. T1 ≤ F [i, j] ≤ T2 , for some threshold range [T1 , T2 ]. This is used to separate objects with intensity values in the range [T1 , T2 ] from the background known to be outside this range. 4. F [i, j] ∈ Z, where Z is a union of several disjoint threshold ranges. This is a general thresholding scheme used to separate objects with intensity levels that may come from several disjoint ranges from the background known to be outside all these ranges.
3.3 Binary Image Processing
79
Fig. 3.4 shows a gray level image and its resulting binary images obtained by using different thresholds.
(a) Image
(b) Tl = 48
(c) T1 = 21, T2 = 48
Fig. 3.4. A gray level image and its resulting binary images using different thresholds
Automated thresholding of images is often the first step in the analysis of images in computer vision systems. Many available thresholding techniques utilize the intensity distribution in an image and the knowledge about the objects of interest for selecting a proper threshold value automatically. To elaborate briefly, consider an image of a MiroSoT robot in the playground and its histogram, as shown in Fig. 3.5; an image histogram is an intensity
80
3. How to Sense?
distribution plot showing the number of image pixels for each gray scale level. The MiroSoT robot appears as a bright square in the image and it lies in the gray level range [140, 180]. From the ‘valley’ between the two peaks in the histogram, it is thus clear that setting the threshold T (either lower or upper bound) to 135 will quite distinctly separate the square from the background.
\WWW
[WWW
ZWWW
YWWW
XWWW
W
WGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXWWGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGX\WGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGYWW
Fig. 3.5. An image and its histogram
One should note that although the thresholding techniques available are useful tools, a proper threshold value is usually selected on the basis of human experience with the application domain. 3.3.2 Computing Geometric Properties In this section, we assume that a thresholding technique has yielded a binary image B of size n × m (n rows, m columns). This image has only one object; the pixel values of the object are 1 and those of the background are 0. Size. The area A occupied by the object in binary image B is given by
A=
n m
B[i, j].
(3.7)
i=1 j=1
Position. The position of an object in an image plays an important role in robot soccer. In MiroSoT, the objects, namely, the robots and the ball, appear on a known surface - the rectangular playground - and the position of the overhead camera is known with respect to the playground. In this case, an object’s position in the image determines its spatial location (in the playground). The position of an object in an image may be defined using the centre of area of the object image. Though other methods such as ‘using a rectangle
3.3 Binary Image Processing
81
to enclose the object image’ may be used, the centre of area of the object image is a point and is relatively insensitive to noise in the whole image. The following equation provides formulae to compute the centre position (¯ x, y¯) of the object in binary image B with respect to the image Cartesian x-y plane. n x ¯= y¯ =
m
i=1
j=1
mA
n i=1
j=1
j · B[i, j] i · B[i, j]
A
, (3.8)
.
To illustrate, consider the image in Fig. 3.6. Note the origin of the image Cartesian x-y plane and the directions of the image x-y axes. T T T
(x, y) Fig. 3.6. An example showing the centre position (¯ x, y¯) of an image
In this example, A = 13. The centre position (¯ x, y¯) is calculated using Eq. (3.8) as follows: 5×4+6×5+7×4 13 = 6.0. 4×2+5×3+6×3+7×3+8×2 y¯ = 13 = 6.0.
x ¯=
82
3. How to Sense?
In general, the calculated x¯ and y¯ may not be integers, and usually lies between the column and row indices of two pixel positions. It does not imply, however, that the calculated position is better than the resolution of pixel positions. Orientation. Calculating the orientation of an object is a little more complicated than calculating its position. For an object which is circular, its orientation is not unique. An object’s orientation is unique if it is elongated, such as that shown in Fig. 3.7. y
Array row i
Line equation:
Orientation: x Array column j
Fig. 3.7. Finding the orientation of the object
The angle θ between the thick line and x-axis in Fig. 3.7 is defined as the object’s orientation. This thick line, called the object’s orientation line, is the least squares fit of the positions of all the pixels of the object (or simply called object points) in binary image B, i.e., it is the line that best fits the object points in that the sum of the squared distances between the object points and the line is minimized. Formally, to find the equation of such a line from which the orientation information about the object can be obtained, minimize λ2 , the sum of the squared perpendicular distances of all object points from the line given by λ2 =
m n
d2ij · B[i, j],
(3.9)
i=1 j=1
where dij is the perpendicular distance from an object point [i, j] to the thick line. To avoid numerical problems when the line is nearly vertical, represent the line in polar coordinates:
3.3 Binary Image Processing
ρ = x cos θo + y sin θo .
83
(3.10)
Referring to Fig. 3.7, θo is the orientation of the normal to the thick line with the x-axis, ρ is the normal (and of course, shortest) distance between the line and the origin. The normal distance d between an arbitrary Cartesian coordinate (x, y) and the line characterized by Eq. (3.10) satisfies the following equation. d2 = (x cos θo + y sin θo − ρ)2 .
(3.11)
Plugging Eq. (3.11) for every image point into Eq. (3.9) (the minimization criterion), we get
λ2 =
m n (xij cos θo + yij sin θo − ρ)2 · B[i, j],
(3.12)
i=1 j=1
where (xij , yij ) is the Cartesian coordinate point of pixel position [i, j] (i.e., pixel at the i-th row and j-th column). The characteristic equation model (ρ, θo ) of the line that best fits the object points can be obtained by minimizing λ2 , done as follows: In Eq. (3.12), set the derivative of λ2 with respect to ρ to zero. Then solving for ρ yields ρ=x ¯ cos θo + y¯ sin θo ,
(3.13)
which shows that the regression line passes through the centre (¯ x, y¯) of the object points. Define (˜ xij , y˜ij ) = (xij − x ¯, yij − y¯). Substituting these definitions and Eq. (3.13) into Eq. (3.9), we get λ2 = a cos2 θo + b sin θo cos θo + c sin2 θo ,
(3.14)
where a=
m n (˜ xij )2 · B[i, j], i=1 j=1 m n
b = 2· c=
x ˜ij y˜ij · B[i, j],
i=1 j=1 m n
(˜ yij )2 · B[i, j].
i=1 j=1
Eq. (3.14) can be rewritten as λ2 =
a + c (a − c) · cos 2θo b sin 2θo + + . 2 2 2
(3.15)
84
3. How to Sense?
In Eq. (3.15), by setting the derivative of λ2 with respect to θo to zero, we get tan 2θo =
b . (a − c)
Therefore, θo =
1 arctan 2
b a−c
(3.16)
It follows easily that π 1 θ = θo + = arctan 2 2
.
(3.17)
b a−c
+
π . 2
(3.18)
To fix the angular reference, let θ ∈ (−π, π]. Note that if b = 0 and a = c, the object’s orientation is undefined. 3.3.3 Labelling Given a binary image, we need to group all spatially close pixels of value 1 into connected components that distinctly represent the different objects. This is done using a component labelling algorithm, which finds all connected components in the image and assigns a unique label, usually an integer, to all pixels in the same component. Fig. 3.8 shows an example of component labeling of a binary image derived using some thresholding technique. The dark pixels (see Fig. 3.8(a)) have ‘1’ values and the others have ‘0’ values. There are a total of 4 connected components and they are given labeling values of 1, 2, 3 and 4 (see Fig. 3.8(b)) through some labeling algorithm. Computing the geometric properties (such as size, position and orientation) of each object, as has been covered in Section 3.3.2, is usually made as its component is labelled. Other properties that can be computed are perimeter (number of pixels at perimeter) and compactness of the object image. Compactness is defined by 4π · Area , Perimeter2 where ‘Area’ refers to the object image area defined by A in Eq. (3.7). A circular object has a compactness value of 1; usually, objects with more complex shapes have smaller values. If the shapes of objects are known, as in robot soccer, compactness and perimeter values are helpful in finding and recognizing them. Computing object properties can be easily integrated into the labelling algorithm. However, in order not to clutter the idea of component labelling, this section only presents the intrinsic labelling algorithms. The notion of spatial proximity has to be made precise before two labelling algorithms can be clearly presented. For this purpose, some definitions are introduced first. Compactness =
3.3 Binary Image Processing
85
5555 555 666 66 777 7777 7777 (a) Binary image
88 8888 8888
(b) Labelled binary image
Fig. 3.8. A binary image and its labelled connected components
Neighbours. Consider a digital image represented on a grid of squares each representing an image pixel. In this representation, a pixel has a common boundary with each of four other pixels, i.e., it shares every side of its square with one different pixel. It shares every corner of its square with each of four additional pixels. We say that two pixels are 4-neighbours if they share a common boundary; and are 8-neighbours if they share at least one corner. As shown in Fig. 3.9, for a pixel at square position [i, j] in a grid, its four 4-neighbours are at [i + 1, j], [i − 1, j], [i, j + 1], [i, j − 1] and its eight 8-neighbours are at [i + 1, j + 1], [i + 1, j − 1], [i − 1, j + 1], [i − 1, j − 1], plus the positions of all of its 4-neighbours. A pixel is said to be 4-connected to its 4-neighbours and 8-connected to its 8-neighbours. Paths. A path from the pixel at [i0 , j0 ] to the pixel at [in , jn ] is a position sequence of pixels [i0 , jo ], [i1 , j1 ], [i2 , j2 ], · · · , [in , jn ] such that any two consecutive pixels in the sequence, at [ik , jk ] and [ik+1 , jk+1 ], 0 ≤ k ≤ n − 1, are neighbours. If the consecutive pixels at [ik , jk ] and [ik+1 , jk+1 ] are 4neighbours for all k, the path is a 4-path; if they are 8-neighbours, the path is an 8-path. Simple examples of these are shown in Fig. 3.10. Foreground. The set of all unity valued pixels in a binary image is called the foreground and is denoted by S.
86
3. How to Sense?
i, j
i, j
(a) 4-neighbours
(b) 8-neighbours
Fig. 3.9. The 4- and 8-neighbourhoods of a pixel at square position [i, j]
(a) 4-path
(b) 8-path
Fig. 3.10. Examples of a 4-path and an 8-path
Connectivity. A pixel p ∈ S is said to be connected to q ∈ S if there is a path from p to q consisting entirely of pixels of S. For any three pixels p, q, r ∈ S, the following properties are satisfied. 1. Pixel p is connected to itself (reflexivity). 2. If p is connected to q, then q is connected to p (commutativity). 3. If p is connected to q and q is connected to r, then p is connected to r (transitivity). In other words, mathematically, connectivity is an equivalence relation. Connected Components and Spatial Proxmity. A subset of S in which each pixel is connected to all other pixels is called a connected component. Points over the same object surface project onto spatially close pixels in an image, and this is captured by the concept of a connected component.
3.3 Binary Image Processing
87
Following, we present two basic algorithms for finding and assigning connected components in a binary image. One is recursive while the other is sequential. A recursive algorithm is very time inefficient on a sequential processor, and so is usually implemented on parallel processors. A sequential algorithm takes less computation time and memory. 3.3.4 Labelling Algorithm 1: Recursive Recursive Connected Components Algorithm 1. Initialize label L = 1. 2. Scan the binary image to find an unlabelled pixel p ∈ S and assign it label L. 3. Recursively assign the label L only to all pixels q ∈ S that p ∈ S is connected to. 4. Set L := L + 1. 5. Go to Step 2.
Given below is a pseudocode Label(r,c) that implements Step 3 of algorithm using 4-connectivity. Label(r,c): // Begin Store(r,c,L); If p[r][c-1] is If p[r][c+1] is If p[r-1][c] is If p[r+1][c] is // End
1 1 1 1
and and and and
unlabeled, unlabeled, unlabeled, unlabeled,
Label(r,c-1); Label(r,c+1); Label(r-1,c); Label(r+1,c);
Label(r,c) is a recursive function. Store(r,c,L) assigns label L to a pixel at [r,c]. p[r][c] stores the value of the pixel at row r and column c. Hence, Label(r,c) recursively labels a pixel at [r,c], and all its 4-neighbor pixels (around it) that have unity values. 3.3.5 Labelling Algorithm 2: Sequential, 4-Connectivity Sequential Connected Components Algorithm 1. Scan the binary image from left to right, top to bottom. 2. If a pixel at [i, j] is in S (i.e., its pixel value is 1), then for the two pixels at [i − 1, j] and [i, j − 1]:
88
3. How to Sense?
a) If one has been assigned6 with label (say, L1 ) and the other is not labelled, assign label L1 to the pixel at [i, j]. b) If both have been assigned and with the same label (say L2 ), assign label L2 to the pixel at [i, j]. c) If both are assigned but with different labels, then assign the label of pixel at [i − 1, j] to that at [i, j], and record the two labels in the equivalence table (as equivalent labels). d) If both the pixels are not labelled, assign a new label L to the pixel at [i, j] and record it in the equivalence table. 3. If not all pixels in S are labelled, go to Step 2. 4. Determine the lowest-valued label for each equivalent-label set in the equivalence table. 5. Scan the image again and replace each label by the lowest-valued label in its equivalent-label set.
Equivalent labels are different label values assigned to pixels of the same connected component. Equivalent labels constitute an equivalent-label set, and these sets constitute an equivalence table. The algorithm requires two scans of the image. In the first scan (Steps 1-3), the connected components are found, and all those labels of the pixels in one component are put into its equivalent-label set. In the second scan (Steps 4-5), each label assigned to a pixel in the image is replaced by the lowest-valued label in its equivalent-label set. Given below is a pseudocode that implements Step 2 of the algorithm, but with details of the equivalent label recording in Step 2 abstracted away. It assumes that the elements of array label are all initialized to 0; this means that all the pixels of a binary image are initially unlabelled. // Begin L=1; If p[r][c] = 1 { if(label[r-1][c] > 0 && label[r][c-1] = 0) label[r][c] = label[r-1][c]; if(label[r-1][c] = 0 && label[r][c-1] > 0) label[r][c] = label[r][c-1]; if(label[r-1][c] > 0 && p[r][c-1] == label[r-1][c]) label[r][c] = label[r-1][c]; 6
Note that only pixels which have unity values are labelled.
3.3 Binary Image Processing
89
if(label[r-1][c] = 0 && label[r][c-1] = 0) { label[r][c] = L; L = L+1; } } // End Fig. 3.11 shows an example of the algorithm’s workings on an image. The equivalence table derived after Steps 1- 3 is ET = {ES1, ES2}, and each equivalent-label set contains equivalent labels, namely, ES1 = {1, 3} and ES2 = {2, 4}. Each label in a set belongs to the same connected component. After Step 4, the lowest-valued labels in ES1 and ES2 are 1 and 2 respectively. In Step 5, all the pixels with labels 3 and 4 are reassigned with labels 1 and 2, respectively. The final outcome is two connected components, each uniquely identified by a different label assigned to all its pixels.
Fig. 3.11. An example illustrating the workings of the sequential connected components algorithm on an image
3.3.6 Size Filtering Noise is inherent in computer vision. Some ‘extraneous’ components in an image could appear due to noise arising from the unstable resolution of the camera and the uneven illumination in environmental lighting. The high irregularities of noise often result in many scattered noise components in the (labelled) binary image, but these components are usually small and ragged contours.
90
3. How to Sense?
In many applications such as robot soccer, the objects of interest have connected components that are individually of greater sizes (i.e., areas in terms of the number of pixels in them) than the biggest noise component. Therefore, one may use what is called size filtering to remove noise after component labeling. This involves changing all the pixel values of a component from 1 to 0 if the component area is less than an appropriately selected size filter Af . This simple filtering mechanism has been found to be very effective in removing noise. Fig. 3.12 shows an example.
(a) Noisy image
(b) Noise filtered image
Fig. 3.12. A noisy binary image and its resulting image after application of a size filter (Af = 8)
3.4 Vision System For MiroSoT Robot Soccer The purpose of the vision system in robot soccer is to compute the robots’ postures and the ball’s position through the processing of the situation images in the playground during the game. A robot’s direction of move or heading direction is indicated by the robot’s heading angle, which is the angle its heading direction makes with the x-axis of the Cartesian x-y frame. In this section, we outline the basic steps in MiroSoT vision processing, the relation between a physical coordinate and its corresponding image coordinate, and provide examples on several practical but specialized aspects of the real vision systems used by previous FIRA Cup MiroSoT teams. 3.4.1 System Processing The basic steps involved in MiroSoT vision processing are as follows:
3.4 Vision System For MiroSoT Robot Soccer
91
1. Calibrate the vision system for colour recognition of target objects with respect to a colour model (we shall assume that the Y U V colour model is used in our description): • This amounts to determining and setting the intensity ranges [Ymin , Ymax ], [Umin , Umax ] and [Vmin , Vmax ] of every colour used for the target objects under the Y U V colour model. According to the FIRA MiroSoT rules, a different robot soccer team is distinguished by a different team colour of either yellow or blue patches placed on top of its team robots. Additional colours can be placed to uniquely identify each individual robot of a team. The target objects in robot soccer are the robots and the ball; thus the Y U V intensity ranges of each different colour for the team ID patch, robot ID patch and the ball need to be determined and set. 2. Run the vision system When the vision system is running, it performs the following steps in a cyclic fashion. a) Obtain a binary image of target objects from a captured Y U V -colour image. • By applying a thresholding technique against the Y U V intensity ranges. b) Obtain a labelled connected components image from the binary image. • By applying a labelling algorithm to the binary image. c) Remove noise in the labelled connected components image. • By performing size filtering of the labelled connected components image. d) Determine the (centre) positions of the remaining connected components. • By applying Eq. (3.8) to each connected component (with other components ‘virtually’ removed). e) Recognize target objects from all the remaining connected components and compute the postures of the target robots. • Through various approaches that exploit the geometry of the target objects (i.e., the robots and the ball) and the layout of the colour patches on top of each robot. During a soccer match, Steps 2a - 2e of vision processing (the SENSE functionality) is executed continually and as shown in the flow chart of Fig. 3.13, it works in a close loop in conjunction with the DECIDE and ACT:Control functionalities. 3.4.2 Image and Physical Coordinates on MiroSoT Playground Before proceeding to the examples, we formalize the simple relationship between a physical coordinate point (x, y) in the 150cm × 130cm playground
92
3. How to Sense?
Start_Game VISUAL SENSE Grab_Image Scan_Image Set_Colour_Ranges
Label_Image Locate_Objects Loop
Interrupt ?
Stop_Game NO
DECIDE and Generate CONTROL
YES
Stop_AllRobots
Send_Command
Fig. 3.13. A high-level flow chart showing vision processing as a software component of a robot soccer host-system program
and the corresponding point (xj , y i ) which is the image coordinate of the pixel at [i, j] (i.e., in the i-th row and j-th column) in its n × m image (i.e., image of n rows and m columns of pixels). Note that in the image coordinate point (xj , y i ), xj and y i correspond to column j and row i of the pixel, respectively. The physical and image coordinate frames are depicted in Fig. 3.14. The two frames are displaced such that the top-left corner of the playground has the image coordinate point (Jmin , Imin ), Jmin ≥ 0 and Imin ≥ 0; and the bottom-right corner has the image coordinate point (Jmax , Imax ), Jmax ≤ m and Imax ≤ n. Then, for an arbitrary image coordinate point (xj , y i ); the corresponding physical coordinate point is (x, y), given by xj − Jmin × 150cm, Jmax − Jmin Imax − y i y= × 130cm. Imax − Imin
x=
The intra-pixel spatial resolution of the image is defined by
(3.19)
3.4 Vision System For MiroSoT Robot Soccer
93
Image X - axis [1,1]
Physic al Y - a xis
I mage Y-- axis
j
( J min, I min) (0, (0,130) 180) ( xj , yi ) ( x,y )
180cm
( J max ,I max ) (220, (150,0) 0) 220cm Physical X-- axis [ n,m]
i Fig. 3.14. On mapping the image and physical coordinate points in the playground
Lg cm per pixel (column-wise), Jmax − Jmin Bg cm per pixel (row-wise), Imax − Imin
(3.20)
where the length Lg and breadth Bg of the playground are 150 cm and 130 cm, respectively. In general, the lower this intra-pixel spatial resolution is, the higher the accuracy of the computed physical position (x, y). Note that elsewhere in this book, we rely on context rather than notations to indicate if a coordinate (x, y) is an image point or a physical point. 3.4.3 Example 1: System Hardware Most participating teams of the previous FIRA Cup’s used similar vision hardware, differing only in the various vision software algorithms for computing the robots’ postures. There are many commercially available vision boards that support very high sampling rates, but these teams used the relatively cheaper boards that sample at 30 image frames per second.
94
3. How to Sense?
Fig. 3.15. Frame grabber (Media camp 7 plus)
The hardware components of an example vision system are as follows: 1. Frame grabber: Media camp7 Plus (see Fig. 3.15) - DOOIN electronics a) Input: Video 1 / Video 2 / SVHS / TV. b) Real image capture: 240 × 320, 30 frames/sec. c) Video signal: NTSC / PAL. 2. Overhead camera: PULNiX TMC-7 a) 768(H) 494(V) resolution. b) Controllable shutter speed: 1/30 - 1/10,000 second. c) lens: 8mm F1.3. 3. Host computer: Pentium PC. The vision board uses the Y U V colour model at 4 : 1 : 1 format. This ratio means that 4 consecutive colour-filtered signal point values (sampled, quantized and stored) in a row of the 2-D signal array consitute a pixel in the 2-D pixel array of the image captured; of which two signal points are for the Y constituent colour, and one each is for the U and V constituent colours. Hence, the actual pixel resolution of the image is not 240 × 320, but 240 × 80. 3.4.4 Example 2: Vision Processing This example illustrates the steps of vision processing for a MiroSoT team, up to and including Step 2d (see Section 3.4.1). The hardware in Example 1 is used. 1. Calibrate the vision system for colour recognition of target objects. • The calibration steps of determining and setting the Y U V intensity ranges for the different colours of the team ID patch, robot ID patch and the ball are as follows: a) Store a sampled image into memory, and display it on the host monitor screen.
3.4 Vision System For MiroSoT Robot Soccer
95
b) Do a scan of the area where an object (team robot or ball) with the target colour is. c) Scrutinize the Y , U and V constituent colour values of pixels in that area to determine the maximum and minimum values of each constituent colour for the target colour. To lessen any deviation due to noise, these maximum and minimum values are adjusted and readjusted several times, possibly on a different team robot patch, until each and every image pixel of the object (or that part of the object) which has the target colour appears on the monitor screen in that colour. d) Store the finalized maximum and minimum values in computer memory. e) Repeat Steps 1c-1d for each different target colour. 2. Run the vision system a) Apply a thresholding technique against the Y U V intensity ranges. In this process, the Y , U and V constituent values of every pixel in the 2-D colour image frame are checked if they fall in the corresponding constituent ranges of any target colour. If so, the value of the pixel is changed to a unity value, and its position is stored in association with the matched target colour. b) Apply a labelling algorithm to the binary image. The labelling algorithm used modifies Step 2 of the sequential labelling algorithm introduced in Section 3.3.5. The modified step is given below. If a pixel at [i, j] has unity value, then do the following: i. If position [i, j] is within the intra-component distance of an existing representative pixel of component (say, with label L), then A. assign label L to the pixel, B. increment the pixel count for this component by 1. ii. Else, A. include the pixel as a representative of a new component and assign it a new component label, B. create and initialize the pixel count for this new component to 1. A component representative pixel is defined as the first unity-valued pixel found for a new component. A pixel at [i, j] is said to be within intra-component distance dc of a component representative pixel at [ir , jr ] (and therefore is also a pixel of the component) if (i − ir )2 + (j − jr )2 ≤ dc . This intra-component distance is userspecified.
96
3. How to Sense?
The resulting algorithm is faster than the sequential algorithm but can erroneously map a bigger component if the objects come very close to one another. It assumes that the sizes of the target objects are known and will remain constant in the images captured; this assumption is, however, easily satisfied since the robots and the ball are of fixed sizes, and the overhead camera is viewing the whole playground vertically downward from a fixed height. Fig. 3.16 shows an example of a connected components image labelled by this modified algorithm. Image X - axis
[1,1] Image Y - axis
j [3,3]
Component 1
[4,13]
Component 2
Component 3 [9,5]
[10,16]
Component 4
[13,21] i
: Representative pixel of each component : Component pixel
Fig. 3.16. Example 1: Labelled components image
c) Do size filtering of the labelled connected components image. The size filter Af needs to be set. Suppose Af = 5, then component 4 in Fig. 3.16 will be filtered out (i.e., removed). d) Determine the (centre) positions of the remaining connected components. After filtering, three labelled connected components in Fig. 3.16, namely 1, 2 and 3, remain. Their centre coordinates (with respect to the image x-y axes) can be calculated using Eq. (3.8), and converted to physical coordinates using Eq. (3.19). For instance, for connected component 1, its centre position (¯ xj1 , y¯1i ) can be computed as follows:
3.4 Vision System For MiroSoT Robot Soccer
97
3×3+4×3+5×2 8 = 3.875. 3×2+4×3+5×3 y¯1i = 8 = 4.125.
x ¯j1 =
The image pixel resolution is n × m, where n = 13 rows and m = 21 columns. Suppose that Imin = Jmin = 0, and Imax = 13 and Jmax = 21, i.e., the playground and its image boundaries coincide. Then the physical coordinate (¯ x1 , y¯1 ) of the centre position of component 1 is given by 3.875 × 150cm 21 = 27.67 cm. 13 − 4.125 y¯1 = × 130cm 13 = 88.75 cm.
x ¯1 =
3.4.5 Example 3: Information Extraction Following up on Example 2, this example shows how to recognize the target objects and compute the posture of each team robot. This addresses Step 2e of the vision process (see Section 3.4.1).
Robot ID colour
Robot ID colour
Team ID colour Team ID colour
Fig. 3.17. A robot’s colour patch layout
Two commonly used colour patch layouts for the top of a robot are as shown in Fig. 3.17. In Fig. 3.17(a), the placement of the robot ID colour square-patch on the upper left quadrant (on the square top of the robot), with the team ID
98
3. How to Sense?
colour square-patch on the lower right quadrant, makes it possible to draw the robot’s orientation line through the centres of the square patches and the robot square top, and visually set the robot’s heading direction as shown in Fig. 3.18.
Y
(220cm,180cm)
p π 4
Playground
( xr ,,yyr )
Heading direction
θqo
θq
( xt , yt ) ( x,yy)) (x,
Orientation line of colour colour patches
(0.0)
X
Fig. 3.18. Computing a robot’s posture
For the layout in Fig. 3.17(b), the robot’s orientation line can be drawn by applying the general technique introduced in Section 3.3.2 (on page 82) to the team ID colour hexagon-patch (placed obliquely on the square top of the robot). The placement of the robot ID colour triangle-patch on the upper right-hand corner, with the ‘base’ of the triangle-patch parallel to the orientation line, then provides a means to help visually set the robot’s heading direction. In this example, the layout as shown in Fig. 3.17(a) is used. The target objects are recognized from the labelled connected components image as follows: 1. The Ball Identified as the connected component with the most number of pixels.
3.4 Vision System For MiroSoT Robot Soccer
99
2. Team Robot Identified from a pair of connected components, each representing a team ID colour patch and a robot ID colour patch. Let Dr and Dt contain the labels of components with robot ID colours and team ID colours, respectively, and do be the intra-object distance; do is user-specified. Then, the pairing is done using ‘Nearest Neighbour Association’, as follows: For each connected component Lrp ∈ Dr , do the following: a) For each connected component Ltq ∈ Dt , • compute the distance dpq between the centre points (xrp , ypr ) and (xtq , yqt ) of components Lrp and Ltq , respectively, using – dpq = (xtq − xrp )2 + (yqt − ypr )2 . b) Select component Lts for which • dps = min{dpq | for all Ltq , given a Lrp }. c) If the shortest distance dps ≤ do , then • pair up the components Lrp and Lts . After associating the (labelled) connected components with the target objects, each target robot’s posture can be computed. Referring to Fig. 3.18, the image coordinates (xr , yr ) and (xt , yt ) are, respectively, the centre points of the robot ID and team ID square colour patches that uniquely identify the team robot. The image coordinate (x, y) of the robot’s centre point can be calculated as follows: xr + xt , 2 yr + y t y= . 2
x=
(3.21)
To find the heading angle θ, the orientation angle θo should be computed first; this can actually be determined based on the general technique introduced in Section 3.3.2 (on page 82) by treating the team ID and robot ID colour patches as one whole, and using some means for direction indication. In this example, a simpler and perhaps more practical method is used by computing θo as follows: θo = tan−1
yrt xrt
,
where yrt = (yr − yt ) and xrt = (xr − xt ).
(3.22)
100
3. How to Sense?
Note that yrt and xrt can be positive or negative, and their signs together define the heading direction of the robot. Fixing θo ∈ (−π, π], θo θo θo θo
∈ [0, π2 ] ∈ ( π2 , π] ∈ (− π2 , 0) ∈ (−π, − π2 ]
if if if if
yrt yrt yrt yrt
≥0 ≥0 r2 . The general form of the limit-cycle is thus derived, using which we can adjust the radius and the direction of the limit-cycle while maintaining motion stability. Importantly, it provides an easy programming basis for implementation, as would be elaborated in the next section. The limit-cycle method generates a local navigation plan by applying the limit-cycle characteristics exhibited in Eq. (4.31). The plan thus shows an efficient way by which the robot can avoid obstacles without having to move far away from them. The Limit-Cycle Local Navigation. Figure 4.23 depicts the limit-cycle method which can drive a robot towards the desired direction and avoid an obstacle. At this time, the direction, either clockwise or counter-clockwise, should be decided. Fig. 4.24 shows a situation in robot soccer where the rightmost robot needs to avoid three obstacle robots A, B and C in moving towards the target (ball).
Robot Desired direction
Obstacle
rv
Fig. 4.23. Navigation using the limit-cycle method
Before applying the limit-cycle to local navigation, some terminology required is defined in the following. • Rotational direction: It decides the turning direction taken to avoid an obstacle, counter-clockwise (CCW) or clockwise (CW).
134
4. How to Decide and Act?
A T21
r n1
O d2 C
O n1 T11
l
Target T22
B T12
rd1 Od1
Fig. 4.24. Multiple obstacle situation
• Variable obstacle (Ov ): In general, the robot is assumed to be a point mass in a simulated situation. This may lead to collision with actual obstacles in a real implementation. So, we define the variable obstacle whose radius is decided by its relative position to the robot and the sizes of the obstacle and the robot, which will be explained later. Here, for simplicity, the variable obstacle is assumed to be circular. The circle of Ov will be a limit-cycle such that the robot follows the circle boundary of Ov . • Variable radius (rv ): The radius of the variable obstacle. It varies with the size of the robot and the obstacle’s relative position to the robot. If we use rv as the radius of a limit-cycle, the robot can navigate without collision with the obstacle. • Disturbing obstacle (Od ): Variable obstacles that are in the way between the robot and the target point. These obstacles are assigned consecutive numbers such that for any two such obstacles Odx and Ody , Odx is nearer to the robot than Ody if x < y. The disturbing obstacle nearest to the robot is designated Od1 , and the next one is Od2 , etc. • Non-disturbing obstacle (On ): Variable obstacles that are not in the way between the robot and the target point. The nondisturbing obstacle nearest to the robot is designated On1 , and the next one is On2 , etc. • Tangent points (Tn1 , Tn2 ): Intersection points of the circle with the variable radius and its tangent lines through the target point. Note that there are two tangent points on each obstacle. Now, the steps of the limit-cycle method (for local navigation) are as follows: 1. Draw a line l from the robot to the target in a global coordinate OXY as follows:
4.6 Unified Navigation Control
ax + by + c = 0.
(4.32)
2. Treat variable obstacles as disturbing obstacles Odi ’s if the line l crosses them, else, treat them as non-disturbing obstacles, On ’s. 3. Move towards the target if there is no Od .
Y rv (Q x, Qy)
l d
(Rx, Ry)
Target (Gx, Gy) 0
X
Fig. 4.25. Decision of rotational direction
4. Referring to Fig. 4.25, calculate the distance d from the centre of the nearest disturbing obstacle, Od to the line l, using d=
aQx + bQy + c √ , a2 + b2
(4.33)
where (Qx , Qy ), (Gx , Gy ) and (Rx , Ry ) are the xy-values of centre positions of the obstacle, the target and the robot, respectively. Eq. (4.30) is extended to fit to the navigation plan by substituting d and rv . If x1 and x2 are matched with x and y in the global coordinate OXY , calculate the desired direction of the robot at each position using d y + x(rv2 − x2 − y 2 ), |d| d y˙ = − x + y(rv2 − x2 − y 2 ), |d|
x˙ =
(4.34)
where x and y are relative values to the obstacle. In this equation, if d is positive, the robot avoids the obstacle Od , clockwise. If d is negative, the avoidance takes place in a counter-clockwise direction. Calculate rv by the size of the robot and the relative position to the obstacle, using the following equation: rv = rr + ro + δ,
(4.35)
135
136
4. How to Decide and Act?
where rr and ro are the radii of the robot and the obstacle, respectively. Here, ro = r, where r is the radius of the real obstacle if the obstacle is a disturbing one; ro = 0 if it is non-disturbing or virtual. δ is a safety margin for collision avoidance. As the robot moves, the line l varies. So, repeat Steps 2 ∼ 4 until the the destination is reached. Note that to obtain rv experimentally, we enclose the obtained 2D image of the obstacle within an appropriate circle. The radius of the circle is ro . With rv and δ given and ro measured, rv is obtained using Eq. (4.35). For example, suppose, as shown in Fig. 4.24, there are three obstacles between the robot and the target. The robot should move towards the target avoiding these obstacles which are marked as A, B and C. First, a line l can be marked from the robot to the target (Step 1). This line goes through two obstacles B and C, so they are considered as Od1 , Od2 , respectively and obstacle A as On1 (Step 2). Using the direction of line l through Od1 , the robot decides the direction in which it should avoid obstacle B. The counter-clockwise direction is chosen (Step 4) as shown in Fig. 4.26(a). It follows the chosen directions until it avoids obstacle B. Once the robot passes obstacle B, the line l ceases to go through obstacle B. Thus, obstacle B becomes On1 and obstacles C and A become Od1 and On2 , respectively, (Steps 1 ∼ 2). Applying the limit-cycle method again (to avoid obstacle C), the navigation path thus generated is as shown by the solid line in Figure 4.26(b). Extended Limit-Cycle Local Navigation. The robot in Fig. 4.27 is avoiding obstacle A counter-clockwise by the limit-cycle navigation method. Later, however, it moves clockwise beside the obstacle B and hence it will get stuck in a local minima between obstacle A and obstacle B. To overcome this problem, we have to add the following rule to Step 4. The distance d from the centre of the obstacle Od to the line l can be calculated as d=
aQx + bQy + c √ . a2 + b2
(4.36)
If more than two variable obstacles are overlapped, they can be regarded as one obstacle and a new central position of obstacles can be defined as 1 Qxk , n n
Qx =
k=0
1 Qyk n n
Qy =
(4.37)
k=0
where Qxk and Qyk are the x-y coordinate values of centre position of the overlapped obstacles. With this (Qx , Qy ), new distance dfor all overlapped Od ’s can be calculated. In the global coordinate OXY , the desired direction of the robot at each position can be calculated from
4.6 Unified Navigation Control
137
A Od2
O n1 1
T11 4
C
l
Target B rd1
T12
Od1 2
A Od1 2 T11 Target
T12
C
O n2 1 l
4
B O n1
rd1
Fig. 4.26. Navigation example
d y + x(rv2 − x2 − y 2 ), |d| d y˙ = − x + y(rv2 − x2 − y 2 ), |d|
x˙ =
(4.38)
where x and y are relative values to the obstacle. rv can be obtained from Eq. (4.35). For example, Fig. 4.28 shows three overlapped variable obstacles. First, new (Qx , Qy ) can be calculated by former the modified Step 4. Substituting this in Eq. (4.36), d can be calculated. Obstacle B is the closest to the robot.
138
4. How to Decide and Act?
On1 B Target
A O d1
Fig. 4.27. Local minima with two overlapped obstacles
So, with rv and d, navigation plan can be provided to avoid the obstacle by Eq. (4.38). Finally, the robot can move towards the target avoiding three obstacles as shown in Fig. 4.28.
Od1 Target
B
On1 rv
A (Qx, Qy)
d
C On2 Fig. 4.28. Extended navigation method
Application to Robot Soccer. In the previous section, the limit-cycle navigation method is proposed for avoiding obstacles and moving to a target. In robot soccer, a primary task of the robot is to kick the ball into the opponent goal. So, as shown in Fig. 4.29(a), when the robot reaches the ball, it has to position itself behind the ball in such a way that it faces the opponent goal area. In other words, the final position and direction should be satisfied for kicking. To apply the limit-cycle navigation method to robot soccer, the following rule should be added to Step 4 in Section 2.4. Putting two virtual variable obstacles on either side of the ball, the modified target, where the target heading is the centre of the goal, is on the extended line and
4.6 Unified Navigation Control
Od2
139
B Od1
A
(a) The limit-cycle navigation method without virtual variable obstacles
Od2
B
A
Od1
O d3 On1
(b) The limit-cycle navigation method with virtual variable obstacles Fig. 4.29. Robot soccer example
d=
−1 (CCW ) if a virtual variable obstacle is on the left of the ball, 1 (CW ) if a virtual variable obstacle is on the right of the ball. (4.39)
For example, a situation as in Fig. 4.29(a) is modified to a situation where it is assumed that two virtual variable obstacles are on either side of the ball as in Fig. 4.29(b). Thus, the modified target is on the extended line from the opponent goal to the ball and is adjacent to virtual variable obstacles
140
4. How to Decide and Act?
with the minimum variable radius rvmin . In the robot soccer, however, the robot moves as fast as 150cm/s. It is meaningless to calculate the minimum variable radius without considering the centrifugal velocity. The following equation shows the non-slippery minimum radius of the robot: rvmin ≤
mvc2 Fc
(4.40)
where m is the mass of the robot, vc is the centrifugal velocity and Fc is the frictional force of the robot. It should be noted that the upper limit of the minimum variable radius is constrained by the centrifugal velocity and the frictional force of the robot. Since the frictional force is fixed and measurable, the minimum virtual radius can be decided if the centrifugal velocity is known. In Fig. 4.29(b), On1 is on the left side of the ball, so d = −1 by the above rule, for Od3 , d = 1. Then, the robot moves using the limit-cycle navigation plan as shown in Fig. 4.29(b). Without modification, if there is no obstacle A in Fig. 4.29(b), d is negative by the former limit-cycle navigation method. So the robot may kick the ball to home side. Since the limit-cycle navigation method does not calculate all the trajectories in the current situation, but only calculates the next trajectory of the robot using the robot’s current relative positions of the target and any obstacle, this method generates the navigation plan incrementally and “adapts” to the dynamically changing environment. The limit-cycle navigation method, as already described, can adjust the direction and the safety distance for obstacle avoidance, and so is applicable to robot soccer.
Notes on Selected References The book [29] provides an excellent coverage of PID control. Aspects of digital PID and its implementation can be found in the books [29, 36]. A special journal issue on PID control [30] focusses on the design methods and future potential of PID control. For an introduction to robot navigation, refer to the textbook [37]. The univector method originated with the work of [38]. The limit-cycle navigation method originated with the work reported in [39]. Several other navigation methods have been reported for robot soccer, including [40] that proposes an optimal path generating navigation method using a combination of a geometric method and a fuzzy logic method optimized using evolutionary programming. The limitations of potential field navigation methods are discussed in [33, 34]. Earlier unified navigation methods include motor-schema [3], navigation templates [41, 42] and artificial potential functions [43].
5. How to Improve Intelligence? Use Soft Computing Techniques
5.1 Introduction The field of robot soccer provides numerous opportunities for the application of AI methods for game strategy development. As mentioned in the previous chapter, good strategies are needed to decide the roles and actions of team robots during the game. Chapter 4 has introduced a hybrid control architecture in which these strategies can be organized or integrated for proper management and control. In general, building a proper strategy is best guided by the intelligence aspects of search and evolution, knowledge representation and inference and learning and adaptation. In this chapter, these aspects of intelligence as needed by the DECIDE and ACT primitives and their importance are first discussed. The basics of some widely known soft-computing paradigms that make concrete (at least one of) these abstract aspects are then introduced. They include the formalisms of Petri nets, Q-learning, neural networks, evolutionary programming and fuzzy logic. Along which, the use of each paradigm for formulating strategies in robot soccer is motivated through simplified examples taken from previous FIRA Cup MiroSoT teams that demonstrate and emphasize its applicability in control, either at high-level (also called supervisory) or low-level. More specifically, for each paradigm, one or two examples are provided that address some key issues at specific hierarchical levels of the hybrid control architecture introduced in Chapter 4. By this, however, we do not imply that these paradigms cannot be applied at the other levels. The reader interested in the performance evaluations of the example techniques presented should consult the research papers referenced therein. As a note of caution though, the performance evaluation results are often inconclusive and are frequently based on limited empirical testing. As each paradigm is an elaborate field in itself, the reader interested in these paradigms in general should consult the many textbooks referenced.
5.2 Intelligence Basics The central theme in soccer robotics is the concept of an intelligent agent. The notion of such an agent has been defined in Chapter 1. To build the J.-H. Kim, D.-H. Kim, Y.-J. Kim, K.-T. Seow: Soccer Robotics, STAR 11, pp. 141-204, 2004 Springer-Verlag Berlin Heidelberg 2004
142
5. How to Improve Intelligence?
decision-making mechanism of an agent or multi-agent program, it is useful to understand the various aspects that guide in realizing the high-level features underlying agent intelligence, namely, autonomy, reactiveness, pro-activeness and communicativeness. One view held is that these features constitute the primary basis of intelligence, and can be combined in various ways to give rise to other agent features such as cooperation, robustness, fault-tolerance and reliability. The three mutually dependent aspects, also called intelligence basics, that guide in building such features into agents, are 1. search and evolution, 2. knowledge representation and inference, 3. learning and adaptation. 5.2.1 Search and Evolution Given an objective to attain, an agent often needs to decide what to do next by systematically considering the outcomes of various sequences of actions it might take. A state is a discrete representation of the relevant aspects of the agent’s working environment. In the space of states that includes states satisfying the objective, each action taken will lead the agent from one state to another. In general, with several immediate options of unknown values, the agent can decide what to do by first examining different possible sequences of actions that lead to states of known values, and then choosing the best sequence. The process of looking for such a sequence is called search. A search algorithm takes a problem as input and returns a solution in the form of an action sequence. It then uses the solution to guide its actions, doing whatever the solution recommends as the next action to take, and then removing the action from the sequence. Once the sequence has been executed, the objective is said to be achieved, and the agent will find or be assigned a new objective. Finding a solution is done by searching through the state space. The search procedure involves expanding a current state which is not an objectivesatisfying state by applying operators to the state to generate a new set of states, from which the agent needs to choose one. The essence of search then is in choosing one option and putting the others aside, in case the first choice does not lead to a solution. Continual choosing, testing and expanding is made until a solution is found, or there are no more states to be expanded. The choice of which state to expand first is determined by the search strategy, evaluated in terms of the following citeria. • Completeness: Is the strategy guaranteed to find a solution when there is one? • Time complexity: How long does it take to find a solution? • Space complexity: How much memory does it require to perform the search? • Optimality: Does the strategy find the highest-quality solution with respect to some objective function when there are several alternative solutions?
5.2 Intelligence Basics
143
We refer the reader to the book [28] for the many search algorithms using various basic strategies. Evolution. In the literature, a class of stochastic search methods inspired by the process of natural evolution have emerged. These are called evolutionary algorithms, and are distinguished from classical search algorithms by the evolutionary paradigm wherein a population of candidate solutions undergoes iterative operations of variation and selection. There are three predominant EAs, namely, evolutionary programming, evolution strategies and genetic algorithms; more will be said about them in Section 5.6 in the next chapter. A drawback of any evolutionary algorithm is that a solution is ‘better’ only in comparison to other, presently known solutions; such an algorithm actually has no concept of an ‘optimal solution,’ or any way to test whether a solution is optimal. For this reason, evolutionary algorithms are best employed on problems where it is difficult or impossible to test for optimality. This also means that an evolutionary algorithm never knows for certain when to stop, aside from the length of time, or the number of iterations or candidate solutions, that is initialized allow it to explore. 5.2.2 Knowledge Representation and Inference States and actions need to be appropriately represented as knowledge in order that an agent can maintain a relevant description of its environment as new sensory information (or percepts) arrives, and draw new inferences to decide a course of action to achieve its goal. The terms ‘inference’ and ‘reasoning’ are generally used to cover any process by which conclusions are reached. A knowledge representation (KR) is not a data structure; what makes it representational is that it carries meaning called semantics, i.e., there is a correspondence between its constructs, called syntax, and the things it models in the external environment. That correspondence in turn carries with it some constraints. While every representation must be implemented in the machine by some data structure, the representational property is in the correspondence to the things it models in the environment, and in the constraints this correspondence imposes. Some key schemes used for knowledge representation and reasoning (KRR) are 1. Procedural knowledge: Knowledge is encoded in functions/procedures. 2. Networks: A compromise between declarative and procedural schemes, knowledge is represented in a labeled, directed graph whose nodes represent concepts and entities, while its arcs represent relationships between these entities and concepts. 3. Frames: A network in which each node represents prototypical concepts and/or situations. Each node has several property slots whose values may be specified or inherited by default. 4. Logic: A way of declaratively representing and inferring knowledge.
144
5. 6. 7. 8.
5. How to Improve Intelligence?
Decision trees: Concepts are organized in the form of a tree. Statistical knowledge: The use of certainty factors. Rules: The use of production systems to encode condition-action rules. Hybrid schemes: Any representation formalism employing a combination of KRR schemes.
5.2.3 Learning and Adaptation Learning is perhaps the only way an agent can acquire what it needs to know in the absence of complete knowledge about the environment that the agent designer can build into the agent. Learning provides autonomy, and helps the agent improve its behaviour through diligent study of its own experience. Here, no explicit distinction is made between adaptation and learning; instead, it is assumed that adaptation is covered by learning, in that according to common usage, the term adaptation is only applied to those self-modifications that enable an agent to survive in a changed environment. There is a great variety in the possible forms of learning for an agent in a multi-agent environment, and there are several key criteria that may be applied in order to structure this variety. Two standard examples of such criteria, which are well known in the field of machine learning (ML), are the following: 1. The learning method or strategy used by a learning entity (a single agent or several agents). The following methods are usually distinguished. • rote learning (i.e., direct implantation of knowledge and skills without requiring further inference or transformation from the learner); • learning from instruction and by advice taking (i.e., operationalization - transformation into an internal representation and integration with prior knowledge and skills - of new information like an instruction or an advice that is not directly executable by the learner); • learning from examples and by practice (i.e., extraction and refinement of knowledge and skills like a general concept or a standardized pattern of motion from positive and negative examples or from practical experience); • learning by analogy (i.e., solution-preserving transformation of knowledge and skills from a solved to a similar but unsolved problem); • learning by discovery (i.e., gathering new knowledge and skills by making observations, conducting experiments, and generating and testing hypotheses or theories on the basis of the observational and experimental results). A major difference between these methods lies in the amount of learning efforts required by them (increasing from top to bottom). 2. The learning feedback that is available to a learning entity and that indicates the performance level achieved so far. This criterion leads to the following usual distinction.
5.2 Intelligence Basics
145
• supervised learning (i.e., the feedback specifies the desired activity of the learner and the objective of learning is to match this desired action as closely as possible); • reinforcement learning (i.e., the feedback only specifies the utility of the actual activity of the learner and the objective is to maximize this utility); • unsupervised learning (i.e., no explicit feedback is provided and the objective is to find out useful and desired activities on the basis of trialanderror and selforganization processes). In all three cases the learning feedback is assumed to be provided by the system environment or the agents themselves. This means that the environment or an agent providing feedback acts as a ‘teacher’ in the case of supervised learning and as a ‘critic’ in the case of reinforcement learning; in the case of unsupervised learning, the environment and the agents just act as passive ‘observers’. It is important to see that different agents do not necessarily have to learn on the basis of the same learning method or the same type of learning feedback. Moreover, in the course of learning an agent may employ different learning methods and types of learning feedback. Both criteria directly or indirectly lead to the distinction between learning and teaching agents, and they show the close relationship between multiagent learning on the one hand and teaching and tutoring on the other. Examples of other than these two standard criteria, together with a brief description of their extreme values, are the following: 1. The purpose and goal of learning. This criterion allows to distinguish between the following two extremes (and many graduations in between them). • Learning that aims at an improvement with respect to one single agent, its skills and abilities. • Learning that aims at an improvement with respect to the agents as a unit, their coherence and coordination. This criterion could be refined with respect to the number and compatibility of the learning goals pursued by the agents. Generally, an agent may pursue several learning goals at the same time, and some of the learning goals pursued by the agents may be incompatible while others are complementary. 2. The decentralization of a learning process (where a learning process consists of all activities carried out by one or more agents in order to achieve a particular learning goal). This criterion concerns the degree of distribution and parallelism, and there are two obvious extremes: • only one of the available agents is involved in the learning process, and the learning steps are neither distributed nor parallelized;
146
5. How to Improve Intelligence?
• all available agents are involved, and the learning steps are ‘maximally’ distributed and parallelized. Of course, the degree of decentralization may vary for different learning processes. 3. An agent’s involvement in a learning process. With respect to the importance of involvement, one can identify the following two extremes: • the involvement of the agent under consideration is not a necessary condition for achieving the pursued learning goal (e.g., because it can be replaced by another equivalent agent); • the learning goal cannot be achieved without the involvement of exactly this agent. Other aspects of involvement that could be applied in order to refine this criterion are its duration and intensity. It also has to be taken into consideration that an agent may be involved in several learning processes, because it may pursue several learning goals. 4. The agent-agent and agent-environment interaction required for realizing a learning process. Two obvious extremes are the following: • learning requires only a minimal degree of interaction; • learning would not be possible without extensive interaction. This criterion could be further refined with respect to the frequency, persistence, level, pattern and type of interaction. Many combinations of different values for these criteria are possible. For instance, one might think of a small group of agents that intensively interact (by discussing, negotiating, etc) in order to understand why the overall system performance has decreased in the past, or of a large group of agents that loosly interact (by sometimes giving advices, sharing insights, etc) in order to enhance the knowledge base of one of the group members. The above criteria characterize learning in multi-agent systems at the single-agent and the total-system level, and they define a large space of possible forms of multiagent learning. Each point in this space represents a form of multiagent learning having its specific characteristics and its specific demands on the skills and abilities of the individual agents.
5.3 Petri Nets Petri nets are a tool for the study of systems. Petri net theory allows a system to be modelled by a Petri net, a mathematical representation of the system. The model should encapsulate what the designer feels are the important aspects of the system to be developed. 5.3.1 Petri Net Structure and Graph A Petri net is composed of four parts: a set of places P , a set of transitions T , an input function I, and an output function O. The input and output
5.3 Petri Nets
147
functions relate transitions and places. The input function I is a mapping from a transition tj to a collection of places I(tj ), known as the input places of the transition. The output function O maps a transition tj to a collection of places O(tj ) known as the output places of the transition. Formally, the structure of a Petri net, defined by its places, transitions, input function, and output function, is given as follows: Definition 5.3.1 (Petri Net Structure). A Petri net structure, C, is a four-tuple, C = (P, T, I, O). P = {p1 , p2 , . . . , pn } is a finite set of places, n ≥ 1. T = {t1 , t2 , . . . , tm } is a finite set of transitions, m ≥ 1. The set of places and the set of transitions are disjoint, i.e., P ∩ T = ∅. I : T −→ P∞ is the input function, a mapping from transitions to bags of places. O : T −→ P ∞ is the output function, a mapping from transitions to bags of places. The cardinality of the set P is n, and the cardinality of the set T is m. We denote an arbitrary element of P by pi , i = 1, . . . , n, and an arbitrary element of T by tj , j = 1, . . . , m. A place pi is an input place of a transition tj if pi ∈ I(tj ); pi is an output place if pi ∈ O(tj ). The inputs and outputs of a transition are collections of places called bags. A bag is a generalization of sets which allows multiple occurrences of an element in a bag. In other words, an element may be in a bag zero times (not in the bag), or one time, two times, three times or any specified number of times. The use of bags, rather than sets, for the inputs and outputs of a transition allows a place to be a multiple input or a multiple output of a transition. Corresponding to a place and a transition in a Petri net structure are a circle ◦ and a bar | in its graphical representation. For convenience, we simply call the circles places and the bars transitions. A Petri net graph is often useful in illustrating the concepts of Petri net theory. Fig 5.1 shows an example of a Petri net structure and its graphical representation. A line connecting a place and a transition is called an arc. Multiple lines connecting one place to one transition indicate that the place is a multiple input or output of the transition. 5.3.2 Petri Net Markings A marking µ is an assignment of tokens to the places of a Petri net. A token is a primitive concept for Petri nets, like places and transitions are. Tokens are assigned to, and can be thought to reside in, the places of a Petri net. The number of tokens and the places they reside in may change during the execution of a Petri net. The tokens are used to define the execution of a Petri net. Definition 5.3.2 (Marking). A marking µ of a Petri net C = (P, T, I, O) is a function from the set of places P to the nonnegative integers N . µ :
148
5. How to Improve Intelligence?
C = (P, T, I, O) P = {p1 , p2 , p3 , p4 , p5 , p6 } T = {t1 , t2 , t3 , t4 , t5 }
I(t1 ) = {p1 } I(t2 ) = {p3 } I(t3 ) = {p2 , p3 } I(t4 ) = {p4 , p5 , p5 , p5 } I(t5 ) = {p2 }
O(t1 ) = {p2 , p3 } O(t2 ) = {p3 , p5 , p5 } O(t3 ) = {p2 , p4 } O(t4 ) = {p4 } O(t5 ) = {p6 }
(a) Structure
p2
t5
p6
t1 p1 p3
p4 t3
t2
t4
p5 (b) Graph Fig. 5.1. A Petri net structure and its graph
P → N . It can also be defined as an n-vector, n = |P |, such that µ = (µ(p1 ), µ(p2 ), · · · , µ(pi ), · · · , µ(pn )). µ(pi ) gives the number of tokens in place pi . Definition 5.3.3 (Marked Petri Net). A marked Petri net M = (C, µ) is a Petri net structure C = (P, T, I, O) and a marking µ. It is sometimes written as M = (P, T, I, O, µ). On a Petri net graph, a token is represented by a small dot • in a place. In cases where the number of tokens µ(p) assigned to place p is large, the convention is to write the number inside the place.
5.3 Petri Nets
149
5.3.3 Rules for Petri Net Execution The execution of a Petri net is controlled by the number and distribution of tokens in the Petri net. Tokens reside in the places and control the executions of the transitions of the net. A Petri net executes by firing transitions. A transition is said to fire by removing tokens from its input places and creating new tokens which are distributed to its output places. A transition can fire provided it is enabled. A transition is enabled if each of its input places has at least as many tokens in it as the arcs from the place to the transition. Multiple tokens are needed for multiple input arcs. The tokens in the input places which enable a transition are its enabling tokens. For example, if the only inputs to transition t4 are places p1 and p2 , i.e., input bag I(t4 ) = {p1 , p2 }, then t4 is enabled if p1 has at least one token and p2 has at least one token. For a transition t7 with input bag I(t7 ) = {p6 , p6 , p6 }, place p6 must have at least three tokens to enable t7 . A transition fires by removing all of its enabling tokens from its input places and then depositing into each of its output places one token for each arc from the transition to the place. A transition t3 with I(t3 ) = {p2 } and O(t3 ) = {p7 , p13 } is enabled whenever there is at least one token in place p2 . Transition t3 fires by removing one token from place p2 and depositing one token in place p7 and one token in place p13 (its outputs). Extra tokens in place p2 are not affected by firing t3 , although they may enable additional firings of t3 . A transition t2 with I(t2 ) = {p21 , p23 } and O(t2 ) = {p23 , p25 , p25 } fires by removing one token from p21 and one token from p23 and then depositing one token in p23 and two tokens in p25 (since p25 occurs twice in the output bag O(t2 ).) Note that firing a transition will in general change the marking µ of the Petri net to a new marking µ . Since only enabled transitions can fire, the number of tokens in each place always remain nonnegative when a transition is fired. Firing a transition can never try to remove a token which is not there. If there are not enough tokens in any input place of a transition, then the transition is not enabled and cannot fire. Transition firings can continue as long as there exists at least an enabled transition. If ever there is no enabled transition, the execution halts. To summarize, the execution rules described above only specify when a transition is enabled and how tokens are distributed and re-distributed when an enabled transition fires. In building a system using Petri nets, in general, one needs to complete the Petri net design with a formulation of ‘liveness’ conditions that trigger the actual firing of each enabled transition; this is an important but application-dependent design issue. 5.3.4 Example 1: Role Level This example is taken from [44]. It illustrates how a role-assignment supervisor is designed using Petri nets. With the role of a robot fixed as goalkeeper,
150
5. How to Improve Intelligence?
the supervisor, according to the game situation, assigns the role of attacking or defending to each of the other two robots. To read the game situation, the supervisor continually receives feedback information on the robots’ postures and ball’s position. Supervisor Model. The following places and transitions are defined for a Petri net model C s of the supervisor: Structure Places Transitions Input function
: : : :
Output function
:
C s = (P, T, I, O). P ={P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 }. T ={T1 , T2 , T3 , T4 , T5 , T6 }. I(T1 ) = {P1 }, I(t2 ) = {P2 }, I(T3 ) = {P1 , P3 }, I(T4 ) = {P4 , P5 }, I(T5 ) = {P7 , P8 }, I(T6 ) = {P2 , P6 }. O(T1 ) = {P2 }, O(T2 ) = {P1 }, O(T3 ) = {P1 , P4 , P8 }, O(T4 ) = {P3 }, O(T5 ) = {P6 }, O(T6 ) = {P2 , P5 , P7 }.
The following places are defined to model the supervisor: P1 P2 P3 P4 P5 P6 P7 P8
: : : : : : : :
robot 1 defending, robot 2 attacking, robot 1 attacking, robot 2 defending, robot 1 attacking, robot 1 defending, both robot 1 and robot 2 defending, robot 2 attacking, robot 2 defending, both robot 1 and robot 2 attacking.
The transitions are defined as follows: T1 : robot 1 is in a good position to attack, T2 : robot 2 is in a good position to attack, T3 : robot 1 is presently defending, but it is in a good position to attack, and it takes the attacking role, T4 : robot 1 is presently attacking, but it is in a good position to defend, and it takes the defending role, T5 : robot 2 is presently defending, but it is in a good position to attack, and it takes the attacking role, and T6 : robot 2 is presently attacking, but it is in a good position to defend, and it takes the defending role. The graph of this Petri net supervisor is shown in Fig. 5.2. To start with, there are three tokens in P1 , P4 and P6 , meaning that initially robot 1 is defending and robot 2 is attacking. In the following instant, if transition T1 is fired, the token in P1 moves to P2 (see Fig. 5.3(a)), then T6 is fired, and the tokens in P2 and P6 move to P2 , P5 and P7 (see Fig. 5.3(b)). When T4 fires, the tokens in P4 and P5 move to P3 (see Fig. 5.3(c)). The team robots used are assumed to be good at moving straight, but not so good at turning. Thus, when a robot is close to the ball and oriented
5.3 Petri Nets
151
T1 P1
T2
P3
T3
P2
P5
P6
T4
T5
P4
T6
P8
P7
Fig. 5.2. A Petri-net graph for role assignment supervision
T1 P1 P3
T3
P5
T4
P4
P2
T2
P6
T5
P8
T6
P7
(a)
T1
T1 P1
T3
P5 T4
P4
P2
T2
P3
(b)
T6
P7
T3
P6
P5 T4
P4
P2
T2
P3
P6 T5
P8
P1
T5
P8
T6
P7
(c)
Fig. 5.3. A Petri-net supervisor for role assignment: transition firings and token redistributions
152
5. How to Improve Intelligence?
towards it, it can attack more effectively than the other robot, and should change its role to attacking if it is not already in. For the Petri net supevisor, this has to be formalized as ‘liveness’ conditions for transitions T1 and T2 . To do so for transition T1 , with i ∈ {1, 2}, let 1. di be the distance between robot i and the ball; 2. θi be the angle between the heading direction of robot i and the ball. This is depicted in Fig. 5.4. Then, an example of the ‘liveness’ condition for
Opponent goal
Robot 1 d1
d2
θ1
θ2
Robot 2
Fig. 5.4. Role selection: who should attack?
transition T1 , that used for assigning the role of attacking or defending to the two team robots, is given as follows: Suppose robot 1 plays the defending role while robot 2 plays the attacking role. Then, if d1 < 2d2 , −45◦ < θ1 < 45◦ and |θ1 | < |θ2 |,
(5.1)
the roles of robot 1 and robot 2 are interchanged. Note that by convention, the angle θ2 is positive if in counter-clockwise direction, the heading directional line is ‘leading’ (relative to the directional line from the robot’s centre to the ball), and negative if it is ‘lagging’.
5.3 Petri Nets
153
The condition for T2 is similar, but with the two robot IDs swapped in (5.1). All other transitions can fire immediately once enabled. 5.3.5 Example 2: Action Level The example of Section 5.3.4 is extended to modelling what a robot should do in an assigned role of defending, goalkeeping or attacking. The overall DECIDE or role-action structure is as given in Fig. 5.5.
Supervisor
Attacking controller
Role assignment
Defending controller
Goalkeeping controller
Fig. 5.5. A role-action structure for Petri net supervision and control
The example Petri net controllers for the defending and goalkeeping robots are given. Defending Robot Controller. The principle to apply is simple and not unlike human soccer play: the robot assigned to defend should kick the ball away from its own team goal before any opponent robot can kick it. Assuming that the left half of the playground is the opponent side, it is reasonable that the defending robot must move to the right side of the ball as soon as possible, so as to get behind the ball. Four simple situations are assumed for the defending robot (controller): (a) (b) (c) (d)
defending robot behind the ball, defending robot kicks the ball, self goal position, and defending robot in contact with ball and behind.
Fig. 5.6 depicts the four key situations. In situation (a), the defending robot is in a probable position to kick, in situation (b) it is kicking the ball, in situation (c), it is in front of the ball and facing its own team goal (self goal position), so needs to be careful to avoid a self goal, and in situation (d), it is in contact with the ball. Defence Control. ‘Angle’ is used to refer to the angle between the heading direction of the defending robot and ball. ‘Distance’ is used to refer to the distance between the defending robot and ball in pixels. In what follows, the values for ‘Angle’ and ‘Distance’ are as fixed by the designer. In situation (a), the defending robot controller should command the robot to move to the ball and kick it. In situation (b), if ‘Angle’ is above 45◦ , or
154
5. How to Improve Intelligence?
Defending robot
Ball Defending robot
Team goal
Ball
Defending robot
Team goal
Defending robot
Team goal
Team goal
Ball
Ball
5.3 Petri Nets
155
‘Distance’ is more than 25 pixels, the defending robot controller transits to situation (a). In situation (a), if the ball is on the right side of the defending robot (deemed as a self goal position), the defending robot controller should transit to situation (c). In situation (b), if the defending robot fails to kick the ball, the controller transits to situation (c). In situation (c), the controller commands the defending robot to move sideways and if it comes behind the ball without touching it, the controller would transit to situation (a). In situation (c), if ‘Distance’ is below 15 pixels, the controller transits to situation (d), in which case it should command the robot to move away from the ball till ‘Distance’ is above 25 pixels, before transiting to situation (c). Defence Control Model. The following places and transitions are defined for a Petri net model C d of the defending robot controller. Note that the key situations described above are represented by places Pi , i ∈ {1, 2, 3, 4}. Structure Places Transitions Input function
: : : :
Output function
:
C d = (P, T, I, O). P ={P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 }. T ={T1 , T2 , T3 , T4 , T5 , T6 , T7 }. I(T1 ) = {P2 , P5 }, I(T2 ) = {P1 , P6 }, I(T3 ) = {P1 , P7 },I(T4 ) = {P3 , P5 }, I(T5 ) = {P4 , P7 }, I(T6 ) = {P3 , P8 }, I(T7 ) = {P4 , P7 }. O(T1 ) = {P1 }, O(T2 ) = {P2 }, O(T3 ) = {P3 }, O(T4 ) = {P1 }, O(T5 ) = {P3 }, O(T6 ) = {P4 }, O(T7 ) = {P3 }.
The following places are defined to model the defending robot controller: P1 P2 P3 P4 P5 P6 P7 P8
: moves behind the ball, : kicks the ball, : tries to escape from self goal position, : in contact with the ball and behind, so it has to move far away from the ball, : not in a good position to kick, : in a good position to kick, : in self goal position, : in front of the ball (self goal position).
The transitions are defined as follows: T1 : tries to kick the ball, though it is not in a good position to kick, T2 : in front of the ball and at the following instant it is in a good position to kick, T3 : in front of the ball, and moving to a self goal position, T4 : in self goal position and escaping from that, T5 : misses the ball, and is in front of the ball,
156
5. How to Improve Intelligence?
T6 : in self goal position, and then in contact with the ball and behind, and T7 : away from the ball and behind, but still in a self goal position. It should be clear that the purpose of such a controller is to keep track of and influence the situational occurrences as modelled by transitions, according to situations modelled by (tokenized) places that the defending robot finds itself in. With the above defined places and transitions, the Petri-net graph for the defending robot controller is arrived at, as shown in Fig. 5.7. The ‘liveness’ condition for each enabled transition can be easily formulated from the descriptions above.
P1 P5
T1
T3
P7
T4
T2 P2
P6
T5
P3 T6 P8 T7
P4 Fig. 5.7. A Petri-net graph for defending robot control
Note that in the control execution of this Petri net controller, a token would be put in an auxiliary place Pi , i ∈ {5, 6, 7, 8} provided the visual feedback information asserts that the situation (or condition) the place characterizes (or is predicated on) becomes true. Such places are said to be controllable. Goalkeeping Robot Controller. The principle of goalkeeping also follows human soccer play: the goalkeeping robot should block or kick the ball away from its own team goal. In this example, the goalkeeping controller is designed to react to situations characterized according to the distance between the team goal and the ball. The three situations considered are (a) far distance, (b) medium distance, and (c) in goal area.
5.3 Petri Nets
157
In the following, the distance values for these situations are as fixed by the designer. ‘Far distance’ means the distance between the goal and the ball is above 60 pixels, ‘medium distance’ means it is above 20 pixels but within 60 pixels, and ‘in goal area’ means, it is below 20 pixels. A subroutine predictor() predicts the ball direction using a linear equation. It is assumed that the ball always moves in a straight line. Using the past four positions and the current position of the ball, the predictor can derive an equation for the line of movement of the ball using a curve fitting algorithm. Fig. 5.8 illustrates the role of predictor().
Past ball positions Current ball position
Position “TARGET”
Goal keeper
Fig. 5.8. Predictor of target point of the ball.
Goalkeeping Control. The goalkeeping robot would only move within the goal area, along a line parallel to the team goal line. The goalkeeping robot controller performs the following: (a) It commands the goalkeeping robot to move to the centre of the goal for ‘kick off’ when the game commences/resumes. (b) It commands the goalkeeping robot to guard the ‘TARGET’ position by moving to and staying in between the ball and the ‘TARGET’ position, called the ‘TARGET’-intercept position, when the ball is in the opponent’s half of the playground, i.e., at a far distance from the team goal.
158
5. How to Improve Intelligence?
The ‘TARGET’ position is the point on the goal line where the predicted path of the moving ball intersects. The predicted path is determined by predictor(). (c) It commands the goalkeeping robot to move to the ‘ASSUME’ position when the ball is in the team’s half of the playground but outside the goal area, i.e., at a medium distance from the team goal. The ‘ASSUME’ position refers to the point of intersection of the goalkeeping robot’s predetermined path (which is parallel to the goal line) and the straight line that passes through the current position of the ball and the centre of the goal (line). (d) It commands the goalkeeping robot to (try to) kick the ball away from the team goal when the ball is in the goal area. Goalkeeping Control Model. The following places and transitions are defined for a Petri net model C g of the goalkeeping robot controller. Structure Places Transitions Input function
: : : :
Output function
:
C g = (P, T, I, O). P ={P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 }. T ={T1 , T2 , T3 , T4 , T5 , T6 , T7 , T8 }. I(T1 ) = {P1 P3 , P6 }, I(T2 ) = {P2 , P6 , I9 }, I(T3 ) = {P1 , P5 , P7 }, I(T4 ) = {P3 , P7 , P9 }, I(T5 ) = {P1 , P5 , P8 }, I(T6 ) = {P4 , P8 , P9 }. O(T1 ) = {P2 }, O(T2 ) = {P1 }, O(T3 ) = {P3 }, O(T4 ) = {P1 }, O(T5 ) = {P4 }, O(T6 ) = {P1 }, O(T7 ) = {P3 }, O(T8 ) = {P4 }.
The following places are defined to model the goalkeeping robot controller: P1 P2 P3 P4 P5 P6 P7 P8 P9
: goal keeper moving to the centre of the goal, : goal keeper moving to the ‘TARGET’-intercept position, : goal keeper moving to the ‘ASSUME’ position, : goal keeper moving toward the ball (it looks like as if it is going to kick the ball), : ball is coming to team goal area, : ball is far away, : ball is at medium distance, : ball is in goal area, : ball is moving away.
The transitions are defined as follows: T1 T2 T3 T4 T5 T6
: : : : : :
ball ball ball ball ball ball
is is is is is is
at far distance and coming towards the goal, at far distance and moving away, at medium distance and coming in, at medium distance and moving away, in goal area and coming in, in goal area and moving away,
5.4 Q-Learning: A Model-Free Reinforcement Learning Method
159
T7 : ball is at far distance, but coming to medium distance, and T8 : ball is at medium distance, but coming to goal area. With these defined places and transitions, the Petri-net graph for the goalkeeping robot controller is arrived at, as shown in Fig. 5.9. P5 T1
P2
P6 T7
T2 T3
P1
P7
P3 T4 T8 T5
P8 P9
P4 T6
Fig. 5.9. A Petri-net graph for goalkeeping robot control
The places, P5 , P6 , P7 , P8 , and P9 are controllable places, meaning that a token would be put in such a place provided the visual feedback information asserts that the situation (or condition) the place characterizes (or is predicated on) becomes true. For example, two tokens are created, one each in P5 and P6 (see Fig. 5.10(a)), when the ball is approaching the team goal area from a far distance; this enables T1 , which when fired, redistributes the tokens to P2 accordingly (see Fig. 5.10(b)).
5.4 Q-Learning: A Model-Free Reinforcement Learning Method Reinforcement learning is the problem faced by an agent that learns how to act through trial-and-error interactions with a dynamic environment. It can
160
5. How to Improve Intelligence?
P5 T1
P2
P6 T7
T2 T3
P1
P7
P3 T4 T8 T5
P8 P9
P4 T6
(a)
P5 T1
P2
P6 T7
T2 T3
P1
P7
P3 T4 T8 T5
P8 P9
P4 T6
(b) Fig. 5.10. A Petri-net control for robot goalkeeping: a transition firing and token redistributions
5.4 Q-Learning: A Model-Free Reinforcement Learning Method
161
be seen as a way of programming agents by reward and punishment without the need to specify or model how the task is to be achieved. 5.4.1 Standard Reinforcement Learning In the standard reinforcement-learning model, an agent is connected to its environment in the SENSE-DECIDE-ACT paradigm, abstracted as in Fig. 5.11. On each step of interaction, the agent receives as input i, some indication of the current state s of the environment; the agent then chooses an action a1 , to generate as output. The action changes the state of the environment, and the value of this state transition is communicated to the agent through a scalar reinforcement (or reward) signal r. The agent’s decisionmaking mechanism should choose actions that tend to increase the long-run sum of values of the reinforcement signal. It can learn to do this over time by systematic trial and error, guided by a wide variety of algorithms [45]. We shall however only concentrate on Q-learning, a classic model-free algorithm for reinforcement learning.
Environment
s
Input
i
Agent Reward
a
r
Fig. 5.11. The standard reinforcement-learning model
Formally, the model consists of • a discrete set of environment states S, • a discrete set of agent actions A, and • a set of scalar reinforcement signals; typically {0, 1} or a set of real numbers. Fig. 5.11 also includes an input function, which determines how the agent views the environment state; we will assume that it is an identity function 1
Note that the term action is used here in a broader sense that defined under the hierarchical control architecture in Chapter 4.
162
5. How to Improve Intelligence?
(that is, i = s, implying the agent perceives the exact state of the environment). The agent’s job is to find a policy π, mapping states to actions, that maximizes some long-run measure of reinforcement. Notice that after choosing an action, the agent is told the immediate reward and the subsequent state, but is not told which action would have been in its best long-term interests. It is necessary for the agent to gather useful experience about the possible system states, actions and rewards actively to act optimally. 5.4.2 Q-Learning Q-learning is one of the simplest and most promising reinforcement learning methods. It provides agents with the capability of learning to act optimally in Markovian domains by experiencing the consequences of actions without the need to build maps of the domains. Consider the following finite state, finite action Markov decision problem: at each discrete time step t, the agent observes the state st ∈ S of the Markov process, chooses its action at ∈ A(st ), where A(st ) is the set of actions available at state st , receives a probabilistic reward rt+1 , whose mean value Rst (at ) depends only on the state and action, and the state of the environment changes probabilistically at t + 1 to state s according to the law: Prob[ st+1 = s |st , at ] = Pst s [at ].
(5.2)
The task facing the agent is to determine an optimal policy, one that maximizes the cumulative discounted expected reward Rst (at ) for performing an action at ∈ A(st ) at every state st ∈ S, modelled by ∞ (5.3) Rst (at ) = E γ j rt+1+j , j=0
where γ, 0 ≤ γ < 1, is a discount factor. By discounted reward, we mean that a reward received immediately after j + 1 time steps is worth less than the one received at time step t by a factor of γ j (0 < γ < 1). Under a policy π, the value V of state s is V π (s) ≡ Rs (π(s)) + γ Pss [π(s)]V π (s ) (5.4) s
because the agent expects to receive Rs (π(s)) immediately after performing the action which policy π recommends, and then moves to a state that is ‘worth’ V π (s ), with probability Pss [π(s)]. The theory of dynamic programming assures us that there is at least one optimal stationary policy π∗ such that
5.4 Q-Learning: A Model-Free Reinforcement Learning Method
∗
V (s) ≡ V
π∗
(s) = max Rs (a) + γ a
∗
Pss [a]V π (s ) .
163
(5.5)
s
Dynamic programming provides a number of methods for calculating V ∗ and the corresponding π∗ , assuming that Rs (a) and Pss [a] are known. For a policy π , define Q values (or state-action values) as Pss [π(s)]V π (s ). (5.6) Qπ (s, a) = Rs (a) + γ s
In other words, the Q(s, a) value is the expected discounted reward for executing an action a at state s and following policy π thereafter. The objective in Q-learning is to estimate the Q values for an optimal policy. For conve∗ nience, define these as Q∗ (s, a) ≡ Qπ (s, a), ∀s, a. It is straightforward to ∗ show that V ∗ (s) = max Q (s, a) and that if a∗ is an action at which the a
maximum is attained, then an optimal policy can be formed as π∗ (s) ≡ a∗ . Herein lies the utility of the Q values - if an agent can learn them, it can easily decide what it is optimal to do. Although there may be more than one optimal policy or a∗ , the Q∗ values are unique. In Q-learning, an agent learns through its experience which consists of a sequence of distinct stages or episodes (also called ‘trials’). In the nth episode, n ≥ 1, the agent • • • • •
observes its current state st , selects and performs an action at , observes the subsequent state st+1 , receives an immediate payoff rt+1 and adjusts its Qn−1 (s, a) values at state st+1 , using a learning factor αn and discount factor γ, according to
Qn (s, a) =
(1 − αn )Qn−1 (s, a)+ if s = st and αn [rt+1 + γmax{Qn−1 (st+1 , at+1 )}] a = at ,
at+1
Qn−1 (s, a)
(5.7)
otherwise.
Note that strictly speaking, the notations st and at should be replaced by st,n and at,n , respectively, to more appropriately denote the state and action at time instant t of episode n, and rt+1 should be replaced by rt+1,n to denote the reward at a next time instant t + 1 of episode n, when the agent has just entered a new state st+1,n due to taking action at,n at state st,n . max{Qn−1 (st+1 , at+1 )} is the best the agent thinks it can do from state at+1
st+1 . Of course, in the early stages of learning, the Q values may not accurately reflect the policy they implicitly define. The initial Q values, Q0 (s, a), for all states s ∈ S and actions a ∈ A are either set to zero or assumed to be known.
164
5. How to Improve Intelligence?
5.4.3 Example 1: Role Level This example illustrates how a role-assignment supervisor is designed using Q-learning [44]. Problem Formulation. The states, actions and rewards are defined as follows: 1. The distance between the attacking robot and the ball is classified into 5 states (see Fig. 5.12) : ra0 , ra1 , ra2 , ra3 , ra4 . 2. The angle between the attacking robot and the ball is classified into 7 states (the forward direction of the attacking robot and the angle between it and the ball) : θ0 , θ1 , θ2 , θ3 , θ4 , θ5 , θ6 .
θ 0 =30 θ1 =30
o
θ6 =30
o
θ 5 =45
o
o
r a4 >40cm r a3 0, 0 if M ax η j (k) = 0.
Denote the action corresponding to M ax ηj (k) by aji it . Then, i it ∈ Ijit and aji it ∈ Ajit . aji it is transferred to the Final Selection module. 5. Final Selection takes into account the outputs of all the other modules and selects an action for each agent for the situation considered. The following are needed to define the Final Selection module: • aji f s : the action selected based on the outputs at time k of the Supervisor, Internal Motive and Intervention modules; i f s ∈ Ija , aji f s ∈ Aj . • A F S j (k): the final action selected by the Final Selection module (output of ASM); A F Sj (k) ∈ Aj . • daji (k): the duration until time k, in which aji holds the same action; nonnegative integer. • t daji : threshold of daji (k); nonnegative integer. t daji is a parameter to keep the persistence of actions, which is directly related to the stability of the overall system. In selecting action aji f s , the priority among the modules of the ASM are set as follows: Supervisor Highest
>
Intervention
>
InternalMotive . Lowest
Fig. 5.20 shows how to select aji f s with this order of priority. daji f s (k) corresponding to aji f s is compared with t daji f s and A F S j (k) is finally determined as follows:
174
5. How to Improve Intelligence?
Fig. 5.20. The process of selecting aji f s
j
A F S (k) =
aji f s if daji f s (k) > t daji f s , j A F S (k − 1) otherwise.
It is necessary for the Intervention module to determine φjl i (k), a measure of the activation level due to the disturbance from opponent agent l, that it needs to select an action. But as it is not clear how an explicit model of the game can be obtained in terms of generally non-deterministic behaviour of opponent agents, approaches such as neural network, fuzzy logic and Qlearning that can learn human judgements of what action is appropriate in various game situations are appropriate for the Intervention module. Problem Formulation: One-A-Side MiroSoT Game. Consider the following ASM formulation for a one-a-side game, depicted in Fig 5.21, where each team has only one robot player. • • • • • • • • • •
Ir = {1}, j ∈ Ir , Io = {1}, l ∈ Io , Aj = {Shoot, Position To Shoot, Sweep Ball, Stop, BlockBall}, Ija = {1, 2, 3, 4, 5, 6}, Naj = 6, i ∈ Ija , Ajim = {Shoot, Position To Shoot, Stop}, j Nim = 4, Ijim = {1, 2, 3, 5}, j P1 = 4, P2j = 3, P3j = 2, P5j = 1, Ajit = {SweepBall, BlockBall}, Nitj = 2, Ijit ={4, 6}, t dfij = 1, θij = 0.8, t daji = 2.
In the feedforward neural network approach, the training data to collect are the situation-action pairs. The pairing is done using human judgement; more
5.5 Neural Networks
175
Fig. 5.21. A one-a-side MiroSoT game
will be said of this later. The inputs ui to the neural network are situation variables that characterize a game situation at each time instant k. The 10 input variables used are as follows: • the ball’s velocity; the opponent robot’s velocity; • the four variables characterizing ball possession (depicted in Fig. 5.22), defined by – θBR : angle between the team (or home) robot’s heading direction and its direction towards the ball, – θBO : angle between the opponent robot’s heading direction and its direction towards the ball, – DBR : distance between the team robot and the ball, – DBO : distance between the opponent robot and the ball; • the two variables representing the risk level of conceding a goal (depicted in Fig. 5.23), defined by – DBRG : distance between the ball and the team (or home) goal, – DIRG : distance between the centre of the goal and the intersection point, of the team goal line and the line passing through the opponent robot and the ball; • and the two variables representing the team robot’s winning score against the opponent robot (depicted in Fig. 5.23), defined by – DBOG : distance between the ball and the opponent goal, – DIOG : distance between the centre of the goal and the intersection point of the opponent goal line and the line passing through the team robot and the ball.
176
5. How to Improve Intelligence?
θ θ
Fig. 5.22. Four situation variables characterizing ball possession
Home robot
Opponent robot Ball
Team goal
DBOG DBRG DIRG DIOG
Fig. 5.23. Four situation variables representing the team (or home) robot’s winning score against the opponent robot and the risk level of conceding a goal
The sigmoid function2 used as the activation function for φjl i (k) is f (x) = 2
1 ∈ (0, 1). 1 + e−ax
Note that this sigmoid function can be obtained by setting λ1 = 1, λ2 = 2, and a = 2σ in Eq. (5.10).
5.5 Neural Networks
177
In other words, the output of each simple neuron in the net represents the activation level of one of the two actions in Ajit , where SweepBall is an action done to kick away the ball and BlockBall is done to block the ball and avoid conceding a goal. Thus, the feedforward neural network to be set up for the Intervention module has 10 inputs ui and 2 outputs yj . In [47], 2 hidden layers were used, with the first and second hidden layers consisting of 12 nodes and 6 nodes, respectively. The complete net built was a 10 input, 2 output, 2 hidden layer, fully-connected, feedforward neural network. The error back propagation algorithm [46], one of the supervised learning methods, was used to train the network depicted in Fig. 5.24.
Fig. 5.24. Structure of the feedforward neural network
Neural Network Training. As mentioned earlier, the training data to collect are the situation-action pairs; in each training data pair is an input vector of a situation and an action-output vector that has the activation value of the most desired action set to 1, and those of the other actions set to 0. The situation data is computed from raw data collected through a real robot soccer game. The exercise game is done between an agent with the ASM excluding Intervention module, and an opponent agent which may have some kind of ASM or other control algorithms. The game raw data, such as the cooordinate position and heading angle of each robot, and the coordinate position of the ball, are stored. Then, the human manager observes the replayed game on a computer 2-D graphics display, and assesses the situation where the activation level of an action by the team robot in response to the disturbance of the opponent agent is deemed to be very high. The best among the actions given to the Intervention module will be identified and the corresponding raw data for the situation will be stored. Note that if the
178
5. How to Improve Intelligence?
replay is displayed on three-dimensional delicate graphics, instead of twodimensional animation, the human manager would make better judgments on the situation-desired action pairing. The situation variables - inputs to the neural net - can be readily calculated using the raw data for each situation. Using the training data obtained this way, back-propagation algorithm is used to train the neural network. The trained net is then applied to the recorded game to check for training effectiveness. Once the performance is deemed to be within the desired levels, the neural network-based Intervention module can be deployed in a one-a-side soccer game, but under the set-up conditions in which the training data pairs are obtained.
5.6 Evolutionary Programming Evolutionary Programming (EP), originally conceived by Lawrence J. Fogel in 1960, is a stochastic optimization strategy similar to genetic algorithms (GAs). GAs arose from a desire to model the biological processes of natural selection and population genetics, with the original aim of designing autonomous learning and decision-making systems. Other analogous algorithms that have also been proposed in the literature include evolution strategies (ES). Together, EP, GAs and ES have been classified under the umbrella group of evolutionary algorithms (EAs). EP can be better understood in relation to GAs. For this, an overview of GAs is first presented. GAs are global, parallel, search and optimisation methods, founded on Darwinian principles. They work with a population of potential solutions to a problem as follows: 1. Each individual within the population represents a particular solution to the problem, generally expressed in some form of genetic code. The population is evolved, over generations, to produce better solutions to the problem. 2. Each individual within the population is assigned a fitness value, which expresses how good the solution is at solving the problem. The fitness value probabilistically determines how successful the individual will be at propagating its genes (its code) to subsequent generations. Better solutions are assigned higher values of fitness than worse performing solutions. 3. Evolution is performed using a set of stochastic genetic operators, which manipulate the genetic code. Most GAs include operators that select individuals for reproduction, produce new individuals based on those selected, and determine the composition of the population at the subsequent generation. Crossover and mutation are two well-known operators:
5.6 Evolutionary Programming
179
• The crossover operator involves the exchange of genetic material between chromosomes (parents), in order to create new chromosomes (offspring). • The mutation operator, in its simplest form, makes small, random, changes to a chromosome. 4. Once the new generation has been constructed, the processes that result in the subsequent generation of the population are begun once more. GAs explore and exploit the search space to find good solutions to the problem. It is possible for a GA to support several dissimilar, but equally good, solutions to a problem, due to its use of a population. However, despite the simple concepts involved, GAs can become quite complicated. Many variations have been proposed since the first GA was introduced. Rigorous mathematical analysis of a GA is difficult and is still incomplete. EP is similar to GAs, but instead, places emphasis on the behavioral linkage between parents and their offspring, rather than seeking to emulate specific genetic operators as observed in nature. The behavioral linkage can be obtained by using a zero mean Gaussian mutation. EP is similar to evolution strategies (ES), although the two approaches were developed independently. Like both ES and GAs, EP is a useful method of optimization when other techniques such as gradient descent or direct, analytical discovery are not possible. Combinatorial and real-valued function optimization in which the optimization surface or fitness landscape is ‘rugged’, possessing many locally optimal solutions, are well suited for evolutionary programming. 5.6.1 The EP Process For EP, there is an underlying assumption that a fitness landscape can be characterized in terms of variables, and that there is an optimum solution (or multiple such optima) in terms of those variables. For example, if one were trying to find the shortest path in a Traveling Salesman Problem, each individual (solution candidate) would be a path. The length of the path could be expressed as a number, which would serve as the individual’s fitness. The fitness landscape for this problem could be characterized as a hypersurface proportional to the path lengths in a space of possible paths. The goal would be to find the globally shortest path in that space, or more practically, to find very short tours in finite time. The basic EP method involves 3 steps (Repeat until a threshold for iteration is exceeded or an adequate solution is obtained): 1. Choose an initial population of individuals (trial solutions) at random. The number of individuals in a population is highly relevant to the speed of optimization, but no definite answers are available as to how many individuals are appropriate (other than > 1) and how many individuals are just wasteful.
180
5. How to Improve Intelligence?
2. Each individual is replicated into a new population. Each of these offsprings is mutated according to a distribution of mutation types, ranging from minor to extreme with a continuum of mutation types in between. The severity of mutation is judged on the basis of the functional change imposed on the individuals. 3. Each offspring is assessed by computing its fitness. Typically, a stochastic competition is held to determine the number of individuals to be retained for the population. It should be pointed out that EP typically does not use any crossover as a genetic operator. The pseudocode for algorithm EP is given in Fig. 5.25.
// Begin EP // start with an initial time t := 0; // initialize a random population of individuals initpopulation P (t); // evaluate fitness of all initial individuals of population evaluate P (t); // test for termination criterion (time, fitness, etc.) while not done do { // perturb the whole population stochastically P’(t) := mutate P (t); // evaluate its new fitness evaluate P’ (t); // stochastically select the survivors from actual fitness P(t+1) := survive P(t),P’(t); // increase the time counter (also called generation counter) t := t + 1; } // end while // End EP Fig. 5.25. Pseudocode of algorithm EP
As an example, the following is an EP program that attempts to minimize the function: f (x1 , x2 ) = x21 + x22 .
5.6 Evolutionary Programming
181
Among several mutation methods, Gaussian random number generator is used for this example; each individual consists of an instance of (x1 , x2 ), associated with a corresponding instance of standard deviation (η1 , η2 ). 1. Begin EP 2. t := 0; 3. initpopulation (xi , ηi ), ∀i ∈ {1, · · · , µ}, where • xi = (xi1 , xi2 ), xij is the the j-th parameter, j ∈ {1, 2}, of the i-th individual, • ηi = (ηi1 , ηi2 ), ηij is the standard deviation of the j-th parameter of the i-th individual for Gaussian mutations, and • µ is the population size; 4. evaluate P (t) for each individual (xi , ηi ), ∀i ∈ {1, · · · , µ}; 5. while not done do { • P (t) := mutate P (t); In this step, each parent (xi , ηi ), i = 1, · · · , µ, creates a single offspring (xi , ηi ) as follows: ηij = ηij exp(τ N (0, 1) + τ Nj (0, 1)), xij = xij + ηij Nj (0, 1), where – xij , xij , ηij and ηij denote the j-th parameter of the vectors xi , xi , ηi and ηi , respectively; – N (0, 1) denotes a normally distributed one-dimensional random number with mean 0 and standard deviation 1. $ # √ −1 √ −1 The factors τ and τ are commonly set to 2 n and 2n . • evaluate P (t) for each individual (xi , ηi ), ∀i ∈ {1, · · · , µ}; • P (t + 1) := survive P (t), P (t); In this step, the following are carried out: a) Pairwise comparisons over the union of parents (xi , ηi ) and offsprings (xi , ηi ), ∀i ∈ {1, · · · , µ}, done as follows: for each individual, – q opponents are chosen uniformly at random from all the parents and offsprings; – if the individual’s fitness is no smaller than opponent’s, it receives a ‘win’. b) Selection of µ individuals, out of (xi , ηi ) and (xi , ηi ), ∀i ∈ {1, · · · , µ}, that have the most wins, to be parents of the next generation. • t := t + 1; } // end while 6. End EP
182
5. How to Improve Intelligence?
5.6.2 EP and GAs There are two important ways in which EP differs from GAs, though they are merging together in a unified algorithm. Representation: There is no constraint on the representation of individuals in a population. The typical GA approach involves encoding the problem solutions as a string of representative tokens, the genome. In EP, the representation follows from the problem. A neural network can be represented in the same manner as it is implemented, for example, because the mutation operation does not demand a linear encoding. (In this case, for a fixed topology, real-valued weights could be coded directly as their real values and mutation operates by perturbing a weight vector with a zero mean multivariate Gaussian perturbation. For variable topologies, the architecture is also perturbed, often using Poisson distributed additions and deletions.) Genetic Operators: While crossover and mutation operators are needed in GA, the mutation operation in EP simply changes aspects of the individual according to a statistical distribution which retains the behavioral linkage between parents and their offsprings. Further, the severity of mutations is often reduced as the global optimum is approached. There is a certain tautology here: if the global optimum is not already known, how can the spread of the mutation operation be damped as the solutions approach it? Several techniques have been proposed and implemented which address this difficulty. The most widely studied one is the MetaEvolutionary technique in which the variance of the mutation distribution is subject to mutation by a fixed variance mutation operator and evolves along with the individual. EP uses stochastic competition while GAs use the Roulette wheel method for selection; it is noted that the stochastic characteristics of the selection methods are similar. 5.6.3 EP and ES The first communication between the evolutionary programming and evolution strategies groups occurred in early 1992, just prior to the first annual EP conference. Despite their independent development over 30 years, they share many similarities. When implemented to solve real-valued function optimization problems, both typically operate on the real values themselves (rather than any coding of the real values as is often done in GAs). Multivariate zero mean Gaussian mutations are applied to each parent in a population and a selection mechanism is applied to determine which individuals to remove (i.e., ‘cull’) from the population. The similarities extend to the use of self-adaptive methods for determining the appropriate mutations to use – methods in which each parent carries not only a potential solution to the problem at hand, but also information on how it will distribute new trials
5.6 Evolutionary Programming
183
(offspring). Most of the theoretical results on convergence (both asymptotic and speed) developed for ES or EP also apply directly to the other. The main differences between ES and EP are 1. Selection: EP typically uses stochastic selection via a competition. Each individual in the population faces competition against a preselected number of opponents and receives a ‘win’ if it is at least as good as its opponent in each encounter. The number of wins is counted. Selection then eliminates those individuals with the least wins. In contrast, ES typically uses deterministic selection in which the worst individuals are purged from the population based directly on their fitness evaluation. 2. Recombination: EP is an abstraction of evolution at the level of reproductive populations (i.e., species) based on phenotypic representation, and thus no recombination mechanisms are typically used because recombination does not occur between species (by definition: see Mayr’s biological species concept [48, p. 318]). In contrast, ES is an abstraction of evolution at the level of individual behavior. When self-adaptive information is incorporated, this is purely genetic information (as opposed to phenotypic) and thus some forms of recombination are reasonable and many forms of recombination have been implemented within ES. The effectiveness of such operators depends on the problem at hand. 5.6.4 Example: Behaviour Level This example illustrates how EP can be used to train univector fields for robot navigation [38]. The idea of generating univector fields for robot navigation has already been introduced in Section 4.6.2 of the previous chapter. To exploit the univector field F to achieve higher performance in robot control, the field has to be optimized. For this purpose, a grid net and a function approximator are developed. To start with, in general, a grid of size b × a is located within the workspace3 as shown in Fig. 5.26(a). The shape and density of the grid net can be varied in accordance with the application and the desired accuracy. A node represents the point of intersection of the grid lines. Denser grid implies a larger number of nodes. pi,j is the (coordinate) position of node (i, j) and Fi,j represents the field vector at pi,j . The set of angles of univectors Fi,j forms an b × a matrix, which is defined as univector field matrix Φ as follows: Φ = {ψi,j |1 ≤ i ≤ b, 1 ≤ j ≤ a},
(5.12)
where ψi,j is the angle of vector Fi,j . To determine the field vector at an arbitrary position p, interpolating operation is adopted. At first, the operator finds the four neighbouring nodes at positions pi,j , pi+1,j , pi,j+1 , and pi+1,j+1 3
In robot soccer, the playground is the workspace.
184
5. How to Improve Intelligence?
Fi,j
(a) Grid
Fi,j+1
F i,j
F(P)
Fi+1,j+1 Fi+1,j
(b) Interpolation Fig. 5.26. Grid net of the function approximator
surrounding the point p. Then as shown in Fig. 5.26(b), the distances da , db , dc , and dd between these four nodes and p are computed. Interpolated field vector F (p) and its angle ψ(p) at p are calculated as follows: F (p) = F /||F ||, ψ(p) = F (p),
(5.13)
with (db dc dd ) Fi,j + (da dc dd ) Fi,j+1 + (da db dd ) Fi+1,j + (da db dc ) Fi+1,j+1 , db dc dd + da dc dd + da db dd + da db dc ||F || = Magnitude of F , F =
where
5.6 Evolutionary Programming
da = ||p − pi+1,j+1 ||, dc = ||p − pi+1,j ||,
185
db = ||p − pi,j+1 ||, dd = ||p − pi,j ||.
Fi,j at any node (i, j) is known beforehand. Thus F (p) can be evaluated. F (p) represents an intermediate vector for Fi,j , Fi,j+1 , Fi+1,j , and Fi+1,j+1 vectors. As p approaches pi,j , F (p) converges to Fi,j . Thus, by setting the elements of the matrix {Fi,j |1 ≤ i ≤ b, 1 ≤ j ≤ a} to each of the node values, all the vectors in the field F can be fully determined by interpolation using Eq. (5.13). For example, consider the following 3 × 3 univector field matrix: π π π 2
Φ= π 3 4π
2 3 2π π 4
4
0 .
(5.14)
π 6
Fig. 5.27 (a) shows the field vectors represented by the matrix Φ and Fig. 5.27 (b) is the univector field calculated using Eq. (5.13). In the next section,
(a)
(b)
Fig. 5.27. Simple example of grid net
the optimization of field vector Fi,j is discussed. Evolutionary Programming for the Grid Net. To optimize the univector field, univector field matrix Φ is used as the data structure for an individual (i.e., a trial solution) of the population. In this example, the evaluation function is decided based on the elapsed time, the heading angle error, the positioning error, the distance from the obstacle, and the maximal angular acceleration. ˙ for These criteria are merged to form an evaluation function f (ts , p, ω) each followed path as follows: f (ts , θc , p, ω) ˙ = kt ts + kd | θc (ts ) − θd | + ft (p) + fo (p) + fa (ω), ˙ (5.15)
186
5. How to Improve Intelligence?
where ts is the elapsed time and θd is the desired heading angle at point g. This evaluation function is used for optimizing the univector field matrix {Fi,j |1 ≤ i ≤ b, 1 ≤ j ≤ a}. The first term in the evaluation function is to ensure quick reachability of the desired point g and the second term forces the robot to converge to the desired heading angle θd at point g. The third term ft (p) makes the robot move to the desired point g: 0 if arrived at point g within allowable error bound, ft (p) = (5.16) Tp + mint∈[0,ts ] ( |p(t) − g| ) otherwise, where p(t) is the position of the robot centre at time t, point g is the desired position, and Tp is a penalty value that is added when the robot does not arrive at point g. If the robot does not reach the desired position as indicated in (5.16), the distance from the robot centre to the desired point, the corresponding value mint∈[0,ts ] ( |p(t) − g| ), and Tp are used to obtain ft (p) as a penalty. The fourth term fo (p) prevents the robot from colliding with an obstacle and assumes the following values: 0 if no obstacle collision, fo (p) = (5.17) Bp + maxt∈Ω ( |p(t) − pb | ) otherwise, where Bp is a penalty parameter, Ω ⊂ [0, ts ] refers to the time interval during which the robot is within an obstacle boundary, and pb is the closest point on the obstacle boundary from the robot centre. When the robot collides with an obstacle, the function fo (p) is calculated by projecting the robot trajectory nearest to the obstacle center. The shortest distance of such a point from the periphery of the obstacle is used for getting the value of the function fo (p). The last term fa (ω) ˙ prevents the robot angular acceleration, ω, ˙ from exceeding its limit αmax : 0 when ω˙ is within the limit αmax , fa (ω) ˙ = (5.18) Ap + maxt∈[0,ts ] ( |ω(t) ˙ − αmax | ) otherwise. In the computer simulation [38], the scaling factor kt and kd are taken as 1 and 5, respectively. The penalty values Tp , Bp , and Ap are taken as 500 cm, 100 cm, and 50 rad/s2 , respectively. The value of Tp is set to be greater than the sum of the other two terms Bp and Ap in evaluation function. The terms ft (p), fo (p), and fa (w) are made to satisfy the constraints, which ensure that the robot reaches the desired position without collision and exhibits movements without ripples. The remaining terms kt ts and kd | θc (ts ) − θd | of (5.15) are used as fine tuning for short navigation time and desired heading angle at point g. Once the univectors are optimized properly, the values of ft (p), fo (p), and fa (w) are all zeros. The penalty parameters Tp , Bp , and Ap determine the properties of optimization progress. At the first stage of optimization, the individuals that fail
5.6 Evolutionary Programming
187
to drive the robot to the desired position are weeded out because Tp is the strongest penalty. But if all the individuals violate this constraint, the term on the right-hand side of Tp in (5.16) becomes meaningful. By this term the individuals are inclined to approach the desired position. The terms Bp and Ap in Eqs. (5.17) and (5.18) are used in a similar manner. The penalty values of Tp , Bp , and Ap can be varied to suit the designer’s intentions. Table 5.1. Algorithm EP for offline training of univector fields 1. Initialization a) t ←− 0 b) Initialize population P0 2. While (not termination condition) do a) t ←− t + 1 b) q ←− 0 c) While (not q = np ) do i. q ←− q + 1 ii. Mutate q-th univector field matrix iii. S(q) ←− 0 iv. k ←− 0 v. While (not k = ns ) do A. k ←− k + 1 B. Simulate the robot navigation C. Calculate evaluation function f D. S(q) ←− S(q) + f d) Select the best np candidates using S() for population Pt+1 . 3. End
Algorithm EP using the evaluation function in Eq. (5.15) is summarized in Table 5.1. In Table 5.1, np is the number of individuals in a population and ns is the number of simulations per individual. The cumulative evaluation value S(q) for the q-th individual is stored in S(), which is used to select the best np individuals. Different termination conditions can be used. In this example, the optimization is terminated if the total number of generations exceeds a predefined one. By this algorithm, the sub-optimal univector field matrix can be obtained. For mutation, the following self-adaptive Gaussian mutation [49] which is widely applied in optimization problems is used: σij = σij exp(τ G(0, 1) + τ Gij (0, 1)), mij = σij Gij (0, 1), 1 1 τ = √ , τ = √ , 2 Kv 2 Kv
(5.19)
where Kv is the number of variables in each individual of the population. G(0, 1) is a random variable of normal probability distribution whose mean and variance are 0 and 1, respectively. The global factor exp(τ G(0, 1)) allows
188
5. How to Improve Intelligence?
an overall change in the mutability and guarantees the preservation of all degrees of freedom, whereas exp(τ Gij (0, 1)) allows individual changes with a mean step size of σij . The univector field matrix is updated using the following: ψ(pij ) ←− ψ(pij ) + k mi,j +
1−k (mi−1,j + mi,j−1 + mi+1,j + mi,j+1 ). 4 (5.20)
where 0 ≤ k < 1. The smoothing coefficient k suppresses the ripples in the fields. For details on constrained optimization by evolutionary programming, the reader is referred to [50]. Real-time Univector Field Navigation for Robot Soccer. Note that the optimization of the univector field is done separately for the two subfields (of the same scale for the same workspace). In optimizing the subfield for a target posture, the target point g and guidance point r are fixed; in particular, point g is fixed at (0, 0), and point r is fixed at (xr , 0). The length |xr | of the line gr depends on which action it implements. In training the subfield for obstacle avoidance, an obstacle of a known size is considered, with the centre of the obstacle (let’s call it the obstacle point) positioned at (0, 0). In robot soccer, a robot is treated as (an object placed within the boundary of) a circular obstacle. In deploying these optimized subfields in a real-time MiroSoT game, the desired heading angle θd of a team robot at an arbitrary point p, i.e., θd = F (p), can then be determined by (some translational and rotational displacements of) these two subfields. Conceptually, such transformation should be done in such a way that 1. the target point g and guidance point r of the subfield for a target posture coincides respectively with the actual target point and guidance point in the playground; 2. the obstacle point (and there are 5 of them for Small League MiroSoT) of each duplicate subfield for obstacle avoidance coincides with the actual centre of the obstacle in the playground. All the 6 subfield univectors whose positions coincide at the actual robot’s position p are then mathematically ‘combined’ to yield the univector F (p), and hence the desired heading angle F (p), as discussed in Section 4.6.2 and depicted in Fig. 4.21.
5.7 Fuzzy Logic and Control Fuzzy control is a control paradigm that has received a lot of attention recently. In this section we will give a brief description of the key ideas. We will start with fuzzy logic, which has inspired the development.
5.7 Fuzzy Logic and Control
189
5.7.1 Fuzzy Logic Ordinary Boolean logic deals with quantities that are either true or false. Fuzzy logic is an attempt to develop a method for logic reasoning that is less sharp. This is achieved by introducing linguistic variables and associating them with membership functions, which take values between 0 and 1. In fuzzy control, the logical connectives ‘and’,‘or’ and ‘not’ are operators on linguistic variables. These operations can be expressed in terms of operations on the membership functions of values of the linguistic variables. Consider two linguistic values, A and B, of a linguistic variable x, with the respective membership functions, µA (x) and µB (x). The logical operations are defined by the following operations on the membership functions. µA and B (x) = min(µA (x), µB (x)), µA or B (x) = max(µA (x), µB (x)), µnot A (x) =1 − µA (x).
(5.21)
A linguistic value A, where the membership function is zero everywhere except for one particular measured value x0 of the linguistic variable x, is called a crisp value. Formally, it is characterized by the following membership function: µA (x) =
1 if x = x0 , 0 if x = x0 .
(5.22)
Assume for example that we want to reason about temperature. For this purpose we introduce the linguistic values cold, moderate, and hot, and we associate them with the membership functions shown in Fig. 5.28. The membership function for the linguistic values cold and moderate and cold or moderate are also shown in the figure. 5.7.2 A Fuzzy Controller A block diagram of a fuzzy PD controller is shown in Fig. 5.29. The measured values of the linguistic variables, the control error e and the time derivative (or rate of change) of the error ce, are converted to so-called ‘linguistic values’ in a process called ‘fuzzification.’ This procedure converts continuous values (of the linguistic variables) to a collection of linguistic values. The number of linguistic values is typically quite small. Examples of linguistic values are: Negative Big (N B), Negative Medium (N M ), Negative Small (N S), Negative Zero (NE), Zero (ZE), Positive Zero (P O), Positive Small (P S), Positive Medium (P M ) and Positive Big (P B). The control strategy is expressed in terms of a function that maps linguistic variables to linguistic variables. This function is defined in terms of a set of if-then rules. As an illustration, we
190
5. How to Improve Intelligence?
cold
moderate
hot
1 0.5
0 -10
0
10
20
30
40
20
30
40
20
30
40
1
cold and moderate 0.5 0 -10
0
10
cold or moderate
1 0.5 0 -10
0
10
Fig. 5.28. Illustration of membership functions for fuzzy values. The upper diagram shows the membership functions of cold, moderate, and hot. The middle diagram shows the membership functions for cold and moderate; the lower diagram shows the membership functions for cold or moderate.
ce
Fuzzyfier e
Inference engine
Fuzzy rule base
Fig. 5.29. A fuzzy PD controller
u
Defuzzifier
5.7 Fuzzy Logic and Control
191
give the rules for a PD controller where the error e and its derivative ce are each characterized by three linguistic values (N, Z, P ) and the control input u is characterized by five linguistic values (N B, N M, ZE, P M, P B). Rule Rule Rule Rule Rule Rule Rule Rule Rule
1: 2: 3: 4: 5: 6: 7: 8: 9:
If If If If If If If If If
e e e e e e e e e
is is is is is is is is is
N and ce is P , then u is ZE. N and ce is Z, then u is N M . N and ce is N , then u is N B. Z and ce is P , then u is P M . Z and ce is Z, then u is ZE. Z and ce is N , then u is N M . P and ce is P , then u is P B. P and ce is Z, then u is P M . P and ce is N , then u is ZE.
These rules can also be expressed in table form, see Table 5.2. Table 5.2. Representation of the fuzzy PD controller as a table u
ce
e
P
Z
N
N
ZE
NM
NB
Z
PM
ZE
NM
P
PB
PM
ZE
The membership functions representing the linguistic values normally overlap (see Fig. 5.28). Due to this, several rules contribute to the control input u. The inferred output of each rule is aggregated. The aggregated output is represented by a (fuzzy) linguistic set. The linguistic set representing the control input is then mapped into a real number by an operation called ‘defuzzification.’ More details are given in the following. 5.7.3 Fuzzy Inference Many different shapes of membership functions can be used. In fuzzy control it is common practice to use overlapping triangular shapes like the ones shown in Fig. 5.28 for both the error and its derivative, and the control input.
192
5. How to Improve Intelligence?
Typically, only a few membership functions are involved in the inferencing of each rule for the measured variables. Fuzzy logic is used to a moderate extent in fuzzy control. A key issue is to interpret logic expressions of the type that appears in the description of the fuzzy controller. Some special methods are used in fuzzy control. To describe these, we assume that µA , µB , and µC are the membership functions associated with the linguistic values A, B, and C. Furthermore, let x and y represent measurements. If the values x0 and y0 are measured, they are considered as crisp values. The fuzzy statement If x is A and y is B is then interpreted as the crisp (‘specific’) value Z 0 = min(µA (x0 ), µB (y0 )),
(5.23)
where ‘and’ is realized by the minimum operation of the membership functions. min(wi , wj ) is called a T-norm operator and its arguments, wi and wj , are called firing strengths. Instead of the min(.) operator, any other T-norm operator for implementing ‘and’, such as algebraic product, bounded product and drastic product, can be used. The linguistic variable u defined by If x is A and y is B then u is C is interpreted as a linguistic set C with the membership function µC (u) = min(Z 0 , µC (u)).
(5.24)
If there are several rules, as in the description of the PD controller, each rule is evaluated individually. The results obtained for each rule are aggregated using the ‘or’ operator. This corresponds to taking the maximum operation of the membership functions obtained for each individual rule. Similarly, instead of the maximum operator max(.), called a T-conorm operator, any other Tconorm operator for implementing ‘or’, such as algebraic sum, bounded sum and drastic sum, can be used. Fig. 5.30 is a graphical illustration for the case of the first two rules of the PD controller. The figure shows how the so-called qualified (induced) consequent membership function corresponding to each rule is constructed, and how the overall output membership function representing the control input is obtained by taking the maximum of the membership functions obtained from all rules. The inference procedure described is called ‘min-max.’ This refers to the operations on the membership functions. Other inference procedures are also used in fuzzy control. The ‘and’ operation is sometimes represented by taking the product of two membership functions and the ‘or’ operator by taking an algebraic sum. Combinations of the schemes are also used. In this way, it is possible to obtain ‘min-max’ and ‘min-sum’ inference.
5.7 Fuzzy Logic and Control
193
Rule 1: If e is N and ce is P, then u is Z.
min N
P
Z
w1 w2
Rule 2: If e is N and ce is Z, then u is NM. N
NM
Z
min w3 w4
ce
e
Fig. 5.30. Illustration of fuzzy inference with two rules using the min-max rule.
5.7.4 Defuzzification Fuzzy inference results in a control input expressed as a linguistic set and defined by its membership function. To apply a control input to the real system, we must have a real value. Thus, the linguistic set defining the control input must be converted to a real number through the operation of ‘defuzzification.’ This can be done in several different ways. Consider an overall output linguistic set C with the membership function µC(u). Defuzzification by the ‘the centroid of area’ method gives the value % uµ (u) du u0 = % C . µC (u) du
(5.25)
Defuzzification by the ‘bisector of area’ method gives a real variable u0 that satisfies u0 ∞ µC (u) du = µC (u) du. (5.26) −∞
u0
194
5. How to Improve Intelligence?
5.7.5 Example: Behaviour Level This example implements a navigation controller (at the behaviour level) for a Shoot action using fuzzy logic control [40].
Allowed
Not allowed
Fig. 5.31. Shooting from the left side when the line connecting the ball to the opponent goal is on the right
In the example, the following two constraints are considered: • Constraint 1 The robot should approach the ball from the side opposite to that with the line connecting the ball and the opponent goal; this is depicted in Fig. 5.31. In other words, it should always approach the ball so as to ‘bump’ it towards the opponent goal. • Constraint 2 The robot should avoid obstacles in the playground that are not very close to the ball. The relative posture of a robot is characterized by three variables (ρ, ϕ, θ), where the polar coordinate (ρ, ϕ) is the robot’s position relative to the ball’s, as depicted in Fig. 5.32. These variables are needed to implement the shooting action in view of the abovementioned constraints. Overall Fuzzy Navigation Controller. The overall fuzzy navigation control structure is as shown in Fig. 5.33. It consists of two sub-controllers organized in a two-level hierarchy. The higher-level sub-controller is a fuzzy
5.7 Fuzzy Logic and Control
195
{
ϕ z
qB
d
ρ
θ t Fig. 5.32. Variables for relative posture characterization
path-planner; it generates a desired global path connecting the robot’s current position to the ball, without violating the two constraints. The lowerlevel sub-controller is a fuzzy posture-controller; it outputs the robot’s left and right wheel velocities to follow the desired path from the robot’s current posture. Fuzzy Planner. The fuzzy planner is for generating a path globally that meets the constraints by calculating the robot’s desired heading angle θd at each relative position (ρ, ϕ). It comprises two blocks: one is the destination block that generates a path which leads to the destination (the ball); this path satisfies Constraint 1; the other is the obstacle block that compensates θd for obstacle avoidance so as to satisfy Constraint 2. Destination Block. This is for determining the desired heading angle at each robot’s position relative to the ball, (ρ, ϕ). Fig. 5.34 shows the basic idea of constructing a path. A desired path is represented by a line extending to an arc; to move along a directional path, a robot’s desired heading angle θd at an arbitrary point on the path is the angle the tangent to the point (in the same direction of move) makes with the X-axis. In Fig. 5.34, the turning radius Rmin is set to 5 cm, considering the size of the ball and that of the MiroSoT robot. The input variable membership functions are depicted in Fig. 5.35. The output θd has singleton values obtained at sampled positions as shown in Fig. 5.36; this figure shows the upper-half plane. Since the lower-half and upper-half planes are symmetrical about the X-axis, it suffices to consider the upper-half plane.
196
5. How to Improve Intelligence?
Fig. 5.33. Overall fuzzy control structure for the Shoot action
The input, output and rules for the destination block are defined as follows: 1. Input space (ρ, ϕ), relative position of the robot to the ball: ρ ∈ [0cm, 60cm], ϕ ∈ [0, 180 deg.]. 2. Output space (θd ), desired heading angle: θd ∈ [-180 deg., 180 deg.] (indicated by arrows → in Fig. 5.36). 3. Rules 49 rules are obtained for θd at sampled positions as shown in Fig. 5.36, with ϕ and ρ each characterized by seven linguistic values (N B, N M, N S, ZE, P S, P M, P B). Since the input space is uniformly divided, the rules are ‘sampled’ at the centre of each input region. The resultant rules for the destination block are represented in Table 3.
5.7 Fuzzy Logic and Control
197
Fig. 5.34. Desired output Table 5.3. Rules for the destination block θd ϕ NB NM NS ZE PS PM PB
NB -270.0 -240.0 -200.0 -170.0 -140.0 -20.0 0.0
NM -216.9 -201.5 -187.9 -180.0 -135.0 -30.0 0.0
NS -202.6 -180.0 -155.5 -126.9 -80.0 -34.2 0.0
ρ ZE -196.3 -171.0 -143.6 -120.6 -76.9 -35.9 0.0
PS -192.7 -166.1 -137.6 -114.3 -71.8 -35.5 0.0
PM -190.4 -163.0 -134.0 -110.7 -69.0 -40.2 0.0
PB -188.8 -161.0 -131.6 -108.4 -67.4 -45.4 0.0
Obstacle Block. This block modifies θd using offset angle θf if there is any obstacle nearby. Four variables, namely, velocity Vr , direction Dr , distance dr , and position Pr (positive if the obstacle is in front, negative otherwise.), all relative to the robot, are utilized to obtain θf in the presence of obstacles. Those relative quantities are necessary to obtain the escape radius Rs so as to avoid any obstacle that is either stationary or moving, as shown in Fig. 5.37. The input variable membership functions are depicted in Fig. 5.38 and Fig. 5.39.
198
5. How to Improve Intelligence? 1.2
NB
NM
NS
ZE
PS
PM
PB
0
10
20
30 rho
40
50
60
1
Membership value
0.8
0.6
0.4
0.2
0 −10
70
(a) For ρ
1.2
NB
NM
NS
ZE
PS
PM
PB
1
Membership value
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
120
140
160
180
phi
(b) For ϕ Fig. 5.35. Membership functions
The input, output and rules for the obstacle block are defined as follows: 1. Input space(Vr , Dr , dr , Pr ), the velocity, direction, distance and position of an obstacle relative to the robot: Vr ∈ [-0.5, 1.5], Dr ∈ [0 deg., 180 deg.], dr ∈ [0 cm, 90 cm], Pr ∈ [-0.5, 1.5].
5.7 Fuzzy Logic and Control
199
60
50
40
30
20
10
0 -6 0
-4 0
-2 0
0
20
40
60
Fig. 5.36. θd sampled at each input region in the vicinity of the ball at (0, 0)
do
Resc s
θs dr Obstacle v
θd
θd' Robot
Fig. 5.37. Obstacle avoidance scheme
2. Output space θf , the offset angle to be added to θd to produce θd = θd + θf : θf ∈ [-180 deg., 180 deg.]. 3. Rules Fig. 5.40 shows the fuzzy logic control (FLC) for the obstacle block. As shown in Fig. 5.40, Vr and Dr are needed to obtain Rs , while Pr and dr are used to obtain the proportional gain, wo , which is multiplied with θs to produce θf . θs is calculated with the relation: θs = tan−1
Rs . do
(5.27)
9 rules are obtained for Rs , with Vr and Dr each characterized by three linguistic variables (N B, ZE, P B). 12 rules are obtained for wo , with Pr characterized by three linguistic values (N E, ZE, P O) and dr characterized by four linguistic values (ZE, P S, P M, P B). These resultant rules for the obstacle block are represented in Table 5.4.
200
5. How to Improve Intelligence?
1.2
NB
ZE
PB
1
Membership value
0.8
0.6
0.4
0.2
0 −0.5
0
0.5 relative velocity
1
1.5
(a) For Vr
1.2
NB
ZE
PB
1
Membership value
0.8
0.6
0.4
0.2
0
0
20
40
60
80 100 relative direction
120
140
160
180
(b) For Dr Fig. 5.38. Membership functions Table 5.4. Rules for the obstacle block, FLC1 (left) and FLC2 (right) Rs Vr NB ZE PB
NB 20 20 20
Dr ZE 20 25 35
PB 20 30 40
wo Pr NE ZE PO
dr ZE 0.8 1.0 1.0
PS 0.7 1.0 1.0
PM 0.6 0.9 1.0
PB 0.0 0.0 0.0
5.7 Fuzzy Logic and Control
201
1.2
ZE
PS
PM
PB
1
0.8
0.6
0.4
0.2
0 −10
0
10
20
30
40
50
60
70
(a) For dr
1.2
ZE
NE
PO
1
0.8
0.6
0.4
0.2
0 −40
−30
−20
−10
0
10
20
30
40
(b) For Pr Fig. 5.39. Membership functions
Fuzzy Posture Controller. In the overall structure of Fig. 5.33, the fuzzy posture controller block receives θd from the fuzzy planner block and the ρpart of robot posture information from the vision processing system. Then the posture controller block generates the appropriate left-wheel and right-wheel velocities to make θ follow θd at non-zero linear speed before ρ diminishes. So the posture controller is only concerned with directing the robot’s heading angle θ to follow θd at positive linear velocity. For this conventional problem of mobile robotics, the following heuristics are incorporated:
202
Vr
5. How to Improve Intelligence?
Rs
FLC1
θs
Tan-1( )
Dr
θf =woθs dr
wo
FLC2
Pr Fig. 5.40. FLC for obstacle block
• If ρ big → VL , VR big. • If |θe | = |θd − θ| big → |VL − VR | big. The input variable membership functions are depicted in Fig. 5.41. The input, output, and rules for the posture controller block are defined as follows: 1. Input space (ρ, θe ), posture error of the robot to the ball and the path: ρ ∈ [0cm, 60cm], θe ∈ [-120 deg., 120 deg.]. 2. Output space (VL , VR ), desired left-wheel and right-wheel velocities: VL , VR ∈ [-54 cm/s, 153 cm/s]. 3. Rules According to the above heuristics, 49 rules are acquired each for the left-wheel and right-wheel velocities. Table 3 is the rule table for the right-wheel speed VR , with ρ and θe each characterized by seven linguistic values (N B, N M, N S, ZE, P S, P M, P B). The left-wheel speed is symmetrical about the X-axis (i.e., ϕ = 0). In the table, one unit corresponds to 1.534 cm/sec for the MiroSoT robot. Table 5.5. Rules for right wheel VR ϕ NB NM NS ZE PS PM PB
NB -35 -25 -15 30 15 25 35
NM -27 8 15 30 40 51 63
NS -27 8 22 50 44 51 63
ρ ZE -3 18 35 60 65 61 67
PS -3 31 57 90 82 68 67
PM -3 31 67 100 92 68 67
PB -3 42 67 100 92 77 67
You would have noticed that all the rule tables contain real-value entries instead of the linguistic values for the respective control variables, namely,
5.7 Fuzzy Logic and Control
203
1.2
NB
NM
NS
ZE
PS
PM
PB
0
10
20
30 Distance
40
50
60
PS
PM
1
Membership value
0.8
0.6
0.4
0.2
0 −10
70
(a) For ρ
1.2
NB
NM
NS
ZE
PB
1
Membership value
0.8
0.6
0.4
0.2
0
−150
−100
−50
0 Theta error
50
100
150
(b) For θe Fig. 5.41. Membership functions
θd , Rs , wo , and VR . In the path planner tables for θd , Rs , and wo , these are crisp (defuzzified) values, obtained without resorting to defuzzification such as using Eq. (5.25), which will be used as control input. In the posture control table for VR , these singleton values have been determined in a heuristic and empirical manner; they need to be fine-tuned for better control performance. EP can be applied to tune the fuzzy posture controller, and we refer the reader to [40] for details.
204
5. How to Improve Intelligence?
Notes on Selected References On search algorithms, knowledge representation and learning from an agent’s perspective, the textbook [28] is a good source. To use Petri net theory for systems modelling, the book [51] should be consulted. The book [52] is a good introduction to reinforcement learning. The paper [45] surveys, from a computer-science perspective, the field of reinforcement learning prior to 1996. On Q-learning, reference to the work of Watkins and Dyan [53] besides the book [52] is recommended. As demonstrated in Section 5.4.3, the state space for Q-learning is quite large for a Small League MiroSoT team, and is set to increase with the number of robots. To extend Q-learning to a Middle League MiroSoT or NaroSoT team, the state explosion problem must be mitigated; the paper [54] presents a modular Q-learning approach that attempts to address this problem for multi-agent cooperation at the action-level for a NaroSoT (5-robot) team. There are a number of good textbooks on neural networks; the book [46] is a suitable reference for beginners. The book [55] is the landmark publication for EP applications, although many other papers appear earlier in the literature. In the book, finite state automata were evolved to predict symbol strings generated from Markov processes and non- stationary time series. Such evolutionary prediction was motivated by a recognition that prediction is a keystone to intelligent behavior (defined in terms of adaptive behavior, in that the intelligent organism must anticipate events in order to adapt behavior in light of an objective to achieve). Recent references on evolutionary programming include the book [56]. The book [57] is a self-contained volume of research papers covering both introductory material and selected advanced topics on the theory and application of evolutionary computation. For an introductory course on fuzzy logic, refer to the textbook [58]. For a treatment on fuzzy control from a control-engineering perspective, the book [59] is recommended. A number of defuzzification methods more flexible than the ones introduced in Section 5.7.4 can be found in [60, 61, 62].
6. Robot Soccer System: Software Components and Programming
6.1 Introduction The software complexity of a real-time robot soccer system calls for a structured development methodology and framework. In this chapter, we describe a host software model for the development of a robot soccer system. This model emphasizes modularity of design. An overview of the programming framework for robot soccer is then presented, in which a number of the robot soccer concepts described in earlier chapters are illustrated through example ‘C’ programs. These programs implement the key functions of a commandbased robot soccer system for MiroSoT. An overview of the functions, categorized into basic and applied skills, is given below. 1. Basic skills For a specified soccer robot, a) Velocity() sets its left-wheel and right-wheel velocity data; b) Angle() sets its desired left-wheel and right-wheel velocity data towards achieving a specified turning angle; c) Position() sets its left-wheel and right-wheel velocity data to move towards a specified position; and d) Shoot(), similar to Position(), but additionally sets the desired angle at which the robot should arrive at the specified position (where the ball is). The purpose is to direct the robot to hit the ball in the intended direction. 2. Applied skills These are more advanced functions that consider various strategic game situations. For a specified robot, a) Kick() implements a strategic process of ball kicking by pushing, as described by a state machine; b) Goalie() implements a strategic process of goalkeeping, as described by IF-THEN rules; and c) AvoidBound() implements an auxiliary strategy to prevent the robot from getting ‘stuck’ at the side-wall. The strategy is also described by IF-THEN rules.
J.-H. Kim, D.-H. Kim, Y.-J. Kim, K.-T. Seow: Soccer Robotics, STAR 11, pp. 205-256, 2004 Springer-Verlag Berlin Heidelberg 2004
206
6. Robot Soccer Programming
Several game and robot navigation strategies are also suggested. For the overall game, a simple zone-defence strategy is explained and its ‘C’ program code is given. The univector field and limit-cycle navigation methods, studied in Section 4.6, are good alternative strategies for implementing the functions, Kick() and Position(). The ‘C’ codes of the essential component functions for implementing each method are also given and explained. To build a solid MiroSoT team, these example and generic codes could be modified or expanded to incorporate other ideas and techniques, many of which have been introduced in earlier chapters.
6.2 MiroSoT Host Software Model 6.2.1 Modular Software The desired characteristics of a host software system for a MiroSoT team include fault tolerance and ease of development. Fault tolerance is desired to accomodate time critical activities and ensure that the system remains stable even if deadlines are sometimes missed. To achieve these characteristics, a modular approach to building the host computer software is recommended. Modularity allows the task of software development to be broken along functional boundaries and assigned to different members of the programming team. It provides for ease of development and maintenance in that the inputs and outputs of each module may be specified, inspected, and debugged. It allows for the eventual migration of some parts of the software to the robots themselves without re-programming from scratch. Finally, it even allows modules to be restarted or replaced during a match without requiring a full reset. 6.2.2 Modular Design Rules The robot soccer system is multi-tasking and asynchronous, so the potential for deadlocks exists that must be totally avoided. Thus, for a modular software design to be deadlock-free, a module should adhere to the following rules: 1. It should not have any loop in the module data-dependency path. 2. It only implements a single task with a single data dependency. For example, the vision module should only depend on the video frame grabber card and not also on the status of the communication port, so the communication driver and video interface must be in separate modules. 3. It should be independent of other modules to the greatest extent possible, in order to minimize dependencies and allow independent specification, implementation, and verification.
6.2 MiroSoT Host Software Model
207
Based on these general rules, a modular software model is presented as shown in Fig. 6.1. It shows all the (software) modules comprising the MiroSoT host software, with the directional lines denoting message passing.
User interface
Vision
Calibration
Strategy
Image data recorder
Real time monitor display
Message transmission
Game data recorder
Data User commands Critical data
Fig. 6.1. Host software model for a MiroSoT team
The key system modules are the vision, strategy and message transmission modules; they implement the functionalities of SENSE, DECIDE and ACT:Control, and ‘interface’ between ACT:Control and ACT:Actuation, respectively. As this is a real-time system, data is continually being generated. A module implementation should therefore adhere to the following rules necessary to achieve fault-tolerance in real-time: 1. It should be able to act only on the most current set of data and not waste time processing old data. This implies that the system as a whole should not have any queuing. 2. It should be able to function even if the module ‘listening’ to (i.e., receiving) its output becomes inoperative. In other words, sending its output data to an inoperative or nonexistent module will not cause it to stall, a feature called non-blocking write. 3. It does not re-transmit and must communicate with other modules using only atomic messages (i.e., messages that cannot be fragmented). 4. It needs to be able to tolerate the failure of a module that it receives its data set from, and has code to handle such events. For example, the message transmission module monitoring the command data from the strategy module would need to tell the team robots to halt if the strategy module fails to send the command data within an alloted time, to prevent these robots from crashing onto the side walls.
208
6. Robot Soccer Programming
5. It should perform the same function regardless of the number of listening robots, in order to ensure consistent output from running in simulation to running in a real environment. Note that an exception to Rule 1 is the implementation of predictor(), an important subroutine of the strategy module. This is because the subroutine requires not only the current but also the last four positions of the ball, in order to predict the next ball position (see Fig. 5.8).
6.3 Programming Framework: An Overview The vision, strategy and message transmission modules of the host software model of Fig. 6.1 are implemented as Find_Object(), My_Strategy(), and Send_Command(), respectively, in the program structure depicted in Fig. 6.2.
2Q5HDG\
)LQGB2EMHFW 0\B6WUDWHJ\ 6HQGB&RPPDQG
Fig. 6.2. Overall program structure
My_Strategy() is developed to cover all the aspects (or modes) of the MiroSoT game, as shown in Fig. 6.3. The program code fragment for selecting a game mode is given as follows: void My_Strategy() { switch(m_nGameMode) { case GAME_STARTKICK: Kick_Off(); break;
// Strategy for kick-off
case GAME_PENALTY: Penalty_Kick(); // Strategy for penalty-kick break; . . .
. . .
. . .
. . .
6.3 Programming Framework: An Overview
209
.LFNB2II
0\B6WUDWHJ\
3HQDOW\B.LFN )UHHB.LFN )UHHB%DOO *RDOB.LFN 1RUPDOB*DPH
Fig. 6.3. My Strategy()
case GAME_NORMAL: Normal_Game(); // Strategy for game break; } } The game mode is selected via the system GUI, and the program invokes accordingly via switch-variable m_nGameMode. The key robot soccer functions/procedures needed for each mode of the game are listed in Table 6.1. They are grouped into basic and applied skill functions. The next two sections describe how they are implemented as program codes for the experimental robot soccer system; in this system, only the proportional (P) control law is used to compute the desired wheel velocities, and then close-loop (actuation) control is implemented to achieve the commanded or desired wheel velocities in each team robot. Besides, the P gain values used in all the program codes are empirical values. The microprocessorbased hardware and firmware for a soccer robot as introduced in Chapter 2 are assumed. The following are definitions of some programming constants, variables and arrays needed to understand the program code following the description of each robot soccer function: 1. Constants HOME1 HOME2 HGOALIE M_PI
: : : :
team robot ID 0. team robot ID 1. team robot ID 2. π (3.14).
210
6. Robot Soccer Programming
Table 6.1. Program functions for robot soccer system (MiroSoT category) Mode
Functions
Kick_Off()
Velocity(), Goalie().
Position(),
Penalty_Kick()
Kick(),
Goalie().
Free_Kick()
Position(),
Goalie().
Goal_Kick()
Kick(),
Position().
: Attack()
Position(),
Kick().
: Defend()
Position(),
Kick().
: Goalie()
Position(), Velocity().
Angle(),
Normal_Game()
2. Variables whichrobot d_e
:
robot ID. error in distance between robot’s current : position (x, y) and desired position (xd , yd ). theta_d : desired angle θd (in degrees). theta_e : error in angle θ e . vL : desired (PWM-based) velocity data HL for left-wheel of robot. vR : desired (PWM-based) velocity data HR : for right-wheel of robot. Recall that data H ∈ [0, 255] is a PWM integer data (see Section 2.3.5, page 63). 3. Arrays and functions • PositionOfBall[0]: the x-coordinate value of the ball’s position. • PositionOfBall[1]: the y-coordinate value of the ball’s position. • AngleOfHomeRobot[whichrobot]: the heading angle θ (in degrees) of robot with ID whichrobot.
6.4 Basic Skill Functions
211
• PositionOfHomeRobot[whichrobot][0]: the x-coordinate value of the position of the robot with ID whichrobot. • PositionOfHomeRobot[whichrobot][1]: the y-coordinate value of the position of the robot with ID whichrobot. • atan2(arg1,arg2): tan−1
arg1 arg2
(in radians).
6.4 Basic Skill Functions 6.4.1 Velocity() To actuate the motors, PWM (Pulse Width Modulation) is used (see Section A.1, from page 273 onwards). As explained in Section 2.3.5 (from page 63), the velocity data H is a PWM data sent by the host computer and converted to the actual PWM data W by the receiving robot for motor actuation. The required PWM data H for a desired wheel (rotational) velocity ω ¯G can be 1 computed using Eq. (2.39) for W and H = W . 4 For a specified robot, the Velocity() function sets the H data vL and vR, given the respective ‘normalized’ velocity data vl and vr defined by 1 1 ω ¯ GL ω ¯ GR , vr = A , (6.1) vl = A 2 ωGL |max 2 ωGR |max where ωGL and ωGR denote the rotational velocities - averages (overhead bar ¯) and maximum constants (|max ) - of the left and right wheels, respectively. Equivalently, we get 1 VL 1 VR vl = A , vr = A , (6.2) 2 VL |max 2 VR |max where VL |max and VR |max are constants denoting the linear maximum velocities of the left and right wheels, respectively. Motors of the same driving capacity are used, thus VL |max = VR |max = Vmax . Hence, the ‘PWM-version’ of Eq. (4.9) is vl = Kν · ν − Kω · ω, vr = Kν · ν + Kω · ω, where Kν =
1 L · A and Kω = Kν · . 2Vmax 2
(6.3)
212
6. Robot Soccer Programming
As we will show later, the inputs vl and vr to Velocity() are generated by selecting and applying a control law for the translational velocity ν and/or turning velocity ω of the (centre of the) robot. Referring to Eq. (2.39), suppose we set A = (PR2) + 1 = 256, and for the left (L) and right (R) motors used, i ∈ {L, R}, ω ¯ Gi | ≤ 0.83 (due to saturation). zdi = 9 (dead zone), | ωGi |max Then applying Eq. (6.1), ω ¯ GL ω ¯ GR vl = 128 ∈ [−106, 106], vr = 128 ∈ [−106, 106]. ωGL |max ωGR |max 1 From Eq. (2.39) for the actual PWM data W and that H = W for vL and 4 vR, the program code for Velocity() follows: void Velocity(int whichrobot, int vl, int vr) { // Max limits for backward (-ve) and forward (+ve) directions // in backward direction if( vl < -106 ) vl = -106; // For left wheel if( vr < -106 ) vr = -106; // For right wheel // in forward direction if( vl > 106 ) vl = 106; if( vr > 106 ) vr = 106;
// For left wheel // For right wheel
// Use ASCII code extensions 128-255 // (representation for non-printable characters) if( vl >=0 ) vL = 137 + vl; // For left wheel else vL = 119 + vl; if( vr >=0 ) vR = 137 + vr; else vR = 119 + vr;
// For right wheel
// To avoid 0xAA (alternating byte) // used in communication protocol if(vR == 0xAA) vR = 0xAC; if(vL == 0xAA) vL = 0xAC; switch(whichrobot){ // case HOME1: // command[4] = (unsigned char)(vL); // command[5] = (unsigned char)(vR); // break;
Specify which robot When whichrobot = 0 Left-wheel velocity Right-wheel velocity
6.4 Basic Skill Functions
213
case HOME2: // When whichrobot = 1 command[6] = (unsigned char)(vL); // Left-wheel velocity command[7] = (unsigned char)(vR); // Right-wheel velocity break; case HGOALIE: // When whichrobot = 2 command[8] = (unsigned char)(vL); // Left-wheel velocity command[9] = (unsigned char)(vR); // Right-wheel velocity break; } } In the code, the global array command[] holds 10 one-byte elements. The experimental host program uses this array to send data by IR communication to the team robots, on the message format shown in Fig. 2.35. The elements command[i], for which i = 4,5,..,9, are the variables that store the respective velocity data H for the team robots. The other elements, for which i = 0,..,3, are the variables that store information in accordance to the IR communication protocol. The communication function used is the Send_Command() function given below: void Send_Command() { m_pComm.WriteCommBlock((LPSTR)command, 10); // Sends data in command array of size 10 bytes } Note that the byte 0xAA is not included in the command[] array; the transmitter automatically adds these bytes during the actual transmission. Before Send_Command() can be used, a serial port must be selected; details of port selection are, however, omitted. Velocity() is the most fundamental function. The next two basic skills are formulated in terms of this function. 6.4.2 Angle() This function sets the desired velocity data vL and vR for a specified robot towards achieving a desired turning angle theta_d. Referring to Fig. 6.4, the x-distance and y-distance between the position (x, y) of the robot and the desired point are |dx| and |dy|, respectively; the desired angle theta_d is thus given by theta_d = tan−1
dy . dx
214
6. Robot Soccer Programming ڴ
dx
Desired point
dy
θe θe = θd −θh θd
θh ڳ
Fig. 6.4. Angle or turning control
For a specified robot, Angle() first uses a control law to set the ‘normalized’ velocity data vl and vr, given a desired angle theta_d, and then uses Velocity() to generate the velocity data vL and vR. The specified robot that receives the data will rotate its two wheels accordingly; for example, the robot depicted in Fig. 6.4 would rotate its left-wheel forward and right-wheel backward to face the desired direction. Consider proportional (P) control for ω. Then ω = KPa · θe ,
(6.4)
where KPa is the proportional gain. Because Angle() is concerned with turning motion only, ν = 0. Substituting Eq. (6.4) and ν = 0 into Eq. (6.3), we get the ‘PWM-version’ of P control law to move the robot: vl = −ka · θe , vr = ka · θe ,
(6.5)
where ka = Kω · KPa . The robot’s heading angle theta is obtained in real-time through a routine called AngleOfHomeRobot[]. Using this routine and Eq. (6.5), the program code for Angle() follows: void Angle(int whichrobot, int theta_d) { // declare variables theta_e, vl, vr int theta_e, vl, vr; // calculate theta_e = theta_d - theta theta_e = theta_d - AngleOfHomeRobot[whichrobot];
6.4 Basic Skill Functions
215
// keep theta_e within (-180, 180] while(theta_e > 180) theta_e -= 360; while(theta_e 50){ // |theta_e| > 50. // Calculate normalized PWM data vl = (int)(-9./90.*(double)theta_e);// for left wheel vr = (int)(9./90.*(double)theta_e); // for right wheel } else if(abs(theta_e) > 20) { // 20 else if(d_e > else if(d_e > else
ka according to kl = 50) kl = 30) kl = 20) kl = 10) kl = kl =
distance error 1.0, ka = 0.12; 1.2, ka = 0.125; 1.4, ka = 0.13; 2.0, ka = 0.14; 3.0, ka = 0.16; 5.0, ka = 0.18;
// calculate the desired angle if(dx==0 && dy==0) theta_d = 90; // prevent div. by zero else theta_d = (int)(180/M_PI*atan2((double)(dy),(double)(dx))); // keep theta_e within (-180,180] while(theta_e > 180) theta_e -= 360; while(theta_e 90){ theta_e -= 180; d_e = -d_e; } // calculate normalized PWM values vl = (int)(kl*d_e - ka*(double)theta_e); // for left wheel vr = (int)(kl*d_e + ka*(double)theta_e); // for right wheel Velocity(whichrobot, vl, vr);
// Call the Velocity function
} When called periodically with the same (x_d, y_d), the above program Position() directs the specified robot to move to the desired position.
218
6. Robot Soccer Programming
However, the code does not address two practical problems. First, as the specified robot approaches the desired point, it slows down and eventually stops at the desired point. This means that if this desired point is the position of the ball, the robot stops when it reaches (or is close to) the ball. In other words, it cannot kick the ball. Second, as depicted in Fig. 6.6, when the angle error theta_e is at +90 ◦ or −90◦ , the robot experiences a swinging motion as it continually switches its heading to either the conventional forward or backward direction. To elaborate, Position(), called when θe = +90◦, will direct the robot to move forward (to the right) as shown in Fig. 6.6(a); but in so doing, the angle error θe increases marginally above 90◦ by the second call to Position(). In this case, Position() will switch the robot’s heading to the conventional backward direction, in the opposite direction as indicated by the thick ‘Moving direction’ arrow. Following which, the robot moves in the new heading direction (to the left) as shown in Fig. 6.6(b), causing the angle error θe to decrease marginally below −90◦ by the third call to Position(). The condition leads Position() to switch the robot’s heading back to the conventional forward direction, as indicated by the thick ‘Moving direction’ arrow. This cyclic pattern continues, resulting in oscillatory motion that the robot will experience. Assuming a stationary ball, this motion can continue for several cycles before the robot can reach the ball.
(a) When θe = 90◦++
(b) When θe = −90◦−−
Fig. 6.6. Problem of oscillation about θe = ±90◦ with Position()
6.4 Basic Skill Functions
219
To overcome the first problem, apply proportional (P) control to ω only, i.e., vl = Vc − ka · θe , vr = Vc + ka · θe ,
(6.8)
but with Vc = Kν · ν set by the following exponential function: 1 Vc = vo , − ε 2 1 + e−ε1 de
(6.9)
where vo , ε1 and ε2 are positive constants. A graph of Eq. (6.9) is shown in Fig. 6.7. The purpose of Eq. (6.9) is
Vc
40
20
0
1cm
5cm
de
Fig. 6.7. Graph of Vc against de
to keep the robot’s velocity ν at a certain positive limit when it reaches the desired position. To overcome the second problem (the oscillation problem), the code for Position() needs to be revised to switch robot’s heading direction to the opposite direction if the angle error θe is greater than (90◦ + β + ) or less than −(90 + β − ). Besides, under the condition of θe ∈ [−(90◦ + β − ), −(90◦ − β − )] or θe ∈ [(90◦ − β + ), (90◦ + β + )], it needs to set the velocity data vl and vr for robot turning only in order to exit this condition swiftly. Incorporating the aforementioned considerations with vo = 70, ε1 = 3 and ε2 = 0.3 for Eq.(6.9) and β + = β − = 5, the program code for Position() is revised as follows:
220
6. Robot Soccer Programming
void Position(int whichrobot, double x_d, double y_d) { // declare the variables int theta_d=0, theta_e = 0, vl, vr, vo =70; double dx, dy, d_e, ka = 10.0/90.0; // calculate the distance error dx = x_d - PositionOfHomeRobot[whichrobot][0]; dy = y_d - PositionOfHomeRobot[whichrobot][1]; d_e = sqrt(dx*dx+dy*dy); if(dx==0 && dy==0) theta_d = 90; else theta_d = (int)(180/M_PI*atan2((double)(dy),(double)(dx))); // calculate theta_e = theta_d - theta_r theta_e = theta_d - AngleOfHomeRobot[whichrobot]; // keep theta_e within (-180,180] while(theta_e > 180) theta_e -= 360; while(theta_e 100) else if(d_e > else if(d_e > else if(d_e > else
according to distance error ka = 17./90.; 50) ka = 19./90.; 30) ka = 21./90.; 20) ka = 23./90.; ka = 25./90.;
if (theta_e > 95 || theta_e < -95) { // switch robot’s heading direction theta_e += 180; if (theta_e>180) theta_e -= 360; if(theta_e 30)) ka = 0.2; if(abs(theta_e > 20)) ka = 0.22; ka = 0.24;
// calculate normalized PWM data vl = (int)(Vc - ka*(double)theta_e); //for left wheel vr = (int)(Vc + ka*(double)theta_e); //for right wheel Velocity(whichrobot,vl,vr); // call the Velocity function } When called periodically with the same position inputs, the above program Shoot() should direct a specified robot to bump against the ball at the desired heading angle. However, the code does not address the technical problem of robot chattering. Ideally, Shoot() should direct the specified robot at high velocity to negotiate any bend smoothly, and then approach the ball along a straight line. But due to ‘overshoot’ in negotiating the bend at high speed, there is subsequently a continual angle-error correction in the robot’s heading as it approaches the ball, resulting in a chattering motion trajectory as depicted in Fig. 6.10. To elaborate, when the fast-moving robot ‘cuts’ the straight line from below, Shoot() would set the robot’s desired heading angle θd to negative. But by the next call to Shoot(), the fast-moving robot would have ‘cut’ the line from above, so Shoot() would set the robot’s desired heading angle θd to positive. This cyclic pattern continues rapidly, resulting in the robot chattering as it approaches the ball.
Robot Ball
Fig. 6.10. Problem of chattering with Shoot()
Other techniques, such as the two methods detailed in Section 4.6, are suitable for implementing the Shoot() function. In particular, we highlight
226
6. Robot Soccer Programming
a modification of the univector navigation method presented in Section 4.6.2 for addressing the chattering problem, as follows: Shown in Fig. 6.11 is a modified univector field for attaining a target posture (i.e., a desired heading angle for the robot at the position of the ball). The area altered has all unit vectors at 0 degree (i.e., pointing horizontally to the right where the ball is). This differs from the original field, shown in Fig. 4.19, which has all unit vectors in the same area pointing at either positive or negative angles, constituting a vector flow that converges to the line of horizontal unit vectors passing through point g (the ball’s position in Fig. 6.11) and point r. Thus, in the original field, chattering can occur when the robot turns into the said area at high speed. But any unit vector that the robot ‘latches’ onto upon entering the modified area can direct the robot straight towards the ball, eliminating the chattering problem. In this modified univector field navigation approach, the setting for width y of the modified area should, optimistically, not exceed the robot’s width L; this is so that the robot could hit the ball with the intended impact direction that goes through the centre of the ball.
0.5∆
oB oB
d d
0.5∆
Fig. 6.11. A modified univector field for solving the problem of chattering
6.5 Applied Skill Functions
227
6.5 Applied Skill Functions The following applied robot soccer functions are formulated in terms of the basic skill functions studied in the preceding section, via a structure. For Kick(), the structure is realized by a state machine. For Goalie() and AvoidBound(), the structure is realized by IF-THEN rules. 6.5.1 Kick() This function uses Position() and Angle() to implement the process of ball kicking by a specified robot. This function is an alternative to the Shoot() function. In the following, the pseudocode for Kick() is first given. We then explain how this function can strategically direct the specified robot to kick the ball towards the opponent goal, taking the borders (or side-walls) of the playground into consideration. Overall Pseudocode. void Kick(int whichrobot) { static int flag; // kicking direction Set the kicking direction to the goal; Near the playground border, change the direction to avoid collision; switch(flag){ case 0: // state S0 Go behind the ball (i.e., into ‘shootable area’) by using Position(); If the robot is near the specified shooting position, switch to state S1 by setting flag = 1; break; case 1: // state S1 Turn to face the ball by using Angle(); If the robot’s direction is towards the ball, switch to state S2 by setting flag = 2; break; case 2: // state S2 Kick the ball by using Position(); If the robot is out of the shootable area, switch to state S0 by setting flag = 0; break; }
228
6. Robot Soccer Programming
The structure of the Kick() pseudocode is re-defined by the program structure of Fig. 6.12(a) that can be naturally described by the state machine, represented as a (self-explanatory) state graph as shown in Fig 6.12(b).
VZLWFK KI IODJ ^ ^ FDVH H VWDWH6 'R R$ $ ,I I& & W WKHQ QIO IOD DJ EUHDN
&
FDVH H VWDWH6 'R R$ $ ,I I& & W WKHQ QIO IOD DJ EUHDN
`
6 $
FDVH H VWDWH6 R$ $ 'R ,I I& & W WKHQ QIO IOD DJ EUHDN
& 6 $
&
&
6 $
&
(a) Program structure
&
(b) State machine
State
Action
Condition
S0 : Far from the ball
A0 : Move behind the ball
C0 : When robot is near the specified shooting position behind the ball
S1 : Behind the ball
A1 : Turn to face the ball
C1 : When robot’s direction is towards the ball
S2 : Kicking the ball
A2 : Kick the ball
C2 : When robot is out of shootable area
Fig. 6.12. A strategy for the Kick() function
We now present the codes with explanation for the various sections of the Kick() program. Set Kicking Direction. In this code section, ideally, the function should direct the specified robot to kick the ball towards the opponent goal, but to
6.5 Applied Skill Functions
229
prevent the robot from colliding with any playground border or side-wall, or miskicking towards its own team goal, the kicking direction has to be planned according to the position of the ball. The plan considered for this example program is shown in Fig. 6.13, where the direction of ball kick has to be aimed towards the goal if the ball is in subarea A, but otherwise modified as depicted in the figure.
Y
' B B ' C C
laog tnenoppO
' D D
laog maeT
AA
C' C D' D
B B'
X
Fig. 6.13. Mapping playground subareas to desired directions of ball kick
The code for this section follows: // calculate the angle of the vector from the ball to // the center of the opponent goal theta_d = 180/M_PI*atan2( 130./2.-PositionOfBall[1],150-PositionOfBall[0] ); // if the ball is in area C or C’ if(PositionOfBall[0]