Introduction to Embedded Systems: Interfacing to the Freescale 9S12

Introduction to Embedded Systems Interfacing to the Freescale 9S12 Jonathan W. Valvano University of Texas at Austin A

3,277 376 4MB

Pages 577 Page size 252 x 316.44 pts Year 2009

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

The HCS12 9S12: An Introduction to Software and Hardware Interfacing

The HCS12/9S12: An Introduction to Software and Hardware Interfacing Second Edition Han-Way Huang Minnesota State Unive

1,578 763 10MB Read more

Introduction to EMS Systems

INTRODUCTION TO E M S Systems This book is dedicated to my students, past, present, and future, for it is in these st

2,674 1,223 2MB Read more

Introduction to Dynamical Systems

This page intentionally left blank This book provides a broad introduction to the subject of dynamical systems, suitab

1,925 850 1MB Read more

Multi-Core Embedded Systems

Edited by Georgios Kornaros Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an in

1,577 969 21MB Read more

Multi-Core Embedded Systems (Embedded Multi-Core Systems)

MULTI-CORE EMBEDDED SYSTEMS Embedded Multi-Core Systems Series Editors Fayez Gebali and Haytham El Miligi University

1,087 128 8MB Read more

Multi-Core Embedded Systems (Embedded Multi-Core Systems)

MULTI-CORE EMBEDDED SYSTEMS Embedded Multi-Core Systems Series Editors Fayez Gebali and Haytham El Miligi University

829 390 6MB Read more

Embedded Systems Handbook, Second Edition: Embedded Systems Design and Verification

1,875 1,115 12MB Read more

Multi-Core Embedded Systems (Embedded Multi-Core Systems)

MULTI-CORE EMBEDDED SYSTEMS Embedded Multi-Core Systems Series Editors Fayez Gebali and Haytham El Miligi University

929 300 8MB Read more

Demystifying Embedded Systems Middleware

Dedication In loving memory of my father, who gave me the inspiration to write this book before he passed away, & for

2,596 444 17MB Read more

Testing Complex and Embedded Systems

1,343 458 2MB Read more

File loading please wait...

Citation preview

Introduction to Embedded Systems Interfacing to the Freescale 9S12

Jonathan W. Valvano University of Texas at Austin

Australia • Canada • Mexico • Singapore • Spain • United Kingdom • United States

Introduction to Embedded Systems: Interfacing to the Freescale 9S12, 1st Edition

© 2010 Cengage Learning

Director, Global Engineering Program: Chris Carson

ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means–graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, information storage and retrieval systems, or in any other manner–except as may be permitted by the license terms herein.

Senior Developmental Editor: Hilda Gowans

For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706.

Editorial Assistant: Jennifer Dinsmore

For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be emailed to [email protected].

Jonathan W. Valvano

Marketing Specialist: Lauren Betsos

Library of Congress Control Number: 2009923271

Media Editor: Chris Valentine

ISBN-13: 978-0-495-41137-6 ISBN-10: 0-495-41137-X

Director, Content and Media Production: Barbara Fuller-Jacobsen

Cengage Learning 200 First Stamford Place, Suite 400 Stamford, CT 06902 USA

Content Project Manager: Emily Nesheim Production Service: RPK Editorial Services, Inc. Copyeditor: Shelley Gerger-Knecthl Proofreader: Harlan James Indexer: Shelley Gerger-Knecthl Compositor: Integra Software Services Senior Art Director: Michelle Kunkler Internal Designer: John Edeen and Carmela Periera Cover Designer: Andrew Adams Cover Image: © Janaka/Shutterstock Permissions Account Manager, Text: Mardell Glinski Schultz Permissions Account Manager, Images: John Hill Text and Image Permissions Researcher: Kristiina Paul Senior First Print Buyer: Doug Wilke

Printed in Canada 1 2 3 4 5 6 7 13 12 11 10 09

Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan. Locate your local office at: international.cengage.com/region. Cengage Learning products are represented in Canada by Nelson Education Ltd. For your course and learning solutions, visit www.cengage.com/engineering. Purchase any of our products at your local college store or at our preferred online store www.ichapters.com.

Preface Embedded computer systems are electronic systems that include a microcomputer to perform specific dedicated tasks. The computer is hidden inside these products. Embedded systems are ubiquitous. Every week millions of tiny computer chips come pouring out of factories like Freescale, Microchip, Philips, Texas Instruments, Silicon Labs, and Mitsubishi finding their way into our everyday products. Our global economy, our production of food, our transportation systems, our military defense, our communication systems, and even our quality of life depend on the efficiency and effectiveness of these embedded systems. Engineers play a major role in all phases of this effort: planning, design, analysis, manufacturing, and marketing. This book provides an introduction to embedded systems, including both hardware interfacing and software fundamentals. This book employs a bottom-up educational approach. The overall educational objective is to allow students to discover how the computer interacts with its environment. It will provide hands-on experiences of how an embedded system could be used to solve Electrical Engineering (EE) problems. The focus will be on understanding and analysis, with an introduction to design. The optical sensors, motors, sampling ADCs and DACs are the chosen mechanism to bridge the Computer Engineering (CE) and EE worlds. EE concepts include Ohms Law, LED voltage/current, resistance measurement, and stepper motor control. CE concepts include I/O device drivers, debugging, stacks, queues, local variables and interrupts. This book is based on the Freescale 9S12. This book can be used effectively with any of the 9S12 derivatives, such as 9S12C32, 9S12DG256, 9S12DP512, and 9S12E128. The hardware construction is performed on a breadboard and debugged using a multimeter (students learn to measure voltage and resistance). Software is developed in 9S12 assembly; labs may be simulated-only or first simulated and then run on the real 9S12 system. Software debugging occurs during the simulation stage. Device testing occurs on the final product. One way to sort the broad range of topics within EE and CE is to group them into three categories: components, interfaces, and systems. Electrical and Computer Engineering curriculi devote considerable effort to teaching how to design the components within a system. Components include physical devices, analog circuits, digital circuits, power circuits, digital signal processing, data structures, and software algorithms. Interfacing in general and this book, in specific, address the important task of connecting these components together. So, one effective way to educate engineering students is to first teach them how to build components, then teach them how to connect components together (this book). After the student learns how to build things and connect them together, then the student can be taught how to build systems. Of course, once a system is complete, it can be interfaced with other systems to solve more complex problems. The book is essentially organized into three parts. Chapters 1 through 4 provide a basic introduction to computer architecture, representation of information, and assembly language programming. Parallel ports, switches, and LEDs are presented early in Chapter 2 so that students can write software that actually does something. Chapters 5, 6, 7, and 10 provide an in-depth treatment of software design as it applies to embedded systems. Interfacing and applications of embedded systems are presented in Chapters 8, 9, 11, and 12.

Objectives of the Book The overall objective of this book is to present basic computer architecture, teach assembly language programming, and present an introduction to interfacing. Most universities teach assembly language programming not because employers wish to hire engineers and scientists iii

iv

䡲 Preface

ready to produce assembly code, but rather, because it affords a concrete approach for teaching how software works. Furthermore, an embedded system is an effective vehicle around which to introduce architecture, programming, and interfacing because the components are simple and inexpensive. The book describes both general processes and specific details involved in embedded system design. In particular, detailed case studies are used to illustrate fundamental concepts, and laboratory assignments are provided. The specific objectives of this book include the understanding of: 䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲

The basic procedures involved in hardware/software simulation How information is represented on the computer The basic arithmetic and logical operations performed by the computer The fundamental architecture of the 9S12 family microcomputers The input/output operations and synchronization Assembly language programming: considering both function and style Simple hardware interfaces, including: switches, keyboards, LEDs, LCDs, DC motors, DACs, ADCs, and serial ports Debugging techniques: breakpoints, scanpoints, profiles, monitors, voltmeters, oscilloscopes, logic analyzers Program structures with a comparison between assembly and C Modular programming Elementary data structures Interrupt programming

This book does not discuss in detail every 9S12 instruction, but rather, it presents some of the instructions and uses them to discuss the general issues of representation of information, computer architecture, and developing embedded systems. In contrast, the Freescale programming reference guides do give details of each assembly instruction. In a similar manner, the Freescale microcomputer technical reference manuals explain all the I/O port functions. In other words, you will use this book along with the manuals from Freescale. A web site http://users.ece.utexas.edu/⬃valvano/ contains many reference documents for this book.

Prerequisites This book is intended for an introductory laboratory course in microcomputer programming and/or microcomputer interfacing. It is assumed the student has some knowledge of programming, covering concepts such as conditionals, for-loops, while-loops, functions, parameter passing, and arrays. Specific knowledge of C is not required, but C programs are presented throughout the book in an effort to explain the assembly language programs. In addition, some prior knowledge about digital logic is desired, but not necessary, covering topics such as not gates, and gates, or gates and D flip-flops. Students will need a fundamental knowledge of resistors, capacitors, and inductors, as typically covered in a freshmen physics class on electromagnetics. Calculus is not required for this book. For a more advanced treatment of microcomputer interfacing and embedded systems, see Embedded Microcomputer Systems: Real Time Interfacing Second Edition by Jonathan W. Valvano, published by Thompson, © 2006.

Special Features This book incorporates a number of special features specifically designed for the beginning engineer. An effective educational approach is to learn by doing. The first action component of the book is the use of checkpoints, which can be found throughout the book. A checkpoint

䡲 Preface

v

is a short question meant as an immediate feedback mechanism for the reader to evaluate his or her level of comprehension. Checkpoints should be performed while reading the chapter. Answers to checkpoints are given in the solutions manual section at the back of the book. The second action component of the book is the examples. Design examples are included within each chapter. The purpose of the examples is to apply knowledge presented in that chapter to solve a specific problem. The third action component is the tutorials. Each tutorial includes a sequence of actions (specific things for the reader to do) and a list of questions. Tutorials are meant to be performed without supervision, and should be performed after reading the chapter, but before attempting the labs or homework. Answers to the tutorial questions are also given in the solutions manual section in the back of the book. The most important action components of the book are the laboratory assignments, which can be found at the end of each chapter. Additional labs and the tutorials can be found on the web site http://users. ece.utexas.edu/⬃valvano/. Each laboratory solution can first be built and tested using the TExaS simulator, then downloaded and run on an actual 9S12. Only by performing the laboratory assignments can the reader truly assimilate the hardware and software concepts introduced in this book. Laboratories are meant to be performed under the supervision of an instructor, and involve the classic engineering processes of design, construction, debugging, and evaluation. Homework problems can also be found at the end of each chapter. These problems are less detailed and are intended to evaluate the reader’s understanding of specific topics introduced in the chapter.

How to Teach a Course Based on This Book The first step in the design of any course is to create a list of educational objectives. This book along with the materials on the book web site could be used to teach introductory microcomputer programming and/or microcomputer interfacing. Specific educational objectives that are supported in this book are microcomputer architecture, number systems, assembly language programming, debugging, I/O device interfacing, I/O device synchronization, subroutines, local variables, elementary data structures, and interrupts. The next important decision to make is the organization of the student laboratory. The importance of practical “hands on” experience is critical in the educational process. Unfortunately, space, staff, and money constraints force all of us to compromise, doing the best we can. On the other hand, the role of simulation is becoming increasingly important as the race for technological superiority is run with shorter and shorter design cycle times. Consequently, it is important to expose our students to all phases of engineering design, including problem specification, conceptualization, simulation, construction, and analysis. Universities that adopt this book will be allowed to download, rewrite, print out, and distribute the laboratory assignments presented in this book. The first laboratory configuration is based entirely on material included with book, and involves no extra costs. Each book allows the student to download and install the TExaS application on a single computer. Students, for the most part, work off campus and come to a TA station for help or lab grading. In this configuration, you can either develop software in assembly using the TExaS assembler or develop C programs using the special version of Metrowerks Codewarrior for the 9S12. The simulator itself becomes the platform on which the lab assignments are developed and tested. A second laboratory configuration combines simulation with some real microcomputer experiments. Labs can be first simulated, then run on a real microcomputer. Students are given or loaned a 9S12 development board like the Dragon12 board from Wytec (http://www.evbplus.com/index.html) or the Adapt9S12 board from Technological Arts (http://www.technologicalarts.com). Students can work off campus on the simulation aspects of the labs, then come to a laboratory for access to test equipment such as voltmeters and oscilloscopes. In this configuration, students first could write and debug assembly

vi

䡲 Preface

software using the TExaS simulator, then use TExaS to download and test on a real 9S12 board. TExaS can be used with any 9S12 that contains the Serial Monitor in protected EEPROM $F800 to $FFFF. The special version of Metrowerks Codewarrior for the 9S12 could also be used to develop either assembly or C using either the serial monitor or a background debug module (BDM) hardware pod. This is more expensive than the first configuration because actual microcomputer hardware and debugging systems are required.

What’s on the Book Web Site? 1. TExaS installer download. Each student purchasing a book can download and install TExaS. TExaS is a complete editor, assembler, and simulator for the Freescale 9S12 microcomputer. It simulates external hardware, I/O ports, interrupts, memory, and program execution. It is intended as a learning tool for embedded systems. This software is not freeware, but the purchase of the book entitles the owner to install one copy of the program. Once installed TExaS creates many subdirectories with example applications. 2. There are multiple short video tutorials about developing assembly language programs on TExaS. See http://users.ece.utexas.edu/⬃valvano/Readme.htm 3. There is a directory containing data sheets in Adobe’s pdf format. This information does not need to be copied to your hard drive; you can simply read the data sheets from the web itself. In particular there are data sheets for microcomputers, digital logic, memory chips, op amps, ADCs, DACs, timer chips and interface chips. See http://users.ece.utexas.edu/⬃valvano/Datasheets/ 4. There is a directory containing example applications. These examples include circuit diagrams and software that can be downloaded and run on the actual 9S12 board. http://users.ece.utexas.edu/⬃valvano/Starterfiles/ 5. There is a directory containing lecture notes and laboratory assignments based on this book. http://users.ece.utexas.edu/⬃valvano/EE319K/ 6. There is a web site containing downloads of materials that can be used with this book. http://www.cengage.com/engineering/valvano

Acknowledgments Many shared experiences contributed to the development of this book. First, I would like to acknowledge the many excellent teaching assistants I have had the pleasure of working with. Some of these hardworking, underpaid warriors include Dr. Nachiket Kharalkar, Dr. Robin Tsang, John Porterfield, Sri Priya Ponnapalli, Dr. Anil Kottam, Brett Hemes, Priyank Patel, Dr. Byung-geun Lee, Deepak Panwar, Tawfik Chowdhury, Jungho Jo, Usman Tariq, Glen Rhodes, Sandy Hermawan, Jacob Egner, Robby Morrill, and Kyle Hutchens. Ann Meyer developed most of the code for the HD44780 LCD simulation. My teaching assistants have contributed greatly to the contents of this book, especially Nachiket and Robin. In the similar manner, my students have recharged my energy each semester with their enthusiasm, dedication, and quest for knowledge. Secondly, I appreciate the patience and expertise of my fellow faculty members here at the University of Texas at Austin. From a personal perspective Dr. John Pearce provided much needed encouragement and support throughout my career. In addition, as instructors of the class around which this book was developed Dr. Bill Bard, Dr. Nachiket Kharalkar, Dr. Nur Touba, Mr. Mark Welker, Mr. Gary Daniels, and Dr. Ramesh Yerraballi provided insight and substance for this book. Dr. Lizy John and Dr. Yale Patt contributed to the architecture sections in this book. Thirdly, I would like to thank the experts who reviewed this manuscript. This is the third book I have written, and I was deeply impressed by the quality and quantity of

䡲 Preface

vii

suggestions made by these reviewers. The rough draft had serious flaws in how it was organized, and thanks to their helpful advice, I think this book now flows smoothly. In particular, I want to thank Bill Bard, University of Texas at Austin Christopher M. Cischke, Michigan Technological University Bruce A. Harvey, Florida A & M University Joseph J. Pfeiffer, New Mexico State University Karkal S. Prabhu, Drexel University Eric M. Schwartz, University of Florida Lastly, I appreciate the valuable lessons of character and commitment taught to me by my parents and grandparents. I recall how hard my parents and grandparents worked to make the world a better place for the next generation. Most significantly, I acknowledge the love, patience and support of my wife, Barbara, and my children, Ben, Dan, and Liz. In particular, Ben helped with the web site and the animations.

JONATHAN W. VALVANO

Good luck!

Contents 1

Introduction to Embedded Microcomputer Systems 1

1.1 1.2 1.3 1.4 1.5 1.6 1.7

Basic Components of an Embedded System 2 Applications Involving Embedded Systems 5 Flowcharts and Structured Programming 6 Concurrent and Parallel Programming 10 Product Development Cycle Successive Refinement 17 Quality Design 19

3

12

1.7.1 Quantitative Performance Measurements 1.7.2 Qualitative Performance Measurements 1.7.3 Attitude 20

1.8 1.9 1.10

2

2.3

2.4 2.5

Debugging Theory 21 Tutorial 1. Getting Started 23 Homework Assignments 24

2.6 2.7 2.8 2.9

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15

2.3.1 Assembly Language Instructions 2.3.2 Pseudo Operation Codes 33

4

32

Simplified 9S12 Machine Language Execution 33 Simple Addressing Modes 36 Inherent Addressing Mode 37 Immediate Addressing Mode 37 Direct Addressing Mode 38 Extended Addressing Mode 38 Indexed Addressing Mode 39 PC Relative Addressing Mode 39

The Assembly Language Development Process 40 Memory Transfer Operations Subroutines 43 Input/Output 45 2.9.1 Direction Registers 45 2.9.2 Switch Interface 46 2.9.3 LED Interface 47

41

Tutorial 2. Running with TExaS Homework Assignments 52 Laboratory Assignments 55

51

Representation and Manipulation of Information 57

Binary and Hexadecimal Numbers 27 Addresses, Registers, and Accessing Memory 29 Assembly Syntax 32

2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 2.5.6

viii

19 19

Introduction to Assembly Language Programming 27

2.1 2.2

2.10 2.11 2.12

Precision 57 Boolean Information 59 8-bit Numbers 60 16-bit Numbers 64 Extended Precision Numbers 66 Logical Operations 66 Shift Operations 76 Arithmetic Operations: Addition and Subtractions 78 Arithmetic Operations: Multiplication and Divide 92 Character Information 97 Conversions 99 Debugging Monitor Using a LED 102 Tutorial 3. Arithmetic and Logical Operations 102 Homework Assignments 104 Laboratory Assignments 110

9S12 Architecture

4.1

Introduction 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7

4.2 4.3

111

Big and Little Endian 111 Memory-Mapped I/O 112 *I/O-Mapped I/O 113 *Segmented or Partitioned Memory Memory Bus Cycles 114 Processor Architecture 116 I/O Port Architecture 118

113

*Understanding Software Execution at the Bus Cycle Level 121 9S12 Architecture Details 127 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5

4.4

111

9S12C32 Architecture 126 9S12DP512 Architecture 129 9S12E128 Architecture 132 Operating Modes 134 Phase-Lock-Loop (PLL) 134

The Stack

135

䡲 Contents

4.5 4.6 4.7

16-Bit Timer 137 *Memory Allocation 140 Performance Debugging 142

6

4.7.1 Instrumentation 142 4.7.2 Measurement of Dynamic Efficiency

4.8

4.9 4.10

5

5.1

6.1 142

Modular Design

Making Decisions

152 153

*Macros 168 *Recursion 171 Writing Quality Software

5.7.1 5.7.2 5.7.3 5.7.4 5.7.5

5.8 5.9 5.10 5.11

Stabilization 180 Single Stepping 180 Breakpoints Without Filtering 181 Conditional Breakpoints 181 Instrumentation: Print Statements 181

Tutorial 5a. Editing and Assembling 181 Tutorial 5b. MicrocomputerBased Lock 182 Homework Problems 186 Laboratory Assignments 190

6.9

Abstraction 216 Moore Finite-State Machines 217 Mealy Finite-State Machines 221 Functional Abstraction Within Finite-State Machines 223

*Dynamically Allocated Data Structures 226 6.9.1 *Fixed Block Memory Manager 6.9.2 *Linked List FIFO 230

*9S12 Paged Memory Functional Debugging

229

232 235

6.11.1 Instrumentation: Dump Into Array without Filtering 235 6.11.2 Instrumentation: Dump Into Array with Filtering 236

179

*How Assemblers Work 179 Functional Debugging 180

Arrays 199 Strings 203 *Matrices 204 Structures 209 *Tables 210 *Trees 212 Finite-State Machines with Statically Allocated Linked Structures 216 6.8.1 6.8.2 6.8.3 6.8.4

6.10 6.11

174

5.5.1 Assembly Language Style Guidelines 174 5.5.2 Comments 177 5.5.3 Inappropriate I/O and Portability

5.6 5.7

6.2 6.3 6.4 6.5 6.6 6.7 6.8

161

5.2.1 Conditional Branch Instructions 161 5.2.2 Conditional if-then Statements 163 5.2.3 Conditional if-then-else Statements 166 5.2.4 While Loops 166 5.2.5 For Loops 167

5.3 5.4 5.5

Indexed Addressing Modes used in Implement Pointers 193 6.1.1 Indexed Addressing Mode 193 6.1.2 Auto Pre/Post Decrement/Increment Indexed Addressing Mode 195 6.1.3 Accumulator Offset Indexed Addressing Mode 196 6.1.4 Indexed Indirect Addressing Mode 196 6.1.5 Accumulator D Offset Indexed Indirect Addressing Mode 196 6.1.6 Post-Byte Machine Coded for Indexed Addressing 196 6.1.6 Load Effective Address Instructions 197 6.1.7 Call-by-Reference Parameter Passing 198

5.1.1 Definition and Goals 153 5.1.2 Functions, Procedures, Methods, and Subroutines 155 5.1.3 Dividing a Software Task into Modules 156 5.1.4 How to Draw a Call-Graph 158 5.1.5 How to Draw a Data Flow Graph 160 5.1.6 Top-Down Versus Bottom-Up Design 160

5.2

Pointers and Data Structures 192

Tutorial 4. Building a Microcomputer and Executing Machine Code 144 Homework Assignments 147 Laboratory Assignments 148

Modular Programming

ix

6.12 6.13 6.14

7

Tutorial 6. Software Abstraction Homework Assignments 238 Laboratory Assignments 244

Local Variables and Parameter Passing

7.1 7.2 7.3 7.4

Local Versus Global 256 Stack Rules 259 Local Variables Allocated on the Stack 261 Stack Frames 262

256

237

䡲 Contents

x

7.5

Parameter Passing Using Registers, Stack and Global Variables 265

9.3 9.4 9.5 9.6

7.5.1 Parameter Passing in C 265 7.5.2 Parameter Passing in Assembly Language 267 7.5.3 C Compiler Implementation of Local and Global Variables 270

7.6 7.7 7.8

8

Tutorial 7. Debugging Techniques Homework Problems 276 Laboratory Assignments 280

8.1 8.2

8.3

8.4 8.5 8.6

RS232 Protocol 286 Transmitting in Asynchronous Mode 287 Receiving in Asynchronous Mode 288 9S12 SCI Details 290

SPI Fundamentals 294 SPI Details 296 9S12DP512 Module Routing Register 8-bit DAC Interface 299

8.7 8.8 8.9 8.10

309

*Pulse-Width Modulation 311 *Stepper Motors 316 Homework Problems 320 Laboratory Assignments 321

I/O Sychronization Interrupt Concepts

326 330

9.2.1 Introduction 330 9.2.2 Essential Components of Interrupt Processing 332 9.2.3 Sequence of Events 333 9.2.4 9S12 Interrupts 334 9.2.5 Polled versus Vectored Interrupts 337 9.2.6 Pseudo-Interrupt Vectors 337

10

Pulse Accumulator 352 *Direct Memory Access 356 Hardware Debugging Tools 357 Profiling 358

Tutorial 9. Profiling 363 Homework Problems 365 Laboratory Assignments 367

Numerical Calculations

368

Fixed-Point Numbers 368 *Extended Precision Calculations

371

10.2.1 Addition and Subtraction 372 10.2.2 Shift Operations 374 10.2.3 Mathematical Instructions on the 9S12 375 10.2.4 Multiplication and Division 377 10.2.5 Table Lookup and Interpolation 380

298

Interrupt Programming and Real-Time Systems 326

9.1 9.2

9.11 9.12 9.13

10.1 10.2

Scanned Keyboards 301 Parallel Port LCD Interface with the HD44780 Controller 303 Binary Actuators 306 8.6.1 Interface 306 8.6.2 Electromagnetic and Solid-State Relays 8.6.3 Solenoids 311

345

9.10.1 Profiling using a Software Dump to Study Execution Pattern 359 9.10.2 Profiling using an Output Port 360 9.10.3 *Thread Profile 361

Synchronous Peripheral Interface, SPI 294 8.3.1 8.3.2 8.3.3 8.3.4

9

9.7 9.8 9.9 9.10

General Introduction to Interfacing 284 Serial Communications Interface, SCI 286 8.2.1 8.2.2 8.2.3 8.2.4

9.6.1 Timer Features and Timer Overflow 9.6.2 Output Compare Interrupts 347 9.6.3 Input Capture Interrupts 350

274

Serial and Parallel Port Interfacing 284

Key Wakeup Interrupts 338 Periodic Interrupt Programming 342 Real-Time Interrupt (RTI) 343 Timer Overflow, Output Compare and Input Capture 345

10.3 10.4 10.5 10.6 10.7

11 11.1 11.2 11.3 11.4

Expression Evaluation 381 *IEEE Floating-Point Numbers Tutorial 10. Overflow and Dropout 387 Homework Problems 388 Laboratory Problems 392

Analog I/O Interfacing

383

398

Approximating Continuous Signals in the Digital Domain 398 Digital to Analog Conversion 399 Music Generation 400 Analog to Digital Conversion 403 11.4.1 9S12 ADC Details 403 11.4.2 ADC Data Formats 406 11.4.3 ADC Resolution 407

11.5 11.6 11.7

*Multiple Access Circular Queues 408 Real-Time Data Acquisition 409 *Control Systems 413

䡲 Contents

11.8 11.9 11.10

12 12.1 12.2 12.3

Tutorial 11. Analog Input Programming 416 Homework Problems 417 Laboratory Assignments 419

12.5 12.6

A1.5 A1.6

Introduction 433 Reentrant Programming and Critical Sections 434 Interthread Communication and Synchronization 438 Mailbox 439 Producer Consumer Problem FIFO Queue Implementation Double Buffer 446

A1.7 A1.8

440 444

Serial Port Interface using Interrupt Synchronization 438 *Distributed Systems. 447 *Design and Implementation of a Controller Area Network (CAN) 449 12.6.1 The Fundamentals of CAN 451 12.6.2 Details of the 9S12 CAN 454 12.6.3 9S12 CAN Device Driver 468

12.7

12.8 12.9 12.10 12.11

The Fundamentals of I2C 460 I2C Synchronization 464 9S12 I2C Details 465 9S12 I2C Single Master Example

469

Wireless Communication 470 Tutorial 12. Performance Debugging 470 Homework Problems 472 Laboratory Assignments 476

Appendix 1 Embedded System Development Using TExaS 480 A1.1 Introduction to TExaS 480 A1.2 Major Components of TExaS 483 A1.3 Embedded System Design Process 486

Overall Structure 492 Label Field 492 Operation Field 493 Operand Field 493 Expressions 494 Comment Field 496 Assembly Listing and Errors 497 Assembler Pseudo-Ops 499 S-19 Object Code 503

TExaS ViewBox 505 Microcomputer Interfacing in TExaS 506

Appendix 2 Running on an Evaluation Board 508 Appendix 3 Systems Engineering 511 A3.1 A3.2

*Inter-Integrated Circuit (I2C) Interface 460 12.7.1 12.7.2 12.7.3 12.7.4

Running and Modifiing Existing Assembly Language Programs 490 TExaS Editor 491 Assembly Language Syntax 492 A1.6.1 A1.6.2 A1.6.3 A1.6.4 A1.6.5 A1.6.6 A1.6.7 A1.6.8 A1.6.9

Communication Systems 433

12.3.1 12.3.4 12.3.4 12.3.4

12.4

A1.4

Design for Manufacturability Battery Power 512

Glossary of Terms

514

Solutions Manual

529

Checkpoint Solutions 529 Tutorial Solutions 542

Index

550

511

xi

This page intentionally left blank

1

Introduction to Embedded Microcomputer Systems Chapter 1 objectives are to: c Introduce embedded microcomputer systems c Outline the basic steps in developing microcomputer systems c Define data flow graphs, flowcharts and call graphs

It is an effective approach to learn new techniques by doing them. But, the dilemma in learning a laboratory-based topic like embedded systems is that there is a tremendous volume of details that first must be learned before microcomputer hardware and software systems can be designed. The approach taken in this book is to learn by doing. One of the advantages of a bottom-up approach to learning is that the student begins by mastering simple concepts. Once the student truly understands simple concepts, he or she can then embark on the creative process of design, which involves the putting the pieces together to create a more complex system. True creativity is needed to solve complex problems using effective combinations of simple components. Embedded systems afford an effective platform to teach new engineers how to program for three reasons. First, there is no operating system. Thus, in a bottom-up fashion the student can see, write, and understand all software running on a system that actually does something. Second, embedded systems involve input/output that is easy for the student to touch, hear, and see. Third, embedded systems are employed in many every-day products, motivating students by showing them how electrical and computer engineering processes can be applied in the real world. Rather than introduce the voluminous details in an encyclopedic fashion, the book is organized by basic concepts, and the details are introduced as they are needed. We will start with simple systems and progressively add complexity. The overriding theme for Chapter 1 will be to present the organizational framework with which embedded systems will be designed. Chapters 2 through 4 explain how the computer works. Chapters 5, 6, 7, and 10 present the details of software development on an embedded system. Interfacing I/O devices to build embedded systems is presented in Chapters 8, 9, 11, 12, and 13.

1

2

1.1

1 䡲 Introduction to Embedded Microcomputer Systems

Basic Components of an Embedded System Information is stored on the computer in binary form. A binary bit can exist in one of two possible states. In positive logic, the presence of a voltage is called the ‘1’, true, asserted, or high state. The absence of a voltage is called the ‘0’, false, not asserted, or low state. Figure 1.1 shows the output of a typical complementary metal oxide semiconductor (CMOS) circuit. The left side shows the condition with a true bit, and the right side shows a false. The output of each digital circuit consists of a p-type transistor “on top of” an n-type transistor. In digital circuits, each transistor is essentially on or off. If the transistor is on, it is equivalent to a short circuit between its two output pins. Conversely, if the transistor is off, it is equivalent to an open circuit between its outputs pins. On a 9S12 powered with 5 V supply, a voltage between 3.25 and 5 V is considered high, and a voltage between 0 and 1.75 V is considered low. Separating the two regions by 1.5 V allows digital logic to operate reliably at very high speeds. The design of transistor-level digital circuits is beyond the scope of this book. However, it is important to know that digital data exist as binary bits and encoded as high and low voltages.

Figure 1.1 A binary bit is true if a voltage is present and false if the voltage is 0.

True

Equivalence +5V

+5V

p-type on Out=5V n-type off

False

+5V

p-type off Out=0V

short Out=5V

n-type on

open

Equivalence +5V open Out=0V short

If the information we wish to store exists in more than two states, we use multiple bits. For example, a byte contains 8 bits, and is built by grouping 8 binary bits into one object, as shown in Figure 1.2. Information can take many forms, e.g., numbers, logical states, text, instructions, sounds, or images. What the bits mean depends on how the information is organized and more importantly how it is used. Figure 1.2 A byte is comprised of 8 bits.

Bit 7

Bit 6 +5V

+5V

Bit 5 +5V

Bit 4 +5V

Bit 3 +5V

Bit 2 +5V

Bit 1 +5V

Bit 0 +5V

Memory is a collection of hardware elements in a computer into which we store information, as shown in Figure 1.3. For most computers in today’s market, each memory cell contains one byte of information, and each byte has a unique and sequential address. The memory is called byte-addressable because each byte has a separate address. The address of a memory cell specifies its physical location and its contents is the data. When we write to memory, we specify an address and 8 bits of data, causing that information to be stored into the memory. When we read from memory we specify an address, causing 8 bits of data to be retrieved from the memory. Read Only Memory, or ROM, is a type of memory where is the information is programmed or burned into the device, and during normal operation it only allows read accesses. Random Access Memory (RAM) is used

1.1 䡲 Basic Components of an Embedded System Figure 1.3 Memory is a sequential collection of data storage elements.

Address

3

Contents

103 Main St 104 Main St 105 Main St 106 Main St 107 Main St 108 Main St

to store temporary information, and during normal operation we can read data from or write data into RAM. The information in the ROM is nonvolatile, meaning the contents are not lost when power is removed. In contrast, the information in the RAM is volatile, meaning the contents are lost when power is removed. The system can quickly and conveniently read data from a ROM. It takes a comparatively long time to program or burn data into a ROM. In contrast, it is fast and easy to both read data from and write data into a RAM. Software is a set of instructions, stored in memory, that are executed in a complicated but well-defined manner. The processor is the digital hardware device that executes software. A port is a physical connection between the computer and its outside world. Ports allow information to enter and exit the system. Information enters via the input ports and exits via the output ports. Other names used to describe ports are I/O ports, I/O devices, interfaces, or sometimes just devices. A bus is a collection of wires used to pass information between modules. A computer is an electronic device with a processor, memory, and I/O ports, connected together with a bus. A microcomputer is a computer small enough that one person can carry it. Small in this context describes its size not its computing power. Consequently, there can be great confusion over the term microcomputer, because it can refer to a very wide range of devices from a PIC12C508, which is an 8-pin chip with 512 words of ROM and 25 bytes RAM, to the most powerful Pentium-based personal computer. Computers are not intelligent. Rather, you are the true genius. Computers are electronic idiots. They can store a lot of data, but they will only do exactly what we tell them to do. Fortunately, however, they can execute our programs quite quickly, and they don’t get bored doing the same tasks over and over again. To better understand the expression embedded microcomputer system, consider each word separately. In this context, the word “embedded” means hidden inside so one can’t see it. The term “micro” means small, and a “computer” contains a processor, memory, and a means to exchange data with the external world. In an embedded system, we use ROM for storing the software and fixed constant data, and RAM for storing temporary information. Many microcomputers employed in embedded systems use EEPROM, which is an electrically erasable programmable ROM, because the information can easily be erased and reprogrammed. The functionality of a digital watch is defined by the software programmed into its ROM. When you remove the batteries from a watch and insert new batteries, it still behaves like a watch because the ROM is nonvolatile storage. As shown in Figure 1.4, the term embedded microcomputer system refers to a device that contains one or more microcomputers inside. Microcontrollers, which are microcomputers incorporating the processor, RAM, ROM and I/O ports into a single package, are often employed in an embedded system because of their low cost, small size, and low power

4

1 䡲 Introduction to Embedded Microcomputer Systems

Figure 1.4 An embedded system includes a microcomputer interfaced to external devices.

Embedded system Microcontroller

9S12

Processor I/O Ports

RAM ROM Bus

ADC

Electrical, mechanical, chemical, or optical devices DAC Analog signals

requirements. Microcontrollers like the 9S12 are available with a large number and wide variety of I/O devices, such as parallel ports, serial ports, timers, digital to analog convertors (DAC), and analog to digital convertors (ADC). The I/O devices are a crucial part of an embedded system, because they provide necessary functionality. The software together with the I/O ports and associated interface circuits give an embedded computer system its distinctive characteristics. Checkpoint 1.1: What is an embedded system?

A digital multimeter, as shown in Figure 1.5, is a typical embedded system. This embedded system has two inputs, the mode selection dial on the front and the red/black test probes. The output is a liquid crystal display (LCD) showing measured parameters. The large black chip inside the box is a microcontroller. The software that defines its very specific purpose is programmed into the ROM of the microcontroller. As you can see, there is not much else inside this box other than the microcontroller, a fuse, a few interfacing resistors, and a battery. Figure 1.5 A digital multimeter contains a microcontroller programmed to measure voltage, current and resistance.

As defined previously, a microcomputer is a small computer. One typically restricts the term embedded to refer to systems that do not look and behave like a typical computer. Most embedded systems do not have a keyboard, a graphics display, or secondary storage (disk). There are two ways to develop embedded systems. The first technique uses a microcontroller, like the 9S12. In general, there is no operating system, so the entire software system must be developed. These devices are suitable for low-cost, low-performance systems. One the other hand, one can develop a high-performance embedded system around the Arm or PC architecture. These systems typically employ an operating system, and are first designed on a development platform, and then the software and hardware are migrated to a standalone embedded platform. Checkpoint 1.2: What is a microcomputer?

The external devices attached to the microcontroller allow the system to interact with its environment. An interface is defined as the hardware and software that combine to allow the computer to communicate the external hardware. We must also learn how to interface a

1.2 䡲 Applications Involving Embedded Systems

5

wide range of inputs and outputs that can exist in either digital or analog form. This book provides an introduction to microcomputer programming, hardware interfacing, and the design of embedded systems. In general, we can classify I/O interfaces into four categories Parallel—binary data is available simultaneously on groups of lines Serial—binary data is available one bit at a time on a single line Analog—data is encoded as a variable voltage Time—data is encoded as a period, frequency, pulse width or phase shift A device driver is a set of software functions that facilitate the use of an I/O port. One of the simplest I/O ports on the 9S12 is a parallel port called PTT, meaning it is a collection of eight pins that can be used for either input or output. If PTT is an input port, then when the software reads from PTT, it gets eight bits (each bit is 1 or 0), representing the digital levels (high or low) that exist at the time of the read. If PTT is an output port, then when the software writes to PTT, it sets the outputs on the eight pins high (1) or low (0), depending on the data value the software has written. The other general concept involved in most embedded systems is they run in real-time. In a real-time computer system, we can put an upper bound on the time required to perform the input-calculation-output sequence. A real-time system can guarantee a worst case upper bound on the response time between when the new input information becomes available and when that information is processed. This response time is called interface latency. Another real-time requirement that exists in many embedded systems is the execution of periodic tasks. A periodic task is one that must be performed at equal-time intervals. A realtime system can put a small and bounded limit on the time error between when a task should be run and when it is actually run. Because of the real-time nature of these systems, microcomputers have a rich set of features to handle many aspects of time. Checkpoint 1.3: An input device allows information to be entered into the computer. List some of the input devices available on a general purpose computer. Checkpoint 1.4: An output device allows information to exit the computer. List some of the output devices available on a general purpose computer.

The embedded computer systems in this book will contain a Freescale 9S12, which will be programmed to perform a specific dedicated application. Software for embedded systems typically solves only a limited range of problems. The microcomputer is embedded or hidden inside the device. In an embedded system, the software is usually programmed into ROM and therefore fixed. Even so, software maintenance (e.g., verification of proper operation, updates, fixing bugs, adding features, extending to new applications, end user configurations) is still extremely important. In fact, because microcomputers are employed in many safety-critical devices, injury or death may result if there are hardware and/or software faults. Consequently, testing must be considered in the original design, during development of intermediate components, and in the final product. The role of simulation is becoming increasingly important in today’s market place as we race to build better and better machines with shorter and shorter design cycles. An effect approach to building embedded systems is to first design the system using a hardware/software simulator, then download and test the system on an actual microcontroller.

1.2

Applications Involving Embedded Systems An embedded computer system includes a microcomputer with mechanical, chemical and electrical devices attached to it, programmed for a specific dedicated purpose, and packaged up as a complete system. Any electrical, mechanical, or chemical system that involves inputs, decisions, calculations, analyses, and outputs is a candidate for implementation as an embedded system. Electrical, mechanical, and chemical sensors collect information.

6

1 䡲 Introduction to Embedded Microcomputer Systems

Electronic interfaces convert the sensor signals into a form acceptable for the microcomputer. For example, a tachometer is a sensor that measures the revolutions per second of a rotating shaft. Microcomputer software performs the necessary decisions, calculations, and analyses. Additional interface electronics convert the microcomputer outputs into the necessary form. Actuators can be used to create mechanical or chemical outputs. For example, an electrical motor converts electrical power into mechanical power. One automobile may soon employ up to 100 microcontrollers. In fact, upscale homes already contain as many as 150 microcontrollers, and the average consumer now interacts with microcontrollers thousands of times each day. Embedded microcomputers impact virtually all aspects of daily life: 䡲䡲䡲䡲䡲䡲

Consumer electronics Communication systems Automotive systems Military hardware Business applications Medical devices

Table 1.1 presents typical embedded microcomputer applications and the function performed by the embedded microcomputer. Each microcomputer accepts inputs, performs calculations, and generates outputs. In contrast, a general-purpose computer system typically has a keyboard, disk and graphics display and can be programmed for a wide variety of purposes. Typical generalpurpose applications include word processing, electronic mail, business accounting, scientific computing, and data base systems. The user of a general-purpose computer does have access to the software that controls the machine. In other words, the user decides which operating system to run and which applications to launch. Because the general-purpose computer has a removable disk or network interface, new programs can easily be added to the system. The most common type of general-purpose computer is the personal computer, e.g., the Apple Macintosh or the IBM-PC compatible computer. Computers more powerful than the personal computer can be grouped in the workstation category, ranging from $10,000 to $50,000 range. Supercomputers cost above $50,000. These computers often employ multiple processors and have much more memory than the typical personal computer. The workstations and supercomputers are used for handling large amounts of information (business applications) or performing large calculations (scientific research.) This book will not specifically cover the general-purpose computer, although many of the basic principles of embedded computers do apply to all types of computer systems. Checkpoint 1.5: There is a microcomputer embedded in a digital watch. List three operations the software must perform.

1.3

Flowcharts and Structured Programming The remainder of this chapter will discuss the art and science of designing embedded systems from a general perspective. If you need to write a paper, you decide on a theme, then begin with an outline. In the same manner, if you design an embedded system, you define its specification (what it does), and begin with an organizational plan. In this chapter, we will present three graphical tools to describe the organization of an embedded system: flowcharts, data flow graphs and call graphs. You should draw all three for every system you design. In this section, we introduce the flowchart syntax that will be used throughout the book. Programs themselves are written in a linear or one-dimensional fashion. In other words, we type one line of software after another in a sequential fashion. Writing programs this way is a natural process, because the computer itself usually executes the program in a top-to-bottom

1.3 䡲 Flowcharts and Structured Programming Table 1.1 Embedded system applications.

Function Performed by the Microcomputer Consumer electronics Washing machine Exercise equipment Remote controls Clocks and watches Games and toys Audio/video electronics Set-back thermostats Camera, camcoder Television, VCR, cable box Communication systems Answering machines Telephones Fax machines Radios Cellular phones, pagers Automotive systems Automatic breaking Noise cancellation Locks Electronic ignition Power windows and seats Cruise control Collision avoidance Climate control Emission control Instrumentation Military hardware Smart weapons Missile guidance systems Global positioning systems Surveillance systems Business applications Cash registers Vending machines ATM machines Traffic controllers Industrial robots Bar code readers and writers Automatic sprinklers Elevator controllers RFID systems Lighting and heating systems Medical devices Monitors Drug delivery systems Cancer treatments Pacemakers Prosthetic devices Dialysis machines

Controls the water and spin cycles Measures speed, distance, calories, heart rate Accepts key touches, and sends infrared pulses Maintains the time, alarm, and display Entertains the user, accepts joystick input, displays video output Interacts with the operator and enhances performance Adjusts day/night thresholds saving energy Records and organizes images Accepts inputs and processes audio/visual signals Plays outgoing message, saves and organizes messages Transmits voice and data information Sends and receives images Sends and receives audio, noise rejection Accepts key pad input, outputs sound, and enables communication Optimizes stopping on slippery surfaces Improves sound quality Allows keyless entry, detects intruders, activates alarms Controls sparks and fuel injectors Remembers preferred settings for each driver Maintains constant speed Reduces accidents Improves comfort Reduces pollution Collects and provides necessary information Recognizes friendly targets Directs ordnance at the desired target Determines where you are on the planet Collects information about enemy activities Accepts inputs and manages money Collects money and dispenses product Provides both security and convenience Senses car positions and controls traffic lights Accepts input from sensors, controls motors Controls inventory and optimizes shipping Controls the wetness of the soil Maximizes traffic, minimizes waiting time Identifies products using radiofrequency tags Maximizes comfort and minimizes cost Measures important functions Administers proper doses Controls doses of radiation, drugs, or heat Helps the heart beat regularly Increases mobility for the handicapped Performs functions normally done by the kidney

7

8

1 䡲 Introduction to Embedded Microcomputer Systems

sequential fashion. This one-dimensional format is fine for simple programs, but conditional branching and function calls may create complex behaviors that are not easily observed in a linear fashion. Flowcharts are one way to describe software in a two-dimensional format, specifically providing convenient mechanisms to visualize conditional branching and function calls. Flowcharts are very useful in the initial design stage of a software system to define complex algorithms. Furthermore, flowcharts can be used in the final documentation stage of a project, once the system is operational, in order to assist in its use or modification. Observation: TExaS is one of the few software development systems that allow you to add flowcharts directly into your software as part of its documentation.

Figures throughout this section illustrate the syntax used to draw flowcharts. The oval shapes define entry and exit points. The main entry point is the starting point of the software. Each function, or subroutine, also has an entry point. The exit point returns the flow of control back to the place from which the function was called. When the software runs continuously, as is typically the case in an embedded system, there will be no main exit point. We use rectangles to specify process blocks. In a high-level flowchart, a process block might involve many operations, but in a low-level flowchart, the exact operation is defined in the rectangle. The parallelogram will be used to define an input/output operation. Some flowchart artists use rectangles for both processes and input/output. Since input/output operations are an important part of embedded systems, we will use the parallelogram format, which will make it easier to identify input/output in our flowcharts. The diamond-shaped objects define a branch point or decision block. The rectangle with double lines on the side specifies a call to a predefined function. In this book, functions, subroutines and procedures are terms that all refer to a well-defined section of code that performs a specific operation. Functions usually return a result parameter, while procedures usually do not. Functions and procedures are terms used when describing a highlevel language, while subroutines often used when describing assembly language. When a function (or subroutine or procedure) is called, the software execution path jumps to the function, the specific operation is performed, and the execution path returns to the point immediately after the function call. Circles are used as connectors. Common Error: In general, it is bad programming style to develop software that requires a lot of connectors when drawing its flowchart.

There are a seemingly unlimited number of tasks one can perform on a computer, and the key to developing great products is to select the correct ones. Just like hiking through the woods, we need to develop guidelines (like maps and trails) to keep us from getting lost. One of the fundamental issues when developing software, regardless whether it is a microcontroller with 1000 lines of assembly code or a large computer system with billions of lines of code, is to maintain a consistent structure. One such framework is called structured programming. A good high-level language will force the programmer to write structured programs. Structured programs are built from three basic building blocks: the sequence, the conditional, and the while-loop. At the lowest level, the process block contains simple and well-defined commands. I/O functions are also low-level building blocks. Structured programming involves combining existing blocks into more complex structures, as shown in Figure 1.6.

Figure 1.6 Flowchart showing the basic building blocks of structured programming.

Sequence

Conditional

While-loop

Block 1 Block 2

Block 1

Block 2

Block

1.3 䡲 Flowcharts and Structured Programming

9

Example 1.1: Using a flowchart describe the control algorithm that a toaster might use to cook toast. There will be a start button the user pushes to activate the machine. There is other input that measures toast temperature. The desired temperature is preprogrammed into the machine. The output is a heater, which can be on or off. The toast is automatically lowered into the oven when heat is applied and is ejected when the heat is turned off. Solution This example illustrates a common trait of an embedded system, that is, they perform the same set of tasks over and over forever. The program starts at main when power is applied, and the system behaves like a toaster until it is unplugged. Figure 1.7 shows a flowchart for one possible toaster algorithm. The system initially waits for the operator to push the start button. If the switch is not pressed the system loops back reading and checking the switch over and over. After the start button is pressed, heat is turned on. When the toast temperature reaches the desired value, heat is turned off and the process is repeated. Figure 1.7 Flowchart illustrating the process of making toast.

Entry point Input/Output Decision Input/Output Input/Output Decision

main Input from switch Start

Not pressed

Pressed Output heat is on Too cold Input toast temperature toast < desired toast ≥ desired

Input/Output

Output heat is off

Checkpoint 1.6: What safety feature might you add to this toaster to reduce the chance of a fire?

Example 1.2: Design a flowchart to illustrate the process of reading a book. The inputs to this system are words read from the book, and definitions looked up in a dictionary. The objective of this system will be to store knowledge into a database. There will be no formal output per se. Solution This second example illustrates the concept of a subroutine. We break a complex system into smaller components so that the system is easier to understand, and easier to test. In particular, once we know how to look up definitions of words in a dictionary, we will encapsulate that process into a subroutine, called Lookup. In this example, the main program performs the tasks of reading and remembering. We use a while-loop to read each word of the book in order until the end of the book is reached. After we read a word from the book, we use a conditional to determine whether or not we understand the meaning of the word. If we do not understand the word, we call the Lookup subroutine to find the definition in the dictionary. After we have read and understood each word, we record the knowledge we have learned into a database. The letters A through D in Figure 1.8 specify the software activities in this simple example. In this example, execution is sequential and predictable

10

1 䡲 Introduction to Embedded Microcomputer Systems

(if BD is to occur, it will come after A and before C.) A software task is called a thread. More formally, a thread is the execution of software or the action caused by the execution. In this example, there is one thread. Consider a book with 10 words, and we do not know the meaning of word 4 and word 7. The thread caused by the execution when reading this 10-word book will be A0 C0 A1 C1 A2 C2 A3 C3 A4 B4 D4 C4 A5 C5 A6 C6 A7 B7 D7 C7 A8 C8 A9 C9 where the subscript refers to the word number. The main program executes the sequence AC or ABDC over and over as it finishes reading the book. Figure 1.8 Flowchart illustrating the process of reading a book.

Entry point Connector 1

main End of book

Decision More Input/Output

Exit point

Read next word w

return

Entry point

Lookup(w)

Input/Output

Read w in dictionary

Exit point Decision

w

Connector

1.4

return

Don’t understand

Understand Function call Process block

D

A

Remember

Lookup(w)

B

C

1

Concurrent and Parallel Programming Many problems can not be implemented using the single-threaded execution pattern described in the previous section. Parallel programming allows the computer to execute multiple threads at the same time. State-of-the art multi-core processors can execute a separate program in each of its cores. Fork and join are the fundamental building blocks of parallel programming. After a fork, two or more software threads will be run in parallel, i.e., the threads will run simultaneously on separate processors. Two or more simultaneous software threads can be combined into one using a join. The flowchart symbols for fork and join are shown in Figure 1.9. Software execution after the join will wait until all threads above the join are complete. As an analogy, if I want to dig a big hole in my back yard, I will invite three friends over and give everyone a shovel. The fork operation changes the situation from me working alone to four of us ready to dig. The four digging tasks are run in parallel. When the overall task is complete, the join operation causes the friends go away, and I am working alone again. Concurrent programming allows the computer to execute multiple threads, but only one at a time. Interrupts are one mechanism to implement concurrency on real-time systems. Interrupts have a hardware trigger and a software action. An interrupt is a parameterless subroutine call, triggered by a hardware event. The flowchart symbols for interrupts are

Figure 1.9 Flowchart symbols to describe parallel and concurrent programming.

Fork

Trigger interrupt

Process

Process

Process

Process

Join

Return from interrupt

1.4 䡲 Concurrent and Parallel Programming

11

also shown in Figure 1.9. The trigger is a hardware event signaling it is time to do something. Examples of interrupt triggers we will see in this book include new input data has arrived, output device is idle, and periodic event. The second component of an interruptdriven system is the software action called an interrupt service routine (ISR). The foreground thread is defined as the execution of the main program, and the background threads are executions of the ISRs. Consider the analogy of sitting in a comfy chair reading a book. Reading a book is like executing the main program in the foreground. You start reading at the beginning of the book and basically read one page at time in a sequential fashion. You might jump to the back and look something up in the glossary, then jump back to where you where, which is analogous to a function call. Similarly, if you might read the same page a few times, which is analogous to a program loop. Even though you skip around a little, the order of pages you read follows a logical and well-defined sequence. Conversely, if the telephone rings, you place a bookmark in the book, and answer the phone. When you are finished with the phone conversation, you hang up the phone and continue reading in the book where you left off. The ringing phone is analogous to hardware trigger and the phone conversation is like executing the ISR.

Example 1.3 Design a flowchart for a system that performs two independent tasks. The first task is to output a pulse on PTT every 1.024 ms in real time. The second task is to find all the prime numbers, and there are no particular time constraints on when or how fast one finds the prime numbers. Solution In this example, there are two threads: foreground and background. Real-time means the output pulse must occur every 1.024 ms. Therefore, we will use a periodic interrupt to guarantee this real-time requirement. In particular, the timer system will be configured so that a hardware trigger will occur every 1.024 ms, and the software action will issue the pulse on PTT. The background thread causes the output to go high, then low. Tasks that are not timecritical can be performed in the foreground by the main program. In this example, the foreground thread finds prime numbers. Because both threads are active at the same time, we say the system is multithreaded and the threads are running concurrently. The letters (A through F) in Figure 1.10 specify the software activities in this multithreaded example. In particular, main Factor and Record are executed in the foreground. In the foreground, execution is sequential and predictable (if C is to occur, it will come after B and before D.) On the other hand, with interrupts, the hardware trigger causes the interrupt service routine to execute. The execution of the ISR is predictable too; in this case it is executed every 1.024 ms, but Figure 1.10 Flowchart for a multithreaded solution of a system performing two tasks.

Clock

Entry point

main

Process block

n=2

A

Input/Output

PTT = 1

Factor(n)

B

Input/Output

PTT = 0 F

Interrupt trigger

< E

Connector 1 Function call

Prime

Decision Function call Process block

Not

n = n+1

Connector 1

Record (n) D

Return from interrupt

>

C void interrupt 7 Clock(void){ PTT = 1; E PTT = 0; F } > void main(void){ int n=2; A while(1){ if(Factor(n)) B Record(n); C n = n+1; D } }

12

1 䡲 Introduction to Embedded Microcomputer Systems

ISR execution does not depend on execution in the foreground. In a single processor system like the 9S12, the interrupt must suspend foreground execution, execute the interrupt service routine in the background, then resume execution of the foreground. The symbol signifies the hardware halting the main program and launching the ISR. The symbol signifies the ISR software executing a return from interrupt instruction (rti), which resumes execution in the main program. The execution sequence of this two-threaded system might be something like the following (2, 3, 5, 7 are prime) Foreground A B2C2D2 B3C3D3 B4D4 B5C5D5 B6D6 B7C7 D7 B8D8B9D9 B10 D10 EF EF EF Background where the subscript refers to the current value of n. The main program executes the sequence BCD or BD over and over as it searches for prime numbers. In this example, the periodic timer causes the execution of EF every 1.024 ms. Even though C will come after B and before D, interrupts may or may not inject a EF between any two instructions of the foreground thread. Being able to inject a EF exactly every 1.024 ms is how the real-time constraint is satisfied.

Figure 1.11 Parallel programming solution for finding the maximum value in a buffer.

Buf[0]>Buf[1]

x = Buf[0]

Buf[0]Buf[3]

x = Buf[1]

x>y

max = x

y = Buf[2]

Buf[2] 1) power--; } } } Homework 1.10 Write C code for the flowchart shown in Figure Hw1.10. PORTB is an output connected to a stepper motor. PORTA is an input connected to a toggle switch. Figure Hw1.10 Flowchart showing a stepper motor controller, used for Homework 1.10.

step(n) main

read PORTA

step(5) step(9)

bit0 1

0

PORTB=n

step(10) cnt = 10000 step(6)

cnt =0 return

>0 cnt = cnt-1

Homework 1.11 Draw a data flow graph of the thermostat algorithm developed in Homework 1.7. Homework 1.12 Draw a data flow graph of the cruise control algorithm developed in Homework 1.8. Homework 1.13 Draw a flowchart of this C program using just the three basic building blocks of structured programming. In particular, first draw the flowchart in the regular way, then show the groupings that define each basic block. short data[100],sum; void calc(void){ short i; sum=0; for(i=0;iRAM #$07 DDRT ;PT2,PT1,PT0 outputs ;allow debugger #4 PTT ;output #2 PTT ;output #1 PTT ;output loop $FFFE ;EEPROM main ;reset vector

// 9S12DP512 void main(void){ DDRT = 0x07; // PT2,PT1,PT0 outputs asm cli while(1){ PTT = 0x04; PTT = 0x02; PTT = 0x01; } }

Program 2.2 Software solution to Example 2.2.

To better understand how the computer translates our program into actions, we will analyze the explicit actions that occur as Program 2.2 executes. After typing in our source code, we will assemble the program generating the machine code and listing file. Program 2.3 is the listing file for Program 2.2. Line numbers were manually added to show instructions that will be executed. Looking at the big picture, we see that lines 1 through 4 are executed once to initialize the system, and lines 5 through 11 are repeated over and over as the system produces the infinite output sequence 4-2-1-4-2-1-4-2-1- . . . 1

$2000 for the 9S12E128 and $3800 for the 9S12C32

50

2 䡲 Introduction to Assembly Language Programming

Program 2.3 Listing file for Program 2.2.

$0240 $0242 $0800 $4000 $4000 $4003 $4005 $4008 $400A $400C $400F $4011 $4014 $4016 $4019 $FFFE $FFFE

CF4000 8607 7A0242 10EF 8604 7A0240 8602 7A0240 8601 7A0240 20EF 4000

; 9S12DP512 PTT equ $0240 DDRT equ $0242 org $0800 ;RAM org $4000 ;EEPROM main lds #$4000 ;SP=>RAM *Line ldaa #$07 *Line staa DDRT ;PT2,PT1,PT0 *Line cli ;allow debugger*Line loop ldaa #4 *Line staa PTT ;output *Line ldaa #2 *Line staa PTT ;output *Line ldaa #1 *Line staa PTT ;output *Line bra loop *Line org $FFFE ;EEPROM fdb main ;reset vector

1 2 3 4 5 6 7 8 9 10 11

The machine code is programmed into the flash EEPROM. In particular, locations $4000 to $401A will contain the machine code for this program, and locations $FFFE to $FFFF will always contain the reset vector, as shown in Figure 2.20.

Figure 2.20 Memory model of Program 2.2. After reset, PC$4000.

Data

Address

I/O

$0000

Processor RegA SP PC

PTT

$0240 PTT $0241 $0242 DDRT

$4000

Bus

$4000 $4001 $4002 $4003 $4004 $4005 $4006 $4007 $4008 $4009 $400A $400B $400C $400D $400E $400F $4010 $4011 $4012 $4013 $4014 $4015 $4016 $4017 $4018 $4019 $401A

$CF $40 $00 $86 $07 $7A $02 $42 $10 $EF $86 $04 $7A $02 $40 $86 $02 $7A $02 $40 $86 $01 $7A $02 $40 $20 $EF

EEPROM

$FFFE $40 $FFFF $00

When power is applied to the system, or when the reset button is pushed, the computer reads the 16-bit number from location $FFFE and $FFFF and places it into the PC. This defines the place the program will begin execution. In this example, the software will start executing at $4000. Lines 1 through 4:

These four lines perform the initialization sequence. Executing the lds instruction will initialize the stack pointer. Although not specifically used in this example, the stack is an important structure and should be initialized in this manner for all our 9S12 software. During the execution of each instruction, the PC is incremented to the next instruction. Executing the ldaa instruction will set register A equal to $07. Since this is immediate mode addressing, the data can be found in the machine code itself. Since it is immediate mode,

2.10 䡲 Tutorial 2. Running with TExaS

51

the data will be fixed, and can only be changed by editing the source code and reassembling the program. Executing the staa instruction will set DDRT equal to $07. Since this is extended mode addressing, the machine code contains the address of DDRT, $0242. DDRT specifies whether each pin of Port T is an input or an output. This store instruction produces a write cycle to address $0242 with data $07, causing PT2, PT1 and PT0 to become output pins. Notice that the load instructions bring data from memory or a port into a register, and the store instructions send data from a register out to memory or a port. Executing the cli instruction will enable interrupts. Although this program not specifically use interrupts, the debugger needs to have interrupts enabled.

Lines 5 through 10:

Line 11:

2.10

Address

Object code

Source code

Action

After completion

$4000 $4003 $4005 $4008

$CF4000 $8607 $7A0242 $10EF

lds #$4000 ldaa #$07 staa DDRT cli

SP=$4000 A=$07 DDRT=$07 I=0

PC=$4003 PC=$4005 PC=$4008 PC=$400A

These lines perform the body of the program, causing the 4-2-1 output sequence. Executing the ldaa instructions will set register A equal to a constant. The # symbol specifies immediate mode addressing, the constant data can be found in the machine code itself. Executing the staa instructions will set PTT. This is extended mode addressing, therefore the machine code contains the address of PTT, $0240. The store staa instructions produce write cycles to address $0240. When you write 1/0 binary data to Port T, the high/low digital voltages occur on the corresponding output pins. In this example, each staa instruction sets a new output on PTT, and notice the sequence will be 4-2-1, as desired. Address

Object code

Source code

Action

After completion

$400A $400C $400F $4011 $4014 $4016

$8604 $7A0240 $8602 $7A0240 $8601 $7A0240

ldaa staa ldaa staa ldaa staa

A=4 PTT=4 A=2 PTT=2 A=1 PTT=1

PC=$400C PC=$400F PC=$4011 PC=$4014 PC=$4016 PC=$4019

#4 PTT #2 PTT #1 PTT

This line causes the execution of the body of the program to occur over and over. The bra instruction uses PC-relative addressing. During the fetching of the two bytes of machine code, the PC is incremented twice, changing it from $4019 to $401B. The PC-relative offset, $EF is sign extended to $FFEF, which means -17. This is an unconditional branch, so PC PC-17 (or $401B$FFEF), setting PC back to line 5. Address

Object code

Source code

$4019

$20EF

bra loop

Action

After completion PC=$400A

Tutorial 2. Running with TExaS This tutorial explains some of the debugging features available with TExaS. A vast amount of information exists as the computer executes software. A good debugger allows us to selectively filter this information, showing us only data relevant to problem at hand. There are two aspects of this filter: what information will we see? and when (or how often) will it be collected? The run mode allows us to adjust the level of detail observable during the simulation. Action: Watch the second movie, called Lesson 2. Lesson 2 is located on the web at http://users.ece.utexas.edu/~valvano/Readme.htm. This lesson introduces some of the debugging features. It takes about 11 minutes and provides a narrated overview of debugging within TExaS. You need not install TExaS, just download and run the Windows media file.

52

2 䡲 Introduction to Assembly Language Programming Question 2.1. A good debugger allows us to filter data that we observe. What are the two aspects of this filtering? I.e., in what two ways does the debugger filter data? Question 2.2. What format code do we use in the ViewBox to see a variable in 8-bit unsigned decimal? Question 2.3. What does CycleView mode do? Question 2.4. What does InstructionView mode do? Question 2.5. What does LogRecord mode do? Question 2.6. What is a ScanPoint?

2.11

Homework Assignments Homework 2.1 What are the differences between the following four instructions: ldaa 10 ldaa #10 ldaa $10 ldaa #$10 Homework 2.2 What is the difference between the following two instructions: ldaa #10 ldx #10 Homework 2.3 Identify the addressing mode used in each of the following instructions: staa 200 staa 2000 staa 200,x staa 2000,x bra 2000 jmp 2000 Homework 2.4 Identify the addressing mode used in each of the following instructions: subd 2,x clra ldaa #$36 ldd $3800 bra loop Homework 2.5 You will need to look up the address of Ports A and J in your data sheet to answer this question. Identify the addressing mode used in each of the following instructions: cli subd #0 bsr $5000 jsr $5000 ldy 2,y ldaa PTJ ;Port J stab PORTA ;Port A rts The next three homework assignments in this chapter involve hand assembly. Pass1 contains three steps. The first step is to determine addressing mode for each instruction. Next, you calculate the object code size for the instruction. The third step is to create the symbol table. Pass2 contains two steps. The first step is to determine the object code for each instruction, and the second step write the listing (address, data) for each line. Homework 2.6 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRH equ $0262 ; Port H Data Direction Register DDRT equ $0242 ; Port T Data Direction Register PTH equ $0260 ; Port H I/O Register

2.11 䡲 Homework Assignments PTT Main

loop

equ org ldaa staa ldaa staa ldaa staa bra org fdb

$0240 $4000 #$FF DDRT #$00 DDRH PTH PTT loop $FFFE Main

53

; Port T I/O Register ; Object code goes in EEPROM ; Port T is output ; ; ; ;

Port H is input Read inputs Set output Repeat

; Starting address after a RESET

Homework 2.7 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRP equ $025A ; Port P Data Direction Register PTP equ $0258 ; Port P I/O Register org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main movb #$00,DDRP ; Port P is input loop ldaa PTP ; Read inputs staa Data ; Save in variable bra loop ; Repeat org $FFFE fdb Main ; Starting address after a RESET Homework 2.8 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main lds #$4000 ; Initialize stack movb #10,Data ; Data=10 loop bsr Add1 bra loop ; Repeat Add1 ldaa Data inca ; Add one staa Data rts org $FFFE fdb Main ; Starting address Homework 2.9 During an 8-bit memory read bus cycle to address $3800, what memory locations are modified? During an 8-bit memory write bus cycle to address $3800, what memory locations are modified? Homework 2.10 Consider this assembly instruction Here bsr Lookup ;call Lookup function For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.11 Consider this assembly instruction Here jsr Lookup ;call Lookup function

54

2 䡲 Introduction to Assembly Language Programming For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.12 Assume RegX is $3800, RegD is $4647, the PC is $4123, and RAM locations $3800 to $38FF are initially $00, $01, . . . $FF respectively. E.g., location $3856 contains $56. Show the simplified bus cycles occurring when the ldd 2,x instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4123 EC02

ldd 2,x

Homework 2.13 Assume PC is $4120, and the SP is initially $3FF4. Show the simplified bus cycles occurring when the bsr instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4120 07F0

bsr MyFunction

Homework 2.14 What does the effective address register contain? Homework 2.15 What is the purpose of the following registers CCR SP PC IR EAR? Homework 2.16 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldaa #44 ldy #$0010 staa 4,y Homework 2.17 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldab #$55 ldx #$0020 stab 5,x Homework 2.18 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. org $F000 fcb $86,$55,$CE,$02,$50,$F6,$F0,$00,$5A,$01,$6B,$08,$20,$FA Homework 2.19 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. Each value is in hexadecimal. org $4000 fcb $87,$CE,$02,$40,$F6,$40,$01,$5A,$08,$54,$6B,$02,$20,$FB Homework 2.20 Write an assembly language subroutine that initializes Port J bits 5, 4, 1, 0 to outputs and bits 7, 6, 3, 2 to input. Make all Port H bits input, and all Port T bits output. Homework 2.21 Write an assembly language subroutine that initializes Port T bits 7, 4, 3, 0 to outputs and bits 6, 5, 2, 1 to input. Make all Port M bits outputs. Homework 2.22 Write an assembly language software that initializes Port T bit 3 to an output. All other bits are input. Homework 2.23 Write an assembly language software that initializes Port H bit 1 to an input. All other bits are output. Homework 2.24 Write assembly software that makes Port T bits 1, 3, 5, and 7 outputs and the rest inputs.

2.12 䡲 Laboratory Assignments

55

Homework 2.25 Interface a LED that requires 1 mA at 2.5 V. A digital output high on PT0 turns on the LED. Homework 2.26 Interface a LED that requires 2 mA at 2.0 V. A digital output low turns PT1 on the LED. Homework 2.27 Interface a LED that requires 15 mA at 2.5 V. Use a 7405 driver and a current limiting resistor. A digital output high on PT2 turns on the LED. The 7405 output voltage VOL is 0.5 V. Homework 2.28 Interface a LED that requires 30 mA at 1.5 V. Use a 7406 driver and a current limiting resistor. A digital output high PT3 turns on the LED. The 7406 output voltage VOL is 0.5 V.

2.12

Laboratory Assignments For each lab in this chapter, you will have two binary switch inputs and one LED output. The LED represents the output, and the operator will toggle the switches in order to set the inputs. Let T be the Boolean variable representing the output (0 means LED is off and output is zero, 1 means LED is on and the output is 1). Let H and J be Boolean variables representing the state of the two switches (0 means the switch is not pressed, and 1 means the switch is pressed). Use the TExaS simulator to create three files. Lab2.rtf will contain the assembly source code. Lab2.uc will contain the microcomputer configuration. Lab2.io will define the external connections, which should be the two switches and one LED. Use the Mode-Processor command to select the desired processor. You should connect switches to PH0 (means Port H bit 0) and to PJ0 (means Port J bit 0). You should connect an LED to PT0 (means Port T bit 0). The switches should be labeled H and J, and the LED should be labeled T. When H switch is “off” or open position, the signal at PH0 will be 0 V, which is a logic “0”. For this situation, your software will consider H to be false. When the H switch is “on” or closed position, the signal at PH0 will be 5 V, which is a logic “1”. In this case, your software will consider H to be true. The J switch, which is connected to PJ0, will operate in a similar fashion. When your software writes a “1” to PT0, the LED will turn on. You will write assembly code that inputs from PH0 and PJ0, and outputs to PT0. A template structure for your assembly program is shown as Program 2.2. To solve this lab you will need the ldaa staa anda coma and bra instructions. You can use the movb instruction if you wish. You can copy and paste the address definitions for ports H, J, and T from the port12.rtf file. In particular, you will need to define DDRH DDRJ DDRT PTH PTJ and PTT. The opening comments include: file name, overall objectives, hardware connections, specific functions, author name, and date. The equ pseudo-op is used to define port addresses. Global variables are declared in RAM, and the main program is placed in EEPROM. The 16-bit contents at $FFFE and $FFFF define where the computer will begin execution after a reset vector. Lab 2.1 The specific device you will create is a digital NAND with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be zero if and only if both the H switch and the J switch are pressed. Program L2.1 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.

Program L2.1 The C program to illustrate Lab 2.1.

void main(void){ DDRH = 0x00; // make Port DDRJ = 0x00; // make Port DDRT = 0xFF; // make Port while(1){ PTT = ~(PTJ&PTH); // LED off } }

H an input, PH0 is H J an input, PJ0 is J T an output, PT0 is T iff PJ0=1 and PH0=1

56

2 䡲 Introduction to Assembly Language Programming Lab 2.2 The specific device you will create is a digital NOR with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be one if and only if both the H switch and the J switch are not pressed. Program L2.2 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.

Program L 2.2 The C program to illustrate Lab 2.2.

void main(void){ DDRH = 0x00; // make DDRJ = 0x00; // make DDRT = 0xFF; // make while(1){ PTT = (~PTJ)&(~PTH); // } }

Port H an input, PH0 is H Port J an input, PJ0 is J Port T an output, PT0 is T LED on iff PJ0=0 and PH0=0

Lab 2.3 The specific device you will create is a digital lock with two binary switch inputs and one LED output. The LED output represents the lock, and the operator will toggle the switches in order to unlock the door. The specific function you will implement is T = H&J This means the LED will be on if and only if the H switch is pressed and the J switch is not pressed. Program L2.3 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary. Program L 2.3 The C program to illustrate Lab 2.3.

void main(void){ DDRH = 0x00; // make Port H an input, DDRJ = 0x00; // make Port J an input, DDRT = 0xFF; // make Port T an output, while(1){ PTT = (~PTJ)&PTH; // LED on iff PJ0=0 and } }

PH0 is H PJ0 is J PT0 is T PH0=1

3

Representation and Manipulation of Information Chapter 3 objectives are: c c c c c c

Introduce the concept of how numbers are stored on the computer Discuss how characters are represented Define terms like precision and basis Review arithmetic and logic operations Explain the usage of condition code bits Develop mechanisms to convert between character strings and binary numbers

Numbers, like all information, are stored on the computer in binary form. On most computers, the memory is organized into 8-bit bytes. This means each 8-bit byte stored in memory will have a separate address. In this chapter we will learn about unsigned numbers, signed numbers, characters, and how to perform basic logical and arithmetic calculations. In order to develop reliable systems it is important to understand how the computer can make mistakes during calculations. With this knowledge, we can write software that detects when an error occurs, or better yet, we can write software that does not make mistakes.

3.1

Precision Precision is the number of distinct or different values. We express precision in alternatives, decimal digits, bytes, or binary bits. Alternatives are defined as the total number of possibilities. For example, an 8-bit number format can represent 256 different numbers. An 8-bit digital to analog converter (DAC) can generate 256 different analog outputs. An 8-bit analog to digital converter (ADC) can measure 256 different analog inputs. Table 3.1 illustrates the relationship between precision in binary bits and precision in alternatives.

Table 3.1 Relationship between bits, bytes and alternatives as units of precision.

Binary Bits

Bytes

Alternatives

8 10 12 16 20 24 30 32 n

1 2 2 2 3 3 4 4 [[n/8]]

256 1024 4096 65536 1,048,576 16,777,216 1,073,741,824 4,294,967,296 2n 57

58

3 䡲 Representation and Manipulation of Information

The operation [[x]] is defined as the greatest integer of x. E.g., [[2.1]] [[2.9]] and [[3.0]] are all equal to 3. The Bytes column in Table 3.1 specifies how many bytes of memory would it take to store a number with that precision assuming the data were not packed or compressed in any way. Checkpoint 3.1: How many bytes of memory would it take to store a 50-bit number?

Decimal digits are used to specify precision of measurement systems that display results as numerical values, as defined in Table 3.2. A full decimal digit can be any value 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. A digit that can be either 0 or 1 is defined as a 1⁄2 decimal digit. The terminology of a 1⁄2 decimal digit did not arise from a mathematical perspective of precision, but rather it arose from the physical width of the LED/LCD module used to display a blank or ‘1’as compared to the width of a full digit. Notice in Figure 3.1 that the 7-segment modules capable of displaying 0 to 9 are about 1 cm wide; however, the corresponding 2-segment modules capable of being blank or displaying a 1 are about half as wide. Similarly, we define a digit that can be or also as a half decimal digit, because it has two choices. A digit that can be 0, 1, 2, 3 is defined as a 3⁄4 decimal digit, because it is wider than a 1⁄2 digit but narrower than a full digit. We also define a digit that can be 1, 0, 0, or 1 as a 3⁄4 decimal digit, because it also has 4 choices. We use the expression 41⁄2 decimal digits to mean 20,000 alternatives and the expression 43⁄4 decimal digits to mean 40,000 alternatives. The use of a 1⁄2 decimal digit to mean twice the number of alternatives or one additional binary bit is widely accepted. On the other hand, the use of a 3⁄4 decimal digit to mean four times the number of alternatives or two additional binary bits is not as commonly accepted. For example, consider the two ohmmeters shown in Figure 3.1. As illustrated in the figure, both are set to the 0 to 200 k range. The 31⁄2 digit ohmmeter has a resolution of 0.1 k with measurements ranging from 0.0 to 199.9 k. On the other hand, the 41⁄2 digit ohmmeter has a resolution of 0.01 k with measurements ranging from 0.00 to 199.99 k. Table 3.2 Definition of decimal digits as a unit of precision.

Decimal Digits

Alternatives

3 31⁄2 33⁄4 4 41⁄2 43⁄4 n n1⁄2 n3⁄4

1000 2000 4000 10000 20000 40000 10n 2•10n 4•10n

Observation: A good rule of thumb to remember is 210•n ⬇ 103•n. Figure 3.1 Two ohmmeters: the one on the left has 31⁄2 decimal digits and the one on the right has 41⁄2.

3.2 䡲 Boolean Information

59

Checkpoint 3.2: How many binary bits is equivalent to 31⁄2 decimal digits? Checkpoint 3.3: About how many decimal digits is 64 binary bits? You can answer this without a calculator, just using the “rule of thumb”.

A great deal of confusion exists over the abbreviations we use for large numbers. In 1998 the International Electrotechnical Commission (IEC) defined a new set of abbreviations for the powers of 2, as shown in Table 3.3. These new terms are endorsed by the Institute of Electrical and Electronics Engineers (IEEE) and International Committee for Weights and Measures (CIPM) in situations where the use of a binary prefix is appropriate. The confusion arises over the fact that the mainstream computer industry, such as Microsoft, Apple, and Dell, continues to the old terminology. According to the companies that market to consumers, a 1 GHz is 1,000,000,000 Hz but 1 Gbyte of memory is 1,073,741,824 bytes. The correct terminology is to use the SI-decimal abbreviations to represent powers of 10, and the IEC-binary abbreviations to represent powers of 2. The scientific meaning of 2 kilovolts is 2000 volts, but 2 kibibytes is the proper way to specify 2048 bytes. The term kibibyte is a contraction of kilo binary byte and is a unit of information or computer storage, abbreviated KiB. 1 KiB 210 bytes 1024 bytes 1 MiB 220 bytes 1,048,576 bytes 1 GiB 230 bytes 1,073,741,824 bytes These abbreviations can also be used to specify the number of binary bits. The term kibibit is a contraction of kilo binary bit, and is a unit of information or computer storage, abbreviated Kibit. 1 Kibit 210 bits 1024 bits 1 Mibit 220 bits 1,048,576 bits 1 Gibit 230 bits 1,073,741,824 bits A mebibyte (1 MiB is 1,048,576 bytes) is approximately equal to a megabyte (1 MB is 1,000,000 bytes), but mistaking the two has nonetheless led to confusion and even legal disputes. In the engineering community, it is appropriate to use terms that have a clear and unambiguous meaning. Checkpoint 3.4: A 2 tebibyte storage system can store how many bytes?

Table 3.3 Common abbreviations for large numbers.

3.2

Value

SI

Decimal

Value

IEC

Binary

10001 10002 10003 10004 10005 10006 10007 10008

k M G T P E Z Y

kilomegagigaterapetaexazettayotta-

10241 10242 10243 10244 10245 10246 10247 10248

Ki Mi Gi Ti Pi Ei Zi Yi

kibimebigibitebipebiexbizebiyobi-

Boolean Information A Boolean number is has two states. The two values represent logical true and false. In Chapter 1, we defined positive logic so that true is a 1 or high, and false is a 0 or low. In C programming, a false is represented by a zero, and a true as any non-zero value. If you

60

3 䡲 Representation and Manipulation of Information

were controlling a motor, light, heater, or air conditioner, the Boolean could mean on or off. Figure 3.2 shows the simulation using TExaS of a simple switch connected to PC0 that has two states and a LED that can be on or off. PB0 is a digital output of the microcomputer, which can be either high or low. The output of the LED driver is low or HiZ (shown as z in Figure 3.2.) In communication systems, we represent the information as a sequence of Booleans: mark or space. For black or white graphic displays we use Booleans to specify the state of each pixel. The most efficient storage of Booleans on a computer is to map each Boolean into one memory bit. In this way, we can pack eight Booleans into each byte. If we have just one Boolean to store in memory, out of convenience we allocate an entire byte for it. A common positive logic definition for Boolean information is: False is defined as all zeros, and True is defined as any nonzero value. Figure 3.2 External to the microcomputer, Boolean information is encoded as voltage (0 or 5 V), position of a switch (off, on), and the presence of light (dark, light).

Checkpoint 3.5: Given an example of a switch that is not binary.

In negative logic, the absence of a voltage is the true or asserted state. The presence of a voltage is called the false or not asserted state. In other words, the 0 or low voltage means true, and the 5 or high voltage means false. RS232 serial communication uses a negative logic encoding where 12 V means true, and 12 V means false. More about serial interfacing can be found in Chapters 8 and 12.

3.3

8-bit Numbers We saw 8-bit and 16-bit numbers in Chapter 2, but more formal definitions will be presented in the next few sections. A byte contains 8 bits as shown in Figure 3.3, where each bit b7, . . . , b0 is binary and has the value 1 or 0. We specify b7 as the most significant bit or MSB, and b0 as the least significant bit or LSB.

Figure 3.3 8-bit binary format.

b7 b6 b5 b4

b3 b2 b1 b0

If a byte is used to represent an unsigned number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Notice that the significance of bit n is 2n. There are 256 different unsigned 8-bit numbers. The smallest unsigned 8-bit number is 0 and the largest is 255. For example, %00001010 is 8 2 or 10. Other examples are shown in Table 3.4. The least significant bit can tell us if the number is even or odd.

3.3 䡲 8-bit Numbers Table 3.4 Example conversions from unsigned 8-bit binary to hexadecimal and to decimal.

61

Binary

Hex

Calculation

Decimal

%00000000 %01000001 %00010110 %10000111 %11111111

$00 $41 $16 $87 $FF

641 1642 128421 1286432168421

0 65 22 135 255

Checkpoint 3.6: Convert the binary number %01101010 to unsigned decimal. Checkpoint 3.7: Convert the hex number $45 to unsigned decimal.

The basis of a number system is a subset from which linear combinations of the basis elements can be used to construct the entire set. The basis represents the “places” in a “placevalue” system. For positive integers, the basis is the infinite set {1, 10, 100, . . .}, and the “values” can range from 0 to 9. Each positive integer has a unique set of values such that the dot-product of the value vector times the basis vector yields that number. For example, 2345 is ( . . . , 2,3,4,5)•(. . . , 1000,100,10,1), which is 2*10003*1004*105. For the unsigned 8-bit number system, the basis is {1, 2, 4, 8, 16, 32, 64, 128} The values of a binary number system can only be 0 or 1. Even so, each 8-bit unsigned integer has a unique set of values such that the dot-product of the values times the basis yields that number. For example, 69 is (0,1,0,0,0,1,0,1)•(128,64,32,16,8,4,2,1), which equals 0*1281*640*320*160*81*40*21*1. Conveniently, there is no other set of 0’s and 1’s, such that set of values multiplied by the basis is 69. One way for us to convert a decimal number into binary is to use the basis elements. The overall approach is to start with the largest basis element and work towards the smallest. More precisely, we start with the most significant bit and work towards the least significant bit. One by one, we ask ourselves whether or not we need that basis element to create our number. If we do, then we set the corresponding bit in our binary result and subtract the basis element from our number. If we do not need it, then we clear the corresponding bit in our binary result. We will work through the algorithm with the example of converting 100 to 8-bit binary, see Table 3.5. We start with the largest basis element (in this case 128) and ask whether or not we need to include it to make 100? Since our number is less than 128, we do not need it, so bit 7 is zero. We go the next largest basis element, 64 and ask, “do we need it?” We do need 64 to generate our 100, so bit 6 is one and we subtract 100 minus 64 to get 36. Next, we go the next basis element, 32 and ask, “do we need it?” Again, we do need 32 to generate our 36, so bit 5 is one and we subtract 36 minus 32 to get 4. Continuing along, we do not need basis elements 16 or 8, but we do need basis element 4. Once we subtract the 4, are working result is zero, so basis elements 2 and 1 are not needed. Putting it together, we get %01100100 (which means 64324).

Table 3.5 Example conversion from decimal to unsigned 8-bit binary to hexadecimal.

Number

Basis

Need It?

Bit

Operation

100 100 36 4 4 4 0 0

128 64 32 16 8 4 2 1

no yes yes no no yes no no

bit 70 bit 61 bit 51 bit 40 bit 30 bit 21 bit 10 bit 00

none subtract 100-64 subtract 36-32 none none subtract 4-4 none none

62

3 䡲 Representation and Manipulation of Information Checkpoint 3.8: In this conversion algorithm, how can we tell if a basis element is needed? Observation: If the least significant binary bit is zero, then the number is even. Observation: If the right-most n bits (least significant) are zero, then the number is divisible by 2n. Observation: Bit 7 of an 8-bit number determines whether its value is greater than or equal to 128. Checkpoint 3.9: Give the representations of the decimal 45 in 8-bit binary and hexadecimal. Checkpoint 3.10: Give the representations of the decimal 200 in 8-bit binary and hexadecimal.

One of the first schemes to represent signed numbers was called one’s complement. It was called one’s complement because to negate a number, we complement (logical not) each bit. For example, if 25 equals 00011001 in binary, then 25 is 11100110. An 8-bit one’s complement number can vary from 127 to 127. The most significant bit is a sign bit, which is 1 if and only if the number is negative. The difficulty with this format is that there are two zeros 0 is 00000000, and 0 is 11111111. Another problem is that ones complement numbers do not have basis elements. These limitations led to the use of two’s complement. The two’s complement number system is the most common approach used to define signed numbers. It is called two’s complement because to negate a number, we complement each bit (like one’s complement), then add 1. For example, if 25 equals 00011001 in binary, then 25 is 11100111. If a byte is used to represent a signed two’s complement number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Observation: One usually means two’s complement when one refers to signed integers.

There are 256 different signed 8-bit numbers. The smallest signed 8-bit number is 128 and the largest is 127. For example, %10000010 equals 1282 or 126. Other examples are shown in Table 3.6. Checkpoint 3.11: Convert the signed binary number %11101010 to signed decimal. Checkpoint 3.12: Are the signed and unsigned decimal representations of the 8-bit hex number $45 the same or different?

For the signed 8-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128} Observation: The most significant bit in a two’s complement signed number will specify the sign.

Table 3.6 Example conversions from signed 8-bit binary to hexadecimal and to decimal.

Binary

Hex

Calculation

Decimal

%00000000 %01000001 %00010110 %10000111 %11111111

$00 $41 $16 $87 $FF

64 1 16 4 2 128 4 2 1 128 64 32 16 8 4 2 1

0 65 22 121 1

3.3 䡲 8-bit Numbers

63

Notice that the same binary pattern of %11111111 could represent either 255 or 1. It is very important for the software developer to keep track of the number format. The computer can not determine whether the 8-bit number is signed or unsigned. You, as the programmer, will determine whether the number is signed or unsigned by the specific assembly instructions you select to operate on the number. Some operations like addition, subtraction, and shift left (multiply by 2) use the same hardware (instructions) for both unsigned and signed operations. On the other hand, multiply, divide, and shift right (divide by 2) require separate hardware (instruction) for unsigned and signed operations. For example, the multiply instruction, mul, operates on unsigned values. Software that employs the mul instruction implements unsigned arithmetic. There is also a signed multiply instruction, smul, and if you use it, you are implementing signed arithmetic. Similar to the unsigned algorithm, we can use the basis to convert a decimal number into signed binary. We will work through the algorithm with the example of converting 100 to 8-bit binary, as shown in Table 3.7. We start with the most significant bit (in this case 128) and decide do we need to include it to make 100? Yes (without 128, we would be unable to add the other basis elements together to get any negative result), so we set bit 7 and subtract the basis element from our value. Our new value equals 100 minus 128, which is 28. We go the next largest basis element, 64 and ask, “do we need it?” We do not need 64 to generate our 28, so bit 6 is zero. Next we go the next basis element, 32 and ask, “do we need it?” We do not need 32 to generate our 28, so bit 5 is zero. Now we need the basis element 16, so we set bit 4, and subtract 16 from our number 28 (28 16 12). Continuing along, we need basis elements 8 and 4 but not 2 1. Putting it together we get %10011100 (which means 128 16 8 4).

Table 3.7 Example conversion from decimal to signed 8-bit binary.

Number

Basis

Need It

Bit

Operation

100 28 28 28 12 4 0 0

128 64 32 16 8 4 2 1

Yes No No Yes Yes Yes No No

bit 71 bit 60 bit 50 bit 41 bit 31 bit 21 bit 10 bit 00

Subtract 100 128 None None Subtract 28 16 Subtract 12 8 Subtract 4 4 None None

Observation: To take the negative of a two’s complement signed number we first complement (flip) all the bits, then add 1.

A second way to convert negative numbers into binary is to first convert them into unsigned binary, then do a two’s complement negate. For example, we earlier found that 100 is %01100100. The two’s complement negate is a two step process. First we do a logic complement (flip all bits) to get %10011011. Then add one to the result to get %10011100. A third way to convert negative numbers into binary is to first add 256 to the number, then convert the unsigned result to binary using the unsigned method. For example, to find 100, we add 256 plus 100 to get 156. Then we convert 156 to binary resulting in %10011100. This method works because in 8-bit binary math adding 256 to number does not change the value. E.g., 256-100 has the same 8-bit binary value as 100. Checkpoint 3.13: Give the representations of 45 in 8-bit binary and hexadecimal. Checkpoint 3.14: Why can’t you represent the number 200 using 8-bit signed binary?

64

3 䡲 Representation and Manipulation of Information

Sign-magnitude representation dedicates one bit as the sign leaving the remaining bits to specify the magnitude of the number. If b7 is 1 then the number is negative, otherwise the number is positive. b

N 1 7•(64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0) Unfortunately, there is no basis set for the sign-magnitude number system. For example, %10000010 equals 1•2 or 2. Other examples are shown in Table 3.8. Table 3.8 Example conversions from sign-magnitude 8-bit binary to hexadecimal and to decimal.

Binary

Hex

Calculation

Decimal

%00000000 %01000001 %00010110 %10000111 %11111111

$00 $41 $16 $87 $FF

64 1 16 4 2 1•(4 2 1) 1•(64 32 16 8 4 2 1)

0 65 22 7 127

Another problem with sign-magnitude is that there are two representations of the number 0: “00000000” and “10000000”. But, the biggest advantage of two’s complement signed numbers over sign-magnitude is that the same addition and subtraction hardware (e.g., the adda, suba instructions) can be used for both signed and unsigned numbers. We also can use the same hardware for shift left (e.g., asla is the same instruction as lsla). Although the hardware for these three operations works for both signed and unsigned numbers, the overflow (error) conditions are distinct. The C bit in the condition code register (CCR) signifies unsigned overflow, and the V bit in the CCR means a signed overflow has occurred. Unfortunately, we must use separate signed and unsigned operations for multiply, divide, and shift right. Common Error: An error will occur if you use signed operations on unsigned numbers, or use unsigned operations on signed numbers. Maintenance Tip: To improve the clarity of our software, always specify the format of your data (signed versus unsigned) when defining or accessing the data.

When communicating with humans (input or output), computers need to store information in an easy-to-read decimal format. One such format is binary coded decimal or BCD. The 8-bit BCD format contains two decimal digits, and each decimal digit is encoded in 4-bit binary. For example, the number 72 is stored as $72 or %01110010. We can represent numbers from 0 to 99 using 8-bit BCD. Checkpoint 3.15: What binary values are used to store the number 25 in 8-bit BCD format?

3.4

16-bit Numbers A word or double byte contains 16 bits, where each bit b15, . . . , b0 is binary and has the value 1 or 0, as shown in Figure 3.4.

Figure 3.4 16-bit binary format.

b15 b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4

b3 b2 b1 b0

If a word is used to represent an unsigned number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0

3.4 䡲 16-bit Numbers

65

There are 65536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0 and the largest is 65535. For example, %0010000110000100 or $2184 is 8192 256 128 4 or 8580. Other examples are shown in Table 3.9.

Binary

Hex

Calculation

%0000000000000000 %0000010000000001 %0000110010100000 %1000111000000010 %1111111111111111

$0000 $0401 $0CA0 $8E02 $FFFF

1024 1 2048 1024 128 32 32768 2048 1024 512 2 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1

Decimal 0 1025 3232 36354 65535

Table 3.9 Example conversions from unsigned 16-bit binary to hexadecimal and to decimal.

Checkpoint 3.16: Convert the 16-bit binary number %0010000001101010 to unsigned decimal. Checkpoint 3.17: Convert the 16-bit hex number $1234 to unsigned decimal.

For the unsigned 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Checkpoint 3.18: Convert the unsigned decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.19: Convert the unsigned decimal number 10000 to 16-bit binary.

There are also 65536 different signed 16-bit numbers. The smallest two’s complement signed 16-bit number is 32768 and the largest is 32767. For example, %1101000000000100 or $D004 is 327681638440964 or 12284. Other examples are shown in Table 3.10.

Binary

Hex

%0000000000000000 %0000010000000001 %0000110010100000 %1000010000000010 %1111111111111111

$0000 $0401 $0CA0 $8402 $FFFF

Calculation 1024 1 2048 1024 128 32 32768 1024 2 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1

Decimal 0 1025 3232 31742 1

Table 3.10 Example conversions from signed 16-bit binary to hexadecimal and to decimal.

If a word is used to represent a signed two’s complement number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Checkpoint 3.20: Convert the 16-bit hex number $1234 to signed decimal. Checkpoint 3.21: Convert the 16-bit hex number $ABCD to signed decimal.

66

3 䡲 Representation and Manipulation of Information

For the signed 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Common Error: An error will occur if you use 16-bit operations on 8-bit numbers, or use 8-bit operations on 16-bit numbers. Maintenance Tip: To improve the clarity of your software, always specify the precision of your data when defining or accessing the data. Checkpoint 3.22: Convert the signed decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.23: Convert the signed decimal number –10000 to 16-bit binary.

3.5

Extended Precision Numbers Consider an unsigned number with n bits, where each bit bn-1, Á , b0 is binary and has the value 1 or 0. If an n-bit number is used to represent an unsigned integer, then the value of the number is n-1

N2n-1•b n-1 + 2 n-2•bn-2 + Á + 2•b1 + b0 a 2 i•bi i0

There are 2n different unsigned n-bit numbers. The smallest unsigned n-bit number is 0 and the largest is 2n 1. For the unsigned n-bit number system, the basis is {1, 2, 4, Á , 2n2, 2n1} If an n-bit binary number is used to represent a signed two’s complement number, then the value of the number is n-2

N -2n-1•bn-1 + 2n-2•bn-2 + Á + 2•b1 + b0 - 2n-1•bn-1 + a 2i•bi i0

There are also 2 different signed n-bit numbers. The smallest signed n-bit number is 2n1 and the largest is 2n1 1. For the signed n-bit number system, the basis is n

{1, 2, 4, Á , 2n2, 2n1} Maintenance Tip: When programming in C, we will use data types char short and long when we wish to explicitly specify the precision as 8-bit, 16-bit or 32-bit. Whereas, we will use the int data type only when we don’t care about precision, and we wish the compiler to choose the most efficient way to perform the operation. Observation: When programming in assembly, we will always explicitly specify the precision of our numbers and calculations.

The binary coded decimal or BCD format is convenient for storing data that has just been input or is just about to be output. Each byte or a BCD number contains two decimal digits, and each decimal digit is encoded in four-bit binary. For example, the number 1,234,567 is stored in four bytes as $01234567. If m is the number of bytes, then the numbers from 0 to 100m 1 can be stored. Checkpoint 3.24: What hexadecimal values are used to store the number 3456 in 16-bit BCD format?

3.6

Logical Operations Software uses logical operations to combine information, to extract information and to test information. A unary operation produces its result given a single input parameter. For example, negate, increment, and decrement are unary operations.

3.6 䡲 Logical Operations

67

In discrete digital logic, the complement operation is called a NOT gate, as shown in Figure 3.5. The complement function is defined in Table 3.11. CMOS refers to complementary metal oxide semiconductor. The “HC” in 74HC04 stands for high-speed CMOS. Most microcomputers, including the 9S12, are made with high-speed CMOS logic. As we saw in Chapter 1, CMOS circuits are built with p-type and n-type transistors. There are just a few rules one needs to know for understanding how CMOS transistor-level circuits work. Each transistor acts like a switch between its source and drain pins. In general, current can flow from source to drain across an active p-type transistor, and no current will flow if the switch is open. From a first approximation, we can assume no current flows into or out of the gate. For a p-type transistor, the switch will be closed (transistor active) if its gate is low. A p-type transistor will be off (its switch is open) if its gate is high. The gate on the n-type works in a complementary fashion, hence the name complementary metal oxide semiconductor. For a n-type transistor, the switch will be closed (transistor active) if its gate is high. A n-type transistor will be off (its switch is open) if its gate is low. Therefore, consider the two possibilities for the circuit in Figure 3.5. If A is high (5 V), then p-type is off and the n-type is active. The closed switch across the source-drain of the n-type will make the output low (0 V). Conversely, if A is low (0 V), then p-type is active and the n-type is off. The closed switch across the sourcedrain of the p-type will make the output high (5 V). The 9S12 performs the complement in a bit-wise fashion. For example, the calculation r⬃n means each bit is calculated separately, r7⬃n7, r6⬃n6, . . . , r0⬃n0. Figure 3.5 Logical NOT operation can be implemented with discrete transistors or digital gates.

+5V p-type A p-type n-type A 0 V active off +5V A +5V off active 0V

drain

n-type

drain

gate

Table 3.11 Logical complement.

A

⬃A

0 1

1 0

source

gate

A

A

A

74HC04

source

A binary operation produces a single result given two inputs. The logical AND (&) operation yields a true result if both input parameters are true. The logical OR (|) operation yields a true result if either input parameter is true. The exclusive OR (^) operation yields a true result if exactly one input parameter is true. The logical operators are summarized in Table 3.12 and shown as digital gates in Figure 3.6. We can understand the operation of the AND gate by observing the behavior of its six transistors. If both A and B are high, both T3 and T4 will be active. Furthermore, if A and B are both high, T1 and T2 will be off. In this case, the signal labeled A & B will be low because the T3,T4 switch combination will short this signal to ground. If A is low, T1 will be active and T3 off. Similarly, if B is low, T2 will be active and T4 off. Therefore, if either

Table 3.12 Logical operations.

A

B

A&B

A|B

A^B

0 0 1 1

0 1 0 1

0 0 0 1

0 1 1 1

0 1 1 0

68

3 䡲 Representation and Manipulation of Information

Figure 3.6 Logical operations can be implemented with discrete transistors or digital gates.

AND Gate

OR Gate

A&B

A B

74HC08 +5V A

A&B T3

74HC86 +5V

+5V T2

A^B

A B

74HC32

T1

B

EOR Gate

A|B

A B

+5V

A

T1

B

T2

+5V

T5 A&B

A|B

T6 T4

T3

T5 A|B T6

T4

A is low or if B is low, the signal labeled A & B will be high because one or both of the T1,T2 switches will short this signal to 5 V. Transistors T5 and T6 create a logical complement, converting the signal A & B into the desired result of A&B. We can understand the operation of the OR gate by observing the behavior of its 6 transistors. If both A and B are low, both T1 and T2 will be active. Furthermore, if A and B are both low, T3 and T4 will be off. In this case, the signal labeled A | B will be high because the T1,T2 switch combination will short this signal to 5V. If A is high, T3 will be active and T1 off. Similarly, if B is high, T4 will be active and T2 off. Therefore if either A is high or if B is high, the signal labeled A | B will be low because one or both of the T3,T4 switches will short this signal to ground. Transistors T5 and T6 create a logical complement, converting the signal A | B into the desired result of A|B. Checkpoint 3.25: Using just the 74HC gates shown in Figures 3.5 and 3.6, design an equals circuit, such that the output is 1 if and only if input A equals input B. There will be two input signals and one output signal.

Most 8-bit logical instructions take two inputs, one from a register and the other from memory. The 9S12 performs these operations in a bit-wise fashion on two 8-bit parameters yielding an 8-bit result. For example, the calculation rm&n means each bit is calculated separately, r7m7&n7, r6m6&n6, . . . , r0m0&n0. All but the bita bitb instructions put the result back in the register. The N bit will be set is the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. anda anda andb andb bita bita bitb bitb coma comb eora eora eorb eorb oraa oraa orab orab

#w U #w U #w U #w U

#w U #w U #w U #w U

;RegA=RegA&w ;RegA=RegA&[U] ;RegB=RegB&w ;RegB=RegB&[U] ;RegA&w ;RegA&[U] ;RegB&w ;RegB&[U] ;RegA=$FF-RegA, RegA=~RegA ;RegB=$FF-RegB, RegB=~RegB ;RegA=RegA ^ w ;RegA=RegA ^ [U] ;RegB=RegB ^ w ;RegB=RegB ^ [U] ;RegA=RegA | w ;RegA=RegA | [U] ;RegB=RegB | w ;RegB=RegB | [U]

Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Complement RegA Complement RegB Exclusive or RegA with a constant Exclusive or RegA with a memory value Exclusive or RegB with a constant Exclusive or RegB with a memory value Logical or RegA with a constant Logical or RegA with a memory value Logical or RegB with a constant Logical or RegB with a memory value

3.6 䡲 Logical Operations

69

Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.1 Write software to set bit 4 and clear bits 1 and 0 of an 8-bit variable N. Solution We use an 8-bit register because we wish to operate on 8-bit data. We “or with 1” to set bits and we “and with 0” to clear bits. This logical function N$FC&(N|$10) performs the desired effect. Immediate mode addressing is used when operating on fixed constants. ldaa oraa anda staa

N #$10 #$FC N

;RegA = N|$10 (set bit 4) ;RegA = $FC&(N|$10) (clears bits 1,0)

To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits of variable N. The ldaa instruction brings these values into Register A. The oraa instruction sets bit 4, the anda instruction clears bits 1,0, and the staa instruction stores the result back to N. b7 0 b7 1 b7

b6 0 b6 1 b6

b5 0 b5 1 b5

b4 1 1 1 1

b3 0 b3 1 b3

b2 0 b2 1 b2

b1 0 b1 0 0

b0 0 b0 0 0

value of N $10 constant result of the oraa instruction $FC constant result of the anda instruction

Checkpoint 3.26: Write assembly code that implements RegDRegD&$0F3C. Checkpoint 3.27: Write assembly code that implements RegXRegX|$1234. Checkpoint 3.28: Let N be an 8-bit location. Write assembly code that clears bit 4.

We can use the AND operation to extract, or mask, individual bits from a value.

Example 3.2 Write software that sets a global variable to true if a switch is pressed. Solution The first step is to interface a switch to an input port of the 9S12. We will use positive logic interface because we want the digital signal in to be high if and only if the switch is pressed, as shown in Figure 3.7. In particular, PTT bit 0 contains a signal that is high or low depending on the position of the switch. Some switches bounce, which means there will be multiple open/closed cycles when the switch is changed. This simple solution can be used if the switch doesn’t bounce or if the bouncing doesn’t matter. Bit 0 of the Port T direction register should be made zero during the initialization. When the computer reads PTT it gets all 8 bits of the input port. On the other hand, the expression PTT&0x01 will be zero, if Figure 3.7 Interface of a switch to a microcomputer input.

+5V 9S12 in 10kΩ

PT0

70

3 䡲 Representation and Manipulation of Information

and only if bit 0 of PTT is zero. The following C code will set the variable Pressed to true (nonzero) if the switch is pressed. Pressed = PTT&0x01;

// true if the switch is pressed

The following 9S12 assembly code uses the anda instruction to perform the same operation. ldaa PTT ;read input Port T anda #$01 ;clear all bits except bit 0 staa Pressed ;true iff the switch is pressed

To illustrate how the above program works, let a7 a6 a5 a4 a3 a2 a1 a0 be the values of the 8 individual bits in PTT. The ldaa instruction brings these values into Register A. The anda instruction clears all bits except bit 0, and the staa instruction stores the result into the variable called Pressed. a7 a6 a5 a4 a3 a2 a1 a0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 a0

value of PTT $01 constant result of the anda instruction

Often we combine many small systems together to make larger systems. Once we debug a small system we would like to have confidence that it will still work when combined with other systems. One difficulty arises when two or more systems share an I/O port (for example system 1 uses PT1, and system 2 uses PT2). Friendly software modifies just the bits that need to be modified, making it easier to combine with other software. Conversely, an unfriendly solution modifies all 8 bits of a register when needing only to modify less than 8 bits.

Example 3.3 Write software that make PT4 PT5 outputs and clears both outputs without affecting the other bits of PTT. Solution This system uses just bits 4 and 5 of PTT, and the other 6 bits are not needed in this problem. If we implement a friendly solution, this system can be combined with other systems that use the other bits of PTT. We begin by setting DDRT bits 4 and 5, so PT4 PT5 become outputs. Usually, we set the direction register once at the start of our program. Rather than just setting DDRT=0x30 (unfriendly), we perform a read modify write so just bits 4 and 5 are affected. The following C code uses the OR operation to set bits 4 and 5 of the register DDRT. The other six bits of DDRT remain constant. DDRT |= 0x30; // set bits 4 and 5, making PT4 and PT5 outputs

The following 9S12 assembly code uses the oraa instruction to perform the same operation. ldaa DDRT oraa #$30 staa DDRT

;read previous value of DDRT ;set bits 4 and 5, other 6 bits left unchanged ;update the actual direction register

To illustrate how the above program works, let c7 c6 c5 c4 c3 c2 c1 c0 be the values of the original 8 bits in DDRT. The ldaa instruction brings these values into Register A. The oraa instruction sets bits 4 and 5, and the staa instruction stores the result back to DDRT. c7 c6 c5 c4 c3 c2 c1 c0 0 0 1 1 0 0 0 0 c7 c6 1 1 c3 c2 c1 c0

value of DDRT $30 constant result of the oraa instruction

We use another read-modify-write to also to clear bits 4 and 5 of PTT. Notice that ⬃0x30 is 0xCF. This complement is executed at compile-time rather than at run-time. PTT &= ~0x30; // clear bits 4 and 5, PT4 and PT5 become 0

3.6 䡲 Logical Operations

71

The following 9S12 assembly code uses the anda instruction to clear the two bits of PTT. ldaa PTT anda #$CF staa PTT

;read previous value of PTT ;clear bits 4 and 5, other 6 bits left unchanged ;update the actual PTT register

Maintenance Tip: When interacting with just some of the bits of an I/O register, it is better to modify just the bits of interest, leaving the other bits unchanged. In this way, the action of one piece of software does not undo the action of another piece.

These read-or-write and read-and-write sequences are extremely useful in manipulating individual bits within direction registers and output ports. So useful in fact that the 9S12 has instructions to perform these logical operations. Notice that these two instructions directly affect memory space without using registers, and that the data size is always 8-bits. These instructions have two addressing modes. The first addressing mode determines memory location to change. For now this first addressing mode will be direct or extended addressing, but later in Chapter 6, we see indexed addressing mode also could be used to specify the memory location. The second addressing mode will always be immediate, specifying which bits to modify. The N bit will be set if the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. bclr bset

U,#w U,#w

;[U]=[U]&(~w) ;[U]=[U] | w

Clear bits in memory Set bits in memory

Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.4 Write software that toggles a PT3 output without affecting the other bits of PTT. Toggle means change. I.e., if it is 1, make it 0. If it is 0, make it 1. Solution The exclusive or operation can be used to toggle bits. The following C code toggles PT3 by inverting bit 3 of PTT, while the other seven bits remain constant. Notice that 0x08 is %00001000 in binary. PTT ^= 0x08;

// toggle PT3 from 0 to 1 or from 1 to 0

The following 9S12 assembly code uses the eora instruction to perform the same operation. ldaa PTT eora #$08 staa PTT

;read output Port T ;toggle just bit 3, other 7 bits left unchanged ;update the actual output port

To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits in PTT. The ldaa instruction brings these values into Register A. The eora instruction toggles bit 3, and the staa instruction stores the result back to PTT. b7 b6 b5 b4 b3 b2 b1 b0 0 0 0 0 1 0 0 0 b7 b6 b5 b4 ~b3 b2 b1 b0

value of PTT $08 constant result of the eora instruction

72

3 䡲 Representation and Manipulation of Information

Example 3.5 Generate two out-of-phase squarewaves as shown in Figure 3.8. Solution Out of phase means one signal goes high when the other one goes low. During the initialization, we specify PT1 and PT0 as outputs, then establish the initial values as 0 and 1 respectively. We use the exclusive or operation to toggle both bits at the same time. The infinite loop program will repeat the exclusive or operation over and over, creating the out of phase squarewaves on Port T bit 1 and 0. The other six bits of Port T remain unchanged. DDRT |= 0x03; // make PT1 PT0 output PTT = (PTT&0xFD)|0x01; // PT1=0, PT0=1 while(1){ PTT ^= 0x03; // toggle bits 1 and 0 }

The following assembly code uses logical instructions to perform the function in a friendly manner. The period of the squarewave is determined by the speed of the microcomputer. Figure 3.8 shows the simulated waveforms running on a 1 MHz 9S12. main bset bclr bset loop ldaa eora staa bra

DDRT,#03 PTT,#$02 PTT,#$01 PTT #$03 PTT loop

;make PT1, PT0 outputs, leaving other bits as is ;make PT1=0 ;make PT0=1, leaving other bits as is ;read previous value of Port T ;toggle bits 1,0 ;change PT1 PT0, leaving other bits as is

Figure 3.8 Scope window showing the execution of Example 3.5.

Other convenient logical operators are summarized in Table 3.13 and shown as digital gates in Figure 3.9. The NAND operation is defined by an AND followed by a NOT. If you compare the transistor-level circuits in Figures 3.6 and 3.9, it would be more precise to say AND is defined as a NAND followed by a NOT. Similarly, the OR operation is a NOR followed by a NOT. The exclusive NOR operation implements the bit-wise equals operation. Table 3.13 Convenient logical operations.

A

B

NAND

NOR

Exclusive NOR

0 0 1 1

0 1 0 1

1 1 1 0

1 0 0 0

1 0 0 1

3.6 䡲 Logical Operations Figure 3.9 Other logical operations can also be implemented with discrete logic.

NAND

NOR

A&B

A B

A|B

A B

74HC00

Ex NOR

74HC02

+5V

Open collector

NOT

A^B

A B

73

A

A 74HC7266

7405 or 7406

+5V

+5V A

A

74HC05

B

A&B

B

A A|B

A

The output of an open collector gate, drawn with the ‘x’, has two states: low (0 V) and HiZ (floating.) TExaS signifies this floating state with a z, as seen in Figure 3.2. Consider the operation of the transistor-level circuit for the 74HC05. If A is high (5 V), the transistor is active, and the output is low (0 V). If A is low (0 V), the transistor is off, and the output is neither high nor low. In general, we can use an open collector NOT gate to control the current to a device, such as a relay, an LED, a solenoid, a small motor and a small light. The 74HC05, the 7405, and the 7406 are all open collector NOT gates. 74HC04 is high speed CMOS and can only sink up to 4 mA when its output is low. Since the 7405 and 7406 are transistor-transistor-logic (TTL) they can sink more current. In particular, the 7405 has a maximum output low current (IOL) of 16 mA, whereas the 7406 has a maximum IOL of 40 mA.

Example 3.6 The goal is develop a means for the microcontroller to turn on and turn off an AC-powered appliance. The interface will use a solid-state relay with a control parameters of 2 V and 10 mA. Write necessary subroutines to operate the system. Solution The control portion of the solid-state relay (SSR) is an LED, which we interface using an open collector NOT gate just like Figure 2.17. We choose an electronic circuit that has an output current larger than the 10 mA needed by the SSR. Since the maximum IOL of the 7405 is 16 mA, it can sink the 10 mA required by the SSR. The 7406 could also have been used. The resistor is selected to control the current to the diode. Using the LED design equation, R (5 Vd VOL)/Id (5 2 0.5 V)/0.01 A 250 . The closest standard value 5% resistor is 240 . A 240 resistor will generate Id (5 2 0.5 V)/240 10.4 mA, which will be close enough to activate the relay. When the input to the 7405 is high (p 5 V), the output is low (q 0.5 V), see Figure 3.10. In this state, a 10 mA current is applied to the diode, and relay switch activates. This causes 120 VAC power to be delivered to the appliance. But, when the input is low (p 0), the output floats (q HiZ, which is neither high or low). This floating output state causes the LED current to be zero, and the relay switch opens. In this case, no power is delivered to the appliance. Figure 3.10 Solid-state relay interface using a 7405 open collector driver.

+5V 240Ω

9S12

SSR

7405 PT5

p

Appliance

q

120 VAC

74

3 䡲 Representation and Manipulation of Information

The initialization subroutine will set bit 5 of DDRT to make PT5 an output, see Program 3.1. This function should be called once at the start of the system. After initialization, the on and off functions can be called to control the applicance. Software that operates by affecting only the bits it has to without changing any of the other bits is called friendly. The oraa instruction is used to set bits and the anda instruction clears bits. Program 3.1 Subroutines to control a solid-state relay.

SSR_Init ldaa oraa staa rts SSR_On ldaa oraa staa rts SSR_Off ldaa anda staa rts

DDRT #$20 DDRT

;PT5 output

PTT #$20 PTT

;PT5 high

PTT #$BF PTT

;PT5 low

// Make PT5 an output SSR_Init(void){ DDRT |= 0x20; } // Make PT5 high void SSR_On(void){ PTT |= 0x20; } // Make PT5 low void SSR_Off(void){ PTT &= ~0x20; }

Checkpoint 3.29: Rewrite the assembly code in Program 3.1 using the bset and bclr instructions.

While we’re introducing digital circuits, we need digital storage devices, which are are essential components used to make registers and memory. The simplest storage device is the set-reset flip-flop. One way to build one is shown on the left side of Figure 3.11. If the inputs are S*0 and R*1, then the Q output will be one. Conversely, if the inputs are S*1 and R*0, then the Q output will be 0. Normally, we leave both the S* and R* inputs high. We make the signal S* go low, then back high to set the flip-flip, making Q 1. Conversely, we make the signal R* go low, then back high to reset the flip-flip, making Q 0. If both S* and R* are 1, the value on Q will be remembered or stored. This flip-flop enters an unpredicable mode with S* and R* are simulataneously low. Figure 3.11 Digital storage elements.

Set-Reset flip-flop S*

Q

Gated D flip-flop S*

D W

R*

R*

Q

74HC374

74HC74 8 D Q clock

D Q clock G

8

The gated D flip-flop is also shown in Figure 3.11. The front-end circuits take a data input, D, and a control signal, W, and produce the S* and R* commands for the set-reset flip-flop. For example, if W 0, then the flip-flip is in its quiescent state, remembering the value on Q that was previously written. However, if W 1, then the data input is stored into the flip-flip. In particular, if D 1 and W 1, then S*0 and R*1, making Q 1. Furthermore, if D 0 and W 1, then S*1 and R*0, making Q 0. So, to use the gated flip-flip, we first put the data on the D input, next we make W go high, then we make W go low. This causes the data value to be stored at Q. After W goes low, the data does not need to exist at the D input anymore. If the D input changes while W is high, then the Q output will change correspondingly. However, the last value on the D input is remembered or latched when the W falls, as shown in Table 3.14. The D flip-flop, shown on the right of Figure 3.11, can also be used to store information. D flip-flips are the basic building block of RAM and registers on the computer. To save information, we first place the digital value we wish to remember on the D input, then give a rising edge to the clock input. After the rising edge of the clock, the value is available at the Q output, and the D input is free to change. The operation of the clocked D flip-flop is

3.6 䡲 Logical Operations

75

defined on the right side of Table 3.14. The 74HC374 is an 8-bit D flip-flop, such that all 8 bits are stored on the rising edge of a single clock. The 74HC374 is similar in structure and operation to a register, which is high speed memory inside the processor. If the gate (G) input on the 74HC374 is high, its outputs will be HiZ (floating), and if the gate is low, the outputs will be high or low depending on the stored values on the flip-flop. Table 3.14 D flip-flop operation. Qold is the value of the D input at the time of the active edge of on W or clock.

D

W

Q

D

clock

Q

0 1 0 1 0 1

0 0 1 1 T T

Qold Qold 0 1 0 1

0 0 1 1 0 1

0 1 0 1 c c

Qold Qold Qold Qold 0 1

Second, the tristate driver, shown in Figure 3.12, can be used dynamically control signals within the computer. The tristate driver is an essential component from which computers are built. To active the driver, we make its gate (G) low. When the driver is active, its output (Y) equals its input (A). To deactive the driver, we make its G high. When the driver is deactive, its output Y floats independent of A. We saw this floating state with the open collector logic, and it is also called HiZ or high impedence. The HiZ output means the output is neither driven high or low. The operation of a tristate driver is defined in Table 3.15. The 74HC244 is an 8-bit tristate driver, such that all 8 bits are active or deactive controlled by a single gate. The 74HC374 8-bit D flip-flop includes tristate drivers on its outputs. Normally, we can’t connect to digital outputs together. The tristate driver provides a way to connect multiple outputs to the same signal, as long as at most one of the gates is active at a time. Figure 3.12 A 1-bit and an 8-bit tristate driver.

74HC125 Y

A G

A G

A +5V

Y

+5V

+5V T3

74HC244 8 8

G A

T5 T6

T4 Y

T1 T7

G T2

Table 3.15 Tristate driver operation. HiZ is the floating state, such that the output is not high or low.

G

T8

A

G

T1

T2

T3

T4

T5

T6

T7

T8

Y

0 1 0 1

0 0 1 1

on on off off

off off on on

on off on off

off on off on

on on off off

off on off on

on off on off

on on off off

0 1 HiZ HiZ

To understand how a tristate driver works, look at the various pieces of the circuit in Figure 3.12. Transistors T1 and T2 create the logical complement of G. Similarly, transistors T3 and T4 create the complement of A. An input of G 0 causes the driver to be active. In this case, both T5 and T8 will be on. With T5 and T8 on, the circuit behaves like a cascade

76

3 䡲 Representation and Manipulation of Information

of two NOT gates, so the output Y equals the input A. However, if the input G 1, both T5 and T8 will be off. Since T5 is in series with the 5 V, and T8 in series with the ground, the output Y will be neither high nor low. I.e., it will float.

3.7

Shift Operations When programming in C, the shift is a binary operation. In other words, the > operators take two inputs and yield one output, e.g., r m >> n. But at the machine level (i.e., assembly programming), the shift operators are actually unary operations, e.g., r m >> 1. The assembly instructions used for shifting will shift one bit at a time. If you want to shift multiple times, you will have to execute the instruction multiple times. The logical shift right (LSR) is the equivalent to an unsigned divide by 2, as shown in Figure 3.13. A zero is shifted into the most significant position, and the carry flag will hold the bit shifted out.

Figure 3.13 8-bit logical shift right.

LSR

0

C

Consider the top row of 8 D flip-flops of Figure 3.14 as a register containing an 8-bit value. The LSR function can be implemented in hardware as a two step process. The first step, which occurs on the falling edge of shift (rising edge of copy), is to make a copy of the 8 bits into the lower row of D flip-flips. Then, on the rising edge of the shift signal, the new shifted value is clocked back in the top row. Figure 3.14 8-bit logical shift right hardware.

0

b7

b6

b5

b4

D Q

D Q

D Q

D Q

b3 D Q

b2 D Q

b1

b0

D Q

D Q

C D Q

shift D Q

D Q

D Q

D Q

D Q

D Q

D Q

D Q

copy

The arithmetic shift right (ASR) is the equivalent to a signed divide by 2, as shown in Figure 3.15. Notice that the sign bit is preserved and the carry flag will hold the bit shifted out. Figure 3.15 8-bit arithmetic shift right.

ASR

C

Checkpoint 3.30: Use D flip-flops like Figure 3.14 to build an 8-bit ASR function.

The same shift left operation works for both unsigned and signed multiply by 2, as shown in Figure 3.16. In other words, the arithmetic shift left (ASL) is identical to the logical shift left (LSL). A zero is shifted into the least significant position, and the carry bit will contain the bit that was shifted out. Figure 3.16 8-bit shift left.

LSL/ASL C

0

The roll operations can be used to create multiple-byte shift functions. Roll right and roll left are shown in Figure 3.17. In each case, the carry is shifted into the 8-bit byte, and the carry bit will contain the bit that was shifted out. The simplest way to perform a shift operation on the microcomputer is to use a register like Register A or Register B. The asla and lsla instructions have identical machine

3.7 䡲 Shift Operations Figure 3.17 8-bit roll right and 8-bit roll left.

ROR

77

C

ROL

C

codes. The two assembly language names allow the programmer to write clearer code (using lsla for unsigned numbers and asla for signed numbers). The shift instructions use inherent addressing. The N bit is set if the result is negative. The Z bit is set if the result is zero. The V bit is set on a signed overflow, and detected by a change in the sign bit. The C bit is the carry out after the shift. asla aslb asld lsla lslb lsld asra asrb asrd lsra lsrb lsrd rola rolb rora rorb

;RegA=RegA*2 ;RegB=RegB*2 ;RegD=RegD*2 ;RegA=RegA*2 ;RegB=RegB*2 ;RegD=RegD*2 ;RegA=RegA/2 ;RegB=RegB/2 ;RegD=RegD/2 ;RegA=RegA/2 ;RegB=RegB/2 ;RegD=RegD/2 ; ; ; ;

Signed shift left, same as lsla Signed shift left, same as lslb Signed shift left, same as lsld Unsigned shift left, same as asla Unsigned shift left, same as aslb Unsigned shift left, same as asld Signed shift right Signed shift right Signed shift right Unsigned shift right Unsigned shift right Unsigned shift right Rotate RegA (C←A7←...←A0←C) Rotate RegB (C←B7←...←B0←C) Rotate RegA (C→A7→...→A0→C) Rotate RegB (C→B7→...→B0→C)

Example 3.7 Write assembly code to implement M N >> 2, where M and N are 16-bit unsigned variables. Solution We need to use a 16-bit register, because we have 16-bit data. First, we perform a 16-bit read, bringing N into Register D. Second we divide by 4 using two shift right operations, and lastly we store the result into M. Since the value gets smaller, no overflow can occur. If the variables were signed, then the two lsrd instructions should be replaced with a asrd instructions ldd N lsrd lsrd std M

Checkpoint 3.31: Let N and M be 8-bit signed locations. Write assembly code to implement M4*N. Maintenance Tip: Use the asla instruction when manipulating signed numbers, and use the lsla instruction when shifting unsigned numbers.

Example 3.8 Take two 4-bit nibbles and combine them into one 8-bit value. Solution The solution uses the shift operation to move the bits into position, then it uses the or operation to combine the two parts into one number. Let High and Low be the unsigned 4-bit

78

3 䡲 Representation and Manipulation of Information

components, which will be combined into a single unsigned 8-bit Result. We will assume both High and Low are bounded within the range of 0 to 15. The expression High127 R=127

R=R16

R=R16

end

end

The C code in Program 3.3 adds and subtracts two 8-bit signed numbers. The compiler will automatically promote A and B to signed 16-bit values before the addition. Program 3.3 Using promotion to detect and compensate for signed overflow errors.

char A,B,R; void add(void){ short result = A+B; /* if(result>127){ /* result = 127; /* } if(result127){ /* result = 127; /* } if(result -1234)isGreater(); Homework 5.3 Assume you have an 8-bit unsigned global variable G. Write assembly code that implements if(G < 50) isLess(); else isMore(); Homework 5.4 Assume you have a 16-bit signed global variable H. Write assembly code that implements if(H < -500) isLess(); else isMore(); Homework 5.5 Assume you have an 8-bit global variable G. Write assembly code that implements while(G&0x80)body(); Homework 5.6 Write assembly code that implements while(PTT&0x01)body(); Homework 5.7 You will write four assembly language versions of the following C code n=100; while(n!=0){n--; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as an 8-bit global variable. c) Assume the variable n is implemented as a 16-bit variable in Register D. d) Assume the variable n is implemented as an 8-bit variable in Register A. Homework 5.8 You will write four assembly language versions of the following C code n=0; while(n3

where the divide by 8 is integer math without rounding. Notice that if J is less than or equal to 31, then J divided by 8 will be less than or equal to 3. Let K be the bottom three bits of J. K = J&0x07;

A mask will specify the bit location within the byte. In C, the following array can be used unsigned const char Masks[8]={0x80,0x40,0x20,0x10,0x08,0x04,0x02, 0x01};

In assembly, this array can be defined in ROM as Masks fcb $80,$40,$20,$10,$08,$04,$02,$01

Recall that K is the bottom three bits of J. For example, if K is 0102 then we use the bit mask of $20 to access the information stored in the appropriate byte of the Video buffer mask = Masks[K];

Program 6.11 takes the row and column index values and calculates the memory address and bit mask to access that bit in the Video matrix. Access is a private function for this module. A helper function is another name for private functions used inside a module, but is not called by software outside the module. Conversely, the other four functions of this module are public. Functions to clear, set, and toggle bits in the Video matrix are shown in Program 6.12. For all four public functions, the parameters I, J as passed by value, and the video buffer itself

208

6 䡲 Pointers and Data Structures

Program 6.11 A helper function to access a bit matrix.

; ********* Access *************** ; Access the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg X points to the byte of interest ; Reg A is the Mask to access that bit Access lsla lsla ;4*I pshb ;save a copy of J lsrb lsrb lsrb ;Reg B = J>>3 aba ;Reg A = 4*I + J>>3 ldx #Video tab abx ;Reg X = Video + 4*I + J>>3 pulb ;Reg B = J again andb #$07 ;Reg B = K (bottom three bits of J) ldy #Masks ldaa B,Y ;Reg A = mask = Masks[K] rts

is a private global within this module. A function that tests the current value within the matrix is shown in Program 6.13. In order for the image to appear on the display, there must be a hardware interface that translates the data in the video buffer onto the graphics hardware. A typical way this translation occurs is for the video buffer to exist in the display hardware itself. The software reads and writes this buffer in a similar way as described in this example. The graphics hardware is then responsible for copying the data from the buffer onto the display.

Program 6.12 Functions that modify the bit matrix.

; Clear the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_ClrBit bsr Access coma ;Not(mask) zero in bit location anda 0,x ;Clear bit staa 0,x rts ; Set the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_SetBit bsr Access oraa 0,x ;Set bit staa 0,x rts ; Invert the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_InvBit bsr Access eora 0,x ;Flip bit staa 0,x rts

6.5 䡲 Structures Program 6.13 A function that reads the bit matrix.

6.5

209

; Read the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg CC zero bit is the value read from the array ; Reg A is zero or not zero depending on the bit Display_ReadBit bsr Access anda 0,x ;Z=1 if bit was zero, Z=0 if bit was one rts

Structures A structure has elements with different types and/or precisions. In C, we use struct to define a structure. The const modifier causes the structure to be allocated in ROM. Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. In the example shown in Figure 6.13, Name is a variable length ASCII strings, but as you can see, we have to specify its maximum size. const struct port{ unsigned char AndMask; // bits that can change unsigned char OrMask; // bits that must stay high unsigned char *Addr; // Port Address unsigned char Name[10]; // ASCII string }; typedef const struct port portType; portType PortT={0x15,0x82,0x0240,”PTT”};

Figure 6.13 A structure collects objects of different sizes into one object.

$F950 $F951 $F952 $F954

$15 $82 $0240 “PTT”,0,0,0,0,0,0,0

Checkpoint 6.13: Most C compilers will align 16-bit elements within structures to an even address. How would Figure 6.13 have been different if the positions of OrMask and Addr had been reversed?

In Program 6.14, we can use the equ pseudo-op to make our software more readable. The subroutine Port_Out uses call by reference for the port structure and call by value for the data written to the port. Program 6.14 Assembly language example of a structure.

AndMask equ 0 OrMask equ AndMask+1 Addr equ OrMask+1 Name equ Addr+2 ; Reg A = data to output ; Reg X = pointer to Port structure Port_Out psha anda AndMask,x ;modify input with andmask oraa OrMask,x ;modify input with ormask ldy Addr,x ;get Port address staa 0,y ;output ldx Name,x ;pointer to string jsr OutString ;print string

continued on p. 210

210

6 䡲 Pointers and Data Structures

continued from p. 209 pula rts ;******************************** PortT fcb $15,$82 ;AndMask,OrMask fdb $0240 ;pointer to PTT fcc “PTT” ;string fcb 0,0,0,0,0,0,0 main lds #$4000 movb #$FF,DDRT ldaa #$00 ;data loop ldx #PortT ;pointer to structure bsr Port-Out inca bra loop

Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. If the structure resides in RAM, then the system will have to initialize it explicitly via software execution. Again, most C compilers will implicitly initialize variable structures.

6.6

*Tables A table is a collection of identically sized structures. Program 6.15 and Figure 6.14 show a table containing a simple data base. Each entry in the table records the name, life span, and the year of inauguration. The names are variable length, but a fixed size will be allocated so that each table entry will be exactly 36 bytes. The C compiler will fill the unused bytes in the Name field with zeros.

Program 6.15 A simple data base with three entries.

Figure 6.14 A table collects structures of same size into one object.

const struct entry{ unsigned char Name[30]; // null-terminated string unsigned short life[2]; // birth year, year died unsigned short year; // year of inauguration }; typedef const struct entry entryType; entryType Presidents[3]={ {“George Washington”,{1732,1799},1789}, {“John Adams”,{1735,1826},1797}, {“Thomas Jefferson”,{1743,1826},1801} };

"George Washington" 1732 1799 1789 "John Adams" 1735 1826 1797 "Thomas Jefferson" 1743 1826 1801

Checkpoint 6.14: Why do elements of a table all have to be the same size?

Program 6.16 shows the assembly language definition of the data base. We use equ pseudoops to make the software more readable.

6.6 䡲 *Tables Program 6.16 The entries of a table written in assembly language.

211

NAME equ 0 LIFE equ NAME+30 YEAR equ LIFE+4 SIZE equ YEAR+2 Presidents fcb “George Washington”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0 fdb 1732,1799 fdb 1789 fcb “John Adams”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 fdb 1735,1826 fdb 1797 fcb “Thomas Jefferson”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0,0 fdb 1743,1826 fdb 1801

To access the Inauguration year of the second president in C, we could execute theyear = Presidents[1].year;

This operation in assembly is ldd std

Presidents+SIZE+YEAR theyear

If we wanted the year the third president died in C, we could execute theyear = Presidents[2].life[1];

This operation in assembly is ldd std

President+2*SIZE+LIFE+2 theyear

Program 6.17 shows an assembly language function that prints the name of the nth president. First it calculates the address of the nth entry (Presidentsn*SIZE). In general, the next step would be to add the offset (in this case NAME is zero). This program assumes SIZE*n is less than 256. Program 6.17 A subroutine that prints the name of a president.

;Print the name of the nth entry ;Reg A is the index n ranging from 0 to 2 OutPresident ldx #Presidents ;Reg X points to the table ldab #SIZE ;36 bytes in each entry mul ;Reg D = SIZE*n abx ;Reg X = base +SIZE*n jsr OutString ;Prints name rts

The table, shown in Program 6.18, contains five identically formated structures. Each structure (e.g., PORTA) contains five entries: an 8-bit ASCII character, two pointers, and two byte values. Again, the equ pseudo-ops clarify access the table. It could be used to write a I/O port driver, separating the high-level software from the low-level hardware.

212

6 䡲 Pointers and Data Structures

Program 6.18 A table containing the information about the some 9S12 I/O ports.

6.7

PortChar equ 0 DataPT equ 2 DirPT equ 4 InitDir equ 6 InitData equ 7 Table PORTA fcb 'A' fdb $0000 fdb $0002 fcb 0,0 PORTB fcb 'B' fdb $0001 fdb $0003 fcb $FF,$55 PORTJ fcb 'J' fdb $0268 fdb $026A fcb 0,0 PORTM fcb 'M' fdb $0250 fdb $0252 fcb 0,0 PORTT fcb 'T' fdb $0240 fdb $0242 fcb $FF,$00

;ASCII character specifying port name ;Pointer to port address ;Pointer to direction register ;8-bit initial value of direction reg ;8-bit initial value to output ;Port A ;Address of PortA ;DDRA ;Initially input ;Port B ;Address of PortB ;DDRB ;Initially output=$55 ;Port J ;Address of PortJ ;DDRJ ;Initially input ;Port M ;Address of PortM ;DDRM ;Initially input ;Port T ;Address of PortT ;DDRT ;Initially output=$00

*Trees A graph is a general linked structure without limitations, see Figure 6.15. An acyclic graph is a linked structure without loops. Although there may be multiple pathways to access a node in an acyclic graph, all paths have a finite length. A tree is an acyclic graph with a single root node from which a unique path to each node can be traced. The pointers in an acyclic graph do not form closed circles.

Figure 6.15 Graphs and trees have nodes and are linked with pointers.

Graph

Acyclic graph

Tree

Figure 6.16 shows an arbitrary tree can have a variable number of leaves, while a binary tree consists of node with exactly two pointers (i.e., links or branches.) One way to implement an arbitrary tree is to place a null-terminated list of pointers in each node. For the binary tree, each node has exactly two links, and we use a null pointer to specify the link is not valid. Checkpoint 6.15: Neglecting shortcuts and the StartMenu for now, what type of organization best describes the file structure on the Windows OS? Checkpoint 6.16: Shortcuts and the StartMenu on the Windows OS allow for files and programs to be accessed in multiple ways. Observe the properties of a shortcut on your computer. Does Windows OS implement an acyclic graph? Checkpoint 6.17: If you made an electronic dictionary where each word in the definition portion of an entry was linked to its definition, what type of structure would you have?

6.7 䡲 *Trees Figure 6.16 A tree can be constructed with only down arrows, and there is a unique path to each node.

Binary tree

Arbitrary tree Root

213

Root

Lists with 0 1,2,... links

Info

Info Info

Lists with exactly 2 links

Info null

null

Info

Info null null

Info

Info

null

Info

null

Info null null

null

Info null null

Info null null

A null pointer signifies the end or leaf of the tree. Since each node of a tree has exactly one pointer to it, there is a unique path from the root to each node. One application of a tree is dictionary storage, as shown in Figure 6.17. Each word is stored as a node in the tree. The position of each word in the tree is determined from its alphabetical order. In this simple dictionary, each node contains a name that is a single letter and a value that is an 8-bit number. The binary tree is sorted by name, meaning elements alphabetically before this node can be found using the first link, and elements alphabetically after this node can be found using the second link. Figure 6.17 A binary tree is constructed so that earlier elements are to the left and later ones to the right.

Root S

F

$88

$84

V

null A

$8B

$86

null T

null

null

null

null

$8A

Program 6.19 shows the definition of the tree structure drawn in Figure 6.17. If the dictionary is static, then we can define it in ROM. If it needs to be dynamic, then it must be allocated in RAM and initialized at run time. In Program 6.19, the tree is implemented as a constant structure. Name Data Left Right Root NULL

equ equ equ equ equ equ

0 1 2 4 WS 0

;name of the node ;data for this node ;pointer to son ;pointer to son ;Pointer to top ;undefined address

#define NULL 0 const struct Node{ unsigned char Name; unsigned char Data; const struct Node *Left; const struct Node *Right;};

continued on p. 214 Program 6.19 Definition of a simple binary tree.

214

6 䡲 Pointers and Data Structures

continued from p. 213 WS

WV

WT

WF

WA

fcb fdb fdb fcb fdb fdb fcb fdb fdb fcb fdb fdb fcb fdb fdb

‘S’,$88 ;name,data WF ;Left son WV ;Right son ‘V’,$86 WT ;WT is a left son NULL ;no right son ‘T’,$8A NULL ;no children NULL ‘F’,$84 WA ;WA is a left son NULL ;no right son ‘A’,$8B NULL ;no children NULL

typedef const struct Node NodeType; typedef NodeType * NodePtr; #define Root WS #define WS &Tree[0] #define WV &Tree[1] #define WT &Tree[2] #define WF &Tree[3] #define WA &Tree[4] NodeType Tree[5]={ { ‘S’,0x88, WF, WV}, { ‘V’,0x86, WT, NULL}, { ‘T’,0x8A, NULL, NULL}, { ‘F’,0x84, WA, NULL}, { ‘A’,0x8B, NULL, NULL}};

Program 6.20 presents assembly and C functions that search the binary tree. To look up a word in this dictionary, one starts at the root. The following sequence is repeated until the entry is found (success) or a null point is reached (failure). If the current name matches, then it quits returning the data (its definition) at that node. If the current word is not correct, then we will search left or right. If the look up word is less than the current word, go left. If the look up word is greater than the current word, go right. The program quits with a false result if the pointer becomes null.

;Inputs: Reg A = look up letter ;Outputs: Reg A=0 if not found, ; =data if found ; If fails RegY=>last link Look ldy #Root ldx 0,y ;start at root loop cpx #NULL beq fail cmpa Value,x ;Match beq found ;Skip if found blo golft leay Right,x ;letter>value ldx 0,y ;go right bra loop golft leay Left,x ;letterValue == letter){ return(pt->Data); // good } if(pt->Value < letter){ pt = pt->Right; } else{ pt = pt->Left; } } return 0; /* not in tree */ }

Program 6.20 Binary tree search functions.

In order to add and remove nodes at run time, the tree must be defined in RAM. Program 6.21 shows how to insert a new word into the dictionary. One first searches for the word (the search should fail), then change the null pointer to point to the new list. If the search fails in the previous Look subroutine, Reg Y contains the address of the null pointer to be changed.

6.7 䡲 *Trees Program 6.21 Program to add a node to a binary tree.

215

; Inputs : Reg Y points to a new word to be added to the dictionary ; the new word is already somewhere in memory formatted e.g., ; fcb ‘J’,6 ; fdb NULL ; fdb NULL New pshy ldaa 0,Y ;Reg A is the name of the new word bsr Look pulx ;RegX points to new node to add tsta bne ok ;skip if already defined stx 0,Y ;link into existing tree OK rts

Figure 6.18 shows the binary tree as the nodes J, U, and G are added to the dictionary. Notice, that after the J and U are added, there is something inefficient about this tree of depth 4 and size 7. A binary tree of depth n is capable of holding 2n 1 nodes. A binary tree is full if it has depth n and contains from 2n1 to 2n 1 nodes. The tree in Figure 6.18 after J and U are added is not full; all the other three trees are full. Figure 6.18 Nodes are added to a binary tree such that the alphabetical order is maintained.

add J

Initial tree S

S

F null

A null

V null

T

null null

F

null

add G

F

null null null null null

F null

T U

null null

null

T

S

V J

J

null null null null null null

add U

S

A

A

V

A null null

V J

G

null null

null null

null

T U

null null

This may seem like a lot of trouble for such a simple problem. However, the search time for a binary tree increases as the log2 of the size of the dictionary (more precisely, the search time increases linearly with the depth of the tree). For a simple linear structure (e.g., table or linked list), the search time increases linearly with the dictionary size. When the dictionary is millions of words, the time savings can be extraordinary. There are similar savings in the insertion and deletion times. The dynamic efficiency (execution speed) is enhanced at the cost of static efficiency (memory storage.) Checkpoint 6.18: Consider the problem of designing a large address book where each entry as a first name, a last name, and an address field. You wish to be able to search the data base both by first name and by last name. How do you organize the structure?

216

6.8

6 䡲 Pointers and Data Structures

Finite-State Machines with Statically Allocated Linked Structures 6.8.1 Abstraction

Software abstraction allows us to define a complex problem with a set of basic abstract principles. If we can construct our software system using these abstract building blocks, then we have a better understanding of both the problem and its solution. This is because we can separate what we are doing (policies) from the details of how we are getting it done (mechanisms.) This separation also makes it is easier to optimize. Abstraction provides for a proof of correct function, and simplifies both extensions and customization. The abstraction presented in this section is the Finite-State Machine (FSM.) The abstract principles of FSM development are the inputs, outputs, states, and state transitions. The FSM state graph defines the time-dependent relationship between its inputs and outputs. If we can take a complex problem and map it into a FSM model, then we can solve it with a simple FSM software tools. Our FSM software implementation will be easy to understand, debug, and modify. Other examples of software abstraction include Proportional Integral Derivative digital controllers, fuzzy logic digital controllers, neural networks, and linear systems of differential equations. In each case, the problem is mapped into well-defined model with a set of abstract yet powerful rules. Then, the software solution is a matter of implementing the rules of the model. In our case, once we prove our software correctly solves one FSM, then we can make changes to the state graph and be confident that our software solution correctly implements the new FSM. The FSM controller employs a well-defined model or framework with which we solve our problem. The state graph will be specified using either a linked or table data structure. An important aspect of this method is to create a one-to-one mapping from the state graph into the data structure. The three advantages of this abstraction are (1) it can be faster to develop because many of the building blocks preexist; (2) it is easier to debug (prove correct) because it separates conceptual issues from implementation; and (3) it is easier to change. In a Moore FSM, the output value depends only on the current state, and the inputs affect the state transitions. On the other hand, the outputs of a Mealy FSM depend both on the current state and the inputs. When designing a FSM, we begin by defining what constitutes a state. In a simple system like a single intersection traffic light, a state might be defined as the pattern of lights (i.e., which lights are on and which are off). In a more sophisticated traffic controller, what it means to be in a state might also include information about traffic volume at this and other adjacent intersections. The next step is to make a list of the various states in which the system might exist. As in all designs, we add outputs so the system can affect the external environment and inputs so the system can collect information about its environment or receive commands as needed. The execution of a Moore FSM repeats this sequence over and over: 1. 2. 3. 4.

Perform output, which depends on the current state Wait a prescribed amount of time (optional) Input Go to next state, which depends on the input and the current state

The execution of a Mealy FSM repeats this sequence over and over 1. 2. 3. 4.

Wait a prescribed amount of time (optional) Input Perform output, which depends on the input and the current state Go to next state, which depends on the input and the current state

There are other possible execution sequences. Therefore, it is important to document the sequence before the state graph is drawn. The high-level behavior of the system is defined by the state graph. The states are drawn as circles. Descriptive states names help explain what the machine is doing. Arrows are drawn from one state to another and labeled with the input value causing that state transition.

6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures

217

Observation: If the machine is such that a specific output value is necessary “to be a state”, then a Moore implementation will be more appropriate. Observation: If the machine is such that no specific output value is necessary “to be a state”, but rather the output is required to transition the machine from one state to the next, then a Mealy implementation will be more appropriate.

A linked structure consists of multiple identically structured nodes. Each node of the linked structure defines one state. One or more of the entries in the node is a pointer (or link) to other nodes. In an embedded system, we usually use statically allocated fixed-size linked structures, which are defined at compile time and exist through out the life of the software. In a simple embedded system, the state graph is fixed, so we can store the linked data structure in nonvolatile memory. For complex systems where the control functions change dynamically (e.g., the state graph itself varies over time), we could implement dynamically allocated linked structures, which are constructed at run time and where the number of nodes can grow and shrink in time. We can also use a table structure to define the state graph, which consists of contiguous multiple, identically structured elements. Each element of the table defines one state. One or more of the entries is an index to other elements. An important factor when implementing FSMs is that there should be a clear and one-to-one mapping between the FSM state graph and the data structure. I.e., there should be one element of the structure for each state. If each state has four arrows, then each node of the linked structure should have four links.

6.8.2 Moore FiniteState Machines

A Moore FSM has the outputs a function of only the current state. In constrast, the outputs are a function of both the input and the current state in a Mealy FSM. Often, in a Moore FSM, the specific output pattern defines what it means to be in the current state. In the first example, the inputs and outputs are simple binary numbers read from and written to a parallel port.

Example 6.6 Design a traffic-light controller for the intersection of two equally busy oneway streets. The goal is to maximize traffic flow, minimize waiting time at a red light, and avoid accidents. Solution The intersection has two one-ways roads with the same amount of traffic: North and East, as shown in Figure 6.19. Controlling traffic is a good example, because we all know what is supposed to happen at the intersection of two busy one-way streets. We begin the design defining what constitutes a state. In this system, a state describes which road has authority to cross the intersection. The basic idea, of course, is to prevent Southbound cars to enter the intersection at the same time as Westbound cars. In this system, the light pattern defines which road has right of way over the other. Since an output pattern to the lights is necessary to remain in a state, we will solve this system with a Moore FSM. It will have two inputs (car sensors on North and East roads) and six outputs (one for each light in the traffic Figure 6.19 Traffic light interface.

9S12

PT1 PT0 PT7 PT6 PT5 PT4 PT3 PT2

North R Y G

East R Y G

218

6 䡲 Pointers and Data Structures

signal.) The six traffic lights are interfaced to Port T bits 7 to 2 and the two sensors are connected to Port T bits 1 to 0, such that PT1 0, PT0 0 means no cars exist on either road PT1 0, PT0 1 means there are cars on the East road PT1 1, PT0 0 means there are cars on the North road PT1 1, PT0 1 means there are cars on both roads The next step in designing the FSM is to create some states. Again, the Moore implementation was chosen because the output pattern (which lights are on) defines which state we are in. Each state is given a symbolic name: goN, waitN, goE, waitE,

PT7 to 2 100001 makes it green on North and red on East PT7 to 2 100010 makes it yellow on North and red on East PT7 to 2 001100 makes it red on North and green on East PT7 to 2 010100 makes it red on North and yellow on East

The output pattern for each state is drawn inside the state circle. The time to wait for each state is also included. How the machine operates will be dictated by the input-dependent state transitions. We create decision rules defining what to do for each possible input and for each state. For this design we can list heuristics describing how the traffic light is to operate: If no cars are coming, we will stay in a green state, but which one doesn’t matter. To change from green to red, we will implement a yellow light of exactly 5 seconds. Green lights will last at least 30 seconds. If cars are only coming in one direction, we will move to and stay green in that direction. If cars are coming in both directions, we will cycle through all four states. Before we draw the state graph, we need to decide on the sequence of operations. 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Output to traffic lights, which depends on the state b) Delay, which depends on the state c) Input from sensors d) Change states, which depends on the state and the input We implement the heuristics by defining the state transitions, as illustrated in Figure 6.20. Instead of using a graph to define the finite-state machine, we could have used a table, as shown in Table 6.4.

Figure 6.20 Graphical form of a Moore FSM that implements a traffic light.

Next if input is 01 or 11 00,10

goN 100001 30

00,01, 10,11

01,11

waitN 100010 5

Wait time

Table 6.4 Tabular form of a Moore FSM that implements a traffic light.

00,01

goE 001100 30 00,01,10,11

01,11 waitE 010100 5

Output

State \ Input

00

01

10

11

goN (100001,30) waitN (100010,5) goE (001100,30) waitE (010100,5)

goN goE goE goN

waitN goE goE goN

goN goE waitE goN

waitN goE waitE goN

6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures

219

The next step is to map the FSM graph onto a data structure that can be stored in EEPROM. Program 6.22 uses a linked data structure, where each state is a node, and state transitions are defined as pointers to other nodes. The four Next parameters define the input-dependent state transitions. The wait times are defined in the software as fixed-point decimal numbers with units of 0.01 seconds, giving a range of 10 ms to about 10 minutes. Using good labels makes the program easier to understand; in other words, goN is more descriptive than &fsm[0]. The main program begins by specifying the Port T bits 1 and 0 to be inputs. The initial state is defined as goN. The main loop of our controller first outputs the desired light pattern to the six LEDs, waits for the specified amount of time, reads the sensor inputs from Port T, then switches to the next state depending on the input data. The timer functions were presented earlier as Program 4.5. The function Timer_Wait10ms will wait 10 ms times the parameter in RegY, and not destroy Registers D or X. We could have eliminated the two shift-left instructions by storing the data in the structure already shifted.

;Linked data structure org $4000 ;Put in ROM OUT equ 0 ;offset for output WAIT equ 1 ;offset for time NEXT equ 3 ;offset for next goN fcb $21 ;North green, East red fdb 3000 ;30sec fdb goN,waitN,goN,waitN waitN fcb $22 ;North yellow, East red fdb 500 ;5sec fdb goE,goE,goE,goE goE fcb $0C ;North red, East green fdb 3000 ;30 sec fdb goE,goE,waitE,waitE waitE fcb $14 ;North red, East yellow fdb 500 ;5sec fdb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldx #goN ;State pointer FSM ldab OUT,x ;Output value lslb lslb ;line up with 7-2 stab PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldab PTT ;Read input andb #$03 ;just bits 1,0 lslb ;2 bytes/address abx ;add 0,2,4,6 ldx NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector

// Linked data structure const struct State { unsigned char Out; unsigned short Time; const struct State *Next[4];}; typedef const struct State STyp; #define goN &FSM[0] #define waitN &FSM[1] #define goE &FSM[2] #define waitE &FSM[3] STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ STyp *Pt; // state pointer unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors Pt = goN; while(1){ PTT = Pt->OutNext[Input]; } }

Program 6.22 Linked data structure implementation of the traffic-light controller.

220

6 䡲 Pointers and Data Structures

Program 6.23 implements the same traffic-light controller using a table data structure. In the linked data structure implementation, the Next parameters contained 16-bit pointers to the next state. In the table implementation, the Next parameters contain 8-bit indices specifying the index of the next state. In this machine, the Next field will be 0, 1, 2, or 3. Although each state only requires 7 bytes of storage, 8 bytes will be allocated to simplify the address calculations (it is easier to multiply by 8 than to multiply by 7). ;Table structure org $4000 ; Put in ROM OUT equ 1 ;offset for output WAIT equ 2 ;offset for time NEXT equ 4 ;offset for next goN equ 0 ;North green, East red Fsm fdb $21,3000 ;30sec fcb goN,waitN,goN,waitN waitN equ 1 ;North yellow, East red fdb $22,500 ;5sec fcb goE,goE,goE,goE goE equ 2 ;North red, East green fdb $0C,3000 ;30 sec fcb goE,goE,waitE,waitE waitE equ 3 ;North red, East yellow fdb $14,500 ;5sec fcb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldab #goN ;State number n FSM ldx #Fsm tba lsla lsla lsla ;8*n leax a,x ;Fsm[n] ldaa OUT,x ;Output value lsla lsla ;line up with 7-2 staa PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldaa PTT ;Read input anda #$03 ;just bits 1,0 leax a,x ;add 0,1,2,3 ldab NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector

// Table implementation const struct State { unsigned char Out; unsigned short Time; unsigned char Next[4];}; typedef const struct State STyp; #define goN 0 #define waitN 1 #define goE 2 #define waitE 3 STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ unsigned char n; // state number unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors n = goN; while(1){ PTT = FSM[n].Out state PTT #$01 ;0,1 ;0,2 Time,X ;Time to wait a,x Out,x PORTAB ;start motors Timer_Wait1ms ;wait in ms #0,PORTAB ;stop motors Next,X ;next loop $FFFE main

struct State{ unsigned short Time; // wait in ms unsigned short Out[2]; // if input=0,1 struct State *Next[2]; // if input=0,1 }; typedef struct State StateType; typedef StateType * StatePtr; #define Trot1 &fsm[0] #define Trot2 &fsm[1] #define Trot3 &fsm[2] #define Trot4 &fsm[3] StateType fsm[4]={ {500,{0,0x8484},{ Trot1, Trot2}}, {500,{0,0x2121},{ Trot2, Trot3}}, {500,{0,0x4848},{ Trot3, Trot4}}, {500,{0,0x1212},{ Trot4, Trot1}} }; void main(void){ StatePtr Pt; // Current State unsigned char Input; Pt = Trot1; // Initial State DDRA = 0xFF; // Right legs DDRB = 0xFF; // Left legs DDRT &= ~0x01; // Trot switch Timer_Init(); while(1){ Input = PTT&0x01; // 0 or 1 PORTAB = Pt->Out[Input]; // output Timer_Wait1ms(Pt->Time); // wait PORTAB = 0; // motors off Pt = Pt->Next[Input]; // next } }

Program 6.24 Mealy FSM.

6.8.4 Functional Abstraction within FiniteState Machines

In the previous examples, the input was obtained by simply reading a parallel port. Similarly, the output was performed by writing to a parallel port. However, finite-state machines can be used in systems where the input and output processes are more complex. In this section, we will develop FSMs where the input is obtained by calling a function, which returns a number to be used by the FSM controller. Similarly, the output process will involve calling a function. The use of function calls adds a layer of abstraction between the high-level FSM and the low-level I/O occurring at the ports.

224

6 䡲 Pointers and Data Structures

Example 6.8 Design a vending machine with two outputs (soda, change) and two inputs (dime, nickel). Solution This vending machine example illustrates additional flexibility that we can build into our FSM implementations. In particular, rather than simple digital inputs, we will create an input subroutine that returns the current values of the inputs. Similarly, rather than simple digital outputs, we will implement general functions for each state. We could have solved this particular vending machine using the approach in the previous examples, but this approach provides an alternative mechanism when the input and/or output operations become complex. Our simple vending machine has two coin sensors: one for dimes and one for nickels. When a coin falls through a slot in the front of the machine, an electrical connection (modeled by a SPST switch) makes a connection between 5 V and a Port A input, as in Figure 6.23. If the digital input is high (1), this means there is a coin currently falling through the slot. When a coin is inserted into the machine, the sensor goes high, then low. Because of the nature of vending machines, we will assume there can not be both a nickel and a dime at the same time. To create the soda and change dispensers, we will interface two solenoids to Port B. The coil current of the solenoids is less than 40 mA, so we can use the 7406 open collector driver. For example, if the software makes PB0 high, waits 10 ms, then makes PB0 low, one soda will be dispensed. Figure 6.23 A simulated vending machine interfaced to a Freescale 9S12.

9S12

10kΩ

Input PA1 Port PA0

dime nickel

+5 +5

7406

10kΩ +12 1N914

Solenoid change

Output Port PB1 +12 1N914

Solenoid soda

PB0

We need to decide on the sequence of operations before we draw the state graph: 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Call an output function, which depends on the state b) Delay, which depends on the state c) Call an input subroutine to get the status of the coin sensors d) Change states, which depends on the state and the input Figure 6.24 shows the Moore FSM that implements the vending machine. A soda costs 15 cents, and the machine accepts nickels and dimes. We have an input sensor to detect nickels (bit 0) and an input sensor to detect dimes (bit 1.) We choose the wait time in each state to be 20 ms, which is smaller than the time it takes the coin to pass by the sensor. Waiting in each state will debounce the sensor, preventing multiple counting of a single event. Notice that we wait in all states, because the sensor may bounce both on touch and release. Each state also has a function to execute. The function Soda will trigger the Port B output so that a soda is dispensed. Similarly, the function Change will trigger the Port B output so that a nickel is returned. The M states refer to the amount of collected money. When we are in a W state, we have collected that much money, but we’re still waiting for the last coin to pass the sensor. For example, we start with no money in state M0. If we insert a dime, the input will

6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures

225

go 102, and our state machine will jump to state W10. We will stay in state W10 until the dime passes by the coin sensor. In particular when the input goes to 00, then we go to state M10. If we insert a second dime, the input will go 102, and our state machine will jump to state W20. Again, we will stay in state W20 until this dime passes. When the input goes to 00, then we go to state M20. Now we call the function change and jump to state M15. Lastly, we call the function Soda and jump back to state M0. Figure 6.24 This Moore FSM implements a vending machine.

00

10 01 M0 20 none

01,10

Function 00,01,10 Wait time M15 20 soda

W5 20 none

01,10

00 00

01

M5 20 none 10

00,01,10 M20 20 change

W10 20 none

00 M10 20 none 10 01

00

01,10 00 W20 20 none

W15 20 00 none 01,10

Since this is a layered system, we will begin by designing the low-level input/output functions that handle the operation of the sensors and solenoid, as in Program 6.25. Coin_Init bclr DDRA,#$03 ;PA1,0 sensor in rts Coin_Input ;0 means none ldaa PORTA ;1 means nickel anda #$03 ;2 means dime rts Solenoid_Init bset DDRB,#$03 ;PB1,0 solenoid out rts Solenoid_None rts Solenoid_Soda bset PORTB,#$01 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$01 ;deactivate rts Solenoid_Change bset PORTB,#$02 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$02 ;deactivate rts

void Coin_Init(void){ DDRA &= ~0x03; // PA1,0 sensor in } unsigned char Coin_Input(void){ return PORTA&0x03; } void Solenoid_Init(void){ DDRB |= 0x03; // PB1,0 solenoid out } void Solenoid_None(void){ }; void Solenoid_Soda(void){ PORTB |= 0x01; // activate solenoid Timer_Wait(10000); // 10 msec PORTB &= ~0x01; // deactivate } void Solenoid_Change(void){ PORTB |= 0x02; // activate solenoid Timer_Wait(10000); // 10 msec PORTB &= ~0x02; // deactivate }

Program 6.25 Low-level input/output functions for the vending machine.

The main program, Program 6.26, begins by specifying the Port A bits 1 and 0 to be inputs. The initial state is defined as M0. Our controller software first calls the function for this state, waits for the specified amount of time, reads the sensor inputs from PORTA, then switches to the next state depending on the input data. The Timer_Wait function is defined previously. Notice again the one-to-one correspondence between the state graph in Figure 6.24 and the data structure in Program 6.26.

226 CmdPt Time Next M0 W5 M5 W10 M10 W15 M15 W20 M20 main

6 䡲 Pointers and Data Structures equ equ equ fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb lds jsr jsr jsr ldx

0 ;output function 2 ;wait time 4 ;3 pointers to next Solenoid_None,20000 M0,W5,W10 Solenoid_None,20000 M5,W5,W5 Solenoid_None,20000 M5,W10,W15 Solenoid_None,20000 M10,W10,W10 Solenoid_None,20000 M10,W15,W20 Solenoid_None,20000 M15,W15,W15 Solenoid_Soda,20000 M0,M0,M0 Solenoid_None,20000 M20,W20,W20 Solenoid_Change,20000 M15,M15,M15 #$4000 Coin_Init Solenoid_Init Timer_Init #M0 ;Initial State

loop jsr [CmdPt,x] ldd Time,x jsr Timer_Wait jsr Coin_Input lsla leax a,x ldx Next,x bra loop

;output ;wait ;0,1,2 ;0,2,4 ;next state

const struct State { void (*CmdPt)(void); // output unsigned short Time; // wait time const struct State *Next[3];}; typedef const struct State StateType; #define M0 &fsm[0] #define W5 &fsm[1] #define M5 &fsm[2] #define W10 &fsm[3] #define M10 &fsm[4] #define W15 &fsm[5] #define M15 &fsm[6] #define W20 &fsm[7] #define M20 &fsm[8] StateType fsm[9]={ {&Solenoid_None, // M0 20000,{M0,W5,W10}}, {&Solenoid_None, // W5 20000,{M5,W5,W5}}, {&Solenoid_None, // M5 20000,{M5,W10,W15}}, {&Solenoid_None, // W10 20000,{M10,W10,W10}}, {&Solenoid_None, // M10 20000,{M10,W15,W20}}, {&Solenoid_None, // W15 20000,{M15,W15,W15}}, {&Solenoid_Soda, // M15 20000,{M0,M0,M0}}, {&Solenoid_None, // W20 20000,{M20,W20,W20}}, {&Solenoid_Change, // M20 20000,{M15,M15,M15}}}; void main(void){ StateType *Pt; unsigned char Input; Coin_Init(); Solenoid_Init(); Timer_Init(); Pt = M0; // Initial State while(1){ (*Pt->CmdPt)(); // output Timer_Wait(Pt->Time); // wait Input = Coin_Input(); // 0,1,2 Pt = Pt->Next[Input]; // next } }

Program 6.26 Vending machine controller.

The next example involves a Mealy FSM with both the input and output processes being performed using function calls. The example also abstracts the high-level FSM from the low-level I/O.

Example 6.9 Design a robot that sits, stands, and lies down (depending on its mood, which can be OK, tired, curious, or anxious).

6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures

227

Solution The goal of this section is to design a robot controller, as illustrated in Figure 6.25. We begin the design defining what constitutes a state. In this system, a state describes the position of robot: standing, sitting, or sleeping. Since the outputs are necessary to cause a change in state, we will solve this system with a Mealy FSM. Rather than generate the output as a simply write to a port, the outputs on this robot will be defined as abstract functions, which perform a sequence of operations as needed to complete the task. The output functions are None, it performs no movement SitDown, assuming the robot is standing, it will perform a sequence of moves to sit down StandUp, assuming the robot is sitting, it will perform a sequence of moves to stand up LieDown, assuming the robot is sitting, it will perform a sequence of moves to lie down SitUp, assuming the robot is sleeping, it will perform a sequence of moves to sit up This robot has mood sensors, which are read and processed at the low level. There is an abstract input function, called Sensor_Input, which returns one of four possible conditions 00 01 10 11

OK, the robot is feeling fine Tired, the robot energy levels are low Curious, the robot senses activity around it Anxious, the robot senses danger

Before we draw the state graph, we need to decide on the sequence of operations: 1. Initialize inputs and outputs 2. Specify initial state 3. Perform FSM controller a) Call the Sensor_Input function to determine the current mode b) Call the appropriate robot output function, which depends on the input and the state c) Change states, which depends on the state and the input Figure 6.25 Robot interface. Inputs 9S12

Outputs

The outputs (which output function to call) depend on both the input and the current state. For this design, we can list heuristics describing how the robot is to operate: If the robot is OK, we will stay in whichever state we are currently in. If the robot’s energy levels are low (tired), it will go to sleep. If the robot senses activity around it (curious), it will awaken from sleep. If the robot senses danger (anxious), it will stand up. These rules are converted into a finite-state machine graph, as shown in Figure 6.26. Each arrow specifies both an input and an output. For example, the “Tired/SitDown” arrow from Standing to Sitting states means if we are in the Standing state and the input is Tired, then we will call the SitDown function and go to the Sitting state. Mealy machines can have time delays, this example just didn’t have time delays. The next step is to define the FSM graph using a linked data structure. Program 6.27 shows the implementation of the Mealy FSM using abstract functions to perform the input and output. Pointers to the functions are stored in the output field of the data structrure. Similar to the other FSM implementations, the four Next parameters define the input-dependent state transitions.

228

6 䡲 Pointers and Data Structures

Figure 6.26 Mealy FSM for a robot controller.

Tired/SitDown Curious/None Anxious/None OK/None

Tired/LieDown Tired/None OK/None

Curious/None OK/None Standing

Sitting

Anxious/StandUp

;Input/output defined as functions org $4000 ;EEPROM Out equ 0 ;Pointers to functions Next equ 8 ;Next states Standing fdb None,SitDown,None,None fdb Standing,Sitting,Standing,Standing Sitting fdb None,LieDown,None,StandUp fdb Sitting,Sleeping,Sitting,Standing Sleeping fdb None,None,SitUp,SitUp fdb Sleeping,Sleeping,Sitting,Sitting Main

LL

lds jsr ldx jsr lsla leax jsr ldx bra org fdb

#$4000 Robot_Init #Standing ;current state Sensor_Input ;0,1,2,3 ;0,2,4,6 a,x ;Base+2*input [Out,x] ;Call output function Next,x LL ;Infinite loop $FFFE Main ;reset vector

Sleeping Anxious/SitUp Curious/SitUp

// Input/outputs defined as functions const struct State{ void (*CmdPt)[4](void); // outputs const struct State *Next[4]; // Next }; typedef const struct State StateType; #define Standing &fsm[0] #define Sitting &fsm[1] #define Sleeping &fsm[2] StateType FSM[3]={ {{&None,&SitDown,&None,&None}, //Standing {Standing,Sitting,Standing,Standing}}, {{&None,&LieDown,&None,&StandUp},//Sitting {Sitting,Sleeping,Sitting,Standing }}, {{&None,&None,&SitUp,&SitUp}, //Sleeping {Sleeping,Sleeping,Sitting,Sitting}} }; void main(void){ StatePtr *Pt; // Current State unsigned char Input; Robot_Init(); // initialize hardware Pt = Standing; // Initial State while(1){ Input = Sensor_Input(); // Input=0-3 (*Pt->CmdPt[Input])(); // function Pt = Pt->Next[Input]; // next state } }

Program 6.27 A Mealy FSM implemented with functional abstraction.

6.9

*Dynamically Allocated Data Structures In order to reuse memory and provide for efficient use of RAM, we need dynamic memory allocation. The previous examples in this chapter used fixed allocation, meaning the size of the data structures are decided in advance and specified in the source code. In addition, the location of these structures is determined by the assembler at assembly time. With a dynamic allocation, the size and location will be determined at run time. To implement dynamic allocation, we will manage a heap. The heap is a chunk of RAM that is 1. Dynamically allocated by the program when it creates the data structure 2. Used by the program to store information 3. Dynamically released by the program when the structure is no longer needed

6.9 䡲 *Dynamically Allocated Data Structures

229

The heap manager provides the system with two operations: pt = malloc(size); // returns a pointer to a block of size bytes free(pt); // deallocates the block at pt

The implementation of this general memory manager is beyond the scope of this book. Instead, we will develop a very useful, but simple, heap manager with these two operations: pt = Heap_Allocate(); Heap_Release(pt);

6.9.1 *Fixed-Block Memory Manager

Figure 6.27 The initial state of the heap has all of the free blocks linked in a list.

// returns a pointer to a block of fixed size // deallocates the block at pt

In general, the heap manager allows the program to allocate a variable block size, but in this section, we will develop a simplified heap manager handles just fixed size blocks. In this example, the block size is specified by SIZE. The initialization will create a linked list of all the free blocks (Figure 6.27). FreePt

null

Program 6.28 shows the global structures for the heap. These entries are defined in RAM. SIZE is the number of 8-bit bytes in each block. All blocks allocated and released with this memory manager will be of this fixed size. NUM is the number of blocks to be managed. FreePt points to the first free block. Program 6.28 Private global structures for the fixed-block memory manager.

SIZE NUM NULL FreePt Heap

equ equ equ rmb rmb

4 5 0 2 SIZE*NUM

#define SIZE 4 #define NUM 5 #define NULL 0 // empty pointer char *FreePt; char Heap[SIZE*NUM];

Initialization must be performed before the heap can be used. Program 6.29 shows the software that partitions the heap into blocks and links them together. FreePt points to a linear linked list of free blocks. Initially, these free blocks are contiguous and in order, but as the manager is used, the positions and order of the free blocks can vary. It will be the pointers that will thread the free blocks together.

Heap_Init ldx stx ldab imLoop pshx puly aby sty abx cpx bne ldy sty rts

#Heap FreePt #SIZE

;FreePt=&Heap[0];

;RegY = pt; ;pt+SIZE 0,x ;*pt=pt+SIZE; ;pt=pt+SIZE; #Heap+SIZE*(NUM-1) imLoop #NULL 0,x ;*pt=NULL;

Program 6.29 Functions to initialize the heap.

void Heap_Init(void){ char *pt; FreePt = &Heap[0]; for(pt=&Heap[0]; pt!=&Heap[SIZE*(NUM-1)]; pt=pt+SIZE){ *(short*)pt =(short)(pt+SIZE); } *(short*)pt = NULL; }

230

6 䡲 Pointers and Data Structures

To allocate a block to manager just removes one block from the free list, see Program 6.30. The Heap_Allocate function will fail and return a null pointer when the heap becomes empty. The Heap_Release returns a block to the free list. This system does not check to verify a released block actually was previously allocated. ; returns RegX points to new block ; RegX=NULL if no more available Heap_Allocate ldx FreePt ;pt=FreePt; cpx #NULL beq aDone ;if (pt!=NULL) ldy 0,x sty FreePt ;FreePt=*pt; aDone rts ; RegX => block being released Heap_Release ldy FreePt ;oldFreePt=FreePt; stx FreePt ;FreePt=pt; sty 0,x ;*pt=oldFreePt; rts

void char pt if

*Heap_Allocate(void){ *pt; = FreePt; (pt != NULL){ FreePt = (char*) *(char**)pt;

} return(pt); } void Heap_Release(void *pt){ char *oldFreePt; oldFreePt = FreePt; FreePt = (char*)pt; *(short*)pt = (short)oldFreePt; }

Program 6.30 Functions to allocate and release memory blocks. Checkpoint 6.20: Consider a system that needs variable-size memory allocation, where the size can range from 2 to a maximum of 20 bytes. How might this simple heap be used?

6.9.2 *Linked List FIFO

Next Data

equ equ

0 2

An example application of a dynamically allocated data structure is a FIFO. In this structure, GetPt points to the oldest node (the one to get next), and PutPt points to the newest node: the place to add more data. The pointer for the newest node (if it exists) is a null. The Fifo_Put operation fails (full) when the heap runs out of space. The Fifo_Get operation fails (empty) when GetPt equals NULL. Program 6.31 shows the global variables defined in RAM. Figure 6.28 shows an example FIFO with three elements (after running with lots of putting and getting). In this example, element 1 is the oldest because it was put first. This system uses Programs 6.28, 6.29, and 6.30 with SIZE equal to 4 bytes.

;next ;16-bit data for node

struct Node{ struct Node *Next; short Data; }; typedef struct Node NodeType; typedef NodeType *NodePtr; NodePtr PutPt; // place to put NodePtr GetPt; // place to get

GetPt rmb 2 ; GetPt is pointer to oldest node PutPt rmb 2 ; PutPt is pointer to newest node

Program 6.31 Definition of the linked list structure. Figure 6.28 A linked list FIFO after putting 1,2,3.

PutPt FreePt

GetPt null 3

null 2

1

Program 6.32 shows the three functions which implement the FIFO. Figure 6.29 is a flowchart of the Put and Get functions. The FIFO is full only when the heap is full

6.9 䡲 *Dynamically Allocated Data Structures

231

(Heap_Allocate returns a failure). The Put operation first allocates space for the new entry, then stores the new information into the Data field. Since this element will be last, its Next field is set to null. The last part of Put links this new node at the end of the linked list. The Get function first checks to make sure the FIFO is not empty. Next, the Data field is retrieved from the node. This node is then unlinked from the linked list, and the memory block is released to the heap. There is a special case that handles the situation where you get the one remaining node in the linked list. In this case both PutPt and GetPt point to this node. When you get this node, both PutPt and GetPt are set to null, signifying the FIFO is now empty. Fifo_Init ldx #NULL stx GetPt ;GetPt=NULL stx PutPt ;PutPt=NULL jsr Heap_Init rts ; Inputs: RegD data to put ; Outputs: V=0 if successful ; V=1 if unsuccessful Fifo_Put jsr Heap_Allocate cpx #NULL beq Pful ;skip if full std Data,x ;store data ldy #NULL sty Next,x ;next=NULL ldy PutPt cpy #NULL ;previously MT? beq PMT stx Next,y ;link to previous bra PCon PMT stx GetPt ;Now one entry PCon stx PutPt ;points to newest clv ;success bra PDon PFul sev ;failure, full PDon rts ; Inputs: none ; Outputs: RegD data removed ; V=0 if successful ; V=1 if empty Fifo_Get ldx GetPt cpx #NULL beq GMT ;empty if NULL ldd Data,x ;read ldy Next,x ;pointer to next sty GetPt cpy #NULL bne GCon sty PutPt ;Now empty GCon sty GetPt ;points to oldest jsr Heap_Release clv ;success bra GetDone GMT sev ;failure, empty GDon rts

Program 6.32 Implementation of the linked list FIFO.

void Fifo_Init(void){ GetPt = NULL; // Empty when null PutPt = NULL; Heap_Init(); } int Fifo_Put(short theData){ NodePtr pt; pt = (NodePtr)Heap_Allocate(); if(!pt){ return(0); // full } pt->Data = theData; // store pt->Next = NULL; if(PutPt){ PutPt->Next = pt; // Link } else{ GetPt = pt; // first one } PutPt = pt; return(1); // successful }

int Fifo_Get(short *datapt){ NodePtr pt; if(!GetPt){ return(0); // empty } *datapt = GetPt->Data; pt = GetPt; GetPt = GetPt->Next; if(GetPt==NULL){ // one entry PutPt = NULL; } Heap_Release(pt); return(1); // success }

232

6 䡲 Pointers and Data Structures

Figure 6.29 Flowcharts of a linked list FIFO Put and Get operations.

Put

Get

pt=Heap_Allocate()

GetPt

valid

full

store data at pt->Data

return(0)

PutPt

fetch data at GetPt->Data pt = GetPt

NULL first element

GetPt valid

GetPt = pt

PutPt->Next = pt

return(0)

GetPt = GetPt->Next

pt->Next = NULL valid

empty

valid

NULL

pt

NULL

NULL now, it is empty PutPt = NULL

PutPt = pt

Heap_Allocate(pt)

return(1)

return(1)

Checkpoint 6.21: Draw a picture like Figure 6.28 of a doubly linked list. How might this more complicated structure be more efficient than the single linked list?

6.10

*9S12 Paged Memory 16-bit pointers can only access up to 64 KiB of memory. The 9S12 uses a paged memory system to access memory beyond this 64 KiB barrier. On most of the 9S12 microcontrollers, the extended address contains 20 bits and thus can access up to 1 Mbytes of memory. The paged memory system is organized into a maximum of 64 pages with a fixed page size of 16 KiB. The software must first write the page number into PPAGE, which is an 8-bit register located at $0030 (only the bottom 6 bits are used). On the 9S12, addresses in the $8000 to $BFFF window invoke the paged memory system. The top 6 bits of the 20-bit extended address are retreived from the PPAGE register, and the bottom 14 bits come from the regular 16-bit address, as shown in Figure 6.30. In particular, when the software accesses any address in the $8000 to $BFFF window, the bottom 6 bits of PPAGE are concatenated to the bottom 14 bits of the window address to create the 20-bit extended address used to access memory. This logical to physical address

Figure 6.30 The address is comprised of two components.

PPAGE 0 0 PIX5 PIX4 PIX3 PIX2 PIX1 PIX0 PIX5 PIX4 PIX3 PIX2 PIX1 PIX0 a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0

10 $8000 to $BFFF

a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0

20-bit address

6.10 䡲 *9S12 Paged Memory

233

translation occurs automatically whenever an address in the $8000 to $BFFF window is accessed. On the 9S12DP512, the full 512 KiB of flash EEPROM can only be accessed using this paged memory system. On the 9S12DP512, there are only 32 pages needed for the 512-KiB flash EEPROM. In particular, it utilizes page numbers $20 through $3F. Page $3E is actually the same as regular EEPROM at $4000 to $7FFF, and page $3F is the same as EEPROM at $C000 to $FFFF. Observation: If the software sets and leaves PPAGE at $20 (actually any constant value from $20 to $3D), then the EEPROM behaves like a simple 48 KiB memory from $4000 to $FFFF.

We will present two applications of paged memory. In this first application, the flash EEPROM on the 9S12DP512 will contain a 256 KiB data buffer. Because these data are located in EEPROM, we will consider them as constant and provide a function to access the data. The buffer will be accessed using a single 18-bit linear address and passed into the subroutine in registers B and X. In this example, we assume the system’s executable object code fits entirely in the 32 KiB space $4000 to $7FFF, $C000 to $FFFF. The 256 KiB buffer will be stored into 16 pages from $20 to $2F. The subroutine, shown as Program 6.33, first sets the PPAGE register to select the correct page, then reads from the $8000 to $BFFF window to retrieve the specified data.

;****Buf_Read******* ;Read byte from buffer ;Input B:X is 18-bit linear address ;Output A is data Buf_Read pshx xgdx lsld xgdx rolb ;addr14) stab PPAGE puld anda #$3F ;D=addr&$3FFF adda #$80 ;D=$8000+addr&$3FFF tfr d,x ;X=$8000+addr&$3FFF ldaa 0,x ;A=data from buffer rts

// Read byte from buffer unsigned char Buf_Read(unsigned long addr){ unsigned char *pt; unsigned char page; page = (unsigned char)(addr>>14); PPAGE = 0x20+(page&0x0F); pt = (unsigned char *)(addr&0x3FFF); return (*pt); }

Program 6.33 A 256 Kibibyte data buffer implemented in paged memory.

The second application implements a system with a code size of more then 48 KiB. In this system, we will partition the code into separate 16 Kibibyte pieces. The system will be most efficient if the partitioning is done according to access probability. In other words, if module A frequently calls module B, then A and B will be placed into the same 16 KiB page. We will place the most frequently used code and the starting location into the pages $4000 to $7FFF and $C000 to $FFFF. Accessing these locations is simple and uses standard 16-bit pointers. We place the remaining code into paged memory. Subroutine calls within the same page can utilize

234

6 䡲 Pointers and Data Structures

Figure 6.31 The call instruction is used to call a subroutine in paged memory.

before CALL PC $81

after CALL

Stack

PC

$01

Stack

$97

$6C

PPAGE

PPAGE

SP

$02 $20

$02 $21

$20 $81 $05

SP

top

PC Page2 $80101 call sub,#$21 $80105

PC Page2 $80101 call sub,#$21 $80105

Page3 $8576C sub inca $8576D rtc

Page3 $8576C sub inca $8576D rtc

the standard bsr and jsr instructions. To call a subroutine located in a different page, the call instruction is used. Figure 6.31 shows the stack before and after the call instruction is executed on the 9S12DP512. The call instruction pushes the old PPAGE and PC values on the stack and then loads PPAGE and PC with the address of the subroutine. When op codes are fetched from the $8000 to $BFFF window, the 6-bit PPAGE is combined with the lower 14 bits of the PC to form a 20-bit address. The translation occurs automatically in hardware. Consider the case where the PPAGE register equals $20 and the PC is $8101 (left picture of Figure 6.32). PPAGE = $20 = 00100000 PC = $8101 = 1000000100000001 PPAGE + Lower 14 bits of PC = 100000+00000100000001 = $80101

After the call instruction, PPAGE register equals $21, and the PC is $976C (right picture of Figure 6.32). PPAGE = $21 = 00100001 PC = $976C = 1001011101101100 PPAGE + Lower 14 bits of PC = 100001+01011101101100 = $8576C

The rtc instruction will return to the program that called the subroutine. Both the PPAGE and PC values are pulled off the stack. Figure 6.32 shows the stack before and after execution of the rtc instruction. Figure 6.32 The rtc instruction is used to return from a subroutine in paged memory.

before RTC PC

after RTC

Stack

$97

$6D

PPAGE

SP

$02 $21

PC $81

$20

PPAGE

$81

$02 $20

Stack $05

$05

SP PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc

top

PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc

6.11 䡲 Functional Debugging

235

Programs 6.34, 6.35, and 6.36 illustrate the use of call and rtc to create a paged memory system on the 9S12. Program 6.34 will be programmed into main EEPROM.

Program 6.34 Main memory programs for this paged memory system.

func1 equ func2 equ org main lds clra loop call call call call bra

0 3 $4000 #$4000

; ; ; ;

func1,#$21 func1,#$22 func2,#$21 func2,#$22 loop

relative offset in paged memory relative offset in paged memory main EEPROM memory stack in main RAM ; ; ; ;

call call call call

function function function function

1 1 2 2

in in in in

page page page page

$21 $22 $21 $22

(add (add (add (add

1) 2) 3) 4)

Program 6.35 will be programmed into external page $21.

Program 6.35 Page $21 programs for this paged memory system.

fun1 fun2

org lbra lbra adda rtc adda rtc

$0000 fun1 fun2 #1

; page $21 external memory ; link to actual function ; link to actual function

#2

Program 6.36 will be programmed into external page $22.

Program 6.36 Page $22 programs for this paged memory system.

fun1 fun2

org lbra lbra adda rtc adda rtc

$0000 fun1 fun2 #3

; page $22 external memory ; link to actual function ; link to actual function

#4

The TExaS simulator does not support external paged memory, but it will execute the call and rtc instructions similar to regular jsr rts subroutine.

6.11

Functional Debugging

6.11.1 Instrumentation: Dump Into Array Without Filtering

As mentioned in the last chapter, one of the difficulties with print statements are that they can significantly slow down the execution speed in real-time systems. Many times the bandwidth of the print functions can not keep pace with the existing system. For example, our system may wish to call a function 1000 times a second (or every 1 ms). If we add print statements to it that require 50 ms to perform, the presence of the print statements will significantly affect the system operation. In this situation, the print statements would be considered extremely intrusive. Another problem with print statements

236

6 䡲 Pointers and Data Structures

occurs when the system is using the same output hardware for its normal operation, as is required to perform the print function. In this situation, debugger output and normal system output are intertwined. To solve both these situations, we can add a debugger instrument that dumps strategic information into an array at run time. We can then observe the contents of the array at a later time. One of the advantages of dumping is that the 9S12 BDM debugger module allows you to visualize memory even when the program is running. So this technique will be quite useful in systems connected to a debugger. Assume happy and sad are strategic 8-bit variables. The first step when instrumenting a dump is to define a buffer in RAM to save the debugging measurements.

#define SIZE 20 unsigned char Buffer[2*SIZE]; unsigned char Cnt;

SIZE equ 20 Buffer rmb SIZE*2 Cnt rmb 1

The Cnt will be used to index into the buffers. Cnt must be initialized to zero, before the debugging begins. The debugging instrument, shown in Program 6.37, saves the strategic variables into the Buffer.

Program 6.37 Instrumentation dump.

Save pshb pshx ldab cmpb beq ldx movb incb movb incb stab done pulx pulb rts

;save Cnt #SIZE*2 ;full? done #Buffer happy,B,X ;save happy sad,B,X

void Save(void){ if(Cnt < SIZE*2){ Buffer[Cnt] = happy; Cnt++; Buffer[Cnt] = sad; Cnt++; } }

;save sad

Cnt

Next, you add jsr Save statements at strategic places within the system. You can either use the debugger to display the results or add software that prints the results after the program has run and stopped. Observation: You should save registers at the beginning and restore them back at the end, so the debugging instrument itself doesn’t cause the software to crash.

6.11.2 Instrumentation: Dump Into Array With Filtering.

One problem with dumps is that they can generate a tremendous amount of information. If you suspect a certain situation is causing the error, you can add a filter to the instrument. A filter is a software/hardware condition that must be true in order to place data into the array. In this situation, if we suspect the error occurs when another variable gets large, we could add a filter that saves in the array only when the variable is above a certain value. In the example shown in Program 6.38, the instrument saves the strategic variables into the buffer only when sad is greater than 100.

6.12 䡲 Tutorial 6 Software Abstraction Program 6.38 Instrumentation dump with filter.

6.12

Save pshb pshx ldab cmpb ble ldab cmpb beq ldx movb incb movb incb stab done pulx pulb rts

;save sad #100 ;save only done ;when sad >100 Cnt #SIZE*2 ;full? done #Buffer happy,B,X ;save happy sad,B,X

237

void Save(void){ if(sad > 100){ if(Cnt < SIZE*2){ Buffer[Cnt] = happy; Cnt++; Buffer[Cnt] = sad; Cnt++; } } }

;save sad

Cnt

Tutorial 6 Software Abstraction The purpose of this tutorial is to evaluate two stepper motor interfaces. Tutor6a.rtf spins a stepper motor using the switch statement. Tutor6b.rtf spins a stepper motor using a linked structure. You first will be asked to calculate the execution speed for each example. Then, you will study its ease of modification by adding additional states to the system. Action: Open and assemble the switch statement program Tutor6a.rtf. Question 6.1 What is the static efficiency of the step subroutine in the Tutor6a.rtf system in ROM bytes? Action: Run the Tutor6a.rtf system and observe the stepper motor signals. Question 6.2 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.3 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.4 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Action: Open and assemble the linked-structure program Tutor6b.rtf. Question 6.5 What is the static efficiency of the linked structure and the step subroutine in the Tutor6b.rtf system in ROM bytes? Action: Run the Tutor6b.rtf system and observe the stepper motor signals. Question 6.6 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.7 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.8 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Comment on the differences between the two approaches.

238

6.13

6 䡲 Pointers and Data Structures

Homework Assignments Homework 6.1 Assume Register X contains the address $2000, Register Y contains the address $2080, Register A contains $45, and Register B contains $67. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std staa stab std

40,x $40,x $66,y 25,y $FF,y $CD,x

Homework 6.2 Assume Register X contains the address $2000, and Register Y contains the address $2080. Assume memory contains the following initial values $2000 0, $2001 1, . . . , $20FF $FF. For each of the following instructions, specify the effective address and the resulting operation. Give all your answers in hexadecimal. ldaa ldab ldaa ldaa ldd ldd

40,x $40,y $66,x 25,y $FE,x $0D,y

Homework 6.3 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $02, and Register B contains $67. Assume locations $0802 and $0803 contain the 16-bit value $0A00. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std stx stab std

b,x -$40,y [2,x] d,y 1,-y 2,x+

Homework 6.4 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $03, and Register B contains $67. Assume locations $0804 and $0805 contain the 16-bit value $0B12. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. stab staa std sty staa std

a,x -1,y [4,x] d,x 1,+x 2,y-

Homework 6.5 Write assembly code that adds 10 to Register X and subtracts 100 from Register Y. Homework 6.6 Write assembly code that sets Register X equal to Register Y plus 100. Homework 6.5 Write assembly code that adds Register D to Register X and stores the sum in Register Y.

6.13 䡲 Homework Assignments

239

Homework 6.7 Look up the machine code created by the following instructions. Explain the basic function of each instructon. The first one is completed. Machine Code

Instruction

Comment

$860A

ldaa ldaa ldaa ldaa

RegA = 10

#10 10 10,x 10,y

Homework 6.8 Look up the machine code created by the following 9S12 instructions. Explain the basic function of each instructon. The first one is completed. Machine Code

Instruction

Comment

$A602

ldaa ldaa ldaa ldaa ldaa ldaa ldaa

RegA = [X + 2]

2,x -2,x 2,+x 2,x+ 2,-x 2,x[2,x]

Homework 6.9 Write a subroutine to converts a null-terminated string to upper case. In particular, convert all lower case ASCII characters to upper case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr

#string UpperCase

; pointer to ASCII string

Homework 6.10 Write a subroutine to converts a null-terminated string to lower case. In particular, convert all upper case ASCII characters to lower case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr

#string LowerCase

; pointer to ASCII string

Homework 6.11 Write a subroutine that compares two null-terminated strings. Register A will be 0 if the strings do not match and will be nonzero if the strings match. The calling sequence is ldx ldy jsr

#string1 ; pointer to first string #string2 ; pointer to second string StringCompare

Homework 6.12 Write a subroutine that adds two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The first array, pointed to by RegX, should be added to the second array, pointed to by RegY, and the sum placed back in the second array. Assume the data is 8-bit unsigned, and implement a ceiling operation (set result to 255) on overflow. Homework 6.13 Write a subroutine that implements the dot-product two equal sized arrays. The arrays contain 8-bit unsigned numbers. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The return parameter is an unsigned 16-bit number in Reg D. For example, consider these two arrays: Vector1 fcb 10,20,30 ; 3-D vector Vector2 fcb 1,0,2 ; 3-D vector The dot product is 10*120*030*2 70. The calling sequence is ldaa ldx ldy jsr

#3 #Vector1 #Vector2 DotProduct

; size of arrays ; pointer to first array ; pointer to second array

240

6 䡲 Pointers and Data Structures Homework 6.14 Write a subroutine that counts the number of characters in a string. The string is null-terminated. Register X is a call-by-reference pointer to the string. The number of characters in the string is returned in Reg B. For example, consider this string: Name "Valvano" fcb 0 The size is is 7. The calling sequence is: ldy jsr

#Name Count

; pointer to string

Homework 6.15 Write a subroutine that finds the maximum number in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-byreference pointer to the array. The maximum value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum value is 40. The calling sequence is ldy jsr

#Array Maximum

; pointer to array

Homework 6.16 Write a subroutine that finds the largest absolute value in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-by-reference pointer to the array. The maximum absolute value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum absolute value is 80. The calling sequence is ldy jsr

#Array Maximum

; pointer to array

Homework 6.17 Write a subroutine that compares two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call-by-reference pointers to the arrays. The return parameter is in RegB. RegB is 1 if the arrays are equal and 0 if they are different. For example, consider these two arrays containing 8-bit numbers: Array1 fcb 10,20,30,40,50,60,70,80 Array2 fcb 10,20,30,41,50,60,70,80 These arrays are different. The calling sequence is ldaa ldx ldy jsr

#8 #Array1 #Array2 ArrayEqual

; size of arrays ; pointer to first array ; pointer to second array

Homework 6.18 Write a subroutine that counts the frequency of occurance of letters in a text buffer. Register X points to a null-terminated ASCII buffer. There is a 26-element array into which the frequency data will be entered. For example, the first element of Freq will contain the number of A’s and a’s. Count only the upper case and lower case letters. Freq ds.w 26

;twenty six 16-bit counters

The calling sequence is ldx jsr

#buffer CalcFreq

; pointer to text buffer

Homework 6.19 Write three debugging subroutines that implement a debugging array dump. Assume there are two global 16-bit variables AA and BB that are strategic to the system under test. The first subroutine initializes your system. The second subroutine saves AA, BB, and TCNT in the array. Your system should be able to support up to ten measurements. You may assume the SCI port

6.13 䡲 Homework Assignments

241

is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine will display the collected data. These three subroutines will be added to the original system with the first being called at the beginning, the second placed at strategic places within the program under test, and the last one will be called at the end. Estimate the level of intrusiveness of this debugging process. In particular, how long does it take to call the second subroutine. These subroutines will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 6.20 Assume we have some 6-row by 8-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 5, and the column index, J, ranges 0 J 7. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs

RegA RegB RegX RegD

row index I=0,1,...,5 column index J=0,1,...,7 pointer to a 6 by 8 matrix 16-bit contents at matrix[I,J]

Homework 6.21 Assume we have some 5-row by 10-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 4, and the column index, J, ranges 0 J 9. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs

RegA RegB RegX RegD

row index I=0,1,...,4 column index J=0,1,...,9 pointer to a 5 by 10 matrix 16-bit contents at matrix[I,J]

Homework 6.22 Consider the following table structure: const struct theRoom{ unsigned char windows; // number of windows unsigned char doors; // number of doors unsigned short size[3]; // x,y,z dimensions } typedef const struct theRoom roomType; roomType Building[4]={ { 3,2,{16,16,8}}, { 4,1,{20,20,10}}, { 5,3,{32,16,12}}, { 0,1,{18,10,8}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of windows of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 2, then the number of windows will be 5. c) Write an assembly program to return the number of doors of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 0, then the number of doors will be 2. d) Write an assembly program to return the volume of a room. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the room number is 1, then the volume will be 20*20*104000. Homework 6.23 Consider the following table structure: const struct thedesk{ unsigned char legs; unsigned char drawers; unsigned short size[2];

// number of legs // number of drawers // top x,y dimensions 0.1 feet

242

6 䡲 Pointers and Data Structures } typedef const struct thedesk deskType; deskType furniture[4]={ { 4,5,{30,50}}, // 4 legs 5 drawers { 4,0,{45,45}}, // square table { 6,7,{40,65}}, { 4,4,{35,55}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of legs of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 2, then the number of legs will be 6. c) Write an assembly program to return the number of drawers of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 0, then the number of drawers will be 5. d) Write an assembly program to return the area of a desk top with units in2. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the desk number is 3, then the desk top area will be (35*55*144)/100 1764. Worry about accuracy (divide last) and overflow (use enough bits in the multiply stage to prevent overflow.) You could factor the 144 100 terms to calculate (35*55*18)/25 1764. Your solution has to work for these four examples. Homework 6.24 Write an assembly main program that implements this Mealy finite-state machine. The FSM data structure, shown below, is given and cannot be changed. The next state links are defined as 16-bit pointers. Each state has eight outputs and eight next-state links. The input is on Port M bits 2,1, and 0 and the output is on Port T bits 5, 4, 3, 2, 1, and 0. There are three states (S0, S1, and S2), and the initial state is S0. Show all assembly software required to execute this machine, including the reset vector. You need not be friendly, but do initialize the direction registers. The repeating execution sequence is input, output (depends on input and current state), and next (depends on input and current state). org * Finite S0 fcb fdb S1 fcb fdb S2 fcb fdb

$4000 ;EPROM State Machine 0,0,5,6,3,9,3,0 S0,S0,S1,S1,S1,S2,S2,S2 1,2,3,9,6,5,3,3 S2,S0,S0,S0,S2,S2,S2,S1 1,2,3,9,6,5,3,3 S2,S2,S2,S2,S0,S0,S2,S1

; ; ; ; ; ;

Outputs for Next states Outputs for Next states Outputs for Next states

inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7

Homework 6.25 Design a microcomputer-based controller using a linked-list finite-state machine. The system has one input and one output.

Figure Hw6.25 Electronic ignition.

9S12

Angle Machine Spark

PT3 PT2

about 1 ms

Angle exactly 50μs

Spark

The input, Angle, is a periodic signal with a frequency of about 1 kHz (has a period of about 1 ms). The output, Spark, should be a positive pulse (exactly 50 s wide) every time Angle goes from 0 to 1. The delay between the rising edge of Angle and the start of the Spark pulse should be as short as possible. The period of Angle can vary from 1 ms to 50 ms. Since Angle is an input you can not control it, only respond to its rising edge.

6.13 䡲 Homework Assignments

243

a) Design the one input, one output finite-state machine for this system. Draw the FSM graph. Use descriptive state names (i.e., don’t call them S0, S1, S2 . . .) b) Show the assembly code to create the statically allocated linked list. Include org statement(s) to place it in the proper location on your microcomputer. c) Show the assembly language controller. Include ORG statement(s) to place it in the proper location on a microcomputer. Assume this is the only task that the microcomputer executes. I.e., show ALL the instructions necessary. Make the program automatically start on a RESET. Homework 6.26 Implement the following Mealy finite-state machine using linked lists. The initial state is Stop. Do not convert the finite-state machine to an equivalent Moore, rather implement it as a mealy machine. There is no wait parameter for the states.

Figure Hw6.26 Engine controller.

Break Machine Gas Control

PT2 PT1 PT0

9S12

Gas Break Control

0/10

0/00 1/01 Go

1/00

Initial

Stop

There is one input, Control, connected to PT0. There are two outputs: Break connected to PT2, and Gas connected to PT1. Each state has two next states and two outputs which depend on the current input. The controller continuously repeats the sequence: Input from Control (PT0) Output to Break,Gas (PT2 and PT1) which depends on the input Control Next state which depends on the input Control E.g., if the state is in Stop, and the Control is 0, then the Break output is 1 and the Gas output is 0 and the next state is Stop. Show ALL the assembly language software required to implement this machine on a single chip microcomputer. Use equ statements to clarify the data structure. Use org statements to implement the appropriate segmentation. Homework 6.27 Write an assembly main program that implements this Moore finite-state machine. The FSM state graph, shown in Figure Hw6.27, is given and cannot be changed. The input is on Port T bits 1 and 0 and the output is on Port M bits 4, 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy.

Figure Hw6.27 Finite state graph.

0 1

happy 10

2

0

3

3 2 hungry 0

3

0 sleepy 12 2

1 1

a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . output, input, next. . . . Please make your code that accesses Port M friendly. Homework 6.28 Write an assembly main program that implements this Mealy finite-state machine. The FSM state graph, shown in Figure Hw6.28, is given and cannot be changed. The input is on Port T

244

6 䡲 Pointers and Data Structures

Figure Hw6.28 Finite state graph.

0/3

0/7

happy 1/2

hungry 1/8 0/4 sleepy

1/3

bit 0 and the output is on Port M bits 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy. a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . input, output, next. . . . Please make your code that accesses Port M friendly. Homework 6.29 Write the Stepper_CCW subroutine as described in Example 6.1.

6.14

Laboratory Assignments Lab 6.1 Minimally Intrusive Debugging Purpose. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project you will run with a short time delay. After the software is debugged, you will build your hardware and run your software on the real 9S12. During this phase of the project you will run with time delays long enough so you will be able to see the LED flash (slower than 8 Hz). Description. You will first design a system, and then add debugging instruments to prove the system is functioning properly. The system has one input switch and one output LED. The basic function of the system is to respond to the input switch, causing certain output patterns on the LED. Interface a positive logic switch to PT3. This means the PT3 signal will be 0 (low, 0V) if the switch is not pressed, and the PT3 signal will be 1 (high, 5V) if the switch is pressed. Overall functionality of this system is described in the following rules. The system starts with the LED off (make PT2 0). The system will return to the off state if the switch is not pressed (PT3 is 0). If the switch is pressed (PT3 is 1), then the LED will flash on and off at about 4 Hz. During the first phase of this lab, you will simulate these hardware circuits in TExaS using a positive logic mode for the switch and LED. During the second phase, you will interface a real switch and LED to your 9S12. When visualizing software running in real-time on an actual microcomputer, it is important use minimally intrusive debugging tools. The objective of this lab is to develop debugging methods that do not depend on the simulator. During the first phase of this lab, you will develop and test your program and debugging instruments on the TExaS simulator. In particular, you will write debugging instruments to record input and output information as your system runs in real time. This software dump should store data into an array while it is running, and the information will be viewed at a later time. Software dumps are an effective technique when debugging software on an actual microcomputer. During the second phase of this lab, you will run your system on the real 9S12 with and without your debugging instruments. a) Design the hardware interface of the switch and LED first in TExaS, then on the real system. b) Write a main program that implements the input/output system. To implement the 125 ms delay, use the timer functions from Chapter 4. The basic steps for the main program are shown in Program L6.1. c) Write two debugging subroutines that implement a dump instrument. This is called functional debugging because you are capturing input/output data of the system without information

6.14 䡲 Laboratory Assignments

loop

wait flash

Initialize the stack pointer Enable interrupts for the Metrowerks debugger, cli Set the direction register so PT3 is an input and PT2 is an output Set PT2 so the LED is off delay about 125ms (any delay from 60 to 500 ms is OK) read the switch and go to flash if the switch is pressed Set PT2 so the LED is off read the switch and go to wait if the switch is not pressed toggle the LED (if on turn it off, if off turn it on) go to loop

245

DDRT &= ~0x08; // PT3 input DDRT |= 0x04; // PT2 output PTT &= ~0x04; // PT2 off while(1){ Delay(); // you write this if((PTT&0x08)==0){ PTT &= ~0x04; // PT2 off while((PTT&0x08)==0){}; } PTT = PTT^0x04; // toggle }

Program L6.1 Program used to develop minimally intrusive debugging instruments. specifying when the input/output was collected. The first subroutine (Debug_Init) initializes your debugging system. The initialization should initialize a 100-byte array (start it at $3880), initializing pointers and/or counters as needed. The second subroutine (Debug_Capture) saves one data point (PT3 input data, and PT2 output data) in the array. Since there are only two bits to save, pack the information into one 8-bit value for storage and ease of visualization. For example, if Input (PT3)

Output (PT2)

0 0 1 1

0 1 0 1

Saved Data 0000,00002, or $00 0000,00012, or $01 0001,00002, or $10 0001,00012, or $11

In this way, you will be able to visualize the entire array in an efficient manner. Place a call to Debug_Init at the beginning of the system, and a call to Debug_Capture just after each time you output to PTT (there will be 3 or 4 places where your software writes to PTT). Within TExaS you can observe the debugging array using a Stack window. The basic steps involved in designing the data structures for this debugging instrument are as follows: Allocate a 100-byte buffer starting at address $3880 Allocate a 16-bit pointer, which will point to the place to save the next measurement The basic steps involved in designing Debug_Init are as follows: Set all entries of the 100-byte buffer to $FF (meaning no data yet saved) Initialize the 16-bit pointer to the beginning of the buffer The basic steps involved in designing Debug_Capture are as follows: Return immediately if the buffer is full (pointer past the end of the buffer) Read PTT data PTT Mask capturing just bits 3,2 data ((data&$08)1) ((data&$04)2) Dump information into buffer (*pt) data Increment pointer to next address pt pt 1 Both routines should save and restore registers that it modifies (except CCR), so that the original program is not affected by the execution of the debugging instruments. The temporary variable data may be implemented in a register. However, the 100-byte buffer and the 16-bit pointer, pt, should be permanently allocated in global RAM. d) By counting cycles in the listing file, estimate the execution time of the Debug_Capture subroutine. Assuming the actual E clock speed, convert the number of cycles to time. This time will be a quantitative measure of the intrusiveness of your debugging instrument.

246

6 䡲 Pointers and Data Structures Lab 6.2 Hand Assembly and Execution Purpose. In this lab you will learn how to hand-assemble source code. During pass 1 you will create the symbol table. During pass 2 you will create the object code. Another objective is to understand how the microcomputer executes instructions. For each memory cycle during execution, you will predict the R/W line, the 16-bit address, and the 8-bit data bus. Description. In preparation for this assignment, you should familiarize yourself with the format of the Microcomputer Programming Reference Manual. In particular, you should understand the addressing modes. You need to be able to look up op codes for each instruction. For each instruction, you need to determine the object code and CPU execution cycles. Many instructions have multiple addressing modes, each addressing mode has a distinct object code and execution cycles. a) Pretend you are pass 1 of the cross-assembler and create the symbol table for the Program L6.2. Labels start in column 1. A symbol table is a list of symbols and their 16-bit unsigned values. There will be an entry in the symbol table for each label. For all labels except equ or set, the value of a symbol is the beginning address of that line. For labels with equ or set, the value of the label is the 16-bit value of the operand. b) Pretend you are pass 2 of the cross-assembler and create the object code for the Program L6.2. Include four fields for each line of assembly code:

Program L6.2 The assembly program for Lab 6.2.

org Result rmb Index rmb org Main lds ldy ldaa bsr std stop Sum pshy staa ldd SLoop addd dec bne puly rts org data fdb org fdb 1. 2. 3. 4.

$900 ; RAM 2 1 $F800 ; EEPROM #$0C00 #data #2 Sum Result

Index #0 2,y+ Index SLoop

$FC00 13,9 $FFFE Main

The address is the 16-bit unsigned hexadecimal location of the start of this line The object code is a group of 8-bit unsigned hexadecimal values The number of cycles to execute this line (called Cycles in the manual) The execution pattern is called Access Detail in the CPU manual

Every line has an address. Some pseudo-ops will create object code (e.g., fcb5 fdb6). Since pseudo-ops are not executed, no pseudo-op will have values for the Cycles or Access Detail entries. For example, the 6812 yields Address $F800 $F800 5 6

Object Code(s) B6 08 00

Cycles

Access Detail

Source Code

[3]

rOP

org $F800 ldaa $0800

In TExaS, the pseudo-ops fcb, dc.b, and dc are identical. In TExaS, the pseudo-ops fdb and dc.w are identical.

6.14 䡲 Laboratory Assignments

247

c) Type the source code into the system and run the cross-assembler. Please correct your part b with a red pen. Please do parts a and b on paper first, then run the machine. d) Pretend your are the microcomputer and hand-execute this program up until the stop instruction. Perform the pseudo-execution showing the R/W, 16-bit Address, and 8-bit Data in hexadecimal for each cycle. On the 6812 the pseudo-execution will not match the actual 6812 execution. This is because the 6812 has an instruction queue and can actually fetch 16 bits at a time. TExaS does not simulate the 6812 instruction queue and will always fetch 8 bits. TExaS properly simulates the software timing on all its microcomputers. For example TExaS will show the 6812 instruction ldaa $0800 as four pseudo cycles Read Read Read Read

$F800 $F801 $F802 $0800

B6 08 00 xx

fetch opcode fetch operand fetch operand memory read, xx is the contents at $800

but the simulated time will be correctly incremented by 3. In fact, all timing aspects of the simulation will be accurate. Add in the comment field at the start of each instruction which instruction is being executed. e) Run this program with the simulator and verify your answers to part d. Correct any mistakes with a red pen. Please do part d on paper first, then run the machine. Lab 6.3 Profiling Purpose. The TExaS simulator provides a rich set of debugging tools, but eventually we will be asked to run programs on an actual microcomputer. The objective of this lab is to develop profiling tools that do not depend on the simulator. Even though we will still be using the simulator for this lab, these techniques can be used when debugging software on an actual microcomputer. Procedure. a) Write three debugging subroutines that implement profiling. The first subroutine (Debug_Init) initializes your system. The second subroutine (Debug_Capture) saves a profile point (time, data, and PC position) in an array. The time parameter is the current TCNT value, the data parameter is the hexadecimal value in Register D, and the PC position information can be obtained by reading the return address off the stack. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine (Debug_Display) displays the profile on the SCI/CRT interface. Be careful to save and restore registers so the original subroutine will execute. Program L6.3 shows an example application of these debugging functions. Measure the execution time of the Debug_Capture subroutine. This time will be a quantitative measure of the intrusiveness of the debugging instrument. b) In this part, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to count the number of times sqrt is called. Modify the main program so sqrt is called exactly 15 times. Connect unused parallel port bits to an external device that will assist in the visualization (LED, LCD etc.) Run your instrumented system that visualizes the program is called 15 times. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. c) Again, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to visualize the execution pattern within sqrt. Modify the main program so sqrt is called once with an input of Reg A 100. Connect unused parallel port bits to a logic analyzer Run your instrumented system that visualizes the execution pattern. In particular, you should see the subroutine start, visualize how many times it loops, and see it finish. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. Lab 6.4 MicroForth Interpreter Purpose. In this lab, you will build a binary-tree data structure. You will design an interpreter that performs simple arithmetic operations. Your system must handle of signed under/overflow conditions.

248

6 䡲 Pointers and Data Structures

org $0800 rmb 1 transformed to sqrt(s) rmb 1 loop counter rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input mul ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx bsr sqrt check nop inca pulx bne loop stop org $FFFE fdb main t cnt s16

Program L6.3 Profiling added to the squareroot program.

* with debugging added org $0800 t rmb 1 transformed to sqrt(s) cnt rmb 1 loop counter s16 rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt jsr Debug_Capture psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 jsr Debug_Capture dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx jsr Debug_Init bsr sqrt jsr Debug_Display check nop inca pulx bne loop stop org $FFFE fdb main

6.14 䡲 Laboratory Assignments

249

Description. In preparation for this assignment, review binary trees, command interpreters, and the last-in-first-out queue (stack). See the simple binary interpreter in TREE.rtf (installed with TExaS). The major advantage of a binary tree structure over a linear list is the speed of lookup. In the worst case, the maximum number of compares one must do to find an entry is the maximum depth of the tree. Let size be the number of entries and depth be the maximum distance from the root to any leaf. If the binary tree is full, the maximum depth is less or equal to next greatest integer of log2 size. For example, a full tree with 1023 entries requires only 10 searches to find an entry. A linear search on the same 1023 entries would take on average 512 searches. In this assignment, we will have only 15 entries, but still will implement a linked-list binary tree. There are two basic approaches to binary searching: linked lists and indexed table. In the listed list, each entry contains a string called name, a pointer to the function to execute called command, and two pointers: left and right. If both left and right are null, then the node is a leaf.

"1" push1

root "–1" pshm1

"in" in

"+" add "*" mult null

null

"/" divide "–" sub

null

null

"–2" pshm2 null

"drop" drop

"0" push0

null

null

depth = 4 "out" out

"2" push2

null

null

null

"dup" dup null

"mod" mod

null

null

"over" over

null

null

null

Figure L6.4a Tree structure containing the names and function addresses. In this procedure, input is a string to find. We begin searching at the root. Figure L6.4b Flowchart for the interpreter.

pt = root;

input < pt->name

input

input == pt->name

input > pt->name pt = pt->left;

pt = pt->right;

execute pt->command(); success

pt != null

pt

pt == null

failure

250

6 䡲 Pointers and Data Structures If the input is less than the name of the current node (pt-name) (alphabetically before) then the search will go left (pt pt-left). If the input is greater than the name of the current node (pt-name) (alphabetically after) then the search will go right (pt pt-right). The second approach (which you will not be implementing, but is included for your consideration) is called an indexed table. In this scheme we start numbering at index 1. The table must be sorted alphabetically. Rather than storing the pointers explicitly as we did in the previous example, notice how the index number when viewed in binary provides the same information. If the size is not exactly a power of two, we must allocate additional entries and place them alphabetically at the beginning or the end.

Table 1000

0100

1100

0010

0001

0110

0011

0101

0111

1010

1001

1110

1011

1101

1111

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

"*" "+" "-" "-1" "-2" "/" "0" "1" "2" "drop" "dup" "in" "mod" "out" "over"

mult add sub pshm1 pshm2 divide push0 push1 push2 drop dup in mod out over

Figure L6.4c Finite state graph.

Again, input is the string that is used to match the name field of the table. This is also a binary search because the number of tests will be less than or equal to the next greatest integer of log2 size.

Figure L6.4d Flowchart for the indexed table interpreter.

I = 0; mask = 0x08;

I = I+mask; input < Table[I].name input == Table[I].name input input > Table[I].name

I = I-mask;

mask = mask>>1; mask > 0

execute Table[I].command();

mask == 0 mask failure

success

6.14 䡲 Laboratory Assignments

251

The linked-list lookup will be a little faster to execute, because it is quicker to access pt-name than it is to access Table[I].name. On the other hand, it is easy to make minor changes in the indexed table. If the space is already allocated, then at run time involves shifting the entries down and so that the list remains alphabetical. Deleting a node simply involves shifting the nodes up. The only disadvantage is the size can not increase so that it exceeds the next power of two. The following table lists the 15 commands your FORTH interpreter will execute. Your software system will have two stacks. The return stack, pointed to by SP, will contain return addresses for the usual jsr rts subroutine call functions. The data stack, pointed to by RegY, will contain the input/output parameters for the functions. Commands will be separated by returns (ASCII 13). The idea is to input an entire line using InString, then lookup the command in the tree, if found, execute the function. You should display (without popping) the top data stack entry in the LCD display. Command

Function

in out dup over drop * / mod 0 1 2 1 2

Input 8-bit signed number from CRT keyboard, push on data stack Pop from data stack and output 8-bit signed number to CRT display Duplicates top of data stack Duplicates next to top of data stack Pop and discard top of data stack Pops two numbers from data stack, add, push result on data stack Pops two numbers from data stack, subtract, push result on data stack Pops two numbers from data stack, multiply, push result on data stack Pops two numbers from data stack, divide, push quotient on data stack Pops two numbers from data stack, divide, push remainder on data stack Pushes the constant 0 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack

We will create 10 bytes of space for the data stack (Y) separate from the hardware stack (SP). Register Y will always point into this space. You must explicitly test for data stack overflow and underflow. You must also implement ceiling and floor handling during the addition, subtraction, multiply, divide, and modulo functions. datastack penultimate ultimate bottom

Figure L6.4e Data and return stacks.

rmb rmb rmb rmb

8 1 1 0

Return stack

Data stack datastack Free area

Free area

Y

SP Subroutine return addresses

Top of stack Valid data

antepenultimate penultimate ultimate bottom

252

6 䡲 Pointers and Data Structures Notice that we can determine how many bytes are on the data stack by comparing Y to the fixed addresses: If Y Equals

This Many Bytes Are on the Data Stack

Bottom Ultimate Penultimate Datastack

None (empty) One Two Ten (full)

Use reverse-polish format for subtraction and division. E.g., 2 1 – is 1, and 2 1 / is 2. The usual stack rules apply to this data stack as well. 1. 2. 3. 4.

Stack accesses (PUSH or PULL) should not be performed outside the allocated area. Stack reads should not be performed from the free area. Stack PUSH should first decrement Y, then store the data (not vise versa). Stack PULL should first read the data, then increment Y (not vise versa).

Here are a couple of the routines to get you started: * duplicate next to top over cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate cpy #datastack check for full bls overend skip if stack is already full ldaa 1,y copy of next to top staa 1,-y push on data stack overend rts * push a 2 on the data stack push2 cpy #datastack check for full bls psh2end skip if stack is already full movb #2,1,-y push 2 on data stack psh2end rts * multiply top two entries mult cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate ldaa 1,y+ pop top of stack sex a,x X=multiplicand (-128 to +127) ldaa 1,y+ pop next to top sex a,d D=multiplicand (-128 to +127) exg x,y Y,D are multiplicands (X is stack pt) emuls D=product (no overflow possible in 16-bit D) exg x,y Y is stack pt again, -16256 = D = 16384 cpd #127 bgt ceiling cpd #-128 bge ok floor ldab #-128 since D127, set B = 127 ok stab 1,-y push result rts a) One by one, write and debug the 15 individual commands. Use stabilization to test each routine. b) Design the fixed binary tree containing the names and function addresses for all your commands. This structure will exist in EEPROM and can not be modified unless the source code is edited and the program reassembled. Note, most FORTH interpreters place the binary tree

6.14 䡲 Laboratory Assignments

253

in RAM and allow commands to be added and subtracted at run time. Use binding (equ) to make the program more readable. c) Write the main program that interprets input from the CRT keyboard and displays output back to the CRT display. Remember to display the top of the data stack on the LCD display. Lab 6.5 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, to create a segmented software system, and real-time synchronization by designing an input-directed traffic light controller. In preparation for this assignment, review finite-state machines, linked lists, and memory allocation. You should also run and analyze the linked-list controllers found in example files moore.rtf and mealy.rtf. Description. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project, you will run with a fast TCNT clock (TSCR2 0). After the software is debugged, you will interface actual lights and switches to the 9S12 and run your software on the real 9S12. During this phase of the project you will run with a slow TCNT clock (TSCR2 $07). As you have experienced, the simulator requires more actual time to simulate one cycle of the microcomputer. On the other hand, the correct simulation time is maintained in the TCNT register, which is incremented every cycle of simulation time. The simulator speed depends on the amount of information it needs to update into the windows. Unfortunately, even with the least amount of window updates, it would take a long for the simulator to process the typical 3 minutes it might take for a “real” car to pass through a “real” traffic intersection. Consequently, the cars in this traffic intersection travel much faster than “real” cars. In other words, you are encouraged to adjust the time delays so that the operation of your machine is convenient for you to debug and for the TA to observe during demonstration. You will create a segmented software system putting global variables into RAM, local variables into RAM, constants and fixed data structures into EEPROM, and program object code into EEPROM. Most microcontrollers have a rich set of timer functions. For this lab, you will the ability to wait a prescribed amount of time. In general, cycle-counting (simple for loops) has the problem of conditional branches and data-dependent execution times. If an interrupt were to occur during a cycle-counting delay, then the delay would be inaccurate using the cyclecounting method. Using the TCNT timer, however, the timing will be very accurate, even if an interrupt were to occur while the microcomputer was waiting. In more sophisticated systems, other timer modes provide even more flexible mechanisms for microcomputer synchronization. A linked list solution may not run the fastest or occupy the fewest memory bytes, but it is a structured technique that is easy to understand, easy to implement, easy to debug, and easy to upgrade. Consider a typical four-corner intersection as shown in Figure L6.5. There two one-way streets are labeled South (cars travel North) and West (cars travel East). There are three inputs to your 9S12, two are car sensors, and one is a walk button. The South sensor will be true (1) if one or more cars are near the South intersection. Similarly, the West sensor will be true (1) if one or more cars are near the West intersection. The Walk sensor will be true (1) if a pedestrian wishes

Figure L6.5 Traffic light intersection.

Walk

South

R Y G

West R Y G

R

Dont walk G

Walk

254

6 䡲 Pointers and Data Structures to cross in any direction. There are eight outputs from your microcomputer that control the two Red/Yellow/Green traffic lights and the two walk/don’t lights. The simulator allows you to attach binary switches to simulate the three inputs and LED lights to simulate the eight outputs. Traffic should not be allowed to crash. I.e., there should not be a green or yellow on South at the same time there is a green or yellow on West. You should exercise common sense when assigning the length of time that the traffic light will spend in each state, so that the simulated system changes at a speed convenient for the TA (stuff changes fast enough so the TA doesn’t get bored, but not too fast that the TA can’t see what is happening). Cars should not be allowed to hit the pedestrians. The walk sequence should be realistic (walk, flashing don’t, continuous don’t). Your system should consider both the average and worst-case waiting time. You may assume the two car sensors remain active for as long as service is required. On the other hand, the walk button may be pushed and released, and the system must remember the walk has been requested. a) Build an I/O system in TExaS with the appropriate names and colors on the lights and switches. Think about which ports you will be using in part d so that you simulate the exact system you will eventually plan to build. b) Design a finite-state machine that implements a good traffic-light system. Include a graphical picture of your finite-state machine showing the various states, inputs, outputs, wait times, and transitions. Remember the wait function will return input data collected while it is waiting. c) Write the assembly code that implements the traffic-light control system. There is no single, “best” way to implement your traffic light. However, your scheme must be segmented into RAM/EEPROM, and you must use a linked-list data structure. There should be a one-toone mapping from the FSM states and the linked list elements. A “good” solution has about 10 to 20 states in the finite-state machine and provides for input dependence. Try not to focus on the civil engineering issues. Rather, build a quality computer engineering solution that is easy to understand and easy to change. Do something reasonable, and have 10 to 20 states. A good solution has 1. 2. 3. 4. 5.

One-to-one mapping between state graph and data structure No conditional branches in program The state graph defines exactly what it does in a clear and unambiguous fashion The format of each state is the same Good names and labels

Typically in real applications using an embedded system, we put the executable instructions and the finite-state machine linked-list data structure into the nonvolatile memory (flash EEPROM). A good implementation will allow minor changes to the finite machine (adding states, modifying times, removing states, moving transition arrows, and changing the initial state) simply by changing the linked list controller, without changing the executable instructions. Making changes to executable code requires you to debug/verify the system again. If there is a one-to-one mapping from FSM to linked-list data structure, then if we just change the state graph and follow the one-to-one mapping, we can be confident our new system still operates properly. Obviously, if we add another input sensor or output light, it may be necessary to update the executable part of the software and re-assemble. During the debugging phase with the TExaS simulator, you can run with a fast TCNT clock (TSCR2 $00). d) After the software has been debugged on the simulator, you will implement it on the real board. The first step is to interface three pushbutton switches for the sensors. Do not place or remove wires on the protoboard while the power is on. Build the switch circuits and test the voltages using a digital voltmeter. You can also use the debugger to observe the input pin to verify the proper operation of the interface. The next step is to build six LED output circuits. You can use the two LEDs on the docking module (PT1, PT0) in addition to the six external LEDs you will build on your protoboard. Look up the pin assignments in the 7406 data sheet. Be sure to connect 5 V power to pin 14 and ground to pin 7. You can use the debugger to set the direction

6.14 䡲 Laboratory Assignments

255

register to output. Then, you can set the output high and low, and measure the three voltages (input to 7406, output from 7406 which is the LED cathode voltage, and the LED anode voltage). e) Debug your combined hardware/software system on the actual 9S12 board. When using the real 9S12, you should run with a slow TCNT clock (TSCR2 $07). An interesting question that may be asked during checkout is how you could experimentally prove your system works. In other words, what data should be collected and how would you collect it?

7

Local Variables and Parameter Passing Chapter 7 objectives are to: c Explain how to implement local variables on the stack c Show how various C compilers implement local variables and pass parameters c Compare and contrast call-by-value versus call-by-reference parameter passing

Variables are an important component of software design, and there are many factors to consider when creating variables. Some of the obvious considerations are the size and format of the data. Another factor is the scope of a variable. The scope of a variable defines which software modules can access the data. Variables with an access that is restricted to one software module are classified as private, and variables shared between multiple modules are public. In general, a system is easier to design (because the modules are smaller and simpler), easier to change (because code can be reused), and easier to verify (because interactions between modules are well-defined) when we limit the scope of our variables. However, since modules are not completely independent, we need a mechanism to transfer information from one to another. In this chapter, we will develop parameter passing methodologies. Because their contents are allowed to change, all variables must be allocated in RAM and not ROM. On the one hand, global variables contain information that is permanent and are usually assigned a fixed location in RAM. On the other hand, local variables contain temporary information and are stored in a register or allocated on the stack. One of the important objectives of this chapter is to present design steps for creating, using, and destroying local variables on the stack. In summary, there are three types of variables: public globals (shared permanent), private globals (unshared permanent), and private locals (unshared temporary). Because there is no appropriate way to create a public local variable, we usually refer to private local variables simply as local variables, and the fact that they are private is understood.

7.1

Local Versus Global A local variable contains temporary information. Since we will implement local variables on the stack or in registers, this information can not be shared with other software modules. Therefore, under most situations, we can further classify these variables as private. Local variables are allocated, used, then deallocated, in this specific order. For speed reasons, we wish to assign local variables to registers. When we assign a

256

7.1 䡲 Local Versus Global

257

local variable to a register, we can do so in a formal manner. There will be a certain line in the assembly software at which the register begins to contain the variable (allocation), followed by lines where the register contains the information (access or usage), and a certain line in the software after which the register no longer contains the information (deallocation). As an example, consider the register allocation used in a finite-state machine controller, shown earlier as Program 6.22, and again here as Program 7.1. Register B is allocated for holding the Output value in Line 6, used in Lines 6 through 9, then deallocated, such that after Line 9, Register B can be used for other purposes. Register B and Y are used in this program to temporarily hold information, and hence are classified as local variables. Constrast this to how Register X is used. This is a VERY simple program, and in such, the usage of Register X is unusual. This main program assigns Register X to hold the state pointer (Pt) in Line 5. From that point in time, Register X always contains Pt, and hence we classify this assignment of Register X as global (meaning permanent). It is appropriate to assign a register as a global only in the most simple situations (e.g., less than a 20-line program with no interrupts). Program 7.1 Register assignments in a finite-state machine controller.

Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Program Main lds #$4000 bsr Timer_Init ldab #$FC stab DDRT ldx #goN FSM ldab OUT,x lslb lslb stab PTT ldy WAIT,x bsr Timer_Wait10ms ldab PTT andb #$03 lslb abx ldx NEXT,x bra FSM

Register B

Register X

Register Y

$FC

Output Output Output Output

Input Input Input Input

Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt

Wait Wait

The information stored in a local variable is not permanent. This means if we store a value into a local variable during one execution of the module, the next time that module is executed the previous value is not available. Examples include loop counters and temporary sums. We use a local variable to store data that is temporary in nature. We can implement a local variable using the stack or registers. Some reasons why we choose local variables over global variables include: 䡲 Dynamic allocation/release allows for reuse of RAM memory. 䡲 Limited scope of access (making it private) provides for data protection; only the program that created the local variable can access it. 䡲 Since an interrupt will save registers and create its own stack frame, the code is reentrant. 䡲 Since absolute addressing is not used, the code is relocatable. Some reasons why we place local variables on the stack rather than using registers include: 䡲 We can use symbolic names for the local variables, making it easier to understand. 䡲 The number of variables is only limited by the size of the stack, which is more than registers. 䡲 Because it is more general, it will be easier to add additional variables in the future.

258

7 䡲 Local Variables and Parameter Passing Checkpoint 7.1: How do you create a local variable in C?

A global variable is allocated at a permanent and fixed location in RAM. A public global variable contains information that is shared by more than one program module. We must use global variables to pass data between the main program (i.e., foreground thread) and an ISR (i.e., background thread). If a function called from the foreground belongs to the same module as the ISR, then a global variable used to pass data between the function and the ISR is classified as a private global (assuming software outside the module does not directly access the data). Global variables are allocated at assembly time and never deallocated. Allocation of a global variable means the assembler assigns the variable a fixed location in RAM. The information they store is permanent. Examples include time of day, date, calibration tables, user name, temperature, fifo queues, and message boards. We use absolute addressing (direct or extended) to access their information. When dealing with complex data structures like the ones presented in Chapter 6, pointers to the data structures are shared. In general, it is a poor design practice to employ public global variables. On the other hand, private global variables are necessary to store information that is permanent in nature. Observation: Sometimes we store temporary information in global variables because it is easier to observe the contents using the debugger. This usage is appropriate during the early stages of development, but once the module is tested, temporary information should be converted to local, and the system should be tested again. Checkpoint 7.2: How do you create a global variable in C?

In C, a static local has permanent allocation, which means it maintains its value from one call to the next. It is still local in scope, meaning it is only accessible from within the function. I.e., modifying a local variable with static changes its allocation (it is now permanent), but doesn’t change its scope (it is still private). In the following example, count contains the number of times MyFunction is called. The initialization of a static local occurs just once, during startup. void MyFunction(void){ static short count=0; count++; }

In C, we create a private global variable using the static modifier. Modifying a global variable with static does not change its allocation (it is still permanent), but does reduce its scope. Regular globals can be accessed from any function in the system (public), whereas a static global only can be accessed by functions within the same file. Static globals are private. Functions can be static also, meaning they can be called only from other functions in the file. E.g., static short myPrivateGlobalVariable; // accessible by this file only void static MyPrivateFunction(void){ }

In C, a const global is read-only. It is allocated in the ROM portion of memory. Constants, of course, must be initialized at compile time. E.g., const short Slope=21; const char SinTable[8]={0,50,98,142,180,212,236,250}; Common Error: If you leave off the const modifier in the SinTable example, the table will be allocated twice: once in ROM containing the initial values and once in RAM

7.2 䡲 Stack Rules

259

containing data to be used at run time. Upon startup, the system copies the ROM-version into the RAM-version. Maintenance Tip: It is good practice to specify whether an assembly variable is signed or unsigned in the comments. If the information has units (e.g., volts, seconds, etc.) this should be included also.

7.2

Stack Rules In the last section, we discussed the important issue of global versus local variables. One of the more flexible means to create local variables will be the stack. In this section, we define a set of rules for proper use of the stack. A last-in-first-out (LIFO) stack is implemented in hardware by most computers. The stack can be used for local variables (temporary storage), saving return addresses during subroutine calls, passing parameters to subroutines, and saving registers during the processing of an interrupt. The first advantage of placing local variables on the stack is that the storage can be dynamically allocated before usage and deallocated after usage. The second advantage is the facilitation of reentrant software. The stack pointer (SP) on the 9S12 points to the top entry of the stack, as shown in Figure 7.1. If it exists, we define the data immediately below the top (larger memory address) as next to top. To push a byte on the stack, we first decrement the stack pointer (SP), then we store the byte at the location pointed to by the SP. To pull a byte from the stack, first we read the byte from memory pointed to by SP, then we increment the SP. To push a 16-bit word on the stack, we first decrement the SP by 2, then we store the word into that location. To pull a 16-bit word from the stack, we first read the word from the location pointed to by SP, then we increment the SP by 2.

Figure 7.1 The 9S12 stack. The white boxes are free spaces, and the shaded boxes contain data.

Stack with 3 elements

Empty Stack

SP

top next

SP

Checkpoint 7.3: How do we push/pull a 16-bit word onto/from the stack?

The instruction tsx will transfer a copy of the stack pointer into Register X. The instruction causes Register X to point to the top element of the stack, as shown in Figure 7.2. The instruction tsy works in a similar manner with Register Y. The tsx and tsy instructions do not modify the stack pointer. Formally, there is only SP that defines what data is on the stack. However, having a second pointer also point into the stack provides additional flexibility for accessing data. Figure 7.2 The tsx instruction creates a stack frame pointer.

Stack before

Stack after txs SP

SP

top next

top X

next

260

7 䡲 Local Variables and Parameter Passing

We can read and write previously allocated locations on the stack using indexed mode addressing. For example, to read an 8-bit value from the next to the top byte: tsx ldaa 1,X

;Reg X points to the top byte of the stack ;Reg A = the next to the top byte

Stack pointer indexed mode also can be used to read any data on the stack: ldaa 1,SP

;Reg A = the next to the top byte

The LIFO stack has a few rules (repeated from Chapter 5): 1. 2. 3. 4. 5.

Program segments should have an equal number of pushes and pulls. Stack accesses (push or pull) should not be performed outside the allocated area. Stack reads and writes should not be performed within the free area. Stack push should first decrement SP, then store the data. Stack pull should first read the data, then increment SP.

Programs that violate rule number 1 will probably crash when a rts instruction pulls an illegal address off the stack at the end of a subroutine. The TExaS simulator will usually recognize this error as an illegal memory access then the processor tries to fetch an op code at this incorrect address. The backdump command will be useful to retrace the steps leading up to the crash. Figures 7.1 and 7.2 show the free area as white boxes. Violations of rule number 2 can be caused by a stack underflow or overflow. Stack underflow is caused when there are more pulls than pushes and is always the result of a software bug. The TExaS simulator will recognize this error as an illegal memory access when the processor tries to pull data from an address that doesn’t exist. A stack overflow can be caused by two reasons. If the software mistakenly pushes more than it pulls, then the stack pointer will eventually overflow its bounds. Even when there is exactly one pull for each push, a stack overflow can occur if the stack is not allocated large enough. Stack overflow is a very difficult bug to recognize, because the first consequence occurs when the computer pushes data onto the stack and overwrites data stored in a global variable. At this point, the local variables and global variables exist at overlapping addresses. Setting a breakpoint at the first address of the allocated stack area allows you to detect a stack overflow situation. Checkpoint 7.4: How do you specify the size of the stack?

The following 9S12 assembly code violates rule 3, and will not work if interrupts are active. The objective is to save register A onto the stack. When an interrupt occurs, registers automatically will be pushed on the stack, destroying the data. staa -1,SP

;Store zero onto the stack (***illegal***)

To use the stack, one first allocates, then saves. The following assembly code also violates rule 3, because it first stores it on the stack, then allocates space. The objective is to push a zero onto the stack. If an interrupt were to occur between the clr and des instructions in the following example, the zero will be destroyed when registers are pushed on the stack by the interrupt context switch: tsx clr -1,X des

;Reg X points to the top of the stack ;Store zero onto the stack (***illegal***) ;Make space for the zero

The proper technique is to allocate first, then store: des clr 0,SP

;Allocate stack space first ;Store zero onto the stack

or clr 1,-SP ;Store zero onto the stack

7.3 䡲 Local Variables Allocated on the Stack

261

Constants can be pushed on the stack with the movb and movw instructions. For example, to push the byte 7: movb #7,1,-SP ;push a 7 onto the stack Checkpoint 7.5: Write an assembly instruction that pushes a 16-bit 1000 onto the stack.

7.3

Local Variables Allocated on the Stack Stack implementation of local variables has four stages: binding, allocation, access, and deallocation. 1. Binding is the assignment of the address (not value) to a symbolic name. The symbolic name will be used by the programmer when referring to the local variable. The assembler binds the symbolic name to a stack index, and the computer calculates the physical location during execution. In the following example, the local variable will be at address SP 0, and the programmer will access the variable using sum,SP addressing: sum

set

0

;16-bit local variable, stored on the stack

Checkpoint 7.6: Why is set better than equ for binding?

2. Allocation is the generation of memory storage for the local variable. The computer allocates space during execution by decrementing the SP. In this first example, the software allocates the local variable by pushing a register on the stack. An 8-bit push (e.g., psha) creates an unitialized 8-byte local variable, and a 16-bit push (e.g., pshx) creates an unitialized 16-byte local variable The value in the register is irrelevant; these instructions are used because they are a fast way to decrement the SP. pshx

;allocate 16-bit sum

In this next example, the software allocates the local variable by decrementing the stack pointer. This local variable is also uninitialized. This method is most general, allowing the allocation of an arbibrary amount of data. leas -2,SP

;allocate sum

Checkpoint 7.7: In what way is pshx better than leas -2,sp for allocating a 16-bit local? In what way is leas -2,sp better?

If you wished to allocate a 16-bit local and initialize it to zero, you could execute: ldx #0 pshx ;allocate sum=0

or movw #0,2,-sp ;allocate sum=0 Checkpoint 7.8: Assume Register A contains the size in bytes of an array, determined at run-time. Write assembly code to allocate the array on the stack.

3. The access to a local variable is a read or write operation that occurs during execution. In the next code fragments, the value of the local variable sum is initialized to 0. One way is tsx ldd std

;X points to locals #0 sum,x ;sum=0

and another way is movw #0,sum,sp

;sum=0

262

7 䡲 Local Variables and Parameter Passing

In the next code fragment, the local variable sum is incremented. We could use RegX to access the data tsx ldd sum,x addd #1 std sum,x

;sum=sum+1

or use the SP directly. ldd sum,sp addd #1 std sum,sp

;sum=sum+1

4. Deallocation is the release of memory storage for the location variable. The computer deallocates space during execution by incrementing SP. In this first example, the software deallocates the local variable by pulling a register from the stack. pulx

;deallocate sum

Observation: When the software uses the “push-register” technique to allocate and the “pull-register” technique to deallocate, it looks like it is saving and restoring the register. Because most applications of local variables involve storing into the local, the value pulled will NOT match the value pushed.

In this next example, the software deallocates the 16-bit local variable by incrementing the stack pointer twice. leas 2,SP

;deallocate sum

Checkpoint 7.9: Write a 9S12 subroutine that allocates then deallocates three 8-bit locals.

7.4

Stack Frames Assume the SP is initialized to $4000. By definition, the SP points to the top of the stack. Therefore, all data on the stack exist at addresses between SP and $3FFF, i.e., SP address $3FFF. However, sometimes it is convenient to setup a second pointer into the stack, using either register X or Y, called a stack frame pointer. For example, the stack frame pointer can point to a set of local variables and parameters of the function. It is important in this implementation that once the stack frame pointer is established (e.g., using the tsx instruction), that the stack frame register (X) not be modified. The term frame refers to the fact that the pointer value is fixed. If Register X is a fixed pointer to the set of local variables, then a fixed binding (using the equ or set pseudo op) can be established between Register X and the local variables (even if additional information is pushed on the stack.) Because the stack frame pointer should not be modified, every subroutine will save the old stack frame pointer of the function that called the subroutine (e.g., pshx at the top) and restore it before returning (e.g., pulx at the bottom.) In some cases, the txs instruction can be used to deallocate the local variables. Local variable access uses the indexed addressing mode using Register X. Observation: One advantage of using a stack frame is that you can push and pull within the body of the function and still be able to access local variables using their symbolic name. Observation: One disadvantage of using a stack frame is that a register is dedicated as the frame pointer, and thus, it is unavailable for general use.

Programs 7.2, 7.3, and 7.4 all calculate the 16-bit sum of the first 100 numbers. The purpose of these simple programs is to demonstrate various implementations of local variables. In these programs, the result will be returned by value in Register D.

7.4 䡲 Stack Frames Program 7.2 A simple function with two local 16-bit variables.

263

unsigned short calc(void){ unsigned short sum,n; sum = 0; for(n=100;n>0;n--){ sum=sum+n; } return sum; }

Program 7.3 shows two implementions using regular stack pointer addressing, as drawn in Figure 7.3 (left). The implementation on the left of Program 7.3 has no binding and is difficult to understand. In this version, the variable n is accessed using 2,SP addressing mode. The version on the right has exactly the same machine code as the left (same size and execution speed), but is easier to understand because the local variables are referred to by their symbolic names. Figure 7.3 Local variables on the stack, accessed with indexed addressing modes.

Stack for Program 7.3

Stack for Program 7.4 num

–4,X

num

0,SP

n

–2,X

n

2,SP

Old Reg X

SP SP

return address 16 bits

;***NO BINDING USED*****

; *******allocation phase ********* calc leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,0,sp ;sum=0 movw #100,2,sp ;n=100 loop ldd 2,sp ;RegD=n addd 0,sp ;RegD=sum+n std 0,sp ;sum=sum+n ldd 2,sp ;n=n-1 subd #1 std 2,sp bne loop ; ********deallocation phase ***** leas 4,sp ;deallocation rts ;RegD=sum

X

return address 16 bits

; *****binding phase*************** sum set 0 ;16-bit number n set 2 ;16-bit number ; *******allocation phase ********* calc leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,sp ;sum=0 movw #100,n,sp ;n=100 loop ldd n,sp ;RegD=n addd sum,sp ;RegD=sum+n std sum,sp ;sum=sum+n ldd n,sp ;n=n-1 subd #1 std n,sp bne loop ; ********deallocation phase ***** leas 4,sp ;deallocation rts ;RegD=sum

Program 7.3 Stack pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.

Program 7.4 shows two implementions using stack frame pointer addressing. The one on the left has no binding and is difficult to understand. The one on the right has exactly the same machine code but is easier to understand. The program establishes the frame pointer, then allocates the variables. In Program 7.4, the variable n is accessed using 2,X addressing mode, as shown in Figure 7.3 (right). Notice in both cases of Figure 7.3 that valid data on the stack exists in memory at addresses greater or equal to the stack pointer. In particular, one does not allocate/deallocate stack space by changing Registers X or Y. I.e., decrementing SP allocates space, and incrementing SP deallocates space.

264

7 䡲 Local Variables and Parameter Passing

;***NO BINDING USED*****

; *******allocation phase ********* calc pshx ;save old Reg X tsx ;stack frame pointer leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,-4,x ;sum=0 movw #100,-2,x ;n=100 loop ldd -2,x ;RegD=n addd -4,x ;RegD=sum+n std -4,x ;sum=sum+n ldd n,x ;n=n-1 subd #1 std -2,x bne loop ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts

; *****binding phase*************** sum set -4 ;16-bit number n set -2 ;16-bit number ; *******allocation phase ********* calc pshx ;save old Reg X tsx ;stack frame pointer leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,x ;sum=0 movw #100,n,x ;n=100 loop ldd n,x ;RegD=n addd sum,x ;RegD=sum+n std sum,x ;sum=sum+n ldd n,x ;n=n-1 subd #1 std n,x bne loop ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts

Program 7.4 Stack frame pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.

Example 7.1. Write an assembly subroutine with three 8-bit and one 16-bit local variables allocated on the stack. Name the variables cnt, n, flag, and pt. Solution There are two general approaches for creating local variables on the stack. Stack pointer addressing is faster, but stack frame addressing is more flexible, allowing for additional stack pushes within the body of the subroutine. The solutions in Program 7.5 begin by

; *****binding phase*************** cnt set 0 ;8-bit number n set 1 ;8-bit number flag set 2 ;8-bit number pt set 3 ;16-bit number ; *******allocation phase ********* func leas -5,sp ;allocate cnt,n,flag,pt

; ********access phase ************ ; ********deallocation phase ***** leas 5,sp ;deallocation rts ;RegD=sum

; *****binding phase*************** cnt set -5 ;8-bit number n set -4 ;8-bit number flag set -3 ;8-bit number pt set -2 ;16-bit number ; *******allocation phase ********* func pshx ;save old Reg X tsx ;stack frame pointer leas -5,sp ;allocate cnt,n,flag,pt ; ********access phase ************ ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts ;RegD=sum

Program 7.5 Three 8-bit and one 16-bit local variables on the stack. The program on the left uses stack pointer addressing, and the one on the right uses a stack frame pointer.

7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables Figure 7.4 Three 8-bit and one 16-bit local variables on the stack.

Stack pointer addressing

SP

cnt n flag pt

0,SP 1,SP 2,SP 3,SP

265

Stack frame pointer addressing –5,X cnt SP –4,X n –3,X flag –2,X pt X

Old Reg X

return address

return address

8 bits

8 bits

allocating five bytes of storage. When using SP addressing, we simply decrement the stack pointer by 5. When using stack frame pointer addressing, we save the frame pointer, copy the SP into the frame pointer, and then decrement the stack pointer by 5. We then draw a picture of the stack at this point, and assign the four variables into the five bytes of storage, as shown in Figure 7.4. There is no particular advantage of one assignment over another, as long as the four variables exist contiguously. We label the addressing mode to be used to access each variable, and use these numbers to assign the bindings in our software.

7.5 Parameter Passing Using Registers, Stack, and Global Variables Up to this point in the book, we used registers to pass data into and out of subroutines. The input parameters (or arguments) are pieces of data passed from the calling routine into the subroutine during execution. The output parameter (or argument) is information returned from the subroutine back to the calling routine after the subroutine has completed its task. As previously defined in Chapter 6, there are two methods to pass parameters: call by reference and call by value. With call by reference, a pointer to the object is passed. In this way, the subroutine and the module that calls the subroutine have access to the exact same object. Call by reference can be used to pass a large quantity of data and can be used to implement a parameter that is both an input and an output parameter. With call by value, a copy of the data itself is passed. Using the stack to pass parameters provides a much greater flexibility not possible with just the registers.

7.5.1 Parameter Passing in C

The call-by-reference method passes a pointer to the object. In other words, references (pointers) to the actual arguments are passed, instead of copies of the actual arguments themselves. In this scheme, assignment statements have implied side effects on the actual arguments; that is, variables passed to a function are affected by changes to the formal arguments. Sometimes side effects are beneficial, and some times they are not. As an example, consider a stepper motor program shown in Program 7.6. Both assembly and C versions are shown. With call-by-reference parameter passing, there is one copy of the information, and the calling program (e.g., main) passes an address (RegX in the assembly version) to the function. The read and write accesses to the parameter affect the original variable. Since C supports only one formal output parameter, we can implement additional output parameters using call by reference. The calling program passes pointers to empty objects

266

7 䡲 Local Variables and Parameter Passing

Program 7.6 An input/output parameter is implemented using call by reference.

;RegX points to the angle next inc 0,x ;(*pt)++ ldaa 0,x ;RegA=(*pt) cmpa #200 bne skip clr 0,x ;(*pt) = 0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step leax angle,sp ;RegX=&angle bsr next bra loop

void next(unsigned char *pt){ (*pt)++; if((*pt) == 200){ (*pt) = 0; } } void main(void){ unsigned char angle=0; // 0 to 199 Stepper_Init(); while(1){ Stepper_Step(); next(&angle); } }

(RegX and RegY in the assembly version), and the where function fills the objects with data. Program 7.7 shows a function that returns two parameters using call by reference. Assume global variables Xx Yy are private to the where function and contain the true current position. Program 7.7 Multiple output parameters implemented using call by reference.

Xx rmb 2 ; private to where Yy rmb 2 where movw Xx,0,X ;RegX = xpt movw Yy,0,Y ;RegY = ypt rts myX set 0 ;16-bit myY set 2 func leas -4,sp ;allocate leax myX,sp ;RegX=&myX leay myY,sp ;RegY=&myY bsr where ;do something based on myX,myY leas 4,sp ;deallocate rts

short Xx,Yy; /* position */ void where(short *xpt, short *ypt){ (*xpt) = Xx; // return Xx (*ypt) = Yy; // return Yy } void func(void){ short myX,myY; where(&myX,&myY); // do something based on myX,myY }

When we use the call-by-value scheme, the values (not references) are passed to functions. With call by value, copies are made of the parameters. Within a called function, references to formal arguments access the copied values, instead of the original objects from which they were taken. At the time when the computer is executing within next, as shown in Program 7.8, there will be two separate and distinct copies of the angle data. An important point to remember about passing arguments by value in C is that there is no connection between an actual argument and its source. Changes to the arguments made within a function, have no affect what so ever on the objects that might have supplied their values. They can be changed and the original values will not be affected. This removes a burden of concern from the programmer since he may use arguments as local variables without side effects. It also avoids the need to define temporary variables just to prevent side effects. It is precisely because C uses call by value that we can pass expressions, not just variables, as arguments. The value of an expression can be copied, but it cannot be referenced since it has no existence in memory. Therefore, call by value adds important generality to the language. Since expressions may include assignment, increment, and decrement operators, it is possible for argument expressions to affect the values of arguments lying to their right. Consider, for example, func(y=x+1, 2*y);

7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables ;Input: RegA is theAngle ;Output:RegA is theAngle next inca ;theAngle++ cmpa #200 bne skip clra ;theAngle=0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step ldaa angle,sp ;copy bsr next staa angle,sp bra loop

267

unsigned char next(unsigned char theAngle){ theAngle++; // next angle if(theAngle == 200){ theAngle = 0; // one rotation } return(theAngle); } void main(void){ unsigned char angle=0; // 0 to 199 Stepper_Init(); while(1){ Stepper_Step(); angle = next(angle); } }

Program 7.8 Parameters are implemented using call by value.

where the first argument has the value x+1 and the second argument has the value 2*(x+1). The value of the second argument depends on whether the arguments are evaluated right-toleft or left-to-right. This kind of situation should be avoided, since the C language does not guarantee the order of argument evaluation. The safe way to write this is y=x+1; func(y, 2*y);

The value of the expression is calculated at the time of the call, and that value is passed into the subroutine. Checkpoint 7.10: What is the difference between call by value and call by reference?

7.5.2 Parameter Passing in Assembly Language

Program 7.9 Multiple return parameters implemented with registers.

In contrast to C, it is easy to return multiple parameters in assembly language. If just a few parameters need to be returned we can use the registers. In Program 7.9, the values of ports A, B, T, and M are to be returned. Notice that it packs two 8-bit parameters into the 16-bit Register X.

; Reg A = Port A, Reg B= Port B ; Reg X = Ports T and M GetPorts ldaa PTT ldab PTM xgdx ldaa PORTA ldab PORTB rts ********calling sequence****** jsr GetPorts * Reg A,B,X have four results staa first stab second xgdx staa third stab fourth

268

7 䡲 Local Variables and Parameter Passing

If many parameters are needed, then the stack can be used. Program 7.10 also returns the values of ports A, B, T, and M. Space for the output parameters is allocated by the calling routine, and GetPorts stores the results into those stack locations.

Program 7.10 Multiple return parameters passed on the stack.

dataA dataB dataT dataM GetPorts

set 2 set 3 set 4 set 5 movb PORTA,dataA,sp movb PORTB,dataB,sp movb PTT,dataT,sp movb PTM,dataM,sp rts

********calling sequence****** leas -4,sp ;allocate jsr GetPorts pula ;first staa first pula ;second staa second pula ;third staa third pula ;fourth staa fourth

An input parameter is information passed from the calling program into the subroutine before the subroutine is executed. An output parameter is information passed out of the subroutine back to the calling program after the subroutine is executed. A parameter can be both an input and an output. The purpose of the next set of examples is to illustrate parameter passing. The subroutine Add8 adds M M N, and sets the flag P if there is an unsigned overflow. M is a 16-bit input/output parameter, N is an 8-bit input parameter, and P is a 1-bit output parameter. The simplest and fastest method to pass parameters uses registers. In this method, the information is contained in the registers. Because concurrent programs have “separate” registers and stack areas, the subroutine is reentrant. Program 7.11 shows the addition module. Reentrancy will be discussed in Chapter 12.

Program 7.11 Addition function that passes parameters call by value in registers.

; Subroutine Calling Sequence ; place information in A,X ; bsr Add8 ; use information in CC,X ; Subroutine Definition ; N is an input parameter, an unsigned 8-bit byte, passed in Reg A ; M is an input/output, a 16-bit number, passed/returned in Reg X ; P is an output parameter, a Boolean flag, ; returned in Reg CC carry bit Add8 psha ;Put N on the stack xgdx ;Place M in Reg D addb 1,SP+ ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P xgdx ;Return result in Reg X rts

7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables

269

A simple but completely inappropriate method is to pass parameters using global variables. In this method, the information is contained in global memory variables. Because of the writes to global memory M and P, the subroutine, shown in Program 7.12, is not reentrant. Many embedded systems use this approach because the processor has limited or no facilities with handling data on the stack. Program 7.12 Addition function that passes call-by-value parameters in global variables.

; These three variables can be anywhere in RAM memory N rmb 1 ;N is an input parameter, an unsigned 8-bit number M rmb 2 ;M is an input/output parameter, 16 bits P rmb 1 ;P is an output parameter, a Boolean flag, ; 0 means no overflow, -1 means overflow ; Subroutine Calling Sequence ; place information in N,M ; bsr Add8 ; use information in M,P ; Subroutine Definition Add8 clr P ;Assume no overflow, P=0 ldd M ;Place M in Reg D addb N ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P ;Overflow, P=-1 POK std M ;Return result in M rts

A flexible and elegant method is to pass parameters using the stack. In this method, the information is placed on the system or user stack. As we will see later, most high-level language generate code that passes the first parameter in a register but use the stack to pass additional parameters. However, most high-level languages have only a single output parameter, which is usually returned in a register. When interrupts are enabled, it is possible have multiple threads active at the same time. There is still only one processor, so exactly one thread is actually running at a time, but we define concurrent programming as the state where multiple threads are “ready to run” at the same time. The interrupt hardware provides the mechanism to switch from one thread to the next. Because concurrent threads have “separate” registers and stack areas, software that uses the stack will operate properly in a concurrent environment. Conversely, extreme care is required when using global variables (including the I/O ports) in a concurrent environment. The other advantage of using the stack is that memory space is used temporarily, then deallocated. Program 7.13 passes both Program 7.13 Addition function that passes call-by-value parameters on the stack.

; ; ; ; ; ; ; ; ; ; ; ; ;

Subroutine Calling Sequence des Make room on the stack for P push M (16 bits) onto the stack push N (8 bits) onto the stack bsr Add8 ins Discard input only parameter, N pop M (16 bits) off the stack pop P (8 bits) off the stack Subroutine Definition N is an input parameter, a unsigned 8-bit number, passed on the top of the stack M is an input/output , a 16-bit number, passed/returned on top-1, top-2

continued on p. 270

270

7 䡲 Local Variables and Parameter Passing

continued from p. 269 ; P ;

is an output parameter, a Boolean flag, returned on top-3 Access Contents ;0,SP 16-bit return address N set 2 ;N,SP 8-bit N M set 3 ;M,SP 16 nit M P set 5 ;P,SP 8-bit P Add8 clr P,SP ;Assume no overflow, P=0 ldd M,SP ;Place M in Reg D addb N,SP ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P,SP ;Overflow, P=-1 POK std M,SP ;Return result in M rts ;Return

input and output parameters on the stack. Figure 7.5 shows the stack at the time while the subroutine is being executed. Figure 7.5 Stack diagram showing the parameters as passed in Program 7.13.

SP

return address N

0,SP 1,SP 2,SP 3,SP

M P

5,SP

8 bits

7.5.3 C Compiler Implementation of Local and Global Variables

One of the most important applications of learning assembly language involves analyzing assembly listings when programming in a high-level language. When one programs in a high-level language, there are many design decisions to be made affecting accuracy (e.g., overflow, dropout), reliability (e.g., buffer overflow, critical section, race condition), speed, and code size. Often, these decisions can be best understood at the assembly language level. In fact, one cannot tell if a section of high-level language code is critical without looking at the associated assembly language generated by the compiler. For another example, assume you are designing a finite-state machine in C. You could implement the FSM using a linked data structure like Program 6.22 or with a table like Program 6.23. If you compiled them both and observed the generated listing files, you could determine which version runs faster. Sometimes we have a highlevel language program that we know doesn’t work, but we just can’t seem to find the bug. Often it is easier to visualize bugs by looking at the assembly listing in and around the bugged code. Another application of observing assembly listing generated by the compiler involves proving program correctness. For example, we might ask if the following C code causes an overflow error, assuming both In and Out are 8-bit unsigned char). Out = (99*In)/100;

There are two ways to determine if overflow could occur. First, we could exhaustively test the software giving all possible inputs and verifying the correct output for each test case.

7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables

271

Second, knowing the architecture and assembly language of the machine, we could look at the compiler listing and prove that overflow cannot occur. The following assembly code was generated by the Metrowerks Codewarrior V4.6 compiler. Because Out will always be less then In the multiplication is 8 by 8 into 16 bits, and the division is 16 by 16 into 16 bits, so this software can not overflow. Furthermore, we see this code takes exactly 23 cycles to execute. 0006 0008 000b 000c 000f 0011 0013

c663 b60000 12 ce0064 1815 b751 7b0000

[1] [3] [1] [2] [12] [1] [3]

LDAB LDAA MUL LDX IDIVS TFR STAB

#99 In #100 X,B Out

The specific goal of this section is to study how compilers implement local variables and pass parameters. However, in the big picture, we can improve our understanding of both the machine architecture and our high-level language programs by looking at the assembly code generated by the compiler. Program 7.14 shows a simple C program with a global variable G, two local variables both called z, and function parameters m and n. All three compilers analyzed in this section will pass one parameter in Register D and push the other parameter on the stack. If there were additional parameters, they too would have been pushed on the stack by the calling routine. Furthermore, all three compilers will push the one parameter initially passed in Register D onto the stack at the beginning of the subroutine. In this way, during the execution of the subroutine sub, the parameters are all on the stack. The first two compilers studied in this section will place the local variables on the stack. The third compiler will generate more efficient code by placing the local variables in registers as needed.

Program 7.14 An example used to illustrate the C compiler’s use of the stack.

short G; // definition of a global variable short sub(short n, short m){ short z; z = n-m; return(z); } void main(void){ short z; // definition of a local variable G = 5; // access global variable z = 6; // access local variable G = sub(z,1); // call function, pass parameter return(0); }

Observation: Although the local variables of the main program are on the stack, and it IS possible to access them, the compiler will NOT allow the subroutine to access them. In C, there is a clear distinction between the parameters pushed on the stack that are supposed to be accessed by the subroutine and the local variables of the calling program, which are not supposed to be accessed. Common Error: It would be a grievous programming error to access the local variables of the main program from the subroutine. Therefore, in assembly language, it is essential to make the distinction between local variables and data passed on the stack to the subroutine.

272

7 䡲 Local Variables and Parameter Passing

Program 7.15 Assembly code generated for the 6812 by the GCC compiler.

z n m sub

set set set movw pshx pshx sts ldx std ldx ldd ldx subd ldx std ldx ldd pulx pulx movw rts z set main movw pshx sts movw ldx movw movw ldx ldd bsr leas std ldd pulx movw rts

2 0 8 $0800,2,-SP

$0800 $0800 n,X $0800 n,X $0800 m,X $0800 z,X $0800 z,X

2,SP+,$0800 0 $0800,2,-SP $0800 #5,G $0800 #6,z,X #1,2,-SP $0800 z,X sub 2,SP G #0 2,SP+,$0800

;1)save previous stack frame pointer ;allocate space for n,z ;2)establish stack frame pointer ;place n on the stack ;3) use frame to access ;RegD=n ;3) use frame to access ;RegD=n-m ;3) use frame to access ;z=n-m ;3) use frame to access ;RegD=z ;deallocate n,z

n m z z

;4)restore previous stack frame pointer

;1)save previous stack frame pointer ;allocate z ;2)establish stack frame pointer ;G=5 ;3) use frame to access z ;z=6 ;push second parameter onto stack ;3) use frame to access z ;first parameter in RegD ;discard parameter ;G = sub(z,1) ;deallocate z ;4)restore previous stack frame pointer

The first compiler we will study is GCC Release 3.1 for the 6812. The assembly listing, shown as Program 7.15, has been edited to be consistent with the syntax of this book. In particular, the set pseudo-ops were added to help see where information is stored on the stack. The sts instruction establishes a stack frame pointer, at global memory $0800. The use of the stack frame pointer follows the typical pattern: (1) save old frame, (2) establish a new frame, (3) use the frame whenever accessing data on the stack, and (4) restore the previous frame. The pshx instruction allocates local variables. The Register X indexing mode is used to access the data on the stack. The pulx instruction deallocates the local variables. The stack pictures for the three compilers at the time of the subd instruction are drawn in Figure 7.6. Although the local variable of main is on the stack, it will not be (and should not be) accessed by the subroutine. The next compiler we will study is ImageCraft ICCV7 for the Freescale 6812. Again, the disassembled output has been edited to clarify its operation, and shown as Program 7.16. The global symbol, G, will be assigned or bound by the linker/loader. The leas instruction

7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables GCC for the 6812 Global SP area frame G

ICCV7 for the 6812

Stack area

n

0,X z of sub 2,X old Frame 4,X return addr 6,X 8,X m z of main old Frame

Stack area

Global area G

Metrowerks Stack area Codewarrior 4.6 Global area G

SP

z of sub n

0,SP 2,SP return addr 4,SP m 6,SP z of main 16 bits

16 bits

273

SP

0,SP m return addr 2,SP 4,SP n 16 bits

Figure 7.6 The stack contains local variables, parameters, and the return address.

allocates and deallocates local variables, and stack pointer addressing is used to access parameters and local variables. This compiler passes the first input parameter into the subroutine by placing it in Register D. The remaining parameters are pushed on the stack by the calling routine.

Program 7.16 Assembly code generated for the 6812 by the ICCV7 compiler.

z m n sub

z main

set 0 set 6 set 2 pshd ;place n on the stack leas -2,SP ;allocate z ldd n,SP ;RegD = n subd m,SP ;RegD = n-m tfr D,Y sty z,SP ;z = n-m tfr Y,D leas 4,SP ;deallocate z,n rts set 2 leas -4,SP ;allocate z,secondParameter movw #5,G ;G=5 movw #6,z,SP ;z=6 ldy #1 sty 0,SP ;put second parameter on stack ldd z,SP ;first parameter in RegD jsr sub tfr D,X std G ;G = sub(z,1) ldd #0 leas 4,SP ;deallocate z,secondParameter rts

The third compiler we will study is Metrowerks Codewarrior 4.6 for the Freescale 9S12. Again, the disassembled output has been edited to clarify its operation (see Program 7.17). This is a highly optimized compiler. The local variable in both main and sub was implemented in a register. For this compiler, the second (or last) parameter is passed in Register D and the remaining parameters are pushed on the stack.

274

7 䡲 Local Variables and Parameter Passing

Program 7.17 Assembly code generated for the 9S12 by the ICC12 compiler.

m n sub

set 0 set 4 pshd ;place m on the stack ldd n,sp ;RegD = n subd m,sp ;RegD = n-m pulx ;deallocate m rts main ldab #5 clra std G ;G=5 incb ;RegD=z=6 pshd ;put first parameter on stack ldab #1 ;second parameter in RegD bsr sub leas 2,sp ;discard parameter std G ;G = sub(z,1) clrb clra rts

Observation: Notice the difference in code efficiency between a free compiler (GCC), a compiler costing about $250 (ICCV7), and a compiler costing over $3000 (Metrowerks Codewarrior).

7.6

Tutorial 7 Debugging Techniques The objective of this tutorial is to illustrate some debugging techniques. In particular, we will use TExaS to visualize stack overflow and stack underflow. Action: Copy the Tutor7.rtf Tutor7.uc files from the Web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. This should open the corresponding microcomputer window. This program contains an integer square root subroutine, based on Newton’s method. There is a bug in it that causes a stack overflow. The purpose of this main program is to exhaustively test this function by giving it all possible input patterns and manually checking the validity of all outputs. Being able to evaluate a subroutine with a known and repeatable sequence of inputs is called stabilization. Once a system is stabilized (the inputs are fixed and known), changes to the subroutine can be made being sure changes in the output are a result of software modification and not due to changes in the input. Question 7.1 This is a very easy bug to spot, but it represents a typical programming error. By visual inspection of the main program, identify the programming error that causes the stack overflow, but don’t fix it. Question 7.2 What’s the difference between a breakpoint and a ScanPoint? Action: Assemble the program. Notice that Input and Output parameters with unsigned 8-bit decimal format are in the ViewBox. A breakpoint has been added at the location in the main program labeled check. You can add breakpoints in two ways. The first way is to left-click the line in the listing file, then right-click executing BreakAtCursor. The second way is to type the address (you should use the symbolic address check rather than its numerical value) into the Break/ScanPoints box and click the add button. You could have used its absolute address, but absolute addresses must be recalculated each time the software is modified. The double red arrow («) points in the listing file to the breakpoint. Make check a ScanPoint by toggling the Mode->BreakMode command until the check mark is removed. Figure T7.1 shows the resulting configuration.

7.6 䡲 Tutorial 7 Debugging Techniques

275

Figure T7.1 A ScanPoint is added to Tutorial 7.

Action: Run the system until the first ten outputs are calculated, then stop the simulation with a F12. You should see the following results in TheLog.rtf file. These results are correct. Input=0 Input=1 Input=2 Input=3 Input=4 Input=5 Input=6 Input=7 Input=8 Input=9

Output=0 Output=1 Output=2 Output=2 Output=2 Output=2 Output=3 Output=3 Output=3 Output=3

Question 7.3 Explain how these first ten results are correct. In particular, verify how the output is the square root of the input. Are there any minor errors? Action: Run the system until TExaS gives the “ Write to EEPROM address 0x07FF” error. Hit reset, run it again, and this time observe the memory box in the Stack window. Notice locations $0800 (Input) and $0801 (Output). The rest of the memory ($802 to $0901) is the stack. In particular, watch in the memory box as the stack overflows. Question 7.4 Look in TheList.rtf file and identify which instruction caused the error. The cursor arrow (») will point to the instruction after the one that caused the error. Action: When a stack instruction causes a bug, observing the stack pointer makes sense. Add the SP to the ViewBox, hit reset, and run it again. The last few outputs are shown below Input=59 Output=8 SP=$0813 Input=60 Output=8 SP=$080F Input=61 Output=8 SP=$080B Input=8 Output=8 SP=$0807 Write to EEPROM address 0x07FF. Question 7.5 Stack errors can cause weird behavior. Why did input change from 61 to 8, when it should have been 62? Action: Fix the bug (change the second pshx to a pulx), assemble, and run the debugged system. Action: Sometimes a stack error results in program branching to a location that is not part of your program. Remove the pshb instruction from first line of the sqrt subroutine. Assemble the software with this new bug and run the system. This stack underflow will cause an error. You should get a Read from uninitialized RAM address error.

276

7 䡲 Local Variables and Parameter Passing Question 7.6 You won’t be able to find the cursor arrow (») in TheList.rtf file. Add the PC to the ViewBox, hit reset, run the system again, and check the value of the PC at the time of error. Question 7.7 There are two ways to find this bug. The first way is to execute Action-BackDump. What are the last five instructions to be executed just before the error? Where in the program are these five instructions? Question 7.8 The second way to visualize the error is to activate Mode-FollowPC. Click this option, reset the computer, and run it again. The rts instruction is highlighted, showing you the last instruction to execute. What does the purple color on the pulb instruction mean?

7.7

Homework Problems Homework 7.1 What does it mean to say a function is public versus private? Why is this distinction important? Homework 7.2 What does it mean to say a variable is public versus private? Why is this distinction important? Homework 7.3 What does it mean to say a variable is local versus global? Homework 7.4 Write assembly code that finds the average value of a ten-element array. The two parameters are passed by reference on the stack. Local variables must be allocated on the stack. void average(unsigned short *pt, unsigned short *ave){ unsigned short sum,n; sum = 0; for(n=0;nBuffer[j]){ temp = Buffer[j-1]; /* Exchange */ Buffer[j-1] = Buffer[j]; Buffer[j] = temp; } } } } A typical calling sequence is ldx #mydata ; pointer to 20-byte structure (call by reference) pshx

7.8 䡲 Laboratory Assignments ldaa #20 psha jsr Bubble pula pulx

281

; Count (call by value)

; balance stack

b) Use this simple assembly code to debug your Bubble Sort algorithm. org $0800 mydata rmb 5 main lds #$4000 ldaa #$35 staa mydata ; initialize ldd #$3433 std mydata+1 ldd #$3231 std mydata+3 ; mydata[]={'5','4','3','2','1'} ldx #mydata ; pointer to 5-byte structure (call by reference) pshx ldab #5 ; Count (call by value) pshb jsr Bubble ins ; balance stack pulx stop c) Write assembly code that tests the Bubble Sort algorithm. Copy and paste the SCI device driver software from tut2.rtf. This main program will input an ASCII string from a SCICRT interface (call SCI_InString), calculate its length, call the bubble sort subroutine, and output the sorted string on the SCI-CRT (call SCI_OutString). d) Add debugging code to the test software in part c) that measures the elasped execution time for the sort subroutine. Plot the execution time versus buffer size (using worst-case initial data) for buffer sizes 10, 20, 30, and 40 bytes. Fit this data to a quadratic equation to derive a general solution for all sizes. Lab 7.2 Heap Sort Purpose. This lab has these major objectives: 䡲 To evaluate the static and dynamic efficiency of software 䡲 To learn how to pass subroutine parameters on the stack 䡲 By value, pushing the value onto the stack 䡲 By reference, pushing a pointer onto the stack 䡲 To implement local variables on the stack 䡲 To study the Heap Sort algorithm Description. a) Write assembly code that implements the Heap Sort algorithm. The input parameters are passed on the stack. Local variables must be allocated on the stack. The buffer size (Count) is 1 to 255. void HeapSort(char *Buffer, unsigned char Count){ // Count is the size of the byte array Buffer[i] unsigned char i,j; // used when sifting unsigned char ir; unsigned char m; // used in the hiring phase char z; // temporary, used to sort m = (Count>>1)+1; // initial value Count/2+1 ir = Count; for(;;){ if(m > 1){ --m;

282

7 䡲 Local Variables and Parameter Passing z = Buffer[m]; // } else{ z = Buffer[ir]; Buffer[ir] = Buffer[1]; if(--ir == 1){ Buffer[1] = z;

still hiring // // // // // //

in retirement and promotion clear space at end Retire top of heap into it Done with last promotion? least competent worker of all

break; } } i = m; // whether in the hiring or promotion phase j = m+m; // we set up to sift down element z to while(j =1; // shift into position } pt++; } return key; }

8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller

303

once in the initialization. The Key_Scan function returns two parameters. One parameter is the number of keys pressed. If there is exactly one key pressed, the second parameter contains the ASCII code representing that key. A debounced interface is created by scanning the keyboard at a rate slower than the time of the bouncing. For example, if the bounce is less than 5 ms, then scan the keyboard every 10 ms. This way a bouncing key will not be seen as touched/released/touched. Observation: An n by n matrix keypad has n2 keys, but requires only 2n I/O pins. You can detect any 0, 1, or 2 key combinations, but it has trouble when 3 or more are pressed. Checkpoint 8.3: What happens if the three keys ‘1’ ‘2’ and ‘5’ are all pressed? Checkpoint 8.4: Why wouldn’t you use a matrix approach when creating a music keyboard for an electric piano?

The key wakeup and input capture will be presented in the next chapter. Either mechanism can be used to generate interrupts on touch and release. We can “arm” this interface for interrupts by driving all the rows to zero.

8.5

Parallel Port LCD Interface with the HD44780 Controller Microprocessor controlled LCD displays are widely used, having replaced most of their LED counterparts, because of their low power and flexible display graphics. This example will illustrate how a handshaked parallel port of the microcomputer will be used to output to the LCD display. The hardware for the display uses an industry standard HD44780 controller, as shown in Figure 8.13. The low-level software initializes and outputs to the HD44780 controller. The 9S12 simply writes ASCII characters to the HC44780 controller. Each ASCII character is mapped into a 5 by 8 bit pixel image, called a font. A 1 by 16 LCD display is 80 pixels wide by 8 pixels, and the HD44780 is responsible for refreshing the pixels in a rastered scanned manner similar to maintaining an image on a TV screen or computer monitor.

Figure 8.13 Interface of a HD44780 LCD controller.

+5

9S12

10kΩ PH0 PH1 PH2 PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Vss (ground) Vdd (power) Vee (contrast) RS R/W E DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7

1 by 16 LCD display

HD44780 controller 5 by 8 bit font

There are four types of access cycles to the HD44780 depending on RS and R/W as shown in Table 8.12. Table 8.12 Two control signals specify the type of access to the HD44780.

RS

R/W

Cycle

0 0 1 1

0 1 0 1

Write to Instruction Register Read Busy Flag (bit 7) Write data from P to the HD44780 Read data from HD44780 to the P

304

8 䡲 Serial and Parallel Port Interfacing

Normally, you write ASCII characters into the data buffer (called DDRAM in the data sheets) to have them displayed on the screen. However, you can create up to eight new characters the LCD by writing to the CGRAM; notice the University of Texas (UT) symbol in Figure 8.14. These new characters exist as ASCII data 0 to 7. Figure 8.14 HD44780-based LCD display interfaced to a 9S12. (Courtesy of Jonathan Valvano.)

Two types of synchronization can be used, blind-cycle and busy-waiting. Most operations require 40 s to complete while some require 1.64 ms. This implementation uses the timer to create the blind-cycle wait. A busy-waiting interface would have provided feedback to detect a faulty interface, but has the problem of creating a software crash if the LCD never finishes. A better interface would have utilized both busy-waiting and blind-cycle, so that the software can return with an error code if a display operation does not finish on time (due to a broken wire or damaged display.) First we present a low-level private helper function, see Program 8.6. This function would not have a prototype in the LCD.H file. E equ 4 ;PH2 RW equ 2 ;PH1 RS equ 1 ;PH0 ; Output command to LCD ; Inputs: RegA is command, Outputs: none OutCmd staa PTP movb #0,PTH ;E=0, RS=0, R/W=0 movb #E,PTH ;E=1, RS=0, R/W=0 movb #0,PTH ;E=0, RS=0, R/W=0 ldd #40 jsr Timer_Wait ;at least 37us rts

#define E 4 // on PH2 #define RW 2 // on PH1 #define RS 1 // on PH0 void OutCmd(unsigned char command){ PTP = command; PTH = 0; // E=0, R/W=0, RS=0 PTH = E; // E=1, R/W=0, RS=0 PTH = 0; // E=0, R/W=0, RS=0 Timer_Wait(40); // at least 37us }

Program 8.6 Private functions for an HD44780 controlled LCD display.

Next, we show the high-level public functions, see Program 8.7. These functions would have prototypes in the LCD.H file. The initialization sequence is copied from the data sheet of the HD44780. Figure 8.15 shows a rough sketch of the E, RS, R/W and data signals as the LCD_OutChar function is executed.

8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller ; Initialize HD44780 LCD display ; Inputs: none, Outputs: none LCD_Init movb #$FF,DDRP ;LCD data movb #$FF,DDRH ;PH3=R/W,PH1=E, PH2=RS jsr Timer_Init ;1us TCNT ldy #15 jsr Timer_Wait1ms ;15ms ldaa #$38 ;first time jsr OutCmd ldy #4 jsr Timer_Wait1ms ;4ms ldaa #$38 ;second time jsr OutCmd ldd #100 jsr Timer_Wait ;100us ldaa #$38 ;third time jsr OutCmd ldaa #$38 ;N=1 two line, F=0 5x7 jsr OutCmd ;DL=1 8-bit data ldaa #$08 ;display off jsr OutCmd jsr LCD_Clear ldaa #$0E ;set D=1, C=1, B=0 jsr OutCmd ;cursor on,no blink ldaa #$06 ;set I/D, S jsr OutCmd ;inc, no shift ldaa #$14 ;cursor move jsr OutCmd ;left rts ; Output one character to LCD ; Inputs: RegA is ASCII, Outputs: none LCD_OutChar staa PTP movb #RS,PTH ;E=0, R/W=0, RS=1 movb #E+RS,PTH ;E=1, R/W=0, RS=1 movb #RS,PTH ;E=0, R/W=0, RS=1 ldd #40 jsr Timer_Wait ;at least 40us rts LCD_Clear ldaa #$01 jsr OutCmd ;Clear Display ldd #1600 jsr Timer_Wait ;at least 1.52ms ldaa #$02 jsr OutCmd ;Cursor to home ldd #1600 jsr Timer_Wait ;at least 1.52ms rts

Program 8.7 Public functions for an HD44780 controlled LCD display.

305

void LCD_init(void){ DDRH = 0xFF; DDRP = 0xFF; Timer_Init(); // 1us TCNT Timer_Wait1ms(15); // 15 ms OutCmd(0x38); // function set Timer_Wait1ms(4); // 4 ms OutCmd(0x38); // second time Timer_Wait(100); // 100us OutCmd(0x38); // third time // now the busy flag could be read OutCmd(0x38); // 8bit, N=1 2line, F=0 5by7 OutCmd(0x08); // D=0 displayoff LCD_Clear(); OutCmd(0x0E); // D=1 displayon, // C=1 cursoron, B=0 blinkoff OutCmd(0x06); // Entry mode // I/D=1 Increment, S=0 nodisplayshift OutCmd(0x14); // S/C=0 cursormove, R/L=0 shiftleft } void LCD_OutChar(unsigned char letter){ // letter is ASCII code PTP = letter; PTH = RS; // E=0, R/W=0, RS=1 PTH = E+RS; // E=1, R/W=0, RS=1 PTH = RS; // E=0, R/W=0, RS=1 Timer_Wait(40); // 40 us wait } void LCD_clear(void){ OutCmd(0x01); // Clear Display Timer_Wait(1600); // 1.6 ms wait OutCmd(0x02); // Cursor to home Timer_Wait(1600); // 1.6 ms wait }

306

8 䡲 Serial and Parallel Port Interfacing

Figure 8.15 Timing diagram of the LCD signals as data is sent to the HD44780 display.

PTP = letter;

data

PTH = RS;

RS

// E=0, R/W=0, RS=1

PTH = E+RS; // E=1, R/W=0, RS=1 PTH = RS; // E=0, R/W=0, RS=1 Timer_Wait(40); // 40 us wait

R/W E

Checkpoint 8.5: Assuming the 9S12 is running at 8 MHz, how many s wide is the E pulse for the assembly language solution in Program 8.7? The movb instruction requires 4 cycles.

8.6

Binary Actuators 8.6.1 Interface

Relays, solenoids, and DC motors are grouped together because their electrical interfaces are similar. We can add speakers to this group if the sound is generated with a square wave. In each case, there is a coil, and the computer must drive (or not drive) current through the coil. To interface a coil, we consider voltage, current, and inductance. We need a power supply at the desired voltage requirement of the coil. If the only available power supply is larger than the desired coil voltage, we use a voltage regulator (rather than a resistor divider to create the desired voltage.) We connect the power supply to the positive terminal of the coil, shown as V in Figure 8.16. We will use a transistor device to drive the negative side of the coil to ground. The computer can turn the current on and off using this transistor. The second consideration is current. In particular, we must however select the power supply and an interface device that can support the coil current. The 7406 is a digital invertor with open collector outputs (hiZ and low). The 2N2222 is a bipolar junction transistor (BJT), NPN type, with moderate current gain. The TIP120 is a Darlingtion transistor, also NPN type, that can handle larger currents. The IRF540 is a MOSFET transistor that can handle even more current. BJT and Darlington transistors are current-controlled (meaning the output is a function of the input current), while the MOSFET is voltage-controlled (output is a function of input voltage). When interfacing a coil to the microcontroller, we use information like Table 8.13 to select an interface

Figure 8.16 Binary interface to EM relay, solenoid, DC motor or speaker.

+V

+V

+

+

R 1N914 9S12

7406

Port

IOL

L + –

emf

9S12

IC Rb IB

–

+

Port

VOL –

+

VOH –

Table 8.13 Four possible devices that can be used to interface a coil compared to the 9S12.

R

2N2222 TIP120 1N914 or IRF540

Coil

+ – VCE + VBE –

Device

Type

Maximum Current

9S12 7406 2N2222 TIP120 IRF540

CMOS TTL logic BJT NPN Darlington NPN power MOSFET

10 mA 40 mA 500 mA 5A 28 A

Coil L + –

emf –

8.6 䡲 Binary Actuators

307

device capable the current necessary to activate the coil. It is a good design practice to select a driver with a maximum current at least twice the required coil current. When the digital Port output is high, the the interface transistor is active and current flows through the coil. When the digital Port output is low, the transistor is not active and no current flows through the coil. Similar to the solenoid and EM relay, the DC motor has a frame that remains motionless, and an armature that moves. In this case, the armature moves in a circular manner (shaft rotation). A DC motor has an electro-magnet as well. When current flows through the coil, a magnetic force is created causing a rotation of the shaft. Brushes positioned between the frame and armature are used to alternate the current direction through the coil, so that a DC current generates a continuous rotation of the shaft. When the current is removed, the magnetic force stops, and the shaft is free to rotate. The resistance in the coil (R) comes from the long wire that goes from the terminal to the – terminal of the motor. The inductance in the coil (L) arises from the fact that the wire is wound into coils to create the electromagnetics. The coil itself can generate its own voltage (emf) because of the interaction between the electric and magnetic fields. If the coil is a DC motor, then the emf is a function of both the speed of the motor and the developed torque (which in turn is a function of the applied load on the motor.) Because of the internal emf of the coil, the current will depend on the mechanical load. For example, a DC motor running with no load might draw 50 mA, but under load (friction) the current may jump to 500 mA. Observation: It is important to realize that many devices can not be connected directly up to the microcontroller. In the specific case of motors, we need an interface that can handle the voltage and current required by the motor.

The third consideration is inductance in the coil. The 1N914 diode in Figure 8.16 provides protection from the back emf generated when the switch is turned off, and the large dI/dt across the inductor induces a large voltage (on the negative terminal of the coil), according to V L•dI/dt. For example, if you are driving 0.1A through a 0.1 mH coil (Port output 1) using a 2N2222, then disable the driver (Port output 0), the 2N2222 will turn off in about 20ns. This creates a dI/dt of at least 5•106 A/s, producing a back emf of 500 V! The 1N914 diode shorts out this voltage, protecting the electronic from potential damage. The 1N914 is called a snubber diode. If you are sinking 16 mA (IOL) with the 7406, the output voltage (VOL) will be 0.4 V. However, when the IOL of the 7406 equals 40 mA, its VOL will be 0.7 V. 40 mA is not a lot of current when it comes to typical coils. However, the 7406 interface is appropriate to control small reed relays. Checkpoint 8.6: A reed relay is interfaced with the 7406 circuit in Figure 8.16. The positive terminal of the coil is connected to 5 V and the coil requires 40 mA. What will be the voltage across the coil when active?

There are lots of motor driver chips, but they are fundamentally similar to the circuits shown in Figure 8.16. For the 2N2222 and TIP120 NPN transistors, if the Port output is low, no current can flow into the base, so the transistor is off, and the collector current, IC, will be zero. If the Port output is high, current does flow into the base and VBE goes above VBEsat turning on the transistor. The transistor is in the linear range if VBE VBEsat and Ic hfe•Ib. The transistor is in the saturated mode if VBE VBEsat, VCE 0.3 V and Ic hfe•Ib. We select the resistor for the NPN transistor interfaces to operate right at the transition between linear and saturated mode. We start with the desired coil current, Icoil (the voltage across the coil will be V VCE which will be about V 0.3 V). Next, we calculate the needed base current (Ib) given the current gain of the NPN Ib Icoil/hfe

308

8 䡲 Serial and Parallel Port Interfacing

knowing the current gain of the NPN (hfe). See Table 8.14. Finally, given the output high voltage of the microcontroller (VOH is about 5 V) and base-emitter voltage of the NPN (VBEsat) needed to activate the transistor, we can calculate the desired interface resistor. Rb (VOH VBEsat)/Ib hfe *(VOH VBEsat)/Icoil The inequality means we can choose a smaller resistor, creating a larger Ib. Because the of the transistors can vary a lot, it is a good design practice to make the Rb resistor about 1 ⁄2 the value shown in the above equation. Since the transistor is saturated, the increased base current produces the same VCE and thus the same coil current.

Table 8.14 Design parameters for the 2N2222 and TIP120.

Parameter

2N2222 (IC 150 mA)

2N2222 (IC 500 mA)

TIP120 (IC 3A)

hfe VBEsat VCE at saturation

100 0.6 0.3

40 2 1

1000 2.5 V 2V

The IRF540 MOSFET is a voltage-controlled device, if the Port output is low, the MOSFET is off, and the coil current will be zero. If the Port output is high, the MOSFET is on, and the VCE will be very close to 0. No resistor is needed between the Port output and the gate of the MOSFET, but often we add a resistor (i.e., Rb 1 k) to limit current into and out of the 9S12 during the turn on/off transients. Because of the resistance of the coil, there will not be significant dI/dt when the device is turned on. Consider a DC motor as shown in Figure 8.16 with V 12 V, R 50 and L 100 H. Assume we are using a 2N2222 with a VCE of 1 V at saturation. Initially the motor is off (no current to the motor). At time t 0, the digital port goes from 0 to 5 and transistor turns on. Assume for this section, the emf is zero (motor has no external torque applied to the shaft) and the transistor turns on instantaneously, we can derive an equation for the motor (Ic) current as a function of time. The voltage across both LC together is 12 VCE 11 V at time 0. At time 0, the inductor is an open circuit. Conversely, at time , the inductor is a short circuit. The Ic at time 0 is 0, and the current will not change instantaneously because of the inductor. Thus, the Ic is 0 at time 0. The Ic is 11 V/50 220 mA at time . 11 V Ic*R L*d Ic/dt General solution to this differential equation is Ic I0 I1et/

d Ic/dt (I1/ )et/

We plug the general solution into the differential equation and boundary conditions. 11 V (I0 I1et/ )*R L*(I1/ )et/ To solve the differential equation, the time constant will be L/R 2 sec. Using initial conditions, we get Ic 220 mA*(1 et/2 s)

Example 8.3 Design an interface for two 12 V 1A geared DC motors. These two motors will be used to propel a robot with two independent drive wheels as shown in Figure 8.17.

8.6 䡲 Binary Actuators

309

Figure 8.17 Geared DC motors provide a good torque and speed for light-weight robots. (Courtesy of Jonathan Valvano.)

Solution We will use two copies of the TIP120 circuit in Figure 8.16 because the TIP120 can sink at least three times the current needed for this motor. We select a 12 V supply and connect it to the V in the circuit. The needed base current is. Ib Icoil/hfe 1A/1000 1 mA The desired interface resistor. Rb (VOH Vbe)/Ib (5 2.5)/1 mA 2.5 k To cover the variability in hfe, we will use a 1.24 k resistor instead of the 2.5 k. The actual voltage on the motor when active will be 12 2 10 V. The coils and transistors can vary a lot, so it is appropriate to experimentally verify the design by measuring the voltages and currents.

8.6.2 Electromagnetic and Solid-State Relays

A relay is a device that responds to a small current or voltage change by activating switches or other devices in an electric circuit. It is used to remotely switch signals or power. The input control is usually electrically isolated from the output switch. The input signal determines whether the output switch is open or closed. Relays are classified into three categories depending upon whether the output switches power (i.e., high currents through the switch) or electronic signals (i.e., low currents through the switch). Another difference is how the relay implements the switch. An electromagnetic (EM) relay uses a coil to apply EM force to a contact switch that physically opens and closes. The solid state relay (SSR) uses transistor switches made from solid state components to electronically allow or prevent current flow across the switch). The three types are: 䡲 The classic general purpose relay has an EM coil and can switch AC power 䡲 The reed relay has an EM coil and can switch low level DC electronic signals 䡲 The solid state relay (SSR) has an input triggered semiconductor power switch Two solid state relays are shown in Figure 8.18. Interfacing a SSR is identical to interfacing an LED, which was previously described in Section 2.8.3, Figure 2.17. A SSR interface was presented earlier as Figure 3.10. SSRs allow the microcontroller to switch AC loads from 1 to 30A. They are appropriate in situations where the power is turned on and off many times.

310

8 䡲 Serial and Parallel Port Interfacing

Figure 8.18 Solid state relays can be used to control power to an AC appliance. (Courtesy of Jonathan Valvano.)

The input circuit of an EM relay is a coil with an iron core. The output switch includes two sets of silver or silver-alloy contacts (called poles.) One set is fixed to the relay frame, and the other set is located at the end of leaf spring poles connected to the armature. The contacts are held in the “normally closed” position by the armature return spring. When the input circuit energizes the EM coil, a “pull in” force is applied to the armature and the “normally closed” contacts are released (called break) and the “normally open” contacts are connected (called make.) The armature pull in can either energize or de-energize the output circuit depending on how it is wired. Relays are mounted in special sockets, or directly soldered onto a PC board. The number of poles (e.g., single pole, double pole, 3P, 4P, etc.) refers to the number of switches that are controlled by the input. Single throw means each switch has two contacts that can be open or closed. Double throw means each switch has three contacts. The common contact will be connected to one of the other two contacts (but not both at the same time.) The parameters of the output switch include maximum AC (or DC) power, maximum current, maximum voltage, on resistance, and off resistance. A DC signal will weld the contacts together at a lower current value than an AC signal, therefore the maximum ratings for DC are considerable smaller than for AC. Other relay parameters include turn on time, turn off time, life expectancy, and input/output isolation. Life expectancy is measured in number of operations. Figure 8.19 illustrates the various configurations available. The sequence of operation is described in Table 8.15. Figure 8.19 Standard relay configurations.

Form A 1

1

Form C 1 2

Form D 2 1

Form E 3 2

+

+

+

+

1 +

–

–

–

–

–

SPST-NO

Table 8.15 Standard definitions for five relay configurations.

Form B

SPST-NC

SPDT

SPDT

SPDT (B-M-B)

Form

Activation Sequence

Deactivation Sequence

A B C D E

Make 1 Break 1 Break 1, Make 2 Make 1, Break 2 Break 1, Make 2, Break 3

Break 1 Make 1 Break 2, Make 1 Make 2, Break 1

8.7 䡲 *Pulse-Width Modulation

8.6.3 Solenoids

311

Solenoids are used in discrete mechanical control situations such as door locks, automatic disk/tape ejectors, and liquid/gas flow control valves (on/off type). Much like an EM relay, there is a frame that remains motionless, and an armature that moves in a discrete fashion (on/off). A solenoid has an electro-magnet. When current flows through the coil, a magnetic force is created causing a discrete motion of the armature. Each of the solenoids shown Figure 8.20 has a cylindrically shaped armature the moves in the horizontal direction relative to the photograph. The solenoid on the top is used in a door lock, and the second from top is used to eject the tape from a video cassette player. When the current is removed, the magnetic force stops, and the armature is free to move. The motion in the opposite direction can be produced by a spring, gravity, or by a second solenoid.

Figure 8.20 Photo of four solenoids. (Courtesy of Jonathan Valvano.)

8.7

*Pulse-Width Modulation In the previous interfaces the microcontroller was able to control electrical power to a device in a binary fashion: either all on or all off. Sometimes it is desirable for the microcontroller to be able to vary the delivered power in a variable manner. One effective way to do this is to use pulse width modulation (PWM). The basic idea of PWM is to create a digital output wave of fixed frequency, but allow the microcontroller to vary its duty cycle. Figure 8.21 shows various waveforms that are high for H cycles and low for L cycles. The system is designed in such a way that H ⴙ L is constant (meaning the frequency is fixed). The duty cycle is defined as the fraction of time the signal is high: H Duty = H + L Hence, duty cycle varies from 0 to 1. We interface this digital output wave to an external actuator (like a DC motor), such that power is applied to the motor when the signal is high, and no power is applied when the signal is low. We purposely select a frequency high enough so the DC motor does not start/stop with each individual pulse, but rather responds to the overall average value of the wave. The average value of a PWM signal is linearly related to its duty cycle and is independent of its frequency. Let P (P V*I) be the power

312

8 䡲 Serial and Parallel Port Interfacing

Figure 8.21 Pulse width modulation used to vary power delivered to a DC motor.

+V

DC motor

+

R

1N914 2N2222 TIP120 or IRF540 Rb

9S12 PWM PP0

H

L

200

50

PP0

125

125

PP0

50

200

PP0

L + –

H

L

emf –

H H

L L

to the DC motor, shown in Figure 8.21, when the PP0 signal is high. Notice the circuit in Figure 8.21 is one of the examples previously described in Figure 8.16. Under conditions of constant speed and constant load, the delivered power to the motor is linearly related to duty cycle. Delivered power duty * P

H H+L

*P

Unfortunately, as speed and torque vary, the developed emf will affect delivered power. Nevertheless, PWM is a very effective mechanism, allowing the microcontroller to adjust delivered power. Appreciating the importance of pulse-width modulation, Freescale added dedicated hardware to handle PWM, not previously available in the 6811. The 9S12C32 has six channels, the 9S12DP512 has eight channels, and the 9S12E128 has 12 channels. This section will present the details on the 9S12DP512. With the exception of the MODRR register, the PWM operation on all 9S12 microcontrollers is identical. Table 8.16 shows the 9S12DP512 registers used to create pulse-width modulated outputs. There are eight 8-bit channels, but

Address

msb

$00B4 $00B6 $00B8 $00BA $00BC $00BE $00C0 $00C2

15 15 15 15 15 15 15 15

Address

Bit 7

6

5

4

3

2

1

Bit 0

Name

$00A0 $00A1 $00A2 $00A3 $00A4 $00A5 $00A8 $00A9

PWME7 PPOL7 PCLK7 0 CAE7 CON67 Bit 7 Bit 7

PWME6 PPOL6 PCLK6 PCKB2 CAE6 CON45 6 6

PWME5 PPOL5 PCLK5 PCKB1 CAE5 CON23 5 5

PWME4 PPOL4 PCLK4 PCKB0 CAE4 CON01 4 4

PWME3 PPOL3 PCLK3 0 CAE3 PSWAI 3 3

PWME2 PPOL2 PCLK2 PCKA2 CAE2 PFRZ 2 2

PWME1 PPOL1 PCLK1 PCKA1 CAE1 0 1 1

PWME0 PPOL0 PCLK0 PCKA0 CAE0 0 Bit 0 Bit 0

PWME PWMPOL PWMCLK PWMPRCLK PWMCAE PWMCTL PWMSCLA PWMSCLB

14 14 14 14 14 14 14 14

13 13 13 13 13 13 13 13

12 12 12 12 12 12 12 12

11 11 11 11 11 11 11 11

10 10 10 10 10 10 10 10

9 9 9 9 9 9 9 9

8 8 8 8 8 8 8 8

7 7 7 7 7 7 7 7

6 6 6 6 6 6 6 6

5 5 5 5 5 5 5 5

Table 8.16 9S12DP512 registers used to configure pulse-width modulated outputs.

4 4 4 4 4 4 4 4

3 3 3 3 3 3 3 3

2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1

lsb

Name

0 0 0 0 0 0 0 0

PWMPER01 PWMPER23 PWMPER45 PWMPER67 PWMDTY01 PWMDTY23 PWMDTY45 PWMDTY45

8.7 䡲 *Pulse-Width Modulation

313

two 8-bit channels can be concatenated together to create one 16-bit channel. In particular, each of the 16-bit registers in Table 8.16 could be considered as two separate 8-bit registers. For example, the 16-bit register PWMPER01 could be considered as the two 8-bit registers PWMPER0 (at address $00B4) and PWMPER1 (at address $00B5). On the 9S12DP512, the PWM channels always use outputs on Port P (PP7-PP0). Bits 4, 5, and 6 of the MODRR register are used to map the SPI channels onto Port P, Port H or Port S, as described in Table 8.10. Since PWM has precedence over SPI (see Table 4.8), a Port P pin will become a PWM output if the corresponding bit in the PWME register is set (regardless of MODRR and SPI). On the 9S12C32, the six PWM channels use outputs on Port P (PP5 to PP0) or on Port T (PT4 to PT0). PP5 is available on all 9S12C32 packages, but the other five channels can be connected to either Port P or Port T. If a bit in the MODRR register is 1, the corresponding Port T pin is connected to the PWM system (see Table 8.17). If the bit is 1, the corresponding Port T pin is connected to the timer system. Address

Bit 7

6

5

4

3

2

1

Bit 0

Name

$0247

0

0

0

MODRR4

MODRR3

MODRR2

MODRR1

MODRR0

MODRR

Table 8.17 9S12C32 MODRR register determines if PWM is on Port P or Port T.

On the 9S12E128, six PWM channels can be created on Port P (PP5 to PP0) and six more on Port U (PU5-PU0). The MODRR register can be used to map the bottom four bits of Port U onto either PWM or a timer module. The PWME register allows you to enable/disable individual PWM channels. The PWMCTL register is used to concatenate two 8-bit channels into one 16-bit PWM. For example, if the CON23 is 1, then channels 2 and 3 become one 16-bit channel with the output generated on PP3. Concatenated channels are controlled using the higher of the two channels. For example, concatenated channel 23 is configured with bits PWME3, PPOL3, PCLK3, and CAE3. The PWMPOL register specifies the polarity of the output. Figure 8.22 shows a PWM output for case when the PPOLx bit is 1. The output will be high for the number of counts in the PWMDTY register. The PWMPER register contains the number of counts in one complete cycle. The duty cycle is defined as the fraction of time the signal is high, calculated as a percent, depends on PWMPER and PWMDTY. Duty cycle 100% * PWMDTYx/PWMPERx Figure 8.22 PWM output generated when PPOL 1.

PWMPERx PWMDTYx PPx

If the PPOLx bit is 0, the output will be low for the number of counts in the PWMDTY register, as illustrated in Figure 8.23. The duty cycle, defined as a fraction of time the signal is high, is Duty cycle 100% * (PWMPERx PWMDTYx)/PWMPERx

Figure 8.23 PWM output generated when PPOL 0.

PWMPERx PWMDTYx PPx

314

8 䡲 Serial and Parallel Port Interfacing

There are many possible choices for the clock. The base clock is derived from the E clock. Activating the PLL affects the E clock, hence will affect the PWM generation. Channels 0, 1, 4, and 5 use either clock A or clock SA. Channels 2, 3, 6, and 7 use either clock B or clock SB. The six bits in the PWMPRCLK register, as shown in Table 8.18, determine the relationship between clocks A,B and the E clock.

Table 8.18 Clock A and Clock B prescale in PWMCLK.

PCKB2

PCKB1

PCKB0

Clock B

PCKA2

PCKA1

PCKA0

Clock A

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

E E/2 E/4 E/8 E/16 E/32 E/64 E/128

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

E E/2 E/4 E/8 E/16 E/32 E/64 E/128

It is possible to divide the A and B clocks further using the PWMSCLA and PWMSCLB registers. The period of the SA clock is the period of the A clock divided by two times the value in the PWMSCLA register. Similarly, the period of the SB clock is the period of the B clock divided by two times the value in the PWMSCLB register. If the value in PWMSCLA(B) is 0, then a divide by 512 is selected. The clock used for each channel is determined by the PWMCLK register. The period of the PWM output is the period of the selected clock times the value in the PWMPER register. PCLKn 1 Clock SB is the clock source for PWM channel n, where n 7, 6, 3, or 2 0 Clock B is the clock source for PWM channel n PCLKm 1 Clock SA is the clock source for PWM channel m, where m 5, 4, 1, or 0 0 Clock A is the clock source for PWM channel m Let n be the 3-bit value for PCKA2-0 in the PWMCLK register. Let the E clock period is PeriodE. Then if the A clock is selected for channel x, the periods of the A clock and PWM output will be PeriodA 2n * PeriodE PeriodPTx 2n * PWMPERx * PeriodE If the SA clock is selected for channel x, the periods of the SA clock and PWM output will be or or

PeriodSA 2n * 2 * PWMSCLA * PeriodE PeriodSA 2n * 512 * PeriodE (if PWMSCLA equals 0) PeriodPTx 2n * 2* PWMSCLA * PWMPERx * PeriodE PeriodPTx 2n * 512 * PWMPERx * PeriodE (if PWMSCLA equals 0)

The design of a PWM system considers three factors. The first factor is period of the PWM output. Most applications choose a period, initialize the waveform at that period, and adjust the duty cycle dynamically. The second factor is precision, which is the total number of duty cycles that can be created. An 8-bit PWM channel may have up to 256 different outputs, while a 16-bit channel can potentially create up to 65536 different duty cycles. More specifically, since the duty cycle register must be less than or equal to the period register (e.g., PWMDTYx PWMPERx), the precision of the system will equal PWMPERx 1 in alternatives. The last consideration is the number of channels. The 9S12DP512 supports up to eight 8-bit channels or four 16-bit channels. It is possible to mix and match, creating for example four 8-bit channels and two 16-bit channels. Different versions of the 9S12 will have different numbers of PWM channels.

8.7 䡲 *Pulse-Width Modulation

315

Example 8.4 Implement a 10-ms 8-bit PWM. Solution The software for this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. In this design example, we will create the PWM output using channel 0 generated on the PP0 output, using the hardware shown in Figure 8.21. In order to maximize precision, it is best to create the 10 ms period using as large a value in PWMPER0 as possible. We have the limitation that the prescale and PWMPER0 factors will be integers. Since 10 ms/256 equals 39.0625 s, we need a clock just larger than 39 s. The fastest clock that can be used is 40 s, resulting in PWMPER0 equal to 250. Assuming the E clock period is 125 ns, the prescale needs to be 40/0.125 or 320. There are a number of ways to make this happen, but one way is to select Clock A to be E/32, create SA A/10, the select the SA clock for channel 0, as shown in Program 8.8. Checkpoint 8.7: Give another way to create a prescale of 320 on channel 0.

PWM_Init0 ;10ms PWM on PP0 bset PWME,#$01 ;enable chan 0 bset PWMPOL,#$01 ;high then low bset PWMCLK,#$01 ;Clock SA ldaa PWMPRCLK anda #$F8 oraa #$05 staa PWMPRCLK ;A=E/32 movb #5,PWMSCLA ;SA=A/10 movb #250,PWMPER0 ;10ms period clr PWMDTY0 ;initially off rts PWM_Duty0 ;RegA is duty cycle staa PWMDTY0 ;0 to 250 rts

// 10ms PWM on PP0 void PWM_Inito(void){ PWME |= 0x01; // enable channel 0 PWMPOL |= 0x01; // PP0 high then low PWMCLK |= 0x01; // Clock SA PWMPRCLK = (PWMPRCLK&0xF8)|0x05; // A=E/32 PWMSCLA = 5; // SA=A/10, 0.125*320=40us PWMPER0 = 250; // 10ms period PWMDTY0 = 0; // initially off } // Set the duty cycle on PP0 output void PWM_Duty0(unsigned char duty){ PWMDTY0 = duty; // 0 to 250 }

Program 8.8 Implementation of an 8-bit PWM output.

Checkpoint 8.8: How would you modify Program 8.8 to have a period of 100 ms?

Example 8.5 Implement a 1-second 16-bit PWM. Solution Again this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. To create a 16-bit PWM we need to concatenate two 8-bit channels. We could have used channels 01, 23, 45, or 67. In this example, we choose to create the PWM output using concatenated channel 23 with its output generated on the PP3 output. In order to maximize precision, it is best to create the 1 s period using as large a value in PWMPER23 as possible. Since 1 s/65536 equals 15.2587890625 s, we need a clock just larger than 15 s. The fastest clock that can be used is 16 s, resulting in PWMPER23 equal to 62500. Assuming the E clock period is 125 ns, the prescale needs to be 16/0.125 or 128. There are a number of ways to make this happen, but one way is to make Clock B to be E/128, the select the B clock for channel 23, as shown in Program 8.9.

316

8 䡲 Serial and Parallel Port Interfacing

PWM_Init3 ;1s PWM on PP3 bset PWME,#$08 ;enable chan 3 bset PWMPOL,#$08 ;high then low bclr PWMCLK,#$08 ;Clock B bset PWMCTL,#$20 ;concat 2+3 ldaa PWMPRCLK anda #$8F oraa #$70 staa PWMPRCLK ;B=E/128 movw #62500,PWMPER23 ;1s period movw #0,PWMDTY23 ;off rts PWM_Duty3 ;RegD is duty cycle std PWMDTY0 ;0 to 62500 rts

// 1s PWM on PP3 void PWM_Init3(void){ PWME |= 0x08; // enable channel 3 PWMPOL |= 0x08; // PP3 high then low PWMCLK &=~0x08; // Clock B PWMCTL |= 0x20; // Concatenate 2+3 PWMPRCLK = (PWMPRCLK&0x8F)|0x70; // B=E/128 PWMPER23 = 62500; // 1s period PWMDTY23 = 0; // initially off } // Set the duty cycle on PP3 output void PWM_Duty3(unsigned short duty){ PWMDTY23 = duty; // 0 to 62500 }

Program 8.9 Implementation of a 16-bit PWM output. Checkpoint 8.9: What would be the effect of creating the 1 s output using a 1 ms SB clock and a PWMPER23 value of 1000? Checkpoint 8.19: Are programs 8.9 and 8.10 friendly enough to be used together?

8.8

*Stepper Motors A motor can be evaluated in terms of its maximum speed (RPM), its torque (N-m), and the efficiency in which it translates electrical power into mechanical power. Sometimes however, we wish to use a motor to control the rotational position ( motor shaft angle) rather control the rotational speed ( d/dt). Stepper motors are used in applications where precise positioning is more important than high RPM, high torque, or high efficiency. Stepper motors are very popular for microcontroller-based embedded systems because of their inherent digital interface. Figure 8.24 shows three stepper motors. The larger motors provide more

Figure 8.24 Photo of three stepper motors. (Courtesy of Jonathan Valvano.)

8.8 䡲 *Stepper Motors

317

torque, but require more current. It is easy for a computer to control both the position and velocity of a stepper motor in an open-loop fashion. Although the cost of a stepper motor is typically higher than an equivalent DC permanent magnetic field motor, the overall system cost is reduced because stepper motors may not require feedback sensors. They are used in printers to move paper and print heads, tapes/disks to position read/write heads, and highprecision robots. For example, the stepper motor shown in Figure 6.8 moves the R/W head from one track to another on an audio tape recorder. A bipolar stepper motor has two coils on the stator (the frame of the motor), labelled A and B in Figures 8.25 and 8.26. Typically, there is always current flowing through both coils. When current flows through both coils, the motor does not spin (it remains locked at that shaft angle). Stepper motors are rated in their holding torque, which is their ability to hold stationary against a rotational force (torque) when current is constantly flowing through both coils. To move a bipolar stepper, we reverse the direction of current through one (not both) of the coils, see Figure 8.25. To move it again we reverse the direction of current in the other coil. Remember, current is always flowing through both coils. Let the direction of the current be signified by up and down. To make the current go up, the microcontroller outputs a binary 01 to the interface. To make the current go down, it outputs a binary 10. Since there are 2 coils, four outputs will be required (e.g., 01012 means up/up). To spin the motor, we output the sequence 01012, 01102, 10102, 10012, . . . Figure 8.25 A bipolar stepper has 2 coils, but a unipolar stepper divides the two coils into four parts.

Interface

Bipolar stepper

Interface

Unipolar stepper A

A

+ –

+V A’

+ –

+ –

B + –

B

B’

+ –

+ –

0101 0110 1010 1001

I N

A

0101 0110 1010 1001

Flip B

Stator S

S

N S N

S

S

I

N

S N

I

I S

A

A B

A

N

N

I

S

S

N N S

I S

B

Output = 0101

N

S

Flip B

B

N

I

N N

Flip A

B

I

+V

N

S

B

I S

A

I S A

N

N S

Flip A

B I

N

S

S

N

N S N

S

I S

B

N

A

N

I S

B

S

I S

S S

N S

A

I

N

N

N S

I N B

Rotor Output = 0110

Output = 1010

Output = 1010

Figure 8.26 To rotate this stepper by 18°, the interface flips the direction of one of the currents.

N

A

318

8 䡲 Serial and Parallel Port Interfacing

over and over. Each output causes the motor to rotate a fixed angle. To rotate the other direction, we reverse the sequence (01012, 10012, 10102, 01102 . . .). There is a North and a South permanent magnet on the rotor (the part that spins). The amount of rotation caused by each current reversal is a fixed angle depending on the number of teeth on the permanent magnets. For example, the rotor in Figure 8.26 is drawn with 5 North teeth and 5 South teeth. If there are n teeth on the South magnet (also n teeth on the North magnet), then the stepper will move at 90/n degrees. This means there will be 4n steps per rotation. Because moving the motor involves accelerating a mass (rotational inertia) against a load friction, after we output a value, we must wait an amount of time before we can output again. If we output too fast, the motor does not have time to respond. The speed of the motor is related to the number of steps per rotation and the time in between outputs. For information on stepper motors see the data sheets web page at http://users.ece.utexas.edu/~valvano/Datasheets. The unipolar stepper motor provides for bi-directional currents by using a center tap, dividing each coil into two parts. In particular, coil A is split into coil A and A’, and coil B is split into coil B and B’. The center tap is connected to the V power source and the four ends of the coils can be controlled with open collector drivers. Because only half of the electromagnets are energized at one time, a unipolar stepper has less torque than an equivalent-sized bipolar stepper. However, unipolar steppers are easier to interface. For example, you can use four copies of the circuit in Figure 8.16 to interface a unipolar stepper motor. Figure 8.27 shows a circular linked graph containing the output commands to control a stepper motor. This simple FSM has no inputs, four output bits and four states. There is one state for each output pattern in the usual stepper sequence 5, 6, 10, 9, . . . The circular FSM is used to spin the motor is a clockwise direction. Notice the one-toone correspondence between the state graph in Figure 8.27 and the fsm[4] data structure in Program 8.10. Figure 8.27 This stepper motor FSM has four states. The 4-bit outputs are given in binary.

Name

Output

S5 0101

Next S6 0110

S10 1010

S9 1001

Example 8.6 Design a stepper motor controller than spins the motor at 6 RPM. Solution We choose a stepper motor according to the speed and torque requirements of the system. A stepper with 200 steps/rotation will provide a very smooth rotation while it spins. Just like the DC motor, we need an interface that can handle the currents required by the coils. We can use a L293 to interface either unipolar or bipolar steppers that require less than 1 A per coil. In general, the output current of a driver must be large enough to energize the stepper coils. We control the interface using an output port of the microcontroller, as shown in Figure 8.28. The circuit shows the interface of a unipolar stepper, but the bipolar stepper interface is similar except there is no V connection to the motor. The main program, Program 8.10, begins by initializing the Port T output and the state pointer. Every 5 ms the program outputs a new stepper command. The function Timer_Wait1ms() from Program 4.5 uses the built-in timer to generate an appropriate delay between outputs to the stepper. For a 200 step/rotation stepper, we need to wait 50 ms between outputs to spin at 6 RPM. Speed (1 rotation/200 steps)*(1000 ms/s)*(60 sec/min)*(1step/50 ms) 6 RPM

8.8 䡲 *Stepper Motors Figure 8.28 A unipolar stepper motor interfaced to a Freescale 9S12.

319

+V +5 16

PT3 9S12

2

L293 1A 1Y

8

A 3

Stepper Motor

1N914

A' PT2

7

2A 2Y

6

shaft

1N914

B PT1

10

3A 3Y

11 1N914

PT0

15

4A 4Y

1 1,2EN

+5

B'

14 4,5,12,13

1N914

9 3,4EN

org Out equ Next equ S5 fcb fdb S6 fcb fdb S10 fcb fdb S9 fcb fdb

$4000 0 1 5 S6 6 S10 10 S9 9 S5

; in ROM

main lds jsr movb ldx loop movb ldy jsr ldx bra

#$4000 Timer_Init #$FF,DDRT ;output to stepper #S5 ;initial state Out,x,PTT ;output #50 Timer_Wait1ms Next,x ;clockwise step loop

; output for this state ; clockwise next

const struct State { unsigned char Out; // command const struct State *next;}; // clockwise typedef const struct State StateType; #define S5 &fsm[0] #define S6 &fsm[1] #define S10 &fsm[2] #define S9 &fsm[3] StateType fsm[4]={ { 5, S6}, // Out=0101, Next=S6 { 6,S10}, // Out=0110, Next=S10 {10, S9}, // Out=1010, Next=S9 { 9, S5}}; // Out=1001, Next=S5 void main(void){ StateType *Pt; Timer_Init(); DDRT = 0xFF; // outputs Pt = S5; // initial state while(1){ // embedded systems never quit PTT = Pt->Out; // stepper out Timer_Wait1ms(50); // 50ms wait Pt = Pt->next; // Clockwise step } }

Program 8.10 Stepper motor controller.

To illustrate how easy it is to make changes to this implementation, let’s consider these three modifications. To make it spin in the other direction, we simply change pointers to sequence in the other direction. To make it spin at a different rate, we change the wait time. To implement an eight-step sequence (the half-stepping outputs are 5, 4, 6, 2, 10, 8, 9, 1, . . .), we add the four new states and link all eight states in the desired sequence. These changes can be easily made. Checkpoint 8.11: If the stepper motor were to have 36 steps per rotation, how fast would the motor spin using Program 8.10?

320

8 䡲 Serial and Parallel Port Interfacing Checkpoint 8.12: What would you change in Program 8.10 to make the motor spin at 30 RPM? Performance Tip: Use a DC motor for applications requiring high torque or high speed, and use a stepper motor for applications requiring accurate positioning at low speed. Performance Tip: To get high torque at low speed, use a geared DC motor (the motor spins at high speed, but the shaft spins slowly).

8.9

Homework Problems Homework 8.1 Assume the baud rate is 9600 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “ABC” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.2 Assume the baud rate is 19200 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “125” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.3 Assume the 9S12 E clock is 8 MHz. Write an assembly language subroutine that initializes the serial port to communicate at 9600 bits/sec, 8-bit data, 1 start bit, and 1 stop bit. Homework 8.4 Sometimes it is important for the software to know when the SCI transmission is complete. The transmit complete (TC) flag is set after the data in the shift register has been transmitted. Rewrite the SCI_OutChar subroutine so that it first writes to the data register, then waits for the TC flag to be set. The TC flag is cleared by first reading the status register with TC set followed by writing into the transmit data register. Homework 8.5 Design an interface for a 64-key keyboard, which is configured with eight rows and eight columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. Write an input subroutine that returns the key number 0 to 63 if a key is pressed or –1 if no key is pressed. Assume the keys do not bounce. Homework 8.6 Design an interface for a 20-key keyboard, which is configured with four rows and five columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. The keys bounce with a maximum time of 1 ms. Use a periodic interrupt at rate of 2 ms, and scan the keyboard in the ISR. Set a public global variable (called Key) equal to 0 to 19 if a key is pressed or –1 if no key is pressed. Homework 8.7 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 500 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 2000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.8 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 125 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 8000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.9 Interface an electromagnetic relay (2 wires) to the 9S12 pin PP5. The coil requires 250 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called On, that activates the relay, and a subroutine, called Off, that deactivates the relay. Homework 8.10 Interface a solenoid (2 wires) to the 9S12 pin PP5. The coil requires 100 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called Pulse, that activates the solenoid for 10 ms (then shuts off). No interrupts needed, use Timer_Wait.

8.10 䡲 Laboratory Assignments

321

Homework 8.11 Interface a DC motor (2 wires) to the 9S12. The coil requires 500 mA at 12 V. In addition to the motor output, there are two inputs. When the Go input is high the motor spins, (when Go is low, no power is delivered). The the motor is spinning, the other input (Direction) determines the CCW/CW rotational direction. Use a L293 H-bridge driver. Homework 8.12 There is a 9S12 digital output connected to a 9S12 digital input across a long cable. The connection has an equivalent capacitance of 25 pF into a 10 M resistance. The capacitance results from the long cable, and the resistance results from the input impedance of the 9S12. What is the time constant of this system? If we operate 10 times slower than the time constant, what is the maximum period allowed for this system? List two ways to speed up this transmission. Homework 8.13 Considering the voltages shown in Table 8.2, prove that you can connect a 9S12 output (VDD 5 V) to a 7404 input. Similarly, prove that you can not connect a 7404 output to a 9S12 input. Which logic family types shown in Table 8.2 allow the output of the digital gate to be connected to a 9S12 input? (By the way, if you wanted to connect a 7404 output to a 9S12 input, you could add a 1 k pull-up resistor on the 7404 output to 5 V, increasing the VOH of the output.) Homework 8.14 Interface a 12-bit DAC, MAX539 to the 9S12 SPI port. Connect MAX539 pins 1, 2, and 3 to the 9S12 SPI. Leave pin 4 not connected. Use a REF03 to create a 2.5 V reference and connect it to the MAX539 pin 6 reference input. Pin 8 is 5 V power and pin 5 is ground. Write two functions, one to initialize and one to update the DAC analog output. Updating the DAC output will require three SPI transmissions. Homework 8.15 Design an 8-bit PWM driver for Port P pin 5. Implement positive logic (PPOL5 equals 1) and left justified (CAE5 equals 0). There will be three functions: one to initialize the system at 1000 Hz 50% duty cycle, one to set the period, and a third function to set the duty cycle. You should fix the PWMPER5 to a constant value of 250, then allow the user to modify the clock using the second function. Add comments to your software that explains how the PWM driver can be used. Homework 8.16 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 1 rps, using Timer_Wait. Homework 8.17 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 100 mA at 6 V. There are 36 steps per revolution. Write software that spins the motor at 10 rps, using Timer_Wait. Homework 8.18 Interface a bipolar stepper motor (4 wires) to the 9S12 pins PT3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 5 rps, using Timer_Wait. Homework 8.19 Interface a 32 speaker (2 wires) to the 9S12 PT0. To make a sound, output a 1 kHz squarewave to the interface, creating about 1 V peak-to-peak on the speaker (about 30 mA pulsed current). Use the 5 V supply and a NPN transistor. Write a main program to activate the sound. Homework 8.20 Write open-loop software to control power to the robot shown in Figure 8.17. Assume the two copies of the TIP120 circuit from Figure 8.16 are connected to two 8-bit PWM channels. Write a Motor_Init subroutine to initialize the two PWM channels. Write a Motor_Left subroutine that adjusts delivered power to the left wheel. Write a Motor_Right subroutine that adjusts delivered power to the right wheel. Assume call by value parameters 0 to 250 in RegA for the left and right subroutines.

8.10

Laboratory Assignments Lab 8.1 Keyboard Device Driver Purpose: You will design the hardware interface between a keyboard and a microcomputer, create the low-level device driver, interface a single LED, and implement keyboard security system. Description: In this keyboard lab, you will design the keyboard interface using busy-wait synchronization. In the next chapter we will learn interrupts. Placing the key input task into a

322

8 䡲 Serial and Parallel Port Interfacing background thread, frees the main program to execute other tasks while the software is waiting for the operator to type something. This security system doesn’t have anything else to do, but in a complex system, it is important to be able to perform multiple tasks. The second advantage of interrupts is the ability to create accurate time delays even with a complex software environment. In this implementation, you will use busy-wait. One way to solve the switch-bounce problem is to wait in between scanning the keyboard. The time in between scans must be longer than the bounce time of the switch, but shorter than the total time a key is touched or released. For example, if the switch has a bounce time of 500 sec, then you could scan every 1 msec. If there is exactly one key typed and this key is different from the pattern observed at the time of the scan, then you will return the ASCII code. This experiment will illustrate how a parallel port of the microcomputer will be used to control a keyboard matrix. In each case your computer will drive the rows (output 0 or HiZ) and read the columns. The low level software (inputs, scans, debounces, and saves keys in a FIFO) runs in a background period interrupt thread. Your system must handle two-key rollover. For example, if the operator were to type “1,2,3”, they could push “1”, push “2”, release “1”, push “3”, release “2”, then release “3”. Low level device drivers normally exist in the BIOS ROM and have direct access to the hardware. They provide the interface between the hardware and the rest of the software. Good low-level device drivers allow: 䡲䡲䡲䡲

New hardware to be installed New synchronization methods to be implemented (like changing busy-waiting to interrupts) New algorithms to be added (error detection, data compression) Higher level features to be built on top of the low level

and still maintain the same software interface. In larger systems like the Workstation and IBM-PC, the low level I/O software is compiled and burned in ROM separate from the code that will call it, it makes sense to implement the device drivers as software traps or software interrupt (swi) and specify the calling sequence in assembly language. In embedded systems like we use, it is OK to provide a source code file that the user can assemble into their application. Linking is the process of resolving addresses to code and programs that have been complied separately. In this way, the routines can be called from any program without requiring complicated linking. In other words, when the device driver is implemented with an swi, the linking is built into the operation of the software interrupt instruction. In our embedded system, the assembler will perform the linking. The concept of a device driver can be illustrated with a prototype device driver. You are encouraged to modify/extend this example, and define/develop/test your own format. A prototype keyboard device driver follows. The device driver software is grouped into four categories. 1. Data structures: global, private (accessed only by the device driver, not the user) openFlag Boolean that is true if the keyboard port is open initially false, set to true by Key_Open, set to false by Key_Close static storage (or dynamically created at bootstrap time, i.e., when loaded into memory) 2. Initialization routines (called by user) Key_Open Initialization of keyboard port Sets openFlag to true Initializes hardware Returns an error code in RegA if unsuccessful (already open) Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr Key_Open tsta ; 0 if opened correctly bne error Key_Close Release of keyboard port Sets openFlag to false Returns an error code in RegA if not previously open Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr KeyClose tsta ; 0 if closed correctly bne error

8.10 䡲 Laboratory Assignments

323

3. Regular I/O calls (called by user to perform I/O) Key_In Input an ASCII character from the keyboard port Waits for a key to be pressed, then waits for it to be released (there is bounce and two key rollover) Returns data in RegB if successful Returns an error code in RegA if unsuccessful device not open, hardware failure (probably not applicable here) Output Parameters: RegB is data, RegA is error code Typical calling sequence jsr Key_In tsta ; 0 if input is OK correctly bne error stab data ; save new key data Key_Status Returns the status of the keyboard port Returns a true in RegA if a call to Key_In would return with a key Returns a false in RegA if a call to Key_In would not return right away, but rather it would wait Returns a true if device not open, hardware failure (probably not applicable here) Typical calling sequence loop jsr work ; perform work until key is typed jsr Key_Status tsta ; true if a key is typed beq loop jsr Key_In ; read and process the key 4. Support software (private code). If you have any helper functions, these would be considered local to your driver and would be placed in this category. In C, these helper functions would be defined as private. In C, we could define the helper functions in the .c file, but not place a prototype in the .h file. In this way, the function could only be called from functions in the .c implementation, and not by the user. In assembly language we are very careful not to call a helper function from outside the device driver. An interrupt service routine is an example of support software. a) Create an I/O window and build a keyboard similar to the one shown in Figure L8.1. b) Write the low-level keyboard device driver. The main program will implement an access code based security system. Each access code will consist of four digits between 0 to 9.

Top view 1 2B

1A 4

3

C

2

3

4

5

6

7

8

9

2nd

0

help

enter

2

5E

6F

7

8

9

2nd

Clear

0

Help

Enter

D

1

3 4 clear

9 8 7 6 54 3 2 1 Wires on 0.1" centers

5 6 7 8

Bottom view Figure L8.1 0-9 keyboard with up arrow, down arrow, 2nd, CLEAR, HELP, and ENTER.

324

8 䡲 Serial and Parallel Port Interfacing The security system can recognize up to five access codes. You will specify these codes in global memory. The keyboard will be used to enter access codes. If this access code is one of the valid codes, checked by searching the access code database, the single LED is turned on. The LED will remain on until the new key is typed. The main program will need its own data structure to hold the last four keys typed. Assume “1257” and “2222” are valid codes. Following example shows the LED status (0 off, 1 on) after each key hit. 1 2 1 2 5 7 8 9 2 2 2 2 2 2 6 1 2 5 7 4 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 Write a main program to test the keyboard device driver. Collect some latency data (time from key touch to return of Key-In) measurements. c) Build the LED display writing a simple device driver that allows you to turn the LED on and off. d) Write a main program that implements the security system. Lab 8.2 Input/Output Interface to a Stepper Motor Purpose: The purpose of this laboratory is to develop a microcomputer system that spins a stepper motor. Description: a) Design the interface between the stepper and the 9S12. Use the simulator to create three files. Stepper.rtf will contain the assembly source code. Stepper.uc will contain the microcomputer configuration. Stepper.io will define the external connections. You should specify the microcomputer and attach one switch and the four signals to the stepper motor controller. The four stepper motor signals are called B, B, A, and A. b) You will write assembly code that inputs from the switch, and outputs to the stepper. When the input switch is “off” or open position, Port A bit 0 will be “0”. For this situation, your software will not change the Port B stepper motor outputs. When the input switch is “on” or closed position Port A bit 0 will be “1”. In this case, your software will output the sequence 5,6,10,9,5,6,10,9, . . . over and over again to the stepper motor. The motor will turn 1.8° for every new output to Port B. Instead of a stepper motor, the four outputs will be connected to four LEDs. The following C program describes the software algorithm.

Program L8.2 The C program to illustrate Lab 8.2.

unsigned char Angle; // ranges from 0 to 199 void main(void){ Angle=0; // initialize global DDRA=0; // make Port A inputs DDRB=0xFF; // make Port B outputs while(1){ while((PORTA&0x01)==0) {}; // stop if PA0=0, continue if PA0=1 PORTB=5; Angle++; PORTB=6; Angle++; PORTB=10; Angle++; PORTB=9; Angle++; if(Angle==200) Angle=0; } The software variable Angle varies from 0 to 199 as the stepper motor angle varies from 0 to 358°. c) During the demonstration, you will be asked to run the program to verify proper operation. Be prepared to use the debugger to determine how fast the simulated motor is spinning. Each output to Port B causes a 1.8° step. Lab 8.3 Calculator Purpose: The objectives of this lab are to: 䡲 Interface a matrix keyboard and HD44780 LCD display to the microcomputer 䡲 Write device drivers for the keyboard and HD44780 LCD display 䡲 Implement a four-function integer calculator

8.10 䡲 Laboratory Assignments

325

Description: In this lab you will design a four-function 8-bit unsigned integer calculator. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘’, ‘’, ‘*’, ‘/’, ‘’ and ‘C’. The HD44780 LCD display will show both an 8-bit global accumulator, and an 8-bit temporary register. You are free to design the calculator functionality in any way you wish, but you must be able to: (1) clear the accumulator and temporary; (2) type numbers in using the matrix keyboard; (3) add, subtract, multiply, and divide; (4) display the results on the HD44780 LCD display. Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 16-key matrix keyboard and

HD44780 display. You can assume the matrix keyboard does not bounce. During the initial debugging stages of the lab, you may disable the HD44780 busy flag, but your final demonstration will have to include the realistic timing for the LCD. b) Write a device driver for the HD44780. You should be able to: (1) initialize the interface; (2) clear the display; (3) output a character; (4) output an 8-bit integer; and (5) output a string. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write the main program that implements the calculator functionality. Include a “call-graph” of the system. Lab 8.4 Stepper Motor Controller Purpose: The objectives of this lab are to 䡲 Interface a matrix keyboard, a LCD display and stepper motor to the microcomputer 䡲 Write device drivers for the keyboard, LCD display and stepper motor 䡲 Implement a stepper motor controller Description: In this lab you will design a simple stepper motor controller. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘c’, and ‘g’. To move the motor, the operator types in the desired angle (0 to 359), then hits the ‘g’ key. As the operator enters the numbers, the digits are displayed on the three-digit LCD. If the operator types ‘c’, the command is cleared, and no motion occurs. The system should move clockwise or counterclockwise, whichever is fewer steps. While the motor is moving the three-digit LCD display will show the current angle of the stepper motor (0 to 359). Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 12-key matrix keyboard, a three-digit LCD display and one stepper motor. You can assume the matrix keyboard does not bounce. b) Write a device driver for the 3-digit LCD. You should be able to initialize the interface and output an angle as a number from 0 to 359. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write a device driver for the stepper interface. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the stepper motor must be included in this driver. The names of all the public driver subroutines should start with the letters “Step_”. Draw flowcharts of these subroutines. e) Write the main program that implements the calculator functionality. Include a “call-graph” of the system.

9

Interrupt Programming and Real-Time Systems Chapter 9 objectives are to: c c c c c c

Explain the fundamentals of interrupt programming Introduce interrupt-driven I/O, and implement periodic interrupts Explain key wakeup interrupts and use them to interface individual switches Present the timer-based modules needed for real-time systems Use the pulse accumulator and input capture to measure period and pulse width Develop methods to debug real-time events

An embedded system uses its input/output devices to interact with the external world. Input devices allow the computer to gather information, and output devices can display information. Output devices also allow the computer to manipulate its environment. The tight-coupling between the computer and external world distinguishes an embedded system from a regular computer system. The challenge is under most situations the software executes much faster than the hardware. E.g., the software may ask the hardware to clear the LCD display, but within the hardware this action might take 1 ms to complete. During this time, the software could execute thousands and thousands of instructions. Therefore, the synchronization between the executing software and its external environment is critical for the success of an embedded system. This chapter begins with an overview I/O synchronization. We then present general concepts about interrupts, and specific details for the 9S12. We will then use periodic interrupts to cause a software task to be executed on a periodic basis. This chapter describes the timer-based modules used to design real-time embedded systems.

9.1

I/O Sychronization Latency is the time between when the I/O device needs service, and the time when service is initiated. Latency includes hardware delays in the digital hardware plus computer software delays. For an input device, software latency (or software response time) is the time between new input data ready and the software reading the data. For an output device, latency is the delay from output device idle and the software giving the device new data to output. In this book, we will also have periodic events. For example, in our data acquisition systems, we wish to invoke the analog to digital converter (ADC) at a fixed time interval. In this way we can collect a sequence of digital values that approximate the continuous analog signal. Software latency in this case is the time between when

326

9.1 䡲 I/O Sychronization

327

the ADC converter is supposed to be started, and when it is actually started. The microcomputer-based control system also employs periodic software processing. Similar to the data acquisition system, the latency in a control system is the time between when the control software is supposed to be run, and when it is actually run. A real time system is one that can guarantee a worst case latency. In other words, the software response time is small and bounded. Throughput or bandwidth is the maximum data flow in bytes/second that can be processed by the system. Sometimes the bandwidth is limited by the I/O device, while other times it is limited by computer software. Bandwidth can be reported as an overall average or a short-term maximum. Priority determines the order of service when two or more requests are made simultaneously. Priority also determines if a high priority request should be allowed to suspend a low priority request that is currently being processed. We may also wish to implement equal priority, so that no one device can monopolize the computer. In some computer literature, the term “softreal-time” is used to describe a system that supports priority. The purpose of our interface is to allow the microprocessor to interact with its external I/O device. There are five mechanisms to synchronize the microprocessor with the I/O device. Each mechanism synchronizes the I/O data transfer to the busy to done transition. The methods are discussed in the following paragraphs. Blind cycle is a method where the software simply waits a fixed amount of time and assumes the I/O will complete before that fixed delay has elapsed. For an input device, the software triggers (starts) the external input hardware, wait a specified time, then reads data from device, see the left part of Figure 9.1. For an output device, the software writes data to the output device, triggers (starts) the device, then waits a specified time. We call this method blind, because there is no status information about the I/O device reported to the computer software. It is appropriate to use this method in situations where the I/O speed is short and predictable. One appropriate application of blind cycle synchronization is an ADC converter. For example, we can ask the ADC to convert, wait exactly 7 s, then read the digital result. This method works because the ADC conversion speed is short and predictable. Another good example of blind cycle synchronization is spinning a stepper motor. If we repeat this 8-step sequence over and over (1) output a 0x05, (2) wait 1 ms, (3) output a 0x06, (4) wait 1 ms, (5) output a 0x0A, (6) wait 1 ms, (7) output a 0x09, (8) wait 1 ms, the motor will spin at a constant speed. The LCD interface developed in Section 8.5 utilized blind cycle synchronization. Busy Waiting is a software loop that checks the I/O status waiting for the done state. For an input device, the software waits until the input device has new data, then reads it from the input device, see the middle part of Figure 9.1. For an output device, the software writes data, triggers the output device then waits until the device is finished. Another approach to output device interfacing is for the software to wait until the output device has finished the previous output, write data, then trigger the device. Busy-wait synchronization will be used in situations where the software system is relatively simple and real time response is not important. The ADC converter could also have been interfaced with busy-wait synchronization. For example, we can ask the ADC to convert, wait until the sequence conversion flag (SCF) in the ADC is set, then read the digital result. An interrupt uses hardware to cause special software execution. With an input device, the hardware will request an interrupt when input device has new data. The software interrupt service will read from the input device and save the data in a global structure, see the right part of Figure 9.1. With an output device, the hardware will request an interrupt when the output device is idle. The software interrupt service will get data from a global structure, then write to the device. Sometimes we configure the hardware timer to request interrupts on a periodic basis. The software interrupt service will perform a special function. A data acquisition system needs to read the ADC at a regular rate. The 9S12 microcomputer will execute special software (trap) when it tries to execute an illegal instruction. Other computers can be configured to request an interrupt on an access to an illegal address or a

328

9 䡲 Interrupt Programming and Real-Time Systems

divide by zero. The Freescale microcomputers do not provide for a divide by zero trap, but many computers do. Interrupt synchronization will be used in situations where the system is fairly complex (e.g., a lot of I/O devices) or when real time response is important. Periodic Polling uses a clock interrupt to periodically check the I/O status. At the time of the interrupt the software will check the I/O status, performing actions as needed. With an input device, a ready flag is set when the input device has new data. At the next periodic interrupt after an input flag is set, the software will read the data and save them in a global structure. With an output device, a ready flag is set when the output device is idle. At the next periodic interrupt after an output flag is set, the software will get data from a global structure, and write it. Periodic polling will be used in situations that require interrupts, but the I/O device does not support interrupt requests directly. DMA, or direct memory access, is an interfacing approach that transfers data directly to/from memory. With an input device, the hardware will request a DMA transfer when input device has new data. Without the software’s knowledge or permission the DMA controller will read data from the input device and save it in memory. With an output device, the hardware will request a DMA transfer when the output device is idle. The DMA controller will get data from memory, then write it to the device. Sometimes we configure the hardware timer to request DMA transfers on a periodic basis. DMA can be used to implement a high-speed data acquisition system. DMA synchronization will be used in situations where high bandwidth and low latency are important. One can think of the hardware being in one of three states. The idle state is when the device is disabled or inactive. No I/O occurs in the idle state. When active (not idle) the hardware toggles between the busy and ready states. The interface includes a flag specifying either busy (0) or ready (1) status. 䡲 The hardware will set the flag when the hardware component of the I/O operation is complete. 䡲 The software can read the flag to determine if the device is busy or ready. 䡲 The software can clear the flag, signifying the software component is complete. 䡲 This flag serves as the hardware trigger event for an interrupt. For an input device, a status flag is set when new input data is available. The “busy to ready” state transition will cause a busy-wait loop to complete, see middle of Figure 9.1. Once the software recognizes the input device has new data, it will read the data and ask the input device to create more data. It is the busy to ready state transition that signals to the computer that service is required. When the hardware is in the done state the I/O transaction is complete. Often the simple process of reading the data will clear the flag and request another input. Figure 9.1 The input device sets a flag when it has new data.

Blind Cycle

Input

Wait a fixed time

Input

Input Busy

BusyWait Status Ready

Interrupt

Empty

Fifo

Read data

Some

Read data

Read data

Get data from Fifo

Put data in Fifo

return

return

return

return from interrupt

The problem with I/O devices is that they are usually much slower than software execution. Therefore, we need synchronization, which is the process of the hardware and software waiting for each other in a manner such that data is properly transmitted. A way to visualize this synchronization is to draw a state versus time plot of the activities of the hardware and software. For an input device, the software begins by waiting for new input. When the input

9.1 䡲 I/O Sychronization

329

device is busy it is in the process of creating new input. When the input device is ready, new data is available. When the input device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software accepts the input, it can release the input device hardware. The arrows in Figure 9.2 represent the synchronizing events. In this example, the time for the software to read and process the data is less than the time for the input device to create new input. This situation is called I/O bound, meaning the bandwidth is limited by the speed of the I/O hardware. Figure 9.2 The software must wait for the input device to be ready.

Ready

Input device Software

Ready

Busy

Busy

Wait

Busy

Wait

Wait

Read Process

Read Process

Time

If the input device were faster than the software, then the software waiting time would be zero. This situation is called CPU bound (meaning the bandwidth is limited by the speed of the executing software). From this figure we can see that the bandwidth depends on both the hardware and the software. The busy-wait method is classified as unbuffered because the hardware and software must wait for each other during the transmission of each piece of data. The interrupt solution (shown in the right part of Figure 9.1) is classified as buffered, because the system allows the input device to run continuously, filling a FIFO with data as fast as it can. In the same way, the software can empty the buffer whenever it is ready and whenever there is data in the buffer. We will implement a buffered interface for the serial port input in Chapter 12 using interrupts. For an output device, a status flag is set when the output is idle and ready to accept more data. The “busy to ready” state transition causes a busy-wait loop to complete, see the middle part of Figure 9.3. Once the software recognizes the output is idle, it gives the output device another piece of data to output. It will be important to make sure the software clears the flag each time new output is started. Figure 9.3 The output device sets a flag when it has finished outputting the last data.

Blind Cycle

BusyWait

Busy

Write data

Status Ready

Wait a fixed time

Write data

return

return

Interrupt

Output

Output

Output

Empty

Fifo Full

Fifo

Not empty

Get data from Fifo

Not full

Put data in Fifo return

Write data return from interrupt

Figure 9.4 contains a state versus time plot of the activities of the output device hardware and software. For an output device, the software begins by generating data then sending it to the output device. When the output device is busy it is processing the data. Normally when the software writes data to an output port, that only starts the output process. The time it takes an output device to process data is usually longer than the software execution time. When the output device is done, it is ready for new data. When the output device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software writes data to the output, it releases the output device hardware. The output

330

9 䡲 Interrupt Programming and Real-Time Systems

Figure 9.4 The software must wait for the output device to finish the previous operation.

Ready

Ready

Output device

Ready

Software

Busy

Busy

Wait Write Generate Generate

Busy

Wait Write

Wait Write

Generate

Time

Generate

interface illustrated in Figure 9.4 is also I/O bound because the time for the output device to process data is longer than the time for the software to generate and write it. The arrows in Figure 9.4 signify the synchronizing events. Again, I/O bound means the bandwidth is limited by the speed of the I/O hardware. The busy-wait solution for this output interface is also unbuffered, because when the hardware is done, it will wait for the software and after the software generates data, it waits for the hardware. On the other hand, the interrupt solution (shown as the right part of Figure 9.3) is buffered, because the system allows the software to run continuously, filling a FIFO as fast as it wishes. In the same way, the hardware can empty the buffer whenever it is ready and whenever there is data in the FIFO. We will implement a buffered interface for the serial port output in Chapter 12 using interrupts.

9.2

Interrupt Concepts 9.2.1 Introduction

An interrupt is the automatic transfer of software execution in response to a hardware event that is asynchronous with the current software execution. This hardware event is called a trigger. The hardware event can either be busy to ready transition in an external I/O device (like the SCI input/output) or an internal event (like an op code fault, memory fault, power failure, or a periodic timer). When the hardware needs service, signified by a busy to ready-state transition, it will request an interrupt by setting its trigger flag. A thread is defined as the path of action of software as it executes. The execution of the interrupt service routine is called a background thread. This thread is created by the hardware interrupt request and is killed when the interrupt service routine executes the rti instruction. A new thread is created for each interrupt request. It is important to consider each individual request as a separate thread because local variables and registers used in the interrupt service routine are unique and separate from one interrupt event to the next interrupt. In a multithreaded system, we consider the threads as cooperating to perform an overall task. Consequently we will develop ways for the threads to communicate (e.g., FIFO) and synchronize with each other. Most embedded systems have a single common overall goal. On the other hand, general-purpose computers can have multiple unrelated functions to perform. A process is also defined as the action of software as it executes. Processes do not necessarily cooperate towards a common shared goal. Threads share access to I/O devices, system resources, and global variables, while processes have separate global variables and system resources. Processes do not share I/O devices. The software has dynamic control over aspects of the interrupt request sequence. First, each potential interrupt trigger has a separate arm bit that the software can activate or deactivate. The software will set the arm bits for those devices it wishes to accept interrupts from, and will deactivate the arm bits within those devices from which interrupts are not to be allowed. In other words it uses the arm bits to individually select which devices will and which devices will not request interrupts. The second aspect that the software controls is the interrupt enable bit, I, which is in the condition code register. The software can enable interrupts by making I 0, or it can disable interrupts by setting I 1. An interrupt occurs only when all three conditions are met: trigger, arm and enable. The disabled interrupt state

9.2 䡲 Interrupt Concepts

331

(I 1) does not dismiss the interrupt requests, rather it postpones them until a later time, when the software deems it convenient to handle the requests. We will pay special attention to these enable/disable software actions. In particular we will need to disable interrupts when executing nonreentrant code, but disabling interrupts will have the effect of increasing the response time of software. The interrupt service routine (ISR) is the software module that is executed when the hardware requests an interrupt. There may be one large ISR that handles all requests (polled interrupts), or many small ISRs specific for each potential source of interrupt (vectored interrupts). The design of the interrupt service routine requires careful consideration of many factors. Three conditions must be true for an interrupt to be generated. A device must be armed (e.g., RIE is set), interrupts must be enabled (I 0), and an external event must occur setting a trigger flag (e.g., new SCI input ready sets RDRF). An interrupt causes the following sequence of events. First, the current instruction is finished. There are exceptions to this rule: the 9S12 instructions rev revw and wav take a long time to execute, hence these three instructions can be interrupted in the middle of their execution. Second, the execution of the main program is suspended, pushing all the registers on the stack. Third, the PC is loaded with the address of the ISR (vector). Lastly, interrupts are disabled (I 1). These four steps, called a context switch, occur automatically in hardware as the context is switched from foreground to background. Next, the software executes the ISR. When the ISR is done it executes an rti causing the main program execution to be resumed. When the microcomputer accepts an interrupt request, it will automatically save the execution state of the main thread by pushing the registers (CCR, A, B, X, Y, and PC) on the stack. After the ISR provides the necessary service, it will execute an rti instruction. This instruction pulls these registers from the stack, which returns control to the main program. Since all threads use the same stack pointer, it is imperative that the ISR software balance the stack before exiting via the rti instruction. Execution of the main program will then continue with the exact stack and register values that existed before the interrupt. Although interrupt handlers can create and use local variables, parameter passing between threads must be implemented using shared global memory variables. A private global variables can be used if an interrupt thread wishes to pass information to itself, e.g., from one interrupt instance to another. The execution of the main program is called the foreground thread, and the executions of the various interrupt service routines are called background threads. An axiom with interrupt synchronization is that the interrupt program should execute as fast as possible. The interrupt should occur when it is time to perform a needed function, and the interrupt service routine should perform that function, and return right away. Placing backward branches (busy-waiting loops, iterations) in the interrupt software should be avoided if possible. The percentage of time spent executing interrupt software should be minimized. For an input device, the interface latency of an interrupt-driven input device is the time between when new input is available, and the time when the software reads the input data. We can also define device latency as the response time of the external I/O device. For example, if we request that a certain sector be read from a disk, then the device latency is the time it take to find the correct track and spin the disk (seek) so the proper sector is positioned under the read head. For an output device, the interface latency of an interruptdriven output device is the time between when the output device is idle, and the time when the software writes new data. A real-time system is one that can guarantee a worst case interface latency. Many factors should be considered when deciding the most appropriate mechanism to synchronize hardware and software. One should not always use busy-waiting because one is too lazy to implement the complexities of interrupts. On the other hand, one should not always use interrupts because they are fun and exciting. Busy-waiting synchronization is appropriate when the I/O timing is predicable, and when the I/O structure is simple and fixed. Busy-waiting should be used for dedicated single thread systems where there is

332

9 䡲 Interrupt Programming and Real-Time Systems

nothing else to do while the I/O is busy. Interrupt synchronization is appropriate when the I/O timing is variable, and when the I/O structure is complex. In particular, interrupts are efficient when there are I/O devices with different speeds. Interrupts allow for quick response times to important events. In particular, using interrupts is one mechanism to design real-time systems, where the interface latency must be short and bounded. They can also be used for infrequent but critical events like power failure, memory faults, and machine errors. Interrupts can be used to assist program development by triggering on stack overflow, invalid op code, and breakpoints. Periodic interrupts will be useful for real-time clocks, data acquisition systems, and control systems. For extremely high bandwidth and low latency interfaces, DMA should be used. An atomic operation is a sequence that once started will always finish, and can not be interrupted. Most instructions on the 9S12 are atomic. The exceptions are wai rev and revw, which can be suspended to process an interrupt. If we wish to make a section of code atomic, we can run that code with I 1. In this way, interrupts will not be able to break apart the sequence. In particular, to implement an atomic operation we will (1) save the current value of the CCR, (2) disable interrupts, (3) execute the operation, and (4) restore the CCR back to its previous value. Checkpoint 9.1: What three conditions must be true for an interrupt to occur? Checkpoint 9.2: How do you enable interrupts? Checkpoint 9.3: What are the steps that occur when an interrupt is processed?

9.2.2 Essential Components of Interrupt Processing

In this section, we will present the specific details for the 9S12 microcomputers. As you develop experience using interrupts, you will come to notice a few common aspects that most computers share. The following paragraphs outline three essential mechanisms that are needed to utilize interrupts. Although every computer that uses interrupts includes all three mechanisms there are a wide spectrum of implementation methods. All interrupting systems must have the ability for the hardware to request action from computer. The interrupt requests can be generated using a separate connection to microprocessor for each device, or using a shared negative logic wire-or requests using open collector logic. The shared interrupt request line on the 9S12 is IRQ, which is on the PE1 pin. The XIRQ line on the PE0 pin can also be shared, but XIRQ is usually reserved for catastrophic errors. The Freescale microcomputers support both types. All interrupting systems must have the ability for the computer to determine the source. A vectored interrupt system employs separate connections for each device so that the computer can give automatic resolution. You can recognize a vectored system because each device has a separate interrupt vector address. With a polled interrupt system, the interrupt software must poll each device, looking for the device that requested the interrupt. The third necessary component of the interface is the ability for the computer to acknowledge the interrupt. Normally there is a trigger flag in the interface that is set on the busy to ready state transition. In essence this trigger flag is the cause of the interrupt. Acknowledging the interrupt involves clearing this flag. It is important to shut off the request, so that the computer will not mistakenly request a second (and inappropriate) interrupt service for the same condition. Some Intel systems use a hardware acknowledgment that automatically clears the request. Most Freescale microcomputers use a software acknowledge. So when designing an interrupting interface on the 9S12, it will be important to know exactly what hardware conditions will set the trigger flag (and request an interrupt) and how the software will clear it (acknowledge) in the ISR. There are no standard definitions for the terms mask, enable, and arm in the professional, Computer Science, or Computer Engineering communities. Nevertheless, in this book we will adhere to the following specific meanings. To arm (disarm) a device means to enable (shut off) the source of interrupts. Each potential interrupting device has a separate arm bit. One arms (disarms) a device if one is (is not) interested in interrupts from

9.2 䡲 Interrupt Concepts

333

this source. For example, the 9S12 TIE register has eight arm bits for the output compare and input capture interrupts. The Freescale literature calls the arm bit as an “interrupt enable mask”. To enable (disable) means to allow interrupts at this time (postponing interrupts until a later time). On the 9S12 there is one interrupt enable bit for the entire interrupt system. We disable interrupts if it is currently not convenient to accept interrupts. In particular, to disable interrupts we set the I bit in 9S12 condition code register using the sei instruction. The software interrupt (swi) instruction and illegal instruction trap can not be disarmed or disabled. The XIRQ interrupt can be enabled by clearing the X bit in the CCR, but XIRQ interrupts can not be disabled. In particular, once cleared, the software can not set the X bit. The reset line will halt execution and load the PC with the 16-bit contents at $FFFE, but does not save the current state by pushing registers on the stack. Reset can’t be disarmed or disabled. Common Error: The system will crash if the interrupt service routine doesn’t either acknowledge or disarm the device requesting the interrupt. Common Error: The ISR software doesn’t have to explicitly disable interrupts at the beginning (sei) or explicitly reenable interrupts at the end (cli). The disabling and enabling occur automatically.

9.2.3 Sequence of Events

The sequence of events begins with the Hardware needs service (busy to done) transition. This signal is connected to an input of the microcomputer that can generate an interrupt. For example, the key wakeup, input capture, serial communication interface (SCI) and serial peripheral interface (SPI) systems support interrupt requests. Some interrupts are internally generated like output compare, real-time interrupt (RTI), and timer overflow. The second event is the setting of a trigger flag in one of the I/O status registers of the microcomputer. This is the same flag that a busy-waiting interface would be polling on. Examples include the key wakeup (KWIFJn), serial communication interface (RDRF and TDRE), output compare (CnF), real-time interrupt (RTIF), and timer overflow (TOF). In order for an interrupt to be requested the appropriate trigger flag bit must be armed. Examples include the key wakeup (KWIEJn), serial communication interface (RIE and TIE), output compare (CnI), real-time interrupt (RTII), and timer overflow (TOI). In summary, three conditions must be met simultaneously for an interrupt service to occur. These three conditions can occur in any order. 1. A device is armed 2. A microcomputer interrupts are enabled 3. An interrupting event occurs that sets the trigger

e.g., C3I 1 I0 e.g., C3F 1

The third event in the interrupt processing sequence is the context switch, or threadswitch. The thread-switch is performed by the microcomputer hardware automatically. First, the microcomputer will finish the current instruction (rev revw and wav are interruptable). After the current instruction is complete, it takes 9 more bus cycles on the 9S12 to perform the thread-switch: 1. 2. 3. 4. 5. 6. 7. 8. 9.

The 16-bit interrupt vector address is read (eventually this is loaded into the PC) The PC is pushed (return address) The first of three op code fetches is performed to fill the instruction queue Register Y is pushed on the stack Register X is pushed on the stack The second of three op code fetches is performed to fill the instruction queue Registers B and A are pushed on the stack (RegD is pushed little endian) The CCR is pushed, with the I bit still equal to 0, then sets I 1 The third of three op code fetches is performed to fill the instruction queue (queue is full)

334

9 䡲 Interrupt Programming and Real-Time Systems

The fourth event is the software execution of the interrupt service routine (ISR). For a polled interrupt configuration, the ISR must poll each possible device, and branch to specific handler for that device. The polling order establishes device priority. For a vectored interrupt configuration, you could poll anyway to check for runtime hardware/software errors. The ISR must either acknowledge or disarm the interrupt. We acknowledge an interrupt by clearing the trigger flag that was set in the second event shown above. After we acknowledge a low-priority interrupt, we may re-enable interrupts (cli) to allow higher priority devices to go first. All ISR’s must perform the necessary operations (read data, write data etc.) and pass parameters through shared global memory (e.g., FIFO queue). The last event is another thread-switch in order to return control back to the thread that was running when the interrupt was processed. In particular, the software executes an rti at the end of the ISR, which will pull CCR, B, A, X, Y, and PC off the stack. At the beginning of the interrupt service the CCR was pushed on the stack with I 0. Therefore, the execution of rti automatically re-enables interrupts. After the ISR executes rti the stack is restored to the state it was before the interrupt. The ISR may change global variables or I/O ports, but the registers and stack are left unchanged by the ISR. The interrupt hardware will automatically save all registers on the stack during the thread-switch, as shown in Figure 9.5. The thread-switch is the process of stopping the foreground (main) thread and starting the background (interrupt handler). The “oldPC” value on the stack points to the place in the foreground thread to resume once the interrupt is complete. At the end of the interrupt handler, another thread-switch occurs as the rti instruction restores registers from the stack (including the PC). Checkpoint 9.4: What would happen if the ISR forgot to acknowledge the interrupt? Checkpoint 9.5: If you didn’t want to or couldn’t acknowledge what else might the ISR do? Figure 9.5 Stack before and after an interrupt.

Before interrupt

RAM

I 0 SP PC

Stack $3FFF $4000 EEPROM main

$FFFF

9.2.4 9S12 Interrupts

After Context Switch 1) Finish instruction interrupt 2) Push registers 3) PC = {Vector} I 1 4) I=1 SP PC

RAM old CC old B old A old X old Y old PC Stack

$3FFF $4000 EEPROM main

Handler

Handler

rti

rti

Vector

Vector $FFFF

On the 9S12, exceptions include resets, software interrupts and hardware interrupts. Each exception has an associated 16-bit vector that points to the memory location where the ISR that handles the exception is located. Vectors are stored in the upper 128 bytes of the standard 64 kibibyte address map. As we have seen previously, the reset vector points to the main program, but the other vectors will point to interrupt service routines. A hardware priority hierarchy determines which exception is serviced first when simultaneous requests are made. Basically, the exception with the vector at a higher address has priority over an exception with a vector at a lower address. Since the reset vector is at $FFFE, it is the highest priority exception. Six exceptions are

9.2 䡲 Interrupt Concepts

335

nonmaskable, meaning there is no associated arm bit, and the exception is not affected by the I bit in the CCR. The remaining sources have an arm bit that can be activated (armed) or deactivated (disarmed). The priorities of the non-maskable sources are: 1. 2. 3. 4. 5. 6.

Power-On-Reset (POR) or regular hardware RESET pin Clock monitor reset Computer-Operating-Properly (COP) watchdog reset Unimplemented instruction (trap) Software interrupt instruction (swi) XIRQ signal (if X bit in CCR 0)

Maskable interrupt sources include on-chip peripheral systems and external interrupt service requests. Interrupts from these sources are recognized when the interrupt enable bit (I) in the CCR is cleared. The default state of the I bit out of reset is one, but it can be written at any time. The 9S12 has two external requests, XIRQ and IRQ, that are level zero active. Many of the internal I/O devices can generate interrupt requests based on external events (e.g., key wakeup, input capture, SCI, SPI, etc.) Other than the six non-maskable sources listed above, the remaining interrupt requests will temporarily set the I bit in the CCR during the interrupt program to prevent other interrupts (including itself). On the other hand, the XIRQ request temporarily sets both the I and X bits in the CCR during the interrupt program to postpone all other interrupts sources. The interrupts have a fixed priority, but you can elevate one request to highest priority using the HPRIO, Hardware Priority Interrupt Register ($001F). The relative priorities of the other interrupt sources remain the same. We typically use XIRQ to interface a single highest priority device. XIRQ has a separate interrupt vector ($FFF4) and a separate enable bit (X). Once the X bit is cleared (enabled) the software can not disable it. A XIRQ interrupt is requested when the external XIRQ pin is low and the X bit in the CCR is 0. XIRQ processing will automatically set X I 1 (an IRQ can not interrupt an XIRQ service) at the start of the XIRQ handler. Just like regular interrupts, the X and I bits will be restored to their original values by the rti instruction. The priority is fixed in the order shown in Table 9.1 with Key Wakeup P having the lowest priority and Reset having the highest. Not all interrupt sources are available on every 9S12, but this list defines some of the interrupt sources. Any one particular application usually uses just a few interrupts. In particular, those devices that need prompt service should be armed to request an interrupt. The software arms (specific for each possible source) and enables (I 0 globally) interrupts. The external event triggers the interrupt by setting the trigger flag. The interrupt service routine (ISR) is executed in response to the trigger. The ISR acknowledges the interrupt by clearing the trigger flag. For some interrupt sources, such as the SCI interrupts, flags are automatically cleared during the response to the interrupt requests. For example, the RDRF flag in the SCI system is cleared by the automatic clearing mechanism, consisting of a read of the SCI status register while RDRF is set, followed by a read of the SCI data register. The normal response to an RDRF interrupt request is to read the SCI status register to check for receive errors, then to read the received data from the SCI data register. These two steps satisfy the automatic clearing mechanism without requiring any special instructions. On the other hand, many trigger flags employ a confusing, but effective way for the software to acknowledge it. Flags such as RTIF, CnF, TOF, PIFJn, PIFHn, and PIFPn are cleared when the software writes a 1 into the bit position of that flag. Writing a zero to the flag register has no effect, and writing a $FF clears all the flag bits in the register. Many of the potential interrupt requests share the

336

9 䡲 Interrupt Programming and Real-Time Systems

same interrupt vector. E.g., there are 8 possible key wakeup interrupt sources (PH7 to PH0) that all use the vector at $FFCC. Therefore, when this request is processed the ISR software must determine which of the 8 possible signals caused the interrupt. Vector Address

CW Number

Interrupt Source or Trigger flag

$FFFE $FFFC

0 1

Reset COP Clock Monitor Fail Reset

$FFFA $FFF8 $FFF6 $FFF4 $FFF2 $FFF0 $FFEE $FFEC $FFEA $FFE8 $FFE6 $FFE4 $FFE2 $FFE0 $FFDE $FFDC $FFDA $FFD8

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

$FFD6

20

$FFD4

21

$FFD2 $FFD0 $FFCE $FFCC $FFC8 $FFC0 $FFBE

22 23 24 25 27 31 32

$FFBC

33

$FFB6 $FFB4

36 37

COP Failure Reset Unimplemented Instruction Trap SWI XIRQ IRQ Real Time Interrupt, RTIF Timer Channel 0, C0F Timer Channel 1, C1F Timer Channel 2, C2F Timer Channel 3, C3F Timer Channel 4, C4F Timer Channel 5, C5F Timer Channel 6, C6F Timer Channel 7, C7F Timer Overflow, TOF Pulse Acc. Overflow, PAOVF Pulse Acc. Input Edge, PAIF SPI0 Transfer Complete, SPIF SPI0 Transmit Empty, SPTEF SCI0 Transmit Buff Empty, TDRE SCI0 Transmit Complete, TC SCI0 Receiver Buffer Full, RDRF SCI0 Receiver Idle, IDLE SCI1 Transmit Buff Empty, TDRE SCI1 Transmit Complete, TC SCI1 Receiver Buffer Full, RDRF SCI1 Receiver Idle, IDLE ATD0 Sequence Complete, ASCIF ATD1 Sequence Complete, ASCIF Key Wakeup J, PIFJ.[7:6],[1,0] Key Wakeup H, PIFH.[7:0] Pulse Acc. Overflow, PBOVF I2C SPI1 Transfer Complete, SPIF SPI1 Transmit Empty, SPTEF SPI2 Transfer Complete, SPIF SPI2 Transmit Empty, SPTEF CAN wakeup CAN errors

$FFB2 $FFB0 $FF8E

38 39 56

CAN receive CAN transmit Key Wakeup P, PIFP[7:0]

Enable none none none none none none X bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit

I bit

I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit

Table 9.1 Some of the interrupt vectors for the 9S12 CW stands for CodeWarrior.

Local Arm

HPRIO to Elevate

none COPCTL.CME COPCTL.FCME COP rate selected none none none INTCR.IRQEN CRGINT.RTIE TIE.C0I TIE.C1I TIE.C2I TIE.C3I TIE.C4I TIE.C5I TIE.C6I TIE.C7I TIE.TOI PACTL.PAOVI PACTL.PAI SPI0CR1.SPIE SPI0CR1. SPTIE SCI0CR2.TIE SCI0CR2.TCIE SCI0CR2.RIE SCI0CR2.ILIE SCI1CR2.TIE SCI1CR2.TCIE SCI1CR2.RIE SCI1CR2.ILIE ATD0CTL2.ASCIE ATD1CTL2.ASCIE PIEJ.[7:6], [1:0] PIEH.[7:0] PBCTL.PBOVI IBCR.IBIE SPI1CR1.SPIE SPI1CR1. SPTIE SPI2CR1.SPIE SPI2CR1. SPTIE CANRIER.WUPIE CANRIER.CSCIE CANRIER.OVRIE CANRIER.RXFIE CANTIER.TXEIE[2:0] PIEP.[7:0]

– – – – – – – $F2 $F0 $EE $EC $EA $E8 $E6 $E4 $E2 $E0 $DE $DC $DA $D8 $D6

$D4

$D2 $D0 $CE $CC $DC $C0 $BE $BC $B6 $B4 $B2 $B0 $8E

9.2 䡲 Interrupt Concepts

9.2.5 Polled Versus Vectored Interrupts

337

As we defined earlier, when more than one source of interrupt exists the computer must have a reliable method to determine which interrupt request has been made. There are two common approaches, and the Freescale microcomputers apply a combination of both methods. The first approach is called vectored interrupts. With a vectored interrupt system each potential interrupt source has a unique interrupt vector address. You simply place the correct handler address in each vector, and the hardware automatically calls the correct software when an interrupt is requested, see Table 9.1. The second approach is called polled interrupts. SCI, SPI, and key wakeup must be polled. With a polled interrupt system multiple interrupt sources share the same interrupt vector address (e.g., both RDRF and TDRE share the same vector). Once the interrupt has occurred, the ISR software must poll the potential devices to determine which device needs service. The 9S12 systems have a separate acknowledgment, so that if both interrupts are pending, acknowledging one will not satisfy the other, so the second device will request a second interrupt and get serviced. Common Error: If two interrupts were requested, it would be a mistake to service just one and acknowledge them both. Observation: External events are often asynchronous to program execution, so careful thought is required to consider the effect if an external interrupt request were to come in between each pair of instructions. Observation: The computer automatically sets the I bit during processing, so that an interrupt handler will not interrupt itself.

9.2.6 Pseudo-Interrupt Vectors

Table 9.2 Some pseudo-interrupt vectors for the 9S12.

Some development boards do not allow you to erase and reprogram the interrupt vectors from $FF80 to $FFFF. In these development systems, the ROM at $FF80 to $FFFF has the interrupt vectors pointing to memory locations that can be set. The locations to which the real vectors point are called pseudo-interrupt vectors. Typically, the pseudointerrupt vectors are defined in the same order as the real vectors. Pseudo vectors for three debuggers are shown in Table 9.2. In the old 6811 development boards each pseudo vector was in RAM and required 3 bytes. Three bytes were required to place a jmp instruction to your ISR. During a 6811 initialization, the program places jmp instructions into the pseudo vectors. In contrast, most 9S12 debuggers only require 2 bytes for each pseudo vector. The MON12 debugger on 9S12 boards from Axiom (http://www.axman.com) and the D-Bug12 debugger from Technological Arts implement 16-bit pseudo vectors in RAM. During initialization at run-time, your program

Real MON12 Vector Pseudo Vector

D-Bug12 Pseudo Vector

Serial Monitor Interrupt Source or Pseudo Vector Trigger Flag

$FFFE $FFFC $FFFA $FFF8 $FFF6 $FFF4 $FFF2 $FFF0 $FFEE ...

none none none $3E78 $3E76 $3E74 $3E72 $3E70 $3E6E ...

none $F7FC $F7FA $F7F8 $F7F6 $F7F4 $F7F2 $F7F0 $F7EE ...

none $0FFC $0FFA $0FF8 $0FF6 $0FF4 $0FF2 $0FF0 $0FEE ...

Reset COP Clock Monitor Fail Reset COP Failure Reset Unimplemented Instruction Trap SWI XIRQ IRQ Real Time Interrupt, RTIF Timer Channel 0, C0F ...

338

9 䡲 Interrupt Programming and Real-Time Systems

must place pointers to your ISRs into the pseudo vectors. Everytime an interrupt occurs the Axiom MON12 debugger requires 21 extra bus cycles to implement the indirect jump to your ISR. The Freescale Serial Monitor used by Metrowerks CodeWarrior and TExaS also employ pseudo vectors. The difference is the Serial Monitor pseudo vectors are in EEPROM. Your software does not have to perform any run-time initialization of the pseudo vector. Rather, the Serial Monitor will automatically translate a “Program ROM” command from $FF80-$FFFF down to $F780-$F7FF. For example, this code is the proper way to set the TC0 interrupt vector in a system without pseudo vectors, see Table 9.1. org fdb

$FFEE TC0han

However, when your software is loaded into EEPROM, this vector transparently and automatically ends up being programmed at $F7EE. Everytime an interrupt occurs the Serial Monitor requires 19 extra bus cycles to implement the pseudo vector. The actual Serial Monitor code for an interrupt is uvector08: bsr ... ISRHandler: pulx ldy cpy beq jmp

ISRHandler

-$0636,X #$FFFF BadVector ,Y

;TC0 interrupt starts executing here ;pull bsr return address off stack ;get value of pseudo vector ;is it programmed? ;jump to your ISR

SCI interrupts with the serial monitor include an overhead longer than 19 cycles, because the SCI interrupts are used by the debugger itself to perform its actions. In particular, after a SCI interrupt, the debugger will check the LOAD/RUN switch to see if the debugger or user program should process the interrupt.

9.3

Key Wakeup Interrupts The basic idea of key wakeup is to connect an input to the 9S12 and configure the interface so an interrupt is requested on either the rising or falling edge of the input. Using key wakeup allows make software respond quickly to changes in the external world. The 9S12C32 has ten possible key wakeup interrupt sources, which are available on Ports J, and P. The 9S12DP512 has twenty key wakeup interrupt sources, which are available on Ports H, J, and P. See Table 9.3. Any or all of these pins can be configured as a key wakeup interrupt. Each of the wakeup lines has a separate I/O pin (PTH, PTJ, PTP), a direction register bit (DDRH, DDRJ, DDRP), a trigger flag bit (PIFH, PIFJ, PIFP), an arm bit (PIEH, PIEJ, PIEP), and a polarity bit (PPSH, PPSJ, PPSP). First we identify external digital signals containing strategic edges (rising or falling). In particular, strategic means we wish to execute software whenever one of these edges occur. We connect these digital signals to individual key wakeup pins. To use key wakeup, we must make these lines an input, and configure the strategic edge to be active. Key wakeup interrupts can be configured to be active on either the rising or falling edge. If the corresponding bit in the PPSH/PPSJ/PPSP is 0, then a falling edge will set the trigger flag. Conversely, if the bit in the PPSH/PPSJ/PPSP register is 1, then a rising edge will set the trigger flag. A key wakeup interrupt will be generated if the trigger flag bit is set, the arm bit is set and the interrupts are enabled (I 0).

9.3 䡲 Key Wakeup Interrupts

339

Address

Bit 7

6

5

4

3

2

1

Bit 0

Name

$0260 $0261 $0262 $0263 $0264 $0265 $0266 $0267 $0268 $0269 $026A $026B $026C $026D $026E $026F $0258 $0259 $025A $025B $025C $025D $025E $025F

PH7 PH7 DDRH7 RDRH7 PERH7 PPSH7 PIEH7 PIFH7 PJ7 PJ7 DDRJ7 RDRJ7 PERJ7 PPSJ7 PIEJ7 PIFJ7 PP7 PP7 DDRP7 RDRP7 PERP7 PPSP7 PIEP7 PIFP7

PH6 PH6 DDRH6 RDRH6 PERH6 PPSH6 PIEH6 PIFH6 PJ6 PJ6 DDRJ6 RDRJ6 PERJ6 PPSJ6 PIEJ6 PIFJ6 PP6 PP6 DDRP6 RDRP6 PERP6 PPSP6 PIEP6 PIFP6

PH5 PH5 DDRH5 RDRH5 PERH5 PPSH5 PIEH5 PIFH5 – – – – – – – – PP5 PP5 DDRP5 RDRP5 PERP5 PPSP5 PIEP5 PIFP5

PH4 PH4 DDRH4 RDRH4 PERH4 PPSH4 PIEH4 PIFH4 – – – – – – – – PP4 PP4 DDRP4 RDRP4 PERP4 PPSP4 PIEP4 PIFP4

PH3 PH3 DDRH3 RDRH3 PERH3 PPSH3 PIEH3 PIFH3 – – – – – – – – PP3 PP3 DDRP3 RDRP3 PERP3 PPSP3 PIEP3 PIFP3

PH2 PH2 DDRH2 RDRH2 PERH2 PPSH2 PIEH2 PIFH2 – – – – – – – – PP2 PP2 DDRP2 RDRP2 PERP2 PPSP2 PIEP2 PIFP2

PH1 PH1 DDRH1 RDRH1 PERH1 PPSH1 PIEH1 PIFH1 PJ1 PJ1 DDRJ1 RDRJ1 PERJ1 PPSJ1 PIEJ1 PIFJ1 PP1 PP1 DDRP1 RDRP1 PERP1 PPSP1 PIEP1 PIFP1

PH0 PH0 DDRH0 RDRH0 PERH0 PPSH0 PIEH0 PIFH0 PJ0 PJ0 DDRJ0 RDRJ0 PERJ0 PPSJ0 PIEJ0 PIFJ0 PP0 PP0 DDRP0 RDRP0 PERP0 PPSP0 PIEP0 PIFP0

PTH PTIH DDRH RDRH PERH PPSH PIEH PIFH PTJ PTIJ DDRJ RDRJ PERJ PPSJ PIEJ PIFJ PTP PTIP DDRP RDRP PERP PPSP PIEP PIFP

Table 9.3 9S12 key wakeup ports (all twenty pins are available on the 9S12DP512, while just the ten shaded pins are available on the 9S12C32).

Another convenience of Ports H, J, and P is the available pull-up or pull-down resistors a shown in Table 9.4. Each of the pins of Ports H, J, and P can be configured separately.

DDRH/DDRJ/DDRP

PPSH/PPSJ/PPSP

PERH/PERJ/PERP

Port Mode

1 0 0 0 0

– 0 1 0 1

– 0 0 1 1

Regular output Regular input, falling edge Regular input, rising edge Input with passive pull-up, falling edge Input with passive pull-down, rising edge

Table 9.4 Pull up/down modes of Ports H, J and P.

A typical application of pull-up is the interface of simple switches. Using pull-up or pull-down mode eliminates the need for an external resistor when interfacing a switch. The PJ6, PT6 interfaces in Figure 9.6a) implement negative logic switch inputs, and the PJ7, PT7 interfaces in Figure 9.6b) implement positive logic switch inputs. The Port P interfaces employ internal resistors.

340

9 䡲 Interrupt Programming and Real-Time Systems

Figure 9.6 Key wakeup or input capture can generate interrupts on a switch touch.

+5V 9S12 +5V

PJ6

9S12 +5V

PJ7

10kΩ PT6

PT7 10kΩ

(a) Pull-up interface

(b) Pull-down interface

Checkpoint 9.6: What values to you write into DDRJ, PPSJ and PERJ to configure the switch interfaces of PJ6 and PJ7 in Figure 9.6?

Three conditions must be simultaneously true for a key wakeup interrupt to be requested: 䡲 The trigger flag bit is set 䡲 The arm bit is set 䡲 The I bit in the 9S12 CCR is 0 Even though there are twenty key wakeup lines, there are only three interrupt vectors, one for Port H, one for Port J and the other for Port P. So, if two or more wakeup interrupts are used on the same port, it will be necessary to poll. Interrupt polling is the software function to look and see which of the potential sources requested the interrupt. The flag bits are cleared by writing a one to it. For example, to clear Port P trigger flag 7 in C we can execute PIFP = 0x80;

// clears flag bit 7 of Port P

In assembly, to clear Port P trigger flag 7 movb #$80,PIFP ; clears flag bit 7 of Port P

Example 9.1 You are asked to design a measurement system for the robot in Figure 8.17 that counts the number of times the wheel turns. This count will be a measure of the total distance travelled. The desired resolution is 1⁄32 of a turn and the desired range is 0 to 2047 31⁄32 revolutions Solution Whenever you measure something, it is important to consider the resolution and range. The basic idea is to use an optical sensor (QRB1134) to visualize the position of the wheel. A black/white striped pattern is attached to the wheel, and an optical reflective sensor placed near the stripes. The sensor has an infrared LED output and a light sensitive transistor. The current to the 1.8 V LED is controlled by the R1 resistor. In this circuit, the LED current will be (5-1.8 V)/200 , which is 16 mA. The R2 pull-up resistor on the transistor creates a output swing at V1 depending on whether the sensor sees a black stripe or white stripe. Unfortunately, the signal V1 is not digital. The rail-to-rail op amp, in open loop mode, creates a clean digital signal at V2, which has the same frequency as V1. The negative terminal is set to a voltage approximately in the center of V1, shown as 2 V in Figure 9.7. In general, we should select the threshold at the place in the wave where the slope is maximum. We then interface V2 to a key wakeup pin, and configure the system to trigger a key wakeup interrupt on each rising edge. This solution uses PP5, such that a rising edge triggers an interrupt on Port P key wakeup, see Program 9.1. Because there are 32 stripes on the wheel, there will be 32 interrupts each time the wheel rotates once. A 16-bit counter is used, because we expect less than 65535 counts. The count is a binary fixed-point number with a resolution of 25 revolutions. E.g., if the count is 100, this means 100/32 or 3.125 revolutions. We also assume no other key wakeup channels on Port P will be used.

9.3 䡲 Key Wakeup Interrupts Figure 9.7 An optical sensor is used to detect rotations on a wheel.

+5V

+5V 5kΩ R2

200Ω R1

TLC2274 +5V V1 +

QRB1134 2V

light

341

9S12 V2

PP5

–

V2 V1

5V 2V 0V

org $0800 ;($3800 if 9S12C32) rmb 2 ;0.03125 revolutions org $4000 ;Rising edge on PP5 causes an interrupt Key_Init movw #0,Count bclr DDRP,#$20 ; PP5 is input bclr PERP,#$20 ; no pull down on PP5 bset PPSP,#$20 ; rising edge active bset PIEP,#$20 ; arm PP5 movb #$20,PIFP ; clear flag cli rts Keyhandler movb #$20,PIFP ; ack, clear flag ldx count inx stx Count ; units 1/32 revolution rti org $FF8E fdb Keyhandler Count

// Rising edge on PP5 causes an interrupt unsigned short Count; // 1/32 revolutions void Key_Init(void){ Count = 0; DDRP &= ~0x20; // PP5 is input PERP &= ~0x20; // no pull down on PP5 PPSP |= 0x20; // rising edge active PIEP |= 0x20; // arm PP5 PIFP = 0x20; // clear flag asm cli // enable interrupts } interrupt 56 void Keyhandler(void){ PIFP = 0x20; // clear flag Count++; // 1/32 revolution }

Program 9.1 Assembly and C implementations of an interrupting key wakeup.

Because of the read, modify, write sequence, the following software clears all the flag bits (hence these are inappropriate ways to clear one flag.)

bset PIFP,#$04

PIFP |= 0x04;

Observation: All 8 key wakeup lines on Port P use the same interrupt vector, but they have separate polarity, arm, pullup/down, and flag bits. Checkpoint 9.7: How do you modify Program 9.1 so it counts falling edges?

If a pin is configured as an input, then reads to PTH/PTJ/PTP return the same value as reads to PTIH/PTIJ/PTIP, which will be the digital value at the input. Conversely, if a pin is configured as an output, then reads to PTH/PTJ/PTP return the most recent value written to the output port, while reads to PTIH/PTIJ/PTIP will return the digital value at the output pin. The RDRH/RDRJ/RDRP register determines the drive strength of an output signal. If the bit is 1, then the corresponding output will have 1/3 drive current. This mode is used to reduce supply current to the 9S12.

342

9 䡲 Interrupt Programming and Real-Time Systems

9.4

Periodic Interrupt Programming We will continue our interrupt examples with periodic interrupts. Periodic interrupts are both simple to understand and extremely useful for real-time embedded systems. A periodic interrupt is one that is requested on a fixed time basis. Periodic interrupts are required for data acquisition and control systems, because software execution must be performed periodically at accurate time intervals. For a data acquisition system, it is important to establish an accurate sampling rate. The time in between ADC samples must be equal (and known) in order for the digital signal processing to function properly. Similarly for microcomputer-based control systems, it is important to maintain both the timing with the sensors (inputs) and with the actuators (outputs). One synchronization method that uses periodic interrupts is called “intermittent polling” or “periodic polling”. In regular busy-waiting, the main program polls the I/O devices continuously. With intermittent polling, the I/O devices are polled on a regular basis, established by a periodic interrupt, as shown in the flowchart of Figure 9.8. Assume for a moment that all n devices are simultaneously ready. It is an appropriate design constraint for the time it takes to service all n devices (maximum time to execute the ISR) to be small compared to the interrupt period used for the periodic polling. This constraint will prevent the periodic polling ISR from capturing all the available CPU time. Similarly, the time to execute this ISR will affect the response time of other interrupts in the system. On the other hand, the interrupt frequency used for the periodic polling should be large compared to the bandwidth of the I/O channel, so no data are lost. If no device needs service, then the interrupt simply returns. This method frees the main program from the I/O tasks. The original IBM-PC computer used an 18 Hz

Figure 9.8 An ISR flowchart that implements periodic polling.

Periodic Interrupt Ready Device 1 Busy

Input/Output Data 1 Ready

Device 2 Busy

Input/Output Data 2

Ready Device n Busy

Input/Output Data n

Acknowledge Interrupt rti

9.5 䡲 Real-Time Interrupt (RTI)

343

periodic interrupt to interface its keyboard. It is appropriate to use periodic polling when the following two conditions apply: 1. The I/O hardware can not generate interrupts directly. 2. We wish to perform the I/O functions in the background. Observation: The average response time of an event interfaced with periodic polling is 1/2 the period. Observation: The worst case response time of an event interfaced with periodic polling is the period.

There are three mechanisms on the 9S12 that generate periodic interrupts: real-timeinterrupt (RTI), timer overflow (TOF) and output compare (OC).

9.5

Real-Time Interrupt (RTI) First, the real-time interrupt (RTI) mechanism can generate interrupts at a fixed rate. Seven bits (RTR6-0) in the RTICTL register specify the interrupt rate. The 7-bit value is composed of two parts: Let RTR6, RTR5, RTR4 be n, which is a 3-bit number ranging from 0 to 7 Let RTR3, RTR2, RTR1, RTR0 be m, which is a 4-bit number ranging from 0 to 15 Table 9.5 shows the 9S12 registers used in RTI interrupts. The entries shown in bold will be used in this section.

Table 9.5 9S12 registers used to configure real time interrupts.

Address Bit 7 $0037 $0038 $003B

6

5

RTIF PROF RTIE 0 0 RTR6

4

3

0 LOCKIF LOCK 0 LOCKIE 0 RTR5 RTR4 RTR3

2

1

TRACK SCMIF 0 SCMIE RTR2 RTR1

Bit 0

Name

SCM CRGFLG 0 CRGINT RTR0 RTICTL

If n is zero, then the RTI system is off. A 9S12C32 with an 8 MHz crystal will have an OSCCLK frequency of 8 MHz and a default E clock frequency of 4 MHz. A 9S12DP512 with a 16 MHz crystal will have an OSCCLK frequency of 16 MHz and a default E clock frequency of 8 MHz. Let fcrystal be the crystal frequency, then the RTI interrupt frequency can be calculated using RTI interrupt frequency fcrystal *2n/(m 1)/512 RTI interrupt period 512*(m 1)*2n/fcrystal Observation: The phase-lock-loop (PLL) on the 9S12 will not affect the RTI rates.

The interrupt rate is determined by the crystal clock and the RTICTL value. Table 9.6 shows the available interrupt periods, assuming an 8 MHz crystal. Table 9.7 shows the available interrupt periods, assuming a 16 MHz crystal. Basically, the RTIF trigger flag is set periodically. If armed (RTIE 0), this trigger flag will request an interrupt. To clear the RTIF flag (acknowledge the interrupt), the software writes a one to it.

344

9 䡲 Interrupt Programming and Real-Time Systems

Table 9.6 9S12 real-time interrupt period in ms, assuming an 8 MHz crystal.

Table 9.7 9S12 real-time interrupt period in ms, assuming a 16 MHz crystal.

0000 0001 0010 0011 0100 0101 0110 m [3:0] 0111 1000 1001 1010 1011 1100 1101 1110 1111

0000 0001 0010 0011 0100 0101 0110 m [3:0] 0111 1000 1001 1010 1011 1100 1101 1110 1111

n [6:4] of the RTICTL 011 100 101

000

001

010

off off off off off off off off off off off off off off off off

0.128 0.256 0.384 0.512 0.640 0.768 0.896 1.024 1.152 1.280 1.408 1.536 1.664 1.792 1.920 2.048

0.256 0.512 0.768 1.024 1.280 1.536 1.792 2.048 2.304 2.560 2.816 3.072 3.328 3.584 3.840 4.096

000

001

010

off off off off off off off off off off off off off off off off

0.064 0.128 0.192 0.256 0.320 0.384 0.448 0.512 0.576 0.640 0.704 0.768 0.832 0.896 0.960 1.024

0.128 0.256 0.384 0.512 0.640 0.768 0.896 1.024 1.152 1.280 1.408 1.536 1.664 1.792 1.920 2.048

0.512 1.024 1.536 2.048 2.560 3.072 3.584 4.096 4.608 5.120 5.632 6.144 6.656 7.168 7.680 8.192

1.024 2.048 3.072 4.096 5.120 6.144 7.168 8.192 9.216 10.240 11.264 12.288 13.312 14.336 15.360 16.384

2.048 4.096 6.144 8.192 10.240 12.288 14.336 16.384 18.432 20.480 22.528 24.576 26.624 28.672 30.720 32.768

n [6:4] of the RTICTL 011 100 101 0.256 0.512 0.768 1.024 1.280 1.536 1.792 2.048 2.304 2.560 2.816 3.072 3.328 3.584 3.840 4.096

0.512 1.024 1.536 2.048 2.560 3.072 3.584 4.096 4.608 5.120 5.632 6.144 6.656 7.168 7.680 8.192

1.024 2.048 3.072 4.096 5.120 6.144 7.168 8.192 9.216 10.240 11.264 12.288 13.312 14.336 15.360 16.384

110

111

4.096 8.192 12.288 16.384 20.480 24.576 28.672 32.768 36.864 40.960 45.056 49.152 53.248 57.344 61.440 65.536

8.192 16.384 24.576 32.768 40.960 49.152 57.344 65.536 73.728 81.920 90.112 98.304 106.496 114.688 122.880 131.072

110

111

2.048 4.096 6.144 8.192 10.240 12.288 14.336 16.384 18.432 20.480 22.528 24.576 26.624 28.672 30.720 32.768

4.096 8.192 12.288 16.384 20.480 24.576 28.672 32.768 36.864 40.960 45.056 49.152 53.248 57.344 61.440 65.536

Example 9.2 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic RTI interrupt that occurs every 32.768 ms. RTI is simple, and accurate if the desired interrupt period matches one of the possibilities shown in Table 9.6 or 9.7. The main program executes RTI_Init to initialize the RTI

9.6 䡲 Timer Overflow, Output Compare, and Input Capture

345

interrupts, as shown in Program 9.2. The RTI rate is determined by the crystal frequency and the RTICTL register. Bit 7 of the CRGINT register is set to arm the RTI system. The RTI_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.

; 9S12C32 4 MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 RTI_Init sei ;make atomic movb #$77,RTICTL ;($73 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ rts ; interrupts every 32.768ms RTIHan movb #$80,CRGFLG ;ack ldd Time addd #1 std Time rti org $FFF0 fdb RTIHan ;vector

// 9S12C32 4 MHz, 9S12DP512 8 MHz unsigned short Time; void RTI_Init(void){ asm sei // RTICTL = 0x77; // CRGINT = 0x80; // Time = 0; // asm cli }

Make atomic (0x73 if C32) Arm Initialize

// interrupts every 32.768ms void interrupt 7 RTIHan(void){ CRGFLG = 0x80; // Acknowledge Time++; }

Program 9.2 Implementation of a periodic interrupt using the real time clock feature.

Checkpoint 9.8: How would you modify Program 9.2 to count every 10.24 ms?

9.6

Timer Overflow, Output Compare, and Input Capture

9.6.1 Timer Features and Timer Overflow

Table 9.8 shows the 9S12 registers used in timer overflow, input capture and output compare. The entries shown in bold will be used in this section. The timer overflow interrupt feature can also be used to generate interrupts at a fixed rate, as listed in Table 9.9. The 16bit TCNT register is incremented at a fixed rate. The TOF trigger flag is set when the counter overflows and wraps back around (automatically) to zero. If armed, the TOF trigger flag will generate an interrupt. Three bits (PR2, PR1, and PR0) in the TSCR2 register determine the rate at which the counter will increment, hence will determine the TOF interrupt rate. To clear the TOF flag (acknowledge the interrupt), the software writes a one to it. To create a TOF periodic interrupt, we enable the timer (TEN 1), arm the timer overflow (TOI), and set the rate (PR2-0). Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The TOF interrupt rate is TOF interrupt frequency fE /2n 16 TOF interrupt period 2n 16/fE

346

9 䡲 Interrupt Programming and Real-Time Systems

Address

msb

$0044 $0050 $0052 $0054 $0056 $0058 $005A $005C $005E

15 15 15 15 15 15 15 15 15

Address

Bit 7

6

5

4

3

2

1

Bit 0

Name

$0240 $0242 $0046 $004D $0040 $004C $004E $004F $0048 $0049 $004A $004B

PT7 DDRT7 TEN TOI IOS7 C7I C7F TOF OM7 OM3 EDG7B EDG3B

PT6 DDRT6 TSWAI 0 IOS6 C6I C6F 0 OL7 OL3 EDG7A EDG3A

PT5 DDRT5 TSBCK 0 IOS5 C5I C5F 0 OM6 OM2 EDG6B EDG2B

PT4 DDRT4 TFFCA 0 IOS4 C4I C4F 0 OL6 OL2 EDG6A EDG2A

PT3 DDRT3 0 TCRE IOS3 C3I C3F 0 OM5 OM1 EDG5B EDG1B

PT2 DDRT2 0 PR2 IOS2 C2I C2F 0 OL5 OL1 EDG5A EDG1A

PT1 DDRT1 0 PR1 IOS1 C1I C1F 0 OM4 OM0 EDG4B EDG0B

PT0 DDRT0 0 PR0 IOS0 C0I C0F 0 OL4 OL0 EDG4A EDG0A

PTT DDRT TSCR1 TSCR2 TIOS TIE TFLG1 TFLG2 TCTL1 TCTL2 TCTL3 TCTL4

14 14 14 14 14 14 14 14 14

13 13 13 13 13 13 13 13 13

12 12 12 12 12 12 12 12 12

11 11 11 11 11 11 11 11 11

10 10 10 10 10 10 10 10 10

9 9 9 9 9 9 9 9 9

8 8 8 8 8 8 8 8 8

7 7 7 7 7 7 7 7 7

6 6 6 6 6 6 6 6 6

5 5 5 5 5 5 5 5 5

4 4 4 4 4 4 4 4 4

3 3 3 3 3 3 3 3 3

2 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1

lsb

Name

0 0 0 0 0 0 0 0 0

TCNT TC0 TC1 TC2 TC3 TC4 TC5 TC6 TC7

Table 9.8 9S12 registers used for timer overflow, input capture, and output compare.

E 4 MHz

E 8 MHz

E 24 MHz

PR2

PR1

PR0

Divide by

TCNT period

TOF period

TCNT period

TOF period

TCNT period

TOF period

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

1 2 4 8 16 32 64 128

250 ns 500 ns 1 s 2 s 4 s 8 s 16 s 32 s

16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms 2097.152 ms

125 ns 250 ns 500 ns 1 s 2 s 4 s 8 s 16 s

8.192 ms 16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms

42 ns 83 ns 167 ns 333 ns 667 ns 1.33 s 2.67 s 5.33 s

2.73067 ms 5.46133 ms 10.9227 ms 21.8453 ms 43.6907 ms 87.3813 ms 174.763 ms 349.525 ms

Table 9.9 Timer overflow periods for various E clock frequencies.

Example 9.3 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic timer overflow interrupt that occurs every 32.768 ms. The main program executes TOF_Init to initialize the periodic interrupts, as shown in

9.6 䡲 Timer Overflow, Output Compare, and Input Capture

347

Program 9.3. The interrupt rate is determined by the crystal frequency, the PLL and the TSCR2 register. When an E clock of 8 MHz, 32.768 ms/125 ns is 218, so the bottom three bits of TSCR2 should be 2. Bit 7 of the TSCR2 register is set to arm the TOF system. The TOF_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.

; 9S12C32 4MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 TOF_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$82,TSCR2 ;($81 if C32) movw #0,Time cli ;enable IRQ rts TOFHan movb #$80,TFLG2 ;acknowledge ldd Time addd #1 std Time rti org $FFDE fdb TOFHan ;vector

// 9S12C32 4MHz, (9S12DP512 8 MHz) unsigned short Time; void TOF_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // enable counter TSCR2 = 0x82; // (0x81 if C32) Time = 0; // Initialize asm cli // enable interrupts }

interrupt 16 void TOFHan(void){ TFLG2 = 0x80; // Acknowledge Time++; }

Program 9.3 Implementation of a periodic interrupt using timer overflow.

Checkpoint 9.9: How would you modify Program 9.3 to count approximately every 1 second?

9.6.2 Output Compare Interrupts

The third mechanism to generate periodic interrupts is output compare. There are 8 independent output compare channels, numbered 0 to 7. Let i be the channel number, 0 i 7. To enable output compare the corresponding bit in the TIOS register must be set. When the TCNT register matches TCi, the output compare flag, CiF is set. If armed (CiI1), then it will request an interrupt. To clear the CiF flag (acknowledge the interrupt), the software writes a one to it. The ISR will acknowledge the interrupt and set TCi TCiPERIOD, where PERIOD is a constant, specifying the time for the next interrupt. The interrupting period is determined by the TCNT period (set by TSCR2) multiplied by the constant PERIOD. Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The output compare interrupt rate is OC interrupt frequency fE /2n/PERIOD OC interrupt period PERIOD*2n/fE TCTL1 and TCTL2 registers are also used for output compare. If OMi OLi 0 then an output compare event will not directly affect the output pin. If the pair (OMi,OLi) equals (0,1) then the output pin will toggle on each output compare. If the pair (OMi,OLi) equals (1,0) then the output pin will clear on each output compare. If the pair (OMi,OLi) equals (1,1) then the output pin will set on each output compare.

348

9 䡲 Interrupt Programming and Real-Time Systems

Example 9.4 Write software that increments a global variable every 1 second. Solution When an E clock of 8 MHz, 1 s/125 ns is 8,000,000. The only possibility is to make n equal to 7 and PERIOD equal to 62500. Program 9.4 shows a periodic interrupt using output compare 6, incrementing a global variable, Time, every 1 sec. During the initialization, bit 7 of TSCR1 is set to activate the timer system. The TCNT period is set to 16 s in TSCR2. Bit 6 in TIOS is set to activate output compare on channel 6. The arm bit is set in TIE. The global variable is cleared in the initialization. The initial value of TC6 is set so the first interrupt occurs in 80 s (subsequent interrupts will occur every 1 s). It is possible the C6F flag might already be set, due to activity occurring before the initialization is executed. Clearing the C6F trigger flag in the initialization guarantees the first interrupt will occur exactly 80 s later. ; 9S12C32 4 MHz, 9S12DP512 8 MHz PERIOD equ 62500 ;in 16usec org $0800 ;($3800 if C32) Time rmb 2 org $4000 OC6_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$07,TSCR2 ;($06 if C32) bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 movw #0,Time ldd TCNT ;time now addd #5 ;first in 80us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han movb #$40,TFLG1 ;acknowledge ldd TC6 addd #PERIOD std TC6 ;next in 1 s ldd Time addd #1 std Time rti org $FFE2 fdb OC6Han ;vector

// 9S12C32 4 MHz, 9S12DP512 8 MHz #define PERIOD 62500 unsigned short Time;

void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 16us TCNT TSCR2 = 0x07; // (0x06 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 Time = 0; // Initialize TC6 = TCNT+5; // first in 80us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+PERIOD; // next in 1 s TFLG1 = 0x40; // acknowledge C6F Time++; }

Program 9.4 Implementation of a periodic interrupt using output compare. Checkpoint 9.10: How would you modify Program 9.4 to count at 100 Hz? Observation: The phase-lock-loop (PLL) on the 9S12 will affect the TOF and output compare rates.

Example 9.5 Design an interface 32 speaker and use it to generate a loud 1 kHz sound. Solution At 5 V, a 32 speaker will require a current of about 150 mA. We will use the 2N2222 circuit in Figure 8.16 because it can sink at least three times the current needed for this speaker.

9.6 䡲 Timer Overflow, Output Compare, and Input Capture

349

In this example the interface will be connected to PT6. We select a 5 V supply and connect it to the V in the circuit. The needed base current is Ib Icoil/hfe 150 mA/100 1.5 mA The desired interface resistor. Rb (VOH Vbe)/ Ib (5 0.6)/1.5 mA 2.9 k To cover the variability in hfe, we will use a 1.5 k resistor instead of the 2.9 k. The actual voltage on the speaker when active will be 5 0.3 4.7 V. We can make the sound quieter by using a larger resistor for Rb. To generate the 1 kHz sound we need a 1 kHz squarewave. There are two good methods on the 9S12 to generate squarewaves. First, the output compare module can be used to create an interrupt every 0.5 ms, and make the output toggle at each interrupt. The second method uses the pulse width modulator (PWM) and previously presented in Section 8.6. The output compare method is used here (Program 9.4 adapted), but the PWM approach has the advantage of not requiring a periodic interrupt. The initialization of Program 9.5 selects toggle mode for output compare 6. Specifically, we set the bits (OM6,OL6) to (0,1) in TCTL1. To select the frequency of the sound we simply set the rate at which output compare interrupts are generated. To turn the sound off, we disarm OC6 interrupts. Notice with toggle mode, the output compare hardware changes the PT6 output automatically. Using automatic mode (as compared to having the software set and clear the port) creates a squarewave with a very low jitter (down to the stability of the crystal).

; 9S12C32 4 MHz, 9S12DP512 8 MHz OC6_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$03,TSCR2 ;($02 if C32) bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 bclr TCTL1,#$20 ;OM6=0 bset TCTL1,#$10 ;OL6=1 ldd TCNT ;time now addd #50 ;first in 50us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han movb #$40,TFLG1 ;acknowledge ldd TC6 addd #500 std TC6 ;next in 0.5 ms rti org $FFE2 fdb OC6Han ;vector

// 9S12C32 4 MHz, 9S12DP512 8 MHz void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 1 MHz TCNT TSCR2 = 0x03; // (0x02 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 TCTL1 = (TCTL1&0xCF)|0x10; TC6 = TCNT+50; // first in 50us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+500; // next in 0.5 ms TFLG1 = 0x40; // acknowledge C6F }

Program 9.5 Sound output using output compare.

Observation: To make a quieter sound, we could use a larger resistor between the 9S12 output and the 2N2222 base.

350

9 䡲 Interrupt Programming and Real-Time Systems

9.6.3 Input Capture Interrupts

We can use input capture to measure the period or pulse width of digital signals. The input capture system can also be used to trigger interrupts on rising or falling transitions of external signals. Table 9.8 shows the registers needed for input capture. TCNT is a 16-bit counter incremented at a fixed rate, determined by the E clock and the TSCR2 register. On most 9S12 microcontrollers, an input capture feature exists for each of the eight Port T inputs (let n be 0 to 7, representing the input PT0 to PT7 respectively.) There is a separate 16-bit input capture register for each of the 8 input capture modules (TC0 to TC7). Each input capture module has 䡲䡲䡲䡲䡲䡲

A direction register bit, DDRTn An external input pin, PTn A flag bit, CnF Two edge control bits, EDGnB EDGnA An interrupt arm bit CnI A 16-bit input capture register, TCn

In this book, we use the term arm to describe the bit that allows/denies a specific flag from requesting an interrupt. The Freescale manuals refer to this bit as a mask. I.e., the device is armed when the mask bit is 1. Typically, there is a separate arm bit for every flag that can request an interrupt. An external input signal is connected to the input capture pin (PT0 to PT7). The EDGnB, EDGnA bits specify whether the rising, falling or both rising and falling edges of the external signal will trigger an input capture event, see Table 9.10. Two or three actions result from an input capture event: 1. The current TCNT value is copied into the input capture register, TCNT → TCn 2. The input capture flag is set, 1 → CnF 3. An interrupt is requested if the CnI equals 1 This means an interrupt can be requested on a capture event. The input capture mechanism has many uses. Three of common applications of input capture are: 1. An interrupt service routine is executed on the active edge of the external signal 2. Perform two rising edge input captures and subtract the measurements to get period 3. Perform rising edge then falling edge captures and subtract the measurements to get pulse width The flag bits do not behave like a regular memory location. In particular, a flag can not be set by software. Rather, an input capture or output compare hardware event will set the flag. The other peculiar behavior of the flag is that the software must write a one to the flag in order to clear it. If the software writes a zero to the flag, no change will occur. The pin is selected as input capture by placing a 0 in the corresponding bit of the TIOS register. There is a direction register, DDRT, and we should clear the corresponding bits for the input capture inputs. We specify the active edge (i.e., the edge that latches TCNT and sets the flag) by initializing the TCTL3 and TCTL4 registers, as described in Table 9.10. We can arm or disarm the input capture interrupts by initializing the TIE register. Our software can determine if an input capture event has occurred by reading the TFLG1 register. Every time the TCNT register overflows from $FFFF to 0, the TOF flag in the TFLG2 register is set. The TOF flag will cause an interrupt if the mask TOI equals 1. Checkpoint 9.11: When does an input capture event occur?

Table 9.10 Two control bits define the active edge used for input capture.

EDGnB

EDGnA

Active Edge

0 0 1 1

0 1 0 1

None Capture on rising Capture on falling Capture on both rising and falling

9.6 䡲 Timer Overflow, Output Compare, and Input Capture

351

Checkpoint 9.12: What happens during an input capture event? Observation: The TCNT timer is very accurate because of the stability of the crystal clock. Therefore, measurements based on the clock will also be very accurate. Observation: When measuring period or pulse-width, the measurement resolution will equal the TCNT period.

The flags in the TFLG1 and TFLG2 registers are cleared by writing a 1 into the specific flag bit we wish to clear. For example, writing a $FF into TFLG1 will clear all 8 flags. The following is a valid method for clearing C3F. I.e., this acknowledge sequence clears the C3F flag without affecting the other 7 flags in the TFLG1 register. TFLG1 = 0x08; Checkpoint 9.13: Write assembly or C code to clear C6F. Common Error: Executing TFLG1 |= 0x08; will mistakenly clear all the bits in the TFLG1 register.

Example 9.6 Design a system that measures period with a resolution of 1 s. Solution Period is defined as the time from one rising edge to the next rising edge. The input signal will be connected to PT1 (any Port T pin could have been used) and the input capture system will be used to measure period. The initialization function first sets the I bit, so interrupts do not occur until the entire initialization sequence is complete, see Program 9.6. TIOS bit 1 and DDRT bit 1 are cleared so PT1 will be an input capture. Input capture is part of the timer module, which is activated by setting the TEN bit. The resolution of the system is determined by the period of the TCNT, so TSCR2 is set to make the TCNT period equal to 1 s, assuming the E clock is 8 MHz. Because the 9S12 must execute the ISR every rising edge, we should not try to use this solution to measure periods less than 50 s. In particular, it takes 9 bus cycles to perform an interrupt context switch plus 31 cycles to execute this assembly language ISR (Metrowerks Codewarrior C ISR executes in 30 cycles), so 40 cycles or 5 s are required for each edge. If the input wave has a period of 50 s, then the ISR software consumes 10% of the available processor execution. On the other extreme, this solution will will be incorrect for periods over 65.535 ms. The TCTL4 register is configured to so PT1 captures on each rising edge. Global variables are initialized and interrupts are armed and enabled. The 16-bit subtraction in the ISR calculates the number of TCNT clocks between rising edges. Since the ritual does not wait for the first edge, the first period measurement will be incorrect and should be neglected. Period rmb 2 ;resolution 1us First rmb 2 ;TCNT at first edge Done rmb 1 ;set each rising Init sei ;make atomic bclr TIOS,#$02 ;PT1=input capture bclr DDRT,#$02 ;PT1 is input movb #$80,TSCR1 ;enable TCNT movb #$03,TSCR2 ;1us clk bclr TCTL4,#$08 ;EDG1BA =01 bset TCTL4,#$04 ;on rise of PT1 movw TCNT,First ;init global

// Range = 50 us to 65.535 ms, // no overflow checking unsigned short Period; // 1us units unsigned short First; // TCNT first edge unsigned char Done; // Set each rising void Init(void){ asm sei // make atomic TIOS &=~0x02; // PT1 input capture DDRT &=~0x02; // PT1 is input TSCR1 = 0x80; // enable TCNT TSCR2 = 0x03; // 1us clock

continued on p. 354 Program 9.6 A software system implementing 16-bit period measurement.

352

9 䡲 Interrupt Programming and Real-Time Systems

continued from p. 353 clr Done movb #$02,TFLG1 ;clear C1F bset TIE,#$02 ;Arm C1F cli ;enable rts TC1Han ldd TC1 [3] subd First [3] std Period ;1us resolution[3] movw TC1,First ;setup [6] movb #$02,TFLG1 ;clear C1F [4] movb #$FF,Done [4] rti [8] org $FFEC ;timer channel 1 fdb TC1Han

TCTL4 = (TCTL4&0xF3)|0x04; // rising First = TCNT; // first will be wrong Done = 0; // set on subsequent TFLG1 = 0x02; // Clear C1F TIE |= 0x02; // Arm IC1 asm cli } void interrupt 9 TC1Han(void){ Period = TC1-First; // 1us resolution First = TC1; // Setup for next TFLG1 = 0x02; // ack by clearing C1F Done = 0xFF; }

Because the input capture interrupt has a separate vector the software does not poll. An interrupt is requested on each rising edge of the input signal. Figure 9.9 illustrates the period measurement for one situation with a period of 8192 s. On the first interrupt, TCNT ($F000) is latched into TC1. The ISR will save the $F000 in the private global called First. On the second interrupt, TCNT ($1000) is latched into TC1. The ISR will perform a 16-bit subtraction of $1000 $F000 $2000 8192, and store the 8192 into the public global called Period. This method is accurate as long as the period is between 50 and 65535 s. Figure 9.9 Example measurement of an input with a 8192 s period.

TCNT

EFFF F000 F001

FFFE FFFF 0000 0001

0FFF 1000 1001

1μs

8192 μs = 8192 cycles PT1 TC1

C1F=1 XXXX

F000

C1F =1 F000

F000

1000

Checkpoint 9.14: How would you modify Program 9.6 to implement a 2 s measurement resolution?

The interface circuit in Figure 9.7 could be combined with Program 9.6 to measure the speed of a spinning motor by connecting V2 to PT1 and calculating Speed = constant/Period.

9.7

Pulse Accumulator The pulse accumulator is a mechanism on the 9S12 to count events, measure frequency, or measure pulse width on a digital input signal. For example, if we wished to know how fast a motor is spinning, we could use a tachometer, which generates a squarewave with a frequency that is related to motor speed. We interface the tachometer output to the PT7 input and use the pulse accumulator to measure either frequency or pulse width. The software then converts the pulse accumulator measurements into motor speed. The 9S12 pulse accumulator is a 16-bit read/write counter that can operate in either of two modes. External event counting mode can be used for counting events or frequency measurement. We will use gated time accumulation mode for pulse width measurement. The I/O ports involved in the 9S12 pulse accumulator are shown in Table 9.11. The bits used in this section are shown in bold.

9.7 䡲 Pulse Accumulator

353

Address

msb

$0062

15

Address

Bit 7

6

5

4

3

2

1

Bit 0

Name

$0046 $0060 $0061 $0240 $0242

TEN 0 0 PT7 DDRT7

TSWAI PAEN 0 PT6 6

TSBCK PAMOD 0 PT5 5

TFFCA PEDGE 0 PT4 4

0 CLK1 0 PT3 3

0 CLK0 0 PT2 2

0 PAOVI PAOVF PT1 1

0 PAI PAIF PT0 Bit 0

TSCR1 PACTL PAFLG PTT DDRT

14

13

12

11

10

9

8

7

6

5

4

3

2

1

lsb

Name

0

PACNT

Table 9.11 9S12 I/O ports used by the pulse accumulator.

DDRT7 is the Data Direction bit for PT7. Normally, the DDRT7 bit is cleared so PT7 is an input, but even if it is configured for output, PT7 still drives the pulse accumulator. PAEN is the Pulse Accumulator System Enable bit. Turn this bit on to activate the pulse accumulator. The PAMOD and PEDGE bits select the operation mode, as shown in Table 9.12.

PAMOD

PEDGE

Mode

Action on Clock

Sets PAIF

0 0 1 1

0 1 0 1

event counting event counting gated time accumulation gated time accumulation

PT7 falling edge increments PACNT PT7 rising edge increments PACNT Counts when PT7 1 Counts when PT7 0

Falling edge Rising edge Falling edge Rising edge

Table 9.12 9S12 pulse accumulator operation modes on PT7.

In the event counting mode, the 16-bit counter (PACNT) is incremented on either the rising edge or falling edge of PT7. The maximum clocking rate for the external event counting mode is the E clock frequency divided by two. Event counting mode does not require the timer to be enabled. To use counting mode to measure frequency, we count the number of edges in a fixed time, T. We define frequency resolution as the smallest change in frequency the system can recognize. In this approach, the frequency resolution will be 1/T. The range of frequencies that can be measured will be 0 to 65535/T. In the gated time accumulation mode, a free-running clock (E clock divided by 64) increments the 16-bit counter. In particular, the E clock divided by 64 increments PACNT while the PT7 input is active. Gated accumulation mode does require the TEN in the TSCR1 register to be set. We can use gated accumulation mode to measure pulse width. We define pulse width resolution as the smallest change in pulse width the system can recognize. Let tE be the period of the E clock. The pulse width resolution will be 64*tE. The range of pulse widths that can be measured will be 64*tE to 65535*64*tE. The PAOVF status bit is set each time the pulse accumulator count rolls over from $FFFF to $0000. To clear this status bit, we write a one to the PAFLG register bit 1. The PAOVI will arm the device so that a pulse accumulator interrupt is requested when PAOVF is set. When PAOVI is zero, pulse accumulator overflow interrupts are disarmed. The PAIF status bit is automatically set each time a selected edge is detected at the PT7 pin (PEDGE 0 means falling edge, and PEDGE 1 means rising edge). To clear this status bit, write to the PAFLG register bit 1. The PAII will arm the device so that a pulse accumulator interrupt is requested when PAIF is set. When PAII is zero, pulse accumulator input interrupts are disarmed.

354

9 䡲 Interrupt Programming and Real-Time Systems Observation: The PACNT input and timer channel 7 use the same pin PT7. To use the pulse accumulator, disconnect PT7 from the output compare logic by clearing bits, OM7 and OL7. Also clear the channel 7 output compare 7 mask bit, OC7M7.

Example 9.7 Design a system that measures frequency with a resolution in Hz. Solution To estable the frequency resolution at 1 Hz, we count the number of falling edges that occur in one second. The signal to be measured will be connected to the pulse accumulator input, which is PT7 on the 9S12. The frequency measurement function, shown in Program 9.7, enables the pulse accumulator and selects event counting mode. When measuring frequency it usually doesn’t matter whether we count rising or falling edges. But, in this case, falling edges will be counted. The approach will be to initialize the pulse accumulator to event counting, clear the count, wait 1 second, then read the counter. Since frequency is defined as the number of edges in one second, the value in the PACNT after the one second time delay will be frequency in Hz. The 9S12 can measure 0 to 65535 Hz. In both cases, the frequency resolution (which is the smallest change in frequency that can be distinguished) will be 1 Hz. In general, the frequency resolution will be one divided by the fixed time during which counts are measured. The PAOVF bit will be set if the input frequency exceeds the measurement range. If the input signal has a frequency of 22.1 Hz (as illustrated in Figure 9.10), then function will return a result of 22.

Program 9.7 Frequency measurement using the pulse accumulator.

Figure 9.10 Example measurement with an input with a 22 Hz frequency.

Freq_Init bclr DDRT,#$80 ;PT7 is input movb #$40,PACTL ;count falling rts ;measures 0 to 65535 Hz ;returns Reg D = freq in Hz Freq_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF ldy #1000 bsr Timer_Wait1ms brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in Hz out rts

void Freq_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x40; // count falling } // measures 0 to 65535 Hz // returns result in Hz unsigned short Freq_Measure(void){ PACNT = 0; PAFLG = 0x02; Timer_Wait1ms(1000); if(PAFLG&0x02){ return(65535); } return PACNT; // frequency }

1s PT7 PACNT

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Checkpoint 9.15: What will be the output of Program 9.7 if the frequency is 1234.56 Hz? Checkpoint 9.16: How do you modify Program 9.7 so that it measures frequency with a resolution of 1 kHz? What the output then be if the frequency is 1234.56 Hz?

9.7 䡲 Pulse Accumulator

355

Example 9.8 Design a system that measures pulse width with a resolution of 8 s. Solution Pulse width is defined as the time the input signal is high. Again, the input signal will be connected to the pulse accumulator input, which is PT7 on the 9S12. The pulse width measurement function, shown in Program 9.8, enables the pulse accumulator and selects gated accumulation mode. In this case, PEDGE is set to zero, so the PACNT will accumulate when the input is high. With PEDGE equal to zero, the PAIF will be set on the falling edge of the input, signaling the pulse width measurement is complete. The approach will be to initialize the pulse accumulator to gated accumulation mode, clear the count, wait for PAIF to be set, then read the counter. Since PACNT counts while the input is high, the value in this counter will represent the width of the pulse. The pulse width resolution is the smallest change in pulse width that can be distinguished. In general, the pulse width resolution will be the period of the free-running clock used to increment the counter. Assuming the 9S12 E clock period is 125 ns, the pulse width resolution will be 8 s. The 9S12 can measure 8 s to 0.52 s. The PAOVF bit will be set if the input pulse width exceeds the measurement range. If the input signal has a pulse width of 152 s (as illustrated in Figure 9.11), then function will return a result of 152/8 or 19.

Pulse_Init bclr DDRT,#$80 ;PT7 is input movb #$60,PACTL ;measure high rts ;returns Reg D = pulse width in 8us ; measures 8us to 0.52s Pulse_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF loop brclr PAFLG,#$01,loop brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in 8us out rts

void Pulse_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x60; // measure high } // measures 8us to 0.52 sec // returns result in 8us unsigned short Pulse_Measure(void){ PACNT = 0; PAFLG = 0x02; while((PAFLG&0x01)==0){}; if(PAFLG&0x02){ return(65535); } return PACNT; // pulse width }

Program 9.8 Pulse width measurement using the pulse accumulator.

Figure 9.11 Example measurement of an input with a 152 s pulse width.

152μs PT7

PAIF set

E/64 PACNT

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

8μs

Checkpoint 9.17: The pulse width resolution of the system in Program 9.8 is 8 s. What does that mean? Checkpoint 9.18: What will be the output of Program 9.8 if the pulse width is 1234.5 sec?

356

9.8

9 䡲 Interrupt Programming and Real-Time Systems

*Direct Memory Access The purpose of this section is to introduce terminology of high-speed I/O interfacing. The bandwidth of an I/O device is the number of bytes/sec that can be transferred. Real-time systems have extremely tight requirements for both latency and bandwidth. The 9S12 has a 16-bit data bus. If it is executing at 24 MHz, the data bus bandwidth is 48 Mbytes/sec. A high speed SCI interface can achieve only 10,000 bytes/sec. The SPI clock can run at 12 Mbps, but peak bandwidth for an SPI interface will be limited by software speed. One of the limitations of software-based interfaces such as busy-wait and interrupts is the data must be brought into the processor and manipulated before it can be transferred to memory. If you wish to transfer data from an SPI input device into RAM, you must first transfer it from SPIDR to Register A, then from Register A into RAM. In order to achieve high bandwidth, we need to be able to transfer data directly from input to RAM or RAM to output using Direct Memory Access, or DMA. Because DMA is faster, we will use this method to interface high bandwidth devices like disks and networks. A key architecture component is the availability of a co-processor that can perform I/O functions in parallel with but separate from the processor execution. We program a co-processor in a similar manner as the way we program the regular processor. For example, there can be a program counter and general purpose registers. However, the instructions are usually very simple, explicitly defining an I/O operation to perform. An architecture with simple and explicit machine codes is called Reduced Instruction Set Computer (RISC). Devices that support DMA include the hard drive controller on the PC, video graphics controller on the PC, and the XGate peripheral co-processor on the 9S12X series of microcontrollers from Freescale. During a read DMA cycle (Figure 9.12) data flows directly from the memory to the output device. During the DMA cycles the co-processor drives the address and control bus.

Figure 9.12 A DMA read cycle copies data from RAM, ROM or input device into an output device.

$98 DMA Read Cycle

$3800 R Processor

Input ports

Input signals

Output ports

Output signals

RAM $98 ROM Address Control Data

During a write DMA cycle (Figure 9.13) data flows directly from the input device to memory. Figure 9.13 A DMA write cycle copies data from the input device into RAM, or output device.

$25 $3800

DMA Write Cycle W

Processor Input $25 ports

Input signals

RAM Output ports ROM Address Control Data

Output signals

9.9 䡲 Hardware Debugging Tools

357

Prediction: The need for I/O bandwidth will increase faster than the processor execution speed, therefore I/O co-processors will become more prevalent in embedded systems of the future. Prediction: The power requirements increase linearly with bandwidth, so there will always be a place in the embedded systems market for low speed low power systems.

9.9

Hardware Debugging Tools Microcomputer related problems often require the use of specialized equipment to debug the system hardware and software. Two very useful tools are the logic analyzer and in-circuit emulator (ICE). A logic analyzer is essentially a multiple channel digital storage scope with many ways to trigger (see Figure 9.14). As a trouble shooting aid, it allows the experimenter to observe numerous digital signals at various points in time and thus make decisions based upon such observations. As with any debugging process, it is necessary to select which information to observe out of a vast set of possibilities. Any digital signal in the system can be connected to the logic analyzer. Figure 9.14 shows an 8-channel logic analyzer, but real devices can support 128 or more channels. One problem with logic analyzers is the massive amount of information that it generates. With logic analyzers (similar to other debugging techniques) we must strategically select which signals in the digital interfaces to observe and when to observe them. In particular, the triggering mechanism can be used to capture data at appropriate times eliminating the need to sift through volumes of output. Sometimes there are extra I/O pins on the microcontroller, not needed for the normal operation of the system (shown as the bottom two wires in Figure 9.14). In this case, we can connect the pins to a logic analyzer, and add software debugging instruments that set and clear these pins at strategic times within the software. In this way we can visualize the hardware/software timing.

Figure 9.14 A logic analyzer and example output.

Logic Analyzer 9S12

Digital Interface Digital Interface

PT1 PT0

Some microcontrollers have external pins containing the address, R/W, and data containing bus cycle information as discussed in Section 4.2. In this case, we could connect address, R/W, and data to the logic analyzer. The logic analyzer must be synchronized to the processor, so that the analyzer knows which memory reads are op code fetches. This way the location and the data it calculates can be reconstructed from the bus cycles. This debugging method is nonintrusive. This process doesn’t work on high-performance processors such as the Pentium because (1) there is an internal memory cache to contain data it needs most frequently, and (2) it fetches many op codes that are never actually executed, as it tries to prefetch machine codes it thinks the processor will need in the future. An in-circuit emulator is a hardware debugging tool that recreates the input/output signals of the processor chip. To use an ICE, we remove the processor chip. One side of the cable is inserted into the vacated processor chip socket, and the other side is connected to the ICE. Figure 9.15 shows the microcomputer system with and without the ICE. Notice the cable between the debugging instrument (ICE) and the microcomputer socket on the target board. In most cases, the emulator/computer system operates at full speed. The emulator allows the programmer to observe and modify internal registers of the processor. Emulators

358

9 䡲 Interrupt Programming and Real-Time Systems

In-Circuit Emulator Registers I/O Ports I/O

I/O

9S12 Embedded System with microcomputer and I/O

A B X Y SP PC

= = = = = =

$55 $31 $1234 $5678 $0BF0 $F103

PortH PortJ PortS PortT PortE TCNT

= = = = = =

$83 $00 $55 $0F $21 $A010

I/O

I/O

Socket Embedded System with emulator and I/O

Figure 9.15 In-circuit emulator and example output.

are often integrated into a personal computer, so that its editor, hard drive, and printer are available for the debugging process. Observation: Many target microcomputer systems have the microcomputer chip soldered onto the circuit board, and thus can not be removed.

To debug a board-level system where the program is stored in an external ROM, we can use another class of emulator called the ROM-emulator (see Figure 9.16). This debugging tool replaces the ROM with cable connects to a dual-port RAM within the emulator. While the software is running, it fetches information from the emulator RAM just like it was the ROM. While the software is halted, you can modify its contents. Figure 9.16 In-circuit ROM emulator and example output.

Emulator Address Contents Interpretation $E000 $E001 $E002 $E003 $E004 $E005

$B6 $02 $40 $B7 $02 $50

ldaa $0240 staa $0250

Processor ROM socket

RAM

Address/data bus

Observation: An in-circuit ROM emulator can only be used in a microcomputer system that stores the program into an external ROM chip.

The only disadvantage of the in-circuit emulator is its cost. To provide some of the benefits of this high-priced debugging equipment, the 9S12 has a background debug module (BDM). The BDM hardware exists on the microcomputer chip itself and communicates with the debugging computer via a dedicated serial interface, as shown in Figure 9.17. Although not as flexible as an ICE, the BDM can provide the ability to observe software execution in real-time, the ability to set breakpoints, the ability to stop the computer and the ability to read and write registers, I/O ports and memory. The registers can only be observed when the computer is halted, but the memory and I/O ports are accessible while the program is executing.

9.10

Profiling Profiling is similar to performance debugging because both involve dynamic behavior. Profiling is a debugging process that collects the time history of strategic variables. For example if we could collect the time-dependent behavior of the program counter, then we could see the execution patterns of our software. We can profile the execution of a multiple thread software system to detect reentrant activity. We can profile a software system to see which of two software modules is run first. For a real-time system,

9.10 䡲 Profiling

359

Figure 9.17 P&E Microcomputer Systems Multilink BDM.

we need to guarantee the time between when software should be run and when it actually runs is short and bounded. Profiling allows us to measure when software is actually run, experimently verifying the system is real-time.

9.10.1 Profiling Using a Software Dump to Study Execution Pattern

Program 9.9 Debugging instrument for profiling.

In this section, we will use a debugging instrument to study the execution pattern of our software. In order to collect information concerning execution we will define a debugging instrument that saves the time and location in an array (like a dump), as shown in Program 9.9. The debugging session will initialize the private global N to zero. In this profile, the place p will be an integer, uniquely specifying from which place in the software Profile is called. The assembly version of Profile requires 44 cycles to execute (including the ldy and jsr). If the 9S12 is running at 24 MHz, this debugging instrument consumes less then 2 s per call. This amount of time would usually be classified as minimally intrusive.

Time rmb 200 Place rmb 200 N rmb 1 Profile ;RegY contains p pshb pshx ldab N cmpb #100 ;full? bhs Pdone lslb ;16-bits each ldx #Time movw TCNT,B,x ;record time ldx #Place sty B,X ;record place inc N Pdone pulx pulb rts

unsigned short Time[100]; unsigned short Place[100]; unsigned char N; void Profile(unsigned short p){ if(N0) { Profile(1); s16 = 16*s; t = 32; // guess 2.0 for(cnt=3; cnt; cnt—){ Profile(2); t = ((t*t+s16)/t)/2; } } Profile(3); return t; }

Observation: Debugging instruments need to save and restore registers so the original function is not disrupted.

9.10.2 Profiling Using an Output Port

In this section, we will discuss a hardware/software combination to visualize program activity. Our debugging instrument will set output port bits. We will place these instruments at strategic places in the software. If we are using a regular oscilloscope, then we must stabilize the system so that the function is called over and over. We connect the output pins to a scope or logic analyzer and observe the program activity. Program 9.11 uses an output port to profile.

9.10 䡲 Profiling Program 9.11 A time/position profile using two output bits.

;------t=sqrt(s)-----; input s RegA, resolution 1/16 ; output t Reg B, 1/16 t rmb 1 ;8-bit, res=1/16 cnt rmb 1 ;loop counter s16 rmb 2 ;16-bit 16*s sqrt movb #0,PTT clrb ;sqrt(0)=0 tsta beq done movb #1,PTT ldab #16 mul ;16*s std s16 ;s16=16*s movb #32,t ;t=2.0 movb #3,cnt next movb #2,PTT ldaa t ;RegA=t tab ;RegB=t tfr a,x ;RegX=t mul ;RegD=t*t addd s16 ;RegD=t*t+16*s idiv ;RegX=(t*t+16*s)/t tfr x,d lsrd ;RegB=((t*t+16*s)/t)/2 adcb #0 stab t dec cnt bne next done movb #3,PTT rts

361

//------t=sqrt(s)-----unsigned char sqrt(unsigned char s){ unsigned char t; // resolution 1/16 unsigned char cnt; // loop counter unsigned short s16; PTT = 0; t = 0; // secant method if(s>0) { PTT = 1; s16 = 16*s; t = 32; // guess 2.0 for(cnt=3; cnt; cnt--){ PTT = 2; t = ((t*t+s16)/t)/2; } } PTT = 3; return t; }

Checkpoint 9.19: Write two friendly debugging instruments, one that sets Port B bit 3 high, and the other makes it low.

9.10.3 *Thread Profile

When more than one thread is active, you could use the previous technique to visualize the thread that is currently running. For each thread, we assign an output pin. The debugging instrument would set the corresponding bit high when the thread starts and clear the bit when the thread stops. We would then connect the output pins to a multiple channel scope or logic analyzer to visualize in real-time the thread that is currently running. For an example of this type of profile, run one of the thread.* examples included with the TExaS simulator, and observe the logic analyzer. Program 9.12 shows a simple thread profile of a system with a foreground thread (main program) and a background thread (ISR). PT1 will be high when the software is running in the foreground and PT0 will be high when executing in the background. The debugging instruments are shown in bold. The ISR saves the previous PTT value at the beginning and restores it at the end. The results shown in Figure 9.18 demonstrate the interrupt occurs every 128 s and most of the time, the software is running in the foreground.

362

9 䡲 Interrupt Programming and Real-Time Systems

org $0800 ;($3800 if C32) rmb 2 org $4000 main lds #$4000 bset DDRT,#$03 ;PT1,PT0 output movb #$20,RTICTL ;($10 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ movb #$02,PTT ;foreground loop bra loop ; interrupts every 128us RTIHan ldab PTT ;save movb #$01,PTT ;background movb #$80,CRGFLG ;ack ldx Time inx stx Time stab PTT ;restore rti org $FFF0 fdb RTIHan ;vector Time

unsigned short Time; void main(void){ DDRT |= 0x03; // PT1,PT0 output RTICTL = 0x20; // (0x10 if C32) CRGINT = 0x80; // Arm Time = 0; // Initialize asm cli PTT = 0x02; // foreground while(1){ } } // interrupts every 128us void interrupt 7 RTIHan(void){ char oldPTT=PTT; PTT = 0x01; // background CRGFLG = 0x80; // Acknowledge Time++; PTT = oldPTT; }

Program 9.12 Implementation of a periodic interrupt using the real time clock feature.

Figure 9.18 Real-time thread profile measured with a logic analyzer.

Observation: Notice in Figure 9.18 that the time to execute the ISR (when PT0 is high) is short compared to the time between interrupt requests (period of PT0). This represents a good interrupt design.

9.11 䡲 Tutorial 9. Profiling

9.11

363

Tutorial 9. Profiling In this tutorial we will profile a real-time system that uses four periodic output compare interrupts. The goal of the system is to periodically execute four separate tasks in the background. Each task is performed at fixed rate; the four rates are similar but unequal, as shown in Table T9.1. As you can see from the table, in each 1 second the time to execute all four tasks is less than 200 ms. In other words, we plan to use only 20 percent of the available processor time. The TCNT period will be set to 16 s.

Task

ISR code

Interrupt period

Time to execute Task

Total time in 1 second

Task 0 Task 1 Task 2 Task 3

TC0TC079 TC1TC173 TC2TC267 TC3TC350

1264 s 1168 s 1072 s 800 s

50 s 50 s 50 s 50 s

39.6 ms 42.8 ms 46.6 ms 62.5 ms

Table T9.1 Real-time requirements of an embedded system. Monitors and memory dumps are minimally intrusive techniques to collect strategic information without slowing down too much the system we are testing. At the start of each ISR, one bit in Port T will be set high, and at the end of the ISR, that bit will be cleared. In addition, the main program will toggle PT4. We profile this system by observing all five bits on a logic analyzer. This profile will allow us to see where and when our tasks are running. We will be studying Task 3 in particular, so we expect PT3 to go high for 50 s every 800 s. The second debugging instrument used in this tutorial is a memory dump. It is a memory dump because the debugging information is not output or displayed, but rather it is just dumped into a memory buffer. In particular, we will measure the time between one execution of Task3 until next execution of Task3. These measurements are entered into a histogram so we can see the variability in the period. Let I be the difference in TCNT cycles, which we expect to be 50 each time. We calculate the time error or jitter as JI50. Next, we make it unsigned (KJ8) and apply upper and lower bounds (if K0 then K0, if K16 then K16). A histogram is a count of the number of times an event occurs, so we perform Dbg_Hist[K]. The first entry, Dbg_Hist[0], is the number of times the time between executing Task3 is more than 96 s too early. The middle entry, Dbg_Hist[8], is the number of times it perfect (50 cycles or 800 s). Similarly, Dbg_Hist[16] is the number of times it is more than 96 s too late. Dbg_Hist rmb 34 ;16-bit counts Question 9.1 What could cause a delay in executing the Task3 ISR? The debugging instruments shown in Program T9.1 were used to profile the system. FirstFlag is a flag used to skip the first measurement, because there is no previous interrupt to measure the time delay from. PreviousTCNT is the TCNT measurement from the previous execution of Task3. The initialization is called once, and the measurement is called from the start of Task3. Question 9.2 Why is it important to know the variability in the time between successive executions of a periodic task? Question 9.3 Consider the situation when two interrupts are requested at the same time. Is one lost or just delayed? If both are executed, which one goes first? Observation: Profiling is made easier if the subroutine as a single rts exit point at the bottom of the function. Action: Copy the Tutor9.rtf Tutor9.uc Tutor9.scp files from the web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. Assemble and run the system, observing the logic analyzer. You should see something like Figure T9.1 Question 9.4 Observe Figure T9.1. PT3 signifies the execution of Task3. The time between the first and second PT3 pulses is noticeably longer than the time between the second and third PT3 pulses. Why?

364

9 䡲 Interrupt Programming and Real-Time Systems

Program T9.1 Debugging instruments to measure time jitter of a periodic task.

Dbg_Init bset movb ldx ldy Dbg.1 clr dbne rts

DDRT,#$1F ;monitors #1,FirstFlag #Dbg_Hist #34 1,x+ ;clear Y,Dbg.1

Dbg_Measure ldd TCNT ;time now pshd tst FirstFlag bne Dbg.2 subd PreviousTCNT subd #42 ;means 42*16=672us bpl Dbg.3 ldd #0 ;way too small Dbg.3 cpd #16 bls Dbg.4 ldd #16 ;way too big Dbg.4 ldx #Dbg_Hist lsld ;16-bit entries ldy D,X iny sty D,X Dbg.2 puld std PreviousTCNT clr FirstFlag rts

Question 9.5 In the original design specification we expected the four tasks to occupy 20% of the available processor time. Does the data in Figure T9.1 support or reject this hypothesis?

Figure T9.1 Profile the system.

Question 9.6 Observing the listing file, estimate the intrusiveness of the Dbg_Measure instrument. Action: Close the logic analyzer window (so the simulation runs faster). Start the system and let it run for a long time. Question 9.7 The Dbg_Hist[8] entry will get very large, but does either Dbg_Hist[0] or Dbg_Hist[16] ever get incremented? What does that mean?

9.12 䡲 Homework Problems

365

Action: If we were to active the PLL changing the E clock from 8 MHz to 24 MHz, then the tasks would run 3 times faster. We can quickly simulate this effect by changing the 400 in the Fiftyus function to 133, making all four tasks complete in about 17 s instead of 50 s. Changing the E clock does not change how often the tasks should be run. I.e., Task 3 should still run every 800 s, but now only takes 17 s to complete. Assemble the new system and let it run for a long time. Question 9.8 Can you say this new system is real time? Question 9.9 How would you prove Task 3 is now running in real time?

9.12

Homework Problems Homework 9.1 Your job is to design a device driver for a computer mouse. Assuming it is to be written in C, give the Mouse.h header file that lists the prototypes for the public functions. You show just the header file, not the implementation file. Homework 9.2 Your job is to design a device driver for a black and white text-based video screen. There are 24 lines and 80 columns. Assuming it is to be written in C, give the Video.h header file that lists the prototypes for the public functions. Homework 9.3 In this problem you will write an assembly language subroutine that outputs data to the following printer using a busy-waiting handshake protocol.

Figure Hw9.3 Printer interface.

9S12

Printer

Start

Start

PA4 PA0 PB7-PB0

Ack Data

Ack Data

The following sequence will print one ASCII character: 1. The microcomputer puts the 8-bit ASCII on the Data lines 2. The microcomputer issues a Start pulse (does not matter how wide) 3. The microcomputer waits for the Ack pulse (Printer is done) a) Show the subroutine that outputs a character You may assume the Ack pulse is larger than 10 s. The 8-bit ASCII data to print is passed by value in Reg B. An example calling sequence is ldab #’V ; ASCII ‘V’ jsr Output b) How long is your Start pulse? Explain your calculation Homework 9.4 Redesign the printer interface of Homework 9.3 using interrupt synchronization. Connect the printer to Ports H and J and use a key wakeup interrupt on the Ack signal Write three routines: an initialization subroutine to turn it on, a public function that accepts a null-terminated ASCII string pointed to by register X, and an interrupt service routine triggered by the rising edge of Ack. The ASCII string to print is passed by reference in Reg X. An example calling sequence is ldx jsr

#String ; pointer to null-terminated ASCII string OutString

After the string has been printed, the system should disarm Homework 9.5 What happens if you forget to execute cli in the initialization in a system using interrupts? Homework 9.6 What happens if you execute cli as the first instruction in an ISR? Homework 9.7 What happens if you execute sei as the last instruction in an ISR? Homework 9.8 Write interrupting software that maintains the time of day. Give the initialization, the ISR, and the interrupt vector. The initial time of day is passed in when initialization is called. Register

366

9 䡲 Interrupt Programming and Real-Time Systems A contains the initial hour, Register B contains the initial minute. Assume the initial seconds are 0. Implement military time, where the hour goes from 0 to 23. Homework 9.9 Write interrupting software that counts a global variable at 1 Hz. Give the initialization, the ISR, and the interrupt vector. Homework 9.10 Assuming the object code is running in RAM, write three debugging subroutines that implement a ScanPoint system. The first subroutine initializes your system. The second subroutine adds a ScanPoint at the address passed into it in Register D. You may assume that the ScanPoint address is the first byte of an op code. When the target program executes that scanned instruction, the values of the registers are displayed, the original instruction is executed, and the program continues execution. Your system should be able to support up to ten ScanPoints. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine removes a ScanPoint at the address passed into it in Register D. For simplicity, you may assume scanpoints are only placed at single-byte instructions. Homework 9.11 Assuming the object code is running in RAM, write debugging subroutines that implement single stepping. In particular, write a subroutine that executes the target software at the address passed into it in Register D. You may assume that the starting address is the first byte of an op code. Your system should execute the target program one instruction at a time, showing the values of the registers, and pausing for SCI input after each instruction. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. If the operator types ‘q’, then the debugging halts and control is returned to the program that called your subroutine. For any other input, you should execute the next instruction. This is an advanced topic and will require output compare interrupts to solve. Homework 9.12 Create the repeating waveform on PT7 output as shown in Figure Hw9.12. Design the software system using RTI periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, RTI initialization, main program, RTI ISR, RTI vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The RTI ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places.

Figure Hw9.12 Desired output.

PT7

5.12ms

10.24ms

5.12ms

10.24ms

Homework 9.13 Create the repeating waveform on PT1 output as shown in Figure Hw9.13. Design the software system using OC1 periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, OC1 initialization, main program, OC1 ISR, OC1 vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The OC1 ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places. Figure Hw9.13 Desired output.

PT1

5 ms

10 ms

5 ms

10 ms

Homework 9.14 Redesign the FSM in Homework 6.24 to run in the background using RTI interrupts. Execute the FSM every 2.048 ms. There are no backward jumps in the ISR. Homework 9.15 Assume the PLL is running so the E clock is 25 MHz. Redesign the FSM in Homework 6.25 to run in the background using input capture and output compare interrupts. The FSM is run whenever there is a rising edge on PT3. There are no backward jumps in the ISR. Homework 9.16 Redesign the FSM in Homework 6.26 to run in the background using output compare interrupts. Execute the FSM every 10 ms. There are no backward jumps in the ISR. Homework 9.17 Redesign the FSM in Homework 6.27 to run in the background using output compare interrupts. Execute the FSM every 5 ms. There are no backward jumps in the ISR. Homework 9.18 Redesign the FSM in Homework 6.28 to run in the background using TOF interrupts. Execute the FSM every 16.384 ms. There are no backward jumps in the ISR.

9.13 䡲 Laboratory Assignments

367

Homework 9.19 Assume the PLL is running so the E clock is 24 MHz. Redesign the system in Homework 6.25 without using the FSM to run in the background using input capture and output compare interrupts. An input capture occurs on the rising edge on PT3. The pulse is created with output compare. There are no backward jumps in the ISR. Homework 9.20 These seven events all occur during each output compare 7 interrupt. 1. 2. 3. 4. 5. 6.

The TCNT equals TC7 and the hardware sets the flag bit (e.g., C7F 1) The output compare 7 vector address is loaded into the PC The I bit in the CCR is set by hardware The software executes movb #$80,TFLG1 The CCR, A, B, X, Y, PC are pushed on the stack The software executes something like ldd TC7 addd #2000 std TC7

7. The software executes rti List one possible order in which the events occur. Homework 9.21 There is a digital squarewave connected to input PT0. Use input capture on PT0 and output compare on channel 1 to measure the frequency on PT0. The range of values is 0 to 10000 Hz, and the desired resolution is 1 Hz. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. For example, if the frequency is 1000 Hz, the variable will be written with 1000. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.22 There is a digital squarewave connected to input PT2. Use input capture on PT2 and output compare on channel 3 to measure the frequency on PT2. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 0.1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. The range of values is 0 to 10000 Hz, and the desired resolution is 10 Hz. For example, if the frequency is 1000 Hz, the variable will be written with 100. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.23 Interface a switch to PJ7. Use positive logic (switch pressed makes PJ7 1). The switch bounce time is 10 ms. Use key wakeup on PJ7 and output compare on channel 0 to count the number of times the switch is pressed. Interrupt on the rising edge of PJ7. In the PJ7 ISR, disarm PJ7 and arm OC0 to interrupt in 15 ms. In the OC0 ISR, if the switch is pressed, increment the global variable, Count. The OC0 OSR software should disarm OC0 and rearm PJ7 key wakeup. Show the ritual, key wakeup ISR and output compare ISR. Assume the E clock is 8 MHz. Each touch will cause one key wakeup and one OC interrupt (Count incremented). Similarly, each release will cause one key wakeup and one OC interrupt (Count not incremented).

9.13

Laboratory Assignments Lab 9.1 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, create a segmented software system, interrupt synchronization by designing an input-directed traffic light controller. Description. Design implement and test the traffic light system described in Lab 6.5 with the added constraint that the software runs in the background using periodic interrupts. In particular, there are three components: a data structure containing the state graph, an initialization function that is called once to start the machine, and a periodic interrupt service routine that executes the state machine. All the other specifications and constraints described in Lab 6.5 still apply.

10

Numerical Calculations Chapter 10 objectives are to: c Introduce fixed-point and use it to develop numerical solutions c Develop extended precision mathematical calculations c Define floating point formats

The overall theme of this chapter is numerical calculations. Non-integer values can be represented on the computer using either fixed-point or floating point. Without hardware support, floating point operations run many times slower than fixed-point. Therefore, on a microcontroller like the 9S12 without floating point hardware, we would rather employ fixed-point. In general, we can use fixed-point for situations where the range of values is known at design time, and this range is small.

10.1

Fixed-Point Numbers We will use fixed-point numbers when we wish to express values in our software that have noninteger values. A fixed-point number contains two parts. The first part is a variable integer, called I. This integer may be signed or unsigned. An unsigned fixedpoint number is one that has an unsigned variable integer. A signed fixed-point number is one that has a signed variable integer. The precision of a number system is the total number of distinguishable values that can be represented. The precision of a fixed-point format is determined by the number of bits used to store the variable integer. On the 9S12, we typically use 8 bits or 16 bits. Extended precision can be implemented, but the execution speed will be slower because the calculations will have to be performed using software algorithms rather than with hardware instructions. This integer part is saved in memory and is manipulated by software. These manipulations include but are not limited to add, subtract, multiply, divide, convert to BCD, convert from BCD. The second part of a fixed-point number is a fixed constant, called . This value is fixed at design time, and can not be changed at run time. The fixed constant is not stored in memory. Usually we specify the value of this fixed constant using software comments to explain our fixed-point algorithm. The value of the fixed-point number is defined as the product of the two parts: Fixed-point number ⬅ I• The resolution of a number is the smallest difference that can be represented. In the case of fixed-point numbers, the resolution is equal to the fixed constant (). Sometimes we express the resolution of the number as its units. For example, a decimal fixed-point number with a resolution of 0.001 volts is really the same thing as an integer with units of mV. When inputting numbers from a keyboard or outputting numbers to a display, it may be convenient to use decimal fixed-point. With decimal fixed-point the fixed constant is a power of 10. Decimal fixed-point number I • 10m for some constant integer m

368

10.1 䡲 Fixed-Point Numbers

369

Again, the integer m is fixed and is not stored in memory. Decimal fixed-point will be easy to input or output to humans, while binary fixed-point will be easier to use when performing mathematical calculations. With binary fixed-point the fixed constant is a power of 2. Binary fixed-point number I • 2m for some constant integer m Observation: If the range of numbers is known and small, then the numbers can be represented in a fixed-point format. Checkpoint 10.1: Give an approximation of using the decimal fixed-point ( 0.001) format. Checkpoint 10.2: Give an approximation of using the binary fixed-point ( 28) format.

In the first example, we will develop the equations that a 9S12 would need to implement a digital voltmeter. The 9S12 has a built-in analog to digital converter (ADC) that can be used to transform an analog signal into digital form. The 10-bit ADC analog input range is 0 to 5 V, and the ADC digital output varies 0 to 1023 respectively. Let Vin be the analog voltage in volts and N be the digital ADC output, then the equation that relates the analog to digital conversion is Vin 5*N/1023 0.0048876*N Resolution is defined as the smallest change in voltage that the ADC can detect. This ADC has a resolution of about 5 mV. In other words, the analog voltage must increase or decrease by 5 mV for the digital output of the ADC to change by at least one bit. It would be inappropriate to save the voltage as an integer, because the only integers in this range are 0, 1, 2, 3, 4, and 5. Since the 9S12 does not support floating point, the voltage data will be saved in fixed-point format. Decimal fixed-point is chosen because the voltage data for this voltmeter will be displayed. A fixed-point resolution of 0.001 V is chosen because it is slightly smaller (better) than the ADC resolution. Table 10.1 shows the performance of the system. The table shows us that we need to store the variable part of the fixed-point number in a 16-bit variable.

Table 10.1 Performance data of a microcomputer-based voltmeter.

Vin (V) Analog input

N ADC digital output

I (0.001 V) variable part of the fixed-point data

0.000 0.005 1.000 2.500 5.000

0 1 205 512 1023

0 5 1000 2500 5000

One possible software formula to convert N into I is as follows. I (5000*N 512)/1023 It is very important to carefully consider the order of operations when performing multiple integer calculations. There are two mistakes that can happen. The first error is overflow, and it is easy to detect. Overflow occurs when the result of a calculation exceeds the range of the number system. The two solutions of the overflow problem were discussed earlier, promotion and ceiling/floor. The other error is called drop-out. Drop-out occurs after a right shift or a divide, and the consequence is that an intermediate result looses its ability to represent all of the values. To avoid drop-out, it is very important to divide last when performing multiple integer calculations. If you divided first, e.g., I 5000*(N/1023), then the

370

10 䡲 Numerical Calculations

values of I would be only 0, or 5000. The addition of “512” has the effect of rounding to the closest integer. The value 512 is selected because it is about one half of the denominator. For example, the calculation (5000*N)/1023 4 for N 1, whereas the “(5000*1 512)/1023” calculation yields the better answer of 5. The display algorithm is given as Program 10.1.

Program 10.1 Print unsigned 16-bit decimal fixed-point number to an output device.

void OutFDec(unsigned short OutUDec(n/100); // OutChar(‘.’); // OutUDec((n%1000)/100); // OutUDec((n%100)/10); // OutUDec(n%10); // OutChar(‘V’);} //

n){ // fixed constant is 0.001 digits to the left of the decimal point decimal point tenths digit hundredths digit thousandths digit units

When adding or subtracting two fixed-point numbers with the same , we simply add or subtract their integer parts. First, let x, y, z be three fixed-point numbers with the same . Let x I•, y J•, and z K•. To perform z x y, we simply calculate K I J. Similarly, to perform z x y, we simply calculate K IJ. When adding or subtracting fixed-point numbers with different fixed parts, then we must first convert two the inputs to the format of the result before adding or subtracting. This is where binary fixedpoint is more convenient, because the conversion process involves shifting rather than multiplication/division. In this next example, let x,y,z be three binary fixed-point numbers with the different s. In particular, we define x to be I•25, y to be J•22, and z to be K•23. To convert x, to the format of z, we divide I by 4 (right shift twice). To convert y, to the format of z, we multiply J by 2 (left shift once). To perform z x y, we calculate K (I 2) (J 1) For the general case, we define x to be I•2n, y to be J•2m, and z to be K•2p. To perform any general operation, we derive the fixed-point calculation by starting with desired result. For addition, we have z x y. Next, we substitute the definitions of each fixedpoint parameter K•2p I•2n J•2m Lastly, we solve for the integer part of the result K I•2np J•2mp For multiplication, we have z x•y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n•J•2m Lastly, we solve for the integer part of the result K I•J•2nmp For division, we have z x/y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n/J•2m

10.2 䡲 *Extended Precision Calculations

371

Lastly, we solve for the integer part of the result K I/J•2nmp Again, it is very important to carefully consider the order of operations when performing multiple integer calculations. We must worry about overflow and drop out. In particular, in the division example, if (n m p) is positive then the left shift (I•2nmp) should be performed before the divide (/J). We can use these fixed-point algorithms to perform complex operations using the integer functions of our 9S12.

Example 10.1 Rewrite the following digital filter using fixed-point calculations. y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 Solution In this case, the variables y, y1, y2, x, x1, and x2 are all integers, but the constants will be expressed in binary fixed-point format. The value 0.0532672 can be approximated by 14•28. The value 0.0506038 can be approximated by 13•28. Lastly, the value 0.9025 can be approximated by 231•28. The fixed-point implementation of this digital filter is y x x2 (14•x1 13•y1 231•y2) 8 Common Error: Lazy or incompetent programmers use floating-point in many situations where fixed-point would be preferable. Observation: As the fixed constant is made smaller, the accuracy of the fixed-point representation is improved, but the variable integer part also increases. Unfortunately, larger integers will require more bits for storage and calculations. Checkpoint 10.3: Using a fixed constant of 28, rewrite the digital equation F 1.8•C 32 in binary fixed-point format. Checkpoint 10.4: Using a fixed constant of 103, rewrite the digital filter y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 in decimal fixed-point format. Checkpoint 10.5: Assume resistors R1, R2, R3 are the integer parts of 16-bit unsigned binary fixed-point numbers with a fixed constant of 24 ohms. Write an equation to calculate R3 R1 ll R2 (parallel combination.)

10.2

*Extended Precision Calculations In this section, we will study various techniques to perform extended precision calculations. Sometimes complex calculations can be performed simply by combining simpler operations, while at other times, more sophisticated algorithms will be required. Three 32-bit local variables are used in the examples of this section. For most situations, local variables are more appropriate than globals, although using globals is often faster and easier to debug. Assume there are 12 bytes allocated on the stack pointed to by the stack pointer SP, and the following local variable binding. N M P

set set set

0 4 8

;32-bit local ;32-bit local ;32-bit local

372

10 䡲 Numerical Calculations

10.2.1 Addition and Subtraction

Program 10.2 A 32-bit addition operation.

Program 10.2 gives a 32-bit addition algorithm. The approach starts with the least significant byte and uses the add-with-carry operation to combine the 8-bit additions to form the 32-bit operation.

; 32-bit addition P=N+M ; Input: Two 32-bit numbers N,M ; Output: One 32-bit sum P ; Error: C/V set for unsigned/signed overflow add32 ldaa N+3,sp ; start with least significant byte adda M+3,sp staa P+3,sp ldaa N+2,sp ; next byte adca M+2,sp ; carry from previous addition staa P+2,sp ldaa N+1,sp ; next byte adca M+1,sp ; carry from previous addition staa P+1,sp ldaa N,sp ; last byte adca M,sp ; carry from previous addition staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z bit is not correct

Checkpoint 10.6: Why isn’t the Z bit correct?

Program 10.3 gives a 32-bit subtraction algorithm. Again, the approach starts with the least significant byte and uses the subtract-with-borrow operation to combine the 8-bit subtractions to form the 32-bit operation. Similar to addition, the V and C bits are properly set, while the Z bit is incorrect.

Program 10.3 A 32-bit subtraction operation.

sub32 ldaa N+3,sp ; start with suba M+3,sp staa P+3,sp ldaa N+2,sp ; next byte sbca M+2,sp ; carry from staa P+2,sp ldaa N+1,sp ; next byte sbca M+1,sp ; carry from staa P+1,sp ldaa N,sp ; last byte sbca M,sp ; carry from staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z

least significant byte

previous addition

previous addition

previous addition

bit is not correct

Program 10.4 presents functions that add (R A B) and subtract (R A B) two unsigned 8-bit values, using promotion to detect for errors. The assembly language version implements the 16-bit local result in Register D. This C program was previously presented as Program 3.2.

10.2 䡲 *Extended Precision Calculations Program 10.4 Using promotion to detect and compensate for unsigned overflow errors.

add

aOK sub

sOK

ldab clra addb adca cpd bls ldd stab rts ldab clra subb sbca cpd bge ldd stab rts

A ;promote to 16 bits B #0 ;A+B (16 bits) #255 aOK #255 ;ceiling R ;demote A ;promote to 16 bits B #0 #0 sOK #0 R

;A-B (16 bits)

;floor ;demote

373

unsigned char A,B,R; void add(void){ unsigned short result; result = A+B; /* promote */ if(result>255){ /* overflow ?*/ result = 255; /* yes */ } R = result; /* demote */ } void sub(void){ short result; result = A-B; /* promote */ if(result127){ /* result = 127; /* } if(result127){ /* result = 127; /* } if(result=M) ; C,V bits set on divide by zero (M=0) ; modifies Reg A,B,X,Y i set 0 ; loop counter div32 leas -1,s ; allocate i ldd M bne d32A ; divisor not zero ldd M+2 bne d32A ; divisor not zero sev ; divide by zero sec bra d32E d32A movw #0,M+4 movw #0,M+6 ; divisor is 64 bits, right justified movw #0,Q movw #0,Q+2 ; quotient=0 movb #32,i,s ; i=0 d32B ldx #M jsr lsr64 ; M=M>>1 ldx #Q jsr lsl32 ; Q=Q