Communicating Process Architectures 2011:  WoTUG-33

  • 79 465 7
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

COMMUNICATING PROCESS ARCHITECTURES 2011

Concurrent Systems Engineering Series Series Editors: M.R. Jane, J. Hulskamp, P.H. Welch, D. Stiles and T.L. Kunii

Volume 68 Previously published in this series: Volume 67, Communicating Process Architectures 2009 (WoTUG-32), P.H. Welch, H.W. Roebbers, J.F. Broenink, F.R.M. Barnes, C.G. Ritson, A.T. Sampson, G.S. Stiles and B. Vinter Volume 66, Communicating Process Architectures 2008 (WoTUG-31), P.H. Welch, S. Stepney, F.A.C. Polack, F.R.M. Barnes, A.A. McEwan, G.S. Stiles, J.F. Broenink and A.T. Sampson Volume 65, Communicating Process Architectures 2007 (WoTUG-30), A.A. McEwan, S. Schneider, W. Ifill and P.H. Welch Volume 64, Communicating Process Architectures 2006 (WoTUG-29), P.H. Welch, J. Kerridge and F.R.M. Barnes Volume 63, Communicating Process Architectures 2005 (WoTUG-28), J.F. Broenink, H.W. Roebbers, J.P.E. Sunter, P.H. Welch and D.C. Wood Volume 62, Communicating Process Architectures 2004 (WoTUG-27), I.R. East, J. Martin, P.H. Welch, D. Duce and M. Green Volume 61, Communicating Process Architectures 2003 (WoTUG-26), J.F. Broenink and G.H. Hilderink Volume 60, Communicating Process Architectures 2002 (WoTUG-25), J.S. Pascoe, P.H. Welch, R.J. Loader and V.S. Sunderam Volume 59, Communicating Process Architectures 2001 (WoTUG-24), A. Chalmers, M. Mirmehdi and H. Muller Volume 58, Communicating Process Architectures 2000 (WoTUG-23), P.H. Welch and A.W.P. Bakkers Volume 57, Architectures, Languages and Techniques for Concurrent Systems (WoTUG-22), B.M. Cook Volumes 54–56, Computational Intelligence for Modelling, Control & Automation, M. Mohammadian Volume 53, Advances in Computer and Information Sciences ’98, U. Güdükbay, T. Dayar, A. Gürsoy and E. Gelenbe Transputer and OCCAM Engineering Series Volume 45, Parallel Programming and Applications, P. Fritzson and L. Finmo Volume 44, Transputer and Occam Developments (WoTUG-18), P. Nixon Volume 43, Parallel Computing: Technology and Practice (PCAT-94), J.P. Gray and F. Naghdy Volume 42, Transputer Research and Applications 7 (NATUG-7), H. Arabnia

ISSN 1383-7575 ISSN 1879-8039

Communicating Process Architectures 2011 WoTUG-33

Edited by

Peter H. Welch University of Kent, UK

Adam T. Sampson University of Abertay Dundee, UK

Jan B. Pedersen University of Nevada, Las Vegas, USA

Jon Kerridge Edinburgh Napier University, UK

Jan F. Broenink University of Twente, the Netherlands

and

Frederick R.M. Barnes University of Kent, UK

Proceedings of the 33rd WoTUG Technical Meeting, 19–22 June 2011, University of Limerick, Ireland

Amsterdam • Berlin • Tokyo • Washington, DC

© 2011 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-60750-773-4 (print) ISBN 978-1-60750-774-1 (online) Library of Congress Control Number: 2011929917 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected]

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

Communicating Process Architectures 2011 P.H. Welch et al. (Eds.) IOS Press, 2011 © 2011 The authors and IOS Press. All rights reserved.

v

Preface This thirty-third Communicating Process Architectures conference, CPA 2011, takes place at the University of Limerick, 19-22 June, 2011. It is hosted by Lero, the Irish Software Engineering Research Centre, and (as for CPA 2009) co-located with FM 2011 (the 17th International Symposium on Formal Methods). Also co-located this year are SEW-34 (the 34th Annual IEEE Software Engineering Workshop) and several specialist Workshops. We are very pleased this year to have Gavin Lowe, Professor of Computer Science at the University of Oxford Computing Laboratory for our keynote speaker. His research over the past two decades has made significant contributions to the field of concurrency, with special emphasis on CSP and the formal modelling of computer security. His paper addresses a long-standing and crucial issue for this community: the verified implementation of CSP external choice, with no restrictions. We have also received a good set of papers covering many of the key issues in modern computer science, which all seem to concern concurrency in one form or another these days. Inside, you will find papers on concurrency models and their theory, pragmatics (the effective use of multicores), language ideas and implementation (for mobile processes, generalised forms of choice), tools to assist verification and performance, applications (large scale simulation, robotics, web servers), benchmarks (for scientific and distributed computing) and, perhaps most importantly, education. They reflect the increasing relevance of concurrency both to express and manage complex problems as well as to exploit readily available parallel hardware. Authors from all around the world, old hands and new faces, PhD students and professors will be gathered here this week. We hope everyone will have a good time and engage in many stimulating discussions and much learning – both in the formal sessions of the conference and in the many opportunities afforded by the evening receptions and dinners, which are happening every night, and into the early hours beyond. We thank the authors for their submissions and the Programme Committee for their hard work in reviewing the papers. We also thank Mike Hinchey at Lero for inviting CPA 2011 to be part of the week of events surrounding FM 2011 and for being so helpful during the long months of planning. Finally, we thank Patsy Finn and Susan Mitchell, also at Lero, for all the detailed – and extra – work they put in researching and making all the special arrangements we requested for CPA.

Peter Welch (University of Kent), Adam Sampson (University of Abertay Dundee), Frederick Barnes (University of Kent), Jan B. Pedersen (University of Nevada, Las Vegas), Jan Broenink (University of Twente), Jon Kerridge (Edinburgh Napier University).

vi

Editorial Board Dr. Frederick R.M. Barnes, School of Computing, University of Kent, UK Dr. Jan F. Broenink, Control Engineering, Faculty EEMCS, University of Twente, The Netherlands Prof. Jon Kerridge, School of Computing, Edinburgh Napier University, UK Prof. Jan B. Pedersen, School of Computer Science, University of Nevada, Las Vegas, USA Dr. Adam T. Sampson, Institute of Arts, Media and Computer Games, University of Abertay Dundee, UK Prof. Peter H. Welch, School of Computing, University of Kent, UK (Chair)

vii

Reviewing Committee Dr. Alastair R. Allen, Aberdeen University, UK Mr. Philip Armstrong, University of Oxford, UK Dr. Paul S. Andrews, University of York, UK Dr. Rick Beton, Equal Experts, UK Dr. John Markus Bjørndalen, University of Tromsø, Norway Dr. Jim Bown, University of Abertay Dundee, UK Dr. Phil Brooke, University of Teesside, UK Mr. Neil C.C. Brown, University of Kent, UK Dr. Kevin Chalmers, Edinburgh Napier University, UK Dr. Barry Cook, 4Links Ltd., UK Mr. Martin Ellis, University of Kent, UK Dr. Oliver Faust, Altreonic, Belgium Dr. Bill Gardner, University of Guelph, Canada Prof. Michael Goldsmith, University of Warwick, UK Mr. Marcel Groothuis, University of Twente, The Netherlands Dr. Gerald Hilderink, The Netherlands Dr. Kohei Honda, Queen Mary & Westfield College, UK Mr. Jason Hurt, University of Nevada, Las Vegas, USA Ms. Ruth Ivimey-Cook, UK Prof. Matthew Jadud, Allegheny College, USA Mr. Brian Kauke, University of Nevada, Las Vegas, USA Prof. Gavin Lowe, University of Oxford, UK Dr. Jeremy M.R. Martin, GlaxoSmithKline, UK Dr. Alistair McEwan, University of Leicester, UK Dr. Fiona A.C. Polack, University of York, UK Mr. Carl G. Ritson, University of Kent, UK Mr. Herman Roebbers, TASS Technology Solutions BV, the Netherlands Mr. Mike Rogers, University of Nevada, Las Vegas, USA Mr. David Sargeant, University of Nevada, Las Vegas, USA Prof. Steve Schneider, University of Surrey, UK Prof. Marc L. Smith, Vassar College, USA Prof. Susan Stepney, University of York, UK Mr. Bernard Sufrin, University of Oxford, UK Dr.ir. Johan P.E. Sunter, TASS, The Netherlands Dr. Øyvind Teig, Autronica Fire and Security, Norway Dr. Gianluca Tempesti, University of Surrey, UK Dr. Helen Treharne, University of Surrey, UK Dr. Kevin Vella, University of Malta, Malta Prof. Brian Vinter, Copenhagen University, Denmark Prof. Alan Wagner, University of British Columbia, Canada Prof. Alan Winfield, University of the West of England, UK Mr. Doug N. Warren, University of Kent, UK Prof. George C. Wells, Rhodes University, South Africa

This page intentionally left blank

ix

Contents Preface Peter Welch, Adam Sampson, Frederick Barnes, Jan B. Pedersen, Jan Broenink and Jon Kerridge

v

Editorial Board

vi

Reviewing Committee

vii

Implementing Generalised Alt – A Case Study in Validated Design Using CSP Gavin Lowe

1

Verification of a Dynamic Channel Model Using the SPIN Model Checker Rune Møllegaard Friborg and Brian Vinter

35

Programming the CELL-BE Using CSP Kenneth Skovhede, Morten N. Larsen and Brian Vinter

55

Static Scoping and Name Resolution for Mobile Processes with Polymorphic Interfaces Jan Bækgaard Pedersen and Matthew Sowders Prioritised Choice over Multiway Synchronisation Douglas N. Warren An Analysis of Programmer Productivity Versus Performance for High Level Data Parallel Programming Alex Cole, Alistair McEwan and Satnam Singh Experiments in Multicore and Distributed Parallel Processing Using JCSP Jon Kerridge Evaluating an Emergent Behaviour Algorithm in JCSP for Energy Conservation in Lighting Systems Anna Kosek, Aly Syed and Jon Kerridge

71 87

111 131

143

LUNA: Hard Real-Time, Multi-Threaded, CSP-Capable Execution Framework M.M. Bezemer, R.J.W. Wilterdink and J.F. Broenink

157

Concurrent Event-Driven Programming in occam-π for the Arduino Christian L. Jacobsen, Matthew C. Jadud, Omer Kilic and Adam T. Sampson

177

Fast Distributed Process Creation with the XMOS XS1 Architecture James Hanlon and Simon J. Hollis

195

Serving Web Content with Dynamic Process Networks in Go James Whitehead II

209

x

Performance of the Distributed CPA Protocol and Architecture on Traditional Networks Kevin Chalmers

227

Object Store Based Simulation Interworking Carl G. Ritson, Paul S. Andrews and Adam T. Sampson

243

A Model for Concurrency Using Single-Writer Single-Assignment Variables Matthew Huntbach

255

The Computation Time Process Model Martin Korsgaard and Sverre Hendseth

273

SystemVerilogCSP: Modeling Digital Asynchronous Circuits Using SystemVerilog Interfaces Arash Saifhashemi and Peter A. Beerel Process-Oriented Subsumption Architectures in Swarm Robotic Systems Jeremy C. Posso, Adam T. Sampson, Jonathan Simpson and Jon Timmis A Systems Re-Engineering Case Study: Programming Robots with occam and Handel-C Dan Slipper and Alistair A. McEwan The Flying Gator: Towards Aerial Robotics in occam-π Ian Armstrong, Michael Pirrone-Brusse, Anthony Smith and Matthew Jadud CONPASU-Tool: A Concurrent Process Analysis Support Tool Based on Symbolic Computation Yoshinao Isobe

287 303

317 329

341

Development of an ML-Based Verification Tool for Timed CSP Processes Takeshi Yamakawa, Tsuneki Ohashi and Chikara Fukunaga

363

Mobile Processes and Call Channels with Variant Interfaces (a Duality) Eric Bonnici and Peter H. Welch

377

Adding Formal Verification to occam-π Peter H. Welch, Jan B. Pedersen, Fred R.M. Barnes, Carl G. Ritson and Neil C.C. Brown

379

Subject Index

381

Author Index

383

Communicating Process Architectures 2011 P.H. Welch et al. (Eds.) IOS Press, 2011 © 2011 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-774-1-1

1

Implementing Generalised Alt A Case Study in Validated Design using CSP Gavin LOWE Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK; e-mail [email protected] Abstract. In this paper we describe the design and implementation of a generalised alt operator for the Communicating Scala Objects library. The alt operator provides a choice between communications on different channels. Our generalisation removes previous restrictions on the use of alts that prevented both ends of a channel from being used in an alt. The cost of the generalisation is a much more difficult implementation, but one that still gives very acceptable performance. In order to support the design, and greatly increase our confidence in its correctness, we build CSP models corresponding to our design, and use the FDR model checker to analyse them. Keywords. Communicating Scala Objects, alt, CSP, FDR.

Introduction Communicating Scala Objects (CSO) [14] is a library of CSP-like communication primitives for the Scala programming language [12]. As a simple example, consider the following code: val c = OneOne[String]; def P = proc{ c! ”Hello world!” ; } def Q = proc{ println (c?); } (P || Q)();

The first line defines a (synchronous) channel c that can communicate Strings (intended to be used by one sender and one receiver—hence the name OneOne; CSO also has channels whose ends can be shared); the second and third lines define processes (more accurately, threads) that, respectively, send and receive a value over the channel; the final line combines the processes in parallel, and runs them. CSO —inspired by occam [9]— includes a construct, alt, to provide a choice between communicating on different channels. In this paper we describe the design and implementation of a generalisation of the alt operator. We begin by describing the syntax and (informal) semantics of the operator in more detail. As an initial example, the code alt ( c −−> { println(”c: ”+(c ?)); } | d −−> { println(”d: ”+(d?)); } )

tests whether the environment is willing to send this process a value on either c or d, and if so fires an appropriate branch. Note that the body of each branch is responsible for performing the actual input: the alt just performs the selection, based on the communications offered by the environment. Channels may be closed, preventing further communication; each alt considers only its open channels.

2

G. Lowe / Implementing Generalised Alt

Each branch of an alt may have a boolean guard. For example, in the alt: alt ( (n >= 0 &&& c) −−> { println(”c: ”+(c?)); } | d −−> { println(”d: ”+(d?)); } )

the communication on c is enabled only if n >= 0. An alt may also have a timeout branch, for example: alt ( c −−> { println(”c: ”+(c ?)); } | after (500) −−> { println(”timeout” ); } )

If no communication has taken place on a different branch within the indicated time (in milliseconds) then the alt times out and selects the timeout branch. Finally, an alt may have an orelse branch, for example: alt ( (n >= 0 &&& c) −−> { println(”c: ”+(c?)); } | orelse −−> { println(”orelse” ); } )

If every other branch is disabled —that is, the guard is false or the channel is closed— then the orelse branch is selected. (By contrast, if there is no orelse branch and all the other branches are disabled, then the alt throws an Abort exception.) Each alt may have at most one timeout or orelse branch. In the original version of CSO —as in occam— alts could perform selections only between input ports (the receiving ends of channels, known as InPorts). Later this was extended to include output ports (the sending ends of channels, known as OutPorts), for example: alt ( in −?−> { println(”in: ”+(in ?)); } | out −!−> { out!2011; } )

The different arrows −?−> and −!−> show whether the InPort or OutPort of the channel is to be used; the simple arrow −−> can be considered syntactic sugar for −?−>. Being able to combine inputs and outputs in the same alt can be useful in a number of circumstances. The following example comes from the bag-of-tasks pattern [4]. A server process maintains a collection of tasks (in this case, in a stack) to be passed to worker processes on channel toWorker. Workers can return (sub-)tasks to the server on channel fromWorker. In addition, a worker can indicate that it has completed its last task on channel done; the server maintains a count, busyWorkers, of the workers who are currently busy. The main loop of the server can be defined as follows: serve( (! stack.isEmpty &&& toWorker) −!−> { toWorker!(stack.pop) ; busyWorkers += 1; } | (busyWorkers>0 &&& fromWorker) −?−> { stack.push(fromWorker?); } | (busyWorkers>0 &&& done) −?−> { done? ; busyWorkers −= 1 } )

The construct serve represents an alt that is repeatedly executed until all its branches are disabled — in this case, assuming no channels are closed, when the stack is empty and busyWorkers = 0. In the above example, it is possible to replace the output branch (the first branch) by one where the server receives a request from a worker (on channel req) before sending the task (! stack.isEmpty &&& req) −?−> { req?; toWorker!(stack.pop) ; busyWorkers += 1; }

However, such a solution adds complexity for the programmer; a good API should hide such complexities. Further, such a solution is not always possible. However, the existing implementation of alt has the following restriction [15]: A channel’s input and output ports may not both simultaneously participate in alts.

G. Lowe / Implementing Generalised Alt

3

This restriction makes the implementation of alts considerably easier. It means that at least one end of each communication will be unconditional, i.e. that offer to communicate will not be withdrawn once it is made. However, the restriction can prove inconvenient in practice, preventing many natural uses of alts. For example, consider a ring topology, where each node may pass data to its clockwise neighbour or receive data from its anticlockwise neighbour; this pattern can be used to adapt the above bag-of-tasks to a distributed-bag-of-tasks as follows, where give and get are aliases for the channels connecting this node to its neighbours:1 serve( (! stack.isEmpty &&& toWorker) −!−> { toWorker!(stack.pop); workerBusy = true; } | (workerBusy &&& fromWorker) −?−> { stack.push(fromWorker?); } | (workerBusy &&& done) −?−> { done?; workerBusy = false; } | (! stack.isEmpty &&& give) −!−> { give!(stack.pop); } | ((! workerBusy && stack.isEmpty) &&& get) −?−> { stack.push(get?); } )

However, now the InPorts and OutPorts of channels connecting nodes are both participating in alts, contrary to the above restriction. One goal of this paper is to present a design and implementation for a generalised alt operator, that overcomes the above restriction. McEwan [11] presents a formal model for a solution to this problem, based on a twophase commit protocol, with the help of a centralised controller. Welch et al. [17,18] implement a generalised alt, within the JCSP library. The implementation makes use of a single (system-wide) Oracle server process, which arbitrates in all alts that include an output branch or a barrier branch (which allows multi-way synchronisation); alts that use only input branches can be implemented without the Oracle. This is a pragmatic solution, but has the disadvantage of the Oracle potentially being a bottleneck. Brown [1] adopted the same approach within the initial version of the CHP library. However, later versions of CHP built upon Software Transactional Memory [6] and so was decentralised in that alts offering to communicate on disjoint channels did not need to interact; see [3,2]. Our aim in this paper is to investigate an alternative, more scalable design. In particular, we are aiming for a design with no central controller, and that does not employ additional channels internally. However, coming up with a correct design is far from easy. Our development strategy, described in later sections, was to build CSP [13] models of putative designs, and then to analyse them using FDR [5]. In most cases, our putative designs turned out to be incorrect: FDR revealed subtle interactions between the components that led to incorrect behaviour. Debugging CSP models using FDR is very much easier than debugging code by testing for a number of reasons: • FDR does exhaustive state space exploration, whereas execution of code explores the state space nondeterministically, and so may not detect errors; • The counterexamples returned by FDR are of minimal length (typically about 20 in this work), whereas counterexamples found by testing are likely to be much longer (maybe a million times longer, based on our experience of a couple of bugs that did crop up in the code); • CSP models are more abstract and so easier to understand than code. 1 This design ignores the problem of distributed termination; a suitable distributed termination protocol can be layered on top of this structure.

4

G. Lowe / Implementing Generalised Alt

A second goal of this paper, then, is to illustrate the use of CSP in such a development. One factor that added to the difficulty was that we were aiming for an implementation using the concurrency primitives provided by the Scala programming language, namely monitors. A third goal of this paper is an investigation of the relationship between abstract CSP processes and implementations using monitors: what CSP processes can be implemented using monitors, and what design patterns can we use? One may use formal analysis techniques with various degrees of rigour. Our philosophy in this work has been pragmatic rather than fully rigorous. Alts and channels are components, and do not seem to have abstract specifications against which the designs can be verified. The best we can do is analyse systems built from the designs, and check that they act as expected. We have analysed a few such systems; this gives us a lot of confidence that other systems would be correct — but does not give us an absolute guarantee of that. Further, the translation from the CSP models to Scala code has been done informally, because, in our opinion, it is fairly obvious. The rest of this paper is structured as follows. Below we present a brief overview of CSP and of monitors. In Section 1 we present an initial attempt at a design; this design will be incorrect, but presenting it will help to illustrate some of the ideas, and indicate some of the difficulties. In Section 2 we present a correct design, but omitting timeouts and closing of channels; we validate the design using FDR. That design, however, does not seem amenable to direct implementation using a monitor. Hence, in Section 3, we refine the design, implementing each alt as the parallel composition of two processes, each of which could be implemented as a monitor. In Section 4 we extend the design, to include timeouts and the closing of channels; this development requires the addition of a third component to each alt. In Section 5 we describe the implementation: each of the three processes in the CSP model of the alt can be implemented using a monitor. We sum up in Section 6. CSP In this section we give a brief overview of the syntax for the fragment of CSP that we will be using in this paper. We then review the relevant aspects of CSP semantics, and the use of the model checker FDR in verification. For more details, see [7,13]. CSP is a process algebra for describing programs or processes that interact with their environment by communication. Processes communicate via atomic events. Events often involve passing values over channels; for example, the event c.3 represents the value 3 being passed on channel c. Channels may be declared using the keyword channel; for example, channel c : Int declares c to be a channel that passes an Int. The notation {|c|} represents the set of events over channel c. In this paper we will have to talk about both CSP channels and CSO channels: we will try to make clear which we mean in each case. The simplest process is STOP, which represents a deadlocked process that cannot communicate with its environment. The process a → P offers its environment the event a; if the event is performed, the process then acts like P. The process c?x → P is initially willing to input a value x on channel c, i.e. it is willing to perform any event of the form c.x; it then acts like P (which may use x). Similarly, the process c?x:X → P is willing to input any value x from set X on channel c, and then act like P (which may use x). The process c!x → P outputs value x on channel c. Inputs and outputs may be mixed within the same communication, for example c?x!y → P. The process P 2 Q can act like either P or Q, the choice being made by the environment: the environment is offered the choice between the initial events of P and Q; hence the alt operator in CSO is very similar to the external choice operator of CSP. By contrast, P  Q may act like either P or Q, with the choice being made internally, not under the control of the environment. 2x:X • P(x) and x:X • P(x) are indexed versions of these operators, with the

G. Lowe / Implementing Generalised Alt

5

choice being made over the processes P(x) for x in X. The process P  Q represents a sliding choice or timeout: it initially acts like P, but if no event is performed then it can internally change state to act like Q. The process if b then P else Q represents a conditional. It will prove convenient to write assertions in our CSP models, similar in style to assertions in code. We define Assert(b)(P) as shorthand for if b then P else error → STOP; we will later check that the event error cannot occur, ensuring that all assertions are true. The process P [| A |] Q runs P and Q in parallel, synchronising on events from A. The process P ||| Q interleaves P and Q, i.e. runs them in parallel with no synchronisation. The process |||x:X • P(x) represents an indexed interleaving. The process P \ A acts like P, except the events from A are hidden, i.e. turned into internal, invisible events. Prefixing (→ ) binds tighter than each of the binary choice operators, which in turn bind tighter than the parallel operators. A trace of a process is a sequence of (visible) events that a process can perform. We say that P is refined by Q in the traces model, written P T Q, if every trace of Q is also a trace of P. FDR can test such refinements automatically, for finite-state processes. Typically, P is a specification process, describing what traces are acceptable; this test checks whether Q has only such acceptable traces. Traces refinement tests can only ensure that no “bad” traces can occur: they cannot ensure that anything “good” actually happens; for this we need the stable failures or failuresdivergences models. A stable failure of a process P is a pair (tr, X), which represents that P can perform the trace tr to reach a stable state (i.e. where no internal events are possible) where X can be refused, i.e., where none of the events of X is available. We say that P is refined by Q in the stable failures model, written P F Q, if every trace of Q is also a trace of P, and every stable failure of Q is also a stable failure of Q. We say that a process diverges if it can perform an infinite number of internal (hidden) events without any intervening visible events. In this paper, we will restrict ourselves to specification processes that cannot diverge. If P is such a process then we say that P is refined by Q in the failures-divergences model, written P F D Q, if Q also cannot diverge, and every stable failure of Q is also a stable failure of P (which together imply that every trace of Q is also a trace of P). This test ensures that if P can stably offer an event a, then so can Q; hence such tests can be used to ensure Q makes useful progress. Again, such tests can be performed using FDR. Monitors A monitor is a program module —in Scala, an object— with a number of procedures that are intended to be executed under mutual exclusion. A simple monitor in Scala typically has a shape as below. object Monitor{ private var x ,...; // private variables def procedure1 (arg1 : T1 ) = synchronized{...}; ... def proceduren (argn : Tn ) = synchronized{...}; }

The keyword synchronized indicates a synchronized block: before a thread can enter the block, it must acquire the lock on the object; when it leaves the block, it releases the lock; hence at most one thread at a time can be executing within the code of the monitor.

6

G. Lowe / Implementing Generalised Alt

It is sometimes necessary for a thread to suspend part way through a procedure, to wait for some condition to become true. It can do this by performing the command wait(); it releases the object’s lock at this point. Another thread can wake it up by performing the command notify(); this latter thread retains the object’s lock at this point, and the awoken thread must wait to re-obtain the lock. The following producer-consumer example illustrates this technique. Procedures are available to put a piece of data into a shared slot, and to remove that data; each procedure might have to suspend, to wait for the slot to be emptied or filled, respectively. object Slot{ private var value = 0; // the value in the slot private var empty = true; // is the slot empty? def put(v : Int ) = synchronized{ while(!empty) wait(); // wait until space is available value = v; empty = false; // store data notify (); // wake up consumer } def get : Int = synchronized{ while(empty) wait(); // wait until value is available val result = value; empty = true; // get and clear value notify (); // wake up producer return result ; } }

An unfortunate feature of the implementation of wait within the Java Virtual Machine (upon which Scala is implemented) is that sometimes a process will wake up even if no other process has performed a notify, a so-called spurious wake-up. It is therefore recommended that all waits are guarded by a boolean condition that is unset by the awakening thread; for example: waiting = true ; while(waiting) wait ();

with awakening code: waiting = false ; notify ();

1. Initial Design In this section we present our initial design for the generalised alt. The design is not correct; however, our aims in presenting it are: • • • •

to act as a stepping-stone towards a correct design; to illustrate some of the difficulties in producing a correct design; to introduce some features of the CSP models; to illustrate how model checking can discover flaws in a design.

For simplicity, we do not consider timeouts or the closing of channels within this model. We begin by describing the idea of the design informally, before presenting the CSP model and the analysis.

7

G. Lowe / Implementing Generalised Alt

In order for an alt to fire a particular branch, say the branch for channel c, there must be another process —either another alt or not— willing to communicate on the other port of c. In order to ascertain this, an alt will register with the channel for each of its branches. • If another process is already registered with channel c’s other port, and ready to communicate, then c will respond to the registration request with YES, and the alt will select that branch. The act of registration represents a promise by the alt, that if it receives an immediate response of YES it will communicate. • However, if no other process is registered with c’s other port and ready to communicate, then c responds with NO, and the alt will continue to register with its other channels. In this case, the registration does not represent a firm promise to communicate, since it may select a different branch: it is merely an expression of interest. If an alt has registered with each of its channels without receiving a positive response, then it waits to hear back from one of them. This process is illustrated in the first few steps of Figure 1: Alt1 registers with Chan1 and Chan2, receiving back a response of NO, before waiting. Chan2

Alt1

Chan1

/

register

o o

Alt2

NO

register NO

/ e

wait

o

commit YES

o

register

/

o deregister

YES

/

Figure 1. First sequence diagram

When a channel receives another registration attempt, it checks whether any of the alts already registered on its other port is able to commit to a communication. If any such alt agrees, the channel returns a positive response to the registering alt; at this point, both alts deregister from all other channels, and the communication goes ahead. However, if none of the registered alts is able to commit, then the channel returns a negative result to the registering alt. This process is illustrated in the last few steps of Figure 1. Alt2 registers with Chan1; Chan1 checks whether Alt1 can commit, and receives a positive answer, which is passed on to Alt2. In the Scala implementation, our aim will be to implement the messages between components as procedure calls and returns. For example, the commit messages will be implemented by a procedure in the alt, also called commit; the responses will be implemented by the values returned from that procedure. A difference between the two types of components is that each alt will be thread-like: a thread will be executing the code of the alt (although at times that thread will be within procedure calls to other components); by contrast, channels will be object-like: they will be mostly passive, but willing to receive procedure calls from active threads.

8

G. Lowe / Implementing Generalised Alt

1.1. CSP Model Each CSP model will be defined in two parts: a definition of a (generic) alt and channel; and the combination of several alts and channels into a system. The definition of each system will include two integer values, numAlts and numChannels, giving the number of alts and CSO channels, respectively. Given these, we can define the identities of alts and channels: A l t I d = { 1 . . numAlts } −− IDs o f A l t s ChannelId = { 1 . . numChannels} −− IDs o f channels

We can further define a datatype of ports, and a datatype of responses: datatype P o r t = I n P o r t . ChannelId | OutPort . ChannelId datatype Resp = YES | NO

We can now declare the CSP channels used in the model. The register, commit and deregister channels, and response channels for the former two, are declared as follows2 . channel channel channel channel channel

r e g i s t e r : A l t I d . Port r e g i s t e r R e s p : P o r t . A l t I d . Resp commit : P o r t . A l t I d commitResp : A l t I d . P o r t . Resp deregister : A l t I d . Port

We also include a CSP channel on which each alt can signal that it thinks that it is executing a branch corresponding to a particular CSO channel; this will be used for specification purposes. channel s i g n a l : A l t I d . ChannelId

The process Alt(me, ps) represents an alt with identity me with branches corresponding to the ports ps. It starts by registering with each of its ports. Below, reged is the set of ports with which it has registered, and toReg is the set of ports with which it still needs to register. It chooses (nondeterministically, at this level of abstraction) a port with which to register, and receives back a response; this is repeated until either it receives a positive response, or has registered with all the ports. A l t (me, ps ) = AltReg (me, ps , { } , ps ) AltReg (me, ps , reged , toReg ) = i f toReg =={} then A l t W a i t (me, ps , reged ) else  p : toReg • r e g i s t e r .me. p → r e g i s t e r R e s p ?p ’ ! me? resp → A s s e r t ( p ’ = = p ) ( i f resp ==YES then A l t D e r e g (me, ps , remove ( reged , p ) , p ) else AltReg (me, ps , add ( reged , p ) , remove ( toReg , p ) ) )

Here we use two helper functions, to remove an element from a set, and to add an element to a set:3 remove ( xs , x ) = d i f f ( xs , { x } ) add ( xs , x ) = union ( xs , { x } ) 2 3

deregister does not return a result, and can be treated as atomic, so we do not need a response channel diff and union are the machine-readable CSP functions for set difference and union.

G. Lowe / Implementing Generalised Alt

9

If the alt registers unsuccessfully with each of its ports, then it waits to receive a commit message from a port, which it accepts. A l t W a i t (me, ps , reged ) = commit?p : reged !me → commitResp .me. p ! YES → A l t D e r e g (me, ps , remove ( reged , p ) , p )

Once an alt has committed to a particular port, p, it deregisters with each of the other ports, and then signals, before returning to its initial state. During the same time, if the alt receives a commit event, it responds negatively. A l t D e r e g (me, ps , toDereg , p ) = i f toDereg =={} then s i g n a l .me. chanOf ( p ) → A l t (me, ps ) else ( (  p1 : toDereg • d e r e g i s t e r .me. p1 → A l t D e r e g (me, ps , remove ( toDereg , p1 ) , p ) ) 2 commit?p1 : a p o r t s (me ) ! me → commitResp .me. p1 !NO → A l t D e r e g (me, ps , toDereg , p ) )

Here chanOf returns the channel corresponding to a port: chanOf ( I n P o r t . c ) = c chanOf ( OutPort . c ) = c

We now consider the definition of a channel. The process Channel(me, reged) represents a channel with identity me, where reged is a set of (port, alt) pairs, showing which alts have registered at its two ports. Channel (me, reged ) = r e g i s t e r ?a? p o r t : p o r t s (me) → ( l e t t o T r y = { ( p , a1 ) | ( p , a1 ) ← reged , p== otherP ( p o r t ) } w i t h i n ChannelCommit (me, a , p o r t , reged , t o T r y ) ) 2 d e r e g i s t e r ?a?p : p o r t s (me) → Channel (me, remove ( reged , ( p , a ) ) )

Here, ports(me) gives the ports corresponding to this channel: p o r t s (me) = { I n P o r t . me, OutPort .me}

The set toTry, above, represents all the previous registrations with which this new registration might be matched; otherP(port) returns this channel’s other port. otherP ( I n P o r t .me) = OutPort .me otherP ( OutPort .me) = I n P o r t .me

The channel now tries to find a previous registration with which this new one can be paired. The parameter toTry represents those previous registrations with which the channel still needs to check. The channel chooses (nondeterministically) a previous registration to try, and sends a commit message. It repeats until either (a) it receives back a positive response, in which case it sends a positive response to the registering alt a, or (b) it has exhausted all possibilities, in which case it sends back a negative response.4 4 The notation pa’ @@(port’,a’) : toTry binds the identifier pa’ to an element of toTry, and also binds the identifiers port’ and a’ to the two components of pa’.

10

G. Lowe / Implementing Generalised Alt

ChannelCommit (me, a , p o r t , reged , t o T r y ) = i f t o T r y =={} then −− None can commit r e g i s t e r R e s p . p o r t . a !NO → Channel (me, add ( reged , ( p o r t , a ) ) ) else (  pa ’ @@ ( p o r t ’ , a ’ ) : t o T r y • commit . p o r t ’ . a ’ → commitResp . a ’ . p o r t ’ ? resp → i f resp ==YES then r e g i s t e r R e s p . p o r t . a ! YES → Channel (me, remove ( reged , pa ’ ) ) else ChannelCommit (me, a , p o r t , remove ( reged , pa ’ ) , remove ( t o T r y , pa ’ ) ) )

1.2. Analysing the Design @ Channel(1) AA AA Alt(1) ^>>> >> >> >

AA AA A } Alt(2) }} } } }} ~} Channel(2) }

Figure 2. A simple configuration

We consider a simple configuration of two alts and two channels, as in Figure 2 (where the arrows indicate the direction of dataflow, so Alt(1) accesses Channel(1)’s inport and Channel(2)’s outport, for example). This system can be defined as follows. numAlts = 2 numChannels = 2 Channels =

|||

me : ChannelId • Channel (me, { } )

a p o r t s ( 1 ) = { I n P o r t . 1 , OutPort . 2 } a p o r t s ( 2 ) = { I n P o r t . 2 , OutPort . 1 } Procs =

|||

me : A l t I d • A l t (me, a p o r t s (me ) )

System = l e t i n t e r n a l s = {| r e g i s t e r , r e g i s t e r R e s p , commit , commitResp , d e r e g i s t e r |} w i t h i n ( Channels [| i n t e r n a l s |] Procs ) \ i n t e r n a l s

The two processes should agree upon which channel to communicate; that is, they should (repeatedly) signal success on the same channel. Further, no error events should occur. This requirement is captured by the following CSP specification. Spec =  c : ChannelId • s i g n a l . 1 . c → s i g n a l . 2 . c → Spec 2 s i g n a l . 2 . c → s i g n a l . 1 . c → Spec

When we use FDR to test if System refines Spec in the traces model, the test succeeds. However, when we do the corresponding test in the stable failures model, the test fails, because System deadlocks. Using the FDR debugger shows that the deadlock occurs after the system (without the hiding) has performed

11

G. Lowe / Implementing Generalised Alt < r e g i s t e r . 2 . I n P o r t . 2 , r e g i s t e r . 1 . I n P o r t . 1 , r e g i s t e r R e s p . I n P o r t . 2 . 2 . NO, r e g i s t e r R e s p . I n P o r t . 1 . 1 . NO, r e g i s t e r . 1 . OutPort . 2 , r e g i s t e r . 2 . OutPort .1>

This is illustrated in Figure 3. Each alt has registered with one channel, and is trying to Alt(1)

Channel(1) register

o

Channel(2)

/

o

NO

o o

register NO

/

commit

/

/

register commit

Alt(2)

register

Figure 3. The behaviour leading to deadlock

register with its other channel. In the deadlocked state, Channel(1) is trying to send a commit message to Alt(1), but Alt(1) refuses this because it is waiting for a response to its last register event; Channel(2) and Alt(2) are behaving similarly. The following section investigates how to overcome this problem. 2. Improved Design The counterexample in the previous section shows that alts should be able to accept commit messages while waiting for a response to a register. But how should an alt deal with such a commit? It would be wrong to respond with YES, for then it would be unable to deal with a response of YES to the register message (recall that an alt must respect a response of YES to a register message). It would also be wrong to respond NO to the commit, for then the chance to communicate on this channel would be missed. Further, a little thought shows that delaying replying to the commit until after a response to the register has been received would also be wrong: in the example of the last section, this would again lead to a deadlock. Our solution is to introduce a different response, MAYBE, that an alt can send in response to a commit; informally, the response of MAYBE means “I’m busy right now; please call back later”. The sequence diagram in Figure 4 illustrates the idea. Alt1 receives a commit from Chan1 while waiting for a response to a register. It sends back a response of MAYBE, which gets passed back to the initiating Alt2. Alt2 pauses for a short while (to give Alt1 a chance to finish what it’s doing), before again trying to register with Chan1. Note that it is the alt’s responsibility to retry, rather than the channel’s, because we are aiming for an implementation where the alt is thread-like, but the channel is object-like. 2.1. CSP Model We now adapt the CSP model from the previous section to capture this idea. First, we expand the type of responses to include MAYBE: datatype Resp = YES | NO | MAYBE

When a channel pauses before retrying, it will signal on the channel pause; we will later use this for specification purposes. channel pause : A l t I d

12

G. Lowe / Implementing Generalised Alt

Chan2

Alt1

Chan1

/

register

o

o

NO

o

commit

o

register

register

/

MAYBE NO

Alt2

/

/

MAYBE

e

wait

o

commit YES

o

e

register

pause

/

o deregister

YES

/

Figure 4. Using MAYBE

An alt again starts by registering with each of its channels. It may now receive a response of MAYBE; the parameter maybes below stores those ports for which it has received such a response. Further, it is willing to receive a commit message during this period, in which case it responds with MAYBE. A l t (me, ps ) = AltReg (me, ps , { } , ps , { } ) AltReg (me, ps , reged , toReg , maybes ) = i f toReg =={} then i f maybes=={} then A l t W a i t (me, ps , reged ) else pause .me → AltPause (me, ps , reged , maybes ) 2 commit?p : a p o r t s (me ) ! me → commitResp .me. p !MAYBE → AltReg (me, ps , reged , toReg , maybes ) else (  p : toReg • r e g i s t e r .me. p → AltReg ’ ( me, ps , reged , toReg , maybes , p ) ) 2 commit?p : a p o r t s (me ) ! me → commitResp .me. p !MAYBE → AltReg (me, ps , reged , toReg , maybes ) −− W a i t i n g f o r response from p AltReg ’ ( me, ps , reged , toReg , maybes , p ) = r e g i s t e r R e s p ?p ’ ! me? resp → A s s e r t ( p ’ = = p ) ( i f resp ==YES then A l t D e r e g (me, ps , remove ( reged , p ) , p ) else i f resp ==NO then AltReg (me, ps , add ( reged , p ) , remove ( toReg , p ) , maybes ) else −− resp ==MAYBE AltReg (me, ps , reged , remove ( toReg , p ) , add ( maybes , p ) ) ) 2 commit?p1 : a p o r t s (me ) ! me → commitResp .me. p1 !MAYBE → AltReg ’ ( me, ps , reged , toReg , maybes , p )

G. Lowe / Implementing Generalised Alt

13

If an alt receives no positive response, and at least one MAYBE, it pauses for a short while before retrying. However, it accepts any commit request it receives in the mean time.5 AltPause (me, ps , reged , maybes ) = ( STOP  AltReg (me, ps , reged , maybes , { } ) ) 2 commit?p : a p o r t s (me ) ! me → commitResp .me. p ! YES → A l t D e r e g (me, ps , remove ( reged , p ) , p )

If an alt receives only negative responses to its register messages, it again waits. A l t W a i t (me, ps , reged ) = commit?p : a p o r t s (me ) ! me → commitResp .me. p ! YES → A l t D e r e g (me, ps , remove ( reged , p ) , p )

Once the alt has committed, it deregisters the other ports, and signals, as in the previous model. A l t D e r e g (me, ps , toDereg , p ) = i f toDereg =={} then s i g n a l .me. chanOf ( p ) → A l t (me, ps ) else ( (  p1 : toDereg • d e r e g i s t e r .me. p1 → A l t D e r e g (me, ps , remove ( toDereg , p1 ) , p ) ) 2 commit?p1 : a p o r t s (me ) ! me → commitResp .me. p1 !NO → A l t D e r e g (me, ps , toDereg , p ) )

The definition of a channel is a fairly straightforward adaptation from the previous model. In the second process below, the parameter maybeFlag is true if any alt has responded MAYBE. The port is registered at the channel only if each register message received a response of NO. Channel (me, reged ) = r e g i s t e r ?a? p o r t : p o r t s (me) → ( l e t t o T r y = { ( p , a1 ) | ( p , a1 ) ← reged , p== otherP ( p o r t ) } w i t h i n ChannelCommit (me, a , p o r t , reged , t o T r y , f a l s e ) ) 2 d e r e g i s t e r ?a . p → Channel (me, remove ( reged , ( p , a ) ) ) ChannelCommit (me, a , p o r t , reged , t o T r y , maybeFlag ) = i f t o T r y =={} then −− None can commit i f maybeFlag then r e g i s t e r R e s p . p o r t . a !MAYBE → Channel (me, reged ) else r e g i s t e r R e s p . p o r t . a !NO → Channel (me, add ( reged , ( p o r t , a ) ) ) else (  pa ’ @@ ( p o r t ’ , a ’ ) : t o T r y • commit . p o r t ’ . a ’ → commitResp . a ’ . p o r t ’ ? resp → i f resp ==YES then r e g i s t e r R e s p . p o r t . a ! YES → Channel (me, remove ( reged , pa ’ ) ) else i f resp ==MAYBE then ChannelCommit (me, a , p o r t , reged , remove ( t o T r y , pa ’ ) , t r u e ) else −− resp ==NO 5 CSP-cognoscenti may point out that the “STOP  ” does not affect the behaviour of the process; we include it merely to illustrate the desired behaviour of our later Scala implementation.

14

G. Lowe / Implementing Generalised Alt ChannelCommit (me, a , p o r t , remove ( reged , pa ’ ) , remove ( t o T r y , pa ’ ) , maybeFlag ) )

2.2. Analysing the Design We can again combine these alts and channels into various configurations. First, we consider the configuration in Figure 2; this is defined as earlier, but also hiding the pause events. FDR can then be used to verify that this system refines the specification Spec, in both the traces and the stable failures model. Alt(1)

Channel(1)

/

o

register

o

o

*⎪

/

commit

/

register

commit MAYBE

NO

/

register

o

Alt(2) register

NO

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

Channel(2)

/

o

MAYBE

MAYBE

o

/

e

MAYBE pause

e

pause

Figure 5. Behaviour causing divergence

However, the refinement does not hold in the failures-divergences model, since the system can diverge. The divergence can happen in a number of different ways; one possibility is shown in Figure 56 . Initially, each alt registers with one channel. When each alt tries to register with the other channel, a commit message is sent to the other alt, receiving a response of MAYBE; each alt then pauses. These attempts to register (marked “∗” in the diagram) can be repeated arbitrarily many times, causing a divergence. The problem is that the two alts are behaving symmetrically, each sending its register events at about the same time: if one alt were to send its register while the other is pausing, it would receive back a response of YES, and the symmetry would be broken. In the implementation, the pause will be of a random amount of time, to ensure the symmetry is eventually broken (with probability 1). We can check that the only way that the system can diverge is through repeated pauses and retries. We can show that the system without the pause events hidden refines the following specification: each alt keeps on pausing until both signal. SpecR = (  p : ChannelId • s i g n a l . 1 . p → SpecR 1 ( p ) 2 s i g n a l . 2 . p → SpecR 2 ( p ) )  pause . 1 → SpecR  pause . 2 → SpecR SpecR 1 ( p ) = s i g n a l . 2 . p → SpecR  pause . 1 → SpecR 1 ( p ) SpecR 2 ( p ) = s i g n a l . 1 . p → SpecR  pause . 2 → SpecR 2 ( p )

We have built other configurations, including those in Figure 6. For each, we have used 6 In fact, FDR finds a slightly simpler divergence, where only one alt repeatedly tries to register; in the implementation, this would correspond to the other alt being starved of the processor; we consider the example in the figure to be more realistic.

G. Lowe / Implementing Generalised Alt > A  Channel(1) >>> >>   >>   >  / Channel(2) ;; A Alt(1)  ;;  Alt(3) ;;  ;;  ;;  ; / Alt(2) ;; Channel(3) @ Alt(4) ;; ;; ;; ; Channel(4)

15

/@ Alt(3) Alt(1) > >> >  >  > >>  >>   >>   >  / Channel(2) /@ Alt(2) Alt(1) ::: :: :: ::  Channel(3) Figure 6. Three test configurations

FDR to check that it refines a suitable specification that ensures that suitable signal events are available, in particular that if an alt signals at one port of a channel then another signals at the other port. We omit the details in the interests of brevity. But as the alts and channels are components, we would really like to analyse all systems built from them: this seems a particularly difficult case of the parameterised model checking problem, beyond the capability of existing techniques. 3. Compound Alts The model in the previous section captures the desired behaviour of an alt. However, it does not seem possible to implement this behaviour using a single monitor. We would like to implement the main execution of the alt as a procedure apply, and to implement the commit and commitResp events as a procedure commit and its return. However, these two procedures will need to be able to run concurrently, so cannot be implemented in a single monitor. Instead we implement the alt using two monitors. • The MainAlt will implement the apply procedure, to register with the channels, deregister at the end, execute the appropriate branch of the alt, and generally control the execution. • The Facet will provide the commit procedure, responding appropriately; it will receive messages from the MainAlt, informing it of its progress; if the Facet receives a call to commit while the MainAlt is waiting, the Facet will wake up the MainAlt. The definition of a channel remains the same as in the previous section. Figure 7 illustrates a typical scenario, illustrating how the two components cooperate together to achieve the behaviour of Alt1 from Figure 1. The MainAlt starts by initialising the Facet, and then registers with Chan1. When the Facet receives a commit message from Chan1, it replies with MAYBE, since it knows the MainAlt is still registering with channels. When the MainAlt finishes registering, it informs the Facet, and then waits. When the Facet subsequently receives another commit message, it wakes up the MainAlt, passing the identity of Chan1, and returns YES to Chan1. The MainAlt deregisters the other channels, and informs the Facet. In addition, if the Facet had received another commit message after sending YES to Chan1, it would have replied with NO.

16

G. Lowe / Implementing Generalised Alt

MainAlt

Chan2

Facet

Chan1

/

INIT

/

register

o o

NO

o

register

commit

/

NO

/

MAYBE

/

WAIT

e

wait

o

commit

o wakeUp.Chan1 o

deregister

/

YES

/

DONE

Figure 7. Expanding the alt

As noted earlier, if the MainAlt receives any reply of MAYBE when trying to register with channels, it pauses for a short while, before retrying; Figures 8 and 9 illustrate this for the compound alt (starting from the point where the alt tries to register with Chan2). Before pausing, the MainAlt informs the Facet. If the Facet receives a commit in the meantime, it replies YES (and would reply NO to subsequent commits). When the MainAlt finishes pausing, it checks back with the Facet to find out if any commit was received, getting a positive answer in Figure 8, and a negative one in Figure 9. Chan2

o

MainAlt

Facet

Chan1

register MAYBE

/ /

PAUSE

e

o

pause

commit YES

/

getToRun.Chan1 o

o

deregister DONE

/

Figure 8. A commit received while pausing

3.1. CSP Model We now describe a CSP model that captures the behaviour described informally above. We define a datatype and channel by which the MainAlt informs the Facet of changes of status. datatype S t a t u s = I n i t | Pause | Wait | Dereg | Done channel changeStatus : S t a t u s

17

G. Lowe / Implementing Generalised Alt

MainAlt

Chan2

o

Facet

Chan1

register

/

MAYBE

PAUSE

/

e

pause

o getToRunNo o

register NO

/

Figure 9. Pausing before retrying

When the Facet wakes up the MainAlt, it sends the identity of the port whose branch should be run, on channel wakeUp. channel wakeUp : P o r t

When the MainAlt finishes pausing, it either receives from the Facet on channel getToRun the identity of a port from whom a commit has been received, or receives a signal getToRunNo that indicates that no commit has been received. channel getToRun : P o r t channel getToRunNo

The alt is constructed from the two components, synchronising on and hiding the internal communications: A l t (me, ps ) = l e t A = {| wakeUp , changeStatus , getToRun , getToRunNo |} w i t h i n ( M a i n A l t (me, ps ) [| A |] Facet (me ) ) \ A

The definition of the MainAlt is mostly similar to the definition of the alt in Section 2, so we just outline the differences here. The MainAlt does not receive the commit messages, but instead receives notifications from the Facet. When it finishes pausing (state MainAltPause below), it either receives from the Facet the identity of the branch to run on channel getToRun, or receives on channel getToRunNo an indication that no commit event has been received. When it is waiting (state MainAltWait), it waits until it receives a message from the Facet on channel wakeUp, including the identity of the process to run. M a i n A l t (me, ps ) = changeStatus ! I n i t → MainAltReg (me, ps , { } , ps , { } ) MainAltReg (me, ps , reged , toReg , maybes ) = i f toReg =={} then i f maybes=={} then M a i n A l t W a i t (me, ps , reged ) else pause .me → changeStatus ! Pause → MainAltPause (me, ps , reged , maybes ) else  p : toReg • r e g i s t e r .me. p → r e g i s t e r R e s p ?p ’ ! me? resp → A s s e r t ( p ’ = = p ) ( i f resp ==YES then

18

G. Lowe / Implementing Generalised Alt changeStatus ! Dereg → MainAltDereg (me, ps , remove ( reged , p ) , p ) else i f resp ==NO then MainAltReg (me, ps , add ( reged , p ) , remove ( toReg , p ) , maybes ) else −− resp ==MAYBE MainAltReg (me, ps , reged , remove ( toReg , p ) , add ( maybes , p ) ) )

MainAltPause (me, ps , reged , maybes ) = STOP  ( getToRunNo → MainAltReg (me, ps , reged , maybes , { } ) 2 getToRun?p → MainAltDereg (me, ps , remove ( reged , p ) , p ) ) M a i n A l t W a i t (me, ps , reged ) = changeStatus ! Wait → wakeUp?p : reged → MainAltDereg (me, ps , remove ( reged , p ) , p ) MainAltDereg (me, ps , toDereg , p ) = i f toDereg =={} then changeStatus ! Done → s i g n a l .me. chanOf ( p ) → M a i n A l t (me, ps ) else  p1 : toDereg • d e r e g i s t e r .me. p1 → MainAltDereg (me, ps , remove ( toDereg , p1 ) , p )

The Facet tracks the state of the MainAlt; below we use similar names for the states of the Facet as for the corresponding states of MainAlt. When the MainAlt is pausing, the Facet responds YES to the first commit it receives (state FacetPause), and NO to subsequent ones (state FacetPause’); it passes on this information on getToRun or getToRunNo. When the MainAlt is waiting, if the Facet receives a commit message, it wakes up the MainAlt (state FacetWait). Facet (me) = changeStatus . I n i t → FacetReg (me) FacetReg (me) = commit?p : a p o r t s (me ) ! me → commitResp .me. p !MAYBE → FacetReg (me) 2 changeStatus ?s → i f s==Wait then FacetWait (me) else i f s==Dereg then FacetDereg (me) else A s s e r t ( s==Pause ) ( FacetPause (me ) ) FacetPause (me) = commit?p : a p o r t s (me ) ! me → commitResp .me. p ! YES → FacetPause ’ ( me, p ) 2 getToRunNo → FacetReg (me) FacetPause ’ ( me, p ) = commit?p1 : a p o r t s (me ) ! me → commitResp .me. p1 !NO → FacetPause ’ ( me, p ) 2 getToRun ! p → FacetDereg (me) FacetWait (me) = commit?p : a p o r t s (me ) ! me → wakeUp ! p → commitResp .me. p ! YES → FacetDereg (me) FacetDereg (me) = commit?p : a p o r t s (me ) ! me → commitResp .me. p !NO → FacetDereg (me) 2 changeStatus ?s → A s s e r t ( s==Done ) ( Facet (me ) )

G. Lowe / Implementing Generalised Alt

19

3.2. Analysing the Design We have built configurations, using this compound alt, as in Figures 2 and 6. We have again used FDR to check that each refines a suitable specification. In fact, the compound alt defined in this section is not equivalent to, or even a refinement of, the sequential alt defined in the previous section. The compound alt has a number of behaviours that the sequential alt does not, caused by the fact that it takes some time for information to propagate through the former. For example, the compound alt can register with each of its ports, receiving NO in each case, and then return MAYBE in response to a commit message (whereas the sequential alt would return YES), because the (internal) changeStatus.Wait event has not yet happened. We see the progression from the sequential to the compound alt as being a step of development rather than formal refinement: such (typically small) changes in behaviour are common in software development. 4. Adding Timeouts and Closing of Channels We now extend our compound model from the previous section to capture two additional features of alts, namely timeouts and the closing of channels. We describe these features separately from the main operation of alts, since they are rather orthogonal. Further, this follows the way we developed the implementation, and how we would recommend similar developments are carried out: get the main functionality right, then add the bells and whistles. We describe the treatment of timeouts first. If the alt has a timeout branch, then the waiting stage from the previous design is replaced by a timed wait. If the Facet receives a commit during the wait, it can wake up the MainAlt, much as in Figure 7. Alternatively, if the timeout time is reached, the alt can run the timeout branch. However, there is a complication: the Facet may receive a commit at almost exactly the same time as the timeout is reached — a race condition. In order to resolve this race, we introduce a third component into the compound alt: the Arbitrator will arbitrate in the event of such a race, so that the Facet and MainAlt proceed in a consistent way. Figure 10 corresponds to the earlier Figure 7. The WAIT message informs the Facet that the MainAlt is performing a wait with a timeout. When the Facet subsequently receives a commit message, it checks with the Arbitrator that this commit has not been preempted by a timeout. In the figure, it receives a returned value of true, indicating that there was no race, and so the commit request can be accepted. Figure 11 considers the case where the timeout is reached without a commit message being received in the meantime. The MainAlt checks with the Arbitrator that indeed no commit message has been received, and then deregisters all channels before running the timeout branch. Figures 12 and 13 consider cases where the timeout happens at about the same time as a commit is received. The MainAlt and the Facet both contact the Arbitrator; whichever does so first “wins” the race, so the action it is dealing with is the one whose branch will be executed. If the Facet wins, then the MainAlt waits for the Facet to wake it up (Figure 12). If the MainAlt wins, then the Facet replies NO to the commit, and waits for the MainAlt to finish deregistering channels (Figure 13). We now consider the treatment of channels closing. Recall that if there is no timeout branch and all the channels close, then the alt should run its orelse branch, if there is one, or throw an Abort exception. However, if there is a timeout branch, then it doesn’t matter if all the branches are closed: the timeout branch will eventually be selected. When a channel closes, it sends a chanClosed message to each alt that is registered with it; this message is received by the Facet, which keeps track of the number of channels that have

20

G. Lowe / Implementing Generalised Alt

MainAlt

Chan2

Arbitrator

Facet

Chan1

/

INIT

/

INIT

/

register

o o

NO

o

register NO

/

/

MAYBE

/

WAIT-TO

e

wait

o o o

commit

COMMIT

/

true

o

commit

wakeUp.Chan1

deregister

/

YES

/

DONE

Figure 10. Expanding the alt Chan2

MainAlt

Arbitrator

Facet

/

WAIT-TO wait, timeout

e

TIMEDOUT

o

/

true

/

DEREG

o

Chan1

deregister

/

deregister DONE

/

Figure 11. After a timeout

closed. If an alt subsequently tries to register with the closed channel, it returns a response of CLOSED. When the MainAlt is about to do a non-timed wait, it sends the Facet a setReged message (replacing the WAIT message in Figure 7), including a count of the number of channels with which it has registered. The Facet returns a boolean that indicates whether all the channels have closed. If so, the MainAlt runs its orelse branch or throws an Abort exception. Otherwise, if subsequently the Facet receives sufficient chanClosed messages such that all channels are closed, it wakes up the MainAlt by sending it an allClosed message; again, the MainAlt either runs its orelse branch or throws an Abort exception.

21

G. Lowe / Implementing Generalised Alt

Chan2

MainAlt

Arbitrator

Facet

/

WAIT-TO wait, timeout

e

o o

commit

COMMIT

/

true

/

TIMEDOUT

o

Chan1

false

e

wait

o

wakeUp.chan1

/

YES

Figure 12. A commit beating a timeout in a race Chan2

MainAlt

Arbitrator

Facet

Chan1

/

WAIT-TO

e

wait, timeout TIMEDOUT

o

/

o

commit

true

o

COMMIT

/

false

NO DEREG

/

/

Figure 13. A timeout beating a commit in a race

4.1. CSP Model We now describe a CSP model that capture the behaviour described informally above. We extend the types of ports, responses and status values appropriately. datatype P o r t = I n P o r t . ChannelId datatype Resp = YES | NO | MAYBE datatype S t a t u s = I n i t | Pause | Dereg | Commit

| OutPort . ChannelId | TIMEOUT | ORELSE | CLOSED WaitTO | Done | | Timedout

We extend the type of signals to include timeouts and orelse. We also include events to indicate that a process has aborted, and (to help interpret debugging traces) that a process has timed out. TIMEOUTSIG = 0 ORELSESIG = −1

22

G. Lowe / Implementing Generalised Alt

channel s i g n a l : A l t I d . union ( ChannelId , { TIMEOUTSIG , ORELSESIG} ) channel a b o r t : A l t I d channel t i m e o u t : A l t I d

Finally we add channels for a (CSO) channel to inform an alt that it has closed, and for communications with the Arbitrator (for simplicity, the latter channel captures communications in both directions using a single event). channel chanClosed : P o r t . A l t I d channel checkRace : S t a t u s . Bool

The alt is constructed from the three components, synchronising on and hiding the internal communications: A l t (me, ps ) = l e t A = {| wakeUp , changeStatus , getToRun , getToRunNo |} w i t h i n ( ( M a i n A l t (me, ps ) [| A |] Facet (me ) ) [| {|checkRace|} |] A r b i t r a t o r ( I n i t ) ) \ union ( A , {|checkRace|} )

The definition of the MainAlt is mostly similar to as in Section 3, so we just describe the main differences here. It starts by initialising the other two components, before registering with channels as earlier. M a i n A l t (me, ps ) = changeStatus ! I n i t → checkRace . I n i t ?b → MainAltReg (me, ps , { } , ps , { } ) MainAltReg (me, ps , reged , toReg , maybes ) = i f toReg =={} then i f maybes=={} then i f member ( TIMEOUT, ps ) then MainAltWaitTimeout (me, ps , reged ) else M a i n A l t W a i t (me, ps , reged ) else r e t r y .me → changeStatus ! Pause → MainAltPause (me, ps , reged , maybes ) else  p : toReg • i f p==TIMEOUT o r p==ORELSE then MainAltReg (me, ps , reged , remove ( toReg , p ) , maybes ) else r e g i s t e r .me. p → r e g i s t e r R e s p ?p ’ ! me? resp → A s s e r t ( p ’ = = p ) ( i f resp ==YES then changeStatus ! Dereg → MainAltDereg (me, ps , remove ( reged , p ) , p ) else i f resp ==NO then MainAltReg (me, ps , add ( reged , p ) , remove ( toReg , p ) , maybes ) else −− resp ==MAYBE MainAltReg (me, ps , reged , remove ( toReg , p ) , add ( maybes , p ) ) ) MainAltPause (me, ps , reged , maybes ) = STOP  ( getToRunNo → MainAltReg (me, ps , reged , maybes , { } ) 2 getToRun?p → MainAltDereg (me, ps , remove ( reged , p ) , p ) )

Before doing an untimed wait, the MainAlt sends a message to the Facet on setReged, giving the number of registered channels, and receiving back a boolean indicating whether all branches are closed. If so (state MainAltAllClosed) it runs the orelse branch if there is one, or aborts. If not all branches are closed, it waits to receive either a wakeUp or allClosed message.

G. Lowe / Implementing Generalised Alt

23

M a i n A l t W a i t (me, ps , reged ) = setReged ! card ( reged ) ? a l l B r a n c h e s C l o s e d → i f a l l B r a n c h e s C l o s e d then M a i n A l t A l l C l o s e d (me, ps , reged ) else −− w a i t f o r s i g n a l from Facet wakeUp?p : reged → MainAltDereg (me, ps , remove ( reged , p ) , p ) 2 a l l C l o s e d → M a i n A l t A l l C l o s e d (me, ps , reged ) M a i n A l t A l l C l o s e d (me, ps , reged ) = i f member (ORELSE, ps ) then changeStatus ! Dereg → MainAltDereg (me, ps , reged , ORELSE) else a b o r t .me → STOP

The state MainAltWaitTimeout describes the behaviour of waiting with the possibility of selecting a timeout branch. The MainAlt can again be woken up by a wakeUp event; we also model the possibility of an allClosed event, but signal an error if one occurs (subsequent analysis with FDR verifies that they can’t occur). We signal a timeout on the timeout channel. The MainAlt then checks with the Arbitrator whether it has lost a race with a commit; if not (then branch) it runs the timeout branch; otherwise (else branch) it waits to be woken by the Facet. MainAltWaitTimeout (me, ps , reged ) = changeStatus ! WaitTO → ( ( wakeUp?p : reged → MainAltDereg (me, ps , remove ( reged , p ) , p ) 2 a l l C l o s e d → e r r o r → STOP )  t i m e o u t .me → checkRace . Timedout? resp → i f resp then changeStatus ! Dereg → MainAltDereg (me, ps , reged , TIMEOUT) else wakeUp?p : reged → MainAltDereg (me, ps , remove ( reged , p ) , p ) ) MainAltDereg (me, ps , toDereg , p ) = i f toDereg =={} then changeStatus ! Done → s i g n a l .me. chanOf ( p ) → M a i n A l t (me, ps ) else  p1 : toDereg • d e r e g i s t e r .me. p1 → MainAltDereg (me, ps , remove ( toDereg , p1 ) , p )

The model of the Facet is a fairly straightforward extension of that in Section 3, dealing with the closing of channels and communications with the Arbitrator as described above. Facet (me) = changeStatus ?s → A s s e r t ( s== I n i t ) ( FacetReg (me, 0 ) ) FacetReg (me, c l o s e d ) = commit?p : a p o r t s (me ) ! me → commitResp .me. p !MAYBE → FacetReg (me, c l o s e d ) 2 changeStatus ?s → ( i f s==WaitTO then FacetWaitTimeout (me, c l o s e d ) else i f s==Dereg then FacetDereg (me) else A s s e r t ( s==Pause ) ( FacetPause (me, c l o s e d ) ) ) 2 chanClosed?p : a p o r t s (me ) ! me → A s s e r t ( closed ... // code for P1 ... case n => ... // code for Pn }

and by providing procedures of the following form, for k = 1,...,n (corresponding to events of the form ek !arg in the other process). def ek (arg : Tk ) = synchronized{ assert(waiting ); wakeUpType = k; xk = arg; // pass data waiting = false ; notify (); // wake up waiting process }

G. Lowe / Implementing Generalised Alt

31

Here waiting, wakeUpType and xk (k = 1,...,n) are private variables of the monitor. In order for this to work, we need to ensure that other processes try to perform one of e1 ,. . . ,en only when this process is in this waiting state. Further, we need to be sure that no other process calls one of the main procedures f1 ,. . . ,fn while this process is in this state. We can test for both of these requirements within our CSP models. The restrictions in the previous paragraph prevent many processes from being directly implemented as monitors. In such cases we believe that we can often follow the pattern corresponding to the use of the Facet: having one monitor that performs most of the functionality, and a second monitor (like the Facet) that keeps track of the state of the main monitor, receives procedure calls, and passes data on to the main monitor where appropriate. In some such cases, it will also be necessary to follow the pattern corresponding to the use of the Arbitrator, to arbitrate in the case of race conditions. We leave further investigation of the relationship between CSP and monitors for future work. 6.2. Priorities An interesting question concerns the behaviour of a system built as the parallel composition of prialts with differing priorities, such as P || Q where: def P = proc{ prialt ( c1 −!−> { c1!1; } | c2 −!−> { c2!2; } ) } def Q = proc{ prialt ( c2 −?−> { println(c2?); } | c1 −?−> { println(c1?); } ) }

It is clear to us that such a system should be able to communicate on either c1 or c2, since both components are; but we should be happy whichever way the choice between the channels is resolved. Consider the implementation in this paper. Suppose that P runs first, and registers with both of its channels before Q runs; then when Q tries to register with c2, it will receive a response of YES, so that branch will run: in other words, Q’s priority will be followed. Similarly, if Q runs first, then P’s priority will be followed. If both run at the same time, so they both receive a response of MAYBE to their second registration attempt, then they will both pause; which channel is chosen depends upon the relative length of their pauses. 6.3. Future Plans Finally, we have plans for developing the implementation of alts further. We would like to change the semantics of alt, so that the alt operator is responsible for performing the read or write of the branch it selects. This will remove the first restriction discussed at the end of Section 5. (This would also remove a possible source of bugs, where the programmer forgets to read or write the channel in question.) This would not change the basic protocol described in this paper. A barrier synchronisation [10] allows n processes to synchronise together, for arbitrary n. It would be useful to extend alts to allow branches to be guarded by barrier synchronisations, as is allowed in JCSP [17]. Acknowledgements We would like to thank Bernard Sufrin for implementing CSO and so interesting us in the subject, and also for numerous discussions involving the intended semantics for alts. We would also like to thank the anonymous referees for a number of useful comments and suggestions.

32

G. Lowe / Implementing Generalised Alt

References [1] Neil Brown. Communicating Haskell Processes: Composable explicit concurrency using monads. In Communicating Process Architectures (CPA 2008), pages 67–83, 2008. [2] Neil Brown. Choice over events using STM. http://chplib.wordpress.com/2010/03/04/ choice-over-events-using-stm/, 2010. [3] Neil Brown. Conjoined events. In Proceedings of the Advances in Message Passing Workshop, 2010. http://twistedsquare.com/Conjoined.pdf. [4] N. Carriero, D. Gelernter, and J. Leichter. Distributed data structures in Linda. In Proc. Thirteenth ACM Symposium on Principles of Programming Languages, pages 236–242, 1986. [5] Formal Systems (Europe) Ltd. Failures-Divergence Refinement—FDR 2 User Manual, 1997. Available via URL http://www.formal.demon.co.uk/FDR2.html. [6] Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy. Composable memory transactions. In PPoPP ’05, pages 48–60, 2005. [7] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985. [8] IEEE 802.3 Ethernet Working Group website, http://www.ieee802.org/3/. [9] INMOS Ltd. The occam Programming Language. Prentice Hall, 1984. [10] H. F. Jordan. A special purpose architecture for finite element analysis. In Proc. 1978 Int. Conf. on Parallel Processing, pages 263–6, 1978. [11] Alistair A. McEwan. Concurrent Program Development. DPhil, Oxford University, 2006. [12] Martin Odersky, Lex Spoon, and Bill Venners. Programming in Scala. Artima Press, 2008. [13] A. W. Roscoe. The Theory and Practice of Concurrency. Prentice Hall, 1997. [14] Bernard Sufrin. Communicating Scala Objects. In Proceedings of Communicating Process Architectures (CPA 2008), 2008. [15] Bernard Sufrin. CSO API documentation. http://users.comlab.ox.ac.uk/bernard.sufrin/CSO/ doc/, 2010. [16] Andrew S. Tanenbaum. Computer Networks. Prentice Hall, 1996. [17] Peter Welch, Neil Brown, James Moores, Kevin Chalmers, and Bernhard Sputh. Integrating and extending JCSP. In Communicating Process Architectures (CPA 2007), 2007. [18] Peter Welch, Neil Brown, James Moores, Kevin Chalmers, and Bernhard Sputh. Alting barriers: synchronisation with choice in Java using CSP. Concurrency and Computation: Practice and Experience, 22:1049–1062, 2010.

A. Code Listing We give here the code for the MainAlt. private object MainAlt extends Pausable{ private var waiting = false ; // flag to indicate the alt is waiting private var toRun = −1; // branch that should be run private var allBranchesClosed = false; // are all branches closed? private var n = 0; // index of current event /∗ Execute the alt ∗/ def apply (): Unit = synchronized { Facet.changeStatus(INIT); Arbitrator .checkRace(INIT); var enabled = new Array[Boolean](eventCount); // values of guards var reged = new Array[Boolean](eventCount); // is event registered ? var nReged = 0; // number of registered events var done = false; // Have we registered all ports or found a match? var success = false; // Have we found a match? var maybes = false; // have we received a MAYBE? var timeoutMS : Long = 0; // delay for timeout var timeoutBranch = −1; // index of timeout branch var orElseBranch = −1; // index of orelse branch if ( priAlt ) n=0; toRun = −1; allBranchesClosed = false;

G. Lowe / Implementing Generalised Alt // Evaluate guards; this must happen before registering with channels for( i =0 || orElseBranch>=0) throw new RuntimeException(”Multiple timeout/orelse branches in alt”); else{ timeoutMS = tf(); timeoutBranch = n; reged(n) = true; } case Alt.OrElseEvent( , ) => if (timeoutBranch>=0 || orElseBranch>=0) throw new RuntimeException(”Multiple timeout/orelse branches in alt”); else{ orElseBranch = n; reged(n) = true; } case => { // InPortEvent or OutPortEvent event. register ( theAlt ,n) match{ case YES => { Facet.changeStatus(DEREG); toRun = n; done=true; success=true; } case NO => { reged(n) = true; nReged += 1; } case MAYBE => maybes = true; case CLOSED => enabled(n) = false; // channel has just closed } // end of event . register ( theAlt ,n) match } // end of case } // end of event match } // end of if (enabled(n)) } // end of if (! reged(n)) n = (n+1)%eventCount; count += 1; } // end of inner while if (! done) // All registered , without finding a match if (maybes){ // Random length pause to break symmetry Facet.changeStatus(PAUSE); pause; // see if a commit has come in toRun = Facet.getToRun; if (toRun=0) toRun = orElseBranch else throw new Abort; // Need to wait for a channel to become ready waiting=true; allBranchesClosed = Facet.setReged(nReged); if (! allBranchesClosed) while(waiting) wait(); // wait to be awoken } else{ // with timeout Facet.changeStatus(WAITTO); waiting=true; wait(timeoutMS); // wait to be awoken or for timeout if (waiting){ // assume timeout was reached ( this could be a spurious wakeup) if ( Arbitrator .checkRace(TIMEDOUT)){ waiting = false; toRun = timeoutBranch; } else // A commit was received just before the timeout . while(waiting) wait() // Wait to be woken

33

34

G. Lowe / Implementing Generalised Alt } // end of if ( waiting ) } // end of else (with timeout ) } // end of if (! success ) // Can now run branch toRun, unless allBranchesClosed if (allBranchesClosed) if (orElseBranch>=0) toRun = orElseBranch else throw new Abort; // Deregister events Facet.changeStatus(DEREG); for(n p r i n t f ( ”A i s t r u e , B i s unknown ” ) ; : : ( B == t r u e ) −> p r i n t f ( ”B i s t r u e , A i s unknown ” ) ; : : e l s e −> p r i n t f ( ”A and B a r e f a l s e ” ) ; fi do : : ( s k i p )−> p r i n t f ( ” I f A i s a l w a y s t r u e , t h e n t h i s may n e v e r p r i n t e d . ” ) ; break ; / * b r e a k s t h e do l o o p * / : : (A == t r u e ) −> p r i n t f ( ”A i s t r u e ” ) ; i = i + 1; od

If the SPIN model checker performs an automatic verification of the above code, then it will visit every possible state until it aborts with the error: “max search depth too small”. The reason is that, there is no deterministic set of values for i , thus the system state space can never be completely explored. It is crucial that all control flows have a valid end-state otherwise SPIN can not verify the model. The SPIN model checker can verify models written in Promela. In 1986, Vardi and Wolper [3] published the foundation for SPIN, an automata-theoretic approach to automatic program verification. SPIN [4] can verify a model for correctness by generating a C program that performs an exhaustive verification of the system state space. During simulation and verification SPIN checks for the absence of deadlocks, livelocks, race conditions, unspecified receptions and unexecutable code. The model checker can also be used to show the correctness of system invariants, find non-progress execution cycles and linear time temporal constraints, though we have not used any of those features for the model checking in this paper. 1. Related Work Various possibilities for synchronous communication can be found in most network libraries, but we focus exclusively on network-enabled communication libraries that support Hoare’s CSP algebra [5,6]. Several projects have investigated how to do CSP in a distributed environment. JCSP [7], Pony/occam-π [8] and C++CSP [9] provide network-enabled channels. Common to all three is that they use a specific naming for the channels, such that channels are reserved for one-to-one, one-to-any, network-enabled and so on. JCSP and C++CSP2 have the limitation that they can only do external choice (alt) on some channel types. Pony enables transparent network support for occam-π. Schweigler and Sampson [8] write: “As long as the interface between components (i.e. processes) is clearly defined, the programmer should not need to distinguish whether the process on the other side of the interface is located on the same computer or on the other end of the globe”. Unfortunately the pony implementation in occam-π is difficult to use as basis for a CSP library in languages like C++, Java or Python, as it relies heavily on the internal workings of occam-π. Pony/occam-π does

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

39

not currently have support for networked buffered channels. The communication overhead in Python is quite high, thus we are especially interested in fast one-to-one buffered networked channels, because they have the potential to hide the latency of the network. This would, for large parallel computations, make it possible to overlap computation with communication. 2. The Dynamic Channel We present the basis for a dynamic channel type that combines multiple channel synchronisation mechanisms. The interface of the dynamic channel resembles a single channel type. When the channel is first created, it may be an any-to-any specialised for co-routines. The channel is then upgraded on request, depending on whether it participates in an alt and on the number of channel-ends connected. The next synchronisation level for the channel may be an optimised network-enabled one-to-one with no support for alt. Every upgrade stalls the communication on the channel momentarily while all active requests for a read or write are transformed to a higher synchronisation level. The upgrades continue, until the lowest common denominator (a network-enabled any-to-any with alt support) is reached. This paper presents three models that are crucial parts in the dynamic channel design. These are: a local channel synchronisation model for shared memory, a distributed synchronisation model and the model for on-the-fly switching between synchronisation levels. We have excluded the following features to avoid state-explosion during automatic verification: mobility of channel ends, termination handling, buffered channels, skip / timeout guards and a discovery service for channel homes. Basically, we have simplified a larger model as much as possible and left out important parts, to focus on the synchronisation model handling the communication. The different models are written in Promela to verify the design using the SPIN model checker. The verification phase is presented in section 3 where the three models are modelchecked successfully. The full model-checked models are available at the PyCSP repository [10]. After the following overview, the models are described in detail: • the local synchronisation model is built around the two-phase locking protocol. It provides a single CSP channel type supporting any-to-any communication with basic read / write and external choice (alt). • the distributed synchronisation model is developed from the local model, providing the same set of constructs. The remote communication is similar to asynchronous sockets. • the transition model enables the combination of a local (and faster) synchronisation model with more advanced distributed models. Channels are able to change synchronisation mechanisms, for example based on the location of channel ends, making it a dynamic channel. For all models presented we do not handle operating system errors that cause threads to terminate or lose channel messages. We assume that all models are implemented on top of systems that provide reliable threads and message protocols. 2.1. Channel Synchronisation with Two-Phase Locking The channel model presented here is similar to the PyCSP implementation (threads and processes) from 2009 [11] and will work as a verification of the method used in [11,12]. It is a single CSP channel type supporting any-to-any communication with basic read / write and external choice (alt). In figure 1 we show an example of how the matching of channel operations comes about. Four processes are shown communicating on two channels using the presented design for negotiating read, write and external choice. Three requests have been posted to channel A

40

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

and two requests to channel B. During an external choice, a request is posted on multiple channels. Process 2 has posted its request to multiple channels and has been been matched. Process 1 is waiting for a successful match. Process 3 has been matched and is going to remove its request. Process 4 is waiting for a successful match. In the future, process 1 and process 4 are going to be matched. The matching is initiated by both, but only one process marks the match as successful. Channel A

Read queue READY

Write queue SUCCESS(B)

Requests 1

Process 1 Read value from channel A

READY 2 4 3 Channel B

Read queue SUCCESS(B)

Process 2 External choice (alt) on the channel operations: ▪ Read from B ▪ Write value to A

Write queue SUCCESS

Process 3 Write value to channel B

Process 4 Write value to channel A

Figure 1. Example of four processes matching channel operations on two channels.

Listing 3. Simple model of a mutex lock with a condition variable. This is the minimum functionality, which can be expected from any multi-threading library. typedef processtype { mtype s t a t e ; bit lock ; b i t waitX ; }; p r o c e s s t y p e p r o c [THREADS ] ; inline acquire ( lock id ) { a t o m i c { ( p r o c [ l o c k i d ] . l o c k == 0 ) ; p r o c [ l o c k i d ] . l o c k = 1 ; } } inline release ( lock id ) { proc [ l o c k i d ] . lock = 0; } inline wait ( lock id ) { a s s e r t ( p r o c [ l o c k i d ] . l o c k == 1 ) ; / * l o c k m u s t be a c q u i r e d * / atomic { release ( lock id ); p r o c [ l o c k i d ] . waitX = 0 ; /* r e s e t wait condition */ } ( p r o c [ l o c k i d ] . waitX == 1 ) ; / * w a i t * / acquire ( lock id ); } inline notify ( lock id ) { a s s e r t ( p r o c [ l o c k i d ] . l o c k == 1 ) ; / * l o c k m u s t be a c q u i r e d * / p r o c [ l o c k i d ] . waitX = 1 ; / * wake up w a i t i n g p r o c e s s * / }

We use the two-phase locking protocol for channel synchronisation. When two processes are requesting to communicate on a channel, we accept the communication by first acquiring

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

41

the two locks, then checking the state of the two requests and if successful, updating and finally the two locks are released. This method requires many lock requests resulting in a large overhead, but it has the advantage that it never has to roll-back from trying to update a shared resource. To perform the local synchronisation between threads, we implement the simple lock model shown in listing 3. This is straight-forward to model in Promela, as every statement in Promela must be executable and will block the executing thread until it becomes executable. The implemented lock model is restricted to single processes calling wait . If multiple processes called wait , then the second could erase a recent notify . For the models in the paper, we never have more than one waiting process on each lock. Now that we can synchronise processes, the process state proc[ id ]. state can be protected on read and update. When blocked, we wait on a condition lock instead of wasting cycles using busy waiting, but the condition lock adds a little overhead. To avoid deadlocks, the process lock must be acquired before a process initiates a wait on a condition lock and before another process notifies the condition lock. The process calls wait in write (Listing 4) and is blocked until notified by offer (Listing 6). The offer function is called by the matching algorithm, which is initiated when a request is posted. To provide an overview, figure 2 shows a pseudo call graph of the model with all inline functions and the call relationship. A process can call read, write or alt to communicate on channels. These then posts the necessary requests to the involved channels and the matching algorithm calls offer for all matching pairs. Eventually a matching pair arrives at a success and the waiting process is notified. Communicating process

Channel.read

Alt

Remove request from all involved channels

Channel.remove_write

Channel.remove_read

Channel.write

Initialise request and then post the request to all involved channels

Channel.post_write

Channel.offer - Test matched requests for possible success Request.state: SUCCESS

Lock.notify - Wake up sleeping process

Request.state: READY

Channel.post_read

Channel.match Read and write requests

ditio con ked led c o l b b a Set to en

Lock.wait - Sleep if no match could be made

n

Figure 2. Pseudo call graph for the local channel synchronisation.

In write (Listing 4), a write request is posted to the write queue of the channel and again removed after a successful match with a write request. The corresponding functions read , post read and remove read are not shown since they are similar, except that remove read returns the read value.

42

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

Listing 4. The write construct and the functions for posting and removing write requests. The process index pid contains the Promela thread id. i n l i n e w r i t e ( c h i d , msg ) { p r o c [ p i d ] . s t a t e = READY; p o s t w r i t e ( c h i d , msg ) ; / * i f no s u c c e s s , t h e n w a i t f o r s u c c e s s * / acquire ( pid ); if : : ( p r o c [ p i d ] . s t a t e == READY) −> w a i t ( p i d ) ; : : e l s e skip ; fi ; release ( pid ); a s s e r t ( p r o c [ p i d ] . s t a t e == SUCCESS ) ; remove write ( ch id ) } i n l i n e p o s t w r i t e ( ch id , msg to write ) { /* acquire channel lock */ a t o m i c { ( ch [ c h i d ] . l o c k == 0 ) −> ch [ c h i d ] . l o c k = 1 ; }

match ( c h i d ) ; ch [ c h i d ] . l o c k = 0 ; / * r e l e a s e c h a n n e l l o c k * /

} inline remove write ( ch id ) { /* acquire channel lock */ a t o m i c { ( ch [ c h i d ] . l o c k == 0 ) −> ch [ c h i d ] . l o c k = 1 ; }

}

ch [ c h i d ] . l o c k = 0 ; / * r e l e a s e c h a n n e l l o c k * /

When matching read and write requests on a channel we use the two-phase locking protocol where the locks of both involved processes are acquired before the system state is changed. To handle specific cases where multiple processes have posted multiple read and write requests, a global ordering of the locks (Roscoe’s deadlock rule 7 [13]) must be used to make sure they are always acquired in the same order. In this local thread system we order the locks based on their memory address. This is both quick and ensures that the ordering never changes during execution. An alternative index for a distributed system would be to generate an index as a combination of the node address and the memory address. Listing 5. Matching pairs of read and write requests for the two-phase locking. i n l i n e match ( c h i d ) { w = 0; r = 0; do / * Matching a l l reads t o a l l w r i t e s * / : : ( r w = 0; do : : (w o f f e r ( ch id , r , w) ; w = w+ 1 ; : : e l s e break ; od ; r = r +1; : : e l s e break ; od ; }

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

43

The two-phase locking in offer (Listing 6) is executed for every possible pair of read and write requests found by match (Listing 5). The first phase acquires locks and the second phase releases locks. Between the two phases, updates can be made. Eventually when a matching is successful, three things are updated: the condition lock of both processes is notified, the message is transferred from the writer to the reader and proc[ id ]. state is updated. One disadvantage of the two-phase locking is that we may have to acquire the locks of many read and write requests that are not in a ready state. The impact of this problem can easily be reduced by testing the state variable before acquiring the lock. Normally, this behaviour results in a race condition. However, the request can never change back to the ready state once it has been committed and remains posted on the channel. Because of this, the state can be tested before acquiring the lock, in order to find out whether time should be spent acquiring the lock. When the lock is acquired, the state must be checked again to ensure the request is still in the ready state. PyCSP [10] uses this approach in a similar offer method to reduce the number of acquired locks. Listing 6. The offer function offering a possible successful match between two requests. i n l i n e o f f e r ( c h i d , r , w) { r p i d = ch [ c h i d ] . r q u e u e [ r ] . i d ; w p i d = ch [ c h i d ] . wqueue [w ] . i d ; i f /* acquire locks using global ordering */ : : ( r p i d < w p i d ) −> a c q u i r e ( r p i d ) ; a c q u i r e ( w pid ) ; : : e l s e s k i p −> a c q u i r e ( w pid ) ; a c q u i r e ( r p i d ) ; fi ; i f / * Does t h e two p r o c e s s e s match ? * / : : ( p r o c [ r p i d ] . s t a t e == READY && p r o c [ w p i d ] . s t a t e == READY) −> p r o c [ r p i d ] . s t a t e = SUCCESS ; p r o c [ w p i d ] . s t a t e = SUCCESS ; / * T r a n s f e r message * / ch [ c h i d ] . r q u e u e [ r ] . msg ch [ c h i d ] . wqueue [w ] . msg proc [ r p i d ] . r e s u l t c h = proc [ w pid ] . r e s u l t c h =

= ch [ c h i d ] . wqueue [w ] . msg ; = NULL ; ch id ; ch id ;

notify ( r pid ); n o t i f y ( w pid ) ; / * b r e a k match l o o p by u p d a t i n g w and r * / w = LEN ; r = LEN ; : : e l s e skip ; fi ; i f /* release locks using reverse global ordering */ : : ( r p i d < w p i d ) −> r e l e a s e ( w pid ) ; r e l e a s e ( r p i d ) ; : : e l s e s k i p −> r e l e a s e ( r p i d ) ; r e l e a s e ( w pid ) ; fi ; }

The alt construct shown in listing 7 is basically the same as a read or write, except that the same process state is posted to multiple channels, thus ensuring that only one will be matched. The alt construct should scale linearly with the number of guards. For the verification of the model we simplify alt to only accept two guards. If the model is model-checked success-

44

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

fully with two guards we expect an extended model to model-check successfully with more than two guards. Adding more guards to the alt construct in listing 7 is a very simple task, but it enlarges the system state-space and is unnecessary for the results presented in this paper. Listing 7. The alt construct. i n l i n e a l t ( c h i d 1 , op1 , msg1 , c h i d 2 , op2 , msg2 , r e s u l t c h a n , r e s u l t ) { p r o c [ p i d ] . s t a t e = READY; r e s u l t = NULL ; if : : ( op1 == READ) −> p o s t r e a d ( c h i d 1 ) ; :: else p o s t w r i t e ( c h i d 1 , msg1 ) ; fi ; if : : ( op2 == READ) −> p o s t r e a d ( c h i d 2 ) ; :: else p o s t w r i t e ( c h i d 2 , msg2 ) ; fi ; acquire ( pid ); / * i f no s u c c e s s , t h e n w a i t f o r s u c c e s s * / if : : ( p r o c [ p i d ] . s t a t e == READY) −> w a i t ( p i d ) ; : : e l s e skip ; fi ; release ( pid ); a s s e r t ( p r o c [ p i d ] . s t a t e == SUCCESS ) ; if : : ( op1 == READ) −> r e m o v e r e a d ( c h i d 1 , r e s u l t ) ; :: else remove write ( ch id1 ) ; fi ; if : : ( op2 == READ) −> r e m o v e r e a d ( c h i d 2 , r e s u l t ) ; :: else remove write ( ch id2 ) ; fi ; r e s u l t c h a n = proc [ pid ] . r e s u l t c h ; }

2.2. Distributed Channel Synchronisation The local channel synchronisation described in the previous section has a process waiting until a match has been made. The matching protocol performs a continuous two-phase locking for all pairs, thus the waiting process is constantly being tried even though it is passive. This method is not possible in a distributed model with no shared memory, instead an extra process is created to function as a remote lock, protecting updates of the posted channel requests. Similar to the local channel synchronisation, we must lock both processes in the offer function and retrieve the current process state from the process. Finally, when a match is found, both processes are notified and their process states are updated. In figure 3, an overview of the distributed model is shown. The communicating process can call read, write or alt to communicate on channels. These then post the necessary requests to the involved channels through a Promela message channel. The channel home (channelThread) receives the request and initiates the matching algorithm to search for a successful offer amongst all matching pairs. During an offer, the channel home communicates with the lock processes (lockThread) to ensure that no other channel home conflicts. Finally, a matching pair arrives at a success and the lock process can notify the waiting process. In listing 8 all Promela channels are created with a buffer size of 10 to model an asynchronous connection. We have chosen a buffer size of 10, as this is large enough to never get filled during verification in section 3. Every process communicating on a channel is required to have a lock process (Listing 9) associated, to handle the socket communication going in on proc * chan types.

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

Communicating process

Node running communicating process and lockThread

Channel.read

Request.state: READY

Alt

Initialise request and then post the request to all involved channels

Channel.write

Remove request from all involved channels

Lock.wait - Sleep if no success yet

ch_cmd_chan ! POST_WRITE

ch_cmd_chan ! POST_READ

ch_cmd_chan ! REMOVE_WRITE

ch_cmd_chan ! REMOVE_READ

network traffic Node running channelThread

Set blocked condition to enabled

channelThread

proc_release_lock_chan ! RELEASE_LOCK

Channel.match Read and write requests

proc_acquire_lock_chan ! ACQUIRE_LOCK

Channel.offer - Test matched requests for possible success

proc_cmd_chan ! REMOVE_ACK

proc_release_lock_chan ! NOTIFY_SUCCESS

network traffic pr Node running communicating process and lockThread Lock.notify - Wake up sleeping process

Request.state: SUCCESS

lockThread

ch_accept_lock_chan ! ACCEPT_LOCK

Figure 3. Pseudo call graph for the distributed channel synchronisation.

Listing 8. Modeling asynchronous sockets. / * D i r e c t i o n : c o m m u n i c a t i n g p r o c e s s −> c h a n n e l T h r e a d * / chan c h c m d c h a n [ C ] = [ 1 0 ] o f { byte , byte , b y t e } ; / * cmd , p i d , msg * / # d e f i n e POST WRITE 1 # d e f i n e POST READ 2 # d e f i n e REMOVE WRITE 3 # d e f i n e REMOVE READ 4 / * D i r e c t i o n : c h a n n e l T h r e a d −> c o m m u n i c a t i n g p r o c e s s * / chan p r o c c m d c h a n [ P ] = [ 1 0 ] o f { byte , byte , b y t e } ; / * cmd , ch , msg * / # d e f i n e REMOVE ACK 9 / * D i r e c t i o n : c h a n n e l T h r e a d −> l o c k T h r e a d * / chan p r o c a c q u i r e l o c k c h a n [ P ] = [ 1 0 ] o f { b y t e } ; / * ch * / / * D i r e c t i o n : l o c k T h r e a d −> c h a n n e l T h r e a d * / chan c h a c c e p t l o c k c h a n [ C ] = [ 1 0 ] o f { byte , b y t e } ; / * p i d , p r o c s t a t e * / / * D i r e c t i o n : c h a n n e l T h r e a d −> l o c k T h r e a d * / chan p r o c r e l e a s e l o c k c h a n [ P ] = [ 1 0 ] o f { byte , byte , b y t e } / * cmd , ch , msg * / # d e f i n e RELEASE LOCK 7 # d e f i n e NOTIFY SUCCESS 8

45

46

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

The lockThread in listing 9 handles the remote locks for reading and updating the process state from the channel home thread. The two functions remote acquire and remote release are called from the channel home process during the offer procedure. The lockThread and the communicating process use the mutex lock operations from listing 3 for synchronisation. Listing 9. The lock process for a communicating process. proctype lockThread ( byte id ) { b y t e c h i d , cmd , msg ; byte ch id2 ; bit locked ; do : : p r o c a c q u i r e l o c k c h a n [ i d ] ? c h i d −> c h a c c e p t l o c k c h a n [ c h i d ] ! id , proc [ i d ] . s t a t e ; locked = 1; do : : p r o c r e l e a s e l o c k c h a n [ i d ] ? cmd , c h i d 2 , msg ; −> if : : cmd == RELEASE LOCK −> a s s e r t ( c h i d == c h i d 2 ) ; break ; : : cmd == NOTIFY SUCCESS −> a s s e r t ( c h i d == c h i d 2 ) ; acquire ( id ) ; / * m u t e x l o c k op * / p r o c [ i d ] . s t a t e = SUCCESS ; proc [ id ] . r e s u l t c h = ch id2 ; p r o c [ i d ] . r e s u l t m s g = msg ; notify ( id ) ; / * m u t e x l o c k op * / release ( id ) ; / * m u t e x l o c k op * / fi ; od ; locked = 0; : : p r o c c m d c h a n [ i d ] ? cmd , c h i d , msg −> if : : cmd == REMOVE ACK −> p r o c [ i d ] . w a i t i n g r e m o v e s −−; fi ; : : t i m e o u t −> a s s e r t ( l o c k e d == 0 ) ; a s s e r t ( p r o c [ i d ] . w a i t i n g r e m o v e s == 0 ) ; break ; od ; } i n l i n e remote acquire ( ch id , lock pid , g e t s t a t e ) { proc acquire lock chan [ lock pid ]! ch id ; c h a c c e p t l o c k c h a n [ c h i d ] ? id , g e t s t a t e ; a s s e r t ( l o c k p i d == i d ) ; } i n l i n e remote release ( ch id , lock pid ) { p r o c r e l e a s e l o c k c h a n [ l o c k p i d ] ! RELEASE LOCK , c h i d , NULL ; }

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

47

The offer function in listing 10 performs a distributed version of the function in listing 6. In this model we exchange the message from the write request to the read request, update the process state to SUCCESS, notifies the condition lock and release the lock process, all in one transmission to the Promela channel proc release lock chan . We may still have to acquire the locks of many read and write requests that are not in ready state. Acquiring the locks are now more expensive than for the local channel model and it would happen more often, due to the latency of getting old requests removed. If an extra flag is added to a request the offer function can update the flag on success. If the flag is set, we know that the request has already been accepted and we avoid the extra remote lock operations. If the flag is not set, the request may still be old and not ready, as it might have been accepted by another process. Listing 10. The offer function for distributed channel communication. i n l i n e o f f e r ( c h i d , r , w) { r p i d = ch [ c h i d ] . r q u e u e [ r ] . i d ; w p i d = ch [ c h i d ] . wqueue [w ] . i d ; i f /* acquire locks using global ordering */ : : ( r p i d < w p i d ) −> remote acquire ( ch id , r pid , r s t a t e ) ; r e m o t e a c q u i r e ( c h i d , w pid , w s t a t e ) ; : : e l s e s k i p −> r e m o t e a c q u i r e ( c h i d , w pid , w s t a t e ) ; remote acquire ( ch id , r pid , r s t a t e ) ; fi ; i f / * Does t h e two p r o c e s s e s match ? * / : : ( r s t a t e == READY && w s t a t e == READY) −> proc release lock chan [ r pid ]! NOTIFY SUCCESS , c h i d , ch [ c h i d ] . wqueue [w ] . msg ; p r o c r e l e a s e l o c k c h a n [ w pid ] ! NOTIFY SUCCESS , c h i d , NULL ; w = LEN ; r = LEN ; / * b r e a k match l o o p * / : : e l s e skip ; fi ; i f /* release locks using reverse global ordering */ : : ( r p i d < w p i d ) −> r e m o t e r e l e a s e ( ch id , w pid ) ; remote release ( ch id , r p i d ) ; : : e l s e s k i p −> remote release ( ch id , r p i d ) ; r e m o t e r e l e a s e ( ch id , w pid ) ; fi ; }

Every channel must have a channel home, where the read and write requests for communication are held and the offers are made. The channel home invokes the matching algorithm for every posted request, as the post * functions did in the local channel model. In this model every channel home is a process (Listing 11). In another implementation there might only be one process per node maintaining multiple channel homes through a simple channel dictionary.

48

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model Listing 11. The channel home process.

proctype channelThread ( byte c h i d ) { DECLARE LOCAL CHANNEL VARS do : : c h c m d c h a n [ c h i d ] ? cmd , i d , msg −> if : : cmd == POST WRITE −>

match ( c h i d ) ; : : cmd == POST READ −>

match ( c h i d ) ; : : cmd == REMOVE WRITE −>

p r o c c m d c h a n [ i d ] ! REMOVE ACK, c h i d , NULL ; : : cmd == REMOVE READ −>

p r o c c m d c h a n [ i d ] ! REMOVE ACK, c h i d , NULL ; fi ; : : t i m e o u t −> / * c o n t r o l l e d s h u t d o w n * / / * r e a d and w r i t e q u e u e s m u s t be e m p t y * / a s s e r t ( ch [ c h i d ] . r l e n == 0 && ch [ c h i d ] . wlen == 0 ) ; break ; od ; }

The functions read , write and alt are for the distributed channel model identical to the local channel model. We can now transfer a message locally using the local channel model or between nodes using the distributed channel model. 2.3. Dynamic Synchronisation Layer The following model will allow channels to change the synchronisation mechanism on-thefly. This means that a local channel can be upgraded to become a distributed channel. Activation of the upgrade may be caused by a remote process requesting to connect to the local channel. The model presented in this section can not detect which synchronisation mechanism to use, it must be set explicitly. If channel-ends were part of the implementation, a channel could keep track of the location of all channel-ends and thus it would know what mechanism to use. A feature of the dynamic synchronisation mechanism is that specialised channels can be used, such as a low-latency one-to-one channel resulting in improved communication time and lower latency. The specialised channels may not support constructs like external-choice (alt), but if an external-choice occurs the channel is upgraded. The upgrade procedure adds an overhead, but since channels are often used more than once this is an acceptable overhead. Figure 4 shows an overview of the transition model. In the figure, the communicating process calls read or write to communicate on channels. These then call the functions enter , wait and leave functions. The enter function posts the request to the channel. The wait function ensures that the post is posted at the correct synchronisation level, otherwise it calls the transcend function. The leave function is called, when the request has been matched successfully. The model includes a thread that at any time activates a switch in synchronisation level and thus may force a call to the transcend function.

49

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

Communicating process

Channel.read

Ch.wait_read

Ch.enter_read

Channel.write

Ch.leave_read

Ch.enter_write

Ch.leave_write

Ch.wait_write Request.state: READY

Request.state: READY transcend_read remove using old sync_level and post with new

Channel.post_read

Initialise request and then post the request to all involved channels

Channel.remove_read

Remove request from all involved channels

Channel.post_write

transcend_write remove using old sync_level and post with new

Channel.remove_write

Lock.wait - Sleep if no match could be made Channel.match Read and write requests

Lock.notify - Wake up sleeping process

Request.state: SUCCESS R

Lock.notify - Wake up sleeping process

Channel.offer - Test matched requests for possible success

Request.state: READY

Thread

Channel.switch_sync_level

Figure 4. Pseudo call graph for the dynamic synchronisation layer.

To model the transition between two levels (layers) we set up two groups of channel request queues and a synchronisation level variable per channel. Every access to a channel variable includes the channel id and the new synchronisation level variable sync level . Every communicating process is viewed as a single channel-end and is provided with a proc sync level . This way the communicating process will know the synchronisation level that it is currently at, even though the sync level variable for the channel changes. The synchronisation level of a channel may change at any time using the switch sync level function in listing 12. The match and offer functions from section 2.1 have been extended with a sync level parameter used to access the channel container. The post * functions update the proc sync level variable to the channel synchronisation level before posting a request, while the remove * functions read the proc sync level variable and uses the methods of that level to remove the request. Other than that, the functions match, offer , post * and remove * are similar to the ones from the local channel model. The switching of synchronisation level in listing 12 works by notifying all processes with a request for communication posted to the channel. The channel sync level variable is changed before notifying processes. In listing 14 when a process either tries to enter wait or is awoken by the notification, it will check that the proc sync level variable of the posted request still matches the sync level variable of the channel. If these do not match, we activate the transcend (Listing 13) function. During a transition, the proc state variable is temporarily changed to SYNC, so that the request is not matched by another process between release and leave read . The leave read function calls remove read which uses the proc sync level variable to remove the request and enter read calls post read which uses the updated channel sync level variable.

50

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model Listing 12. Switching the synchronisation level of a channel.

i n l i n e s w i t c h s y n c l e v e l ( ch id , t o l e v e l ) { b y t e SL ; b y t e r , w, r p i d , w p i d ; SL = ch [ c h i d ] . s y n c l e v e l ; atomic { ( ch [ c h i d ] . l v l [ SL ] . l o c k == 0 ) −> ch [ c h i d ] . l v l [ SL ] . l o c k = 1 ; } / * a c q u i r e * / ch [ c h i d ] . s y n c l e v e l = t o l e v e l ;

}

/* Notify connected processes */ r = 0; do : : ( r r p i d = ch [ c h i d ] . l v l [ SL ] . r q u e u e [ r ] ; acquire ( r pid ); if : : p r o c s t a t e [ r p i d ] == READY −> notify ( r pid ) ; /* Notify process to transcend */ : : e l s e −> s k i p ; fi ; release ( r pid ); r = r +1; : : e l s e break ; od ; w = 0; do : : (w w p i d = ch [ c h i d ] . l v l [ SL ] . wqueue [w ] ; a c q u i r e ( w pid ) ; if : : p r o c s t a t e [ w p i d ] == READY −> n o t i f y ( w pid ) ; / * N o t i f y p r o c e s s t o t r a n s c e n d * / : : e l s e −> s k i p ; fi ; r e l e a s e ( w pid ) ; w = w+ 1 ; : : e l s e break ; od ; ch [ c h i d ] . l v l [ SL ] . l o c k = 0 ; / * r e l e a s e * /

Listing 13. The transition mechanism for upgrading posted requests. inline transcend read ( ch id ) { p r o c s t a t e [ p i d ] = SYNC ; release ( pid ); leave read ( ch id ); enter read ( ch id ); acquire ( pid ); }

In listing 14 the read function from the local channel model (Section 2.1) is split into an enter, wait and leave part. To upgrade blocking processes we use the transition mechanism in listing 13 which can only be used between an enter and a leave part. We require that all synchronisation levels must have an enter part, a wait / notify state and a leave part.

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

51

Listing 14. The read function is split into an enter, wait and leave part. inline enter read ( ch id ) { p r o c s t a t e [ p i d ] = READY; post read ( ch id ); } inline wait read ( ch id ) { / * i f no s u c c e s s , t h e n w a i t f o r s u c c e s s * / acquire ( pid ); do : : ( p r o c s y n c l e v e l [ p i d ] == ch [ c h i d ] . s y n c l e v e l ) && ( p r o c s t a t e [ p i d ] == READY) −> wait ( pid ) ; : : ( p r o c s y n c l e v e l [ p i d ] ! = ch [ c h i d ] . s y n c l e v e l ) && ( p r o c s t a t e [ p i d ] == READY) −> transcend read ( ch id ); : : e l s e break ; od ; release ( pid ); } i n l i n e l e a v e r e a d ( ch id ){ a s s e r t ( p r o c s t a t e [ p i d ] == SUCCESS | | p r o c s t a t e [ p i d ] == SYNC ) ; remove read ( ch id ) ; } inline read ( ch id ) { enter read ( ch id ); wait read ( ch id ); leave read ( ch id ); }

The three models presented can be used separately for new projects or they can be combined to the following: a CSP library for a high-level programming language where channelends are mobile and can be sent to remote locations. The channel is automatically upgraded, which means that the communicating processes can exist as co-routines, threads and nodes. Specialised channel implementations can be used without the awareness of the communicating processes. Any channel implementation working at a synchronisation level in the dynamic channel, must provide six functions to the dynamic synchronisation layer: enter read , wait read , leave read , enter write , wait write and leave write . 3. Verification Using SPIN The commands in listing 15 verify the state-space system of a SPIN model written in Promela. The verification process checks for the absence of deadlocks, livelocks, race conditions, unspecified receptions, unexecutable code and user-specified assertions. One of these userspecified assertions checks that the message is correctly transferred for a channel communication. All verifications were run in a single thread on an Intel Xeon E5520 with 24 Gb DDR3 memory with ECC. Listing 15. The commands for running an automatic verification of the models. s p i n −a model . p g c c −o pan −O2 −DVECTORSZ=4196 −DMEMLIM=24000 −DSAFETY \\ −DCOLLAPSE −DMA=1112 pan . c . / pan

52

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

The local and the distributed channel models are verified for six process configurations and the transition model is verified for three process configurations. The results from running the SPIN model checker to verify models is listed in table 1. The automatic verification of the models found no errors. The “threads in model” column shows the threads needed for running the configuration in the specific model. The number of transitions in table 1 does not relate to how a real implementation of the model performs, but is the total amount of different transitions between states. If the number of transitions is high, then the model allows a large number of statements to happen in parallel. The SPIN model checker tries every transition possible, and if all transitions are legal the model is verified successfully for a process configuration. This means that for the verified configuration, the model has no deadlocks, no livelocks, no starvation, no race-conditions and do not fail with a wrong end-state. The longest running verification which completed was the distributed model for the configuration in figure 5(f). This configuration completed after verifying the full state-space in 9 days. This means that adding an extra process to the model would multiply the total number of states to a level where we would not be able to complete a verification of the full statespace. The DiVinE model checker [14] is a parallel LTL model checker that should be able to handle larger models than SPIN, by performing a distributed verification. DiVinE has not been used with the models presented in this paper. Table 1. The results from using the SPIN model checker to verify models. Model Local Local Local Local Local Local Distributed Distributed Distributed Distributed Distributed Distributed Transition sync layer Transition sync layer Transition sync layer

Configuration Fig. 5(a) Fig. 5(b) Fig. 5(c) Fig. 5(d) Fig. 5(e) Fig. 5(f) Fig. 5(a) Fig. 5(b) Fig. 5(c) Fig. 5(d) Fig. 5(e) Fig. 5(f) Fig. 5(a) Fig. 5(c) Fig. 5(d)

Threads in model 2 2 3 4 3 3 5 6 7 9 8 8 3 4 5

Depth 91 163 227 261 267 336 151 245 326 446 406 532 162 346 467

Transitions 1217 10828 149774 2820315 420946 2056700 90260 28042640 18901677 1.1157292e+09 6.771875e+08 1.2102407e+10 43277 18567457 3.9206391e+09

The process configurations in figure 5 cover a wide variety of possible transitions for the local and distributed models. None of the configurations check a construct with more than two processes, but we expect the configurations to be correct for more than two processes. The synchronisation mechanisms are the same for a reading process and a writing process in the presented models. Based on this, we can expect that all the configurations in figure 5 can be mirrored and model-checked successfully. The local one-to-one communication is handled by the configuration in figure 5(a). Configurations in figure 5(c) and figure 5(d) cover the one-to-any and any-to-any cases, and we expect any-to-one to also be correct since it is a mirrored version of a one-to-any. The alt construct supports both input and output guards, thus figure 5(b) presents an obvious configuration to verify. In CSP networks this configuration does not make sense, but the verification of the configuration in figure 5(b) shows that two competing alts configured with the worst-case priority do not cause any livelocks. We must also model-check when alt communicates with reads or writes (Figure 5(e)).

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

53

read write write write

read

alt

(a) write

read

alt

(b) read

(c) read

write

write

alt

alt alt write

read

(d)

(e)

alt alt

(f)

Figure 5. Process configurations used for verification.

Finally, the configuration in figure 5(f) verify when alts are communicating on one-to-any and any-to-one. These configurations cover most situations for up to two processes.

4. Conclusions We have presented three building blocks for a dynamic channel capable of transforming the internal synchronisation mechanisms during execution. The change in synchronisation mechanism is a basic part of the channel and can come about at any time. In the worst case, the communicating processes will see a delay caused by having to repost a communication request to the channel. Three models have been presented and model-checked: the shared memory channel synchronisation model, the distributed channel synchronisation model and the dynamic synchronisation layer. The SPIN model checker has been used to perform an automatic verification of these models separately. During the verification it was checked that the communicated messages were transferred correctly using assertions. All models were found to verify with no errors for a variety of configurations with communicating sequential processes. The full model of the dynamic channel has not been verified, since the large state-space may make it unsuited for exhaustive verification using a model checker. With the results from this paper, we can also conclude that the synchronisation mechanism in the current PyCSP [11,12] can be model-checked succesfully by SPIN. The current PyCSP uses the two-phase locking approach with total ordering of locks, which has now been shown to work correctly for both the shared memory model and the distributed model. 4.1. Future Work The equivalence between the dynamic channel presented in this paper and CSP channels, as defined in the CSP algebra, needs to be shown. Through equivalence, it can also be shown that networks of dynamic channels function correctly. The models presented in this paper will be the basis for a new PyCSP channel, that can start out as a simple pipe and evolve into a distributed channel spanning multiple nodes. This channel will support mobility of channel ends, termination handling, buffering, scheduling of lightweight processes, skip and timeout guards and a discovery service for channel homes.

54

R.M. Friborg and B. Vinter / Verification of a Dynamic Channel Model

5. Acknowledgements The authors would like to extend their gratitude for the rigorous review of this paper, including numerous constructive proposals from the reviewers. References [1] David Beazly. Understanding the Python GIL. http://dabeaz.com/python/UnderstandingGIL.pdf. Presented at PyCon 2010. [2] Rune M. Friborg and Brian Vinter. Rapid Development of Scalable Scientific Software Using a Process Oriented Approach. Journal of Computational Science, page 11, March 2011. [3] Moshe Y. Vardi and Pierre Wolper. An Automata-Theoretic Approach to Automatic Program Verification. Proc. First IEEE Symp. on Logic in Computer Science, pages 322–331, 1986. [4] Gerard J. Holzman. The Model Checker Spin. IEEE Trans. on Software Engineering, pages 279–295, May 1997. [5] C.A.R. Hoare. Communicating Sequential Processes. Communications of the ACM, pages 666–676, August 1978. [6] C.A.R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. [7] Peter H. Welch, Neil Brown, James Moores, Kevin Chalmers, and Bernhard Sputh. Integrating and Extending JCSP. In A.A.McEwan, S.Schneider, W.Ifill, and P.Welch, editors, Communicating Process Architectures 2007, Jul 2007. [8] M. Schweigler and A. Sampson. p0ny - the occam-π Network Environment. Communicating Process Architectures 2006, pages 77–108, Jan 2006. [9] Neil C. Brown. C++CSP Networked. In Ian R. East, David Duce, Mark Green, Jeremy M. R. Martin, and Peter H. Welch, editors, Communicating Process Architectures 2004, pages 185–200, sep 2004. [10] Pycsp distribution. http://code.google.com/p/pycsp. [11] Rune M. Friborg, John Markus Bjørndalen, and Brian Vinter. Three Unique Implementations of Processes for PyCSP. In Communicating Process Architectures 2009, pages 277–292, 2009. [12] Brian Vinter, John Markus Bjørndalen, and Rune M. Friborg. PyCSP Revisited. In Communicating Process Architectures 2009, pages 263–276, 2009. [13] A. W. Roscoe. The Theory and Practice of Concurrency. Prentice-Hall International Series in Computer Science, 2005. ˇ ska, and P. Roˇckai. DiVinE: Parallel Distributed Model Checker. In Parallel and [14] J. Barnat, L. Brim, M. Ceˇ Distributed Methods in Verification 2010, pages 4–7, 2010.

Communicating Process Architectures 2011 P.H. Welch et al. (Eds.) IOS Press, 2011 © 2011 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-774-1-55

55

Programming the CELL-BE using CSP Kenneth SKOVHEDE a,* Morten N. LARSEN a and Brian VINTER a a eScience Center, Niels Bohr Institute, University of Copenhagen Abstract. The current trend in processor design seems to focus on using multiple cores, similar to a cluster-on-a-chip model. These processors are generally fast and power efficient, but due to their highly parallel nature, they are notoriously difficult to program for most scientists. One such processor is the CELL broadband engine (CELL-BE) which is known for its high performance, but also for a complex programming model which makes it difficult to exploit the architecture to its full potential. To address this difficulty, this paper proposes to change the programming model to use the principles of CSP design, thus making it simpler to program the CELL-BE and avoid livelocks, deadlocks and race conditions. The CSP model described here comprises a thread library for the synergistic processing elements (SPEs) and a simple channel based communication interface. To examine the scalability of the implementation, experiments are performed with both scientific computational cores and synthetic workloads. The implemented CSP model has a simple API and is shown to scale well for problems with significant computational requirements. Keywords. CELL-BE, CSP, Programming

Introduction The CELL-BE processor is an innovative architecture that attempts to tackle the problems, that prevent processors from achieving higher performance [1,2,3]. The limitations in traditional processors are primarily problems relating to heat, clock frequency and memory speed. Instead of using the traditional chip design, the CELL-BE consists of multiple units, effectively making it a cluster-on-a-chip processor with high interconnect speed. The CELL-BE processor consists of a single PowerPC (PPC) based processor connected to eight SPEs1 through a 204.8 GB/s EIB2 [4]. The computing power of a CELL-BE chip is well investigated [5,6], and a single CELL blade with two CELL-BE processors can yield as much as 460 GFLOPS [7] at one GFLOPS per Watt [7]. Unfortunately, the computing power comes at the price of a very complex programming model. As there is no cache coherent shared memory in the CELL-BE, the processes must explicitly transfer data between the units using a DMA model which resembles a form of memory mapped IO [8,4]. Furthermore to fully utilize the CELL-BE, the application must use task-, memory-, data- and instruction-level (SIMD3 ) parallelization [5]. A number of papers discuss various computational problems on the CELL-BE, illustrating that achieving good performance is possible, but the process is complex [5,9,10]. In this paper we focus on the communication patterns and disregard instruction-level and data parallelization methods because they depend on application specific computations and cannot be easily generalized. C.A.R. Hoare introduced the CSP model in 1978, along with the concept of explicit communication through well-defined channels. Using only channel based communication, each Author: E-mail:   . Synergistic Processing Elements, a RISC based processor. 2 Element Interconnect Bus. 3 Single Instruction Multiple Data.

* Corresponding 1

56

K. Skovhede et al. / Programming the CELL-BE using CSP

participating process becomes a sequential program [11,12]. It is possible to prove that a CSP based program is free from deadlocks and livelocks [11] using CSP algebra. Furthermore, CSP based programs are easy to understand, because the processes consist of sequential code and channels which handle communication between the processes. This normally means that the individual processes have very little code, but the total number of processes are very high. This work uses the CSP design rules and not the CSP algebra itself. By using a CSP like interface, we can hide the underlying complexity from the programmer giving the illusion that all transfers are simply channel communications. We believe that this abstraction greatly simplifies the otherwise complex CELL-BE programming model. By adhering to the CSP model, the implementation automatically obtains properties from CSP, such as being free of race-conditions and having detectable deadlocks. Since the library does not use the CSP algebra, the programmer does not have to learn a new language but can still achieve many of the CSP benefits. 1. Related Work A large number of programming models for the CELL-BE are available [13,14,15,16] illustrating the need for a simpler interface to the complex machine. Most general purpose libraries cannot be directly used on the CELL-BE, because the SPEs use a different instruction set than the PPC. Furthermore, the limited amount of memory available on the SPEs makes it difficult to load a general purpose library onto them. 1.1. Programming Libraries for the CELL-BE The ALF [13] system allows the programmer to build a set of dependent tasks which are then scheduled and distributed automatically according to their dependencies. The OpenMP [14] and CellSs [15] systems provide automatic parallelization in otherwise sequential code through the use of code annotation. As previously published [16], the Distributed Shared Memory for the CELL-BE (DSMCBE), is a distributed shared memory system that gives the programmer the “illusion” that the memory in a cluster of CELL-BE machines is shared. The channel based communication system described in this paper uses the communication system from DSMCBE, but does not use any DSM functionality. It is possible to use both communication models at the same time, however this is outside the scope of this paper. The CellCSP [17] library shares the goals of the channel based system described in this paper but by scheduling independent processes with a focus on processes, rather than communication. 1.2. CSP Implementations The Transterpreter [18] is a virtual machine that can run occam-π programs. By modifying the Transterpreter to run on the SPEs [19], it becomes possible to execute occam-π on the CELL-BE processor and also utilize the SPEs. The Transterpreter implementation that runs on the CELL-BE [19] has been extended to allow programs running in the virtual machine to access some of the SPE hardware. A similar project, trancell [20], allows a subset of occamπ to run on the SPU, by translating Extended Transputer Code to SPU binary code. Using occam-π requires that the programmer learns and understands the occam-π programming language and model, and also requires that the programs are re-written in occamπ. The Transterpreter fro CELL-BE has an extension that allows callbacks to native code [19], which can mitigate this issue to some extent. A number of other CSP implementations are available, such as C++CSP [21], JCSP [22] and PyCSP [23]. Although these may work on the CELL-BE processor they can currently

K. Skovhede et al. / Programming the CELL-BE using CSP

57

only utilize the PPC and not the high performing SPEs. We have used the simplified channel interface in the newest version of PyCSP [24] as a basis for developing the channel communication interface. Since DSMCBE [16] is written in C, we have produced a flattened and non-object oriented interface. 2. Implementation This section gives a short introduction to DSMCBE and describes some design and implementation details of the CSP library. For a more detailed description and evaluation of the DSMCBE system see previous work [16]. 2.1. Distributed Shared Memory for the CELL-BE (DSMCBE) As mentioned in the introduction, the basis for the implementation is the DSMCBE system. The main purpose of DSMCBE is to provide the user with a simple API that establishes a distributed shared memory system on the CELL-BE architecture. Apart from its main purpose, the underlying framework can also be adjusted to serve as a more generic platform for communication between the Power PC element (PPE) and the Synergistic Processing Elements (SPEs). Figure 1 shows the DSMCBE model along with the components involved. The DSMCBE system consists of four elements which we describe below:

Figure 1. DSMCBE Internal Structure.

The DSMCBE PPE/SPE modules contains the DSMCBE functions which the programmer will call from the user code. To manipulate objects in the system, the programmer will use the functions from the modules to create, acquire and release objects. In addition the two modules are responsible for communicating with the main DSMCBE modules which are located on the PPC. The PPE handler is responsible for handling communication between the PPC user code and the request coordinator (see below). Like the PPE handler, the SPE handler is responsible for handling communication between user code on the SPEs and the request coordinator (see below). However the SPE handler also manages allocation and deallocation of Local Store (LS) memory, which enables the SPE handler to perform memory management without interrupting the SPEs.

58

K. Skovhede et al. / Programming the CELL-BE using CSP

The DSMCBE library uses a single processing thread, called the request coordinator, which is responsible for servicing requests from the other modules. Components can then communicate with the request coordinator by supplying a target for the answer. Using this single thread approach makes it simpler to execute atomic operations and reduces the number of locks to a pair per participating component. Each PPC thread and SPE unit functions as a single component, which results in the request coordinator being unable to determine if the participant is a PPC thread or a SPE. As most requests must pass through the request coordinator, an obvious drawback to this method is that it easily becomes a bottleneck. With this communication framework it is easier to implement channel based communication, as the Request Coordinator can simply be extended to handle channel requests. 2.2. Extending DSMCBE with Channel Based Communication for CELL-BE This section will describe how we propose to extend the DSMCBE model with channel based communication. We have used the DSMCBE system as a framework to ensure atomicity and enable memory transfers within the CELL-BE processor. The implementation does not use any DSM methods and consists of a separate set of function calls. We have intentionally made the programming model very simple; it consists of only six functions: • • • • • •

   

                

All functions return a status code which describes the outcome of the call. 2.2.1. Channel Communication The basic idea in the communication model is to use channels to communicate. There are two operations defined for this:     and     . As in other CSP implementations, the read and write operations block until a matching request arrives, making the operations a synchronized atomic event. When writing to a channel, the calling process must supply a pointer to the data area. The result of a read operation is a pointer to a data area, as well as the size of the data area. After receiving a pointer the caller is free to read and write the contents of the area. As the area is exclusively owned by the process there is no possibility of a race condition. As it is possible to write arbitrary memory locations, when using C, it is the programmers responsibility not to use the data area after a call to write. Logically, the caller can consider the     operation as transferring the data and ownership of the area to the recipient. After receiving a pointer from a read operation, and possibly modifying data area, the process may forward the pointer again using    . As the reading process has exclusive ownership of the data area, it is also responsible for freeing the data area, if it is no longer needed. The operation results in the same output regardless of which CELL-BE processor the call originates from. If both processes are in the same memory space the data is not copied ensuring maximal speed. If the data requires a transfer, the library will attempt to do so in the most efficient manner. 2.2.2. Transferable Items The CELL-BE processor requires that data is aligned and have certain block sizes, a constraint that is not normally encountered by a programmer. We have chosen to expose a sim-

K. Skovhede et al. / Programming the CELL-BE using CSP

59

ple pair of functions that mimic the well-known  and  functions called       and      , respectively. A process wishing to communicate can allocate a block of memory by calling the       function and get a standard pointer to the allocated data area. The process is then free to write data into the allocated area. After a process has used a memory block, it can either forward the block to another channel, or release the resources held by calling      . 2.2.3. Channel Creation When the programmer wants to use a channel it is necessary to create it by calling the       method. To distinguish channels, the create function must be called with a unique number, similar to a channel name or channel object in other CSP systems. This channel number is used to uniquely identify the channel in all subsequent communication operations. The create function allows the caller to set a buffer size on the channel, thus allowing the channel writers to write data into the channel without awaiting a matching reader. A buffer in the CSP model works by generating a sequence of processes where each process simply reads and writes an element. The number of processes in the chain determines the size of the buffer. The semantics of the implemented buffer are the same as a chain of processes, but the implementation uses a more efficient method with a queue. The channel type specifies the expected use of the channel, with the following options: one-to-one, one-to-any, any-to-one, any-to-any and one-to-one-simple. Using the channel type it is possible to verify that the communication patterns correspond to the intended use. In situations where the participating processes do not change it is possible to enable "low overhead" communication by using the channel type one-to-one-simple. Section 2.2.8 describes this optimization in more detail. A special convention borrowed from the DSMCBE model is that read or write operations on non-existing channels will cause the caller to block if the channel is not yet created. Since a program must call the create function exactly once for each channel, some start-up situations are difficult to handle without this convention. Once a process has created the channel, it processes all the pending operations as if they occurred after the channel creation. 2.2.4. Channel Poison As all calls are blocking they can complicate the shutdown phase of a CSP network. The current CSP implementations support a channel poison state, which causes all pending and following operations on that channel to return the poison. To poison a channel, a process calls        with the id of an existing channel. When using poison, it is important to check the return value of the read and write operations, as they may return the poison status. A macro named    can be used to check the return value and exit the current function when encountered. However the programmer is still fully responsible for making the program handle and distribute poison correctly. 2.2.5. External Choice As a read operation is blocking, it is not possible to wait for data on more than one channel, nor is it possible to probe a channel for its content. If a process could see whether or not a channel has content, a race condition could be introduced. Thereby a second process could read the item right after the probe, resulting in a blocking read. To solve this issue, CSP uses the concept of external choice where a process can request data from multiple channels and then gets a response once a channel is ready. To use external choice, the process must call a variation of the       function named

60

K. Skovhede et al. / Programming the CELL-BE using CSP

     , where  is short for “alternation”, the term used in C.A.R. Hoare’s original paper [25]. Using this function, the process can block for a read operation on multiple channels. When one of the channels has data, the data is returned, as with the normal read operation, along with the channel id of the originating channel. This way of dealing with reads ensures that race conditions cannot occur. With the channel selection done externally, the calling process has no way of controlling which channel to read, should there be multiple available choices. To remedy this, the calling process must also specify what strategy to use if multiple channels are ready. The JCSP library offers three strategies: arbitrary, priority and fair. Arbitrary picks a channel at random whereas priority chooses the first available channel, prioritized by the order in which the channels are given. Fair selection keeps count of the number of times each channel has been selected and attempts to even out the usage of channels. The current implementation of CSP channels for CELL-BE only supports priority select, but the programmer can emulate the two other modes. Similar to the read function, a function called      allows a process to write to the first available channel. This function also supports a selection strategy and returns the id of the channel written to. There is currently no mechanism to support the simultaneous selection of channel readers and writers, though there are other ways of engineering this. 2.2.6. Guards To prevent a call from blocking, the calling function can supply a guard which is invoked when no data is available. The implementation defines a reserved channel number, called  which can be given as a channel id when requesting read or write from multiple channels. If the operation would otherwise block, the function returns a  pointer and  as the channel value. Other CSP implementations also offer a time-out guard, which performs a skip, but only if the call blocks for a certain period. This functionality is not available in the current implementation, but could be added without much complication. 2.2.7. Processes for CELL-BE The hardware in the CELL-BE is limited to a relatively low number of physical SPEs, which prevents the generation of a large number of CSP processes. To remedy this situation the implementation also supports running multiple processes on each SPE. Since the SPEs have little support for timed interrupts, the implementation is purely based on cooperative switching. To allow multiple processes on the SPE, we have used an approach similar to CELLMT [26], basically implementing a user-mode thread library, but based on the standard C functions  and  . The CSP threading library implements the   function, and allocates ABI compliant stacks for each of the processes when started. After setting up the multithreading environment, the scheduler is activated which transfers control to the first processes. Since the   function is implemented by the library, the user code must instead implement the    function, which is activated for each process in turn. This means that all processes running on a single SPE must use the same   function, but each process can call the function   !

  and thus obtain a unique id, which can be used to determine what code the process will execute. When a process is executing it can cooperatively yield control by calling  

" , which will save the process state and transfer control to the next available process. Whenever a process is waiting for an API response, the library will automatically call a similar function called   "   ". This function will yield if another process is ready to execute, meaning that it is not currently awaiting an API response. The

K. Skovhede et al. / Programming the CELL-BE using CSP

61

effect of this is that each API call appears to be blocking, allowing the programmer to write a fully sequential program and transparently run multiple processes. As there is no preemptive scheduling of threads, it is possible for a single process to prevent other processes from executing. This is a common trade-off between allowing the SPE to execute code at full speed, and ensuring progress in all processes. This can be remedied by inserting calls to       inside computationally heavy code, which allows the programmer to balance the single process execution and overall system progress in a fine grained manner. The scheduler is a simple round-robin scheduler using a ready queue and a waiting queue. The number of threads possible is limited primarily by the amount of available LS memory, which is shared among program code, stack and data. The running time of the scheduler is O(N ) which we deem sufficient, given that all processes share the limited LS, making more than 8 processes per SPE unrealistic. 2.2.8. SPE-to-SPE Communication Since the PPC is rarely a part of the actual problem solving, the memory blocks can often be transferred directly from SPE to SPE without transferring it into main memory. If a SPE is writing to a buffered channel, the data may not be read immediately after the write. Thus, the SPE may run out of memory since the data is kept on the SPE in anticipation of a SPE-to-SPE transfer. To remedy this, the library will flush data to main memory if an allocation would fail. This is in effect a caching system, and as such it is subject to the regular benefits and drawbacks of a cache. One noticeable drawback is that due to the limited available memory, the SPEs are especially prone to memory fragmentation, which happens more often when using a cache, as the memory stays fully populated for longer periods. If the channel is created with the type one-to-one-simple, the first communication will be used to determine the most efficient communication pattern, and thus remove some of the internal synchronization required. If two separate SPEs are communicating, this means that the communication will be handled locally in the SPE Handler shown in Figure 1, and thus eliminate the need to pass messages through the Request Coordinator. A similar optimization is employed if two processes on the same SPE communicate. In this case the data is kept on the SPE, and all communication is handled locally on the SPE in the DSMCBE SPE module shown in Figure 1. Due to the limited amount of memory available on the SPE, data may be flushed out if the channel has large buffers or otherwise exhaust the available memory. These optimizations can only work if the communication is done in a one-to-one fashion where the participating processes never change. Should the user code attempt to use such a channel in an unsupported manner, an error code will be returned. 2.2.9. Examples To illustrate the usage of the channel-based communication Listing 1 shows four simple CSP processes. Listing 2 presents a simple example that uses the alternation method to read two channels and writes the sum to an output channel. 3. Experiments When evaluating system performance, we focus mainly on the scalability aspect. If the system scales well, further optimizations may be made specific to the application, utilizing the SIMD capabilities of the SPEs. The source code for the experiments are available from    .

62

K. Skovhede et al. / Programming the CELL-BE using CSP

1 # i n c l u d e < d smcbe_csp . h> 3 i n t d e l t a 1 ( GUID i n , GUID o u t ) { void∗ value ; 5 while (1) { 7 CSP_SAFE_CALL( " r e a d " , d s m c b e _ c s p _ c h a n n e l _ r e a d ( i n , NULL, &v a l u e ) ) ; CSP_SAFE_CALL( " w r i t e " , d s m c b e _ c s p _ c h a n n e l _ w r i t e ( o u t , v a l u e ) ) ; 9 } } 11 i n t d e l t a 2 ( GUID i n , GUID outA , GUID outB ) { 13 void ∗ inValue , outValue ; size_t size ; 15 while (1) { 17 CSP_SAFE_CALL( " r e a d " , d s m c b e _ c s p _ c h a n n e l _ r e a d ( i n , &s i z e , &i n V a l u e ) ) ; CSP_SAFE_CALL( " a l l o c a t e " , d s m c b e _ c s p _ i t e m _ c r e a t e (& o u t V a l u e , s i z e ) ) ; 19 memcpy ( o u t V a l u e , i n V a l u e , s i z e ) ; / / Copy c o n t e n t s a s we n eed two c o p i e s 21 CSP_SAFE_CALL( " w r i t e A" , d s m c b e _ c s p _ c h a n n e l _ w r i t e ( outA , i n V a l u e ) ) ; 23 CSP_SAFE_CALL( " w r i t e B" , d s m c b e _ c s p _ c h a n n e l _ w r i t e ( outB , o u t V a l u e ) ) ; } 25 } 27 29

i n t p r e f i x ( GUID i n , GUID o u t , v o i d ∗ d a t a ) { CSP_SAFE_CALL( " w r i t e " , d s m c b e _ c s p _ c h a n n e l _ w r i t e ( o u t , d a t a ) ) ;

31

r e t u r n d e l t a 1 ( in , out ) ; }

33 35

i n t t a i l ( GUID i n , GUID o u t ) { v o i d ∗ tmp ;

37

CSP_SAFE_CALL( " r e a d " , d s m c b e _ c s p _ c h a n n e l _ r e a d ( i n , NULL, &tmp ) ) ; CSP_SAFE_CALL( " f r e e " , d s m c b e _ c s p _ i t e m _ f r e e ( tmp ) ) ;

39 r e t u r n d e l t a 1 ( in , out ) ; 41 }

Listing 1. Four simple CSP processes. 1 i n t add ( GUID inA , GUID inB , GUID o u t ) { 3 void ∗ data1 , ∗ data2 ; 5 7

GUID c h a n n e l L i s t [ 2 ] ; c h a n n e l L i s t [ 0 ] = inA ; c h a n n e l L i s t [ 1 ] = inB ;

9

GUID ch an ;

11 13

while (1) { d s m c b e _ c s p _ c h a n n e l _ r e a d _ a l t ( CSP_ALT_MODE_PRIORITY , c h a n n e l L i s t , 2 , &chan , NULL, &d a t a 1 ) ; d s m c b e _ c s p _ c h a n n e l _ r e a d ( ch an == inA ? inB : inA , NULL, &d a t a 2 ) ;

15

∗( i n t ∗) data1 = ∗(( i n t ∗) data1 ) + ∗(( i n t ∗) data2 ) ;

17 dsmcbe_csp_item_free ( data2 ) ; dsmcbe_csp_channel_write ( out , d at a1 ) ;

19 } 21 }

Listing 2. Reading from two channels with alternation read and external choice. To better fit the layout of the article the   macro is omitted.

K. Skovhede et al. / Programming the CELL-BE using CSP

63

All experiments were performed on an IBM QS22 blade, which contains 2 connected CELL-BE processors, giving access to 4 PPE cores and 16 SPEs. 3.1. CommsTime A common benchmark for any CSP implementation is the CommsTime application which sets up a ring of processes that simply forwards a single message. The conceptual setup is shown in Figure 2. This benchmark measures the communication overhead of the channel operations since there is almost no computation required in the processes. To better measure the scalability of the system, we have deviated slightly from the normal CommsTime implementation, by inserting extra successor processes as needed. This means that each extra participating process will add an extra channel, and thus and thus produce a longer communication ring. Figure 3 shows the CommsTime when communicating among SPE processes. The PPE records the time between each received message, thus measuring the time it takes for the message to traverse the ring. The time shown is an average over 10 runs of 10.000 iterations. As can be seen, the times seems to stabilize around 80 μseconds when using one thread per SPE. When using two or more threads the times stabilizes around 38 μseconds, 27 μseconds, and 20 μseconds respectively. When using multiple threads, the communication is performed internally on the SPEs, which results in a minimal communication overhead causing the average communication overhead to decrease.

 









 





Figure 2. Conceptual setup for the CommsTime experiment with 4 SPEs.

We have executed the CommsTime sample from the JCSP library v.1.1rc4 on the PPE. The JCSP sample uses four processes in a setup similar to Figure 2 but with all processes placed on the PPE. Each communication took on average 63 μseconds which is slightly faster than our implementation, which runs at 145 μseconds on the PPE. Even though JCSP is faster, it does not utilize the SPEs, and cannot utilize the full potential of the CELL-BE. 3.2. Prototein Folding Prototeins are a simplified 2D model of a protein, with only two amino acids and only 90 degree folds [27]. Folding a prototein is computationally simpler than folding a full protein, but exhibit the same computational characteristics. Prototein folding can be implemented with a bag-of-tasks type solution, illustrated in Figure 4, where partially folded prototeins are placed in the bag. The partially folded prototeins have no interdependencies, but may differ in required number of combinations and thus required computational time. As seen in Figure 5 the problem scales very close to linearly with the number of SPEs, which is to be expected for this type of problem. This indicates that the communication latency is not a limiting factor, which also explains why the number of SPE threads have very little effect on the scalability.

64

K. Skovhede et al. / Programming the CELL-BE using CSP CommsTime 140 1 threads 2 threads 3 threads 4 threads

120

Time(μs)

100 80 60 40 20 0 2

3

4

5

6

7

8 9 10 11 Number of SPEs

12

13

14

15

16

Figure 3. CommsTime using 2-16 SPEs with 1-4 threads per SPE.















 





Figure 4. Conceptual setup for Prototein folding with 3 SPEs. Prototein 16 14

Speedup

12 10 8 1 threads 2 threads 3 threads 4 threads

6 4 2 1

2

3

4

5

6

7 8 9 10 Number of SPEs

11

12

13

14

15

16

Figure 5. Speedup of prototein folding using 1-16 SPEs.

3.3. k Nearest Neighbors (kNN) The kNN application is a port of a similar application written for PyCSP [28]. Where the PyCSP model is capable of handling an extreme number of concurrent processes, the library is limited by the number of available SPEs and the amount of threads each SPE can accommodate. Due to this, the source code for the two applications are hard to compare, but the

65

K. Skovhede et al. / Programming the CELL-BE using CSP

overall approach and communication patterns are the same. Figure 6 shows a conceptual ring based setup for finding the kNN. 

 





 

 







Figure 6. Conceptual setup for the kNN experiment with 4 SPEs, each running 2 threads.

This ring-based approach means that each process communicates only with its neighbor. To support arbitrary size problems, one of the channels are buffered. The underlying system will attempt to keep data on the SPE, in anticipation of a transfer, but as the SPE runs out of memory, the data will be swapped to main memory. This happens completely transparent to the process, but adds an unpredictable overhead to the communication. This construction allows us to run the same problem size on one to 16 SPEs. KNN 16 14

Speedup

12 10 8 6 4 1 thread 2 threads

2 1

2

3

4

5

6

7 8 9 10 Number of SPEs

11

12

13

14

15

16

Figure 7. Speedup of the k Nearest Neighbors problem using 1-16 SPEs to search for 10 nearest neighbors in a set with 50k elements with 72 dimensions.

As seen in Figure 7 this does not scale linearly, but given the interdependencies we consider this to be a fairly good result. Figure 7 also shows that using threads to run multiple solver processes on each SPE offers a performance gain, even though the processes compete for the limited LS memory. This happens because the threads implement an implicit form of double buffering, allowing each SPE to mask communication delays with computation. The achieved speedup indicates that there is a good balance between the communication and computation performed in the experiment. The speedup for both graphs is calculated based on the measured time for running the same problem size on a single SPE with a single solver thread.

66

K. Skovhede et al. / Programming the CELL-BE using CSP

3.4. Communication to Computation Ratio The ring based communication model used in the kNN experiment is quite common for problems that use a n2 approach. However, the scalability of such a setup is highly dependent on the amount of work required in each subtask. To quantify the communication to computation ratio required for a well-scaling system, we have developed a simple ring-based program that allows us to adjust the number of floating point operations performed between communications. The computation performed is adjustable and does not depend on the size of the transmitted data, allowing us to freely experiment with the computational workload. The setup for this communication system is shown in Figure 8. The setup is identical to the one used in the kNN experiment, but instead of having two communicating processes on the same SPE, the processes are spread out. This change cause the setup to loose the possibility for the very fast internal SPE communication channels, which causes more load on the PPE and thus gives a more realistic measurement for the communication delays. 





  



 







Figure 8. Conceptual setup for non-structured ring based communication.

As seen in Figure 9, the implementation scales well if computation performed in each ring iteration is around 100MFLOPS. Comparing the two graphs in Figure 9, shows that increasing the number of threads on the SPEs, results in a decrease in performance. This happens because the extra processes introduce more communication. This increase in communication causes a bigger strain on the PPE, which results in more latency than the processes hide. In other words, the threads cause more latency than they can hide in this setup. The speedup for both graphs in Figure 9 are calculated based on measurements from a run with the same data size on a single SPE with a single thread. Comparing the Communication to Computation experiment with the kNN experiment reveals that the use of optimized channels reduces the latency of requests to a level where the threads are unable to hide the remaining latency. In other words, the latency becomes so low, that the thread switching overhead is larger than the latency it attempts to hide. This is consistent with the results from the CommsTime experiment, which reveals that the communication time is very low when performing inter-SPE communication. This does not mean that the latency is as low as it can be, but it means that the extra communication generated by the threads increases the amount of latency that must be hidden. 4. Future Work The main problem with any communication system is the overhead introduced by the communication. As the experiments show, this overhead exists but can be hidden because the CELL-BE and library are capable of performing the communication and computation simultaneously. But this hiding only works if the computational part of a program has a sufficient size. To remedy this, the communication overhead should be reduced significantly.

67

K. Skovhede et al. / Programming the CELL-BE using CSP

CommToComp − 1 thread 20

Speedup

15

10

5

0 2

3

4

5

6

7

8 9 10 Number of SPEs

11

12

13

14

15

16

12

13

14

15

16

CommToComp − 2 threads 20

Speedup

15

10

5

0 2 0.2 Mflop

3

4

5

2 Mflop

6

7

10 Mflop

8 9 10 Number of SPEs 20 Mflop

11

100 Mflop

200 Mflop

400 Mflop

Figure 9. Communication To Computation ratio, 16 bytes of data.

The decision to use the request coordinator to handle the synchronization simplifies the implementation, but also introduces two performance problems. One problem is that if the system becomes overwhelmed with requests, the execution will be sequential, as the processes will only progress as fast as the request coordinator responds to messages. The other problem is that the requests pass through both the SPU handler and the request coordinator, which adds load to the system and latency to each communication operation. 4.1. Reduce Request Latency Since the SPEs are the main workhorse of the CELL-BE, it makes sense to move much of the decision logic into the SPU handler rather than handle it in the request coordinator. The request coordinator is a legacy item from the DSM system, but there is nothing that prevents participating PPE processes from communicating directly with the SPU handler. 4.2. Increase Parallelism Even if the request coordinator is removed completely, the PPE can still be overwhelmed with requests, which will make everything run sequentially rather than in parallel. It is not possible to completely remove a single synchronization point, but many communication operations involve exactly two processes. In the common case where these two processes reside on separate SPEs, it is possible to perform direct SPE-to-SPE communication through the use of signals and DMA transfers. If this is implemented, it will greatly reduce the load on the PPE for all the presented experiments.

68

K. Skovhede et al. / Programming the CELL-BE using CSP

4.3. Improve Performance of the SPU Handler The current implementation uses a shared spinning thread that constantly checks for SPE and request coordinator messages. It is quite possible that this can be improved by using a thread for each SPE which uses the SPE events rather than spinning. Experiments performed for the DSMCBE [16] system show that improving the SPU handler can improve the overall system performance. 4.4. Improve Memory Exhaustion Handling When the communication is handled by the SPEs internally, it is likely that they will run out of memory. If the SPU handler is involved, such situations are detected and handled gracefully. Since this is essentially a cache system, a cache policy can greatly improve the performance of the system, by selectively choosing which elements to remove from the LS and when such an operation is initiated. 4.5. Process Migration The processes are currently bound to the SPE that started them, but it may turn out that the setup is ineffective and can be improved by moving communicating processes closer together, i.e. to the same SPE. There is limited support for this in the CELL-BE architecture itself, but the process state can be encapsulated to involve only the current thread stack and active objects. However, it may prove to be impossible to move a process, as data may occupy the same LS area. Since the C language uses pointers, the data locations cannot be changed during a switch from one SPE to another. One solution to this could be to allocate processes in slots, such as those used in CELL CSP [17]. 4.6. Multiple Machines The DSMCBE system already supports multiple machines, using standard TCP-IP communication. It would be desirable to also support multiple machines for CSP. The main challenge with multiple machines is to implement a well-scaling version of the alternation operations, because the involved channels can span multiple machines. This could use the cross-bar approach used in JCSP [29].

5. Conclusion In this paper we have described a CSP inspired communication model and a thread library, that can help programmers handle the complex programming model on the CELL-BE. We have shown that even though the presented models introduce some overhead, it is possible to get good speedup for most problems. On the other hand Figure 9 shows, that if the computation to communication ratio is too low - meaning too little computation per communication, it is very hard to scale the problems to utilize all 16 SPEs. However we believe that for most programmers solving reasonable sized problems, the tools provided can significantly simplify the writing of programs for the CELL-BE architecture. We have also shown that threads can be used to mask some latency, but at the same time they generate some latency, which limits their usefulness to certain problems. DSMCBE and the communication model described in this paper is open source software under the LGPL license, and are available from      .

K. Skovhede et al. / Programming the CELL-BE using CSP

69

Acknowledgements The authors acknowledge the Danish National Advanced Technology Foundation (grant number 09-067060) and the innovation consortium (grant number 09-052139) for supporting this research project. Furthermore the authors acknowledge Georgia Institute of Technology, its Sony-Toshiba-IBM Center of Competence, and the National Science Foundation, for the use of Cell Broadband Engine resources that have contributed to this research.

References [1] Wm. A. Wulf and Sally A. Mckee. Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News, 23:20–24, 1995. [2] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell multiprocessor. IBM J. Res. Dev., 49(4/5):589–604, 2005. [3] Gordon E. Moore. Readings in computer architecture. chapter Cramming more components onto integrated circuits, pages 56–59. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000. [4] Thomas Chen. Cell Broadband Engine Architecture and its first implementation - A Performance View, 2005.         . Accessed 26 July 2010. [5] Martin Rehr. Application Porting and Tuning on the Cell-BE Processor, 2008.             . Accessed 26 July 2010. [6] Mohammed Jowkar. Exploring the Potential of the Cell Processor for High Performance Computing, 2007.       

    . Accessed 26 July 2010. [7] IBM. IBM Doubles Down on Cell Blade, 2007.     !!!"#. Accessed 26 July 2010. $  [8] IBM. Cell BE Programming Handbook Including PowerXCell 8i, 2008.       $%&$'"('")*&+, !"%&**)-*#-. ',/01'2 $$$$!3#  . Accessed 26 July 2010. [9] Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization. IEEE Trans. Parallel Distrib. Syst., 19(9):1175–1186, 2008. [10] Asim Munawar, Mohamed Wahib, Masaharu Munetomo, and Kiyoshi Akama. Solving Large Instances of Capacitated Vehicle Routing Problem over Cell BE. In HPCC ’08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications, pages 131–138, Washington, DC, USA, 2008. IEEE Computer Society. [11] C.A.R. Hoare. Communicating Sequential Processes. Prentice-Hall, London, 1985. ISBN: 0-131-532715. [12] A.W. Roscoe, C.A.R. Hoare, and R. Bird. The theory and practice of concurrency, volume 216. Citeseer, 1998. [13] IBM. Accelerated Library Framework Programmer’s Guide and API Reference, 2009.   

     45)0 6 407 $ . Accessed 26 July 2010. [14] Kevin O’Brien, Kathryn O’Brien, Zehra Sura, Tong Chen, and Tao Zhang. Supporting OpenMP on Cell. In IWOMP ’07: Proceedings of the 3rd international workshop on OpenMP, pages 65–76, Berlin, Heidelberg, 2008. Springer-Verlag. [15] Pieter Bellens, Josep M. Perez, Rosa M. Badia, and Jesus Labarta. CellSs: a Programming Model for the Cell BE Architecture. In ACM/IEEE CONFERENCE ON SUPERCOMPUTING, page 86. ACM, 2006. [16] Morten N. Larsen, Kenneth Skovhede, and Brian Vinter. Distributed Shared Memory for the Cell Broadband Engine (DSMCBE). In ISPDC ’09: Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing, pages 121–124, Washington, DC, USA, 2009. IEEE Computer Society. [17] Mads Alhof Kristiansen. CELL CSP Sourcecode, 2009.      . Accessed 26 July 2010. [18] Christian L. Jacobsen and Matthew C. Jadud. The Transterpreter: A Transputer Interpreter. In Ian R. East, David Duce, Mark Green, Jeremy M. R. Martin, and Peter H. Welch, editors, Communicating Process Architectures 2004, volume 62 of Concurrent Systems Engineering Series, pages 99–106, Amsterdam, September 2004. IOS Press.

70

K. Skovhede et al. / Programming the CELL-BE using CSP

[19] Damian J. Dimmich, Christian L. Jacobsen, and Matthew C. Jadud. A Cell Transterpreter. In Peter Welch, Jon Kerridge, and Fred Barnes, editors, Communicating Process Architectures 2006, volume 29 of Concurrent Systems Engineering Series, pages 215–224, Amsterdam, September 2006. IOS Press. [20] Ulrik Schou Jørgensen and Espen Suenson. trancell - an Experimental ETC to Cell BE Translator. In Alistair A. McEwan, Wilson Ifill, and Peter H. Welch, editors, Communicating Process Architectures 2007, pages 287–298, jul 2007. [21] Alistair A. Mcewan, Steve Schneider, Wilson Ifill, Peter Welch, and Neil Brown. C++CSP2: A Many-toMany Threading Model for Multicore Architectures, 2007. [22] P. H. Welch, A. W. P. Bakkers (eds, and Nan C. Schaller. Using Java for Parallel Computing - JCSP versus CTJ. In Communicating Process Architectures 2000, pages 205–226, 2000. [23] Otto J. Anshus, John Markus Bjørndalen, and Brian Vinter. PyCSP - Communicating Sequential Processes for Python. In Alistair A. McEwan, Wilson Ifill, and Peter H. Welch, editors, Communicating Process Architectures 2007, pages 229–248, jul 2007. [24] Brian Vinter, John Markus Bjørndaln, and Rune Møllegaard Friborg. PyCSP Revisited, 2009.         . Accessed 26 July 2010. [25] C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, 1978. [26] Vicenç Beltran, David Carrera, Jordi Torres, and Eduard Ayguadé. CellMT: A cooperative multithreading library for the Cell/B.E. In HiPC, pages 245–253, 2009. [27] Brian Hayes. Prototeins. American Scientist, 86(3):216–, 1998. [28] Rune Møllegaard Friborg. PyCSP kNN implementation, 2010.           . Accessed 26 July 2010. [29] P.H. Welch and B. Vinter. Cluster Computing and JCSP Networking. Communicating Process Architectures 2002, 60:203–222, 2002.

Communicating Process Architectures 2011 P.H. Welch et al. (Eds.) IOS Press, 2011 © 2011 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-774-1-71

71

Static Scoping and Name Resolution for Mobile Processes with Polymorphic Interfaces Jan Bækgaard PEDERSEN 1 , Matthew SOWDERS School of Computer Science, University of Nevada, Las Vegas Abstract. In this paper we consider a refinement of the concept of mobile processes in a process oriented language. More specifically, we investigate the possibility of allowing resumption of suspended mobile processes with different interfaces. This is a refinement of the approach taken currently in languages like occam-π. The goal of this research is to implement varying resumption interfaces in ProcessJ, a process oriented language being developed at UNLV. Keywords. ProcessJ, process oriented programming, mobile processes, static name resolution

Introduction In this paper we redefine static scoping rules for mobile processes with polymorphic (multiple possible varying) suspend/resume interfaces, and develop an algorithm to perform correct name resolution. One of the core ideas behind mobile processes is the ability to suspend execution (almost) anywhere in the code and return control to the caller, who can then treat the suspended process as a piece of data, that can be transmitted to a different (physical) location, and at a later point in time, resumed and continue executing from where it left off. We shall use the word start the first time a mobile procedure is executed/invoked, and resume for all subsequent executions/invocations. Let us illustrate the problem with an example from occam-π. In occam-π [16], mobile processes are all initially started and subsequently resumed with the original (procedure) interface; that is, every resumption requires the same parameter list, even if some of these parameters have no meaning for the code that is to be executed. An example from [17] is shown in Figure 1. The reindelf process only uses the initialise channel (line 1) in the in station compound (initialise local state) code block (line 7). For each subsequent resumption (lines 11, 13, and 15) of this process, a ’dummy’ channel-end must be passed as the first parameter. The channel end represents a channel on which no communication is ever going to happen. Not only does that make the code harder to read, but also opens the possibility of incorrect code should the channel be used for communication in the subsequent code blocks. Similarly, should subsequent resumptions of the process require different channels, the initial call must provide ’dummy’ values for these the first time the process is called. 1 Corresponding Author: Jan Bækgaard Pedersen, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV, 89154, United States of America. Tel.: +1 702 895 2557; Fax: +1 702 895 2639; E-mail: [email protected].

72

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

MOBILE PROC reindelf (CHAN AGENT.INITIALIZE initialize?, SHARED CHAN AGENT.MESSAGE report!, SHARED CHAN INT santa.a!, santa.b!) IMPLEMENTS AGENT ... local state declarations SEQ ... in station compound (initialise local state) WHILE TRUE SEQ ... in station compound SUSPEND -- move to gathering place ... in the gathering place SUSPEND -- move to santa’s grotto ... in santa’s grotto SUSPEND -- move to compound : Figure 1. occam-π example.

For ProcessJ [13], a process oriented language being developed at the University of Nevada, Las Vegas, we propose a different approach to mobile process resumption. When a process explicitly suspends, it defines with which interface it should be resumed. This of course means that parameters from the previous resumption are no longer valid. Static scoping analysis as we know it no longer suffices to perform name resolution. In this paper we present a new approach to name resolution for mobile processes with polymorphic interfaces. In ProcessJ, a suspend point is represented by the three keywords suspend resume with followed by a parameter list in parentheses (like a formal parameter list for a procedure as found in most languages). A suspended mobile process is resumed by a simple invocation using the name of the variable holding the reference to it, followed by a list of actual parameters (like a regular procedure call). For example, if a suspended mobile is held in a variable f , and the interface defines one integer parameter, then f (42) is a valid resumption. Let us start with a small example without any channels or local variables: 1: 2: 3: 4: 5: 6: 7: 8: 9:

mobile void foo(int x, int y) { B1 while (B2 ) { B3 suspend resume with (int z); B4 } B5 } Figure 2. Simple ProcessJ example.

The first (and only) time B1 is executed, it has access to the parameters x and y from the original interface (line 1). The first time B2 is executed will be immediately after the execution of B1 . That is, following the execution of B1 , which had access to the parameters x and y. B2 cannot access x or y, as we will see shortly. If B2 evaluates to true the first time it is reached, the process will execute B3 and suspend itself. B4 will be executed when the process is resumed though the interface that declares the parameter z (line 5). The previous parameters x and y are now no longer valid. To realize why these parameters should no longer be valid, imagine they held channels to the previous local environment (the caller’s

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

73

environment) in which the process was executed, but in which it no longer resides; these channels can no longer be used, so it is imperative that the parameters holding references to them not be used again. Therefore, B4 can only reference the z parameter, and not x and y. But what happens now when B2 is reached a second time? x and y are no longer valid, but what about z? Naturally z cannot be referenced by B2 either as the first time B2 was reached, the process was started through the original interface and there was no z in that interface. Furthermore, if we look closely at the code, we also realize that the first time the code in block B3 is reached, just like B2 , the parameters from the latest process resumption (which here is also the first) would be x and y. The second time the code block B3 is executed will be during the second execution of the body of the while loop. This means, that foo has been suspended and resumed once, and since the interface of the suspend statement has just one parameter, namely z, and not x and y, neither can be referenced. So in general, we cannot guarantee that x and y can be referenced anywhere except block B1 . The same argument holds for z in block B4 . We can illustrate this by creating a table with a trace of the program and by listing with which parameters the most recent resumption of the process happened. Table 1 shows a trace of the process where B2 is evaluated three times, the first two times to true, and the last time to false. By inspecting Table 1, we see that both B2 and B3 can be reached with disjoint sets of parameters; therefore disallowing referenced to both x and y as well as z. B5 could have appeared with the parameters x and y had B2 evaluated to false the first time it was evaluated, thus we can draw the same conclusion for B5 as we did for B2 and B3 . Table 1. Trace of sample execution. Started/resumed interface f oo(x, y)

f oo(z)

f oo(z)

Block — B1 B2 B3 — B4 B2 B3 — B4 B2 B5

Parameters from latest resumption — {x, y} {x, y} {x, y} — {z} {z} {z} — {z} {z} {z}

Remarks foo(int x, int y) B2 = true suspend resume with (int z); B2 = true suspend resume with (int z); B2 = f alse

Table 2 shows in which blocks (Bi ) the three interface parameters can be referenced. Later on we shall add local variables to the code and redo the analysis. Table 2. Parameters that can be referenced in various blocks. Parameter x y z

Blocks that may reference it B1 B1 B4

If we had changed z to x (and retained their shared type int), all of a sudden, x would now also be a valid reference in the blocks B2 , B3 , and B5 ; that is, everywhere in the body of the procedure.

74

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

We start by examining the parameters of the interfaces, and later return to incorporate the local variables (for which regular static scoping rules apply) into a single name resolution pass containing both parameters and local variables. In the next section we look at related work, and then proceed in section 2 to present a method for constructing a control flow graph (CFG) based on the ProcessJ source code. In section 3 we define sets of declarations to be used in the computation of valid reference, and in section 4 we illustrate how to compute these sets, and finally in section 5 we present the new name resolution algorithm for mobile processes with polymorphic interfaces. Finally we wrap up with a result section and some thoughts about future work. 1. Related Work The idea of code mobility has been around for a long time. In 1969 Jeff Rulifson introduced a language called the Decode-Encode-Language (DEL) [15]. One could download a DEL program from a remote machine, and the program would control communication and efficiently use limited bandwidth between the local and remote hosts [4]. Though not exactly similar to how a ProcessJ process can be sent to different computational environments, DEL could be considered the beginning of mobile agents. Resumable processes are similar to mobile agents. In [5], Chess et al. provides a classification of Mobile Code Languages. In a Mobile Code Language, a process can move from one computational environment to another. A computational environment is container of components, not necessarily a host. For example, two Java Virtual Machines running on the same host would be considered two different computational environments. The term Strong Mobility [5] is used when the process code, state, and control state are saved before passing them to another process to resume at the same control state and with the same variable state in a potentially different computational environment. The term Weak Mobility in contrast does not preserve control state. Providing mobility transparently means the programmer will not need to save the state before sending the process. All that is needed is to define the positions where the process can return control using a suspend statement or a suspend resume statement. The process scheduling is also transparent to the end programmer because mobile processes are scheduled the same as normal processes. 1.1. The Join Calculus and Chords The Join Calculus [9] is a process algebra that extends Milner’s π-calculus [12] and that models distributed and mobile programming. Mobility is treated slightly different in the Join Calculus. The Join Calculus has the concept of Locality, or the computational environment [5] where the process is executed. Locality is inherent to the system and a process can define its locality rather than the suspend-send-resume approach used in occam-π. Cω [3] is a language implementation of the Join Calculus and an extension of the C# programming language. Cω uses chords, a method with multiple interfaces that can be invoked in any order. The body of the method will not execute until every interface has been invoked at least once. ProcessJ does not treat multiple interfaces this way; only one interface is correct at a time, and the process can only be resumed with that exact interface. Therefore, we are forced to either implement run-time errors, or allow querying the suspended mobile about which interface it is ready to accept. 1.2. The Actor Model ProcessJ also differs from Hewitts’ actor model [2,10,11] in the same way; In the actor model, any valid interface can be invoked, and the associated code will execute; again, for ProcessJ, only the interface that the suspended process is ready to accept can be invoked.

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

75

A modern example of the Actor Model is Erlang actors. Erlang uses pattern matching and receive to respond to messages sent. Figure 3 is a basic actor that takes several differing message types and acts according to each message sent. It is possible to specify a wild card ’ ’ message that will match all other messages so there is a defined default behavior. Erlang also has the ability to dynamically load code on all nodes in a cluster using the nl command [1], or send a message to a process running on another node. A combination of these features could be used to implement a type of weak mobility in Erlang; this is illustrated in Figure 3. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

loop () → receive % If I receive a string ”a” print ”a” to standard out "a" → io:format("a"), loop(); % If I receive a process id and a string ”b” % write ”echo” to the given process id {Pid, "b"} → Pid ! "echo", loop(); % handle any other message I might receive → io:format("do not know what to do."), loop(); end. Figure 3. Erlang Actors can respond to multiple message interfaces.

1.3. Delimited Continuations and Swarm In 2009, Ian Clarke created a project called Swarm [6]. Swarm is a framework for transparent scaling of distributed applications utilizing delimited continuations in Scala through the use of a Scala compiler plug-in. A delimited continuation, also known as a functional continuation [8], is a functional representation of the control state of a process. The goal of Swarm is to deploy an application to an environment with distributed data and move the computations to where the data resides instead of moving the data to the where the process resides. This approach is similar to that used in MapReduce [7] though it is more broadly applicable because not every application can map to the MapReduce paradigm. 1.4. occam-π Versus ProcessJ Mobiles The occam-π language has built in support for mobile processes [16]. The method adopted by occam-π allows processes to suspend rather than always needing to complete. A suspended process can then be communicated on a channel and resumed from the same state it was suspended, providing strong mobility. In occam-π, a mobile process must implement a mobile process type [16]; this is to assure that the process receiving the (suspended) mobile will have the correct set of resources to re-animate the mobile. Mobile processes in ProcessJ with polymorphic interfaces cannot make use of such a technique, as there is no way of guaranteeing that the receiving process will resume the mobile with the correct interface. Naturally, this can be rather detrimental to the further execution of the code; a runtime error would be generated if the mobile is not in a state to accept the interface with which is is resumed. The runtime check added by the

76

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

compiler is inexpensive and is similar in use to an ArrayOutOfBoundsException in Java. In ProcessJ we approach this problem (though not the scope of this paper, but worth mentioning) in the following way: It is possible to query a mobile process about its next interface (the one waiting to be invoked); this can be done as illustrated in Figure 4. If a process is not in a 1: 2: 3: 4: 5: 6: 7: 8:

MobileProc p = c.read(); // Receive a mobile on channel c if (p.accepts(chan.read)) { // is p’s interface (chan.read) ? chan intChan; par { p(intChan.read); // Resume p with a reading channel end c.write(42); } } Figure 4. Runtime check to determine if a process accepts a specific interface.

state, in which it is capable of accepting a resumption with a certain interface, the check will evaluate to false, and no such resumption is performed. This kind of check is necessarily a runtime check. 2. Control Flow Graphs and Rewriting Rules The key idea to determine which parameters can be referred in a block, is to consider all paths from interfaces leading into that block. If all paths to a block include a definition from an interface of a parameter with the same name and type, then this parameter can be referenced in that block. This can be achieved by computing the intersection of all the parameters declared in interfaces that can flow into a block (directly or indirectly through other nodes.) We will develop this technique through the example code in Figure 2. The first step is to generate a source code-based control flow graph (CFG), which can be achieved using a number of simple graph construction rules for control diverting statements (these are if-, while-, do-, for-, switch-, and alt-statements as well as break and continue). Theses rules are illustrated in Figure 5. For the sake of completeness, it should be noted, that the depiction of the switch statement in Figure 5 is based on each statement case having a break statement at its end; that is, there are no fall though cases. If for example B1 could fall through to B2 the graph would have an arc from e to B1 , from e to B2 , and to represent the fall through case, an arc from B1 to B2 . continue statements in loops add an extra arc to the boolean expression controlling the loop, and a break in an if statement would skip the rest of the nodes from it to the end of the statement by adding an arc directly to the next node in the graph. If we apply the CFG construction rules from Figure 5 in which we treat procedure calls and suspend/resume statements as non-control-diverting statements (The original process interface can be thought of as resume point and will thus be the first ’statement’ in the first block in the CFG.), we get the control flow graph shown in Figure 6. Note, the I0 before B1 represents the original procedure interface, and the I1 between B3 and B4 represents the suspend/resume interface. Having the initial interface and the suspend/resume statements mixed with the regular block commands will not work for the analysis to come, so we need to separate those out. This can be done using a simple graph rewriting rule; each interface gets its own node. This rewriting rule is illustrated in Figure 7.

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

.. . if (b) S1 else S2 .. .

.. . if (b) S .. .

b

S1

S2

if-then-else statement

77

b

S

if-then statement

.. . do

.. . while (b) S .. .

S while (b) .. .

b

S

while statement

.. . for (i; e; u) S .. .

.. . alt { g1 :

b

S

} .. .

u

S1 .. . gn : Sn

for statement

g1

...

gn

S1

...

Sn

alt statement

e

B1

b

do statement

i

.. . switch (e) { case c1 : B1 .. . case cn : Bn } .. .

S

...

Bn

switch statement Figure 5. CFG construction rules.

78

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

I 0 B1

B2

B 3 I 1 B4

B5

Figure 6. CFG for the example code in Figure 2.

We will refer to the nodes representing interfaces as interface nodes and all others (with code) as code nodes. With an interface node we associate a set of name/type/interface triples (ni ti Ii ), namely the name (ni ) of the parameter, its type (ti ) and the interface (Ii ) in which it was declared. In addition, we introduce a comparison operator = ˆ between triples defined ˆ (nj tj Ij ) ⇔ (ni = nj ∧ ti = tj ). The corresponding set in the following way: (ni ti Ii ) = ˆ . We introduce interface nodes for suspend/resume points intersection operator is denoted ∩ into the graph in the following manner: if a code block Bi has m suspend/resume statements, then split Bi into m + 1 new code blocks Bi1 , . . . , Bim+1 interspersed with interface nodes Ii1 , . . . , Iim . Bi1 and/or Bim+1 might be empty code nodes (Technically, so might all the

I im

Bim+1

...

...

I i1

...

...

...

Bi 1

Bi

Figure 7. CFG rewriting rule.

other code nodes, but that would be a little strange, as that would signify 2 or more suspend statements following each other without any code in between). Also, since the parameters of the procedure interface technically also make up an interface, we need to add an interface node for these as well. This is also covered by the rewriting rule in Figure 7, and in this case Bi1 will be empty and Ii2 will be I0 . Rewriting the CFG from Figure 6 results in the graph depicted in Figure 8. We now have a CFG with code and interface nodes. Each interface node has information about the parameters it declares, as well as their types. This CFG is a directed graph (VCF G , ECF G ), where the vertices in V are either interface nodes (Ii ) or code nodes (Bi ). An edge in ECF G is a pair of nodes (N, M ) representing a directed edge in the CFG from N to M ; that is, if (N, M ) ∈ ECF G , then the control flows from the code represented by vertex N to the code represented by the vertex M in the program. 3. In and Out Sets For the nodes representing an interface, Ii , we are not interested in the incoming arcs. Since a suspend/resume point represented by an interface node re-defines which parameters can be accessed, they will overwrite any existing parameters. We can now define, for each node in the CFG, sets representing incoming and outgoing parameters. We define two sets for each node N (N is either a code node (Bi ) or an interface node (Ii )) in the CFG, namely the in set (Ik (N )) and the out set (Ok (N ))). Each of these sets

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

I0

79

{(x int I 0 ),(y int I 0 )}

B1

B4

B2

I1

B3

B5

{(z int I1)} Figure 8. The altered CFG of the example in Figure 6.

are subscripted with a k denoting a generation. Generations of in and out set are dependent on the previous generations. The in set of a code block ultimately represent the parameters that can be referenced in that block. The out set for a code block is a copy of the in set; while technically not necessary, they make the algorithm that we will present later look nicer. For interface nodes, in sets are ignored (there is no code in an interface node). We can now define the following generation 0 sets for an interface node Ii (representing an interface (ti,1 ni,1 , . . . , ti,ki ni,ki )) and a code node Bi : I0 (Ii ) O0 (Ii ) I0 (Bi ) O0 (Bi )

:= := := :=

{} {(ni,1 ti,1 Ii ), . . . , (ni,ki ti,ki Ii )} {} {}

Since an interface node introduces a new set of parameters, we only define its out set. The (k + 1)th generation of in and out sets can easily be computed based on the k th generation. Recall that a parameter (of a certain name and type) can only be referenced in a code block Bi if all interfaces Ij that have a path to Bi define it (both name and type must be the same!); this leads us to the following definition of the k + 1th generation for in and out sets: Ik+1 (Ii ) Ok+1 (Ii ) Ik+1 (Bi ) Ok+1 (Bi )

:= := := :=

{} Ok (Ii )  ˆ (N,B )∈E Ok (N ) i CF G Ik+1 (Bi )

That is, the k + 1th generation of the in set of block Bi is the intersection of the out sets of all its immediate predecessors at generation k in the CFG. To determine the set of references that are valid within a code block we repeatedly apply the four rules (only the two rules for the code blocks will change any sets after the first iteration) until no sets change. Table 3 shows the results after two generations; the third does not change anything, so the result can be observed in the column labeled I1 . To see that x and y or z cannot be referenced in block B2 , consider the set I1 (B2 ): ˆ O0 (B4 ) = {(x int I0 ), (y int I0 )}∩ ˆ {(z int I1 )} = { } I1 (B2 ) := O0 (B1 )∩

80

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes Table 3. Result of in and out sets after 2 generations. I0 B1 B2 B3 I1 B4 B5

I0 {} {} {} {} {} {} {}

O0 {(x int I0 ), (y int I0 )} {} {} {} {(z int I1 )} {} {}

I1 {} {(x int I0 ), (y int}I0 )} {} {} {} {(z int I1 )} {}

O1 {(x int I0 ), (y int I0 )} {(x int I0 ), (y int I0 )} {} {} {(z int I1 )} {(z int I1 )} {}

If two triples have have the same name and type both triples will be represented in the result set (with different interface numbers of course.) We can now formulate the algorithm for computing in and out sets.

4. Algorithm for In and Out Set Computation Input: ProcessJ mobile procedure. Method: 1. Using the CFG construction rules from Figure 5, construct the control flow graph G. 2. For each interface node Ii , and code node Bj in G = (V, E) initialize Ik+1 (Ii ) := { } Ok+1 (Ii ) := Ok (Ii )  Ik+1 (Bj ) := ˆ (N,Bj )∈E Ok (N ) Ok+1 (Bj ) := Ik+1 (Bj ) 3. Execute this code: done = false; while (!done) { done = true; for (B ∈ V ) do { // only for code nodes  B  = ˆ (N,B)∈E O(N ) if (B  = B) done = false; O(B) = I(B) = B  } } Result: Input sets for all code block with valid parameter references. It is worth pointing out that in the algorithm generations of in and out sets are not used. This does not impact the correctness of the computation (because the operator used is the intersection operator.) If anything, it shortens the runtime by allowing sets from generation k + 1 to be used in the computation of other generation k + 1 sets. With this in hand, we can now turn to performing the actual scope resolution. This can be achieved using a regular static scope resolution algorithm with a small twist, as we shall see in the following section.

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

81

5. Static Name Resolution for Mobile Processes Let us re-introduce the code from Figure 2, but this time with local variables added (lines 2, 5, and 8); this code can be found in Figure 9. Also note, the local variable z in line 8 has the same name as the parameter in the interface in line 7. Naturally, this means that the interface parameter is hidden by the local variable. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

mobile void foo(int x, int y) { int a; B1 while (B2 ) { int q; B3 suspend resume with (int z); int w,z; B4 } B5 } Figure 9. Simple ProcessJ example with local variables.

As briefly mentioned in the previous section, the regular static name resolution algorithm works almost as-is. The only differences are that we have to incorporate the in sets computed by the algorithm in the previous section in the resolution pass, and the way scopes are closed will differ slightly. Different languages have different scoping rules, so let us briefly state the static scoping rules for parameters and locals in a procedure in ProcessJ. • Local variables cannot be re-declared in the same scope. • An interface/procedure declaration opens a scope in which only the parameters are held. The scoping rules of interface parameters are what we defined in this paper. • The body of a procedure opens a scope for local variables. (this means, that we can have parameters and locals named the same, but the parameters will be hidden by the local variables.) • A block (a set of { }) opens a new scope (Local variable names can now be reused, though re-declared local variables hide other local variables or parameters in enclosing scopes. The scope of a local variable declared in a block is from the point of declaration to the end of the block. • A for-statement opens a scope (it is legal to declare variables in the initialization part of a for-statement. The scope of such variables is the rest of the for-statement. • A suspend/resume point open a new scope for the new parameters. Since we treat a suspend/resume point’s interface like the original procedure interface, an implicit block ensues immediately after, so a new scope is opened for that as well (If we did not do this, we would break the rule that parameters and local can have shared names, as the in this situation would reside in the same scope.) A symbol table, in this context, is a two dimensional table mapping names to attributes. In addition, a symbol table has a parent (table), and an access list of block numbers that represent which blocks may perform look-ups in them. This access list contains the result of the algorithm that computed which blocks can access an interface’s parameters. If the use of a name in block Bi requires a look-up in a table that does not list i in its access list, the look-up query is passed to the parent recursively, until either the name is successfully resolved, or the end of the chain of tables is reached, resulting in an unsuccessful lookup of that name.

82

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

Using the example from Figure 2, a total of 5 scopes are opened, two by interfaces (The original procedure’s interface declaring parameters x and y, accessible only by code in block B1 , and the suspend/resume point’s interface declaring parameter w and z, accessible only by code in block B4 ), one by the main body of the procedure (declaring local variable a), one by a block (declaring local variable q), and one following the suspend/resume point (declaring the local variable z, which hides the parameter from the interface of the suspend/resume statement). In Figure 10, the code has been decorated with +Ti to mark where the ith scope is opened, and −Ti to mark where it is closed. Furthermore the implicit scopes opened by the parameter list of an interface, and the body following a suspend/resume statement have been added; these are the underlined brackets in lines 2, 12, 14, 17, 18, and 22. Note the closure of three scopes, −T4 , −T3 , −T2 , at the end of the block making up the body

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:

mobile void foo {+T0 (int x, int y) {+T1 int a; B1 while (B2 ) {+T2 int q; B3 suspend resume with {+T3 (int z); {+T4 int w,z; B4 }−T4 }−T3 }−T2 B5 }−T1 }+T0 Figure 10. Simple ProcessJ example annotated with scope information.

of the while-loop. Since there are no explicit markers in the code that close down scopes for suspend/resume points (T3 ), and the following scope (T4 ), these get closed automatically when an enclosing scope (T2 ) is closed. This is easily controlled when traversing the code (and not the CFG), as a typical name resolution pass would. Figure 11 illustrates the 5 symbol tables, the symbols they declare, their access lists, and the nodes in the CFG with which they are associated. We summarize in Table 4 which variables (locals and parameters) can be referenced in which blocks. Note, although block 4 appears in the access list in symbol table T3 in Figure 11 (and the parameter z is in O1 (B4 )), the local variable z in table T4 hides the parameter.

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

83

T0 N V int x int y

I0

int a;

B1

T1 N

{1}

V

int a B2

int q;

B5

B3

T2 N V int q

{1,2,3,4}

{1,2,3,4}

T3 N V int z

I1

int w,z;

B4

T4 N

{4}

V

int z int w {1,2,3,4}

Figure 11. CFG with symbol tables. Table 4. Final list of which variables/parameters can be access in which blocks. Block B1 B2 B3 B4 B5

Locals a ∈ T1 a ∈ T1 q ∈ T2 , a ∈ T1 w ∈ T4 , z ∈ T4 , q ∈ T2 , a ∈ T1 a ∈ T1

Parameters x ∈ T0 , y ∈ T0 − − z ∈ T4 −

6. Results and Conclusion We have presented an algorithm that can be applied to create a control flow graph (CFG) at a source code level, and an algorithm to determine which procedure parameters and suspend/resume parameters can be referenced in the code of a mobile procedure. Additionally, we presented a method for performing static scope resolution on a mobile procedure (mobile process) in a process oriented language like ProcessJ. This analysis obeys the standard static scoping rules for local variables and also takes into account the new rules introduced by making a procedure mobile with polymorphic interfaces (and thus resumable in the ’middle of the code’, immediately after the point of exit (suspend point)). 7. Future Work The ProcessJ compiler generates Java code using JCSP to implement CSP primitives like channels, processes and alternations. Additional implementation work is required to integrate

84

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

the algorithm as well as the JCSP code generation into the ProcessJ compiler. A possible implementation of mobiles using Java/JCSP can follow the approach taken in [14], which unfortunately requires the generated (and compiled) bytecode to be rewritten; this involved reloading the bytecode and inserting new bytecode instructions, something that can be rather cumbersome. However, we do have a new approach, which does not require any bytecode rewriting at all. We expect to be able to report on this in a different paper in the very near future. References [1] Ericsson AB. Erlang STDLIB, 2010. http://www.erlang.org/doc/apps/stdlib/stdlib. pdf. [2] Gul Agha. Actors: a model of concurrent computation in distributed systems. MIT Press, Cambridge, 1986. [3] Nick Benton, Luca Cardelli, and Cedric Fournet. Modern Concurrency Abstractions for C#. In ACM TRANS. PROGRAM. LANG. SYST, pages 415–440. Springer, 2002. [4] Peter Braun and Wilhelm Rossak. Mobile Agents: Basic Concepts, Mobility Models, and the Tracy Toolkit. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004. [5] David Chess, Colin Harrison, and Aaron Kershenbaum. Mobile agents: Are they a good idea?Mobile Agents: Are they a good idea? In Jan Vitek and Christian Tschudin, editors, Mobile Object Systems Towards the Programmable Internet, volume 1222 of Lecture Notes in Computer Science, pages 25–45. Springer Verlag, Berlin, 1997. [6] Ian Clarke. swarm-dpl - A transparent scalable distributed programming language, 2008. http:// code.google.com/p/swarm-dpl/. [7] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008. [8] Matthias Felleisen. Beyond continuations. Computer Science Dept. Indiana University Bloomington, Bloomington IN, 1987. [9] C´edric Fournet and Georges Gonthier. The Join Calculus: A Language for Distributed Mobile Programming. In Gilles Barthe, Peter Dybjer, Lu´ıs Pinto, and Jo˜ao Saraiva, editors, Applied Semantics, volume 2395 of Lecture Notes in Computer Science, pages 268–332. Springer Verlag Berlin / Heidelberg, 2000. [10] Carl Hewitt. Viewing control structures as patterns of passing messages. Artificial Intelligence, 8(3):323364, June 1977. [11] Carl Hewitt, Peter Bishop, Irene Greif, Brian Smith, Todd Matson, and Richard Steiger. Actor induction and meta-evaluation. In In ACM Symposium on Principles of Programming Languages, pages 153–168, 1973. [12] Robin Milner. Communicating and mobile systems: the pi-calculus. Cambridge University Press, Cambridge[England] ;;New York, 1999. [13] Jan B. Pedersen et al. The ProcessJ homepage, 2011. http://processj.cs.unlv.edu. [14] Jan B. Pedersen and Brian Kauke. Resumable Java Bytecode - Process Mobility for the JVM. In The thirty-second Communicating Process Architectures Conference, CPA 2009, organised under the auspices of WoTUG, Eindhoven, The Netherlands, 1-6 November 2009, pages 159–172, 2009. [15] Jeff Rulifson. DEL, 1969. http://www.ietf.org/rfc/rfc0005.txt. [16] Peter H. Welch and Frederick R.M. Barnes. Communicating Mobile Processes: introducing occam-π. In Ali E. Abdallah, Cliff B. Jones, and Jeff W. Sanders, editors, 25 Years of CSP, volume 3525 of Lecture Notes in Computer Science, pages 175–210. Springer Verlag, April 2005. [17] Peter H. Welch and Jan B. Pedersen. Santa Claus - with Mobile Reindeer and Elves. In Fringe Presentation at Communicating Process Architectures conference, September 2008.

J.B. Pedersen and M. Sowders / Static Scoping and Name Resolution for Mobile Processes

85

A. Appendix To illustrate the construction of the CFG in more depth, Figure 13 shows the control flow graph for a for loop with conditional break and continue. The code from which the CFG in Figure 13 was generated is shows in Figure 12. In Figure 13 the body of the for loop is represented by the largest shaded box, the if statement containing the break statement is the box shaded with vertical lines, and the if statement containing the continue statement is the box shaded with horizontal lines. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

for ( i ; b1 ; u ) { B1 if (b2 ) { B2 break; } B3 if (b3 ) { B4 continue; } B5 } Figure 12. Example code with conditional break and continue statements.

Figure 13. CFG for the example code shown in Figure 12.

This page intentionally left blank

Communicating Process Architectures 2011 P.H. Welch et al. (Eds.) IOS Press, 2011 © 2011 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-774-1-87

87

Prioritised Choice over Multiway Synchronisation a

Douglas N. WARREN a,1 School of Computing, University of Kent, Canterbury, UK

Abstract. Previous algorithms for resolving choice over multiway synchronisations have been incompatible with the notion of priority. This paper discusses some of the problems resulting from this limitation and offers a subtle expansion of the definition of priority to make choice meaningful when multiway events are involved. Presented in this paper is a prototype extension to the JCSP library that enables prioritised choice over multiway synchronisations and which is compatible with existing JCSP Guards. Also discussed are some of the practical applications for this algorithm as well as its comparative performance. Keywords. CSP, JCSP, priority, choice, multiway synchronisation, altable barriers.

Introduction CSP [1,2] has always been capable of expressing external choice over multiway synchronisation: the notion of more than one process being able to exercise choice over a set of shared events, such that all processes making that choice select the same events. For some time, algorithms for resolving such choices at run-time were unavailable and, when such algorithms were proposed, they were non-trivial [3,4]. Conversely priority, a notion not expressed in standard CSP, has been a part of CSP based languages from a very early stage. Priority, loosely, is the notion that a process may reliably select one event over another when both are available. Whilst being compatible with simple events such as channel inputs, algorithms for resolving choice over multiway synchronisation have been incompatible with priority. This paper introduces an algorithm for implementing Prioritised Choice over Multiway Synchronisation (PCOMS) in JCSP [5,6] through the use of the AltableBarrier class. This addition to the JCSP library, allows entire process networks to be atomically paused or terminated by alting over such barriers. They also enable the suspension of sub-networks of processes, for the purposes of process mobility, with manageable performance overheads. This paper assumes some knowledge of both JCSP and occam-π [7,8] – the latter being the basis of most pseudo-code throughout the paper. This paper intends to establish that using the AltableBarrier class simplifies certain problems of multiway synchronisation. However, there are no immediately obvious problems which require PCOMS per se. For example, graceful termination of process networks can be achieved using conventional channel communication. However, if such networks have a complicated layout or if consistency is required at the time of termination then graceful termination using channel communication becomes more complicated. This same problem using PCOMS requires only that all affected processes are enrolled on (and regularly ALT over) an AltableBarrier which they prioritise over other events. 1

Corresponding Author: Douglas N. Warren. E-mail: [email protected].

88

D.N. Warren / Prioritised Choice over Multiway Synchronisation

Introduced in this paper are the background and limitations of existing multiway synchronisation algorithms. In Section 3 the limitations of existing notions of priority and readiness are discussed and proposals for processes to pre-assert their readiness to synchronise on barriers are made. This section also proposes the notion of nested priority, the idea that several events may be considered to be of the same priority but to exist in a wider priority ordering. Section 5 details the interface that JCSP programmers need to use in order to include AltableBarriers in their programs. Section 6 details the inner workings of the algorithm itself. Section 7 details stress tests performed on the algorithm as well as comparative performance tests with the previous (unprioritisable) AltingBarrier algorithm. The results are discussed in Section 8 as well as some proposed patterns for implementing fair alting and for influencing the probability of certain events being selected through partial priority. Section 9 concludes the paper. 1. Background This section considers some of the existing algorithms for resolving choice over multiway synchronisation both where the set of events are limited and where the set of events may be arbitrarily large. Also considered are some of the attempts to model priority in CSP. Some of the earliest algorithms resolving choice over multiway synchronisation are database transaction protocols such as the two phase commit protocol [9]. Here the choice is between selecting a ‘commit’ event or one or more processes choosing to ‘abort’ an attempt to commit changes to the database. Initially such protocols were blocking. After the commit attempt was initiated, a coordinator would ask the enrolled nodes to commit to the transaction. If all nodes commit in this way then the transaction is confirmed by an acknowledgement otherwise the nodes are informed that they should abort the transaction. In either case the network and the nodes themselves were considered to be reliable and responsive to such requests. Later incarnations were non-blocking and tolerated faults by introducing the possibility that transactions could timeout [10], these are sometimes referred to as a 3 phase commit protocol. The first phase asks nodes if they are in a position to synchronise (which the coordinator acknowledges), the second involves the processes actually committing to the synchronisation, this being subject to timeouts, the third ensures that the ‘commit’ or ‘abort’ is consistent for all nodes. The protocols above are limited in that they can be considered to be choosing over two events ‘commit’ and ‘abort’. A more general solution was proposed by McEwan [3] which reduced the choice to state machines connected to a central controller. This was followed by an algorithm which coordinates all multiway synchronisations through a single central Oracle [11] and implemented as a library extension for JCSP [5,4] in the form of the AltingBarrier class. All of the above algorithms are incompatible with user defined priority. The database commit protocols are only compatible with priority to the extent that committing to a transaction is favoured over aborting it. The more general algorithms have no mechanism by which priority can be imposed and in the case of JCSP AltingBarriers this incompatibility is made explicit. There have been many attempts to formalise event priority in CSP. Fidge [12] considers previous approaches which (either statically or dynamically) assign global absolute priority values to specific events, these approaches are considered to be less modular and compo→ − sitional. Fidge instead proposes an asymmetric choice operator ( [] ) which favours the left operand. Such an operator is distinguished from the regular external choice operator in that

D.N. Warren / Prioritised Choice over Multiway Synchronisation

89

it excludes the traces of the right hand (low priority) operand where both are allowed by the system, i.e. the high priority event is always chosen where possible. While this might be considered ideal, in practice the arbitrary nature of scheduling may allow high priority events to not be ready, even when the system allows it. Therefore low priority events are not excluded in practice in CSP based languages. However the introduction of readiness tests to CSP by Lowe [13] allow for priority to be modelled as implemented in CSP based languages. Using this model priority conflicts (an inevitable possibility with locally defined relative priority structure) are resolved by arbitrary selection, this is the same result as occurs with JCSP AltableBarriers (albeit with higher performance costs). However, Lowe treats readiness as a binary property of events, in Section 3.1 a case is presented for treating readiness as a (possibly false) assertion that all enrolled processes will be in a position to synchronise on an event in the near future. This distinction allows for processes to pre-emptively wait for multiway synchronisations to occur. Section 2.2 establishes this as being necessary to implement meaningful priority. 2. Limitations of Existing External Choice Algorithms Existing algorithms offering choice over multiway synchronisation do not offer any mechanism for expressing priority; they offer only arbitrary selection (which is all that standard CSP describes). Listed in this section are two (not intuitively obvious) ways in which repeated use of these selection algorithms can be profoundly unfair – although it is worth bearing in mind that CSP choice has no requirement for priority and repeated CSP choice has no requirement for fairness. As such, while these aspects of existing choice resolution algorithms are arguably undesirable, they all constitute valid implementations of external choice. 2.1. Arbitration by Barrier Size Pre-existing algorithms for resolving choice over multiway synchronisation have in common an incompatibility with priority [4], this means that event selection is considered to be arbitrary - in other words no guarantees are made about priority, fairness or avoiding starvation. It is therefore the responsibility of programmers to ensure that this limitation has no adverse effects on their code. While the problems of arbitrary selection may be relatively tractable for code involving channel communications, code containing barrier guards pose extra complications. Consider the following occam-π pseudo-code: PROC P1 ( BARRIER a , b ) ALT SYNC a SKIP SYNC b SKIP : PROC P2 ( BARRIER a ) SYNC a : PROC P3 ( BARRIER b ) SYNC b :

Three different types of processes, one enrolled on ‘a’ and ‘b’, the other two enrolled on either one or the other but not both. Consider a process network containing only P1 and P3 processes:

90

D.N. Warren / Prioritised Choice over Multiway Synchronisation PROC main ( VAL INT n , m ) BARRIER a , b , c : PAR PAR i = 0 FOR n P1 (a , b ) PAR i = 0 FOR m P3 ( b ) :

In such a network event ‘a’ is favoured. In order for either event to happen all of the processes enrolled on that event must be offering it. Since the set of processes enrolled on ‘a’ is a subset of those enrolled on ‘b’, for ‘b’ to be ready implies that ‘a’ is also ready (although the reverse is not true). It is therefore necessary for all of the P3 processes to offer ‘b’ before all of the P1 processes offer ‘a’ and ‘b’ in order for synchronisation on ‘b’ to be possible (even then the final selection is arbitrary). However as the ratio of P1 to P3 processes increases this necessary (but not sufficient) condition becomes less and less likely. This state of affairs may however be desirable to a programmer. For example another process in the system may be enrolled on ‘a’ but also waiting for user input. Provided that processes P1 and P3 are looped ad infinitum, ‘a’ may represent a high priority, infrequently triggered event while ‘b’ is less important and is only serviced when ‘a’ is unavailable. A naive programmer may consider that this property will always hold true. However consider what happens if P2 processes are dynamically added to the process network. Initially ‘a’ continues to be prioritised over ‘b’ but once the P2 processes outnumber the P3 processes it becomes more and more likely that ‘b’ will be picked over ‘a’, even if ‘a’ would otherwise be ready. For this reason a programmer needs to not only be aware of the overall structure of their program in order to reason about which events are selected but also the numbers of processes enrolled on those events. This becomes even more difficult if these numbers of enrolled processes change dynamically. 2.2. Unselectable Barriers As well as making the selection of events depend (to an extent) on the relative numbers of processes enrolled on competing barriers, existing algorithms for resolving external choice over multiway synchronisation can allow for the selection of certain events to be not only unlikely but (for practical purposes) impossible. Consider the pseudo-code for the following two processes: PROC P1 ( BARRIER a , b ) WHILE TRUE ALT SYNC a SKIP SYNC b SKIP : PROC P2 ( BARRIER a , c ) WHILE TRUE ALT SYNC a SKIP SYNC c SKIP :

D.N. Warren / Prioritised Choice over Multiway Synchronisation

91

If a process network is constructed exclusively out of P1 and P2 processes then the sets of processes enrolled on ‘a’, ‘b’ and ‘c’ have some interesting properties. The set of processes enrolled on ‘b’ and those enrolled on ‘c’ are both strict sub-sets of those enrolled on ‘a’. Further the intersection of the sets for ‘b’ and ‘c’ is the empty set. Since choice resolution algorithms (like the Oracle algorithm used in JCSP AltingBarriers) always select events as soon as they are ready (i.e. all enrolled processes are in a position to synchronise on the event), this means that for an event to be selected it must become ready either at the same time or before any competing events. However, because ‘a’ is a superset of ‘b’ and ‘c’ it would be necessary for ‘a’, ‘b’ and ‘c’ to become ready at the same time for ‘a’ to be selectable. This is impossible because only one process may make or retract offers at a time and no process offers ‘a’, ‘b’ and ‘c’ simultaneously. It is therefore impossible for ‘a’ to be selected as either ‘b’ or ‘c’ must become ready first. The impossibility of event ‘a’ being selected in the above scenario holds true for AltingBarrier events: each process offers its set of events atomically and the Oracle deals with each offer atomically (i.e. without interruption by other offers). However, this need not happen. If it is possible for processes to have offered event ‘a’ but to have not yet offered event ‘b’ or ‘c’, then ‘a’ may be selected if a sufficient number of processes have offered ‘a’ and are being slow about offering ‘b’ or ‘c’. This gives a clue as to how priority can be introduced into a multiway synchronisation resolution algorithm. 3. Limitations of Existing Priority Models 3.1. Case for Redefining Readiness and Priority As discussed in Section 2.2, selecting events as soon as all enrolled processes have offered to synchronise can cause serious problems for applying priority to choice over multiway synchronisation. As such meaningful priority may be introduced by allowing processes to pre-emptively wait for synchronisations to occur or by suppressing the readiness of other events in favour of higher priority ones. Here to pre-emptively wait on a given event means to offer only that event and to exclude the possibility of synchronising on any others that would otherwise be available in an external choice. Events which are not part of that external choice may be used to stop a process preemptively waiting, for example a timeout elapsing may trigger this. Once a process stops pre-emptively waiting it is once again free to offer any of the events in an external choice. In other words a process waits for the completion of one event over any other in the hope that it will be completed soon, if it is not then the process may consider offering other events. Waiting in this way requires the resolution of two problems. The first is that if processes wait indefinitely for synchronisations to occur, the network to which the process belongs may deadlock. The corollary to this is that where it is known in advance that an event cannot be selected, it should be possible for processes to bypass waiting for that event altogether (so as to avoid unnecessary delays). The second is that, as a consequence of the first problem, when a process does stop waiting for a high priority event and begins waiting for a lower priority one, it is possible that the higher priority event may become ready again. Here, ready again means that the event now merits its set of processes enrolled on it pre-emptively waiting for its completion. Thus it must be possible for a process to switch from pre-emptively waiting for a low priority synchronisation to a higher priority one. While there are almost an infinite number of ways of pre-emptively determining the readiness of any event, it is proposed that PCOMS barriers use flags to pre-emptively assert the readiness of enrolled processes. Each process uses its flags (one each per barrier that it is enrolled on) to assert whether or not it is in a position to synchronise on that event in the near future. It is a necessary condition that all enrolled processes assert their readiness for those

92

D.N. Warren / Prioritised Choice over Multiway Synchronisation

processes to begin waiting for that synchronisation to take place. If this condition becomes false during such a synchronisation attempt then that attempt is aborted. Conversely if this condition becomes true then this triggers enrolled processes waiting for lower priority events to switch to the newly available event. For the purposes of causing synchronisation attempts to be aborted because of a timeout, such timeouts falsify the assertion of the absent processes that they are in a position to synchronise in the near future. Their flags are changed to reflect this, this in turn causes the synchronisation attempt as a whole to be aborted. In this way high priority events are given every opportunity to be selected over their lower priority counterparts, while the programmer is given every opportunity to avoid wasteful synchronisation attempts where it is known that such a synchronisation is unlikely, 3.2. Case for Nested Priority While there are positive uses for prioritising some multiway synchronisations over others (graceful termination, pausing, etc.) there may be some circumstances where imposing a priority structure on existing arbitrary external choices can be undesireable. Consider the process network for the TUNA project’s one-dimensional blood clotting model [11]. Each SITE process communicates with others through a ‘tock’ event and an array of ‘pass’ events, each process being enrolled on a pass event corresponding to itself as well as the two processes in front of it in a linear pipeline. Although the events offered at any given time depend on the SITE process’ current state, it is a convenient abstraction to consider that the SITE process offers all events at all times, as in the following pseudo-code: PROC site ( VAL INT i ) WHILE TRUE ALT ALT n = 0 FOR 3 SYNC pass [ i + n ] SKIP SYNC tock SKIP :

Here the SITE process makes an arbitrary selection over the events that it is enrolled on. Now suppose that the SITE processes also offer to synchronise on a ‘pause’ barrier. This barrier would need to be of higher priority than the other barriers and would presumably only be triggered occasionally by another process waiting for user interaction. A naive way of implementing this could be the following: PROC site ( VAL INT i ) WHILE TRUE PRI ALT SYNC pause SKIP PRI ALT n = 0 FOR 3 SYNC pass [ i + n ] SKIP SYNC tock SKIP :

Here the SITE process prioritises the ‘pause’ barrier most highly, followed by the ‘pass’ barriers in numerical order, followed by the ‘tock’ barrier. This might not be initially considered a problem as any priority ordering is simply a refinement of an arbitrary selection scheme.

D.N. Warren / Prioritised Choice over Multiway Synchronisation

93

However when more than one process like this is composed in parallel problems begin to emerge, each individual SITE process identified by the ‘i’ parameter passed to it prefers the ‘pass[i]’ event over other pass events further down the pipeline. In other words SITE2 prefers ‘pass[2]’ over ‘pass[3]’, while SITE3 prefers ‘pass[3]’ over all others and so on. This constitues a priority conflict as there is no event consistently favoured by all processes enrolled on it. To paraphrase, each process wishes to select its own ‘pass’ event and will only consider lower priority events when it is satisfied that its own ‘pass’ event is not going to complete. Since no processes can agree on which event is to be prioritised there is no event which can be selected which is consistent with every process’ priority structure. There are two ways in which this can be resolved. The first is that the system deadlocks. The second is that each process wastes time waiting for its favoured event to complete, comes to the conclusion that the event will not complete and begins offering other events. This second option is effectively an (inefficient) arbitrary selection. The proposed solution to this problem for PCOMS barriers is to allow groups of events in an external choice to have no internal priority but for that group to exist in a wider prioritised context. For the purposes of expressing this as occam-π pseudo-code, a group of guards in an ALT block are considered to have no internal priority structure but if that block is embedded in PRI ALT block then those events all fit into the wider priority context of the PRI ALT block. For example in this code: PROC site ( VAL INT i ) WHILE TRUE PRI ALT SYNC pause SKIP ALT ALT n = 0 FOR 3 SYNC pass [ i + n ] SKIP SYNC tock SKIP :

The ‘pause’ event is considered to be have higher priority than all other events but the ‘pass’ and ‘tock’ events are all considered to have the same priority, thereby eliminating a