Recently I read several papers about synchronous dataflow programming languages, it broadened my view about programming languages, thus I want to summarize and share what I’ve learned in this post.

Introduction

Broadly speaking, computer systems can be classified into two categories:

  • transformational systems
  • reactive systems

Transformational systems usually take some input, execute and generate the output on termination. For example, the command grep for searching is a transformational system.

In contrast, reactive systems continuously interact with the environment. Usually, the response time of reactive systems have to satisfy the requirement induced in specific application domains. The typical usage of reactive systems is automatic control in avionics, manufacturing and nuclear plants.

Synchronous dataflow programming languages, e.g. Synchronous Lucid, Lustre, Signal and Esterel, are proposed for programming reactive systems – automatic control systems in particular. These systems not only have strong time constraints, but also require high reliability. The language Lustre has been successfully used in several critical industries, including the application in Airbus A380.

Dataflow Programming

We can think of dataflow programming as a network. For example, the following diagram shows the dataflow representation of the expression (1 + x)*y + (1 + x)*z:

1                       y
--->-----+           +---<---
x        |           |
---->---[+]---->----[*]---->----[+]---->--
         |                       |
         |                       |
         +----->----[*]---->-----+
                     |
         ------->----+
         z

The flow of data in the network resembles the flow of water in channels. The nodes in the network are processing stations of data, which correspond to functions in the program. These processing stations can work in parallel, i.e. the sibling functions can be executed at the same time by multi-processors or multi-computers. Thus, the dataflow model is a simple and natural model for parallel programming.

To program in dataflow style is usually to define a set of equations describing the network. Each equation corresponds to one processing station in the network. The order of the equations is usually insignificant, and each variable can only be assigned once. For example, following equations describe the network we’ve seen above:

a = 1 + x
b = a * y
c = a * z
d = b + c

The single assignment requirement implies that dataflow programming is declarative (instead of imperative, or control-flow), like logic programming and functional programming. Declarative programming makes verification easier thanks to referential transparency, i.e. substitution of variables by its definition is always safe.

Every Data is a Flow

The most distinctive feature of dataflow programming is that every data is a flow. A variable x means an infinite sequence of x1, x2, x3, ..., the constant 1 means an infinite sequence of 1s.

Correspondingly, the operators +, - and if/else are extended to operate on sequences of data. The following table illustrates this by concrete examples:

|      c          |     true         false         true     ...
|      x          |     1              3            5       ...
|      y          |     2              4            6       ...
|    x + y        |     3              7            11      ...
| if c x then y   |     1              4            5       ...

Sequence Operators

Dataflow programming allows manipulation of data sequences by sequence operators, such as fby and next, which enables operating on the history of a sequence. The meanings of these operators are illustrated below:

|      x          |     1              3            5       ...
|      y          |     2              4            6       ...
|    next y       |     4              6            8       ...
|    x fby y      |     1              2            4       ...

The expression x fby y returns a new sequence by prepending the sequence y with the first element of the sequence x.

Following code shows the natural number sequence and the Fibonacci sequence represented in dataflow programming:

n = 1 fby n + 1

fib = 0 fby ( 1 fby fib + next fib )

Kahn[1974] gave a formal semantics of dataflow programming. In the five page paper, he not only gave a formalization of dataflow networks, but also introduced recursive parallel networks. He mentioned three operators on data sequences, namely first (F), remainder (R) and append (A).

As noted by Kahn, a distinctive merit of dataflow programming is the eradication of the notion of state of a complex system and the resulting of deterministic programs. The determinism also enables sequential simulation/execution of dataflow programs.

Seen from a different perspective, the explicit representation of time (history) avoids stateful computation. Time and state are closed intertwined in programming.

Synchronous Dataflow Programming

The general dataflow model is not very specific about the temporal behaviors of the data – whether the flow of data is at regular speed or asynchronous. Synchronous dataflow programming assumes that the flow of data in the network is at determinate frequency, though different edges in the network may have different frequencies.

This assumption greatly enriches the expressiveness of the general dataflow model in terms of temporal behaviors of systems. It has been proved that the synchronous dataflow language Lustre is adequate to describe digital circuits.

Synchronous dataflow programs can be compiled to sequential programs. This is another merit of the model.

In the following, I’ll take the language Lustre as an example to introduce the most distinctive features of synchronous dataflow programming.

In synchronous dataflow programming, each data in the stream strictly correspond to one clock. Thus, the sequence operators have a strict temporal interpretation. The meanings of these operators are illustrated below:

|      x          |     1        |      3       |     5    |   ... |
|      y          |     2        |      4       |     6    |   ... |
|    x fby y      |     1        |      2       |     4    |   ... |
|     pre y       |     nil      |      1       |     3    |   ... |
|     x -> y      |     1        |      4       |     6    |   ... |

The expression x fby y returns a new sequence by prepending the sequence y with the first element of the sequence x. The expression pre x delays the sequence x by one clock. The expression x -> y constructs a new sequence by replacing the first element of the sequence y by the first element of the sequence x.

It’s easy to see that x -> pre y is equivalent to x fby y.

Following code can detect edges of a boolean signal:

let edge c = false -> c & not (pre c)

Synchronous dataflow languages usually support programming multiple clocks in one system through some clock calculus.

Synchronous Dataflow with Functional Programming

Marc Pouzet[2006] proposed Synchronous Lucid, which is a combination of strongly typed functional programming with synchronous dataflow programming. More precisely, the combination of ML with Lustre.

Just as there are two categories of operators, namely extensional operators and temporal operators (intensional operators), there are also two kinds of functions, combinatory functions and sequential functions (or extensional/intensional). Combinatory functions are stateless, the output only depends on the same instant of the inputs. Sequential functions are stateful, the output usually depends on history valuations of the inputs. The following code shows two example functions, one for each kind:

let xor (a, b) = (a & not b) or (not a & b)   // combinatory

let node edge c = false -> c & not (pre c)  // sequential

Synchronous Lucid allows programming in multiple clocks, each data may have a different clock. The clock information is captured in the type system by dependent kinding. The operator when and merge enables manipulation of clocks. The keyword clock enables introducing a clock by a sequence of boolean values.

let clock half = count 2 true
let node odd x = x when half

It’s also possible to define algebraic data types, which should be interpreted as a sequence of values, each value is of the corresponding algebraic data type. Following example code shows how to detect the direction of a colored turning disk:

type color = Bleu | Rouge | Vert
type dir = Direct | Indirect | Indetermine | Immobile

let node direction i = d where
  rec pi = i fby i
  and ppi = i fby pi
  and match ppi, pi, i with (Rouge, Rouge, Rouge) | (Bleu, Bleu, Bleu)
     | (Vert, Vert, Vert) ->
       do d = Immobile done
     | (_, Bleu, Rouge) | (_, Rouge, Vert) | (_, Vert, Bleu) ->
       do d = Direct done
     | (_, Rouge, Bleu) | (_, Bleu, Vert) | (_, Vert, Rouge) ->
       do d = Indirect done
     | _ -> do d = Indetermine done
     end

Synchronous Lucid also supports programming with signals, shared memories and automaton.

Dataflow with Object-Orientation

We can also think Word, Excel and Powerpoint as reactive systems. We know object-oriented programming has a success in this field. It implies it’s possible to combine dataflow and object-orientation.

Dataflow with Generative Programming

The idea of staged computation may be used to improve the performance of dataflow programs and increase the expressiveness of dataflow languages.

Staged computation (e.g. LMS) can differentiate compilation-time values from run-time runs, thus enabling the mixing of stream values with static values.

References

  • Gilles, K. A. H. N. (1974). The semantics of a simple language for parallel programming. Information Processing’74: Proceedings of the IFIP Congress (Vol. 74, pp. 471-475).

  • Caspi, P., Hamon, G., & Pouzet, M. Lucid Synchrone, un langage de programmation des systèmes réactifs. Systèmes Temps-réel: Techniques de Description et de Vérification-Théorie et Outils, 217-260. ISO 690

  • Halbwachs, N., Caspi, P., Raymond, P., & Pilaud, D. (1991). The synchronous data flow programming language LUSTRE. Proceedings of the IEEE, 79(9), 1305-1320.

  • Lucid Synchrone

[Back to Top]