Recently I read several papers about synchronous dataflow programming languages, it broadened my view about programming languages, thus I want to summarize and share what I’ve learned in this post.
Broadly speaking, computer systems can be classified into two categories:
- transformational systems
- reactive systems
Transformational systems usually take some input, execute and generate
the output on termination. For example, the command
searching is a transformational system.
In contrast, reactive systems continuously interact with the environment. Usually, the response time of reactive systems have to satisfy the requirement induced in specific application domains. The typical usage of reactive systems is automatic control in avionics, manufacturing and nuclear plants.
Synchronous dataflow programming languages, e.g. Synchronous Lucid, Lustre, Signal and Esterel, are proposed for programming reactive systems – automatic control systems in particular. These systems not only have strong time constraints, but also require high reliability. The language Lustre has been successfully used in several critical industries, including the application in Airbus A380.
We can think of dataflow programming as a network. For example, the
following diagram shows the dataflow representation of the expression
(1 + x)*y + (1 + x)*z:
1 y --->-----+ +---<--- x | | ---->---[+]---->----[*]---->----[+]---->-- | | | | +----->----[*]---->-----+ | ------->----+ z
The flow of data in the network resembles the flow of water in channels. The nodes in the network are processing stations of data, which correspond to functions in the program. These processing stations can work in parallel, i.e. the sibling functions can be executed at the same time by multi-processors or multi-computers. Thus, the dataflow model is a simple and natural model for parallel programming.
To program in dataflow style is usually to define a set of equations describing the network. Each equation corresponds to one processing station in the network. The order of the equations is usually insignificant, and each variable can only be assigned once. For example, following equations describe the network we’ve seen above:
a = 1 + x b = a * y c = a * z d = b + c
The single assignment requirement implies that dataflow programming is declarative (instead of imperative, or control-flow), like logic programming and functional programming. Declarative programming makes verification easier thanks to referential transparency, i.e. substitution of variables by its definition is always safe.
Every Data is a Flow
The most distinctive feature of dataflow programming is that every
data is a flow. A variable
x means an infinite sequence of
x3, ..., the constant
1 means an infinite sequence of 1s.
Correspondingly, the operators
if/else are extended to
operate on sequences of data. The following table illustrates this by
| c | true false true ... | x | 1 3 5 ... | y | 2 4 6 ... | x + y | 3 7 11 ... | if c x then y | 1 4 5 ...
Dataflow programming allows manipulation of data sequences by sequence
operators, such as
next, which enables operating on the
history of a sequence. The meanings of these operators are illustrated
| x | 1 3 5 ... | y | 2 4 6 ... | next y | 4 6 8 ... | x fby y | 1 2 4 ...
x fby y returns a new sequence by prepending the
y with the first element of the sequence
Following code shows the natural number sequence and the Fibonacci sequence represented in dataflow programming:
n = 1 fby n + 1 fib = 0 fby ( 1 fby fib + next fib )
Kahn gave a formal semantics of dataflow programming. In the
five page paper, he not only gave a formalization of dataflow
networks, but also introduced recursive parallel networks. He
mentioned three operators on data sequences, namely
remainder (R) and
As noted by Kahn, a distinctive merit of dataflow programming is the eradication of the notion of state of a complex system and the resulting of deterministic programs. The determinism also enables sequential simulation/execution of dataflow programs.
Seen from a different perspective, the explicit representation of time (history) avoids stateful computation. Time and state are closed intertwined in programming.
Synchronous Dataflow Programming
The general dataflow model is not very specific about the temporal behaviors of the data – whether the flow of data is at regular speed or asynchronous. Synchronous dataflow programming assumes that the flow of data in the network is at determinate frequency, though different edges in the network may have different frequencies.
This assumption greatly enriches the expressiveness of the general dataflow model in terms of temporal behaviors of systems. It has been proved that the synchronous dataflow language Lustre is adequate to describe digital circuits.
Synchronous dataflow programs can be compiled to sequential programs. This is another merit of the model.
In the following, I’ll take the language Lustre as an example to introduce the most distinctive features of synchronous dataflow programming.
In synchronous dataflow programming, each data in the stream strictly correspond to one clock. Thus, the sequence operators have a strict temporal interpretation. The meanings of these operators are illustrated below:
| x | 1 | 3 | 5 | ... | | y | 2 | 4 | 6 | ... | | x fby y | 1 | 2 | 4 | ... | | pre y | nil | 1 | 3 | ... | | x -> y | 1 | 4 | 6 | ... |
x fby y returns a new sequence by prepending the
y with the first element of the sequence
pre x delays the sequence
x by one clock. The
x -> y constructs a new sequence by replacing the first
element of the sequence
y by the first element of the sequence
It’s easy to see that
x -> pre y is equivalent to
x fby y.
Following code can detect edges of a boolean signal:
let edge c = false -> c & not (pre c)
Synchronous dataflow languages usually support programming multiple clocks in one system through some clock calculus.
Synchronous Dataflow with Functional Programming
Marc Pouzet proposed Synchronous Lucid, which is a combination of strongly typed functional programming with synchronous dataflow programming. More precisely, the combination of ML with Lustre.
Just as there are two categories of operators, namely extensional operators and temporal operators (intensional operators), there are also two kinds of functions, combinatory functions and sequential functions (or extensional/intensional). Combinatory functions are stateless, the output only depends on the same instant of the inputs. Sequential functions are stateful, the output usually depends on history valuations of the inputs. The following code shows two example functions, one for each kind:
let xor (a, b) = (a & not b) or (not a & b) // combinatory let node edge c = false -> c & not (pre c) // sequential
Synchronous Lucid allows programming in multiple clocks, each data may
have a different clock. The clock information is captured in the type
system by dependent kinding. The operator
manipulation of clocks. The keyword
clock enables introducing a
clock by a sequence of boolean values.
let clock half = count 2 true let node odd x = x when half
It’s also possible to define algebraic data types, which should be interpreted as a sequence of values, each value is of the corresponding algebraic data type. Following example code shows how to detect the direction of a colored turning disk:
type color = Bleu | Rouge | Vert type dir = Direct | Indirect | Indetermine | Immobile let node direction i = d where rec pi = i fby i and ppi = i fby pi and match ppi, pi, i with (Rouge, Rouge, Rouge) | (Bleu, Bleu, Bleu) | (Vert, Vert, Vert) -> do d = Immobile done | (_, Bleu, Rouge) | (_, Rouge, Vert) | (_, Vert, Bleu) -> do d = Direct done | (_, Rouge, Bleu) | (_, Bleu, Vert) | (_, Vert, Rouge) -> do d = Indirect done | _ -> do d = Indetermine done end
Synchronous Lucid also supports programming with signals, shared memories and automaton.
Dataflow with Object-Orientation
We can also think Word, Excel and Powerpoint as reactive systems. We know object-oriented programming has a success in this field. It implies it’s possible to combine dataflow and object-orientation.
Dataflow with Generative Programming
The idea of staged computation may be used to improve the performance of dataflow programs and increase the expressiveness of dataflow languages.
Staged computation (e.g. LMS) can differentiate compilation-time values from run-time runs, thus enabling the mixing of stream values with static values.
Gilles, K. A. H. N. (1974). The semantics of a simple language for parallel programming. Information Processing’74: Proceedings of the IFIP Congress (Vol. 74, pp. 471-475).
Caspi, P., Hamon, G., & Pouzet, M. Lucid Synchrone, un langage de programmation des systèmes réactifs. Systèmes Temps-réel: Techniques de Description et de Vérification-Théorie et Outils, 217-260. ISO 690
Halbwachs, N., Caspi, P., Raymond, P., & Pilaud, D. (1991). The synchronous data flow programming language LUSTRE. Proceedings of the IEEE, 79(9), 1305-1320.