Where is the Domain in Domain-Specific Languages

03 Mar 2021

Why Domain-Specific Languages?
What is a Domain-Specific Language?
Where is the Domain?
The Pitfall of DSLs
References

Domain-specific languages (DSL) and domain-specific architectures (DSA) are hot topics in both research and industry [1, 2, 3, 4, 5]. However, the concept of “domain-specific languages” raises some questions:

If a general-purpose language can do all things that a domain-specific language can do, why bother domain-specific languages?
Programming languages have roots in logic, which is known to be content-agnostic. How can a language be possibly made domain-specific?

In this post, I’d like to convince that

The main concern of DSLs is to improve qualities of programs and ergonomics of programming , such as safety, security, reliability, productivity, performance, etc, instead of functionality.
There are no domains in domain-specific languages. Domains influence the design of DSLs in an indirect way.

Why Domain-Specific Languages?

If a general-purpose language can do all things that a domain-specific language can do, why bother domain-specific languages? It cannot be that some computation that can be done in a DSL cannot be done in a general-purpose language. Therefore, expressiveness cannot be the main concern in the design of a DSL, though it might be a concern in improving a DSL.

The natural answer is that DSLs mainly care about qualities of programs and ergonomics of programming, such as safety, security, reliability, productivity, performance, etc.

For example, in real-time control systems, we might remove general recursion from a DSL to ensure that programs always terminate.

For example, in the setting of numeric computing, a language might make matrix and its operations as first-class language elements. This way, the DSL compiler may use the domain knowledge, laws of matrix, to optimize programs during compilation.

For example, in a multi-tenant system (e.g. web browser or SaaS), we might restrict side effects in a DSL for security reasons. This way, we can guarantee that programs written by customers cannot cause harm to the system.

For example, while it is possible to manually engineer a recursive-descent parser, it is much easier to use parser combinator for the purpose. Or, write the syntax in a DSL and use parser generator such as LeX/Yacc to automatically derive a parser.

What is a Domain-Specific Language?

In this post, we restrict ourselves to languages with explicit and structured representation of programs. With an explicit representation, it’s possible to check properties of the program statically, optimize the program and compile it to different target platforms.

DSLs can be implemented standalone with its own surface syntax (external DSL), or they can be embedded in another host language by reusing syntax of the host language (internal DSL).

Here we are more strict than Martin Fowler in his definition of internal DSL [6]:

An internal DSL is just a particular idiom of writing code in the host language. So a Ruby internal DSL is Ruby code, just written in particular style which gives a more language-like feel.

The internal DSLs defined above is usually referred to as shallow embedding, which does not have explicit representation of programs thus cannot support manipulating programs of the DSL.

In contrast, internal DSLs with explicit representation of programs is usually called deep embedding, which permits interesting manipulation of programs, such as static checks, optimization, compilation, etc. An example of such a language is Chisel [10], which is a DSL for hardware design embedded in Scala.

We do not regard DSLs with shallow embedding as DSL here. They are definitely important and useful in practice, but they are less interesting from the language perspective because we cannot perform any operation on programs of such DSLs due to the lack of explicit representation.

Where is the Domain?

Programming languages have roots in logic, which is known to be topic-agnostic. How can a language be possibly made domain-specific?

Let’s examine the major design considerations of a DSL:

syntax
semantics
common constructs: such as tuples, records, lists, options, variants
scoping, naming and name resolution
type system and type checking
custom data types
identity and equality of values
functions and procedures
effects: such as IO, randomness, errors (divide by 0, overflow, etc)
parametric polymorphism and type inference
exception mechanism
mutability vs. immutability
overloading of operators and/or methods
global names and their initialization semantics
module system and separate compilation
meta-programming: macros

As can be seen above, most tasks in the design and implementation of a DSL have little to do with a particular domain. That is one reason why I say there are no domains in domain-specific languages.

But do not get me wrong, domains do impact the design of domain-specific languages. The impact happens mainly not by putting “domain” inside the language, but by shaping the language to ensure non-functional properties about programs written in the language.

For example, in a multi-tenant cloud environment, a DSL might be equipped with a capability system such that code snippets from customers can safely execute in the tenant system and only perform permitted actions.

Of course, as we may augment a logic with a domain theory by adding a bunch of axioms, we can also add a domain in the language. For example, we may add matrix and its operations as native constructs in a numerical language. But compared to the whole language, such additions are small.

Meanwhile, such a domain is not a domain of human activity, such as car industry, avionics, insurance industry, scientific research, etc. The proper domains in domain-specific languages are always mathematical abstractions. Abstraction implies moving away from a specific domain. The abstractions might originate from a particular domain of human activities, but its essence has nothing to do with the domain and it’s potentially applicable to completely different domains with the same mathematical abstraction. That is the second reason why I say there are no domains in domain-specific languages.

It is difficult to get the abstractions right, and it makes a difference. For example, GPUs are domain-specific computing devices. If the abstractions are not done right, GPUs can only be used for graphics. When done correctly, it’s a general computing platform for genomics, deep learning, scientific computing, etc.

A particular domain of human activity relates to a DSL via programs written in the DSL. While a DSL is usually domain-agnostic, a program written in a DSL can refer to entities and solve problems in a particular domain of human activity. For example, while SQL is itself domain-agnostic, it can be used for nearly all domains of human activities.

The specific domain may also indirectly influence the design of a domain-specific language by committing to a particular programming paradigm or incorporating useful linguistic abstractions. In the early history of smalltalk, Alan Kay mentioned that programming languages seem to be either an “agglutination of features” or a “crystallization of style.” The design strategies can also be applied to domain-specific languages to make domain-specific programs more reliable, secure, easier to construct and optimize. Relational databases favors the declarative-styled language SQL to offload most optimization task from programmers to the database engine and database administrators.

I hope up to this point I have provided enough arguments to support the view that the domain in domain-specific language is not the domain of business nor scientific activities.

The Pitfall of DSLs

Designing domain-specific programming languages is hard, as pointed out by Tony Hoare in his report [7]:

The design of a DSL needs to address all problems in general-purpose language design, such as abstractions for code reuse, type checking, effects and exceptions, polymorphism, separate compilation, etc.
Users will ask for more features to be supported in the DSL and eventually make it as complex as general-purpose languages.

The users usually ask for new features

to improve code reuse, e.g. polymorphism and classes
to compose programs in new ways
to perform side effects in programs
to have recursion in the programs

This is a slippery slope. While some additions are benign, a new feature may also break program properties. For example, adding general recursion might make some static checks undecidable and programs are no longer guaranteed to terminate.

One potential approach to defend against feature creep is to base a DSL on a small and well studied calculus. For example, SQL is based on relational algebra [9]. This approach might not be bullet-proof, but I assume it can at least resist drastic and reckless changes to the language, and maintain the integrity of the language design.

Another effective strategy is ask all involved parties to read the emperor’s old clothes [11]:

There is nothing a mere scientist can say that will stand against the flood of a hundred million dollars. But there is one quality that cannot be purchased in this way – and that is reliability. The price of reliability is the pursuit of the utmost simplicity.

References

A Domain-Specific Architecture for Deep Neural Networks, Norman P. Jouppi et al, 2018
Domain-Specific Hardware Accelerators, William J. Dally et al, 2020
A New Golden Age for Computer Architecture, John L. Hennessy et al, 2019
When and how to develop domain-specific languages, Marjan Mernik et al, 2005
Domain Specific Languages, Martin Fowler, 2010
DSL Q & A, Martin Fowler
Everything You’ve Wanted to Know about Programming Languages but Have Been Afraid to Ask, Tony Hoare, 1978
Digital Design with Implicit State Machines, Fengyun Liu et al, 2020
Chisel 3: A Modern Hardware Design Language
Relational Algebra
The Emperor’s Old Clothes, Tony Hoare, 1980