Copyright © 2008 David Schmidt

Chapter 8:
Domain-Specific Languages


8.1 Domain-specific software architecture
8.2 Domain-specific language
8.3 Domain-specific programming language: a ``little language''
    8.3.1 Top-down DSPLs
    8.3.2 Bottom-up DSPL
8.4 How to design a top-down DSL
8.5 Example Top-down DSPLs
    8.5.1 Document language
    8.5.2 Parser language
    8.5.3 Gamer language
8.6 How to design a bottom-up DSL
8.7 Further reading


It is unlikely that you will ever design a general-use language like Fortran, C++, ML, or Prolog, but if you become a professional software engineer or software architect, it is highly likely that you will specialize in some problem area, like telecommunications, aviation, banking, or gaming. You will become expert at building systems in your problem area, and you may well design a notation, a language, that helps you and others write solutions to their problems in this area. In this case, you are a designer of a domain-specific language that is used to build domain-specific software architectures.

This chapter introduces these concepts, applying the concepts already learned.


8.1 Domain-specific software architecture

Every large system is built from software and hardware components; the pattern of layout and connection of the components is called its architecture. A software architecture is the layout of software components. The software architecture is deployed (installed) on the hardware architecture.

Specific problem areas, e.g., flight-control or telecommunications or banking, use specific hardware architectures, and they also use specific software architectures. When a new model of airplane is designed, the hardware architecture (the airplane hardware, including its computers) is based on a hardware design that has succeeded in the past. (It is too great of a risk to start from scratch; it is also better to build on and refine what is known to work.) The software architecture for the plane will also be based on some standard layout that is known to work well.

Software architects use a collection of concepts and techniques to build a new system in an established problem area; this collection is called a domain-specific software architecture. It contains concepts, languages, tools, and methods:

  1. application domain: defines the problem area, including the fundamental domain concepts and terminology (how the clients, designers, and builders understand and discuss the problem); customer requirements (what must the system do); scenarios (examples of behaviors); configuration models (the high-level blueprints of the system and its operation --- entity-relationship (dependency) diagrams, data flow diagrams, deployment diagrams, etc.)
  2. reference requirements: These are the ``features'' or ``customizations'' or ``attributes'' or ``ordering options'' that the clients choose to configure the desired system. (Think about all the choices you make when you order a brand new car from an auto dealer --- colors, engine options, accessories --- these are the reference requirements for the car you want.)

    Strictly speaking, the reference requirements form part of the terminology of the application domain, but they are often specially identified because they are treated specially in the implementation methodology.

  3. reference architecture: the software and hardware architectures that will support the implementation.
  4. supporting environment/infrastructure: hardware and software languages, libraries, frameworks, tools, and platforms for modelling, designing, implementating, and evaluating the system.
  5. a process or methodology for designing, implementing, and evaluating the system using the requirements, reference architecture, and environment.
We can't study all these topics here. But it is important to know that professionals start from a domain-specific software architecture to build complex, working systems. What we will study here is primarily the first item, namely the terminology/language one uses to discuss and solve problems within an application domain. This language is called a domain-specific language.


8.2 Domain-specific language

English is a general-purpose language. Legal English is a special-purpose language, dedicated to the writing of contracts and laws --- it is specific to the domain of contracts and laws. Algebra is a domain-specific language for stating numerical relationships.

A language that is designed for discussing problems, behaviors, and solutions within a problem domain is a domain-specific language (DSL). The language's vocabulary includes concepts and notation from the problem domain: the nouns, pronouns, adjectives, verbs, and adverbs of the language. The language lets participants (people and machines) discuss and implement solutions within the domain. Because its vocabulary is limited to the specific domain, a DSL is often useless to discuss and solve problems outside the domain.

DSL uses concepts familiar to people who work in the domain. For example, say that you must install an alarm system in an office building, and you must discuss the setup with the building's owners and employees. A DSL for sensor-alarm networks would discuss

Here is a scenario, stated in the DSL:
``when a movement detector detects an intruder in a room, it generates a movement-event for a camera and sends a message to a guard....''
The DSL lets you talk about the behaviors of the alarm system so that you can extract, design, and even implement the system using the DSL's vocabulary.

Compare the lingo of sensor alarms to the lingo you write in Java --- in the latter, the ``nouns'' are numbers, strings, and variables that name numbers, strings, commands, etc. The ``adjectives'' are data types and other declaration modifiers. The ``operations'' are arithmetic, data-structure indexing, function call, etc. ``Actions'' are commands, or groups of commands. ``Events'' can be GUI events or a call to a method to start execution. Java is a ``DSL'' for computation on (arrays of) numbers and strings.

Now, take your view of Java and think again of a domain like sensor alarms, or network protocols, or music composition or gaming. What are the domains of interest, their elements, the features, operations, actions, and events? How many of these are directly implemented (that is, ``understood'') by a computer? How many must be ``refined'' to be computational (understood by a computer)?

A simplistic view is that a DSL is a kind of ``restricted'' programming language, much like Legal English is ``restricted'' English:

RANGE OF PROGRAMMING LANGUAGES:

More General                                 More Specific
< ------------------------------------------------------- >
GPL (C, Java, ML, etc.)     DSL                      GUI
(Here, we treat a GUI as a ``programming language'' because a user ``programs'' with mouse drags and clicks.)

But a DSL is not exactly a restricted GPL: Consider the relationship between English and algebra --- the notation of the latter is definitely not a mere restriction of the former. It is the restriction of the problem area that is significant.

A DSL lets stakeholders (the participants in a systems project) communicate their ideas (needs, suggestions, solutions, implementations, orders). The DSL is a is a modelling language that lets us discuss models, structures, and behaviors specialized to a problem domain like telecommunications, banking, transportation, gaming, algebra, typesetting, etc.

If the computer is a ``participant,'' that is, we can use the DSL to tell the computer what to do --- we can program the computer --- then the DSL is a domain-specific programming language.


8.3 Domain-specific programming language: a ``little language''

A domain-specific programming language (DSPL) is a DSL that one uses to tell a computer what to do to solve a problem in the domain. For example, arithmetic is a DSPL for numbers, and algebra is a DSPL for numerical relationships. A DSPL will often provide some data structures and control structures in addition to its expressible values and operations. Programmers sometimes call DSPLs ``little languages'' (e.g., ``here is a little language for drawing figures''; ``here is a little language for linking files''.) Here is a short list of DSPLs that have/had wide use:
  1. Make --- for linking files
  2. Matlab --- for doing mathematics
  3. SQL --- for doing lookup queries and updates in databases
  4. VHDL -- for laying out hardware circuits
  5. Yacc, Bison, Antlr (BNF) --- for programming a parser
  6. Excel --- for laying out and computing a spreadsheet
  7. HTML --- for formatting web documents
  8. PS, groff, LaTex --- for typesetting documents
  9. eqn --- for typesetting math formulas
Many of the above languages came about because someone thought,
''It would be nice to have a little language to help me do ...this little job....''
So, that person designed a little language to do the little job.

In terms of domain-specific software architecture, someone might ask you,

''It would be nice to have a little language to help ...somebody... do ...some little job in this domain.... Can you put together something for us?''

For example, ''It would be nice to have a little language to help us lay out the wiring and sensors for a building's alarm system.''.

Or, ''It would be nice to have a little language to help us write the protocols for how the movement detectors send/receive messages to/from the other devices and people in the network.''

This kind of wishful thinking can lead to a domain-specific programming language, in particular, a top-down domain-specific programming language.


8.3.1 Top-down DSPLs

Each of the languages in the list in the previous section. does one thing well in one application domain; by no means can any of these languages be used for general-purpose computing. All the examples in the list are called top-down DSPLs because they are designed as stand-alone languages that implement domain concepts and nothing more. Since a top-down DSL is a ``little language,'' it should be easier to learn and use than a general-purpose language. (If it isn't, then it is a failure!) Less-experienced and non-programmers should be able to use a top-down DSPL to write solutions.

Excel is a good example --- it has a nice mix of graphical and textual notations that falls within the grasp of a user who has rudimentary math and problem-solving skills. The user can layout a spreadsheet that computes totals of rows and columns. (If you have never used Excel or a spreadsheet tool, you can read a tutorial here.)

Another good example is Yacc --- a user writes the BNF rules of a language, and this gives the information the Yacc compiler uses to build a parser matching the BNF rules. (Here is a tutorial.)

Another good example is SQL --- without knowing the internal layout of a data base, a user can write a query in terms of sets and set operations, and the SQL interpreter executes the query as if it were a data-structures lookup algorithm. (There is a demo and tutorial here.)

HTML lets a user format a web document in terms of paragraphs, lists, and fonts and hides for the user the details of spacing, line breaks, and painting text and pictures. (The document you are reading is ``programmed'' in HTML that I wrote by hand. Use the View/PageSource menu option on your web browser to see the program. Here is a tutorial.)

A top-down DSPLs can be used ``standalone'' without a connected general-purpose programming language.

Upon first hearing, it sounds like top-down DSPLs are wonderful --- a language for just my problem that lets me say exactly what I want! --- but in reality, a top-down DSPL is a ``mixed bag'' of assets and drawbacks:

For these reasons, top-down DSPLs are not always the best tool for solving domain-specific problems.


8.3.2 Bottom-up DSPL

There is another variant of DSPL, one that is used by an experienced programmer who wants to ``extend'' a general-purpose language with concepts specific to the problem domain. In this situation, the DSPL is programmed in the general-purpose, host language as a library of data structures and operations.

This is called a bottom-up DSPL. DSPLs for GUI-building are typically bottom-up DSPLs, because a GUI by itself is useless --- the GUI must be connected to components that do something. Here are some examples of GUI libraries (GUI DSPLs) that are mated to general-purpose languages:

Each of these DSPLs have their own nouns, verbs, styles, and templates. But they are implemented in their host languages, and a programmer must be capable of programming in the host language to use the GUI language.

Because of the popularity of GUIs, the DSPL for GUI building is ``married'' to its host language in Visual Basic and Visual C++. (But in the beginning, there were only Basic and C++.)

Experienced programmers naturally become bottom-up DSPL designers, because over time they will assemble a library of custom data structures, functions, and control structures (templates) that they use over and over again to solve problems in the same domain. Eventually, the programs they write consist mostly of the components of their library and less and less of new code, finally reaching the point where the underlying, host programming language acts merely as ``glue'' for connecting the components selected from the library.

At this point, the host language with its library is essentially a bottom-up DSPL, because the library has become ``more important'' to problem solving than the host language itself. What has happened is this:

The programmer has extended the host language ``upwards'' towards the problems to be solved.
This makes the host(glue)-language-plus-its-library a bottom-up DSPL.

In practice, bottom-up DSPLs often evolve starting from a dynamic-data-structure host language, like Scheme or Perl or Python or Ruby, because (i) there is less keyword notation to clutter the programs and (ii) there are few preset limitations on combinations of data and control structure. The custom-written library for the problem area is written in the host language, and it is oriented towards encoding ``domain-concepts-as-code'' (nouns as data structures, verbs as operations, adjectives and adverbs as control-structure templates) so that the scenarios discussed in the problem area's DSL can be readily converted into programs using the DSPL. Experienced programmers have good instincts for coding domain concepts as code and saving them as libraries. It is almost a matter of survival --- there is never enough time to build a new solution completely from scratch!

A bottom-up DSPL has its strengths and weaknesses also:


8.4 How to design a top-down DSL

Become an expert in the problem domain : learn the vocabulary --- nouns, verbs, and adjectives. Develop many scenarios (case studies) within the domain. Extract from the scenarios patterns or schemes of structure, behavior, computation.

Try to develop the ``levels'' or ``layers'' or the language, which suggest the syntax (BNF grammar) of the DSL. It can be helpful to draw ad-hoc parse trees/operator trees for the scenarios that you wish to state in the domain. The trees will help you see the levels, the nouns, adjectives, verbs (operations), actions (sentences), and events (calls). If the scenarios read as it they are understandable by a computer (that is, computable), then this can be the basis of a top-down DSPL.

The parts of the scenarios that are not computable are probably part of a DSL modelling/design language, and you must consider how to refine those parts into something that is computable. (That's why methodology is a key part of Domain-Specific Software Architecture.)

Here are questions to ask:

Also, consider --- who are the users (programmers) for the DSPL? The language must be friendly towards these persons' views of the domain. If the top-down DSPL is for non-expert programmers (e.g., like Excel or HTML are), then you must de-emphasize imperative control and data-structure and use more definitional concepts, like functional (arithmetic-like) and predicate (Prolog-like) ones.

Most non-experts have difficulty with nested structures of any form --- sequencing is about the most they can handle. Repetition is also a challenge.

Data structures must be kept simple, resembling real-life, physical structures (a sheet of graph paper, a chest of drawers, a filing cabinet, a dictionary) or resembling the structures that are basic to the domain in which the users are immersed (hallways, buildings, wiring bundles...).

Keep this directive in mind, always:

The programmers must ``see'' their domain and the actions within it in the DSPL!

If the DSPL's users see notation and concepts that lie outside their domain, the users will get lost. (That's why non-programmers don't use Java as a DSPL for spreadsheet building!)


8.5 Example Top-down DSPLs


8.5.1 Document language

Perhaps we want a DSPL for document layout within web browsers, where some details are managed by the browser but others within the document are fixed. Sample scenarios:
  1. on-line newspaper page: is a collection of columns, which hold titles, text paragraphs, pictures and captions
  2. on-line photo album: layouts pictures, borders, and captions, in grid or linear layouts
  3. on-line notebook (e.g., Google or Twitter listings): sequence of titled paragraphs, linear layouts.
Extract from such scenarios the elements/data, the structures for collecting the elements, the operations on the elements, any standard patterns for structuring or operating on the elements. Consider expressible, denotable, storable values, then components. What we might use for the document-DSPL:
elements: text (``long strings'') and images (jpg files)
data structures: paragraphs, lists (bulleted, numbered, unlabelled), tables (grids), custom layouts
operations: font and size on text, sizes, borders, padding on structures
patterns: standard layouts for titles, tables, and columns, perhaps parameterized by lengths, widths, paddings.
denotable values: user can name customized layouts or environments that can be used in multiple places in the document
storable values: is there is need for variables, to count maximum columns or layout? Should variables name text that is created or updated? Should time be a variable, say, for real-time updates of a document?
component structures: should a document be organized with functions or procedures or modules (to make it easy to move around text)?
At this point, think about every text editor program you have ever used. Think about every GUI attached to the program. If you have used a document-layout language, like PS, or LaTex, or Roff, or HTML, think about how its programs looked. Compare your experiences to the above lists. How do you get from the above wish list to the document program?

In the case of HTML, the primitive data are text, which is embedded within the program (document) and images, whose filenames are embedded. Both operations and data structures are stated with bracket pairs of form, <op ...> and </op>, where ... can hold arguments to the operation/structure. (Examples: <p> ... </p>, <font = NAME> and </font>., <ul> ... <li> ... </ul>., etc!) There are standard environments for laying out pages, and a user can formulate and name a custom environment.

How should we implement the document language? Should it be an extension of an existing programming language? Should it be a new language? Should it be expressed within a GUI so that programs are typed with the mouse plus keystrokes or as a text file, typed solely with a keyboard?

Document languages tend to be static --- they don't need variables, identifiers, loops, and components as much as other languages, and their users tend to be less sophisticated at composition and planning ahead than software people, so a top-down, stand-alone language is sensible here. The language might be GUI-based (e.g., Word) or text based (e.g., PS or LaTex or Roff) or both (HTML).


8.5.2 Parser language

BNF notation; LL(k). Notation for tree construction. Use my own conception. Note yacc, bison.

Note how Perl is Regular expression language bottom up added to script. Note how Python mail/internet libraries work this way. Java's libraries are lesser successful.


8.5.3 Gamer language

Add to favorite scripting language: library for multi-player games.


8.6 How to design a bottom-up DSL

Experienced programmers naturally become bottom-up DSPL designers: Over time, they will assemble a library of custom data structures, functions, and control structures (templates) that they use over and over again to solve problems in the same domain. Eventually, the programs they write consist mostly of the components of their library and less and less of new code, finally reaching the point where the underlying, host programming language acts merely as ``glue'' for connecting the components selected from the library.

At this point, the host language with its library is essentially a bottom-up DSPL, because the library has become ``more important'' to problem solving than the host language itself. What has happened is this:

The programmer has extended the host language ``upwards'' towards the problems to be solved.
This makes the host(glue)-language-plus-its-library a bottom-up DSPL.

So, if you work in the same problem domain on a regular basis, you will do yourself well to consciously organize your work into a library that can evolve into a bottom-up DSPL. To plan ahead for this, you should review the list of items at the beginning of the chapter that define domain-specific software architecture. Then, for each item, ask yourself, ``How much of this affects what I have to program?'' and more specifically, ``What patterns of programming will I do for this?''

When you build systems in the domain, you will need data structures and control structures and component structures that match the parts of the software architecture. You want to have a good match between each concept, idea, technique in the domain and a piece of code, so that you can convert somewhat mechanically to the concepts within a domain-specific software architecture to their software coding.

Take note of patterns of coding you do --- what repeats over and over in the coding of data structures, control structures, and component structures. Are the patterns important? Do they match the concepts in the domain-specific software architecture?

These exercises should give you strong suggestions about what you should include in the library you develop. Once you start your library, force yourself to use it as much as possible (instead of writing from scratch something similar) and improve it so that it can be used in the future. Your goal should be to do programming mostly by selecting code from your library and ``gluing'' it together with the underlying host language.

If you are having good success at developing and using your library, then you might consider how to make the library as ``stand-alone'' as possible, that is, writing program skeletons and saving them in the library, so that you write a new program by selecting the appropriate skeleton and inserting into it the data structures, operations, and control structures that you also have saved in the library. This means you use the underlying host language only as an ``interface language'' to contact external components that you have not written and you use the host language only as a ``trap door'' to write code that must ``escape'' from the problem domain area.

Implicit in the previous paragraph are the notions of framework and product line. A framework is a library that has a one or more program skeletons that one starts with to build a system. The programmer selects the appropriate skeleton and fills in the gaps with a mix of other library code and code custom written in the host language. GUI libraries are almost always organized as frameworks.

A product line is a family of programs that are structured almost exactly the same, but they differ only in minor customizations. (Consider a product line of cars all based on the same engine-chassis assembly. A software example is Notepad/Wordpad/Word, which are all based on the same structure but have different degrees of customizations for font choices, formatting, and file formats.) A product line of software is built from a library when the one and the same skeleton is used for all software products, and the gaps in the skeleton are all filled by other library components.


8.7 Further reading

When and how to develop domain-specific languages by M. Mernik, J. Heering, A.M. Sloane, CWI, Amsterdam, 2005.