Tuesday, July 23, 2013

What Declarative Languages Are

On his blog, Bob Harper asks what, if anything, a declarative language is. He notes that "declarative" is often used to mean "logic or functional programming", and is (justly) skeptical that this is a useful pairing.

However, there's actually a surprisingly simple and useful definition of declarative language: a declarative language is any language with a semantics has some nontrivial existential quantifiers in it. To illustrate this definition, let's begin by looking at some examples of declarative languages:

  • Regular expressions
  • Context-free grammars
  • Database query languages based on relational algebra (eg, SQL)
  • Logic programming languages (eg, Prolog)
  • Constraint-based languages for layout (eg, CSS)

The common factor is that all of these languages have a semantics which relies on some existential quantifiers which it is not immediately obvious how to discharge. For example, the semantics of regular expressions can be explained in terms of string membership, by giving a judgement $w \in r$ meaning that the string $w$ is an inhabitant of the regular expression $r$. $$ \begin{array}{c} \frac{} {\epsilon \in \cdot} \\[1em] \frac{} { c \in c} \\[1em] \frac{ w \in r_i \qquad i \in \{1,2\} } { w \in r_1 \vee r_2} \\[1em] \mbox{(no rule for $w \in \bot$)} \\[1em] \frac{\exists w_1, w_2.\; w = w_1 \cdot w_2 \qquad w_1 \in r_1 \qquad w_2 \in r_2 } {w \in r_1 \cdot r_2} \\[1em] \frac{\exists w_1, w_2.\; w = w_1 \cdot w_2 \qquad w_1 \in r \qquad w_2 \in r* } {w \in r*} \end{array} $$

In particular, note the appearance of an existential quantifier in the premises of the sequential composition and Kleene star cases, and note the nondeterministic choice of a branch in the alternation case. So read as a logic program, this semantics is not well-moded.

Of course, to implement a regular expression matcher, you need an operational semantics which is well-moded and functional, which we get by constructing a finite state machine from the regular expression. But now we have a proof obligation to show that the operational semantics agrees with the declarative semantics.

We can make a similar distinction for each of the examples above. Context-free languages also have a declarative membership relation, but are recognized with parsing algorithms such as CYK and Earley parsing. Query languages have a declarative semantics in terms of the relational algebra, where relational composition is hiding an existential, but are implemented in terms of nested loops and indices. Logic programming has a declarative semantics in terms of the model theory of first-order logic, and an implementation in terms of backtracking and unification. Constraint languages have a declarative semantics in terms of simultaneous satisfaction of a collection of equations, with the existentials lurking in the values assigned to the free variables, but are implemented in terms of simplification and propagation through imperative graphs.

This also lets us make the prediction that the least-loved features of any declarative language will be the ones that expose the operational model, and break the declarative semantics. So we can predict that people will dislike (a) backreferences in regular expressions, (b) ordered choice in grammars, (c) row IDs in query languages, (d) cut in Prolog, (e) constraint priorities in constraint languages. And lo and behold, these are indeed the features which programmers are encouraged to avoid as much as possible!

This definition also lets say that functional programming is not a declarative language -- the reduction relation of the lambda calculus is well-moded, in that we do not need to guess any part of a term to figure out how to reduce it. (And if you fix an evaluation order, the reduction relation is even deterministic.) The same is true for imperative languages like ML or Haskell: now we're just additionally threading a store through the reduction relation.


  1. Interesting -- so you don't consider a well-moded logic program to be declarative? I think this is a pretty different sense from how the parallelism folks Bob mentioned were using it, and from how I'd use it, but maybe it's a meaningful distinction anyhow.

  2. Hi Chris, you're reading me right! If every parameter of the declaration has a unique mode, then I don't really think it's declarative, since you can turn it into a functional program without doing anything very interesting. For example, when I prove that a type system is "algorithmic", essentially I see my job as proving that the rules can be read as a logic program with (a) some decreasing metric, and that (b) each argument has a unique input or output mode.

    An ambiguous case is when a logic program has multiple valid modes (eg, list concatenation in Prolog). Then I'm not sure whether I'd call it declarative or not. I'm also not sure about things like difference lists, which are operationally very deterministic, but which simple mode systems can't handle.

    But I think this definition *does* cover what parallelism people are thinking of. In that space, your cost semantics is affected by a whole bunch of issues like scheduling, assigning work to different processors, caching, memory traffic, and so on. BUT you don't really want to expose these choices to programmers, since the machines the program will run on will change so much more frequently than the program will. So you want to say that the distribution of work will be chosen by the compiler/runtime, and that's the nontrivial instantiation of existential quantifiers I'm thinking of.

    In the end, though, I don't think it's really a big deal -- I can't see "declarative" being a definition you'd use in a theorem, and so fundamentally the choice is a matter of taste. I just think there actually is a coherent sense in which people use it.

  3. This is a very natural, syntactic definition! Thank you! A mundane remark: you're missing a start rule concerning the Kleene star I think. On a more interesting level, I'm not sure I follow you about functional programming: imagine the pure lambda-calculus with beta and no fixed reduction strategy ie. a general context rule. That context rule *does* have a hidden existential (the context C[] in which to apply beta), so we *do* need to guess a part of the rule (ok, not a term but a term with a hole). So following your definition, it is a declarative language, only a "confluent" one. What do you think?

  4. This is analagous to the difference between "logic program with a defined mode" and "logic program with potentially many modes, or no valid moding," I think.

    When I write an Standard ML program there is a totally defined operational semantics. (Same for Haskell.) The syntax could be divorced from that evaluation order, leading to a declarative reading. But in point of fact that is incompatible with the way people use sequential functional programming languages (Standard ML or Haskell, at least), but it *is* common to divorce the text of a logic program from its mode or modes. This is the matter of taste bit.

  5. This condition doesn't seem very rigorous, in a way that leads to the answers we expect. What about an operational semantics for C that says for a call to function "f": "there exists an entry in the list of program functions such that the 'name' field of the entry is 'f'"? Sure, we can make this lookup process manifestly algorithmic, but we can do the same to implement regular expressions.

  6. Pardon my ignorance.. I would love to know.. where can I find an explanation of the symbols used above in the definition of regular expressions ? Thank you!