September 6, 2018

Functional DSLs in two languages

The objective of domain specific languages is to improve our productivity when programming for a certain domain. A DSL is likely more than just a set of function calls. A good DSL is not only easy to use, but also simple enough that it can be debugged and extended. And, not the least, compose and co-exist with the rest of the code base.

I wanted to generate some Java code and came up with a DSL for doing that. I am not making any claims that my approach is original in any regard, but would nevertheless like to document it because I think the approach is reusable for rendering many different kinds of data, such as

  • Text (e.g. source code as in my case)
  • Graphics
  • Audio

The approach is based on function composition and functional in nature so I will start implementing it in Clojure, but then also show an implementation in C++.

All source code can be found in this Github repository.

Sample problem

We would like to implement a DSL for generating Java source code, for instance, we would like this code

(named-class
 "Kattskit"
 (variable "double" "sum")
 (variable "double" "sumSquares")
 (variable "int" "count")
 (private
  (variable "boolean" "_isDirty"))
 (static
  (variable "int" "INSTANCE_COUNTER")))

to generate this Java source code:

public class Kattskit {
  public double sum ;
  public double sumSquares ;
  public int count ;
  private boolean _isDirty ;
  public static int INSTANCE_COUNTER ;
}

We are going to implement the DSL such that every subexpression evaluates to a function that will build up the DSL. So, as an example (variable "double" "sum") should evaluate to a function that will append the line double sum; to the output, maybe static if it is static, and with public or private, depending on visibility.

We will need two types of data that will flow through the functions: An accumulator will hold the result that the DSL is producing, and a context will hold any contextual information, for instance, should the code be generated as being static.

So in our case, we have the initial context being

(def java-src-context {:visibility :public
                       :static? false
                       :new-line-prefix "\n"})

containing contextual information about the code we are generating. The accumulator, that is the Java source code, will just be an empty string to start with:

(def java-src-accumulator "")

We will now define a helper function that can be reused of any DSL of this kind:

(defn run-dsl [context accumulator body]
  (cond
    (nil? body) accumulator
    (fn? body) (body context accumulator)
    (sequential? body) (reduce (partial run-dsl context) accumulator body)
    :default (ex-info "Invalid body" {:body body})))

The two first arguments to this function is the context and the accumulator. The body parameter is DSL code that we would like to evaluate on the accumulator to produce a new accumulator, which is returned. Here, the body can be either (i) a function context x accumulator -> accumulator, or (ii) a vector of bodies. The return value of the function is a the new accumulator holding the result of evaluating the body.

Coming back to our Java source code example, let's first define a function to generate the appropriate keyword for the visibility given a context:

(defn visibility-str [ctx]
  (-> ctx :visibility name))

For instance, we have

fdsls.core> (visibility-str (merge java-src-context {:visibility :public}))
"public"

And let's also write a function to generate the word static from a context:

(defn static-str [ctx]
  (if (:static? ctx)
    "static" ""))

These two functions, static-str and visibility-str will be used later. To generate code, we will need to append things to the accumulator. Here is a function for doing that:

(defn output-line [new? & parts]
  (fn [context accumulator]
    (str accumulator
         (if new?
           (:new-line-prefix context)
           "")
         (clojure.string/join " " (filter (complement empty?) parts)))))

This function can be used as part of our DSL. As first argument it takes a boolean indicating whether it should start a new line first. Then a parts parameter, that take strings as input. If we just evaluate this function with some arguments, we get a new function:

fdsls.core> (output-line true "Mjao")
#function[fdsls.core/output-line/fn--6695]

This function can now be used as a body to the run-dsl function that we wrote before:

fdsls.core> (run-dsl java-src-context "" (output-line true "Mjao"))
"\nMjao"

The output we get is the generated Java code. The output-line function will be a fundamental building block. In order for source code to be easy to read, it needs to be properly indented. So we will write a function for that:

(defn indent-more [& body]
  (fn [context accumulator]
    (run-dsl (update context :new-line-prefix #(str % "  "))
             accumulator body)))

To see its effect, we can wrap it around a call to output-line and evaluate the DSL:

fdsls.core> (run-dsl java-src-context "" (indent-more (output-line true "Mjao")))
"\n  Mjao"

We get the same result as before except that it is indented. So simple, and so composable!

We will add more functions. Here is a function that wraps a block of code in braces and indents the code inside. It is composed of functions that we previously wrote:

(defn block [& body]
  [(output-line false " {")
   (indent-more body)
   (output-line true "}")])

and we add a few more functions:

(defn named-class [name & body]
  (fn [context accumulator]
    (run-dsl
     context
     accumulator
     [(output-line false (visibility-str context) "class" name)
      (block
       body)])))

(defn static [& body]
  (fn [context accumulator]
    (run-dsl (assoc context :static? true)
             accumulator
             body)))

(defn private [& body]
  (fn [context accumulator]
    (run-dsl (assoc context :visibility :private)
             accumulator
             body)))

(defn variable [type name]
  (fn [context accumulator]
    (run-dsl context
             accumulator
             (output-line
              true
              (visibility-str context)
              (static-str context)
              type name ";"))))

So in order to generate the Java code, we can just call

(run-dsl java-src-context
         java-src-accumulator
         (named-class
          "Kattskit"
          (variable "double" "sum")
          (variable "double" "sumSquares")
          (variable "int" "count")
          (private
           (variable "boolean" "_isDirty"))
          (static
           (variable "int" "INSTANCE_COUNTER"))))

Because this DSLs is not based on syntax and just composition of functions as values, we can easily break it up and reuse parts as we please, e.g.

(def public-vars
  [(variable "double" "sum")
   (variable "double" "sumSquares")
   (variable "int" "count")])

(def the-class (named-class
                "Kattskit"
                public-vars
                (private
                 (variable "boolean" "_isDirty"))
                (static
                 (variable "int" "INSTANCE_COUNTER"))))

(run-dsl java-src-context
         java-src-accumulator
         the-class)

In either case, we get the output we expected:

public class Kattskit {
  public double sum ;
  public double sumSquares ;
  public int count ;
  private boolean _isDirty ;
  public static int INSTANCE_COUNTER ;
}

C++ implementation

In order to show that this pattern can work in other languages too, here follows a C++ implementation. We will implement it using templates, with one template parameter Context for the type of the context, and one template parameter for Accumulator.

First some quite generic code that we will use:

template <typename Context, typename Accumulator>
using DslFun = std::function<Accumulator(Context, Accumulator)>;

template <typename Context, typename Accumulator>
using DslBody = std::vector<DslFun<Context, Accumulator>>;

template <typename Context, typename Accumulator>
Accumulator runDsl(const Context& c, const Accumulator& acc0, 
  DslBody<Context, Accumulator> fs) {
  Accumulator acc = acc0;
  for (const auto& f: fs) {
    acc = f(c, acc);
  }
  return acc;
}

template <typename Context, typename Accumulator>
DslFun<Context, Accumulator> group(
  const DslBody<Context, Accumulator>& body) {
  return [=](const Context& c, const Accumulator& a) {
    return runDsl(c, a, body);
  };
}

The function runDsl does just what the run-dsl Clojure function does: Given a context and an initial accumulator, it executes the DSL on that accumulator and returns the resulting accumulator. The function group is a small helper function that converts vector of DSL functions into one DSL function.

For the specific case of generating Java source code, we introduce a struct JavaSrcContext that will the the Context parameter in the above functions:

struct JavaSrcContext {
  enum Visibility {
    Public, Private
  };
  
  Visibility visibility = Public;
  bool isStatic = false;
  std::string newLinePrefix = "\n";

  std::string staticStr() const {
    return isStatic? "static" : "";
  }

  std::string visibilityStr() const {
    switch (visibility) {
    case Public: return "public";
    case Private: return "private";
    };
  }
};

and introduce a few extra typedefs for convenience:

typedef DslFun<JavaSrcContext, std::string> JavaDslFun;
typedef std::vector<JavaDslFun> JavaBody;

Here is a quite direct port of all the code used to generate Java code:

JavaDslFun outputLine(bool newLine, std::vector<std::string> parts) {
  return JavaDslFun([=](JavaSrcContext c, std::string acc) {
      if (newLine) {
        acc += c.newLinePrefix;
      }
      bool rest = false;
      for (auto p: parts) {
        if (!p.empty()) {
          if (rest) {
            acc += " ";
          }
          acc += p;
          rest = true;
        }
      }
      return acc;
  });
}

JavaDslFun indentMore(JavaBody body) {
  return [=](JavaSrcContext c, std::string acc) {
    c.newLinePrefix += "  ";
    return runDsl(c, acc, body);
  };
}

JavaDslFun block(JavaBody body) {
  return group<JavaSrcContext, std::string>({
    outputLine(false, {" {"}),
    indentMore({group(body)}),
    outputLine(true, {"}"})
  });
}

JavaDslFun namedClass(const std::string& name, JavaBody body) {
  return [=](const JavaSrcContext& c, const std::string& acc) {
    return runDsl(c, acc, {
        outputLine(false, {c.visibilityStr(), "class", name}),
        block(body)
    });
  };
}

JavaDslFun Static(const JavaBody& body) {
  return [=](JavaSrcContext c, const std::string& acc) {
    c.isStatic = true;
    return runDsl(c, acc, body);
  };
}

JavaDslFun Private(const JavaBody& body) {
  return [=](JavaSrcContext c, const std::string& acc) {
    c.visibility = JavaSrcContext::Private;
    return runDsl(c, acc, body);
  };
}

JavaDslFun Variable(
  const std::string& type, const std::string& name) {
  return [=](const JavaSrcContext& c, const std::string& acc) {
    return runDsl(c, acc, {
        outputLine(true, {
            c.visibilityStr(), c.staticStr(), type, name, ";"})
      });
  };
}

And finally, we can use the DSL like this:

  auto code = namedClass("Kattskit", {
      Variable("double", "sum"),
      Variable("double", "sumSquares"),
      Variable("int", "count"),
      Private({
        Variable("boolean", "_isDirty")
      }),
      Static({
        Variable("int", "INSTANCE_COUNTER")
      })
  });

  std::cout << "The source code is\n" 
            << runDsl(JavaSrcContext(), 
                 std::string(""), 
                 {code}) 
            << std::endl;

Conclusion

We looked at a technique to implement DSLs in two languages. The DSL is expressed using nested function calls: No need for extra syntax. Here we demonstrated how it can be used to generate source code, but we can imagine other applications too, such as generation of graphics, build scripts, SQL-calls maybe? There is probably room for many improvements to make the DSL code even more concise.

As mentioned, the technique is based on function calls and program code. But code is data, so with some extra tweaks, code written in this DSL would probably be suitable also to store as data or transferred.

Tags: clojure c++ programming