Tuesday, 19 November 2013

How Clojure works: a simple namespace

Have you ever wondered how Clojure works at runtime? Perhaps you've wondered how Clojure hot swaps code at the REPL or why Clojure seems to start up so slowly. Well, here is a chance to get to know a little more about how Clojure works under the covers. We'll start by examining how a namespace bootstraps itself.

A minimal example

Let's start with a minimal example and compile the following namespace:

(ns greeter.hello)

This should generate hardly any code, right? It does nearly nothing, after all. Well, if we look at the compiler output, we find three different classes have been generated:

  1. greeter.hello__init
  2. greeter.hello$loading__4910__auto__
  3. greeter.hello$fn__17

That seems like a lot. What's going on here? Let's take a closer look.

greeter.hello__init

greeter.hello__init is the class that bootstraps the greeter.hello namespace. Every time that you try to load a namespace, Clojure will look for an AOT-compiled class that matches the namespace with an __init suffix. By loading this class, the namespace itself is loaded. So, what does such a class look like? Using the javap decompiler, we come up with the following:

package greeter;
import clojure.lang.*;

public class hello__init {
  public static {};

  public static final Var const__0;
  public static final AFn const__1;
  public static final AFn const__2;

  public static void load();
  public static void __init0();
}

What are these constants? What are all of these methods? What's that weird nameless method?

The nameless method is a static initializer, which contains code that will run when the class itself is loaded. Clojure relies on a static initialiser (line 5) to do the actual work of bootstrapping the namespace, so let's dissassemble the class and take a closer look at what is going on. This will also help us find out what these constants and static methods do.

The static initialiser

The javap tool will output the disassembled code for the initialiser, and it looks something like this:

public static {};
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: invokestatic  #75                 // Method __init0:()V
         3: ldc           #77                 // String greeter.hello__init
         5: invokestatic  #83                 // Method java/lang/Class.forName:(Ljava/lang/String;)Ljava/lang/Class;
         8: invokevirtual #87                 // Method java/lang/Class.getClassLoader:()Ljava/lang/ClassLoader;
        11: invokestatic  #93                 // Method clojure/lang/Compiler.pushNSandLoader:(Ljava/lang/ClassLoader;)V
        14: invokestatic  #95                 // Method load:()V
        17: invokestatic  #98                 // Method clojure/lang/Var.popThreadBindings:()V
        20: goto          27
        23: invokestatic  #98                 // Method clojure/lang/Var.popThreadBindings:()V
        26: athrow
        27: return
      Exception table:
         from    to  target type
            14    17    23   any

This is pretty cryptic, and from now on I'll skip the bytecode and just present Java code that reflects what's going on, such as:

static {
  __init0();
  ClassLoader loader = Class.forName("greeter.hello__init").getClassLoader();
  clojure.lang.Compiler.pushNSandLoader(loader);
  try {
    load();
  } finally {
    clojure.lang.Var.popThreadBindings();
  }
}

We'll see what the __init0 and load() methods do below. The Compiler.pushNSandLoader() and Var.popThreadBindings() calls create a binding context, roughly equivalent to the following Clojure code:

(bind [clojure.core/*ns*         nil
       clojure.core/*fn-loader*  loader
       clojure.core/*read-eval*  true]
  (greeter.hello_init/load))

At this point, this is not too interesting. We just see that:

  • *ns* has been nulled out,
  • The namespace's class loader is available via *fn-loader*, and
  • *read-eval* has been set to true.

Let's see what __init0 does next.

__init0()

You probably noticed that greeter.hello__init has a few class constants. These constants are identified by the Clojure compiler as values that are used within the class that only need to be looked up once. In a very complex namespace, such as clojure.core, there can be thousands of these constants1. Once it has collected all of these constants, the compiler generates one __init method per 100 constants.

For our namespace, we only have three constants and the one __init0 method, and the code looks something like this:

static void __init0() {
  const__0 = (Var)RT.var("clojure.core", "in-ns");
  const__1 = (AFn)Symbol.intern(null, "greeter.hello");
  const__2 = (AFn)Symbol.intern(null, "clojure.core");
}

So, we need the var clojure.core/in-ns and two symbols2. Why these in particular? Well, keep in mind that ns is actually a macro, and expand it once:

(do
  (clojure.core/in-ns (quote greeter.hello))
  (clojure.core/with-loading-context (clojure.core/refer (quote clojure.core)))
  (if (.equals (quote greeter.hello) (quote clojure.core))
    nil
    (do
      (clojure.core/dosync (clojure.core/commute (clojure.core/deref (var clojure.core/*loaded-libs*))
                                                 clojure.core/conj
                                                 (quote greeter.hello)))
      nil)))

That helps clarify things somewhat, but why are only in-ns, greeter.hello, and clojure.core saved as constants and not the rest? Well, taking a look at the full macro-expansion tells us why:

(do
  (clojure.core/in-ns (quote greeter.hello))
  ((fn* loading__4910__auto__
        ([]
         (. clojure.lang.Var (clojure.core/pushThreadBindings {clojure.lang.Compiler/LOADER (. (. loading__4910__auto__ getClass) getClassLoader)}))
         (try
           (clojure.core/refer (quote clojure.core))
           (finally
             (. clojure.lang.Var (clojure.core/popThreadBindings)))))))
  (if (. (quote greeter.hello) equals (quote clojure.core))
    nil
    (do
      (. clojure.lang.LockingTransaction (clojure.core/runInTransaction (fn*
                                                                          ([]
                                                                           (clojure.core/commute (clojure.core/deref (var clojure.core/*loaded-libs*))
                                                                                                 clojure.core/conj
                                                                                                 (quote greeter.hello))))))
      nil)))

Wow. That's ugly; but we're not here to look at beautiful Clojure code—we're here to see how beautiful Clojure code really works. We see that with-loading-context and dosync do some of their work using anonymous functions. This explains the origin of the mysterious greeter.hello$fn__17 and greeter.hello$loading__4910__auto__ classes and why greeter.hello__init has relatively few constants.

Well, that's it for __init0, and there's just one method left to look at in greeter.hello__init, load().

load()

Having seen the macro-expansion of (ns greeter.hello), we can get a pretty good idea of what to expect in load(). Let's take a look:

public static void load() {
  // (in-ns 'greeter.hello)
  IFn inNs = (IFn)const__0.getRawRoot();
  inNs.invoke(const__1);

  // (with-loading-context (refer 'clojure.core))
  IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto();
  loading4910auto.invoke();

  // (if (.equals 'greeter.hello 'clojure.core)
  //   nil
  //   (do
  //     (LockingTransaction/runIntransaction (fn* …))
  //     nil))
  Symbol greeterHello = (Symbol)const__1;
  if (greeterHello.equals(const__2)) {
    return null;
  } else {
    Callable callable = (Callable)new greeter.hello$fn__17();
    LockingTransaction.runInTransaction(callable);
    return null;
  }
}

Unsurprisingly, the first thing that happens is that (in-ns 'greeter.hello) is run. This will create the greeter.hello namespace and set *ns* to point to it. The next couple of lines perform (refer 'clojure.core), but do so in an anonymous function. Finally, the last part of the function checks to see if greeter.hello is the same as clojure.core. If not, it instantiates the second anonymous function, which will ensure that Clojure knows that the namespace has been loaded.

And that's all the greeter.hello__init class does, so now let's take a closer look at those anonymous functions.

greeter.hello$loading__4910__auto__

Remember that greeter.hello$loading__4910__auto__ is the expansion of (with-loading-context (refer 'clojure.core)), which is essentially:

  (fn* loading__4910__auto__
   ([]
    (clojure.lang.Var/pushThreadBindings {clojure.lang.Compiler/LOADER (.getClassLoader (.getClass loading__4910__auto__))})
    (try
     (refer 'clojure.core)
     (finally
      (clojure.lang.Var/popThreadBindings)))))

So, what does the class signature look like? Let's see:

package greeter;
import clojure.lang.*;

public final class hello$loading__4910__auto__ extends AFunction {
  public static final Var const__0;
  public static final AFn const__1;

  public static {};
  public hello$loading__4910__auto__();

  public Object invoke();
}

Right off the bat, we some similarities to our namespace class, namely the class constants and the static initialiser. Let's check out the decompiled results.

The static initialiser

As expected, the static initialiser does bear a great resemblance to the __init0 method we saw above:

public static {
  const__0 = (Var)Rt.var("clojure.core", "refer");
  const__1 = (AFn)Symbol.intern(null, "clojure.core");
}

We can surmise that this is a common technique by the Clojure compiler. The big difference between function classes and namespace classes appears to be that function classes instantiate their constants right in the static initialiser instead of delegating to a helper method.

All of this takes place once, when the class is loaded. However, each time the function is used, a new object will have to be constructed.

Constructor

It turns out that the constructor is fairly trivial:

public hello$loading__4910__auto__() {
  super();
}

This most likely would not have been the case had the function been a closure. In such a case, the closed over values would have been arguments to the constructor and there would have been corresponding fields.

Finally, let's get to the juicy part, invoke().

invoke()
public Object invoke() {
  // (Var/pushThreadBindings {Compiler/LOADER (.getClassLoader (.getClass loading__4910__auto__))})
  Object[] bindings = new Object[2];
  bindings[0] = Compiler.LOADER;
  bindings[1] = this.getClass().getClassLoader();
  Var.pushThreadBindings((Associative)RT.mapUniqueKeys(bindings));

  try {
    // (refer 'clojure.core)
    IFn refer = (IFn)const__0.getRawRoot();
    return refer.invoke(const__1);
  } finally {
    Var.popThreadBindings();
  }
}

The content is more or less what you'd expect. The only really interesting part is how the map for the thread bindings is created. It calls the variadic RT.mapUniqueKeys utility method which will, depending on the number of arguments given, return a singleton empty map, a PersistentArrayMap instance, or a PersistentHashMap instance.

And that's that for greeter.hello$loading__4910__auto__. We can finally look at the last anonymous function.

greeter.hello$fn__17

This anonymous function holds some code that needs to run in a transaction, as seen in the snippet below:

(dosync (commute (deref #'clojure.core/*loaded-libs*)
                 conj
                 'greeter.hello))

This is modifying the global *loaded-libs* var to let Clojure know that the greeter.hello namespace has been loaded. The expansion of this code looks like:

(LockingTransaction/runInTransaction (fn* ([] (commute (deref #'clojure.core/*loaded-libs*)
                                                       conj
                                                       'greeter.hello))))

This anonymous function holds the commute invocation, so let's see what this function's class looks like:

package greeter;
import clojure.lang.*;

public final class hello$fn__17 extends AFunction {
  public static final Var const__0;
  public static final Var const__1;
  public static final Var const__2;
  public static final Var const__3;
  public static final AFn const__4;

  public static {};

  public hello$fn__17();

  public Object invoke();
}

There is nothing surprising here. We just have a few more constants than in our previous examples.

The static initialiser and constructor

By now, the contents of the static initialiser should be predictable, can you guess what each field will be?

public static {
  const__0 = (Var)RT.var("clojure.core", "commute");
  const__1 = (Var)RT.var("clojure.core", "deref");
  const__2 = (Var)RT.var("clojure.core", "*loaded-libs*");
  const__3 = (Var)RT.var("clojure.core", "conj");
  const__4 = (AFn)Symbol.intern(null, "greeter.hello");
}

Likewise, the constructor is trivial:

public hello$fn__17() {
  super();
}

Let's see if invoke() is any more interesting.

invoke()

The heart of any Clojure function class is its invoke() method, what does this one look like?

public Object invoke() {
  IFn commute = (IFn)const__0.getRawRoot();
  IFn deref = (IFn)const__1.getRawRoot();
  // (deref #'clojure.core/*loaded-libs*)
  Object loadedLibs = deref.invoke(const__2);
  Object conj = const__3.getRawRoot();
  // (commute loadedLibs conj 'greeter.hello)
  return commute.invoke(loadedLibs, conj, const__4);
}

Again, nothing new. Each var's value is retrieved and the deref and commute functions are invoked.

Parting thoughts

I went through this exercise in an attempt to get a better understanding of Clojure's runtime. As we can see, just loading a namespace involves:

  • Loading three classes, two of which may never need to be reused
  • Instantiating at least a half dozen objects
  • Getting references to vars and symbols, at least once for each class that uses it
  • Getting the value of the var each time it used

While this seems like a lot of overhead, the fact of the matter is that this enables Clojure's dynamic runtime environment. Without it, there would be no REPL or REPL-driven development.

However, sometimes you don't want all of that. A dynamic runtime is great for development, but some production environments have constraints that make this flexibility a bad trade-off. It's been a long-standing desire of mine to have a leaner Clojure runtime, and I believe it's important to understand how the current runtime works.

In a future entry, I plan to examine the impact of adding a var to the namespace.

Footnotes

  1. A quick glance at clojure.core reveals 2342 constants and 24 __init functions.
  2. It's somewhat curious that symbols are stored as abstract functions.

TrackBacks

Comments