Monday, 20 January 2014

How Clojure works: namespace metadata

In the first How Clojure works post, we examined how a Clojure namespace bootstraps itself. In particular, we saw how beguiling the following program can be.

(ns greeter.hello)

This program actually ends up creating three classes, including two anonymous function classes, each with a static initializer and a handful of constants.

Although I had promised looking at how a def works, I'd like to first add a bit more to our namespace declaration. Let's add some metadata:

(ns greeter.hello
  "A simple namespace, worth decompiling."
  {:author "Daniel Solano Gómez"})

We have added a namespace docstring as well as an attribute map that will be added to the namespace metadata. What do you think will be the result?

Anticipating the changes

Last time, we saw that ns is a macro that actually does quite a bit. So, let's expand it once (and clean up the result so that it can be read):

(do
  (in-ns 'greeter.hello)
  (with-loading-context (refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil
    (do
      (dosync 
        (commute @#'clojure.core/*loaded-libs* 
                 conj 
                 'greeter.hello)) 
      nil)))

Well, that's interesting. It's not any different than what we had before. Where did the metadata go? Is it possible that it's all lost? That's not likely. Further macro-expansion won't help, so let's start decompiling.

Edit: As we see in the follow-up entry, using *print-meta* allows us to see the metadata.

Decompilation overview

When we look at the list of generated classes, we find the same three generated classes as before¹:

  • greeter.hello__init
  • greeter.hello$fn__17
  • greeter.hello$loading__4910__auto__

We are still not seeing anything new, so it's time to break out the decompiler and see what's going on at a deeper level. Let's start with the namespace class, greeter.hello__init.

greeter.hello__init

The class signature of greeter.hello__init hasn't changed:

package greeter;
import clojure.lang.*;

public class hello__init {
  public static {};

  public static final Var const__0;
  public static final AFn const__1;
  public static final AFn const__2;

  public static void load();
  public static void __init0();
}

However, if we examine the decompiled code, we find some changes to the __init0 method, so let's take a closer look at that.

__init0()

Examining the new content of the __init0 method, we begin to see what's going on:

static void __init0() {
  const__0 = (Var)RT.var("clojure.core", "in-ns");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "author");
  meta[1] = "Daniel Solano Gómez";
  meta[2] = RT.keyword(null, "doc");
  meta[3] = "A simple namespace, worth decompiling";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__1 = (AFn)iobj.withMeta(metaMap);
  const__2 = (AFn)Symbol.intern(null, "clojure.core");
}

As before, const__0 refers to the clojure.core/in-ns var and const__2 refers to the clojure.core symbol. The big difference here is that Clojure is no longer storing the greeter.hello symbol it creates. Instead, it creates that symbol, 'adds'² the metadata to the symbol, and stores the result in const__1.

This explains, to some extent, where the metadata went. It has been preserved by the compiler, but how can the Clojure runtime access the metadata? The greeter.hello__init class doesn't implement IMeta. It seems unlikely that the runtime would scour the class constants of loaded namespace classes looking for metadata.

Clearly, there is more to investigate. Let's take a look at the greeter.hello$loading__4910__auto__ class next.

greeter.hello$loading__4910__auto__

This is the class that implements (with-loading-context (refer 'clojure.core)). It hasn't changed as a result of the new metadata, so let's move onto the last generated class.

hello$fn__17

This is the anonymous function class that registers the namespace with Clojure. It effectively implements the following Clojure code:

(commute @#'clojure.core/*loaded-libs* 
         conj 
         'greeter.hello)

Decompiling the class, we see that it hasn't changed much. As with the greeter.hello__init, the class signature is identical. In this case, the implementation of the static initialiser differs:

static {
  const__0 = (Var)RT.var("clojure.core", "commute");
  const__1 = (Var)RT.var("clojure.core", "deref");
  const__2 = (Var)RT.var("clojure.core", "*loaded-libs*");
  const__3 = (Var)RT.var("clojure.core", "conj");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "author");
  meta[1] = "Daniel Solano Gómez";
  meta[2] = RT.keyword(null, "doc");
  meta[3] = "A simple namespace, worth decompiling";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__4 = (AFn)iobj.withMeta(metaMap);
}

As before, the first four class constants refer to the vars for clojure.core/commute, clojure.core/deref, clojure.core/*loaded-libs*, and clojure.core/conj. For the fifth class constant, instead of storing the symbol greeter.hello directly, it adds the metadata to the symbol before storing it in the class constant. So what are the consequences of this?

Well, when invoke() is called on this anonymous function, it ensures that clojure.core/*loaded-libs* will contain the symbol that contains the metadata. So, this must be where the namespace metadata comes from, right?

Digging deeper

At this point in my investigation, I was a little bit confused. At first, I thought that the namespace metadata must come from the *loaded-libs* var, but that's just a ref to a sorted set of symbols. However, if I want to get the metadata from a namespace at the REPL, I use (meta (find-ns 'greeter.hello)), and the type of the object returned by find-ns is a Namespace instance, not a Symbol. This got me thinking: what is the purpose of *loaded-libs* and where is the Namespace instance created?

The purpose of *loaded-libs*

*loaded-libs* is a private var declared in core.clj. You can get its content, a sorted set of symbols, via the loaded-libs function. It is used indirectly by require and use to keep track of what namespaces have been loaded. For example, when you use require without :reload or :reload-all, the presence of the namespace name symbol in *loaded-libs* will keep the namespace from being reloaded.

When using :reload-all, Clojure uses an initially-empty, thread-local binding of *loaded-libs*. This allows all dependencies of the desired library to be reloaded once, and the resulting set of loaded namespace name symbols is added to root binding of *loaded-libs*.

As a result, this means that the metadata used for the *loaded-libs* is not the metadata we get from the namespace object. For that, we'll have to take a closer look at the metadata attached to the symbol at greeter.hello__init/const__1.

Another look at greeter.hello__init/load

Looking back at greeter.hello__init, the namespace name symbol with metadata is stored in a class constant, const__1. The only place where this constant is used is in the load() method, which is decompiled as follows:

public static void load() {
  // (in-ns 'greeter.hello)
  IFn inNs = (IFn)const__0.getRawRoot();
  inNs.invoke(const__1);

  // (with-loading-context (refer 'clojure.core))
  IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto();
  loading4910auto.invoke();

  // (if (.equals 'greeter.hello 'clojure.core)
  //   nil
  //   (do
  //     (LockingTransaction/runIntransaction (fn* …))
  //     nil))
  Symbol greeterHello = (Symbol)const__1;
  if (greeterHello.equals(const__2)) {
    return null;
  } else {
    Callable callable = (Callable)new greeter.hello$fn__17();
    LockingTransaction.runInTransaction(callable);
    return null;
  }
}

As we see here, there are two places where this constant is used:

  1. In lines 3-4, it is used as the argument for in-ns.
  2. In lines 15-16, it is used in a comparison to 'clojure.core.

In the second case, the metadata has no effect, but what of the first?

A closer look at in-ns

in-ns is a bit special. Unlike most of clojure.core, it is not defined in core.clj. Instead, it is constructed within RT.java. Its value is actually an anonymous AFn implementation also defined in RT.java. This implementation is fairly simple, and the noteworthy bit is that the symbol that is passed to in-ns is further passed to the static method clojure.lang.Namespace/findOrCreate.

Class diagram of clojure.lang.Namespace

Namespace contains a static member called namespaces, which is a map³ of namespace name symbols to Namespace object instances. When findOrCreate is called and there is no mapping for the symbol yet, a new Namespace instance is created and inserted into the map.

The Namespace class extends clojure.lang.AReference, which holds metadata and indirectly implements clojure.lang.IMeta. As such, the Namespace constructor uses the metadata from the namespace name symbol as its metadata.

At last, we now know how a namespace gets its metadata. Looking at the implementation of find-ns, we see that it just calls Namespace/find which merely does a lookup in the namespaces map.

Parting thoughts

  1. If the purpose of *loaded-libs* is primarily to keep track of what namespaces have been loaded, does it really need metadata? Metadata doesn't affect the equality of symbols. Arguably, adding metadata to symbols in *loaded-libs* is a waste of memory.
  2. One interesting finding is that at the very heart of Clojure is a bit of mutable state. Keeping track of loaded libraries uses Clojure's concurrency utilities and persistent data structures, but namespaces rely on a Java concurrent collection.
  3. All new namespaces are initialised with a default set of import mappings, mostly classes from java.lang. This has two main implications:
    1. The only thing special about classes from java.lang is that their mappings are hard-coded. If a new java.lang class were to be added to Java, it won't get imported by default until RT.java is updated with a new mapping.
    2. Imports are mappings of symbols to class objects, and there is not a separate set of mappings for Clojure vars.
  4. Since Clojure keeps track of what's loaded in two different places, it's possible mess up the environment in strange ways. In particular, remove-ns does not clear a symbol from *loaded-libs*, meaning that it would be possible to get Clojure into a state where it thinks a namespace is loaded when actually it is not.

Footnotes

  1. Note that the names of the anonymous function classes can be different each time you compile.
  2. Unsurprisingly, symbols are immutable and annotating one with metadata generates a new symbol.
  3. In particular, it is a ConcurrentHashMap, a concurrency-friendly implementation of a map from Java. It is not a persistent data structure, but it does concurrent reads and limited concurrent writes.

TrackBacks

Comments