Friday, 24 January 2014

How Clojure works: more on namespace metadata

In the post How Clojure works: namespace metadata, I commented on how the metadata seemed to be missing from the following macro-expansion of ns:

(ns greeter.hello
  "A simple namespace, worth decompiling."
  {:author "Daniel Solano Gómez"})

; macro-expands once to (with some cleanup):
(do
  (in-ns 'greeter.hello)
  (with-loading-context (refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil
    (do
      (dosync 
        (commute @#'*loaded-libs* 
                 conj 
                 'greeter.hello)) 
      nil)))

*print-meta*

Stuart Sierra pointed out in a comment that if we set *print-meta* to true, the metadata actually shows up three times in the macro-expansion:

(do
  (in-ns (quote ^{:author "Daniel Solano Gómez",
                  :doc "A simple namespace, worth decompiling."}
                greeter.hello))
  (with-loading-context (refer 'clojure.core))
  (if (.equals (quote ^{:author "Daniel Solano Gómez",
                        :doc "A simple namespace, worth decompiling."}
                      greeter.hello)
               'clojure.core)
    nil
    (do
      (dosync (commute @#'*loaded-libs*
                       conj
                       (quote ^{:author "Daniel Solano Gómez",
                                :doc "A simple namespace, worth decompiling."}
                              greeter.hello)))
      nil)))

A couple of observations:

  1. While it was obvious that the Clojure wasn't losing the metadata, now we can actually see how it gets processed.

  2. Even though the metadata is expanded three separate times, it only shows up twice in the compiled result. Apparently, when compiling a particular class, the compiler keeps tracks of what symbols are being used and deduplicates them.

This last point got me thinking: how sensitive is the compiler to the symbols and metadata it encounters?

Modifying the macro-expansion

To find out how sensitive the compiler is to symbols and their metadata, we can replace the ns form above with its macro-expansion and modify it just slightly:

(do
  (in-ns (quote ^{:author "Daniel Solano Gómez",
                  :doc "A simple namespace, worth decompiling."}
                greeter.hello))
  (clojure.core/with-loading-context (clojure.core/refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil        
    (do               
      (dosync (commute @#'clojure.core/*loaded-libs*
                       conj
                       (quote ^{:author "Daniel Solano Gómez",
                                :doc "A simple namespace, worth decompiling."}
                              greeter.hello)))
      nil)))                  

The only change is in the comparison on line 6 where we have removed the metadata from the greeter.hello symbol. Functionally, this has no effect as metadata doesn't affect equality. However, does this change the generated code?

Examining the impact

As a matter of fact, it does. We have been careful so that the change has only affected the greeter.hello__init class. Just looking at the class signature, we can see this change made an impact:

package greeter;

import clojure.lang.*;

public class hello__init {
  public static final Var const__0;
  public static final AFn const__1;
  public static final AFn const__2;
  public static final AFn const__3;

  static {}

  public static void load();
  public static void __init0();
}

There is now an additional class AFn constant. When we see the decompiled __init0 method, we can see exactly what has changed:

public static void __init0() {
  const__0 = (Var)RT.var("clojure.core", "in-ns");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "doc");
  meta[1] = "A simple namespace, worth decompiling";
  meta[2] = RT.keyword(null, "author");
  meta[3] = "Daniel Solano Gómez";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__1 = (AFn)iobj.withMeta(metaMap);
  const__2 = (AFn)Symbol.intern(null, "greeter.hello");
  const__3 = (AFn)Symbol.intern(null, "clojure.core");
}

These changes include:

  • On lines 5-8, the order of the map metadata was changed. I don't think it's a significant change, but it is a change nonetheless;
  • On line 11, const__2 now holds a version of the symbol greeter.hello without the metadata; and
  • On line 12, const__3 holds the reference to clojure.core, which used to be in const__2.

When we examine the decompiled output of load(), we see that, as expected, the version of the greeter.hello symbol that has the metadata is used for the in-ns call and the version without the metadata is used in the comparison to clojure.core:

public static void load() {
  // (in-ns 'greeter.hello)
  IFn inNs = (IFn)const__0.getRawRoot();
  inNs.invoke(const__1); // version with metadata

  // (with-loading-context (refer 'clojure.core))
  IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto();
  loading4910auto.invoke();

  // (if (.equals 'greeter.hello 'clojure.core)
  //   nil
  //   (do
  //     (LockingTransaction/runIntransaction (fn* …))
  //     nil))
  Symbol greeterHello = (Symbol)const__2; // version without metadata
  if (greeterHello.equals(const__3)) {
    return null;
  } else {
    Callable callable = (Callable)new greeter.hello$fn__17();
    LockingTransaction.runInTransaction(callable);
    return null;
  }
}

Closing thoughts

First, a big thanks to Stuart Sierra for the tip about *print-meta*, it has been really helpful.

Second, examining the Compiler source code, it becomes a bit clearer what's going on: as the compiler encounters constants, it stores them in a vector to be emitted later. Additionally, it ensures that constants are not duplicated by using an IdentityHashMap, which relies on identity rather than equality. As such, we can see how the two symbols (with and without metadata) would be considered different.

However, what's not entirely clear is how the compiler knows that in the original macro-expansion that the two symbols with metadata are identical. I spent some time studying the compiler source, but it's somewhat hard to follow. I could probably use a debugger to trace its execution, but that's an exercise for another day.

TrackBacks

Comments