How Clojure works: namespace metadata
In the first How Clojure works post, we examined how a Clojure namespace bootstraps itself. In particular, we saw how beguiling the following program can be.
(ns greeter.hello)
This program actually ends up creating three classes, including two anonymous function classes, each with a static initializer and a handful of constants.
Although I had promised looking at how a def
works, I'd like to first add a
bit more to our namespace declaration. Let's add some metadata:
(ns greeter.hello "A simple namespace, worth decompiling." {:author "Daniel Solano Gómez"})
We have added a namespace docstring as well as an attribute map that will be added to the namespace metadata. What do you think will be the result?
Anticipating the changes
Last time, we saw that ns
is a macro that actually does quite a bit. So,
let's expand it once (and clean up the result so that it can be read):
(do (in-ns 'greeter.hello) (with-loading-context (refer 'clojure.core)) (if (.equals 'greeter.hello 'clojure.core) nil (do (dosync (commute @#'clojure.core/*loaded-libs* conj 'greeter.hello)) nil)))
Well, that's interesting. It's not any different than what we had before. Where did the metadata go? Is it possible that it's all lost? That's not likely. Further macro-expansion won't help, so let's start decompiling.
Edit: As we see in the follow-up entry, using *print-meta*
allows us to see the metadata.
Decompilation overview
When we look at the list of generated classes, we find the same three generated classes as before¹:
greeter.hello__init
greeter.hello$fn__17
greeter.hello$loading__4910__auto__
We are still not seeing anything new, so it's time to break out the decompiler
and see what's going on at a deeper level. Let's start with the namespace
class, greeter.hello__init
.
greeter.hello__init
The class signature of greeter.hello__init
hasn't changed:
package greeter; import clojure.lang.*; public class hello__init { public static {}; public static final Var const__0; public static final AFn const__1; public static final AFn const__2; public static void load(); public static void __init0(); }
However, if we examine the decompiled code, we find some changes to the
__init0
method, so let's take a closer look at that.
__init0()
Examining the new content of the __init0
method, we begin to see what's going
on:
static void __init0() { const__0 = (Var)RT.var("clojure.core", "in-ns"); IObj iobj = (IObj)Symbol.intern(null, "greeter.hello"); Object[] meta = new Object[4]; meta[0] = RT.keyword(null, "author"); meta[1] = "Daniel Solano Gómez"; meta[2] = RT.keyword(null, "doc"); meta[3] = "A simple namespace, worth decompiling"; IPersistentMap metaMap = (IPersistentMap)RT.map(meta); const__1 = (AFn)iobj.withMeta(metaMap); const__2 = (AFn)Symbol.intern(null, "clojure.core"); }
As before, const__0
refers to the clojure.core/in-ns
var and const__2
refers to the clojure.core
symbol. The big difference here is that Clojure
is no longer storing the greeter.hello
symbol it creates. Instead, it
creates that symbol, 'adds'² the metadata to the symbol, and stores the
result in const__1
.
This explains, to some extent, where the metadata went. It has been preserved
by the compiler, but how can the Clojure runtime access the metadata? The
greeter.hello__init
class doesn't implement IMeta
. It seems unlikely that
the runtime would scour the class constants of loaded namespace classes looking
for metadata.
Clearly, there is more to investigate. Let's take a look at the
greeter.hello$loading__4910__auto__
class next.
greeter.hello$loading__4910__auto__
This is the class that implements (with-loading-context (refer
'clojure.core))
. It hasn't changed as a result of the new metadata, so let's
move onto the last generated class.
hello$fn__17
This is the anonymous function class that registers the namespace with Clojure. It effectively implements the following Clojure code:
(commute @#'clojure.core/*loaded-libs* conj 'greeter.hello)
Decompiling the class, we see that it hasn't changed much. As with the
greeter.hello__init
, the class signature is identical. In this case, the
implementation of the static initialiser differs:
static { const__0 = (Var)RT.var("clojure.core", "commute"); const__1 = (Var)RT.var("clojure.core", "deref"); const__2 = (Var)RT.var("clojure.core", "*loaded-libs*"); const__3 = (Var)RT.var("clojure.core", "conj"); IObj iobj = (IObj)Symbol.intern(null, "greeter.hello"); Object[] meta = new Object[4]; meta[0] = RT.keyword(null, "author"); meta[1] = "Daniel Solano Gómez"; meta[2] = RT.keyword(null, "doc"); meta[3] = "A simple namespace, worth decompiling"; IPersistentMap metaMap = (IPersistentMap)RT.map(meta); const__4 = (AFn)iobj.withMeta(metaMap); }
As before, the first four class constants refer to the vars for
clojure.core/commute
, clojure.core/deref
, clojure.core/*loaded-libs*
, and
clojure.core/conj
. For the fifth class constant, instead of storing the
symbol greeter.hello
directly, it adds the metadata to the symbol before
storing it in the class constant. So what are the consequences of this?
Well, when invoke()
is called on this anonymous function, it ensures that
clojure.core/*loaded-libs*
will contain the symbol that contains the
metadata. So, this must be where the namespace metadata comes from, right?
Digging deeper
At this point in my investigation, I was a little bit confused. At first, I
thought that the namespace metadata must come from the *loaded-libs*
var, but
that's just a ref to a sorted set of symbols. However, if I want to get the
metadata from a namespace at the REPL, I use (meta (find-ns 'greeter.hello))
,
and the type of the object returned by find-ns
is a Namespace
instance, not
a Symbol
. This got me thinking: what is the purpose of *loaded-libs*
and
where is the Namespace
instance created?
The purpose of *loaded-libs*
*loaded-libs*
is a private var declared in core.clj
. You can get
its content, a sorted set of symbols, via the loaded-libs
function. It is
used indirectly by require
and use
to keep track of what namespaces have
been loaded. For example, when you use require
without :reload
or
:reload-all
, the presence of the namespace name symbol in *loaded-libs*
will keep the namespace from being reloaded.
When using :reload-all
, Clojure uses an initially-empty, thread-local binding
of *loaded-libs*
. This allows all dependencies of the desired library to be
reloaded once, and the resulting set of loaded namespace name symbols is added
to root binding of *loaded-libs*
.
As a result, this means that the metadata used for the *loaded-libs*
is not
the metadata we get from the namespace object. For that, we'll have to take a
closer look at the metadata attached to the symbol at
greeter.hello__init/const__1
.
Another look at greeter.hello__init/load
Looking back at greeter.hello__init
, the namespace name symbol with metadata
is stored in a class constant, const__1
. The only place where this constant
is used is in the load()
method, which is decompiled as follows:
public static void load() { // (in-ns 'greeter.hello) IFn inNs = (IFn)const__0.getRawRoot(); inNs.invoke(const__1); // (with-loading-context (refer 'clojure.core)) IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto(); loading4910auto.invoke(); // (if (.equals 'greeter.hello 'clojure.core) // nil // (do // (LockingTransaction/runIntransaction (fn* …)) // nil)) Symbol greeterHello = (Symbol)const__1; if (greeterHello.equals(const__2)) { return null; } else { Callable callable = (Callable)new greeter.hello$fn__17(); LockingTransaction.runInTransaction(callable); return null; } }
As we see here, there are two places where this constant is used:
- In lines 3-4, it is used as the argument for
in-ns
. - In lines 15-16, it is used in a comparison to
'clojure.core
.
In the second case, the metadata has no effect, but what of the first?
A closer look at in-ns
in-ns
is a bit special. Unlike most of clojure.core
, it is not defined
in core.clj
. Instead, it is constructed within RT.java
.
Its value is actually an anonymous AFn
implementation also defined in
RT.java
. This implementation is fairly simple, and the
noteworthy bit is that the symbol that is passed to in-ns
is further passed
to the static method clojure.lang.Namespace/findOrCreate
.
Namespace
contains a static member called namespaces
, which is a
map³ of namespace name symbols to Namespace
object instances. When
findOrCreate
is called and there is no mapping for the symbol yet, a new
Namespace
instance is created and inserted into the map.
The Namespace
class extends clojure.lang.AReference
, which holds metadata
and indirectly implements clojure.lang.IMeta
. As such, the Namespace
constructor uses the metadata from the namespace name symbol as its
metadata.
At last, we now know how a namespace gets its metadata. Looking at the
implementation of find-ns
, we see that it just calls
Namespace/find
which merely does a lookup in the namespaces
map.
Parting thoughts
- If the purpose of
*loaded-libs*
is primarily to keep track of what namespaces have been loaded, does it really need metadata? Metadata doesn't affect the equality of symbols. Arguably, adding metadata to symbols in*loaded-libs*
is a waste of memory. - One interesting finding is that at the very heart of Clojure is a bit of mutable state. Keeping track of loaded libraries uses Clojure's concurrency utilities and persistent data structures, but namespaces rely on a Java concurrent collection.
- All new namespaces are initialised with a default set of import
mappings, mostly classes from
java.lang
. This has two main implications:- The only thing special about classes from
java.lang
is that their mappings are hard-coded. If a newjava.lang
class were to be added to Java, it won't get imported by default untilRT.java
is updated with a new mapping. - Imports are mappings of symbols to class objects, and there is not a separate set of mappings for Clojure vars.
- The only thing special about classes from
- Since Clojure keeps track of what's loaded in two different places, it's
possible mess up the environment in strange ways. In particular,
remove-ns
does not clear a symbol from*loaded-libs*
, meaning that it would be possible to get Clojure into a state where it thinks a namespace is loaded when actually it is not.
Footnotes
- Note that the names of the anonymous function classes can be different each time you compile.
- Unsurprisingly, symbols are immutable and annotating one with metadata generates a new symbol.
- In particular, it is
a
ConcurrentHashMap
, a concurrency-friendly implementation of a map from Java. It is not a persistent data structure, but it does concurrent reads and limited concurrent writes.
TrackBacks
No trackbacks, yet.
Trackbacks are closed for this story.
Comments
-
On Tuesday, 21 Jan 2014 12:57, Stuart Sierra wrote the following:
Metadata is usually attached to symbols when they are defined. If you look at the macroexpansion of `ns` with *print-meta* set to true, you can see that the symbol's metadata is constructed three times, including the transaction that commutes *loaded-libs*
-
On Tuesday, 21 Jan 2014 13:15, Daniel Solano Gómez wrote the following:
Hello, Stuart,
Thanks for the tip. I wasn't aware of *print-meta*, and I see now that using it helps me a lot. I'll need to update this post or write a new post to reflect this.
Comments are closed for this story.