Sunday, 26 January 2014

Book review: Boost C++ Application Development Cookbook

Although I write a lot about Clojure, I use C++ and Boost for the majority of my work. As such, when Packt offered me a review copy of Boost C++ Application Development Cookbook (sample chapter), I gladly took them up on the offer.

About the book

As you may infer from the word ‘cookbook’ in the title, this is not a comprehensive Boost reference or a book you expect to read from cover to cover. It consists of scores of recipes, all of which follow the same formula:

  1. An introduction the problem to solve.
  2. A brief statement of the prerequisite knowledge for the solution.
  3. A step-by-step walkthrough on how to solve the problem.
  4. A brief explanation as to how/why the solution works.
  5. Discussion about comparable/related functionality in C++11 or in other parts of Boost.
  6. Pointers to related recipes, external resources, and related Boost documentation.

These recipes are all grouped together into chapters, with each chapter having a general topic such as resource management or multithreading.

The intended audience for this book is experienced C++ developers who may not be familiar with all of Boost’s functionality and how compares with C++11.

Highlights

There are a lot things that I like about this book, in particular its emphasis on C++11 and the way it introduces a topic and then gives you to resources for more in-depth learning.

Though it’s been out for a couple years now, C++11 is still a relatively new standard and it takes time for programmers and programs to adopt it. Many of the new library features in C++11 are based on Boost libraries, and some Boost libraries exists to help ‘backport’ new language features to older versions of C++. The Cookbook does a very good job of letting the reader know whether C++11 has the same or similar features as Boost and how they differ.

The other thing I really enjoyed about this book is how it gently introduces the reader to Boost. There are a lot of Boost libraries, and the quality of the official documentation varies from very good to cryptic. In the past, I have avoided some libraries simply because I could never figure out how to even get started using them. This book can make some of these more accessible by giving me a simple example from which I can get a toehold. From that point, I can start to make sense of the documentation.

Room for improvement

No book is perfect, and there are a couple of ways in which this book could be more useful:

  1. It is not exhaustive. Granted, the number of Boost libraries is enormous, and some of them have limited applicability. Fortunately, the Cookbook does a good job of covering the most useful libraries.
  2. It only barely touches on the situation when you have to do deal with different Boost versions. Some Boost libraries have source and ABI incompatibilities between versions, and it can sometimes be a bit of a nightmare to write code that has to support different versions of Boost. It would have been nice to see if the author had any insights on how to handle that issue.

Concluding thoughts

Boost C++ Application Development Cookbook is definitely worth considering if you are a C++ developer that uses or would like to use Boost. It’s a good reference to have handy when you find yourself in a situation where you think: 'There has got to be a library for this.’

Friday, 24 January 2014

How Clojure works: more on namespace metadata

In the post How Clojure works: namespace metadata, I commented on how the metadata seemed to be missing from the following macro-expansion of ns:

(ns greeter.hello
  "A simple namespace, worth decompiling."
  {:author "Daniel Solano Gómez"})

; macro-expands once to (with some cleanup):
(do
  (in-ns 'greeter.hello)
  (with-loading-context (refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil
    (do
      (dosync 
        (commute @#'*loaded-libs* 
                 conj 
                 'greeter.hello)) 
      nil)))

*print-meta*

Stuart Sierra pointed out in a comment that if we set *print-meta* to true, the metadata actually shows up three times in the macro-expansion:

(do
  (in-ns (quote ^{:author "Daniel Solano Gómez",
                  :doc "A simple namespace, worth decompiling."}
                greeter.hello))
  (with-loading-context (refer 'clojure.core))
  (if (.equals (quote ^{:author "Daniel Solano Gómez",
                        :doc "A simple namespace, worth decompiling."}
                      greeter.hello)
               'clojure.core)
    nil
    (do
      (dosync (commute @#'*loaded-libs*
                       conj
                       (quote ^{:author "Daniel Solano Gómez",
                                :doc "A simple namespace, worth decompiling."}
                              greeter.hello)))
      nil)))

A couple of observations:

  1. While it was obvious that the Clojure wasn't losing the metadata, now we can actually see how it gets processed.

  2. Even though the metadata is expanded three separate times, it only shows up twice in the compiled result. Apparently, when compiling a particular class, the compiler keeps tracks of what symbols are being used and deduplicates them.

This last point got me thinking: how sensitive is the compiler to the symbols and metadata it encounters?

Modifying the macro-expansion

To find out how sensitive the compiler is to symbols and their metadata, we can replace the ns form above with its macro-expansion and modify it just slightly:

(do
  (in-ns (quote ^{:author "Daniel Solano Gómez",
                  :doc "A simple namespace, worth decompiling."}
                greeter.hello))
  (clojure.core/with-loading-context (clojure.core/refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil        
    (do               
      (dosync (commute @#'clojure.core/*loaded-libs*
                       conj
                       (quote ^{:author "Daniel Solano Gómez",
                                :doc "A simple namespace, worth decompiling."}
                              greeter.hello)))
      nil)))                  

The only change is in the comparison on line 6 where we have removed the metadata from the greeter.hello symbol. Functionally, this has no effect as metadata doesn't affect equality. However, does this change the generated code?

Examining the impact

As a matter of fact, it does. We have been careful so that the change has only affected the greeter.hello__init class. Just looking at the class signature, we can see this change made an impact:

package greeter;

import clojure.lang.*;

public class hello__init {
  public static final Var const__0;
  public static final AFn const__1;
  public static final AFn const__2;
  public static final AFn const__3;

  static {}

  public static void load();
  public static void __init0();
}

There is now an additional class AFn constant. When we see the decompiled __init0 method, we can see exactly what has changed:

public static void __init0() {
  const__0 = (Var)RT.var("clojure.core", "in-ns");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "doc");
  meta[1] = "A simple namespace, worth decompiling";
  meta[2] = RT.keyword(null, "author");
  meta[3] = "Daniel Solano Gómez";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__1 = (AFn)iobj.withMeta(metaMap);
  const__2 = (AFn)Symbol.intern(null, "greeter.hello");
  const__3 = (AFn)Symbol.intern(null, "clojure.core");
}

These changes include:

  • On lines 5-8, the order of the map metadata was changed. I don't think it's a significant change, but it is a change nonetheless;
  • On line 11, const__2 now holds a version of the symbol greeter.hello without the metadata; and
  • On line 12, const__3 holds the reference to clojure.core, which used to be in const__2.

When we examine the decompiled output of load(), we see that, as expected, the version of the greeter.hello symbol that has the metadata is used for the in-ns call and the version without the metadata is used in the comparison to clojure.core:

public static void load() {
  // (in-ns 'greeter.hello)
  IFn inNs = (IFn)const__0.getRawRoot();
  inNs.invoke(const__1); // version with metadata

  // (with-loading-context (refer 'clojure.core))
  IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto();
  loading4910auto.invoke();

  // (if (.equals 'greeter.hello 'clojure.core)
  //   nil
  //   (do
  //     (LockingTransaction/runIntransaction (fn* …))
  //     nil))
  Symbol greeterHello = (Symbol)const__2; // version without metadata
  if (greeterHello.equals(const__3)) {
    return null;
  } else {
    Callable callable = (Callable)new greeter.hello$fn__17();
    LockingTransaction.runInTransaction(callable);
    return null;
  }
}

Closing thoughts

First, a big thanks to Stuart Sierra for the tip about *print-meta*, it has been really helpful.

Second, examining the Compiler source code, it becomes a bit clearer what's going on: as the compiler encounters constants, it stores them in a vector to be emitted later. Additionally, it ensures that constants are not duplicated by using an IdentityHashMap, which relies on identity rather than equality. As such, we can see how the two symbols (with and without metadata) would be considered different.

However, what's not entirely clear is how the compiler knows that in the original macro-expansion that the two symbols with metadata are identical. I spent some time studying the compiler source, but it's somewhat hard to follow. I could probably use a debugger to trace its execution, but that's an exercise for another day.

Monday, 20 January 2014

How Clojure works: namespace metadata

In the first How Clojure works post, we examined how a Clojure namespace bootstraps itself. In particular, we saw how beguiling the following program can be.

(ns greeter.hello)

This program actually ends up creating three classes, including two anonymous function classes, each with a static initializer and a handful of constants.

Although I had promised looking at how a def works, I'd like to first add a bit more to our namespace declaration. Let's add some metadata:

(ns greeter.hello
  "A simple namespace, worth decompiling."
  {:author "Daniel Solano Gómez"})

We have added a namespace docstring as well as an attribute map that will be added to the namespace metadata. What do you think will be the result?

Anticipating the changes

Last time, we saw that ns is a macro that actually does quite a bit. So, let's expand it once (and clean up the result so that it can be read):

(do
  (in-ns 'greeter.hello)
  (with-loading-context (refer 'clojure.core))
  (if (.equals 'greeter.hello 'clojure.core)
    nil
    (do
      (dosync 
        (commute @#'clojure.core/*loaded-libs* 
                 conj 
                 'greeter.hello)) 
      nil)))

Well, that's interesting. It's not any different than what we had before. Where did the metadata go? Is it possible that it's all lost? That's not likely. Further macro-expansion won't help, so let's start decompiling.

Edit: As we see in the follow-up entry, using *print-meta* allows us to see the metadata.

Decompilation overview

When we look at the list of generated classes, we find the same three generated classes as before¹:

  • greeter.hello__init
  • greeter.hello$fn__17
  • greeter.hello$loading__4910__auto__

We are still not seeing anything new, so it's time to break out the decompiler and see what's going on at a deeper level. Let's start with the namespace class, greeter.hello__init.

greeter.hello__init

The class signature of greeter.hello__init hasn't changed:

package greeter;
import clojure.lang.*;

public class hello__init {
  public static {};

  public static final Var const__0;
  public static final AFn const__1;
  public static final AFn const__2;

  public static void load();
  public static void __init0();
}

However, if we examine the decompiled code, we find some changes to the __init0 method, so let's take a closer look at that.

__init0()

Examining the new content of the __init0 method, we begin to see what's going on:

static void __init0() {
  const__0 = (Var)RT.var("clojure.core", "in-ns");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "author");
  meta[1] = "Daniel Solano Gómez";
  meta[2] = RT.keyword(null, "doc");
  meta[3] = "A simple namespace, worth decompiling";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__1 = (AFn)iobj.withMeta(metaMap);
  const__2 = (AFn)Symbol.intern(null, "clojure.core");
}

As before, const__0 refers to the clojure.core/in-ns var and const__2 refers to the clojure.core symbol. The big difference here is that Clojure is no longer storing the greeter.hello symbol it creates. Instead, it creates that symbol, 'adds'² the metadata to the symbol, and stores the result in const__1.

This explains, to some extent, where the metadata went. It has been preserved by the compiler, but how can the Clojure runtime access the metadata? The greeter.hello__init class doesn't implement IMeta. It seems unlikely that the runtime would scour the class constants of loaded namespace classes looking for metadata.

Clearly, there is more to investigate. Let's take a look at the greeter.hello$loading__4910__auto__ class next.

greeter.hello$loading__4910__auto__

This is the class that implements (with-loading-context (refer 'clojure.core)). It hasn't changed as a result of the new metadata, so let's move onto the last generated class.

hello$fn__17

This is the anonymous function class that registers the namespace with Clojure. It effectively implements the following Clojure code:

(commute @#'clojure.core/*loaded-libs* 
         conj 
         'greeter.hello)

Decompiling the class, we see that it hasn't changed much. As with the greeter.hello__init, the class signature is identical. In this case, the implementation of the static initialiser differs:

static {
  const__0 = (Var)RT.var("clojure.core", "commute");
  const__1 = (Var)RT.var("clojure.core", "deref");
  const__2 = (Var)RT.var("clojure.core", "*loaded-libs*");
  const__3 = (Var)RT.var("clojure.core", "conj");
  IObj iobj = (IObj)Symbol.intern(null, "greeter.hello");
  Object[] meta = new Object[4];
  meta[0] = RT.keyword(null, "author");
  meta[1] = "Daniel Solano Gómez";
  meta[2] = RT.keyword(null, "doc");
  meta[3] = "A simple namespace, worth decompiling";
  IPersistentMap metaMap = (IPersistentMap)RT.map(meta);
  const__4 = (AFn)iobj.withMeta(metaMap);
}

As before, the first four class constants refer to the vars for clojure.core/commute, clojure.core/deref, clojure.core/*loaded-libs*, and clojure.core/conj. For the fifth class constant, instead of storing the symbol greeter.hello directly, it adds the metadata to the symbol before storing it in the class constant. So what are the consequences of this?

Well, when invoke() is called on this anonymous function, it ensures that clojure.core/*loaded-libs* will contain the symbol that contains the metadata. So, this must be where the namespace metadata comes from, right?

Digging deeper

At this point in my investigation, I was a little bit confused. At first, I thought that the namespace metadata must come from the *loaded-libs* var, but that's just a ref to a sorted set of symbols. However, if I want to get the metadata from a namespace at the REPL, I use (meta (find-ns 'greeter.hello)), and the type of the object returned by find-ns is a Namespace instance, not a Symbol. This got me thinking: what is the purpose of *loaded-libs* and where is the Namespace instance created?

The purpose of *loaded-libs*

*loaded-libs* is a private var declared in core.clj. You can get its content, a sorted set of symbols, via the loaded-libs function. It is used indirectly by require and use to keep track of what namespaces have been loaded. For example, when you use require without :reload or :reload-all, the presence of the namespace name symbol in *loaded-libs* will keep the namespace from being reloaded.

When using :reload-all, Clojure uses an initially-empty, thread-local binding of *loaded-libs*. This allows all dependencies of the desired library to be reloaded once, and the resulting set of loaded namespace name symbols is added to root binding of *loaded-libs*.

As a result, this means that the metadata used for the *loaded-libs* is not the metadata we get from the namespace object. For that, we'll have to take a closer look at the metadata attached to the symbol at greeter.hello__init/const__1.

Another look at greeter.hello__init/load

Looking back at greeter.hello__init, the namespace name symbol with metadata is stored in a class constant, const__1. The only place where this constant is used is in the load() method, which is decompiled as follows:

public static void load() {
  // (in-ns 'greeter.hello)
  IFn inNs = (IFn)const__0.getRawRoot();
  inNs.invoke(const__1);

  // (with-loading-context (refer 'clojure.core))
  IFn loading4910auto = (IFn)new greeter.hello$loading__4910__auto();
  loading4910auto.invoke();

  // (if (.equals 'greeter.hello 'clojure.core)
  //   nil
  //   (do
  //     (LockingTransaction/runIntransaction (fn* …))
  //     nil))
  Symbol greeterHello = (Symbol)const__1;
  if (greeterHello.equals(const__2)) {
    return null;
  } else {
    Callable callable = (Callable)new greeter.hello$fn__17();
    LockingTransaction.runInTransaction(callable);
    return null;
  }
}

As we see here, there are two places where this constant is used:

  1. In lines 3-4, it is used as the argument for in-ns.
  2. In lines 15-16, it is used in a comparison to 'clojure.core.

In the second case, the metadata has no effect, but what of the first?

A closer look at in-ns

in-ns is a bit special. Unlike most of clojure.core, it is not defined in core.clj. Instead, it is constructed within RT.java. Its value is actually an anonymous AFn implementation also defined in RT.java. This implementation is fairly simple, and the noteworthy bit is that the symbol that is passed to in-ns is further passed to the static method clojure.lang.Namespace/findOrCreate.

Class diagram of clojure.lang.Namespace

Namespace contains a static member called namespaces, which is a map³ of namespace name symbols to Namespace object instances. When findOrCreate is called and there is no mapping for the symbol yet, a new Namespace instance is created and inserted into the map.

The Namespace class extends clojure.lang.AReference, which holds metadata and indirectly implements clojure.lang.IMeta. As such, the Namespace constructor uses the metadata from the namespace name symbol as its metadata.

At last, we now know how a namespace gets its metadata. Looking at the implementation of find-ns, we see that it just calls Namespace/find which merely does a lookup in the namespaces map.

Parting thoughts

  1. If the purpose of *loaded-libs* is primarily to keep track of what namespaces have been loaded, does it really need metadata? Metadata doesn't affect the equality of symbols. Arguably, adding metadata to symbols in *loaded-libs* is a waste of memory.
  2. One interesting finding is that at the very heart of Clojure is a bit of mutable state. Keeping track of loaded libraries uses Clojure's concurrency utilities and persistent data structures, but namespaces rely on a Java concurrent collection.
  3. All new namespaces are initialised with a default set of import mappings, mostly classes from java.lang. This has two main implications:
    1. The only thing special about classes from java.lang is that their mappings are hard-coded. If a new java.lang class were to be added to Java, it won't get imported by default until RT.java is updated with a new mapping.
    2. Imports are mappings of symbols to class objects, and there is not a separate set of mappings for Clojure vars.
  4. Since Clojure keeps track of what's loaded in two different places, it's possible mess up the environment in strange ways. In particular, remove-ns does not clear a symbol from *loaded-libs*, meaning that it would be possible to get Clojure into a state where it thinks a namespace is loaded when actually it is not.

Footnotes

  1. Note that the names of the anonymous function classes can be different each time you compile.
  2. Unsurprisingly, symbols are immutable and annotating one with metadata generates a new symbol.
  3. In particular, it is a ConcurrentHashMap, a concurrency-friendly implementation of a map from Java. It is not a persistent data structure, but it does concurrent reads and limited concurrent writes.

Saturday, 18 January 2014

Dart vs. ClojureScript: first impressions

Update: I have spent some more time with both Dart and ClojureScript since I wrote this, and I have written more about this in Dart vs. ClojureScript: two weeks later.

Dart is a relatively new programming language created by Google for client-side web development. I've started looking into it a bit as I am planning a local Dart Flight School event as one of the organisers for GDG Houston. In particular, I spent about an hour going throught the Darrrt code lab. Having done this, I was curious to see how the same application coded in ClojureScript would compare.

First impressions

Let me start by saying I haven't done much JavaScript programming since about 1996, and I am generally more comfortable with the server side of web programming than the client side. This makes a big difference in how I approach Dart and ClojureScript, as I am relatively unfamiliar with the underlying platform. Someone with a lot of experience with JavaScript and client-side web programming may approach these languages differently.

Dart

My first impression of Dart is that it is very Java-like. It's features include:

  • static typing
  • object-oriented
  • Java-like syntax
  • generics
  • exceptions

One of the major ways in which Dart is different Java is that it has lexical closures and first-class functions, which are welcome additions. Additionally, I think Dart has pretty good documentation and a fairly decent standard library.

ClojureScript

This isn't my first time using ClojureScript; nonetheless, it's still different enough in tooling and language details that it takes me a little bit more time to get up and running compared to a traditional Clojure project. However, the biggest problem that I ran into isn't the language, but the library.

With Dart, things like making asynchronous HTTP requests and manipulating the DOM are baked right into the standard library. With ClojureScript, it's not quite as straightforward. Do I use JavaScript primitives, the Google Closure library, or look for a ClojureScript library that wraps Closure or raw JavaScript? In the end, I used ClojureScript libraries such as Domina and storage-atom.

Some comparisons

Even within this short code lab, there are a few places where the differences between the two languages was remarkable.

Serialisation

The code lab includes both storing bits of information in HTML5 local storage and loading data from an external file. In the case of Dart, I found it to be somewhat painful as it required transforming things into or from JSON.

For example, a Dart PirateName class requires the following JSON read/write code:

final String TREASURE_KEY = 'pirateName';

class PirateName {
  PirateName.fromJSON(String jsonString) {
    Map storedName = JSON.decode(jsonString);
    _firstName = storedName['f'];
    _appellation = storedName['a'];
  }

  String get jsonString => '{ "f": "$_firstName", "a": "$_appellation" }';
}

PirateName getBadgeNameFromStorage() {
  String storedName = window.localStorage[TREASURE_KEY];
  if (storedName != null) {
    return new PirateName.fromJSON(storedName);
  } else {
    return null;
  }
}

In comparison, with ClojureScript we can use the reader and a map instead of an object. Combining this with the storage-atom library makes reading from and writing to local storage trivial.

(def storage (local-storage (atom {}) :pirate-storage))

; write to local storage
(swap! storage assoc :pirate-name name)

; read from local storage
(:pirate-name @storage)

Likewise, in Dart, reading a set of names and appellations from an external file requires decoding JSON:

class PirateNames {
  static _parsePirateNamesFromJSON(String jsonString) {
    Map pirateNames = JSON.decode(jsonString);
    names = pirateNames['names'];
    appellations = pirateNames['appellations'];
  }
}

In Clojure, we can just use read-string.

Built-in functions

While Dart's library has a fairly decent set of functions for doing things like interacting with the DOM, it doesn't have some of the functionaly that ClojureScript has built-in.

For example, compare (rand-nth names) to:

final Random indexGen = new Random();

name = names[indexGen.nextInt(names.length)];
Asynchronous code

With ClojureScript, you can use core.async as described by David Nolen to write asynchronous code that reads logically. Dart has no comparable functionality (though it does have functionality to support asynchronous code in general). While using core.async for this code lab is probably unnecessary, I can definitely see how it could make a more complex application easier to understand and maintain.

Final thoughts

I can definitely see how Dart is a big improvement over JavaScript. Nonetheless, I think that ClojureScript is fundamentally a more powerful language. It's shortcomings in comparison to Dart seem to generally lie in terms of libraries, which can be easily written by the community. As such, while I have found Dart interesting, and will continue to learn more about it, at this time I would lean more towards using ClojureScript in a project.