Here at RJMetrics we recently added 10% time to our development cycle. I decided to do my project in clojure because I needed easy concurrency and I wanted to learn a new skill. Since I have never worked in a lisp-based language before, I found the fact that clojure has two ways to resolve a symbol’s scope to be a little confusing at first. I found many posts that helped explain the difference, but few took a beginner approach. I hope this post will clear up any confusion that new clojurians may have.
A “scope” in computer science is a context that is created to encapsulate the values of variables (or symbols in clojure) to smaller pieces of code. Without any scopes, every variable would be global. Can you imagine if every time you used the variable x or i you had to worry about interfering with another piece of code? Luckily, most sane languages provide a means of creating and restricting the scope of variables. Some languages (eg. Clojure, Common Lisp, Perl) have two means of controlling scope, lexically (aka static) and dynamically.
These are a few basic clojure terms you will need to know to understand the rest of this post.
- A symbol is similar to an identifier in other languages. Symbols are used to associate values with common names. In an imperative language like PHP you might say something like “$x = 5;”. Now the variable $x equals 5, in clojure semantics I would say the symbol x is associated with the value 5. Clojure has an opinionated philosophy on separating identities and values.
- A value is just actual data, such as a number, string, function (code is data), vector, object, etc..
- A var is a way of tying a symbol to a potenitally changing value; they are created using the def macro. This will be covered in more depth when I cover dynamic scope.
The lexical scope of a symbol is the “textual region” (aka code block) where the symbol definition occurs, plus all child code blocks. You can imagine your code creating a tree of code blocks that the compiler can traverse. What defines a code block is dependent on the language. Lexical scoping is by far the most common amongst all modern ALGOL-influenced languages (C, Java, etc). When the compiler is trying to resolve a reference to a variable, it will first look in the inner-most scope and continue up the levels of scope (code blocks) until it finds its declaration. Lexical scope is sometimes called static scope since all symbol references can be resolved at compile-time, as opposed to dynamic scope which needs runtime information. In this C example below the variable i is available within the for and while loops, because the while loop is below the for loop. However, i is not available outside of the for loop since that is above the scope in which the variable was defined.
This is the scope you will be using in clojure the vast majority of the time. The main difference is the scope blocks are declared explicitly by the programmer in clojure rather than by the syntax of the language. In clojure, a lexical scope region is defined by a let block. In the code example below, x, y, and add are all symbols that are associated with the values 3, 5, and a function that adds all of its arguments, respectively (remember functions are first-class values).
The lexical environment is encapsulated by the parentheses (aka parens) surrounding the let block, which means as soon as I close the let statement, those symbols go back to their original values. Here’s a more complex example:
In this last example, while inside a let block, I create and return a function that takes 1 argument and adds it to the value of x. Since the code that created the function was within a let block, the value of x is 1 even if the function is called outside of the let. Due to the fact this function retains its scope at the time of creation, it is said to “close” over the lexical environment. This type of function can be referred to as a lexical closure, or just closure for short (this is where clojure gets its name).
Dynamic scope is another method of resolving symbol names. A dynamic scope environment is created using the binding macro, which syntactically looks like let, but has some big differences:
- You can’t create new symbols. Every symbol has to be associated with a previously initialized var.
- The initialized vars also have to be defined as “dynamic”.
- The new bindings affect the entire call stack, not just the immediate text region.
- The bindings that are created are pushed onto a thread-local stack and cannot be shared between multiple threads.
We can’t make up new symbols, so we have to shadow existing vars. This seems like a restricted version of let initially, but after a few examples the purpose of dynamic scope will hopefully be made clear. We already know we can make new vars using def, which, as I mentioned before, act like constants. The exception to that rule is when they are defined as dynamic. Functions defined with defn are stored in a var, and so they can also be declared as dynamic (the defn macro uses def internally). A var can be defined as dynamic simply by adding “:dynamic true” to its metadata. Let’s take a look at this next example:
[NOTE: Before clojure 1.3 all vars were implicitly dynamic, which means there is a lot of broken code on the internet (including the official clojure website at the time of this writing)]
The vars x, y, and sum-of-squares are all declared to be dynamic at the top of the file. The function sum-of-squares-for-x-and-y calculates the sum of squares for the vars associated with the symbols x and y. Remember x, y, and sum-of-squares refer to vars and not values, the value of the var is looked up every time this function is run. When the clojure runtime encounters a var, it first checks for any thread-local bindings, if it can’t find any it will return the root value of the var which is shared by all threads. So in the above example when we first call sum-of-squares-for-x-and-y, all of the vars referenced in the function definition defer to their root binding. Next, we push new bindings on to our thread-local stack and call the same function, and now the function uses these new values and returns a different result. The key point here is the entire call stack below the binding point is affected, not just the text within the binding block. Once we end the binding block, the values get popped off the stack and return to their root value.
In the above example, we have a function that returns an adder function for the var x just like we did earlier using a let block. Since x is a var, the value is determined at the time the function is run, not when it is created. You cannot close over a dynamic scope! In the example below I got around this by lexically shadowing the var x with its current dynamic value.
So why would you ever want to use dynamic scope over lexical? Let’s take a look at how dynamic vars are being used in the core clojure library. The stdin and stdout file handles, which many core I/O functions use (eg. println, read), are defined by the dynamic vars *in* and *out* (by naming convention, dynamic vars are wrapped in asterisks (aka earmuffs)). Maybe you want your application to print its output to some log file instead of stdout. We can simply dynamically re-bind the *out* var at the base of our application.
Now the println function (and any other function that uses *out*) will print to log.txt. Let’s try something a little more complicated. Say we have a multi-threaded application and we don’t want the print output from each thread to be intertwined with each other. One way we could accomplish this is to have each thread print to its own log file. Since each thread maintains its own stack of bindings, this is trivial to do.
Now each threads’ print statements will go in to their own files. Note that it is rare to directly make java threads like this in clojure since clojure already has more expressive concurrency primitives (agents, futures, pmap, etc). However, since those all use threads internally these same binding rules apply which can lead to unexpected results. Some other use cases of dynamic scope include the following:
- You can debug function calls by dynamically overriding that function to print every time it gets called. Checkout clojure/tools.trace for the excellent dotrace macro.
- The clojure/java.jdbc library uses a dynamic var *db* so that you don’t have to pass in a database connection to every function call. You can just use the with-connection macro to bind *db* to the database you want to use, and then perform any database queries within the body of the macro.
- It allows you to mock functions that have side-effects to make them more unit-testable.
- If you look at the clojure core library you will find a lot of other use cases. For example, the current namespace is stored in *ns* and you can use the in-ns macro to temporarily bind that to some other namespace and dynamically add functions to the newly bound namespace.
In general, when you think you need a dynamic var you probably don’t. Lexical scope is easier to understand because the programmer can determine the resolution of the symbols by looking at the code. Using dynamic vars also breaks referential transparency, which greatly reduces code readability and testability. However, clojure is a practical language that allows the programmer to come up with creative and expressive solutions to problems, and recognizes that sometimes using dynamic scope might be the most straightforward way to solve a problem.