Category Archives: Clojure

From Clojure to ArangoDB – Clarango Version 0.6 released

Clarango is the bridge between Clojure and the multi-model database ArangoDB. I just released version 0.6 of this driver and here is an overview of the new functionalities.

HTTP API compatibility updates

Some necessary changes had to be made to make Clarango work with the ever changing HTTP API of the latest version of ArangoDB (2.3).

The admin namespace had to be deleted completely because ArangoDB 2.3 no longer supports the former admin interface. Instead a misc namespace was added to hold methods of general functionality in the future. Currently it only has one method, version, that retrieves ArangoDB version info.

ArangoDB’s HTTP API also has a new graph interface that is way more flexible than the old one. I started the migration, but du to lack of time there is still a lot to be done here to complete the migration.

Helper Methods to test if a Database, Graph or Collection exists

In the database namespace you now will now find find the following methods to do a quick check if a database, collection or graph already exists: database-exists?, collection-exists? and graph-exists?. They work just like you would expect, taking a name as argument and returning true or false as a result.

Getting a whole collection at once with delayed documents

Wouldn’t it be cool if you could just get a whole collection with all its content at once? That’s what I thought. But since collections can be arbitrarily large and every Clarango method call should only result in one HTTP request, I thought of this:

You can now call the method get-delayed-collection in the collection namespace and it will return a map associating the keys of every document in the collection with the document itself, represented as a delay.

That means you can retrieve all document keys like so:

(keys (collection/get-delayed-collection :my-collection :my-db))

For getting the content of all documents one after another you could do something like:

(let [delayed-collection (collection/get-delayed-collection :my-collection :my-db)]
      (doseq [k (keys delayed-collection)]
        (pprint @(get delayed-collection k))))

This is an experimental feature, let me know what you think about it!

More Nesting possible

It’s now possible to do more method nesting in order to shorten expressions. Instead of doing everything in a strictly sequential style, you can create resources e.g. like this:

(clacore/with-db (database/create :my-db [])
  (document/create-with-key {:description "nice stuff"} :my-doc 
    (collection/create :my-collection)))
(graph/create :test-graph 
  (collection/create :people {"type" 2})
  (collection/create :friendships {"type" 3}))

So what would be the shortest possible way to create a document? I can think of several, but one would for example be:

(clacore/with-connection
  {:connection-url "http://localhost:8529/"
   :db-name (database/create "my-db" [])
   :collection-name (collection/create "my-collection")}
  (document/create-with-key {:description "nice stuff"} :my-doc))

This is as much an experimental feature as the last one, so let me know if you find this useful. I’m sure it’s convenient in some cases, but it also makes the method signatures be defined less precisely. To make this work, I’m storing the collection and database names in the metadata of the return values of their respective create methods. That means the Clarango methods now accept also maps as arguments where before they only accepted keywords or strings. This may be confusing at some points and can also make the debugging harder, since Clarango can’t just throw an error anymore if you pass a map instead of a resource name.

To learn more about Clarango, check out the Github repository. You will find more examples there and also links to further documentation.

Btw: There is still a lot to be done to make Clarango a complete driver and I can’t do it all on my own since the ArangoDB HTTP API is changing quite a lot. If you wish to contribute to Clarango, send me a message or just pull request!

Quick Overview: State, Transactions and Concurrency Primitives in Clojure

“What was the difference between delays and futures again…?” – If you’ve ever found yourself asking that, you are probably right here! In this blog post I will give a short overview over the number of language constructs that Clojure offers for dealing with mutable state and concurrent behavior. This is not supposed to be exhaustive at all, my main goal was to keep it short and concise.

Vars

To get a complete overview let’s start with vars:

(def somevar "some content")

Vars offer mutable state and are bound to a namespace. You can rebind vars locally on a per-thread basis. I wrote about this in detail here. Changes to a vars’ root binding are not a good idea when you are running multiple threads. Also you probably already know about vars, so let’s skip ahead to the more interesting parts…

Refs and Transactions

Refs are encapsulated data packages whose access is put under restrictions. Every change to a ref has to be made inside a transaction.

Define a ref:

> (def bowloffruits (ref ["apple"]))

To obtain the content of a ref, one has to dereference it. Dereferencing will always return a snapshot of their state at the time of dereferencing:

> @bowloffruits ;; or use the long version: (deref bowloffruits)
["apple"]

Now to modify the ref we need to run a transaction. We do this by running the so called transformation function dosync:

> (def emptybowl (ref []))
> (dosync (alter bowloffruits conj "banana"))
["apple" "banana"]
> (dosync 
    (alter emptybowl conj (first @bowloffruits)) 
    (ref-set bowloffruits (vec (rest @bowloffruits))))
> @emptybowl
["apple"]
> @bowloffruits
["banana"]

Using transactions to modify the refs will ensure that either all changes within the dosync block are executed or none. Further it makes sure that in case of conflicting transactions at the same time, one of them will be retried at a later point.

Atoms

Atoms are encapsulated pieces of data just like refs, except that you don’t need a transaction to modify them. You can use an atom if you want thread security and atomic state change for a single entity, but you don’t need coordination with other activities. If two conflicting changes are made at the same time to an atom, the first one to complete with a return value will succeed and the other one will be retried with that latest value.

> (def food (atom "apples"))
> @food
"apples"
> (reset! food "carrots")
> (swap! food str " and peas")
> @food
"Carrots and peas"

Agents

Agents are encapsulated pieces of data which can be modified by a function that is sent to it. Like Atoms, the changes made to an agent’s state are always uncoordinated and happen independently from other agents (no transactions). In difference to atoms though, the modifications will always be executed in another thread.

> (def counter (agent 0))
> (send counter inc)
> @counter
1

Apart from send, there is another function available to modify an agent’s state: send-off. The only difference between the two is that send works with a fixed-size thread pool while with send-off there are no restrictions to the number of concurrent threads. That means you shouldn’t run blocking operations with send, because that may also block other operations which then may need to wait for the blocking operation to be finished in order to start. Before exiting a program, you have to call the function shutdown-agents to let the VM know that it’s safe to quit the agents.

Futures

A future takes one or several bodies of code, executes them in another thread and returns the value of the last body. They allow an asynchronous return to the thread they are called from without blocking it. When they are dereferenced though, the thread will block until the value is available.

> (def waitabit (future (Thread/sleep 5000) "That was worth waiting for!"))
> @waitabit
"That was worth waiting for!"

Delays

Suspends some body of code until the user demands for it. When the delay gets dereferenced for the first time, the code is executed once, the result will be saved and every future dereferencing will return that same result without executing the code again.

> (def delayed-slurp (delay (slurp "http://www.peterfessel.com")))
> @delayed-slurp
...

Promises

They have similar characteristics as delays and futures. When dereferenced, promises will block the current thread until they are fulfilled and have some data to deliver. The difference to the above constructs is that promises are not initialized with a body of code that will eventually deliver the data. Instead it is the user’s responsibility to deliver the data to the promise once it’s available:

> (def p (promise))
> (realized? p)
false
> (deliver p "keeping my promise")
> (realized? p)
true
@p
"keeping my promise"

(Note that the realized? function can also be used with delays and futures.)

Core.async channels

Clojure has some extra nice asynchronous features in the library core.async that you need to import separately if you want to use it. Core.async is mostly about channels. Channels have a similarity to promises: You put something in on one side and it comes out on the other. Yet, channels are far more sophisticated:

> (require '[clojure.core.async :refer :all])
> (def jackie (chan))
> (thread (println "Some" (<!! jackie) "are what I was waiting for.")) 
> (>!! jackie "carrots")
Some carrots are what I was waiting for

Here we define a channel. We start a thread and let it listen to the one end of the channel (<!! reads from the channel and waits until there is something to read). Once we put something on the channel via >!!, we get the output from the thread. The thread method here works just like a future with the difference that it returns a channel as a result.

This is cool, but not very efficient, since we have a thread running empty until there is something to read from the channel. That’s why core.async offers a more controlled way to do this with go blocks:

> (def jackie (chan 2))
> (go (loop [food (<! jackie)]
    (if food
      (do 
        (println "Some" food "is what I was waiting for.")
        (Thread/sleep 1000)
        (recur (<! jackie))))))
> (doseq [food ["carrots" "peas"]]
    (println "deliver" food)
    (>!! jackie food)
    (Thread/sleep 1000))
deliver carrots
Some carrots is what I was waiting for.
deliver peas
Some peas is what I was waiting for.

The go macro runs its body in a pool of threads, making sure to put execution on hold as long as nothing happens, so that no threads are blocked. Inside go blocks you use the methods <! and >! to read and write on a channel. When defining the channel in this case we make it a buffered channel via (chan 2), because in our code it may happen that two foods get delivered in the channel before they are taken out of it. So we create a channel with a fixed buffer size of 2.

This is only scratching the top of what is possible. For a more detailed yet easy to follow introduction on core.async and channels, I recommend this blog article.

Java Threads

… and then it’s still possible to use Java’s Threads, if you must :-)

> (.start (Thread.
    (fn []
      (dotimes [i 5]
        (println i)))))
0
1
2
3
4

Summary

We can divide the constructs from this post into two groups: first we have vars, refs, atoms and agents, they are entities holding a mutable state. We can modify them by passing a function (except for vars). Vars take a special position here, because usually we just use them as a container to bind the other entities to a namespace. For refs all changes happen in a coordinated way through transactions, while atoms do the same thing without coordination. In agents the modifications happen in another thread and are put in order through an unbounded queue. You can listen for changes in all of these entities (again, except for vars) via watches.

The rest of the constructs – futures, delays, promises and channels – we can use to manage the control flow of our code, defer the execution of a body of code or let it be executed in another thread. Futures execute a body of code in a separate thread, while delays just defer the execution of the code. Through promisessingle user can deliver a value once, that can then be seen by multiple consumers. Channels on the other hand allow for multiple users to deliver an endless amount of data through a bounded queue, but each data entity can only be taken by a single consumer.

It may take some time really understanding the principles of the here described primitives, but I hope this little overview can be of help for you to decide which primitives to use in which situation. Writing it certainly helped me understand the differences between all the described features.

Understanding the difference between lexical and dynamic scope in Clojure

What is the difference between lexical and dynamic scope in Clojure? A question that I asked myself several times because the difference was difficult to grasp and to remember for me. I’ll try to give a short explanation here.

What is scope?

First, what is scope after all? Usually in programming at several points there need to be values saved in order to work with them. These values need to be referenced somehow. That’s why we usually give these values names, so we can reuse them later. Now the scope of a value is the location in the program where a name is referring to a certain value. In lexical scoping, that will correspond somehow with the area in the code where a value has been declared, while in dynamic scoping this depends on the runtime stack of declarations.

A Clojure example

To understand this, let’s first look at an example:

(def non-dynamic-var "this is a non dynamic var")
(def ^:dynamic *dynamic-var* "this is a dynamic var")

(defn function-using-dynamic-var []
  (println *dynamic-var*))

(defn function-using-non-dynamic-var []
  (println non-dynamic-var))

(defn some-function []

  (function-using-dynamic-var)
  ;; dynamically rebind dynamic-var
  (binding [*dynamic-var* "this is some new content for the dynamic var"]
    (function-using-dynamic-var))

  (function-using-non-dynamic-var)
  ;; locally rebind non-dynamic-var
  (let [non-dynamic-var "this is some new content that won't be used"]
    (function-using-non-dynamic-var))
  ;; lexically (and globally) rebind non-dynamic-var
  (def non-dynamic-var "this is some new content that will be used")
  (function-using-non-dynamic-var))

We are declaring a normal and a dynamic var (by adding the ^:dynamic  metadata part; the leading and trailing asterisks in the name are a naming convention for dynamic vars) and for each one a function that outputs them. Then we call each function before and after rebinding/redefining the vars and see what changes.

Output from a call of some-function is:

this is a dynamic var
this is some new content for the dynamic var
this is a non dynamic var
this is a non dynamic var
this is some new content that will be used

We can see here that both binding and def have changed the value of each var when we call function-using-dynamic-var and function-using-non-dynamic-var again. The difference is that binding only changes the value of *dynamic-var* within the scope of the binding expression, while def changes the root definition of non-dynamic-var from the point of execution on. What does not work here, is using let to rebind non-dynamic-var since the change is only local and won’t affect the top-level non-dynamic-var.

Lexical scope vs. dynamic scope

In lexical scope, a variable always refers to its local lexical environment. The lexical environment is dependent only on the program text. Thus, a static analysis (at compile time) of the program text can always tell us the scope of a variable. That’s why lexical scope is also called static scope.

In difference to static scope dynamic scope is dependent on the runtime call stack. This means a program needs to be executed to determine the scope of dynamic variables. Each identifier has a global stack of bindings and the content of a variable is difficult to reason about just by looking at the code, since there may be several different dynamic contexts in which a piece of code can be invoked.

A note on local scope and the method with-redefs

In the example above we see that local scope, as we know it from other languages like Java, in Clojure can only be achieved with let and binding, not with def. The latter doesn’t provide us any local scope since it always applies changes globally.

Note though that there is another possibility to achieve local scope: the method with-redefs can change the root binding of a (non-dynamic) var within its scope. You might wonder now: What’s the difference between binding and with-redefs and why do we need dynamic vars after all? Can’t we achive the same thing with with-redefs?

It’s a little confusing, yes. The difference is that binding only rebinds the vars thread-locally while bindings made by with-redefs are visible in all threads. The takeaway is that dynamic vars and binding are a more controlled way to achieve the binding and you should always use them if you can. Only use with-redefs for testing and when you don’t have another choice because the var you want to rebind lies out of your control and wasn’t declared ^:dynamic.