2013-07-26: Clojure Parallelism with pmap

I've been playing around with Clojure a lot recently. Here is a little example of adding parallelism to your clojure code.

Let's say you want to get some weather data from Weatherbug.

We define a method raw-weatherdata-from-zipcode that will:

  1. construct a URL string from the zipcode parameter (and my API key that I've left out)
  2. parse that data into an xml structure
  3. coerce the xml structure into a sequence


(use '[clojure.xml :as cxml])

(defn raw-weatherdata-from-zipcode
  [^String zipcode]
  (-> (str "http://api.wxbug.net/getLiveCompactWeatherRSS.aspx?ACode=" apikey "&zipcode=" zipcode "&unittype=0")
      cxml/parse
      xml-seq))

Now that we have a sequence, we can easily filter out the XML elements we want via their tag:

(defn struct-from-weathertag
  [tag xs]
  (first (filter #(= tag (:tag %)) xs)))

Putting it all together in a function:

(defn city-condition-by-zip
  [^String zipcode]
  (let
    [weather (raw-weatherdata-from-zipcode zipcode)
     station (struct-from-weathertag :aws:station weather)
     cc      (struct-from-weathertag :aws:current-condition weather)]
     (str (-> station :attrs :name) ": " (-> cc :content first))))

We can now get the weather for individual zipcodes:

user=> (city-condition-by-zip "01602")
"Worcester Regional Airport: Rain Showers"

But what if we need to get the weather for a 100 zipcodes. We can do that simply with the fantasic map function; so, if we wanted to get the weather conditions for 100 of the same zipcode sequentially, we could:

(time
  (let [result (map city-condition-by-zip (repeat 100 "01602"))]
    (println result)))
...
"Elapsed time: 6287.561 msecs"

That doesn't seem terribly efficient because each result is independent. It would certainly take a lot of code to parallelize this process, right? Nope. One measely letter, p, gets us to where we need to be:

(time
  (let [result (pmap city-condition-by-zip (repeat 100 "01602"))]
    (println result)))
...
"Elapsed time: 857.029 msecs"