Wednesday, March 12, 2008

Some examples of Software Transactional Memory in Haskell

I've just finished teaching a course on Concurrent Programming. It's been fun. It's an increasingly important subject as software becomes more complex and processors get more cores. Most of the course has focused on lock-based shared memory synchronization. It's a tricky subject, you have to be very careful to avoid all the possibilities for deadlock. Fairness is also a tricky issue.

In the course I went through a couple of language constructs for synchronization, for example semaphores, monitors with condition variables and message passing. But at the end of the course I had time for one extra lecture and so I thought it would be nice to talk about transactional memory, which is the new hot language construct for simplifying concurrent programming. Sure, it can be implemented as a library as well, and that's how it's made in most languages, but that's a very fragile solution.

So, despite the fact that the languages used in the course are Java, JR and Erlang I chose to use Haskell in the lecture, simply because GHC provides better support for software transactional memory (STM) than any other system that I know of. I think it worked well, the kind of Haskell I used in the lectures is pretty imperative looking, simply because STM in Haskell uses monads, and so the code doesn't look to foreign even for someone not used to functional programming.

In this post I thought I'd share some snippets of code that I wrote for the lecture. They're all just standard examples in concurrent programming. Most of them look rather unimpressive but that's the whole point: programming with transactional memory is such a relief compared to fiddling with lock-based solutions.

This post is a literate Haskell file. Just copy-n-paste it into a file and you can try it out yourself.

> module STM where
>
> import Random
> import Control.Monad
> import Control.Concurrent
> import Control.Concurrent.STM
Let's start with something easy: binary semaphores. Or locks as they're also called. I've used the traditional names p and v here for acquiring and releasing the lock. They're really awful names though.
> type Semaphore = TVar Bool
>
> newSem :: Bool -> IO Semaphore
> newSem available = newTVarIO available
>
> p :: Semaphore -> STM ()
> p sem = do b <- readTVar sem
>            if b
>               then writeTVar sem False
>               else retry
>
> v :: Semaphore -> STM ()
> v sem = writeTVar sem True
The most impressive thing with the transactional memory implementation in GHC is in my opinion the retry combinator. It handles all conditional synchronization by complete magic. Sure, I happen to know how it works under the hood but the simplicity of the API is just marvellous. Whenever we have a condition that is not fulfilled, in the above code the condition is that the lock happens to be already taken, we just call retry. The runtime system will take care of waking up the process at the appropriate time. No messing around with condition variables or anything, like with monitors. It just couldn't be simpler. We'll see more of this in the examples that follow.

A slightly more interesting example: an unbounded buffer which processes can insert stuff into and then take out. It's written with clarity in mind, not efficiency. It's easy to make it more efficient but I didn't want to introduce more data types to the students so I used a naive list implementation. I leave an efficient implementation as an exercise to the reader.

> type Buffer a = TVar [a]
>
> newBuffer :: IO (Buffer a)
> newBuffer = newTVarIO []
> 
> put :: Buffer a -> a -> STM ()
> put buffer item = do ls <- readTVar buffer
>                      writeTVar buffer (ls ++ [item])
> 
> get :: Buffer a -> STM a
> get buffer = do ls <- readTVar buffer
>                 case ls of
>                   [] -> retry
>                   (item:rest) -> do writeTVar buffer rest
>                                     return item
Again the code is so clear and easy to understand that it makes you wonder how is could be any different. It should be said though that this implementation gives priority to processes writing to the buffer. If there is a single process writing to the buffer he can make an arbitrary amount of readers retry. A lock-based implementation provides mutual exclusion so only one process at a time has access to the buffer. In such an implementation processes get access in the same order as they requested it so writers do not have priority over readers. But that implementation of course has all the usual problems that a lock-based implementation has.

One standard pattern that I used a lot in the course was resource allocation. I used a distilled version where there is a counter keeping track of the amount of resources available. Processes may ask for an arbitrary amount of resources and block if they're not available. This interface has the advantage that it's resource agnostic and can be implemented in a language like Java. I'm sure there are nicer ways to do it in Haskell, depending on the resource.

> type Resource = TVar Int
> 
> acquire :: Resource -> Int -> STM ()
> acquire res nr = do n <- readTVar res
>                     if n < nr
>                        then retry
>                        else writeTVar res (n - nr)
>                             
> release :: Resource -> Int -> STM ()
> release res nr = do n <- readTVar res
>                     writeTVar res (n + nr)
As a final example I will give an implementation of the dining philosophers. It's one of the classic problems in concurrent programming and so it's interesting to see how STM performs.

To make this program interesting we have to output what the philosophers are doing so that we can verify that the simulation is correct. But the standard I/O primitives in Haskell are not thread safe so we have to take a bit of care when writing stuff to standard out. The way I've solved it is to have a buffer which all the philosopher processes writes to and then I have a separate thread which reads from the buffer and writes to the screen. That way only one process does the outputting and there is no risk of weird output.

I'm not going to go through the exact formulation of the dining philosophers problem. If you're unsure how it goes I suggest you read the wikipedia article on it. I've chosen to implement the forks as the binary semaphores that we defined above.

> simulation n = do forks <- replicateM n (newSem True)
>                   outputBuffer <- newBuffer
>                   for [0..n-1] $ \i -> 
>                     forkIO (philosopher i outputBuffer
>                             (forks!!i) 
>                             (forks!!((i+1)`mod`n)))
>                   output outputBuffer
>
> output buffer = 
>     do str <- atomically $ get buffer
>        putStrLn str
>        output buffer
>
> for = flip mapM_
This first bit of code just sets everything up for the simulation. The function simulation take the number of philosophers as an argument. Then it creates the required number of forks, the output buffer and spawns off all the philosopher processes which are given their corresponding forks. Finally the main thread goes into a loop which reads of the strings from the output buffer and prints them.
> philosopher :: Int -> Buffer String -> Semaphore -> Semaphore -> IO ()
> philosopher n out fork1 fork2 = 
>     do atomically $ put out ("Philosopher " ++ show n ++ " is thinking.")
>        randomDelay
>
>        atomically $ do
>          p fork1
>          p fork2
>
>        atomically $ put out ("Philosopher " ++ show n ++ " is eating.")
>        randomDelay
>
>        atomically $ do
>          v fork1
>          v fork2
>
>        philosopher n out fork1 fork2
>
> randomDelay = do r <- randomRIO (100000,500000)
>                  threadDelay r
Again, this implementation is so simple that you wonder what the problem was in the first place. The power comes from being able to compose several transactions together sequentially and perform them atomically.

Note that the philosophers execute in the IO monad and only invoke the transactional memory bit when it has to do synchronization. This is the typical usage of transactional memory, something we didn't see with the earlier examples.

There's one thing that's not so nice with GHC's implementation of STM though. You can see it for yourself if you change the call to randomDelay to a call to threadDelay with a fixed time. What will happen is that one of the philosophers will starve. I will not say more about that here though, it's the topic of another blog post.

That's all for this post. I hope you find the examples useful.

7 comments:

adam brown said...

Hello I just entered before I have to leave to the airport, it's been very nice to meet you, if you want here is the site I told you about where I type some stuff and make good money (I work from home): here it is

pbv said...

To make this program interesting we have to output what the philosophers are doing so that we can verify that the simulation is correct. But the standard I/O primitives in Haskell are not thread safe so we have to take a bit of care when writing stuff to standard out.

I think the issue is not that primitives aren't thread safe, but rather that IO actions can't be performed inside a transition to allow retrying in case the commit fails.

Nijlo said...

I/O can be performed inside of a transaction; the transaction just has to be irrevocable (inevitable) or isolated. Welc et al. and Spear et al. have good papers out explaining irrevocability (inevitability).

Cheers,
Justin

javieth said...

Interesting blog, i usally be aware all about all different kind of sofware. i am online all the time, and this action allow me to see a site costa rica homes for sale and i like it too much. beyond all doubt without my computer i never would have seen this site too.

victorholmes02 said...

Well... round about every blog posts online don't have much originality as I found on yours.. Just keep updating much useful information so that reader like me would come back over and over again.
How to seduce a woman

cheater said...

Hi,
I have run your example with a constant threadDelay and none of the philosophers starve. They all seem to enjoy the same amount of transitions. I tried with 5 and 50 philosophers and then got bored, but my intuition says it works for any amount. I am on GHC 7.4.1 and Linux 3.5.0 generic. I tried with -threaded and -threaded -rtsopts too and couldn't reproduce.

Josef said...

cheater,

Since I wrote this example I've patched GHC so that the starvation issue doesn't come up.

http://hackage.haskell.org/trac/ghc/ticket/2319