Computational Thoughts

Powers of minus one and some bit twiddling

2010-10-01T12:47:00.000-07:00

Since the start of this year I'm employed as a postdoc on a project for designing a new domain specific language for writing dsp algorithms. The language is called Feldspar and if you want to check it out you can look at the official homepage or download it from hackage. As you can see on these pages Feldspar is a collaboration between Chalmers (where I'm employed), Ericsson and Elte University in Budapest.

There are two main goals for the design of Feldspar: first of all it should be possible to program at a very high level, close to how dsp algorithms are normally specified. Secondly, the generated code needs to be very efficient as the kind of applications that Ericsson has for dsp applications are performance critical.

I'm involved in various parts of Feldspar but one particular thing on my plate is making sure that the generated code is fast.

Recently the language has been used in a pilot project within Ericsson to implement part of the 3gpp standard (a mobile broadband standard, unsurprisingly, given Ericssons involvement). Having people using the language is tremendously useful for us language implementors as we really get a chance to see how well the language works in its intended environment.

The Feldspar code that was written in this pilot project was kept very close to the standard and was very similar to the mathematical specification of the algorithms. It's a very nice feature of Feldspar that this is possible, but it poses a challenge for us language implementers. While eyeballing some of the code I notice a little piece of code that I would like to discuss a little:

(-1) ^ v30

Just to clarify, in Feldspar this means minus one to the power of v30 and it has nothing to do with xor. v30 is just a variable name.

Powers of minus one is a common idiom in mathematics for saying that a value should change sign. If the exponent is even the result will be positive and if the exponent is odd the result is negative. This kind of thing is useful in various places and apparently also in the 3gpp standard.

However, actually performing exponentiation would in this case be ridiculously inefficient. But the question is what kind of code we should generate for this? Currently our compiler generates C so the code examples I will show from here on will be written in C.

Remember what I wrote above that the result only depends on whether the exponent is even or not. That is very easy to check, it's just the least significant bit! So we might generate the following code (assuming that the exponent is v30 as in the example above):

v30 & 1 ? -1 : 1;

This is very short and nice and most likely as good as we can hope for, at least for the kind of processors we are targeting.

However, I started programming in the 80's and I still instinctively flinch when I see branches in performance critical code. So I got curious to see if I could write the above as straight line code.

A straight line code solution which computes the above function would most likely involve some bit twiddling. I spent some time trying to come up with a solution on my own but wasn't very happy with what I managed to produce. I was aiming for a three instruction solution and my solutions were nowhere near that. So I decided to look around, maybe someone else had solved the problem before me.

Bit Twiddling Hacks to the rescue! This wonderful list of various bit twiddling tricks doesn't have anything which solves my particular problem but there is one little nugget which is close enough that I could make good use of it: "Conditionally negate a value without branching". The value that we will be negating is 1, we want to negate it depending on the least significant bit of our input value (v30 in our example above). I will not reproduce the code from Bit Twiddling Hacks, you can check it out yourself via the link. Instead I'm just going to present the final result of using that code on my particular problem (using C code):

int v; //The exponent
int r; //Will contain -1 to the power of 
int isOdd = v & 1; //Is the exponent odd?
r = (1 ^ (-isOdd)) + isOdd;

This is a fairly clever piece of code and I'm quite happy with it. However, it will compile to four instructions on typical architectures and I was really hoping for a three instruction solution. Anyone out there who knows of a shorter solution?

Using Stow

2010-07-26T09:18:00.000-07:00

Stow

Sometimes I've found myself in the position where I would like to have access to several versions of the same programs. Maybe not all at once but at least a convenient way to switch between versions. Especially as a developer this need comes up every now and then, for instance trying to compile a piece of source code with several different versions of a compiler.

In the past when I wanted to try out an alternative version of a program I installed it in a temporary directory and ran it from there. It works but it's not terribly convenient. One has to be very careful to always specify the correct path, or chaos ensues.

So a while ago I started looking for a saner solution. This is where I came upon stow, a little command line tool for switching between different versions of the same program. I've found it quite useful, it's simple but does it the job.

However, I use stow with very irregular intervals and sometimes I forget some detail about how it works. The natural thing then is to turn to the stow manual. Unfortunately, the stow manual is not the most helpful of documents. It describes in some detail what stow does to the file system when it is invoked. While this is good to know if you want to implement your own version of stow it leaves you to infer yourself how to invoke it if you're an ordinary user.

Therefor I've written down some notes on how stow works. These notes are mostly for myself but I've put them on this blog in the hope that others might find them useful. This short user manual is by no means complete, but it cover my own usecase and I think that's the one that most users care about.

A short user manual for `stow`

Preliminaries

Before going in to how to use stow it is useful to know a little bit about how stow expects things to be organized.

To begin with I'm going to use a running example. Suppose you have a program called frobnitz and you would like to have several versions of it installed and be able to switch between them seamlessly.

Normally when programs like frobnitz are installed they end up somewhere under the directory /usr/local. There's where the shell looks for programs when you try to execute them on the command line. All programs are stored together in one big lump there. But stow likes things differently. It expects a new directory /usr/local/stow. This is a directory which you have to create yourself. Each version of each program will have its own directory in the /usr/local/stow directory. So, if you have versions 1.2, 1.3 and 1.4b of frobnitz installed then they will be installed in directories /usr/local/stow/frobnitz-1.2, /usr/local/stow/frobnitz-1.3 and /usr/local/stow/frobnitz-1.4b respectively.

Installing programs for use `stow`

The first step is to install a program that you which to have under stow's control. Using stow to control the installation of your programs is not compatible with other package managers such as those that come with your distribution. So if you have a program which is already installed via your distribution you must first uninstall it.

Recall from above that stow expects each version of each program to be in a separate directory. Installing a program into its own directory like this is typically just a matter of setting the right prefix when configuring the software. Below is a fictional example of how to install version 1.3 of frobnitz.

./configure --prefix=/usr/local/stow/frobnitz-1.3
make
sudo make install

I've seen more complicated ways of commicating with both configure and make to tell them exactly where files are expected to exist. But the above method has worked without problems for me so I'm sticking with it.

Enabling a program

Installing a program in its own directory like I showed above means that it is not automatically available from the command line and other things like man pages will not show up either. So to enable the program we have to tell stow that we want to use it. This is done by cd:ing to the stow directory and invoking stow with the program that we want to enable.

Here's how to enable version 1.2 of the frobnitz software (assuming it's already installed).

cd /usr/local/stow
sudo stow frobnitz-1.2

After executing the above commands you will have access to the program frobnitz and you will be using version 1.2.

Switching programs

The whole purpose of using stow is so that we can switch between different versions when we want to. Here's how to do that.

Following our running example, suppose you have version 1.2 of the program frobnitz enabled and you would like to switch to version 1.4b. This is done by first disabling 1.2. This is done with the delete command in stow. It might seem a little scary to invoke a command called delete but stow will never destroy things for you. After disabling version 1.2 it's a simple matter to enable 1.4b afterwards.

Here's how you switching programs is done:

cd /usr/local/stow
sudo stow --delete frobnitz-1.2
sudo stow frobnitz-1.4b

Wrapping up

It's not hard to use stow once you know the basic use cases. I hope you've found this little mini-guide useful.

There is at least one other tool which helps having multiple versions of the same program installed at once. The package manager nix can be tricked into doing this but last time I checked it involved to edit some configuration files and I prefer a tool which is a little more user friendly. It is also overkill if all you're after is the functionality that stow offers. That being said, nix looks really nice and I've been meaning to try it out. Any decade now.

A comment on Parameterized Monads

2009-02-15T05:48:00.000-08:00

This post was really intended to be just a comment to another blog post, but since I wanted to include some code it turned out to be better to write a little post myself. sigfpe has a nice post about Parameterized Monads, which pop up naturally ones you've done a bit of monadic programming. He argues that the ordinary Monad concept we have in Haskell is the wrong one, Paramterized Monads fall out so naturally that they ought to be the default. He then goes on to say "At the very least, do-notation needs to be adapted to support ParameterisedMonad." But GHC already supports do-notation for Parameterized Monads! Here's a proof:


> {-# LANGUAGE NoImplicitPrelude #-}
> module PMonad where

> import Prelude (fromInteger,(+),(*),show)

> class PMonad m where
>   return :: a -> m s s a
>  (>>=)  :: m s1 s a -> (a -> m s s2 b) -> m s1 s2 b
>   (>>)   :: m s1 s2 a -> m s2 s3 b -> m s1 s3 b
>   fail   :: m s1 s2 a

> data State s1 s2 a = State { runState :: s1 -> (a,s2) }

> instance PMonad State where
>   return a = State (\s -> (a,s))
>   f >>= m  = State (\s1 ->
>              case runState f s1 of
>                (a,s2) -> runState (m a) s2)
>   m1 >> m2 = m1 >>= \_ -> m2
>   fail     = fail

> get   = State (\s -> (s,s))
> put s = State (\_ -> ((),s))

> test1 = do x <- return 1
>            y <- return 2
>            z <- get
>            put (x+y*z)
>            return z

> go1 = runState test1 10

> test2 = do x <- return 1
>            y <- return 2
>            z <- get
>            put (show (x+y*z))
>            return z

> go2 = runState test2 10

[1 of 1] Compiling PMonad           ( PMonad.hs, interpreted )
Ok, modules loaded: PMonad.
*PMonad> go2
(10,"21")

And there you go. It's the NoImplicitPrelude that does the trick.

Some examples of Software Transactional Memory in Haskell

2008-03-12T08:30:00.000-07:00

I've just finished teaching a course on Concurrent Programming. It's been fun. It's an increasingly important subject as software becomes more complex and processors get more cores. Most of the course has focused on lock-based shared memory synchronization. It's a tricky subject, you have to be very careful to avoid all the possibilities for deadlock. Fairness is also a tricky issue.

In the course I went through a couple of language constructs for synchronization, for example semaphores, monitors with condition variables and message passing. But at the end of the course I had time for one extra lecture and so I thought it would be nice to talk about transactional memory, which is the new hot language construct for simplifying concurrent programming. Sure, it can be implemented as a library as well, and that's how it's made in most languages, but that's a very fragile solution.

So, despite the fact that the languages used in the course are Java, JR and Erlang I chose to use Haskell in the lecture, simply because GHC provides better support for software transactional memory (STM) than any other system that I know of. I think it worked well, the kind of Haskell I used in the lectures is pretty imperative looking, simply because STM in Haskell uses monads, and so the code doesn't look to foreign even for someone not used to functional programming.

In this post I thought I'd share some snippets of code that I wrote for the lecture. They're all just standard examples in concurrent programming. Most of them look rather unimpressive but that's the whole point: programming with transactional memory is such a relief compared to fiddling with lock-based solutions.

This post is a literate Haskell file. Just copy-n-paste it into a file and you can try it out yourself.

> module STM where
>
> import Random
> import Control.Monad
> import Control.Concurrent
> import Control.Concurrent.STM

Let's start with something easy: binary semaphores. Or locks as they're also called. I've used the traditional names p and v here for acquiring and releasing the lock. They're really awful names though.

> type Semaphore = TVar Bool
>
> newSem :: Bool -> IO Semaphore
> newSem available = newTVarIO available
>
> p :: Semaphore -> STM ()
> p sem = do b <- readTVar sem
>            if b
>               then writeTVar sem False
>               else retry
>
> v :: Semaphore -> STM ()
> v sem = writeTVar sem True

The most impressive thing with the transactional memory implementation in GHC is in my opinion the retry combinator. It handles all conditional synchronization by complete magic. Sure, I happen to know how it works under the hood but the simplicity of the API is just marvellous. Whenever we have a condition that is not fulfilled, in the above code the condition is that the lock happens to be already taken, we just call retry. The runtime system will take care of waking up the process at the appropriate time. No messing around with condition variables or anything, like with monitors. It just couldn't be simpler. We'll see more of this in the examples that follow.

A slightly more interesting example: an unbounded buffer which processes can insert stuff into and then take out. It's written with clarity in mind, not efficiency. It's easy to make it more efficient but I didn't want to introduce more data types to the students so I used a naive list implementation. I leave an efficient implementation as an exercise to the reader.

> type Buffer a = TVar [a]
>
> newBuffer :: IO (Buffer a)
> newBuffer = newTVarIO []
> 
> put :: Buffer a -> a -> STM ()
> put buffer item = do ls <- readTVar buffer
>                      writeTVar buffer (ls ++ [item])
> 
> get :: Buffer a -> STM a
> get buffer = do ls <- readTVar buffer
>                 case ls of
>                   [] -> retry
>                   (item:rest) -> do writeTVar buffer rest
>                                     return item

Again the code is so clear and easy to understand that it makes you wonder how is could be any different. It should be said though that this implementation gives priority to processes writing to the buffer. If there is a single process writing to the buffer he can make an arbitrary amount of readers retry. A lock-based implementation provides mutual exclusion so only one process at a time has access to the buffer. In such an implementation processes get access in the same order as they requested it so writers do not have priority over readers. But that implementation of course has all the usual problems that a lock-based implementation has.

One standard pattern that I used a lot in the course was resource allocation. I used a distilled version where there is a counter keeping track of the amount of resources available. Processes may ask for an arbitrary amount of resources and block if they're not available. This interface has the advantage that it's resource agnostic and can be implemented in a language like Java. I'm sure there are nicer ways to do it in Haskell, depending on the resource.

> type Resource = TVar Int
> 
> acquire :: Resource -> Int -> STM ()
> acquire res nr = do n <- readTVar res
>                     if n < nr
>                        then retry
>                        else writeTVar res (n - nr)
>                             
> release :: Resource -> Int -> STM ()
> release res nr = do n <- readTVar res
>                     writeTVar res (n + nr)

As a final example I will give an implementation of the dining philosophers. It's one of the classic problems in concurrent programming and so it's interesting to see how STM performs.

To make this program interesting we have to output what the philosophers are doing so that we can verify that the simulation is correct. But the standard I/O primitives in Haskell are not thread safe so we have to take a bit of care when writing stuff to standard out. The way I've solved it is to have a buffer which all the philosopher processes writes to and then I have a separate thread which reads from the buffer and writes to the screen. That way only one process does the outputting and there is no risk of weird output.

I'm not going to go through the exact formulation of the dining philosophers problem. If you're unsure how it goes I suggest you read the wikipedia article on it. I've chosen to implement the forks as the binary semaphores that we defined above.

> simulation n = do forks <- replicateM n (newSem True)
>                   outputBuffer <- newBuffer
>                   for [0..n-1] $ \i -> 
>                     forkIO (philosopher i outputBuffer
>                             (forks!!i) 
>                             (forks!!((i+1)`mod`n)))
>                   output outputBuffer
>
> output buffer = 
>     do str <- atomically $ get buffer
>        putStrLn str
>        output buffer
>
> for = flip mapM_

This first bit of code just sets everything up for the simulation. The function simulation take the number of philosophers as an argument. Then it creates the required number of forks, the output buffer and spawns off all the philosopher processes which are given their corresponding forks. Finally the main thread goes into a loop which reads of the strings from the output buffer and prints them.

> philosopher :: Int -> Buffer String -> Semaphore -> Semaphore -> IO ()
> philosopher n out fork1 fork2 = 
>     do atomically $ put out ("Philosopher " ++ show n ++ " is thinking.")
>        randomDelay
>
>        atomically $ do
>          p fork1
>          p fork2
>
>        atomically $ put out ("Philosopher " ++ show n ++ " is eating.")
>        randomDelay
>
>        atomically $ do
>          v fork1
>          v fork2
>
>        philosopher n out fork1 fork2
>
> randomDelay = do r <- randomRIO (100000,500000)
>                  threadDelay r

Again, this implementation is so simple that you wonder what the problem was in the first place. The power comes from being able to compose several transactions together sequentially and perform them atomically.

Note that the philosophers execute in the IO monad and only invoke the transactional memory bit when it has to do synchronization. This is the typical usage of transactional memory, something we didn't see with the earlier examples.

There's one thing that's not so nice with GHC's implementation of STM though. You can see it for yourself if you change the call to randomDelay to a call to threadDelay with a fixed time. What will happen is that one of the philosophers will starve. I will not say more about that here though, it's the topic of another blog post.

That's all for this post. I hope you find the examples useful.

Fun Functions: Flatten

2007-08-13T08:28:00.000-07:00

This post is going to be about a function which may not seem like much, it takes a tree and returns the elements of the tree in a list the way an inorder traversal would visit them. While this is not an uncommon task in programming it is so mundane that it hardly deserves a blog post on its own. So why am I writing this? The thing that have captured me is a particular implementation of this function. Let us first set the scene. This blog post is a literate Haskell file and you should be able so save it to a file and run it yourself.

> module Flatten where
> import Test.QuickCheck
> data Tree a = Node (Tree a) a (Tree a) | Leaf deriving Show

So we have a data type for trees, it's the standard one. Nothing surprising here. The scene is set. How would we go about implementing flatten then? There are in fact a whole number of ways to implement it and I'm going to show several of them. The last version will be my favorite one and after having seen all the other versions I hope you will appreciate it as I do. Let us then start with something rather naive. A straightforward recursive function which turns a tree into a list:

> flatten1 :: Tree a -> [a]
> flatten1 Leaf           = []
> flatten1 (Node t1 a t2) = flatten1 t1 ++ [a] ++ flatten1 t2

That's very concise and easy to understand. But this solution has a slight problem. It takes quadratic time. This is due to the fact that ++ take linear time in it's left argument. It is a well known problem with many solutions. The standard solution, due to John Hughes, is to represent lists as functions from lists to lists. This way we get constant time concatenation. Incidentally, some people seem to think that this version is fine. It is used on the type level by alpheccar in his blog. Implementing lists the Hughes way gives us our second variation of flatten.

> flatten2 :: Tree a -> [a]
> flatten2 tree = help2 tree []
> 
> help2 :: Tree a -> [a] -> [a]
> help2 Leaf           = id
> help2 (Node t1 a t2) = help2 t1 . (a :) . help2 t2

Hmmmm. I don't know what you think of this version but I don't like it. For two reasons. It uses higher order programming when it isn't necessary, and that just obfuscates things. Secondly, it needs a helper function. It would be nice if we could do without that. Let's start by tackling the first problem of higher order programming. One way to remove the higher orderness is to give the function help2 an extra argument (called eta expansion, if that means anything to you). This gives us the following version.

> flatten3 :: Tree a -> [a]
> flatten3 tree = help3 tree []
> 
> help3 :: Tree a -> [a] -> [a]
> help3 Leaf           rest = rest
> help3 (Node t1 a t2) rest = help3 t1 (a : help3 t2 rest)

I call the extra argument rest because it contains the rest of the elements that we should put in the list. This solution still uses a helper function which is a bit of a nuisance but unfortunately I don't know an easy way to remove it. Instead I'm going to present another way to think about flattening that will take us down another road. How would you go about implementing an inorder traversal with a while loop instead of recursion? The solution would be to use a stack to remember the parts of the tree that we haven't visited yet. We can use the same idea to write a new version of flatten, using a list as the stack. The top of the stack will be the part of the tree that we are visiting at the moment. We will need a little trick to remember when we have visited a subtree and that is to simply remove it from its parent. The final code looks like this.

> flatten4 :: Tree a -> [a]
> flatten4 tree = help4 [tree]
>
> help4 :: [Tree a] -> [a]
> help4 [] = []
> help4 (Leaf : ts) = help4 ts
> help4 (Node Leaf a t : ts) = a : help4 (t : ts)
> help4 (Node t1 a t2 : ts) = help4 (t1 : Node Leaf a t2 : ts)

This version is pretty clever. It takes apart the tree and spreads it out into the list until there is nothing left of the tree. But one might argue that this version is too clever. It is certainly not easy to understand from the code. Plus it uses a helper function, something that we have been trying to get away from. The final version that I will show builds upon the idea that we can represent the list of trees from our previous version as simply a tree. We are going to use the input tree as the stack and modify it as we go along. It goes like this.

> flatten5 :: Tree a -> [a]
> flatten5 Leaf = []
> flatten5 (Node Leaf a t) = a : flatten5 t
> flatten5 (Node (Node t1 a t2) b t3) = flatten5 (Node t1 a (Node t2 b t3))

What happens in the third equation is that the tree is restructured so that it becomes a linear structure, just like a list. It is then simple to read off the element in the second equation. This is my favorite version. It's short, without helper functions and no higher order stuff. Plus it is efficient. Now, you might complain that it does more allocations that some of the previous versions. Indeed we have to allocate new trees in the third case. But note that that case is tail recursive. That means that it doesn't use any stack. It is possible to implement this version of the function so that it doesn't any more space than any of the other versions above. The control stack is encoded into the tree. One additional thing about this version is that it's termination is not entirely obvious. This function will fool several termination checkers that I know of. In particular, in the third equation the argument tree in recursive call is not smaller than the input tree. This is a criterion that many termination checker use. To understand that the function terminates you must instead see that it is the left subtree of the input that gets smaller together with the fact that trees are finite. To me this is an extra bonus. You might think otherwise. Finally, I have some code to test that all the version I've given above are equal.

> instance Arbitrary a => Arbitrary (Tree a) where
>   arbitrary = sized arbTree
>    where arbTree 0 = return Leaf
>          arbTree n | n>0 = frequency $
>                 [(1, return Leaf)
>                 ,(4, do elem <- arbitrary
>                         t1   <- arbTree (n `div` 2)
>                         t2   <- arbTree (n `div` 2)
>                         return (Node t1 elem t2))]
> prop_flatten2 t = flatten1 t == flatten2 (t ::Tree Int)
> prop_flatten3 t = flatten1 t == flatten3 (t ::Tree Int)
> prop_flatten4 t = flatten1 t == flatten4 (t ::Tree Int)
> prop_flatten5 t = flatten1 t == flatten5 (t ::Tree Int)
> runTests = mapM_ quickCheck [prop_flatten2,prop_flatten3,
>                              prop_flatten4,prop_flatten5]
>

Learning Monads in 1996

2007-08-13T07:39:00.000-07:00

Given the plethora of monad tutorials these days it's hard to imagine how anybody could have learned about this concept during the 90's. I learned most of what I know about monads in 1996, back in the days when even researchers had a hard time understanding the list monad. This post will not be another monad tutorial. I'm simply going to explain how I managed to become friendly with monads in a time where monad tutorials where very scarce. So, why did I start looking into monads in the first place? In 1995 I started my undergraduate studies at Chalmers and the first language we were taught was Gofer, a now extinct dialect of Haskell. When the time came for teaching us I/O they didn't show us the monadic way of doing things. Apparently our teacher (who by the way was not a functional programmer, a strange fact considering that Chalmers abound with functional programmers) thought that the concept of monads was too scary and complicated and taught us continuation I/O, something which I experienced as painfully horrible and utterly confusing. But some of us students had heard about monads and how they were supposed to be they way of the future when it came to I/O and other things. They sounded really cool, and hey, they couldn't possibly be worse than continuation I/O right? So I set my mind to learning monads. I tried to read everything that was available on the subject which wasn't much at the time. There were two papers which I spent a considerable time with: "The essence of functional programming" and "Monads for functional programming" by Phil Wadler. I implemented every single bit of code in those papers, tore the code to pieces and put it back again. I started implementing monadic parser libraries based on the paper "Monadic parser combinators" by Hutton and Meijer. Trying out different implementations and combinators I wrote variation after variation of these libraries. I don't know how many of parser libraries I implemented in the end. I've lost count. Slowly I was starting to grasp monads as I played around with them. After some time I was even able to digest and understand the paper "Monad Transformers and Modular Interpreters" but it certainly didn't happen the first time I read it. What finally pushed me over the edge was when a friend asked me how to combine different monads. I didn't know the answer and for some reason it kind of bugged me that I couldn't answer his question. But I had a vague recollection of ideas in the above paper. Determined to understand this, I sat down and started to hack on monad transformers and didn't finish until I finally had a rudimentary understanding of them, several hours later. Monadic I/O turned out to be a breeze to learn thanks to the nice "Monadic I/O in Haskell 1.3" by Kevin Hammond and Andrew Gordon. And there you have it. The way I learned about monads. No magic, just hard work. I'm a slow learner and you can probably get comfortable with monads much faster than I. But I don't think you'll find any method which is significantly different from mine. What makes me say that? Why are monads such a difficult concept to learn? The conclusion I've come to is that the difficulty lie in the fact that it's an abstraction. Abstract things are simply hard, especially for people who have little experience with abstractions before. As an example take an abstraction like the concept of a group. What is a group? It's a set, call it G, and a binary operator on G which is associative and has inverses and an identity element. For anyone used to abstractions this is a perfectly fine definition. But not so for your average man on the street, or even programmer. They will react something like this: "Ok, I understand the definition. But what *is* a group? I mean, in *reality*?" I remember having similar thoughts myself. The fact that a group is not a concrete thing is a great confusion. In my experience this only settles after the person has gained some experience with lots of examples of what a group is, and a few examples of things that fail to be a group. By then most people will have gotten pretty comfortable with the concept. It's the same thing with monads. They're an abstract concept which will take time to digest. And this digestion is best done by playing around with various examples of monads and occasionally meet examples of data types which are not monads and for what reason they fail. I don't think there are any shortcuts to learning abstract concepts. But I think that one can become better at it and it certainly helps to have a good tutorial when learning a subject, such as monads.