A blag containing my current adventures in logic, haskell and agents.

Friday 11 September 2009

Breaking GHC and the monad laws

So the summer school Applied Functional Programming at the University of Utrecht ended a few weeks ago. I really had a lot of fun and learned even more than I had fun ;). Anyway during the summer school we received a lot of extra exercises you could do out of your own interests. So I picked up one of the exercises I didn't do during course. Namely an extension of the state monad.

In this exercise you should extend the state monad in such a way it keeps track of the number of binds, returns and some other capabilities. One can read the full exercise here. In short if you have a computation like this:
do return 3 >> return 4
return 5
diagnostics

After running this computation in the state monad one should have a result that somewhat looks like this:
 "[bind=3, diagnostics=1, return=3]"


That looks doable, but is a lot more tedious than you except at first. Anyway, a very important point you should catch is that after implementing this functionality we now do NOT abide to the monad laws anymore.

Remember the monad laws (stolen from here):
 
1. "Left identity": return a >>= f ≡ f a
2. "Right identity": m >>= return ≡ m
3. "Associativity": (m >>= f) >>= g ≡ m >>= (\x -> f x >>= g)

As you can see all three laws can transform an expression on the left side to an expression that contains one bind less. Therefore if we perform a computation in our state monad we would have a different number of binds after transforming and could therefore give a different result when getting out our number of binds and returns.


But let's first return to the implementation of this extension. I chose a Map with strings to int that can keep track of the number of binds and annotations. So we now have a State Monad with some extra functionality:
data StateMonadPlus s a = StateMonadPlus {runStateMonadPlus:: (s,Map String Int) -> (a, s, Map String Int)}

This is very similar to the original definition of the State Monad. (If you forgot some parts of it you can read a part of my State Monad tutorial here.

Anyway, before defining the Monad instance we define a useful helper function for updating the Map that keeps track of the number of binds and annotations and another function that puts the diagnostics as the result of the computation.
diagnostics :: StateMonadPlus s String
diagnostics = StateMonadPlus $ \(s,m) ->
let m2 = updateMap "diagnostics" m
in ((show m2 ), s, m2)
updateMap :: Ord k => k -> Map k Int -> Map k Int
updateMap k = M.insertWith (+) k 1


And now for the Monad instance:
test1 = do diagnostics
return 4
return 5
mget

diag = diagStateMonadPlus test1 undefined
eval = evalStateMonadPlus test1 undefined
diageval = (diag, eval)


Anyway, halfway during the implementation I got stuck on something that seemed like a bug in my program, but before spoiling everything I will first show the functions I defined for running and evaluating our freshly made State Monad.
evalStateMonadPlus :: StateMonadPlus s a -> s -> a
evalStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in a

execStateMonadPlus :: StateMonadPlus s a -> s -> s
execStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in b

diagStateMonadPlus :: StateMonadPlus s a -> s -> Map String Int
diagStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in c

instance MonadState s (StateMonadPlus s) where
get = StateMonadPlus $ \(s,m) -> (s, s, m)
put a = StateMonadPlus $ \(_, m) -> ((),a, m)

mget:: StateMonadPlus s (Map String Int)
mget = StateMonadPlus $ \(s,m) -> (m, s, m)


Here the get and put functions are the familiar functions in the original state monad, and mget does something similar to diagnostics, by putting our map into the result.

Because I defined a multi parameter type class, and a flexible instance too, we will have to add some compiler pragma's to be able to use them. And we furthermore used some imports. So just add this to the start of the module:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
module StateMonadExercise where
import Control.Monad.State
import qualified Data.Map as M
import Data.Map(Map)


So the only part left to test this functionality is to define some tests:
test1 = do return 4 
return 5
mget

diag = diagStateMonadPlus test1 undefined
eval = evalStateMonadPlus test1 undefined
diageval = (diag, eval)


So now we're ready to run these test right? (I've enclosed the full test module at the end of this blog post for your convenience, so you could just copy the whole code at once if you'd like.)

Anyway, after running these test, (and I think it's not a bug on my part but a (maybe) too enthusiastic optimization of GHC (6.10.4)). We get these results:
 
*StateMonadExercise> diag
fromList [("bind",2),("return",2)]
*StateMonadExercise> eval
fromList [("return",2)]
*StateMonadExercise> diageval
(fromList [("bind",2),("return",2)]
,fromList [("return",2)])


So here GHC just optimized away 2 binds for the result but not from the map. It serves me right though, since I'm breaking the monad laws anyway :-). So if anyone can confirm this is GHC's "fault" (or doing) and not mine I would be happy.
One final comment: I will probably post the full implementation later this month.


The full code:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
module StateMonadExercise where
import Control.Monad.State
import qualified Data.Map as M
import Data.Map(Map)

data StateMonadPlus s a = StateMonadPlus {runStateMonadPlus:: (s,Map String Int) -> (a, s, Map String Int)}

instance Monad (StateMonadPlus s) where
return a = StateMonadPlus $ (\ (s, m) -> (a, s, updateMap "return" m))
(StateMonadPlus s) >>= f = StateMonadPlus $ \(state, m) ->
(let (a1, st1, m1) = s (state,m) in
let (a2, st2, m2) = runStateMonadPlus (f a1) (st1,m1)
in (a2, st2, updateMap "bind" m2)
)

updateMap :: Ord k => k -> Map k Int -> Map k Int
updateMap k = M.insertWith (+) k 1

diagnostics :: StateMonadPlus s String
diagnostics = StateMonadPlus $ \(s,m) ->
let m2 = updateMap "diagnostics" m
in ((show m2 ), s, m2)

evalStateMonadPlus :: StateMonadPlus s a -> s -> a
evalStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in a

execStateMonadPlus :: StateMonadPlus s a -> s -> s
execStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in b

diagStateMonadPlus :: StateMonadPlus s a -> s -> Map String Int
diagStateMonadPlus (StateMonadPlus s) st = let (a,b,c) = s (st, M.empty) in c

instance MonadState s (StateMonadPlus s) where
get = StateMonadPlus $ \(s,m) -> (s, s, m)
put a = StateMonadPlus $ \(_, m) -> ((),a, m)

mget:: StateMonadPlus s (Map String Int)
mget = StateMonadPlus $ \(s,m) -> (m, s, m)

test1 = do return 4
return 5
mget

diag = diagStateMonadPlus test1 undefined
eval = evalStateMonadPlus test1 undefined
diageval = (diag, eval)

{-
*StateMonadExercise> diag
fromList [("bind",2),("return",2)]
*StateMonadExercise> eval
fromList [("return",2)]
*StateMonadExercise> diageval
(fromList [("bind",2),("return",2)]
,fromList [("return",2)])
-}

Followers