Writing a Single-Stepping Interpreter with Monads
One nice thing about Haskell monads - perhaps the nicest - is how they
permit separation of concerns. You can write code in a
straightforward way, without knowing that underneath it all, all kinds
of implicit control flow may be happening. Usually, when we use
monads in Haskell, we use the standard ones: Maybe, Either, lists,
and so on. When we design our programs, use tend to use compositions
of Reader, Writer, State, and probably IO somewhere. This is
fine. Going overboard with odd monadic effects is probably not good
for readability. But, sometimes it is important to remember that
defining your own idiosyncratic monad can allow an elegant
implementation of something seemingly very complicated. In this
Literate Haskell program, we will see how to implement an
single-stepping interpreter for a very simple Lisp dialect, structured
in a monadic style. In fact, the single-stepping support is going to
be a minuscule part of the overall code. Please note that when we
write "Lisp", we mean a highly cut-down and simplified language that
bears only a superficial resemblance to modern industrial Lisp
dialects.
This is not a monad tutorial. This program will not be comprehensible unless you already know how to work with monads in Haskell. All I am trying to do is demonstrate a technique that I find cool.
First, we need the usual module boilerplate. Three of the four imports are for the small command line interface we will add near the end.
module Main (main) where
import Control.Monad (ap, liftM)
import Data.Char (isSpace, isDigit)
import System.Environment (getArgs)
import Text.ParserCombinators.ReadPS-Expressions
In Lisp, the same structure is used for representing both code and
data values: the so-called S-expression. An S-expression is either
an atom or a cons cell (or just cons). A cons is just a pair of
two other values, called car and cdr for ancient and arcane
reasons. An atom is either a symbol or a number. We encapsulate this
as one data type:
data SExp = Cons SExp SExp | Symbol String | Number IntWe print a cons as (a . b), where a and b are its constituents.
There are two special cases:
if
bis itself a cons(c . d), we print(a c . d), and recursively.if
bis the symbolnil, then we print simply(a).
By convention, the empty list is represented as the symbol nil.
These conventions let us print the common case of a linked list as (a b c) instead of (a . (b . (c . nil))). We implement the printing
as an instance of the Show typeclass - I am not normally a fan of
using this for human-readable information, but I do not wish to
include a prettyprinting library for this program.
instance Show SExp where
show (Symbol s) = s
show (Number x) = show x
show (Cons a b) = "(" ++ show a ++ recurse b ++ ")"
where recurse (Symbol "nil") = ""
recurse (Cons c d) = " " ++ show c ++ recurse d
recurse x = " . " ++ show xWe can turn any Haskell list of S-expressions into an S-expression
that represents that same list, using the cars to contain the head,
and the cdr the link to the tail of the list. We use the symbol
nil to represent the empty list.
toSExp :: [SExp] -> SExp
toSExp [] = Symbol "nil"
toSExp (x:xs) = Cons x (toSExp xs)We can also try to convert a SExp to a Haskell list of SExps, although this is not guaranteed to work.
fromSExp :: SExp -> Maybe [SExp]
fromSExp (Symbol "nil") = Just []
fromSExp (Cons car cdr) = do cdr' <- fromSExp cdr
Just $ car : cdr'
fromSExp _ = NothingThe Monad
Before we write the evaluation functions, we'll define the InterpM
monad we will be using. Computation can be in three different states:
either we have a value (Result), we have encountered an error during
evaluation (Error), or we are at a stepping point (Step). This
is a labeled breakpoint in execution where we have the option of
continuing execution by executing the embedded monadic value. Or we
can stop. Or we can save the state for later. This is very similar
to continuation-passing style (and indeed, the InterpM monad is just
a special case of a continuation monad).
data InterpM a = Result a
| Error String
| Step String (InterpM a)The Monad instance is completely straightforward by the types.
instance Monad InterpM where
return = Result
fail = Error
Result x >>= f = f x
Error s >>= _ = Error s
Step desc path >>= f = Step desc (path >>= f)The Functor and Applicative instances, which are required in GHC
versions newer than 7.8, are completely mechanical.
instance Functor InterpM where
fmap = liftM
instance Applicative InterpM where
pure = return
f <*> x = f `ap` xWe wrap the Step constructor in a function to create a veneer of
abstraction. In principle, the monad could be much more complicated
(it would be in most practical implementations), but with a similarly
simple interface.
step :: String -> InterpM a -> InterpM a
step = StepThe Interpreter
We are now ready to implement our Lisp interpreter. First, we define our variable table, which is a mapping from variable names to their values. Values are, of course, just S-expressions.
type VarTable = [(String,SExp)]Evaluation with the eval function happens in the context of a
variable table, and evaluates an S-expression, giving back an
InterpM action. Thus, we might get a result (if evaluation
finished), an error, or a stepping point, where we can choose to
continue execution.
eval :: VarTable -> SExp -> InterpM SExpWe will need a few building blocks before we can define the eval
function itself. And most importantly, we must understand how Lisp is
evaluated. A Lisp form is a list (x y z...). The first element
of the list is called the operator, and the remaining elements the
operands. In the general case, the operator is the name of a
function, and the operands are recursively evaluated, with the
operator applied to the results. However, some forms are special
forms, which have specialised evaluation semantics. Our Lisp dialect
has a few of those, which we will discuss when we get to them.
The most basic evaluation rule is that the operator of a form must be a symbol, so we define a helper function for determining whether an S-expression is a symbol.
isSymbol :: SExp -> Maybe String
isSymbol (Symbol v) = Just v
isSymbol _ = NothingThe operands of a form must constitute a valid list, which we can
check using the previously defined fromSExp function.
getOperands :: SExp -> InterpM [SExp]
getOperands = maybe (fail "Invalid form") return . fromSExpOnce we have the name of a function, we must also be able to call it.
This is the job of getFunction, which returns a Haskell function
corresponding to a named Lisp function.
getFunction :: String -> InterpM (VarTable -> [SExp] -> InterpM SExp)Before we give getFunction a definition, let's define eval. We
use step to create a stepping point before any S-expression is
evaluated, followed by matching on the structure of the S-expression.
eval vt sexp = step ("Evaluating " ++ show sexp) $
case sexp ofThe simplest case is a number, which evaluates to itself.
Number x -> return $ Number xSymbols are looked up in the variable table.
Symbol v -> case lookup v vt of
Nothing -> fail $ "Unknown variable: " ++ v
Just se -> return seThe only special form in our Lisp is quote, which produces its
single operand without evaluating it. Thus, (quote a) evaluates
to the symbol a, and (quote (a b c)) to the list (a b c). This
special form is how we write literal Lisp data.
Cons (Symbol "quote") cdr ->
case fromSExp cdr of
Just [x] -> return x
_ -> fail $ "Bad arguments to quote: " ++ show cdrIf the operator is not quote, then it must be the name of a
function. We use getFunction to get the corresponding Haskell
function, evaluate the operands, then apply the function to the
evaluated operands. We also create a stepping point, just for good
measure.
Cons (Symbol operator) rest -> do
operator' <- getFunction operator
args <- getOperands rest
args' <- mapM (eval vt) args
step ("Applying " ++ operator ++ " to " ++ show (toSExp args')) $
operator' vt args'If none of the previous cases matched, then the S-expression must be malformed.
Cons car cdr ->
fail ("Bad form: " ++ show (Cons car cdr))The definition of getFunction is mostly uninteresting. This is
where we put built-in functions. One of those is list, which takes
any number of arguments, and simply returns them.
getFunction "list" = return $ const $ return . toSExpThe cons function takes two arguments, and returns a cons cell
containing them.
getFunction "cons" = return $ \_ args ->
case args of
[x, y] -> return $ Cons x y
_ -> fail $ "Wrong number of arguments to cons: " ++ show argsThe functions car and cdr are used to access the components of a
cons cell.
getFunction "car" = return $ \_ args ->
case args of
[Cons car _] -> return car
_ -> fail $ "Bad arguments to car: " ++ show args
getFunction "cdr" = return $ \_ args ->
case args of
[Cons _ cdr] -> return cdr
_ -> fail $ "Bad arguments to cdr: " ++ show argsWe also have the usual arithmetic operators on numbers.
getFunction "+" = return $ \_ args ->
case args of
[Number x, Number y] -> return $ Number $ x+y
_ -> fail $ "Invalid arguments to +: " ++ show args
getFunction "-" = return $ \_ args ->
case args of
[Number x, Number y] -> return $ Number $ x-y
_ -> fail $ "Invalid arguments to -: " ++ show args
getFunction "/" = return $ \_ args ->
case args of
[Number x, Number y] -> return $ Number $ x `div` y
_ -> fail $ "Invalid arguments to /: " ++ show args
getFunction "*" = return $ \_ args ->
case args of
[Number x, Number y] -> return $ Number $ x*y
_ -> fail $ "Invalid arguments to *: " ++ show argsThe most interesting built-in function is apply. The apply
function takes two arguments: the first represents a function, the
second a list of arguments, and apply applies the function to those
arguments. The simplest case is the one where the first argument is a
symbol - we just call the function with that name.
getFunction "apply" = return $ \vt args ->
case args of
[Symbol fname, fargs]
| Just fargs' <- fromSExp fargs -> do
f <- getFunction fname
step ("Calling " ++ fname ++ " with arguments " ++ show fargs) $
f vt fargs'More interestingly, we also permit the function argument to be a
lambda form. This is an S-expression with the structure (lambda (params...) body). When apply is given a lambda form, it binds the
named parameters to given arguments (they must match in number) and
evaluates the body.
[Cons (Symbol "lambda") rest, fargs]
| Just fargs' <- fromSExp fargs,
Just [params, body] <- fromSExp rest,
Just params' <- mapM isSymbol =<< fromSExp params,
length params' == length fargs' -> do
step ("Calling lambda with parameters " ++ show params ++
" bound to " ++ show fargs) $
eval (zip params' fargs' ++ vt) body
_ -> fail $ "Invalid arguments to funcall: " ++ show argsIf apply is given anything else, it reports an error.
getFunction f = fail ("Unknown function: " ++ f)Apart from the calls to step, the above Lisp interpreter is
completely bog-standard. The single-stepping capability has not
intruded upon the way we structured the functions. Yet, the
capability is there, and we can use it to create an interesting
frontend, where the user is prompted before every evaluation step.
stepIO :: SExp -> IO ()
stepIO = loop . eval mempty
where loop (Result v) = do
putStrLn $ "Evaluation finished. Result: " ++ show v
loop (Error err) = do
error err
loop (Step desc s) = do
putStrLn $ desc ++ " (press Enter to continue)"
_ <- getLine
loop sThis frontend could easily be made much more sophisticated - for example, it could allow the user to move backwards in time, or inspect the variable table (although this would require an extension of the monad).
A Small Parser
To tie it all together, we define a quick little Lisp parser. Note that this one does not support the dotted-pair notation for conses.
token :: ReadP a -> ReadP a
token p = skipSpaces >> p
schar :: Char -> ReadP Char
schar = token . char
numberOrSymbol :: ReadP SExp
numberOrSymbol = token $ do s <- munch1 $ \c -> not (isSpace c || c `elem` "()")
return $ if all isDigit s then Number $ read s
else Symbol s
readSExp :: ReadP SExp
readSExp = numberOrSymbol
+++ between (schar '(') (schar ')') sexps
where sexps = many readSExp >>= return . toSExp
parseString :: String -> Either String SExp
parseString s =
case readP_to_S (do {e <- readSExp; token eof; return e}) s of
[(e, [])] -> Right e
_ -> Left "Parse error"Command Line Usage
Finally, we create a main function that allows us to run this very
file as a Haskell program.
main :: IO ()
main = do
args <- getArgs
case args of
[f] -> do s <- readFile f
case parseString s of
Right sexp -> stepIO sexp
Left err -> error err
_ -> error "Needs an argument."While this Lisp dialect is quite limited (in particular, dynamic
scoping is an anachronism), it is still more expressive than you may
think. We can use lambda forms to simulate local variables and named
functions. For example, if we put the following program in double.lisp:
(apply (quote (lambda (square)
(apply square (list 2))))
(list (quote (lambda (x) (* x x)))))
Then we can run our interpreter on it as such:
$ runhaskell stepping.lhs double.lisp
Evaluating (apply (quote (lambda (square) (apply square (list 2)))) (list (quote (lambda (x) (* x x))))) (press Enter to continue)
Evaluating (quote (lambda (square) (apply square (list 2)))) (press Enter to continue)
Evaluating (list (quote (lambda (x) (* x x)))) (press Enter to continue)
Evaluating (quote (lambda (x) (* x x))) (press Enter to continue)
Applying list to ((lambda (x) (* x x))) (press Enter to continue)
Applying apply to ((lambda (square) (apply square (list 2))) ((lambda (x) (* x x)))) (press Enter to continue)
Calling lambda with parameters (square) bound to ((lambda (x) (* x x))) (press Enter to continue)
Evaluating (apply square (list 2)) (press Enter to continue)
Evaluating square (press Enter to continue)
Evaluating (list 2) (press Enter to continue)
Evaluating 2 (press Enter to continue)
Applying list to (2) (press Enter to continue)
Applying apply to ((lambda (x) (* x x)) (2)) (press Enter to continue)
Calling lambda with parameters (x) bound to (2) (press Enter to continue)
Evaluating (* x x) (press Enter to continue)
Evaluating x (press Enter to continue)
Evaluating x (press Enter to continue)
Applying * to (2 2) (press Enter to continue)
Evaluation finished. Result: 4
Limitations and Extensions
The interpreter implemented here is a toy. The primary limitation is
that the monad is entirely special-purpose. We do not re-use a
standard error monad, nor do we use a Reader for passing around the
variable table. Futhermore, the stepping points give us too little
information. One bit that would be nice to have would be the
evaluation depth, which would allow us to create a command for
saying "evaluate the current expression to completeness without
prompting me about the intermediate steps". This command would be
implemented by always following Steps (without prompting) until the
starting depth is reached again. Such a facility is crucial for
fast-forwarding to the interesting parts of the computation.