Whitespace is a widely-known instruction set on a stack-machine virtual machine.
The below is a sample hello world in Whitespace intermediate language. I added some notes that begin with #.
Push 0 Push 72 Store # storing the value 72 to 0th heap Push 1 Push 101 Store # storing the value 101 to 1st heap Push 2 Push 108 Store Push 3 Push 108 Store Push 4 Push 111 Store Push 5 Push 44 Store Push 6 Push 32 Store Push 7 Push 119 Store Push 8 Push 111 Store Push 9 Push 114 Store Push 10 Push 108 Store Push 11 Push 100 Store Push 12 Push 32 Store Push 13 Push 111 Store Push 14 Push 102 Store Push 15 Push 32 Store Push 16 Push 115 Store Push 17 Push 112 Store Push 18 Push 97 Store Push 19 Push 99 Store Push 20 Push 101 Store Push 21 Push 115 Store # (cont.d...) |
Push 22 Push 33 Store Push 23 Push 0 Store Push 0 Call "\t \t \t\t \t \t\t\t \t \t \t\t \t \t\t\t \t\t\t \t\t\t " # Jumping to the ☃ mark with remembering this place to return later Call "\t \t \t\t \t\t\t \t\t \t \t \t\t \t\t \t\t \t\t\t \t\t\t \t \t \t\t \t\t\t \t\t " # Jumping to the ☁ mark with remembering this place to return later End # Terminating program. Label " \t \t\t \t \t\t \t \t\t " Infix Plus Return Label "\t \t \t\t \t \t\t\t \t \t \t\t \t \t\t\t \t\t\t \t\t\t " # ☃ Dup # Copying the top value of the stack. It's 0 (if you see here for the first time.) Retrieve # Getting the 0th value of the heap. It's 72 (if you see here for the first time.) Dup If Zero " \t \t\t \t\t\t \t\t \t \t \t\t \t\t\t\t\t \t \t \t \t\t \t \t\t\t \t \t \t\t \t \t\t\t \t\t\t \t\t\t " OutputChar Push 1 Infix Plus Jump "\t \t \t\t \t \t\t\t \t \t \t\t \t \t\t\t \t\t\t \t\t\t " Label " \t \t\t \t\t\t \t\t \t \t \t\t \t\t\t\t\t \t \t \t \t\t \t \t\t\t \t \t \t\t \t \t\t\t \t\t\t \t\t\t " Discard Discard Return # Ending the function call and going back to the previous place. Label " \t \t\t \t \t\t \t \t \t\t \t \t\t\t " Dup Dup ReadChar Retrieve Dup Push 10 Infix Minus If Zero " \t \t\t \t\t\t \t\t \t \t \t\t \t\t\t\t\t \t \t \t\t \t \t\t \t \t \t\t \t \t\t\t " Discard Push 1 Infix Plus Jump " \t \t\t \t \t\t \t \t \t\t \t \t\t\t " Label " \t \t\t \t\t\t \t\t \t \t \t\t \t\t\t\t\t \t \t \t\t \t \t\t \t \t \t\t \t \t\t\t " Discard Push 1 Infix Plus Push 0 Store Return Label "\t \t \t\t \t\t\t \t\t \t \t \t\t \t\t \t\t \t\t\t \t\t\t \t \t \t\t \t\t\t \t\t " # ☁ Push 10 Push 13 OutputChar OutputChar Return |
And the result is
Hello, world of spaces!
You can run it with a little bit fixes of the official whitespace implementation wspace 0.3 written in Haskell.
A Language
I made an experimental small language and it's compiler.
(begin
(def 0 98)
(f 90 7)
(putc (ref 0))
(end)
(defn f (x y)
(putc (g x y)))
(defn g (x y)
(+ x y)))
Syntax: an expression begins with "(" and a name, arguments for it, and end with ")". If an expression has a value, you have to use the value with enclosing another expression. Otherwise the behavior is undefined. If an expression does not have a value, you must not enclose it as an argument. Otherwise the behavior is undefined.
- begin
- combines some expressions that don't have return values
- def {index} {value}
- assigns the {value} to {index}th slot of heap
- ref {index} -> {value}
- returns the {value} of {index}th slot of heap
- putc {value}
- outputs a character which ASCII code is {value}
- end
- terminates program
- defn {name} ({args}) {body}
- defines a function
- if a program come to defn without using call, the behavior is undefined.
* + {value} {value} -> {value} * obviously
You can "call" a function you made just with ({name} {arg1} {arg2}). You can use arguments of the function just by an identifier like x.
Sample code
(begin
(putc (+ 50 40))
(end))
That is compiled to
Push 50
Push 40
Infix Plus
OutputChar
End
and shows 'Z'.
I little bit complicated example
(begin
(def 0 98)
(f 90 7)
(putc (ref 0))
(end)
(defn f (x y)
(putc (g x y)))
(defn g (x y)
(+ x y)))
compiled to
Push 0
Push 98
Store
Push 90
Push 7
Call "f"
Push 0
Retrieve
OutputChar
End
Label "f"
Push (-1)
Swap
Store
Push (-2)
Swap
Store
Push (-1)
Retrieve
Push (-2)
Retrieve
Call "g"
OutputChar
Return
Label "g"
Push (-3)
Swap
Store
Push (-4)
Swap
Store
Push (-3)
Retrieve
Push (-4)
Retrieve
Infix Plus
Return
and shows "ab".
You may have noticed that the function argument is in negative number of heaps. If you updates them like by (def -3 100)
it may result in breaking something, but since this implementation doesn't support negative literals, it remains safe.
The compiler is below, written in Haskell.
import qualified VM as V
import qualified Text.Parsec as P
import Control.Applicative ((<|>), (<$>))
import qualified Control.Monad.State as S
import qualified Data.Map as M
import Data.Maybe (fromJust)
data Intermediate = Comment String
| Inst V.Instruction
| Paramdef String
| Paramref String
deriving Show
type LispParser = P.ParsecT String () (S.State String)
type ParamMap = M.Map String Integer
main :: IO ()
main = do
code <- readFile "hworld.lisp"
--mapM_ print $ parse code
let runtime = compile (parse code)
mapM_ print runtime
putStrLn "--"
V.vm (V.VM runtime (V.Stack []) (V.Stack []) M.empty 0)
parse :: String -> [Intermediate]
parse str = either (error . show) id $
S.evalState (P.runPT parseExpr () "lisp" str) "toplevel"
parseExpr :: LispParser [Intermediate]
parseExpr = P.try parseInt
<|> parseDefn
<|> parseBuiltin
<|> parseApply
<|> parseVar
parseInt :: LispParser [Intermediate]
parseInt = do
x <- P.many1 P.digit
return [Inst $ V.Push $ read x]
parseAtom :: LispParser String
parseAtom = P.many1 $ P.noneOf " \t\n()"
parseDefn :: LispParser [Intermediate]
parseDefn = do
P.try $ do
ignoringSpaces $ P.char '('
ignoringSpaces $ P.string "defn"
fname <- requireSpaces parseAtom
S.lift $ S.put fname
ignoringSpaces $ P.char '('
names <- ignoringSpaces $ parseAtom `P.sepBy` P.skipMany1 P.space
ignoringSpaces $ P.char ')'
body <- ignoringSpaces parseExpr
ignoringSpaces $ P.char ')'
S.lift $ S.put "toplevel"
return $
Comment "(defn" :
Inst (V.Label fname) :
map (Paramdef . ((fname ++ "/") ++)) names ++
body ++ [Inst V.Return] ++ [Comment ")"]
parseBuiltin :: LispParser [Intermediate]
parseBuiltin = P.try $ do
(fname, xs) <- atomAndArgs
x <- case (fname, length xs) of
("+", 2) -> return [Inst $ V.Infix V.Plus]
("putc", 1) -> return [Inst V.OutputChar]
("def", 2) -> return [Inst V.Store]
("ref", 1) -> return [Inst V.Retrieve]
("end", 0) -> return [Inst V.End]
("begin", _) -> return []
_ -> fail "omg"
return $ Comment ('(' : fname) : concat xs ++ x ++ [Comment ")"]
parseApply :: LispParser [Intermediate]
parseApply = do
(fname, xs) <- atomAndArgs
return $ concat xs ++ [Inst $ V.Call fname]
atomAndArgs :: LispParser (String, [[Intermediate]])
atomAndArgs = do
ignoringSpaces $ P.char '('
fname <- ignoringSpaces parseAtom
xs <- ignoringSpaces $ parseExpr `P.sepBy` P.many1 P.space
P.char ')'
return (fname, xs)
parseVar :: LispParser [Intermediate]
parseVar = do
name <- ignoringSpaces $ P.many1 $ P.noneOf " \t\n()"
fname <- S.lift S.get
return [Paramref $ fname ++ '/' : name]
ignoringSpaces :: LispParser a -> LispParser a
ignoringSpaces f = P.skipMany P.space >> f
requireSpaces :: LispParser a -> LispParser a
requireSpaces f = P.skipMany1 P.space >> f
compile :: [Intermediate] -> [V.Instruction]
compile inters = concat $ S.evalState (mapM compile' inters) M.empty
compile' :: Intermediate -> S.State ParamMap [V.Instruction]
compile' (Comment _) = return []
compile' (Inst x) = return [x]
compile' (Paramdef name) = do
idx <- pred . negate . fromIntegral . M.size <$> S.get
S.modify $ M.insert name idx
return [V.Push idx, V.Swap, V.Store]
compile' (Paramref name) = do
idx <- fromJust . M.lookup name <$> S.get
return [V.Push idx, V.Retrieve]
This code depends on VM.hs from wspace-0.3 to share the data structure of VM Instruction and to execute the compiled program. If you only want to compile given programs, you don't need VM.hs but just to add the following definition.
data Instruction =
Push Integer
| Dup
| Ref Int
| Slide Int
| Swap
| Discard
| Infix Op
| Store
| Retrieve
| Label Label
| Call Label
| Jump Label
| If Test Label
| Return
| OutputChar
| OutputNum
| ReadChar
| ReadNum
| End
deriving (Show,Eq)
By the way wspace-0.3 had an issue that it can only handle sequential indices of heap. You can store values in 0th, 1st and 2nd slots of heap, but you cannot store in 100th without completing all indices between 0 to 100. I wrote a patch to allow any index. Feel free to use it.
diff --git VM.hs VM.hs
index c9e96ab..bb74374 100644
--- VM.hs
+++ VM.hs
@@ -1,6 +1,8 @@
module VM where
import IO
+import qualified Data.Map as M
+import Data.Maybe (fromJust)
{- Stack machine for running whitespace programs -}
@@ -35,7 +37,7 @@ type Loc = Integer
type Program = [Instruction]
newtype Stack = Stack [Integer]
-type Heap = [Integer]
+type Heap = M.Map Integer Integer
data VMState = VM {
program :: Program,
@@ -130,13 +132,7 @@ findLabel' m (_:xs) i = findLabel' m xs (i+1)
-- Heap management
retrieve :: Integer -> Heap -> IO Integer
-retrieve x heap = return (heap!!(fromInteger x))
+retrieve x heap = return $ fromJust $ M.lookup x heap
store :: Integer -> Integer -> Heap -> IO Heap
-store x 0 (h:hs) = return (x:hs)
-store x n (h:hs) = do hp <- store x (n-1) hs
- return (h:hp)
-store x 0 [] = return (x:[])
-store x n [] = do hp <- store x (n-1) []
- return (0:hp)
-
+store x n h = return $ M.insert n x h