Standards: 2011

Monday, December 26, 2011

Compiling a Language to Whitespace

Whitespace is a widely-known instruction set on a stack-machine virtual machine.

The below is a sample hello world in Whitespace intermediate language. I added some notes that begin with #.

Push 0
Push 72
Store
# storing the value 72 to 0th heap
Push 1
Push 101
Store
# storing the value 101 to 1st heap
Push 2
Push 108
Store
Push 3
Push 108
Store
Push 4
Push 111
Store
Push 5
Push 44
Store
Push 6
Push 32
Store
Push 7
Push 119
Store
Push 8
Push 111
Store
Push 9
Push 114
Store
Push 10
Push 108
Store
Push 11
Push 100
Store
Push 12
Push 32
Store
Push 13
Push 111
Store
Push 14
Push 102
Store
Push 15
Push 32
Store
Push 16
Push 115
Store
Push 17
Push 112
Store
Push 18
Push 97
Store
Push 19
Push 99
Store
Push 20
Push 101
Store
Push 21
Push 115
Store
# (cont.d...)

Push 22
Push 33
Store
Push 23
Push 0
Store
Push 0
Call "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# Jumping to the ☃ mark with remembering this place to return later
Call "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# Jumping to the ☁ mark with remembering this place to return later
End
# Terminating program.
Label "  \t  \t\t   \t  \t\t \t    \t\t "
Infix Plus
Return
Label "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# ☃
Dup
# Copying the top value of the stack. It's 0 (if you see here for the first time.)
Retrieve
# Getting the 0th value of the heap. It's 72 (if you see here for the first time.)
Dup
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
OutputChar
Push 1
Infix Plus
Jump "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Discard
Discard
Return
# Ending the function call and going back to the previous place.
Label "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Dup
Dup
ReadChar
Retrieve
Dup
Push 10
Infix Minus
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Jump "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Push 0
Store
Return
Label "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# ☁
Push 10
Push 13
OutputChar
OutputChar
Return

And the result is

Hello, world of spaces!

You can run it with a little bit fixes of the official whitespace implementation wspace 0.3 written in Haskell.

A Language

I made an experimental small language and it's compiler.

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

Syntax: an expression begins with "(" and a name, arguments for it, and end with ")". If an expression has a value, you have to use the value with enclosing another expression. Otherwise the behavior is undefined. If an expression does not have a value, you must not enclose it as an argument. Otherwise the behavior is undefined.

begin
- combines some expressions that don't have return values
def {index} {value}
- assigns the {value} to {index}th slot of heap
ref {index} -> {value}
- returns the {value} of {index}th slot of heap
putc {value}
- outputs a character which ASCII code is {value}
end
- terminates program
defn {name} ({args}) {body}
- defines a function
- if a program come to defn without using call, the behavior is undefined.

* + {value} {value} -> {value} * obviously

You can "call" a function you made just with ({name} {arg1} {arg2}). You can use arguments of the function just by an identifier like x.

Sample code

(begin
  (putc (+ 50 40))
  (end))

That is compiled to

Push 50
Push 40
Infix Plus
OutputChar
End

and shows 'Z'.

I little bit complicated example

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

compiled to

Push 0
Push 98
Store
Push 90
Push 7
Call "f"
Push 0
Retrieve
OutputChar
End
Label "f"
Push (-1)
Swap
Store
Push (-2)
Swap
Store
Push (-1)
Retrieve
Push (-2)
Retrieve
Call "g"
OutputChar
Return
Label "g"
Push (-3)
Swap
Store
Push (-4)
Swap
Store
Push (-3)
Retrieve
Push (-4)
Retrieve
Infix Plus
Return

and shows "ab".

You may have noticed that the function argument is in negative number of heaps. If you updates them like by (def -3 100) it may result in breaking something, but since this implementation doesn't support negative literals, it remains safe.

The compiler is below, written in Haskell.

import qualified VM as V
import qualified Text.Parsec as P
import Control.Applicative ((<|>), (<$>))
import qualified Control.Monad.State as S
import qualified Data.Map as M
import Data.Maybe (fromJust)

data Intermediate = Comment String
  | Inst V.Instruction
  | Paramdef String
  | Paramref String
  deriving Show

type LispParser = P.ParsecT String () (S.State String)
type ParamMap = M.Map String Integer

main :: IO ()
main = do
  code <- readFile "hworld.lisp"
  --mapM_ print $ parse code
  let runtime = compile (parse code)
  mapM_ print runtime
  putStrLn "--"
  V.vm (V.VM runtime (V.Stack []) (V.Stack []) M.empty 0)

parse :: String -> [Intermediate]
parse str = either (error . show) id $
  S.evalState (P.runPT parseExpr () "lisp" str) "toplevel"

parseExpr :: LispParser [Intermediate]
parseExpr = P.try parseInt
         <|> parseDefn
         <|> parseBuiltin
         <|> parseApply
         <|> parseVar

parseInt :: LispParser [Intermediate]
parseInt = do
  x <- P.many1 P.digit
  return [Inst $ V.Push $ read x]

parseAtom :: LispParser String
parseAtom = P.many1 $ P.noneOf " \t\n()"

parseDefn :: LispParser [Intermediate]
parseDefn = do
  P.try $ do
    ignoringSpaces $ P.char '('
    ignoringSpaces $ P.string "defn"
  fname <- requireSpaces parseAtom
  S.lift $ S.put fname

  ignoringSpaces $ P.char '('
  names <- ignoringSpaces $ parseAtom `P.sepBy` P.skipMany1 P.space
  ignoringSpaces $ P.char ')'

  body <- ignoringSpaces parseExpr
  ignoringSpaces $ P.char ')'
  S.lift $ S.put "toplevel"
  return $
    Comment "(defn" :
    Inst (V.Label fname) :
    map (Paramdef . ((fname ++ "/") ++)) names ++
    body ++ [Inst V.Return] ++ [Comment ")"]

parseBuiltin :: LispParser [Intermediate]
parseBuiltin = P.try $ do
  (fname, xs) <- atomAndArgs
  x <- case (fname, length xs) of
       ("+", 2) -> return [Inst $ V.Infix V.Plus]
       ("putc", 1) -> return [Inst V.OutputChar]
       ("def", 2) -> return [Inst V.Store]
       ("ref", 1) -> return [Inst V.Retrieve]
       ("end", 0) -> return [Inst V.End]
       ("begin", _) -> return []
       _ -> fail "omg"
  return $ Comment ('(' : fname) : concat xs ++ x ++ [Comment ")"]

parseApply :: LispParser [Intermediate]
parseApply = do
  (fname, xs) <- atomAndArgs
  return $ concat xs ++ [Inst $ V.Call fname]

atomAndArgs :: LispParser (String, [[Intermediate]])
atomAndArgs = do
  ignoringSpaces $ P.char '('
  fname <- ignoringSpaces parseAtom
  xs <- ignoringSpaces $ parseExpr `P.sepBy` P.many1 P.space
  P.char ')'
  return (fname, xs)

parseVar :: LispParser [Intermediate]
parseVar = do
  name <- ignoringSpaces $ P.many1 $ P.noneOf " \t\n()"
  fname <- S.lift S.get
  return [Paramref $ fname ++ '/' : name]

ignoringSpaces :: LispParser a -> LispParser a
ignoringSpaces f = P.skipMany P.space >> f

requireSpaces :: LispParser a -> LispParser a
requireSpaces f = P.skipMany1 P.space >> f

compile :: [Intermediate] -> [V.Instruction]
compile inters = concat $ S.evalState (mapM compile' inters) M.empty

compile' :: Intermediate -> S.State ParamMap [V.Instruction]
compile' (Comment _) = return []
compile' (Inst x) = return [x]
compile' (Paramdef name) = do
  idx <- pred . negate . fromIntegral . M.size <$> S.get
  S.modify $ M.insert name idx
  return [V.Push idx, V.Swap, V.Store]
compile' (Paramref name) = do
  idx <- fromJust . M.lookup name <$> S.get
  return [V.Push idx, V.Retrieve]

This code depends on VM.hs from wspace-0.3 to share the data structure of VM Instruction and to execute the compiled program. If you only want to compile given programs, you don't need VM.hs but just to add the following definition.

data Instruction =
       Push Integer
     | Dup
     | Ref Int
     | Slide Int
     | Swap
     | Discard
     | Infix Op
     | Store
     | Retrieve
     | Label Label
     | Call Label
     | Jump Label
     | If Test Label
     | Return
     | OutputChar
     | OutputNum
     | ReadChar
     | ReadNum
     | End
   deriving (Show,Eq)

By the way wspace-0.3 had an issue that it can only handle sequential indices of heap. You can store values in 0th, 1st and 2nd slots of heap, but you cannot store in 100th without completing all indices between 0 to 100. I wrote a patch to allow any index. Feel free to use it.

diff --git VM.hs VM.hs
index c9e96ab..bb74374 100644
--- VM.hs
+++ VM.hs
@@ -1,6 +1,8 @@
 module VM where

 import IO
+import qualified Data.Map as M
+import Data.Maybe (fromJust)

 {- Stack machine for running whitespace programs -}

@@ -35,7 +37,7 @@ type Loc = Integer

 type Program = [Instruction]
 newtype Stack = Stack [Integer]
-type Heap = [Integer]
+type Heap = M.Map Integer Integer

 data VMState = VM {
         program :: Program,
@@ -130,13 +132,7 @@ findLabel' m (_:xs) i = findLabel' m xs (i+1)
 -- Heap management

 retrieve :: Integer -> Heap -> IO Integer
-retrieve x heap = return (heap!!(fromInteger x))
+retrieve x heap = return $ fromJust $ M.lookup x heap

 store :: Integer -> Integer -> Heap -> IO Heap
-store x 0 (h:hs) = return (x:hs)
-store x n (h:hs) = do hp <- store x (n-1) hs
-             return (h:hp)
-store x 0 [] = return (x:[])
-store x n [] = do hp <- store x (n-1) [] 
-         return (0:hp)
-
+store x n h = return $ M.insert n x h

Thursday, December 22, 2011

Continuous-Passing Conversion in Haskell

http://en.wikipedia.org/wiki/Continuation-passing_style

Convert from

(+ (f 0 (g 1)) 2)

(g' (lambda (r0) (f' (lambda (r1) (+ r1 2)) 0 r0)) 1)

where data structure internally in Haskell is like

data AST = Node [AST] | Leaf Value
data Value = IntVal Int | Plus | Atom String | Lambda [String]

Implementation and description

import qualified Control.Monad.State as S

data AST = Node [AST] | Leaf Value
instance Show AST where
  show (Node xs) = "(" ++ unwords (map show xs) ++ ")"
  show (Leaf v) = show v

data Value = IntVal Int | Plus | Atom String | Lambda [String]
instance Show Value where
  show (IntVal i) = show i
  show Plus = "+"
  show (Atom name) = name
  show (Lambda names) = "lambda (" ++ unwords names ++ ")"

-- (+ (f 0 (g 1)) 2)
-- (g' (lambda (r0) (f' (lambda (r1) (+ r1 2)) 0 r0)) 1)
program :: AST
program = Node [Leaf Plus,
  Node [Leaf (Atom "f"), Leaf (IntVal 0), Node [Leaf (Atom "g"), Leaf (IntVal 1)]],
  Leaf (IntVal 2)]

main = do
  print program
  print $ cps program

cps :: AST -> AST
cps ast =
  let (newAst, modifiers) = S.runState (cps' ast) [] in
      foldl (flip ($)) newAst modifiers

cps' :: AST -> S.State [AST -> AST] AST
cps' (Node (Leaf (Atom f) : xs)) = do
  xs' <- mapM cps' xs
  n <- length `fmap` S.get
  let name = 'r' : show n
  append $ \root -> Node $
    (Leaf . Atom $ f ++ "'") :
    Node [Leaf (Lambda [name]), root] :
    xs'
  return $ Leaf (Atom name)
cps' (Node xs) = Node `fmap` mapM cps' xs
cps' c@(Leaf _) = return c

append x = S.modify (x :)

This converts correctly.

I used State Monad to modify given tree. The function cps starts state and the actual function cps' traverses given AST subtrees recursively.

(+ (f 0 (g 1)) 2)
   ^^^^^^^^^^^

When cps' sees this subtree, oh yes the first item of the list is a user-defined function and it's not tail-call, so cps' wants to replace the part with a new variable (say r), and enclose whole tree with new function f' and the arguments.

(f' (lambda (r) ((+ r 2) 0 (g 1))))
^^^^^^^^^^^^^^^^^   ^   ^^^^^^^^^^^

It's easy to change subtree but it's not trivial to change outside the subtree. But fortunately we already know that we only have to enclose something around the whole tree, so you can just save a function in state.

After cps' process is done, you apply all functions that the state has accumulatively to enclose trees. That will complete the job.

Tuesday, December 20, 2011

Unlambda Interpreter in Haskell

Unlambda is a minimal, "nearly pure"[1] functional programming language invented by David Madore. It is based on combinatory logic, a version of the lambda calculus that omits the lambda operator. It relies mainly on two built-in functions (s and k) and an "apply" operator (written `, the backquote character). These alone make it Turing-complete, but there are also some I/O functions to make it possible to interact with the user, some shortcut functions and a function for lazy evaluation. There are no variables in the language.

http://en.wikipedia.org/wiki/Unlambda

import qualified Text.Parsec as P
import Control.Applicative ((*>), (<$>), (<*>))

data AST = Apply AST AST | Val Value
instance Show AST where
  show (Apply a b) = "(" ++ show a ++ " " ++ show b ++ ")"
  show (Val (Dot c)) = "put-" ++ [c]
  show (Val (Builtin c)) = [c]

data Value = Dot Char
  | Builtin Char
  | PendingK Value
  | PendingS1 Value
  | PendingS2 Value Value
  deriving Show

main = do
  let helloworld = "`r```````````.H.e.l.l.o. .w.o.r.l.di"
  let fibonacci = "```s``s``sii`ki`k.*``s``s`ks``s`k`s`ks``s``s`ks``s`k`s`kr``s`k`sikk`k``s`ksk"
  print $ desugar $ parse helloworld
  eval $ desugar $ parse helloworld
  --eval $ desugar $ parse fibonacci

parse :: String -> AST
parse = either (error . show) id . P.parse parse' "unlambda"

parse' = P.try (P.char '`' *> (Apply <$> parse' <*> parse'))
         P.<|>
         P.try (P.char '.' *> (Val . Dot <$> P.anyChar))
         P.<|>
         P.try (Val . Builtin <$> P.anyChar)

desugar :: AST -> AST
desugar (Apply a b) = Apply (desugar a) (desugar b)
desugar (Val (Builtin 'r')) = Val (Dot '\n')
desugar (Val (Builtin 'i')) = Apply (Apply (Val (Builtin 's')) (Val (Builtin 'k'))) (Val (Builtin 'k')) -- i = ``skk
desugar x = x

eval :: AST -> IO (Value)
eval (Apply a b) = do
  a' <- eval a
  b' <- eval b
  apply a' b'
eval (Val x) = return x

apply :: Value -> Value -> IO Value
apply (Dot c) x = putChar c >> return x
apply (Builtin 'k') x = return $ PendingK x
apply (Builtin 's') x = return $ PendingS1 x
apply (PendingK x) y = return $ x
apply (PendingS1 x) y = return $ PendingS2 x y
apply (PendingS2 x y) z = do
  a <- apply x z
  b <- apply y z
  apply a b

parse the given string to abstract syntax tree
desugar the ast; expanding macros like r or i.
interpreter evaluates all nodes!

AST

(put-\n (((((((((((put-H put-e) put-l) put-l) put-o) put- ) put-w) put-o) put-r) put-l) put-d) ((s k) k)))

Result of helloworld

Hello world

Result of fibonacci

*
*
**
***
*****
********
*************
*********************
**********************************
*******************************************************
*****************************************************************************************

(added at Tue Dec 20 23:57:12 PST 2011)

I also made a stackmachine-based virtual machine and a compiler for it.

https://gist.github.com/1505131

This was actually much simpler/easier than I thought. There's a difference between pure interpreter and this virtualmachine, but it's not very big.

For example very short program "hi" that shows "hi" is "``.h.ii" in unlambda. First this parser converts the text to AST.

((put-h put-i) ((s k) k))

Then the compiler converts the tree to sequence of instructions.

IPush (Dot 'h')
IPush (Dot 'i')
IApply
IPush (Builtin 's')
IPush (Builtin 'k')
IApply
IPush (Builtin 'k')
IApply
IApply

Then the virtualmachine runtime will run it.

hi

Sunday, December 18, 2011

Read Raw POST Request Body in Compojure

Brief introduction to Compojure

Compojure gives you a shortcut to make a monofunctional web application. For example a Lingr bot is a web application that only need to be responsible with one single endpoint that handles POST request. The below is a web application that only shows "hello" in / endpoint with GET request.

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

Note that it requires you to be the root of the system if you are going to run a web app on port 80.

The main part of the app is just 3 lines of code. That reminds me of code examples for Sinatra, a Ruby web library.

get '/' do
  'hello'
end

Anyways the Compojure example code doesn't work only with the main logic. You are supposed to make a Leiningen project usually to manage the app and its dependent libraries.

$ lein new hello
$ cd hello

project.clj

(defproject hello "1.0.0-SNAPSHOT"
            :main hello.core
            :description "FIXME: write description"
            :dependencies [[org.clojure/clojure "1.2.1"]
                           [compojure "1.0.0-RC2"]
                           [ring "1.0.1"]])

src/hello/core.clj

(ns hello.core
  (:use
    compojure.core
    ring.adapter.jetty))

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

then

$ lein deps
$ lein run

it should work.

Parameters

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

The 2nd argument of GET, [] in this case, is parameter list for the expression you give in 3rd argument, which mostly for referring GET parameters. That's actually a hashmap that contains :params key which value is also a hashmap of GET parameters. Ditto in POST.

How can we get the raw post parameter?

(POST "/" {params :params} (...))

In that way you cannot get raw data because it's after the process. You can reconstruct the raw data only when the given parameter is like proper a=1\nb=2 form. These days some web apis are required to handle raw POST data, which is mostly in JSON, like a Lingr Bot API.

The answer is in :body of the parameter, but it's not a String but a mysterious HttpParser.Input object, assuming you are using ring as the middleware.

http://jetty.codehaus.org/jetty/jetty-6/apidocs/org/mortbay/jetty/HttpParser.Input.html

This class looks weird because even though this has read() method the return value type isn't String but int. The other read() looks like you are supposed to pass a mutable data and refer the changed data.

Fortunately we can use slurp Clojure function to hide this complicated behaviour.

(defroutes hello
           (POST "/" {body :body} (slurp body)))

This shows the given raw POST parameter!

Tuesday, November 15, 2011

Lazy List in C

#include <stdio.h>
#include <stdlib.h>

typedef struct list_ {
  int x;
  struct closure_int_int_list_ *tail;
} *list;

typedef struct closure_int_int_list_ {
  list (*call)(int, int);
  int x1;
  int x2;
} *closure_int_int_list;

closure_int_int_list newclosure(list call(int, int), int x1, int x2)
{
  closure_int_int_list c;
  c = (closure_int_int_list)malloc(sizeof(*c));
  c->call = call;
  c->x1 = x1;
  c->x2 = x2;
  return c;
}

list newlist(int x, closure_int_int_list tail)
{
  list xs = (list)malloc(sizeof(struct list_));
  xs->x = x;
  xs->tail = tail;
  return xs;
}

list listtail(list xs)
{
  if (xs->tail == NULL) return NULL;
  return xs->tail->call(xs->tail->x1, xs->tail->x2);
}

void deletelist(list xs)
{
  free(xs->tail);
  free(xs);
}

int *takelist(int num, list xs)
{
  int *array;
  int i;
  list p;
  array = (int *)malloc(sizeof(int) * num);
  p = xs;
  for (i = 0; i < num; ++i) {
    array[i] = p->x;
    p = listtail(p);
  }
  return array;
}

list fibnext(int a, int b)
{
  return newlist(b, newclosure(fibnext, b, a + b));
}

void printarray(int *xs, int size)
{
  int i;
  for (i = 0; i < size; ++i) {
    printf("%d ", xs[i]);
  }
}

int main(int argc, char const* argv[])
{
  list xs;
  int *array;
  xs = newlist(1, newclosure(fibnext, 1, 1));
  array = takelist(10, xs);
  printarray(array, 10);
  free(array);
  deletelist(xs);
  return 0;
}

result:

1 1 2 3 5 8 13 21 34 55

Sunday, November 13, 2011

Type Inferences of Ambiguous Literals

The Haskell code below works.

main = print $ x + y
x = 1
y = 2.3

This results 3.3. x isn't Int because x is used with (+) operator that also takes 2.3.

On the other hand, the code below causes a type error in compile time.

main = print $ x + y
x = 1
y = 2.3

z = x :: Int

error:

No instance for (Fractional Int)
  arising from the literal `2.3'
Possible fix: add an instance declaration for (Fractional Int)
In the expression: 2.3
In an equation for `y': y = 2.3

You can make x ambiguous with NoMonomorphismRestriction option.

{-# LANGUAGE NoMonomorphismRestriction #-}

or -XNoMonomorphismRestriction in command line option.

http://stackoverflow.com/questions/7055146/why-do-3-and-x-which-was-assigned-3-have-different-inferred-types-in-haskell

Thanks @ikegami__!

Thursday, November 3, 2011

Lossless Data Compressions

Run-length encoding

http://en.wikipedia.org/wiki/Run-length_encoding

One of the easiest encoding method to write.

An implementatin in Haskell:

import Data.List (group)
main = do
  print $ runlength "AAAAABBCBADDDDDDDD"

runlength = concatMap f . group
  where
    f xs@(x:_) = x : show (length xs)

"A5B2C1B1A1D8"

Huffman coding

http://en.wikipedia.org/wiki/Huffman_coding

An implementatin in Haskell:

import Data.List (group, sort, sortBy)
import Data.Function (on)
import Data.Map (fromList, lookup)
import Data.Maybe (fromJust)
import Prelude hiding (lookup)

main = print $ huffman "AAAAABBCBADDDDDDDD"

huffman s = flip concatMap s $ fromJust . flip lookup (huffmanMap s)

huffmanMap s =
  let x = map head . sortBy (compare `on` length) . group . sort $ s in
      fromList $ zipWith ((,)) x (bits $ length x)
bits :: Int -> [[Int]]
bits length = (take length $ repeat 1) : reverse (take (length - 1) $ iterate (1 :) [0])

[1,0,1,0,1,0,1,0,1,0,1,1,0,1,1,0,1,1,1,1,1,1,0,1,0,0,0,0,0,0,0,0,0]

BPE: Byte Pair Encoding

http://en.wikipedia.org/wiki/Byte_pair_encoding

import Data.List (group, sort, sortBy, (\\))
import Data.Function (on)
import qualified Data.Map as M
import Data.Map (insert, empty, notMember)
import Data.Maybe (fromJust)
import Safe (fromJustDef)

main = do
  let orig = "ABCDCDABCDCDE"
  print orig
  let (encoded, table) = bpe orig
  print (encoded, table)
  print $ bpeDecode encoded table

bpe xs = bpe' xs empty
bpe' xs table =
  let x = head $ sortBy (flip compare `on` length) $ group $ sort $ zipWith ((,)) xs (tail xs) in
      if length x == 1
         then (xs, table)
         else let (a, b) = head x
                  new = pickupOther xs table in
                  bpe' (replace2 xs a b new) (insert new (a, b) table)

bpeDecode xs table = concatMap (replace (expand (M.map (\(a, b) -> [a, b]) table))) xs
  where
    replace :: M.Map Char String -> Char -> String
    replace expandedTable c = maybe [c] id $ M.lookup c expandedTable
    expand :: M.Map Char String -> M.Map Char String
    expand table = M.map (concatMap f) table
      where
        f :: Char -> String
        f c = maybe [c] (concatMap f) $ M.lookup c table

pickupOther xs table =
  head $ filter (flip notMember table) $ ['Z', 'Y'..'A'] \\ xs

replace2 (a:[]) _ _ _ = [a]
replace2 (a:b:xs) c d e
  | a == c && b == d = e : replace2 xs c d e
  | otherwise = a : replace2 (b : xs) c d e

"ABCDCDABCDCDE"
("WWE",fromList [('W',('X','Z')),('X',('Y','Z')),('Y',('A','B')),('Z',('C','D'))])
"ABCDCDABCDCDE"

Monday, October 31, 2011

Optimizer Comparisons -- Skip Nops

In Haskell

The following Haskell code is supposed to just output "hello".

main = print $ times 1000000000000 id "hello"

times 0 _ x = x
times n f x = times (n-1) f (f x)

The times function I defined applies the given function many times.

times 3 f x is...
- times 2 f (f x)
- times 1 f (f (f x))
- times 0 f (f (f (f x)))
- f (f (f x))

id, defined in Prelude, just returns the given argument. You may want to assume the GHC internal optimizer to eliminate any id call. In other words, the Haskell code in the very beginning is supposed to be transformed just to main = print "hello".

Unfortunately, no it doesn't, even with -O2 option.

$ ghc -O2 a.hs -o a
$ ./a
... (long long hours) ..

In C

I just translated the original Haskell code to C.

#include "stdio.h"

char *id(char *x) {
  return x;
}
char *times(long long n, char *(*f)(char *), char *x) {
  if (n == 0) {
    return x;
  } else {
    return times(n-1, f, f(x));
  }
}

int main(int argc, char const* argv[]) {
  puts(times(1000000000000LL, id, "hello"));
  return 0;
}

I tried it with GCC and Clang.

GCC on Mac:

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
```
$ gcc -O2 a.c -o a
$ ./a
... (long long hours) ..
```

The assembly code for the main function with -O2 -m32 option is the following.

_main:
  pushl %ebp
  movl  %esp, %ebp
  pushl %ebx
  subl  $20, %esp
  call  L13
"L00000000001$pb":
L13:
  popl  %ebx
  leal  LC0-"L00000000001$pb"(%ebx), %eax
  movl  %eax, 8(%esp)
  leal  _id-"L00000000001$pb"(%ebx), %eax
  movl  %eax, 4(%esp)
  movl  $-727379968, (%esp)
  call  _times
  movl  %eax, (%esp)
  call  _puts
  xorl  %eax, %eax
  addl  $20, %esp
  popl  %ebx
  leave
  ret

You see it's calling _times for sure.

Clang:

$ clang -O2 a.c -o a
$ ./a
hello

This gives you the message immediately!

The below is the generated LLVM IR. You can easily tell that it just calls puts function directly. Note that the @.str const is just "hello" ending with null.

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable ssp {
entry:
  %call1 = tail call i32 @puts(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0)) nounwind
  ret i32 0
}

GCC on Linux:

gcc 4.4

main:
  pushl %ebp
  movl  %esp, %ebp
  andl  $-16, %esp
  subl  $16, %esp
  movl  $.LC0, (%esp)
  call  puts
  xorl  %eax, %eax
  leave
  ret

It ran like the one with Clang!

When the big number isn't long long

I wrote the first argument of times in C as int at first.

I got warning messages from both GCC and Clang that the value, 1000000000000, was too big for int. In the case, GCC on Linux made the following assembly code.

main:
  pushl %ebp
  movl  %esp, %ebp
.L11:
  jmp   .L11

It's just an infinite loop! That actually makes sense because any int values never go to non-int value 1000000000000.

Summary

GHC doesn't eliminate id in optimization
GCC eliminates it except for Apple GCC 4.2
Clang eliminates id
Some optimizations can conflict if the code depends on undefined behaviour

Sunday, October 30, 2011

How To Compile C to IA32

Q. How to compile a C file to IA32 assembly language on AMD64 Debian?

$ gcc -S -m32 a.c
In file included from /usr/include/features.h:378,
                 from /usr/include/stdio.h:28,
                 from a.c:1:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory

A. install gcc-multilib.

$ sudo aptitude install gcc-multilib

Friday, October 28, 2011

URL Encodings

URL Encoding, specifically Percent-encoding, is one of common encodings that you may use in arbitorary programming languages.

Let me encode a text into URL Encoding, which includes UTF-8 Japanese characters.

JavaScript (node.js)

Use encodeURIComponent.

var sys = require('sys');
sys.puts(encodeURIComponent('abc あいう-#!@'));

result:

abc%20%E3%81%82%E3%81%84%E3%81%86-%23!%40

Note that escape and encodeURI aren't proper for URI encoding.

sys.puts(escape('abc あいう-#!@'));
//=> abc%20%u3042%u3044%u3046-%23%21@
sys.puts(encodeURI('abc あいう-#!@'));
//=> abc%20%E3%81%82%E3%81%84%E3%81%86-#!@
sys.puts(encodeURIComponent('abc あいう-#!@'));
//=> abc%20%E3%81%82%E3%81%84%E3%81%86-%23!%40

(Thanks @javascripter!)

Ruby

Use ERB::Util.url_encode.

# coding: utf-8
require 'erb'
puts ERB::Util.url_encode 'abc あいう-#!@'

result:

abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40

This result is exactly same to JavaScript's except for "!".

Note that URI.encode or URI.encode_www_form_component aren't proper.

require 'uri'
puts URI.encode 'abc あいう-#!@'
#=> abc%20%E3%81%82%E3%81%84%E3%81%86-%23!@
puts URI.encode_www_form_component 'abc あいう-#!@'
#=> abc+%E3%81%82%E3%81%84%E3%81%86-%23%21%40

Python

Use urllib.quote.

# coding: utf-8
import urllib
print urllib.quote('abc あいう-#!@')

result:

abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40

(Thanks @raa0121!)

Vim script

Use Vital.Web.Http.encodeURI.

let H = vital#of('vital').import('Web.Http')
echo H.encodeURI('abc あいう-#!@')

result:

abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40

Haskell

Use Network.HTTP.urlEncode. But...

import Network.HTTP (urlEncode)
main = putStrLn $ urlEncode "abc あいう-#!@"

result:

abc%20%30%42%30%44%30%46-%23%21%40

oh...

Tuesday, October 25, 2011

Converting From/To String

Haskell has three dominant types of the implementations of strings; String, ByteString and Text(*1). Since String is the default literal of strings, most libraries only support String as strings, even though you can write ByteString or Text directly in a Haskell code with OverloadedStrings pragma.

Haskell has IsString class that means you can convert the type to String, using fromString function. IsString only requires the type to have the fromString function.

-- in Data.String
class IsString a where
  fromString :: String -> a

ToString

I implemented ToString class that you can convert from the type to String by toString function, oppositely to the IsString class and its fromString function.

{-# LANGUAGE TypeSynonymInstances #-}
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString as B

class ToString a where
  toString :: a -> String

instance ToString String where
  toString = id

instance ToString T.Text where
  toString = T.unpack

instance ToString B.ByteString where
  toString = C.unpack

You can use them like the following way.

{-# LANGUAGE OverloadedStrings #-}
main = do
  putStrLn $ toString ("hello world!" :: String)
  putStrLn $ toString ("hello world!" :: T.Text)
  putStrLn $ toString ("hello world!" :: B.ByteString)

It might remind you about show :: a -> String. show and the toString I defined here have the same type but the purpose and the behaviour are different. show is to give human-readable and machine-parsable String represantation while toString is just to convert from a type which is an implementation of the concept of strings to String.

If you are familiar with Ruby, this may make sense; show in Haskell is inspect in Ruby while toString in Haskell is to_str in Ruby. Note that that's not to_s whom Integer or some other non-string classes also have.

Use case

An example is Network.HTTP.urlEncode function; the function is String -> String but you may want to use it for Text values.

import qualified Data.Text as T
import qualified Network.HTTP as H
urlEncodeText :: T.Text -> T.Text
urlEncodeText = T.pack . H.urlEncode . T.unpack

You can define Text-specific version of urlEncode easily. You can define ByteString-specific version as well in the same way. But you cannot have general one.

It's turn to use the ToString we saw here.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeSynonymInstances #-}
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString as B
import qualified Network.HTTP as H
import Data.String (fromString, IsString)

main = do
  putStrLn $ urlEncode ("hello world!" :: String)
  T.putStrLn $ urlEncode ("hello world!" :: T.Text)
  B.putStrLn $ urlEncode ("hello world!" :: B.ByteString)

class ToString a where
  toString :: a -> String

instance ToString String where
  toString = id

instance ToString T.Text where
  toString = T.unpack

instance ToString B.ByteString where
  toString = C.unpack

urlEncode :: IsString a => ToString a => a -> a
urlEncode = fromString . H.urlEncode . toString

This urlEncode function defined here can use String, ByteString and Text just by the single function! I just combined IsString and ToString there :)

Footnote

*1 see also: http://www.slideshare.net/ujihisa/text-manipulation-withwithout-parsec

Wednesday, October 5, 2011

Completing Method Names in Ruby Based on Input and Output

I described how to auto-complete method names of an object on Vim but here let me describe about a bit more restricted one. How can we auto-complete method names of an object that goes to a given object?

An example here that you send succ or next on an object 1 when you want to get an object 2.

1.succ #=> 2

Another example. first method will let you get an object :a by [:a, :b, :c].

[:a, :b, :c].first #=> :a

It is natural that you expect Vim automatically completes the method names when the cursor is on the . just after the object.

neco-rubymf.vim

A neocomplcache plugin, neco-rubymf, does exactly what I described above.

Installation:

Install a Rubygems package methodfinder.
```
$ gem install methodfinder
```
Install the neocomplcache plugin neco-rubymf. If your Vim still don't have neocomplcache, install it as well.

Try inputting the following line in a buffer where the 'filetype' is ruby. Note that ¶ means cursor here.

1.¶ #=> 2

That shows methods next and succ in completion popup automatically, which the method an object 1 owns and also the returning value is 2.

Similarly,

[:a, :b, :c].¶ #=> :a

gives this.

The screenshots below are other examples.

※The last example occurs stochastically.

Wednesday, August 3, 2011

Benchmarks between Clang and GCC about Ruby 1.9.3

Since Matz Ruby Implimentation ruby 1.9.3 has been released as preview release, I think that all Rubyists should abandon the ruby they are usually using, ruby 1.9.2, and should use 1.9.3 now.

Ruby 1.9.3 also supports clang instead of gcc inofficially. You can compile ruby with clang 3.0 and almost everything works(*1).

Building clang 3.0

There are two things you have to understand before you build clang.

Clang is a subproject of LLVM. You cannot build clang independently to LLVM unless you are an LLVM specialist.
LLVM build directly has to be not in LLVM source tree for some reason.

Otherwise it just fails building.

$ mkdir -p ~/src/llvm-git-build
$ cd ~/git
$ git clone http://llvm.org/git/llvm.git
$ cd tools
$ git clone http://llvm.org/git/clang.git
$ cd ~/src/llvm-git-build
$ ~/git/llvm/configure --prefix=`pwd`/local
$ make && make install

That will install llvm commands and clang in ~/src/llvm-git-build/local/bin directory.

$ ~/src/llvm-git-build/local/bin/clang -v
clang version 3.0 (http://llvm.org/git/clang.git d484040a5fc8d8d8a7a6a0239fc3b34486258181)
Target: x86_64-apple-darwin10.8.0
Thread model: posix

Building ruby 1.9.3 with clang

$ cd ~/git
$ git clone http://github.com/ruby/ruby.git ruby193
$ cd ruby193
$ git checkout ruby_1_9_3 origin/ruby_1_9_3

a.sh

export CC=~/src/llvm-git-build/local/bin/clang
autoconf && ./configure --with-out-ext=tk\* --with-arch=x86_64 --prefix=`pwd`/local --with-readline-dir=/usr/local --with-libyaml-include=/usr/local/include && make && make install

then

$ sh a.sh

Benchmark

$ local/bin/ruby benchmark/run.rb --ruby=local/bin/ruby

I ran on ruby made by clang and ruby made by gcc and compared them.

	gcc/clang	gcc [sec]	clang [sec]
loop_generator	1.55	1.187	0.768
vm_thread_pass_flood	1.38	0.52	0.377
vm1_const	1.23	1.449	1.175
so_random	1.15	1.191	1.04
so_k_nucleotide	1.11	0.02	0.018
vm2_case	1.10	0.338	0.307
vm2_regexp	1.07	1.8	1.683
loop_whileloop	1.07	0.677	0.634
vm_thread_pipe	1.07	1.659	1.556
so_count_words	1.06	0.442	0.418
loop_whileloop2	1.06	0.148	0.14
so_reverse_complement	1.06	0.019	0.018
vm1_swap	1.06	0.956	0.906
vm2_array	1.05	1.543	1.468
vm1_ivar	1.04	1.662	1.596
vm1_length	1.04	1.467	1.412
vm_thread_create_join	1.03	4.45	4.321
vm_thread_pass	1.02	2.308	2.258
loop_times	1.02	1.834	1.796
app_uri	1.01	1.362	1.346
io_select2	1.00	3.976	3.957
vm1_rescue	1.00	0.799	0.798
vm1_ivar_set	1.00	1.766	1.765
io_select3	1.00	0.03	0.03
vm1_ensure	1.00	0.736	0.738
io_select	1.00	3.503	3.513
vm_thread_alive_check1	1.00	0.327	0.328
so_mandelbrot	0.99	6.776	6.817
vm1_not	0.99	0.958	0.965
vm2_poly_method_ov	0.98	0.422	0.43
io_file_write	0.98	1.78	1.815
so_nbody	0.97	4.861	5.012
vm2_eval	0.97	28.817	29.772
so_partial_sums	0.97	6.003	6.211
so_concatenate	0.96	5.322	5.536
so_exception	0.96	1.653	1.723
loop_for	0.95	1.865	1.96
vm1_neq	0.95	1.206	1.269
io_file_read	0.95	3.764	3.965
so_sieve	0.94	1.06	1.13
vm3_gc	0.93	1.593	1.712
so_pidigits	0.92	0.905	0.982
so_spectralnorm	0.92	3.911	4.265
vm2_proc	0.91	0.858	0.941
so_nested_loop	0.91	1.415	1.554
so_fannkuch	0.89	2.321	2.6
vm3_clearmethodcache	0.89	0.683	0.767
so_array	0.89	1.859	2.098
vm2_defined_method	0.88	4.449	5.034
io_file_create	0.88	4.266	4.831
vm1_simplereturn	0.88	2.034	2.307
vm_thread_mutex3	0.88	1.894	2.164
vm2_send	0.87	0.494	0.567
vm2_unif1	0.87	0.419	0.484
so_matrix	0.86	1.173	1.357
so_nsieve_bits	0.86	4.002	4.637
app_tak	0.86	1.133	1.316
so_lists	0.85	1.439	1.698
app_tarai	0.85	0.9	1.062
vm1_block	0.85	2.818	3.331
so_ackermann	0.82	0.891	1.082
vm_thread_mutex1	0.82	1.145	1.393
so_nsieve	0.80	3.188	3.967
vm2_super	0.80	0.68	0.852
so_binary_trees	0.79	0.492	0.62
app_strconcat	0.79	1.93	2.444
vm2_mutex	0.79	1.546	1.968
vm2_zsuper	0.78	0.682	0.88
vm2_poly_method	0.77	2.654	3.444
app_factorial	0.76	1.317	1.734
app_erb	0.76	1.834	2.426
so_meteor_contest	0.75	4.953	6.626
so_object	0.74	1.047	1.416
vm2_method	0.71	2.174	3.049
app_pentomino	0.70	25.525	36.325
app_fib	0.70	0.812	1.168
so_fasta	0.67	2.866	4.302
app_answer	0.66	0.075	0.113
app_mandelbrot	0.65	2.422	3.743
app_raise	0.49	0.861	1.743
vm_thread_mutex2	0.41	1.228	2.977

Considering all the results gcc-made ruby is faster than clang-made ruby but not in all the benchmarks. Hopefully the results must be an interesting example for some people.

Footnote and references

*1: make test-all shows that there is one issue in Fiber with the ruby made by clang.
http://ujihisa.blogspot.com/2011/01/how-to-build-mri-n-times-faster.html comparison between clang and gcc about time to build ruby

Monday, July 25, 2011

Compiling a C File, Preserving the LLVM File

Assuming you have a C file a.c and want to get a.ll and a.o.

An intuitive way is that

$ clang -flto -S a.c
$ mv a.s a.ll
$ clang -flto -c a.c

But the second clang looks waste of CPU; you generate an LLVM assembly language and then you generate an object file through LLVM assembly language without using the generated assembly file.

Let me break down the second command into small pieces..

$ clang -flto -S a.c
$ mv a.s a.ll
$ llc a.ll
$ gcc -c a.s

In GCC

$ gcc -c --save-temps a.c

oh... very easy...

http://www.qiita.com/questions/87

Saturday, July 23, 2011

String Interpolation in Haskell

Ruby has a special syntac in String literal which name is String Interpolation that you can write the result of a expression in a string easily.

"hello, #{name}!"

The notation is equivalent to the following notation.

"hello, " + name + "!"

That's not only available in Ruby but also in CoffeeScript.

I wrote this for Haskell in Haskell.

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Attoparsec.Text as P
import Control.Applicative

main = T.putStrLn $ stringInterpolate "as{df \"jkl#{x}jkw\"jl}kf"

(<<) = T.append

stringInterpolate :: T.Text -> T.Text
stringInterpolate x = case P.parseOnly stringInterpolate' x of
                           Left x -> T.pack x
                           Right x -> x

stringInterpolate' :: P.Parser T.Text
stringInterpolate' = P.try $
  T.concat <$> P.many (block <|> string <|> T.singleton `fmap` P.notChar '}')

block :: P.Parser T.Text
block = P.try $ do
  x <- (P.char '{') *> stringInterpolate' <* (P.char '}')
  return $ "{" << x << "}"

-- "jklsfj#{...}sdjfkl"
string :: P.Parser T.Text
string = P.try $ do
  x <- (P.char '"') *> stringContent <* (P.char '"')
  return $ "(\"" << x << "\")"

-- jklsfj#{...}sdjfkl
stringContent :: P.Parser T.Text
stringContent = P.try $
  T.concat <$> P.many (interpolate <|> T.singleton `fmap` P.notChar '"')

-- #{...}
interpolate :: P.Parser T.Text
interpolate = P.try $ do
  x <- P.string "#{" *> stringInterpolate' <* P.char '}'
  return $ "\" ++ " << x << " ++ \""

String concatination in Haskell is ++. The program above converts string literals like the below.

"hello, #{name}!"

("hello, " ++ name ++ "!")

You can write arbitrary expression in the placeholder.

"hello, #{[ toUpper c | c <- ['a'..'z'] ]}!"

("hello, " ++ [ toUpper c | c <- ['a'..'z'] ] ++ "!")

Thursday, June 30, 2011

Vanrb Lightning Talk Slides: Ruby and Vim

http://www.meetup.com/vancouver-ruby-rails/events/21277581/

structure

RSense

A Ruby development toolset

Code completion
Type inspection
Definition jump

RSense code completion

a.rb

a = '123'
b = a.to_i
b.

rsense command

$ rsense code-completion --file=a.rb --location=3:2
completion: to_enum Object#to_enum Object METHOD
completion: succ! String#succ! String METHOD
completion: swapcase! String#swapcase! String METHOD
completion: instance_variable_defined? Object#instance_variable_defined? Object METHOD
...

Implementation

Front-end command bin/rsense
Back-end server process with Java&JRuby
Type-inference algorithm "modified-Cartesian Product Algorithm"

limitations

classes in the buffer
1.8 built-in libraries
incorrect syntax
evals and exceptions

Vim

Vim

Vim and me

fullscreen MacVim
vimshell (no other terminals)
neocomplcache
unite
quickrun

RSense built-in Vim plugin

User-defined completion <C-x><C-u>

a = '123'
b = a.to_i
b._

Insert-mode completions in general

<C-n> Keyword
<C-x><C-f> File name
<C-x><C-s> English spelling
<C-x><C-o> Omni
<C-x><C-u> User-defined

neocomplcache

Insert-mode auto-completion plugin framework

No key mapping necessary
Integrates lots of completions

RSense as omni func

~/.vimrc

let g:rsenseUseOmniFunc = 1
let g:rsenseHome = expand('~/git/rsense')

let g:neocomplcache_omni_patterns = {}
let g:neocomplcache_omni_patterns.ruby = '[^. *\t]\.\w*\|\h\w*::'

one more thing

neco-: prefix for neocomplcache plugins
neco-rake: rake task completion

structure

http://ujihisa.blogspot.com
http://twitter.com/ujm
google rsense
google neocomplcache if you are a Vimmer

Sunday, June 12, 2011

String\#slice of Ruby in Vim script and Haskell

A Ruby example of String#[]

str = "Oh, this is a pen."
p str[/this is a (\w+)\./, 1]

The result is "pen". Since String#[] is just an alias of String#slice, (*1)

p str.slice(/this is a (\w+)\./, 1)

The result is completely same.

in Vim script

Vim script version needs two separate processes; getting the list of first-matched string itself and containing sub matches, and then getting the specific item.

let str = "Oh, this is a pen."
echo matchlist(str, 'this is a \(\w\+\)\.')[1]

(Added at Sun Jun 12 09:26:44 PDT 2011) thinca suggested that matchstr() with \zs and \ze is very handy, particularly because of the different behavior in the case when it didn't match.

let str = "Oh, this is a pen."
echo matchstr(str, 'this is a \zs\w\+\ze\.')

in Haskell

Haskell version needs three separate processes with Text.Regex.Posix.=~; it's almost same to Vim but the default =~ behaviour is to assume the regex object has "global" option, so you have to pick which match.

import Text.Regex.Posix ((=~))
main = do
  let str = "Oh this is a pen."
  print $ head (str =~ "this is a ([a-zA-Z_]*)" :: [[String]]) !! 1

(Added at Sun Jun 12 12:54:01 PDT 2011) The following code is another example; it's safe in runtime and also this supports Vim's \zs and \ze.

import Text.Regex.PCRE ((=~))
import Data.String.Utils (replace)
import Safe (headMay, atMay)
import Data.Maybe (fromMaybe)

matchstr :: String -> String -> String
matchstr expr pat =
  let x = replace "\\zs" "(" $ replace "\\ze" ")" pat in
  fromMaybe "" $ headMay (expr =~ x :: [[String]]) >>= \y -> atMay y 1

main = print $ matchstr "this is a pen" " \\zs\\w+\\ze"

in Clojure

(Added at Thu Mar 22 17:49:40 PDT 2012)

((re-find #"this is a (\w+)\." "Oh, this is a pen.") 1)
; "pen"

(*1) Precisely no. Try and check the differences between them without the second argument.

Tuesday, May 31, 2011

A Vim Workshop in Tokyo, Japan in May

I organized a Vim workshop in Tokyo, Japan. The name of the workshop was ujihisa.vim. More than 40 people tried to come but 20+ people attended at this workshop due to the capacity of the room.

The venue

I went to Tokyo for this workshop.

a photo by guyon

Thanks TMI and kana1!

Talks

shadow.vim with me
- shadow.vim https://github.com/ujihisa/shadow.vim
- arbitrary macros for arbitrary programming languages
- e.g. compiling CoffeeScript into JavaScript secretly
- it's not a framework but just a tool; you can combine it with others easily
- shadow.el is also available https://github.com/mooz/shadow.el
all your vimrc is belong to us with kana1
- picks a victim, points a lot of bad smells in the vimrc of the victim, and fix them!
- if the length of your vimrc is less than 1000, you aren't a Vimmer yet.
- syntax on is evil. use syntax enable.
i18n with niw
- strchars() doesn't treat multi-byte character correctly.
tiny-vim with guyon
- about Debian package "tiny-vim"
- no built-in documentation!
Vim in 5 minutes with Shougo
- neocomplcache, unite, vimshell and vimfiler author
- explained everything about Vim exactly within 5 minutes and extra minutes for further details
vital.vim with me
- it's amazing. use it if you are a Vim plugin author.
- something like jQuery (from JavaScript) + Bundler (from Ruby)
- there's an issue in vital.vim about debugging with quickrun. (*)
metarw-yakiudon with sora_h
- a metarw plugin yakiudon
- works almost exactly same as blogger.vim but the backend Ruby library is different
- after this presentation he talked about Ruby core development

(*) thinca, the quickrun author, fixed this issue with the following change in vital during the workshop!

Misc

Before and after the workshop some of the attendees had lunch and dinner with good Vim conversation. The topics were a lot.. vimshell, CoffeeScript, Titanium, HootSuite, Twitter, Ruby core, the ultimate programming language Zimbu, PHP vs Vim script, Emacs lisp and etc...

Quotes

"It's... it's not the Vim I knew..."
"I came to Japan from San Francisco only for attending at this workshop"
"Wow! HootSuite for iPad works amazingly!!!"

Monday, May 9, 2011

Workshops in Amagasaki, Japan by Me

I organized the 7th Vim workshop and a CoffeeScript workshop in Amagasaki, Japan on the last Saturday.

About 20 to 30 people attended the workshops. Even though the room had an air conditioner, the room was so hot and humid like sauna by them (and their MacBooks.)

I was supposed to make presentations with demonstrations, but unfortunately I forgot bringing my mini-DVI port for my initial version of MacBook Air. There were many MacBook Air attendees but nobody had the same version to me. I eventually gave up using my computer and just made presentation by friends' computers. Sorry for missing demonstrations.

Vim Workshop#7

There were five talks

Recommended Vim plugins (by me)
The implementation of lingr.vim (by tsukkee)
Bundle with Vundle (by kozo2)
Unite is useful, and rails.vim as well (by Sixeight)
All about vital.vim (by me)

I broadcasted them and recorded some of them (I just forgot recording the others.) http://ustream.tv/channel/uji

CoffeeScript Workshop

There were five talks.

Introduction to CoffeeScript -- syntax and usage (by me)
Node.js on Windows, and introduction to no.de (by cuzic)
iPhone app development with CoffeeScript and Titanium (by deguchi)
TDD with Jasmine and CoffeeScript (by http://twitter.com/nanki)
Details of CoffeeScript syntax (by yhara and me)

by fukui

Tuesday, February 22, 2011

Vim 7.3 Conceal Current Issue

Vim 7.3 new feature Conceal has an issue that when you change your colorscheme the current conceal information will be partly removed automatically.

I found this issue with the combination of Haskell-Conceal plugin and unite-colorscheme plugin which internally uses tabpagecolorscheme plugin. Every time I moved the tab, the concealed text highlighted wrongly. After a research I figured out that it was not by changing the tab but by changing colorscheme into the same colorscheme.

First of all, the idiom to define a concealed token in a syntax file is (1) to define a syntax match {your favourite name here} "{regexp}" conceal cchar={the replacement} (2) and to link the color by highlight! link Conceal {token name}. Note that normal highlights only needs to hightlight without the bang.

The main part of the implementation of colorscheme command is defined as below which is from src/syntax.c of vim repository.

6755 /*
6756  * Load color file "name".
6757  * Return OK for success, FAIL for failure.
6758  */
6759     int
6760 load_colors(name)
6761     char_u *name;
6762 {
6763     char_u *buf;
6764     int        retval = FAIL;
6765     static int recursive = FALSE;
6766 
6767     /* When being called recursively, this is probably because setting
6768      * 'background' caused the highlighting to be reloaded.  This means it is
6769      * working, thus we should return OK. */
6770     if (recursive)
6771    return OK;
6772 
6773     recursive = TRUE;
6774     buf = alloc((unsigned)(STRLEN(name) + 12));
6775     if (buf != NULL)
6776     {
6777    sprintf((char *)buf, "colors/%s.vim", name);
6778    retval = source_runtime(buf, FALSE);
6779    vim_free(buf);
6780 #ifdef FEAT_AUTOCMD
6781    apply_autocmds(EVENT_COLORSCHEME, NULL, NULL, FALSE, curbuf);
6782 #endif
6783     }
6784     recursive = FALSE;
6785 
6786     return retval;
6787 }

Basically this function runs :source colors/{the colorscheme name}.vim and them runs autocmds already registers in Colorscheme event. This function itself looks good.

Every colorscheme files run :highlight clear in the beginning of the files. This removes all highlights except for user-defined ones. Thus it preserves syntactic information of each syntax files, but breaks conceal information given by each syntax files, because it's not treated as user-defined one but as just extended built-in one.

Possible solutions

I propose the following solutions. Each of them are independent and exclusive.

Remove the procedure of removing conceal information from :highlight clear command
Fix colorscheme command to recover the conceal information after changing colorscheme
The behaviour is specification. Just add documentation in Vim core to persuade conceal users to add autocmd event to change conceal again after colorscheme changed.

The below is the patch for the solution (a).

diff --git src/syntax.c src/syntax.c
index 369311f..90165cf 100644
--- src/syntax.c
+++ src/syntax.c
@@ -6572,10 +6572,6 @@ static char *(highlight_init_light[]) =
  CENT("ColorColumn term=reverse ctermbg=LightRed",
       "ColorColumn term=reverse ctermbg=LightRed guibg=LightRed"),
 #endif
-#ifdef FEAT_CONCEAL
-   CENT("Conceal ctermbg=DarkGrey ctermfg=LightGrey",
-        "Conceal ctermbg=DarkGrey ctermfg=LightGrey guibg=DarkGrey guifg=LightGrey"),
-#endif
 #ifdef FEAT_AUTOCMD
  CENT("MatchParen term=reverse ctermbg=Cyan",
       "MatchParen term=reverse ctermbg=Cyan guibg=Cyan"),
@@ -6662,10 +6658,6 @@ static char *(highlight_init_dark[]) =
  CENT("MatchParen term=reverse ctermbg=DarkCyan",
       "MatchParen term=reverse ctermbg=DarkCyan guibg=DarkCyan"),
 #endif
-#ifdef FEAT_CONCEAL
-   CENT("Conceal ctermbg=DarkGrey ctermfg=LightGrey",
-        "Conceal ctermbg=DarkGrey ctermfg=LightGrey guibg=DarkGrey guifg=LightGrey"),
-#endif
 #ifdef FEAT_GUI
  "Normal gui=NONE",
 #endif

Saturday, February 5, 2011

Delegation in Vim script

In case when you define a function F1 which behaves same as a given function F2 in Vim script, you may just choose the following cases depending on the arguments of F2.

The solution is just to use call().

function! F1(...)
  return call('F2', a:000)
endfunction

But if you also want to preserve the definition of arguments? There are still solution of each cases.

Named arguments (or no arguments)

function! F1(x)
  return F2(a:x)
endfunction

Unnamed arguments

function! F1(...)
  return call('F2', a:000)
endfunction

Then you can make mixed version, using call().

function! F1(x, ...)
  return call('F2', insert(deepcopy(a:000), a:x))
endfunction

Basically this is "cons"ing a named variable into the top of the list.

Summary

call() is so handy.

Shougo Matsu, the neocomplcache and unite author, suggested me to use call() instead of the following "First version" of this blog post. That's absolutely better idea than I had. Thanks Shougo!

First version

The following sentense was legacy. I leave them just for your additional information.

Named arguments (or no arguments)

function! F1(x)
  return F2(a:x)
endfunction

Unnamed arguments

function! F1(...)
  return eval('F2(' . join(map(range(1, a:0), '"a:" . v:val'), ', ') . ')')
endfunction

This looks more difficult than it should be. Let me explain it.

a:0 is the number of the arguments of the function. If you call F1(x, y, z), a:0 is 3. You can access each arguments as a:1, a:2 and a:3.

range(1, a:0)

This becomes a list [1, 2, 3] if the number of the arguments of F1 is 3.

map(range(1, a:0), '"a:" . v:val'), ', ')

This becomes 'a:1, a:2, a:3'. Then you can make a string "F2(a:1, a:2, a:3)" dynamically, and call it by eval.

Then you can make mixed version easily.

function! F1(x, ...)
  return eval('F2(a:x, ' . join(map(range(1, a:0), '"a:" . v:val'), ', ') . ')')
endfunction

The difference is only just "F2(a:x, " which was originally "F2(".

How To Build MRI n Times Faster

Clang, a compiler front end which you can use instead of gcc, can build MRI: Matz Ruby Implementation, only if you succeeded in avoiding all pitfalls.

Build the latest LLVM and Clang

Mac OS X Snow Leopard has builtin clang command but it's too old to build ruby. First of all, get the latest clang and it's dependency, the latest LLVM.

$ svn checkout http://llvm.org/svn/llvm-project/llvm/trunk llvm && cd llvm/tools && svn checkout http://llvm.org/svn/llvm-project/cfe/trunk clang && cd ..
$ ./configure && time make

Note that it takes really long time. I recommend you not to build before you go away from keyboard for a while but to build before you go to bed.

A simple test

hello.c

#include<stdio.h>
int main() {
  puts("Hello, world!");
  return 0;
}

building on gcc and clang

$ time gcc hello.c -o hello-gcc
!!!        0.34 real         0.01 user         0.03 sys!!!

[%] /private/tmp
$ time ~/src/llvm/Debug+Asserts/bin/clang hello.c -o hello-clang
!!!        0.11 real         0.06 user         0.02 sys!!!

clang is 3 times faster than gcc to build the hello world code. The executable files work samel.

Ruby

On GCC

$ git clone https://github.com/ruby/ruby.git ruby193-gcc
$ cd ruby193-gcc
$ autoconf
$ ./configure --prefix=`pwd`/local --enable-pthread --disable-install-doc --with-out-ext=tk\* --program-suffix=193 --with-readline-dir=/usr/local --with-arch=x86_64
$ time make
...
!!!      214.93 real       191.58 user        23.57 sys!!!

On Clang

$ git clone https://github.com/ruby/ruby.git ruby193
$ cd ruby193
$ ./configure --prefix=`pwd`/local --enable-pthread --disable-install-doc --with-out-ext=tk\* --program-suffix=193 --with-readline-dir=/usr/local PATH=/Users/ujihisa/src/llvm/Debug+Asserts/bin/:$PATH CC=/Users/ujihisa/src/llvm/Debug+Asserts/bin/clang CFLAGS=-Qunused-arguments CPPFLAGS=-Qunused-arguments --with-arch=x86_64
$ time make
!!!      932.06 real       883.44 user        31.09 sys!!!

It took 3 min 34.93 sec by GCC while it took 15 min 32.06 sec by Clang. GCC is 4.34 times faster than Clang to build MRI.

Pitfalls

If you use clang command of Mac OS X Snow Leopard, even though you'll succeed in finishing make ruby 1.9.3, but you cannot do anything by the Ruby.

$ ./local/bin/ruby -ve 1
ruby 1.9.3dev (2011-01-25 trunk 30651) [x86_64-darwin10.6.0]
!!!dyld: lazy symbol binding failed: Symbol not found: _rb_encdb_declare!!!
!!!  Referenced from: /Users/ujihisa/git/ruby193/local/lib/ruby/1.9.1/x86_64-darwin10.6.0/enc/encdb.bundle!!!
!!!  Expected in: flat namespace!!!

!!!dyld: Symbol not found: _rb_encdb_declare!!!
!!!  Referenced from: /Users/ujihisa/git/ruby193/local/lib/ruby/1.9.1/x86_64-darwin10.6.0/enc/encdb.bundle!!!
!!!  Expected in: flat namespace!!!

!!!/Users/ujihisa/git/ruby193/local/lib/ruby/1.9.1/x86_64-darwin10.6.0/enc/encdb.bundle: [BUG] Segmentation fault!!!
!!!ruby 1.9.3dev (2011-01-25 trunk 30651) [x86_64-darwin10.6.0]!!!

!!!-- Control frame information -----------------------------------------------!!!
!!!c:0002 p:-537663266 s:0004 b:0004 l:000003 d:000003 TOP   !!!
!!!c:0001 p:0000 s:0002 b:0002 l:001dd8 d:001dd8 TOP   !!!


!!!-- See Crash Report log file under ~/Library/Logs/CrashReporter or ---------!!!
!!!-- /Library/Logs/CrashReporter, for the more detail of ---------------------!!!
!!!-- C level backtrace information -------------------------------------------!!!

!!!-- Other runtime information -----------------------------------------------!!!

!!!* Loaded script: ./local/bin/ruby!!!

!!!* Loaded features:!!!

!!!    0 enumerator.so!!!

!!![NOTE]!!!
!!!You may have encountered a bug in the Ruby interpreter or extension libraries.!!!
!!!Bug reports are welcome.!!!
!!!For details: http://www.ruby-lang.org/bugreport.html!!!


!!!vimshell: signal 6(SIGABRT) "./local/bin/ruby -ve 1"!!!

Conclusion

You can build Ruby 1.9.3 by Clang. It's 0.23 times faster than building by GCC.

Wednesday, January 5, 2011

A Contributed Article to Kernel/VM Advent Calendar: VIM=VM

There are two kinds of programmers; one uses Vim and the other uses Emacs. I don't think there are positive counter opinions about that Emacs is not an editor but an environment. On the other hand, there are some opinions that Vim is an editor or not, even though everyone agrees with the fact that vi is an editor. Some people, including the author of this article, claim that Vim is a virtual machine. Here I give you instructions of a Vim plugin shadow.vim instead of explaining why Vim is VM directly.

shadow.vim

The Vim plugin shadow.vim supports wrapping a virtual file with a program automatically with thin configuration.

Here it's a quote from a programmer:

"Nobody knows the Java code you committed is originally written in Scheme."

Usage

Assuming the product is a.pl, create a.pl.shd first.

a.pl (in Vim):

## ruby -e 'puts $<.read.gsub(/$/, ";")'
$a = 1
print($a)

Open a.pl in Vim. The Vim actually shows the contents of a.pl.shd. When you save the file, the command in the first line without ## runs, then the actual a.pl will be the result.

a.pl (actually):

$a = 1;
print($a);

Install

Unarchive the zip file into a directory that is under &rtp of your Vim, which stands for run time path, including ~/.vim dir.

Use Case

Here there are three examples, but you can use more general purposes.

Commit JavaScript files which was written in CoffeeScript

before

    ## coffee -csb
    f = (x) -> x + 1
    print f 10

    # vim: set ft=coffee :

after

    var f;
    f = function(x) {
      return x + 1;
    };
    print(f(10));

Use cpp command before committing Java files.
Markdown, Haml or something else to HTML

More examples

You can write code in C and save the file as GNU Assembly Language on your Linux automatically, so you can avoid portability.

References

VIM=VM http://www.slideshare.net/Shougo/vim-presentation-5330807 by Shougo
http://atnd.org/events/10701

Monday, December 26, 2011

A Language

Thursday, December 22, 2011

Implementation and description

Tuesday, December 20, 2011

(added at Tue Dec 20 23:57:12 PST 2011)

Sunday, December 18, 2011

Brief introduction to Compojure

Parameters

Tuesday, November 15, 2011

Sunday, November 13, 2011

Thursday, November 3, 2011

Run-length encoding

Huffman coding

BPE: Byte Pair Encoding

Monday, October 31, 2011

In Haskell

In C

When the big number isn't long long

Summary

Sunday, October 30, 2011

Friday, October 28, 2011

JavaScript (node.js)

Ruby

Python

Vim script

Haskell

Tuesday, October 25, 2011

ToString

Use case

Footnote

Wednesday, October 5, 2011

neco-rubymf.vim

Wednesday, August 3, 2011

Building clang 3.0

Building ruby 1.9.3 with clang

Benchmark

Footnote and references

Monday, July 25, 2011

In GCC

Saturday, July 23, 2011

Thursday, June 30, 2011

RSense

RSense code completion

Implementation

limitations

Vim

Vim and me

RSense built-in Vim plugin

Insert-mode completions in general

neocomplcache

RSense as omni func

one more thing

Sunday, June 12, 2011

in Vim script

in Haskell

in Clojure

Tuesday, May 31, 2011

The venue

Talks

Misc

Quotes

Monday, May 9, 2011

Vim Workshop#7

CoffeeScript Workshop

See also

Tuesday, February 22, 2011

Possible solutions

Saturday, February 5, 2011

Summary

First version

Thursday, January 27, 2011

Build the latest LLVM and Clang

A simple test

Ruby

Pitfalls

Conclusion

Wednesday, January 5, 2011

shadow.vim

Usage