# Standards

Blogged by Ujihisa. Standard methods of programming and thoughts including Clojure, Vim, LLVM, Haskell, Ruby and Mathematics written by a Japanese programmer. github/ujihisa

## Monday, October 31, 2011

### Optimizer Comparisons -- Skip Nops

The following Haskell code is supposed to just output "hello".

``````main = print \$ times 1000000000000 id "hello"

times 0 _ x = x
times n f x = times (n-1) f (f x)
``````

The `times` function I defined applies the given function many times.

• `times 3 f x` is...
• `times 2 f (f x)`
• `times 1 f (f (f x))`
• `times 0 f (f (f (f x)))`
• `f (f (f x))`

`id`, defined in Prelude, just returns the given argument. You may want to assume the GHC internal optimizer to eliminate any `id` call. In other words, the Haskell code in the very beginning is supposed to be transformed just to `main = print "hello"`.

Unfortunately, no it doesn't, even with `-O2` option.

``````\$ ghc -O2 a.hs -o a
\$ ./a
... (long long hours) ..
``````

## In C

I just translated the original Haskell code to C.

``````#include "stdio.h"

char *id(char *x) {
return x;
}
char *times(long long n, char *(*f)(char *), char *x) {
if (n == 0) {
return x;
} else {
return times(n-1, f, f(x));
}
}

int main(int argc, char const* argv[]) {
puts(times(1000000000000LL, id, "hello"));
return 0;
}
``````

I tried it with GCC and Clang.

GCC on Mac:

• i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

``````\$ gcc -O2 a.c -o a
\$ ./a
... (long long hours) ..
``````

The assembly code for the `main` function with `-O2 -m32` option is the following.

``````_main:
pushl %ebp
movl  %esp, %ebp
pushl %ebx
subl  \$20, %esp
call  L13
"L00000000001\$pb":
L13:
popl  %ebx
leal  LC0-"L00000000001\$pb"(%ebx), %eax
movl  %eax, 8(%esp)
leal  _id-"L00000000001\$pb"(%ebx), %eax
movl  %eax, 4(%esp)
movl  \$-727379968, (%esp)
call  _times
movl  %eax, (%esp)
call  _puts
xorl  %eax, %eax
popl  %ebx
leave
ret
``````

You see it's calling `_times` for sure.

Clang:

``````\$ clang -O2 a.c -o a
\$ ./a
hello
``````

This gives you the message immediately!

The below is the generated LLVM IR. You can easily tell that it just calls `puts` function directly. Note that the `@.str` const is just "hello" ending with null.

``````define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable ssp {
entry:
%call1 = tail call i32 @puts(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0)) nounwind
ret i32 0
}
``````

GCC on Linux:

• gcc 4.4

``````main:
pushl %ebp
movl  %esp, %ebp
andl  \$-16, %esp
subl  \$16, %esp
movl  \$.LC0, (%esp)
call  puts
xorl  %eax, %eax
leave
ret
``````

It ran like the one with Clang!

## When the big number isn't long long

I wrote the first argument of `times` in C as `int` at first.

I got warning messages from both GCC and Clang that the value, 1000000000000, was too big for int. In the case, GCC on Linux made the following assembly code.

``````main:
pushl %ebp
movl  %esp, %ebp
.L11:
jmp   .L11
``````

It's just an infinite loop! That actually makes sense because any int values never go to non-int value 1000000000000.

## Summary

• GHC doesn't eliminate `id` in optimization
• GCC eliminates it except for Apple GCC 4.2
• Clang eliminates id
• Some optimizations can conflict if the code depends on undefined behaviour

## Sunday, October 30, 2011

### How To Compile C to IA32

Q. How to compile a C file to IA32 assembly language on AMD64 Debian?

``````\$ gcc -S -m32 a.c
In file included from /usr/include/features.h:378,
from /usr/include/stdio.h:28,
from a.c:1:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
``````

A. install `gcc-multilib`.

``````\$ sudo aptitude install gcc-multilib
``````

## Friday, October 28, 2011

### URL Encodings

URL Encoding, specifically Percent-encoding, is one of common encodings that you may use in arbitorary programming languages.

Let me encode a text into URL Encoding, which includes UTF-8 Japanese characters.

## JavaScript (node.js)

Use `encodeURIComponent`.

``````var sys = require('sys');
sys.puts(encodeURIComponent('abc あいう-#!@'));
``````

result:

``````abc%20%E3%81%82%E3%81%84%E3%81%86-%23!%40
``````

Note that `escape` and `encodeURI` aren't proper for URI encoding.

``````sys.puts(escape('abc あいう-#!@'));
//=> abc%20%u3042%u3044%u3046-%23%21@
sys.puts(encodeURI('abc あいう-#!@'));
//=> abc%20%E3%81%82%E3%81%84%E3%81%86-#!@
sys.puts(encodeURIComponent('abc あいう-#!@'));
//=> abc%20%E3%81%82%E3%81%84%E3%81%86-%23!%40
``````

(Thanks @javascripter!)

## Ruby

Use `ERB::Util.url_encode`.

``````# coding: utf-8
require 'erb'
puts ERB::Util.url_encode 'abc あいう-#!@'
``````

result:

``````abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40
``````

This result is exactly same to JavaScript's except for "!".

Note that `URI.encode` or `URI.encode_www_form_component` aren't proper.

``````require 'uri'
puts URI.encode 'abc あいう-#!@'
#=> abc%20%E3%81%82%E3%81%84%E3%81%86-%23!@
puts URI.encode_www_form_component 'abc あいう-#!@'
#=> abc+%E3%81%82%E3%81%84%E3%81%86-%23%21%40
``````

## Python

Use `urllib.quote`.

``````# coding: utf-8
import urllib
print urllib.quote('abc あいう-#!@')
``````

result:

``````abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40
``````

(Thanks @raa0121!)

## Vim script

Use `Vital.Web.Http.encodeURI`.

``````let H = vital#of('vital').import('Web.Http')
echo H.encodeURI('abc あいう-#!@')
``````

result:

``````abc%20%E3%81%82%E3%81%84%E3%81%86-%23%21%40
``````

Use `Network.HTTP.urlEncode`. But...

``````import Network.HTTP (urlEncode)
main = putStrLn \$ urlEncode "abc あいう-#!@"
``````

result:

``````abc%20%30%42%30%44%30%46-%23%21%40
``````

oh...

## Tuesday, October 25, 2011

### Converting From/To String

Haskell has three dominant types of the implementations of strings; String, ByteString and Text(*1). Since String is the default literal of strings, most libraries only support String as strings, even though you can write ByteString or Text directly in a Haskell code with `OverloadedStrings` pragma.

Haskell has `IsString` class that means you can convert the type to String, using `fromString` function. `IsString` only requires the type to have the `fromString` function.

``````-- in Data.String
class IsString a where
fromString :: String -> a
``````

## ToString

I implemented `ToString` class that you can convert from the type to String by `toString` function, oppositely to the `IsString` class and its `fromString` function.

``````{-# LANGUAGE TypeSynonymInstances #-}
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString as B

class ToString a where
toString :: a -> String

instance ToString String where
toString = id

instance ToString T.Text where
toString = T.unpack

instance ToString B.ByteString where
toString = C.unpack
``````

You can use them like the following way.

``````{-# LANGUAGE OverloadedStrings #-}
main = do
putStrLn \$ toString ("hello world!" :: String)
putStrLn \$ toString ("hello world!" :: T.Text)
putStrLn \$ toString ("hello world!" :: B.ByteString)
``````

It might remind you about `show :: a -> String`. `show` and the `toString` I defined here have the same type but the purpose and the behaviour are different. `show` is to give human-readable and machine-parsable String represantation while `toString` is just to convert from a type which is an implementation of the concept of strings to String.

If you are familiar with Ruby, this may make sense; `show` in Haskell is `inspect` in Ruby while `toString` in Haskell is `to_str` in Ruby. Note that that's not `to_s` whom Integer or some other non-string classes also have.

## Use case

An example is Network.HTTP.urlEncode function; the function is `String -> String` but you may want to use it for Text values.

``````import qualified Data.Text as T
import qualified Network.HTTP as H
urlEncodeText :: T.Text -> T.Text
urlEncodeText = T.pack . H.urlEncode . T.unpack
``````

You can define Text-specific version of `urlEncode` easily. You can define ByteString-specific version as well in the same way. But you cannot have general one.

It's turn to use the `ToString` we saw here.

``````{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeSynonymInstances #-}
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString as B
import qualified Network.HTTP as H
import Data.String (fromString, IsString)

main = do
putStrLn \$ urlEncode ("hello world!" :: String)
T.putStrLn \$ urlEncode ("hello world!" :: T.Text)
B.putStrLn \$ urlEncode ("hello world!" :: B.ByteString)

class ToString a where
toString :: a -> String

instance ToString String where
toString = id

instance ToString T.Text where
toString = T.unpack

instance ToString B.ByteString where
toString = C.unpack

urlEncode :: IsString a => ToString a => a -> a
urlEncode = fromString . H.urlEncode . toString
``````

This `urlEncode` function defined here can use String, ByteString and Text just by the single function! I just combined IsString and ToString there :)

## Wednesday, October 5, 2011

### Completing Method Names in Ruby Based on Input and Output

I described how to auto-complete method names of an object on Vim but here let me describe about a bit more restricted one. How can we auto-complete method names of an object that goes to a given object?

An example here that you send `succ` or `next` on an object 1 when you want to get an object 2.

``````1.succ #=> 2
``````

Another example. `first` method will let you get an object `:a` by `[:a, :b, :c]`.

``````[:a, :b, :c].first #=> :a
``````

It is natural that you expect Vim automatically completes the method names when the cursor is on the `.` just after the object.

## neco-rubymf.vim

A neocomplcache plugin, neco-rubymf, does exactly what I described above.

Installation:

• Install a Rubygems package `methodfinder`.

``````\$ gem install methodfinder
``````
• Install the neocomplcache plugin neco-rubymf. If your Vim still don't have neocomplcache, install it as well.

Try inputting the following line in a buffer where the `'filetype'` is `ruby`. Note that ¶ means cursor here.

``````1.¶ #=> 2
``````

That shows methods `next` and `succ` in completion popup automatically, which the method an object 1 owns and also the returning value is 2.

Similarly,

``````[:a, :b, :c].¶ #=> :a
``````

gives this.

The screenshots below are other examples.

※The last example occurs stochastically.