Standards

Wednesday, May 13, 2009

Fixed uri-open of Termtter

Termtter's uri-open plugin

I committed some to Termtter. The following diff is the summary of my patches today.

diff --git a/lib/plugins/uri-open.rb b/lib/plugins/uri-open.rb
index 0afe73f..06a521b 100644
--- a/lib/plugins/uri-open.rb
+++ b/lib/plugins/uri-open.rb
@@ -8,44 +8,43 @@ module Termtter::Client
     :points => [:output],
     :exec_proc => lambda {|statuses, event|
       statuses.each do |s|
-        public_storage[:uris] += s[:text].scan(%r|https?://[^\s]+|)
+        public_storage[:uris] = s[:text].scan(%r|https?://[^\s]+|) + public_storage[:uris]
       end
     }
   )

   def self.open_uri(uri)
-    unless config.plugins.uri_open.browser.empty?
-      system config.plugins.uri_open.browser, uri
-    else
-      case RUBY_PLATFORM
-      when /linux/
-        system 'firefox', uri
-      when /mswin(?!ce)|mingw|bccwin/
-        system 'explorer', uri
+    cmd =
+      unless config.plugins.uri_open.browser.empty?
+        config.plugins.uri_open.browser
       else
-        system 'open', uri
+        case RUBY_PLATFORM
+        when /linux/; 'firefox'
+        when /mswin(?!ce)|mingw|bccwin/; 'explorer'
+        else; 'open'
+        end
       end
-    end
+    system cmd, uri
   end

   register_command(
     :name => :'uri-open', :aliases => [:uo],
-    :exec_proc => lambda{|arg|
+    :exec_proc => lambda {|arg|
       case arg
-      when /^\s+$/
-        public_storage[:uris].each do |uri|
-          open_uri(uri)
-        end
-        public_storage[:uris].clear
+      when ''
+        open_uri public_storage[:uris].shift
       when /^\s*all\s*$/
-        public_storage[:uris].each do |uri|
-          open_uri(uri)
-        end
-        public_storage[:uris].clear
+        public_storage[:uris].
+          each {|uri| open_uri(uri) }.
+          clear
       when /^\s*list\s*$/
-        public_storage[:uris].each_with_index do |uri, index|
-          puts "#{index}: #{uri}"
-        end
+        public_storage[:uris].
+          enum_for(:each_with_index).
+          to_a.
+          reverse.
+          each  do |uri, index|
+            puts "#{index}: #{uri}"
+          end
       when /^\s*delete\s+(\d+)\s*$/
         puts 'delete'
         public_storage[:uris].delete_at($1.to_i)
@@ -53,19 +52,20 @@ module Termtter::Client
         public_storage[:uris].clear
         puts "clear uris"
       when /^\s*(\d+)\s*$/
-        open_uri(public_storage[:uris][$1.to_i])
-        public_storage[:uris].delete_at($1.to_i)
+        open_uri(public_storage[:uris].delete_at($1.to_i))
+      else
+        puts "**parse error in uri-open**"
       end
     },
-    :completion_proc => lambda{|cmd, arg|
-      %w(all list delete clear).grep(/^#{Regexp.quote arg}/).map{|a| "#{cmd} #{a}"}
+    :completion_proc => lambda {|cmd, arg|
+      %w(all list delete clear).grep(/^#{Regexp.quote arg}/).map {|a| "#{cmd} #{a}" }
     }
   )
 end
-# ~/.termtter
+# ~/.termtter/config
 # plugin 'uri-open'
 #
 # see also: http://ujihisa.nowa.jp/entry/c3dd00c4e0
 #
 # KNOWN BUG
-# * In Debian, exit or C-c in the termtter kills your firefox.
+# * In Debian, exit or C-c in the termtter would kill your firefox.

Lisp like symbol in Ruby

I succeeded in adding a lisp like symbol literal to ruby by fixing parse.y.

diff --git a/parse.y b/parse.y
index e2e92ce..c9fbb03 100644
--- a/parse.y
+++ b/parse.y
@@ -6319,6 +6319,9 @@ static int
 parser_yylex(struct parser_params *parser)
 {
     register int c;
+    int cs[1024];
+    int i;
+    char flag; // 't': sTring, 'y': sYmbol
     int space_seen = 0;
     int cmd_state;
     enum lex_state_e last_state;
@@ -6631,8 +6634,26 @@ parser_yylex(struct parser_params *parser)
         return tXSTRING_BEG;

       case '\'':
-        lex_strterm = NEW_STRTERM(str_squote, '\'', 0);
-        return tSTRING_BEG;
+        flag = 'y'; // sYmbol by default
+        for (i=0; i<sizeof(cs); i++) {
+            cs[i] = nextc();
+            if (cs[i] == '\'') {
+                flag = 't'; // sTring
+                break;
+            }
+            if (cs[i] == '\n' || cs[i] == -1) {
+                break; // sYmbol
+            }
+        }
+        while (i >= 0)
+            pushback(cs[i--]);
+        if (flag == 't') {
+            lex_strterm = NEW_STRTERM(str_squote, '\'', 0);
+            return tSTRING_BEG;
+        } else {
+            lex_state = EXPR_FNAME;
+            return tSYMBEG;
+        }

       case '?':
         if (lex_state == EXPR_END || lex_state == EXPR_ENDARG) {

After applying this patch, we can write such like:

p 'aaa'      #=> "aaa"
p 'bbb       #=> :bbb
p 'ccc, true #=> :ccc
             #   true
p :ddd       #=> :ddd

The followings are the methodology how it works:

If the ruby parser finds single quote, the parser peeks the next characters from the next character
The parser stocks each characters into a stack which maximum size is 1024
If the parser finds another single quote, the parser does back tracking with the stack and then considers the following characters as a string.
Otherwise if the parser does not find another single quote within the line, the parser does back tracking with the stack and then considers the following characters as a symbol.

Obviously, there are some known bugs:

It cannot handle a line which has 1024+ characters after a quote
It confuses in p 'aaa, 'bbb
It cannot handle a multi line string

e.g.

p 'aaa
   bbb'

To tell the truth, I wrote this patch just for my curiousity. I don't believe this syntax will harmonize well with Ruby.

Tuesday, May 12, 2009

Cryptic infinite recursion in parse.y

When a single quote comes, parse.y will do:

case '\'':
  lex_strterm = NEW_STRTERM(str_squote, '\'', 0);
  return tSTRING_BEG;

This piece of codes is very same as a part of :'aaa'

NEW_STRTERM is a macro. It is a wrapper of a macro rb_node_newnode, which is a wrapper of a function node_newnode

#define NEW_STRTERM(func, term, paren) \
        rb_node_newnode(NODE_STRTERM, (func), (term) | ((paren) << (CHAR_BIT * 2)), 0)

#define rb_node_newnode(type, a1, a2, a3) node_newnode(parser, type, a1, a2, a3)

static NODE*
node_newnode(struct parser_params *parser, enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
    NODE *n = (rb_node_newnode)(type, a0, a1, a2);
    nd_set_line(n, ruby_sourceline);
    return n;
}

The last function node_newnode calls rb_node_newnode, that is node_newnode itself, recursively.

Here is the node_newnode which macro is expanded:

static NODE*
node_newnode(struct parser_params *parser, enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
    NODE *n = node_newnode(parser, type, a0, a1, a2);
    nd_set_line(n, ruby_sourceline);
    return n;
}

It's mysterious. node_newnode seems infinite recursion without any terminate conditions. Why does it work?

Read parse.y and learn about a colon

In Ruby, there are a lot of usages of colon.

Symbol :aaa
Symbol with single quotes :'aaa'
Symbol with double quotes :"aaa"
Conditional operator aaa ? bbb : ccc
Class hierarchy A::B

The following is an excerpt from parse.y relates a colon:

case ':':
  c = nextc();
  if (c == ':') {
      if (IS_BEG() ||
          lex_state == EXPR_CLASS || (IS_ARG() && space_seen)) {
          lex_state = EXPR_BEG;
          return tCOLON3;
      }
      lex_state = EXPR_DOT;
      return tCOLON2;
  }
  if (lex_state == EXPR_END || lex_state == EXPR_ENDARG || (c != -1 && ISSPACE(c))) {
      pushback(c);
      lex_state = EXPR_BEG;
      return ':';
  }
  switch (c) {
    case '\'':
      lex_strterm = NEW_STRTERM(str_ssym, c, 0);
      break;
    case '"':
      lex_strterm = NEW_STRTERM(str_dsym, c, 0);
      break;
    default:
      pushback(c);
      break;
  }
  lex_state = EXPR_FNAME;
  return tSYMBEG;

If the next character of ':' is ':', in a nutshell if '::' has come, it returns tCOLON3 (:: at the very left) or tCOLON2 (:: at the right hand). Otherwise if the colon is just after expressions, it's a conditional operator. Ditto if the next character is a space. Otherwise finally the colon is a prefix of symbol.

I've never known that we can have a newline after ::!

class A::

B
end

It's valid.

OK... So, let's try to add a new literal :-) which has equivalent to =>.

diff --git a/parse.y b/parse.y
index e2e92ce..8e49bf4 100644
--- a/parse.y
+++ b/parse.y
@@ -7082,6 +7082,9 @@ parser_yylex(struct parser_params *parser)

       case ':':
         c = nextc();
+        if (c == '-' && nextc() == ')') {
+            return tASSOC;
+        }
         if (c == ':') {
             if (IS_BEG() ||
                 lex_state == EXPR_CLASS || (IS_ARG() && space_seen)) {

That's easy.

$ ./ruby -e 'p({ 1 :-) 2 })'
{1=>2}

cool!

Monday, May 11, 2009

Save the iTerm

System Preferences

Now you don't have to care about typing Cmd-Q or Cmd-W by mistake.

Thursday, May 7, 2009

About Array#<=> (a.k.a. Spacecraft Operator)

The documentations about Array#<=> are:

http://www.ruby-doc.org/core/classes/Array.html#M002227 (in English)
http://doc.okkez.net/191/view/method/Array/i/=3c=3d=3e (in Japanese)

For example:

[1, 2, 3, 4] <=> [1, 2, 3, 3] #=> 1
[1, 2, 3, 4] <=> [1, 2, 3, 4] #=> 0
[1, 2, 3, 4] <=> [1, 2, 3, 5] #=> -1
[1, 2, 3, 4] <=> [1, 2, 3] #=> 1

The implementation in MRI 1.9 is:

VALUE
rb_ary_cmp(VALUE ary1, VALUE ary2)
{
    long len;
    VALUE v;

    ary2 = to_ary(ary2);
    if (ary1 == ary2) return INT2FIX(0);
    v = rb_exec_recursive(recursive_cmp, ary1, ary2);
    if (v != Qundef) return v;
    len = RARRAY_LEN(ary1) - RARRAY_LEN(ary2);
    if (len == 0) return INT2FIX(0);
    if (len > 0) return INT2FIX(1);
    return INT2FIX(-1);
}

static VALUE
recursive_cmp(VALUE ary1, VALUE ary2, int recur)
{
    long i, len;

    if (recur) return Qnil;
    len = RARRAY_LEN(ary1);
    if (len > RARRAY_LEN(ary2)) {
        len = RARRAY_LEN(ary2);
    }
    for (i=0; i<len; i++) {
        VALUE v = rb_funcall(rb_ary_elt(ary1, i), id_cmp, 1, rb_ary_elt(ary2, i));
        if (v != INT2FIX(0)) {
            return v;
        }
    }
    return Qundef;
}

I translated it from C to Ruby literally:

class Array
  def yet_another_cmp(you)
    you = you.to_ary
    return 0 if self == you
    v = nil
    (0...[self.length, you.length].min).each do |i|
      v = self[i] <=> you[i]
      return v if v != 0
    end
    len = self.length - you.length
    return 0 if len == 0
    return 1 if len > 0
    -1
  end
end

It works exactly the same.

You may think it is easy to write a simpler equivalent implement with Array#zip, but unfortunately I found that it was not so simple. The following is the simplest code I can write. Of course it works exactly the same as original <=>.

class Array
  def simple_cmp(you)
    self.zip(you) {|x, y|
      break if (x && y).nil?
      (v = x <=> y) == 0 or return v
    }
    self.length <=> you.length
  end
end

In conclusion, Array#<=> itself is complicated one. Enjoy your space travel!

Saturday, May 2, 2009

A benchmark of Array#to_s

A bugfix patch may make Array#to_s slow. I am finding out how slow it becomes.

That patch changes the definition of Array#to_s from:

rb_define_method(rb_cArray, "to_s", rb_ary_inspect, 0);
rb_define_method(rb_cArray, "inspect", rb_ary_inspect, 0);

to:

rb_define_method(rb_cArray, "inspect", rb_ary_inspect, 0);
rb_define_alias(rb_cArray,  "to_s", "inspect");

This change effects only in the array has itself recursively.

Benchmark code I use is:

require 'benchmark'
short_a = Array.new(100) { rand }
long_a = Array.new(10000) { rand }

Benchmark.bmbm do |b|
  b.report do
    1000.times do
      short_a.to_s
    end
  end

  b.report do
    10.times do
      long_a.to_s
    end
  end
end

And the results are below.

Conventional (before the patch applied)

Rehearsal ------------------------------------
   0.810000   0.010000   0.820000 (  0.836625)
   0.870000   0.010000   0.880000 (  0.901661)
--------------------------- total: 1.700000sec

       user     system      total        real
   0.740000   0.010000   0.750000 (  0.748501)
   0.770000   0.010000   0.780000 (  0.801869)

Current (after the patch applied)

Rehearsal ------------------------------------
   0.820000   0.000000   0.820000 (  0.852360)
   0.860000   0.010000   0.870000 (  0.897602)
--------------------------- total: 1.690000sec

       user     system      total        real
   0.850000   0.010000   0.860000 (  0.873125)
   0.830000   0.010000   0.840000 (  0.871903)

There are certainly slow changes, but they seem enough slight. What do you think about it?