How does Ruby execute your code?

Bhanu Prakash
Bhanu Prakash
December 21, 2018
#backendengineering

Ruby does the following steps when you execute ruby sample_code.rb 

  • Tokenization
  • Parsing
  • Compiling
  • Execution

How ruby executes the code

Tokenization

When you execute ruby sample_code.rb, the first thing Ruby does is read the characters in the file and convert them into tokens and labels them. For example, it will convert the following code

# sample_code.rb
puts "Hello Aliens!!!"

Into the following tokens

  • puts ( identifier )
  • “ “  ( space )
  • ‘ “ ‘ ( string beginning )
  • “Hello Aliens” ( string content )
  • ‘ “ ‘ ( string end )
  • “\n” ( new line )

You can actually verify this with the help of the Ripper class. Let’s modify the code in our sample_code.rb file to print out the tokens after tokenization

# sample_code.rb
require 'ripper'
require 'pp'

code = <<STR
puts "Hello Aliens"
STR

puts code
pp Ripper.lex(code)

When you execute the above code, you will get the following output

puts "Hello Aliens"
[[[1, 0], :on_ident, "puts"],
 [[1, 4], :on_sp, " "],
 [[1, 5], :on_tstring_beg, "\""],
 [[1, 6], :on_tstring_content, "Hello Aliens"],
 [[1, 18], :on_tstring_end, "\""],
 [[1, 19], :on_nl, "\n"]]

The output of Ripper.lex is an array of arrays. Each element in the array has information regarding the tokens. The first element is the line number and column number where the token was found. The second element is the classification of the token, whether it is a string, integer, new line, empty space, identifier, keyword, etc.  

The tokenizer doesn’t check for any syntax i.e even if you provide a wrong syntax it will blindly convert it into tokens. Let’s modify the sample_code.rb to have incorrect syntax

# sample_code.rb
require 'ripper'
require 'pp'

code = <<STR
puts "Hello Aliens
STR

puts code
pp Ripper.lex(code)

The output will be

puts "Hello Aliens
[[[0, 6], :on_tstring_end, "Hello Aliens\n"],
 [[1, 0], :on_ident, "puts"],
 [[1, 4], :on_sp, " "],
 [[1, 5], :on_tstring_beg, "\""]]

Parsing

While parsing Ruby actually tries to group the tokens into phrases that actually makes sense to Ruby. Ruby uses parse generator called Bison to create the parser class. The input to the parser is set of grammar rules. Ruby builds this parser during the build process. Ruby uses LALR algorithm ( Look-Ahead Left Reversed Rightmost Derivation ) to parse the tokens. It reads the tokens from left to right, trying to match the pattern to one or more grammar rules. The parser also looks at the next token in the stream when trying to figure out which rule to match.

To see the output of the parsing stage you have to pass the code to Ripper#sexp. Here is a sample code:

# sample_code.rb
require 'ripper'
require 'pp'

code = <<STR
puts "Hello Aliens"
STR

puts code
pp Ripper.slex(code)

And the output when the above code when executed is:

puts "Hello Aliens"
[:program,
 [[:command,
   [:@ident, "puts", [1, 0]],
   [:args_add_block,
    [[:string_literal,
      [:string_content, [:@tstring_content, "Hello Aliens", [1, 6]]]]],
    false]]]]


This output is a data structure called Abstract Syntax Tree. The data structure is used to record the structure and meaning of the Ruby code. The graphical representation of the above output is

Abstract Syntax Tree

The command node or the function call is followed by the identifier/function to be called. The args_add_block has the list of arguments or block passed to the function. Here a string literal with content “Hello Aliens” is passed to the puts method.

Compilation

Ruby version 1.9 introduced compiler which compiles the ruby code before executing. Ruby compiler translates your code into another language which Ruby’s virtual machine understands.  Ruby’s compiler runs in the background without the need of our interaction.  Ruby compiles the AST into low-level bytecode which ( Yet Another Ruby Virtual Machine ) YARV can understand. YARV is an interpreter which executes the code.