Zordoz

6.3.90.900

top ← prev up next →

Zordoz

package: zordoz

Zordoz is a tool for exploring .zo bytecode files. It offers a simple command-line interface for exploring string representations of bytecode structures.

These files describe the REPL and the API functions supporting it. Jump to the bottom of the REPL section for example usage.

1 Overview

1.1 Quickstart

1.1.1 Explorer

1.1.2 Automated Search

2.3 Sample Interaction

3 API

3.1 Starting a REPL

3.2 String Representations

3.3 Traversing zo Structs

3.4 Searching Structs

3.5 Compiling and Decompiling

3.6 Compiling C Modules

3.7 Typed API

4 Future Work

4.1 Checking two files for differences

4.2 More Search Options

4.3 Bytecode Graphs

4.4 Dr.Racket Integration

1 Overview

1.1 Quickstart

1.1.1 Explorer

1.1.2 Automated Search

1.2 Testing

1.3 Project Goals

1.1 Quickstart

To install, either use raco

raco pkg install zordoz

Or clone the repository and install manually, via raco.

$ git clone https://github.com/bennn/zordoz
$ raco pkg install zordoz/

Zordoz provides a raco command. To see help information, run:

raco zordoz --help

1.1.1 Explorer

The default mode is to interactively explore a bytecode file. Assuming FILE.zo is a compiled file on your computer,

raco zordoz FILE.zo

will start a REPL session. Type help at the REPL to see available commands. See REPL for a detailed explanation of each.

1.1.2 Automated Search

To search a bytecode file for occurrences of a certain zo struct, use the -f flag. (This flag may be supplied more than once.)

raco zordoz -f STRUCT-NAME FILE.zo

The number of occurrences of each struct will be printed to the console. For example:

$ raco zordoz -f branch -f lam private/compiled/zo-string_rkt.zo
INFO: Loading bytecode file 'private/compiled/zo-string_rkt.zo'...
INFO: Parsing bytecode...
INFO: Parsing complete! Searching...
FIND 'branch': 427 results
FIND 'lam': 433 results
All done!

1.2 Testing

Each source file contains a module+ test with unit tests. Run them all with:

raco test zordoz

or individually using:

raco test FILE.rkt

1.3 Project Goals

Racket offers a de-compilation API, however the structs it produces are still dense reading. This project takes a de-compiled zo struct and offers:

A string representation of the struct, with name and fields clearly labeled.
Interactive exploration of the struct’s fields.
A simple search interface for finding patterns nested within a struct.

This library should be available to as many versions of Racket as possible, and kept up-to-date.

We also hope to add more features, especially a tool for comparing two bytecode files. Feedback and suggestions appreciated!

2 REPL

2.3 Sample Interaction

The REPL is a simple, interactive way to explore bytecode files. This document is a users’ guide for the REPL. See the API page for alternate ways of starting a REPL (besides the command line).

2.1 Summary

The REPL works by storing an internal context and reacting to commands. This context is either:

A zo struct
A list of zo structs
Search results, obtained by calling find.

The commands observe or advance this context. Commands may be separated by newlines or semicolons.

For convenience, the REPL records previous states. We call this recorded past the history of the REPL; it is a stack of contexts.

Keeping this stack in mind is useful for understanding the REPL commands.

2.2 Commands

2.2.1 alst

List all command aliases.

For uniformity, the canonical name of each command has 4 letters. But each has a few mnemonic aliases to choose from. For example, you can type ls instead of info and cd instead of dive.

2.2.2 back

Move up to the previous context.

Each successful dive or find command changes the current context to new struct or list. Before making these transitions, we save the previous context to a stack. The back command pops and switches to the most recent element from this stack.

Note that back will fail (and print a warning) at the top of the zo struct hierarchy or the top of a saved subtree.

2.2.3 dive

Enter a struct’s field.

This is where exploring happens. Each struct has a few fields; you can see these by printing with info. Any field containing zo structs is a candidate for dive. For example, the struct assign has a field rhs, which can be accessed by:

dive rhs

If you know where you are going, you can chain paths together. Starting at an average compilation-top, this command should move to the body of the enclosed module.

dive code/body

Extra Notes:

Only fields that contain zo structures or lists of zo structures may be explored.
Changing to a zo structure field changes the context to the child zo structure. Changing to a list field changes context to that list, from which you can select a natural-number position in the list to explore.
dive takes exactly one argument. Any more or less is not permitted.

2.2.4 find

Search the current struct’s children for a certain zo struct.

Find uses string matching to automate a simple search process. Give it a string, for instance find lam structs nested within the current context. The string must be the name of a zo struct—anything else will return null results.

A successful find changes context to a list of zo structs. Exploring any element of the list changes the current history to be that element’s history. You are free to explore the children and parents of any struct returned by a find query. Use jump to immediately return to the search results.

Note:

If, after exploring a search result, you move back past the list of search results, the REPL will print a notice.

2.2.5 help

Print command information.

Shows a one-line summary of each command. The tabernacle is all-knowing.

2.2.6 info

Print the current context.

This info command does the real work of exploring. It shows the current context, whether struct or list. Lists give their length and the names of their elements. Zo structs show their name, their fields’ names, and their fields’ values.

Struct fields are printed as best we can.

Fields which are zo structures print their names. These fields may be dive-ed into.
Fields which are lists containing at least one zo structure are printed with a natural number in square braces, indicating the number of zo structs inside the list. These fields may also be dived into.
Other fields are printed with Racket’s default printer. Be aware, lists and hashes can sometimes cause very large printouts.

2.2.7 jump

Warp back to a previously-saved context.

The commands jump and save work together. After saving or making a successful query with find, the current history is saved. At this point, a step backwards will recover this history. The interesting thing is that steps forward create a new history, and you can immediately forget that new history by calling jump.

For example, if you call find and explore one of the results, you can immediately jump back to your search results.

2.2.8 save

Mark the current context and history as a future target for jump. This is useful for marking a struct you want to backtrack to.

Note that, if you manually backtrack past a saved struct then its mark disappears and the REPL prints a notice.

2.2.9 quit

Exit the REPL.

2.3 Sample Interaction

Let’s explore the REPL’s own bytecode. Starting from the directory you cloned this repo to (or where ‘raco‘ put it on your filesystem):

$ raco zordoz private/compiled/zo-string_rkt.zo
INFO: Loading bytecode file 'private/compiled/zo-string_rkt.zo'...
INFO: Parsing bytecode...
INFO: Parsing complete!
--- Welcome to the .zo shell,version 1.0 'vortex'---
zo>

Now we can start typing commands, like info.

zo> info
<zo:compilation-top>
  max-let-depth : 31
  prefix        : <zo:prefix>
  code          : <zo:mod>

Next, let’s try a dive.

zo> dive max-let-depth
'dive max-let-depth'not permitted.

Didn’t work! That’s because max-let-depth is an integer. Let’s try one of the structs.

zo> dive prefix
zo> info
<zo:prefix>
  num-lifts : 0
  toplevels : [#f]
  stxs      : []

Great! We can’t dive any further from here, so let’s go back up.

zo> back
zo> info
<zo:compilation-top>
  max-let-depth : 31
  prefix        : <zo:prefix>
  code          : <zo:mod>

And we’re back to where we began. From here we could dive to the code field and print it, but the printout is a little overwhelming. The module we’re exploring, zo-string, creates over 40 different functions. There’s just a lot of data to look at, and because it’s heterogenous data we do not have a nice way of truncating it.

Instead, we’ll try the find command. Be warned, the search might take a minute.

zo> find compilation-top
FIND returned 0 results

Zero results is good: there should not be any other compilation-top structs besides the one we’re currently in. Now try searching for something else, like branch.

zo> find branch
FIND returned 422 results
FIND automatically saving context
<zo:branch>[422]

Wow! Over 400 results. We can start exploring one of them:

zo> dive 17
zo> info
<zo:branch>
  test : <zo:application>
  then : <zo:seq>
  else : <zo:branch>

We can also explore its children and parents.

zo> dive test
zo> info
<zo:application>
  rator : <zo:primval>
  rands : [<zo:localref>]
zo> dive rator
zo> info
<zo:primval>
  id : 90
zo> up
zo> up
zo> info
<zo:branch>
  test : <zo:application>
  then : <zo:seq>
  else : <zo:branch>
zo> up
zo> info
<zo:branch>
  test : <zo:localref>
  then : <zo:branch>
  else : #f

And if we do a jump, we return to the search results.

zo> jump
zo> info
<zo:branch>[422]

3 API

3.1 Starting a REPL

3.2 String Representations

3.3 Traversing zo Structs

3.4 Searching Structs

3.5 Compiling and Decompiling

3.6 Compiling C Modules

3.7 Typed API

These functions support the REPL, but may be useful in more general settings. Import them with (require zordoz).

3.1 Starting a REPL

procedure
(filename->shell fname) → void?
fname : path-string?

Start a REPL to explore a .zo bytecode file.

procedure
(zo->shell z) → void?
z : zo

Start a REPL to explore a zo struct.

procedure
(syntax->shell stx) → void?
stx : syntax?

Start a REPL to explore a syntax object. First compiles the syntax to a zo representation.

3.2 String Representations

These tools convert a zo structure to a pretty-printed string, or a more structured representation.

procedure
(zo->string z #:deep? deep) → string?
z : zo?
deep : boolean?

Convert a zo struct into a string. When the optional argument #:deep? is set, include the struct’s fields in the string representation. Otherwise, only print the name.

Examples:

> (displayln (zo->string (primval 129)))
<zo:primval>
  id : 129
> (displayln (zo->string (primval 129) #:deep? #f))
<zo:primval>
> (displayln (zo->string (branch (= 3 1) "true" 'false)))
<zo:branch>
  test : #f
  then : true
  else : false

procedure
(zo->spec z) → spec/c
z : zo?

Convert a zo struct into a spec/c representation. A spec/c is a list containing:

A string, representing its name.
Pairs, representing the struct’s fields. The first element of each pair should be a string representing the field name. The second element should be a thunk that, when forced, yields either a string or another spec.

The thunks delay pretty-printing an entire nested struct.

Examples:

> (zo->spec (primval 129))
'("primval" ("id" . #<procedure:...te/zo-string.rkt:87:26>))
> (zo->spec (branch (= 3 1) "true" 'false))
'("branch"
  ("test" . #<procedure:...te/zo-string.rkt:87:26>)
  ("then" . #<procedure:...te/zo-string.rkt:87:26>)
  ("else" . #<procedure:...te/zo-string.rkt:87:26>))

3.3 Traversing zo Structs

Racket does not provide a reflective way of accessing struct fields at runtime. So we provide a function that does this by-force, just for zo structures.

procedure
(zo-transition z str)
→
(values (or/c zo? (listof zo?)) boolean?)
z : zo?
str : string?

Identify what specific zo struct z is, then access its field named str, if any. The multiple return values deal with the following cases:

If the field str does not exist, or does not denote a zo struct, return the argument z and the boolean value #f.
If the field str denotes a list and we can parse zo structs from the list, return a list of zo structs and the boolean #t.
(Expected case) If the field points to a zo struct, return the new zo struct and the boolean #t.

Examples:

> (let-values ([(z success?) (zo-transition (primval 42) "foo")])
    (displayln success?)
    z)
#f
'#s((primval expr 0 form 0 zo 0) 42)
> (let-values ([(z success?) (zo-transition (primval 42) "id")])
    (displayln success?)
    z)
#f
'#s((primval expr 0 form 0 zo 0) 42)
> (let-values ([(z success?) (zo-transition
                               (application (primval 42) '())
                               "rator")])
    (displayln success?)
    z)
#t
'#s((primval expr 0 form 0 zo 0) 42)

3.4 Searching Structs

If you know the name of the zo struct you hope to find by exploring a subtree, you can automate the exploring. Literally, find is repeated application of zo->string and zo-transition.

procedure
(zo-find z str [#:limit lim]) → (listof result?)
  z : zo?
  str : string?
  lim : (or/c natural-number/c #f) = #f

Starting with the children of the struct z, search recursively for struct instances matching the string str. For example, if str is application then find will return all application structs nested below z.

The return value is a list of result structs rather than plain zo structs because we record the path from the argument z down to each match.

Examples:

> (let* ([seq* (list (seq '()) (seq '()))]
         [z (seq (list (seq seq*) (seq seq*)))])
    (zo-find z "seq" #:limit 1))
(list
(result
  '#s((seq form 0 zo 0) (#s((seq form 0 zo 0) ()) #s((seq form 0 zo 0) ())))
  '())
(result
  '#s((seq form 0 zo 0) (#s((seq form 0 zo 0) ()) #s((seq form 0 zo 0) ())))
  '()))
> (let* ([thn (primval 0)]
         [els (branch #t (primval 1) (primval 2))]
         [z (branch #t thn els)])
    (map result-zo (zo-find z "primval")))
'(#s((primval expr 0 form 0 zo 0) 0)
  #s((primval expr 0 form 0 zo 0) 1)
  #s((primval expr 0 form 0 zo 0) 2))

procedure
(result-zo result) → zo?
result : zo-result?

Converts a zo-result? to the found zo? field. See zo-find.

struct
(struct result (z path)
    #:transparent)
  z : zo?
  path : (listof zo?)

A result contains a zo struct and a path leading to it from the search root. In the context of find, the path is always from the struct find was called with.

procedure
(find-all fname qry* [#:limit lim]) → void?
  fname : path-string?
  qry* : (Listof String)
  lim : (or/c natural-number/c #f) = #f

Apply find iteratively on the bytecode file fname. Print the results for each string in the list qry* to current-output-port.

3.5 Compiling and Decompiling

Tools for compiling syntax fragments rather than entire modules.

procedure
(syntax->zo stx) → zo?
stx : syntax?

Compiles a syntax object to a zo struct. The result is wrapped in a compilation-top struct.

Examples:

> (syntax->zo #'6)
'#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) 6)
> (syntax->zo #'(member 'a '(a b c)))
'#s((compilation-top zo 0)
    2
    #hash()
    #s((prefix zo 0)
       0
       (#s((module-variable zo 0)
           #<module-path-index:("member.rkt" "pre-base.rkt" "private/base.rkt" racket/base)>
           member
           0
           0
           #f))
       ()
       insp0)
    #s((application expr 0 form 0 zo 0)
       #s((toplevel expr 0 form 0 zo 0) 2 0 #f #t)
       (a (a b c))))
> (syntax->zo #'(if #t 'left 'right))
'#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) left)

procedure
(syntax->decompile stx) → any/c
stx : syntax?

Compiles a syntax object, then immediately decompiles the compiled code back to an S-expression. Similar to syntax->zo, except the final output is Racket code and not a zo structure.

Examples:

> (syntax->decompile #'6)
'(begin (quote inspector insp0) '6)
> (syntax->decompile #'(member 'a '(a b c)))
'(begin
(quote inspector insp0)
(|_member@(lib "racket/private/member.rkt")| 'a '(a b c)))
> (syntax->decompile #'(if #t 'left 'right))
'(begin (quote inspector insp0) 'left)

procedure
(compiled-expression->zo cmp) → zo?
cmp : compiled-expression?

Converts a compiled expression into a zo struct. Differs from zo-parse in that the input is expected to be a compiled-expression?. This function is the inverse of zo->compiled-expression.

Examples:

> (compiled-expression->zo (compile-syntax #'6))
'#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) 6)
> (compiled-expression->zo (compile-syntax #'(member 'a '(a b c))))
'#s((compilation-top zo 0)
    2
    #hash()
    #s((prefix zo 0)
       0
       (#s((module-variable zo 0)
           #<module-path-index:("member.rkt" "pre-base.rkt" "private/base.rkt" racket/base)>
           member
           0
           0
           #f))
       ()
       insp0)
    #s((application expr 0 form 0 zo 0)
       #s((toplevel expr 0 form 0 zo 0) 2 0 #f #t)
       (a (a b c))))
> (compiled-expression->zo (compile-syntax #'(if #t 'left 'right)))
'#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) left)

procedure
(zo->compiled-expression z) → compiled-expression?
z : zo?

Transform a zo struct to compiled code. The compiled code can be run with eval. If the struct z encodes a module (i.e., contains a mod sub-struct) then the result zo->compiled-expressions z can be written to a .rkt file and run using the Racket executable.

Example:

> (let* ([stx #'(string-append "hello, " "world")]
         [z     (syntax->zo stx)]
         [e     (zo->compiled-expression z)])
    (eval e (make-base-namespace)))
"hello, world"

procedure
(toplevel-syntax->zo stx) → (listof zo?)
stx : syntax?

Variant of syntax->zo, except can handle top level syntax expressions. Uses eval-compile-time-part-of-top-level/compile to compile syntax rather than just compile. As such, this function returns a list of zo structs rather than just one.

Example:

> (toplevel-syntax->zo #'(begin
                           (define x 5)
                           x))
'(#s((compilation-top zo 0)
     0
     #hash()
     #s((prefix zo 0) 0 (#s((global-bucket zo 0) x)) () insp0)
     #s((def-values form 0 zo 0)
        (#s((toplevel expr 0 form 0 zo 0) 0 0 #f #f))
        5))
  #s((compilation-top zo 0)
     0
     #hash()
     #s((prefix zo 0) 0 (#s((global-bucket zo 0) x)) () insp0)
     #s((toplevel expr 0 form 0 zo 0) 0 0 #f #f)))

3.6 Compiling C Modules

Tools for compiling modules implemented in C.

procedure
(compile-c-module c-path) → void?
c-path : (or/c path-string? path?)

Compiles a C module to a form where it can be required later.

See Inside: Racket C API for more information on how to build Racket modules in C.

WARNING: Do not replace the file produced by the functions while still inside the Racket VM. Doing so will cause undefined and potentially catastrophic behavior. As a general rule of thumb, if you modify a C file implementing a module, shut down all Racket VMs using that library. This means restarting DrRacket (not just reloading the file) whenever the C file is modified.

c-path is the path to the C file that implemented the module.

For example:

(require zordoz)
(compile-c-module "c-module.c")
(dynamic-require "c-module" 0)

syntax
(from-c c-path)

c-path : path-string?

A convenience form to compile a C module and require it directly. Use outside of a require form is a syntax error.

c-path is the path to the C file that implements the module.

For example:

(require zordoz
(from-c "c-module.c"))

3.7 Typed API

A typed variants of this API is available with (require zordoz/typed).

(require zordoz/typed)

package: zordoz

Require zordoz/typed for a typed version zordoz.

(require zordoz/typed/zo-structs)

package: zordoz

Require zordoz/typed/zo-structs for a typed version of Racket’s compiler/zo-structs.

4 Future Work

Bytecode is a useful format to program with. The language is much simpler than full-blown Racket. Besides this simple command-line explorer, here are a few ideas for future tools.

4.1 Checking two files for differences

Bytecode should be much more amenable to "diffing" than source files. Superficial changes should disappear after the compiler’s extracted the core functionality. Potential applications:

Code search – find a library function similar to a code chunk
Detect malware / corrupted files
Cheating detection software

These are just ideas for now. First thing is to explore and see what’s possible and feasible.

See S6 project for inspiration. Also, possibly, MOSS.

4.2 More Search Options

The current find tool is very simple. It is only string matching on struct names. You need to know what structs are available before using it at all.

The question is, what search tools would be most useful? Some ideas:

Search for an identifier from the source code.
Search for patterns of structs (lambdas containing if-statements, I dunno)
Find contracts, or instances of another higher-order construct.
Search for often-run structs.

SXPath seems promising; we should try replacing the spec/c representation with XML.

4.3 Bytecode Graphs

Working with the REPL gives a nice idea of that the struct hierarchy in the bytecode looks like. With a paper and pencil, I can trace out the whole picture for myself.

Unless space / efficiency becomes a problem, we should be able to generate a picture. Possibly use the racket-explorer, or maybe just make a graphical tool for Dr. Racket and record the images as the user steps through. (Imagine a REPL that remembered the picture of a user’s search path.)

4.4 Dr.Racket Integration

The command line is nice, but an interactive panel within Dr. Racket would be much nicer. One of these days.

top ← prev up next →