Zordoz
(require zordoz) | package: zordoz |
Zordoz is a tool for exploring .zo bytecode files. It offers a simple command-line interface for exploring string representations of bytecode structures.
These files describe the REPL and the API functions supporting it. Jump to the bottom of the REPL section for example usage.
1 Overview
1.1 Quickstart
To install, either use raco
raco pkg install zordoz
Or clone the repository and install manually, via raco.
$ git clone https://github.com/bennn/zordoz $ raco pkg install zordoz/
Zordoz provides a raco command. To see help information, run:
raco zordoz --help
1.1.1 Explorer
The default mode is to interactively explore a bytecode file. Assuming FILE.zo is a compiled file on your computer,
raco zordoz FILE.zo
will start a REPL session. Type help at the REPL to see available commands. See REPL for a detailed explanation of each.
1.1.2 Automated Search
raco zordoz -f STRUCT-NAME FILE.zo
$ raco zordoz -f branch -f lam private/compiled/zo-string_rkt.zo INFO: Loading bytecode file 'private/compiled/zo-string_rkt.zo'... INFO: Parsing bytecode... INFO: Parsing complete! Searching... FIND 'branch': 427 results FIND 'lam': 433 results All done!
1.2 Testing
Each source file contains a module+ test with unit tests. Run them all with:
raco test zordoz
or individually using:
raco test FILE.rkt
1.3 Project Goals
Racket offers a de-compilation API, however the structs it produces are still dense reading. This project takes a de-compiled zo struct and offers:
A string representation of the struct, with name and fields clearly labeled.
Interactive exploration of the struct’s fields.
A simple search interface for finding patterns nested within a struct.
This library should be available to as many versions of Racket as possible, and kept up-to-date.
We also hope to add more features, especially a tool for comparing two bytecode files. Feedback and suggestions appreciated!
2 REPL
The REPL is a simple, interactive way to explore bytecode files. This document is a users’ guide for the REPL. See the API page for alternate ways of starting a REPL (besides the command line).
2.1 Summary
A zo struct
A list of zo structs
Search results, obtained by calling find.
The commands observe or advance this context. Commands may be separated by newlines or semicolons.
For convenience, the REPL records previous states. We call this recorded past the history of the REPL; it is a stack of contexts.
Keeping this stack in mind is useful for understanding the REPL commands.
2.2 Commands
2.2.1 alst
List all command aliases.
For uniformity, the canonical name of each command has 4 letters. But each has a few mnemonic aliases to choose from. For example, you can type ls instead of info and cd instead of dive.
2.2.2 back
Move up to the previous context.
Each successful dive or find command changes the current context to new struct or list. Before making these transitions, we save the previous context to a stack. The back command pops and switches to the most recent element from this stack.
Note that back will fail (and print a warning) at the top of the zo struct hierarchy or the top of a saved subtree.
2.2.3 dive
Enter a struct’s field.
dive rhs
dive code/body
Only fields that contain zo structures or lists of zo structures may be explored.
Changing to a zo structure field changes the context to the child zo structure. Changing to a list field changes context to that list, from which you can select a natural-number position in the list to explore.
dive takes exactly one argument. Any more or less is not permitted.
2.2.4 find
Search the current struct’s children for a certain zo struct.
Find uses string matching to automate a simple search process.
Give it a string, for instance find lam structs nested within the current context.
The string must be the name of a zo struct—
A successful find changes context to a list of zo structs. Exploring any element of the list changes the current history to be that element’s history. You are free to explore the children and parents of any struct returned by a find query. Use jump to immediately return to the search results.
If, after exploring a search result, you move back past the list of search results, the REPL will print a notice.
2.2.5 help
Print command information.
Shows a one-line summary of each command. The tabernacle is all-knowing.
2.2.6 info
Print the current context.
This info command does the real work of exploring. It shows the current context, whether struct or list. Lists give their length and the names of their elements. Zo structs show their name, their fields’ names, and their fields’ values.
Fields which are zo structures print their names. These fields may be dive-ed into.
Fields which are lists containing at least one zo structure are printed with a natural number in square braces, indicating the number of zo structs inside the list. These fields may also be dived into.
Other fields are printed with Racket’s default printer. Be aware, lists and hashes can sometimes cause very large printouts.
2.2.7 jump
Warp back to a previously-saved context.
The commands jump and save work together. After saving or making a successful query with find, the current history is saved. At this point, a step backwards will recover this history. The interesting thing is that steps forward create a new history, and you can immediately forget that new history by calling jump.
For example, if you call find and explore one of the results, you can immediately jump back to your search results.
2.2.8 save
Mark the current context and history as a future target for jump. This is useful for marking a struct you want to backtrack to.
Note that, if you manually backtrack past a saved struct then its mark disappears and the REPL prints a notice.
2.2.9 quit
Exit the REPL.
2.3 Sample Interaction
Let’s explore the REPL’s own bytecode. Starting from the directory you cloned this repo to (or where ‘raco‘ put it on your filesystem):
$ raco zordoz private/compiled/zo-string_rkt.zo INFO: Loading bytecode file 'private/compiled/zo-string_rkt.zo'... INFO: Parsing bytecode... INFO: Parsing complete! --- Welcome to the .zo shell,version 1.0 'vortex'--- zo>
Now we can start typing commands, like info.
zo> info <zo:compilation-top> max-let-depth : 31 prefix : <zo:prefix> code : <zo:mod>
Next, let’s try a dive.
zo> dive max-let-depth 'dive max-let-depth'not permitted.
Didn’t work! That’s because max-let-depth is an integer. Let’s try one of the structs.
zo> dive prefix zo> info <zo:prefix> num-lifts : 0 toplevels : [#f] stxs : []
Great! We can’t dive any further from here, so let’s go back up.
zo> back zo> info <zo:compilation-top> max-let-depth : 31 prefix : <zo:prefix> code : <zo:mod>
And we’re back to where we began. From here we could dive to the code field and print it, but the printout is a little overwhelming. The module we’re exploring, zo-string, creates over 40 different functions. There’s just a lot of data to look at, and because it’s heterogenous data we do not have a nice way of truncating it.
Instead, we’ll try the find command. Be warned, the search might take a minute.
zo> find compilation-top FIND returned 0 results
Zero results is good: there should not be any other compilation-top structs besides the one we’re currently in. Now try searching for something else, like branch.
zo> find branch FIND returned 422 results FIND automatically saving context <zo:branch>[422]
Wow! Over 400 results. We can start exploring one of them:
zo> dive 17 zo> info <zo:branch> test : <zo:application> then : <zo:seq> else : <zo:branch>
We can also explore its children and parents.
zo> dive test zo> info <zo:application> rator : <zo:primval> rands : [<zo:localref>] zo> dive rator zo> info <zo:primval> id : 90 zo> up zo> up zo> info <zo:branch> test : <zo:application> then : <zo:seq> else : <zo:branch> zo> up zo> info <zo:branch> test : <zo:localref> then : <zo:branch> else : #f
And if we do a jump, we return to the search results.
zo> jump zo> info <zo:branch>[422]
3 API
These functions support the REPL, but may be useful in more general settings. Import them with (require zordoz).
3.1 Starting a REPL
procedure
(filename->shell fname) → void?
fname : path-string?
procedure
(syntax->shell stx) → void?
stx : syntax?
3.2 String Representations
These tools convert a zo structure to a pretty-printed string, or a more structured representation.
procedure
(zo->string z #:deep? deep) → string?
z : zo? deep : boolean?
> (displayln (zo->string (primval 129)))
<zo:primval>
id : 129
> (displayln (zo->string (primval 129) #:deep? #f)) <zo:primval>
> (displayln (zo->string (branch (= 3 1) "true" 'false)))
<zo:branch>
test : #f
then : true
else : false
A string, representing its name.
Pairs, representing the struct’s fields. The first element of each pair should be a string representing the field name. The second element should be a thunk that, when forced, yields either a string or another spec.
> (zo->spec (primval 129)) '("primval" ("id" . #<procedure:...te/zo-string.rkt:87:26>))
> (zo->spec (branch (= 3 1) "true" 'false))
'("branch"
("test" . #<procedure:...te/zo-string.rkt:87:26>)
("then" . #<procedure:...te/zo-string.rkt:87:26>)
("else" . #<procedure:...te/zo-string.rkt:87:26>))
3.3 Traversing zo Structs
Racket does not provide a reflective way of accessing struct fields at runtime. So we provide a function that does this by-force, just for zo structures.
If the field str does not exist, or does not denote a zo struct, return the argument z and the boolean value #f.
If the field str denotes a list and we can parse zo structs from the list, return a list of zo structs and the boolean #t.
(Expected case) If the field points to a zo struct, return the new zo struct and the boolean #t.
> (let-values ([(z success?) (zo-transition (primval 42) "foo")]) (displayln success?) z) #f
'#s((primval expr 0 form 0 zo 0) 42)
> (let-values ([(z success?) (zo-transition (primval 42) "id")]) (displayln success?) z) #f
'#s((primval expr 0 form 0 zo 0) 42)
> (let-values ([(z success?) (zo-transition (application (primval 42) '()) "rator")]) (displayln success?) z) #t
'#s((primval expr 0 form 0 zo 0) 42)
3.4 Searching Structs
If you know the name of the zo struct you hope to find by exploring a subtree, you can automate the exploring. Literally, find is repeated application of zo->string and zo-transition.
The return value is a list of result structs rather than plain zo structs because we record the path from the argument z down to each match.
> (let* ([seq* (list (seq '()) (seq '()))] [z (seq (list (seq seq*) (seq seq*)))]) (zo-find z "seq" #:limit 1))
(list
(result
'#s((seq form 0 zo 0) (#s((seq form 0 zo 0) ()) #s((seq form 0 zo 0) ())))
'())
(result
'#s((seq form 0 zo 0) (#s((seq form 0 zo 0) ()) #s((seq form 0 zo 0) ())))
'()))
> (let* ([thn (primval 0)] [els (branch #t (primval 1) (primval 2))] [z (branch #t thn els)]) (map result-zo (zo-find z "primval")))
'(#s((primval expr 0 form 0 zo 0) 0)
#s((primval expr 0 form 0 zo 0) 1)
#s((primval expr 0 form 0 zo 0) 2))
procedure
(find-all fname qry* [#:limit lim]) → void?
fname : path-string? qry* : (Listof String) lim : (or/c natural-number/c #f) = #f
3.5 Compiling and Decompiling
Tools for compiling syntax fragments rather than entire modules.
procedure
(syntax->zo stx) → zo?
stx : syntax?
> (syntax->zo #'6) '#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) 6)
> (syntax->zo #'(member 'a '(a b c)))
'#s((compilation-top zo 0)
2
#hash()
#s((prefix zo 0)
0
(#s((module-variable zo 0)
#<module-path-index:("member.rkt" "pre-base.rkt" "private/base.rkt" racket/base)>
member
0
0
#f))
()
insp0)
#s((application expr 0 form 0 zo 0)
#s((toplevel expr 0 form 0 zo 0) 2 0 #f #t)
(a (a b c))))
> (syntax->zo #'(if #t 'left 'right)) '#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) left)
procedure
(syntax->decompile stx) → any/c
stx : syntax?
> (syntax->decompile #'6) '(begin (quote inspector insp0) '6)
> (syntax->decompile #'(member 'a '(a b c)))
'(begin
(quote inspector insp0)
(|_member@(lib "racket/private/member.rkt")| 'a '(a b c)))
> (syntax->decompile #'(if #t 'left 'right)) '(begin (quote inspector insp0) 'left)
procedure
(compiled-expression->zo cmp) → zo?
cmp : compiled-expression?
> (compiled-expression->zo (compile-syntax #'6)) '#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) 6)
> (compiled-expression->zo (compile-syntax #'(member 'a '(a b c))))
'#s((compilation-top zo 0)
2
#hash()
#s((prefix zo 0)
0
(#s((module-variable zo 0)
#<module-path-index:("member.rkt" "pre-base.rkt" "private/base.rkt" racket/base)>
member
0
0
#f))
()
insp0)
#s((application expr 0 form 0 zo 0)
#s((toplevel expr 0 form 0 zo 0) 2 0 #f #t)
(a (a b c))))
> (compiled-expression->zo (compile-syntax #'(if #t 'left 'right))) '#s((compilation-top zo 0) 0 #hash() #s((prefix zo 0) 0 () () insp0) left)
procedure
z : zo?
> (let* ([stx #'(string-append "hello, " "world")] [z (syntax->zo stx)] [e (zo->compiled-expression z)]) (eval e (make-base-namespace))) "hello, world"
procedure
(toplevel-syntax->zo stx) → (listof zo?)
stx : syntax?
> (toplevel-syntax->zo #'(begin (define x 5) x))
'(#s((compilation-top zo 0)
0
#hash()
#s((prefix zo 0) 0 (#s((global-bucket zo 0) x)) () insp0)
#s((def-values form 0 zo 0)
(#s((toplevel expr 0 form 0 zo 0) 0 0 #f #f))
5))
#s((compilation-top zo 0)
0
#hash()
#s((prefix zo 0) 0 (#s((global-bucket zo 0) x)) () insp0)
#s((toplevel expr 0 form 0 zo 0) 0 0 #f #f)))
3.6 Compiling C Modules
Tools for compiling modules implemented in C.
procedure
(compile-c-module c-path) → void?
c-path : (or/c path-string? path?)
See Inside: Racket C API for more information on how to build Racket modules in C.
WARNING: Do not replace the file produced by the functions while still inside the Racket VM. Doing so will cause undefined and potentially catastrophic behavior. As a general rule of thumb, if you modify a C file implementing a module, shut down all Racket VMs using that library. This means restarting DrRacket (not just reloading the file) whenever the C file is modified.
c-path is the path to the C file that implemented the module.
For example:
(require zordoz) (compile-c-module "c-module.c") (dynamic-require "c-module" 0)
syntax
(from-c c-path)
c-path : path-string?
c-path is the path to the C file that implements the module.
For example:
(require zordoz (from-c "c-module.c"))
3.7 Typed API
A typed variants of this API is available with (require zordoz/typed).
(require zordoz/typed) | package: zordoz |
(require zordoz/typed/zo-structs) | package: zordoz |
4 Future Work
Bytecode is a useful format to program with. The language is much simpler than full-blown Racket. Besides this simple command-line explorer, here are a few ideas for future tools.
See also the project’s issue tracker.
4.1 Checking two files for differences
Code search – find a library function similar to a code chunk
Detect malware / corrupted files
Cheating detection software
See S6 project for inspiration. Also, possibly, MOSS.
4.2 More Search Options
The current find tool is very simple. It is only string matching on struct names. You need to know what structs are available before using it at all.
Search for an identifier from the source code.
Search for patterns of structs (lambdas containing if-statements, I dunno)
Find contracts, or instances of another higher-order construct.
Search for often-run structs.
SXPath seems promising; we should try replacing the spec/c representation with XML.
4.3 Bytecode Graphs
Working with the REPL gives a nice idea of that the struct hierarchy in the bytecode looks like. With a paper and pencil, I can trace out the whole picture for myself.
Unless space / efficiency becomes a problem, we should be able to generate a picture. Possibly use the racket-explorer, or maybe just make a graphical tool for Dr. Racket and record the images as the user steps through. (Imagine a REPL that remembered the picture of a user’s search path.)
4.4 Dr.Racket Integration
The command line is nice, but an interactive panel within Dr. Racket would be much nicer. One of these days.