EBML: A binary encoding format
1 EBML
This is a Racket library to read and write EBML, the "extended binary markup language." It was designed by the implementors of the “matroska” project.
EBML is a packed format for representing structured data.
element = header data |
|
header = encoded-id encoded-len |
|
data = element ... |
| non-element |
The encoded-id and encoded-len fields use a
variable-length encoding. The first byte of this encoding
indicates—
So, for instance, the byte string
(bytes #b10001011)
... encodes the length 11, while the byte string
(bytes #b01000001 #b00000010)
... encodes the length 258. Note that there is more than one way to encode a given length; you could also encode the length 11 with the byte string
(bytes #b00100000 #b00000000 #b00001011)
Header IDs are limited to four bytes, and data lengths are limited to 8.
EBML has at least one big problem, which is that the packed representation is ambiguous. Specifically, there’s no way to reliably distinguish data that is a sequence of elements from data that is a binary blob.
To choose a concrete example, if you were using EBML to encode s-expressions, you might choose a particular header id (say, 1) to encode a pair, and another one (say, 2) to encode atoms (let’s use 3 to indicate null, just to simplify). The pairs would include two sub-elements, and the atoms would contain, say string or integer. When decoding an encoded stream, the reader needs to have a priori knowledge that the header id 1 contains sub-elements, and the headers 2 and 3 do not.
The reader and writer both use this representation of ebml-elements:
(define ebml-element? (flat-murec-contract ([ebml-element? (list/c exact-nonnegative-integer? (or/c bytes? (listof ebml-element?)))]) ebml-element?))
(require ebml/reader) | package: ebml |
;ping-pong required to know when to recur: (define (read-ebml-sexps bytes) (map expand-ebml-sexp (ebml-read bytes))) ;recur on elements that are containers: (define (expand-ebml-sexp element) (match element [(list 1 exps-bytes) (apply cons (read-ebml-sexps exps-bytes))] [(list 2 atom-bytes) (string->symbol (bytes->string/utf-8 atom-bytes))] [(list 3 atom-bytes) empty]))
procedure
in : (or/c input-port? path-string? bytes?)
(require ebml/writer) | package: ebml |
procedure
(ebml-write elements [port]) → void?
elements : (listof ebml-element?) port : (output-port?) = (current-output-port)