Binary-class:   parsing and saving binary data
1 Binary class. Base system.
binary
define-binary-class
binary<%>
read
write
1.1 Utilities
read-value
write-value
2 Common datatypes
integer-be
integer-le
signed
u1
u2
u3
u4
l1
l2
l3
l4
float-be
float-le
double-be
double-le
bytestring
2.1 Control binary types
discard
constant
current-position
move-position
ref
3 Strings
generic-string
generic-terminated-string
iso-8859-1-string
iso-8859-1-terminated-string
ucs-2-string
ucs-2-terminated-string
iso-8859-1-char
ucs-2-char
ucs-2-char-type
4 Performance and safety
5 Contracts
binary/  c
binary-class/  c
unsigned-integer/  c
signed-integer/  c
u1?
u2?
u3?
u4?
iso-8859-1-char?
ucs-2-char?
iso-8859-1-string?
ucs-2-string?
iso-8859-1-len/  c
ucs-2-len/  c
iso-8859-1-terminated/  c
ucs-2-terminated/  c
string-terminated/  c
6.3.90.900

Binary-class: parsing and saving binary data

Roman Klochkov <kalimehtar@mail.ru>

 (require binary-class) package: binary-class

The binary-class combines binary-class/base, binary-class/common and binary-class/string.

1 Binary class. Base system.

 (require binary-class/base) package: binary-class

This package is based upon idea in Practical Common Lisp.

Binary formats usually are represented as a sequence of fields. So the module allows to define classes with their binary types.

For example, ID3 tag is specified like this:

ID3/file identifier      "ID3"

ID3 version              $02 00

ID3 flags                %xx000000

ID3 size             4 * %0xxxxxxx

It may be represented as
(define-binary-class id3-tag
  ((file-identifier (iso-8859-1-bytes 3))
   (major-version   u1)
   (revision        u1)
   (flags           u1)
   (size            id3-tag-size)
   (frames          (id3-frames size))))

Here iso-8859-1-bytes should be a function of one argument, that returns structure binary. u1, id3-tag-size are simply such structures.

struct

(struct binary (read write)
    #:extra-constructor-name make-binary)
  read : (-> input-port? any)
  write : 
(or/c
 (->* (output-port? any/c) #:rest list? void?)
 (-> output-port? any/c void?))
A structure type for binary values. read is a function, that reads from input port and returns the data (maybe several values). write – takes output-port and data to write the data in port.

Note, that you may use values of previous fields to calculate the type of the next. In the example the value of field size is used to set the type of field frames.

Another common case is "tagged structures". In that case there is a tag in the beginning of data. The structure of the rest of the data depends upon the value of the tag.

To accomplish the goal one may use #:dispatch option.
(define-binary-class id3-frame
  ((id     (iso-8859-1-bytes 3))
   (size   u3))
  #:dispatch (find-frame-class id))

Function find-frame-class should return the binary class for given id.

You may even insert any expression after #:dispatch
(define-binary-class id3-tag
  ((identifier     (iso-8859-1-bytes 3))
   (major-version  u1)
   (revision       u1)
   (flags          u1)
   (size           id3-tag-size))
  #:dispatch
   (case major-version
     ((2) id3v2.2-tag)
     ((3) id3v2.3-tag)))

Besides, you may use inheritance: simply add super class after the class name.
(define-binary-class id3v2.2-tag id3-tag
  ((frames (id3-frames size id3v2.2-frame))))

If you use #:dispatch, result class should be either inherited from current class, or at least to have all fields that the current class has. Super class of a binary class may be also not binary class, but then it should have no methods read and write.

syntax

(define-binary-class id [superclass-expr]
  ((field field-expr arg ...) ...)
  [#:dispatch dispatch-expr]
  class-body ...)
 
field = _
  | id
  | (id ...)
 
  superclass-expr : class?
  field-expr : (or/c binary? (implementation?/c binary<%>))
  dispatch-expr : (is-a?/c binary<%>)
Defines new binary class and binds it to id. class-body any definitions, allowed inside class.

field may be _. This means, that the field is omitted. In this case no field is created in class, but the data is read and is written from/to the binary port. Value for writing is #f.

field may be list of id’s. In this case binary, returned from field-expr should return the same number of values.

superclass-expr may be either id of a binary class, or any expression, returning non-binary class. If you return binary class from expression, then it is not error, but fields of given class will not be visible inside the current class field-exprs.

If field-expr returns class, implementing binary<%>, then provided arg’s will be used as init arguments to make-object, otherwise they ignored.

If field-expr returns not binary<%> neither binary, then the returned value assigned to field as is.

Binary class implements interface binary<%>:

interface

binary<%> : interface?

method

(send a-binary read in)  (is-a?/c binary<%>)

  in : input-port?
Reads the object from in and returns it.

method

(send a-binary write out)  void?

  out : output-port?
Writes the object to out

1.1 Utilities

To make the usage of the module easier there are some shortcuts for reading and writing binary values.

procedure

(read-value type in init-v ...)  any

  type : any/c
  in : input-port?
  init-v : any/c
Reads binary value from input port and returns it. When type is binary, read-value returns value read from in with read field of type. When type is binary<%>, read-value makes object with given init-v’s, then fills it from in and returns it. When type is any other value, read-value returns type.

procedure

(write-value type out value)  void?

  type : any/c
  out : output-port?
  value : any/c
Writes binary value to output port. When type is binary, write-value writes value to out with write field of type. When type is binary<%>, write-value assumes, that value is a binary object and writes it to out. When type is any other value, write-value do nothing.

2 Common datatypes

 (require binary-class/common) package: binary-class

Most common data in binary file is integer numbers in little-endian or big-endian order, or bytestrings. So you may use them from this module.

procedure

(integer-be bytes [bits-per-byte])  binary?

  bytes : exact-positive-integer?
  bits-per-byte : exact-positive-integer? = 8
Returns binary datatype for unsigned integer with big-endian order

procedure

(integer-le bytes [bits-per-byte])  binary?

  bytes : exact-positive-integer?
  bits-per-byte : exact-positive-integer? = 8
Returns binary datatype for unsigned integer with little-endian order

procedure

(signed base-type bytes [bits-per-byte])  binary?

  base-type : (-> exact-positive-integer? exact-positive-integer? binary?)
  bytes : exact-positive-integer?
  bits-per-byte : exact-positive-integer? = 8
Returns binary datatype for signed integer. base-type expected to be integer-be or integer-le or any other binary type with same signature.

value

u1 : binary?

value

u2 : binary?

value

u3 : binary?

value

u4 : binary?

value

l1 : binary?

value

l2 : binary?

value

l3 : binary?

value

l4 : binary?

Binary types for big-endian integers u1u4 and little-edian ones l1l4. Number 1–4 displays the length of the integer in bytes

Binary types for real numbers. Big-endian float-be, double-be and little-edian float-le, double-le.

procedure

(bytestring bytes)  binary?

  bytes : exact-positive-integer?
Reads and writes bytes to bytes? from binary port and vice versa.

2.1 Control binary types

procedure

(discard bytes)  binary?

  bytes : exact-positive-integer?
Reads given number of bytes and return #f. Writes given number of null bytes. Recommended for use with field id _ when you see "Reserved" in the specification.

procedure

(constant bytes)  binary?

  bytes : bytes?
When reading, checks that bytes is read, returns #f. When writing, writes bytes. Recommended for use with field id _ when you see "Signature" in the specification.

Return current position in file. Writes nothing.

procedure

(move-position position)  binary?

  position : exact-nonnegative-integer?
Sets position in input port when reading and in output port when writing.

procedure

(ref position type init-v ...)  binary?

  position : exact-nonnegative-integer?
  type : binary?
  init-v : any/c
Sets given position, process type, then returns to original position. If given init-v’s and type is a binary<%>, then init-v’s will be passed to read-object, when reading type.

3 Strings

 (require binary-class/string) package: binary-class

In this module there are several binary types for reading and writing string?.

procedure

(generic-string length character-type)  binary?

  length : exact-positive-integer?
  character-type : binary?
Returns type, describing string with given fixed length and character-type, that describes how to read and write every char?.

procedure

(generic-terminated-string terminator    
  character-type)  binary?
  terminator : char?
  character-type : binary?
Returns type, describing string with given terminator and character-type. terminator is present in file, but not in Racket string.

procedure

(iso-8859-1-string length)  binary?

  length : exact-positive-integer?
String, represented in file as a ISO 8859-1 string with fixed length. Only char?’s with codes up to 255 allowed

procedure

(iso-8859-1-terminated-string [terminator])  binary?

  terminator : char? = #\nul
String, represented in file as a ISO 8859-1 string with terminator. Only char?’s with codes up to 255 allowed

procedure

(ucs-2-string length)  binary?

  length : exact-positive-integer?
String, represented in file as a UCS-2 string with fixed length. Only char?’s with codes up to 65535 allowed

procedure

(ucs-2-terminated-string [terminator])  binary?

  terminator : char? = #\nul
String, represented in file as a UCS-2 string with terminator. Only char?’s with codes up to 65535 allowed

ISO 8859-1 character

procedure

(ucs-2-char swap?)  binary?

  swap? : boolean?
UCS-2 char. swap is #t, when byte order in file and operating system is different.

procedure

(ucs-2-char-type byte-order-marker)  binary?

  byte-order-marker : (or/c 65279 65534)
UCS-2 char from given byte order marker. BOM should be #xFEFF, so if it is #xFFFE, then ucs-2-char should be swapped.

4 Performance and safety

By default contracts in binary-class/* check only function arguments. If you need more security or more performance, you may use instead their submodules: Submodule safe gives maximum safety and contract checks, submodule unsafe gives maximum performance, but no check at all.

 (require (submod binary-class safe))
 (require (submod binary-class unsafe))
 (require (submod binary-class/base safe))
 (require (submod binary-class/base unsafe))
 (require (submod binary-class/common safe))
 (require (submod binary-class/common unsafe))
 (require (submod binary-class/string safe))
 (require (submod binary-class/string unsafe))

5 Contracts

 (require binary-class/contract) package: binary-class

This module introduces useful contracts for use with binary classes.

procedure

(binary/c value-contract)  contract?

  value-contract : contract?
Returns contract for binary with value contracted by value-contract

syntax

(binary-class/c binary-class-id maybe-opaque member-spec ...)

 
maybe-opaque = 
  | #:opaque
     
member-spec = method-spec
  | (field field-spec ...)
  | (init field-spec ...)
  | (init-field field-spec ...)
  | (inherit method-spec ...)
  | (inherit-field field-spec ...)
  | (super method-spec ...)
  | (inner method-spec ...)
  | (override method-spec ...)
  | (augment method-spec ...)
  | (augride method-spec ...)
  | (absent absent-spec ...)
     
method-spec = method-id
  | (method-id method-contract-expr)
     
field-spec = field-id
  | (field-id contract-expr)
     
absent-spec = method-id
  | (field field-id ...)
Defines contract for binary class. binary-class-id should be an id of existing binary class. Rest arguments are the same as for class/c.

procedure

(unsigned-integer/c bytes [bits-per-byte])  flat-contract?

  bytes : exact-integer?
  bits-per-byte : exact-integer? = 8
Defines contract for integer-be and integer-le with given bytes and bits-per-byte. Checks that value is exact integer and in avalable range.

procedure

(signed-integer/c bytes [bits-per-byte])  flat-contract?

  bytes : exact-integer?
  bits-per-byte : exact-integer? = 8
Defines contract for signed with given bytes and bits-per-byte. Checks that value is exact integer and in avalable range.

value

u1? : flat-contract?

value

u2? : flat-contract?

value

u3? : flat-contract?

value

u4? : flat-contract?

Contracts for u1u4 and l1l4.

value

iso-8859-1-char? : flat-contract?

Contract for ISO 8859-1 character. Asserts that it can be converted to ISO 8859-1 charset.

value

ucs-2-char? : flat-contract?

Contract for UCS-2 character. Asserts that it can be converted to UCS-2 charset.

value

iso-8859-1-string? : flat-contract?

Contract for ISO 8859-1 string. Asserts that all chars can be converted to ISO 8859-1 charset.

value

ucs-2-string? : flat-contract?

Contract for UCS-2 string. Asserts that all chars can be converted to UCS-2 charset.

procedure

(iso-8859-1-len/c len)  flat-contract?

  len : real?
Recognizes ISO 8859-1 strings that have fewer than len characters.

procedure

(ucs-2-len/c len)  flat-contract?

  len : real?
Recognizes UCS-2 strings that have fewer than len characters.

procedure

(iso-8859-1-terminated/c terminator)  flat-contract?

  terminator : char?
Recognizes ISO 8859-1 strings that doesn’t have terminator character.

procedure

(ucs-2-terminated/c terminator)  flat-contract?

  terminator : char?
Recognizes UCS-2 strings that doesn’t have terminator character.

procedure

(string-terminated/c terminator    
  [char-contract])  flat-contract?
  terminator : char?
  char-contract : flat-contract? = any/c
Recognizes strings that doesn’t have terminator character and whose characters match char-contract.