Usage
*****
.. current-library:: binary-data
.. current-module:: binary-data
Terminology
===========
A vector of bytes that has an associated definition for its
interpretation is a *frame*. These come in two variants: some cannot
be broken down further structurally, we call those *leaf frames*. The
others have a composite structure, those are *container frames*. They
consist of a number of *fields*, which are the named components of
that frame. Every field in a container frame is a frame in itself,
leading to a recursive definition of frames. The description of the
structure of a container frame in our domain-specific language
:macro:`define binary-data` is referred to as a *binary data
definition*.
Representation in Dylan
=======================
The binary-data library provides an extension to Dylan for manipulating frames,
with a representation of frames as Dylan objects, and a set of functions on
these objects to perform the manipulation. The representation used
introduces a class hierarchy rooted at the abstract superclass :class:``,
with the two disjoint abstract subclasses :class:`` and
:class:``. Every type of frame in the system is represented
as a concrete subclass of either one, and actual frames are instances of
these classes. A pair of generic functions, :class:`parse-frame` and
:gf:`assemble-frame`, convert a given byte vector into the appropriate
high-level instance of :class:``, or vice versa.
Typical code that handles a frame then looks like this:
.. code-block:: dylan
let frame = parse-frame(, some-byte-vector);
format-out("This packet goes from %= to %=\n\",
frame.source-address,
frame.destination-address);
The first line binds the variable frame to an instance of some subclass of
````. This instance is created from the vector of bytes
passed to the call of :gf:`parse-frame`. Then, the value of the source and
destination address fields in the Ethernet frame are extracted and printed.
The class :class:`` defines several generic functions:
.. hlist::
* :gf:`parse-frame` instantiates a :class:`` with the value taken from a given byte-vector
* :gf:`assemble-frame` encodes a :class:`` instance into its byte-vector
* :gf:`frame-size` returns the size (in bit) of the given frame
* :gf:`summary` prints a human-readable summary of the given frame
Some properties are mixed in into our class hierarchy by introducing
the direct subclasses of :class:``:
For efficiency reasons, there is a distinction between frames that
have a static (compile-time) size (:class:``) and
frames of dynamic size (:class:``).
Another property is translation of the value into a Dylan object of
the standard library. An example of such a :class:``
is the (fixed size) type :class:`<2byte-big-endian-unsigned-integer>`
which is translated into a Dylan :drm:``. This is referred to
as a *translated frame* while frames without a matching Dylan type are
known as *untranslated frames* (:class:``).
The appropriate classes and accessor functions are not written
directly for container frames. Rather, they are created by invocation
of the macro :macro:`define binary-data`. This serves two purposes: it
allows a more compact representation, eliminating the need to write
boilerplate code over and over again, and it hides implementation
details from the user of the DSL.
Frame Types
===========
Leaf Frames
-----------
A leaf frame can be fixed or variable size, and translated or
untranslated. Examples are:
.. hlist::
* :class:`` has a variable size and no translation
* :class:`` has a zero size and no translation
* :class:`` (e.g. an IPv4 address) has a fixed size and no translation
* :class:`<2byte-big-endian-unsigned-integer>` has a fixed size of 16 bits, and its translation is a Dylan :drm:``.
FIXME: :class:`` is variable size and
untranslated, though :drm:`as` in both directions with :drm:``
is provided (should inherit from translated frame)
The generic function :gf:`read-frame` is used to convert a
:drm:`` into an instance of a ``.
FIXME: why is read-frame not defined on container-frame?
The running example in this guide will be an ````,
which contains the mac address of the source and a mac-address of the
destination. A mac address is the unique address of each network
interface, assigned by the IEEE. It consists of 6 bytes and is usually
printed in hexadecimal, each byte separated by ``:``.
The definition of the ```` class in Dylan is:
.. code-block:: dylan
define class ()
end;
define inline method field-size (type == )
=> (length :: )
6 * 8
end;
define method mac-address (data :: )
=> (res :: )
parse-frame(, data)
end;
define method mac-address (data :: )
=> (res :: )
read-frame(, data)
end;
define method read-frame (type == , string :: )
=> (res :: )
let res = as-lowercase(string);
if (any?(method(x) x = ':' end, res))
//input: 00:de:ad:be:ef:00
let fields = split(res, ':');
unless(fields.size = 6)
signal(make())
end;
make(,
data: map-as(,
rcurry(string-to-integer, base: 16),
fields))
else
//input: 00deadbeef00
...
end;
end;
define method as (class == , frame :: )
=> (string :: )
reduce1(method(a, b) concatenate(a, ":", b) end,
map-as(,
rcurry(integer-to-string, base: 16, size: 2),
frame.data))
end;
The data is stored in the ``data`` slot of the
:class:``, the ``field-size`` method
returns statically 48 bit, syntax sugar for constructing
```` instances are provided, ``read-frame`` converts a
````, whereas ``as`` converts a ```` into human
readable output.
A leaf frame on its own is not very useful, but it is the building
block for the composed container frames.
Container Frame
---------------
The container frame class inherits from :class:``
and :class:``.
A container frame consists of a sequence of fields. A field represents
the static information about a protocol: the name of the field, the
frame type, possibly a start and length offset, a length, a method for
fixing the byte vector, ...
The list of fields for a given :class:`` persists
only once in memory, the dynamic values are represented by
:class:`` objects.
Methods defined on :class:``:
.. hlist::
* :gf:`fields` returns the list of :class:`` instances
* :gf:`field-count` returns the size of the list
* :gf:`frame-name` returns a short identifier of the frame
The definer macro :macro:`define binary-data` translates the
binary-data DSL into a class definition which is a subclass of
:class:`` (and other useful stuff).
The class :class:`` is a direct subclass of
:class:`` which is used for container frames which
consist of a header (addressing, etc) and some payload, which might
also be a container-frame of variable type.
The running example is an ````, which is shown as
binary data definition.
.. code-block:: dylan
define binary-data ()
summary "ETH %= -> %=", source-address, destination-address;
field destination-address :: ;
field source-address :: ;
layering field type-code :: <2byte-big-endian-unsigned-integer>;
variably-typed field payload, type-function: frame.payload-type;
end;
The first line specifies the name ````, and its
superclass, :class:``.
The second line specialises the method :gf:`summary` on an
```` to print ``ETH``, the source address and the
destination address.
The remaining lines represent each one field in the ethernet frame
structure. The ``source-address`` and ``destination-address`` are each
of type ````.
The ``type-code`` field is a 16 bit integer, and it is a ``layering``
field (:class:``). This means that its value is used
to determine the type of its payload! Also, when assembling such a
frame, the layering field will be filled out automatically depending
on the payload type. There can be at most one ``layering`` field in a
binary data definition.
The last field is the payload, whose type is variable and given by
applying the function ``payload-type`` to the concrete frame instance.
The default type-function of a :class:`` is
:func:`payload-type`.
A payload for an ```` might be a ````, if
the ``type-code`` is ``#x8100`` (the ``over`` keyword takes care of
the hairy details).
.. code-block:: dylan
define binary-data ()
over #x8100;
summary "VLAN: %=", vlan-id;
field priority :: <3bit-unsigned-integer> = 0;
field canonical-format-indicator :: <1bit-unsigned-integer> = 0;
field vlan-id :: <12bit-unsigned-integer>;
layering field type-code :: <2byte-big-endian-unsigned-integer>;
variably-typed field payload, type-function: frame.payload-type;
end;
Default values for fields can be provided, similar to Dylan class
definitions, using the equals sign (``=``) after the field type.
A more detailed description of the binary data language can be found
in its reference :macro:`define binary-data`.
Inheritance: Variably Typed Container Frames
--------------------------------------------
A container frame can inherit from another container frame that
already has some fields defined. The
:class:`` class is used in container
frames which have the type information encoded in the frame. The
layering field (:class:``) of such container
frames must be parsed in order to determine the actual type.
Continuing with the ```` example, consider the `options of an
IPv4 packet `__. These share a
common header (``copy-flag`` and ``option-type``), but a concrete option
might have additional fields. The end of the options list is determined by
the ``header-length`` field of an IPv4 packet and by the
```` (whose ``option-type`` is 0).
.. code-block:: dylan
define abstract binary-data ()
field copy-flag :: <1bit-unsigned-integer>;
layering field option-type :: <7bit-unsigned-integer>;
end;
define binary-data ()
over 0;
end;
define binary-data ()
over 20;
field router-alert-length :: = 4;
field router-alert-value :: <2byte-big-endian-unsigned-integer>;
end;
This defines the ```` which has the ``option-type`` field
in the ip-option frame set to ``0``. An ```` does not
contain any further fields, thus only has the two fields inherited
from the ````.
The ```` specifies two more fields, which are
appended to the inherited fields.
Fields
======
The domain-specific language :macro:`define binary-data` provides
syntactic sugar to create :class:`` instances. A client should
not need to instantiate these directly. A field contains the static
information (such as type, length, default value) of a sequence of
bits inside of a :class:``.
Binary data formats have some common patterns which are directly
integrated into this library:
* *variably-typed* fields for payloads
* *layering* of protocols in the OSI network stack
* *enumeration* where the bit value has a direct correspondence to a :drm:``
* *repeating* occurences of a field, such as key-value pairs
.. note:: There might be more patterns, if you find any, please tell us!
Variably-typed
--------------
Most fields have the same type in all frame instances, i.e. they are
statically typed. In some cases however, the type of a field can
depend on the value of another field in the same :class:``.
Such fields can be defined using :class:`` which does
not have a static type, but an expression determining the field type for a
concrete frame instance.
This example uses the ``variably-typed field`` syntax. The
``type-function`` keyword has ``frame`` bound to the concrete frame
object.
.. code-block:: dylan
field length-type :: <2bit-unsigned-integer>;
variably-typed field body-length,
type-function: select (frame.length-type)
0 => ;
1 => <2byte-big-endian-unsigned-integer>;
2 => <4byte-big-endian-unsigned-integer>;
3 => ;
end;
Note that whenever the actual type of a variably-typed field resolves to the
type it means that the field is completely missing from the
container frame.
Layering
--------
Binary data format stacking is omnipresent in network protocols. An
ethernet frame can contain different types of payload, amongst others
ARP frames, IPv4 frames. This library provides syntactic sugar
``layering`` to define which field in a frame determines the type of
the payload. A binary data definition can also specify which value is
used to be the payload of another binary data format.
A layering field (:class:``) provides the information
that the value of this field controls the type of the payload, and
establishes a registry for field values and matching payload types.
The registry can be extended with the ``over`` syntax of
:macro:`define binary-data`, and it can be queried using the
convinience function :func:`payload-type`, or :gf:`lookup-layer` and
:gf:`reverse-lookup-layer`.
Enumeration
-----------
An enumerated field (:class:``) provides a set of mappings
from the binary value to a Dylan :drm:``. Note that the binary
value must be a numerical type so that the mapping is from an integer
to a symbol.
In this example, accessing the value of the field would return one of
the symbols rather than the value of the :class:``. For
mappings not specified, the integer value is used:
.. code-block:: dylan
enum field command :: = 0,
mappings: { 1 <=> #"connect",
2 <=> #"bind",
3 <=> #"udp associate" };
Repeating
---------
Repeated fields (:class:``) have a list of values of
the field type, instead of just a single one. Currently two kinds
of repeated fields are supported, :class:``
and :class:``, they only differ in the way the
number of elements in the repeated field is determined.
A self-delimited field definition uses an expression to evaluate whether
or not the end of the list of values has been reached, usually by checking
for a magic value. This expression should return ``#t`` when the field is
fully parsed. For example:
.. code-block:: dylan
repeated field options :: ,
reached-end?:
instance?(frame, );
A count field definition uses another field in the frame to determine
how many elements are in the field. For example:
.. code-block:: dylan
field number-methods :: ,
fixup: frame.methods.size;
repeated field methods :: ,
count: frame.number-methods;
Note the use of the ``fixup`` keyword on the ``number-methods`` field to
calculate a value for use by :gf:`assemble-frame` if the value is not
otherwise specified.
Adding a New Leaf Frame Type
----------------------------
Depending on the properties of the frame, there are different methods
which should be specialized. In general, there need to be a
specialization of the size, how to parse, and how to assemble the
frame.
There are two generic functions which should be specialized by every
:class:`` subclass: :gf:`parse-frame` and
:gf:`read-frame`.
.. note:: there should be a ``print-frame`` as well, rather than using ``as(, frame)``.
Fixed size frames must specialize :gf:`field-size`, variable sized
ones :gf:`frame-size`.
Translated frames must specialize :gf:`high-level-type` and
:gf:`assemble-frame-into-as`.
Untranslated frames must specialize :gf:`assemble-frame-into`.
There are already several classes and macros implemented where these
methods are defined.
See also
- :class:``
- :class:``
- :class:``
- :class:``
- :class:``
- :macro:`define n-byte-vector`
- :macro:`define n-bit-unsigned-integer`
- :macro:`define n-byte-unsigned-integer`
- :class:``
- :class:``
- :class:``
- :class:``
- :class:``
- :class:``
Efficiency Considerations
=========================
The design goal of this library is, as usual in object-centered
programming, that the time and space overhead are minimal (the
compiler should remove all the indirections!).
This library is carefully designed to achieve this goal, while not
limiting the expressiveness, sacrificing the safety, or burdening the
developer with inconvenient syntactic noise. A story about binary data
is that there are often big chunks of data, and deeply nested pieces
of data. The good news is that most applications do not need all
binary data.
The binary data library was designed with lazy parsing in mind: if a
byte vector is received, the high-level object does not parse the byte
vector completely, but only the requested fields. To achieve this, we
gather information about each field, specifically its start and end
offset, and also its length, already at compile time, using a number
system consisting of the type union between :drm:`` and
:const:`$unknown-at-compile-time`, for which basic arithmetic is
defined.
For fixed sized fields, meaning single fields with a static and fixed
size frame type, their length is propagated while the DSL iterates
over the fields. All field offsets for the ```` are
known at compile time. Accessing the ``payload`` is a subsequence
operation (performing zerocopy) starting at bit 112 (or byte 15) of
the binary vector.
While at the user level arithmetic is on the bit level, accesses at
byte boundaries are done directly into the byte vector. This is
encapsulated in the class :class:``
FIXME: move to a separate module
Each binary data macro call defines a container class with two direct
subclasses, a high-level decoded class
(:class:``) and a partially parsed one with
an attached byte-vector (:class:``). The
decoded class has a list of :class:`` instances, which
contain the metadata (size, fixup function, reference to the field,
etc.) of each field. The partially parsed class reuses this class in
its ``cache`` slot, and keeps a reference to its byte vector in
another slot.