The Meta Library

Introduction

This is an implementation of Meta, a technique used to simplify the task of writing parsers. [Baker91] describes Meta and shows the main ideas for an implementation in Common Lisp.

If all META did was recognize regular expressions, it would not be very useful. It is a programming language, however, and the operations [], {} and $ correspond to the Common Lisp control structures AND, OR, and DO.[8] Therefore, we can utilize META to not only parse, but also to transform. In this way, META is analogous to “attributed grammars” [Aho86], but it is an order of magnitude simpler and more efficient. Thus, with the addition of the “escape” operation “!”, which allows us to incorporate arbitrary Lisp expressions into META, we can not only parse integers, but produce their integral value as a result. – [Baker91]

The macro defined here is an attempt to implement Meta (with slightly adapted syntax) for Dylan. It is functional, but not yet optimized.

Exported facilities

The meta-library exports the meta module with the following macros:

meta-definer Macro

Meta integrates the ability to parse from streams and strings in one facility. (The parsing of lists is not implemented yet, because it’s rather useless in Dylan. This addition would be simple to do, though.)

Macro-call:
define meta name (variables) => (results)
  meta body
end
Parameters:
  • name – The meta-function name, which is immediately transformed into scan-name

  • variables – – token-holders used in meta body.

  • results – An expression returned on a successful scan. This can be omitted in which case, the return value will be #f.

  • body (meta) – A sequence of Meta expressions to scan

Discussion:

The meta-definer form works only with the parse-string source-type of the with-meta-syntax form.

The user of this form has control over the return value. Usually #t is sufficient (in which case the results clause may be omitted, see below); however, e.g., the values of the variables may need to be manipulated during the parse phase.

Example:
define meta public-id(s, pub) => (pub)
  "PUBLIC", scan-s(s), scan-pubid-literal(pub)
end meta public-id;

This definition returns pub when it successfully scans the tokens “PUBLIC”, (some) spaces, and a literal which pub receives. Note that, hereafter, the meta definition is referred to as scan-public-id outside the meta syntax block.

This example, without an explicit results block, is from the meta library:

define meta s(c)
  element-of($space, c), loop(element-of($space, c))
end meta s;

Scans in at least one space (element-of and loop are discussed in the section on Meta expressions).

with-meta-syntax Macro

The guts of the meta-definer form; use when requiring precise control of variables or constructs

Macro Call:

with-meta-syntax
source-type (source #key keys)
  [ variables ]
  meta;
  body
end

Parameters:
  • source-type – Either parse-stream or parse-string.

  • source – Either a stream or a string, depending on source-type

  • start (#key) – If source-type is parse-string, the index to start at.

  • end (#key) – If source-type is parse-string, the index to finish before.

  • pos (#key) – If source-type is parse-string, a name that will be bound to the current index during execution of the with-meta-syntax forms.

  • meta – A Meta expression.

  • body – A body. Evaluated only if parsing is successful.

Values:
  • results – If parsing fails #f, otherwise the values of body.

Discussion:

Special programming aids:

variables (variable [ :: type ] [ = init ], ...);

Bind variables to init, which defaults to #f;

Future versions will have further special forms.

Example:
with-meta-syntax parse-stream (*standard-input*)
  body
end with-meta-syntax;

let query :: <string> = ask-user();
with-meta-syntax parse-string (query, start: 23, end: 42)
  body
end with-meta-syntax;

with-meta-syntax parse-string (query)
  ... ['\n', finish()] ...
  values(these, values, will, be, returned);
end with-meta-syntax;
collector-definer Macro

General facility to collect data into sequences (by default, into strings). Initially with-meta-syntax had this functionality integrated. This is a more modular approach.

with-collector Macro

The guts of the collector-definer form. This macro allows collecting data into a sequence. This is similar in spirit to Common Lisp’s LOOP clauses COLLECT and APPEND, but more flexible. If you want to extract subsequences from a string while parsing it, this is the tool to use.

Macro Call:

with-collector operation ... #key collect, append;
  body
end

Parameters:
  • operation – Specifies the mode of operation. See below.

  • collect – A name for a function that, called with a parameter, inserts this parameter into the sequence.

  • append – A name for a function that, called with a sequence, appends this parameter to the sequence.

  • body – A Dylan body (bnf).

Values:
  • result – Normally the values of body. There is minimal form of with-collector, which always returns the collected sequence.

Discussion:

Like COLLECT, with-collector can put objects into a list. Unlike COLLECT, it can also create vectors or write into already created vectors.

Collecting into a list or vector

Two simple forms of with-collector are into-list and into-vector. They create a list or a vector and write into it. The sequence is available as a variable with a user-defined name:

with-collector into-list name #key collect, append;
  body
end

with-collector into-vector name #key collect, append;
  body
end

Writing into an existing vector

into-vector, by default, creates a <stretchy-sequence>. If you don’t like this behaviour, you can specify a different vector that will be used. For instance, if you already know how long the result will be, you might want to create a string in the first place.

with-collector into-vector name = init, #key collect, append;
  body
end

Using buffers

Normally it is not known in advance how long the result will be. What is really needed is a sequence that automatically reduces its size after processing is finished. into-buffer implements this by returning a subsequence of the original vector.

Instead of a variable holding the sequence there is now a function which creates the subsequence.

with-collector into-buffer function-name, #key collect, append;
  body
end

But how do you find out what the maximum buffer size has to be? A safe guess is the length of the original vector you are extracting elements from. The following construct automatically creates a vector of the same class (well, type-for-copy) and size as big-one:

with-collector into-buffer function-name like big-one, #key collect, append;
  body
end

A minimal collection form

If you don’t need to write into vectors or use buffers, but just want to collect some stuff and return it, use this idiom:

with-collector {into-list|into-vector}, #key collect, append;
  body
  // Note: Values of body will be thrown away.
end

Example:
define function parse-finger-query (query :: <string>)
  with-collector into-buffer user like query, collect: collect;
    with-meta-syntax parse-string (query)
      let (whois, at, c);
      [loop(' '), {[{"/W", "/w"}, yes!(whois)], []},        // Whois switch?
       loop(' '), loop({[{'\n', '\r'}, finish()],           // Newline? Quit.
            {['@', yes!(at), do(collect('@'))], // @? Indirect.
             [type(<character>, c),             // Else:
              do(collect(c))]}})];              //   Collect char
      values(whois, user(), at);
    end with-meta-syntax;
  end with-collector;
end function parse-finger-query;

Aside from the syntactic constructors exported above, the Meta library also provides some commonly-used forms for scanning and parsing:

scan-s Function

Scans in at least one space.

scan-word Function

Scans in a token and returns that token as a <string> instance. A “word” is surrounded by spaces or any of the following characters: ‘<’, ‘>’, ‘{’, ‘}’, ‘[’, ‘]’, punctuation (‘,’, ‘?’, or ‘!’), or the single- or double- quotation-mark.

scan-int Function

Reads in digit characters and returns an <integer> instance.

scan-number Function

Although this is not an all-encompassing conversion utility (although, IMHO, it’s good enough to be part of the standard, once there is one, YMMV), it reads in just about any fixed-point number format and returns a <real> instance.

string-to-number Function
Parameters:
  • str – An instance of <string>, the string to convert to a number.

  • base – An instance of <integer>, defaults to 10, the base of the number in the string.

Values:
  • ans – An instance of <real>, the resulting number.

Discussion:

This really should belong to the common-dylan spec, so that instead of rolling their own, everyone should use this function. It is therefore exported to this end.

Scanning tokens usually entails using some common character types. Meta exports the following:

$space Constant
Equivalent:

Any whitespace.

$digit Constant
Equivalent:

[0-9]

$letter Constant
Equivalent:

[a-zA-Z]

$num-char Constant
Equivalent:

$digit ++ [.eE+]

$graphic-char Constant
Equivalent:

[_@#$%^&*()+=~/]

$any-char Constant
Equivalent:

$letter ++ $num-char ++ $graphic-char

Meta expressions

Meta is a small, but featureful language, so naturally it has its own syntax. This syntax is adapted to Dylan’s way of writing things, of course.

There are several basic Meta expressions implementing the core functionality. Additionally there are some pseudo-functions, syntactically function-like constructs which simplify certain tasks that would otherwise have to be written manually.

Basic Meta expressions as described by Baker

Baker

``with-meta-syntax``

Description

fragment

fragment

try to match this

[a b c ... n]

[a, b, c, ..., n]

and/try all

{a b c ... n}

{a, b, c, ..., n}

or/first hit

@(type variable)

type(type, variable)

match any type, store result in variable

Warning

deprecated type is most often used in seeing if a character is one of several possibilities. Use element-of instead.

$foo

loop(foo)

zero or more

!Lisp

(Dylan)

call the code (and check result)

The same grammar which works for streams will works for strings. When parsing strings, more than just one-character look-ahead is possible, though. You can therefore not only match against characters, but also whole substrings. This does not work when reading from a stream.

Additional pseudo-function expressions

``with-meta-syntax``

Description

Could be written as

do(Dylan)

call the code and continue (whatever the result is)

(Dylan; #t)

finish()

finish parsing successfully

not possible

test(predicate)

Match against a predicate.

not possible

test(predicate, variable)

Match against a predicate, saving the result.

not possible

peeking(variable, test)

Save result first, so that expression test can use it.

not possible;

Warning: deprecated, usepeek instead

peek(variable, test)

Look one character ahead and store in variable if it passes test. Leave the character on the stream.

not possible

element-of(sequence, variable)

Sees if the variable (a character) is a member of the sequence, storing the result

{ ‘a’, ‘b’, ‘c’ } (but not storing result)

yes!(variable)

Set variable to #t and continue.

(variable := #t)

no!(variable)

Set variable to #f and continue.

(variable := #f; #t)

set!(variable, value)

Set variable to value and continue.

(variable := value; #t)

accept(variable)

Match anything and save result.

type(<object>, variable)

Example code

Parsing an integer (base 10)

Common Lisp version:

(defun parse-integer (&aux (s +1) d (n 0))
  (and
   (matchit
    [{#\+ [#\- !(setq s -1)] []}
    @(digit d) !(setq n (digit-to-integer d))
    $[@(digit d) !(setq n (+ (* n 10) (digit-to-integer d)))]])
   (* s n)))

Direct translation to Dylan:

define constant <digit> = one-of('0','1','2','3','4','5','6','7','8','9');

define function parse-integer (source :: <stream>);
  let s = +1; // sign
  let n = 0;  // number
  with-meta-syntax parse-stream (source)
    variables(d);
    [{'+', ['-', (s := -1)], []},
     type(<digit>, d), (n := digit-to-integer(d)),
     loop([type(<digit>, d), (n := digit-to-integer(d) + 10 * n)])];
    (s * n)
  end with-meta-syntax;
end function parse-integer;

Alternative version:

// this will actually return a fn named 'scan-int', not 'parse-int'
define collector int(i) => (as(<string>, str).string-to-integer)
  loop([element-of("+-0123456789", i), do(collect(i))])
end collector int;

Parsing finger queries

define function parse-finger-query (query :: <string>)
  with-collector into-buffer user like query (collect: collect)
    with-meta-syntax parse-string (query)
      variables (whois, at, c);
      [loop(' '), {[{"/W", "/w"}, yes!(whois)], []},        // Whois switch?
       loop(' '), loop({[{'\n', '\r'}, finish()],           // Newline? Quit.
            {['@', yes!(at), do(collect('@'))], // @? Indirect.
             [accept(c), do(collect(c))]}})];   // then collect char
      values(whois, user(), at);
    end with-meta-syntax;
  end with-collector;
end function parse-finger-query;

References

[Baker91] Baker, Henry. “Pragmatic Parsing in Common Lisp”. ACM Lisp Pointers 4, 2 (Apr-Jun 1991), 3-15.