Squicky:   a quick wiki parser
1 Usage
2 Command line
3 Reference
parse
wikitext?
body
lookup
lookup/  multiple
lookup-parsed
lookup-keys
set-metadata!
squicky-version
3.1 Parsing metadata lookups
lookup-value-parser
6.3.90.900

Squicky: a quick wiki parser

Norman Gray <http://nxg.me.uk>

This is a parser for a wiki syntax based closely on WikiCreole, as described below. The source repository is available at bitbucket.

Version 1.1.2, released 2015 September 22

1 Usage

The dialect parsed here is the consensus WikiCreole syntax of http://www.wikicreole.org/. It handles all of the WikiCreole test cases, except for one test of wiki-internal links (which is in any case somewhat underspecified).

In particular, the supported syntax is
  • //italics//

  • **bold** : A line which begins with **, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.

  • ##monospaced text## : A line which begins with ##, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].

  •  * bulleted list : (including sublists, the asterisk may or may not be indented)

  •  # numbered list : (including sublists)

  • >quoted paragraph : including multiple levels (this appears to be an extension of WikiCreole).

  • [[link to wikipage]]

  • [[URL|description]]

  • {{image.png}} or {{image.png|alt text}} or {{image.png|att=value;att2=value; or more}}. In the last case, the att indicates any attribute on the HTML <img> element, such as class; the att must immediately follow the semicolon (so the last case (which is an extension to the Creole syntax) parses as att2=value; or more); and if the att is omitted, it defaults to alt.

  • == heading

  • === subheading

  • ==== subsubheading

  • line\\break

  • ----

    (four dashes in a row, on a line by themselves) horizontal list

  • ~x escaped character, and ~http://url which isn’t linked

  • {{{in-line literal text}}}

Blocks of verbatim text (which will typically be rendered to <pre> blocks), can be specified with:

{{{

preformatted text

}}}

The opening {{{, and its closing partner, must be on lines by themselves. The newline after the opening marker, and the newline before the closing one, are ignored.

Tables look like this:

|=Heading Col 1 |=Heading Col 2         |

|Cell 1.1       |Two lines\\in Cell 1.2 |

|Cell 2.1       |Cell 2.2               |

To this I add syntax:
  • ::key value... : adds ‘metadata’, which can be retrieved with the lookup function; For example after ::title Interesting things, the value of the ‘title’ key will be Interesting things. This must be at the beginning of a line.

  • "quoted" : corresponds to <q>quoted</q> (note that’s a double-quote character, not two single quotes).

  • <<element-name content>> : adds <element-name>content</element-name> to the output.

  • The att=value syntax for {{image.png}} is an extension.

  • !!target value ... : processing instruction support – adds a sexp equivalent to <?target value ...?> to the output

For an example, the following parses some input text, and writes it out as XML.

(require xml
         squicky
         (prefix-in srfi19: srfi/19))
 
(define (write-html-to-port wiki-text port)
  (write-xml/content
   (xexpr->xml
    `(html ((xmlns "http://www.w3.org/1999/xhtml"))
           (head
            ,@(cond [(or (lookup-parsed wiki-text 'date)
                         (lookup-parsed wiki-text 'updated))
                     => (λ (date)
                           `((meta ((name "DC.date")
                                    (content ,(srfi19:date->string date "~4"))))))]
                    [else '()])
            (title ,(or (lookup wiki-text 'title) "Title")))
           (body
            ,@(body wiki-text))))
   port)
  (newline port))
 
(write-html-to-port (parse (current-input-port))
                    (current-output-port))

Suitable input text would be:

::date 2010 December 12

== Here is a heading

Here is some text, with a list comprising:

  * one

  * two.

 

That's quite //astonishing!//.

The parsing is intended to be tolerant. No matter how garbled the WikiCreole syntax, the parser should not produce an error, or a body which fails to satisfy the (listof xexpr?) contract.

2 Command line

To convert input text to output, use

% racket -l squicky -- --html input.wiki >input.html

Give the option --help for other instructions.

3 Reference

procedure

(parse source)  wikitext/c

  source : (or/c port? string?)
Parse the source into a wikitext object.

procedure

(wikitext? x)  boolean?

  x : any/c
Returns #t if x is a parsed wikitext object.

procedure

(body wikitext)  (listof xml:xexpr?)

  wikitext : wikitext/c
Extract the body of the document from the wikitext object. Each ‘block’ structure – such as a paragraph or a header – produces a separate XML xexpr?. This sequence of xexpr? will have to be wrapped inside a further list/element before it can, for example, be processed by the XML module’s xexpr->xml function (so (cons 'doc (body wikitext)) creates an xexpr representing a doc element containing the parsed content).

procedure

(lookup wikitext key)  (or/c string? false/c)

  wikitext : wikitext/c
  key : symbol?
Retrieve the metadata value corresponding to key key, or #f if the key was not specified.

procedure

(lookup/multiple wikitext key)  (listof string?)

  wikitext : wikitext/c
  key : symbol?
Retrieve the multiple metadata values corresponding to key, or an empty list if there was none. Thus if a metadata value appears several times in the input file, then all of the values appear here, in order.

procedure

(lookup-parsed wikitext key)  any

  wikitext : wikitext/c
  key : symbol?
Like lookup, except that, depending on the key, the value is returned as a parsed object. See also lookup-value-parser.

procedure

(lookup-keys wikitext)  (listof symbol?)

  wikitext : wikitext/c
Return the list of available keys.

procedure

(set-metadata! wikitext key value)  any

  wikitext : wikitext/c
  key : symbol?
  value : string?
Set a metadata key to have the given value. This changes the value retrieved by lookup; but lookup/multiple returns this and any previous values. There is not currently any way of fully replacing a value.

procedure

(squicky-version)  string?

(squicky-version with-repo-revision?)  string?
  with-repo-revision? : boolean?
Returns a string giving the version of the squicky parser. If with-repo-revision? is true, then the output includes the identifier of the repository revision this represents.

3.1 Parsing metadata lookups

The default parsing function for lookup-parsed treats specially only the 'date and 'updated keys, which are returned as SRFI-19 date objects. The date parser is reasonably lenient, and detects all of 2010-09-01, 2010-09-01T12:34:56, 2010 September 1, 1 Sep 2010, Sep 1, 2010, September 1, 2010, 1-09-2010, 1/9/2010 and 1 September 2010 as the same date (that is, nn/nn/nnnn dates are parsed as day-month-year, not month-day-year; ISO-8601-style formats are probably the most reliable in general).

You may override this parsing with a parameter:

parameter

(lookup-value-parser)  (-> symbol? string? any/c)

(lookup-value-parser parser)  void?
  parser : (-> symbol? string? any/c)
A parameter which evaluates to a parsing function for lookup-parsed. The function is given a key and a non-#f string, and may return anything, including #f. If, for a given key, lookup would return the value #f, then lookup-parsed returns #f. Otherwise, lookup-parsed returns the value of ((lookup-value-parser) key value). Thus, if the function does not recognise the key, it should return the value unchanged.