Squicky: a quick wiki parser

6.3.90.900

top ← prev up next →

Squicky: a quick wiki parser

This is a parser for a wiki syntax based closely on WikiCreole, as described below. The source repository is available at bitbucket.

Version 1.1.2, released 2015 September 22

1 Usage

The dialect parsed here is the consensus WikiCreole syntax of http://www.wikicreole.org/. It handles all of the WikiCreole test cases, except for one test of wiki-internal links (which is in any case somewhat underspecified).

In particular, the supported syntax is

//italics//
**bold** : A line which begins with **, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.
##monospaced text## : A line which begins with ##, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].
* bulleted list : (including sublists, the asterisk may or may not be indented)
# numbered list : (including sublists)
>quoted paragraph : including multiple levels (this appears to be an extension of WikiCreole).
[[link to wikipage]]
[[URL|description]]
{{image.png}} or {{image.png|alt text}} or {{image.png|att=value;att2=value; or more}}. In the last case, the att indicates any attribute on the HTML <img> element, such as class; the att must immediately follow the semicolon (so the last case (which is an extension to the Creole syntax) parses as att2=’value; or more’); and if the att is omitted, it defaults to alt.
== heading
=== subheading
==== subsubheading
line\\break
----
(four dashes in a row, on a line by themselves) horizontal list
~x escaped character, and ~http://url which isn’t linked
{{{in-line literal text}}}

Blocks of verbatim text (which will typically be rendered to <pre> blocks), can be specified with:

{{{

preformatted text

}}}

The opening {{{, and its closing partner, must be on lines by themselves. The newline after the opening marker, and the newline before the closing one, are ignored.

Tables look like this:

|=Heading Col 1 |=Heading Col 2 |

|Cell 1.1 |Two lines\\in Cell 1.2 |

|Cell 2.1 |Cell 2.2 |

To this I add syntax:

::key value... : adds ‘metadata’, which can be retrieved with the lookup function; For example after ::title Interesting things, the value of the ‘title’ key will be Interesting things. This must be at the beginning of a line.
"quoted" : corresponds to <q>quoted</q> (note that’s a double-quote character, not two single quotes).
<<element-name content>> : adds <element-name>content</element-name> to the output.
The att=value syntax for {{image.png}} is an extension.
!!target value ... : processing instruction support – adds a sexp equivalent to <?target value ...?> to the output

For an example, the following parses some input text, and writes it out as XML.

(require xml
         squicky
         (prefix-in srfi19: srfi/19))

(define (write-html-to-port wiki-text port)
  (write-xml/content
   (xexpr->xml
    `(html ((xmlns "http://www.w3.org/1999/xhtml"))
           (head
            ,@(cond [(or (lookup-parsed wiki-text 'date)
                         (lookup-parsed wiki-text 'updated))
                     => (λ (date)
                           `((meta ((name "DC.date")
                                    (content ,(srfi19:date->string date "~4"))))))]
                    [else '()])
            (title ,(or (lookup wiki-text 'title) "Title")))
           (body
            ,@(body wiki-text))))
   port)
  (newline port))

(write-html-to-port (parse (current-input-port))
                    (current-output-port))

Suitable input text would be:

::date 2010 December 12

== Here is a heading

Here is some text, with a list comprising:

* one

* two.

That's quite //astonishing!//.

The parsing is intended to be tolerant. No matter how garbled the WikiCreole syntax, the parser should not produce an error, or a body which fails to satisfy the (listof xexpr?) contract.

2 Command line

To convert input text to output, use

% racket -l squicky -- --html input.wiki >input.html

Give the option --help for other instructions.

3 Reference

procedure
(parse source) → wikitext/c
source : (or/c port? string?)

Parse the source into a wikitext object.

procedure
(wikitext? x) → boolean?
x : any/c

Returns #t if x is a parsed wikitext object.

procedure
(body wikitext) → (listof xml:xexpr?)
wikitext : wikitext/c

Extract the body of the document from the wikitext object. Each ‘block’ structure – such as a paragraph or a header – produces a separate XML xexpr?. This sequence of xexpr? will have to be wrapped inside a further list/element before it can, for example, be processed by the XML module’s xexpr->xml function (so (cons 'doc (body wikitext)) creates an xexpr representing a doc element containing the parsed content).

procedure
(lookup wikitext key) → (or/c string? false/c)
wikitext : wikitext/c
key : symbol?

Retrieve the metadata value corresponding to key key, or #f if the key was not specified.

procedure
(lookup/multiple wikitext key) → (listof string?)
wikitext : wikitext/c
key : symbol?

Retrieve the multiple metadata values corresponding to key, or an empty list if there was none. Thus if a metadata value appears several times in the input file, then all of the values appear here, in order.

procedure
(lookup-parsed wikitext key) → any
wikitext : wikitext/c
key : symbol?

Like lookup, except that, depending on the key, the value is returned as a parsed object. See also lookup-value-parser.

procedure
(lookup-keys wikitext) → (listof symbol?)
wikitext : wikitext/c

Return the list of available keys.

procedure
(set-metadata! wikitext key value) → any
  wikitext : wikitext/c
  key : symbol?
  value : string?

Set a metadata key to have the given value. This changes the value retrieved by lookup; but lookup/multiple returns this and any previous values. There is not currently any way of fully replacing a value.

procedure
(squicky-version) → string?
(squicky-version with-repo-revision?) → string?
with-repo-revision? : boolean?

Returns a string giving the version of the squicky parser. If with-repo-revision? is true, then the output includes the identifier of the repository revision this represents.

3.1 Parsing metadata lookups

The default parsing function for lookup-parsed treats specially only the 'date and 'updated keys, which are returned as SRFI-19 date objects. The date parser is reasonably lenient, and detects all of 2010-09-01, 2010-09-01T12:34:56, 2010 September 1, 1 Sep 2010, Sep 1, 2010, September 1, 2010, 1-09-2010, 1/9/2010 and 1 September 2010 as the same date (that is, nn/nn/nnnn dates are parsed as day-month-year, not month-day-year; ISO-8601-style formats are probably the most reliable in general).

You may override this parsing with a parameter:

parameter
(lookup-value-parser) → (-> symbol? string? any/c)
(lookup-value-parser parser) → void?
parser : (-> symbol? string? any/c)

A parameter which evaluates to a parsing function for lookup-parsed. The function is given a key and a non-#f string, and may return anything, including #f. If, for a given key, lookup would return the value #f, then lookup-parsed returns #f. Otherwise, lookup-parsed returns the value of ((lookup-value-parser) key value). Thus, if the function does not recognise the key, it should return the value unchanged.

top ← prev up next →