Squicky: a quick wiki parser
Norman Gray <http://nxg.me.uk>
This is a parser for a wiki syntax based closely on WikiCreole, as described below. The source repository is available at bitbucket.
Version 1.1.2, released 2015 September 22
1 Usage
The dialect parsed here is the consensus WikiCreole syntax of http://www.wikicreole.org/. It handles all of the WikiCreole test cases, except for one test of wiki-internal links (which is in any case somewhat underspecified).
//italics//
**bold** : A line which begins with **, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.
##monospaced text## : A line which begins with ##, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].
* bulleted list : (including sublists, the asterisk may or may not be indented)
# numbered list : (including sublists)
>quoted paragraph : including multiple levels (this appears to be an extension of WikiCreole).
[[link to wikipage]]
[[URL|description]]
{{image.png}} or {{image.png|alt text}} or {{image.png|att=value;att2=value; or more}}. In the last case, the att indicates any attribute on the HTML <img> element, such as class; the att must immediately follow the semicolon (so the last case (which is an extension to the Creole syntax) parses as att2=’value; or more’); and if the att is omitted, it defaults to alt.
== heading
=== subheading
==== subsubheading
line\\break
----
(four dashes in a row, on a line by themselves) horizontal list~x escaped character, and ~http://url which isn’t linked
{{{in-line literal text}}}
{{{ |
preformatted text |
}}} |
|=Heading Col 1 |=Heading Col 2 | |
|Cell 1.1 |Two lines\\in Cell 1.2 | |
|Cell 2.1 |Cell 2.2 | |
::key value... : adds ‘metadata’, which can be retrieved with the lookup function; For example after ::title Interesting things, the value of the ‘title’ key will be Interesting things. This must be at the beginning of a line.
"quoted" : corresponds to <q>quoted</q> (note that’s a double-quote character, not two single quotes).
<<element-name content>> : adds <element-name>content</element-name> to the output.
The att=value syntax for {{image.png}} is an extension.
!!target value ... : processing instruction support – adds a sexp equivalent to <?target value ...?> to the output
For an example, the following parses some input text, and writes it out as XML.
(require xml squicky (prefix-in srfi19: srfi/19)) (define (write-html-to-port wiki-text port) (write-xml/content (xexpr->xml `(html ((xmlns "http://www.w3.org/1999/xhtml")) (head ,@(cond [(or (lookup-parsed wiki-text 'date) (lookup-parsed wiki-text 'updated)) => (λ (date) `((meta ((name "DC.date") (content ,(srfi19:date->string date "~4"))))))] [else '()]) (title ,(or (lookup wiki-text 'title) "Title"))) (body ,@(body wiki-text)))) port) (newline port)) (write-html-to-port (parse (current-input-port)) (current-output-port))
::date 2010 December 12 |
== Here is a heading |
Here is some text, with a list comprising: |
* one |
* two. |
|
That's quite //astonishing!//. |
The parsing is intended to be tolerant. No matter how garbled the WikiCreole syntax, the parser should not produce an error, or a body which fails to satisfy the (listof xexpr?) contract.
2 Command line
% racket -l squicky -- --html input.wiki >input.html |
Give the option --help for other instructions.
3 Reference
procedure
(body wikitext) → (listof xml:xexpr?)
wikitext : wikitext/c
procedure
(set-metadata! wikitext key value) → any
wikitext : wikitext/c key : symbol? value : string?
procedure
(squicky-version) → string?
(squicky-version with-repo-revision?) → string? with-repo-revision? : boolean?
3.1 Parsing metadata lookups
The default parsing function for lookup-parsed treats specially only the 'date and 'updated keys, which are returned as SRFI-19 date objects. The date parser is reasonably lenient, and detects all of 2010-09-01, 2010-09-01T12:34:56, 2010 September 1, 1 Sep 2010, Sep 1, 2010, September 1, 2010, 1-09-2010, 1/9/2010 and 1 September 2010 as the same date (that is, nn/nn/nnnn dates are parsed as day-month-year, not month-day-year; ISO-8601-style formats are probably the most reliable in general).
You may override this parsing with a parameter: