txexpr:   Tagged X-expressions
1 Installation
2 Importing the module
3 What’s a txexpr?
4 Why not just use match, quasiquote, and so on?
5 Interface
txexpr?
txexpr-tag?
txexpr-attr?
txexpr-attr-key?
txexpr-attr-value?
txexpr-element?
txexpr-tags?
txexpr-attrs?
txexpr-elements?
validate-txexpr
can-be-txexpr-attr-key?
can-be-txexpr-attr-value?
->txexpr-attr-key
->txexpr-attr-value
txexpr->values
txexpr->list
xexpr->html
get-tag
get-attrs
get-elements
txexpr
make-txexpr
can-be-txexpr-attrs?
attrs->hash
hash->attrs
attrs-have-key?
attrs-equal?
attr-ref
attr-ref*
attr-set
attr-set*
attr-join
merge-attrs
remove-attrs
map-elements
map-elements/  exclude
splitf-txexpr
findf*-txexpr
findf-txexpr
check-txexprs-equal?
6 License & source code
6.3.90.900

txexpr: Tagged X-expressions

Matthew Butterick <mb@mbtype.com>

 (require txexpr) package: txexpr
 (require (submod txexpr safe))

A set of small but handy functions for improving the readability and reliability of programs that operate on tagged X-expressions (for short, txexprs).

1 Installation

At the command line:

raco pkg install txexpr

After that, you can update the package from the command line:

raco pkg update txexpr

2 Importing the module

The module can be invoked two ways: fast or safe.

Fast mode is the default, which you get by importing the module in the usual way: (require txexpr).

Safe mode enables the function contracts documented below. Use safe mode by importing the module as (require (submod txexpr safe)).

3 What’s a txexpr?

It’s an X-expression with the following grammar:

  txexpr = (list tag (list attr ...) element ...)
  | (cons tag (list element ...))
     
  tag = symbol?
     
  attr = (list key value)
     
  key = symbol?
     
  value = string?
     
  element = xexpr?

A txexpr is a list with a symbol in the first position — the tag — followed by a series of elements, which are other X-expressions. Optionally, a txexpr can have a list of attributes in the second position.

Examples:
> (txexpr? '(span "Brennan" "Dale"))

#t

> (txexpr? '(span "Brennan" (em "Richard") "Dale"))

#t

> (txexpr? '(span [[class "hidden"][id "names"]] "Brennan" "Dale"))

#t

> (txexpr? '(span lt gt amp))

#t

> (txexpr? '("We really" "should have" "a tag"))

#f

> (txexpr? '(span [[class not-quoted]] "Brennan"))

#f

> (txexpr? '(span [class "hidden"] "Brennan" "Dale"))

#t

The last one is a common mistake. Because the key–value pair is not enclosed in a list, it’s interpreted as a nested txexpr within the first txexpr, as you may not find out until you try to read its attributes:

There’s no way of eliminating this ambiguity, short of always requiring an attribute list — empty if necessary — in your txexpr. See also xexpr-drop-empty-attributes.

Examples:
> (get-attrs '(span [class "hidden"] "Brennan" "Dale"))

'()

> (get-elements '(span [class "hidden"] "Brennan" "Dale"))

'((class "hidden") "Brennan" "Dale")

Tagged X-expressions are most commonly found in HTML & XML documents. Though the notation is different in Racket, the data structure is identical:

Examples:
> (xexpr->string '(span [[id "names"]] "Brennan" (em "Richard") "Dale"))

"<span id=\"names\">Brennan<em>Richard</em>Dale</span>"

> (string->xexpr "<span id=\"names\">Brennan<em>Richard</em>Dale</span>")

'(span ((id "names")) "Brennan" (em () "Richard") "Dale")

After converting to and from HTML, we get back the original X-expression. Well, almost. The brackets turned into parentheses — no big deal, since they mean the same thing in Racket. Also, per its usual practice, string->xexpr added an empty attribute list after em. This is also benign.

4 Why not just use match, quasiquote, and so on?

If you prefer those, please do. But I’ve found two benefits to using module functions:

Readability. In code that already has a lot of matching and quasiquoting going on, these functions make it easy to see where & how txexprs are being used.

Reliability. Because txexprs come in two close but not quite equal forms, careful coders will always have to take both cases into account.

The programming is trivial, but the annoyance is real.

5 Interface

procedure

(txexpr? v)  boolean?

  v : any/c

procedure

(txexpr-tag? v)  boolean?

  v : any/c

procedure

(txexpr-attr? v)  boolean?

  v : any/c

procedure

(txexpr-attr-key? v)  boolean?

  v : any/c

procedure

(txexpr-attr-value? v)  boolean?

  v : any/c

procedure

(txexpr-element? v)  boolean?

  v : any/c
Predicates for txexprs that implement this grammar:

  txexpr = (list tag (list attr ...) element ...)
  | (cons tag (list element ...))
     
  tag = symbol?
     
  attr = (list key value)
     
  key = symbol?
     
  value = string?
     
  element = xexpr?

procedure

(txexpr-tags? v)  boolean?

  v : any/c

procedure

(txexpr-attrs? v)  boolean?

  v : any/c

procedure

(txexpr-elements? v)  boolean?

  v : any/c

procedure

(validate-txexpr possible-txexpr)  txexpr?

  possible-txexpr : any/c
Like txexpr?, but raises a descriptive error if possible-txexpr is invalid, and otherwise returns possible-txexpr itself.

Examples:
> (validate-txexpr 'root)

validate-txexpr: 'root: not an X-expression

> (validate-txexpr '(root))

'(root)

> (validate-txexpr '(root ((id "top")(class 42))))

validate-txexpr-attrs: in '(root ((id "top") (class 42))),

'((id "top") (class 42)) is not a valid list of attributes

because '(class 42) is not in the form '(symbol "string")

> (validate-txexpr '(root ((id "top")(class "42"))))

'(root ((id "top") (class "42")))

> (validate-txexpr '(root ((id "top")(class "42")) ("hi")))

validate-txexpr-element: in '(root ((id "top") (class "42"))

("hi")), '("hi") is not a valid element (must be txexpr,

string, symbol, XML char, or cdata)

> (validate-txexpr '(root ((id "top")(class "42")) "hi"))

'(root ((id "top") (class "42")) "hi")

procedure

(can-be-txexpr-attr-key? v)  boolean?

  v : any/c

procedure

(can-be-txexpr-attr-value? v)  boolean?

  v : any/c
Predicates for input arguments that are trivially converted to an attribute key or value

… with these conversion functions.

Dissolves a txexpr into its components and returns all three.

Examples:
> (txexpr->values '(div))

'div

'()

'()

> (txexpr->values '(div "Hello" (p "World")))

'div

'()

'("Hello" (p "World"))

> (txexpr->values '(div [[id "top"]] "Hello" (p "World")))

'div

'((id "top"))

'("Hello" (p "World"))

Like txexpr->values, but returns the three components in a list.

Examples:
> (txexpr->list '(div))

'(div () ())

> (txexpr->list '(div "Hello" (p "World")))

'(div () ("Hello" (p "World")))

> (txexpr->list '(div [[id "top"]] "Hello" (p "World")))

'(div ((id "top")) ("Hello" (p "World")))

procedure

(xexpr->html x)  string?

  x : xexpr?
Convert x to an HTML string. Better than xexpr->string because consistent with the HTML spec, it will not escape text that appears within script or style blocks. For convenience, this function will take any X-expression, not just tagged X-expressions.

Examples:
> (define tx '(root (script "3 > 2") "Why is 3 > 2?"))
> (xexpr->string tx)

"<root><script>3 &gt; 2</script>Why is 3 &gt; 2?</root>"

> (xexpr->html tx)

"<root><script>3 > 2</script>Why is 3 &gt; 2?</root>"

> (map xexpr->html (list "string" 'entity 65))

'("string" "&entity;" "&#65;")

procedure

(get-tag tx)  txexpr-tag?

  tx : txexpr?

procedure

(get-attrs tx)  txexpr-attr?

  tx : txexpr?

procedure

(get-elements tx)  (listof txexpr-element?)

  tx : txexpr?
Accessor functions for the individual pieces of a txexpr.

Examples:
> (get-tag '(div [[id "top"]] "Hello" (p "World")))

'div

> (get-attrs '(div [[id "top"]] "Hello" (p "World")))

'((id "top"))

> (get-elements '(div [[id "top"]] "Hello" (p "World")))

'("Hello" (p "World"))

procedure

(txexpr tag [attrs elements])  txexpr?

  tag : txexpr-tag?
  attrs : txexpr-attrs? = empty
  elements : txexpr-elements? = empty
Assemble a txexpr from its parts. If you don’t have attributes, but you do have elements, you’ll need to pass empty as the second argument. Note that unlike xml->xexpr, if the attribute list is empty, it’s not included in the resulting expression.

Examples:
> (txexpr 'div)

'(div)

> (txexpr 'div '() '("Hello" (p "World")))

'(div "Hello" (p "World"))

> (txexpr 'div '[[id "top"]])

'(div ((id "top")))

> (txexpr 'div '[[id "top"]] '("Hello" (p "World")))

'(div ((id "top")) "Hello" (p "World"))

> (define tx '(div [[id "top"]] "Hello" (p "World")))
> (txexpr (get-tag tx)
  (get-attrs tx) (get-elements tx))

'(div ((id "top")) "Hello" (p "World"))

procedure

(make-txexpr tag [attrs elements])  txexpr?

  tag : txexpr-tag?
  attrs : txexpr-attrs? = empty
  elements : txexpr-elements? = empty
Alternate name for txexpr.

procedure

(can-be-txexpr-attrs? v)  boolean?

  v : any/c
Predicate for functions that handle txexpr-attrs. Covers values that are easily converted into pairs of attr-key and attr-value. Namely: single xexpr-attrs, lists of xexpr-attrs (i.e., what you get from get-attrs), or interleaved symbols and strings (each pair will be concatenated into a single xexpr-attr).

procedure

(attrs->hash x ...)  hash-eq?

  x : can-be-txexpr-attrs?

procedure

(hash->attrs h)  txexpr-attrs?

  h : hash?
Convert attrs to an immutable hash, and back again.

Examples:
> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (attrs->hash (get-attrs tx))

'#hasheq((class . "red") (id . "top"))

> (hash->attrs '#hasheq((class . "red") (id . "top")))

'((class "red") (id "top"))

procedure

(attrs-have-key? attrs key)  boolean?

  attrs : (or/c txexpr-attrs? txexpr?)
  key : can-be-txexpr-attr-key?
Returns #t if the attrs contain a value for the given key, #f otherwise.

Examples:
> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (attrs-have-key? tx 'id)

#t

> (attrs-have-key? tx 'grackle)

#f

procedure

(attrs-equal? attrs other-attrs)  boolean?

  attrs : (or/c txexpr-attrs? txexpr?)
  other-attrs : (or/c txexpr-attrs? txexpr?)
Returns #t if attrs and other-attrs contain the same keys and values, #f otherwise. The order of attributes is irrelevant.

Examples:
> (define tx1 '(div [[id "top"][class "red"]] "Hello"))
> (define tx2 '(p [[class "red"][id "top"]] "Hello"))
> (define tx3 '(p [[id "bottom"][class "red"]] "Hello"))
> (attrs-equal? tx1 tx2)

#t

> (attrs-equal? tx1 tx3)

#f

procedure

(attr-ref tx key)  can-be-txexpr-attr-value?

  tx : txexpr?
  key : can-be-txexpr-attr-key?
Given a key, look up the corresponding value in the attributes of a txexpr. Asking for a nonexistent key produces an error.

Examples:
> (attr-ref tx 'class)

"red"

> (attr-ref tx 'id)

"top"

> (attr-ref tx 'nonexistent-key)

attr-ref: no value found for key 'nonexistent-key

procedure

(attr-ref* tx key)  (listof can-be-txexpr-attr-value?)

  tx : txexpr?
  key : can-be-txexpr-attr-key?
Like attr-ref, but returns a recursively gathered list of all the values for that key within tx. Asking for a nonexistent key produces null.

Examples:
> (define tx '(div [[class "red"]] "Hello" (em ([class "blue"]) "world")))
> (attr-ref* tx 'class)

'("red" "blue")

> (attr-ref* tx 'nonexistent-key)

'()

procedure

(attr-set tx key value)  txexpr?

  tx : txexpr?
  key : can-be-txexpr-attr-key?
  value : can-be-txexpr-attr-value?
Given a txexpr, set the value of attribute key to value. Return the updated txexpr.

Examples:
> (define tx '(div [[class "red"][id "top"]] "Hello" (p "World")))
> (attr-set tx 'id "bottom")

'(div ((class "red") (id "bottom")) "Hello" (p "World"))

> (attr-set tx 'class "blue")

'(div ((class "blue") (id "top")) "Hello" (p "World"))

> (attr-set (attr-set tx 'id "bottom") 'class "blue")

'(div ((class "blue") (id "bottom")) "Hello" (p "World"))

procedure

(attr-set* tx key value ... ...)  txexpr?

  tx : txexpr?
  key : can-be-txexpr-attr-key?
  value : can-be-txexpr-attr-value?
Like attr-set, but accepts any number of keys and values.

Examples:
> (define tx '(div "Hello"))
> (attr-set* tx 'id "bottom" 'class "blue")

'(div ((class "blue") (id "bottom")) "Hello")

procedure

(attr-join tx key value)  txexpr?

  tx : txexpr?
  key : can-be-txexpr-attr-key?
  value : can-be-txexpr-attr-value?
Given a txexpr, append the value of attribute key with value. Return the updated txexpr.

Examples:
> (define tx '(div [[class "red"]] "Hello"))
> (attr-join tx 'class "small")

'(div ((class "red small")) "Hello")

procedure

(merge-attrs attrs ...)  txexpr-attrs?

  attrs : (listof can-be-txexpr-attrs?)
Combine a series of attributes into a single txexpr-attrs item. This function addresses three annoyances that surface in working with txexpr attributes.

  1. You can pass the attributes in multiple forms. See can-be-txexpr-attrs? for further details.

  2. Attributes with the same name are merged, with the later value taking precedence (i.e., hash behavior).

  3. Attributes are sorted in alphabetical order.

Examples:
> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (define tx-attrs (get-attrs tx))
> tx-attrs

'((id "top") (class "red"))

> (merge-attrs tx-attrs 'editable "true")

'((class "red") (editable "true") (id "top"))

> (merge-attrs tx-attrs 'id "override-value")

'((class "red") (id "override-value"))

> (define my-attr '(id "another-override"))
> (merge-attrs tx-attrs my-attr)

'((class "red") (id "another-override"))

> (merge-attrs my-attr tx-attrs)

'((class "red") (id "top"))

procedure

(remove-attrs tx)  txexpr?

  tx : txexpr?
Recursively remove all attributes.

Examples:
> (define tx '(div [[id "top"]] "Hello" (p [[id "lower"]] "World")))
> (remove-attrs tx)

'(div "Hello" (p "World"))

procedure

(map-elements proc tx)  txexpr?

  proc : procedure?
  tx : txexpr?
Recursively apply proc to all elements, leaving tags and attributes alone. Using plain map will only process elements at the top level of the current txexpr. Usually that’s not what you want.

Examples:
> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map upcaser tx)

'(div "HELLO!" (p "Welcome to" (strong "Mars")))

> (map-elements upcaser tx)

'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))

In practice, most xexpr-elements are strings. But woe befalls those who pass string procedures to map-elements, because an xexpr-element can be any kind of xexpr?, and an xexpr? is not necessarily a string.

Examples:
> (define tx '(p "Welcome to" (strong "Mars" amp "Sons")))
> (map-elements string-upcase tx)

string-upcase: contract violation

  expected: string?

  given: 'amp

> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)

'(p "WELCOME TO" (strong "MARS" amp "SONS"))

procedure

(map-elements/exclude proc tx exclude-test)  txexpr?

  proc : procedure?
  tx : txexpr?
  exclude-test : (txexpr? . -> . boolean?)
Like map-elements, but skips any txexprs that evaluate to #t under exclude-test. The exclude-test gets a whole txexpr as input, so it can test any of its parts.

Examples:
> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)

'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))

> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'strong)))

'(div "HELLO!" (p "WELCOME TO" (strong "Mars")))

Be careful with the wider consequences of exclusion tests. When exclude-test is true, the txexpr is excluded, but so is everything underneath that txexpr. In other words, there is no way to re-include (un-exclude?) elements nested under an excluded element.

Examples:
> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)

'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))

> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'p)))

'(div "HELLO!" (p "Welcome to" (strong "Mars")))

> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'div)))

'(div "Hello!" (p "Welcome to" (strong "Mars")))

procedure

(splitf-txexpr tx pred [replace-proc])

  
txexpr? (listof txexpr-element?)
  tx : txexpr?
  pred : procedure?
  replace-proc : procedure? = (λ(x) null)
Recursively descend through txexpr and extract all elements that match pred. Returns two values: a txexpr with the matching elements removed, and the list of matching elements. Sort of esoteric, but I’ve needed it more than once, so here it is.

Examples:
> (define tx '(div "Wonderful day" (meta "weather" "good") "for a walk"))
> (define is-meta? (λ(x) (and (txexpr? x) (equal? 'meta (get-tag x)))))
> (splitf-txexpr tx is-meta?)

'(div "Wonderful day" "for a walk")

'((meta "weather" "good"))

Ordinarily, the result of the split operation is to remove the elements that match pred. But you can change this behavior with the optional replace-proc argument.

Examples:
> (define tx '(div "Wonderful day" (meta "weather" "good") "for a walk"))
> (define is-meta? (λ(x) (and (txexpr? x) (equal? 'meta (get-tag x)))))
> (define replace-meta (λ(x) '(em "meta was here")))
> (splitf-txexpr tx is-meta? replace-meta)

'(div "Wonderful day" (em "meta was here") "for a walk")

'((meta "weather" "good"))

procedure

(findf*-txexpr tx pred)  (or/c #f (listof txexpr-element?))

  tx : txexpr?
  pred : procedure?

procedure

(findf-txexpr tx pred)  (or/c #f txexpr-element?)

  tx : txexpr?
  pred : procedure?
Like splitf-txexpr, but only retrieve the elements that match pred. findf*-txexpr retrieves all results; findf-txexpr only the first. In both cases, if there are no matches, you get #f.

Examples:
> (define tx '(div "Wonderful day" (meta "weather" "good")
                   "for a walk" (meta "dog" "Roxy")))
> (define is-meta? (λ(x) (and (txexpr? x) (eq? 'meta (get-tag x)))))
> (findf*-txexpr tx is-meta?)

'((meta "weather" "good") (meta "dog" "Roxy"))

> (findf-txexpr tx is-meta?)

'(meta "weather" "good")

> (define is-zimzam? (λ(x) (and (txexpr? x) (eq? 'zimzam (get-tag x)))))
> (findf*-txexpr tx is-zimzam?)

#f

> (findf-txexpr tx is-zimzam?)

#f

procedure

(check-txexprs-equal? tx1 tx2)  void?

  tx1 : txexpr?
  tx2 : txexpr?
Designed to be used with rackunit. Check whether tx1 and tx2 are equal? except for ordering of attributes (which ordinarily has no semantic significance). Return void if so, otherwise raise a check failure.

Examples:
> (define tx1 '(div ((attr-a "foo")(attr-z "bar"))))
> (define tx2 '(div ((attr-z "bar")(attr-a "foo"))))
> (parameterize ([current-check-handler (λ _ (display "not "))])
    (display "txexprs are ")
    (check-txexprs-equal? tx1 tx2)
    (displayln "equal"))

txexprs are equal

If ordering of attributes is relevant to your test, then just use check-equal? as usual.

Examples:
> (define tx1 '(div ((attr-a "foo")(attr-z "bar"))))
> (define tx2 '(div ((attr-z "bar")(attr-a "foo"))))
> (parameterize ([current-check-handler (λ _ (display "not "))])
    (display "txexprs are ")
    (check-equal? tx1 tx2)
    (displayln "equal"))

txexprs are not equal

6 License & source code

This module is licensed under the LGPL.

Source repository at http://github.com/mbutterick/txexpr. Suggestions & corrections welcome.