Functional Relational Algebra
(require fra) | package: fra |
This package provides a purely functional implementation of Relational Algebra in Racket.
1 Examples
In this example, we will build a relational database for a university grade book. First, we define a database with a few relations:
(define GradebookDB (Database [Students [StudentId Name Course] (0 "Jonah" 'CS330) (1 "Aidan" 'CS142) (2 "Sarah" 'CS630)] [Projects [ProjectId Course Title] (0 'CS330 "Garbage Collectors") (1 'CS330 "Prolog") (2 'CS142 "UFO") (3 'CS630 "Theorem Prover")] [Grades [StudentId ProjectId Grade] [0 0 2/3] [1 2 99] [2 3 -inf.0]]))
At this point GradebookDB is bound to a value obeying database/c that contains three relations: 'Students, 'Projects, and 'Grades. The first S-expression after each relation identifier is the schema/c for that relation. Each S-expression after that is a Tuple that is in the relation. As you can see, any Scheme value can be included in a tuple.
Let’s do some queries!
(require racket/set) (define (print-relation r) (for ([c (in-list (relation-schema r))]) (printf "~a\t" c)) (printf "~n") (for ([t (in-set (relation-tuples r))]) (for ([i (in-range 0 (tuple-length t))]) (printf "~a\t" (tuple-ref t i))) (printf "~n")))
> (with-database GradebookDB (print-relation (execute-query (query-relation 'Students))))
StudentId Name Course
0 Jonah CS330
1 Aidan CS142
2 Sarah CS630
As you can see from this interaction, a relation is just a set of tuples, which are a vector-like abstraction. Now for some more interesting queries:
(define (>30 n) (> 30 n))
> (with-database GradebookDB (print-relation (execute-query (query-selection (Proposition (>30 Grade)) (query-relation 'Grades)))))
StudentId ProjectId Grade
0 0 2/3
2 3 -inf.0
Proposition can be any Scheme value that may be applied.
Suppose that we attempted to use that proposition on a relation that did not have 'Grade in its schema?
> (with-database GradebookDB (query-selection (Proposition (>30 Grade)) (query-relation 'Students))) query-selection: Proposition must refer to subset of query's
attributes: >30((Grade))
As you can see, the error is detected before the query is ever run.
Now, let’s have some joins:
> (with-database GradebookDB (print-relation (execute-query (query-rename 'Title 'Project (query-projection '(Name Course Title Grade) (query-natural-join (query-relation 'Projects) (query-natural-join (query-relation 'Grades) (query-relation 'Students))))))))
Name Course Project Grade
Jonah CS330 Garbage Collectors 2/3
Aidan CS142 UFO 99
Sarah CS630 Theorem Prover -inf.0
Finally, some functional database modification:
> (with-database GradebookDB (print-relation (execute-query (query-relation 'Students))))
StudentId Name Course
0 Jonah CS330
1 Aidan CS142
2 Sarah CS630
> (define GradebookDB+1 (database-insert GradebookDB 'Students (Tuple 3 "Omega" (lambda () ((lambda (x) (x x)) (lambda (x) (x x)))))))
> (with-database GradebookDB+1 (print-relation (execute-query (query-relation 'Students))))
StudentId Name Course
0 Jonah CS330
1 Aidan CS142
2 Sarah CS630
3 Omega #<procedure>
> (define GradebookDB+2 (database-delete GradebookDB+1 'Students (Tuple 0 "Jonah" 'CS330)))
> (with-database GradebookDB+2 (print-relation (execute-query (query-relation 'Students))))
StudentId Name Course
1 Aidan CS142
2 Sarah CS630
3 Omega #<procedure>
2 API
This section documents the APIs of the package.
2.1 Schemas
Schemas are defined as lists of symbols.
value
schema/c : contract?
2.2 Propositions
Propositions are used by query-selection to compute sub-relations.
procedure
(make-prop:or lhs rhs) → prop?
lhs : prop? rhs : prop?
procedure
(make-prop:and lhs rhs) → prop?
lhs : prop? rhs : prop?
procedure
(make-prop:not p) → prop?
p : prop?
procedure
(make-prop:op op cols) → prop?
op : procedure? cols : (listof symbol?)
Propositions constructors.
syntax
(Proposition p)
p = (not p) | (and p p) | (or p p) | (proc attribute ...)
proc : procedure?
attribute : identifier?
2.3 Queries
Queries are used by execute-query to run relational queries.
value
database-schema/c : contract?
value
current-database-schema : (parameter/c database-schema/c)
procedure
(query-schema q) → schema/c
q : query?
procedure
(query-relation rid) → query?
rid : symbol?
procedure
(query-union q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-difference q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-intersection q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-product q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-projection s q) → query?
s : schema/c q : query?
procedure
(query-selection p q) → query?
p : prop? q : query?
procedure
(query-rename old-attr new-attr q) → query?
old-attr : symbol? new-attr : symbol? q : query?
procedure
(query-rename* renaming q) → query?
renaming : (hash/c symbol? symbol?) q : query?
procedure
(query-natural-join q1 q2 [equal?]) → query?
q1 : query? q2 : query? equal? : (any/c any/c . -> . boolean?) = equal?
procedure
(query-theta-join p q1 q2) → query?
p : prop? q1 : query? q2 : query?
procedure
(query-semi-join q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-anti-join q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-division q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-left-outer-join q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-right-outer-join q1 q2) → query?
q1 : query? q2 : query?
procedure
(query-outer-join q1 q2) → query?
q1 : query? q2 : query?
query-rename* applies multiple renamings at once using the dictionary to map old names to new names.
query-natural-join takes an optional equal? argument to compare attribute values for equality.
2.4 Tuples
Tuples are the contents of relations.
procedure
t : tuple?
procedure
(tuple-ref t i) → any/c
t : tuple? i : exact-nonnegative-integer?
syntax
(Tuple elem ...)
elem : any/c
2.5 Relations
Relations are the contents of databases and the results of queries.
procedure
(relation-schema r) → schema/c
r : relation?
procedure
(relation-tuples r) → (set? tuple?)
r : relation?
syntax
(Relation [attribute ...] (elem ...) ...)
attribute : identifier?
elem : any/c
2.6 Database
value
database/c : contract?
procedure
(database-insert db rid t) → database/c
db : database/c rid : symbol? t : tuple?
procedure
(database-delete db rid t) → database/c
db : database/c rid : symbol? t : tuple?
procedure
(call-with-database db thnk) → any
db : database/c thnk : (-> any)
syntax
(with-database db e ...)
db : database/c
procedure
(execute-query q) → relation?
q : query?
value
NULL : any/c
syntax
(Database [relation [attribute ...] (elem ...) ...] ...)
relation : identifier?
attribute : identifier?
elem : any/c
3 Implementation Notes
The current implementation uses immutable hashes as relations, vectors as tuples (except that they can be efficient appended), and lists as schemas. Propositions are structures, but are compiled to procedures (with attribute identifiers resolved to tuple positions) prior to query execution.
execute-query uses a simple query optimizer. It has two passes: first it tries to push selection operations to the leaves of the query to reduce relation sizes prior to products, then it pulls selection operations towards the root (but not passed products) to reduce the number of iterations over all the elements of a tuple. During both passes, some simplifications are performed, such as merging adjacent renamings, projections, and identical (or trivial) selections. This optimization happens independent of any statistics about relation sizes, etc.