LURQL

LURQL is a specialized language for querying metadata via JMI. As input, a LURQL query takes a JMI object extent and its corresponding metamodel, and as output produces references to the JMI objects which match the query specification. LURQL has several special features which make it the right tool for the job: By contrast, SQL is excellent for querying relational data, but not for querying the complex object hierarchies and associations typical of metadata. (Recent additions of object types and recursive queries to SQL have improved the expressiveness, but optimizer technology hasn't caught up enough yet to make usage practical, and metamodel-guided evaluation is still completely lacking.) In the other corner, XQuery is good for semi-structured hierarchical data, and has some support for recursion, but is not appropriate for metadata (which has rich graph structure and a well-defined metamodel). However, a special-purpose language like LURQL is necessarily limited in what it can do, and there is much room for improvement.

This document presents the syntax and semantics of LURQL along with examples of how to use it. It assumes familiarity with UML.

Example Model

As our example metamodel, we use the Common Warehouse Metamodel (CWM), which defines standard relational objects such as Schema, Table, View, and Column using UML concepts such as Namespace, Class, and Attribute as a base. More details on Farrago's usage of this model are provided in the Farrago model documentation.

As sample metadata to be retrieved (instantiating the above metamodel), we use the standard Farrago example schema. Here's a UML object diagram for the partial definition of one of the tables (including its containers, columns, and their datatypes):

Simple Queries

Here's our first LURQL query, which finds a table by name together with all of its columns:

select * 
from class Table 
where name = 'DEPTS' 
then (
    follow association ClassifierFeature
);

EXECUTION RESULT:
LocalTable: DEPTS
StoredColumn: DEPTNO
StoredColumn: NAME
As you can see, LURQL uses a SQL-ish select/from/where structure, but the resemblance is only superficial. LURQL queries have a mapping to the directed graph structure of the metamodel of interest. In this case, we are starting from the extent of all tables, filtering down to just those tables whose name matches the string literal 'DEPTS', and then following the association ClassifierFeature to pick up the corresponding columns. The result set is a collection of JMI objects (one table and two columns); we only display the object types and names in this document, but the actual return value is a reference to the live object in the repository (from which all of its attributes and links are accessible). Note that the actual type returned for DEPTS (LocalTable) is a subtype of the requested CWM type (Table); LURQL automatically includes all subtypes in its search.

Unlike SQL, the * in the select clause does not project attributes; instead, it projects entire nodes of the query graph. For example, if we only wanted to retrieve the columns:


select c
from class Table as t where name='DEPTS' then (
    follow association ClassifierFeature as c
);

EXECUTION RESULT:
StoredColumn: DEPTNO
StoredColumn: NAME
The as clause can be used to apply a label to each from or follow clause. In this case the label "t" is unused and is only specified as a syntax example. For follow clauses, the label is applied to the set of classes reached via the association (or associations) traversed (StoredColumn for c).

Follow clauses can be chained to arbitrary depth. For example, to retrieve the schema and table containing column EMPNO:


select s, t
from class Column where name='EMPNO' then (
    follow association ClassifierFeature as t then (
        follow association ElementOwnership backward as s
    )
);

EXECUTION RESULT:
LocalSchema: SALES
LocalTable: EMPS
LocalTable: TEMPS
View: EMPSVIEW
View: TEMPSVIEW
Wait, where did those extra objects come from? It turns out that there are four different objects in schema SALES with columns named EMPNO: two tables (EMPS and TEMPS) and two views (EMPSVIEW and TEMPSVIEW). LURQL followed all of these paths from the root column object EMPNO, and then merged all of them at the single parent schema (duplicates are always removed from the final result set).

Note that the query above specified the backward qualifier on the traversal of association ElementOwnership, which is what takes us from table to containing schema. The reason is that the ElementOwnership association also links tables to other objects such as constraints and dependencies. LURQL imposes direction on the underlying metamodel as follows:

In this case, we happen to know that the only traversal we are interested in follows the backward direction. As contrast, here are the query results without the directional qualifier:

select s, t
from class Column where name='EMPNO' then (
    follow association ClassifierFeature as t then (
        follow association ElementOwnership as s
    )
);

EXECUTION RESULT:
Dependency: EMPS$DEP
Dependency: EMPSVIEW$DEP
Dependency: TEMPS$DEP
Dependency: TEMPSVIEW$DEP
LocalSchema: SALES
LocalTable: EMPS
LocalTable: TEMPS
PrimaryKeyConstraint: SYS$PRIMARY_KEY
PrimaryKeyConstraint: SYS$PRIMARY_KEY
UniqueKeyConstraint: SYS$UNIQUE_KEY$EMPID
View: EMPSVIEW
View: TEMPSVIEW
Direction is not the only qualifier available for filtering association traversals. The others are covered in some of the later examples as well as the formal syntax.

Divergence and Convergence

In the previous example, we saw how LURQL can follow many different links in parallel, all matching a single follow clause. It is also possible to explicitly specify more than one follow branching out from the same origin along different associations. For example, suppose we'd like to find the datatype and containing tables of a column:

select t, d
from class Column where name='CITY' then (
    follow destination class Table as t
    union
    follow destination class SQLDataType as d
);

EXECUTION RESULT:
LocalTable: EMPS
LocalTable: TEMPS
SQLSimpleType: VARCHAR
The keyword union is used because the final results are based on the two paths combined (with duplicates removed as always). Intersect and other set operations are not currently supported. The follow qualifier used here is based on the class (Table or SQLDataType) reached rather than the association. The destination qualifier means filter based on the class reached rather than the starting class (the origin qualifier provides this alternative).

The example above shows diverging paths; it is also possible for paths to converge and then carry on together. Suppose we'd like to query the union of the columns of view EMPSVIEW and table DEPTS in schema SALES:


select c
from class Schema where name='SALES' then (
    follow destination class Table where name='DEPTS' 
    union
    follow destination class View where name='EMPSVIEW'
) gather then (
    follow destination class Column as c
);

EXECUTION RESULT:
StoredColumn: DEPTNO
StoredColumn: NAME
ViewColumn: EMPNO
ViewColumn: NAME
The gather clause combines the leaves of the unioned paths and then applies the last then clause to that combination. Schematically:

In some cases, a query may need to contain excursions; the gather with parent variation can be used for this purpose. For example, suppose we want to query the columns of a view together with all columns of directly underlying tables:


select c, t
from class View where name='JOINVIEW' then (
    follow association ElementOwnership destination class Dependency then (
        follow destination end supplier destination class Table as t
    )
) gather with parent then (
    follow association ClassifierFeature as c
);

EXECUTION RESULT:
LocalTable: DEPTS
LocalTable: EMPS
StoredColumn: AGE
StoredColumn: CITY
StoredColumn: DEPTNO
StoredColumn: DEPTNO
StoredColumn: EMPID
StoredColumn: EMPNO
StoredColumn: GENDER
StoredColumn: MANAGER
StoredColumn: NAME
StoredColumn: NAME
StoredColumn: PUBLIC_KEY
StoredColumn: SLACKER
ViewColumn: DNAME
ViewColumn: ENAME
The first follow chain defines the view dependency traversal (and demonstrates association filtering via end names). It is an "excursion" in the sense that regardless of what it finds, we also want to include the original view (the "parent") as a source for the final follow after the gather with parent. Here's the corresponding query graph:

Recursion

The examples so far have contained query graphs which match the structure of the underlying metamodel exactly. If the structure is complex, this may be cumbersome (consider enumerating all of the parts of an object such as a table). And in some cases, the structure may be recursive, allowing chains of arbitrary depth. To address these requirements, LURQL provides recursion. Here's a simple example which expands all of the containers of a table:

select *
from class Table where name='TEMPS' then (
    recursively (
        follow composite backward
    )
);

EXECUTION RESULT:
LocalCatalog: LOCALDB
LocalSchema: SALES
LocalTable: TEMPS
The query did not mention any particular association or class other than Table; the structure was discovered automatically by query execution.

Recursion may contain follow chains (or diverging chains as long as they all eventually converge back to a single endpoint) and may be followed by more non-recursive query processing. Here's the previous view dependency example, this time capable of handling arbitrary depth (the results are not shown since the example views only have one level):


select c, t
from class View where name='JOINVIEW' then (
    recursively (
        follow association ElementOwnership destination class Dependency then (
            follow destination end supplier destination class ColumnSet as t
        )
    )
) gather with parent then (
    follow association ClassifierFeature as c
);
Note that the dependency destination has been changed from Table to ColumnSet (which includes the views we want to recursively expand). Such a recursive query can be diagrammed as a cyclic graph:

Existence Filters

So far we have seen how LURQL can navigate complex models, but the filtering performed during navigation has been very localized. LURQL supports an exists clause for filtering based on the existence of related objects.

Suppose we'd like to find all schemas which contain tables named 'EMPS':


select s
from class Schema as s where exists (
    follow association ElementOwnership destination class Table 
    where name='EMPS'
);

EXECUTION RESULT:
LocalSchema: SALES
By default, exists tests whether results are found for any node in the subquery (which can contain any kind of LURQL construct, including union and recursion). For more selective existence tests, it is possible to make the decision based on a select list. Suppose we'd like to find tables with columns of type BOOLEAN:

select t
from class Table as t where exists d in (
    follow association ClassifierFeature then (
        follow destination class SQLDataType as d
        where name='BOOLEAN'
    )
);

EXECUTION RESULT:
LocalTable: EMPS
LocalTable: TEMPS
If multiple variables are specified in the select list, then the existence test passes if any of them return at least one result. (Union rather than intersection.)

SQL Integration

At times it can be useful to complement LURQL with SQL. When properly configured, a LURQL query processor can use SQL queries as sources for filter lists. Here's an example of how to get the datatypes for all columns having a name ending with 'NO':

select dt
from class Column where mofId in
[select "mofId" from sys_cwm."Relational"."Column" where "name" like '%NO']
then (
    follow destination class SQLDataType as dt
);

EXECUTION RESULT:
SQLSimpleType: INTEGER
Any query text in between square brackets is not parsed by LURQL; it is instead sent to the configured SQL connection for execution. The SQL query should return a result set with exactly one column.

In this case, the SQL query is against the catalog views derived from the same metadata being queried by LURQL. The "mofId" column is a special internal column representing the JMI object ID.

API

TBD

Formal Grammar


<lurql-query> ::= 
select <select-list>
from <root>

select-list ::=
'*'
| 
<id> [, <id> ... ]

<root> ::=
<simple-root>
|
<compound-root>

<simple-root> ::=
class <id> [ as <id> ]
[ <where-clause> ]
[ then <path-spec> ]

<compound-root> ::=
'(' <root> [ union <root> ... ] ')'
[ gather then <path-spec> ]

<where-clause> ::=
where <filter> [ and <filter> ... ]

<filter> ::= [ NOT ] <leaf-filter>

<leaf-filter> ::=
<id> '=' <value>
|
<id> matches <regular-expression-value>
|
<id> in '(' <value> [, <value> ... ] ')'
|
<id> in '[' <sql-query-text> ']'
|
<id> in <dynamic-param>
|
exists [ <select-list> in ] <path-spec>

<value> ::=
<literal>
| <dynamic-param>

<regular-expression-value> ::= <value>

<dynamic-param> ::=
'?' <id>

<path-spec> ::=
<path-spec-without-gather> 
[ gather [ with parent ] then <path-spec> ]

<path-spec-without-gather> ::=
'(' <path-branch> [ union <path-branch> ... ] ')'

<path-branch> ::=
<follow>
|
<recursion>

<follow> ::=
follow [ <association-filter> ... ]
[ as <id> ]
[ <where-clause> ]
[ then <path-spec> ]

<association-filter> ::=
( origin | destination ) ( end | class ) <id>
| composite
| noncomposite
| association <id>
| forward
| backward

<recursion> ::=
recursively ( <path-spec-without-gather> | '(' <path-spec> ')' )
[ then <path-spec> ]

<id> ::= 
<double-quoted-id> 
| <unquoted-id>

<literal> ::=
<single-quoted-string>
| null

Formal Semantics

TBD

Related Topics

For information on how to use LURQL queries to drive metadata visualization, see the DMV docs.

History and Credits

LURQL was originally developed in 1998 under the name BERQL by Roy Goldman, John Sichi, and others at Broadbase Software. BERQL was an acronym for Broadbase Entity Relationship Query Language, but there also happens to be a restaurant on Telegraph Avenue in Berkeley named Berkel Berkel. The original implementation of the language didn't have its own parser; instead, queries were phrased in XML (just like XQueryX), which was nice for machines but no so nice for humans.

In 2005, the code was contributed under GPL licensing to The Eigenbase Project via LucidEra, a startup which has acquired the rights to the Broadbase code. Besides rechristening as LURQL, major changes from BERQL include: