Release 2.0 - 12 August 2010
A description of the simple SQL-like Ganymede Query Language (GanyQL) supported by Ganymede 2.0.
Starting with version 2.0, Ganymede supports a simple textual query language to represent queries against the Ganymede data store. The Ganymede server incorporates a translator to turn queries specified in the new Ganymede Query Language (GanyQL) into the old-school linked tree representation that Ganymede has always supported behind the scenes.
Programmers writing code directly to the internal Ganymede API can make use of GanyQL to simplify their coding on plugins and the like and, best of all, end users can make use of GanyQL directly from the command line using the Ganymede 2.0 xmlclient.
by Jonathan Abbey, GanyQL designed and implemented by Deepak Giridharagopal.
In Ganymede 1.0, the only way to query the Ganymede data store was to manually create a Ganymede arlut.csd.ganymede.common.Query object with a linked tree of arlut.csd.ganymede.common.QueryNode objects representing the query's exploded parse tree. Writing code within Ganymede server plugins which carried out queries on the data store was was difficult, and creating shell scripts to run specific queries from the Ganymede server was simply impossible. All queries, no matter how simple, required the creation of custom Java code to assemble the appropriate structured query trees.
To bring light into this darkness, Ganymede 2.0 introduces a simple new Ganymede Query Language (GanyQL). GanyQL is very loosely patterned after SQL syntax, and is designed to make it possible to express queries on the Ganymede data store in a simple and direct fashion. All you need to do is write a bit of GanyQL and parsing code on the Ganymede server will take care of translating your request into the Ganymede query engine's internal structure.
GanyQL was designed and implemented by Deepak Giridharagopal using Terrance Parr's excellent ANTLR parser generation system. If you ever find yourself needing to incorporate language parsing into your development project, we recommend ANTLR to you without reservation. It is a superb tool.
Here are some examples of what GanyQL queries look like:
select from editable User where UID >= 2000
select Username,UID from editable User
select Username from User where expiration_date defined
select GroupName,GID from Group where Notes =~_ci "(atl|isl) group"
select Username,UID,Home_Group from editable User
where Home_Group->(GID >= 2000 and GID <= 8000)
select Name from "Owner Group" where Members->User->"Home Group"->Users len<= 5
select system_name from System where Virtual defined and owner_list == "ITS"
In English, these queries are as follows:
As you can see from these examples, you can express some quite involved queries with a very little bit of GanyQL.
When dealing with object types and field names, case is not significant, and you can use underscores for space characters if you like. In such cases, quotation marks around the object type or field name are not required.
Note that this freedom does not extend to string values (including object names) that you are seeking to match against. In such cases, the quotation marks are mandatory, and you may not substitute underscores for spaces.
As powerful as it is, and as much as the syntax is designed to look like SQL, GanyQL is really very different. Queries are made on objects, not tables, and you can't get back data from multiple kinds of objects with a single query. You can express queries and subqueries that traverse Ganymede's object pointer (invid) fields, though, which makes up somewhat for the lack of table joins and the like.
Perhaps most significant is the fact that, unlike SQL, GanyQL is just a query language. To actually submit data changes to the Ganymede server, you need either to use the xmlclient in conjunction with the Ganymede XML data format, or to write custom Java code.
To fully express the syntax of GanyQL would require an exposition as long and precise as the actual GanyQL grammar. If you've had any experience with abstract syntax definitions, you may find reading the grammar both interesting and informative.
The GanyQL syntax is reasonably simple. All queries in GanyQL have this general structure:
The words in boldface are mandatory keywords. Words in italic boldface are optional, and text in plain italics are where you will have to provide input.
The "Object Type" string is the name of the type of object in the Ganymede data store that you want to query. The (double or single) quotation marks are only needed here if the object type has a space in it, and you want to represent it in its native form.
The return values can be skipped or be the object keyword, if you want to get back all fields from the object, or a comma-separated list of field name strings to be retrieved from "Object Type".
The editable keyword is optional. If it is present, the query will only examine and return objects that you have permission to check out for editing. If editable is not present, the query will match against all objects of the appropriate type that you have permission to view.
Finally, there's the optional where clause and test expression. If you do not provide a where clause, all objects of the appropriate Object Type that you have permission to see and/or edit will be returned.
If you do provide a where clause, you can place specific restrictions on the objects you are interested in querying.
The where clause consists of the where keyword followed by a logical combination of field tests. These tests are specified using simple operators, and can be combined using the logical operators and, or, and not. You can also use parentheses to control the grouping of expressions.
The field tests supported by GanyQL are listed in the following table:
Operator | Meaning | Field Types Supporting | Argument Types |
---|---|---|---|
=~ | Regular Expression Match | Invid, I.P. Address, String | Regular Expression |
=~_ci | Regular Expression Match (case insensitive) | Invid, I.P. Address, String | Regular Expression |
== | Equality | Boolean, Date, Float, Invid, I.P. Address, Number, String | String, Date, Booleans, Numeric, I.P. Address |
==_ci | Equality (case insensitive) | Invid, I.P. Address, String | String, Date, Booleans, Numeric, I.P. Address |
< | Less Than | Date, Float, Invid, Number, String | String, Date, Numeric |
<= | Less Than or Equal | Date, Float, Invid, Number, String | String, Date, Numeric |
> | Greater Than | Date, Float, Invid, Number, String | String, Date, Numeric |
>= | Greater Than or Equal | Date, Float, Invid, Number, String | String, Date, Numeric |
starts | Begins With | Invid, I.P. Address, String | String |
ends | Ends With | Invid, I.P. Address, String | String |
len< | Array Length Less Than | Any array field | Numeric |
len<= | Array Length Less Than or Equal | Any array field | Numeric |
len> | Array Length Greater Than | Any array field | Numeric |
len>= | Array Length Greater Than or Equal | Any array field | Numeric |
len== | Array Length Equality | Any array field | Numeric |
defined | Field is defined (not empty) | All | n/a |
Note that most of these field test operators only work on certain kind of Ganymede data fields. See XXX for details.. ??
Most of these field test operators take a pair of arguments, with the name of the field to look at before the operator and a value to compare the field to after the operator.
The sole exceptions to this pattern is the defined operator, which does not take a comparison value.
The field tests can be combined with the following operators:
Operator | Meaning |
---|---|
and | Logical AND |
or | Logical OR |
not | Logical Negation |
( ) | Grouping |
Finally, GanyQL features a special pointer field dereference operator, which can be used to extend queries to objects linked from the primary object type being queried.
Operator | Meaning |
---|---|
-> | Test Linked Object |
There are two ways to use the -> operator. You can use the pointer dereference operator to execute a specific test against a field in the object pointed to by the invid field being dereferenced, or you can use the parentheses to execute a logical combination of tests on multiple fields in the object pointed to.
For example, both of these examples work just fine:
"invid field"->field == "value"
"invid field"->(field 1 == "value1" and field2 < 200)
In the latter case, the tests against "field1" and "field2" are evaluated for each object that is pointed to by "invid field" in the primary query. If "invid field" is a multivalued invid array field, the test will be applied against each object pointed to by "invid field" until the test is passed or all invids in the array have been examined, whichever comes first.
Through the use of the pointer dereference operator, it is possible to create queries that can examine quite a lot of objects in an attempt to determine matches against the query expression.
Here's another example:
"invid field"->"invid field2"->(GID <= 3000 or Users->(UID < 4000 or "User Netgroup"->Members->UID > 8192))
This query expression returns true for all objects which have an "invid field" which points to an object which has an "invid field2" that points to an object that has either a GID field that is less than or equal to 3000, or that has a Users field which points to an object that has either a UID field which is less than 4000 or a "User Netgroup" field which points to an object that has a Members field which points to an object that has a UID field which holds a value greater than 8192.
If we assume that all invid fields involved in pointer dereferencing in this query are single-valued (not array) fields, to evaluate this query could require the Ganymede server to examine fully six objects:
If any of those invid (object pointer) fields are actually multivalued array fields, the number of objects that may need to be examined to determine the truth of this expression for a single object may increase by the product of the number of values in each of these invid fields. If, say, the "Users", "User Negroup", and "Members" fields were each array invid fields, with four values in each of the fields we examine to test this query expression, that would add up to an additional 64 objects that might need to be examined in order to determine the truth of this expression for a single "invid field".
In general, you probably won't need to get this crazy with subqueries, but it's nice to know that the system is capable of doing whatever work is required to get the job done for stuff like this.
Anywhere where you provide a string delimited by single or double quote characters, you are dealing with a string value. All string values handled by the GanyQL parser support a simple backslash escape encoding to handle inclusion of the quotation characters within the string.
Here are the encoding rules:
Character | Encoding |
---|---|
' | \' |
" | \" |
\ | \\ |
<newline> | \n |
So if we wanted to look for a user whose name includes a pair of double quote characters, for some reason, we could do it in one of two ways:
First, by using the opposite style of quotation for the string value:
Second, by using backslash escaping:
We can also use the escaping rules to embed newlines into the query string:
Remember that these character escaping rules are in addition to the standard rules for regular expression syntax, so if you are wanting to use a regular expression match to search for a string that contains a backslash character (which has special meaning to the regular expression parser) you'd need to do something like this:
In this case, we're really looking for any user whose full name includes the substring Wu\Tang. We need to add one backslash to get past the initial GanyQL parser, so we'd have Wu\\Tang, to get Wu\Tang to the regular expression parser. The regular expression parser, however, requires a backslash escape itself. Thus we need to provide Wu\\\\Tang to get the proper encoding all the way down to the regular expression parser.