Package edu.upc.dama.dex.script can be used either by command line
or by function calls.
It is deisgned to interpret plain text files. These files have a
set of commands that are able to construct and load data into a
DEX database.
: TOC
Scripting utilities requires .des files in order to define
DEX graph schema.
The following example defines an schema to analyze phone
calls:
// calls-schema.des file create dbgraph PhoneCalls into 'calls.dex' create node 'person' ( 'name' string unique, 'phone' string indexed ) create edge 'calls' from 'person' to 'person' ( 'date' timestamp, 'duration' long )This schema defines one node type, person, and edge type, calls. Person has two attributes: name and phone. Calls is a materialized directed restriced edge type from person to person, it has two attributes: date and duration.
Previous .des file have defined an schema, but it has no
data. DEX scripts (.des) allow us to load data into
DEX database. Data is required to be present in
CSV format
An extra .des file is required to load an existing
CSV:
// calls-load.des dile use dbgraph PhoneCalls into 'calls.dex' load nodes 'persons.csv' columns 'name', 'phone' into 'person' fields terminated ';' from 1 mode rows load edges 'calls.csv' columns 'caller', 'called', 'date', 'duration' into 'calls' ignore 'caller', 'called' where tail 'caller' = 'person'.'phone' head 'called' = 'person'.'phone' fields terminated ';' from 1 mode rowsThis .des file load information from persons.csv and from calls.csv. The former contains two columns one with values for the name attribute, and another one for the phone attribute. The latter contains the required description to build person to person calls relationships (two columns: caller and called) and attribute values (two more columns: date and duration).
CSV files are readed by using {@link edu.upc.dama.dex.io} package.
They are simple comma sepparated files, as the ones generated
by most of database managers and applications.
The person.csv for
calls-schema.def is:
name;phone John Bronson;555312111 "William ""Will"" Thomson";555192939 Maria Garudo;555443322The calls.csv for calls-load.def is:
caller;called;date;duration 555192939;555443322;2001-05-23 13:37:27;12 555312111;555192939;2001-05-24 21:32:24;3 555443322;555192939;2001-05-24 01:49:48;54 555192939;555443322;2001-05-25 15:21:12;14
This files can be uploaded into a DEX database by using command
line. The command to run previous .des files is:
java edu.upc.dama.dex.script.ScriptParser calls-schema.des java edu.upc.dama.dex.script.ScriptParser calls-load.desThe former launches the script that creates the schema, the latter launches the script that loads data into the dex schema. The CSV files are required on the same directory in order to load their values. A relative or absolute path can also be used as a filename. The "jdex.jar" file must be in the java class path.
This files can be executed from java classes through API.
An application to run previous .des files is:
import edu.upc.dama.dex.script.ScriptParser; class Example { public static void main(String[] args) { ScriptParser.main(new String[] {"calls-schema.des"}); ScriptParser.main(new String[] {"calls-load.des"}); } }This class uses ScriptParser main static method in order to execute both DEX scripts.
A DEX Script file (.des) is a file that contains an
ordered list of commands.
DEX will execute each one of script file commands in order.
Commands will create schemas, define nodes and edges, and load
data into a previously defined DEX schema.
There are six main commands:
: TOC
Schema command defines and creates a graph schema into a DEX
schema database.
Schema definition has the alias, former name of database,
filename to store the database, and multiple node and edge
definition.
Node and edge definition has the type name, and a list of
their attributes names and types. Optionally an attribute can
be indexed or unique (the default is basic).
The attribute could be indexed later (see Schema update), when the
data had already been loaded.
Edge definition it is also parametrized to suport multiple types of
edges (directed, undirected, restricted, ...) and optionally
to specify connecting types (restricted), and attributes.
Also the edges can enable materialize neighbors or not.
Defined schema is created into a new DEX database and used on
following load commands.
Schema command syntax are as follows:
CREATE DBGRAPH alias INTO filename CREATE NODE node_type_name "(" [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC], ...] ")" CREATE [UNDIRECTED] EDGE edge_type_name [FROM node_type_name TO node_type_name] "(" [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT), ...] ") [MATERIALIZE NEIGHBORS]"
create dbgraph WIKIPEDIA into 'wikipedia.dex' create node TITLES ( ID int unique, 'TEXT' string , NLC string, TITLE string indexed ) create node IMAGES ( ID int unique, NLC string, filename string indexed ) create edge REFS ( NLC string, 'TEXT' string, TYPE string ) create edge IMGS ( ) create dbgraph FAMILY into 'family.dex' create node PERSON (NAME string indexed, ID int unique, YEAR int) create node DOG (NAME string indexed, YEAR int) create edge CHILD from PERSON to PERSON (YEAR int) create edge HUSBAND from PERSON to PERSON (YEAR int) materialize neighbors create edge WIFE from PERSON to PERSON (YEAR int) materialize neighbors create edge PET from PERSON to DOG () materialize neighbors create dbgraph CARMODEL into 'cars.dex' create node PERSON (NAME string, ID int unique, YEAR int) create node CAR (MODEL string, ID int, OWNER int indexed) create edge CAROWNER from PERSON to CAR
Loads an schema from an existing DEX database and uses it on
following load commands.
This command specifies an alias for a DEX database, loads the
DEX schema from the filename. Selected DEX database is used
on following load commands.
Use command syntax is the following:
USE DBGRAPH alias INTO filename
Load nodes command creates and sets attribute values for nodes
imported from a CSV. For each CSV row, a new node is created in
the previously selected DEX database
(see dbgraph use).
This command selects the file to read and sets the name of the columns to read
(they can be optionally * to ignore). The type of the new
nodes is defined by into parameter. All node attributes
are set from each column but the ones defined by
ignore parameter or named as *.
Default field delimiter is , but that behaviour can be
modified by fields parameter. All CSV lines are
read unless the parameter from or max
are present. The former allows to skip headers, like field
definition rows. The latter allows to test loads by reading
a limiter number of rows. Default load mode is by rows, but
mode parameter can change it for load optimization.
Load nodes command syntax is the following:
As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.
LOAD NODES file_name COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
load nodes 'titles.csv' columns ID, NLC, 'TEXT', TITLE into TITLES mode rows load NODES 'images.csv' columns ID, NLC, FILENAME into IMAGES from 1 max 10000 mode columns load nodes 'people.csv' columns *, DNI, NAME, AGE, *, ADDRESS into PEOPLE fields terminated ';' enclosed '"' mode rows
Load edges command creates links and sets attributes values for
edges imported from a CSV. For each CSV row a new edge is created
in the previously selected DEX database
(see dbgraph use).
This command has the same parameters that load node plus
edge specific parameters.
Edges are always created as a link between two previous existing nodes,
in order to select which ones the where parameter is used.
That means that you must create nodes (see load nodes
command) first.
It is important to note that referenced node attributes in the
where clause (tail and head both) must be indexed or unique attributes.
Tail node is defined by tail property, it looks for
the node where file column value is the same than the node
of an specific name with the same value at that specific attribute name.
Head node is defined by head property, like tail it
creates the edge against the specified node.
Load edges command syntax is the following:
As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.
LOAD EDGES file_name COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] WHERE TAIL (attribute_name|alias_name) = node_type_name.attribute_name HEAD (attribute_name|alias_name) = node_type_name.attribute_name [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
load edges 'references.csv' columns NLC, 'TEXT', TYPE, 'FROM' F, 'TO' T into REFS ignore F, T where tail F = TITLES.ID head T = TITLES.ID mode columns split partitions 3 load edges 'imagesReferences.csv.gz' columns 'From', 'To' into IMGS ignore 'From', 'To' where tail 'From' = TITLES.ID HEAD 'To' = IMAGES.ID mode rows
In these examples,
Schema update commands allows for updating the schema of a graph pool. Nowadays it is possible to remove node or edge types or attributes. The node attribute indexing can also be modified.
DROP (NODE|EDGE|ATTRIBUTE) name INDEX node_type_name.attribute_name [INDEXED|UNIQUE|BASIC]
drop attribute IMAGES.ID drop attribute IMAGES.NLC drop attribute IMAGES.FILENAME drop node 'IMAGES' drop edge IMGS index PEOPLE.NAME indexed index TITLES.TITLE unique
There are some conventions that are applied to all commands.
This conventions are the following:
SET TIMESTAMP FORMAT "yyyy-MM-dd hh:mm:ss"
to
change the default format of the Timestamps data to load on dex.
After use this, all timestamps data are loaded with this format.Created DEX Scripts (.des files) are executed on command line or inside a java application through API.
: TOC
DEX Script API provides a class with an executable main that allows
to execute a DEX Script file.
This command line tool requires one argument which is the path and
name of the .des file.
Command line syntax is the following:
java edu.upc.dama.dex.script.ScriptParser file.des
This tool requires whole DEX environment including DEX classpath and DEX JNI native libraries.
DEX provides and API to run DEX Scripts from Java applications.
The current package provides the
{@link edu.upc.dama.dex.script.ScriptParser} class that executes
DEX Script files and streams.
There are two ways to execute DEX Scripts from API: through main
or parser construction. The former reuses
{@link edu.upc.dama.dex.script.ScriptParser#main(String[]) }
operation executed from command line. It receives a string of one
element which is the file name.
An example is the following:
import edu.upc.dama.dex.script.ScriptParser; class Example1 { public void main(String[] args) { ScriptParser.main(new String[] {"file.def"}); } }Presented solution is limited to execute existing files on user file system and use new DEX instance object. The latter is to use an instance of {@link edu.upc.dama.dex.script.ScriptParser}. Its constructor receives a DEX object and a Reader object with a DEX file. Once the object is constructed, a call to parse will {@link edu.upc.dama.dex.script.ScriptParser#parse(boolean)} the reader into the DEX object. An example is the following:
import java.io.StringReader; import edu.upc.dama.dex.script.ScriptParser; class Example2 { public static void main(String args[]) throws Exception { DEX dex = new DEX(); StringReader reader= new StringReader( "dbgraph People into 'people.dex' ( "+ "node 'person' ( 'name' string ) "+ ")" ); ScriptParser ps = new ScriptParser(dex, reader); ps.parse(true); dex.close(); } }@see edu.upc.dama.dex.io