Provides classes and tools used to help to create schemas and load data. These utilities are able to read a plain text file containing commands called dex scripts. Commands describes how to create DEX schemas and load data into DEX.

Table of Contents

1. Introduction

Package edu.upc.dama.dex.script can be used either by command line or by function calls.
It is deisgned to interpret plain text files. These files have a set of commands that are able to construct and load data into a DEX database.

: TOC

1.1. Quick .des File

Scripting utilities requires .des files in order to define DEX graph schema.
The following example defines an schema to analyze phone calls:

                // calls-schema.des file
                create dbgraph PhoneCalls into 'calls.dex'
                
                create node 'person' (
                    'name' string unique,
                    'phone' string indexed
                    )
                    
                create edge 'calls' from 'person' to 'person' (
                    'date' timestamp,
                    'duration' long
                    )
            
This schema defines one node type, person, and edge type, calls. Person has two attributes: name and phone. Calls is a materialized directed restriced edge type from calls, it has two attributes: date and duration.

Previous .des file have defined an schema, but it has no data. DEX scripts (.des) allow us to load data into DEX database. Data is required to be present in CSV format An extra .des file is required to load previous existing CSV:

                // calls-load.des dile
                use dbgraph PhoneCalls into 'calls.dex'

                load nodes 'persons.csv'
                    columns 'name', 'phone'
                    into 'person'
                    fields terminated ';'
                    from 1
                    mode rows

                load edges 'calls.csv'
                    columns 'caller', 'called', 'date', 'duration'
                    into 'calls'
                    ignore 'caller', 'called'
                    where tail 'caller' = 'person'.'phone' head 'called' = 'person'.'phone'
                    fields terminated ';'
                    from 1
                    mode rows
            
This .des file load information from persons.csv and from calls.csv. The former contains two columns one for values for name attribute, and an other one for phone attribute. The latter contains the required description to build person to person calls relationships (two columns: caller and called) and attribute values (two more columns: date and duration).

: TOC Introduction

1.2. Quick .csv File

CSV files are readed by using {@link edu.upc.dama.dex.io} package. They are simple comma sepparated files, as the ones generated by most of database managers and applications.
The
person.csv for calls-schema.def is:

                name;phone
                John Bronson;555312111
                "William ""Will"" Thomson";555192939
                Maria Garudo;555443322
            
The calls.csv for calls-load.def is:
                caller;called;date;duration
                555192939;555443322;2001-05-23 13:37:27;12
                555312111;555192939;2001-05-24 21:32:24;3
                555443322;555192939;2001-05-24 01:49:48;54
                555192939;555443322;2001-05-25 15:21:12;14
            

: TOC Introduction

1.3. Quick Command Line

This files can be uploaded into a DEX database by using command line. The command to run previous .des files is:

                java edu.upc.dama.dex.script.ScriptParser calls-schema.des
                java edu.upc.dama.dex.script.ScriptParser calls-load.des
            
The former launches the script that creates the schema, the latter launches the script that loads data into the dex schema. The CSV files are required on the same directory in order to load their values.

: TOC Introduction

1.4. Quick Java Invocation

This files can be executed from java classes through API. An application to run previous .des files are:

                import edu.upc.dama.dex.script.ScriptParser;
                class Example {
                    public void main(String[] args)
                    {
                        ScriptParser.main(new String[] {"calls-schema.des"});
                        ScriptParser.main(new String[] {"calls-load.des"});
                    }
                }
            
This class uses ScriptParser main static method in order to execute both DEX scripts.

: TOC Introduction

2. Script .des File Format

A DEX Script file (.des) is a file that contains an ordered list of commands. DEX will execute each one of script file commands in order. Commands will create schemas, define nodes and edges, and load data into a previous defined DEX schema.
There are six main commands:

Commands shares the same naming and spacing conventions presented in this document.

: TOC

2.1. Schema Definition

Schema command defines and creates a graph schema into a DEX schema database.
Schema definition has the alias, former name of database, filename to store the database, and multiple node and edge definition.
Node and edge definition has the type name, and a list of their attributes names and types. Optionally an attribute can be indexed or unique (the default is basic). The attribute could be indexed later (see Schema update), when the data had already been loaded.
Edge definition it is also parametrized to suport multiple types of edges (directed, undirected, restricted, ...) and optionally to specify connecting types (restricted), and attributes (virtual). Also, materialized edges (that is, all non-virtual edges) can enable materialize neighbors or not.
Defined schema is created into a new DEX database and used on following load commands.
Schema command syntax are as follows:

            CREATE DBGRAPH alias INTO filename 
            
            CREATE NODE node_type_name "("
                [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC], ...]
            ")"

            CREATE [UNDIRECTED|VIRTUAL] EDGE edge_type_name
                [FROM node_type_name[.attribute_name] TO node_type_name[.attribute_name]] "("
                [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT), ...]
            ") [MATERIALIZE NEIGHBORS]"
        

Notice that, in some cases, it's necessary to put the alias, filename, node_type_name, attribute_name, and the edge_type_name in inverted commas as done in the following example. Otherwise, special words reserved for the schema definition could be used for defining these parameters and, consequently, this will generate an error because the script parser will read those words as reserved words instead of as parameters for defining the schema. So, the reserved words are:


Examples
            create dbgraph WIKIPEDIA into 'wikipedia.dex'
            create node TITLES (
                ID int unique,
                'TEXT' string ,
                NLC string,
                TITLE string indexed
            )
            create node IMAGES (
                ID int unique,
                NLC string,
                filename string indexed
            )
            create edge REFS (
                NLC string,
                'TEXT' string,
                TYPE string
            )
            create edge IMGS ( )

            create dbgraph FAMILY into 'family.dex'
            create node PERSON (NAME string indexed, ID int unique, YEAR int)
            create node DOG (NAME string indexed, YEAR int)
            create edge CHILD from PERSON to PERSON (YEAR int)
            create undirected edge MARRIED from PERSON to PERSON (YEAR int) materialize neighbors
            create edge PET from PERSON to DOG () materialize neighbors

            create dbgraph CARMODEL into 'cars.dex'
            create node PERSON (NAME string, ID int unique, YEAR int)
            create node CAR (MODEL string, ID int, OWNER int indexed)
            create virtual edge CAROWNER from PERSON.ID to CAR.OWNER
        

: TOC File Format

2.2. Schema Use

Loads an schema from an existing DEX database and uses it on following load commands.
This command specifies an alias for a DEX database, loads the DEX schema from the filename. Selected DEX database is used on following load commands.
Use command syntax is the following:

            USE DBGRAPH alias INTO filename
        

: TOC File Format

2.3. Data Node Load

Load nodes command creates and sets attributes values for nodes imported from a CSV. For each CSV row it is created a new nodes in a previous selected DEX database (see dbgraph use).
It command selects the file to read, the name of columns to read (they can be optionally * to ignore). Created nodes type is defined by into parameter. All node attributes are set from each column but the ones defined by ignore parameter or named as *. Default field delimiter is ; but the behaviour can be modified by field parameter. All CSV lines are readed except defining from parameter or max parameter. The former allows to skip headers, like field definition rows. The latter allows to test loads by reading a limiter number of rows. Default load mode is by rows, but mode parameter can change it for load optimization.
Load nodes command syntax is the following:

As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.

            LOAD NODES file_name
                COLUMNS attribute_name [alias_name], ...
                INTO node_type_name
                [IGNORE (attribute_name|alias_name), ....]
                [FIELDS
                    [TERMINATED char]
                    [ENCLOSED char]
                    [ALLOW_MULTILINE]]
                [FROM num]
                [MAX num]
                [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
        
Examples
            load nodes 'titles.csv'
                columns  ID, 'TEXT', NLC, TITLE
                into TITLES
                mode row

            load NODES 'images.csv'
                columns ID, NLC, FILENAME
                into IMAGES
                from 2
                max 10000
                mode columns

            load nodes 'people.csv'
                columns *, DNI, NAME, AGE, *, ADDRESS
                into PEOPLE
                mode row
        

: TOC File Format

2.4. Data Edge Load

Load edges command creates, links and sets attributes values for edges imported from a CSV. For each CSV row it is created a new edge in a previous selected DEX database (see dbgraph use).
This command has the same parameters than load node plus edge specific parameters.
Edges are always created as link of two previous existing nodes, in order to select which ones where parameter is used. That means that you must create nodes (see load nodes command) first.
It is important to note that referenced node attributes in the where clause (tail and head both) must be indexed or unique attributes.
Tail node is defined by tail property, it looks for the node where file column value is the same than the node of an specific name with the same value at specific attribute name. Head node is defined by head property, like tail it creates the edge against the specified node.
Load edges command syntax is the following:

As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.

            LOAD EDGES file_name
                COLUMNS attribute_name [alias_name], ...
                INTO node_type_name
                [IGNORE (attribute_name|alias_name), ....]
                WHERE
                    TAIL (attribute_name|alias_name) = node_type_name.attribute_name
                    HEAD  (attribute_name|alias_name) = node_type_name.attribute_name
                [FIELDS
                    [TERMINATED char]
                    [ENCLOSED char]
                    [ALLOW_MULTILINE]]
                [FROM num]
                [MAX num]
                [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
        
Examples
            load edges 'references.csv'
                columns NLC, 'TEXT', TYPE, FROM F, TO T
                into REFS
                ignore F, T
                where tail F = TITLES.ID head T = TITLES.ID
                mode columns split partitions 3

            load edges 'imagesReferences.csv'
                columns From, To
                into IMGS
                ignore From, To
                where tail From = TITLES.ID HEAD To = IMAGES.ID
                mode rows

            load edges 'calls.gz'
                columns From, To, Time, Long
                into CALLS
                ignore From, To
                where tail From = PEOPLE.DNI head To = PEOPLE.DNI
                mode columns
        

In these examples, TITLES.ID, IMAGES.ID, and PEOPLE.DNI must be indexed (or unique) attributes.

: TOC File Format

2.5. Schema Update

Schema update commands allows for updating the schema of a graph pool. Nowadays it is possible to remove node or edge types or attributes. The node attribute indexing can also be modified.

            DROP (NODE|EDGE|ATTRIBUTE) name

            INDEX node_type_name.attribute_name [INDEXED|UNIQUE|BASIC]
        
Examples
            drop edge REFS

            drop node 'TITLES'

            drop attribute PEOPLE.DNI

	    index PEOPLE.NAME indexed

            index CAR.ID unique
        

: TOC File Format

2.6. DEX Script Conventions

There are some conventions that are applied to all commands. This conventions are the following:

: TOC File Format

3. Running a DEX Script

Created DEX Scripts (.des files) are executed on command line or inside a java application through API.

: TOC

3.1. Running from Command Line

DEX Script API provides a class with an executable main that allows to execute a DEX Script file. This command line tool requires one argument which is the path and name of the .des file. Command line syntax is the following:

            java edu.upc.dama.dex.script.ScriptParser file.des
        

This tool requires whole DEX environment including DEX classpath and DEX JNI native libraries.

: TOC Running scripts

3.2. Running from Java Application

DEX provides and API to run DEX Scripts from Java applicaitons. The current package provides of {@link edu.upc.dama.dex.script.ScriptParser} class that executes DEX Script files and streams.
There are two ways to execute DEX Scripts from API: through main or parser construction. The former reuses {@link edu.upc.dama.dex.script.ScriptParser#main(String[]) } operation executed from command line. It receives a string of one element which is the file name. An example is the following:

                import edu.upc.dama.dex.script.ScriptParser;
                class Example1 {
                    public void main(String[] args)
                    {
                        ScriptParser.main(new String[] {"file.def"});
                    }
                }
            
Presented solution is limited to execute existing files on user file system and use new DEX instance object. The latter is to use an instance of {@link edu.upc.dama.dex.script.ScriptParser}. Its constructor receives a DEX object and a Reader object with a DEX file. Once the object is constructed, a call to parse will {@link edu.upc.dama.dex.script.ScriptParser#parse(boolean)} the reader into the DEX object. An example is the following:
                import java.io.StringReader;
                import edu.upc.dama.dex.script.ScriptParser;
                class Example2 {
                      public static void main(String args[]) throws Exception {
                            DEX dex = new DEX();
                            StringReader reader= new StringReader(
                                    "dbgraph People into 'people.dex' ( "+
                                            "node 'person' ( 'name' string ) "+
                                            ")"
                                    );
                            ScriptParser ps = new ScriptParser(dex, reader);
                            ps.parse(true);
                            dex.close();
                      }
                }
            

: TOC Running script

@see edu.upc.dama.dex.io