DEX
There are two different licenses:
- Academic license:
DEX is available for personal and academic use at the
DEX home page. This
version is restricted in the amount of information it deals with.
Specifically, it is restricted to one milion of nodes for the
persistent graph database and to one user's session.
License details are shown here.
- Business license:
Contact us to
deal a business and unlimited license.
License code is required, thus have a look
at configuration section.
DEX requires:
- Java runtime environment 5.0 or higher (it is recommended the
Sun Microsystems' JVM)
- Operating system: Windows or Linux, both 32 or 64 bits,
or Mac OSX Snow Leopard.
Just add jdex.jar to the Java classpath. For example, to run
the java program myprog having the DEX library
installed in the path ./dex:
$JAVA_HOME/bin/java -cp ./dex/jdex.jar myprog
DEX is a high-performance graph database that allows for an
efficient storage and management of very large graphs.
Implemented as a Java library for persistent graph-like data manipulation
and query system, it fulfills the
conditions of a graph database model since (i) its data representation
is in the form of a large graph, (ii) the query operations are based
on graph operations or extensions to graph operations, (iii) query
results are also in the form of new graphs, and (iv) there are
constraints based on node and edge types, explicit and implicit relationships,
and attribute domains.
The basic logical data structure in DEX is a labeled and
directed attributed multigraph, where:
- Labeled graph
- A graph which has a label for each node and edge which denotes the
node or edge type.
- Directed graph
- A graph which allows for edges with a fixed direction from the
tail or source node to the head or destination node.
- Attributed graph
- A graph which allows a variable list of attributes for each node or
edge, where an attribute is a value associated to a string which
identifies the attribute.
- Multigraph
- A graph which allows multiple edges between two nodes even if those
edges have the same label (that is, they belong to the same edge
type) and direction.
A {@link edu.upc.dama.dex.core.DEX} instance is a graph database management
system that manages one or more graph databases. Each graph database
is managed by an instance of the {@link edu.upc.dama.dex.core.GraphPool}
class which is responsible for all the memory and I/O management of a
graph database.
The persistent graph database is stored in a single DEX data file.
Temporary data generated during the activity of a DEX application,
such as {@link edu.upc.dama.dex.core.RGraph}s
or {@link edu.upc.dama.dex.core.Objects}, is stored in a different
temporary file which is destroyed when the
{@link edu.upc.dama.dex.core.GraphPool} is closed.
Thus, a new empty {@link edu.upc.dama.dex.core.GraphPool} can be
created ({@link edu.upc.dama.dex.core.DEX#create(File, String)}), or an
existing {@link edu.upc.dama.dex.core.GraphPool} can be open from a
persistent DEX data file ({@link edu.upc.dama.dex.core.DEX#open(File)}).
Moreover, all activity with the graph database must be done through a
user's sessions which is created with the
{@link edu.upc.dama.dex.core.GraphPool#newSession} method.
A {@link edu.upc.dama.dex.core.Session} keeps and manages all temporary
data, which is exclusive for the session.
All DEX graphs belong to a {@link edu.upc.dama.dex.core.Session} and
are valid just while its {@link edu.upc.dama.dex.core.Session} is
active, that is it has not been closed.
Also, the persistent {@link edu.upc.dama.dex.core.DbGraph} as well as
all temporary {@link edu.upc.dama.dex.core.RGraph}s are specializations
of the {@link edu.upc.dama.dex.core.Graph} class which follows the
logical graph model defined previously.
All nodes and edges (the graph objects) into a {@link edu.upc.dama.dex.core.Graph}
are typed (labeled). Each object (node or edge) type has a unique numeric
identifier and a unique type name.
Each node into a DEX {@link edu.upc.dama.dex.core.Graph} has the
following properties:
- It belongs to a node type.
- It has a unique numeric identifier or OID (object identifier).
- It can have several attributes.
Each edge into a {@link edu.upc.dama.dex.core.Graph} has the
following properties:
- It belongs to an edge type.
- It has a unique numerical identifier or OID (object identifier).
- It relates two nodes.
- It can have several attributes.
There are different kinds of edge types:
- Directed: The edge has a direction from the tail or source node
to the head or destination node.
- Restricted: The edge types restricts the type of the
tail nodes and the type of the head nodes.
- Unrestricted: There is no restriction on the type
of the tail nodes and the type of the head nodes.
- Undirected: Both nodes of a undirected edge type are
indistinctly the tail or the head (there is no direction).
Virtual edges are a special kind of edge which are not materialized.
A virtual edge exist only for navigational purposes. They are defined
between two attributes which both must belong to the same domain in such a way
that a virtual edge will exist between two nodes if they have the
same value for the attributes which define the virtual edge. In fact, they
are similar to a foreign key in a relational model.
Thus, a virtual edge:
- It does not have an OID
- It cannot have attributes
All objects (node or edges) into a {@link edu.upc.dama.dex.core.Graph}
can have attributes. An attribute has the following properties:
- It is always defined for a single object type. Two types can not
share an attribute. There is an exception: global attributes.
- It has an unique numerical identifier.
- It has an unique attribute name for the type which it belongs, but
two attributes for two different types can have the same name.
- Only node or edge objects belonging to the type
which the attribute is defined will be able to have values for the
given attribute.
- An attribute is associated to a domain which can be
{@link edu.upc.dama.dex.core.Value#BOOL},
{@link edu.upc.dama.dex.core.Value#INT},
{@link edu.upc.dama.dex.core.Value#LONG},
{@link edu.upc.dama.dex.core.Value#DOUBLE},
{@link edu.upc.dama.dex.core.Value#STRING}, or
{@link edu.upc.dama.dex.core.Value#TIMESTAMP}, or
{@link edu.upc.dama.dex.core.Value#TEXT}.
- Attributes can be indexed or not. Indexed attributes can be
used at query methods such as select.
When an attribute is not indexed it only can be used to store and
query the value of a given object or oid.
- A unique integrity constraint can be set for indexed attributes.
Class {@link edu.upc.dama.dex.core.Value} allows for setting or getting
attribute values of nodes and edges.
Global attributes are those defined for all node and edge types.
That is, all node or edge objects (no matters which type they belong to)
can set and get values from that attribute identifier. To do that,
{@link edu.upc.dama.dex.core.Graph#GLOBAL_TYPE} must be used when creating
the attribute.
- Graph instantiation
- {@link edu.upc.dama.dex.core.DEX#create(File,String)}
Creates a {@link edu.upc.dama.dex.core.GraphPool}.
- {@link edu.upc.dama.dex.core.DEX#open(File)}
Opens an existing {@link edu.upc.dama.dex.core.GraphPool}.
- {@link edu.upc.dama.dex.core.GraphPool#newSession}
Creates a new {@link edu.upc.dama.dex.core.Session}.
- {@link edu.upc.dama.dex.core.Session#getDbGraph()}
Gets the {@link edu.upc.dama.dex.core.DbGraph}.
- {@link edu.upc.dama.dex.core.Session#newGraph()}
Creates a temporary {@link edu.upc.dama.dex.core.Graph}.
- Graph schema manipulation
- {@link edu.upc.dama.dex.core.Graph#newNodeType(String)}
Creates a node type.
- {@link edu.upc.dama.dex.core.Graph#newEdgeType(String, boolean)}
Creates an edge type.
- {@link edu.upc.dama.dex.core.Graph#newRestrictedEdgeType(String, int, int)}
Creates an edge type where the tail nodes and head nodes
are restricted to the given node types.
- {@link edu.upc.dama.dex.core.Graph#newVirtualEdgeType(String, long, long)}
Creates a virtual edge type between the two given attributes.
- {@link edu.upc.dama.dex.core.Graph#newAttribute(int, String, short, short)}
Creates an attribute for the given node or edge type and
with the specified domain. The attribute can be indexed or not.
- Data manipulation
- {@link edu.upc.dama.dex.core.Graph#newNode(int)}
Creates a node instance of the given node type.
- {@link edu.upc.dama.dex.core.Graph#newEdge(long, long, int)}
Creates an edge instance between the two given nodes of the
given edge type.
- {@link edu.upc.dama.dex.core.Graph#getAttribute(long, long, edu.upc.dama.dex.core.Value)}
Gets the value of the given node or edge instance and the given
attribute.
- {@link edu.upc.dama.dex.core.Graph#setAttribute(long, long, edu.upc.dama.dex.core.Value)}
Sets or replaces the value of the given node or edge instance
and the given attribute.
- {@link edu.upc.dama.dex.core.Graph#drop(long)}
Drops the given node or edge instance and removes all its
attribute values.
Here there is an example of a directed multigraph.
DEX queries ara implemented as a combination of low-level graph-oriented
operations, which are highly optimized to get the maximum from the data
structures.
These operations can be grouped in different areas.
- {@link edu.upc.dama.dex.core.Graph#select(int)}
It selects the collection of objects of the given type.
- {@link edu.upc.dama.dex.core.Graph#select(long, short, Value)}
It selects the collection of objects which satisfy a condition for the
given attribute and {@link edu.upc.dama.dex.core.Value}. The attribute
must be indexed.
- {@link edu.upc.dama.dex.core.Graph#explode(long, int, short)}
It gets the collection of edge identifier belonging to an edge type
and where the given node identifier is the tail or head of the obtained
edges. This operation is supported only by the materialized edge types
but not by the virtual edges which have not edge identifiers.
- {@link edu.upc.dama.dex.core.Graph#neighbors(long, int, short)}
It gets the collection of node identifier which are related by means
of the given edge type and where the given node identifier is the
tail or head of the implicit relationships. This operation is supported
by all edge types, materialized or virtual.
The {@link edu.upc.dama.dex.core.Objects} class is used to manage collections
of object (node or edge) identifiers, which are the result
of most of query methods. These collections can be traversed by means of
{@link edu.upc.dama.dex.core.Objects.Iterator}s.
Additionally, a user can build its own temporary collections to store object
identifiers. These collections can be updated adding or removing elements
from them or using different methods to combine them.
Collections consume important resources from the core memory and from the
temporary storage. Thus, it is important to close
({@link edu.upc.dama.dex.core.Objects#close()})
unused collections as soon as possible to recover their memory and resources.
- {@link edu.upc.dama.dex.core.Objects#union(edu.upc.dama.dex.core.Objects)}
It performs the union (OR) operation of the two collections.
- {@link edu.upc.dama.dex.core.Objects#intersection(edu.upc.dama.dex.core.Objects)}
It performs the intersection (AND) operation of the two collections.
- {@link edu.upc.dama.dex.core.Objects#difference(edu.upc.dama.dex.core.Objects)}
It performs the difference operation of the two collections.
A {@link edu.upc.dama.dex.core.GraphPool} can manage several temporary
graphs, or {@link edu.upc.dama.dex.core.RGraph}s.
{@link edu.upc.dama.dex.core.RGraph}s can inherit or copy nodes
from another {@link edu.upc.dama.dex.core.Graph} by means of
the {@link edu.upc.dama.dex.core.RGraph#addNode(edu.upc.dama.dex.core.Graph, long, boolean)}
method. When a node is added, its node type and the attribute defined
over its node type are added too. In case of a node from a
{@link edu.upc.dama.dex.core.DbGraph}, the {@link edu.upc.dama.dex.core.RGraph}
will keep only a reference to it,
which is called node inheritance. In case of a node from
another {@link edu.upc.dama.dex.core.RGraph}, the {@link edu.upc.dama.dex.core.RGraph} will make a complete copy of it.
Some runtime configuration parameters can be set for a DEX instance.
This configuration can be set in two different ways:
- Through a DEX properties file.
Have a look at {@link edu.upc.dama.dex.utils.DEXConfig} for
extended info.
- Through an instance of the class {@link edu.upc.dama.dex.core.DEX.Config}.
One of these configuration parameters is the license code.
The users' license restricts some capabilities of the technology such as
maximum number of nodes or maximum number of concurrent Sessions.
Some additional modules allows to perform different kind of tasks.
There are some methods to get information from a
{@link edu.upc.dama.dex.core.GraphPool}. These methods are:
- {@link edu.upc.dama.dex.core.GraphPool#dumpData(Session, String)}
To get a summary of the existing data into the graph database.
- {@link edu.upc.dama.dex.core.GraphPool#dumpStorage(Session, String)}
To get information about the internal strucutures of the
graph database.
Also it is possible to get some of these
{@link edu.upc.dama.dex.core.GraphPool.Statistics}
by means of the method
{@link edu.upc.dama.dex.core.GraphPool#getStatistics}.
It is possible to export a {@link edu.upc.dama.dex.core.Graph} to different
formats in order to visualize it. To do that it is necessary to implement
the {@link edu.upc.dama.dex.core.Export} interface and call the
{@link edu.upc.dama.dex.core.Graph#export(PrintWriter, edu.upc.dama.dex.core.Export.Type, edu.upc.dama.dex.core.Export)}
method.
Nowadays it is possible to export a {@link edu.upc.dama.dex.core.Graph}
to:
- {@link edu.upc.dama.dex.core.Export.Type#YGRAPHML}.
- {@link edu.upc.dama.dex.core.Export.Type#GRAPHML}.
- {@link edu.upc.dama.dex.core.Export.Type#GRAPHVIZ}.
Package {@link edu.upc.dama.dex.script} allows to create and populate
a graph database from CSV files and by means of a scripting language.
This way it is not necessary to develop any piece of code.
Package {@link edu.upc.dama.dex.io} allows to easily create and populate
a graph database from CSV files or relational databases by means of a
set of classes. In this case it will be necessary to develop a piece of
code which uses classes from this package.
Package {@link edu.upc.dama.dex.shell} contains an interactive shell
to query an existing graph database, that is an image file where a
{@link edu.upc.dama.dex.core.GraphPool} was previously created.
Package {@link edu.upc.dama.dex.algorithms} allows to run useful
algorithms among graphs in order to solve the following known problems:
- {@link edu.upc.dama.dex.algorithms.SinglePairShortestPathBFS}:
to solve the single pair shortest problem in unweighted graphs.
- {@link edu.upc.dama.dex.algorithms.SinglePairShortestPathDijkstra}:
to solve the single pair shortest problem in weighted graphs.
- {@link edu.upc.dama.dex.algorithms.TraversalBFS}:
to traverse a graph using the BFS algorithm.
- {@link edu.upc.dama.dex.algorithms.TraversalDFS}:
to traverse a graph using the DFS algorithm.
- {@link edu.upc.dama.dex.algorithms.WeakConnectivityDFS}:
to find all the weakly connected components in an undirected
graph.
- {@link edu.upc.dama.dex.algorithms.StrongConnectivityGabow}:
to find all the strongly connected components in a directed
graph.
DEX has been built using an architecture of two layers as shown if the
figure above.
- dexcore: Native private core library.
libjdex.so for Linux systems,
libjdex.dylib for MacOSX systems and
jdex.dll for Windows systems.
- jdex: Java public API.
Java applications must be built on top of the JDEX Java public API.
Also, the public library must be installed into the Java classpath
(the native library is included into the JDEX Java public library and
automatically loaded from there).
Some well known databases were loaded for performance evaluation:
- IMDB, data until 2007.
- A synthetic social network model where users, photos, tags,
and groups are related.
- Scopus, data just include
Information Technology and Spain.
- Wikipedia, data until 2007.
- Xmark, data generated
for three different scale factors.
Benchmark | #Nodes | #Edges | Raw (GB) | Dex (GB) | Load |
IMDB | 13338793 | 22259869 | 1.5 | 2.4 | 21m 8s |
Social network | 90873017 | 663546226 | 11 | 32 | 298m |
Scopus | 15107484 | 113294315 | 7 | 8 | 115m 50s |
Wikipedia | 19490489 | 180230453 | 5.5 | 7.6 | 129m 53s |
Xmark_25 | 4970162 | 9848812 | 1.9 | 2 | 15m |
Create and get a DbGraph, the instance which represents the persistent graph data base.
DEX dex = new DEX();
GraphPool gpool = dex.create("C:/image.dex");
Session sess = gpool.newSession();
DbGraph dbg = sess.getDbGraph();
...
...
sess.close();
gpool.close();
dex.close();
Create new node types and their attributes.
After that, create new node objects and set a Value for their attributes.
...
sess.beginTx();
DbGraph dbg = sess.getDbGraph();
int person = dbg.newNodeType("PERSON");
long name = dbg.newAttribute(person, "NAME", STRING);
long age= dbg.newAttribute(person, "AGE", INT);
long p1 = dbg.newNode(person);
dbg.setAttribute(p1, name, "JOHN");
dbg.setAttribute(p1, age, 18);
long p2 = dbg.newNode(person);
dbg.setAttribute(p2, name, "KELLY");
long p3 = dbg.newNode(person);
dbg.setAttribute(p3, name, "MARY");
sess.commitTx();
...
Create new edge (directed and undirected) types and their attributes.
After that, create new edge objects and set a Value for their attributes.
...
sess.beginTx();
DbGraph dbg = sess.getDbGraph();
int friend = dbg.newUndirectedEdgeType("FRIEND");
int since = dbg.newAttribute(friend, "SINCE", INT);
long e1 = dbg.newEdge(p1, p2, friend);
dbg.setAttribute(e1, since, 2000);
long e2 = dbg.newEdge(p2, p3, friend);
dbg.setAttribute(e2, since, 1995);
...
int loves = dbg.newEdgeType("LOVES");
long e3 = dbg.newEdge(p1, p3, loves);
sess.commitTx();
...
...
sess.beginTx();
DbGraph dbg = sess.getDbGraph();
int phones = dbg.newEdgeType("PHONES");
int when = dbg.newAttribute(phones, "WHEN", TIMESTAMP);
long e4 = dbg.newEdge(p1, p3, phones);
dbg.setAttribute(e4, when, 4pm);
long e5 = dbg.newEdge(p1, p3, phones);
dbg.setAttribute(e5, when, 5pm);
long e6 = dbg.newEdge(p3, p2, phones);
dbg.setAttribute(e6, when, 6pm);
sess.commitTx();
...
Select all objects from an specific node type and, then, iterate them.
...
sess.beginTx();
DbGraph dbg = sess.getDbGraph();
Objects persons = dbg.select(person);
Objects.Iterator it = persons.iterator();
while (it.hasNext()) {
long p = it.next();
String name = dbg.getAttribute(p, name);
}
it.close();
persons.close();
sess.commitTx();
...
Select some objects and combine them.
...
DbGraph dbg = sess.getDbGraph();
...
Objects objs1 = dbg.select(when, >=, 5pm);
...
...
Objects objs2 = dbg.explode(p1, phones, OUT);
...
...
Objects objs = objs1.intersection(objs2);
...
objs.close();
objs1.close();
objs2.close();
...
Software end-user license agreement
===================================
NOTICE TO THE USER: BY COPYING, INSTALLING OR USING THIS SOFTWARE OR PART
OF THIS SOFTWARE, YOU AGREE TO THE TERMS AND CONDITIONS OF THIS AGREEMENT
AS IF IT WERE A WRITTEN AGREEMENT NEGOTIATED AND SIGNED BY YOU. THIS
AGREEMENT IS ENFORCEABLE AGAINST YOU AND ANY OTHER LEGAL PERSON ACTING ON
YOUR BEHALF.
IF, AFTER READING THE TERMS AND CONDITIONS HEREIN, YOU DO NOT AGREE TO
THEM, YOU MAY NOT INSTALL THIS SOFTWARE ON YOUR COMPUTER.
SPARSITY, S.L. (hereinafter ÒThe licensorÓ) IS A SPANISH SPIN-OUT COMPANY
OF UNIVERSITAT POLITECNICA DE CATALUNYA (UPC) AND IS THE EXCLUSIVE
LICENSOR OF ALL THE INTELLECTUAL PROPERTY OF THE SOFTWARE AND ONLY
AUTHORIZES YOU TO USE THE SOFTWARE IN ACCORDANCE WITH THE TERMS SET OUT
IN THIS AGREEMENT.
1. Definitions
==============
"Software" means (a) all the information provided in this agreement,
including but not limited to (i) software files and other computer
information belonging to the licensor or third parties; (ii) written material
and explanatory files ("Documentation"); and (b) any modified versions
and copies of this information, such as improvements, updates and
additions to the information provided to you by the licensor at any time,
insofar as it is not the object of another agreement (collectively,
"Updates").
"User" means the physical or legal person who accepts the terms and
conditions of this agreement.
"Computer" means a computer device that accepts information in digital
or similar form and processes that information to achieve a specific
result based on a sequence of instructions.
2. Intellectual Property
========================
The Software and the authorized copies are property of UPC and Sparsity S.L
is the exclusive licensor. This agreement shall not imply the waiver or
transfer, in whole or in part , of this ownership neither by UPC nor the
licensor. The Software is protected by law, including, but not limited to,
the laws of Spain and other countries on intellectual property, and by the
applicable international agreements.
Except to the extent expressly stipulated here, this agreement does not
grant the user any intellectual property rights over the Software. All
rights not expressly granted are reserved for the licensor.
3. License
==========
The licensor grants the user a free and non-exclusive license to use the
Software. The user may
1) Install and use a copy of the Software for personal evaluation use
2) Develop software using the Software licensed for personal evaluation use
This agreement does not authorize the user to do any of the following:
1) Use or copy the Software in any way other than that specified in
this agreement;
2) Disassemble, decompile or translate the Software, or modify it in
any way other than that permitted by the specific Spanish law on
intellectual property;
3) Sublicense, lease, rent, sell, transfer or transmit his or her
right to the use of the Software.
4) Authorize the total or partial copying of the Software onto the
computer of another natural or legal person;
5) Use the Software as the basis for software developed by third
parties.
Should the user wish to license software developed using the licensed
Software to third parties, he or she should contact the licensor.
4. Limited Guarantee
====================
The licensor does not guarantee the uninterrupted or error-free operation
of a program or that any defects that may occur will be corrected.
The user is responsible for the results obtained through the use of
the software.
5. Liability
============
The licensor shall not be liable for any claims for damages, including
partial or total noncompliance, negligence, falsification or any other
claim
6. Authorization for the use of personal data
=============================================
The user accepts that the licensor uses their personal data for the
following purposes only:
- Communication on DEX updates
- Information about the licensor courses, events and any other licensor
product updates
In order to withdraw the authorization, the user shall send an e-mail to
the following e-mail address: info@sparsity-technologies.com
7. Other Provisions
===================
7.1 No legally recognized rights of the user shall be subject to
elimination or limitation in virtue of this agreement.
7.2 The licensor may terminate this license if the user fails to comply with
the conditions of this agreement. If the licensor decides to terminate the
license, this decision also terminates the authorization of the user to
make use of the Software.
7.3 The user may not bring any action which may arise from this agreement
more than two years after the cause of said action, unless otherwise
established by Spanish law without the possibility of contractual waiver
or limitation.
7.4 This license is indefinite and failure to comply with it by the user
may give rise to legal action by the licensor at any time.
7.5 The licensor shall not be liable for noncompliance with their obligations
when said noncompliance is due to force majeure.
7.6 This agreement is governed by Spanish law and the courts of Barcelona
shall be the authority competent to resolve any conflicts that may arise
from this agreement.
Barcelona, October 1st 2010