org.opentox.ontology.rdf
Class Dataset

java.lang.Object
  extended by org.opentox.error.ErrorSource
      extended by org.opentox.ontology.rdf.RDFHandler
          extended by org.opentox.ontology.rdf.Dataset
All Implemented Interfaces:
java.io.Serializable, IDataset, IProne2Error

public class Dataset
extends RDFHandler
implements java.io.Serializable, IDataset

Author:
OpenTox - http://www.opentox.org, Sopasakis Pantelis, Sarimveis Harry
See Also:
Serialized Form

Field Summary
private static long serialVersionUID
           
 
Fields inherited from class org.opentox.ontology.rdf.RDFHandler
jenaModel
 
Fields inherited from class org.opentox.error.ErrorSource
errorRep
 
Constructor Summary
  Dataset()
          Initialized a void Dataset object; invokes a call to the super-class constructor
  Dataset(java.io.InputStream in)
          Initializes a new Dataset object given an input stream which can either correspond to a file on the disk or some web resource.
protected Dataset(com.hp.hpl.jena.ontology.OntModel ontological_model)
           
  Dataset(java.net.URI dataset_uri)
          Initializes a Dataset given a URI.
  Dataset(java.net.URL dataset_url)
          Initializes a Dataset object given a URL.
 
Method Summary
static void createRandomDataset(int numOfCompounds, int numOfFeatures, java.io.OutputStream out, java.lang.String Lang)
          Generates a random dataset of prescribed dimensions.
private  weka.core.FastVector getAttributes(java.util.Map<com.hp.hpl.jena.rdf.model.Resource,java.lang.String> featureTypeMap)
          Returns a FastVector for the attributes of the dataset as an Instaces object.
 weka.core.Instances getInstaces(java.lang.String target, boolean isClassNominal)
          This method is used to encapsulate the data of the RDF document in a weka.core.Instances object which can be used to create Regression and classification models using weka algorithms.
 weka.core.Instances getInstances(java.lang.String model_id)
          Similar to getInstaces(java.lang.String, boolean) but the generated Instances is constructed with respect to a certain model.
static void main(java.lang.String[] atts)
           
private static java.util.Set<java.lang.String> numericXSDtypes()
          The set of XSD data types that should be cast as numeric.
 Dataset populateDataset(weka.core.Instances predictedData)
           
 java.util.Set<java.lang.String> setOfFeatures()
          Returns the set of features in the dataset.
private static java.util.Set<java.lang.String> stringXSDtypes()
          The set of XSD data types that should be cast as string.
 
Methods inherited from class org.opentox.ontology.rdf.RDFHandler
getClassMemberIteratorFor, getJenaModel
 
Methods inherited from class org.opentox.error.ErrorSource
getErrorRep
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.opentox.interfaces.IProne2Error
getErrorRep
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

Dataset

public Dataset(java.io.InputStream in)
Initializes a new Dataset object given an input stream which can either correspond to a file on the disk or some web resource.

Parameters:
in -
See Also:
Dataset(java.net.URI), Dataset(java.net.URL)

Dataset

protected Dataset(com.hp.hpl.jena.ontology.OntModel ontological_model)

Dataset

public Dataset(java.net.URI dataset_uri)
Initializes a Dataset given a URI.

Parameters:
dataset_uri -
See Also:
Dataset(java.net.URL), Dataset(java.io.InputStream)

Dataset

public Dataset(java.net.URL dataset_url)
        throws java.net.URISyntaxException
Initializes a Dataset object given a URL.

Parameters:
dataset_url -
Throws:
java.net.URISyntaxException
See Also:
Dataset(java.net.URI), Dataset(java.io.InputStream)

Dataset

public Dataset()
Initialized a void Dataset object; invokes a call to the super-class constructor

See Also:
RDFHandler, Dataset(java.io.InputStream), Dataset(java.net.URI), Dataset(java.net.URL)
Method Detail

setOfFeatures

public java.util.Set<java.lang.String> setOfFeatures()
Returns the set of features in the dataset.

Returns:
the set of all features in the dataset.

getInstaces

public weka.core.Instances getInstaces(java.lang.String target,
                                       boolean isClassNominal)
                                throws java.lang.Exception
This method is used to encapsulate the data of the RDF document in a weka.core.Instances object which can be used to create Regression and classification models using weka algorithms.

Description:
This method was developed to generate datasets (as Instances) in order to be used as input to training algorithms of weka.

Characteristics of generated Instances:
The relation name of the generated instances is the same with the identifier of the dataset. If no identifier is available, then this is set to some arbitraty URI. If isClassNominal is set to false, the class attribute is not defined in this method but it can be set externally (from the method that calls getWekaDataset). If isClassNominal is set to true, the target of the datset is defined by the first agument of the method (String target).
The attributes of the Instances object coincides with the set of features of the dataset in RDF format.

Specified by:
getInstaces in interface IDataset
Parameters:
target - URI of the target feature of the dataset. It is optional (you may leave it null) if you are going to use the Instances for regression models and isClassNominal is set to false, otherwise you have to specify a valid feature URI.
isClassNominal - Set to true if the class attribute should be considered to be nominal.
Returns:
The Instances object which encapsulates the data in the RDF document.
Throws:
java.lang.Exception

getInstances

public weka.core.Instances getInstances(java.lang.String model_id)
Similar to getInstaces(java.lang.String, boolean) but the generated Instances is constructed with respect to a certain model.

Specified by:
getInstances in interface IDataset
Parameters:
model_id -
Returns:
Instances for prediction using a given model.

numericXSDtypes

private static java.util.Set<java.lang.String> numericXSDtypes()
The set of XSD data types that should be cast as numeric.

Returns:
the set of XSD datatypes that should be considered as numeric.

stringXSDtypes

private static java.util.Set<java.lang.String> stringXSDtypes()
The set of XSD data types that should be cast as string.

Returns:
the set of XSD datatypes that should be considered as strings.

getAttributes

private weka.core.FastVector getAttributes(java.util.Map<com.hp.hpl.jena.rdf.model.Resource,java.lang.String> featureTypeMap)
Returns a FastVector for the attributes of the dataset as an Instaces object.

Parameters:
featureTypeMap -
Returns:
FastVector of Attributes

createRandomDataset

public static void createRandomDataset(int numOfCompounds,
                                       int numOfFeatures,
                                       java.io.OutputStream out,
                                       java.lang.String Lang)
Generates a random dataset of prescribed dimensions.

Parameters:
numOfCompounds - The number of compounds in the dataset.
numOfFeatures - The number of features of the dataset.
out - The output stream to be used to write the dataset. Can be System.out (The standard system output), a FileOutputStream or other stream. If set to null, System.out will be used.
Lang - The prefered language of the representation. Choose among "RDF/XML", "RDF/XML-ABBREV", "N-TRIPLE" and "N3"

populateDataset

public Dataset populateDataset(weka.core.Instances predictedData)
Specified by:
populateDataset in interface IDataset
Parameters:
predictedData - Instances containing just the predictions.
Returns:
Populated Dataset.

main

public static void main(java.lang.String[] atts)
                 throws java.net.URISyntaxException,
                        java.lang.Exception
Throws:
java.net.URISyntaxException
java.lang.Exception