The ObjectBank class is designed to make it easy to change the format/source
of data read in by other classes and to standardize how data is read in javaNLP
classes. This should make reuse of existing code (by non-authors of the code)
easier because one has to just create a new ObjectBank which knows where to
look for the data and how to turn it into Objects, and then use the new
ObjectBank in the class. This will also make it easier to reuse code for
reading in the same data.
An ObjectBank is a Collection of Objects. These objects are taken
from input sources and then tokenized and parsed into the desired
kind of Object. An ObjectBank requires a ReaderIteratorFactory and an
IteratorFromReaderFactory. The ReaderIteratorFactory is used to get
an Iterator over java.util.Readers which contain representations of
the Objects. A ReaderIteratorFactory resembles a Collection that
takes input sources and dispenses Iterators over java.util.Readers
of those sources. An IteratorFromReaderFactory is used to turn a single
java.util.Reader into an Iterator over Objects. The IteratorFromReaderFactory
splits the contents of the java.util.Reader into Strings and then parses them
into appropriate Objects.
Example Usage:
You have a collection of files in the directory /u/nlp/data/gre/questions. Each file
contains several Puzzle documents which look like:
<puzzle>
<preamble> some text </preamble>
<question> some intro text
<answer> answer1 </answer>
<answer> answer2 </answer>
<answer> answer3 </answer>
<answer> answer4 </answer>
</question>
<question> another question
<answer> answer1 </answer>
<answer> answer2 </answer>
<answer> answer3 </answer>
<answer> answer4 </answer>
</question>
</puzzle>
First you need to build a ReaderIteratorFactory which will provide java.io.Readers
over all the files in your directory:
Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
Next you need to make a IteratorFromReaderFactory which will take the java.io.Readers
vended by the ReaderIteratorFactory, split them up into documents (Strings) and
then convert the Strings into Objects. In this case we want to keep everything
between each set of <puzzle> </puzzle> tags so we would use a BeginEndIteratorFactory.
You would also need to write a class which extends Appliable and whose apply method
converts the String between the <puzzle> </puzzle> tags into Puzzle objects.
public class PuzzleParser implements Appliable {
public Object apply (Object o) {
String s = (String)o;
...
Puzzle p = new Puzzle(...);
...
return p;
Now to build the IteratorFromReaderFactory:
IteratorFromReaderFactory rtif = BeginEndIterator.getFactory("<puzzle>", "</puzzle>", new PuzzleParser());
Now, to create your ObjectBank you just give it the ReaderIteratorFactory and
IteratorFromReaderFactory that you just created:
ObjectBank puzzles = new ObjectBank(rif, rtif);
Now, if you get a new set of puzzles that are located elsewhere and formatted differently
you create a new ObjectBank for reading them in and use that ObjectBank instead with only
trivival changes (or possible none at all if the ObjectBank is read in on a constructor)
to your code. Or even better, if someone else wants to use your code to evaluate their puzzles,
which are located elsewhere and formatted differently, they already know what they have to do
to make your code work for them.