TreePath is like XPath but more general. It can be adapted to any tree
data structure. As in XPath, a tree path expression consists of a
number of steps --
<step1> <step2> <step3>
. Each step selects a set of candidates nodes in the context of the
candidates from the preceding step, the initial context being some
node, usually the root, offered as an inital parameter to the
expression. Optionally, a step has predicates which filter the
candidate set. The nodes that remain after the final step are those
selected by the expression. So, for example,
/foo
would select the
foo
child nodes under the root node and
/foo/bar[@length = 1]
would select all the
bar
nodes that are children of
foo
nodes under the root, but these
bar
nodes must have a length attribute with a value of 1. The sections
below provide more examples and explain the mechanism and usage.
foo
nodes.
a / \ b foo | / | c d e | foo
a
or
e
.
a / \ b foo | / | c d e | foo
foo
nodes under the root.
a / \ b foo | / | c d e | foo
a / \ b foo | / | c d e | foo
In order to adapt
dfh.treepath
to a particular variety of tree, you need to extend the
Forester
class. A
Forester
then makes
Path
objects for you by compiling tree path expressions. If you give a
Path
the some node of a tree, it will select the nodes from the tree
corresponding to your expression.
Forester<Node> f = new MyForester(); Path<Node> p = f.path('//foo/>bar[@quux xor @corge]'); List<Node> l = f.select(root);
Foresters know about your trees. In particular, they know how to find
the children and parent of a node and whether a particular string
matches its “tag” (or tags). Optionally, they may also
know what sort of nodes you'd rather ignore and what, if anything,
constitutes a unique identifier. So, for example, if your nodes are of
class
Node
and have the methods
Node Node.parent()
and
java.util.List<Node> Node.children()
, the following class will give you a
Forester
that can handle your trees.
class MyForester extends Forester<Node> { @Override protected List<Node> children(Node n, Index<Node> i) { return n.children(); } @Override protected boolean hasTag(Node n, String tag) { return n.tag().equals(tag); } @Override protected boolean matchesTag(Node n, Pattern p) { return p.matcher(n.tag()).find(); } @Override protected Node parent(Node n, Index<Node> i) { return n.parent(); } }
If your nodes don't know their parents, you have to override another method:
@Override protected Index<Node> index(Node root) { return new ParentIndex<Node>(root, this); }
ParentIndex
will walk the tree from
root
, recording the parent-child relationship.
If your nodes don't have tags, you can still use tree path expressions, but you'll have to use the wildcard (*) in place of a tag name or pattern.
Foresters are
Path
factories: they compile path expressions into
Path
objects.
Path<Node> p = f.path("id(foo)//bar/*[@att = 1.5][not @quux]")
Once you have a path object -- you should reuse them when possible --
you call its select method, perhaps with an
Index
You need to provide an index if the forester
by itself cannot generate the index from whatever node you give it.
In particular, if your nodes don't know their parent and you use a ParentIndex
,
this index assumes whatever node you give it initially is the root.
If this is false, and in particularly if it is a damaging error, then
you had best generate the true index from the root and use this. Of
course, you will gain some efficiency by reusing indexes as well.
. If you do not provide an index, one will be generated.
List<Node> list1 = p.select(n1), list2 = p.select(n2, index);
An index is an object that caches information about a tree for rapid retrieval. Minimally it caches an appropriate forester and the root node. It may also cache the mapping from child to parent and unique ids to nodes. The minimal index is cheap to construct, so there is little lost in constructing a new one with every selection. If you are doing many matches against the same tree, you can get a single index for it initially using your forester.
Index<Node> i = f.index(root);
Node tests classify nodes into types -- nodes with a particular tag or whose tag matches a particular pattern, zero-width nodes, text nodes, whitespace nodes, etc. A node test has only the node itself to examine, no selection context (other than the tree index). They are used to interpret the steps, but the chief way one will interact with them while using path expressions is in defining categories of nodes to ignore.
NodeTest<Node> t = new NodeTest() { @Override public boolean passes(Node n, Index<Node> i) { return n.someTestOrOther(); } } Forester<Node> f = new MyForester(t); // now all the nodes that pass someTestOrOther are invisible
The following sections explain the theory of tree path expressions (mostly).
The following
dfh.grammar
grammar specification defines the set of valid tree path expressions.
treepath = <path> [ <s> "|" <s> <path> ]* and_cnd = <condition> [ <s> /&|(?<!\/)\band\b(?!\/)/ <s> <condition> ]+ (and_precedence) arg = <treepath> | <literal> | <num> | <attribute> | <attribute_test> | <condition> args = "(" <s> <arg> [ <s> "," <s> <arg> ]* <s> ")" attribute = <aname> [ <args> ]? attribute_test = <attribute> <s> <cmp> <s> <value> | <value> <s> <cmp> <s> <attribute> condition = <term> | <not_cnd> | <or_cnd> | <and_cnd> | <xor_cnd> | <group> first_step = [{segment} <separator>?+ <step> ] group = "(" <s> <condition> <s> ")" not_cnd = /!|(?<!\/)\bnot\b(?!\/)/ <s> <condition> (not_precedence) or_cnd = <condition> [ <s> /\|{2}|(?<!\/)\bor\b(?!\/)/ <s> <condition> ]+ path = <first_step> [ <subsequent_step> ]*+ predicate = "[" <s> [ <signed_int> | <treepath> | <attribute_test> | <condition> ] <s> "]" step = [ <full> | <abbreviated> ] [ <predicate> ]*+ subsequent_step = [{segment} <separator> <step> ] term = <attribute> | <attribute_test> | <treepath> value = <literal> | <num> | <attribute> xor_cnd = <condition> [ <s> /^|(?<!\/)\bxor\b(?!\/)/ <s> <condition> ]+ (xor_precedence) abbreviated = !-[ "//" | "/>" ] [ "." | ".." | <id> ] full = [ <axis> ]?+ <forward> num = <signed_int> | <float> axis = !-[ "//" | "/>" ] <axis_name> "::" float = /[+-]?+/ <int>?+ /\.\d++/ [ /e[+-]?+/i <int> ]?+ forward = <wildcard> | <specific> | <pattern> id = "id(" /(?:[^)\\]|\\.)++/ ")" literal = <squote> | <dquote> signed_int = /[+-]?+/ <int> aname = /@(?:[\p{L}_$]|\\.)(?:[\p{L}_$\p{N}]|[-:](?=[\p{L}_\p{N}])|\\.)*+/ axis_name = /(?>s(?>ibling(?>-or-self)?+|elf)|p(?>receding(?>-sibling)?+|arent)|leaf|following(?>-sibling)?+|descendant(?>-or-self)?+|child|ancestor(?>-or-self)?+)/ cmp = /[<>=]=?|!=/ dquote = /"(?:[^"]|\\.)*+"/ int = /\b(?:0|[1-9][0-9]*+)\b/ pattern = /~(?:[^~\\]|\\.)++~/ s = /\s*+/ separator = /\/[\/>]?/ specific = /[\p{L}_](?:[\p{L}\p{N}_]|[-:](?=[\p{L}_\p{N}])|\\.)*+/ squote = /'(?:[^']|\\.)*+'/ wildcard = "*"
This grammar contains a few conditions; e.g.,
(not_precedence)
. These conditions all govern the precedence of logical operations,
ensuring that only valid parse trees for these operations are
returned. The class that implements this grammar, including the
conditions, is
dfh.treepath.PathGrammar
.
Every tree path expression is a sequence of steps. Each step begins with a collection of nodes and chooses a new set, taking each node in the context set and selecting a new set and filtering it. For example, a step might take a particular context node, select all its children, and then filter these down to those with a particular tag or those with a particular index -- a single node. Some examples:
b
tag.
b
tag.
b
tag, then filter these down to those whose
@foo
property has the value "bar".
b
tag, then drop the first 3, then filter the remainder down to those
whose
@foo
property has the value "bar".
A step consists of a separator --
/
,
//
, or
/>
--, a node type -- e.g.,
b
--, and zero or more predicates.
descendant-or-self
, otherwise it is
descendant
.
<root><a><b><a/></b></a></root>
, the expression
/>a
would select the
a
element immediately under the root but not the nested element -- the
latter element is screened from the root by the first.
This separator is unique to tree path expressions. I have included it because I often find myself wanting to define this sort of path for a tree.
The latter two separator types cannot be followed by an explicit axis specification. The last cannot be followed by the wildcard node type.
Node type expressions select an initial candidate set of nodes. There
are three varieties, all defined relative to a node's
“tag” or tags. A node's tag, in turn, is simply a string
type label. Unlike in XPath, tree path expressions do not assume a
node has a single tag, or any tag, though in the latter case
expressions will be less efficient and one cannot use the
/>
separator.
/a //foo />bar corge ...
Literal node type expressions require that a selected node have precisely the type specified.
/~a~ //~foo~ />~bar~ ~corge~ ...
Pattern node type expressions require only that the pattern -- the bit between the tildes -- match some part of one of the node's tags. The pattern must compile into a Java regular expression or an error will be thrown when the path is compiled.
/* //* *
The wildcard node type expression matches every node.
Each step has an implicit or explicit axis -- a set of candidate nodes
chosen by their graphical relationship to a context node. The default
axis is
child::
-- all the children of the context node. If the separator preceding
the current step is
//
, the understood axis is
descendant::
(or
descendant-or-self::
, if it's the first step in the path) . For a thorough understanding
of axes try Googling "axis xpath". For the most part, the concepts are
the same. There are some differences in the axes available in tree
path expressions, however. Most importantly, tree path expressions
provide a
leaf::
axis, which is the list of nodes at or under the context node that
have no children.
See dfh.treepath.PathGrammar.Axis
for the full list of axes.
Predicates are filters which take a candidate set and eliminate nodes
that fail to pass some test. A step may have zero or more predicates.
A predicate is a function from a node and the other nodes it was
selected with given a particular context node to the values
true
or
false
. If it evaluates to
true
, the node is retained, otherwise it is discarded. As in XPath,
predicate expressions are always enclosed in square brackets.
/a[0] //a[1] />a[2] a[3] ...
Indices are the simplest predicate. They simply pick the node with the
appropriate index from the collection being evaluated. NOTE:
the index is zero-based, unlike in XPath. Why? Because I find
one always uses these path expressions while programming in a language
that uses zero-based indices, and switching back and forth is
confusing, particularly since, unlike in SQL, the formalism *looks*
the same in the expression language and the programming language. In
Java,
a[0]
picks the first element in the
a
array, so one expects
a[0]
to be the expression that picks the first
a
node immediately under the context node. Also note that
//a/b[0]
is going to pick
b
nodes that are the first node of that type immediately under some
a
node -- that is, it will not pick the first such node. The collection
selected from is that picked by one of the context nodes selected by
the preceding step, not the collection of all nodes selected by all
the context nodes together.
One may use negative indices in tree path expressions. If you add a negative index to the size of the collection indexed you get the corresponding positive index, so -1 is the index of the last node in the collection, -2, the index of the second to last node, and so forth. For example in
a /|\ c d e
we have the following correspondences:
a /|\ c d e /a/*[0] or /a/*[-3] a /|\ c d e /a/*[1] or /a/*[-2] a /|\ c d e /a/*[2] or /a/*[-1]
If you overshoot the end of the collection in either direction you simply get no nodes.
/a[foo] //a[//bar] />a[sibling::quux | ancestor::corge] a[/>nef[2]] ...
Evaluates to true if the collection of nodes it selects is not empty.
For the XML document
<root><a/></root>
, the path
/.[a]
will select the root node but
/.[b]
will select nothing.
/a[@foo] //a[@bar(1)] />a[@quux(//corge, @true)] a[@nef('a', 3, -1.5)] ...
Attributes are described in more detail below, but briefly, they are functions from candidate nodes, and perhaps other parameters, to values. The following values all evaluate to false:
Everything else evaluates to
true
.
/a[@foo = @bar] //a[@bar(1) < 2] />a[@quux(//corge, @true) > 'mim'] a[@nef('a', 3, -1.5) == @pick(//twee, 0)] ...
Attribute tests allow you to override the default boolean
interpretation of attributes.And note that
there must be an attribute on one or the other side of the operator
in a test. //a[1 < 2]
and //a[foo = bar]
and the like will not parse as tree path expressions. However, one
can convert anything into an attribute with the @echo
attribute, so this in the end is no limitation. In general there is
little value in predicates like [@echo(1) < 2]
that
evaluate the same way for all nodes, but @echo
lets you
construct them.
Attribute tests always involve a pair of expressions separated by an
operator. The available operators are
equals(Object)
; collections use equality of cardinality; numbers use equality of
value.
=
.
<
but requiring the opposite order.
<
but also returning true if the two arguments cannot be sorted.
>
but also returning true if the two arguments cannot be sorted.
/a[@foo or @bar] //a[not />corge[3]] />a[@quux != 3 ^ @nef] a[not @foo and (//bar or @chuzzle > 1)] ...
All the other predicate types aside from index predicatesOne can use the @index
attribute to
duplicate the functionality of index predicates with attributes. //a[0]
is equivalent to //a[@index = 0]
.
can be combined with boolean operators.
//a[b | c]
will
parse to a different path from //a[b || c]
. In this
case, the two paths are logically equivalent -- both select a
nodes that have either b
or c
nodes as
children, though the first collects all such nodes into one pool
whereas the other collects them into two and returns true if either
pool is non-empty. But consider //a[b | c & d]
versus //a[b || c & d]
. The first will be
equivalent to //a[(b | c) & d]
whereas the second
will be equivalent to //a[b || (c & d)]
.
I have provided both concise and long versions of all the operators to suit varying tastes, but I tend to use the long operators myself to prevent confusion.
In tree path attributes are functions that return some property of a
node in a selection context. For example, there might be an attribute
@tag
that returns the tag value of a node, or
@position
that returns its index in the candidate list it is found inThis is not to say that internally candidate sets
are always instances of java.util.List
. Rather, they are
collections that maintain the order of node discovery. The index of
an item in such a collection is just the number of times one must
iterate over the candidate collection to find it, counting the first
iteration as iteration 0.
.
Attributes are implemented as methods in some
Forester
with a particular signature and bearing the
@Attribute
anotation. For example, here is the implementation of the
@echo
attribute.
@Attribute(description = "converts anything into an attribute") protected final Object echo(N n, Collection<N> c, Index<N> i, Object o) { return o; }
The first three parameters in the method signature are required. The rest can be whatever you want them to be, though they must correspond to one of the types understood by path expressions. Here is the relevant line from the grammar:
arg = <treepath> | <literal> | <num> | <attribute> | <attribute_test> | <condition>
Tree path expressions correspond to a
java.util.Collection
of nodes; literals -- single- or double-quoted strings --, to
String
; nums to
Integer
,
Double
, or just
Number
; attributes to arbitrary objects; and attribute tests and conditions,
to
Boolean
. You can use primitive types in lieu of
Boolean
and so forth. You may also use variadic methods. For example, here is
the implementation of the
@log
attribute:
@Attribute(description = "records parameters to a log stream") protected final Boolean log(N n, Collection<N> c, Index<N> i, Object... msg) { for (Object o : msg) loggingStream.println(o); return Boolean.TRUE; }
Attributes are discovered by reflection, which is complicated code to write and inefficient but gives us incredible flexibility. All attributes in a forester's inheritance tree will be discovered aside from private attributes, and these won't be found even in the terminal class. I could handle accessibility modifiers differently, and perhaps I will in the future, but for now this is how things work. The reflective code gathers attributes the first time an instance of the class is created. This reduces the reflection overhead. Some checking of method signatures is done at this time, but it's best to create unit tests for any attributes you define. Type erasure limits the extensiveness of the signature checking ()as does my laziness).
The base
Forester
class provides a small number of attributes to all extensions:
true
value.
false
value.
null
value.
System.Err
, by default.
/0/1
is the
@uid
of the second child of the first child of the root node.
@pick(/*, 0)
will return the first child of the root node.
//foo
selects 3 nodes, then
@size(//foo)
will return 3.
//foo/bar[@index = 1]
will select the same nodes as
//foo/bar[1]
.
//foo/bar[@echo(1)]
will return 1. This attribute is chiefly useful as a wrapper to
convert anything into an attribute that can be used in an attribute
test.
//*[@leaf & @tsize = 1]
is equivalent to
//*[@leaf]
, while
//*[@leaf & @tsize = 2]
will always return an empty collection.
//*[@leaf & @width = 1]
is equivalent to
//*[@leaf]
, while
//*[@leaf & @width = 2]
will always return an empty collection.
//*[@root & @depth = 0]
is equivalent to
//*[@root]
(which is equivalent to the more efficient expression
/.
) , while
//*[@root & @depth > 0]
will always return an empty collection.
@height
of a leaf is 1. The height of
a
in
a / \ b foo | / | c d e | foois 4.
The
FunctionalForester
class is a library of additional attributes, many of them copied from
XPath. A forester that extends
FunctionalForester
,
MatchPath
, for instance, gets access to all these attributes.
@Attribute
Parameters
An attribute is created by annotating a method in a
Forester
with the
dfh.treepath.Attribute
annotation, optionally providing a name and/or a description.
@Attribute String foo(Node n, Collection<Node> c, Index<Node> i) { return "foo"; } @Attribute("short") String reallyReallyLongMethodName(Node n, Collection<Node> c, Index<Node> i, int j) { return "short" + j; } @Attribute(value = "longer", description = "has a longer name than short") String alsoLong(Node n, Collection<Node> c, Index<Node> i, String... bar) { StringBuilder b = new StringBuilder(); for (String s: bar) b.append(s); return b.toString(); } @Attribute(description = "a roundabout way of putting a number into an expression") Number num(Node n, Collection<Node> c, Index<Node> i, Number num) { return num; }
value
The
value
parameter is useful to avoid namespace collisions, use Java keywords
like
this
as attribute names, and dodge Java naming conventions. If no
value
parameter is given the attribute name will be the method name.
description
The
description
provides a bit of metadata about the attribute. This is visible in the
code and javadocs, where it is largely redundant, but it is also
exposed through the
Forester.attributes()
method. The latter returns an alphabetized map from the names of
attributes available to the forester to two-member arrays of
meta-data: the attribute description and an abbreviated description of
the corresponding method signature. The main method of
dfh.treepath.MatchPath
uses this method to summarize all the attributes available to that
class:
public static void main(String[] args) { int length = 0; for (String s : MatchPath.standard().attributes().keySet()) length = Math.max(length, s.length()); String format = "@%-" + length + "s : %s%n%" + length + "s %s%n"; for (Entry<String, String[]> e : MatchPath.standard().attributes() .entrySet()) { System.out.printf(format, e.getKey(), e.getValue()[0], "", e.getValue()[1]); } }returns
@assertion : whether node is an assertion public boolean dfh.treepath.MatchPath.assertion(...) @depth : the number of steps between node and root protected int dfh.treepath.Forester.depth(...) @echo : converts anything into an attribute protected final Object dfh.treepath.Forester.echo(...,Object) @explicit : whether the rule generating this node is explicitly defined public boolean dfh.treepath.MatchPath.explicit(...) @false : the false value protected final Boolean dfh.treepath.Forester.False(...) @group : group matched public String dfh.treepath.MatchPath.group(...) @height : the longest path between node and a leaf; 1 if node is a leaf protected int dfh.treepath.Forester.height(...) @id : the nodes string id, if any protected String dfh.treepath.Forester.id(...) @index : the index of the context node in the context collection protected int dfh.treepath.Forester.index(...) @label : rule generating match public String dfh.treepath.MatchPath.label(...) @leaf : whether the context node is a leaf protected boolean dfh.treepath.Forester.isLeaf(...) @length : length of group matched public int dfh.treepath.MatchPath.length(...) @log : records parameters to a log stream protected final Boolean dfh.treepath.Forester.log(...,Object[]) @m:abs : absolute value protected Number dfh.treepath.FunctionalForester.abs(...,Number) @m:ceil : round up protected Number dfh.treepath.FunctionalForester.ceil(...,Number) @m:floor : round down protected Number dfh.treepath.FunctionalForester.floor(...,Number) @m:int : integral portion protected Integer dfh.treepath.FunctionalForester.integralPortion(...,Number) @m:max : maximum value protected Number dfh.treepath.FunctionalForester.max(...,Number[]) @m:min : minimum value protected Number dfh.treepath.FunctionalForester.min(...,Number[]) @m:prod : product of values protected Double dfh.treepath.FunctionalForester.product(...,Number[]) @m:round : round to nearest whole number protected Number dfh.treepath.FunctionalForester.round(...,Number) @m:sum : sum of values protected Double dfh.treepath.FunctionalForester.sum(...,Number[]) @null : the null value protected final Object dfh.treepath.Forester.Null(...) @pick : picks a node from a collection protected final Object dfh.treepath.Forester.pick(...,java.util.Collection,int) @root : whether the node is root protected boolean dfh.treepath.Forester.isRoot(...) @s:cmp : compare string order protected Integer dfh.treepath.FunctionalForester.compare(...,String,String) @s:concat : string concatenating items protected String dfh.treepath.FunctionalForester.concatenate(...,Object[]) @s:contains : whether the string contains a particular infix protected boolean dfh.treepath.FunctionalForester.contains(...,String,String) @s:ends-with : whether the string parameter has a particular suffix protected boolean dfh.treepath.FunctionalForester.endsWith(...,String,String) @s:find : look for pattern in string protected boolean dfh.treepath.FunctionalForester.find(...,String,String) @s:index : the index of an infix in the string protected int dfh.treepath.FunctionalForester.index(...,String,String) @s:join : concatenate items with separator protected String dfh.treepath.FunctionalForester.join(...,Object,Object[]) @s:lc : lowercase protected String dfh.treepath.FunctionalForester.lowercase(...,String) @s:len : string length protected Integer dfh.treepath.FunctionalForester.length(...,String) @s:looking-at : match prefix protected boolean dfh.treepath.FunctionalForester.lookingAt(...,String,String) @s:matches : whether the string parameter matches a pattern protected boolean dfh.treepath.FunctionalForester.matches(...,String,String) @s:nspace : normalize whitespace protected String dfh.treepath.FunctionalForester.normalizeWhitespace(...,String) @s:replace : replace all occurrences of infix protected String dfh.treepath.FunctionalForester.replace(...,String,String,String) @s:replace-all : replace all occurrences of pattern protected String dfh.treepath.FunctionalForester.replaceAll(...,String,String,String) @s:replace-first : replace first occurrence of pattern protected String dfh.treepath.FunctionalForester.replaceFirst(...,String,String,String) @s:starts-with : whether the string parameters as a particular prefix protected boolean dfh.treepath.FunctionalForester.startsWith(...,String,String) @s:substr : select substring protected String dfh.treepath.FunctionalForester.substring(...,String,Integer[]) @s:trim : trim whitespace protected String dfh.treepath.FunctionalForester.trim(...,String) @s:uc : uppercase protected String dfh.treepath.FunctionalForester.uppercase(...,String) @s:ucfirst : capitalize only first letter protected String dfh.treepath.FunctionalForester.ucFirst(...,String) @size : the size of the node collection protected int dfh.treepath.Forester.size(...,java.util.Collection) @this : the context node protected final Object dfh.treepath.Forester.This(...) @true : the true value protected final Boolean dfh.treepath.Forester.True(...) @tsize : the size in nodes of the tree rooted at node protected int dfh.treepath.Forester.tsize(...) @u:def : whether the parameter value is non-null protected final Boolean dfh.treepath.FunctionalForester.defined(...,Object) @u:millis : current time in milliseconds protected Long dfh.treepath.FunctionalForester.millis(...) @uid : unique id of context node representing its position in its tree protected final String dfh.treepath.Forester.uid(...) @width : the number of leaves under node; 1 if node is a leaf protected int dfh.treepath.Forester.width(...) @zero : whether node is zero-width public boolean dfh.treepath.MatchPath.zero(...)
The ellipsis represents arguments you do not provide in tree path expressions. A final array represents a varargs argument.
Rationale
The rationale behind the
attributes()
method and the
description
parameter of
@Attribute
that feeds it is just that attributes are not easily discovered in
javadocs or code, particularly when they may be added ad hoc, but they
are particularly important to tree path expressions. So I made
attributes self-documenting.
If you wish to define a particular attribute you can add it to an anonymous inner class:
Forester<N> f = new MyForester() { @Attribute boolean foo(N n, Collection<N> c, Index<N> i) { return bar(n) > quux(n); } }; Path<N> p = f.path(//*[@foo]);
This is useful mostly for rapid exploratory hacks. Once you find something that works, you can move your ad hoc attributes into your forester class or an extension.
The typical way to provide attributes to an extension of tree path is
to implement them in a class extending
Forester
. This isn't done for any principled reason arising from the logic of
tree path expressions, however, but because Java's implementation of
inheritance and reflection make this easy. It would be more natural to
put attributes into libraries and load them into a Forester class as
needed. This mixin structure and Java are an uncomfortable fit, but I
have implemented it as an experimental improvement of TreePath.
A library of attributes to be mixed in orthogonally to the inheritance
hierarchy must extend the class
AttributeLibrary
, must be public, and must have a public zero-argument constructor.
For example:
public class TestLibrary extends AttributeLibrary<Node> { public TestLibrary() { super(); } @Attribute String foo(Node e, Collection<Node> c, Index<Node> i) { return "foo"; } }
To load the attributes provided by this library in a class, you call
Forester.mixin(Class<? extends
AttributeLibrary<N>>...)
. You can load libraries in at any time, but if you want the new
attributes to persist through serialization it is recommended that you
add them in by overriding the
init
method like so:
@Override protected void init() { if (attributes == null) { super.init(); mixin(Library1.class, Library2.class); } }
Note that this method can be used only to supplement the attributes
obtained from the class hierarchy. If
MyForester
provides attribute
@foo
and we try to mix an identically named method in from a library, the
native
@foo
will win out.
The only dependency of this library is my
dfh.grammar
library, which has no dependencies.
All the classes in this library are thread safe.
All the classes in this library aside from
Index
are serializable, though you have to take care if you use mixins.
I use my grammar library,
dfh.grammar
, a lot. Generally when I match something with a grammar I'm not
satisfied simply to know a match is possible, what was matched, or
where the match starts and ends: I want the entire parse tree, and
given the tree, I want to find particular parts and do things with
them. If I'm parsing a date, I want to find the year part and the
month part and so on. If I'm parsing an email address, I want to
separate the user name from the domain and I want to know the
subdomains. For many purposes the methods provided by the match object
itself suffice, but often enough I found myself writing the same sort
of picky code that basically drilled down a particular path in a
particular sort of parse tree. If this were XML I'd use XPath. If it
were HTML I'd use CSS (or perhaps XPath after converting it to XHTML).
So I thought I'd make something like XPath for my parse trees. I
didn't want it to be precisely the same as XML, however, because my
parse trees weren't precisely like XML trees. The notion of
“attribute” wasn't native to the problem, nor were
namespaces. While I was thinking about this it occurred to me that I
could make a generic library that could handle any sort of tree,
adapting the attribute idea to represent callbacks.
Having started along this path, I figured I would add features to the
expression language that I often wanted: an easy way to obtain leaves
or nodes near to a particular node but not screened from it by other
nodes of the same type -- what the
/>foo
expression represents. I found I could make it trivial to adapt the
library to a new sort of tree and trivial to add attributes, both of
which were very nice. So here we are.
I may add mathematical expressions to the things that can occur in predicates.
If I find the time I might implement the tree path specification in other languages, starting with Perl. I find this useful. I'd like someone else to as well. If I make it available to someone who prefers to work in another language, particularly one that facilitates code sharing as Perl does, then I increase the odds that someone other than myself will use it.