You need to parse an XML document into an object graph, and you would like to avoid using either the DOM or SAX APIs directly.
Use the Commons Digester to transform an XML document into an object graph. The Digester allows you to map an XML document structure to an object model in an external XML file containing a set of rules telling the Digester what to do when specific elements are encountered. In this recipe, the following XML document containing a description of a play will be parsed into an object graph:
<?xml version="1.0"?> <plays> <play genre="tragedy" year="1603" language="english"> <name>Hamlet</name> <author>William Shakespeare</author> <summary> Prince of Denmark freaks out, talks to ghost, gets into a crazy nihilistic funk, and dies in a duel. </summary> <characters> <character protagonist="false"> <name>Claudius</name> <description>King of Denmark</description> </character> <character protagonist="true"> <name>Hamlet</name> <descr> Son to the late, and nephew of the present king </descr> </character> <character protagonist="false"> <name>Horatio</name> <descr> friend to Hamlet </descr> </character> </characters> </play> </plays>
This XML document contains a list of play
elements describing plays by William
Shakespeare. One play element describes "Hamlet"; it includes a name
, author
, and summary
element as well as a characters
element containing character
elements describing characters in
the play. After parsing a document with Digester, each play
element will be represented by a Play
object with a set of properties and a
List
of Character
objects:
public class Play { private String genre; private String year; private String language; private String name; private String author; private String summary; private List characters = new ArrayList( ); // accessors omitted for brevity // Add method to support adding elements to characters. public void addCharacter(Character character) { characters.add( character ); } } public class Character { private String name; private String description; private boolean protagonist; // accessors omitted for brevity }
The Digester maps XML to objects using a set of rules, which can
be defined either in an XML file, or they can be constructed
programmatically by creating instances of Rule
and adding them to an instance of
Digester
. This recipe uses an XML
file to create a set of rules that tell the Digester how to translate an
XML document to a List
of Play
objects:
<?xml version="1.0"?> <digester-rules> <pattern value="plays/play"> <object-create-rule classname="xml.digester.Play"/> <set-next-rule methodname="add" paramtype="java.lang.Object"/> <set-properties-rule/> <bean-property-setter-rule pattern="name"/> <bean-property-setter-rule pattern="summary"/> <bean-property-setter-rule pattern="author"/> <!-- Nested Pattern for Characters --> <pattern value="characters/character"> <object-create-rule classname="xml.digester.Character"/> <set-next-rule methodname="addCharacter" paramtype="xml.digester.Character"/> <set-properties-rule/> <bean-property-setter-rule pattern="name"/> <bean-property-setter-rule pattern="descr" propertyname="description"/> </pattern> </pattern> </digester-rules>
This mapping document (or rule sets) can be explained in very
straightforward language. It is telling Digester how to deal with the
document, "When you see an element matching the pattern plays/play
, create an instance of xml.digester.Play
, set some properties, and
push it on to a Stack
(object-create-rule
). If you encounter an
element within a play
element that
matches characters/character
,
create an instance of xml.digester.Character
, set some properties,
and add it to the Play
object." The
following code creates an instance of Digester
from the XML rule sets shown
previously, producing a plays List
,
which contains one Play
object:
import org.apache.commons.digester.Digester; import org.apache.commons.digester.xmlrules.DigesterLoader; List plays = new ArrayList( ); // Create an instance of the Digester from the XML rule set URL rules = getClass( ).getResource("./play-rules.xml"); Digester digester = DigesterLoader.createDigester(rules); // Push a reference to the plays List on to the Stack digester.push(plays); // Parse the XML document InputStream input = getClass( ).getResourceAsStream("./plays.xml"); Object root = digester.parse(input); // The XML document contained one play "Hamlet" Play hamlet = (Play) plays.get(0); List characters = (List) hamlet.getCharacters( );
Digester is simple, but there is one concept you need to
understand: Digester uses a Stack
to
relate objects to one another. In the previous example, set-next-rule
tells the Digester to relate the
top of the Stack
to the next-to-top
of the Stack
. Before the XML document
is parsed, a List
is pushed onto the
Stack
. Every time the Digester
encounters a play
element, it will
create an instance of Play
, push it
onto the top of the Stack
, and call
add( )
with Play
as an argument on the object next to the
top of the stack. Since the List
is
next to the top of the Stack
, the
Digester is simply adding the Play
to
the playList
. Within the pattern
element matching plays/play
, there
is another pattern element matching characters/character
. When an element
matching characters/character
is
encountered, a Character
object is
created, pushed onto the top of the Stack
, and the addCharacter( )
method is called on the next
to top of the Stack
. When the
Character
object is pushed onto the
top of the Stack
, the Play
object is next to the top of the Stack
; therefore, the call to addCharacter( )
adds a Character
to the List
of Character
objects in the Play
object.
Digester can be summed up as follows: define patterns to be
matched and a sequence of actions (rules) to take when these patterns
are encountered. Digester is essentially short-hand for your own SAX
parser, letting you accomplish the same task without having to deal with
the complexity of the SAX API. If you look at the source for the
org.apache.commons.digester.Digester
class, you see that it implements org.xml.sax.helpers.DefaultHandler
and that a
call to parse( )
causes Digester
to register itself as a content
handler on an instance of org.xml.sax.XMLReader
. Digester is simply a
lightweight shell around SAX, and, because of this, you can parse XML
just as fast with the Digester as with a system written to the SAX
API.
Digester rule sets can be defined in an external XML document, or programmatically in compiled Java code, but the general rules are the same. The following code recreates the rule set defined in the previous XML rule set:
import org.apache.commons.digester.BeanPropertySetterRule; import org.apache.commons.digester.Digester; import org.apache.commons.digester.ObjectCreateRule; import org.apache.commons.digester.Rules; import org.apache.commons.digester.SetNextRule; import org.apache.commons.digester.SetPropertiesRule; Digester digester = new Digester( ); Rules rules = digester.getRules( ); // Add Rules to parse a play element rules.add( "plays/play", new ObjectCreateRule("xml.digester.Play")); rules.add( "plays/play", new SetNextRule("add", "java.lang.Object") ); rules.add( "plays/play", new SetPropertiesRule( ) ); rules.add( "plays/play/name", new BeanPropertySetterRule("name") ); rules.add( "plays/play/summary", new BeanPropertySetterRule("summary") ); rules.add( "plays/play/author", new BeanPropertySetterRule("author") ); // Add Rules to parse a character element rules.add( "plays/play/characters/character", new ObjectCreateRule("xml.digester.Character")); rules.add( "plays/play/characters/character", new SetNextRule("addCharacter", "xml.digester.Character")); rules.add( "plays/play/characters/character", new SetPropertiesRule( ) ); rules.add( "plays/play/characters/character/name", new BeanPropertySetterRule("name") ); rules.add( "plays/play/characters/character/description", new BeanPropertySetterRule("description") );
While this is perfectly acceptable, think twice about defining Digester rule sets programmatically. Defining rule sets in an XML document provides a very clear separation between the framework used to parse XML and the configuration of the Digester. When your rule sets are separate from compiled code, it will be easier to update and maintain logic involved in parsing; a change in the XML document structure would not involve changing code that deals with parsing. Instead, you would change the model and the mapping document. Defining Digester rule sets in an XML document is a relatively new Digester feature, and, because of this, you may find that some of the more advanced capabilities of Digester demonstrated later in this chapter are not available when defining rule sets in XML.
More information about Digester XML rule sets can be found in the
package document for org.apache.commons.digester.xmlrules
(http://commons.apache.org/digester/apidocs/index.html).