Common Java Cookbook

Edition: 0.19

Download PDF or Read on Scribd

Download Examples (ZIP)

6.3. Namespace-Aware Parsing

6.3.1. Problem

You need to parse an XML document with multiple namespaces.

6.3.2. Solution

Use Digester to parse XML with multiple namespaces, using digester.setNamespaceAware(true), and supplying two RuleSet objects to parse elements in each namespace. Consider the following document, which contains elements from two namespaces: http://discursive.com/page and http://discursive.com/person:

<?xml version="1.0"?>
<pages xmlns="http://discursive.com/page"
       xmlns:person="http://discursive.com/person">
  <page type="standard">
    <person:person firstName="Al" lastName="Gore">
      <person:role>Co-author</person:role> 
    </person:person>
    <person:person firstName="George" lastName="Bush">
      <person:role>Co-author</person:role> 
    </person:person>
  </page>
</pages>

To parse this XML document with the Digester, you need to create two separate sets of rules for each namespace, adding each RuleSet object to Digester with addRuleSet( ). A RuleSet adds Rule objects to an instance of Digester. By extending the RuleSetBase class, and setting the namespaceURI in the default constructor, the following class, PersonRuleSet, defines rules to parse the http://discursive.com/person namespace:

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.RuleSetBase;
public class PersonRuleSet extends RuleSetBase {
    public PersonRuleSet( ) {
        this.namespaceURI = "http://discursive.com/person";
    }
    public void addRuleInstances(Digester digester) {
        digester.addObjectCreate("*/person", Person.class);
        digester.addSetNext("*/person", "addPerson");
        digester.addSetProperties("*/person");
        digester.addBeanPropertySetter("*/person/role", "role");
    }
}

PersonRuleSet extends RuleSetBase , which is an implementation of the RuleSet interface. RuleSetBase adds support for namespaces with a protected field namespaceURI. The constructor of PersonRuleSet sets the namespaceURI field to http://discursive.com/person, which tells the Digester to apply these rules only to elements and attributes in the http://discursive.com/person namespace. PageRuleSet extends RuleSetBase and provides a set of rules for the http://discursive.com/page namespace:

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.RuleSetBase;
public class PageRuleSet extends RuleSetBase {
    public PageRuleSet( ) {
        this.namespaceURI = "http://discursive.com/page";
    }
    public void addRuleInstances(Digester digester) {
        digester.addObjectCreate("*/page", Page.class);
        digester.addSetNext("*/page", "addPage");
        digester.addSetProperties("*/page");
        digester.addBeanPropertySetter("*/page/summary", "summary");
    }
}

Both RuleSet implementations instruct the Digester to create a Page or a Person object whenever either element is encountered. The PageRuleSet instructs the Digester to create a Page object when a page element is encountered by using a wildcard pattern—*/page. Both PageRuleSet and PersonRuleSet use digester.addSetNext( ) to add the objects just created to the next object in the Stack. In the following code, an instance of Pages is pushed onto the Digester Stack, and both RuleSet implementations are added to a Digester using addRuleSet() :

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.ObjectCreateRule;
import org.apache.commons.digester.RuleSetBase;
import org.apache.commons.digester.Rules;
import org.apache.commons.digester.SetNextRule;
Pages pages = new Pages( );
        
Digester digester = new Digester( );
digester.setNamespaceAware(true);
digester.addRuleSet( new PageRuleSet( ) );
digester.addRuleSet( new PersonRuleSet( ) );
        
digester.push(pages);
InputStream input = getClass( ).getResourceAsStream("./content.xml");
digester.parse(input);
Page page = (Page) pages.getPages( ).get(0);
System.out.println(page);

Because the PageRuleSet adds each Page object to the next object on the Stack, the Pages object has an addPage( ) method that accepts a Page object.

6.3.3. Discussion

Each of the RuleSet implementations defined a set of rules in compiled Java code. If you prefer to define each set of rules in an XML file, you may use the FromXmlRuleSet instead of the RuleSetBase, as follows:

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.xmlrules.FromXmlRuleSet;
Pages pages = new Pages( );
        
Digester digester = new Digester( );
digester.setNamespaceAware(true);
// Add page namespace
digester.setRuleNamespaceURI("http://discursive.com/page");
URL pageRules = getClass( ).getResource("./page-rules.xml");
digester.addRuleSet( new FromXmlRuleSet( pageRules ) );
    
// Add person namespace
digester.setRuleNamespaceURI("http://discursive.com/person");
URL personRules = getClass( ).getResource("./person-rules.xml");
digester.addRuleSet( new FromXmlRuleSet( personRules ) );
        
digester.push(pages);
InputStream input = getClass( ).getResourceAsStream("./content.xml");
digester.parse(input);
Page page = (Page) pages.getPages( ).get(0);
System.out.println(page);

Calling digester.setRuleNamespaceURI( ) associates the Rules contained in each FromXmlRuleSet with a specific namespace. In the Solution, the RuleSetBase protected field namespaceURI was used to associate RuleSet objects with namespaces. In the previous example, the namespace is specified by calling setRuleNamespaceURI( ) before each FromXmlRuleSet is added to the digester because there is no access to the protected member variable, namespaceURI, which FromXmlRuleSet inherits from RuleSetBase. person-rules.xml contains an XML rule set for parsing the http://discursive.com/person namespace:

<?xml version="1.0"?>
<!DOCTYPE digester-rules PUBLIC 
        "-//Jakarta Apache //DTD digester-rules XML V1.0//EN" 
        "http://commons.apache.org/digester/dtds/digester-rules.dtd">
<digester-rules>
  <pattern value="*/page">
    <object-create-rule classname="com.discursive.jccook.xml.bean.Page"/>
    <set-next-rule methodname="addPage"/>
    <set-properties-rule/>
    <bean-property-setter-rule pattern="summary" name="summary"/>
  </pattern>
</digester-rules>

page-rules.xml contains an XML rule set for parsing the http://discursive.com/page namespace:

<?xml version="1.0"?>
<!DOCTYPE digester-rules PUBLIC 
        "-//Jakarta Apache //DTD digester-rules XML V1.0//EN" 
        "http://commons.apache.org/digester/dtds/digester-rules.dtd">
<digester-rules>
  <pattern value="*/person">
    <object-create-rule classname="com.discursive.jccook.xml.bean.Person"/>
    <set-next-rule methodname="addPerson"/>
    <set-properties-rule/>
    <bean-property-setter-rule pattern="role"/>
  </pattern>
</digester-rules>

6.3.4. See Also

For more information relating to the use of namespaces in the Digester, refer to the Javadoc for the org.apache.commons.digester package at http://commons.apache.org/digester/apidocs.


Creative Commons License
Common Java Cookbook by Tim O'Brien is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Permissions beyond the scope of this license may be available at http://www.discursive.com/books/cjcook/reference/jakartackbk-PREFACE-1.html. Copyright 2009. Common Java Cookbook Chunked HTML Output. Some Rights Reserved.