XPath is a popular way to select nodes from an XML document. If you
are working with a single XML document or a series of XML documents in an
XML Database— such as Apache Xindice—XPath is used to query, address, and
filter XML content. Commons JXPath allows you to use an XPath query to
address objects in a collection or properties of a bean. JXPath is an
unconventional application of an XML standard to Java objects, which
allows you to quickly select objects from a Collection
without the need for an Iterator
and a comparison. For example, if you
had a List
of Person
objects with an age
property, you could select all of the people
older than 10 by passing the expression /person[@age > 10]
to a JXPathContext
. JXPath implements a large subset
of the XPath specification, and JXPath expressions can be applied to a
wide array of objects, beans, Document Object Model (DOM) Documents,
collections, and maps. This chapter shows you how to use Commons JXPath to
search and filter objects in a collection.
A system will frequently need to search for and identify occurrences of text in a large set of documents. To accomplish this, you will use a tool, such as Apache Lucene, to create a searchable index of terms. For example, if you've used an IDE, such as Eclipse, you may find yourself searching for all of the occurrences of the text "testVariable" in your workspace. Eclipse can quickly perform any number of complex searches, and when it does, it is using Apache Lucene to index and search every file. Apache Lucene is a very efficient search engine that can be used to search a set of documents for terms and phrases and analyze the frequency of terms within a set of documents. Lucene offers a complex query syntax, which allows for compound queries and term matching using proximity and wildcards. This chapter combines Lucene with Commons Digester to create a tool to search a set of XML documents.