Common Java Cookbook

Edition: 0.19

Download PDF or Read on Scribd

Download Examples (ZIP)

Chapter 12. Searching and Filtering

12.1. Introduction

XPath is a popular way to select nodes from an XML document. If you are working with a single XML document or a series of XML documents in an XML Database— such as Apache Xindice—XPath is used to query, address, and filter XML content. Commons JXPath allows you to use an XPath query to address objects in a collection or properties of a bean. JXPath is an unconventional application of an XML standard to Java objects, which allows you to quickly select objects from a Collection without the need for an Iterator and a comparison. For example, if you had a List of Person objects with an age property, you could select all of the people older than 10 by passing the expression /person[@age > 10] to a JXPathContext. JXPath implements a large subset of the XPath specification, and JXPath expressions can be applied to a wide array of objects, beans, Document Object Model (DOM) Documents, collections, and maps. This chapter shows you how to use Commons JXPath to search and filter objects in a collection.

A system will frequently need to search for and identify occurrences of text in a large set of documents. To accomplish this, you will use a tool, such as Apache Lucene, to create a searchable index of terms. For example, if you've used an IDE, such as Eclipse, you may find yourself searching for all of the occurrences of the text "testVariable" in your workspace. Eclipse can quickly perform any number of complex searches, and when it does, it is using Apache Lucene to index and search every file. Apache Lucene is a very efficient search engine that can be used to search a set of documents for terms and phrases and analyze the frequency of terms within a set of documents. Lucene offers a complex query syntax, which allows for compound queries and term matching using proximity and wildcards. This chapter combines Lucene with Commons Digester to create a tool to search a set of XML documents.


Creative Commons License
Common Java Cookbook by Tim O'Brien is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Permissions beyond the scope of this license may be available at http://www.discursive.com/books/cjcook/reference/jakartackbk-PREFACE-1.html. Copyright 2009. Common Java Cookbook Chunked HTML Output. Some Rights Reserved.