Provides classes for working with the ClueWeb09 collection. The dataset consists of one billion web pages (5 TB compressed, 25 TB uncompressed), in ten languages, collected in January and February 2009. Its creation, supported by U.S. National Science Foundation (NSF), was led by Jamie Callan of the Language Technologies Institute at Carnegie Mellon University to support research on information retrieval and related human language technologies.