Index partitioning and clustering.

This package contains the classes that provide the infrastructure for index partitioning and clustering. The tools the actually perform partitioning can be found in {@link it.unimi.di.mg4j.tool}.

An index cluster is a set of local indices that are viewed as a single index. In a {@linkplain it.unimi.di.mg4j.index.cluster.LexicalCluster lexical cluster} each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a {@linkplain it.unimi.di.mg4j.index.cluster.DocumentalCluster documental cluster} each index contains postings referring to a disjoint subset of a collection.

Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by {@linkplain it.unimi.di.mg4j.index.cluster.DocumentalClusteringStrategy documental clustering strategies} and {@linkplain it.unimi.di.mg4j.index.cluster.LexicalClusteringStrategy lexical clustering strategies}.

Clusters are often generated by partitioning an index (albeit, for instance, {@link it.unimi.di.mg4j.tool.Scan} produces a cluster as output of the indexing process). In this case, a {@linkplain it.unimi.di.mg4j.index.cluster.DocumentalPartitioningStrategy documental partitioning strategy} or a {@linkplain it.unimi.di.mg4j.index.cluster.LexicalPartitioningStrategy lexical partitioning strategy} explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning strategy must be suitably matched.