Type-specific hash set collection classes. The classes in this package implement type-specific hash sets supporting a subset of the normal collection methods. The standard public access methods in these classes are as follows:

Method SignatureFrom
boolean add(type)implementation class
void clear(){@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase}
Object clone()implementation class
boolean contains(type)implementation class
void ensureCapacity(int){@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase}
Iterator iterator(){@link com.sosnoski.util.hashset.ObjectSetBase} (not implemented for primitives)
boolean remove(type)implementation class
int size(){@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase}

The access methods are unsynchronized for best performance. The user program must implement appropriate locking if multiple threads need to access an instance of these classes while that instance may be modified.

Collections of a primitive type and of a specific object type are both supported. Hash sets of primitive types derive from the {@link com.sosnoski.util.hashset.PrimitiveSetBase} base class, and those of object types derive from the {@link com.sosnoski.util.hashset.ObjectSetBase}. To define a hash set of a new type, generally you can use one of the existing classes as a base and do a text substitution of the type names.

This is generally not too useful for object types, since none of the methods return values of the type present in the table. The sample implementations include sets of Strings as well as generic Objects, but there's really little reason to use anything other than the Object set in practice.

Collections of object types may be configured to use any of three possible combinations of hash method and key comparison techniques. The choice is determined by an optional tech parameter to the constructor, with the following values inherited from {@link com.sosnoski.util.ObjectHashBase}:

In testing with the Sun Java 1.3.1 JVM System.identityHashCode() was found to be very slow in comparison to a typical hashCode() implementation, so be careful of using the IDENTITY_HASH option. IDENTITY_COMP is always faster than STANDARD_HASH, though, so it's a good choice when you know that objects are unique (such as Strings which have been interned).

Primitive types are more complicated to build. To implement a stack of a primitive type other than the included int, you're best off basing it on DoubleHashSet.java. If you're doing this for long, for instance, you'd need to first substitute "Long" for "Double", then "long" for "double" (except for the use of double as a parameter type for the first constructor - this needs to stay!). The last step is replacing the first part of the computeSlot method with an appropriate hash code computation for the target data type which results in an int value, which can then be converted into an initial index into the underlying array by the return statement of the method.

If you work with primitive types you'll also need to define a hashing function (the computeSlot method in these classes). Selecting a good hash computation method for your data requires considerable research. The methods used in the provided code have not been subjected to any serious testing and may be flawed by poor distributions, which results in higher overhead for operations on the set. If you're using hash-based collections heavily, make sure the hashing method you choose works well with your data.

Hash sets of both object and primitive types use considerably less memory than the corresponding standard Java library implementations, so a good approach to take when using these collections in high-performance applications is to be generous in setting the initial size for the table. It may also improve performance if you lower the fill parameter value, so that the table is expanded more often.