Type-specific hash set collection classes. The classes in this package implement type-specific hash sets supporting a subset of the normal collection methods. The standard public access methods in these classes are as follows:
Method Signature | From |
boolean add(type) | implementation class |
void clear() | {@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase} |
Object clone() | implementation class |
boolean contains(type) | implementation class |
void ensureCapacity(int) | {@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase} |
Iterator iterator() | {@link com.sosnoski.util.hashset.ObjectSetBase} (not implemented for primitives) |
boolean remove(type) | implementation class |
int size() | {@link com.sosnoski.util.PrimitiveHashBase} or {@link com.sosnoski.util.ObjectHashBase} |
The access methods are unsynchronized for best performance. The user program must implement appropriate locking if multiple threads need to access an instance of these classes while that instance may be modified.
Collections of a primitive type and of a specific object type are both supported. Hash sets of primitive types derive from the {@link com.sosnoski.util.hashset.PrimitiveSetBase} base class, and those of object types derive from the {@link com.sosnoski.util.hashset.ObjectSetBase}. To define a hash set of a new type, generally you can use one of the existing classes as a base and do a text substitution of the type names.
This is generally not too useful for object types, since none of the methods
return values of the type present in the table. The sample implementations include
sets of String
s as well as generic Object
s, but there's really
little reason to use anything other than the Object
set in practice.
Collections of object types may be configured to use any of three possible
combinations of hash method and key comparison techniques. The choice is determined by an
optional tech
parameter to the constructor, with the following
values inherited from {@link com.sosnoski.util.ObjectHashBase}:
STANDARD_HASH
- use object hashCode()
and equals()
comparison
IDENTITY_COMP
- use object hashCode()
and ==
comparison
IDENTITY_HASH
- use System.identityHashCode()
and ==
comparison
System.identityHashCode()
was found to be
very slow in comparison to a typical hashCode()
implementation, so be careful of using
the IDENTITY_HASH
option. IDENTITY_COMP
is always faster than
STANDARD_HASH
, though, so it's a good choice when you know that objects are unique
(such as String
s which have been intern
ed).
Primitive types are more complicated to build. To implement a
stack of a primitive type other than the included int
,
you're best off basing it on DoubleHashSet.java
. If you're doing
this for long
, for instance, you'd need to first substitute
"Long" for "Double", then "long" for "double" (except for the use of
double
as a parameter type for the first constructor - this needs
to stay!). The last step is replacing the first part of the computeSlot
method with an appropriate hash code computation for the target data type which
results in an int
value, which can then be converted into an
initial index into the underlying array by the return statement of the method.
If you work with primitive types you'll also
need to define a hashing function (the computeSlot
method in
these classes). Selecting a good hash computation method for your data
requires considerable research. The methods used in the provided code
have not been subjected to any serious testing and may be flawed by poor
distributions, which results in higher overhead for operations on the set.
If you're using hash-based collections heavily, make sure the hashing method
you choose works well with your data.
Hash sets of both object and primitive types use considerably less memory than
the corresponding standard Java library implementations, so a good approach to take
when using these collections in high-performance applications is to be generous in
setting the initial size for the table. It may also improve performance if you lower the
fill
parameter value, so that the table is expanded more often.