public class BloomFilter<T>
extends java.lang.Object
implements java.lang.Cloneable, java.io.Serializable
Modifier and Type | Class and Description |
---|---|
static interface |
BloomFilter.CustomHashFunction
An interface which can be implemented to provide custom hash functions.
|
static class |
BloomFilter.HashMethod
Different types of hash functions that can be used.
|
Constructor and Description |
---|
BloomFilter(java.util.BitSet bloomFilter,
int m,
int k,
BloomFilter.HashMethod hashMethod,
java.lang.String hashFunctionName)
Constructs a new bloom filter by using the provided bit vector
bloomFilter.
|
BloomFilter(double n,
double p)
Constructs a new bloom filter by determining the optimal bloom filter
size n in bits and the number of hash functions k based on
the expected number n of elements in the bloom filter and the
tolerable false positive rate p.
|
BloomFilter(int m,
int k)
Constructs a new bloom filter of the size m bits and k hash
functions.
|
Modifier and Type | Method and Description |
---|---|
boolean |
add(byte[] value) |
boolean |
add(T value) |
void |
addAll(java.util.Collection<T> values) |
void |
clear() |
java.lang.Object |
clone() |
boolean |
contains(byte[] value) |
boolean |
contains(T value) |
boolean |
containsAll(java.util.Collection<T> values) |
boolean |
equals(java.lang.Object obj) |
java.util.BitSet |
getBitSet() |
double |
getBitsPerElement(int n)
Calculates the numbers of Bits per element, based on the expected number
of inserted elements n.
|
double |
getBitZeroProbability(int n)
Returns the probability that a bit is zero.
|
java.lang.String |
getCryptographicHashFunctionName()
Returns the name of the cryptographic hash function used when the hash
method is HashMethod.Cryptographic.
|
double |
getFalsePositiveProbability(int n)
Returns the probability of a false positive (approximated):
(1 - e^(-k * insertedElements / m)) ^ k |
BloomFilter.HashMethod |
getHashMethod()
Returns the hash method used to calculate hash values.
|
int |
getK()
Returns the number of hash functions.
|
int |
getM()
Returns the size of the bloom filter.
|
int[] |
hash(java.lang.String value)
Dispatches the hash function defines via
setHashMethod(HashMethod) (default: cryptographic hash
function). |
int |
hashCode() |
boolean |
intersect(BloomFilter<T> other)
Performs the intersection operation on two compatible bloom filters.
|
boolean |
isEmpty() |
static int |
optimalK(double n,
int m)
Calculates the optimal k (number of hash function) given n
(expected number of elements in bloom filter) and m (size of bloom
filter in bits).
|
static int |
optimalM(double n,
double p)
Calculates the optimal size m of the bloom filter in bits given
n (expected number of elements in bloom filter) and p
(tolerable false positive rate).
|
void |
setCryptographicHashFunction(java.lang.String hashFunctionName)
Uses the given cryptographic hash function.
|
void |
setCusomtHashFunction(BloomFilter.CustomHashFunction chf)
Uses a given custom hash function.
|
void |
setHashMethod(BloomFilter.HashMethod hashMethod)
Sets the method used to generate hash values.
|
int |
size()
Returns the size of the bloom filter.
|
java.lang.String |
toString() |
boolean |
union(BloomFilter<T> other)
Performs the union operation on two compatible bloom filters.
|
public BloomFilter(double n, double p)
n
- Expected number of elements inserted in the bloom filterp
- Tolerable false positive ratepublic BloomFilter(int m, int k)
optimalK(double, int)
and/or
optimalM(double, double)
, depending on wether you want to
provide the tolerable false positive probability or the exact size of the
bloom filter.m
- The size of the bloom filter in bits.k
- The number of hash functions to use.public BloomFilter(java.util.BitSet bloomFilter, int m, int k, BloomFilter.HashMethod hashMethod, java.lang.String hashFunctionName)
bloomFilter
- the bit vector used to construct the bloom filterk
- the number of hash functions usedhashMethod
- hash function typehashFunctionName
- name of the hash function to be used for the cryptographic
HashMethod, i.e. MD2, MD5, SHA-1, SHA-256, SHA-384 or SHA-512public static int optimalM(double n, double p)
n
- Expected number of elements inserted in the bloom filterp
- Tolerable false positive ratepublic static int optimalK(double n, int m)
n
- Expected number of elements inserted in the bloom filterm
- The size of the bloom filter in bits.public void setHashMethod(BloomFilter.HashMethod hashMethod)
clear()
before being used again.
Possible hash methods are: setCryptographicHashFunction(String)
. It slices the digest in
bit ranges of x with 2^x > m and does rejection sampling for each slice.
It is fast and very well distributed.
hashMethod
- the method used to generate hash valuespublic void setCryptographicHashFunction(java.lang.String hashFunctionName)
clear()
before being used again.hashFunctionName
- name of the hash function to be used, i.e. MD2, MD5, SHA-1,
SHA-256, SHA-384 or SHA-512public void setCusomtHashFunction(BloomFilter.CustomHashFunction chf)
clear()
before being used again.chf
- the custom hash functionpublic boolean add(byte[] value)
public boolean add(T value)
public void addAll(java.util.Collection<T> values)
public void clear()
public boolean contains(byte[] value)
public boolean contains(T value)
public boolean containsAll(java.util.Collection<T> values)
public java.util.BitSet getBitSet()
public int[] hash(java.lang.String value)
setHashMethod(HashMethod)
(default: cryptographic hash
function).value
- the value to be hashedpublic boolean union(BloomFilter<T> other)
other
- the other bloom filterpublic boolean intersect(BloomFilter<T> other)
other
- the other bloom filterpublic boolean isEmpty()
public double getFalsePositiveProbability(int n)
(1 - e^(-k * insertedElements / m)) ^ k
n
- The number of elements already inserted into the Bloomfilteradd(byte[])
operationspublic double getBitsPerElement(int n)
n
- The number of elements already inserted into the Bloomfilterpublic double getBitZeroProbability(int n)
n
- The number of elements already inserted into the Bloomfilteradd(byte[])
operationspublic int size()
public int getM()
public int getK()
public java.lang.String getCryptographicHashFunctionName()
public BloomFilter.HashMethod getHashMethod()
public java.lang.Object clone()
clone
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object