com.twitter.common.stats
Class ReservoirSampler<T>

java.lang.Object
  extended by com.twitter.common.stats.ReservoirSampler<T>
Type Parameters:
T - Type of the sample

public class ReservoirSampler<T>
extends Object

An in memory implementation of Reservoir Sampling for sampling from a population.

Several optimizations can be done. Especially, one can avoid rolling the dice as many times as the size of the population with an involved trick. See "Random Sampling with a Reservoir", Vitter, 1985

TODO (delip): Fix this when the problem arises


Constructor Summary
ReservoirSampler(int numSamples)
          Create a new sampler with a certain reservoir size using the default random number generator.
ReservoirSampler(int numSamples, Random random)
          Create a new sampler with a certain reservoir size using a supplied random number generator.
 
Method Summary
 Iterable<T> getSamples()
          Get samples collected in the reservoir.
 void sample(T item)
          Sample an item and store in the reservoir if needed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ReservoirSampler

public ReservoirSampler(int numSamples,
                        Random random)
Create a new sampler with a certain reservoir size using a supplied random number generator.

Parameters:
numSamples - Maximum number of samples to retain in the reservoir. Must be non-negative.
random - Instance of the random number generator to use for sampling

ReservoirSampler

public ReservoirSampler(int numSamples)
Create a new sampler with a certain reservoir size using the default random number generator.

Parameters:
numSamples - Maximum number of samples to retain in the reservoir. Must be non-negative.
Method Detail

sample

public void sample(T item)
Sample an item and store in the reservoir if needed.

Parameters:
item - The item to sample - may not be null.

getSamples

public Iterable<T> getSamples()
Get samples collected in the reservoir.

Returns:
A sequence of the samples. No guarantee is provided on the order of the samples.