com.twitter.summingbird.batch.state
is called when the batch is started.
is called when the batch is started. notice the intersection is not necessarily the same as [startBatch..endBatch), since there could be only part of the data available given a requested time range. It returns a batch token of typ T, which will be provided when the job is completed or failed for checkpointing.
is called when the scalding job failed
is called when the planning is failed to start the scalding job
is called when the batches are finished successfully
usually it's startBatch + numBatchesToRun, it's used to calculate requested time interval for the job
should get startBatch by checking the previous batch run, it's used to calculate requested time interval for the job
To create an implemetation of CheckpointState you need first define a class of CheckpointStore see
com.twitter.summingbird.batch.state.HDFSCheckpointStore
for an exampleSubclass of CheckpointStore should be responsible for getting the startBatch by checking the checkpoints of previous batch run and getting the endBatch by number of batches the clients asks to run.
The CheckpointStore should provide concrete implementation of how to read previous batch and checkpoint current batch
Type T is the token of each batch run created by startBatch(), the token is then provided back to checkPoint store for checkpoint Success or Failure.