com.twitter.summingbird.scalding.store

DirectoryBatchedStore

class DirectoryBatchedStore[K <: Writable, V <: Writable] extends BatchedStore[K, V]

DirectoryBatched Scalding Store, which only contains (K, V) data pairs in the data. Batch information is presented in directory pathes.

Source
DirectoryBatchedStore.scala
Linear Supertypes
BatchedStore[K, V], Store[K, V], Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DirectoryBatchedStore
  2. BatchedStore
  3. Store
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DirectoryBatchedStore(rootPath: String)(implicit inBatcher: Batcher, ord: Ordering[K], tset: TupleSetter[(K, V)], tconv: TupleConverter[(K, V)])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. val batcher: Batcher

    The batcher for this store

    The batcher for this store

    Definition Classes
    DirectoryBatchedStoreBatchedStore
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def getFileStatus(p: String, conf: Configuration): (Boolean, Timestamp, String)

    Attributes
    protected
  14. def getLastBatchID(exclusiveUB: BatchID, mode: Mode): BatchID

    Attributes
    protected
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. final def merge(delta: PipeFactory[(K, V)], sg: Semigroup[V], commutativity: Commutativity, reducers: Int): PipeFactory[(K, (Option[V], V))]

    instances of this trait MAY NOT change the logic here.

    instances of this trait MAY NOT change the logic here. This always follows the rule that we look for existing data (avoiding reading deltas in that case), then we fall back to the last checkpointed output by calling readLast. In that case, we compute the results by rolling forward

    Definition Classes
    BatchedStoreStore
  18. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  21. val ordering: Ordering[K]

    Definition Classes
    DirectoryBatchedStoreBatchedStore
  22. def partialMerge[K1](delta: PipeFactory[(K1, V)], sg: Semigroup[V], commutativity: Commutativity): PipeFactory[(K1, V)]

    For each batch, collect up values with the same key on mapside before the keys are expanded.

    For each batch, collect up values with the same key on mapside before the keys are expanded.

    Definition Classes
    BatchedStoreStore
  23. final def planReadLast: PlannerOutput[(BatchID, FlowProducer[TypedPipe[(K, V)]])]

    This is the monadic version of readLast, returns the BatchID actually on disk

    This is the monadic version of readLast, returns the BatchID actually on disk

    Definition Classes
    BatchedStore
  24. def pruning: PrunedSpace[(K, V)]

    Override this to set up store pruning, by default, no (key,value) pairs are pruned.

    Override this to set up store pruning, by default, no (key,value) pairs are pruned. This is a house keeping function to permanently remove entries matching a criteria.

    Definition Classes
    BatchedStore
  25. final def readAfterLastBatch[T](input: PipeFactory[T]): PlannerOutput[(BatchID, FlowProducer[TypedPipe[(K, V)]], FlowToPipe[T])]

    Reads the input data after the last batch written.

    Reads the input data after the last batch written.

    Returns: - the BatchID of the last batch written - the snapshot of the store just before this state - the data from this input covering all the time SINCE the last snapshot

    Definition Classes
    BatchedStore
  26. def readDeltaLog(delta: PipeFactory[(K, V)]): PipeFactory[(K, V)]

    This combines the current inputs along with the last checkpoint on disk to get a log of all deltas with a timestamp This is useful to leftJoin against a store.

    This combines the current inputs along with the last checkpoint on disk to get a log of all deltas with a timestamp This is useful to leftJoin against a store. TODO: This should not limit to batch boundaries, the batch store should handle only writing the data for full batches, but we can materialize more data if it is needed downstream. Note: the returned time interval NOT include the time of the snapshot data point (which is exactly 1 millisecond before the start of the interval).

    Definition Classes
    BatchedStore
  27. def readLast(exclusiveUB: BatchID, mode: Mode): Right[Nothing, (BatchID, Reader[(FlowDef, Mode), TypedPipe[(K, V)]])]

    Get the most recent last batch and the ID (strictly less than the input ID) The "Last" is the stream with only the newest value for each key, within the batch combining the last from batchID and the deltas from batchID.

    Get the most recent last batch and the ID (strictly less than the input ID) The "Last" is the stream with only the newest value for each key, within the batch combining the last from batchID and the deltas from batchID.next you get the stream for batchID.next

    Definition Classes
    DirectoryBatchedStoreBatchedStore
  28. val rootPath: String

  29. def select(b: List[BatchID]): List[BatchID]

    Override select if you don't want to materialize every batch.

    Override select if you don't want to materialize every batch. Note that select MUST return a list containing the final batch in the supplied list; otherwise data would be lost.

    Definition Classes
    BatchedStore
  30. def sumByBatches[K1, V](ins: TypedPipe[(Timestamp, (K1, V))], capturedBatcher: Batcher, commutativity: Commutativity)(implicit arg0: Semigroup[V]): TypedPipe[((K1, BatchID), (Timestamp, V))]

    Attributes
    protected
    Definition Classes
    BatchedStore
  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  32. final def timeSpanToBatches: PlannerOutput[List[BatchID]]

    This gives the batches needed to cover the requested input This will always be non-empty

    This gives the batches needed to cover the requested input This will always be non-empty

    Definition Classes
    BatchedStore
  33. def toString(): String

    Definition Classes
    AnyRef → Any
  34. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. def withInitialBatch(firstNonZero: BatchID): BatchedStore[K, V]

    For (firstNonZero - 1) we read empty.

    For (firstNonZero - 1) we read empty. For all before we error on read. For all later, we proxy On write, we throw if batchID is less than firstNonZero

    Definition Classes
    BatchedStore
  38. def writeLast(batchID: BatchID, lastVals: TypedPipe[(K, V)])(implicit flowDef: FlowDef, mode: Mode): Unit

    Record a computed batch of code

    Record a computed batch of code

    Definition Classes
    DirectoryBatchedStoreBatchedStore

Inherited from BatchedStore[K, V]

Inherited from Store[K, V]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped