Package

com.twitter.scalding

typed

Permalink

package typed

Content Hierarchy
Visibility
  1. Public
  2. All

Type Members

  1. class BijectedSourceSink[T, U] extends TypedSource[U] with TypedSink[U]

    Permalink
  2. trait CoGroupable[K, +R] extends HasReducers with HasDescription with Serializable

    Permalink

    Represents something than can be CoGrouped with another CoGroupable

  3. trait CoGrouped[K, +R] extends KeyedListLike[K, R, CoGrouped] with CoGroupable[K, R] with WithReducers[CoGrouped[K, R]] with WithDescription[CoGrouped[K, R]]

    Permalink
  4. abstract class CoGroupedJoiner[K] extends Joiner

    Permalink
  5. case class ComputedValue[T](toTypedPipe: TypedPipe[T]) extends ValuePipe[T] with Product with Serializable

    Permalink
  6. case class Converter[R](conv: TupleConverter[R]) extends FlatMapFn[R] with Product with Serializable

    Permalink
  7. class DistinctCoGroupJoiner[K] extends CoGroupedJoiner[K]

    Permalink
  8. case class FilteredFn[R](fmap: FlatMapFn[R], fn: (R) ⇒ Boolean) extends FlatMapFn[R] with Product with Serializable

    Permalink
  9. sealed trait FlatMapFn[+R] extends (TupleEntry) ⇒ TraversableOnce[R] with Serializable

    Permalink

    Closures are difficult for serialization.

    Closures are difficult for serialization. This class avoids that.

  10. case class FlatMappedFn[T, R](fmap: FlatMapFn[T], fn: (T) ⇒ TraversableOnce[R]) extends FlatMapFn[R] with Product with Serializable

    Permalink
  11. trait Grouped[K, +V] extends KeyedListLike[K, V, UnsortedGrouped] with HashJoinable[K, V] with Sortable[V, [+x]SortedGrouped[K, x] with Reversable[SortedGrouped[K, x]]] with WithReducers[Grouped[K, V]] with WithDescription[Grouped[K, V]]

    Permalink

    This encodes the rules that 1) sorting is only possible before doing any reduce, 2) reversing is only possible after sorting.

    This encodes the rules that 1) sorting is only possible before doing any reduce, 2) reversing is only possible after sorting. 3) unsorted Groups can be CoGrouped or HashJoined

    This may appear a complex type, but it makes sure that code won't compile if it breaks the rule

  12. trait HasDescription extends AnyRef

    Permalink

    Used for objects that may have a description set to be used in .dot and MR step names.

  13. trait HasReducers extends AnyRef

    Permalink

    used for types that may know how many reducers they need e.g.

    used for types that may know how many reducers they need e.g. CoGrouped, Grouped, SortedGrouped, UnsortedGrouped

  14. sealed trait HashEqualsArrayWrapper[T] extends AnyRef

    Permalink
  15. final class HashEqualsBooleanArrayWrapper extends HashEqualsArrayWrapper[Boolean]

    Permalink
  16. final class HashEqualsByteArrayWrapper extends HashEqualsArrayWrapper[Byte]

    Permalink
  17. final class HashEqualsCharArrayWrapper extends HashEqualsArrayWrapper[Char]

    Permalink
  18. final class HashEqualsDoubleArrayWrapper extends HashEqualsArrayWrapper[Double]

    Permalink
  19. final class HashEqualsFloatArrayWrapper extends HashEqualsArrayWrapper[Float]

    Permalink
  20. final class HashEqualsIntArrayWrapper extends HashEqualsArrayWrapper[Int]

    Permalink
  21. final class HashEqualsLongArrayWrapper extends HashEqualsArrayWrapper[Long]

    Permalink
  22. final class HashEqualsObjectArrayWrapper[T] extends HashEqualsArrayWrapper[T]

    Permalink
  23. final class HashEqualsShortArrayWrapper extends HashEqualsArrayWrapper[Short]

    Permalink
  24. trait HashJoinable[K, +V] extends CoGroupable[K, V] with KeyedPipe[K]

    Permalink

    If we can HashJoin, then we can CoGroup, but not vice-versa i.e., HashJoinable is a strict subset of CoGroupable (CoGrouped, for instance is CoGroupable, but not HashJoinable).

  25. class HashJoiner[K, V, W, R] extends Joiner

    Permalink

    Only intended to be use to implement the hashCogroup on TypedPipe/Grouped

  26. case class IdentityReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with Grouped[K, V1] with Product with Serializable

    Permalink
  27. case class IdentityValueSortedReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], valueSort: Ordering[_ >: V1], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with SortedGrouped[K, V1] with Reversable[IdentityValueSortedReduce[K, V1]] with Product with Serializable

    Permalink
  28. final case class IterablePipe[T](iterable: Iterable[T]) extends TypedPipe[T] with Product with Serializable

    Permalink

    Creates a TypedPipe from an Iterable[T].

    Creates a TypedPipe from an Iterable[T]. Prefer TypedPipe.from.

    If you avoid toPipe, this class is more efficient than IterableSource.

  29. case class IteratorMappedReduce[K, V1, V2](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reduceFn: (K, Iterator[V1]) ⇒ Iterator[V2], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with UnsortedGrouped[K, V2] with Product with Serializable

    Permalink
  30. trait KeyedList[K, +T] extends KeyedListLike[K, T, KeyedList]

    Permalink

    This is for the case where you don't want to expose any structure but the ability to operate on an iterator of the values

  31. trait KeyedListLike[K, +T, +This[K, +T] <: KeyedListLike[K, T, This]] extends Serializable

    Permalink

    Represents sharded lists of items of type T There are exactly two fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations.

    Represents sharded lists of items of type T There are exactly two fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations. mapValueStream: further transforms all values, in order, one at a time, with a function from Iterator to another Iterator

  32. trait KeyedPipe[K] extends AnyRef

    Permalink

    Represents anything that starts as a TypedPipe of Key Value, where the value type has been erased.

    Represents anything that starts as a TypedPipe of Key Value, where the value type has been erased. Acts as proof that the K in the tuple has an Ordering

  33. case class LiteralValue[T](value: T) extends ValuePipe[T] with Product with Serializable

    Permalink
  34. case class MapFn[T, R](fmap: FlatMapFn[T], fn: (T) ⇒ R) extends FlatMapFn[R] with Product with Serializable

    Permalink
  35. class MappablePipeJoinEnrichment[T] extends AnyRef

    Permalink

    This class is for the syntax enrichment enabling .joinBy on TypedPipes.

    This class is for the syntax enrichment enabling .joinBy on TypedPipes. To access this, do import Syntax.joinOnMappablePipe

  36. class MemorySink[T] extends TypedSink[T]

    Permalink
  37. final case class MergedTypedPipe[T](left: TypedPipe[T], right: TypedPipe[T]) extends TypedPipe[T] with Product with Serializable

    Permalink
  38. trait MustHaveReducers extends HasReducers

    Permalink

    used for types that must know how many reducers they need e.g.

    used for types that must know how many reducers they need e.g. Sketched

  39. sealed trait NoStackAndThen[-A, +B] extends Serializable

    Permalink

    This type is used to implement .andThen on a function in a way that will never blow up the stack.

    This type is used to implement .andThen on a function in a way that will never blow up the stack. This is done to prevent deep scalding TypedPipe pipelines from blowing the stack

    This may be slow, but is used in scalding at planning time

  40. trait PartitionSchemed[P, T] extends SchemedSource with TypedSink[(P, T)] with Mappable[(P, T)] with HfsTapProvider

    Permalink

    Trait to assist with creating partitioned sources.

    Trait to assist with creating partitioned sources.

    Apart from the abstract members below, hdfsScheme and localScheme also need to be set. Note that for both of them the sink fields need to be set to only include the actual fields that should be written to file and not the partition fields.

  41. trait PartitionedDelimited extends Serializable

    Permalink

    Trait to assist with creating objects such as PartitionedTsv to read from separated files.

    Trait to assist with creating objects such as PartitionedTsv to read from separated files. Override separator, skipHeader, writeHeader as needed.

  42. case class PartitionedDelimitedSource[P, T](path: String, template: String, separator: String, fields: Fields, skipHeader: Boolean = false, writeHeader: Boolean = false, quote: String = "\"", strict: Boolean = true, safe: Boolean = true)(implicit mt: Manifest[T], valueSetter: TupleSetter[T], valueConverter: TupleConverter[T], partitionSetter: TupleSetter[P], partitionConverter: TupleConverter[P]) extends SchemedSource with PartitionSchemed[P, T] with Serializable with Product with Serializable

    Permalink

    Scalding source to read or write partitioned delimited text.

    Scalding source to read or write partitioned delimited text.

    For writing it expects a pair of (P, T), where P is the data used for partitioning and T is the output to write out. Below is an example.

    val data = List(
      (("a", "x"), ("i", 1)),
      (("a", "y"), ("j", 2)),
      (("b", "z"), ("k", 3))
    )
    IterablePipe(data, flowDef, mode)
      .write(PartitionedDelimited[(String, String), (String, Int)](args("out"), "col1=%s/col2=%s"))

    For reading it produces a pair (P, T) where P is the partition data and T is data in the files. Below is an example.

    val in: TypedPipe[((String, String), (String, Int))] = PartitionedDelimited[(String, String), (String, Int)](args("in"), "col1=%s/col2=%s")
  43. case class PartitionedTextLine[P](path: String, template: String, encoding: String = TextLine.DEFAULT_CHARSET)(implicit valueSetter: TupleSetter[String], valueConverter: TupleConverter[(Long, String)], partitionSetter: TupleSetter[P], partitionConverter: TupleConverter[P]) extends SchemedSource with TypedSink[(P, String)] with Mappable[(P, (Long, String))] with HfsTapProvider with Serializable with Product with Serializable

    Permalink

    Scalding source to read or write partitioned text.

    Scalding source to read or write partitioned text.

    For writing it expects a pair of (P, String), where P is the data used for partitioning and String is the output to write out. Below is an example.

    val data = List(
      (("a", "x"), "line1"),
      (("a", "y"), "line2"),
      (("b", "z"), "line3")
    )
    IterablePipe(data, flowDef, mode)
      .write(PartitionTextLine[(String, String)](args("out"), "col1=%s/col2=%s"))

    For reading it produces a pair (P, (Long, String)) where P is the partition data, Long is the offset into the file and String is a line from the file. Below is an example.

    val in: TypedPipe[((String, String), (Long, String))] = PartitionTextLine[(String, String)](args("in"), "col1=%s/col2=%s")
    path

    Base path of the partitioned directory

    template

    Template for the partitioned path

    encoding

    Text encoding of the file content

  44. class PipeTExtensions extends Serializable

    Permalink
  45. sealed trait ReduceStep[K, V1] extends KeyedPipe[K]

    Permalink

    This is a class that models the logical portion of the reduce step.

    This is a class that models the logical portion of the reduce step. details like where this occurs, the number of reducers, etc... are left in the Grouped class

  46. trait Reversable[+R] extends AnyRef

    Permalink
  47. case class SketchJoined[K, V, V2, R](left: Sketched[K, V], right: TypedPipe[(K, V2)], numReducers: Int)(joiner: (K, V, Iterable[V2]) ⇒ Iterator[R])(implicit evidence$1: Ordering[K]) extends MustHaveReducers with Product with Serializable

    Permalink
  48. case class Sketched[K, V](pipe: TypedPipe[(K, V)], numReducers: Int, delta: Double, eps: Double, seed: Int)(implicit serialization: (K) ⇒ Array[Byte], ordering: Ordering[K]) extends MustHaveReducers with Product with Serializable

    Permalink

    This class is generally only created by users with the TypedPipe.sketch method

  49. trait Sortable[+T, +Sorted[+_]] extends AnyRef

    Permalink

    All sorting methods defined here trigger Hadoop secondary sort on key + value.

    All sorting methods defined here trigger Hadoop secondary sort on key + value. Hadoop secondary sort is external sorting. i.e. it won't materialize all values of each key in memory on the reducer.

  50. trait SortedGrouped[K, +V] extends KeyedListLike[K, V, SortedGrouped] with WithReducers[SortedGrouped[K, V]] with WithDescription[SortedGrouped[K, V]]

    Permalink

    After sorting, we are no longer CoGroupable, and we can only call reverse in the initial SortedGrouped created from the Sortable: .sortBy(_._2).reverse for instance

    After sorting, we are no longer CoGroupable, and we can only call reverse in the initial SortedGrouped created from the Sortable: .sortBy(_._2).reverse for instance

    Once we have sorted, we cannot do a HashJoin or a CoGrouping

  51. case class TemplatePartition(partitionFields: Fields, template: String) extends Partition with Product with Serializable

    Permalink

    Creates a partition using the given template string.

    Creates a partition using the given template string.

    The template string needs to have %s as placeholder for a given field.

  52. trait TypedPipe[+T] extends Serializable

    Permalink

    Think of a TypedPipe as a distributed unordered list that may or may not yet have been materialized in memory or disk.

    Think of a TypedPipe as a distributed unordered list that may or may not yet have been materialized in memory or disk.

    Represents a phase in a distributed computation on an input data source Wraps a cascading Pipe object, and holds the transformation done up until that point

  53. class TypedPipeFactory[T] extends TypedPipe[T]

    Permalink

    This is a TypedPipe that delays having access to the FlowDef and Mode until toPipe is called

  54. class TypedPipeInst[T] extends TypedPipe[T]

    Permalink

    This is an instance of a TypedPipe that wraps a cascading Pipe

  55. trait TypedSink[-T] extends Serializable

    Permalink

    Opposite of TypedSource, used for writing into

  56. trait TypedSource[+T] extends Serializable

    Permalink
  57. trait UnsortedGrouped[K, +V] extends KeyedListLike[K, V, UnsortedGrouped] with HashJoinable[K, V] with WithReducers[UnsortedGrouped[K, V]] with WithDescription[UnsortedGrouped[K, V]]

    Permalink

    This is the state after we have done some reducing.

    This is the state after we have done some reducing. It is not possible to sort at this phase, but it is possible to do a CoGrouping or a HashJoin.

  58. case class UnsortedIdentityReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with UnsortedGrouped[K, V1] with Product with Serializable

    Permalink
  59. sealed trait ValuePipe[+T] extends Serializable

    Permalink

    ValuePipe is special case of a TypedPipe of just a optional single element.

    ValuePipe is special case of a TypedPipe of just a optional single element. It is like a distribute Option type It allows to perform scalar based operations on pipes like normalization.

  60. case class ValueSortedReduce[K, V1, V2](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], valueSort: Ordering[_ >: V1], reduceFn: (K, Iterator[V1]) ⇒ Iterator[V2], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with SortedGrouped[K, V2] with Product with Serializable

    Permalink
  61. trait WithDescription[+This <: WithDescription[This]] extends HasDescription

    Permalink

    Used for objects that may _set_ a description to be used in .dot and MR step names.

  62. case class WithDescriptionTypedPipe[T](typedPipe: TypedPipe[T], description: String) extends TypedPipe[T] with Product with Serializable

    Permalink
  63. case class WithOnComplete[T](typedPipe: TypedPipe[T], fn: () ⇒ Unit) extends TypedPipe[T] with Product with Serializable

    Permalink
  64. trait WithReducers[+This <: WithReducers[This]] extends HasReducers

    Permalink

    used for objects that may _set_ how many reducers they need e.g.

    used for objects that may _set_ how many reducers they need e.g. CoGrouped, Grouped, SortedGrouped, UnsortedGrouped

Value Members

  1. object BijectedSourceSink extends Serializable

    Permalink
  2. object CoGroupable extends Serializable

    Permalink
  3. object CoGrouped extends Serializable

    Permalink
  4. object CumulativeSum

    Permalink

    Extension for TypedPipe to add a cumulativeSum method.

    Extension for TypedPipe to add a cumulativeSum method. Given a TypedPipe with T = (GroupField, (SortField, SummableField)) cumulaitiveSum will return a SortedGrouped with the SummableField accumulated according to the sort field. eg: ('San Francisco', (100, 100)), ('San Francisco', (101, 50)), ('San Francisco', (200, 200)), ('Vancouver', (100, 50)), ('Vancouver', (101, 300)), ('Vancouver', (200, 100)) becomes ('San Francisco', (100, 100)), ('San Francisco', (101, 150)), ('San Francisco', (200, 300)), ('Vancouver', (100, 50)), ('Vancouver', (101, 350)), ('Vancouver', (200, 450))

    If you provide cumulativeSum a partition function you get the same result but you allow for more than one reducer per group. This is useful for when you have a single group that has a very large number of entries. For example in the previous example if you gave a partition function of the form { _ / 100 } then you would never have any one reducer deal with more than 2 entries.

  5. object Empty extends FlatMapFn[Nothing] with Product with Serializable

    Permalink
  6. object EmptyTypedPipe extends TypedPipe[Nothing] with Product with Serializable

    Permalink

    This object is the EmptyTypedPipe.

    This object is the EmptyTypedPipe. Prefer to create it with TypedPipe.empty

  7. object EmptyValue extends ValuePipe[Nothing] with Product with Serializable

    Permalink
  8. object FlattenGroup

    Permalink

    Autogenerated methods for flattening the nested value tuples that result after joining many pipes together.

    Autogenerated methods for flattening the nested value tuples that result after joining many pipes together. These methods can be used directly, or via the the joins available in MultiJoin.

  9. object Grouped extends Serializable

    Permalink
  10. object HashEqualsArrayWrapper

    Permalink
  11. object Joiner extends Serializable

    Permalink
  12. object KeyedListLike extends Serializable

    Permalink
  13. object LookupJoin extends Serializable

    Permalink

    lookupJoin simulates the behavior of a realtime system attempting to leftJoin (K, V) pairs against some other value type (JoinedV) by performing realtime lookups on a key-value Store.

    lookupJoin simulates the behavior of a realtime system attempting to leftJoin (K, V) pairs against some other value type (JoinedV) by performing realtime lookups on a key-value Store.

    An example would join (K, V) pairs of (URL, Username) against a service of (URL, ImpressionCount). The result of this join would be a pipe of (ShortenedURL, (Username, Option[ImpressionCount])).

    To simulate this behavior, lookupJoin accepts pipes of key-value pairs with an explicit time value T attached. T must have some sensible ordering. The semantics are, if one were to hit the right pipe's simulated realtime service at any time between T(tuple) T(tuple + 1), one would receive Some((K, JoinedV)(tuple)).

    The entries in the left pipe's tuples have the following meaning:

    T: The time at which the (K, W) lookup occurred. K: the join key. W: the current value for the join key.

    The right pipe's entries have the following meaning:

    T: The time at which the "service" was fed an update K: the join K. V: value of the key at time T

    Before the time T in the right pipe's very first entry, the simulated "service" will return None. After this time T, the right side will return None only if the key is absent, else, the service will return Some(joinedV).

  14. object MultiJoin extends Serializable

    Permalink

    This is an autogenerated object which gives you easy access to doing N-way joins so the types are cleaner.

    This is an autogenerated object which gives you easy access to doing N-way joins so the types are cleaner. However, it just calls the underlying methods on CoGroupable and flattens the resulting tuple

  15. object NoStackAndThen extends Serializable

    Permalink
  16. object PartitionUtil

    Permalink

    Utility functions to assist with creating partitioned sourced.

  17. object PartitionedCsv extends PartitionedDelimited

    Permalink

    Partitioned typed commma separated source.

  18. object PartitionedOsv extends PartitionedDelimited

    Permalink

    Partitioned typed \1 separated source (commonly used by Pig).

  19. object PartitionedPsv extends PartitionedDelimited

    Permalink

    Partitioned typed pipe separated source.

  20. object PartitionedTsv extends PartitionedDelimited

    Permalink

    Partitioned typed tab separated source.

  21. object SketchJoined extends Serializable

    Permalink
  22. object Syntax

    Permalink

    These are named syntax extensions that users can optionally import.

    These are named syntax extensions that users can optionally import. Avoid import Syntax._

  23. object TDsl extends Serializable with GeneratedTupleAdders

    Permalink

    implicits for the type-safe DSL import TDsl._ to get the implicit conversions from Grouping/CoGrouping to Pipe, to get the .toTypedPipe method on standard cascading Pipes.

    implicits for the type-safe DSL import TDsl._ to get the implicit conversions from Grouping/CoGrouping to Pipe, to get the .toTypedPipe method on standard cascading Pipes. to get automatic conversion of Mappable[T] to TypedPipe[T]

  24. object TypedPipe extends Serializable

    Permalink

    factory methods for TypedPipe, which is the typed representation of distributed lists in scalding.

    factory methods for TypedPipe, which is the typed representation of distributed lists in scalding. This object is here rather than in the typed package because a lot of code was written using the functions in the object, which we do not see how to hide with package object tricks.

  25. object TypedPipeDiff

    Permalink

    Some methods for comparing two typed pipes and finding out the difference between them.

    Some methods for comparing two typed pipes and finding out the difference between them.

    Has support for the normal case where the typed pipes are pipes of objects usable as keys in scalding (have an ordering, proper equals and hashCode), as well as some special cases for dealing with Arrays and thrift objects.

    See diffByHashCode for comparing typed pipes of objects that have no ordering but a stable hash code (such as Scrooge thrift).

    See diffByGroup for comparing typed pipes of objects that have no ordering *and* an unstable hash code.

  26. object TypedPipeFactory extends Serializable

    Permalink

    This is an implementation detail (and should be marked private)

  27. object TypedSink extends Serializable

    Permalink
  28. object ValuePipe extends Serializable

    Permalink

Ungrouped