RichPipe

Instance Constructors

new RichPipe(pipe: Pipe)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
def ++(that: Pipe): Pipe

Merge or Concatenate several pipes together with this one:
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def addTrap(trapsource: Source)(implicit flowDef: FlowDef, mode: Mode): Pipe

Adds a trap to the current pipe, which will capture all exceptions that occur in this pipe and save them to the trapsource given
Adds a trap to the current pipe, which will capture all exceptions that occur in this pipe and save them to the trapsource given
Traps do not include the original fields in a tuple, only the fields seen in an operation. Traps also do not include any exception information.
There can only be at most one trap for each pipe.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def blockJoinWithSmaller(fs: (Fields, Fields), otherPipe: Pipe, rightReplication: Int = 1, leftReplication: Int = 1, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

Performs a block join, otherwise known as a replicate fragment join (RF join).
Performs a block join, otherwise known as a replicate fragment join (RF join). The input params leftReplication and rightReplication control the replication of the left and right pipes respectively.
This is useful in cases where the data has extreme skew. A symptom of this is that we may see a job stuck for a very long time on a small number of reducers.
A block join is way to get around this: we add a random integer field and a replica field to every tuple in the left and right pipes. We then join on the original keys and on these new dummy fields. These dummy fields make it less likely that the skewed keys will be hashed to the same reducer.
The final data size is right * rightReplication + left * leftReplication but because of the fragmentation, we are guaranteed the same number of hits as the original join.
If the right pipe is really small then you are probably better off with a joinWithTiny. If however the right pipe is medium sized, then you are better off with a blockJoinWithSmaller, and a good rule of thumb is to set rightReplication = left.size / right.size and leftReplication = 1
Finally, if both pipes are of similar size, e.g. in case of a self join with a high data skew, then it makes sense to set leftReplication and rightReplication to be approximately equal.
Note
You can only use an InnerJoin or a LeftJoin with a leftReplication of 1 (or a RightJoin with a rightReplication of 1) when doing a blockJoin.

Definition Classes
JoinAlgorithms
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def coGroupBy(f: Fields, j: JoinMode = InnerJoinMode)(builder: (CoGroupBuilder) ⇒ GroupBuilder): Pipe

This method is used internally to implement all joins.
This method is used internally to implement all joins. You can use this directly if you want to implement something like a star join, e.g., when joining a single pipe to multiple other pipes. Make sure that you call this method on the larger pipe to make the grouping as efficient as possible.
If you are only joining two pipes, then you are better off using joinWithSmaller/joinWithLarger/joinWithTiny/leftJoinWithTiny.

Definition Classes
JoinAlgorithms
def collect[A, T](fs: (Fields, Fields))(fn: PartialFunction[A, T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

Filters all data that is defined for this partial function and then applies that function
def collectTo[A, T](fs: (Fields, Fields))(fn: PartialFunction[A, T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe
def crossWithSmaller(p: Pipe, replication: Int = 20): Pipe

Does a cross-product by doing a blockJoin.
Does a cross-product by doing a blockJoin. Useful when doing a large cross, if your cluster can take it. Prefer crossWithTiny

Definition Classes
JoinAlgorithms
def crossWithTiny(tiny: Pipe): Pipe

Doing a cross product with even a moderate sized pipe can create ENORMOUS output.
WARNING
Doing a cross product with even a moderate sized pipe can create ENORMOUS output. The use-case here is attaching a constant (e.g. a number or a dictionary or set) to each row in another pipe. A common use-case comes from a groupAll and reduction to one row, then you want to send the results back out to every element in a pipe
This uses joinWithTiny, so tiny pipe is replicated to all Mappers. If it is large, this will blow up. Get it: be foolish here and LOSE IT ALL!
Use at your own risk.

Definition Classes
JoinAlgorithms
def debug(dbg: PipeDebug): Pipe

Print the tuples that pass with the options configured in debugger For instance:
Print the tuples that pass with the options configured in debugger For instance:
```
debug(PipeDebug().toStdOut.printTuplesEvery(100))
```
def debug: Pipe

Print all the tuples that pass to stderr
def discard(f: Fields): Pipe

Discard the given fields, and keep the rest.
Discard the given fields, and keep the rest. Kind of the opposite of project method.
def distinct(f: Fields): Pipe

Returns the set of distinct tuples containing the specified fields
def each(fs: (Fields, Fields))(fn: (Fields) ⇒ Function[_]): Each

Convenience method for integrating with existing cascading Functions
def eachTo(fs: (Fields, Fields))(fn: (Fields) ⇒ Function[_]): Each

Same as above, but only keep the results field.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filter[A](f: Fields)(fn: (A) ⇒ Boolean)(implicit conv: TupleConverter[A]): Pipe

Keep only items that satisfy this predicate.
def filterNot[A](f: Fields)(fn: (A) ⇒ Boolean)(implicit conv: TupleConverter[A]): Pipe

Keep only items that don't satisfy this predicate.
Keep only items that don't satisfy this predicate. filterNot is equal to negating a filter operation.
```
filterNot('name) { name: String => name contains "a" }
```
is the same as:
```
filter('name) { name: String => !(name contains "a") }
```
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def flatMap[A, T](fs: (Fields, Fields))(fn: (A) ⇒ TraversableOnce[T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe
def flatMapTo[A, T](fs: (Fields, Fields))(fn: (A) ⇒ TraversableOnce[T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe
def flatten[T](fs: (Fields, Fields))(implicit conv: TupleConverter[TraversableOnce[T]], setter: TupleSetter[T]): Pipe

the same as
the same as
```
flatMap(fs) { it : TraversableOnce[T] => it }
```
Common enough to be useful.
def flattenTo[T](fs: (Fields, Fields))(implicit conv: TupleConverter[TraversableOnce[T]], setter: TupleSetter[T]): Pipe

the same as
the same as
```
flatMapTo(fs) { it : TraversableOnce[T] => it }
```
Common enough to be useful.
lazy val forceToDisk: Pipe

Force a materialization to disk in the flow.
Force a materialization to disk in the flow. This is useful before crossWithTiny if you filter just before. Ideally scalding/cascading would see this (and may in future versions), but for now it is here to aid in hand-tuning jobs
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupAll(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

This kills parallelism.
Warning
This kills parallelism. All the work is sent to one reducer.
Only use this in the case that you truly need all the data on one reducer.
Just about the only reasonable case of this method is to reduce all values of a column or count all the rows.
def groupAll: Pipe

Group all tuples down to one reducer.
Group all tuples down to one reducer. (due to cascading limitation). This is probably only useful just before setting a tail such as Database tail, so that only one reducer talks to the DB. Kind of a hack.
def groupAndShuffleRandomly(reducers: Int, seed: Long)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

Like groupAndShuffleRandomly(reducers : Int) but with a fixed seed.
def groupAndShuffleRandomly(reducers: Int)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

Like shard, except do some operation im the reducers
def groupBy(f: Fields)(builder: (GroupBuilder) ⇒ GroupBuilder): Pipe

group the Pipe based on fields
group the Pipe based on fields
builder is typically a block that modifies the given GroupBuilder the final OUTPUT of the block is used to schedule the new pipe each method in GroupBuilder returns this, so it is recommended to chain them and use the default input:
```
_.size.max('f1) etc...
```
def groupRandomly(n: Int, seed: Long)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

like groupRandomly(n : Int) with a given seed in the randomization
def groupRandomly(n: Int)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

Like groupAll, but randomly groups data into n reducers.
Like groupAll, but randomly groups data into n reducers.
you can provide a seed for the random number generator to get reproducible results
def groupRandomlyAux(n: Int, optSeed: Option[Long])(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

Attributes
protected
def hashCode(): Int

Definition Classes
AnyRef → Any
def insert[A](fs: Fields, value: A)(implicit setter: TupleSetter[A]): Pipe

Adds a field with a constant value.
Adds a field with a constant value.
Usage
```
insert('a, 1)
```
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def joinWithLarger(fs: (Fields, Fields), that: Pipe, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

same as reversing the order on joinWithSmaller
same as reversing the order on joinWithSmaller

Definition Classes
JoinAlgorithms
def joinWithSmaller(fs: (Fields, Fields), that: Pipe, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

Joins the first set of keys in the first pipe to the second set of keys in the second pipe.
Joins the first set of keys in the first pipe to the second set of keys in the second pipe. All keys must be unique UNLESS it is an inner join, then duplicated join keys are allowed, but the second copy is deleted (as cascading does not allow duplicated field names).
Smaller here means that the values/key is smaller than the left.
Avoid going crazy adding more explicit join modes. Instead do for some other join mode with a larger pipe:
```
.then { pipe => other.
          joinWithSmaller(('other1, 'other2)->('this1, 'this2), pipe, new FancyJoin)
      }
```
Definition Classes
JoinAlgorithms
def joinWithTiny(fs: (Fields, Fields), that: Pipe): Pipe

This does an assymmetric join, using cascading's "HashJoin".
This does an assymmetric join, using cascading's "HashJoin". This only runs through this pipe once, and keeps the right hand side pipe in memory (but is spillable).
Choose this when Left > max(mappers,reducers) * Right, or when the left side is three orders of magnitude larger.
joins the first set of keys in the first pipe to the second set of keys in the second pipe. Duplicated join keys are allowed, but the second copy is deleted (as cascading does not allow duplicated field names).
Warning
This does not work with outer joins, or right joins, only inner and left join versions are given.

Definition Classes
JoinAlgorithms
def joinerToJoinModes(j: Joiner): (Product with Serializable with JoinMode, Product with Serializable with JoinMode)

Definition Classes
JoinAlgorithms
def leftJoinWithLarger(fs: (Fields, Fields), that: Pipe, reducers: Int = 1): Pipe

This is joinWithLarger with joiner parameter fixed to LeftJoin.
This is joinWithLarger with joiner parameter fixed to LeftJoin. If the item is absent on the right put null for the keys and values

Definition Classes
JoinAlgorithms
def leftJoinWithSmaller(fs: (Fields, Fields), that: Pipe, reducers: Int = 1): Pipe

This is joinWithSmaller with joiner parameter fixed to LeftJoin.
This is joinWithSmaller with joiner parameter fixed to LeftJoin. If the item is absent on the right put null for the keys and values

Definition Classes
JoinAlgorithms
def leftJoinWithTiny(fs: (Fields, Fields), that: Pipe): HashJoin

Definition Classes
JoinAlgorithms
def limit(n: Long): Pipe

Keep at most n elements.
Keep at most n elements. This is implemented by keeping approximately n/k elements on each of the k mappers or reducers (whichever we wind up being scheduled on).
def map[A, T](fs: (Fields, Fields))(fn: (A) ⇒ T)(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

If you use a map function that does not accept TupleEntry args, which is the common case, an implicit conversion in GeneratedConversions will convert your function into a (TupleEntry => T).
If you use a map function that does not accept TupleEntry args, which is the common case, an implicit conversion in GeneratedConversions will convert your function into a (TupleEntry => T). The result type T is converted to a cascading Tuple by an implicit TupleSetter[T]. acceptable T types are primitive types, cascading Tuples of those types, or scala.Tuple(1-22) of those types.
After the map, the input arguments will be set to the output of the map, so following with filter or map is fine without a new using statement if you mean to operate on the output.
```
map('data -> 'stuff)
```
* if output equals input, REPLACE is used. * if output or input is a subset of the other SWAP is used. * otherwise we append the new fields (cascading Fields.ALL is used)
```
mapTo('data -> 'stuff)
```
Only the results (stuff) are kept (cascading Fields.RESULTS)
Note
Using mapTo is the same as using map followed by a project for selecting just the output fields
def mapTo[A, T](fs: (Fields, Fields))(fn: (A) ⇒ T)(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe
def name(s: String): Pipe

Rename the current pipe
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def normalize(f: Fields, useTiny: Boolean = true): Pipe

Divides sum of values for this variable by their sum; assumes without checking that division is supported on this type and that sum is not zero
Divides sum of values for this variable by their sum; assumes without checking that division is supported on this type and that sum is not zero
If those assumptions do not hold, will throw an exception -- consider checking sum sepsarately and/or using addTrap
in some cases, crossWithTiny has been broken, the implementation supports a work-around
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def pack[T](fs: (Fields, Fields))(implicit packer: TuplePacker[T], setter: TupleSetter[T]): Pipe

Maps the input fields into an output field of type T.
Maps the input fields into an output field of type T. For example:
```
pipe.pack[(Int, Int)] (('field1, 'field2) -> 'field3)
```
will pack fields 'field1 and 'field2 to field 'field3, as long as 'field1 and 'field2 can be cast into integers. The output field 'field3 will be of tupel (Int, Int)
def packTo[T](fs: (Fields, Fields))(implicit packer: TuplePacker[T], setter: TupleSetter[T]): Pipe

Same as pack but only the to fields are preserved.
def partition[A, R](fs: (Fields, Fields))(fn: (A) ⇒ R)(builder: (GroupBuilder) ⇒ GroupBuilder)(implicit conv: TupleConverter[A], ord: Ordering[R], rset: TupleSetter[R]): Pipe

Given a function, partitions the pipe into several groups based on the output of the function.
Given a function, partitions the pipe into several groups based on the output of the function. Then applies a GroupBuilder function on each of the groups.
Example: pipe .mapTo(()->('age, 'weight) { ... } .partition('age -> 'isAdult) { _ > 18 } { _.average('weight) } pipe now contains the average weights of adults and minors.
val pipe: Pipe

Definition Classes
RichPipe → JoinAlgorithms
def project(fields: Fields): Pipe

Keep only the given fields, and discard the rest.
Keep only the given fields, and discard the rest. takes any number of parameters as long as we can convert them to a fields object
def rename(fields: (Fields, Fields)): Pipe

Rename some set of N fields as another set of N fields
Rename some set of N fields as another set of N fields
Usage
```
rename('x -> 'z)
       rename(('x,'y) -> ('X,'Y))
```
Warning
rename('x,'y) is interpreted by scala as rename(Tuple2('x,'y)) which then does rename('x -> 'y). This is probably not what is intended but the compiler doesn't resolve the ambiguity. YOU MUST CALL THIS WITH A TUPLE2! If you don't, expect the unexpected.
def sample(fraction: Double, seed: Long): Pipe
def sample(fraction: Double): Pipe

Sample a fraction of elements.
Sample a fraction of elements. fraction should be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get reproducible results
def sampleWithReplacement(fraction: Double, seed: Int): Pipe
def sampleWithReplacement(fraction: Double): Pipe

Sample fraction of elements with return.
Sample fraction of elements with return. fraction should be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get reproducible results
def shard(n: Int, seed: Int): Pipe

Force a random shuffle of all the data to exactly n reducers, with a given seed if you need repeatability.
def shard(n: Int): Pipe

Force a random shuffle of all the data to exactly n reducers
def shuffle(shards: Int, seed: Long): Pipe
def shuffle(shards: Int): Pipe

Put all rows in random order
Put all rows in random order
you can provide a seed for the random number generator to get reproducible results
def skewJoinWithLarger(fs: (Fields, Fields), otherPipe: Pipe, sampleRate: Double = 0.001, reducers: Int = 1, replicator: SkewReplication = SkewReplicationA()): Pipe

Definition Classes
JoinAlgorithms
def skewJoinWithSmaller(fs: (Fields, Fields), otherPipe: Pipe, sampleRate: Double = 0.001, reducers: Int = 1, replicator: SkewReplication = SkewReplicationA()): Pipe

Performs a skewed join, which is useful when the data has extreme skew.
Performs a skewed join, which is useful when the data has extreme skew.
For example, imagine joining a pipe of Twitter's follow graph against a pipe of user genders, in order to find the gender distribution of the accounts every Twitter user follows. Since celebrities (e.g., Justin Bieber and Lady Gaga) have a much larger follower base than other users, and (under a standard join algorithm) all their followers get sent to the same reducer, the job will likely be stuck on a few reducers for a long time. A skewed join attempts to alleviate this problem.
This works as follows:
1. First, we sample from the left and right pipes with some small probability, in order to determine approximately how often each join key appears in each pipe. 2. We use these estimated counts to replicate the join keys, according to the given replication strategy. 3. Finally, we join the replicated pipes together.
sampleRate
This controls how often we sample from the left and right pipes when estimating key counts.
replicator
Algorithm for determining how much to replicate a join key in the left and right pipes. Note: since we do not set the replication counts, only inner joins are allowed. (Otherwise, replicated rows would stay replicated when there is no counterpart in the other pipe.)

Definition Classes
JoinAlgorithms
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def thenDo[T, U](pfn: (T) ⇒ U)(implicit in: (RichPipe) ⇒ T): U

Insert a function into the pipeline:
def toString(): String

Definition Classes
AnyRef → Any
def unique(f: Fields): Pipe

Returns the set of unique tuples containing the specified fields.
Returns the set of unique tuples containing the specified fields. Same as distinct
def unpack[T](fs: (Fields, Fields))(implicit unpacker: TupleUnpacker[T], conv: TupleConverter[T]): Pipe

The opposite of pack.
The opposite of pack. Unpacks the input field of type T into the output fields. For example:
```
pipe.unpack[(Int, Int)] ('field1 -> ('field2, 'field3))
```
will unpack 'field1 into 'field2 and 'field3
def unpackTo[T](fs: (Fields, Fields))(implicit unpacker: TupleUnpacker[T], conv: TupleConverter[T]): Pipe

Same as unpack but only the to fields are preserved.
def unpivot(fieldDef: (Fields, Fields)): Pipe

This is an analog of the SQL/Excel unpivot function which converts columns of data into rows of data.
This is an analog of the SQL/Excel unpivot function which converts columns of data into rows of data. Only the columns given as input fields are expanded in this way. For this operation to be reversible, you need to keep some unique key on each row. See GroupBuilder.pivot to reverse this operation assuming you leave behind a grouping key
Example
```
pipe.unpivot(('w,'x,'y,'z) -> ('feature, 'value))
```
takes rows like:
```
key, w, x, y, z
1, 2, 3, 4, 5
2, 8, 7, 6, 5
```
to:
```
key, feature, value
1, w, 2
1, x, 3
1, y, 4
```
etc...
def upstreamPipes: Set[Pipe]

Set of pipes reachable from this pipe (transitive closure of 'Pipe.getPrevious')
def using[C <: AnyRef { def release(): Unit }](bf: ⇒ C): AnyRef { ... /* 3 definitions in type refinement */ }

Beginning of block with access to expensive nonserializable state.
Beginning of block with access to expensive nonserializable state. The state object should contain a function release() for resource management purpose.
def verifyTypes[A](f: Fields)(implicit conv: TupleConverter[A]): Pipe

Text files can have corrupted data.
Text files can have corrupted data. If you use this function and a cascading trap you can filter out corrupted data from your pipe.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def write(outsource: Source)(implicit flowDef: FlowDef, mode: Mode): Pipe

Write all the tuples to the given source and return this Pipe

Related Docs: object RichPipe | package scalding

class RichPipe extends Serializable with JoinAlgorithms

Instance Constructors

new RichPipe(pipe: Pipe)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

def ++(that: Pipe): Pipe

final def ==(arg0: Any): Boolean

def addTrap(trapsource: Source)(implicit flowDef: FlowDef, mode: Mode): Pipe

final def asInstanceOf[T0]: T0

def blockJoinWithSmaller(fs: (Fields, Fields), otherPipe: Pipe, rightReplication: Int = 1, leftReplication: Int = 1, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

Note

def clone(): AnyRef

def coGroupBy(f: Fields, j: JoinMode = InnerJoinMode)(builder: (CoGroupBuilder) ⇒ GroupBuilder): Pipe

def collect[A, T](fs: (Fields, Fields))(fn: PartialFunction[A, T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

def collectTo[A, T](fs: (Fields, Fields))(fn: PartialFunction[A, T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

def crossWithSmaller(p: Pipe, replication: Int = 20): Pipe

def crossWithTiny(tiny: Pipe): Pipe

WARNING

def debug(dbg: PipeDebug): Pipe

def debug: Pipe

def discard(f: Fields): Pipe

def distinct(f: Fields): Pipe

def each(fs: (Fields, Fields))(fn: (Fields) ⇒ Function[_]): Each

def eachTo(fs: (Fields, Fields))(fn: (Fields) ⇒ Function[_]): Each

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def filter[A](f: Fields)(fn: (A) ⇒ Boolean)(implicit conv: TupleConverter[A]): Pipe

def filterNot[A](f: Fields)(fn: (A) ⇒ Boolean)(implicit conv: TupleConverter[A]): Pipe

def finalize(): Unit

def flatMap[A, T](fs: (Fields, Fields))(fn: (A) ⇒ TraversableOnce[T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

def flatMapTo[A, T](fs: (Fields, Fields))(fn: (A) ⇒ TraversableOnce[T])(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

def flatten[T](fs: (Fields, Fields))(implicit conv: TupleConverter[TraversableOnce[T]], setter: TupleSetter[T]): Pipe

def flattenTo[T](fs: (Fields, Fields))(implicit conv: TupleConverter[TraversableOnce[T]], setter: TupleSetter[T]): Pipe

lazy val forceToDisk: Pipe

final def getClass(): Class[_]

def groupAll(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

Warning

def groupAll: Pipe

def groupAndShuffleRandomly(reducers: Int, seed: Long)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

def groupAndShuffleRandomly(reducers: Int)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

def groupBy(f: Fields)(builder: (GroupBuilder) ⇒ GroupBuilder): Pipe

def groupRandomly(n: Int, seed: Long)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

def groupRandomly(n: Int)(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

def groupRandomlyAux(n: Int, optSeed: Option[Long])(gs: (GroupBuilder) ⇒ GroupBuilder): Pipe

def hashCode(): Int

def insert[A](fs: Fields, value: A)(implicit setter: TupleSetter[A]): Pipe

Usage

final def isInstanceOf[T0]: Boolean

def joinWithLarger(fs: (Fields, Fields), that: Pipe, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

def joinWithSmaller(fs: (Fields, Fields), that: Pipe, joiner: Joiner = new InnerJoin, reducers: Int = 1): Pipe

def joinWithTiny(fs: (Fields, Fields), that: Pipe): Pipe

Warning

def joinerToJoinModes(j: Joiner): (Product with Serializable with JoinMode, Product with Serializable with JoinMode)

def leftJoinWithLarger(fs: (Fields, Fields), that: Pipe, reducers: Int = 1): Pipe

def leftJoinWithSmaller(fs: (Fields, Fields), that: Pipe, reducers: Int = 1): Pipe

def leftJoinWithTiny(fs: (Fields, Fields), that: Pipe): HashJoin

def limit(n: Long): Pipe

def map[A, T](fs: (Fields, Fields))(fn: (A) ⇒ T)(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

Note

def mapTo[A, T](fs: (Fields, Fields))(fn: (A) ⇒ T)(implicit conv: TupleConverter[A], setter: TupleSetter[T]): Pipe

def name(s: String): Pipe

final def ne(arg0: AnyRef): Boolean

def normalize(f: Fields, useTiny: Boolean = true): Pipe

final def notify(): Unit

final def notifyAll(): Unit

def pack[T](fs: (Fields, Fields))(implicit packer: TuplePacker[T], setter: TupleSetter[T]): Pipe

def packTo[T](fs: (Fields, Fields))(implicit packer: TuplePacker[T], setter: TupleSetter[T]): Pipe

def partition[A, R](fs: (Fields, Fields))(fn: (A) ⇒ R)(builder: (GroupBuilder) ⇒ GroupBuilder)(implicit conv: TupleConverter[A], ord: Ordering[R], rset: TupleSetter[R]): Pipe

val pipe: Pipe

def project(fields: Fields): Pipe

def rename(fields: (Fields, Fields)): Pipe

Usage

Warning

def sample(fraction: Double, seed: Long): Pipe

def sample(fraction: Double): Pipe

def sampleWithReplacement(fraction: Double, seed: Int): Pipe

def sampleWithReplacement(fraction: Double): Pipe