Trait

com.twitter.scalding

SuccessFileSource

Related Doc: package scalding

Permalink

trait SuccessFileSource extends FileSource

Ensures that a _SUCCESS file is present in every directory included by a glob, as well as the requirements of FileSource.pathIsGood. The set of directories to check for _SUCCESS is determined by examining the list of all paths returned by globPaths and adding parent directories of the non-hidden files encountered. pathIsGood should still be considered just a best-effort test. As an illustration the following layout with an in-flight job is accepted for the glob dir*/*:

  dir1/_temporary
  dir2/file1
  dir2/_SUCCESS

Similarly if dir1 is physically empty pathIsGood is still true for dir*/* above

On the other hand it will reject an empty output directory of a finished job:

  dir1/_SUCCESS

Source
FileSource.scala
Linear Supertypes
Type Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SuccessFileSource
  2. FileSource
  3. HfsTapProvider
  4. LocalSourceOverride
  5. SchemedSource
  6. Source
  7. Serializable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def hdfsPaths: Iterable[String]

    Permalink
    Definition Classes
    FileSource
  2. abstract def localPaths: Iterable[String]

    Permalink

    A path to use for the local tap.

    A path to use for the local tap.

    Definition Classes
    LocalSourceOverride

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def checkFlowDefNotNull()(implicit flowDef: FlowDef, mode: Mode): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Source
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def createHdfsReadTap(hdfsMode: Hdfs): Tap[JobConf, _, _]

    Permalink
    Attributes
    protected
    Definition Classes
    FileSource
  8. def createHfsTap(scheme: Scheme[JobConf, RecordReader[_, _], OutputCollector[_, _], _, _], path: String, sinkMode: SinkMode): Hfs

    Permalink
    Definition Classes
    HfsTapProvider
  9. def createLocalTap(sinkMode: SinkMode): Tap[JobConf, _, _]

    Permalink

    Creates a local tap.

    Creates a local tap.

    sinkMode

    The mode for handling output conflicts.

    returns

    A tap.

    Definition Classes
    LocalSourceOverride
  10. def createTap(readOrWrite: AccessMode)(implicit mode: Mode): Tap[_, _, _]

    Permalink

    Subclasses of Source MUST override this method.

    Subclasses of Source MUST override this method. They may call out to TestTapFactory for making Taps suitable for testing.

    Definition Classes
    FileSourceSource
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def goodHdfsPaths(hdfsMode: Hdfs): Iterable[String]

    Permalink
    Attributes
    protected
    Definition Classes
    FileSource
  16. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  17. def hdfsReadPathsAreGood(conf: Configuration): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    FileSource
  18. def hdfsScheme: Scheme[JobConf, RecordReader[_, _], OutputCollector[_, _], _, _]

    Permalink

    The scheme to use if the source is on hdfs.

    The scheme to use if the source is on hdfs.

    Definition Classes
    SchemedSource
  19. def hdfsWritePath: String

    Permalink
    Definition Classes
    FileSource
  20. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  21. def localScheme: Scheme[Properties, InputStream, OutputStream, _, _]

    Permalink

    The scheme to use if the source is local.

    The scheme to use if the source is local.

    Definition Classes
    SchemedSource
  22. def localWritePath: String

    Permalink
    Definition Classes
    LocalSourceOverride
  23. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. def pathIsGood(p: String, conf: Configuration): Boolean

    Permalink

    Determines if a path is 'valid' for this source.

    Determines if a path is 'valid' for this source. In strict mode all paths must be valid. In non-strict mode, all invalid paths will be filtered out.

    Subclasses can override this to validate paths.

    The default implementation is a quick sanity check to look for missing or empty directories. It is necessary but not sufficient -- there are cases where this will return true but there is in fact missing data.

    TODO: consider writing a more in-depth version of this method in TimePathedSource that looks for TODO: missing days / hours etc.

    Attributes
    protected
    Definition Classes
    SuccessFileSourceFileSource
  27. def read(implicit flowDef: FlowDef, mode: Mode): Pipe

    Permalink
    Definition Classes
    Source
  28. val sinkMode: SinkMode

    Permalink
    Definition Classes
    SchemedSource
  29. def sourceId: String

    Permalink

    This is a name the refers to this exact instance of the source (put another way, if s1.sourceId == s2.sourceId, the job should work the same if one is replaced with the other

    This is a name the refers to this exact instance of the source (put another way, if s1.sourceId == s2.sourceId, the job should work the same if one is replaced with the other

    Definition Classes
    Source
  30. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  31. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  32. def transformForRead(pipe: Pipe): Pipe

    Permalink
    Attributes
    protected
    Definition Classes
    Source
  33. def transformForWrite(pipe: Pipe): Pipe

    Permalink
    Attributes
    protected
    Definition Classes
    Source
  34. def transformInTest: Boolean

    Permalink

    The mock passed in to scalding.JobTest may be considered as a mock of the Tap or the Source.

    The mock passed in to scalding.JobTest may be considered as a mock of the Tap or the Source. By default, as of 0.9.0, it is considered as a Mock of the Source. If you set this to true, the mock in TestMode will be considered to be a mock of the Tap (which must be transformed) and not the Source.

    Definition Classes
    Source
  35. def validateTaps(mode: Mode): Unit

    Permalink
    Definition Classes
    FileSourceSource
  36. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. def writeFrom(pipe: Pipe)(implicit flowDef: FlowDef, mode: Mode): Pipe

    Permalink

    write the pipe but return the input so it can be chained into the next operation

    write the pipe but return the input so it can be chained into the next operation

    Definition Classes
    Source

Deprecated Value Members

  1. def readAtSubmitter[T](implicit mode: Mode, conv: TupleConverter[T]): Stream[T]

    Permalink
    Definition Classes
    Source
    Annotations
    @deprecated
    Deprecated

    (Since version 0.9.0) replace with Mappable.toIterator

Inherited from FileSource

Inherited from HfsTapProvider

Inherited from LocalSourceOverride

Inherited from SchemedSource

Inherited from Source

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped