Create an object that can be used to register a given URI (representing an hdfs file) that should be added to the DistributedCache.
Create an object that can be used to register a given URI (representing an hdfs file) that should be added to the DistributedCache.
The fully qualified URI that points to the hdfs file to add
A CachedFile instance
The distributed cache is simply hadoop's method for allowing each node local access to a specific file. The registration of that file must be called with the Configuration of the job, and not when it's on a mapper or reducer. Additionally, a unique name for the node-local access path must be used to prevent collisions in the cluster. This class provides this functionality.
In the configuration phase, the file URI is used to construct an UncachedFile instance. The name of the symlink to use on the mappers is only available after calling the add() method, which registers the file and computes the unique symlink name and returns a CachedFile instance. The CachedFile instance is Serializable, it's designed to be assigned to a val and accessed later.
The local symlink is available thorugh .file or .path depending on what type you need.
example: