public class MapFileRecordWriter extends AbstractMapFileWriter<java.util.List<Writable>> implements RecordWriter
MapFileRecordReader
MapFileRecordReader
convertTextTo, counter, DEFAULT_FILENAME_PATTERN, DEFAULT_INDEX_INTERVAL, DEFAULT_MAP_FILE_SPLIT_SIZE, filenamePattern, hadoopConfiguration, indexInterval, isClosed, KEY_CLASS, MAP_FILE_INDEX_INTERVAL_KEY, mapFileSplitSize, opts, outputDir, outputFiles, writers
APPEND
Constructor and Description |
---|
MapFileRecordWriter(java.io.File outputDir)
Constructor for all default values.
|
MapFileRecordWriter(java.io.File outputDir,
int mapFileSplitSize)
Constructor for most default values.
|
MapFileRecordWriter(java.io.File outputDir,
int mapFileSplitSize,
WritableType convertTextTo) |
MapFileRecordWriter(java.io.File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
org.apache.hadoop.conf.Configuration hadoopConfiguration) |
MapFileRecordWriter(java.io.File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
org.apache.hadoop.conf.Configuration hadoopConfiguration) |
MapFileRecordWriter(java.io.File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
java.lang.String filenamePattern,
org.apache.hadoop.conf.Configuration hadoopConfiguration) |
MapFileRecordWriter(java.io.File outputDir,
WritableType convertTextTo) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.hadoop.io.Writable |
getHadoopWritable(java.util.List<Writable> input) |
protected java.lang.Class<? extends org.apache.hadoop.io.Writable> |
getValueClass() |
void |
initialize(Configuration configuration,
InputSplit split,
Partitioner partitioner)
Initialize the record reader with the given configuration
and
InputSplit |
void |
initialize(InputSplit inputSplit,
Partitioner partitioner)
Initialize a record writer with the given input split
|
boolean |
supportsBatch()
Returns true if this record writer
supports efficient batch writing using
RecordWriter.writeBatch(List) |
PartitionMetaData |
writeBatch(java.util.List<java.util.List<Writable>> batch)
Write a batch of records
|
close, convertTextWritables, getConf, setConf, write
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
close, write
getConf, setConf
public MapFileRecordWriter(java.io.File outputDir)
outputDir
- Output directory for the map file(s)public MapFileRecordWriter(@NonNull java.io.File outputDir, int mapFileSplitSize)
outputDir
- Output directory for the map file(s)mapFileSplitSize
- Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize
examples. This can be used to avoid having a single multi gigabyte map file, which may
be undesirable in some cases (transfer across the network, for example).public MapFileRecordWriter(@NonNull java.io.File outputDir, WritableType convertTextTo)
outputDir
- Output directory for the map file(s)convertTextTo
- If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.public MapFileRecordWriter(@NonNull java.io.File outputDir, int mapFileSplitSize, WritableType convertTextTo)
outputDir
- Output directory for the map file(s)mapFileSplitSize
- Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize
examples. This can be used to avoid having a single multi gigabyte map file, which may
be undesirable in some cases (transfer across the network, for example).convertTextTo
- If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.public MapFileRecordWriter(@NonNull java.io.File outputDir, int mapFileSplitSize, WritableType convertTextTo, org.apache.hadoop.conf.Configuration hadoopConfiguration)
outputDir
- Output directory for the map file(s)mapFileSplitSize
- Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize
examples. This can be used to avoid having a single multi gigabyte map file, which may
be undesirable in some cases (transfer across the network, for example).convertTextTo
- If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.hadoopConfiguration
- Hadoop configuration.public MapFileRecordWriter(@NonNull java.io.File outputDir, int mapFileSplitSize, WritableType convertTextTo, int indexInterval, org.apache.hadoop.conf.Configuration hadoopConfiguration)
outputDir
- Output directory for the map file(s)mapFileSplitSize
- Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize
examples. This can be used to avoid having a single multi gigabyte map file, which may
be undesirable in some cases (transfer across the network, for example).convertTextTo
- If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.indexInterval
- Index interval for the Map file. Defaults to 1, which is suitable for most caseshadoopConfiguration
- Hadoop configuration.public MapFileRecordWriter(@NonNull java.io.File outputDir, int mapFileSplitSize, WritableType convertTextTo, int indexInterval, java.lang.String filenamePattern, org.apache.hadoop.conf.Configuration hadoopConfiguration)
outputDir
- Output directory for the map file(s)mapFileSplitSize
- Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize
examples. This can be used to avoid having a single multi gigabyte map file, which may
be undesirable in some cases (transfer across the network, for example).convertTextTo
- If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.indexInterval
- Index interval for the Map file. Defaults to 1, which is suitable for most casesfilenamePattern
- The naming pattern for the map files. Used with String.format(pattern, int)hadoopConfiguration
- Hadoop configuration.protected java.lang.Class<? extends org.apache.hadoop.io.Writable> getValueClass()
getValueClass
in class AbstractMapFileWriter<java.util.List<Writable>>
protected org.apache.hadoop.io.Writable getHadoopWritable(java.util.List<Writable> input)
getHadoopWritable
in class AbstractMapFileWriter<java.util.List<Writable>>
public boolean supportsBatch()
RecordWriter
RecordWriter.writeBatch(List)
supportsBatch
in interface RecordWriter
public void initialize(InputSplit inputSplit, Partitioner partitioner) throws java.lang.Exception
RecordWriter
initialize
in interface RecordWriter
inputSplit
- the input split to initialize withjava.lang.Exception
public void initialize(Configuration configuration, InputSplit split, Partitioner partitioner) throws java.lang.Exception
RecordWriter
InputSplit
initialize
in interface RecordWriter
configuration
- the configuration to iniailize withsplit
- the split to usejava.lang.Exception
public PartitionMetaData writeBatch(java.util.List<java.util.List<Writable>> batch) throws java.io.IOException
RecordWriter
writeBatch
in interface RecordWriter
batch
- the batch to writejava.io.IOException