Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce. We will inform the community when this feature is no longer experimental.
A Pipeline used for Mapreduce jobs.
MapreducePipeline is provided by the
The MapreducePipeline class is used to "wire-together" or connect all the steps needed to perform a specific Mapreduce job. It specifies the mapper, reducer, data input reader, output writer and so forth to be used to carry out the job.
Returns filenames from the output writer.
- class MapreducePipeline(job_name, mapper_spec, reducer_spec, input_reader_spec, output_writer_spec=None, mapper_params=None, reducer_params=None, shards=None)
- The MapreducePipeline constructor's arguments fully specify the Mapreduce job.
- The name of the Mapreduce job. This name shows up in the logs and in the UI.
- The name of the mapper used in this mapreduce job. The mapper processes the line by line input from the input reader specified in the input_reader_spec param.
- The name of the reducer used in this mapreduce job. The reducer performs work and yields results, using the optional output writer specified in the output_writer_spec param.
- The name of the input reader used in the mapper for this Mapreduce job. The mapper processes the line by line input from the input reader specified.
- The name of the output writer (if any) used to store results from this Mapreduce job.
- Parameters to use in the input reader.
- Parameters to use in the output writer.
- Number of shards to use for this Mapreduce job.
A Mapreduce instance has the following methods:
- Starts the Mapreduce job.