Partitioner partitions the key space.
Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent to for reduction.
HashPartitioner is the default Partitioner.
大概意思就是:Partitioner控制着map任务的输出的key的分区,也就是会根据Partitioner对key进行分区,以方便传输给不同的reduce节点处理,分区的总数等于reduce的任务个数。默认的Partitioner是HashPartitioner。
引用自
http://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Partitioner
2. 如何使用?
......
Configuration conf = getConf();
//Create Job
Job job = new Job(conf, "hello");
......
//set partitioner statement
job.setPartitionerClass(HashPartitioner.class);