如何使用Hadoop的Partitioner

2025-03-28 07:09:28

推荐回答（1个）

回答（1）：

　　Partitioner partitions the key space.
　　Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent to for reduction.
　　HashPartitioner is the default Partitioner.
　　大概意思就是：Partitioner控制着map任务的输出的key的分区，也就是会根据Partitioner对key进行分区，以方便传输给不同的reduce节点处理，分区的总数等于reduce的任务个数。默认的Partitioner是HashPartitioner。
　　引用自
　　http://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Partitioner

　　2. 如何使用？
　　......
　　Configuration conf = getConf();

　　//Create Job
Job job = new Job(conf, "hello");
......
//set partitioner statement
job.setPartitionerClass(HashPartitioner.class);