如何使用Hadoop的Partitioner

2025-03-28 07:09:28
推荐回答(1个)
回答(1):

  Partitioner partitions the key space.
  Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent to for reduction.
  HashPartitioner is the default Partitioner.
  大概意思就是:Partitioner控制着map任务的输出的key的分区,也就是会根据Partitioner对key进行分区,以方便传输给不同的reduce节点处理,分区的总数等于reduce的任务个数。默认的Partitioner是HashPartitioner。
  引用自
  http://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Partitioner

  2. 如何使用?
  ......
  Configuration conf = getConf();

  //Create Job
Job job = new Job(conf, "hello");
......
//set partitioner statement
job.setPartitionerClass(HashPartitioner.class);