jarなしでjavaコードからhadoopジョブを呼び出す

私はこのコードを使ってhadoopという単語を実行しました。 WordCountDriverは、hadoop eclipseプラグインを使用してEclipse内から実行すると実行されます。 WordCountDriverは、マッパーとレデューサークラスをjarファイルとしてパッケージ化し、クラスパスにドロップすると、コマンドラインからも実行されます。jarなしでjavaコードからhadoopジョブを呼び出す

ただし、クラスパスに両方のクラスを追加したにもかかわらず、マッパーとレデューサークラスをクラスパスに追加することなくコマンドラインから実行しようとすると失敗します。 hadoopには、マッパー&レデューサークラスを通常のクラスファイルとして受け入れることができないという制約があるかどうかを知りたかったのです。瓶を常に作成することは必須ですか？

public class WordCountDriver extends Configured implements Tool {

 public static final String HADOOP_ROOT_DIR = "hdfs://universe:54310/app/hadoop/tmp"; 


static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { 

    private Text word = new Text(); 
    private final IntWritable one = new IntWritable(1); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 

     String line = value.toString(); 
     StringTokenizer itr = new StringTokenizer(line.toLowerCase()); 
     while (itr.hasMoreTokens()) { 
      word.set(itr.nextToken()); 
      context.write(word, one); 
     } 
    } 
}; 

static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { 

     int sum = 0; 

     for (IntWritable value : values) { 
      sum += value.get(); // process value 
     }  
     context.write(key, new IntWritable(sum)); 
    } 
}; 


/** 
* 
*/ 
public int run(String[] args) throws Exception { 

    Configuration conf = getConf(); 

    conf.set("mapred.job.tracker", "universe:54311"); 

    Job job = new Job(conf, "Word Count"); 

    // specify output types 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 

    // specify input and output dirs 
    FileInputFormat.addInputPath(job, new Path(HADOOP_ROOT_DIR + "/input")); 
    FileOutputFormat.setOutputPath(job, new Path(HADOOP_ROOT_DIR + "/output")); 

    // specify a mapper 
    job.setMapperClass(WordCountDriver.WordCountMapper.class); 

    // specify a reducer 
    job.setReducerClass(WordCountDriver.WordCountReducer.class); 
    job.setCombinerClass(WordCountDriver.WordCountReducer.class); 

    job.setJarByClass(WordCountDriver.WordCountMapper.class); 

    return job.waitForCompletion(true) ? 0 : 1; 
} 

/** 
* 
* @param args 
* @throws Exception 
*/ 
public static void main(String[] args) throws Exception { 
    int res = ToolRunner.run(new Configuration(), new WordCountDriver(), args); 
    System.exit(res); 
}

}

出典

2012-04-02 cosmos

JARファイルにそのそれはあなたが参照しているクラスパスされ、完全に明確ではないのですが、あなたはリモート Hadoopクラスタ上で実行している場合、最後に、あなたはすべてのクラスを提供する必要がありますhadoop jarの実行中にHadoopに送信されます。あなたのローカルプログラムのclasspathは無関係です。

実際にローカルプロセス内でHadoopインスタンスを実行しているので、おそらくローカルで動作しています。その場合、ローカルプログラムのクラスパスでクラスを見つけることができます。

出典

2012-04-02 14:51:20

マイドライバー・クラスがローカルであるとのHadoopは、1ノードのクラスタとして設定され：GenericOptionsParserと-libjarsオプションを使用してジョブ・クラスパス – cosmos

クラスをhadoopクラスパスに追加すると、利用可能なクライアント側（ドライバ側）になります。

マッパーとレデューサーはクラスタ全体で使用可能である必要があります。また、hadoopでこれを簡単にするには、jarファイルにバンドルしてJob.setJarByClass（..）クラスを指定するか、

http://hadoop.apache.org/common/docs/r1.0.1/api/org/apache/hadoop/util/GenericOptionsParser.html

出典

2012-04-02 14:53:43

jarなしでjavaコードからhadoopジョブを呼び出す

答えて

関連する問題