2012-04-05 56 views
2

私は以下のブタスクリプトを用意しています。ただし、Java EmbeddedPigを使用して同じスクリプトを実行した場合、最後のジョブ(ORDER BY)は失敗しました。 ORDER BYジョブをGROUPやFOREACH GENERATEなどの他の人に置き換えた場合、スクリプト全体がJava EmbeddedPigに成功しました。だから私は問題を引き起こすORDER BYだと思う。誰でもこれについて何か経験がありますか?どんな助けもありがとう!Javaを使用してEmbeddedPigを実行しているときに、PigスクリプトでORDER BYジョブが失敗しました

豚スクリプト:

REGISTER pig-udf-0.0.1-SNAPSHOT.jar; 
    user_similarity = LOAD '/tmp/sample-sim-score-results-31/part-r-00000' USING PigStorage('\t') AS (user_id: chararray, sim_user_id: chararray, basic_sim_score: float, alt_sim_score: float); 
    simplified_user_similarity = FOREACH user_similarity GENERATE $0 AS user_id, $1 AS sim_user_id, $2 AS sim_score; 
    grouped_user_similarity = GROUP simplified_user_similarity BY user_id; 
    ordered_user_similarity = FOREACH grouped_user_similarity {       
     sorted = ORDER simplified_user_similarity BY sim_score DESC; 
     top = LIMIT sorted 10; 
     GENERATE group, top; 
    }; 
    top_influencers = FOREACH ordered_user_similarity GENERATE com.aol.grapevine.similarity.pig.udf.AssignPointsToTopInfluencer($1, 10); 
    all_influence_scores = FOREACH top_influencers GENERATE FLATTEN($0); 
    grouped_influence_scores = GROUP all_influence_scores BY bag_of_topSimUserTuples::user_id; 
    influence_scores = FOREACH grouped_influence_scores GENERATE group AS user_id, SUM(all_influence_scores.bag_of_topSimUserTuples::points) AS influence_score; 
    ordered_influence_scores = ORDER influence_scores BY influence_score DESC; 
    STORE ordered_influence_scores INTO '/tmp/cc-test-results-1' USING PigStorage(); 

豚からのエラーログ:

12/04/05 10:00:56 INFO pigstats.ScriptState: Pig script settings are added to the job 
12/04/05 10:00:56 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 
12/04/05 10:00:58 INFO mapReduceLayer.JobControlCompiler: Setting up single store job 
12/04/05 10:00:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
12/04/05 10:00:58 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission. 
12/04/05 10:00:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
12/04/05 10:00:58 INFO input.FileInputFormat: Total input paths to process : 1 
12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths to process : 1 
12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths (combined) to process : 1 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating tmp-1546565755 in /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134-work-6955502337234509704 with rwxr-xr-x 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:58 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 
12/04/05 10:00:58 INFO mapred.TaskRunner: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/pigsample_854728855_1333645258470 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.jar.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.jar.crc 
12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.split.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.split.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.splitmetainfo.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.splitmetainfo.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.xml.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.xml.crc 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.jar <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.jar 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.split <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.split 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.splitmetainfo <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.splitmetainfo 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.xml <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.xml 
12/04/05 10:00:59 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
12/04/05 10:00:59 INFO mapred.MapTask: io.sort.mb = 100 
12/04/05 10:00:59 INFO mapred.MapTask: data buffer = 79691776/99614720 
12/04/05 10:00:59 INFO mapred.MapTask: record buffer = 262144/327680 
12/04/05 10:00:59 WARN mapred.LocalJobRunner: job_local_0004 
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) 
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) 
    ... 6 more 
12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Deleted path /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 
12/04/05 10:00:59 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId: job_local_0004 
12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: job job_local_0004 has failed! Stop running all dependent jobs 
12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: 100% complete 
12/04/05 10:01:04 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 
12/04/05 10:01:04 INFO pigstats.PigStats: Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
0.20.2-cdh3u3 0.8.1-cdh3u3 cchuang 2012-04-05 10:00:34 2012-04-05 10:01:04 GROUP_BY,ORDER_BY 

Some jobs have failed! Stop running all dependent jobs 

Job Stats (time in seconds): 
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs 
job_local_0001 0 0 0 0 0 0 0 0 all_influence_scores,grouped_user_similarity,simplified_user_similarity,user_similarity GROUP_BY 
job_local_0002 0 0 0 0 0 0 0 0 grouped_influence_scores,influence_scores GROUP_BY,COMBINER 
job_local_0003 0 0 0 0 0 0 0 0 ordered_influence_scores SAMPLER 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_local_0004 ordered_influence_scores ORDER_BY Message: Job failed! Error - NA /tmp/cc-test-results-1, 

Input(s): 
Successfully read 0 records from: "/tmp/sample-sim-score-results-31/part-r-00000" 

Output(s): 
Failed to produce result in "/tmp/cc-test-results-1" 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local_0001 -> job_local_0002, 
job_local_0002 -> job_local_0003, 
job_local_0003 -> job_local_0004, 
job_local_0004 


12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: Some jobs have failed! Stop running all dependent jobs 
+0

こんにちは黄、私はまったく同じ問題を同じ質問でより簡単にしています。問題の解決策または原因を見つけましたか?ありがとう。 – AOvejero

+0

私は同じ問題に遭遇しました。これらの2行はエラーと関係している必要がありますが、私は修正する方法がわかりません:-( --- '12/04/05 10:00:58 WARN mapred.LocalJobRunner:LocalJobRunnerは現在のシンボリックリンクをサポートしていません --- '入力パスが存在しません:file:/ Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470' –

+0

http://stackoverflow.com/questions/15983956/pig-orderの回答を参照してください。 -command-fails/18035754#18035754 – user2649134

答えて

0

は必ずPIG_HOME環境変数が豚のインストールに設定されていることを確認します。

関連する問題