0

3ノードのEMRクラスターを使用してkafkaからs3にデータを移動するためにgobblinを実行しています。私はhadoop 2.6.0で走っていますし、2.6.0に対してもゴブリンを作りました。Gobblin MAPRジョブがEMRで正常に実行されていますが、s3に出力がありません

map-reduceジョブが正常に実行されたようです。私のhdfs上で私はメトリックと作業ディレクトリを参照してください。メトリックにはいくつかのファイルがありますが、作業ディレクトリは空です。 S3バケットには最終出力があってもデータはありません。

2016-04-08 16:23:26 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1366 -  Job job_1460065322409_0002 running in uber mode : false 
2016-04-08 16:23:26 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 0% reduce 0% 
2016-04-08 16:23:32 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 10% reduce 0% 
2016-04-08 16:23:33 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 40% reduce 0% 
2016-04-08 16:23:34 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 60% reduce 0% 
2016-04-08 16:23:36 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 80% reduce 0% 
2016-04-08 16:23:37 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 100% reduce 0% 
2016-04-08 16:23:38 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1384 -  Job job_1460065322409_0002 completed successfully 
2016-04-08 16:23:38 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1391 -  Counters: 30 
    File System Counters 
    FILE: Number of bytes read=0 
    FILE: Number of bytes written=1276095 
    FILE: Number of read operations=0 
    FILE: Number of large read operations=0 
    FILE: Number of write operations=0 
    HDFS: Number of bytes read=28184 
    HDFS: Number of bytes written=41960 
    HDFS: Number of read operations=60 
    HDFS: Number of large read operations=0 
    HDFS: Number of write operations=11 
Job Counters 
    Launched map tasks=10 
    Other local map tasks=10 
    Total time spent by all maps in occupied slots (ms)=1828125 
    Total time spent by all reduces in occupied slots (ms)=0 
    Total time spent by all map tasks (ms)=40625 
    Total vcore-seconds taken by all map tasks=40625 
    Total megabyte-seconds taken by all map tasks=58500000 
Map-Reduce Framework 
    Map input records=10 
    Map output records=0 
    Input split bytes=2150 
    Spilled Records=0 
    Failed Shuffles=0 
    Merged Map outputs=0 
    GC time elapsed (ms)=296 
    CPU time spent (ms)=10900 
    Physical memory (bytes) snapshot=2715054080 
    Virtual memory (bytes) snapshot=18852671488 
    Total committed heap usage (bytes)=4729077760 
File Input Format Counters 
    Bytes Read=6444 
File Output Format Counters 
    Bytes Written=0 
2016-04-08 16:23:38 UTC INFO [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 101 - Stopping the TaskStateCollectorService 
2016-04-08 16:23:38 UTC WARN [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 123 - Output task state path /gooblinOutput/working/GobblinKafkaQuickStart_mapR3/output/job_GobblinKafkaQuickStart_mapR3_1460132596498 does not exist 
2016-04-08 16:23:38 UTC INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 443 - Deleted working directory /gooblinOutput/working/GobblinKafkaQuickStart_mapR3 
2016-04-08 16:23:38 UTC INFO [main] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: [email protected][Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1] 
2016-04-08 16:23:38 UTC INFO [main] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: [email protected][Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1] 
2016-04-08 16:23:38 UTC INFO [main] gobblin.runtime.app.ServiceBasedAppLauncher 158 - Shutting down the application 
2016-04-08 16:23:38 UTC INFO [MetricsReportingService STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: j[email protected]5584dbb6 
2016-04-08 16:23:38 UTC INFO [MetricsReportingService STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: j[email protected]5584dbb6 
2016-04-08 16:23:38 UTC WARN [Thread-7] gobblin.runtime.app.ServiceBasedAppLauncher 153 - ApplicationLauncher has already stopped 
2016-04-08 16:23:38 UTC WARN [Thread-4] gobblin.metrics.reporter.ContextAwareReporter 116 - Reporter MetricReportReporter has already been stopped. 
2016-04-08 16:23:38 UTC WARN [Thread-4] gobblin.metrics.reporter.ContextAwareReporter 116 - Reporter MetricReportReporter has already been stopped. 
:そして最後に、それは、タスク状態のパス/ gooblinOutput /作業/ GobblinKafkaQuickStart_mapR3 /出力/ job_GobblinKafkaQuickStart_mapR3_1460132596498はここ 削除された作業ディレクトリ/ gooblinOutput /作業/ GobblinKafkaQuickStart_mapR3

存在しない

出力は、最終的なログですと言いますここで

私のconfファイルは、次のとおりです。

gobblin-mapreduce.properties 

# Thread pool settings for the task executor 
taskexecutor.threadpool.size=2 
taskretry.threadpool.coresize=1 
taskretry.threadpool.maxsize=2 

# File system URIs 
fs.uri=hdfs://{host}:8020 
writer.fs.uri=${fs.uri} 
state.store.fs.uri=s3a://{bucket}/gobblin-mapr/ 

# Writer related configuration properties 
writer.destination.type=HDFS 
writer.output.format=AVRO 
writer.staging.dir=${env:GOBBLIN_WORK_DIR}/task-staging 
writer.output.dir=${env:GOBBLIN_WORK_DIR}/task-output 

# Data publisher related configuration properties 
data.publisher.type=gobblin.publisher.BaseDataPublisher 
data.publisher.final.dir=${env:GOBBLIN_WORK_DIR}/job-output 
data.publisher.replace.final.dir=false 

# Directory where job/task state files are stored 
state.store.dir=${env:GOBBLIN_WORK_DIR}/state-store 

# Directory where error files from the quality checkers are stored 
qualitychecker.row.err.file=${env:GOBBLIN_WORK_DIR}/err 

# Directory where job locks are stored 
job.lock.dir=${env:GOBBLIN_WORK_DIR}/locks 

# Directory where metrics log files are stored 
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics 

# Interval of task state reporting in milliseconds 
task.status.reportintervalinms=5000 

# MapReduce properties 
mr.job.root.dir=${env:GOBBLIN_WORK_DIR}/working 


# s3 bucket configuration 

data.publisher.fs.uri=s3a://{bucket}/gobblin-mapr/ 
fs.s3a.access.key={key} 
fs.s3a.secret.key={key} 

F ILE 2:カフカツーs3.pull

job.name=GobblinKafkaQuickStart_mapR3 
job.group=GobblinKafka_mapR3 
job.description=Gobblin quick start job for Kafka 
job.lock.enabled=false 

kafka.brokers={kafka-host}:9092 
topic.whitelist={topic_name} 

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource 
extract.namespace=gobblin.extract.kafka 

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder 
writer.file.path.type=tablename 
writer.destination.type=HDFS 
writer.output.format=txt 

data.publisher.type=gobblin.publisher.BaseDataPublisher 

mr.job.max.mappers=10 
bootstrap.with.offset=latest 

metrics.reporting.file.enabled=true 
metircs.enabled=true 
metrics.reporting.file.suffix=txt 

実行コマンド

export GOBBLIN_WORK_DIR=/gooblinOutput 
Command : bin/gobblin-mapreduce.sh --conf /home/hadoop/gobblin-files/gobblin-dist/kafkaConf/kafka-to-s3.pull --logdir /home/hadoop/gobblin-files/gobblin-dist/logs 

わからない

が起こっていただきました。誰かが助けてくれますか?

答えて

0

が2つの問題

いた私だったdata.publisher.final.dir = $ {ENV:GOBBLIN_WORK_DIR} /ジョブ出力

それがされている必要がありますS3Aのようなもの://dev.com/ gobblin-mapr6/

そして、どういうわけかtopic.whitelistに特殊文字が追加されました。したがって、トピックを認識できませんでした

関連する問題