Giraphでは若干大きなスーパーステップ値を設定できません。

スーパーステップを20に設定すると、正常に動作します。しかし、スーパーステップを200に設定すると、機能しません。Giraphでは若干大きなスーパーステップ値を設定できません。

hadoop jar Test-jar-with-dependencies.jar org.apache.giraph.GiraphRunner test.Test -mc test.TestMC -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /input/test.txt -w 1 -ca mapred.job.tracker=s1 -ca mapreduce.job.counters.limit=1000

そして、最終的な結果は次のとおりです。

16/10/20 08:56:08 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers 
16/10/20 08:56:38 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer s3:22181 --zkNode /_hadoopBsp/job_1476868823433_0017/_haltComputation' 
16/10/20 08:56:38 INFO mapreduce.Job: Running job: job_1476868823433_0017 
16/10/20 08:56:39 INFO mapreduce.Job: Job job_1476868823433_0017 running in uber mode : false 
16/10/20 08:56:39 INFO mapreduce.Job: map 50% reduce 0% 
16/10/20 08:56:47 INFO mapreduce.Job: map 100% reduce 0% 
16/10/20 08:56:47 INFO mapreduce.Job: Job job_1476868823433_0017 failed with state FAILED due to: Task failed task_1476868823433_0017_m_000000 
Job failed as tasks failed. failedMaps:1 failedReduces:0 

16/10/20 08:56:47 INFO mapreduce.Job: Counters: 34 
    File System Counters 
     FILE: Number of bytes read=0 
     FILE: Number of bytes written=97529 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=76 
     HDFS: Number of bytes written=0 
     HDFS: Number of read operations=8 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=4 
    Job Counters 
     Failed map tasks=1 
     Launched map tasks=2 
     Other local map tasks=2 
     Total time spent by all maps in occupied slots (ms)=33269 
     Total time spent by all reduces in occupied slots (ms)=0 
     Total time spent by all map tasks (ms)=33269 
     Total vcore-seconds taken by all map tasks=33269 
     Total megabyte-seconds taken by all map tasks=34067456 
    Map-Reduce Framework 
     Map input records=1 
     Map output records=0 
     Input split bytes=44 
     Spilled Records=0 
     Failed Shuffles=0 
     Merged Map outputs=0 
     GC time elapsed (ms)=130 
     CPU time spent (ms)=7280 
     Physical memory (bytes) snapshot=186077184 
     Virtual memory (bytes) snapshot=823398400 
     Total committed heap usage (bytes)=200802304 
    Zookeeper base path 
     /_hadoopBsp/job_1476868823433_0017=0 
    Zookeeper halt node 
     /_hadoopBsp/job_1476868823433_0017/_haltComputation=0 
    Zookeeper server:port 
     s3:22181=0 
    File Input Format Counters 
     Bytes Read=0 
    File Output Format Counters 
     Bytes Written=0

私のテストコードは次のとおりです。頂点計算

public class Test extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable>{ 

    @Override 
    public void compute(
      Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, 
      Iterable<DoubleWritable> messages) throws IOException { 
     // TODO Auto-generated method stub 
    } 

}

マスター計算

public class TestMC extends DefaultMasterCompute { 

    @Override 
    public void compute() { 
     // TODO Auto-generated method 
     if (getSuperstep() == 200) { 
      haltComputation(); 
     } 
    } 
}

カウンターがにあるように思え小（120）ですが、1000に設定しました。この問題を解決するには？

エラーログは次のとおりです。

2016-10-20 08:56:38,569 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 199 took 0.016 seconds ended with state ALL_SUPERSTEPS_DONE and is now on superstep 200 
2016-10-20 08:56:38,573 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep 200 
2016-10-20 08:56:38,574 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} 
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x236f zxid:0x143b txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir 
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710001 type:create cxid:0xd8f zxid:0x143c txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_masterJobState 
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir/0_master 
2016-10-20 08:56:38,575 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x2375 zxid:0x143f txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir 
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Node /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir already exists, no need to create. 
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 1 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir 
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanedUpZooKeeper: Waiting for the children of /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir to change since only got 1 nodes. 
2016-10-20 08:56:40,710 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120 
    at org.apache.hadoop.ipc.Client.call(Client.java:1411) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1364) 
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) 
    at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source) 
    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737) 
    at java.lang.Thread.run(Thread.java:745) 

2016-10-20 08:56:40,879 INFO [main-EventThread] org.apache.giraph.bsp.BspService: process: cleanedUpChildrenChanged signaled 
2016-10-20 08:56:40,880 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 2 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir 
2016-10-20 08:56:40,880 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x157df96ea710001 
2016-10-20 08:56:40,882 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:22181] org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /219.223.239.57:49390 which had sessionid 0x157df96ea710001 
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Removed HDFS checkpoint directory (_bsp/_checkpoints//job_1476868823433_0017) with return = false since the job Giraph: cost.Test succeeded 
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Halting netty client 
2016-10-20 08:56:40,890 INFO [netty-client-worker-0] org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing resources now. 
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Netty client halted 
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Halting netty server 
2016-10-20 08:56:43,106 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Start releasing resources 
2016-10-20 08:56:43,780 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120 
    at org.apache.hadoop.ipc.Client.call(Client.java:1411) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1364) 
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) 
    at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source) 
    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737) 
    at java.lang.Thread.run(Thread.java:745) 

2016-10-20 08:56:43,793 INFO [communication thread] org.apache.hadoop.mapred.Task: Process Thread Dump: Communication exception 
46 active threads 
Thread 56 (netty-server-worker-15): 
    State: RUNNABLE 
    Blocked count: 0 
    Waited count: 1 
    Stack: 
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) 
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) 
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) 
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) 
    io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:596) 
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:306) 
    io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) 
    java.lang.Thread.run(Thread.java:745) 
Thread 55 (netty-server-worker-14): 
    State: RUNNABLE 
    Blocked count: 0 
    Waited count: 1

出典

2016-10-20 Yafei Chang

それが機能するようになりました！コマンドラインでプロパティmapreduce.job.counters.limitを設定することは役に立ちません。しかし、それを$ HADOOP_HOME/conf/mapred-site.xmlに追加した後は動作します。

<property> 
    <name>mapreduce.job.counters.limit</name> 
    <value>20000</value> 
    <description>Limit on the number of counters allowed per job. The default value is 200.</description> 
</property>

出典

2016-10-20 06:47:39

Giraphでは若干大きなスーパーステップ値を設定できません。

答えて

関連する問題