2017-11-01 4 views
0

I have the below setup:スプリングブート - スパークアプリケーション:ファイルを処理できません、そのノード上

Spark Master and Slaves configured and running in my local.

17/11/01 18:03:52 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 
17/11/01 18:03:52 INFO Master: Starting Spark master at spark://127.0.0.1:7077 

17/11/01 18:03:52 INFO Master: Running Spark version 2.2.0 
17/11/01 18:03:52 INFO Utils: Successfully started service 'MasterUI' on port 8080. 

I have a spring boot application whose properties file contents look like the below:

spark.home =は/ usr/local /セラー/ apacheのスパーク/ 2.2.0/binに/ マスター。 URI =スパーク://127.0.0.1:

17/11/01 18:16:36 INFO TorrentBroadcast: Started reading broadcast variable 2 
17/11/01 18:16:36 INFO TransportClientFactory: Successfully created connection to /192.168.0.135:51903 after 1 ms (0 ms spent in bootstraps) 
17/11/01 18:16:36 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.4 KB, free 366.3 MB) 
17/11/01 18:16:36 INFO TorrentBroadcast: Reading broadcast variable 2 took 82 ms 
17/11/01 18:16:36 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 67.2 KB, free 366.2 MB) 
17/11/01 18:16:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 
    at org.apache.spark.scheduler.Task.run(Task.scala:108) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:748) 
17/11/01 18:16:36 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) 
:7077

@Autowired 
    SparkConf sparkConf; 

    public void processFile(String inputFile, String outputFile) { 
     JavaSparkContext javaSparkContext; 
     SparkContext sc = new SparkContext(sparkConf); 
     SerializationWrapper sw= new SerializationWrapper() { 
      private static final long serialVersionUID = 1L; 

      @Override 
      public JavaSparkContext createJavaSparkContext() { 
       // TODO Auto-generated method stub 
       return JavaSparkContext.fromSparkContext(sc); 
      } 
     }; 
     javaSparkContext=sw.createJavaSparkContext(); 

     JavaRDD<String> lines = javaSparkContext.textFile(inputFile); 
     Broadcast<JavaRDD<String>> outputLines; 
     outputLines = javaSparkContext.broadcast(lines.map(new Function<String, String>() { 

      /** 
      * 
      */ 
      private static final long serialVersionUID = 1L; 

      @Override 
      public String call(String arg0) throws Exception { 
       // TODO Auto-generated method stub 
       return arg0; 
      } 
     })); 

     outputLines.getValue().saveAsTextFile(outputFile); 
     //javaSparkContext.close(); 
    } 

私は以下のエラーを取得していたコードを実行すると、

The springboot-spark app should process the files based on REST API call where i get the input and output file location shared across the Spark nodes.

Any suggestions to fix the above errors

答えて

0

JavaRDDをブロードキャストするべきではないと思います.RDDがすでにクラスタノードに分散されているからです。

+0

ブロードキャスターなしで試してみましたが、これはうまくいきませんでした:-(同じエラーが発生しました – user3709612

+0

それは環境問題です。[This](https://issues.apache.org/jira/browse/SPARK) -19938)が役に立ちます。 – Arthur

関連する問題