2016-11-01 4 views
0

SparkとMongoDBのmaven依存関係を持つScalaアプリケーションを構築したいと思います。私が使用するScalaのバージョンは2.10です。このような私のポンポンの外観は、(unrelevant部品を省い):私はmvn clean assembly:assemblyを実行するとScala 2.10でsparkとMongoDBアプリケーションが作成されました。

<properties> 
    <maven.compiler.source>1.6</maven.compiler.source> 
    <maven.compiler.target>1.6</maven.compiler.target> 
    <encoding>UTF-8</encoding> 
    <scala.tools.version>2.10</scala.tools.version> 
    <!-- Put the Scala version of the cluster --> 
    <scala.version>2.10.5</scala.version> 
</properties> 

<!-- repository to add org.apache.spark --> 
<repositories> 
    <repository> 
     <id>cloudera-repo-releases</id> 
     <url>https://repository.cloudera.com/artifactory/repo/</url> 
    </repository> 
</repositories> 

<build> 
    <sourceDirectory>src/main/scala</sourceDirectory> 
    <testSourceDirectory>src/test/scala</testSourceDirectory> 
    <!-- <pluginManagement> -->  
     <plugins> 
      <plugin> 
       <!-- see http://davidb.github.com/scala-maven-plugin --> 
       <groupId>net.alchim31.maven</groupId> 
       <artifactId>scala-maven-plugin</artifactId> 
       <version>3.1.3</version> 
       <executions> 
        <execution> 
         <goals> 
          <goal>compile</goal> 
          <goal>testCompile</goal> 
         </goals> 
         <configuration> 
          <args> 
           <arg>-make:transitive</arg> 
           <arg>-dependencyfile</arg> 
           <arg>${project.build.directory}/.scala_dependencies</arg> 
          </args> 
         </configuration> 
        </execution> 
       </executions> 
      </plugin> 
      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-surefire-plugin</artifactId> 
       <version>2.13</version> 
       <configuration> 
        <useFile>false</useFile> 
        <disableXmlReport>true</disableXmlReport> 
        <!-- If you have classpath issue like NoDefClassError,... --> 
        <!-- useManifestOnlyJar>false</useManifestOnlyJar --> 
        <includes> 
         <include>**/*Test.*</include> 
         <include>**/*Suite.*</include> 
        </includes> 
       </configuration> 
      </plugin> 

      <!-- "package" command plugin --> 
      <plugin> 
       <artifactId>maven-assembly-plugin</artifactId> 
       <version>2.4.1</version> 
       <configuration> 
        <descriptorRefs> 
         <descriptorRef>jar-with-dependencies</descriptorRef> 
        </descriptorRefs> 
       </configuration> 
       <executions> 
        <execution> 
         <id>make-assembly</id> 
         <phase>package</phase> 
         <goals> 
          <goal>single</goal> 
         </goals> 
        </execution> 
       </executions> 
      </plugin> 
     </plugins> 
    <!-- </pluginManagement> -->  
</build> 

<dependencies> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-library</artifactId> 
     <version>${scala.version}</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.10</artifactId> 
     <version>1.6.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.mongodb.spark</groupId> 
     <artifactId>mongo-spark-connector_2.10</artifactId> 
     <version>1.1.0</version> 

    </dependency> 
    <dependency> 
     <groupId>org.mongodb.scala</groupId> 
     <artifactId>mongo-scala-driver_2.11</artifactId> 
     <version>1.1.1</version> 
    </dependency> 
</dependencies> 

は、次のエラーが発生します。mongo-scala-driver_2.11依存関係を追加するとき

C:\Develop\workspace\SparkApplication>mvn clean assembly:assembly 
[INFO] Scanning for projects... 
[INFO] 
[INFO] ------------------------------------------------------------------------ 
[INFO] Building SparkApplication 0.0.1-SNAPSHOT 
[INFO] ------------------------------------------------------------------------ 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ SparkApplication --- 
[INFO] Deleting C:\Develop\workspace\SparkApplication\target 
[INFO] 
[INFO] ------------------------------------------------------------------------ 
[INFO] Building SparkApplication 0.0.1-SNAPSHOT 
[INFO] ------------------------------------------------------------------------ 
[INFO] 
[INFO] >>> maven-assembly-plugin:2.4.1:assembly (default-cli) > package @ SparkA 
pplication >>> 
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ SparkAppli 
cation --- 
[INFO] Using 'UTF-8' encoding to copy filtered resources. 
[INFO] skip non existing resourceDirectory C:\Develop\workspace\SparkApplication 
\src\main\resources 
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ SparkApplicatio 
n --- 
[INFO] Nothing to compile - all classes are up to date 
[INFO] 
[INFO] --- scala-maven-plugin:3.1.3:compile (default) @ SparkApplication --- 
[WARNING] Expected all dependencies to require Scala version: 2.10.5 
[WARNING] xx.xxx.xxx:SparkApplication:0.0.1-SNAPSHOT requires scala version: 
2.10.5 
[WARNING] com.twitter:chill_2.10:0.5.0 requires scala version: 2.10.4 
[WARNING] Multiple versions of scala libraries detected! 
[INFO] C:\Develop\workspace\SparkApplication\src\main\scala:-1: info: compiling 
[INFO] Compiling 1 source files to C:\Develop\workspace\SparkApplication\target\ 
classes at 1477993255625 
[INFO] No known dependencies. Compiling everything 
[ERROR] error: bad symbolic reference. A signature in package.class refers to ty 
pe compileTimeOnly 
[INFO] in package scala.annotation which is not available. 
[INFO] It may be completely missing from the current classpath, or the version o 
n 
[INFO] the classpath might be incompatible with the version used when compiling 
package.class. 
[ERROR] C:\Develop\workspace\SparkApplication\src\main\scala\com\examples\MainEx 
ample.scala:33: error: Reference to method intWrapper in class LowPriorityImplic 
its should not have survived past type checking, 
[ERROR] it should have been processed and eliminated during expansion of an encl 
osing macro. 
[ERROR]  val count = sc.parallelize(1 to NUM_SAMPLES).map{i => 
[ERROR]        ^
[ERROR] two errors found 
[INFO] ------------------------------------------------------------------------ 
[INFO] BUILD FAILURE 
[INFO] ------------------------------------------------------------------------ 
[INFO] Total time: 10.363 s 
[INFO] Finished at: 2016-11-01T10:40:58+01:00 
[INFO] Final Memory: 20M/353M 
[INFO] ------------------------------------------------------------------------ 
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compi 
le (default) on project SparkApplication: wrap: org.apache.commons.exec.ExecuteE 
xception: Process exited with an error: 1(Exit value: 1) -> [Help 1] 
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit 
ch. 
[ERROR] Re-run Maven using the -X switch to enable full debug logging. 
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please rea 
d the following articles: 
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE 
xception 

エラーが発生します。この依存関係がなければ、瓶が作られます。私のコードは、現在Spark websiteからパイ推定の例である:私はいくつかのgithub issueでこれを見つけたよう

val conf = new SparkConf() 
      .setAppName("Cluster Application") 
      //.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 

val sc = new SparkContext(conf) 


val count = sc.parallelize(1 to NUM_SAMPLES).map{i => 
    val x = Math.random() 
    val y = Math.random() 
    if (x*x + y*y < 1) 1 else 0 
}.reduce(_ + _) 
println("Pi is roughly " + 4.0 * count/NUM_SAMPLES) 

は、私はまた、各要素に次のタグを追加してみました。しかし、助けなかった。

<exclusions> 
     <exclusion> 
      <!-- make sure wrong scala version is not pulled in --> 
      <groupId>org.scala-lang</groupId> 
      <artifactId>scala-library</artifactId> 
     </exclusion> 
    </exclusions>   

これを修正するにはどうすればよいですか? MongoDB Scala DriverはScala 2.11に対して構築されているようですが、SparkにはScala 2.10が必要です。

+0

Scala 2.10用のMongo Scalaドライバはありません。したがって、エラーの原因です。あなたがSparkを使用しているなら、Mongo Spark Connectorはあなたが必要とするものでなければなりません。 – Ross

答えて

1

Scala 2.10用にコンパイルされていない互換性のないMongo Scala Driver依存関係を削除します。

MongoDB Spark Connectorはスタンドアロンコネクタです。 SparkはCPU集約型同期タスク用に設計されているため、同期Mongo Java Driverを使用しています。 Sparkのイディオムに従うように設計されており、MongoDBをSparkに接続するために必要なすべてです。

一方、Mongo Scala Driverは、現代のScalaの規則に慣例的です。すべてのIOは完全に非同期です。これは、Webアプリケーションや個々のマシンのスケーラビリティを向上させるのに最適です。

関連する問題