デシジョンツリー分類器をトレーニングしようとするとエラーが発生する

デシジョンツリーを使用してデータセットを読み込み、分類を実行します。トレーニングのステップに到達すると、エラーが表示されます（下記参照）。デシジョンツリー分類器をトレーニングしようとするとエラーが発生する

ステップ1：

val data = sparkSession.read.format("com.databricks.spark.csv") .option("delimiter", "\t") .load("data.txt")
：データ読み取りを私はタブ形式 (text \t label)で分離され .txtファイルを持っている

私がこれまで何をやったか

となり、以下のようになります。

ステップ2：分割データ

val splits = data.randomSplit(Array(0.7, 0.3)) 
val (trainingData, testData) = (splits(0), splits(1))

ステップ3：パラメータチューニング

val numClasses = 2 
val categoricalFeaturesInfo = Map[String, Int]() 
val impurity = "gini" 
val maxDepth = 5 
val maxBins = 32

ステップ4：このステップでトレーニング

val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, 
    impurity, maxDepth, maxBins)

I完全なコードを以下に示し

Main.scala:63: overloaded method value trainClassifier with alternatives: 
    (input: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.regression.LabeledPoint],numClasses: Int,categoricalFeaturesInfo: java.util.Map[Integer,Integer],impurity: String,maxDepth: Int,maxBins: Int)org.apache.spark.mllib.tree.model.DecisionTreeModel <and> 
    (input: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint],numClasses: Int,categoricalFeaturesInfo: scala.collection.immutable.Map[Int,Int],impurity: String,maxDepth: Int,maxBins: Int)org.apache.spark.mllib.tree.model.DecisionTreeModel 
cannot be applied to (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], Int, scala.collection.immutable.Map[String,Int], String, Int, Int) 
     val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,

：すべてのヘルプははるかに高く評価されるだろう

import org.apache.spark.sql.SparkSession 
import org.apache.spark.mllib.util.MLUtils 
import org.apache.spark.mllib.tree.DecisionTree 
import org.apache.spark.mllib.tree.model.DecisionTreeModel 

object DC_classifier { 
    def main() { 

     val sparkSession = SparkSession.builder 
      .master("local") 
      .appName("Decision tree") 
      .getOrCreate() 

     val sc = sparkSession.sparkContext 
     import sparkSession.implicits._ 

     val data = sparkSession.read.format("com.databricks.spark.csv") 
      .option("delimiter", "\t") 
      .load("data.txt") 

     val splits = data.randomSplit(Array(0.7, 0.3)) 
     val (trainingData, testData) = (splits(0), splits(1)) 

     val numClasses = 2 
     val categoricalFeaturesInfo = Map[String, Int]() 
     val impurity = "gini" 
     val maxDepth = 5 
     val maxBins = 32 

     val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, 
      impurity, maxDepth, maxBins) 

    } 

} 

DC_classifier.main()

次のエラーを取得します。あなたはorg.apache.spark.mllib.regression.LabeledPointを使用してorg.apache.spark.sql.Dataset [org.apache.spark.sql.Row] trainingData

のためのようなものをしないでくださいする必要が

出典

2017-03-02 Giorgos Myrianthous

間違ったAPI： 'RDD' - >' org.apache.spark.mllib'、 'Dataset' - >' org.apache.spark.ml'（https://spark.apache.org/docs/latest/） ml-classification-regression.html＃この場合のdecision-tree-classifier）。正しい型（ 'LabeledPoint'、' Vector'カラム）と特徴抽出/選択については触れません。 – zero323

あなたはもっと具体的になることができますか？ありがとう。 –

最初にhttps://spark.apache.org/docs/latest/ml-guide.html#example-pipelineにアクセスし、興味のあるAPI（main（= DataFrame/Dataset）またはRDD）をチェックし、例に従ってください。 RDDの場合は、署名も必ず確認してください。 – zero323

このhttp://www.bmc.com/blogs/sgd-linear-regression-example-apache-spark/

DataBricksメソッドをデータフレームの作成時に使用しないでください。

出典

2017-05-25 23:16:44

デシジョンツリー分類器をトレーニングしようとするとエラーが発生する

答えて

関連する問題