spark/scalaの中心をシフトさせる数値の範囲を再調整するには？

範囲内の値を-infinityから+infinityまたは-2から130などに変換/再スケーリングすることができる最大値を定義します。例えば以下でspark/scalaの中心をシフトさせる数値の範囲を再調整するには？

は、I 55は100であることを確認すると、100 + 0

before | after

45-55 | 90-100

35-44 | 80-89

...

100+ or < 0| 0-5

あります

はどれもML features functionsのいずれかですか？

出典

2016-11-03 Karol Sudol

私はそれを解決できました、あなたの助けに感謝@ user6910411。データに応じて密度の高いベクトルまたは疎ベクトルを使用し、MaxAbsScalerに置き換えて、linalg.VectorsまたはDenseVectorを入力してください。アイデアは、必要な正中および逆スケールのポイントでデータを分割し、両方のハーフをスケールしてDFをマージします。

import org.apache.spark.mllib.linalg.Vectors 
import org.apache.spark.ml.feature.Normalizer 
import org.apache.spark.ml.feature.MaxAbsScaler 
import org.apache.spark.ml.feature.MinMaxScaler 
import org.apache.spark.ml.feature.VectorAssembler 
import org.apache.spark.ml.linalg.DenseVector 
import org.apache.spark.sql.functions.udf 

val vectorToColumn = udf{ (x: DenseVector, index: Int) => x(index) } 

val gt50 = df.filter("score >= 55").select('id,('score * -1).as("score")) 
val lt50 = df.filter("score < 55") 

val assembler = new VectorAssembler() 
.setInputCols(Array("score")) 
.setOutputCol("features") 

val ass_lt50 = assembler.transform(lt50) 
val ass_gt50 = assembler.transform(gt50) 

val scaler = new MinMaxScaler() 
.setInputCol("features") 
.setOutputCol("featuresScaled") 
.setMax(100) 
.setMin(0) 

val feat_lt50 = scaler.fit(ass_lt50).transform(ass_lt50).drop('score) 
val feat_gt50 = scaler.fit(ass_gt50).transform(ass_gt50).drop('score) 

val scaled_lt50 = feat_lt50.select('id,round(
vectorToColumn(col("featuresScaled"),lit(0))).as("scaled_score")) 

val scaled_gt50 = feat_gt50.select('id,round(
vectorToColumn(col("featuresScaled"),lit(0))).as("scaled_score")) 

val scaled = scaled_lt50.unionAll(scaled_gt50)

出典

2016-11-04 12:53:17

spark/scalaの中心をシフトさせる数値の範囲を再調整するには？

答えて

関連する問題