2016-04-05 2 views
0

環境:カサンドラ2.1、DataStaxドライバ2.1.9、DSE 4.8Cassandra 2.1の挿入パフォーマンスは影響を受けるカラムに依存しますか?

を持つ単一ノードクラスタは、私はテーブルを作成しました:

create table calc_data_test2(
    data_set_id uuid,svod_type text,section text,index_code text,value_type text,data_hash text,c1 text,c2 text,c3 text,c4 text,c5 text,c6 text,c7 text,c8 text,c9 text,c10 text,c11 text,c12 text,c13 text,c14 text,c15 text,c16 text,c17 text,c18 text,c19 text,c20 text,c21 text,c22 text,c23 text,c24 text,c25 text,c26 text,c27 text,c28 text,c29 text,c30 text,c31 text,c32 text,c33 text,c34 text,c35 text,c36 text,c37 text,c38 text,c39 text,c40 text,c41 text,c42 text,c43 text,c44 text,c45 text,c46 text,c47 text,c48 text,c49 text,c50 text,c51 text,c52 text,c53 text,c54 text,c55 text,c56 text,c57 text,c58 text,c59 text,c60 text,c61 text,c62 text,c63 text,c64 text,c65 text,c66 text,c67 text,c68 text,c69 text,c70 text,c71 text,c72 text,c73 text,c74 text,c75 text,c76 text,c77 text,c78 text,c79 text,c80 text,c81 text,c82 text,c83 text,c84 text,c85 text,c86 text,c87 text,c88 text,c89 text,c90 text,c91 text,c92 text,c93 text,c94 text,c95 text,c96 text,c97 text,c98 text,c99 text,c100 text,se1 text,se2 text,data_value double, 
    primary key ((data_set_id)) 
); 

その後、私はテーブルに非同期インサートを有するいくつかの実験を行いました。同じテーブルに1000000個の挿入があり、それぞれの場合に50個の並列要求がありました。影響を受けた列の数の差。 - 143860のMS

  • 65カラム - 108564のMS
  • 45カラム - 78213のMS
  • 25カラム - 68447のMS
  • 5カラム - 49812のMS
    • 85カラム:ここでの結果であります

    以下の詳細。


    85列の挿入:

    :25列の

    >java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673');" 1000000 --cassandra.connection.requests.max.local=50 
    00:33:19,972 INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established 
    
    Entering: Overall process 
    Entering: Prebuilding of statements 
    Leaving [845 ms]: Prebuilding of statements 
    Entering: Executing statements async 
    Leaving [78213 ms][12785.598302072545 ops/s]: Executing statements async 
    Leaving [79060 ms]: Overall process 
    

    インサート:45列の

    >java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c43,c44,c45,c46,c47,c48,c49,c50,c51,c52,c53,c54,c55,c56,c57,c58,c59,c60) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673','7867','8509','6889','3540','5994','4290','1925','8924','4704','4987','803','4291','4987','1111','4934','9885','6441','8212','9349','6852');" 1000000 --cassandra.connection.requests.max.local=50 
    00:28:27,393 INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established 
    
    Entering: Overall process 
    Entering: Prebuilding of statements 
    Leaving [847 ms]: Prebuilding of statements 
    Entering: Executing statements async 
    Leaving [108564 ms][9211.15655281677 ops/s]: Executing statements async 
    Leaving [109413 ms]: Overall process 
    

    インサート:65列の

    >java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c43,c44,c45,c46,c47,c48,c49,c50,c51,c52,c53,c54,c55,c56,c57,c58,c59,c60,c61,c62,c63,c64,c65,c66,c67,c68,c69,c70,c71,c72,c73,c74,c75,c76,c77,c78,c79,c80) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673','7867','8509','6889','3540','5994','4290','1925','8924','4704','4987','803','4291','4987','1111','4934','9885','6441','8212','9349','6852','6628','42','6713','3696','3316','8122','3288','3845','6063','5430','2052','5121','3343','6362','8724','2184','1380','5828','3723','8185');" 1000000 --cassandra.connection.requests.max.local=50 
    22:56:40,398 INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established 
    
    Entering: Overall process 
    Entering: Prebuilding of statements 
    Leaving [1086 ms]: Prebuilding of statements 
    Entering: Executing statements async 
    Leaving [143860 ms][6951.202558042542 ops/s]: Executing statements async 
    Leaving [144954 ms]: Overall process 
    

    挿入

    >java -jar store-utils-cli-1.2.0-SNAPSHOT.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525');" 1000000 --cassandra.connection.requests.max.local=50 
    00:39:29,337 INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established 
    
    Entering: Overall process 
    Entering: Prebuilding of statements 
    Leaving [885 ms]: Prebuilding of statements 
    Entering: Executing statements async 
    Leaving [68447 ms][14609.844112963314 ops/s]: Executing statements async 
    Leaving [69339 ms]: Overall process 
    

    5列の挿入:

    >java -jar store-utils-cli-1.2.0-SNAPSHOT.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type) VALUES(now(), '58','9281','7611','367');" 1000000 --cassandra.connection.requests.max.local=50 
    00:43:35,293 INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established 
    
    Entering: Overall process 
    Entering: Prebuilding of statements 
    Leaving [968 ms]: Prebuilding of statements 
    Entering: Executing statements async 
    Leaving [49812 ms][20075.483819160043 ops/s]: Executing statements async 
    Leaving [50782 ms]: Overall process 
    

    が、インサート中影響を受ける列の数は、パフォーマンスにそれほど大きな影響を与えていることは本当に本当ですか?このような依存関係についての情報はまだ見つかっていません。私は何か間違っているかもしれない?

    挿入のためのすべての意味のあるコードはここにある:

    override fun run(args: Array<String?>) { 
        if (args.isEmpty() || args.size < 2){ 
         System.err.println("You should specify a query and a number of iterations: ${args.toList()}") 
         return 
        } 
    
        val query: String? = args[0] 
        val iterationCount: Long = args[1]!!.toLong() 
    
        // get the session 
        val session: Session = cassandraCluster.connection().driverSession 
        // prepare the query 
        val preparedQuery: PreparedStatement = session.prepare(query) 
    
        MeasureTime("Overall process").use { 
         // create bound statements 
         val statements = MeasureTime("Prebuild statements").use { 
          (1..iterationCount).map { BoundStatement(preparedQuery) } 
         } 
    
         // execute async 
         MeasureTime("Execute statements async", iterationCount).use { 
          val phaser = Phaser(1) 
          statements.map { statement -> 
           phaser.register() 
           session.executeAsync(statement).withCallback({ 
            phaser.arriveAndDeregister() 
           }, { err -> 
            System.err.println(err) 
            phaser.arriveAndDeregister() 
           }) 
          } 
          // block until all tasks are done 
          phaser.arriveAndAwaitAdvance() 
         } 
        } 
    } 
    
    // extension method for convenience 
    private fun <T> ListenableFuture<T>.withCallback(onSuccessCallback: (T?) -> Unit, onFailureCallback: (Throwable?) -> Unit): ListenableFuture<T> { 
        Futures.addCallback(this, object: FutureCallback<T> { 
         override fun onSuccess(p0: T?) { 
          onSuccessCallback(p0) 
         } 
    
         override fun onFailure(p0: Throwable?) { 
          onFailureCallback(p0) 
         } 
        }) 
        return this 
    } 
    
    class MeasureTime(val message: String, val operationCount: Long? = null): Closeable { 
        private val startTime: Long 
    
        init { 
         startTime = System.nanoTime() 
         System.out.println("Entering: $message") 
        } 
    
        override fun close() { 
         val endTime = System.nanoTime() 
         val elapsed = (endTime - startTime)/1000000 
         val opStats = if (operationCount != null) { 
          val f = operationCount/elapsed.toDouble()*1000 
          "[$f ops/s]" 
         } else "" 
         val message = "Leaving [$elapsed ms]$opStats: $message" 
         System.out.println(message) 
        } 
    } 
    

    私はそれがJavaの人がkotlinコードで何が起こっているか理解するための問題ではないと信じています。

    答えて

    2

    2番目のテーブルに1番目のテーブル(c1〜c100 +他の2つの列)より多くのデータを挿入すると、挿入が遅くなるのが普通です。

    1. さて、あなたは両方のテーブルに(バイト数の期間中)同じ量のデータを挿入した場合でも、第二の表の挿入はまだ理由を少し遅くなりますメタデータのオーバーヘッド。それらのいくつかの代わりに

    2. 、多分私は忘れて他のパラメータを、列の多くをシリアル化するために、彼らに

    3. CPUの消費量を格納するメモリ内に作成するより多くのオブジェクトがあるので、あなたはより多くの列を持っている

    +0

    はい、そうです。そこで、実行時間の列数依存性のみを測定するように質問を更新しました。 – sedovav

    関連する問題