2014-01-18 8 views
5

おそらく、この問題の解決方法は非常に簡単ですが、私はグーグルで見つけられませんでした。またはdata.table FAQです。あなたが見ることができるように、行の多くの値が列の一部について、繰り返している最小条件を使用したサブセットdata.table

> test 
    chr  bp   ID REF  ALT  AF AC AN    EFFECT IMPACT FUNCLASS CODING  GENE  pos effRank 
1: 1 860416 rs61464428  G   A 0.5000000 14 28    UPSTREAM MODIFIER   CODING  SAMD11 1:860416  21 
2: 1 860416 rs61464428  G   A 0.5000000 14 28    UPSTREAM MODIFIER   CODING  SAMD11 1:860416  21 
3: 1 860416 rs61464428  G   A 0.5000000 14 28   DOWNSTREAM MODIFIER   CODING AL645608.1 1:860416  22 
4: 1 860461 rs57465118  G   A 1.0000000 62 62    UPSTREAM MODIFIER   CODING  SAMD11 1:860461  21 
5: 1 860461 rs57465118  G   A 1.0000000 62 62    UPSTREAM MODIFIER   CODING  SAMD11 1:860461  21 
6: 1 860461 rs57465118  G   A 1.0000000 62 62   DOWNSTREAM MODIFIER   CODING AL645608.1 1:860461  22 
7: 1 860521 rs57924093  C   A 0.9840000 61 62    UPSTREAM MODIFIER   CODING  SAMD11 1:860521  21 
8: 1 860521 rs57924093  C   A 0.9840000 61 62    UPSTREAM MODIFIER   CODING  SAMD11 1:860521  21 
9: 1 860521 rs57924093  C   A 0.9840000 61 62   DOWNSTREAM MODIFIER   CODING AL645608.1 1:860521  22 
10: 1 861261 rs144896029  G   A 0.0027270 3 1100    UPSTREAM MODIFIER   CODING  SAMD11 1:861261  21 
11: 1 861261 rs144896029  G   A 0.0027270 3 1100   DOWNSTREAM MODIFIER   CODING AL645608.1 1:861261  22 
12: 1 861332     G   A 0.0009074 1 1102 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING AL645608.1 1:861332  11 
13: 1 861332     G   A 0.0009074 1 1102 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:861332  11 
14: 1 861332     G   A 0.0009074 1 1102 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:861332  11 
15: 1 861332     G   A 0.0009074 1 1102 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:861332  11 
16: 1 861332     G   A 0.0009074 1 1102    UPSTREAM MODIFIER   CODING  SAMD11 1:861332  21 
17: 1 865455     C   G 0.0033190 3 904    UPSTREAM MODIFIER   CODING  SAMD11 1:865455  21 
18: 1 865628 rs41285790  G   A 0.0027780 3 1080 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:865628  11 
19: 1 865628 rs41285790  G   A 0.0027780 3 1080 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:865628  11 
20: 1 865628 rs41285790  G   A 0.0027780 3 1080 NON_SYNONYMOUS_CODING MODERATE MISSENSE CODING  SAMD11 1:865628  11 
21: 1 865628 rs41285790  G   A 0.0027780 3 1080  SYNONYMOUS_CODING  LOW SILENT CODING AL645608.1 1:865628  14 
22: 1 865628 rs41285790  G   A 0.0027780 3 1080    UPSTREAM MODIFIER   CODING  SAMD11 1:865628  21 
23: 1 866437 rs139076934  C   T 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING AL645608.1 1:866437  14 
24: 1 866437 rs139076934  C   T 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866437  14 
25: 1 866437 rs139076934  C   T 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866437  14 
26: 1 866437 rs139076934  C   T 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866437  14 
27: 1 866461 rs148884928  G   A 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866461  14 
28: 1 866461 rs148884928  G   A 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866461  14 
29: 1 866461 rs148884928  G   A 0.0009074 1 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:866461  14 
30: 1 866461 rs148884928  G   A 0.0009074 1 1102    UPSTREAM MODIFIER   CODING AL645608.1 1:866461  21 
31: 1 866511 rs71576583 CCCCT CCCCTCCCT 1.0000000 148 148    UPSTREAM MODIFIER   CODING AL645608.1 1:866511  21 
32: 1 871057     C   T 0.0009074 1 1102    UPSTREAM MODIFIER   CODING  SAMD11 1:871057  21 
33: 1 871057     C   T 0.0009074 1 1102    UPSTREAM MODIFIER   CODING AL645608.1 1:871057  21 
34: 1 871057     C   T 0.0009074 1 1102    UPSTREAM MODIFIER   CODING  SAMD11 1:871057  21 
35: 1 871215 rs28419423  C   G 0.0036300 4 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:871215  14 
36: 1 871215 rs28419423  C   G 0.0036300 4 1102  SYNONYMOUS_CODING  LOW SILENT CODING  SAMD11 1:871215  14 
37: 1 871215 rs28419423  C   G 0.0036300 4 1102    UPSTREAM MODIFIER   CODING  SAMD11 1:871215  21 
38: 1 871215 rs28419423  C   G 0.0036300 4 1102    UPSTREAM MODIFIER   CODING  SAMD11 1:871215  21 
39: 1 871215 rs28419423  C   G 0.0036300 4 1102    UPSTREAM MODIFIER   CODING AL645608.1 1:871215  21 
40: 1 871215 rs28419423  C   G 0.0036300 4 1102   DOWNSTREAM MODIFIER   CODING  SAMD11 1:871215  22 
41: 1 871287     C   G 0.0009107 1 1098    UPSTREAM MODIFIER   CODING  SAMD11 1:871287  21 
42: 1 871287     C   G 0.0009107 1 1098    UPSTREAM MODIFIER   CODING  SAMD11 1:871287  21 
43: 1 871287     C   G 0.0009107 1 1098    UPSTREAM MODIFIER   CODING AL645608.1 1:871287  21 
44: 1 871287     C   G 0.0009107 1 1098   DOWNSTREAM MODIFIER   CODING  SAMD11 1:871287  22 
45: 1 871334 rs4072383  G   T 0.6680000 474 710    UPSTREAM MODIFIER   CODING  SAMD11 1:871334  21 
46: 1 871334 rs4072383  G   T 0.6680000 474 710    UPSTREAM MODIFIER   CODING  SAMD11 1:871334  21 
47: 1 871334 rs4072383  G   T 0.6680000 474 710    UPSTREAM MODIFIER   CODING AL645608.1 1:871334  21 
48: 1 871334 rs4072383  G   T 0.6680000 474 710   DOWNSTREAM MODIFIER   CODING  SAMD11 1:871334  22 
49: 1 874415 rs74047412  C   T 0.0018250 2 1096    UPSTREAM MODIFIER   CODING  SAMD11 1:874415  21 
50: 1 874415 rs74047412  C   T 0.0018250 2 1096    UPSTREAM MODIFIER   CODING  SAMD11 1:874415  21 
    chr  bp   ID REF  ALT  AF AC AN    EFFECT IMPACT FUNCLASS CODING  GENE  pos effRank 

私はdata.tableとても似ています。私がしたいのは、effRank変数の値(最小値)に基づいて、重複した行を削除することです。キーをchr,bpeffRankに設定しました。したがって、これらの3つの列に基づいて表をソートする必要があります。私は一種の近づいた。次のコマンドは、必要な行を返しますが、必要なすべての列を返しません。

> test[,min(effRank), by=pos] 
     pos V1 
1: 1:860416 21 
2: 1:860461 21 
3: 1:860521 21 
4: 1:861261 21 
5: 1:861332 11 
6: 1:865455 21 
7: 1:865628 11 
8: 1:866437 14 
9: 1:866461 14 
10: 1:866511 21 
11: 1:871057 21 
12: 1:871215 14 
13: 1:871287 21 
14: 1:871334 21 
15: 1:874415 21 

私は必要なのは、上記のコマンドは、式の中で言及さdata.tableだけでなく、もののすべての列を返すようにする方法です。それ以外の場合は、完全に動作します。どんな助けもありがとうございます。 dputの出力は以下のとおりです。

乾杯、 デイビー

> dput(test) 
structure(list(chr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), bp = c(860416L, 860416L, 860416L, 
860461L, 860461L, 860461L, 860521L, 860521L, 860521L, 861261L, 
861261L, 861332L, 861332L, 861332L, 861332L, 861332L, 865455L, 
865628L, 865628L, 865628L, 865628L, 865628L, 866437L, 866437L, 
866437L, 866437L, 866461L, 866461L, 866461L, 866461L, 866511L, 
871057L, 871057L, 871057L, 871215L, 871215L, 871215L, 871215L, 
871215L, 871215L, 871287L, 871287L, 871287L, 871287L, 871334L, 
871334L, 871334L, 871334L, 874415L, 874415L), ID = structure(c(10L, 
10L, 10L, 8L, 8L, 8L, 9L, 9L, 9L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
1L, 7L, 7L, 7L, 7L, 7L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 11L, 
1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 
6L, 12L, 12L), .Label = c("", "rs139076934", "rs144896029", "rs148884928", 
"rs28419423", "rs4072383", "rs41285790", "rs57465118", "rs57924093", 
"rs61464428", "rs71576583", "rs74047412"), class = "factor"), 
    REF = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
    1L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("C", 
    "CCCCT", "G"), class = "factor"), ALT = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
    1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 4L, 
    4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
    4L, 4L, 4L), .Label = c("A", "CCCCTCCCT", "G", "T"), class = "factor"), 
    AF = c(0.5, 0.5, 0.5, 1, 1, 1, 0.984, 0.984, 0.984, 0.002727, 
    0.002727, 0.0009074, 0.0009074, 0.0009074, 0.0009074, 0.0009074, 
    0.003319, 0.002778, 0.002778, 0.002778, 0.002778, 0.002778, 
    0.0009074, 0.0009074, 0.0009074, 0.0009074, 0.0009074, 0.0009074, 
    0.0009074, 0.0009074, 1, 0.0009074, 0.0009074, 0.0009074, 
    0.00363, 0.00363, 0.00363, 0.00363, 0.00363, 0.00363, 0.0009107, 
    0.0009107, 0.0009107, 0.0009107, 0.668, 0.668, 0.668, 0.668, 
    0.001825, 0.001825), AC = c(14L, 14L, 14L, 62L, 62L, 62L, 
    61L, 61L, 61L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 
    3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 148L, 1L, 1L, 1L, 
    4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 474L, 474L, 474L, 
    474L, 2L, 2L), AN = c(28L, 28L, 28L, 62L, 62L, 62L, 62L, 
    62L, 62L, 1100L, 1100L, 1102L, 1102L, 1102L, 1102L, 1102L, 
    904L, 1080L, 1080L, 1080L, 1080L, 1080L, 1102L, 1102L, 1102L, 
    1102L, 1102L, 1102L, 1102L, 1102L, 148L, 1102L, 1102L, 1102L, 
    1102L, 1102L, 1102L, 1102L, 1102L, 1102L, 1098L, 1098L, 1098L, 
    1098L, 710L, 710L, 710L, 710L, 1096L, 1096L), EFFECT = structure(c(4L, 
    4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 1L, 2L, 2L, 2L, 2L, 4L, 
    4L, 2L, 2L, 2L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
    4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, 4L, 
    4L, 1L, 4L, 4L), .Label = c("DOWNSTREAM", "NON_SYNONYMOUS_CODING", 
    "SYNONYMOUS_CODING", "UPSTREAM"), class = "factor"), IMPACT = structure(c(3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 
    3L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 
    3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L), .Label = c("LOW", "MODERATE", "MODIFIER" 
    ), class = "factor"), FUNCLASS = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 
    2L, 2L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 
    1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("", "MISSENSE", "SILENT"), class = "factor"), 
    CODING = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "CODING", class = "factor"), 
    GENE = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
    2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("AL645608.1", 
    "SAMD11"), class = "factor"), pos = structure(c(1L, 1L, 1L, 
    2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 
    7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 11L, 
    11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 
    14L, 14L, 14L, 14L, 15L, 15L), .Label = c("1:860416", "1:860461", 
    "1:860521", "1:861261", "1:861332", "1:865455", "1:865628", 
    "1:866437", "1:866461", "1:866511", "1:871057", "1:871215", 
    "1:871287", "1:871334", "1:874415"), class = "factor"), effRank = c(21L, 
    21L, 22L, 21L, 21L, 22L, 21L, 21L, 22L, 21L, 22L, 11L, 11L, 
    11L, 11L, 21L, 21L, 11L, 11L, 11L, 14L, 21L, 14L, 14L, 14L, 
    14L, 14L, 14L, 14L, 21L, 21L, 21L, 21L, 21L, 14L, 14L, 21L, 
    21L, 21L, 22L, 21L, 21L, 21L, 22L, 21L, 21L, 21L, 22L, 21L, 
    21L)), .Names = c("chr", "bp", "ID", "REF", "ALT", "AF", 
"AC", "AN", "EFFECT", "IMPACT", "FUNCLASS", "CODING", "GENE", 
"pos", "effRank"), row.names = c(NA, -50L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000004260788>, sorted = c("chr", 
"bp", "effRank")) 

答えて

8

あなたは、行番号を与える内部変数.Iを、使用することができます。これらの値を使用して次にsubset、次のように:

それはあなたが2行に書き込む場合は、次のように理解することは簡単です
DT[DT[, .I[which.min(effRank)], pos]$V1] 

tmp <- DT[, .I[which.min(effRank)], pos] 
DT[tmp$V1] 

最初の行は、すべての行番号ので列V1を生成し、最小位置(jの式)はposでグループ化されています。次に、それらをサブセット化します。

+0

優秀!ありがとうございました。私はドキュメンテーションでそれを完全に見逃しましたか? –

関連する問題