2016-04-11 21 views
1

特定のサブセットの突然変異の数を計算する方法が複雑になりました。頻度表を使用して各サンプルのサブタイプの割合を計算する方法が不思議でした突然変異を含む。以下のようにデータが類似している:頻度テーブル内の行サブセットの頻度

Rownames gene1 gene2 ...gene40 
Sample1 Mut WT  WT 
Sample2 WT WT  WT 
Sample3 Mut WT  WT 
Sample4 WT MUT  MUT 
...277 

各試料が同じ順序でサンプルと別のフレームに分類される $Subtype = "GS" "CIN" "MSI"

Iは、各遺伝子についてサブタイプによってジェネックス変異の割合を計算したいですテーブルに

ありがとうございます!

答えて

1
ng <- 40 
ns <- 277 

set.seed(1) 
m <- matrix(sample(c('WT','MUT'), ng * ns, TRUE), ns, 
      dimnames = list(paste0('Sample', seq(ns)), paste0('Gene', seq(ng)))) 

data <- data.frame(m, stringsAsFactors = FALSE) 
subtype <- sample(c("GS","CIN","MSI"), ns, TRUE) 

あなたのデータはその後

str(data) 
'data.frame': 277 obs. of 40 variables: 
    # $ Gene1 : chr "WT" "WT" "MUT" "MUT" ... 
    # $ Gene2 : chr "WT" "WT" "MUT" "WT" ... 
    # $ Gene3 : chr "MUT" "WT" "WT" "WT" ... 
    # $ Gene4 : chr "MUT" "MUT" "MUT" "WT" ... 

ように見える場合は、可能性がありそう

sp <- split(data, subtype) 
(l <- lapply(sp, function(x) colMeans(x == 'MUT'))) 

## 
## 
## $CIN 
##  Gene1  Gene2  Gene3  Gene4  Gene5  Gene6  Gene7  Gene8  Gene9 
## 0.5268817 0.4516129 0.4838710 0.4408602 0.4516129 0.4301075 0.4731183 0.5376344 0.4408602 
## Gene10 Gene11 Gene12 Gene13 Gene14 Gene15 Gene16 Gene17 Gene18 
## 0.4193548 0.4946237 0.5698925 0.4301075 0.4838710 0.5053763 0.3978495 0.5161290 0.5483871 
## Gene19 Gene20 Gene21 Gene22 Gene23 Gene24 Gene25 Gene26 Gene27 
## 0.3978495 0.5698925 0.5698925 0.4516129 0.4946237 0.5268817 0.5591398 0.4731183 0.4838710 
## Gene28 Gene29 Gene30 Gene31 Gene32 Gene33 Gene34 Gene35 Gene36 
## 0.4946237 0.5161290 0.5161290 0.4301075 0.5698925 0.5376344 0.5161290 0.4516129 0.4301075 
## Gene37 Gene38 Gene39 Gene40 
## 0.4731183 0.6021505 0.5483871 0.4731183 
## 
## $GS 
##  Gene1  Gene2  Gene3  Gene4  Gene5  Gene6  Gene7  Gene8  Gene9 
## 0.4742268 0.4536082 0.5567010 0.4845361 0.4742268 0.5051546 0.5463918 0.4020619 0.4845361 
## Gene10 Gene11 Gene12 Gene13 Gene14 Gene15 Gene16 Gene17 Gene18 
## 0.4329897 0.4536082 0.4948454 0.4948454 0.4639175 0.3711340 0.5051546 0.5154639 0.5876289 
## Gene19 Gene20 Gene21 Gene22 Gene23 Gene24 Gene25 Gene26 Gene27 
## 0.5670103 0.5051546 0.5567010 0.5670103 0.5876289 0.5051546 0.4536082 0.5567010 0.5051546 
## Gene28 Gene29 Gene30 Gene31 Gene32 Gene33 Gene34 Gene35 Gene36 
## 0.4639175 0.4329897 0.5154639 0.4639175 0.4639175 0.5773196 0.5257732 0.4948454 0.4329897 
## Gene37 Gene38 Gene39 Gene40 
## 0.5360825 0.5257732 0.4742268 0.5051546 
## 
## $MSI 
##  Gene1  Gene2  Gene3  Gene4  Gene5  Gene6  Gene7  Gene8  Gene9 
## 0.4367816 0.4827586 0.5172414 0.4597701 0.4252874 0.5402299 0.4827586 0.5057471 0.5172414 
## Gene10 Gene11 Gene12 Gene13 Gene14 Gene15 Gene16 Gene17 Gene18 
## 0.5057471 0.5057471 0.5862069 0.5747126 0.5172414 0.4252874 0.5057471 0.5057471 0.5517241 
## Gene19 Gene20 Gene21 Gene22 Gene23 Gene24 Gene25 Gene26 Gene27 
## 0.5057471 0.5057471 0.5747126 0.4597701 0.5517241 0.4597701 0.6321839 0.4252874 0.4712644 
## Gene28 Gene29 Gene30 Gene31 Gene32 Gene33 Gene34 Gene35 Gene36 
## 0.4942529 0.4022989 0.5172414 0.5172414 0.4827586 0.4252874 0.5632184 0.4712644 0.5172414 
## Gene37 Gene38 Gene39 Gene40 
## 0.5172414 0.4712644 0.5977011 0.4482759 

そして

do.call('rbind', l) 

#   Gene1  Gene2  Gene3  Gene4  Gene5  Gene6  Gene7  Gene8 
# CIN 0.5268817 0.4516129 0.4838710 0.4408602 0.4516129 0.4301075 0.4731183 0.5376344 
# GS 0.4742268 0.4536082 0.5567010 0.4845361 0.4742268 0.5051546 0.5463918 0.4020619 
# MSI 0.4367816 0.4827586 0.5172414 0.4597701 0.4252874 0.5402299 0.4827586 0.5057471 
のようなもの
関連する問題