2016-05-26 12 views
0

私は、他の値を維持しながら、各被験者について各エリアの左右の平均値の行を追加したいこのR:データフレーム - 手段を追加行は

subject_id area side value confound1 confound2 confound3 
s01 A left 5 154 952 no 
s01 A right 7 154 952 no 
s01 B left 15 154 952 no 
s01 B right 17 154 952 no 
s02 A left 3 130 870 yes 
s02 A right 5 130 870 yes 
s02 B left 12 130 870 yes 
s02 B right 11 130 870 yes 

ようなデータフレームを有しています変数:

subject_id area side value confound1 confound2 confound3 
s01 A left 5 154 952 no 
s01 A right 7 154 952 no 
s01 A avg 6 154 952 no 
s01 B left 15 154 952 no 
s01 B right 17 154 952 no 
s01 B avg 16 154 952 no 
s02 A left 3 130 870 yes 
s02 A right 5 130 870 yes 
s02 A avg 4 130 870 yes 
s02 B left 12 130 870 yes 
s02 B right 11 130 870 yes 
s02 B avg 11.5 130 870 yes 

どのようにすればいいですか?

答えて

3

は、基地R機能aggregaterbind有する方法です。ライブラリdplyrを使用して

# get the data 
df <- read.table(header=T, text="subject_id area side value confound1 confound2 confound3 
s01 A left 5 154 952 no 
        s01 A right 7 154 952 no 
        s01 B left 15 154 952 no 
        s01 B right 17 154 952 no 
        s02 A left 3 130 870 yes 
        s02 A right 5 130 870 yes 
        s02 B left 12 130 870 yes 
        s02 B right 11 130 870 yes") 

# get the average values 
dfAgg <- aggregate(cbind(value=value, confound1=confound1, 
         confound2=confound2, confound3=confound3) ~ 
        subject_id + area, data=df, FUN=mean) 
# add variables 
dfAgg$side <- "side.avg" 
dfAgg$confound3 <- factor(dfAgg$confound3, labels=c("no", "yes")) 

#rbind the averages  
dfFinal <- rbind(df, dfAgg) 

# order the data 
dfFinal <- dfFinal[order(dfFinal$subject_id, dfFinal$area, dfFinal$side),] 
+0

を使用してオプションがこれは動作しません。なぜなら、 ng side.avgは常にconfound3 = "no"を持ちます。この場合、subject_id = "B"に対しては正しくありません – Anders

+0

@アンダースではありません。 – lmo

1

私はtidyrを使用してデータを収集してから配信します。ここ

library(dplyr) 
library(tidyr) 

df %>% 
    spread(side, value) %>% 
    mutate(avg = (left + right)/2) %>% 
    gather(side, value, left:avg) 

     subject_id area confound1 confound2 confound3 side value 
1   s01 A  154  952  no left 5.0 
2   s01 B  154  952  no left 15.0 
3   s02 A  130  870  yes left 3.0 
4   s02 B  130  870  yes left 12.0 
5   s01 A  154  952  no right 7.0 
6   s01 B  154  952  no right 17.0 
7   s02 A  130  870  yes right 5.0 
8   s02 B  130  870  yes right 11.0 
9   s01 A  154  952  no avg 6.0 
10  s01 B  154  952  no avg 16.0 
11  s02 A  130  870  yes avg 4.0 
12  s02 B  130  870  yes avg 11.5 
2

は、あなたがこのような何かを行うことができます。

library(dplyr) 
df %>% group_by(subject_id, area) %>% mutate(mean_left_right = mean(value)) 

出力は次のとおりです。

Source: local data frame [8 x 8] 
Groups: subject_id, area [4] 

    subject_id area side value confound1 confound2 confound3 mean_left_right 
     <chr> <chr> <chr> <int>  <int>  <int>  <chr>   <dbl> 
1  s01  A left  5  154  952  no    6.0 
2  s01  A right  7  154  952  no    6.0 
3  s01  B left 15  154  952  no   16.0 
4  s01  B right 17  154  952  no   16.0 
5  s02  A left  3  130  870  yes    4.0 
6  s02  A right  5  130  870  yes    4.0 
7  s02  B left 12  130  870  yes   11.5 
8  s02  B right 11  130  870  yes   11.5 
1

data.table

library(data.table) 
rbind(setDT(df)[, .(side = 'avg', value=mean(value)) , 
    .(subject_id, area, confound1, confound2, confound3)][, 
    names(df), with=FALSE], df)[order(subject_id, area, 
     factor(side, levels=c('left', 'right', 'ave')))] 
# subject_id area side value confound1 confound2 confound3 
# 1:  s01 A left 5.0  154  952  no 
# 2:  s01 A right 7.0  154  952  no 
# 3:  s01 A avg 6.0  154  952  no 
# 4:  s01 B left 15.0  154  952  no 
# 5:  s01 B right 17.0  154  952  no 
# 6:  s01 B avg 16.0  154  952  no 
# 7:  s02 A left 3.0  130  870  yes 
# 8:  s02 A right 5.0  130  870  yes 
# 9:  s02 A avg 4.0  130  870  yes 
#10:  s02 B left 12.0  130  870  yes 
#11:  s02 B right 11.0  130  870  yes 
#12:  s02 B avg 11.5  130  870  yes 
関連する問題