2016-04-13 6 views
0

私はSNPの有無にかかわらず大きな配列データを比較しようとしており、snpsを非同義語または同義語としてマークしています。私が言う必要が代替ヌクレオチドがミスセンス変異につながる場合

ref[test$pos]=as.vector(test$ALT) 

、意志交換用リード:私はalternatriveものでrefferenceヌクレオチドを置き換えることができます

head(test) 

    pos ALT REF 
1 2 G T 
2 8 G T 
3 65 C G 
4 68 C G 
5 77 T C 
6 78 G C 

.fastaシーケンスおよび保守(refference)とPLNIKから.bimファイルおよび代替ヌクレオチドを有しますアミノ酸の変化かどうか。私はseqinrパッケージを使用するつもりです、多分私は間違った方法ですか? だから私は配列で2列、(altベクトルで代替ヌクレオチドは、高音域でマークされている)しました:私は、これはアミノ酸にベクトル変換することができ

ref=c("a","t","g","t","c","g","t","c","g","g","c","c","g","c","g","g","g","c", 
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c", 
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g", 
"t","g","g","a","t","g","t","g","c","g","g","c","g","c","c","g","g","t", 
"g","g","c","c","g","t","g","c","g","g","g","c","g","g","c","g","c","c", 
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c", 
"g","g","c","g","t","t","g","g","c","t","c","c","c","t","c","g","t","c", 
"c","g","t","g","a","c","a","t","t","g","g","c","g","a","c","c","c","c", 
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","c","c", 
"g","t","t","a","a","g") 

alt=c("a","G","g","t","c","g","t","G","g","g","c","c","g","c","g","g","g","c", 
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c", 
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g", 
"t","g","g","a","t","g","t","g","c","g","C","c","g","C","c","g","g","t", 
"g","g","c","c","T","G","g","c","g","g","C","c","g","g","c","g","c","c", 
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c", 
"g","g","c","g","t","t","g","g","c","t","C","c","c","t","c","g","C","c", 
"c","T","t","g","a","c","a","T","t","g","g","c","g","a","c","c","c","c", 
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","C","c", 
"g","t","t","a","a","g") 

を:

t_ref=translate(ref) 
t_alt=translate(alt) 

その後、私は比較することができます彼らと変更which've言う:

which((ref==alt)==FALSE) 
which((t_ref==t_alt)==FALSE) 

そこで質問がaminoaciにつながるtest DFにヌクレオチドをマークすることですdの変更。前もって感謝します。

答えて

2

使用モジュロ演算は、これがあることを指摘し、私は私の最初の試みで間違っモジュロ演算の「登録」を得たヌクレオチド配列

library(seqinr) 
test$pos %/% 3 # returns a zero-based position, so add 1 to get 1 based value 
#[1] 0 2 21 22 25 26 
t_ref[ 1+(test$pos %/% 3)] 
#[1] "M" "S" "G" "A" "R" "A" # lookup value in prot-seq 
t_alt[ 1+(test$pos %/% 3)] 
#[1] "R" "W" "A" "A" "L" "A" # test for equality to this value 
test$change <- t_ref[ 1+((test$pos-1) %/% 3)] == t_alt[ 1+((test$pos-1) %/% 3)] 
test 
#===================== 
    pos ALT REF change 
1 2 G T FALSE 
2 8 G T FALSE 
3 65 C G FALSE 
4 68 C G TRUE 
5 77 T C FALSE 
6 78 G C FALSE 

pos列からのタンパク質配列中の位置を構築します適切に "登録されています"翻訳:

> (1:21 -1) %/% 3 
[1] 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 
+0

SNPが近くにないとうまくいきますが、SNPが近い場合はうまくいきません。私たちは両方のSNPを 'FALSE'とマークします – Lionir

+0

「正しい」答えが何であるかを言っておらず、この文脈では「近く」および「遠い」という用語を理解できません。それが失敗した場合、それはhte起源からの距離によるものではありません。モジュロ演算の「登録」が間違っている可能性があります。このテストがより良いかどうかを調べる: 'test $ change < - t_ref [1 +(テスト$ pos-1)%/%3)] == t_alt [1 +((テスト$ pos-1)%/%3) ] ' –

+0

しかし、とにかく、あなたは私の問題を解決しました、私はちょうど隣のSNPの例外ルールを追加しました、ありがとう! – Lionir

関連する問題