2016-11-06 13 views
0

リテット(RTで始まる文字列)をデータセットから削除しようとしていますが、greplコマンドが正しく機能していないようです。greplを使用して値を削除する

これは正常に動作します:

grepl("[^rt|RT][:alnum]",c("RT hi","rt boo","rtlolo","im goodRT"),ignore.case=T)

これが失敗しました。どうして?

data<-structure(list(data = c("RT @4MySquad: This makes me sick!\\n#whiteprivilege\\n#BlackLivesMatter \\n#Policestate https:\\/\\/t.co\\/nDL0AHwWTd", 
           "RT @weaselzippers: D.C. Police Want Help Identifying #BlackLivesMatter Supporters Who Beat And Left Hero Marine For Dead\\u2026 https:\\/\\/t.co\\/tbmO\\u2026", 
           "RT @vicegandako: #PrayForMannyPacquiao #LoveWins", "\\Dig out of the binaries of right and wrong\\ - #BlackLivesMatter at Mizzou", 
           "Even Democrats think #Bernie 's ideas are unrealistiC#insane #UNLV #BigBangTheory #Hillary2016 #blacklivesmatter https:\\/\\/t.co\\/ITDyXoAvtK", 
           "RT @eelawl1966: Former NAACP President Ben Jealous endorses Bernie Sanders\\n#BlackLivesMatter #BLM #Bernie2016 \\n https:\\/\\/t.co\\/Qom1KMwLHs", 
           "#SayNoToHillary #NoMoreClintons #FeelTheBern #BernieSanders #BlackLivesMatter #Disabled4Bernie #Women4Bernie... https:\\/\\/t.co\\/I8F21ilJgv", 
           "RT @JoshuaMannery: #BlackLivesMatter \\ud83d\\udc4a\\ud83c\\udffd https:\\/\\/t.co\\/tcEITKKGhd", 
           "lang:und", "@FoxNews Did he not say, \\Yes\\? Hopefully this story won't gain traction bc it's not reflective of the #blacklivesmatter movement", 
           "President Barack Obama Is Doing Big Things With Cuba + #BlackLivesMatter https:\\/\\/t.co\\/6gEJreOiUc", 
           "RT @Uberarabic: \\u0644\\u0644\\u0639\\u0644\\u0645 \\u0639\\u0642\\u0648\\u0628\\u0629 \\u0627\\u0644\\u0645\\u062b\\u0644\\u064a\\u064a\\u0646 \\u0641\\u064a \\u062c\\u0645\\u064a\\u0639 \\u0627\\u0644\\u062f\\u0627\\u064a\\u0627\\u0646\\u0627\\u062a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0648\\u064a\\u0629 \\u0647\\u064a \\u0627\\u0644\\u0642\\u062a\\u0644\\n\\n#LoveWins", 
           "RT @AishaYesufu: Let's not forget 219#ChibokGirls still in captivity today 676 days \\n#NeverToBeForgotten #CryingToBeRescued #BringBackOurGi\\u2026", 
           "RT @arctic_matters: Chukchi Sea. #LoveWins https:\\/\\/t.co\\/gH8KZgVZk3", 
           ". @DoubleFine r u joking, tim u know the servers aren't working you dumb asshole #gamergate", 
           "RT @realkingcalii: #BlackLivesMatter Kendrick Lamar \\Alright\\ - https:\\/\\/t.co\\/amlRn0fKsA", 
           "RT @DreamersMOMS: Community representing #CCA &amp; @geogroups making dirty $$$$ w\\/immigrants. #WeAreFlorida #not1more #immigration https:\\/\\/t.c\\u2026", 
           "id_str:700", "RT @DreamersMOMS: Con compa\\u00f1eras de Carolina del Norte apoy\\u00e1ndonos en #Tallahassee. #ProteccionNoDeportation #Not1More @grisalonso https:\\/\\/\\u2026", 
           "RT @IkeIsaacson2: Hey #blacklivesmatter this is a hate crime done by racists in your name. https:\\/\\/t.co\\/6uGSXAJcrM" 
)), .Names = "data", row.names = c(NA, 20L), class = "data.frame") 

data[grepl("[^rt|RT][:alnum]",data,ignore.case=T)] 

this question

はまた、Twitterのデータを使用して、しかし、それを我々は、1つ以上のスペース( \\s+)続い RTで始まる文字( ^)としてパターンを指定

+0

あなたは 'grepl( "^([RT | RT])[[:alnum:]]を+" を意味しました、C( "HI RT"、 "RTブー"、 "rtlolo"、「イムgoodRTを")、ignore.case = TRUE)#[1] TRUE TRUE TRUE FALSE ' – akrun

+0

3番目の文字列' rtlolo'を除外したくありませんでした。私の場合は '(rt | RT)[\ s]'で始まらないので有効です。 – Rilcon42

+0

上記のコメントにその正規表現をぶつけた場合は、 'rt'または 'RT'で始まるつぶやきだけを除外したい'(最後のスペースに注意してください) – Rilcon42

答えて

1

異なるアプローチがあり、ignore.case = TRUEとしてrtで始まり、スペースで始まる要素も取得します。

grepl("^RT\\s+",c("RT hi","rt boo","rtlolo","im goodRT"), ignore.case=TRUE) 
#[1] TRUE TRUE FALSE FALSE 

grep("^RT\\s+", data$data, ignore.case=TRUE, value = TRUE) 
#[1] "RT @4MySquad: This makes me sick!\\n#whiteprivilege\\n#BlackLivesMatter \\n#Policestate https:\\/\\/t.co\\/nDL0AHwWTd"                                                                 
#[2] "RT @weaselzippers: D.C. Police Want Help Identifying #BlackLivesMatter Supporters Who Beat And Left Hero Marine For Dead\\u2026 https:\\/\\/t.co\\/tbmO\\u2026"                                                       
#[3] "RT @vicegandako: #PrayForMannyPacquiao #LoveWins"                                                                                  
#[4] "RT @eelawl1966: Former NAACP President Ben Jealous endorses Bernie Sanders\\n#BlackLivesMatter #BLM #Bernie2016 \\n https:\\/\\/t.co\\/Qom1KMwLHs"                                                          
#[5] "RT @JoshuaMannery: #BlackLivesMatter \\ud83d\\udc4a\\ud83c\\udffd https:\\/\\/t.co\\/tcEITKKGhd"                                                                       
#[6] "RT @Uberarabic: \\u0644\\u0644\\u0639\\u0644\\u0645 \\u0639\\u0642\\u0648\\u0628\\u0629 \\u0627\\u0644\\u0645\\u062b\\u0644\\u064a\\u064a\\u0646 \\u0641\\u064a \\u062c\\u0645\\u064a\\u0639 \\u0627\\u0644\\u062f\\u0627\\u064a\\u0627\\u0646\\u0627\\u062a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0648\\u064a\\u0629 \\u0647\\u064a \\u0627\\u0644\\u0642\\u062a\\u0644\\n\\n#LoveWins" 
#[7] "RT @AishaYesufu: Let's not forget 219#ChibokGirls still in captivity today 676 days \\n#NeverToBeForgotten #CryingToBeRescued #BringBackOurGi\\u2026"                                                         
#[8] "RT @arctic_matters: Chukchi Sea. #LoveWins https:\\/\\/t.co\\/gH8KZgVZk3"                                                                            
#[9] "RT @realkingcalii: #BlackLivesMatter Kendrick Lamar \\Alright\\ - https:\\/\\/t.co\\/amlRn0fKsA"                                                                       
#[10] "RT @DreamersMOMS: Community representing #CCA &amp; @geogroups making dirty $$$$ w\\/immigrants. #WeAreFlorida #not1more #immigration https:\\/\\/t.c\\u2026"                                                       
#[11] "RT @DreamersMOMS: Con compa\\u00f1eras de Carolina del Norte apoy\\u00e1ndonos en #Tallahassee. #ProteccionNoDeportation #Not1More @grisalonso https:\\/\\/\\u2026"                                                      
#[12] "RT @IkeIsaacson2: Hey #blacklivesmatter this is a hate crime done by racists in your name. https:\\/\\/t.co\\/6uGSXAJcrM"   
関連する問題