テキストファイル内の特定の行とセルを選択してデータフレームに入れる：pythonまたはR

pythonまたはRのどちらでも使用できますが、「基本統計」行を選択する方法については、下のようなものです。テキストファイル内の特定の行とセルを選択してデータフレームに入れる：pythonまたはR

ROI    Min  Max   Mean  Stdev 
mrc_ranch_house -20.208261 6.025762 -8.866403 5.289712 
river_1   -20.187374 -6.694543 -12.227586 2.66464 
river_2   -18.365091 -5.820825 -13.164463 2.851231

：私はは、最終的な出力は次のようになります

ROI: mrc_ranch_house [Red] 195 points 

Basic Stats  Min  Max   Mean  Stdev 
    Band 1 -20.208261 6.025762 -8.866403 5.289712 

Histogram   DN  Npts Total Percent  Acc Pct 
Band 1  -20.208261  1  1 0.5128  0.5128 
Bin=0.10287 -20.105383  0  1 0.0000  0.5128 
      -20.002504  1  2 0.5128  1.0256 
      -19.899626  0  2 0.0000  1.0256 
      -19.796747  0  2 0.0000  1.0256 
      -19.693869  0  2 0.0000  1.0256 
      -19.590990  0  2 0.0000  1.0256 
      -19.488112  0  2 0.0000  1.0256 

Stats for ROI: river_1 [Blue] 90 points      
Basic Stats  Min  Max   Mean  Stdev   
    Band 1 -20.187374 -6.694543 -12.227586 2.66464  

Histogram   DN  Npts Total Percent  Acc Pct  
Band 1  -20.187374 1 1 1.1111 1.1111 
Bin=0.05291 -20.134461 0 1 0 1.1111 
     -20.081548 0 1 0 1.1111 
     -20.028635 0 1 0 1.1111 
     -19.975722 0 1 0 1.1111 


Stats for ROI: river_2 [Blue] 96 points     
Basic Stats  Min  Max   Mean  Stdev  
    Band 1 -18.365091 -5.820825 -13.164463 2.851231  

Histogram    DN  Npts Total Percent  Acc Pct 
Band 1   -18.365091 1 1 1.0417 1.0417 
Bin=0.04919 -18.315898 0 1 0 1.0417 
     -18.266705 0 1 0 1.0417 
     -18.217512 0 1 0 1.0417

この情報とROIの名前はR.

でパンダのデータフレームまたはデータテーブルとして置くことにしたいです。 ..など

ありがとう！

出典

2017-02-28 JAG2024

'readLines'を使用すると、データを1行ずつ読み込むことができます。この整然とした出力を読み取るには正規表現fooも必要です。 –

Pythonで 'readLines'ですか？途中で編集していただきありがとうございます。 – JAG2024

これはRの基本関数です。 'gsub'も参照してください。 –

、所望に応じてこの結果を与える：

# read the text file 
txt <- readLines('https://dl.dropboxusercontent.com/u/45095175/rois_all.txt') 

# create an index for the lines that are needed 
ti <- rep(which(grepl('ROI:', txt)), each = 3) + 1:3 
# create a grouping vector of the same length 
grp <- rep(1:33, each = 3) 

# filter the text with the index 'ti' 
# and split into a list with grouping variable 'grp' 
lst <- split(txt[ti], grp) 
# loop over the list a read the text parts in as dataframes 
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', header = TRUE, 
              blank.lines.skip = TRUE)) 

# bind the dataframes in the list together in one data.frame 
DF <- do.call(rbind, lst) 
# change the name of the first column 
names(DF)[1] <- 'ROI' 

# get the correct ROI's for the ROI-column 
DF$ROI <- sub('.*: (\\w+).*$', '\\1', txt[grepl('ROI: ', txt)])

を与える：

> DF 
       ROI  Min  Max  Mean Stdev 
1 mrc_ranch_house -20.208261 6.025762 -8.866403 5.289712 
2   river_1 -20.187374 -6.694543 -12.227586 2.664640 
3   river_2 -18.365091 -5.820825 -13.164463 2.851231 
4   river_3 -18.291010 -4.583666 -12.092995 3.479293 
5   river_4 -17.074295 -4.926921 -9.970926 2.897855 
6   river_5 -16.849176 -8.622208 -12.387085 2.168462 
7 adjacent_river_2 -18.987597 -7.957749 -13.392523 1.962263 
8 adjacent_river_3 -19.426531 -8.640042 -13.467425 1.888105 
9 adjacent_river_4 -20.452566 -6.830183 -12.833450 2.124761 
10   bcs_1_ -23.612043 -8.221417 -16.032305 2.080695 
11   bcs_2_ -24.018219 -10.648975 -16.814048 1.948863 
12   bcs_3_ -23.011086 -9.106754 -15.404174 1.867498 
13   red_1_ -22.313442 -7.839107 -14.768196 2.134152 
14   red_2_ -22.551537 -7.236300 -14.613618 2.204253 
15   red_3_ -22.057703 -7.746992 -14.483161 2.123497 
16   bcs_4 -22.705107 -8.972753 -15.201623 1.817122 
17   bcs_5 -24.109459 -10.113716 -15.776537 1.849163 
18   glade_1_ -19.913187 -6.189866 -12.695884 3.303929 
19   glade_2_ -19.812855 -4.672865 -11.995191 4.840168 
20   glade_3_ -10.078033 -2.828722 -5.877417 1.941401 
21   mwea_b -13.979379 -4.977155 -11.392434 2.019037 
22    kaga -13.114172 -8.889531 -10.649324 1.290551 
23    huku -14.206743 -7.853305 -10.608210 1.441250 
24    ruai -18.643108 -12.645180 -14.54.224183 
25   tumaini -19.543234 -13.164941 -15.899968 1.812876 
26   nkando -19.973492 -7.040238 -11.716987 2.617544 
27   jikaze -16.408030 -9.001065 -12.323898 1.942196 
28  miarage_b -15.126486 -6.661448 -10.391111 1.764279 
29   batian -15.269146 -9.603316 -11.962470 1.168859 
30   gitaraga -17.037708 -7.495215 -10.886802 2.561877 
31  wiumiririe -9.578024 -6.225223 -7.688715 1.059796 
32   chumvi -14.883148 -10.327570 -12.819469 1.231636 
33 next_to_airstrip -17.242777 -5.207252 -10.601750 1.987712

（以降1つのデータフレームで一緒にリストを結合してから）最後の部分はdata.table -packageからrbindlist -functionともを行うことができる：ここ

# load the 'data.table' package for the 'rbindlist' function 
library(data.table) 
# bind the dataframes in the list together to a data.table (enhanced version of a data.frame) 
DT <- rbindlist(lst) 
# change the name of the first column 
setnames(DT, 1, 'ROI') 

# get the correct ROI's for the ROI-column 
DT[, ROI := sub('.*: (\\w+).*$', '\\1', txt[grepl('ROI: ', txt)])]

出典

2017-02-28 16:55:19 Jaap

ガー、それに私を打つがhttps://twitter.com/romunov/status/836622674944266240 –

@RomanLuštrikのために私を打つ私は少し遅れて – Jaap

ちょっと@ジャップ！あなたがこの簡単な質問で私を助けることができるかどうか疑問に思っています。私は今では複数の「バンド」を持つテキストファイルを持っていますが、あなたのコードを動作させることができませんでした。それをここで確認できますか？http://stackoverflow.com/questions/42614688/broken-r-code-to-select-specific-rows-and-cells-in-text-file-and-put-into -data-f？noredirect = 1＃comment72359303_42614688 – JAG2024

dataの各行が1つのインポートソリューションを見つけられませんでしたが、Band 1と呼ばれていますが、それは良いスタートです。

import pandas as pd 

data = pd.read_csv(r'rois_all.txt', delimiter='\t', error_bad_lines=False, skiprows=[0, 1]) 
data = data.dropna() 
data = data.ix[data.ix[:, 'Basic Stats']!='Basic Stats', :] 
data

次のように私は今、次のようにこれは、参加

0  mrc_ranch_house 
1    river_1 
2    river_2

、見える、

names = pd.read_csv(r'rois_all.txt', delimiter='\t', error_bad_lines=False, skiprows=[0, 1]) 

names = names.ix[names.ix[:, 'Basic Stats'] != '  Band 1'] 
names = names.ix[names.ix[:, 'Basic Stats'] != 'Basic Stats'] 
names = names.ix[:, 'Basic Stats'].str.extract('Stats for ROI: (.*) \[.*\] [0-9]*') 
names.loc[0] = 'mrc_ranch_house' 
names = names.sort_index() 
names = names.reset_index(drop=True)

を基本統計名のすべてを抽出している出力

Basic Stats Min   Max   Mean  Stdev 
0 Band 1 -20.208261 6.025762 -8.866403 5.289712 
3 Band 1 -20.187374 -6.694543 -12.227586 2.664640 
6 Band 1 -18.365091 -5.820825 -13.164463 2.851231

の例dataとnamesようなので、

data.ix[:, 'Basic Stats'] = names

を使用、Rで

Basic Stats  Min   Max   Mean  Stdev 
0 mrc_ranch_house -20.208261 6.025762 -8.866403 5.289712 
1 river_1   -20.187374 -6.694543 -12.227586 2.664640 
2 river_2   -18.365091 -5.820825 -13.164463 2.851231

出典

2017-02-28 16:40:03 josh

ROIの名前を選択してこの情報を列に追加する方法の部分を追加できますか？ – JAG2024

これは完全な解決策ではありません。 –

@RomanLuštrik私は私の答えで多くを言った。私は解決策のその部分を持っていましたが、何も投稿していないよりもJAG2024のほうが使い易かったです。 – josh

別の醜いソリューションです。結果は古くなった古い普通のdata.frameです。

rois_all <- file("https://dl.dropboxusercontent.com/u/45095175/rois_all.txt") 

xy <- readLines(rois_all) 

# find lines where ROI starts 
roin <- grep(pattern = "ROI: ", x = xy) 
roi <- xy[roin] 
roi <- gsub(".*ROI: (\\w+).*$", "\\1", roi) 

# find lines with stats 
stats <- grep(pattern = "Basic Stats", x = xy) 

# trim whitespace and collect Col 
cn <- trimws(sapply(strsplit(xy[stats][1], "\t"), "[", 2:5, simplify = FALSE)[[1]]) 

# split the stat line by \t and extract only elements 2 to 5. merge row-wise 
out <- do.call(rbind, sapply(strsplit(xy[stats + 1], "\t"), "[", 2:5, simplify = FALSE)) 
out <- as.data.frame(apply(out, MARGIN = 2, as.numeric)) 

# add ROI column extracted earlier 
out <- cbind(roi, out) 

colnames(out) <- c("ROI", cn) 

out 

       ROI  Min  Max  Mean Stdev 
1 mrc_ranch_house -20.208261 6.025762 -8.866403 5.289712 
2   river_1 -20.187374 -6.694543 -12.227586 2.664640 
3   river_2 -18.365091 -5.820825 -13.164463 2.851231 
4   river_3 -18.291010 -4.583666 -12.092995 3.479293 
5   river_4 -17.074295 -4.926921 -9.970926 2.897855 
6   river_5 -16.849176 -8.622208 -12.387085 2.168462 
7 adjacent_river_2 -18.987597 -7.957749 -13.392523 1.962263 
8 adjacent_river_3 -19.426531 -8.640042 -13.467425 1.888105 
9 adjacent_river_4 -20.452566 -6.830183 -12.833450 2.124761 
10   bcs_1_ -23.612043 -8.221417 -16.032305 2.080695 
11   bcs_2_ -24.018219 -10.648975 -16.814048 1.948863 
12   bcs_3_ -23.011086 -9.106754 -15.404174 1.867498 
13   red_1_ -22.313442 -7.839107 -14.768196 2.134152 
14   red_2_ -22.551537 -7.236300 -14.613618 2.204253 
15   red_3_ -22.057703 -7.746992 -14.483161 2.123497 
16   bcs_4 -22.705107 -8.972753 -15.201623 1.817122 
17   bcs_5 -24.109459 -10.113716 -15.776537 1.849163 
18   glade_1_ -19.913187 -6.189866 -12.695884 3.303929 
19   glade_2_ -19.812855 -4.672865 -11.995191 4.840168 
20   glade_3_ -10.078033 -2.828722 -5.877417 1.941401 
21   mwea_b -13.979379 -4.977155 -11.392434 2.019037 
22    kaga -13.114172 -8.889531 -10.649324 1.290551 
23    huku -14.206743 -7.853305 -10.608210 1.441250 
24    ruai -18.643108 -12.645180 -14.54.224183 
25   tumaini -19.543234 -13.164941 -15.899968 1.812876 
26   nkando -19.973492 -7.040238 -11.716987 2.617544 
27   jikaze -16.408030 -9.001065 -12.323898 1.942196 
28  miarage_b -15.126486 -6.661448 -10.391111 1.764279 
29   batian -15.269146 -9.603316 -11.962470 1.168859 
30   gitaraga -17.037708 -7.495215 -10.886802 2.561877 
31  wiumiririe -9.578024 -6.225223 -7.688715 1.059796 
32   chumvi -14.883148 -10.327570 -12.819469 1.231636 
33 next_to_airstrip -17.242777 -5.207252 -10.601750 1.987712

出典

2017-02-28 17:06:44

ありがとう@RomanLuštrik。このコードは非常に明確です。私は今、複数のバンドを持つテキストファイルを持って、それを動作させようとしていますが、現在失敗しています。あなたはそれを見てください：http://stackoverflow.com/questions/42614688/broken-r-code-to-select-specific-rows-and-cells-in-text-file-and-put-into- data-f default = 1＃comment72359303_42614688 – JAG2024

テキストファイル内の特定の行とセルを選択してデータフレームに入れる：pythonまたはR

答えて

関連する問題