2016-08-02 51 views
0

時々カンマと改行を含む.csvカラムのデータがあります。データにカンマがある場合は、文字列全体を二重引用符で囲みました。改行やコンマを考慮して、その列の出力を.txtファイルに解析するにはどうすればよいでしょうか。私のコマンドでは動作しませんカンマと改行を含む.csvカラムを取得するawk

サンプルデータ:

,"This is some text with a , in it.", #data with commas are enclosed in double quotes 

,line 1 of data 
line 2 of data, #data with a couple of newlines 

,"Data that may a have , in it and 
also be on a newline as well.", 

ここでは、私がこれまで持っているものです。

awk -F "\"*,\"*" '{print $4}' file.csv > column_output.txt 
+0

は、あなたの二重引用符で区切られ、フィールド内の二重引用符をエスケープして、もしそうなら、彼らはエスケープされているか、例えば持つことができます'' foo \ "bar" '' '' 'foo''" bar "' 'なのか? –

答えて

0
$ cat decsv.awk 
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," } 
{ 
    # create strings that cannot exist in the input to map escaped quotes to 
    gsub(/a/,"aA") 
    gsub(/\\"/,"aB") 
    gsub(/""/,"aC") 

    # prepend previous incomplete record segment if any 
    $0 = prev $0 
    numq = gsub(/"/,"&") 
    if (numq % 2) { 
     # this is inside double quotes so incomplete record 
     prev = $0 RT 
     next 
    } 
    prev = "" 

    for (i=1;i<=NF;i++) { 
     # map the replacement strings back to their original values 
     gsub(/aC/,"\"\"",$i) 
     gsub(/aB/,"\\\"",$i) 
     gsub(/aA/,"a",$i) 
    } 

    printf "Record %d:\n", ++recNr 
    for (i=0;i<=NF;i++) { 
     printf "\t$%d=<%s>\n", i, $i 
    } 
    print "#######" 

$ awk -f decsv.awk file 
Record 1: 
     $0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes> 
     $1=<> 
     $2=<"This is some text with a , in it."> 
     $3=< #data with commas are enclosed in double quotes> 
####### 
Record 2: 
     $0=<,"line 1 of data 
line 2 of data", #data with a couple of newlines> 
     $1=<> 
     $2=<"line 1 of data 
line 2 of data"> 
     $3=< #data with a couple of newlines> 
####### 
Record 3: 
     $0=<,"Data that may a have , in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that may a have , in it and 
also be on a newline as well."> 
     $3=<> 
####### 
Record 4: 
     $0=<,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well."> 
     $3=<> 
####### 

上記はGNU awk for FPATとRTを使用しています。私は引用符で囲まれていないフィールドの途中に改行を入れることができるようなCSV形式は知らないので(もしレコードがどこで終わったのか分からなければ)、スクリプトはそれを許さないそれ。上記は、この入力ファイル上で実行されました:

$ cat file 
,"This is some text with a , in it.", #data with commas are enclosed in double quotes 
,"line 1 of data 
line 2 of data", #data with a couple of newlines 
,"Data that may a have , in it and 
also be on a newline as well.", 
,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.", 
関連する問題