PythonでRのread.tableに相当する

私の処理の一部をRからPythonに移行しようとしています。 Rでは、私はread.table（）を使って本当に面倒なCSVファイルを読んでいます。そして、正しいフォーマットでレコードを自動的に分割します。例えば。PythonでRのread.tableに相当する

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p> 

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p> 
","windows-7 printer hp"

は正しく4つの列に分割されます。 1のレコードは多くの行に分かれていて、カンマが全部ある。 Rで私は次のようにします。

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

これを同様にうまくいくものがありますか？

ありがとうございます！

出典

2013-10-23 mchangun

csvモジュールを使用できます。出力の

from csv import reader 
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"") 

for row in csv_reader: 
    print row 

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

長さ= 4

出典

2013-10-23 08:59:05

しかしこれは文字列を返します。 read.tableが行うのと同じ方法で各列の型を推論するわけではありません。 –

pandasモジュールはまたread_csv含む多くのR様関数およびデータ構造を提供しています。ここでの利点は、データがパンダとして読み込まれるということです（DataFrame）。これは、標準的なpythonリストやdictよりも扱いが簡単です（特にRに慣れている場合）。次に例を示します。

>>> from pandas import read_csv 
>>> ugly = read_csv("ugly.csv",header=None) 
>>> ugly 
     0            1 \ 
0 391788 HP Deskjet 3050 scanner always seems to break 

                2      3 
0 <p>I'm running a Windows 7 64 blah blah blah..... windows-7 printer hp

出典

2013-10-23 14:25:30 David

PythonでRのread.tableに相当する

答えて

関連する問題