データのトークン化中にエラーが発生しました。 Cエラー：エスケープ文字に続くEOF

Objective-C（XCodeを使用）で書かれたOS Xアプリケーションで作成したcsvテキストファイルをロードしようとしています。テキストファイル（temp2.csv）はエディタでうまく見えますが、何か問題があり、Pandasデータフレームに読み込むときにこのエラーが発生します。私は新しいテキストファイル（temp.csv）にデータをコピーし、それが正常に保存する場合！ 2つのテキストファイルは明らかに異なっています（1つは74バイト、もう1つは150です）。おそらく見えない文字でしょうか？私はPythonコードでCコードによって生成されたテキストファイルをロードしたいので、非常に面倒です。ファイルは参照用に添付されています。データのトークン化中にエラーが発生しました。 Cエラー：エスケープ文字に続くEOF

temp.csv

-3.132700,0.355885,9.000000,0.444416 
-3.128256,0.444416,9.000000,0.532507

temp2.csv

-3.132700,0.355885,9.000000,0.444416 
-3.128256,0.444416,9.000000,0.532507

（私はStackExchangeに、この特定のエラーに任意のヘルプを見つけることができません）。

Python 2.7.11 |Anaconda 2.2.0 (x86_64)| (default, Dec 6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
Anaconda is brought to you by Continuum Analytics. 
Please check out: http://continuum.io/thanks and https://anaconda.org 
>>> import pandas as pd 
>>> df = pd.read_csv("temp2.csv", header=None) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f 
    return _read(filepath_or_buffer, kwds) 
    File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 275, in _read 
    parser = TextFileReader(filepath_or_buffer, **kwds) 
    File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 590, in __init__ 
    self._make_engine(self.engine) 
    File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 731, in _make_engine 
    self._engine = CParserWrapper(self.f, **self.options) 
    File "/Users/billtubbs/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1103, in __init__ 
    self._reader = _parser.TextReader(src, **kwds) 
    File "pandas/parser.pyx", line 515, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4948) 
    File "pandas/parser.pyx", line 717, in pandas.parser.TextReader._get_header (pandas/parser.c:7496) 
    File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838) 
    File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649) 
pandas.parser.CParserError: Error tokenizing data. C error: EOF following escape character 
>>> df = pd.read_csv("temp.csv", header=None) 
>>> df 
      0   1 2   3 
0 -3.132700 0.355885 9 0.444416 
1 -3.128256 0.444416 9 0.532507

脚注：私は問題を見つけたと思います。

>>> f = open('temp2.csv') 
>>> contents = f.read() 
>>> print contents 
??-3.132700,0.355885,9.000000,0.444416 
-3.128256,0.444416,9.000000,0.532507 
>>> contents 
'\xff\xfe-\x003\x00.\x001\x003\x002\x007\x000\x000\x00,\x000\x00.\x003\x005\x005\x008\x008\x005\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00\n\x00-\x003\x00.\x001\x002\x008\x002\x005\x006\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x005\x003\x002\x005\x000\x007\x00'

エスケープ文字でいっぱいです！それらを削除するには？

出典

2016-01-11 Bill

あなたは、ファイルのエンコーディングがUTF-16あるので、read_csvにパラメータencodingを追加必要があります。

import pandas as pd 

contents = '\xff\xfe-\x003\x00.\x001\x003\x002\x007\x000\x000\x00,\x000\x00.\x003\x005\x005\x008\x008\x005\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00\n\x00-\x003\x00.\x001\x002\x008\x002\x005\x006\x00,\x000\x00.\x004\x004\x004\x004\x001\x006\x00,\x009\x00.\x000\x000\x000\x000\x000\x000\x00,\x000\x00.\x005\x003\x002\x005\x000\x007\x00' 

text_file = open("test/file1.csv", "wb") 
text_file.write(contents) 
text_file.close() 

df = pd.read_csv("test/file1.csv", header=None, encoding='utf-16') 
print df 

      0   1 2   3 
0 -3.132700 0.355885 9 0.444416 
1 -3.128256 0.444416 9 0.532507

出典

2016-01-11 06:54:09 jezrael

データのトークン化中にエラーが発生しました。 Cエラー：エスケープ文字に続くEOF

答えて

関連する問題