テキストファイルから非ASCII文字を読み取る

私はpython 2.7を使用しています。私はコーデックのような多くのことを試しましたが、うまくいきませんでした。どうしたらいいですか？テキストファイルから非ASCII文字を読み取る

wörd

myfile.txtのは、私のコード

f = open('myfile.txt','r') 
for line in f: 
    print line 
f.close()

出力

s\xc3\xb6zc\xc3\xbck

出力は日食とコマンドウィンドウ上で同じです。私はWin7を使用しています。私がファイルから読み込まないときは、文字に問題はありません。

出典

2012-04-29 Rckt

あなたはどのような結果を期待していますか？技術的には、Pythonはファイルを正しく読み込んでいます。 – srgerg

なぜ行単位で文字を印刷しますか？単純に 'for line in f：print line'と言うのはなぜですか？私がそれをしたとき、それは必要に応じて "söcük"を印刷しました。 – srgerg

私は試しましたが動作しません。それはs \ xc3 \ xb6zc \ xc3 \ xbckを出力しました。 – Rckt

すべての最初の - そして


    from chardet import detect 
    encoding = lambda x: detect(x)['encoding'] 
    print encoding(line)

をコード検出 - それはUnicodeまたはあなたのデフォルトのエンコードstrの変換：


    n_line=unicode(line,encoding(line),errors='ignore') 
    print n_line 
    print n_line.encode('utf8')

出典

2012-04-30 00:16:51 lavrton

それは、端末のエンコードです。ファイルで使用しているのと同じエンコーディングで端末を設定してみてください。 UTF-8を使用することをお勧めします。

ところで

は、問題を回避するために、すべての入力・出力をエンコードしデコードすることをお勧めし：

f = open('test.txt','r')  
for line in f: 
    l = unicode(line, encoding='utf-8')# decode the input                     
    print l.encode('utf-8') # encode the output                        
f.close()

出典

2012-04-30 00:18:12 jgomo3

今、なぜ彼らが3.0でUTF-8標準を作っているのか分かります。（PEP 3120） – mgold

@mgold：PEP 3120はすべてソース（.py）ファイルのエンコードに関するものです。入力や出力のエンコーディングに関するOPの問題とは関係ありません。 –

Ooh。良いキャッチ。 – mgold

import codecs 
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8') 
#read the file to unicode string 
sfile=f.read() 

#check the encoding type 
print type(file) #it's unicode 

#unicode should be encoded to standard string to display it properly 
print sfile.encode('utf-8') 
#check the type of encoded string 

print type(sfile.encode('utf-8'))

出典

2013-02-09 11:58:32

テキストファイルから非ASCII文字を読み取る

答えて

関連する問題