マージUnicodeのCSVファイルは、私は、このようなコードスニペットを持っている2.7

をPYTHON：マージUnicodeのCSVファイルは、私は、このようなコードスニペットを持っている2.7

import csv, sys, os 
rootdir = sys.argv[1] 
for root,subFolders, files in os.walk(rootdir): 
    outfileName = rootdir + "\\root-dir.csv" # hardcoded path 
    #for subdir in subFolders: 
    for file in files: 
     filePath = os.path.join(root, file) 
     with open(filePath) as csvin: 
      readfile = csv.reader(csvin, delimiter=',') 
      with open(outfileName, 'a') as csvout: 
       writefile = csv.writer(csvout, delimiter=',', lineterminator='\n') 
       for row in readfile: 
        row.extend([file]) 
        writefile.writerow(row) 
       csvout.close() 
      csvin.close() 
print("Ready!")

それはASCIIファイルで素晴らしい作品が、Unicodeのバージョンで動作することはできません。自動実行ログファイルの例を次に示します。https://cloud.mail.ru/public/6Gqc/MKjKaqs8B。私はそのようなファイルのいくつかをマージする必要があります。このアクションを実行するには、このコードをどのように変更できますか？ Python 2.7のために必要です。

ありがとうございます！

出典

2017-01-23 Oleg

pythonのドキュメントには、reading/writing to unicode CSVsの素晴らしい例があります。

class UnicodeReader: 
    """ 
    A CSV reader which will iterate over lines in the CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     f = UTF8Recoder(f, encoding) 
     self.reader = csv.reader(f, dialect=dialect, **kwds) 

    def next(self): 
     row = self.reader.next() 
     return [unicode(s, "utf-8") for s in row] 

    def __iter__(self): 
     return self 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row)

出典

2017-01-23 08:57:07 2ps

私はそれを使用しようとしましたが、データを正しく読み取れませんでした。 'utf8'コーデックは、0番地のバイト0xffをデコードできません。ファイルの先頭から2バイトを削除すると、次のエラーが発生します：line NULLバイトを含んでいます – Oleg

@OlegあなたのデータファイルがUTF-16ではなく、UTF-8であるように聞こえます。 –

UTF-16を読む方法を検討することを提案してもよいですか？ – Oleg

マージUnicodeのCSVファイルは、私は、このようなコードスニペットを持っている2.7

答えて

関連する問題