unicodeから句読点を削除する：

ユニコード文字列から句読点を削除する必要があります。私はいくつかの投稿を読んで、最も推奨されたものはthis oneでした。unicodeから句読点を削除する：

私は次のように実装しました：

table = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')) 

def tokenize(message): 
    message = unicode(message,'utf-8').lower() 
    #print message 
    message = remove_punctuation_unicode(message) 
    return message 

def remove_punctuation_unicode(string): 
    return string.translate(table)

をしかし、私は、コードを実行すると、このエラーがポップアップ表示されます：

table = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')) 
TypeError: must be unicode, not str

私はかなり何をすべきか、それを把握することはできません。誰かがこれを修正する方法を教えてもらえますか？

出典

2016-04-16 Krishh

あなたは、Pythonのバージョンは何を使用していますか？ – rvs

@rvs Python 2.7 – Krishh

はunichr代わりのchrをお試しください：

Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import sys, unicodedata 
>>> table = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(unichr(i)).startswith('P')) 
>>>

出典

2016-04-16 11:43:45 Yurim

Worked !!! 細かいことが重要です。ありがとう！ – Krishh

@KrishanuKonar [リンクした投稿]（http://stackoverflow.com/a/11066687/5374161）で 'unichr'を使用すると、なぜ' chr'を使いましたか？ – jfs

正直言って、私はそれを一度読んで理解し、それを自分でタイプしていました。間違って間違ったタイプの間違いがありました。私の悪い。 – Krishh

unicodeから句読点を削除する：

答えて

関連する問題