NLTKでpos_tagを使用するには？

だから私はそうのような（POSタグ付けは正確には）、リスト内の単語の束をタグ付けしようとしていた：NLTKでpos_tagを使用するには？

lwは単語のリストである（それは本当に長いですか、私はそれを掲載しているでしょう

pos = [nltk.pos_tag(i,tagset='universal') for i in lw]

が、それは、（各リストは1つの単語を含むリストのリスト別名）[['hello'],['world']]のようだが、私は試してみて、それを実行したときに私が取得：

Traceback (most recent call last): 
    File "<pyshell#183>", line 1, in <module> 
    pos = [nltk.pos_tag(i,tagset='universal') for i in lw] 
    File "<pyshell#183>", line 1, in <listcomp> 
    pos = [nltk.pos_tag(i,tagset='universal') for i in lw] 
    File "C:\Users\my system\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\tag\__init__.py", line 134, in pos_tag 
    return _pos_tag(tokens, tagset, tagger) 
    File "C:\Users\my system\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\tag\__init__.py", line 102, in _pos_tag 
    tagged_tokens = tagger.tag(tokens) 
    File "C:\Users\my system\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\tag\perceptron.py", line 152, in tag 
    context = self.START + [self.normalize(w) for w in tokens] + self.END 
    File "C:\Users\my system\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\tag\perceptron.py", line 152, in <listcomp> 
    context = self.START + [self.normalize(w) for w in tokens] + self.END 
    File "C:\Users\my system\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\tag\perceptron.py", line 240, in normalize 
    elif word[0].isdigit(): 
IndexError: string index out of range

誰かがなぜ、どのように私はこのエラーを取得し、どのようにそれを修正する教えてもらえますか？多くのありがとう。

出典

2017-11-27 EighteenthVariable

まず、人間が読める変数名を使用してください。）=

次は、pos_tagの入力は文字列のリストです。

>>> from nltk import pos_tag, word_tokenize 
>>> a_sentence = 'hello world' 
>>> word_tokenize(a_sentence) 
['hello', 'world'] 
>>> pos_tag(word_tokenize(a_sentence)) 
[('hello', 'NN'), ('world', 'NN')] 

>>> two_sentences = ['hello world', 'good morning'] 
>>> [word_tokenize(sent) for sent in two_sentences] 
[['hello', 'world'], ['good', 'morning']] 
>>> [pos_tag(word_tokenize(sent)) for sent in two_sentences] 
[[('hello', 'NN'), ('world', 'NN')], [('good', 'JJ'), ('morning', 'NN')]]

をそして、あなたは段落の文章を持って、あなたが分割するsent_tokenizeを使用することができます。だから、あなたがpos_tag前word_tokenizeを使用することができ、生の文字列として入力している場合、

また

>>> from nltk import pos_tag 
>>> sentences = [ ['hello', 'world'], ['good', 'morning'] ] 
>>> [pos_tag(sent) for sent in sentences] 
[[('hello', 'NN'), ('world', 'NN')], [('good', 'JJ'), ('morning', 'NN')]]

です文章。

>>> from nltk import sent_tokenize, word_tokenize, pos_tag 
>>> text = "Hello world. Good morning." 
>>> sent_tokenize(text) 
['Hello world.', 'Good morning.'] 
>>> [word_tokenize(sent) for sent in sent_tokenize(text)] 
[['Hello', 'world', '.'], ['Good', 'morning', '.']] 
>>> [pos_tag(word_tokenize(sent)) for sent in sent_tokenize(text)] 
[[('Hello', 'NNP'), ('world', 'NN'), ('.', '.')], [('Good', 'JJ'), ('morning', 'NN'), ('.', '.')]]

参照：答えをHow to do POS tagging using the NLTK POS tagger in Python?

出典

2017-11-28 01:51:09 alvas

おかげで、それは動作しますが、ここではただ問題は、これが起こっていた_why_私はまた思っていたということです。それにもかかわらず、私はあなたの答えに感謝します。 – EighteenthVariable

NLTKでpos_tagを使用するには？

答えて

関連する問題