NLTKの一致を呼び出す - 使用された単語の前/後にテキストを取得する方法？

私は、concordaceが返すインスタンスの後にどのテキストが来るかを知りたいと思います。例えば、あなたが'Searching Text' sectionで示した例を見ると、彼らは単語 '怪物'の一致を得ます。あなたは怪物のインスタンスの直後に来る言葉をどうやって得るのですか？NLTKの一致を呼び出す - 使用された単語の前/後にテキストを取得する方法？

出典

2012-01-17 dev.e.loper

import nltk 
import nltk.book as book 
text1 = book.text1 
c = nltk.ConcordanceIndex(text1.tokens, key = lambda s: s.lower()) 
print([text1.tokens[offset+1] for offset in c.offsets('monstrous')])

利回り

['size', 'bulk', 'clubs', 'cannibal', 'and', 'fable', 'Pictures', 'pictures', 'stories', 'cabinet', 'size']

私はconcordanceメソッドが定義されている方法を検索することでこれを見つけました。

これはtext1.concordanceを示しては/usr/lib/python2.7/dist-packages/nltk/text.pyに定義されています。そのファイルには

In [107]: text1.concordance? 
Type:  instancemethod 
Base Class: <type 'instancemethod'> 
String Form: <bound method Text.concordance of <Text: Moby Dick by Herman Melville 1851>> 
Namespace: Interactive 
File:  /usr/lib/python2.7/dist-packages/nltk/text.py

あなたはこれがConcordanceIndexオブジェクトをインスタンス化する方法を示し

def concordance(self, word, width=79, lines=25): 
    ... 
     self._concordance_index = ConcordanceIndex(self.tokens, 
                key=lambda s:s.lower()) 
    ...    
    self._concordance_index.print_concordance(word, width, lines)

見つけることができます。

、同じファイルであなたも見つける：IPythonインタプリタでいくつかの実験では

class ConcordanceIndex(object): 
    def __init__(self, tokens, key=lambda x:x): 
     ... 
    def print_concordance(self, word, width=75, lines=25): 
     ... 
     offsets = self.offsets(word) 
     ... 
     right = ' '.join(self._tokens[i+1:i+context])

が、これはself.offsets('monstrous')は言葉monstrousを見つけることができる番号（オフセット）のリストを与える示しています。実際の単語にはself._tokens[offset]でアクセスできます。これはtext1.tokens[offset]と同じです。

monstrousの次の単語は、text1.tokens[offset+1]です。

出典

2012-01-17 17:11:18 unutbu

NLTKの一致を呼び出す - 使用された単語の前/後にテキストを取得する方法？

答えて

関連する問題