2015-09-19 41 views
5

vocabulary引数で語彙を渡してsklearn.feature_extraction.text.CountVectorizerオブジェクトをインスタンス化しましたが、sklearn.utils.validation.NotFittedError: CountVectorizer - Vocabulary wasn't fitted.というエラーメッセージが表示されます。どうして?CountVectorizer:語彙が正しくありません

例:

import sklearn.feature_extraction 
import numpy as np 
import pickle 

# Save the vocabulary 
ngram_size = 1 
dictionary_filepath = 'my_unigram_dictionary' 
vectorizer = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size), min_df=1) 

corpus = ['This is the first document.', 
     'This is the second second document.', 
     'And the third one.', 
     'Is this the first document? This is right.',] 

vect = vectorizer.fit(corpus) 
print('vect.get_feature_names(): {0}'.format(vect.get_feature_names())) 
pickle.dump(vect.vocabulary_, open(dictionary_filepath, 'w')) 

# Load the vocabulary 
vocabulary_to_load = pickle.load(open(dictionary_filepath, 'r')) 
loaded_vectorizer = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size), min_df=1, vocabulary=vocabulary_to_load) 
print('loaded_vectorizer.get_feature_names(): {0}'.format(loaded_vectorizer.get_feature_names())) 

出力:何らかの理由で

vect.get_feature_names(): [u'and', u'document', u'first', u'is', u'one', u'right', u'second', u'the', u'third', u'this'] 
Traceback (most recent call last): 
    File "C:\Users\Francky\Documents\GitHub\adobe\dstc4\test\CountVectorizerSaveDic.py", line 22, in <module> 
    print('loaded_vectorizer.get_feature_names(): {0}'.format(loaded_vectorizer.get_feature_names())) 
    File "C:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 890, in get_feature_names 
    self._check_vocabulary() 
    File "C:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 271, in _check_vocabulary 
    check_is_fitted(self, 'vocabulary_', msg=msg), 
    File "C:\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 627, in check_is_fitted 
    raise NotFittedError(msg % {'name': type(estimator).__name__}) 
sklearn.utils.validation.NotFittedError: CountVectorizer - Vocabulary wasn't fitted. 

答えて

5

、あなたがsklearn.feature_extraction.text.CountVectorizer()の引数としてvocabulary=vocabulary_to_loadを通過したにも関わらず、あなたはまだloaded_vectorizer.get_feature_names()を呼び出すことができるようになる前にloaded_vectorizer._validate_vocabulary()を呼び出す必要があります。あなたの例では

、あなたが行う必要がありますので、あなたの語彙とCountVectorizerオブジェクトを作成するときは、次の

vocabulary_to_load = pickle.load(open(dictionary_filepath, 'r')) 
loaded_vectorizer = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size, 
             ngram_size), min_df=1, vocabulary=vocabulary_to_load) 
loaded_vectorizer._validate_vocabulary() 
print('loaded_vectorizer.get_feature_names(): {0}'. 
    format(loaded_vectorizer.get_feature_names())) 
関連する問題