Rの中でDocumentTermMatrixを 'dictionary'パラメータで使用する

テキスト分類にRを使いたい。私は言葉の行列を返すためにDocumentTermMatrixを使用します。Rの中でDocumentTermMatrixを 'dictionary'パラメータで使用する

library(tm) 
crude <- "japan korea usa uk albania azerbaijan" 
corps <- Corpus(VectorSource(crude)) 
dtm <- DocumentTermMatrix(corps) 
inspect(dtm) 

words <- c("australia", "korea", "uganda", "japan", "argentina", "turkey") 
test <- DocumentTermMatrix(corps, control=list(dictionary = words)) 
inspect(test)

結果と期待通りに最初inspect(dtm)作品：

Terms 
Docs albania azerbaijan japan korea usa 
    1  1   1  1  1 1

しかし、二inspect(test)はこの結果を示しています。

Terms 
Docs argentina australia japan korea turkey uganda 
    1   0   1  0  1  0  0

しばらく期待される結果は：

Terms 
Docs argentina australia japan korea turkey uganda 
    1   0   0  1  1  0  0

バグですか、それとも間違った方法ですか？

出典

2017-06-20 Izzur Zuhri

コーパス（）は単語の頻度をインデックスするときにバグがあるようです。

代わりにVCorpus（）を使用すると、期待した結果が得られます。

出典

2017-09-27 18:11:00 AshOfFire

Rの中でDocumentTermMatrixを 'dictionary'パラメータで使用する

答えて

関連する問題