R-bigramトークナイザの文書用語行列が動作しない

私はコーパスに対してunigramsとbigramsの2つの文書 - 項行列を作ろうとしています。しかし、バイグラム・マトリックスは現在、ユニグラム・マトリックスとまったく同じです。私はなぜその理由がわかりません。R-bigramトークナイザの文書用語行列が動作しない

コード：

docs<-Corpus(DirSource("data", recursive=TRUE)) 

# Get the document term matrices 
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2)) 
dtm_unigram <- DocumentTermMatrix(docs, control = list(tokenize="words", 
    removePunctuation = TRUE, 
    stopwords = stopwords("english"), 
    stemming = TRUE)) 
dtm_bigram <- DocumentTermMatrix(docs, control = list(tokenize = BigramTokenizer, 
    removePunctuation = TRUE, 
    stopwords = stopwords("english"), 
    stemming = TRUE)) 

inspect(dtm_unigram) 
inspect(dtm_bigram)

Iは、（X、N = 2）nグラムを使用しようとしたnグラムパッケージからトークナイザとして、それはどちらか動作しません。どのようにしてbigramトークンを修正するのですか？

出典

2017-03-05 filaments

私もこの問題を抱えています。答えが見つかったら教えてください。 –

返事の遅れて、申し訳ありませんが、私はコーパスの代わりにVCorpusを使用してこれを動作させました。 – filaments

tokenizerオプションがCorpus（SimpleCorpus）で動作しないようです。代わりにVCorpusを使用すると問題が解決しました。

出典

2017-03-28 18:30:48 filaments

なぜ 'コーパス '上の' VCorpus'ですか？別の関連するSO問題[ここ]（https://stackoverflow.com/questions/42757183/creating-n-grams-with-tm-rweka-works-with-vcorpus-but-not-corpus）がありますが、満足のいく説明であると思われる。 – hongsy

R-bigramトークナイザの文書用語行列が動作しない

答えて

関連する問題