LDAモデルは毎回異なるトピックを作成する私は、私は231行の文章の小さなコーパスからの潜在的ディリクレ配分（LDA）モデルを訓練するためのpython <code>gensim</code>を使用しています同じコーパス

に訓練します。しかし、私はプロセスを繰り返すたびに、異なるトピックを生成します。LDAモデルは毎回異なるトピックを作成する私は、私は231行の文章の小さなコーパスからの潜在的ディリクレ配分（LDA）モデルを訓練するためのpython <code>gensim</code>を使用しています同じコーパス

同じLDAパラメータとコーパスが毎回異なるトピックを生成するのはなぜですか？

トピック生成を安定させるにはどうすればよいですか？

は、私が（http://pastebin.com/WptkKVF0）このコーパスを使用し、ストップワードのリスト（http://pastebin.com/LL7dqLcj）と、ここに私のコードですよ：

from gensim import corpora, models, similarities 
from gensim.models import hdpmodel, ldamodel 
from itertools import izip 
from collections import defaultdict 
import codecs, os, glob, math 

stopwords = [i.strip() for i in codecs.open('stopmild','r','utf8').readlines() if i[0] != "#" and i != ""] 

def generateTopics(corpus, dictionary): 
    # Build LDA model using the above corpus 
    lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50) 
    corpus_lda = lda[corpus] 

    # Group topics with similar words together. 
    tops = set(lda.show_topics(50)) 
    top_clusters = [] 
    for l in tops: 
     top = [] 
     for t in l.split(" + "): 
      top.append((t.split("*")[0], t.split("*")[1])) 
     top_clusters.append(top) 

    # Generate word only topics 
    top_wordonly = [] 
    for i in top_clusters: 
     top_wordonly.append(":".join([j[1] for j in i])) 

    return lda, corpus_lda, top_clusters, top_wordonly 

####################################################################### 

# Read textfile, build dictionary and bag-of-words corpus 
documents = [] 
for line in codecs.open("./europarl-mini2/map/coach.en-es.all","r","utf8"): 
    lemma = line.split("\t")[3] 
    documents.append(lemma) 
texts = [[word for word in document.lower().split() if word not in stopwords] 
      for document in documents] 
dictionary = corpora.Dictionary(texts) 
corpus = [dictionary.doc2bow(text) for text in texts] 

lda, corpus_lda, topic_clusters, topic_wordonly = generateTopics(corpus, dictionary) 

for i in topic_wordonly: 
    print i

出典

2013-02-25 alvas

はなぜ同じLDAパラメータとコーパスは、毎回異なるトピックを生成するのでしょうか？

LDAは訓練段階と推論段階の両方でランダム性を使用するため、

トピック生成を安定させるにはどうすればよいですか？同じ値にnumpy.random.seedを持つモデルが訓練されるか推論が実行されるたびに、numpy.randomシードをリセットすることにより

：

SOME_FIXED_SEED = 42 

# before training/inference: 
np.random.seed(SOME_FIXED_SEED)

（これは醜いであり、それはGensimを再現するのは難しい結果になり、パッチを提出することを検討してください。私はすでにissueを開いた。）

出典

2013-02-25 14:44:31

traingデータが十分であれば、結果は限定されたループに収束しなければなりません。ではない？ –

は私がnumpy.random.seed' 'に' numpy.random'を設定しない方法を知ってもいいですか？ 'ldamodel'を' numpy.random.seed'と呼ぶ方法の例を私に見せてもらえますか？あなたは 'np.random'を設定しない2er0 @ – alvas

は* *' np.random.seed'、あなたは* 'np.random.seed'でシード*を設定します。 –

私も約50,000のコメントで、同じ問題を抱えていました。しかし、LDAが実行する反復回数を増やすことで、より一貫したトピックを得ることができます。初期値は50に設定されていますが、300に設定すると、通常はコンバージェンスに近いため、同じ結果が得られます。

具体的には、あなただけの次のオプションを追加します。

ldamodel.LdaModel(corpus, ..., iterations = <your desired iterations>):

出典

2017-01-21 03:56:12 Richard

LDAモデルは毎回異なるトピックを作成する私は、私は231行の文章の小さなコーパスからの潜在的ディリクレ配分（LDA）モデルを訓練するためのpython <code>gensim</code>を使用しています同じコーパス

答えて

関連する問題