テンソルフローの損失がRNNに広がっている

私はこの問題を解決してTensorflowで手を濡らそうとしています：https://www.kaggle.com/c/integer-sequence-learning。テンソルフローの損失がRNNに広がっている

私の仕事は、これらのブログの記事に基づいています。

完全な作業例 - 私のデータでは、 - ここで見つけることができます：https://github.com/bottiger/Integer-Sequence-Learning例を実行すると、印刷します多くのデバッグ情報を出力します。 execute rnn-lstm-my.pyを実行します。（テンターフローとパンダが必要）

アプローチはかなり簡単です。私はすべての列シーケンスを読み込み、その長さをベクトルに格納し、最長の長さを変数「max_length」に格納します。私はすべてのシーケンスの最後の要素を取り除くと「train_solutions」

をIストアすべてのシーケンスと呼ばれるベクトルに格納し、私のトレーニングデータで

は、形状の行列で、ゼロでパディング： [n_seq、max_length]。

シーケンス内の次の数字を予測したいので、出力は1つの数字にする必要があり、入力はシーケンスでなければなりません。

私はRNN（tf.nn.rnn）とBasicLSTMCellをセルとして使用し、24の隠れユニットを使用します。出力は、予測を生成する基本線形モデル（xW + B）に供給されます。

マイコスト関数は単に私のモデルの予測数ですが、私は計算し、このようなコスト：

cost = tf.nn.l2_loss(tf_result - prediction)

基本寸法は、コードが実際に実行されるため、正しいように思えます。しかし、わずか1〜2回の反復の後で、NaNが発生し始め、すぐに広がり、すべてがNaNになります。

ここでは、グラフを定義して実行するコードの重要な部分を示します。しかし、私は投稿されたデータのロード/準備を省略しています。その詳細についてはgit repoを見てください - しかし、私はその部分が正しいと確信しています。

cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True) 

num_inputs = tf.placeholder(tf.int32, name='NumInputs') 
seq_length = tf.placeholder(tf.int32, shape=[batch_size], name='NumInputs') 

# Define the input as a list (num elements = batch_size) of sequences 
inputs = [tf.placeholder(tf.float32,shape=[1, max_length], name='InputData') for _ in range(batch_size)] 

# Result should be 1xbatch_szie vector 
result = tf.placeholder(tf.float32, shape=[batch_size, 1], name='OutputData') 

tf_seq_length = tf.Print(seq_length, [seq_length, seq_length.get_shape()], 'SequenceLength: ') 

outputs, states = tf.nn.rnn(cell, inputs, dtype=tf.float32) 

# Print the output. The NaN first shows up here 
outputs2 = tf.Print(outputs, [outputs], 'Last: ', name="Last", summarize=800) 

# Define the model 
tf_weight = tf.Variable(tf.truncated_normal([batch_size, num_hidden, frame_size]), name='Weight') 
tf_bias = tf.Variable(tf.constant(0.1, shape=[batch_size]), name='Bias') 

# Debug the model parameters 
weight = tf.Print(tf_weight, [tf_weight, tf_weight.get_shape()], "Weight: ") 
bias = tf.Print(tf_bias, [tf_bias, tf_bias.get_shape()], "bias: ") 

# More debug info 
print('bias: ', bias.get_shape()) 
print('weight: ', weight.get_shape()) 
print('targets ', result.get_shape()) 
print('RNN input ', type(inputs)) 
print('RNN input len()', len(inputs)) 
print('RNN input[0] ', inputs[0].get_shape()) 

# Calculate the prediction 
tf_prediction = tf.batch_matmul(outputs2, weight) + bias 
prediction = tf.Print(tf_prediction, [tf_prediction, tf_prediction.get_shape()], 'prediction: ') 

tf_result = result 

# Calculate the cost 
cost = tf.nn.l2_loss(tf_result - prediction) 

#optimizer = tf.train.AdamOptimizer() 
learning_rate = 0.05 
optimizer = tf.train.GradientDescentOptimizer(learning_rate) 


minimize = optimizer.minimize(cost) 

mistakes = tf.not_equal(tf.argmax(result, 1), tf.argmax(prediction, 1)) 
error = tf.reduce_mean(tf.cast(mistakes, tf.float32)) 

init_op = tf.initialize_all_variables() 
sess = tf.Session() 
sess.run(init_op) 

no_of_batches = int(len(train_input))/batch_size 
epoch = 1 

val_dict = get_input_dict(val_input, val_output, train_length, inputs, batch_size) 

for i in range(epoch): 
    ptr = 0 
    for j in range(no_of_batches): 

    print('eval w: ', weight.eval(session=sess)) 

    # inputs batch 
    t_i = train_input[ptr:ptr+batch_size] 

    # output batch 
    t_o = train_output[ptr:ptr+batch_size] 

    # sequence lengths 
    t_l = train_length[ptr:ptr+batch_size] 

    sess.run(minimize,feed_dict=get_input_dict(t_i, t_o, t_l, inputs, batch_size)) 

    ptr += batch_size 

    print("result: ", tf_result) 
    print("result len: ", tf_result.get_shape()) 
    print("prediction: ", prediction) 
    print("prediction len: ", prediction.get_shape()) 


    c_val = sess.run(error, feed_dict = val_dict) 
    print "Validation cost: {}, on Epoch {}".format(c_val,i) 


    print "Epoch ",str(i) 

print('test input: ', type(test_input)) 
print('test output: ', type(test_output)) 

incorrect = sess.run(error,get_input_dict(test_input, test_output, test_length, inputs, batch_size)) 

sess.close()

そして、ここではそれが生成する出力（は最初の行）です。私がNaNを参照してください最初の時間はここにあるhttp://pastebin.com/TnFFNFrrは（私が原因の身体の限界にそれをここに投稿することができませんでした）

：：私は/コア/カーネル/ logging_opsをtensorflow

あなたはNaNになるそのすべてを見ることができます.cc：79]最後：[0 0.76159418 0 0 0 0 0 -0.76159418 0 -0.76159418 0 0 0.76159418 0.76159418 0 -0.76159418 0.76159418 0 0 0 0.76159418 0 0 0ナノナン0 0ナノナノ10 0ナノ0.76159418ナノワンナノワイヤ。で76159418 -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -inにおける内で内で内で内で内で内で内で内でのInでで-in -in -in -in -in -in -in -in -in -in -in - -in -in -in -IN -IN -IN -IN -IN -IN -IN -INのののののののののののののののののの -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -inで内における内における内における内における内における内における内における内における内における内で -in -in -in -in -in -in -in -in -in -in -in -inでで-in -in -in -in -in -in -in -in -in -in - のののののののののののののののの]

のの-in私は、私は私の問題を明らかにしたいと考えています。事前のおかげで

出典

2016-08-04 bottiger