2016-10-13 5 views

おもちゃ分類問題でTensorflow(r0.10、python 3.5)を使用してリカレントニューラルネットワークを訓練しようとしていますが、混乱しています。tensorflow RNNがおもちゃデータを学習していない理由を理解してください


input sequence: [0,  0,  1,  0,  1,  1] 
binary digits : [-, [0,0], [0,1], [1,0], [0,1], [1,1]] 
target class : [-,  0,  1,  2,  1,  3] 



tf.nn.rnn()にデータを入力します。長さはTで、その要素は[batch_size x input_size]です。私のシーケンスは1次元なので、input_sizeは1に等しいので、基本的には、長さがbatch_sizedocumentationという文字列は時間次元として扱われているかどうかわかりません)というシーケンスのリストを入力していると思います。 その理解は正しいですか?その場合、RNNモデルが正しく学習していない理由を理解できません。

それは私の完全なRNNを介して実行可能なコードの小さなセットを取得するのは難しい、これは(それがほとんどthe PTB model herethe char-rnn model hereから構成されている)私は何ができる最善である:

import tensorflow as tf 
import numpy as np 

input_size = 1 
batch_size = 50 
T = 2 
lstm_size = 5 
lstm_layers = 2 
num_classes = 4 
learning_rate = 0.1 

lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True) 
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True) 

x = tf.placeholder(tf.float32, [T, batch_size, input_size]) 
y = tf.placeholder(tf.int32, [T * batch_size * input_size]) 

init_state = lstm.zero_state(batch_size, tf.float32) 

inputs = [tf.squeeze(input_, [0]) for input_ in tf.split(0,T,x)] 
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state) 

w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w') 
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b') 

output = tf.concat(0, outputs) 

logits = tf.matmul(output, w) + b 

probs = tf.nn.softmax(logits) 

cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
    [logits], [y], [tf.ones_like(y, dtype=tf.float32)] 

optimizer = tf.train.GradientDescentOptimizer(learning_rate) 
tvars = tf.trainable_variables() 
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), 
train_op = optimizer.apply_gradients(zip(grads, tvars)) 

init = tf.initialize_all_variables() 

with tf.Session() as sess: 
    curr_state = sess.run(init_state) 
    for i in range(3000): 
     # Create toy data where the true class is the value represented 
     # by the current and previous value treated as binary, i.e. 
     train_x = np.random.randint(0,2,(T * batch_size * input_size)) 
     train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2))) 

     # Reshape into T x batch_size x input_size 
     train_x = np.reshape(train_x, (T, batch_size, input_size)) 

     feed_dict = { 
      x: train_x, y: train_y 
     for j, (c, h) in enumerate(init_state): 
      feed_dict[c] = curr_state[j].c 
      feed_dict[h] = curr_state[j].h 

     fetch_dict = { 
      'cost': cost, 'final_state': final_state, 'train_op': train_op 

     # Evaluate the graph 
     fetches = sess.run(fetch_dict, feed_dict=feed_dict) 

     curr_state = fetches['final_state'] 

     if i % 300 == 0: 
      print('step {}, train cost: {}'.format(i, fetches['cost'])) 

    # Test 
    test_x = np.array([[0],[0],[1],[0],[1],[1]]*(T*batch_size*input_size)) 
    test_x = test_x[:(T*batch_size*input_size),:] 
    probs_out = sess.run(probs, feed_dict={ 
      x: np.reshape(test_x, [T, batch_size, input_size]), 
      init_state: curr_state 
    # Get the softmax outputs for the points in the sequence 
    # that have [0, 0], [0, 1], [1, 0], [1, 1] as their 
    # last two values. 
    for i in [1, 2, 3, 5]: 
     print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
       [1, 2, 3, 5].index(i), *list(probs_out[i,:])) 


0: [0.4899 0.0007 0.5080 0.0014] 
1: [0.0003 0.5155 0.0009 0.4833] 
2: [0.5078 0.0011 0.4889 0.0021] 
3: [0.0003 0.5052 0.0009 0.4936] 

これは[0,2]と[1,3]を区別することのみを学習していることを示しています。 なぜこのモデルはシーケンスの前の値を使用することを学習していませんか?



this blog postの助けを借りて(それは入力テンソルの素晴らしい図を持っています)それを考え出しました。 tf.nn.rnn()への入力の形状を正しく理解していないことが判明しました。

たとえば、batch_size個のシーケンスがあるとします。各シーケンスはinput_sizeディメンションを持ち、長さはTです(これらの名前は、tf.nn.rnn()hereというドキュメントに一致するように選択されています)。次に、各要素の形状がbatch_size x input_sizeT - 長さのリストに入力を分割する必要があります。 これは、隣接するシーケンスがリストの要素にまたがって広がることを意味します。連続したシーケンスをまとめて、リストinputsの各要素が1つのシーケンスの例になるように考えました。



import tensorflow as tf 
import numpy as np 

sequence_size = 50 
batch_size = 7 
num_features = 1 
lstm_size = 5 
lstm_layers = 2 
num_classes = 4 
learning_rate = 0.1 

lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True) 
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True) 

x = tf.placeholder(tf.float32, [batch_size, sequence_size, num_features]) 
y = tf.placeholder(tf.int32, [batch_size * sequence_size * num_features]) 

init_state = lstm.zero_state(batch_size, tf.float32) 

inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1,sequence_size,x)] 
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state) 

w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w') 
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b') 

output = tf.reshape(tf.concat(1, outputs), [-1, lstm_size]) 

logits = tf.matmul(output, w) + b 

probs = tf.nn.softmax(logits) 

cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
    [logits], [y], [tf.ones_like(y, dtype=tf.float32)] 

# Now optimize on that cost 
optimizer = tf.train.GradientDescentOptimizer(learning_rate) 
tvars = tf.trainable_variables() 
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), 
train_op = optimizer.apply_gradients(zip(grads, tvars)) 

init = tf.initialize_all_variables() 

with tf.Session() as sess: 
    curr_state = sess.run(init_state) 
    for i in range(3000): 
     # Create toy data where the true class is the value represented 
     # by the current and previous value treated as binary, i.e. 

     train_x = np.random.randint(0,2,(batch_size * sequence_size * num_features)) 
     train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2))) 

     # Reshape into T x batch_size x sequence_size 
     train_x = np.reshape(train_x, [batch_size, sequence_size, num_features]) 

     feed_dict = { 
      x: train_x, y: train_y 
     for j, (c, h) in enumerate(init_state): 
      feed_dict[c] = curr_state[j].c 
      feed_dict[h] = curr_state[j].h 

     fetch_dict = { 
      'cost': cost, 'final_state': final_state, 'train_op': train_op 

     # Evaluate the graph 
     fetches = sess.run(fetch_dict, feed_dict=feed_dict) 

     curr_state = fetches['final_state'] 

     if i % 300 == 0: 
      print('step {}, train cost: {}'.format(i, fetches['cost'])) 

    # Test 
    test_x = np.array([[0],[0],[1],[0],[1],[1]]*(batch_size * sequence_size * num_features)) 
    test_x = test_x[:(batch_size * sequence_size * num_features),:] 
    probs_out = sess.run(probs, feed_dict={ 
      x: np.reshape(test_x, [batch_size, sequence_size, num_features]), 
      init_state: curr_state 
    # Get the softmax outputs for the points in the sequence 
    # that have [0, 0], [0, 1], [1, 0], [1, 1] as their 
    # last two values. 
    for i in [1, 2, 3, 5]: 
     print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
       [1, 2, 3, 5].index(i), *list(probs_out[i,:])) 