MNISTデータセットを学習するマシンのバックプロパゲーションを正しく実装する方法は？

だから、私は（それが基本的に同じである）私のコードのための基準としてマイケル・ニールソンの機械学習帳を使用しています：http://neuralnetworksanddeeplearning.com/chap1.html MNISTデータセットを学習するマシンのバックプロパゲーションを正しく実装する方法は？

問題のコード：

def backpropagate(self, image, image_value) : 


     # declare two new numpy arrays for the updated weights & biases 
     new_biases = [np.zeros(bias.shape) for bias in self.biases] 
     new_weights = [np.zeros(weight_matrix.shape) for weight_matrix in self.weights] 

     # -------- feed forward -------- 
     # store all the activations in a list 
     activations = [image] 

     # declare empty list that will contain all the z vectors 
     zs = [] 
     for bias, weight in zip(self.biases, self.weights) : 
      print(bias.shape) 
      print(weight.shape) 
      print(image.shape) 
      z = np.dot(weight, image) + bias 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 

     # -------- backward pass -------- 
     # transpose() returns the numpy array with the rows as columns and columns as rows 
     delta = self.cost_derivative(activations[-1], image_value) * sigmoid_prime(zs[-1]) 
     new_biases[-1] = delta 
     new_weights[-1] = np.dot(delta, activations[-2].transpose()) 

     # l = 1 means the last layer of neurons, l = 2 is the second-last, etc. 
     # this takes advantage of Python's ability to use negative indices in lists 
     for l in range(2, self.num_layers) : 
      z = zs[-1] 
      sp = sigmoid_prime(z) 
      delta = np.dot(self.weights[-l+1].transpose(), delta) * sp 
      new_biases[-l] = delta 
      new_weights[-l] = np.dot(delta, activations[-l-1].transpose()) 
     return (new_biases, new_weights)

私のアルゴリズムのみを取得することができます最初のラウンドバックプロパゲーションこのエラーが発生する前に：

File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 97, in stochastic_gradient_descent 
    self.update_mini_batch(mini_batch, learning_rate) 
    File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 117, in update_mini_batch 
    delta_biases, delta_weights = self.backpropagate(image, image_value) 
    File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 160, in backpropagate 
    z = np.dot(weight, activation) + bias 
ValueError: shapes (30,50000) and (784,1) not aligned: 50000 (dim 1) != 784 (dim 0)

私はそれは誤りだ理由を取得します。重みの列の数はピクセル画像の行の数と一致しないので、行列の乗算はできません。ここで私は混乱しています。バックプロパゲーションに使用される30個のニューロンがあり、それぞれ50,000個の画像が評価されます。私の理解では、50,000の各ピクセルに784の重みが付いている必要があります。しかし、私はそれに応じてコードを変更する場合：

 count = 0 
     for bias, weight in zip(self.biases, self.weights) : 
      print(bias.shape) 
      print(weight[count].shape) 
      print(image.shape) 
      z = np.dot(weight[count], image) + bias 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 
      count += 1

を私はまだ同様のエラーを取得：

ValueError: shapes (50000,) and (784,1) not aligned: 50000 (dim 0) != 784 (dim 0)

私が関与するすべての線形代数によって本当にconfuzzledだと私は、私はちょうど何かが欠けていると思います重み行列の構造についてどんな助けでも大歓迎です。

出典

2017-10-20 Eli

元のコードに変更があったようです。

I'beは、あなたが提供されるリンクから例をダウンロードし、それがエラーなしで動作します。

ここで私が使用し、完全なソースコードです：

import cPickle 
import gzip 
import numpy as np 
import random 

def load_data(): 
    """Return the MNIST data as a tuple containing the training data, 
    the validation data, and the test data. 
    The ``training_data`` is returned as a tuple with two entries. 
    The first entry contains the actual training images. This is a 
    numpy ndarray with 50,000 entries. Each entry is, in turn, a 
    numpy ndarray with 784 values, representing the 28 * 28 = 784 
    pixels in a single MNIST image. 
    The second entry in the ``training_data`` tuple is a numpy ndarray 
    containing 50,000 entries. Those entries are just the digit 
    values (0...9) for the corresponding images contained in the first 
    entry of the tuple. 
    The ``validation_data`` and ``test_data`` are similar, except 
    each contains only 10,000 images. 
    This is a nice data format, but for use in neural networks it's 
    helpful to modify the format of the ``training_data`` a little. 
    That's done in the wrapper function ``load_data_wrapper()``, see 
    below. 
    """ 
    f = gzip.open('../data/mnist.pkl.gz', 'rb') 
    training_data, validation_data, test_data = cPickle.load(f) 
    f.close() 
    return (training_data, validation_data, test_data) 

def load_data_wrapper(): 
    """Return a tuple containing ``(training_data, validation_data, 
    test_data)``. Based on ``load_data``, but the format is more 
    convenient for use in our implementation of neural networks. 
    In particular, ``training_data`` is a list containing 50,000 
    2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray 
    containing the input image. ``y`` is a 10-dimensional 
    numpy.ndarray representing the unit vector corresponding to the 
    correct digit for ``x``. 
    ``validation_data`` and ``test_data`` are lists containing 10,000 
    2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional 
    numpy.ndarry containing the input image, and ``y`` is the 
    corresponding classification, i.e., the digit values (integers) 
    corresponding to ``x``. 
    Obviously, this means we're using slightly different formats for 
    the training data and the validation/test data. These formats 
    turn out to be the most convenient for use in our neural network 
    code.""" 
    tr_d, va_d, te_d = load_data() 
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]] 
    training_results = [vectorized_result(y) for y in tr_d[1]] 
    training_data = zip(training_inputs, training_results) 
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]] 
    validation_data = zip(validation_inputs, va_d[1]) 
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]] 
    test_data = zip(test_inputs, te_d[1]) 
    return (training_data, validation_data, test_data) 

def vectorized_result(j): 
    """Return a 10-dimensional unit vector with a 1.0 in the jth 
    position and zeroes elsewhere. This is used to convert a digit 
    (0...9) into a corresponding desired output from the neural 
    network.""" 
    e = np.zeros((10, 1)) 
    e[j] = 1.0 
    return e 

class Network(object): 

    def __init__(self, sizes): 
     """The list ``sizes`` contains the number of neurons in the 
     respective layers of the network. For example, if the list 
     was [2, 3, 1] then it would be a three-layer network, with the 
     first layer containing 2 neurons, the second layer 3 neurons, 
     and the third layer 1 neuron. The biases and weights for the 
     network are initialized randomly, using a Gaussian 
     distribution with mean 0, and variance 1. Note that the first 
     layer is assumed to be an input layer, and by convention we 
     won't set any biases for those neurons, since biases are only 
     ever used in computing the outputs from later layers.""" 
     self.num_layers = len(sizes) 
     self.sizes = sizes 
     self.biases = [np.random.randn(y, 1) for y in sizes[1:]] 
     self.weights = [np.random.randn(y, x) 
         for x, y in zip(sizes[:-1], sizes[1:])] 

    def feedforward(self, a): 
     """Return the output of the network if ``a`` is input.""" 
     for b, w in zip(self.biases, self.weights): 
      a = sigmoid(np.dot(w, a)+b) 
     return a 

    def SGD(self, training_data, epochs, mini_batch_size, eta, 
      test_data=None): 
     """Train the neural network using mini-batch stochastic 
     gradient descent. The ``training_data`` is a list of tuples 
     ``(x, y)`` representing the training inputs and the desired 
     outputs. The other non-optional parameters are 
     self-explanatory. If ``test_data`` is provided then the 
     network will be evaluated against the test data after each 
     epoch, and partial progress printed out. This is useful for 
     tracking progress, but slows things down substantially.""" 
     if test_data: n_test = len(test_data) 
     n = len(training_data) 
     for j in xrange(epochs): 
      random.shuffle(training_data) 
      mini_batches = [ 
       training_data[k:k+mini_batch_size] 
       for k in xrange(0, n, mini_batch_size)] 
      for mini_batch in mini_batches: 
       self.update_mini_batch(mini_batch, eta) 
      if test_data: 
       print "Epoch {0}: {1}/{2}".format(
        j, self.evaluate(test_data), n_test) 
      else: 
       print "Epoch {0} complete".format(j) 

    def update_mini_batch(self, mini_batch, eta): 
     """Update the network's weights and biases by applying 
     gradient descent using backpropagation to a single mini batch. 
     The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` 
     is the learning rate.""" 
     nabla_b = [np.zeros(b.shape) for b in self.biases] 
     nabla_w = [np.zeros(w.shape) for w in self.weights] 
     for x, y in mini_batch: 
      delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
      nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
      nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
     self.weights = [w-(eta/len(mini_batch))*nw 
         for w, nw in zip(self.weights, nabla_w)] 
     self.biases = [b-(eta/len(mini_batch))*nb 
         for b, nb in zip(self.biases, nabla_b)] 

    def backprop(self, x, y): 
     """Return a tuple ``(nabla_b, nabla_w)`` representing the 
     gradient for the cost function C_x. ``nabla_b`` and 
     ``nabla_w`` are layer-by-layer lists of numpy arrays, similar 
     to ``self.biases`` and ``self.weights``.""" 
     nabla_b = [np.zeros(b.shape) for b in self.biases] 
     nabla_w = [np.zeros(w.shape) for w in self.weights] 
     # feedforward 
     activation = x 
     activations = [x] # list to store all the activations, layer by layer 
     zs = [] # list to store all the z vectors, layer by layer 
     for b, w in zip(self.biases, self.weights): 
      z = np.dot(w, activation)+b 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 
     # backward pass 
     delta = self.cost_derivative(activations[-1], y) * \ 
      sigmoid_prime(zs[-1]) 
     nabla_b[-1] = delta 
     nabla_w[-1] = np.dot(delta, activations[-2].transpose()) 
     # Note that the variable l in the loop below is used a little 
     # differently to the notation in Chapter 2 of the book. Here, 
     # l = 1 means the last layer of neurons, l = 2 is the 
     # second-last layer, and so on. It's a renumbering of the 
     # scheme in the book, used here to take advantage of the fact 
     # that Python can use negative indices in lists. 
     for l in xrange(2, self.num_layers): 
      z = zs[-l] 
      sp = sigmoid_prime(z) 
      delta = np.dot(self.weights[-l+1].transpose(), delta) * sp 
      nabla_b[-l] = delta 
      nabla_w[-l] = np.dot(delta, activations[-l-1].transpose()) 
     return (nabla_b, nabla_w) 

    def evaluate(self, test_data): 
     """Return the number of test inputs for which the neural 
     network outputs the correct result. Note that the neural 
     network's output is assumed to be the index of whichever 
     neuron in the final layer has the highest activation.""" 
     test_results = [(np.argmax(self.feedforward(x)), y) 
         for (x, y) in test_data] 
     return sum(int(x == y) for (x, y) in test_results) 

    def cost_derivative(self, output_activations, y): 
     """Return the vector of partial derivatives \partial C_x/
     \partial a for the output activations.""" 
     return (output_activations-y) 

#### Miscellaneous functions 
def sigmoid(z): 
    """The sigmoid function.""" 
    return 1.0/(1.0+np.exp(-z)) 

def sigmoid_prime(z): 
    """Derivative of the sigmoid function.""" 
    return sigmoid(z)*(1-sigmoid(z)) 

training_data, validation_data, test_data = load_data_wrapper() 
net = Network([784, 30, 10]) 
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

追加情報：

は、

しかし、私は既存のフレームワークを使用することをお勧めします。例えば、Kerasはホイールを再開発しません。

また、それはpythonの3.6で確認した：

出典

2017-10-21 13:11:19

ありがとうございました。また、私はこのコードを働かせようとしている唯一の理由は、ニューラルネットワークの背後にある実際のメカニズムを理解しようとしており、なぜそれが動作するのかを理解しようとしているからです。私はまた、私が参加しているデータサイエンスクラブにそれを説明することができなければなりません。この作業が終わると、おそらくテンソルフローという既存のフレームワークに移ります。 – Eli

さて、コードをもう一度見てみましたが、文字通り、変数名の変更とPython 3の互換性の更新（コードはPython 2）の横にすべて同じです。 \ n delta = np.dot（self.weights [-l + 1] .transpose（）、delta）* sp ValueError：オペランドをシェイプ（30,1）とともにブロードキャストできませんでした10,1）あなたは私のコードを見て、私がちょうどそれほど明白でない何かを逃したかどうかを知りたいですか？ Repo：https://github.com/elijahanderson/DPUDS_Projects/tree/master/Fall_2017/MNIST – Eli

@Eli：リンクからコードをチェックしたところ、少なくともPython 2.7の私の環境では正しく動作します。その後、Python 3.6でコードをチェックしました（私の答えにスクリーンショットが追加されていますのでご覧ください）。私はあなたの環境で言及したエラーを正確に引き起こしたのかどうか確かではありません。パッケージの一部または誤った設定の間違ったバージョンかもしれません。あなたはアップグレードするか、あなたが不安定なパッケージを再インストールしようとしますか？それが助けにならないなら、私はすべてのパッケージであなたのpython環境を再インストールすることをお勧めします –

賛辞をニールセンのコードに掘りに。 NN原則の徹底的な理解を深めるための素晴らしいリソースです。フードの下で何が起こっているかを知らずにケラスに飛び込む人が多すぎます。

各トレーニングの例には独自の重みがありません。 784 の各機能はです。各例に独自の重みがある場合、各重み集合は対応する訓練例にあてはまります。また、あとで訓練されたネットワークを使用して単一のテスト例で推論を実行した場合、1桁の手書き数字で提示された場合、50,000セットの重み付けではどうなりますか？代わりに、あなたの隠れ層の30個のニューロンの各々は、一般化を手書き数字にすると高い予測精度を提供する784個の重みのセットを各ピクセルごとに学習します。

インポートnetwork.pyとは、任意のコードを変更せずに、このようなネットワークのクラスをインスタンス化：

net = network.Network([784, 30, 10])

を..784個の入力ニューロン、30個の隠れニューロン、10個の出力ニューロンを持つネットワークを提供します。重量行列は、それぞれ[30, 784]と[10, 30]の寸法を持ちます。ネットワークの入力配列[784, 1]に入力すると、重み行列のdim 1が入力配列（どちらも784）のdim 0に等しくなるため、誤差を与えた行列乗算が有効になります。

問題はバックプロップの実装ではなく、入力データの形状に適したネットワークアーキテクチャを設定することです。メモリが役立つとすれば、ニールセンは第1章のブラックボックスとしてバックプロップを残し、第2章までそれには潜んでいません。

出典

2017-10-22 18:04:19 jklaus

MNISTデータセットを学習するマシンのバックプロパゲーションを正しく実装する方法は？

答えて

関連する問題