バックプロパゲーション中にTheano gradientsを（選択的に）反転できますか？

私はLasagne/Theanoフレームワークの最近の論文 "Unsupervised Domain Adaptation by Backpropagation"で提案されているアーキテクチャを利用したいと考えています。底部に沿ってバックプロパゲーション中にTheano gradientsを（選択的に）反転できますか？

（矢印：

それは少し珍しいなり、この論文についての事は、それが「勾配反転層」、反転バックプロパゲーションの間に勾配を組み込むことです勾配が反転した逆伝播である）。

著者は、このアプローチは「深い学習パッケージを使用して実装できる」と主張しており、確かにversion made in caffeを提供しています。

しかし、私はさまざまな理由でLasagne/Theanoフレームワークを使用しています。

このようなグラデーション反転レイヤーをLasagne/Theanoに作成することはできますか？私はこのようなグラデーションにカスタムスカラー変換を適用できる例は見ていません。もしそうなら、私はLasagneにカスタムレイヤーを作成することでそれを行うことができますか？

出典

2015-11-23 Bill Cheatham

ここは、プレーンなTheanoを使ったスケッチの実装です。これはLasagneに簡単に統合することができます。

フォワードパスでアイデンティティ操作として機能するカスタム操作を作成する必要がありますが、後方パスではグラデーションを反転します。

これはどのように実装できるかについての提案です。それはテストされていません、私はすべてを正しく理解していることを100％確信していませんが、必要に応じて検証して修正することができます。

class ReverseGradient(theano.gof.Op): 
    view_map = {0: [0]} 

    __props__ = ('hp_lambda',) 

    def __init__(self, hp_lambda): 
     super(ReverseGradient, self).__init__() 
     self.hp_lambda = hp_lambda 

    def make_node(self, x): 
     return theano.gof.graph.Apply(self, [x], [x.type.make_variable()]) 

    def perform(self, node, inputs, output_storage): 
     xin, = inputs 
     xout, = output_storage 
     xout[0] = xin 

    def grad(self, input, output_gradients): 
     return [-self.hp_lambda * output_gradients[0]]

紙の表記法と命名規則を使用して、提案する完全な一般的なモデルの単純なTheanoの実装を次に示します。

import numpy 
import theano 
import theano.tensor as tt 


def g_f(z, theta_f): 
    for w_f, b_f in theta_f: 
     z = tt.tanh(theano.dot(z, w_f) + b_f) 
    return z 


def g_y(z, theta_y): 
    for w_y, b_y in theta_y[:-1]: 
     z = tt.tanh(theano.dot(z, w_y) + b_y) 
    w_y, b_y = theta_y[-1] 
    z = tt.nnet.softmax(theano.dot(z, w_y) + b_y) 
    return z 


def g_d(z, theta_d): 
    for w_d, b_d in theta_d[:-1]: 
     z = tt.tanh(theano.dot(z, w_d) + b_d) 
    w_d, b_d = theta_d[-1] 
    z = tt.nnet.sigmoid(theano.dot(z, w_d) + b_d) 
    return z 


def l_y(z, y): 
    return tt.nnet.categorical_crossentropy(z, y).mean() 


def l_d(z, d): 
    return tt.nnet.binary_crossentropy(z, d).mean() 


def mlp_parameters(input_size, layer_sizes): 
    parameters = [] 
    previous_size = input_size 
    for layer_size in layer_sizes: 
     parameters.append((theano.shared(numpy.random.randn(previous_size, layer_size).astype(theano.config.floatX)), 
          theano.shared(numpy.zeros(layer_size, dtype=theano.config.floatX)))) 
     previous_size = layer_size 
    return parameters, previous_size 


def compile(input_size, f_layer_sizes, y_layer_sizes, d_layer_sizes, hp_lambda, hp_mu): 
    r = ReverseGradient(hp_lambda) 

    theta_f, f_size = mlp_parameters(input_size, f_layer_sizes) 
    theta_y, _ = mlp_parameters(f_size, y_layer_sizes) 
    theta_d, _ = mlp_parameters(f_size, d_layer_sizes) 

    xs = tt.matrix('xs') 
    xs.tag.test_value = numpy.random.randn(9, input_size).astype(theano.config.floatX) 
    xt = tt.matrix('xt') 
    xt.tag.test_value = numpy.random.randn(10, input_size).astype(theano.config.floatX) 
    ys = tt.ivector('ys') 
    ys.tag.test_value = numpy.random.randint(y_layer_sizes[-1], size=9).astype(numpy.int32) 

    fs = g_f(xs, theta_f) 
    e = l_y(g_y(fs, theta_y), ys) + l_d(g_d(r(fs), theta_d), 0) + l_d(g_d(r(g_f(xt, theta_f)), theta_d), 1) 

    updates = [(p, p - hp_mu * theano.grad(e, p)) for theta in theta_f + theta_y + theta_d for p in theta] 
    train = theano.function([xs, xt, ys], outputs=e, updates=updates) 

    return train 


def main(): 
    theano.config.compute_test_value = 'raise' 
    numpy.random.seed(1) 
    compile(input_size=2, f_layer_sizes=[3, 4], y_layer_sizes=[7, 8], d_layer_sizes=[5, 6], hp_lambda=.5, hp_mu=.01) 


main()

これは未テストですが、以下は、このカスタムopはラザニア層として使用されることを可能にする：

class ReverseGradientLayer(lasagne.layers.Layer): 
    def __init__(self, incoming, hp_lambda, **kwargs): 
     super(ReverseGradientLayer, self).__init__(incoming, **kwargs) 
     self.op = ReverseGradient(hp_lambda) 

    def get_output_for(self, input, **kwargs): 
     return self.op(input)

出典

2015-11-24 09:28:20

これは非常にクールです、少なくとも私のマシン上でコンパイルするようです。私はこれをLasagneフォーマットに適合させて変換し、それを使っていくつかのデータを実行します。どうもありがとう！ –

誰もこれをテストしましたか？ @BillCheatham、それはあなたのために働いたのですか？ – pir

バックプロパゲーション中にTheano gradientsを（選択的に）反転できますか？

答えて

関連する問題