単純な勾配降下はなぜ発散するのですか？

これは、1つの変数にグラデーションディセントを実装する2番目の試みであり、常に発散します。何か案は？単純な勾配降下はなぜ発散するのですか？

これは、1つの変数の残余平方和を最小にする単純な線形回帰です。

def gradient_descent_wtf(xvalues, yvalues): 
    tolerance = 0.1 

    #y=mx+b 
    #some line to predict y values from x values 
    m=1. 
    b=1. 

    #a predicted y-value has value mx + b 

    for i in range(0,10): 

     #calculate y-value predictions for all x-values 
     predicted_yvalues = list() 
     for x in xvalues: 
      predicted_yvalues.append(m*x + b) 

     # predicted_yvalues holds the predicted y-values 

     #now calculate the residuals = y-value - predicted y-value for each point 
     residuals = list() 
     number_of_points = len(yvalues) 
     for n in range(0,number_of_points): 
      residuals.append(yvalues[n] - predicted_yvalues[n]) 

     ## calculate the residual sum of squares from the residuals, that is, 
     ## square each residual and add them all up. we will try to minimize 
     ## the residual sum of squares later. 
     residual_sum_of_squares = 0. 
     for r in residuals: 
      residual_sum_of_squares += r**2 
     print("RSS = %s" % residual_sum_of_squares) 
     ## 
     ## 
     ## 

     #now make a version of the residuals which is multiplied by the x-values 
     residuals_times_xvalues = list() 
     for n in range(0,number_of_points): 
      residuals_times_xvalues.append(residuals[n] * xvalues[n]) 

     #now create the sums for the residuals and for the residuals times the x-values 
     residuals_sum = sum(residuals) 

     residuals_times_xvalues_sum = sum(residuals_times_xvalues) 

     # now multiply the sums by a positive scalar and add each to m and b. 

     residuals_sum *= 0.1 
     residuals_times_xvalues_sum *= 0.1 

     b += residuals_sum 
     m += residuals_times_xvalues_sum 

     #and repeat until convergence. 
     #convergence occurs when ||sum vector|| < some tolerance. 
     # ||sum vector|| = sqrt(residuals_sum**2 + residuals_times_xvalues_sum**2) 

     #check for convergence 
     magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5 
     if magnitude_of_sum_vector < tolerance: 
      break 

    return (b, m)

結果：

gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567]) 
RSS = 370433.0 
RSS = 300170125.7 
RSS = 4.86943013045e+11 
RSS = 7.90447409339e+14 
RSS = 1.28312217794e+18 
RSS = 2.08287421094e+21 
RSS = 3.38110045417e+24 
RSS = 5.48849288217e+27 
RSS = 8.90939341376e+30 
RSS = 1.44624932026e+34 
Out[108]: 
(-3.475524066284303e+16, -2.4195981188763203e+17)

出典

2016-12-28 Default picture

勾配が巨大ある - ので、あなたは（0.1倍大きな数が多い）長距離のための大きなベクトルを以下の通りです。適切な方向の単位ベクトルを見つける。（内包表記は、あなたのループを交換すると）このような何か：たとえば

def gradient_descent_wtf(xvalues, yvalues): 
    tolerance = 0.1 

    m=1. 
    b=1. 

    for i in range(0,10): 
     predicted_yvalues = [m*x+b for x in xvalues] 

     residuals = [y-y_hat for y,y_hat in zip(yvalues,predicted_yvalues)] 

     residual_sum_of_squares = sum(r**2 for r in residuals) #only needed for debugging purposes 
     print("RSS = %s" % residual_sum_of_squares) 

     residuals_times_xvalues = [r*x for r,x in zip(residuals,xvalues)] 

     residuals_sum = sum(residuals) 

     residuals_times_xvalues_sum = sum(residuals_times_xvalues) 

     # (residuals_sum,residual_times_xvalues_sum) is a vector which points in the negative 
     # gradient direction. *Find a unit vector which points in same direction* 

     magnitude = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5 

     residuals_sum /= magnitude 
     residuals_times_xvalues_sum /= magnitude 

     b += residuals_sum * (0.1) 
     m += residuals_times_xvalues_sum * (0.1) 

     #check for convergence -- this needs work! 
     magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5 
     if magnitude_of_sum_vector < tolerance: 
      break 

    return (b, m)

：

>>> gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567]) 
RSS = 370433.0 
RSS = 368732.1655050716 
RSS = 367039.18363896786 
RSS = 365354.0543519137 
RSS = 363676.7775934381 
RSS = 362007.3533123621 
RSS = 360345.7814567845 
RSS = 358692.061974069 
RSS = 357046.1948108295 
RSS = 355408.17991291644 
(1.1157111313023558, 1.9932828425473605)

確かにはるかにもっともらしいです。

数値的に安定した勾配降下アルゴリズムを作るのは簡単なことではありません。あなたは、数値分析でまともな教科書に相談したいかもしれません。

出典

2016-12-28 02:41:53

まず、あなたのコードは正しいです。

しかし、線形回帰を行うときは、数学について何か考慮する必要があります。

例えば、残留が-205.8であり、あなたの学習率は、あなたが巨大な降下ステップ-25.8を取得します0.1です。

これは、正しいmとbに戻れないほど大きなステップです。あなたは足を小さくする必要があります。

合理的な勾配降下ステップを作成するには、2つの方法があります。

は、0.001と0.0003のような小さな学習率を、初期化します。
ステップを入力値の合計で割ってください。

出典

2016-12-28 02:51:09 Jing

単純な勾配降下はなぜ発散するのですか？

答えて

関連する問題