PANDASで異なるインデックスを持つデータフレームやシリーズで計算を行うにはどうすればよいですか？

私は同じ長さとデータ型の2つのシリーズを持っています。どちらもfloat64です。唯一の違いは、両方のインデックスが日付であることですが、1つの日付はその月の初めにあり、もう1つは月末です。 Seriesでデータの相関や共分散などの計算を行うにはどうしたらいいですか？あなたはその後、連結を相関したいもののためにPANDASで異なるインデックスを持つデータフレームやシリーズで計算を行うにはどうすればよいですか？

import numpy as np 
from pandas import Series, DataFrame 
import pandas as pd 
import Quandl 

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key") 
ir=Quandl.get("FRBC/REALRT", authtoken="api key") 

ipo_splice=IPO[264:662] 
new_ipo=ipo_splice['Gross Number of IPOs']; 
new_ipo=new_ipo.T 


ir_splice=ir[0:398] 
new_ir=ir_splice['RR 1 Month'] 
new_ir=new_ir.T 

new_ipo.corr(new_ir)

出典

2016-06-17 Jeffrey Derose

reset_index(drop=True)。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1']) 
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2']) 

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr() 


      s1  s2 
s1 1.000000 -0.437945 
s2 -0.437945 1.000000

出典

2016-06-17 01:11:33 piRSquared

あなたは（私たちの目標は、どちらかの両方の指標は、BOMまたはEOM持っている）あなたの指標の一つをリサンプリングするために、resample()機能を使用することができます。

データ：

In [63]: df_bom 
Out[63]: 
      val 
2015-01-01 76 
2015-02-01 27 
2015-03-01 65 
2015-04-01 71 
2015-05-01 9 
2015-06-01 23 
2015-07-01 52 
2015-08-01 10 
2015-09-01 62 
2015-10-01 25 

In [64]: df_eom 
Out[64]: 
      val 
2015-01-31 87 
2015-02-28 16 
2015-03-31 85 
2015-04-30 4 
2015-05-31 37 
2015-06-30 63 
2015-07-31 3 
2015-08-31 73 
2015-09-30 81 
2015-10-31 69

ソリューション：

In [61]: df_eom.resample('MS') + df_bom 
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation 
use .resample(...).mean() instead of .resample(...) 
Out[61]: 
      val 
2015-01-01 163 
2015-02-01 43 
2015-03-01 150 
2015-04-01 75 
2015-05-01 46 
2015-06-01 86 
2015-07-01 55 
2015-08-01 83 
2015-09-01 143 
2015-10-01 94 

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft') 
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation 
use .resample(...).mean() instead of .resample(...) 
Out[62]: 
      val_lft val 
2015-01-01  87 76 
2015-02-01  16 27 
2015-03-01  85 65 
2015-04-01  4 71 
2015-05-01  37 9 
2015-06-01  63 23 
2015-07-01  3 52 
2015-08-01  73 10 
2015-09-01  81 62 
2015-10-01  69 25

別のアプローチ - yearとmonth部品でDFのをマージ：

In [69]: %paste 
(pd.merge(df_bom, df_eom, 
      left_on=[df_bom.index.year, df_bom.index.month], 
      right_on=[df_eom.index.year, df_eom.index.month], 
      suffixes=('_bom','_eom'))) 
## -- End pasted text -- 
Out[69]: 
    key_0 key_1 val_bom val_eom 
0 2015  1  76  87 
1 2015  2  27  16 
2 2015  3  65  85 
3 2015  4  71  4 
4 2015  5  9  37 
5 2015  6  23  63 
6 2015  7  52  3 
7 2015  8  10  73 
8 2015  9  62  81 
9 2015  10  25  69

セットアップ：

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS')) 

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))

出典

2016-06-17 08:09:55 MaxU

PANDASで異なるインデックスを持つデータフレームやシリーズで計算を行うにはどうすればよいですか？

答えて

関連する問題