は、私はこのようなデータフレームを使用している場合はパンダ

内のグループによって列を断ち切る：は、私はこのようなデータフレームを使用している場合はパンダ

type value group 
    a  10  one 
    b  45  one 
    a  224  two 
    b  119  two 
    a  33 three 
    b  44 three

私はこれにそれを作るのですか：

type  one  two three 
    a  10  224  33 
    b  45  119  44

私はそれがpivot_tableだろうと思ったが、それちょうど私に再グループ化されたリストを与えます。

出典

2016-05-03 futuraprime

は、私はあなたが必要だと思うpivotrename_axis（新しい0.18.0pandas中）とreset_indexと：

df = df.pivot(index='type', columns='group', values='value').rename_axis(None, axis=1) 

print df[['one','two','three']].reset_index() 
    type one two three 
0 a 10 224  33 
1 b 45 119  44

EDIT：

あなたの本当で

print df.pivot(index='type', columns='group', values='value') 
     .rename_axis(None, axis=1) 
     .reset_index() 

    type one three two 
0 a 10  33 224 
1 b 45  44 119

列の順序が重要な場合

あなたが得ることができるデータ：

print df.pivot(index='type', columns='group', values='value') 
     .rename_axis(None, axis=1) 
     .reset_index()

ValueError: Index contains duplicate entries, cannot reshape

print df 
    type value group 
0 a  10 one 
1 a  20 one 
2 b  45 one 
3 a 224 two 
4 b 119 two 
5 a  33 three 
6 b  44 three

問題は、2行目である - 10と20 - あなたはインデックス値aと列one二つの値のために取得します。この場合、機能pivot_tableデータを集計します。 Dafault集約関数はnp.meanですが、パラメータaggfuncによって変更することができます。

print df.pivot_table(index='type', columns='group', values='value', aggfunc=np.mean) 
     .rename_axis(None, axis=1) 
     .reset_index() 

    type one three two 
0 a 15  33 224 
1 b 45  44 119 

print df.pivot_table(index='type', columns='group', values='value', aggfunc='first') 
     .rename_axis(None, axis=1) 
     .reset_index() 

    type one three two 
0 a 10  33 224 
1 b 45  44 119 

print df.pivot_table(index='type', columns='group', values='value', aggfunc=sum) 
     .rename_axis(None, axis=1) 
     .reset_index() 

    type one three two 
0 a 30  33 224 
1 b 45  44 119

出典

2016-05-03 07:23:52 jezrael

は、私はこのようなデータフレームを使用している場合はパンダ

答えて

関連する問題