1日あたりの出現回数を合計する

私はパンダのデータフレームのようなデータセットを持っています。1日あたりの出現回数を合計する

        score 
timestamp         
2013-06-29 00:52:28+00:00  -0.420070 
2013-06-29 00:51:53+00:00  -0.445720 
2013-06-28 16:40:43+00:00   0.508161 
2013-06-28 15:10:30+00:00   0.921474 
2013-06-28 15:10:17+00:00   0.876710

は、私は私が出現のカウントをしたい私は感情の列を気にしないいけない、この

        count 
    timestamp 
    2013-06-29      2 
    2013-06-28      3

のようなものを探していますので、発生する測定回数のカウントを取得する必要があります1日あたり

出典

2013-07-17 myusuf3

[重複]（http://stackoverflow.com/questions/17288636/fast-way-to-groupby-time-of-day-in-pandas）？ – TomAugspurger

あなたtimestampインデックスがDatetimeIndexの場合：

import io 
import pandas as pd 
content = '''\ 
timestamp score 
2013-06-29 00:52:28+00:00  -0.420070 
2013-06-29 00:51:53+00:00  -0.445720 
2013-06-28 16:40:43+00:00   0.508161 
2013-06-28 15:10:30+00:00   0.921474 
2013-06-28 15:10:17+00:00   0.876710 
''' 

df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0]) 

print(df)

のでdfは次のようになります。あなたが使用することができます

     score 
timestamp      
2013-06-29 00:52:28 -0.420070 
2013-06-29 00:51:53 -0.445720 
2013-06-28 16:40:43 0.508161 
2013-06-28 15:10:30 0.921474 
2013-06-28 15:10:17 0.876710 

print(df.index) 
# <class 'pandas.tseries.index.DatetimeIndex'>

：

print(df.groupby(df.index.date).count())

  score 
2013-06-28  3 
2013-06-29  2

注parse_datesパラメータの重要性を生み出す。それがなければ、インデックスはちょうどpandas.core.index.Indexオブジェクトになります。あなたはdf.index.dateを使用できませんでした。

だから、答えはあなたがresample機能を使用し、それ以外の場合は...

出典

2013-07-17 17:27:32 unutbu

In [145]: df 
Out[145]: 
timestamp 
2013-06-29 00:52:28 -0.420070 
2013-06-29 00:51:53 -0.445720 
2013-06-28 16:40:43 0.508161 
2013-06-28 15:10:30 0.921474 
2013-06-28 15:10:17 0.876710 
Name: score, dtype: float64 

In [160]: df.groupby(lambda x: x.date).count() 
Out[160]: 
2013-06-28 3 
2013-06-29 2 
dtype: int64

出典

2013-07-17 17:21:47 TomAugspurger

'' x.date''はプロパティとして動作し、（） –

hhhです。 'df.index [0] .date'が' 'を返す理由を知っていますか？ – TomAugspurger

Hmm。私はしません。 @アンディー？ –

を示していないtype(df.index)、に依存しています。

In [419]: df 
Out[419]: 
timestamp 
2013-06-29 00:52:28 -0.420070 
2013-06-29 00:51:53 -0.445720 
2013-06-28 16:40:43 0.508161 
2013-06-28 15:10:30 0.921474 
2013-06-28 15:10:17 0.876710 
Name: score, dtype: float64 

In [420]: df.resample('D', how={'score':'count'}) 

Out[420]: 
2013-06-28 3 
2013-06-29 2 
dtype: int64

UPDATE：@jbochiが指摘したように

は、howでリサンプリングパンダと0.18+が廃止されます。代わりに代わりに使用：

出典

2015-04-09 15:24:56

'how 'を使ったリサンプルは廃止予定です。あなたは 'df.resample（ 'D'）を使うべきです。apply（{'score'： 'count'}）' – jbochi

1日あたりの出現回数を合計する

答えて

関連する問題