Pandas "diff（）" with string

列が文字列の値を変更するたびにデータフレームの行にフラグを付けるにはどうすればよいですか？Pandas "diff（）" with string

例：

入力

ColumnA ColumnB 
1   Blue 
2   Blue 
3   Red 
4   Red 
5   Yellow 


# diff won't work here with strings.... only works in numerical values 
dataframe['changed'] = dataframe['ColumnB'].diff()   


ColumnA ColumnB  changed 
1   Blue   0 
2   Blue   0 
3   Red   1 
4   Red   0 
5   Yellow  1

出典

2016-10-31 guilhermecgs

パフォーマンスノートを：単に 'np.bool'タイプの代わりに整数を使用する方がよいかもしれません。 'np.bool'は1バイトを使います。私はあなたが 'np.int8'を使うことができると思いますが、デフォルトでは' np.int64'や 'np.int64'（システム上のC言語が何であっても）が使われていると思います... –

私が代わりに実際の!=比較を使用してのneとのより良いパフォーマンスを得る：

df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)

タイミング

を次のセットアップを使用して、より大きなデータフレームを作成します。

df = pd.concat([df]*10**5, ignore_index=True)

私は、次のタイミングを取得：

%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int) 
10 loops, best of 3: 38.1 ms per loop 

%timeit (df.ColumnB != df.ColumnB.shift()).astype(int) 
10 loops, best of 3: 77.7 ms per loop 

%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB']) 
10 loops, best of 3: 99.6 ms per loop 

%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int) 
10 loops, best of 3: 19.3 ms per loop

出典

2016-10-31 19:06:41 root

'（df.ColumnB.ne（df.ColumnB.shift（）））.statype（int）'のタイミングを追加できますか？ – jezrael

@jezrael：タイミングを追加しました。 'ix'を使って最初の行を0にすると、タイミングに〜1msが追加されるので、そのように最速に見えます。 – root

使用.shiftと比較：

dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])

出典

2016-10-31 18:47:13 Kartik

とてもきれいな答え – guilhermecgs

私の作品は、NaNが無いため値の前0を交換し、shiftと比較：

df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int) 
df.ix[0,'diff'] = 0 
print (df) 
    ColumnA ColumnB diff 
0  1 Blue  0 
1  2 Blue  0 
2  3  Red  1 
3  4  Red  0 
4  5 Yellow  1

別の答えの timingsによって

編集 - 最速はneを使用している：

df['diff'] = (df.ColumnB.ne(df.ColumnB.shift())).astype(int) 
df.ix[0,'diff'] = 0

出典

2016-10-31 18:49:00 jezrael

私は、このアプローチと '！='を使うだけでパフォーマンスに違いはありますか？ –

@ juanpa.arrivillaga - はい、ありがとうございます。 – jezrael

Pandas "diff（）" with string

答えて

関連する問題