pandas dataframe文字列を抽出します

私のデータフレームに 'a'という列があり、 'apple'と 'orange'が含まれている可能性があります。私が望むのは、それらが存在すればそれを抽出し、そうでなければ「他者」とラベルを付けることです。pandas dataframe文字列を抽出します

単純に行をループして抽出することができます。しかし、同様の目的でnumpy.where()という使用法がいくつかありましたが、2つのカテゴリしかありませんでした。

result = numpy.where(df['a'].str.contains('apple'), 'apple', 'others')

3つのカテゴリの場合はここに適用できますか？つまり、resultには、 'apple'、 'orange'、または 'others'のエントリが含まれている必要があります。

単純にループするよりも良い方法がありますか？ fillnaと

出典

2016-07-12 nos

使用str.extract：

df = pd.DataFrame({'a': ['orange','apple','a']}) 
print (df) 
     a 
0 orange 
1 apple 
2  a 

df['new'] = df.a.str.extract('(orange|apple)', expand=False).fillna('others') 
print (df) 
     a  new 
0 orange orange 
1 apple apple 
2  a others

出典

2016-07-12 21:02:40 jezrael

私は結果は3つの可能性の一つになりたい： 'りんご'、 'オレンジ' や「O thers '。 – nos

私は答えを編集して、それを確認してください。 – jezrael

は、単に、その後othersとしてそれらの残りの部分を設定するためにnp.whereで使用することができブールマスクを作成するためにnp.in1dでappleかmangoているアイテムを探します。したがって、我々は持っているだろう - 例については

df['b'] = np.where(np.in1d(df.a,['apple','orange']),df.a,'others')

を使用すると、大きな文字列の一部としてこれらの名前を持つ文字列で動作するように見ている可能性がある場合、あなたは@jezrael's solutionからこのアイデアをキャッチ（str.extractを使用することができ、私はそれは大丈夫です願っています！）、その後、np.whereを使用し、そのような -

strings = df.a.str.extract('(orange|apple)') 
df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others')

サンプル実行 -

In [294]: df 
Out[294]: 
      a 
0 apple-shake 
1  orange 
2 apple-juice 
3  apple 
4  mango 
5  orange 
6  banana 

In [295]: strings = df.a.str.extract('(orange|apple)') 

In [296]: df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others') 

In [297]: df 
Out[297]: 
      a  b 
0 apple-shake apple 
1  orange orange 
2 apple-juice apple 
3  apple apple 
4  mango others 
5  orange orange 
6  banana others

出典

2016-07-12 21:11:29 Divakar

pandas dataframe文字列を抽出します

答えて

関連する問題