インデックスデータフレームに基づいてパンダで新しい（より詳細な）データフレームを作成する

新生児の質問をお詫び申し上げますが、私はパンダのデータフレームを理解するのに苦労している。私はインデックスデータフレームに基づいてパンダで新しい（より詳細な）データフレームを作成する

df_details: 
Product     Title 
100000     Sample main product 
100000-Format-English  Sample product details 
100000-Format-Spanish  Sample product details 
100000-Format-French  Sample product details 
110000     Another sample main product 
110000-Format-English  Another sample details 
110000-Format-Spanish  Another sample details 
120000     Yet another sample main product 
120000-Format-English  Yet another sample details 
120000-Format-Spanish  Yet another sample details 
... 
200000     Non-consecutive main sample 
200000-Format-English  Non-consecutive sample details 
200000-Format-Spanish  Non-consecutive sample details

のように、私はdf_detailsに基づいて、新たなデータフレームを作成したい、フォーマットと製品の詳細なリストを持つ別のデータフレームを持って

df_index: 
Product Title 
100000  Sample main product 
200000  Non-consecutive main sample

のようなものを持つ1つのデータフレームを持っていますただし、df_indexに表示される製品のみに適用されます。理想的には、それはのようなものになります。

new_df: 
Product     Title 
100000     Sample main product 
100000-Format-English  Sample product details 
100000-Format-Spanish  Sample product details 
100000-Format-French  Sample product details 
200000     Non-consecutive main sample 
200000-Format-English  Non-consecutive sample details 
200000-Format-Spanish  Non-consecutive sample details

をここに私はこれまで試したものです：

new_df = df_details[df_details['Product'][0:5] == df_index['Product'][0:5]]

私にエラーを与えること：

ValueError: Can only compare identically-labeled Series objects

私も

を試してみました

new_df = pd.merge(df_index, df_details, 
    left_on=['Product'[0:5]], right_index=True, how='left')

それはフォーマットの情報を含む詳細行を含んでいません。唯一の共通の指標を見てマスクを行います

new_df = df_details[df_details['Product'].isin(df_index['Product']]

この：

出典

2016-12-22 nathan.hunt

あなたはとして.isin()を使用することができるはずです。

EDIT：これは、列が同じ文字列であるかどうかだけで動作します。列は、文字列としてフォーマットされている場合は、この作品

import re 

# create a pattern to look for 
pat ='|'.join(map(re.escape, df_index['Product'])) 

# Create the mask 
new_df = df_details[df_details['Product'].str.contains(pat)]

：これを解決するためにあなたがstr.contains()を使用することができます。

出典

2016-12-22 18:13:03

ニース。同じソリューションを書いていただけです。 –

それは動作します、一種です。しかし、それは私に200000-Format-Englishスタイルの行を与えません。おそらく200000のような行の完全一致ではないからでしょうか？ –

@ nathan.huntはい、あなたは正しいです、私はすべての行が同じ書式を持っていると思っていました...これは、同じ形式の行を検索します...もっと一般的な解決策を考える... –

これをどうやって引き離したのかはわかりませんが、それを達成する最速の方法ではないと確信していますが、うまくいきます。

私は、データフレームの行ごとの通過するループし、いくつかのforとifでパンダ.itterow()を使用：

# create a list based on the 'Product' column of df_index 
increment = 0 
index_list = [] 
for product, row in df_index.iterrows(): 
    prod_num = df_index.product.iloc[increment] 
    index_list.append(prod_num) 
    increment += 1 

#construct a new data frame based on the rows in df_details that are found in index_list 
new_df = pd.DataFrame(columns=detail_df.columns) 
increment_detail = 0 
for product, row in df_details.iterrows(): 
    prod_num_detail = df_details.product.iloc[increment_detail] 
    prod_num_detail = prod_num_detail[0:6] 
    if str(prod_num_detail) in dupe_list: 
     new_df = new_df.append(df_details.iloc[increment_detail]) 
     increment_detail += 1 
    else: 
     increment_detail += 1

出典

2016-12-23 17:42:13

インデックスデータフレームに基づいてパンダで新しい（より詳細な）データフレームを作成する

答えて

関連する問題