私はiris.scaleデータセットを分析目的で使用しています。しかし、処理中に、私は、データファイルを読んだ後のように列の値をスライスしスライスされた列の値を取得

df = pd.read_csv("../Data/iris.scale.csv", sep=' ', header=None, names=['class','S.lenght','S.width','P.lenght','P.width']) 
print(df.head(3)) 

    class  S.lenght  S.width  P.lenght  P.width 
    1  1:-0.555556 2:0.25  3:-0.864407  4:-0.916667 
    1  1:-0.666667 2:-0.166667 3:-0.864407  4:-0.916667 
    1  1:-0.833333 2:-0.08333 3:-0.830508  4:-0.916667

しかし、それは

class  S.lenght  S.width  P.lenght  P.width 
    1  -0.555556 0.25  -0.864407  -0.916667 
    1  -0.666667 -0.166667 -0.864407  -0.916667 
    1  -0.833333 -0.08333 -0.830508  -0.916667

を処理することができますので、任意の特徴相場せずに、このようなこれらのスライスの列を取得する方法を取得しない方法

出典

2017-01-27 Zafar Mahmood

「 '' iris.scale.csv'''」の大きさはどれくらいですか？あなたはcsvファイルのいくつかの行を追加しますか？ – wwii

その非常に小さなデータセット、通常ベッドテストには非常に良い https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale –

あなたはset_indexを使用するためのfloatにsplitによって抽出されただけ:、最後のキャスト出力と値を持つDataFrameを作成することができます。

df=df.set_index('class').apply(lambda x: x.str.split(':').str[1]).astype(float).reset_index() 
print (df) 
    class S.lenght S.width P.lenght P.width 
0  1 -0.555556 0.250000 -0.864407 -0.916667 
1  1 -0.666667 -0.166667 -0.864407 -0.916667 
2  1 -0.833333 -0.083330 -0.830508 -0.916667

str.extractと別の解決策：

df = df.set_index('class').apply(lambda x: x.str.extract(':(.*)', expand=False)).astype(float).reset_index() 
print (df) 
    class S.lenght S.width P.lenght P.width 
0  1 -0.555556 0.250000 -0.864407 -0.916667 
1  1 -0.666667 -0.166667 -0.864407 -0.916667 
2  1 -0.833333 -0.083330 -0.830508 -0.916667

出典

2017-01-27 19:57:17 jezrael

`pandas`

filter正しい列+ unstack
stack + str.split
update

に焦点を当てます

コード

df.update(
    df.filter(regex='S|P').stack().str.split(':').str[1].astype(float).unstack()) 
df 

    class S.lenght S.width P.lenght P.width 
0  1 -0.555556  0.25 -0.864407 -0.916667 
1  1 -0.666667 -0.166667 -0.864407 -0.916667 
2  1 -0.833333 -0.08333 -0.830508 -0.916667

`numpy`

split全体一度
でアレイ構築新しい配列
スライス割り当て

次に、あなたがそれを供給する前に、新しいファイルにデータを書き込むことができますいずれかの過剰

import re, io with open("../Data/iris.scale.csv") as f: data = f.read() p = r'[1-4]:' data = re.sub(p, '', data)

を削除するパンダにそれを送り込む前にコード

s = np.core.defchararray.split(df.values[:, 1:].astype(str), ':').tolist() 
df.iloc[:, 1:] = np.array(s)[:, :, 1].astype(float) 

    class S.lenght S.width P.lenght P.width 
0  1 -0.555556  0.25 -0.864407 -0.916667 
1  1 -0.666667 -0.166667 -0.864407 -0.916667 
2  1 -0.833333 -0.08333 -0.830508 -0.916667

出典

2017-01-27 20:44:42 piRSquared

プリプロセスデータPandasに送るか、それをファイルのようなオブジェクトに入れてPandasに送ります。

#Python 2.7 
data = io.BytesIO(data) 
#Python 3x 
#data = io.StringIO(data) 
df = pd.read_csv(data, delim_whitespace = True, index_col = False, names=['class','S.lenght','S.width','P.lenght','P.width'])

出典

2017-01-27 23:39:31 wwii

スライスされた列の値を取得

答えて

pandas

numpy

関連する問題

`pandas`

`numpy`