2016-09-06 7 views
1

スタッキングとブレンディングモデルの読み込み中にエラーが発生しました。IndexError:インデックスが範囲外です。 これに関するガイダンスがあれば助けになります。おかげで...フィッティング中のIndexError積み重ねられた一般化

は、私は、データセットをお読みください。

import pandas as pd 
import numpy as np 
from stacked_generalizer import StackedGeneralizer 
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier 
from sklearn.linear_model import LogisticRegression 

#Load cleaned data : 
train = pd.read_csv('train1.csv') 
test = pd.read_csv('test1.csv') 

は、それから私は、変数を選択しました。列車データのすべての変数のサブセット。

target='Y1' 
ID = 'ID' 
predictors1= ['Marks_SA','Marks_PA', 
     'Marks_CA','Feat2','Experience', 'Feat6','Feat1', 
     'Feat5','Feat4'] 

は今のモデルをブレンド:

base_models = [RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='gini'), 
      RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='entropy'), 
      ExtraTreesClassifier(n_estimators=100, n_jobs=-1, criterion='gini')] 


# define blending model 
blending_model = LogisticRegression() 
VERBOSE = True 
N_FOLDS = 5 

# initialize multi-stage model 
sg = StackedGeneralizer(base_models, blending_model, 
        n_folds=N_FOLDS, verbose=VERBOSE) 

# fit model 
sg.fit(train[predictors1],train[target]) 

は、次のエラーを取得:

Fitting Base Models... 
Fitting model 01: RandomForestClassifier(bootstrap=True, class_weight=None,  criterion='gini', 
     max_depth=None, max_features='auto', max_leaf_nodes=None, 
     min_samples_leaf=1, min_samples_split=2, 
     min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=-1, 
     oob_score=False, random_state=None, verbose=0, 
     warm_start=False) 

Fold 1 

IndexError        Traceback (most recent call last) 
<ipython-input-47-dd6152e11339> in <module>() 
    1 # fit model 
    2 #sg.fit(X[:n_train],y[:n_train]) 
    ----> 3 sg.fit(train[columns],train[target]) 

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit(self, X, y) 
211 
212   def fit(self, X, y): 
--> 213     X_blend = self.fit_transform_base_models(X, y) 
214     self.fit_blending_model(X_blend, y) 
215 

c:\users\src\stacked-generalization\stacked_generalizer.pyc in  fit_transform_base_models(self, X, y) 
159 
160   def fit_transform_base_models(self, X, y): 
--> 161     self.fit_base_models(X, y) 
162     return self.transform_base_models(X) 
163 

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_base_models(self, X, y) 
129           print('Fold %d' % (j + 1)) 
130 
--> 131         X_train = X[train_idx] 
132         y_train = y[train_idx] 
133 

C:\Users\Anaconda2\envs\gl-env\lib\site- packages\pandas\core\frame.pyc in  __ getitem__(self, key) 
1984   if isinstance(key, (Series, np.ndarray, Index, list)): 
1985    # either boolean or fancy integer index 
-> 1986    return self._getitem_array(key) 
1987   elif isinstance(key, DataFrame): 
1988    return self._getitem_frame(key) 

    C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key) 
    2029   else: 
    2030    indexer = self.ix._convert_to_indexer(key, axis=1) 
    -> 2031    return self.take(indexer, axis=1, convert=True) 
    2032 
    2033  def _getitem_multilevel(self, key): 

    C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in take(self, indices, axis, convert, is_copy) 
    1626   new_data = self._data.take(indices, 
    1627          axis=self._get_block_manager_axis(axis), 
-> 1628         convert=True, verify=True) 
1629   result = self._constructor(new_data).__finalize__(self) 
    1630 

    C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in take(self, indexer, axis, verify, convert) 
    3635   n = self.shape[axis] 
3636   if convert: 
-> 3637    indexer = maybe_convert_indices(indexer, n) 
3638 
3639   if verify: 

    C:\Usersnaconda2\envs\gl-env\lib\site-packages\pandas\core\indexing.pyc in maybe_convert_indices(indices, n) 
    1808  mask = (indices >= n) | (indices < 0) 
    1809  if mask.any(): 
-> 1810   raise IndexError("indices are out-of-bounds") 
1811  return indices 
1812 

IndexError: indices are out-of-bounds 

答えて

1

ただ、この行を変更:

sg.fit(train[predictors1],train[target]) 

をして、それを作る:

sg.fit(train[predictors1].values,train[target].values) 

stacked_generalizer fit関数は、入力としてndarrayを受け取ります。

+0

私は見てみましょう、いくつかの自由な時間にそれを行います。 –

+0

ありがとう@Pranav Waila – Harry

関連する問題