洞察探索如何利用兼容微信生态的小程序容器,实现跨平台开发,助力金融和车联网行业的数字化转型。
983
2022-11-08
How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?
I use GridSearchCV of scikit-learn to find the best parameters for my XGBClassifier model, I use code like below:
grid_params = { 'n_estimators' : [100, 500, 1000], 'subsample' : [0.01, 0.05]}est = xgb.Classifier()grid_xgb = GridSearchCV(param_grid = grid_params, estimator = est, scoring = 'roc_auc', cv = 4, verbose = 0)grid_xgb.fit(X_train, y_train)print('best estimator:', grid_xgb.best_estimator_)print('best AUC:', grid_xgb.best_score_)print('best parameters:', grid_xgb.best_params_)
I need to have feature importance DataFrame with my variables and their importance something like below:
variable | importance---------|-------x1 | 12.456x2 | 3.4509x3 | 1.4456... | ...
How can I achieve above DF from my XGBClassifier made by using GridSearchCV ?
I tried to achieve that by using something like below:
f_imp_xgb = grid_xgb.get_booster().get_score(importance_type='gain')keys = list(f_imp_xgb.keys())values = list(f_imp_xgb.values())df_f_imp_xgb = pd.DataFrame(data = values, index = keys, columns = ['score']).sort_values(by='score', ascending = False)
But I have error:
AttributeError: 'GridSearchCV' object has no attribute 'get_booster'
What can I do?
You can use:
clf.best_estimator_.get_booster().get_score(importance_type='gain')
Example:
import pandas as pdimport numpy as npfrom xgboost import XGBClassifierfrom sklearn.model_selection import GridSearchCVnp.random.seed(42)# generate some dummy datadf = pd.DataFrame(data=np.random.normal(loc=0, scale=1, size=(100, 3)), columns=['x1', 'x2', 'x3'])df['y'] = np.where(df.mean(axis=1) > 0, 1, 0)# find the best modelX = df.drop(labels=['y'], axis=1)y = df['y']parameters = { 'n_estimators': [100, 500, 1000], 'subsample': [0.01, 0.05]}clf = GridSearchCV( param_grid=parameters, estimator=XGBClassifier(random_state=42), scoring='roc_auc', cv=4, verbose=0)clf.fit(X, y)# get the feature importancesimportances = clf.best_estimator_.get_booster().get_score(importance_type='gain')importances = pd.DataFrame(importances, index=[0]).transpose().rename(columns={0: 'importance'})print(importances)# importance# x1 1.782590# x2 1.420949# x3 1.500457
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~