Python使用Feature-engine的DropFeatures、RecursiveFeatureElimination、SelectByShuffling函数做特征选择
准备工作:
1. 确保已经安装Python环境。
2. 安装Feature-engine库,可以通过以下命令进行安装:
pip install feature-engine
3. 导入所需的类和函数:
python
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from feature_engine.selection import DropFeatures, RecursiveFeatureElimination, SelectByShuffling
依赖的类库:
- Feature-engine: 提供了一些功能强大的特征工程方法,包括特征选择、变换、离散化等。
数据样例:
本例中,我们将使用sklearn中的波士顿房价数据集作为示例数据。这个数据集包括13个特征变量,用于预测房价。
python
# 加载数据集
boston = load_boston()
# 获取特征和目标变量
X = boston.data
y = boston.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
完整代码实例:
python
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from feature_engine.selection import DropFeatures, RecursiveFeatureElimination, SelectByShuffling
# 加载数据集
boston = load_boston()
# 获取特征和目标变量
X = boston.data
y = boston.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 使用DropFeatures类删除特征
drop_feats = DropFeatures(features_to_drop=['CRIM', 'ZN', 'INDUS'])
X_train_drop = drop_feats.fit_transform(X_train)
# 使用RecursiveFeatureElimination类进行递归特征消除
rfe = RecursiveFeatureElimination(estimator=LinearRegression(), n_features_to_select=5)
X_train_rfe = rfe.fit_transform(X_train, y_train)
# 使用SelectByShuffling类进行特征选择
sel_shuffle = SelectByShuffling(estimator=LinearRegression(), scoring='r2', cv=10, random_state=42)
X_train_shuffle = sel_shuffle.fit_transform(X_train, y_train)
# 打印选取的特征
print("Selected Features using DropFeatures:", drop_feats.features_to_drop_)
print("Selected Features using RecursiveFeatureElimination:", X_train.columns[rfe.support_])
print("Selected Features using SelectByShuffling:", sel_shuffle.features_to_drop_)
# 模型训练和预测
lr = LinearRegression()
lr.fit(X_train_drop, y_train)
y_pred = lr.predict(drop_feats.transform(X_test))
print("R-Squared (DropFeatures):", lr.score(drop_feats.transform(X_test), y_test))
lr.fit(X_train_rfe, y_train)
y_pred = lr.predict(rfe.transform(X_test))
print("R-Squared (RecursiveFeatureElimination):", lr.score(rfe.transform(X_test), y_test))
lr.fit(X_train_shuffle, y_train)
y_pred = lr.predict(sel_shuffle.transform(X_test))
print("R-Squared (SelectByShuffling):", lr.score(sel_shuffle.transform(X_test), y_test))
总结:
本文我们使用Feature-engine库中的DropFeatures、RecursiveFeatureElimination、SelectByShuffling函数实现了特征选择。DropFeatures函数用于删除特定的特征,RecursiveFeatureElimination函数通过递归特征消除方法选择最佳特征子集,SelectByShuffling函数通过洗牌法选择特征。通过实验,我们可以从给定的数据集中选择合适的特征子集,以提高模型的性能和准确性。