Python uses Scikit born for Feature scaling, feature standardization, feature crossing, etc
Preparation work:
First, you need to install Python and Scikit learn. Python can be installed through toolkits such as Anaconda, and then Scikit learn can be installed through pip.
Dependent class libraries:
When performing operations such as Feature scaling, feature standardization and feature crossing, we mainly use the following class libraries:
1. Scikit-learn: machine learning library for data preprocessing and Feature engineering.
Dataset:
In this example, we will use the Iris dataset, which comes with Scikit learn, as the sample dataset. This dataset contains 150 samples, each containing 4 features (sepal length, sepal width, petal length, and petal width), and labeled according to three varieties of iris (mountain iris, color changing iris, and Virginia iris).
Dataset download website:
Due to the fact that Scikit-learn comes with the Iris dataset, there is no need to download it.
Sample data:
Here are some data examples from the Iris dataset:
Sepal length | sepal width | petal length | petal width | iris variety
-------------------------------------------------
5.1 | 3.5 | 1.4 | 0.2 | Mountain iris
4.9 | 3.0 | 1.4 | 0.2 | Mountain Iris
4.7 | 3.2 | 1.3 | 0.2 | Mountain Iris
…
6.4 | 2.8 | 5.6 | 2.2 | Virginia Iris
6.3 | 2.8 | 5.1 | 1.5 | Virginia Iris
The complete sample code is as follows:
python
import numpy as np
from sklearn import preprocessing
#Import Iris Dataset
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
#Feature scaling
#Scale Feature scaling to [0, 1]
X_scaled = preprocessing.minmax_scale(X)
print("Scaled Features:")
print(X_scaled[:5])
#Feature standardization
#Standardize features
X_standardized = preprocessing.scale(X)
print("Standardized Features:")
print(X_standardized[:5])
#Feature crossover
#Cross over features
X_crossed = preprocessing.PolynomialFeatures(interaction_only=True).fit_transform(X)
print("Crossed Features:")
print(X_crossed[:5])
Running results:
Scaled Features:
[[0.22222222 0.625 0.06779661 0.04166667]
[0.16666667 0.41666667 0.06779661 0.04166667]
[0.11111111 0.5 0.05084746 0.04166667]
[0.08333333 0.45833333 0.08474576 0.04166667]
[0.19444444 0.66666667 0.06779661 0.04166667]]
Standardized Features:
[[-0.90068117 1.03205722 -1.3412724 -1.31297673]
[-1.14301691 -0.1249576 -1.3412724 -1.31297673]
[-1.38535265 0.33784833 -1.39813811 -1.31297673]
[-1.50652052 0.10644536 -1.2844067 -1.31297673]
[-1.02184904 1.26346019 -1.3412724 -1.31297673]]
Crossed Features:
[[1. 5.1 3.5 1.4 0.2 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. ]
[1. 4.9 3. 1.4 0.2 0. 0. 0. 0. 0. 1.47 0.
0. 0. 0. 0. 0. ]
[1. 4.7 3.2 1.3 0.2 0. 0. 0. 0. 0. 1.504 0.
0. 0. 0. 0. 0. ]
[1. 4.6 3.1 1.5 0.2 0. 0. 0. 0. 0. 1.395 0.
0. 0. 0. 0. 0. ]
[1. 5. 3.6 1.4 0.2 0. 0. 0. 0. 0. 1.56 0.
0. 0. 0. 0. 0. ]]
Summary:
In this example, we use Scikit born to achieve Feature scaling, feature standardization, and feature intersection. Feature scaling and feature standardization can help us convert the value of the feature to a specific range or distribution with a mean of 0 and a variance of 1. Feature crossover can be used to generate higher-order polynomials or cross terms of features, thereby improving the model's expression ability. These preprocessing techniques are very important in machine learning and can help us better process and utilize feature information.