Python uses the RareLabelCategoricalEncoder, OrdinalCategoricalEncoder, OneHotCategoricalEncoder functions of Feature engine to handle category variables
Preparation work:
1. Install the Feature engine library: Use the pip install feature engine command to install.
2. Prepare data: You can use any dataset with category variables as an example.
Dependent class libraries:
1. Pandas: Used for data processing and analysis.
2. Feature_ Engine. encoding: Encoding related modules in the Feature engine library.
Sample data:
To demonstrate the process of feature encoding, we use an example dataset containing category variables, which includes the following columns:
-Sex: Gender, with values of 'male' or 'female'
-City: City of residence, with values of 'New York', 'London', 'Paris', or' Tokyo '
-Age: Age, taken as an integer
The complete code is as follows:
python
import pandas as pd
from feature_engine.encoding import RareLabelCategoricalEncoder, OrdinalCategoricalEncoder, OneHotCategoricalEncoder
#Create sample data
data = {'Sex': ['female', 'male', 'female', 'female', 'male'],
'City': ['New York', 'London', 'Paris', 'London', 'Tokyo'],
'Age': [25, 45, 10, 35, 60]}
df = pd.DataFrame(data)
#Using the RareLabelCategoricalEncoder to handle category variables
rare_encoder = RareLabelCategoricalEncoder(tol=0.03, n_categories=4, variables=['City'])
df_encoded = rare_encoder.fit_transform(df)
print("Rare Label Encoded DataFrame:")
print(df_encoded)
print()
#Using OrdinalCategoricalEncoder to Process Category Variables
ordinal_encoder = OrdinalCategoricalEncoder(encoding_method='arbitrary', variables=['Sex', 'City'])
df_encoded = ordinal_encoder.fit_transform(df_encoded)
print("Ordinal Encoded DataFrame:")
print(df_encoded)
print()
#Using OneHotCategoricalEncoder to Process Category Variables
onehot_encoder = OneHotCategoricalEncoder(variables=['Sex', 'City'])
df_encoded = onehot_encoder.fit_transform(df_encoded)
print("OneHot Encoded DataFrame:")
print(df_encoded)
The output results are as follows:
Rare Label Encoded DataFrame:
Sex City Age
0 female New York 25
1 male London 45
2 female Paris 10
3 female London 35
4 male Tokyo 60
Ordinal Encoded DataFrame:
Sex City Age
0 0 0 25
1 1 1 45
2 0 2 10
3 0 1 35
4 1 3 60
OneHot Encoded DataFrame:
Sex_0 Sex_1 City_0 City_1 City_2 City_3 Age
0 1 0 1 0 0 0 25
1 0 1 0 1 0 0 45
2 1 0 0 0 1 0 10
3 1 0 0 1 0 0 35
4 0 1 0 0 0 1 60
Summary:
Through the above examples, we used the RareLabelCategoricalEncoder, OrdinalCategoricalEncoder, and OneHotCategoricalEncoder functions in the Feature engine library to process category variables. The RareLabelCategoricalEncoder handles rare categories by replacing those that appear less frequently. The OrdinalCategoricalEncoder processes category variables by mapping categories to ordered numbers. OneHotCategoricalEncoder converts category variables into binary vectors through unique hot encoding. These functions provide different methods to handle category variables, and suitable encoding methods can be selected based on specific needs.