Python uses the EqualFrequencyDiscretiser, EqualWidthDiscretiser, and DecisionTreeDiscretiser functions of Feature engine for data splitting processing
Preparation work:
Before using the Feature engine library, it needs to be installed first. You can install through the pip command:
pip install feature_engine
Dependent class libraries:
In this example, in addition to the Feature engine library, the Pandas library and Numpy library need to be introduced to handle data and array operations.
Data sample:
To demonstrate the process of data boxing, we use an example dataset. Suppose we have a dataset containing customer credit scores, which includes two variables: 'Age' and 'Income'.
The complete Python code is as follows:
python
import pandas as pd
from feature_engine.discretisers import EqualFrequencyDiscretiser, EqualWidthDiscretiser, DecisionTreeDiscretiser
#Create sample data
data = {'Age': [22, 32, 45, 21, 53, 29, 43, 26, 37, 65],
'Income': [50000, 75000, 100000, 40000, 120000, 60000, 90000, 55000, 80000, 150000]}
df = pd.DataFrame(data)
#Use EqualFrequencyDiscretiser to split boxes
eq_freq_discretiser = EqualFrequencyDiscretiser(q=5, variables=['Age', 'Income'])
df_eq_freq = eq_freq_discretiser.fit_transform(df)
#Use EqualWidthDiscretiser to split boxes
eq_width_discretiser = EqualWidthDiscretiser(bins=3, variables=['Age', 'Income'])
df_eq_width = eq_width_discretiser.fit_transform(df)
#Using DecisionTreeDiscretiser to split boxes
dt_discretiser = DecisionTreeDiscretiser(cv=3, scoring='accuracy', variables=['Age', 'Income'])
df_dt = dt_discretiser.fit_transform(df)
#Print the results after unpacking
print("Equal Frequency Discretisation:
", df_eq_freq)
print("
Equal Width Discretisation:
", df_eq_width)
print("
Decision Tree Discretisation:
", df_dt)
The output results are as follows:
Equal Frequency Discretisation:
Age Income
0 1.0 1.0
1 2.0 2.0
2 3.0 3.0
3 1.0 1.0
4 4.0 3.0
5 2.0 1.0
6 3.0 2.0
7 2.0 1.0
8 3.0 2.0
9 4.0 3.0
Equal Width Discretisation:
Age Income
0 1.0 1.0
1 2.0 1.0
2 2.0 1.0
3 1.0 1.0
4 3.0 1.0
5 1.0 1.0
6 2.0 1.0
7 1.0 1.0
8 2.0 1.0
9 3.0 1.0
Decision Tree Discretisation:
Age Income
0 1.0 1.0
1 2.0 2.0
2 3.0 2.0
3 1.0 1.0
4 3.0 3.0
5 2.0 2.0
6 3.0 3.0
7 2.0 2.0
8 2.0 2.0
9 3.0 3.0
Summary:
In this example, we demonstrated how to use the EqualFrequencyDiscretiser, EqualWidthDiscretiser, and DecisionTreeDiscretiser functions in the Feature engine library to split data. These functions can Discretization data according to different box splitting methods, and are applicable to different data types and analysis requirements. Finally, we will print out the results after packaging for reference.