Practical Application of Scikit-learn Decision Tree in Python
Environmental construction and preparation work:
1. Ensure that Python and pip are installed, and it is recommended to use Python version 3. x.
2. Install the Scikit learn library: You can use the following command to install:
pip install -U scikit-learn
3. Import the required class library:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
Dataset introduction and download website:
This example uses the Iris (Iris) dataset, which is a commonly used classification experiment dataset that includes 150 samples divided into 3 categories, with 50 samples in each category. Each sample contains 4 features: calyx length, calyx width, petal length, and petal width.
Dataset download website: https://archive.ics.uci.edu/ml/datasets/iris
Sample code implementation:
python
#Read Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pd.read_csv(url, names=names)
#The dataset is divided into features and target variables
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
#The dataset is divided into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
#Create Decision Tree Classifier
clf = DecisionTreeClassifier()
#Using training sets to train models
clf.fit(X_train, y_train)
#Predictive Test Set
y_pred = clf.predict(X_test)
#Calculation accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
Print ("Accuracy:", accuracy)
Summary:
This article introduces the practical application of Python's decision tree algorithm in the Scikit learn library for classification tasks. Firstly, the environment was built and prepared, including installing Scikit learn and importing the required class libraries. Then it introduced the dataset Iris used and provided a website for downloading the dataset. Subsequently, a complete sample code was provided, including steps such as dataset reading, feature and target variable partitioning, model training, prediction, and calculation accuracy. Finally, a summary of the entire process was provided.