Practical Application of Scikit-learn Decision Tree in Python

Environmental construction and preparation work: 1. Ensure that Python and pip are installed, and it is recommended to use Python version 3. x. 2. Install the Scikit learn library: You can use the following command to install: pip install -U scikit-learn 3. Import the required class library: import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn import metrics Dataset introduction and download website: This example uses the Iris (Iris) dataset, which is a commonly used classification experiment dataset that includes 150 samples divided into 3 categories, with 50 samples in each category. Each sample contains 4 features: calyx length, calyx width, petal length, and petal width. Dataset download website: https://archive.ics.uci.edu/ml/datasets/iris Sample code implementation: python #Read Dataset url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pd.read_csv(url, names=names) #The dataset is divided into features and target variables X = dataset.iloc[:, :-1] y = dataset.iloc[:, -1] #The dataset is divided into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) #Create Decision Tree Classifier clf = DecisionTreeClassifier() #Using training sets to train models clf.fit(X_train, y_train) #Predictive Test Set y_pred = clf.predict(X_test) #Calculation accuracy accuracy = metrics.accuracy_score(y_test, y_pred) Print ("Accuracy:", accuracy) Summary: This article introduces the practical application of Python's decision tree algorithm in the Scikit learn library for classification tasks. Firstly, the environment was built and prepared, including installing Scikit learn and importing the required class libraries. Then it introduced the dataset Iris used and provided a website for downloading the dataset. Subsequently, a complete sample code was provided, including steps such as dataset reading, feature and target variable partitioning, model training, prediction, and calculation accuracy. Finally, a summary of the entire process was provided.