Python uses Scikit-learn Random forest
Preparation work:
1. Environment setup: Install Python and Scikit learn libraries.
2. Dependent class libraries: numpy, pandas, matplotlib.
Dataset introduction:
We will use a built-in Iris classification dataset (IRIS dataset) from the Scikit learn library. This dataset contains 150 records, each with 4 features: sepal length, sepal width, petal length, and petal width. Each record belongs to one of three categories: Setosa, Versicolor, and Virginia.
Code implementation:
#Import the required class libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
#Load Iris Dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
#Divide the dataset into training and testing sets
np.random.seed(0)
indices = np.random.permutation(len(X))
X_train = X[indices[:-30]]
y_train = y[indices[:-30]]
X_test = X[indices[-30:]]
y_test = y[indices[-30:]]
#Create Random forest Classifier Model
model = RandomForestClassifier(n_estimators=10)
#Training model
model.fit(X_train, y_train)
#Prediction
predicted = model.predict(X_test)
#Print prediction results
Print ("Predicted result:", predicted)
#Print real results
Print ("Real result:", y_test)
#Printing accuracy
accuracy = np.mean(predicted == y_test)
Print ("Accuracy:", accuracy)
Code description:
Firstly, import the necessary class libraries.
2. Load the Iris dataset, store the features in variable X, and store the target variable in variable y.
3. Use the random. mutation function of the numpy library to randomly shuffle the dataset and divide it into training and testing sets.
4. Create a Random forest classifier model and set n_ The estimators parameter is 10, indicating the use of 10 decision trees.
5. Use training set data for model training.
6. Use test set data for prediction.
7. Print predicted and actual results.
8. Calculate and print the accuracy rate, which represents the proportion of the predicted results that are consistent with the actual results.
Summary:
This example uses the Random forest classifier in the Scikit-learn library to classify the iris dataset. Firstly, import the necessary class libraries, load the dataset, and divide it into training and testing sets. Then create and train the Random forest classifier model, use the test set data to predict, and calculate the accuracy. Finally, print the predicted results, true results, and accuracy. Random forest classifier is a powerful machine learning model, which is suitable for classification and regression problems. Its advantages include the ability to process large amounts of data, good accuracy, and the ability to process high-dimensional data.