Practical Application of K-Means Using Scikit-learn in Python
Environmental construction and preparation work:
1. Install Python environment: Scikit learn is a machine learning library based on Python, and Python needs to be installed first. You can access it from the official website( https://www.python.org/ )Download the latest version of Python and install it.
2. Install the Scikit learn library: Use the pip command to install the Scikit learn library, as follows:
pip install scikit-learn
Dependent class libraries:
In this example, only the Scikit learn library is used.
Dataset introduction:
This example takes the Iris Iris dataset as an example, which is one of the very classic datasets in the field of machine learning, used for multi classification problems. The dataset contains 150 samples, each with 4 features, divided into 3 categories.
Dataset download website:
The Iris dataset is an example dataset that comes with the Scikit learn library and can be loaded using the following code:
python
from sklearn.datasets import load_iris
iris = load_iris()
Sample data:
The samples in the Iris dataset contain four features, namely calyx length, calyx width, petal length, and petal width. Each sample comes with a label, which represents the variety of flowers.
The complete sample code is as follows:
python
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
#Load Iris dataset
iris = load_iris()
X = iris.data
#Create a KMeans model and set the number of cluster centers to 3
kmeans = KMeans(n_clusters=3)
#Training model
kmeans.fit(X)
#Forecast Category
labels = kmeans.predict(X)
#Output the category of each sample
for i in range(len(X)):
print("Sample:", X[i], " Label:", labels[i])
Running the above code can obtain the characteristics of each sample and the clustering category they belong to.
Summary:
This example introduces the practical application of the K-Means algorithm using the Scikit learn library in Python. Firstly, the environment construction and preparation work were carried out, and then the relevant class libraries and datasets were introduced. The sample code loads the Iris dataset, clusters the dataset using the K-Means algorithm, and outputs the clustering category for each sample. Finally, by running the code, the category of each sample can be obtained.