Practical Application of Scikit-learn Linear Regression in Python
Preparation work and environmental setup:
1. Install Python: On the Python official website( https://www.python.org/downloads/ )Download the Python version suitable for your operating system and install it.
2. Install Scikit learn: Open a command prompt and enter the following command to install Scikit learn:
pip install -U scikit-learn
3. Install other necessary class libraries: In this practice, we also need to use the numpy, pandas, and matplotlib class libraries. Enter the following command to install:
pip install numpy pandas matplotlib
Dependent class libraries:
1. numpy: Used for numerical calculations and array operations.
2. Pandas: Used for data preprocessing and analysis.
3. matplotlib: used for Data and information visualization.
4. Scikit learn: used to construct and train machine learning models.
Dataset introduction:
The actual combat used this time is the Boston Housing Dataset, which comes with Scikit-learn and is a classic dataset used for regression problems.
This dataset contains 506 samples, each with 13 features such as crime rate, average number of rooms, etc. The target variable is the median price of houses in the region.
Dataset download website:
The dataset provided by Scikit learn can be downloaded directly from its server without the need for additional download links.
Sample data and code:
The following is a complete sample code for linear regression using Scikit-learn:
python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
#Load Boston House Price Dataset
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['PRICE'] = boston.target
#Extract features and target variables
X = df.drop('PRICE', axis=1).values
y = df['PRICE'].values
#Divide training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#Building a linear regression model
model = LinearRegression()
#Training model
model.fit(X_train, y_train)
#Prediction
y_pred = model.predict(X_test)
#Evaluation
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)
#Visualization results
plt.scatter(y_test, y_pred)
plt.plot([y.min(), y.max()], [y.min(), y.max()], '--', color='red', linewidth=2)
plt.xlabel('True Price')
plt.ylabel('Predicted Price')
plt.title('Boston Housing Dataset - Linear Regression')
plt.show()
Run the above code to perform linear regression and obtain a visual display of the results.
Summary:
This practical exercise introduced how to use Scikit-learn for linear regression, and used the Boston housing price dataset as an example to train, predict, and evaluate the model. Through this example, we can learn the basic process of building machine learning models using Scikit learn, as well as using relevant class libraries for data processing and visualization.