Python uses Statsmoodels linear regression analysis
Environmental preparation:
1. Install Python (it is recommended to install Python version 3. x)
2. Install the statsmodels library: Run 'pip install statsmodels' from the command line`
Dependent libraries:
-Statsmoodels: used to perform linear regression analysis
Dataset introduction:
The dataset used in this example is the Boston Housing Dataset, which comes with Statsmodes and contains data on housing prices and other related features in the Boston area of the United States. The dataset contains 506 observations and 13 feature variables.
Dataset download website: https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
Sample data:
The following are several sample columns of data in the dataset:
CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.00 2.310 0.538 6.575 65.20 4.0900 1 296.00 15.30 396.90 4.98 24.00
1 0.02731 0.00 7.070 0.469 6.421 78.90 4.9671 2 242.00 17.80 396.90 9.14 21.60
2 0.02729 0.00 7.070 0.469 7.185 61.10 4.9671 2 242.00 17.80 392.83 4.03 34.70
3 0.03237 0.00 2.180 0.458 6.998 45.80 6.0622 3 222.00 18.70 394.63 2.94 33.40
4 0.06905 0.00 2.180 0.458 7.147 54.20 6.0622 3 222.00 18.70 396.90 5.33 36.20
The complete code is as follows:
python
import pandas as pd
import statsmodels.api as sm
#Download Dataset
url = 'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'
data = pd.read_csv(url)
#Extract features and target variables
X = data[['CRIM', 'ZN', 'INDUS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']]
y = data['MEDV']
#Add a constant column as an independent variable
X = sm.add_constant(X)
#Train linear regression models
model = sm.OLS(y, X)
results = model.fit()
#Print regression results
print(results.summary())
Running the above code will result in a summary of regression analysis results, including coefficients, t-statistic, p-value, and other information for each feature.