Python uses Scikit born Logistic regression
Preparation work:
1. Install Python: on the official website( https://www.python.org/downloads/ )Download and install the latest version of Python.
2. Install Scikit learn: Open a command line window and run the following command:
pip install scikit-learn
Dependent class libraries:
-Pandas: For data processing and analysis, the installation command is' pip install Pandas'`
-NumPy: used for numerical calculations and array operations, with the installation command 'pip install numpy'`
-Matplotlib: For visualization, the installation command is' pip install matplotlib '`
-Seaborn: Matplotlib based Data and information visualization library. The installation command is ` pip install seaborn`
Dataset introduction:
The actual battle used a dataset of Titanic passengers, which includes characteristic information of the passengers (such as age, gender, ticket level, etc.) and survival labels. The dataset contains two files: the training set (train. csv) and the test set (test. csv), which can be downloaded from the Kaggle website( https://www.kaggle.com/c/titanic/data ).
Sample data:
Some of the data in the training set are as follows:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
The complete sample code is as follows:
python
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
#Read Training Set
train_data = pd.read_csv('train.csv')
#Data preprocessing
train_data = train_data[['Survived', 'Pclass', 'Sex', 'Age', 'Fare']]
train_data = train_data.dropna()
train_data['Sex'] = train_data['Sex'].map({'female': 0, 'male': 1})
#Partition features and labels
X = train_data[['Pclass', 'Sex', 'Age', 'Fare']]
y = train_data['Survived']
#Divide training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#Create Logistic regression model
model = LogisticRegression()
#Model training
model.fit(X_train, y_train)
#Model prediction
y_pred = model.predict(X_test)
#Calculation accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Code Description:
Firstly, import the required libraries and classes.
2. Use the Pandas library to read the CSV file of the training set, preprocess the data, select the required feature columns, and delete rows containing null values. At the same time, convert the value of the gender column to a numerical representation.
3. Divide features and labels.
4. Use train_ Test_ Split divides the dataset into training and testing sets.
5. Create a LogisticRegression object, that is, a Logistic regression model.
6. Train the model.
7. Use the trained model to predict the test set.
8. Use accuracy_ Calculate the prediction accuracy based on score.
9. Printing accuracy.
Summary:
The Logistic regression model in the Scikit-learn database was used to predict the passenger data of the Titanic in this actual battle, and the prediction accuracy was calculated. Through this example, we can learn how to use Scikit learn to model and predict machine learning tasks, and to preprocess data and Feature selection.