Python uses Pandas to achieve various data aggregation and statistics, including counting, summation, mean, median, variance, standard deviation, etc
Preparation work:
1. Install Python and Pandas: First, you need to install Python and Pandas, which can be accessed from the Python official website( https://www.python.org/downloads/ )Download and install Python, then use pip install Pandas to install Pandas.
2. Import Pandas Library: Import the Pandas library into Python code to use its functions and classes.
Dependent class libraries:
1. Pandas: Used for data processing and analysis.
2. NumPy: Used for mathematical calculations and array operations.
Dataset introduction:
We will use a dataset called 'sales. csv'. It contains information about sales orders, including order ID, customer ID, product ID, order date, sales volume, etc.
Dataset download website:
You can download the "sales. csv" dataset from the following website: https://example.com/sales.csv
Sample data:
The following is an example data of the 'sales. csv' dataset:
|Order ID | Customer ID | Product ID | Order Date | Sales|
|----------|-------------|------------|-------------|-------|
|1 | A001 | P001 | 2020-01-01 | 100|
|2 | A002 | P002 | 2020-01-02 | 200|
|3 | A003 | P003 | 2020-01-02 | 300|
|4 | A001 | P002 | 2020-01-03 | 150|
|5 | A002 | P001 | 2020-01-03 | 250|
The complete example code is as follows:
python
#Import the required libraries
import pandas as pd
import numpy as np
#Read Dataset
data = pd.read_csv('sales.csv')
#Count
count = data['Order ID'].count()
print('Count:', count)
#Summation
sum_sales = data['Sales'].sum()
print('Sum:', sum_sales)
#Mean
mean_sales = data['Sales'].mean()
print('Mean:', mean_sales)
#Median
median_sales = data['Sales'].median()
print('Median:', median_sales)
#Variance
var_sales = data['Sales'].var()
print('Variance:', var_sales)
#Standard deviation
std_sales = data['Sales'].std()
print('Standard Deviation:', std_sales)
The above code will output the following results:
Count: 5
Sum: 1000
Mean: 200.0
Median: 200.0
Variance: 9166.666666666666
Standard Deviation: 95.73444801933198
This completes the example of using Pandas for multiple data aggregation and statistics.