DataFrame Aggregation and Grouping
Pandas aggregate() function is used to apply some aggregation across one or more column.
df.aggregate(['sum', 'min'])
Above statement will apply aggregation across all the columns in a DataFrame and calculate sum and min will be found for each numeric type column.
Lets create a DataFrame..
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben']
df['Age'] = [14, 12, 14,11,12,14]
df['Category'] = ['A', 'E', 'B','E','C','D']
df['Height'] = [145, 152,167,136,149,161]
df['Weight'] = [34, 54,38,39,44,51]
df
Name Age Category Height Weight
0 John 14 A 145 34
1 Doe 12 E 152 54
2 Bill 14 B 167 38
3 Jim 11 E 136 39
4 Harry 12 C 149 44
5 Ben 14 D 161 51
The following operation will apply aggregation across all the columns in a DataFrame and calculate minimum and maximum will be found for each numeric type column.
df.aggregate(['min', 'max'])
Name Age Category Height Weight
min Ben 11 A 136 34
max John 14 E 167 54
DataFrame aggregation across different columns
df.aggregate({"Age":['min', 'max'],
"Height":['mean', 'max'],
"Weight":['mean', 'min']})
Age Height Weight
max 14.0 167.000000 NaN
mean NaN 151.666667 43.333333
min 11.0 NaN 34.000000
Aggregation with groupby
df.groupby('Age').agg({"Height": "sum"})
Height
Age
11 136
12 301
14 473
For multiple functions applied for one column use a list of tuples - names of new columns and aggregated functions:
df.groupby(['Age'])['Height'].agg([('Average','mean'),('Total','sum')])
Average Total
Age
11 136.000000 136
12 150.500000 301
14 157.666667 473
If you want to pass multiple functions is possible pass list of tuples.
df.groupby(['Age', 'Category']).agg([('Average','mean'),('Total','sum')])
Height Weight
Average Total Average Total
Age Category
11 E 136 136 39 39
12 C 149 149 44 44
E 152 152 54 54
14 A 145 145 34 34
B 167 167 38 38
D 161 161 51 51
Instead of an aggregation function it is possible to pass list, tuple, set for converting column.
df.groupby('Age')['Category'].agg(list).reset_index()
Age Category
0 11 [E]
1 12 [E, C]
2 14 [A, B, D]
For converting to strings with separator use .join only if string column.
df.groupby('Age')['Category'].agg(','.join).reset_index()
Age Category
0 11 E
1 12 E,C
2 14 A,B,D
Some common aggregating functions are tabulated below:
Function | Description |
---|---|
mean() | Compute mean of groups |
sum() | Compute sum of group values |
size() | Compute group sizes |
count() | Compute count of group |
std() | Standard deviation of groups |
first() | Compute first of group values |
last() | Compute last of group values |
min() | Compute min of group values |
max() | Compute max of group values |
Related Topics
- Pandas DataFrame: GroupBy Examples
- How to Sort Pandas DataFrame
- Pandas DataFrame: query() function
- Finding and removing duplicate rows in Pandas DataFrame
- How to Replace NaN Values With Zeros in Pandas DataFrame
- How to read CSV File using Pandas DataFrame.read_csv()
- How to Convert Pandas DataFrame to NumPy Array
- How to shuffle a DataFrame rows
- Import multiple csv files into one pandas DataFrame
- Create new column in DataFrame based on the existing columns
- New Pandas dataframe column based on if-else condition
- How to Convert a Dictionary to Pandas DataFrame
- Rename Pandas columns/index names (labels)