DataFrame Aggregation and Grouping

Pandas aggregate() function is used to apply some aggregation across one or more column.
df.aggregate(['sum', 'min'])
Above statement will apply aggregation across all the columns in a DataFrame and calculate sum and min will be found for each numeric type column.

Lets create a DataFrame..

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['Age'] = [14, 12, 14,11,12,14] df['Category'] = ['A', 'E', 'B','E','C','D'] df['Height'] = [145, 152,167,136,149,161] df['Weight'] = [34, 54,38,39,44,51] df
Name Age Category Height Weight 0 John 14 A 145 34 1 Doe 12 E 152 54 2 Bill 14 B 167 38 3 Jim 11 E 136 39 4 Harry 12 C 149 44 5 Ben 14 D 161 51

The following operation will apply aggregation across all the columns in a DataFrame and calculate minimum and maximum will be found for each numeric type column.

df.aggregate(['min', 'max'])
Name Age Category Height Weight min Ben 11 A 136 34 max John 14 E 167 54

DataFrame aggregation across different columns

df.aggregate({"Age":['min', 'max'], "Height":['mean', 'max'], "Weight":['mean', 'min']})
Age Height Weight max 14.0 167.000000 NaN mean NaN 151.666667 43.333333 min 11.0 NaN 34.000000

Aggregation with groupby

df.groupby('Age').agg({"Height": "sum"})
Height Age 11 136 12 301 14 473

For multiple functions applied for one column use a list of tuples - names of new columns and aggregated functions:

df.groupby(['Age'])['Height'].agg([('Average','mean'),('Total','sum')])
Average Total Age 11 136.000000 136 12 150.500000 301 14 157.666667 473

If you want to pass multiple functions is possible pass list of tuples.

df.groupby(['Age', 'Category']).agg([('Average','mean'),('Total','sum')])
Height Weight Average Total Average Total Age Category 11 E 136 136 39 39 12 C 149 149 44 44 E 152 152 54 54 14 A 145 145 34 34 B 167 167 38 38 D 161 161 51 51

Instead of an aggregation function it is possible to pass list, tuple, set for converting column.

df.groupby('Age')['Category'].agg(list).reset_index()
Age Category 0 11 [E] 1 12 [E, C] 2 14 [A, B, D]

For converting to strings with separator use .join only if string column.

df.groupby('Age')['Category'].agg(','.join).reset_index()
Age Category 0 11 E 1 12 E,C 2 14 A,B,D

Some common aggregating functions are tabulated below:

Function Description
mean() Compute mean of groups
sum() Compute sum of group values
size() Compute group sizes
count() Compute count of group
std() Standard deviation of groups
first() Compute first of group values
last() Compute last of group values
min() Compute min of group values
max() Compute max of group values