Pandas dataframe.groupby()

The groupby statement in DataFrame groups rows that have the similar values into summary rows, like "find the number of Apple Steve have". The DataFrame groupby statement is often used with aggregate functions (sum, count, mean, min, max etc.) to group the output by one or more columns.

Lets' create DataFrame with values.

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['Doe', 'Doe', 'Mike','Steve','Doe','Doe','Mike','Doe','Mike'] df['Fruit'] = ['Apple', 'Apple', 'Apple','Orange','Orange','Orange','Orange','Grapes','Grapes'] df['Count'] = [20, 10, 20,30,10,40,50,10,30] df
Name Fruit Count 0 Doe Apple 20 1 Doe Apple 10 2 Mike Apple 20 3 Steve Orange 30 4 Doe Orange 10 5 Doe Orange 40 6 Mike Orange 50 7 Doe Grapes 10 8 Mike Grapes 30

Here you can see 3 names (Doe, Mike and Steve) have different kind of fruits (Apple, Orange and Grapes). So, you can have some operations on these tables using DataFrame groupby statement.


Pandas dataframe.groupby() examples

In the above image you can see some results from the above DataFrame. So, lets try to get the above result using DataFrame group by operation.

df.groupby(['Name','Fruit']).sum()
Count Name Fruit Doe Apple 30 Grapes 10 Orange 50 Mike Apple 20 Grapes 30 Orange 50 Steve Orange 30

Apply reset_index()

df.groupby(['Name','Fruit'])['Count'].sum().reset_index()
Name Fruit Count 0 Doe Apple 30 1 Doe Grapes 10 2 Doe Orange 50 3 Mike Apple 20 4 Mike Grapes 30 5 Mike Orange 50 6 Steve Orange 30

Also, you get another result to change the groupby order:

df.groupby(['Fruit','Name']).sum()
Count Fruit Name Apple Doe 30 Mike 20 Grapes Doe 10 Mike 30 Orange Doe 50 Mike 50 Steve 30

Pivot Table

You can use the pivot functionality to arrange the data in a better grid.

df.groupby(['Name','Fruit'],as_index = False).sum().pivot('Name','Fruit').fillna(0)
Fruit Apple Grapes Orange Name Doe 30.0 10.0 50.0 Mike 20.0 30.0 50.0 Steve 0.0 0.0 30.0

Find the total count of fruits by person

df.groupby(['Name']).sum()
Count Name Doe 90 Mike 100 Steve 30

Find the total count of fruits

df.groupby(['Fruit']).sum()
Count Fruit Apple 50 Grapes 40 Orange 130

How many row entries for fruits in the table?

df.groupby(['Fruit']).size().reset_index()
Fruit 0 0 Apple 3 1 Grapes 2 2 Orange 4