New dataframe column based on a given condition
There are times when you would like to add a new DataFrame column based on some condition . Actually, there does not exist any Pandas library function to achieve this method directly.Suppose you have a DataFrame like this:
Name A B
0 John 2 2
1 Doe 3 1
2 Bill 1 3
You want to create a new column "Result" based on the following condition:
- A == B: 0
- A > B: 1
- A < B: -1
So, by applying above condition, DataFrame should be:
Name A B Result
0 John 2 2 0
1 Doe 3 1 1
2 Bill 1 3 -1
How yo can achieve above condition through Pandas DataFrame operation?
Lets create a DataFrame..
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['John', 'Doe', 'Bill']
df['A'] = [2,3,1]
df['B'] = [2, 1, 3]
df
Name A B
0 John 2 2
1 Doe 3 1
2 Bill 1 3
Vectorized Version
df['Result'] = np.where(
df['A'] == df['B'], 0, np.where(
df['A'] > df['B'], 1, -1))
Full Source
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['John', 'Doe', 'Bill']
df['A'] = [2,3,1]
df['B'] = [2, 1, 3]
df
df['Result'] = np.where(
df['A'] == df['B'], 0, np.where(
df['A'] > df['B'], 1, -1))
df

Name A B Result
0 John 2 2 0
1 Doe 3 1 1
2 Bill 1 3 -1
Using if..else
def f(row):
if row['A'] == row['B']:
val = 0
elif row['A'] > row['B']:
val = 1
else:
val = -1
return val
df['Result'] = df.apply(f, axis=1)
Full Source
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['John', 'Doe', 'Bill']
df['A'] = [2,3,1]
df['B'] = [2, 1, 3]
def f(row):
if row['A'] == row['B']:
val = 0
elif row['A'] > row['B']:
val = 1
else:
val = -1
return val
df['Result'] = df.apply(f, axis=1)
df
Name A B Result
0 John 2 2 0
1 Doe 3 1 1
2 Bill 1 3 -1
Operation on Single column
Suppose, you have a DataFrame like this:
Marks
0 82
1 38
2 44
3 51
4 67
You would like to add one more column for Result based on certain conditions.
- Marks <= 30 : Failed
- Marks >= 40 and <=49 : Passed
- Marks >= 50 and <=59 : Second Class
- Marks >= 60 and <=79 : First Class
- Marks >= 80 and <=100 : Top
df.loc[df['Marks'] <= 39, 'Result'] = 'Failed'
df.loc[(df['Marks'] >= 40) & (df['Marks'] <= 49) , 'Result'] = 'Passed'
df.loc[(df['Marks'] >= 50) & (df['Marks'] <= 59) , 'Result'] = 'Second Class'
df.loc[(df['Marks'] >= 60) & (df['Marks'] <= 79) , 'Result'] = 'First Class'
df.loc[(df['Marks'] >= 80) & (df['Marks'] <= 100) , 'Result'] = 'Top'
Full Source
numbers = {'Marks': [82,38,44,51,67]}
df = pd.DataFrame(numbers,columns=['Marks'])
df.loc[df['Marks'] <= 39, 'Result'] = 'Failed'
df.loc[(df['Marks'] >= 40) & (df['Marks'] <= 49) , 'Result'] = 'Passed'
df.loc[(df['Marks'] >= 50) & (df['Marks'] <= 59) , 'Result'] = 'Second Class'
df.loc[(df['Marks'] >= 60) & (df['Marks'] <= 79) , 'Result'] = 'First Class'
df.loc[(df['Marks'] >= 80) & (df['Marks'] <= 100) , 'Result'] = 'Top'
print (df)

Marks Result
0 82 Top
1 38 Failed
2 44 Passed
3 51 Second Class
4 67 First Class
Related Topics
- Pandas DataFrame: GroupBy Examples
- Pandas DataFrame Aggregation and Grouping
- How to Sort Pandas DataFrame
- Pandas DataFrame: query() function
- Finding and removing duplicate rows in Pandas DataFrame
- How to Replace NaN Values With Zeros in Pandas DataFrame
- How to read CSV File using Pandas DataFrame.read_csv()
- How to Convert Pandas DataFrame to NumPy Array
- How to shuffle a DataFrame rows
- Import multiple csv files into one pandas DataFrame
- Create new column in DataFrame based on the existing columns
- How to Convert a Dictionary to Pandas DataFrame
- Rename Pandas columns/index names (labels)