Insert a new column in existing DataFrame

In Pandas, a DataFrame is essentially a 2-dimensional data structure implemented as an ordered dictionary of columns. To add a new column to an existing DataFrame, you can simply assign values to a new column name using either bracket notation or the .loc accessor. This allows you to easily extend the DataFrame with additional data or computed values. By adding a new column, you can enrich the dataset and perform various data manipulations and analysis. So first let's create a data frame with values.

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True] df
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

Using [] accessor

df['Age'] = [12, 12, 13, 12, 13, 12] #Adding column 'Age' df
Name TotalMarks Grade Promoted Age 0 John 82 A True 12 1 Doe 38 E False 12 2 Bill 63 B True 13 3 Jim 22 E False 12 4 Harry 55 C True 13 5 Ben 40 D True 12

Here, when you use square brackets [] to assign a Series to a Pandas DataFrame as a new column, it is effectively performing an outer join or outer merge using the index of the left-hand DataFrame and the index of the right-hand Series. The values from the Series will be aligned based on their corresponding indices, and any missing values will be filled with NaN. This allows you to easily add new data to the DataFrame based on a common index, creating a new column with values from the Series.

Using insert() method

df.insert(loc, column, value)

You can insert a new column into a DataFrame at a specified index using the insert() method. This method allows you to specify the position where you want to insert the new column, along with the name of the column and the data you want to populate it with. It is a useful way to add new columns to a DataFrame at a specific location, rather than just appending them at the end.

df.insert(1,"Age",[12, 12, 13, 12, 13, 12]) df
Name Age TotalMarks Grade Promoted 0 John 12 82 A True 1 Doe 12 38 E False 2 Bill 13 63 B True 3 Jim 12 22 E False 4 Harry 13 55 C True 5 Ben 12 40 D True

Here you can see the column 'Age' inserted at the index position of 1 using insert() method.

Using assign() method

The assign() function in Pandas DataFrame allows you to create a new DataFrame with additional columns based on the existing DataFrame, without modifying the original DataFrame. This method is useful when you want to create derived columns based on the existing data or perform some data manipulation while keeping the original DataFrame intact. The assign() function returns a new DataFrame with the specified columns added, leaving the original DataFrame unchanged.

new_df = df.assign(Age = [12, 12, 13, 12, 13, 12]) new_df
Name TotalMarks Grade Promoted Age 0 John 82 A True 12 1 Doe 38 E False 12 2 Bill 63 B True 13 3 Jim 22 E False 12 4 Harry 55 C True 13 5 Ben 40 D True 12

Conclusion

You can insert a new column in an existing Pandas DataFrame using various methods like indexing, assignment, or the insert() method. It's essential to choose the appropriate method based on your specific needs and whether you want to modify the DataFrame in-place or create a new DataFrame with the added column.