Combine multiple CSV files into one DataFrame

Combining multiple CSV files into one DataFrame is a common data integration task, especially when dealing with large datasets that are split across multiple files. Pandas provides a straightforward and efficient way to achieve this using the concat() function or the append() method. Let's explore both methods in detail with examples:

Using concat() function

Suppose we have two CSV files, "data1.csv" and "data2.csv," with the following contents:

data1.csv:

Name,Age,Salary Willaim,25,50000 Smith,30,45000

data2.csv:

Name,Age,Salary Clipper,22,60000 Warner,28,55000

We can use the concat() function to combine these two CSV files into one DataFrame:

import pandas as pd # Reading the CSV files df1 = pd.read_csv('data1.csv') df2 = pd.read_csv('data2.csv') # Combining the DataFrames using concat() combined_df = pd.concat([df1, df2]) print(combined_df)
Output: Name Age Salary 0 Willaim 25 50000 1 Smith 30 45000 0 Clipper 22 60000 1 Warner 28 55000

In this example, the concat() function horizontally stacks the rows from both DataFrames and creates a new DataFrame, combined_df, with all the rows from "data1.csv" followed by all the rows from "data2.csv."

Using append() method

The append() method offers an alternative approach to achieve the same result. Instead of creating a new DataFrame like concat(), append() adds rows from one DataFrame to another.

import pandas as pd # Reading the CSV files df1 = pd.read_csv('data1.csv') df2 = pd.read_csv('data2.csv') # Combining the DataFrames using append() combined_df = df1.append(df2) print(combined_df)

The output of this example will be the same as the output of Example 1.


Reading Multiple CSVs Into Pandas

Note: When using the append() method, it is important to remember that a new DataFrame is returned, and the original DataFrames df1 and df2 remain unchanged.

Both methods (concat() and append()) offer flexibility in handling the columns and indexes. For example, you can use the ignore_index=True parameter with concat() or append() to reset the index of the resulting DataFrame.

Conclusion

Combining multiple CSV files into one DataFrame using Pandas is a straightforward process that simplifies data integration and facilitates comprehensive data analysis. By using the power of concat() or append(), data professionals can efficiently merge datasets from multiple sources and unlock deeper insights from their combined data.