Combine multiple CSV files into one DataFrame
Combining multiple CSV files into one DataFrame is a common data integration task, especially when dealing with large datasets that are split across multiple files. Pandas provides a straightforward and efficient way to achieve this using the concat() function or the append() method. Let's explore both methods in detail with examples:
Using concat() function
Suppose we have two CSV files, "data1.csv" and "data2.csv," with the following contents:
data1.csv:
data2.csv:
We can use the concat() function to combine these two CSV files into one DataFrame:
In this example, the concat() function horizontally stacks the rows from both DataFrames and creates a new DataFrame, combined_df, with all the rows from "data1.csv" followed by all the rows from "data2.csv."
Using append() method
The append() method offers an alternative approach to achieve the same result. Instead of creating a new DataFrame like concat(), append() adds rows from one DataFrame to another.
The output of this example will be the same as the output of Example 1.
Note: When using the append() method, it is important to remember that a new DataFrame is returned, and the original DataFrames df1 and df2 remain unchanged.
Both methods (concat() and append()) offer flexibility in handling the columns and indexes. For example, you can use the ignore_index=True parameter with concat() or append() to reset the index of the resulting DataFrame.
Conclusion
Combining multiple CSV files into one DataFrame using Pandas is a straightforward process that simplifies data integration and facilitates comprehensive data analysis. By using the power of concat() or append(), data professionals can efficiently merge datasets from multiple sources and unlock deeper insights from their combined data.
- Pandas DataFrame: GroupBy Examples
- Pandas DataFrame Aggregation and Grouping
- How to Sort Pandas DataFrame
- Pandas DataFrame: query() function
- Finding and removing duplicate rows in Pandas DataFrame
- How to Replace NaN Values With Zeros in Pandas DataFrame
- How to read CSV File using Pandas DataFrame.read_csv()
- How to Convert Pandas DataFrame to NumPy Array
- How to shuffle a DataFrame rows
- Create new column in DataFrame based on the existing columns
- New Pandas dataframe column based on if-else condition
- How to Convert a Dictionary to Pandas DataFrame
- Rename Pandas columns/index names (labels)
- Check for NaN Values : Pandas DataFrame