Importing Data with DataFrame.read_csv()
The simple and easiest way to read data from a CSV file is:
import pandas as pd
df = pd.read_csv ('data.csv')
print(df)
Specifying Delimiter
pd.read_csv ('data.csv',sep='\t')
Reading specific Columns only
pd.read_csv ('data.csv',usecols=['Name','Age'])
Read CSV without headers
pd.read_csv ('data.csv',header=None)
Argument header=None , skip the first row and use the 2nd row as headers
Skiprows
skiprows allows you to specify the number of lines to skip at the start of the file.
df = pd.read_csv ('data.csv', skiprows = 3)
Use a specific encoding (e.g. 'utf-8' )
pd.read_csv('data.csv', encoding='utf-8')
Parsing date columns
pd.read_csv('data.csv', parse_dates=['date'])
Specify dType
df = pd.read_csv ('data.csv', usecols=['Height'],dtype=np.float32)
Multi-character separator
By default, Pandas read_csv() uses a C parser engine for high performance. The C parser engine can only handle single character separators. If you need your CSV has a multi-character separator , you will need to modify your code to use the 'python' engine.
pd.read_csv ('data.csv', sep=r'\s*\\s*', engine='python')

UnicodeDecodeError while read_csv()
UnicodeDecodeError occurs when the data was stored in one encoding format but read in a different, incompatible one. The easiest solution for this error is:
pd.read_csv('data.csv', engine='python')
"Unnamed: 0" while read_csv()
"Unnamed: 0" occurs when a DataFrame with an un-named index is saved to CSV and then re-read after. To solve this error, what you have to do is to specify an index_col=[0] argument to read_csv() function, then it reads in the first column as the index.
pd.read_csv('data.csv', index_col=[0])
Instead of having to fix this issue while reading, you can also fix this issue when writing by using:
df.to_csv('data.csv', index=False)
Error tokenizing data while read_csv()
In most cases, it might be an issue with (1) the delimiters in your data (2) confused by the headers/column of the file. Solution:
pandas.read_csv('data.csv', sep='you_delimiter', header=None)
Above code tells pandas that your source data has no row for headers/column titles.
Or
pd.read_csv('data.csv', error_bad_lines=False)
Above code will cause the offending lines to be skipped.
In order to get information about error causing rows try to use combination of error_bad_lines=False and warn_bad_lines=True:
pd.read_csv('data.csv', error_bad_lines=False,warn_bad_lines=True)
FileNotFoundError
In most cases :just put r'' before your path to file. Because \ escapes character.
pd.read_csv(r'D:\Users\Desktop\data.csv')
Here r is a special character and means raw string.
Another way is to use \\ in your string to escape that \.
pd.read_csv('C:\\Users\\mylab\\Desktop\\data.csv')

MemoryError
Memory errors happens a lot with python when using the 32bit Windows version . This is because 32bit processes only gets 2GB of memory to play with by default. The solution for this error is that pandas.read_csv() function takes an option called dtype. This lets pandas know what types exist inside your csv data. For example: by specifying dtype={'age':int} as an option to the .read_csv() will let pandas know that age should be interpreted as a number. This saves you lots of memory.
pd.read_csv('data.csv',dtype={'age':int})
Or try the solution below:
pd.read_csv('data.csv',sep='\t',low_memory=False)
Related Topics
- Pandas DataFrame: GroupBy Examples
- Pandas DataFrame Aggregation and Grouping
- How to Sort Pandas DataFrame
- Pandas DataFrame: query() function
- Finding and removing duplicate rows in Pandas DataFrame
- How to Replace NaN Values With Zeros in Pandas DataFrame
- How to Convert Pandas DataFrame to NumPy Array
- How to shuffle a DataFrame rows
- Import multiple csv files into one pandas DataFrame
- Create new column in DataFrame based on the existing columns
- New Pandas dataframe column based on if-else condition
- How to Convert a Dictionary to Pandas DataFrame
- Rename Pandas columns/index names (labels)