Pandas CParserError: Error tokenizing data
In most cases, it might be an issue with:
- the delimiters in your data.
- confused by the headers/column of the file.
pandas.read_csv(fileName, sep='you_delimiter', header=None)
In the above code, sep parameter defines your delimiter (for eg. '\t' ) and header=None tells pandas that your source data has no row for headers/column titles.
Alternate solution is that:
pd.read_csv('file.csv', error_bad_lines=False)
Above code will cause the offending lines to be skipped.
Fix it manually
The Error tokenizing data may arise when you're using separator (for eg. comma ',') as a delimiter and you have more separator than expected (more fields in the error row than defined in the header). So you need to either remove the additional field or remove the extra separator if it's there by mistake. The better solution is to investigate the offending file and to fix it manually so you don't need to skip the error lines.pandas.to_csv()
In some cases, the pandas.parser.CParserError generated when reading a file written by pandas.to_csv(), it might be because there is a carriage return ('\r') in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, it will cause a difference between the number of columns in the first X rows. This difference is one cause of the CParserError .skiprows
Sometimes the parser is getting confused by the column header of the file. Parser reads the first row and infers the number of columns from that row. Actually the first row(column headers) is not representative of the actual data in the file (for eg. more columns in the error row than defined in the header). In that cases, you can use skiprows . The skiprows parameter skip the first n number of rows .
pd.read_csv('myFile.csv', skiprows=1)
** skiprows=1 will skip first line and try to read from second line.
Related Topics
- ImportError: No module named pandas
- What is SettingWithCopyWarning?
- UnicodeDecodeError while reading CSV file
- ValueError: cannot reindex from a duplicate axis
- How to fix "Unnamed: 0" column in a pandas DataFrame
- ValueError: cannot convert float NaN to integer
- ValueError: Unknown label type: 'unknown'
- ValueError: Length of values does not match length of index
- ValueError: The truth value of an array with more than..