UnicodeDecodeError: 'utf-8' codec can't decode byte
This will solve UnicodeDecodeError :
df.read_csv('file_name.csv', engine='python')
Or
- Open the csv file in Sublime text editor or VS Code.
- Save the file in utf-8 format.
Then use:
df.read_csv('filename.csv',encoding='utf-8')
On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows, used in Western Europe and the Americas) instead of UTF-8 if there is no BOM ( Byte Order Mark ) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. The pandas read_csv() takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.
Also, you can encode a problematic series first then decode it back to utf-8 .
df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))
This will also rectify the problem.
Related Topics
- ImportError: No module named pandas
- What is SettingWithCopyWarning?
- How to fix CParserError: Error tokenizing data
- ValueError: cannot reindex from a duplicate axis
- How to fix "Unnamed: 0" column in a pandas DataFrame
- ValueError: cannot convert float NaN to integer
- ValueError: Unknown label type: 'unknown'
- ValueError: Length of values does not match length of index
- ValueError: The truth value of an array with more than..