Python: \ufeff solution

Have you ever had this problem when you open a csv file using python and it did not recognize the column name even though you are 3000% sure the column name is there. 

Example csv data which is save as my_example.csv

id;name;age
1;aaa;11
2;bbb;12
3;ccc;13

So then you tried to open it using python such as:

with open('my_example.csv') as csv_file:
    reader = csv.DictReader(csv_file, delimiter=";")
        for row in list(reader)[:5]:
            print(row['id')

But then you are surprised because it throws error “KeyError: id is not recognized on row[‘id’]”. So I changed the code from print(row[‘id’] to print(row). I did that just because I am curious how the ptyhon unpack the csv file. Surprisingly it shows like this below:

{'\ufeffid': '1', 'name': 'aaa', 'age': '11'}
{'\ufeffid': '2', 'name': 'bbb', 'age': '12'}
{'\ufeffid': '3', 'name': 'ccc', 'age': '13'}

What is this \ufeff ?????

So this is called BOM – Byte Order Mark and is used to tell the difference between big- and little-endian UTF-16 encoding.  And when this issue is most likely occur? Imagine you create a csv file via excel and then when you save it, you save it in UTF-8. Excel by default will put a signature on the file and afaik there is no way to avoid it. 

Oke, so it is encoding problem. And then we can solve it with adding encoding when we open the csv file. 

with open('my_example.csv', encoding='utf-8-sig') as csv_file:
    reader = csv.DictReader(csv_file, delimiter=";")
        for row in list(reader)[:5]:
            print(row['id')

And voila, it works again! But will it work if the file we are trying to open does not have this issue? It is! The utf-8-sig encoding will decode both utf-8-sig-encoded text and text encoded with the standard utf-8 encoding. 

Leave a Reply

Your email address will not be published. Required fields are marked *