Introduction
In this tutorial, we are going to manage CSV(comma-separated values) files using Python Language. There are two python tools, csv, and pandas that offer CSV parsing(performing several operations with CSV files). But here, we will use the built-in library, csv to deal with CSV.
I know many of you have an idea about the file ending with .csv but I suggest you go through the next step to find something interesting here.
What is a CSV file?
A quick recap: CSV file or comma-separated file is a plain text file where values are separated by a comma(',') which is also called a delimiter.
Not only a comma but there are more characters which are also known as delimiters of values for a CSV file.
Examples of delimiter: comma(','), semicolon(';'), tab('\t'), pipe('|'), and space(' '). Among these separator choices, comma(',') is most frequently used.
Values are stored in a CSV file in rows. The very first row specifies as header or name of the column. The next lines represent values or data. See the image below for a better understanding.
How to read this Tutorial?
I have divided this tutorial into two parts, Basic and Advanced. Most people just find the code example and if this is you, visit the Basic part.
People who want to dig the topic more deeper, go through both Basic and Advanced parts. I have all done it.
Read and Write CSV files in Python: Basic Methods
In this section, we will perform several operations with CSV files using simple methods. The reason you will get shortly why I'm using the word simple.
Throughout this section, I will use this 'employee.csv' file which looks like the following.
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT
So, let's get started with the reading operation.
Read a CSV file in Python with csv.reader()
Normally, reading a simple CSV file is too much easy. You just need to follow these simple steps below:
1. Import the csv module
2. Open the CSV file in read mode(by default it opens in the same mode)
3. Declare a csv.reader object
4. Iterate through the reader object and print the rows
Important Note |
---|
Remember one thing, the reader object is iterable object; so you can read data from it using a for loop. |
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# reader object
reader = csv.reader(f)
# header of the columns
header = next(reader)
print(header,'\n')
# printing all the rows of values
for row in reader:
print(row)
Output
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']
['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN-07', 'SH_CLERK']
['200', 'Jennifer', 'Whalen', '515-123-4444', '17-SEP-03', 'AD_ASST']
['206', 'William', 'Gietz', '515-123-8181', '07-JUN-02', 'AC_ACCOUNT']
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']
Read a CSV file in Python with csv.DictReader()
csv.reader() returns an iterable object which contains the values in list format. But there is a class, csv.DictReader(), which offers reading CSV file data into a python dictionary.
The main advantage of using this process is that you get an ordered collection of data to work with.
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# DictReader object
reader = csv.DictReader(f)
# printing all the rows of values
for row in reader:
print(row)
Output
{'ID': '198', 'Name': 'Donald', 'Surname': 'OConnell', 'Contact': '650-507-9833', 'Hire_Date': '21-JUN-07', 'Job_ID': 'SH_CLERK'}
{'ID': '200', 'Name': 'Jennifer', 'Surname': 'Whalen', 'Contact': '515-123-4444', 'Hire_Date': '17-SEP-03', 'Job_ID': 'AD_ASST'}
{'ID': '206', 'Name': 'William', 'Surname': 'Gietz', 'Contact': '515-123-8181', 'Hire_Date': '07-JUN-02', 'Job_ID': 'AC_ACCOUNT'}
{'ID': '208', 'Name': 'Alex', 'Surname': 'Jones', 'Contact': '-7793', 'Hire_Date': '07-JAN-02', 'Job_ID': 'AD_ASST'}
{'ID': '209', 'Name': 'Peter', 'Surname': 'Morgan', 'Contact': '-7769', 'Hire_Date': '08-FEB-04', 'Job_ID': 'AD_ASST'}
Writing to CSV file in Python with csv.writer()
Here, we will write data or values to a CSV file using python. Look a glimpse at 'employee.csv' above. We'll add two data rows just next to the existing.
In the code below, you'll see the CSV file has opened in append('a') mode. In this case, the new data will be added next to the existing ones. If you want to rewrite the existing data with new, use write('w') mode instead of append mode('a').
Code
import csv
# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [[208, 'Alex', 'Jones',512-123-8182,'07-JAN-02','AD_ASST'],
[209, 'Peter', 'Morgan',650-231-8188,'08-FEB-04','AD_ASST']
]
# opening the CSV file in append mode
with open('employee.csv','a') as f:
# writer object
f_csv = csv.writer(f)
# writing the headings
# f_csv.writerow(headers)
# writing the rows or values
f_csv.writerows(rows)
Output
After adding the data to the CSV file('employees.csv') looks like this.
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT
208,Alex,Jones,-7793,07-JAN-02,AD_ASST
209,Peter,Morgan,-7769,08-FEB-04,AD_ASST
Look, the last two lines are newly added.
See the yellow line in the above code. I've commented this line. Let me tell you one interesting fact about when you are writing to a CSV file using csv module.
.writerow() method is used for adding only one row of data and .writerows() adds multiple rows presented in a python list or dictionary or tuple, etc.
Since we applied the append mode(on the 'employee.csv' file which is not empty), there is no need to add the header rows more than one time. That is why I commented there(the yellow line).
If you are thinking about creating a new file or rewriting the data again, you must add the header rows. In this case, remove the comment from there(the yellow line).
Write dictionary to CSV with csv.DictWriter()
In the previous example, we write a list of data in a CSV file(without heading) using python. We used append mode there to open the CSV.
In this example, we are gonna write a dictionary of data to a CSV file with headers included. In this case, we will save data to a new file('employee_dictcsv') and use write('w') method instead of append('a'). Let's see what changes are happened.
Code
import csv
# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [{'ID':198, 'Name':'Donald', 'Surname':'Trump',
'Contact':'650-507-9833', 'Hire_Date':'21-JUN-07', 'Job_ID':'EX_PRESIDENT'},
{'ID':200, 'Name':'Joe', 'Surname':'Biden',
'Contact':'515-123-4444', 'Hire_Date':'21-JAN-21', 'Job_ID':'PRESIDENT'},
{'ID':206, 'Name':'William', 'Surname':'Gietz',
'Contact':'515-123-8181', 'Hire_Date':'20-JAN-17', 'Job_ID':'AC_ACCOUNT'},
]
# opening the CSV file in write mode
with open('employee_dict.csv','w') as f:
# dictwriter object
writer = csv.DictWriter(f, headers)
# writing the header
writer.writeheader()
# writing the rows or values
writer.writerows(rows)
Output
After adding the data to the CSV file('employee_dict.csv') looks like this.
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,Trump,"650-507-9833","21-JUN-07",EX_PRESIDENT
200,Joe,Biden,"515-123-4444","21-JAN-21",PRESIDENT
206,William,Gietz,"515-123-8181","20-JAN-17",AC_ACCOUNT
Working with CSV files in Python: Advanced Methods
The reason why I classified this section from the previous and gave the name Advanced Methods, you'll soon get the idea of it.
In this section, I'm gonna cover some major tasks that come under CSV file handling with csv module and it will be useful when you're handling a CSV file consisting of a variety and a large amount of data.
Throughout this section, I will use this 'Python_Trends.csv' file which consists of data related to several keywords on python programming, which are searched by people on google.
Here is the data of 'Python_Trends.csv' file.
CSV file of Keyword Research on Python Language |
On my Linux machine, I tried to open this file using 'utf-8' encoding technique (which is by default) but the program was giving such an error.
"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"
Many applications produce utf-16 encoded file as needed because not every character are listed in the utf-8 encoding.
The problem can be solved by using 'utf-16' encoding technique while opening the file through programming. Let's see how it's possible.
Read a CSV file using UTF-16 encoding in Python
The code will be the same as the previous reading method but will have to define an additional parameter such as "encoding = 'utf-16'", when opening the CSV file using the 'open ()' method.
Here is the code for you.
Code
import csv
# Opening the CSV file in read mode(by-default)
with open('Python_Trends.csv', encoding='utf-16') as f:
# csv.reader object
reader = csv.reader(f)
# column names
headers = next(reader)
# printing the column names
print(headers, '\n')
# printing the data from the reader object
for row in reader:
print(row)
Output
['Keyword', 'Avg. monthly searches', 'Competition', 'CPC(Low)', 'CPC(High)']
['python programming', '500000', 'Low', '7.15', '186.04']
['python', '500000', 'Low', '9.64', '197.46']
['learn python', '50000', 'Medium', '30.55', '486.5']
['python ide', '50000', 'Low', '86.4', '295.79']
['guido van rossum', '50000', 'Low', '', '']
['python language', '50000', 'Low', '4.67', '35.96']
['python gui', '50000', 'Low', '3.89', '272.44']
['python coding', '50000', 'Low', '16.35', '382.19']
['python course', '50000', 'Medium', '22.57', '272.44']
['python certification', '50000', 'Medium', '41.84', '760.5']
['python basics', '50000', 'Low', '5.45', '82.51']
['python 3.7', '5000', 'Low', '5.14', '203.94']
['fibonacci series in python', '50000', 'Low', '3.11', '6.23']
['python for beginners', '50000', 'Medium', '11.68', '273.22']
['automate the boring stuff with python', '50000', 'Low', '28.02', '153.34']
['python ide online', '50000', 'Low', '91.07', '296.57']
['python excel', '5000', 'Low', '8.23', '115.98']
['python developer', '50000', 'Low', '12.45', '467.04']
['learn python the hard way', '5000', 'Low', '27', '565.9']
See, the data looks pretty complex. This will be further complicated when the amount of data with different variations increases.
In data science, we often use complex data, in which case, we must use the 'utf-16' encoding technique.
Reading CSV using csv with optional arguments
Remember, at first we worked with a CSV file named 'employee.csv'. I made a minor change there for showing you some interesting facts. Let's see the data stored there.
ID Name Surname Contact Hire_Date
198 "Donald" OConnell 650-507-9833 21-JUN, 2007
200 Jennifer's Whalen 515-123-4444 17-SEP, 2003
Look, I marked every change I made there. Now, in this file, the space(' ') is delimiter(which means two values are separated by a space(' ') character).
The orange mark: "Donald" is a name(or value) that is double-quoted.
The green mark: an apostrophe(') is present in the word "Jennifer's"
The yellow mark: There are two dates where the year is separated by a comma from day and month.
Now we will tell the program to do the following task.
1. Take the space(' ') character as the delimiter by passing " delimiter=' ' " parameter. Comma is the default delimiter. You can specify a semicolon(';'), tab('\t'), or pipe('|') too there, if needed.
2. Ignore the comma(' , ') in the date section, by passing " escapechar=',' " parameter.
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
reader = csv.reader(f, delimiter=' ', escapechar=',')
# header of the columns
header = next(reader)
print(header,'\n')
# printing all the rows of values
for row in reader:
print(row)
Output
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date']
['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN 2007']
['200', "Jennifer's", 'Whalen', '515-123-4444', '17-SEP 2003']
Try to find out what has changed in the output.
Summary
Throughout this tutorial, we have learned how to read and write CSV files in Python. We talked over basic and advanced methods with different examples about this topic.
Some applications create UTF-16 encoded CSV files as needed. In this situation, we have to use UTF-16 encoding instead of UTF-8.
There are different ways to parse CSV files. These libraries can be used when data will become complex and huge: pandas, ANTLR(a powerful parser generator), PLY(parsing tool), PyPlus(A library that is a collection of python extensions), etc.
If string manipulation(for reading and writing methods) is becoming complex and simple methods are not working, regular expression is the only way to get rid of that situation.
Thanks for reading!💙
PySeek