Introduction
In this tutorial, we are going to work with CSV (comma-separated values) files using Python Language. There are two python tools, csv, and pandas that offer CSV parsing (performing several operations with CSV files). But here, we will use the built-in library, 'csv' to deal with CSV files.
If you are not so familiar with File Handling in Python, I'd recommend brushing up on your basics a bit.
What is a CSV file?
A quick recap: CSV file or Comma-Separated file is a plain text file where values are separated by a comma (',') which is also called a delimiter.
Not only a comma but there are many more characters which are also work as delimiters for CSV files.
Examples of delimiter
comma(','), semicolon(';'), tab('\t'), pipe('|'), and space(' '). Among these separator choices, comma(',') is most frequently used.
Values are stored in a CSV file in rows. The very first row specifies as header or name of the column. The next lines represent values or data. See the image below for a better understanding.
How to read this Tutorial?
I have divided this tutorial into two parts, Basic and Advanced. Most people just find the code example and if this is you, visit the Basic part.
People who want to dig the topic more deeper, go through both Basic and Advanced parts. Both choices are offered.
Learn Also: Working with PDF Files in Python - Complete Tutorial
Read and Write CSV files in Python: Basic Methods
In this section, we will perform several operations with CSV files using simple methods. The reason you will get shortly why I'm using the word simple.
Throughout this section, I will use this 'employee.csv' file which contains some data and it look like the following.
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT
Read a CSV file in Python using csv.reader()
Reading a CSV file is pretty straightforward. You just need to follow these simple steps below:
- Import the csv module.
- Open the CSV file in read mode (by default).
- Declare a csv.reader object.
- Iterate through the reader object and print the rows.
Important Note |
---|
Remember one thing, the reader object is iterable object; so you can read data from it using a for loop. |
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# reader object
reader = csv.reader(f)
# header of the columns
header = next(reader)
print(header,'\n')
# printing all the rows of values
for row in reader:
print(row)
Output
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']
['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN-07', 'SH_CLERK']
['200', 'Jennifer', 'Whalen', '515-123-4444', '17-SEP-03', 'AD_ASST']
['206', 'William', 'Gietz', '515-123-8181', '07-JUN-02', 'AC_ACCOUNT']
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']
Read a CSV file in Python using csv.DictReader()
csv.reader() returns an iterable object which contains the values in list format. But there is a class, csv.DictReader(), which offers reading CSV file data into a Python Dictionary.
The main advantage of using the second method is that you get an ordered collection of data to work with.
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# DictReader object
reader = csv.DictReader(f)
# printing all the rows of values
for row in reader:
print(row)
Output
{'ID': '198', 'Name': 'Donald', 'Surname': 'OConnell', 'Contact': '650-507-9833', 'Hire_Date': '21-JUN-07', 'Job_ID': 'SH_CLERK'}
{'ID': '200', 'Name': 'Jennifer', 'Surname': 'Whalen', 'Contact': '515-123-4444', 'Hire_Date': '17-SEP-03', 'Job_ID': 'AD_ASST'}
{'ID': '206', 'Name': 'William', 'Surname': 'Gietz', 'Contact': '515-123-8181', 'Hire_Date': '07-JUN-02', 'Job_ID': 'AC_ACCOUNT'}
{'ID': '208', 'Name': 'Alex', 'Surname': 'Jones', 'Contact': '-7793', 'Hire_Date': '07-JAN-02', 'Job_ID': 'AD_ASST'}
{'ID': '209', 'Name': 'Peter', 'Surname': 'Morgan', 'Contact': '-7769', 'Hire_Date': '08-FEB-04', 'Job_ID': 'AD_ASST'}
Writing to CSV file in Python using csv.writer()
Here, we will write data or values to a CSV file using python. Look a glimpse at the data in 'employee.csv' above. We'll add two more data rows right next to the existing one.
In the code below, you'll see the CSV file has opened in append ('a') mode. In this case, new data will be added next to the existing data.
If you want to rewrite the existing data with new, use write('w') mode instead of append mode ('a').
Code
import csv
# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [[208, 'Alex', 'Jones',512-123-8182,'07-JAN-02','AD_ASST'],
[209, 'Peter', 'Morgan',650-231-8188,'08-FEB-04','AD_ASST']
]
# opening the CSV file in append mode
with open('employee.csv','a') as f:
# writer object
f_csv = csv.writer(f)
# writing the headings
# f_csv.writerow(headers)
# writing the rows or values
f_csv.writerows(rows)
Output
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT
208,Alex,Jones,-7793,07-JAN-02,AD_ASST
209,Peter,Morgan,-7769,08-FEB-04,AD_ASST
Look, the last two lines are newly added.
See the yellow line in the above code. I've commented this line. Let me tell you one interesting fact about when you are writing to a CSV file using the csv module.
.writerow() method is used for adding only one row of data and .writerows() adds multiple rows presented in a python list or dictionary or tuple, etc.
Since we applied the append mode (on the 'employee.csv' file which is not empty), there is no need to add the header rows more than one time. That is why I commented there (the yellow line).
If you are thinking about creating a new file or rewriting the data again, you must add the header rows. In this case, remove the hash tag ('#') from there (the yellow line).
Write Python Dictionary to CSV using csv.DictWriter()
In the previous example, we write a list of data in a CSV file (without heading) using python. We used append mode there to open the CSV.
In this example, we are gonna write a dictionary of data to a CSV file with headers included. In this case, we will save data to a new file ('employee_dict.csv') and use write ('w') method instead of append ('a').
Code
import csv
# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [{'ID':198, 'Name':'Donald', 'Surname':'Trump',
'Contact':'650-507-9833', 'Hire_Date':'21-JUN-07', 'Job_ID':'EX_PRESIDENT'},
{'ID':200, 'Name':'Joe', 'Surname':'Biden',
'Contact':'515-123-4444', 'Hire_Date':'21-JAN-21', 'Job_ID':'PRESIDENT'},
{'ID':206, 'Name':'William', 'Surname':'Gietz',
'Contact':'515-123-8181', 'Hire_Date':'20-JAN-17', 'Job_ID':'AC_ACCOUNT'},
]
# opening the CSV file in write mode
with open('employee_dict.csv','w') as f:
# dictwriter object
writer = csv.DictWriter(f, headers)
# writing the header
writer.writeheader()
# writing the rows or values
writer.writerows(rows)
Output
After adding the data to the CSV file ('employee_dict.csv') looks like this.
ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,Trump,"650-507-9833","21-JUN-07",EX_PRESIDENT
200,Joe,Biden,"515-123-4444","21-JAN-21",PRESIDENT
206,William,Gietz,"515-123-8181","20-JAN-17",AC_ACCOUNT
Working with CSV files in Python: Advanced Methods
The reason why I classified this section from the previous and gave the name Advanced Methods, you'll soon get the idea of it.
In this section, I'm gonna cover some major tasks that come under CSV file handling using the 'csv' module and it will be useful when you're handling a CSV file consisting of a variety and a large amount of data.
Throughout this section, I will use this 'Python_Trends.csv' file which consists of data related to several keywords on python programming, which are searched by people on Google.
Here is the data of 'Python_Trends.csv' file.
Python_Trends.csv |
On my Linux machine, I tried to open this file using 'utf-8' encoding technique (which is by default) but the program was giving such an error.
"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"
Many applications produce utf-16 encoded file as needed because not every character are listed in the utf-8 encoding. The problem can be solved by using 'utf-16' encoding technique while opening the file through programming. Let's see how it's possible.
Read a CSV file using UTF-16 encoding in Python
The code will be the same as the previous reading method but will have to define an additional parameter such as "encoding = 'utf-16'", when opening the CSV file using the 'open ()' method.
Here is the code for you.
Code
import csv
# Opening the CSV file in read mode(by-default)
with open('Python_Trends.csv', encoding='utf-16') as f:
# csv.reader object
reader = csv.reader(f)
# column names
headers = next(reader)
# printing the column names
print(headers, '\n')
# printing the data from the reader object
for row in reader:
print(row)
Output
['Keyword', 'Avg. monthly searches', 'Competition', 'CPC(Low)', 'CPC(High)']
['python programming', '500000', 'Low', '7.15', '186.04']
['python', '500000', 'Low', '9.64', '197.46']
['learn python', '50000', 'Medium', '30.55', '486.5']
['python ide', '50000', 'Low', '86.4', '295.79']
['guido van rossum', '50000', 'Low', '', '']
['python language', '50000', 'Low', '4.67', '35.96']
['python gui', '50000', 'Low', '3.89', '272.44']
['python coding', '50000', 'Low', '16.35', '382.19']
['python course', '50000', 'Medium', '22.57', '272.44']
['python certification', '50000', 'Medium', '41.84', '760.5']
['python basics', '50000', 'Low', '5.45', '82.51']
['python 3.7', '5000', 'Low', '5.14', '203.94']
['fibonacci series in python', '50000', 'Low', '3.11', '6.23']
['python for beginners', '50000', 'Medium', '11.68', '273.22']
['automate the boring stuff with python', '50000', 'Low', '28.02', '153.34']
['python ide online', '50000', 'Low', '91.07', '296.57']
['python excel', '5000', 'Low', '8.23', '115.98']
['python developer', '50000', 'Low', '12.45', '467.04']
['learn python the hard way', '5000', 'Low', '27', '565.9']
See, the data looks pretty complex. This will be further complicated when the amount of data with different variations increases.
In Data Science, we often use complex data, in which case, we must use the 'utf-16' encoding technique.
Reading CSV using csv with optional arguments
Remember, at first we worked with a CSV file named 'employee.csv'. I made a minor change there for showing you some interesting facts. Let's see the data stored there.
ID Name Surname Contact Hire_Date
198 "Donald" OConnell 650-507-9833 21-JUN, 2007
200 Jennifer's Whalen 515-123-4444 17-SEP, 2003
Look, I marked every change I made there. Now, in this file, the space(' ') is delimiter (which means two values are separated by a space (' ') character).
The orange mark: "Donald" is a name (or value) that is double-quoted.
The green mark: an apostrophe (') is present in the word "Jennifer's"
The yellow mark: There are two dates where the year is separated by a comma from day and month.
Now we will tell the program to do the following task.
1. Take the space(' ') character as the delimiter by passing " delimiter=' ' " parameter. Comma is the default delimiter. You can also specify a semicolon(';'), tab('\t'), or pipe('|') there, if needed.
2. Ignore the comma(' , ') in the date section, by passing " escapechar=',' " parameter.
Code
import csv
# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
reader = csv.reader(f, delimiter=' ', escapechar=',')
# header of the columns
header = next(reader)
print(header,'\n')
# printing all the rows of values
for row in reader:
print(row)
Output
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date']
['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN 2007']
['200', "Jennifer's", 'Whalen', '515-123-4444', '17-SEP 2003']
Try to find out what has changed in the output.
Learn Also: Regular Expression in Python - Complete Tutorial
Summary
Throughout this tutorial, we have learned how to read and write CSV files using Python. We talked over Basic and Advanced methods with different programming examples about this topic.
Some applications create 'UTF-16' encoded CSV files as needed. In this situation, we have to use 'UTF-16' encoding instead of 'UTF-8'.
There are different ways to parse CSV files. These libraries can be used when data will become complex and huge: pandas, ANTLR (a powerful parser generator), PLY (parsing tool), PyPlus (A library that is a collection of python extensions), etc.
If String Manipulation (for reading and writing methods) is becoming complex and simple methods are not working, regular expression is the only way to get rid of that situation.
Thanks for reading!💙
PySeek