Working with CSV files in Python - Complete Tutorial

Introduction

In this tutorial, we are going to work with CSV (comma-separated values) files using Python Language. There are two python tools, csv, and pandas that offer CSV parsing (performing several operations with CSV files). But here, we will use the built-in library, 'csv' to deal with CSV files.

If you are not so familiar with File Handling in Python, I'd recommend brushing up on your basics a bit.

What is a CSV file? 

A quick recap: CSV file or Comma-Separated file is a plain text file where values are separated by a comma (',') which is also called a delimiter.

Not only a comma but there are many more characters which are also work as delimiters for CSV files. 

Examples of delimiter

comma(','), semicolon(';'), tab('\t'), pipe('|'), and space(' '). Among these separator choices, comma(',') is most frequently used.

Values are stored in a CSV file in rows. The very first row specifies as header or name of the column. The next lines represent values or data. See the image below for a better understanding.

An example of CSV file
A CSV File Example

How to read this Tutorial?

I have divided this tutorial into two parts, Basic and Advanced. Most people just find the code example and if this is you, visit the Basic part.

People who want to dig the topic more deeper, go through both Basic and Advanced parts. Both choices are offered.

Learn Also: Working with PDF Files in Python - Complete Tutorial

Read and Write CSV files in Python: Basic Methods

In this section, we will perform several operations with CSV files using simple methods. The reason you will get shortly why I'm using the word simple.

Throughout this section, I will use this 'employee.csv' file which contains some data and it look like the following.

ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT

Read a CSV file in Python using csv.reader()

Reading a CSV file is pretty straightforward. You just need to follow these simple steps below:

  1. Import the csv module.
  2. Open the CSV file in read mode (by default).
  3. Declare a csv.reader object.
  4. Iterate through the reader object and print the rows.
Important Note
Remember one thing, the reader object is iterable object; so you can read data from it using a for loop.

Code


import csv

# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# reader object
reader = csv.reader(f)
# header of the columns
header = next(reader)
print(header,'\n')

# printing all the rows of values
for row in reader:
print(row)

Output

['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']

['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN-07', 'SH_CLERK']
['200', 'Jennifer', 'Whalen', '515-123-4444', '17-SEP-03', 'AD_ASST']
['206', 'William', 'Gietz', '515-123-8181', '07-JUN-02', 'AC_ACCOUNT']
['ID', 'Name', 'Surname', 'Contact', 'Hire_Date', 'Job_ID']

Read a CSV file in Python using csv.DictReader()

csv.reader() returns an iterable object which contains the values in list format. But there is a class, csv.DictReader(), which offers reading CSV file data into a Python Dictionary.

The main advantage of using the second method is that you get an ordered collection of data to work with.

Code


import csv

# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
# DictReader object
reader = csv.DictReader(f)

# printing all the rows of values
for row in reader:
print(row)

Output

{'ID': '198', 'Name': 'Donald', 'Surname': 'OConnell', 'Contact': '650-507-9833', 'Hire_Date': '21-JUN-07', 'Job_ID': 'SH_CLERK'}
{'ID': '200', 'Name': 'Jennifer', 'Surname': 'Whalen', 'Contact': '515-123-4444', 'Hire_Date': '17-SEP-03', 'Job_ID': 'AD_ASST'}
{'ID': '206', 'Name': 'William', 'Surname': 'Gietz', 'Contact': '515-123-8181', 'Hire_Date': '07-JUN-02', 'Job_ID': 'AC_ACCOUNT'}
{'ID': '208', 'Name': 'Alex', 'Surname': 'Jones', 'Contact': '-7793', 'Hire_Date': '07-JAN-02', 'Job_ID': 'AD_ASST'}
{'ID': '209', 'Name': 'Peter', 'Surname': 'Morgan', 'Contact': '-7769', 'Hire_Date': '08-FEB-04', 'Job_ID': 'AD_ASST'}

Writing to CSV file in Python using csv.writer()

Here, we will write data or values to a CSV file using python. Look a glimpse at the data in 'employee.csv' above. We'll add two more data rows right next to the existing one.

In the code below, you'll see the CSV file has opened in append ('a') mode. In this case, new data will be added next to the existing data. 

If you want to rewrite the existing data with new, use write('w') mode instead of append mode ('a').

Code


import csv

# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [[208, 'Alex', 'Jones',512-123-8182,'07-JAN-02','AD_ASST'],
[209, 'Peter', 'Morgan',650-231-8188,'08-FEB-04','AD_ASST']
]

# opening the CSV file in append mode
with open('employee.csv','a') as f:
# writer object
f_csv = csv.writer(f)
# writing the headings
# f_csv.writerow(headers)
# writing the rows or values
f_csv.writerows(rows)

Output

ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,OConnell,650-507-9833,21-JUN-07,SH_CLERK
200,Jennifer,Whalen,515-123-4444,17-SEP-03,AD_ASST
206,William,Gietz,515-123-8181,07-JUN-02,AC_ACCOUNT
208,Alex,Jones,-7793,07-JAN-02,AD_ASST
209,Peter,Morgan,-7769,08-FEB-04,AD_ASST

Look, the last two lines are newly added.

See the yellow line in the above code. I've commented this line. Let me tell you one interesting fact about when you are writing to a CSV file using the csv module.

.writerow() method is used for adding only one row of data and .writerows() adds multiple rows presented in a python list or dictionary or tuple, etc.

Since we applied the append mode (on the 'employee.csv' file which is not empty), there is no need to add the header rows more than one time. That is why I commented there (the yellow line). 

If you are thinking about creating a new file or rewriting the data again, you must add the header rows. In this case, remove the hash tag ('#') from there (the yellow line).

Write Python Dictionary to CSV using csv.DictWriter()

In the previous example, we write a list of data in a CSV file (without heading) using python. We used append mode there to open the CSV.

In this example, we are gonna write a dictionary of data to a CSV file with headers included. In this case, we will save data to a new file ('employee_dict.csv') and use write ('w') method instead of append ('a').

Code


import csv

# declaring the headers or column names
headers = ['ID','Name','Surname','Contact','Hire_Date','Job_ID']
# values, which will be added
rows = [{'ID':198, 'Name':'Donald', 'Surname':'Trump',
'Contact':'650-507-9833', 'Hire_Date':'21-JUN-07', 'Job_ID':'EX_PRESIDENT'},
{'ID':200, 'Name':'Joe', 'Surname':'Biden',
'Contact':'515-123-4444', 'Hire_Date':'21-JAN-21', 'Job_ID':'PRESIDENT'},
{'ID':206, 'Name':'William', 'Surname':'Gietz',
'Contact':'515-123-8181', 'Hire_Date':'20-JAN-17', 'Job_ID':'AC_ACCOUNT'},
]

# opening the CSV file in write mode
with open('employee_dict.csv','w') as f:
# dictwriter object
writer = csv.DictWriter(f, headers)
# writing the header
writer.writeheader()
# writing the rows or values
writer.writerows(rows)

Output

After adding the data to the CSV file ('employee_dict.csv') looks like this.

ID,Name,Surname,Contact,Hire_Date,Job_ID
198,Donald,Trump,"650-507-9833","21-JUN-07",EX_PRESIDENT
200,Joe,Biden,"515-123-4444","21-JAN-21",PRESIDENT
206,William,Gietz,"515-123-8181","20-JAN-17",AC_ACCOUNT

Working with CSV files in Python: Advanced Methods

The reason why I classified this section from the previous and gave the name Advanced Methods, you'll soon get the idea of it.

In this section, I'm gonna cover some major tasks that come under CSV file handling using the 'csv' module and it will be useful when you're handling a CSV file consisting of a variety and a large amount of data.

Throughout this section, I will use this 'Python_Trends.csv' file which consists of data related to several keywords on python programming, which are searched by people on Google.

Here is the data of 'Python_Trends.csv' file.

A CSV file of keyword research on python programming language - PySeek
Python_Trends.csv

On my Linux machine, I tried to open this file using 'utf-8' encoding technique (which is by default) but the program was giving such an error.

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"

Many applications produce utf-16 encoded file as needed because not every character are listed in the utf-8 encoding. The problem can be solved by using 'utf-16' encoding technique while opening the file through programming. Let's see how it's possible.

Read a CSV file using UTF-16 encoding in Python

The code will be the same as the previous reading method but will have to define an additional parameter such as "encoding = 'utf-16'", when opening the CSV file using the 'open ()' method.

Here is the code for you.

Code


import csv

# Opening the CSV file in read mode(by-default)
with open('Python_Trends.csv', encoding='utf-16') as f:
# csv.reader object
reader = csv.reader(f)
# column names
headers = next(reader)
# printing the column names
print(headers, '\n')

# printing the data from the reader object
for row in reader:
print(row)

Output

['Keyword', 'Avg. monthly searches', 'Competition', 'CPC(Low)', 'CPC(High)']

['python programming', '500000', 'Low', '7.15', '186.04']
['python', '500000', 'Low', '9.64', '197.46']
['learn python', '50000', 'Medium', '30.55', '486.5']
['python ide', '50000', 'Low', '86.4', '295.79']
['guido van rossum', '50000', 'Low', '', '']
['python language', '50000', 'Low', '4.67', '35.96']
['python gui', '50000', 'Low', '3.89', '272.44']
['python coding', '50000', 'Low', '16.35', '382.19']
['python course', '50000', 'Medium', '22.57', '272.44']
['python certification', '50000', 'Medium', '41.84', '760.5']
['python basics', '50000', 'Low', '5.45', '82.51']
['python 3.7', '5000', 'Low', '5.14', '203.94']
['fibonacci series in python', '50000', 'Low', '3.11', '6.23']
['python for beginners', '50000', 'Medium', '11.68', '273.22']
['automate the boring stuff with python', '50000', 'Low', '28.02', '153.34']
['python ide online', '50000', 'Low', '91.07', '296.57']
['python excel', '5000', 'Low', '8.23', '115.98']
['python developer', '50000', 'Low', '12.45', '467.04']
['learn python the hard way', '5000', 'Low', '27', '565.9']

See, the data looks pretty complex. This will be further complicated when the amount of data with different variations increases.

In Data Science, we often use complex data, in which case, we must use the 'utf-16' encoding technique.

Reading CSV using csv with optional arguments

Remember, at first we worked with a CSV file named 'employee.csv'. I made a minor change there for showing you some interesting facts. Let's see the data stored there.

ID Name Surname Contact Hire_Date
198 "Donald" OConnell 650-507-9833 21-JUN, 2007
200 Jennifer's Whalen 515-123-4444 17-SEP, 2003

Look, I marked every change I made there. Now, in this file, the space(' ') is delimiter (which means two values are separated by a space (' ') character).

The orange mark: "Donald" is a name (or value) that is double-quoted.

The green mark: an apostrophe (') is present in the word "Jennifer's"

The yellow mark: There are two dates where the year is separated by a comma from day and month.

Now we will tell the program to do the following task.

1. Take the space(' ') character as the delimiter by passing " delimiter=' ' " parameter. Comma is the default delimiter. You can also specify a semicolon(';'), tab('\t'), or pipe('|') there, if needed.

2. Ignore the comma(' , ') in the date section, by passing " escapechar=',' " parameter.

Code


import csv

# opening the CSV file in read mode(by-default)
with open('employee.csv') as f:
reader = csv.reader(f, delimiter=' ', escapechar=',')
# header of the columns
header = next(reader)
print(header,'\n')

# printing all the rows of values
for row in reader:
print(row)

Output

['ID', 'Name', 'Surname', 'Contact', 'Hire_Date']

['198', 'Donald', 'OConnell', '650-507-9833', '21-JUN 2007']
['200', "Jennifer's", 'Whalen', '515-123-4444', '17-SEP 2003']

Try to find out what has changed in the output.

Learn AlsoRegular Expression in Python - Complete Tutorial

Summary

Throughout this tutorial, we have learned how to read and write CSV files using Python. We talked over Basic and Advanced methods with different programming examples about this topic.

Some applications create 'UTF-16' encoded CSV files as needed. In this situation, we have to use 'UTF-16' encoding instead of 'UTF-8'.

There are different ways to parse CSV files. These libraries can be used when data will become complex and huge: pandas, ANTLR (a powerful parser generator), PLY (parsing tool), PyPlus (A library that is a collection of python extensions), etc.

If String Manipulation (for reading and writing methods) is becoming complex and simple methods are not working, regular expression is the only way to get rid of that situation.

Thanks for reading!💙

PySeek

Subhankar Rakshit

Meet Subhankar Rakshit, a Computer Science postgraduate (M.Sc.) and the creator of PySeek. Subhankar is a programmer, specializes in Python language. With a several years of experience under his belt, he has developed a deep understanding of software development. He enjoys writing blogs on various topics related to Computer Science, Python Programming, and Software Development.

Post a Comment (0)
Previous Post Next Post