Learn How to Get Data from Wikipedia using Python

extract data from wikipedia using python script

Introduction

Wikipedia is an online encyclopedia that is one of the most popular online sites to gather information. Among internet users, there may be very few people who are not known about it yet.

We are more or less familiar with it online but here you are going to learn how to extract information about a specific topic from Wikipedia using Python language. To do so, here we will take the help of a Python library called wikipedia which will help us to extract the data from there easily. So without further ado let's get to the main topic.

Visit AlsoExtract emails and phone numbers from a webpage using python

Requirements

Install wikipedia: pip install wikipedia

Get the introduction part

Let's start this topic by getting the summary of any titles (if the queried topic is available on Wikipedia). Look, I called the summary() function from the wikipedia module and passed two arguments there.

1. title: The title name of the topic.

2. sentences: In the program, we mentioned the number "2", which means the program will extract the first two lines from the summary of the title.


import wikipedia

result = wikipedia.summary(title="Kevin Mitnick", sentences = 2)

print(result)

Output

Kevin David Mitnick (born August 6, 1963) is an American computer security consultant, author, and convicted hacker. He is best known for his high-profile 1995 arrest and five years in prison for various computer and communications-related crimes.Mitnick's pursuit, arrest, trial, and sentence along with the associated journalism, books, and films were all controversial.He now runs the security firm Mitnick Security Consulting, LLC. He is also the Chief Hacking Officer and part owner of the security awareness training company KnowBe4, as well as an active advisory board member at Zimperium, a firm that develops a mobile intrusion prevention system.

Search the title and get the suggested names

In this case, we will mention the query and number of suggestions (that we want to get) as arguments in the search() function. As a result, the program will return the number of names suggested for that topic (As same as the web result).


import wikipedia

result = wikipedia.search(title="London", results = 5)

print(result)

Output

 ['London', 'Greater London', 'Lauren London', 'London, Ontario', 'London Underground']

Get the List page links on a Wikipedia page

Here, we will get the list of titles of Wikipedia page links on a page.


import wikipedia

# wikipedia page object
page_object = wikipedia.page(title="London")

# print page title
print(page_object.original_title)

# printing links on the page object
print(page_object.links[0:10])

Output

London
['.london', '101 Dalmatians (1996 film)', '10 Downing Street', '122 Leadenhall Street', '15 February 2003 anti-war protests', '1854 Broad Street cholera outbreak', '1896 Summer Olympics', '18th-century London', '1900 Summer Olympics', '1904 Summer Olympics']

Change the language of the Wikipedia page

Now we will get the summary of the title of 'London' from the Wikipedia page but in the French language instead of English. To do so, we passed the short form of our required language name (in our case french, "fr") as an argument to the set_lang() function.


import wikipedia

# setting language to french
wikipedia.set_lang("fr")

# printing the summary
print(wikipedia.summary(title="London", sentences="5"))

Output

Londres (/lɔ̃dʁ/  ; en anglais : London, /ˈlʌndən/ ) est la capitale et plus grande ville d'Angleterre et du Royaume-Uni,. La ville est située près de l'estuaire de la Tamise dans le sud-est de l'Angleterre. Londinium est fondée par les Romains il y a presque 2 000 ans. La Cité de Londres, le noyau historique de Londres avec une superficie de seulement 1,12 miles carrés (2,9 km2) conserve des frontières qui suivent de près ses limites médiévales. Londres est gouvernée par le maire de Londres et l'Assemblée de Londres.

Suggestion for Spelling Mistake

Suppose, you entered the wrong spelling in the program, the result may not satisfy you. There is a function named suggest() which helps in finding the correct suggestion name for a query. Look at the program below, I accidentally made a spelling mistake. Let's see how our program returns the correct name as a result.


import wikipedia

results = wikipedia.suggest(query="mortgag")

print(results)

Output

mortgage

Summary

Today, we learned how to Get Data about a topic from Wikipedia using Python even without visiting the web. We covered several examples above. It was the most conventional way to do the task. We can also get the data even more informatively but using another web scrapping method. We can talk about this later.

That's all for today. For any doubt, leave your comment below. You will get a reply soon.

Thanks for reading!💙

PySeek

Subhankar Rakshit

Meet Subhankar Rakshit, a Computer Science postgraduate (M.Sc.) and the creator of PySeek. Subhankar is a programmer, specializes in Python language. With a several years of experience under his belt, he has developed a deep understanding of software development. He enjoys writing blogs on various topics related to Computer Science, Python Programming, and Software Development.

Post a Comment (0)
Previous Post Next Post