Introduction
Wikipedia is an online encyclopedia that is one of the most popular online sites to gather information. Among internet users, there may be very few people who are not known about it yet.
We are more or less familiar with it online but here you are going to learn how to extract information about a specific topic from Wikipedia using Python language. To do so, here we will take the help of a Python library called wikipedia which will help us to extract the data from there easily. So without further ado let's get to the main topic.
Visit Also: Extract emails and phone numbers from a webpage using python
Requirements
Install wikipedia: pip install wikipedia
Get the introduction part
Let's start this topic by getting the summary of any titles (if the queried topic is available on Wikipedia). Look, I called the summary() function from the wikipedia module and passed two arguments there.
1. title: The title name of the topic.
2. sentences: In the program, we mentioned the number "2", which means the program will extract the first two lines from the summary of the title.
import wikipedia
result = wikipedia.summary(title="Kevin Mitnick", sentences = 2)
print(result)
Output
Kevin David Mitnick (born August 6, 1963) is an American computer security consultant, author, and convicted hacker. He is best known for his high-profile 1995 arrest and five years in prison for various computer and communications-related crimes.Mitnick's pursuit, arrest, trial, and sentence along with the associated journalism, books, and films were all controversial.He now runs the security firm Mitnick Security Consulting, LLC. He is also the Chief Hacking Officer and part owner of the security awareness training company KnowBe4, as well as an active advisory board member at Zimperium, a firm that develops a mobile intrusion prevention system.
Search the title and get the suggested names
In this case, we will mention the query and number of suggestions (that we want to get) as arguments in the search() function. As a result, the program will return the number of names suggested for that topic (As same as the web result).
import wikipedia
result = wikipedia.search(title="London", results = 5)
print(result)
Output
['London', 'Greater London', 'Lauren London', 'London, Ontario', 'London Underground']
Get the List page links on a Wikipedia page
Here, we will get the list of titles of Wikipedia page links on a page.
import wikipedia
# wikipedia page object
page_object = wikipedia.page(title="London")
# print page title
print(page_object.original_title)
# printing links on the page object
print(page_object.links[0:10])
Output
London
['.london', '101 Dalmatians (1996 film)', '10 Downing Street', '122 Leadenhall Street', '15 February 2003 anti-war protests', '1854 Broad Street cholera outbreak', '1896 Summer Olympics', '18th-century London', '1900 Summer Olympics', '1904 Summer Olympics']
Change the language of the Wikipedia page
Now we will get the summary of the title of 'London' from the Wikipedia page but in the French language instead of English. To do so, we passed the short form of our required language name (in our case french, "fr") as an argument to the set_lang() function.
import wikipedia
# setting language to french
wikipedia.set_lang("fr")
# printing the summary
print(wikipedia.summary(title="London", sentences="5"))
Output
Londres (/lɔ̃dʁ/ ; en anglais : London, /ˈlʌndən/ ) est la capitale et plus grande ville d'Angleterre et du Royaume-Uni,. La ville est située près de l'estuaire de la Tamise dans le sud-est de l'Angleterre. Londinium est fondée par les Romains il y a presque 2 000 ans. La Cité de Londres, le noyau historique de Londres avec une superficie de seulement 1,12 miles carrés (2,9 km2) conserve des frontières qui suivent de près ses limites médiévales. Londres est gouvernée par le maire de Londres et l'Assemblée de Londres.
Suggestion for Spelling Mistake
Suppose, you entered the wrong spelling in the program, the result may not satisfy you. There is a function named suggest() which helps in finding the correct suggestion name for a query. Look at the program below, I accidentally made a spelling mistake. Let's see how our program returns the correct name as a result.
import wikipedia
results = wikipedia.suggest(query="mortgag")
print(results)
Output
mortgage
Summary
Today, we learned how to Get Data about a topic from Wikipedia using Python even without visiting the web. We covered several examples above. It was the most conventional way to do the task. We can also get the data even more informatively but using another web scrapping method. We can talk about this later.
That's all for today. For any doubt, leave your comment below. You will get a reply soon.
Thanks for reading!💙
PySeek