This is the third task for T276329, Synchronising Wikidata and Wikipedias using pywikibot, aimed at getting you familiar with parsing information from Wikipedia articles.
- You should already have a Wikimedia account and set up pywikibot (if not, do Tasks 1 and 2 first).
- Set up a script that will connect to Wikipedia, and load the contents of one of the pages you identified in Task 1 (just one for now).
- Parse through the article text to extract the statements you manually found in Task 1. Use whichever tool you would like for this (e.g., 're', or searching for template parameter names in the infobox, etc.). (Yes - this is the first tricky part!)
- Print out the information (extracted from the article, not Wikidata!) alongside the property name (e.g., "P31 = radio telescope". Code this for at least 6 items in the current article.
- Try for a few other pages as well - how well does your parsing work, and what changes do you need to make for the other pages?
Bonus: print out the corresponding values from Wikidata as well, if they are available.
Save your code to a repository, or create a page like https://www.wikidata.org/wiki/User:Mike_Peel/Outreachy_2 (under your username - and change the ending to '3'.)
Once you are happy, send me a link to your page (by email, on my talk page, or replying to this ticket as you prefer). Make sure to also register it as a contribution on the Outreachy website (https://www.outreachy.org/outreachy-may-2021-internship-round/communities/wikimedia/synchronising-wikidata-and-wikipedias-using-pywiki/contributions/)!
- You can find examples in https://bitbucket.org/mikepeel/wikicode/src/master/example.py
- And more at https://www.mediawiki.org/wiki/Manual:Pywikibot/Create_your_own_script
- And more complicated examples at https://bitbucket.org/mikepeel/wikicode/src/master/wir_newpages.py
- This might also be useful: https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial