This is the first task for T290718, Automatically matching new Wikipedia articles with Wikidata items using Python, aimed at getting you familiar with Wikidata and how properties work within structured data.
- You should register a Wikimedia account if you don't already have one. You can do so at https://www.wikidata.org/w/index.php?title=Special:CreateAccount
- Pick the language Wikipedia that you are most familiar with - any language is OK (see https://www.wikipedia.org/ for a complete list), but Wikipedias with more articles will have more content to work with.
- Pick a type of article. This could be books, ships, arthropods, authors, sports, castles, bridges, chemists, museums, rivers, trees - anything you are interested in. Find a few of that type of article (say, 6-12) - as varied as you can.
- Have a look at how key facts are stored in the article - particularly in the infobox, but also look at the categories and the text. Start thinking about what you could define as very simple statements about the topic ('this is a human', 'this was published in 2021', 'this is made of stone", etc.)
- Have a look at the Wikidata item for each article. You can find that by clicking on the 'Wikidata item' link on the left-hand sidebar of the article (or similar for other languages). Each article has a 'Q' number, which is at the top of the page.
- You will see a set of properties with information about the topic. See how well they compare to the statements you were thinking of earlier. Are there obvious matches?
- Wikidata stores properties as "P" numbers (for example, 'instance of' is P31). - you can find these by hovering over the property label. Start collecting the ones that are used for your type of article. Properties will be linked against a value (date, number, text string, Q-number, filename, etc.)
- Start a page like https://www.wikidata.org/wiki/User:Mike_Peel/Outreachy_1 (change 'Mike_Peel' to your username). Follow the rough format there to document what you're seeing, e.g.:
* {{P|P31}}: [[:en:Lovell Telescope]] is a {{Q|Q184356}} in the infobox
The first part is the property, the second links to the article you were looking at (change 'en' to the relevant language code), and the 'Q' number is the value that it is linking to ('radio telescope' - or if it doesn't have a Q-number, give the string, date, filename etc.). Finally, say where in the article that piece of information is stored.
- See how many statements you can find matching properties, and which ones you can't find (list those anyway and we can come back to them). You should aim for around 15-20 different properties) across at least 6 articles.
Bonus 1: you will see that some properties have qualifier values, also document those and understand how they work.
Bonus 2: Look for additional properties that could be used but currently aren't (see https://www.wikidata.org/wiki/Wikidata:List_of_properties ), and document those.
Bonus 3: Add new properties to the Wikidata items you have been looking at. At the bottom of the page, click on 'Add statements', and you can input a property (by P-number or text), and a value (by Q-number or text). If you can't see the link, or it doesn't work, check the top-right of the page to see if it has a padlock - this means that the item is protected and you will need to make edits to other items instead (you will be able to come back to this item once your account is auto-confirmed, so you can note down the item anyway to come back to it!)
Once you are happy, send me a link to your page (by email, on my talk page, or replying to this ticket as you prefer). Make sure to also register it as a contribution on the Outreachy website ( https://www.outreachy.org/outreachy-december-2021-internship-round/communities/wikimedia/automatically-matching-new-wikipedia-articles-with/contributions/ )! I'll send you a reply to say whether it is accepted or not on the talk page for the contribution.