Page MenuHomePhabricator

Synchronising Wikidata and Wikipedias using pywikibot - Task 5
Closed, ResolvedPublic

Description

This is the fifth task for T276329, Synchronising Wikidata and Wikipedias using pywikibot, aimed at getting you familiar with searching for Wikidata items and finding the QIDs.

  1. You should already have a Wikimedia account and set up pywikibot (if not, do Tasks 1 and 2 first).
  1. Find some terms to search for. This could be name strings identified in previous tasks, or article titles (e.g., those not yet connected to Wikidata),
  1. Set up a script that connects to Wikidata, searches for the term, and returns the QID. Make sure it is the correct QID!
  1. Bonus: Explore how to identify the correct item when multiple terms are returned

Save your code to a repository, or create a page like https://www.wikidata.org/wiki/User:Mike_Peel/Outreachy_2 (under your username - and change the ending to '5'.) Add the links to the edits at the end of the code as a comment.

Once you are happy, send me a link to your page (by email, on my talk page, or replying to this ticket as you prefer). Make sure to also register it as a contribution on the Outreachy website (https://www.outreachy.org/outreachy-may-2021-internship-round/communities/wikimedia/synchronising-wikidata-and-wikipedias-using-pywiki/contributions/)!

Hints:

Event Timeline

@Mike_Peel, For Bonus: Explore how to identify the correct item when multiple terms are returned, can we approach the problem with the idea that for every QID returned through the searching title, we refine the correct one by parsing each item for a specific property? For instance, if my title is "Harry Potter". Now that returns Harry Potter movie, book, and character. But only the book will have the property "Language of the work", or "author", or "publication date"?

@Mike_Peel, For Bonus: Explore how to identify the correct item when multiple terms are returned, can we approach the problem with the idea that for every QID returned through the searching title, we refine the correct one by parsing each item for a specific property? For instance, if my title is "Harry Potter". Now that returns Harry Potter movie, book, and character. But only the book will have the property "Language of the work", or "author", or "publication date"?

It's an open-ended question that I don't know the best answer to, feel free to try different things. :-)

Hi @Mike_Peel and @MSGJ ,

In the task description '' those not yet connected to Wikidata'' means we have to search for things which haven't allotted a Q number already?? If not, please explain.

Hi @Mike_Peel and @MSGJ ,

In the task description '' those not yet connected to Wikidata'' means we have to search for things which haven't allotted a Q number already?? If not, please explain.

It's up to you. If you want a challenge, look through the items from UnconnectedPages and try to find the right QIDs for them (if they exist). Or if you want something easier, search for items that already have QIDs, and see if you recover the same one.

@Mike_Peel @MSGJ for unconnected pages, do we have to look for only articles, or for all types of pages viz. Template, Wikipedia, Category, Help, Module, etc.?

Thanks

@Mike_Peel @MSGJ for unconnected pages, do we have to look for only articles, or for all types of pages viz. Template, Wikipedia, Category, Help, Module, etc.?

Go for articles - or categories if you want something slightly different. The others are more complex as not all of them get connected to Wikidata.

I have submitted my task 5 code through email.

@MSGJ , @Mike_Peel can i hardcode the terms , which i need to search or i have to find them using code. For example, if i want to get Qid for 'human', can i just put it in search_entities or i need to go to wikidata page and parse that to get the title human?

Hi @MSGJ , @Mike_Peel ,
link - https://www.wikidata.org/wiki/User:Pushp24/Outreachy_5
Above is the link for my outreachy task5.
I am looking forward for your suggestion and feedback.

Thanks and regards
Pushpanjali Kumari

Hello @Mike_Peel, @MSGJ,

I would like to submit my contribution for this task -> task 5.

Also, I would like to know if you have any specific instructions about the internship timeline.
Should we follow the points listed in the project page on the Outreachy website?
Any feedback would be highly appreciated.

Thank you in advance,
Nina

@MSGJ , @Mike_Peel can i hardcode the terms , which i need to search or i have to find them using code. For example, if i want to get Qid for 'human', can i just put it in search_entities or i need to go to wikidata page and parse that to get the title human?

For this task, you need to search for the QIDs, not hard-code them. Unless you're using them as a cross-check of the search results only, in which can hard-code general topic QIDs such as 'human' or 'book.

Also, I would like to know if you have any specific instructions about the internship timeline.
Should we follow the points listed in the project page on the Outreachy website?
Any feedback would be highly appreciated.

I put guidance at T276329 under the task list for the timeline.