Page MenuHomePhabricator

Using Structured Data on Commons in file pages
Open, Needs TriagePublic

Description

Brief summary

Wikimedia Commons, the media repository for the Wikimedia projects, now holds multilingual structured data about the media files it stores. However, it also still stores that information in text/template form on the file pages.

This project will investigate and implement methods of displaying structured data in file pages, and how to migrate information currently stored in file pages to structured data. To do this, it will build on https://commons.wikimedia.org/wiki/Template:Structured_Data, https://commons.wikimedia.org/wiki/Template:Geograph_from_structured_data, and similar templates. It may also involve Python/pywikibot bot tasks.

Skills required

Knowledge of Lua is an advantage, although it can be learnt during the project. Similar with Python.

Possible mentor(s)

  • @Mike_Peel: Postdoctoral researcher at the Instituto de Astrofísica de Canarias, creator of the Commons Wikidata Infobox, and programmer in Python (but not Lua).

Microtasks

  • Convert some file pages to use the existing structured data templates manually
  • As above, but using a Python script
  • Expanding an existing template to support additional Wikidata properties

Event Timeline

Based on my initial thoughts at T270429. @Multichill might you be interested in being a co-mentor of this project?

@Mike_Peel Sounds like an interesting project for Outreachy! Same quick question here too- how much time you think it would take for an intern with beginner-level skills on the topic to complete this project? Is it sufficient amount of work for 3-months? Also, could you elaborate a bit on file pages.

@srishakatux Same as the other proposal, I think it's about a week's worth of coding for an experienced developer, and 3 months for a beginner is about right. It's also flexible, again, but this is a harder task than T273109 as the minimum viable product level is higher, but if a student does well then there is a lot of potential for expanding the project. For file pages, see https://commons.wikimedia.org/wiki/File:Jodrell_Bank_Mark_II_5.jpg as a live but not-quite-working-right example.

Also pinging @RexxS and @Jarekt as other potential co-mentors.

@Mike_Peel Scope wise sounds good! My comment would be the same here like for the other project task. If it is not purely a coding project and contributions would stay on the wiki, this project would fit under Outreachy.

(as per discussion in the other task, keeping this one reserved for Outreachy only).

I obviously like getting more structured data and better usage, but I see some issues with this task: The scope seems to be very broad. Is this intentional? What kind of time investment is expected? Might have a higher chance of making a difference when it's more tightly scoped. We have several steps in the process and possible tasks related to it:

  • Convert existing data from wikitext to structured data. This is already happening on a small subset, but on a large scale. Not controversial as long as you don't remove any data. Still a ton of work to do here, but quite a few things are not clear yet on how to model things. Building some kind of workflow to extract knowledge from the category tree and show it to users for approval would be extremely useful.
  • Show the structured data in a pretty way. You already mentioned some examples, https://commons.wikimedia.org/wiki/Module:Artwork and https://commons.wikimedia.org/wiki/Module:Information . The current approach here is to do incremental improvements without changing the look and view from what it used to be. You can also take a radical different approach like skins: You make another implementation for the same data with a completely different look and feel. With some logic logged-in users can enable this new template skin. That way you can experiment quite freely without disturbing the Monobook people. This task would be fun for someone who is on the edge of development and design.

@Multichill Thanks for the comments! I deliberately kept the scope wide so that it can later be adapted to the student's skillset, but there's a balance between that and having a clearly defined project, which we can iterate on. It's a 3-month student project.

I was mostly thinking about work on Lua templates to display existing structured data - perhaps by one of the current templates, perhaps by a new one. The skin approach I don't like so much, it's better to sort it out properly for all users. We currently have very few cases where most info on the file page is displayed from structured data, so there's a lot of scope there.

Converting from wikitext to structured data is more of a Python task, and is the second level I was thinking about - or perhaps they would be better as separate projects.

Would you be interested in co-mentoring? I believe that two mentors are required. Happy to continue iterating here or by other means (zoom/google docs/etc).

@Mike_Peel Whenever you are ready (but before the deadline which is March 14th), follow the steps in Step 3 here https://www.mediawiki.org/wiki/Outreachy/Mentors#_Before_the_program to upload a proposal on the Outreachy site. Let me know if you need any help w/ this.

I can help with this task, but I agree that display of structured data and conversion from wikitext to structured data are two separate tasks. I also share @Multichill concern that this is such a broad scope that it is hard to pin down what it is, or tell when its goal is achieved. Some possible tasks:

  • https://commons.wikimedia.org/wiki/Module:Information already fills {{information}} infobox with data from SDC for template fields missing in wikitext. That can be expanded if more data models are agreed on. Not a good task for a newcomer.
  • We can also create other templates to show other data stored in SDC, which would have to be called from wikitext. Unclear if needed, or if would be used.
  • We could try to tackle creation of license templates based on SDC data. Good understanding of Commons license templates (>1000) would be important.
  • I like Multichill's idea of alternative SDC based skin, but I do not know much about skins

OK, I think it's best to park this for now, and we can think about it some more for the next round of Outreachy (and potentially split it out into several different projects).