Page MenuHomePhabricator

Investigation: Assistance with structured data on Commons
Closed, ResolvedPublic5 Estimated Story Points

Description

The overall question for this investigation: Is the work that we could do on this project over the next six months going to make a version of "stuctured data on Commons" happen faster?

If the Wikidata team understands and agrees on how to break the problem down, and we can pick up some of the pieces that they don't have time/resources to do, then that's a good situation for us to help with. But if they're still figuring out how to do some of the foundational work, then we might just be getting in their way.

Essentially: is this a project where adding more people makes it go faster, or slower?


Related to the #6 item on the Wishlist Survey: T120451: Allow categories in Commons in all languages

Translating Commons category names will add extra layers of complexity to an already hard-to-use category system. The best way for Community Tech to contribute towards this goal is to support Wikidata's work on supporting structured data on Commons:

T68108: [Epic] Store media information for files on Wikimedia Commons as structured data
T125822: [Epic] Basic first prototype for structured data support for Commons

In March, Lydia gave us the following list of tasks that would support this work. Some of these have already had some work, although none of them are closed.

This investigation ticket is to determine: What can our team actually work on, to help make structured data support happen?

Directly helping:

Indirectly helping, comment on these RFCs:

Update, Niharika's meeting with Lydia at Wikimania:

The main thing they want to do is have a new Type on Wikidata, like we have Items and Properties only right now. They want a new type for storing media info. I asked her if it would be similar to Item and she said it'll include some of the Item properties but some new ones also, which is why a new type.

  • Ability to use items/properties from Wikidata to make statements on other wikis: T76007

If I upload a picture of a Mango tree on Commons, I should be able to pick what kind of tree, what color etc. from Wikidata options (sort of like an auto-complete interface for specific properties the user chooses). If it's something new, the data has to first go on Wikidata and then can be used on the wikis.

  • A new Wikibase datatype for smart URIs: T127929

Wikibase (the software that Wikidata runs on) supports these data types as of now: https://www.wikidata.org/wiki/Special:ListDatatypes They want support for a new data type: https://phabricator.wikimedia.org/T127929 -- Basically that we accept the user's profile link (for a bunch of possible sources) and display only the relevant handle while retaining the URI link underneath.

  • Multi-Content Revisions T107595 (in close collaboration with Daniel)

Need to talk more to Daniel about this one.

  • Thoughts/concepts on integration of query and search in the context of multimedia meta data

Ability to run complex searches from the wiki itself. For example:
{{dog:white|male|poodle}}
should turn up all images of dogs with those specifications. The syntax and logistics of this task are still up in the air and possibly dependent on the first task being completed.

Event Timeline

DannyH raised the priority of this task from to Medium.
DannyH updated the task description. (Show Details)
DannyH moved this task to Older: Team Work on the Community-Tech board.
DannyH subscribed.

Danny: Will you be at the developer summit in January? If so I think it'd be good to talk about this for 15 minutes there in person.

DannyH renamed this task from Investigation: Allow categories in Commons in all languages to Investigation: Assistance with structured data on Commons.Jun 13 2016, 10:49 PM
DannyH updated the task description. (Show Details)
kaldari set the point value for this task to 5.Jun 30 2016, 5:33 PM
Investigation details (including discussion details from Wikimania):
  • Ability to use items/properties from Wikidata to make statements on other wikis: T76007
    • What this task is about: If I upload a picture of a Mango tree on Commons, I should be able to pick what kind of tree, what color etc. from Wikidata options (sort of like an auto-complete interface for specific properties the user chooses). If it's something new, the data has to first go on Wikidata and then can be used on the wikis.
    • Blocked on T133381: Add support for foreign entities to EntityId. Daniel: "Adding support for this to the EntityId class itself should not be hard. Adding prefix mapping for serializers and deserializers, plus the necessary config, is more work"
    • Both of these tasks are tricky and non-trivial for someone without prior Wikibase familiarity.
  • A new Wikibase datatype for smart URIs: T127929
    • What this task is about: Wikibase (the software that Wikidata runs on) supports these data types as of now: https://www.wikidata.org/wiki/Special:ListDatatypes They want support for a new data type: T127929 -- Basically that we accept the user's profile link (for a bunch of possible sources) and display only the relevant handle while retaining the URI link underneath.
    • Estimated time to complete this for someone with Wikibase familiarity: 1 week (as per Daniel's estimate) plus another week for feedback and consultations. UI part is tricky and non-trivial. For someone with no Wikibase experience (like our team), this task could possibly stretch up to a month of work.
  • T107595: [RFC] Multi-Content Revisions (in close collaboration with Daniel)
    • This RFC proposal talks of basically revolutionizing how MediaWiki stores and renders content. The idea is to have different "data streams" for every page with separate "slots" which are saved in the database for immediate retrieval on demand. For example "wikitext", "html", "diff", "blamemap" etc. are all different slots. The latter two fall in the category of Derived slots and the former two are Primary slots. Depending on what part of the page changes, the derived data streams get updated but the rest of the slots which were untouched by the change are retained in the database as is. The RFC talks about in-depth implementation details for the project.
    • For the purposes of this investigation, we can probably help with getting involved in the discussions for the RFC but it's unlikely that the implementation will start any time soon.
  • Thoughts/concepts on integration of query and search in the context of multimedia meta data
    • What this task is about: Ability to run complex searches from the wiki itself. For example: {{dog:white|male|poodle}} should turn up all images of dogs with those specifications. The syntax and logistics of this task are still up in the air and possibly dependent on the first task being completed.
    • This task is mainly dependent on T68108: [Epic] Store media information for files on Wikimedia Commons as structured data. Looking at the subtasks, this is again not something we can help with in the immediate future.
Investigation outcomes

We can help with:

Given how daunting the Wikibase code is (2500 files before running Composer), the feedback from Daniel about how long it would take a new dev to get up to speed, and the vague timeline on getting all these pieces into place, I would prefer that we leave these coding tasks to the Wikidata developers, as our time will be better spent on other projects, IMO. It would be useful if we add comments to some of these RfCs though. @DannyH: What are your thoughts?

Yes, I agree. I want structured data on Commons, and I wish we could make that process go faster, but it doesn't sound like we can help right now.