Page MenuHomePhabricator

[Spike 1 day] Figure out wb_terms redesign, and what that means for MediaInfo
Closed, DeclinedPublic

Description

Question we are trying to answer

In T223792 (parent task) I suggest to start writing to wb_terms, so that Wikibase's Lua support can find MediaInfo entities.
IIRC, wb_terms is in the process of being redesigned, though, and there appears to be migration code in Wikibase already.
We need to figure out:

  • what is going to change and where in the process we are
  • whether it makes sense to start writing to wb_terms at this point (and go through the pain - if any - of migrating), or if it's better to wait until migration done
  • if we're going to have to go through the migration, we need to figure out what steps to take
  • if we're waiting until migration is done, we might need to write different code to write to the new, redesigned, solution

Acceptance Criteria

  • Decide whether https://gerrit.wikimedia.org/r/522355 can be merged, or should be changed
  • Create a new ticket with follow-up steps (after merging that patch or shooting it down), if any

Event Timeline

what is going to change and where in the process we are

The end goal is that we replace wb_terms completely with the new store (normalized tables, fwiw). Since wikidata has a very big volume in wb_terms, we will have to go through a lengthy migration period. This means that wb_terms will continue to be supported code-wise for the time being and MediaInfo can for sure rely on using it if that's necessary at the moment. wb_terms table and related interfaces will not be dropped without an announcement with buffer period for users of the store to switch to the new one.

whether it makes sense to start writing to wb_terms at this point (and go through the pain - if any - of migrating), or if it's better to wait until migration done

Depends on the gain of writing to wb_terms against waiting. If the current use-cases for writing to wb_terms are not urgent, I think it might make more sense to wait on that to avoid another lengthy migration on commons in case the volume of terms there grows big. To give it temporal sense, we have already migrated property terms in wikidata production, and should be reading them from new store as well soon. Item terms will take longer, and we hope it will be matter of couple of months to migrate all of it, putting us in a position to move away from wb_terms.

If your use-case cannot wait couple of months, it might be worth it to start using wb_terms for now.

if we're going to have to go through the migration, we need to figure out what steps to take

yes. If you really decide to go with it, we can certainly help by sharing more details on how we are doing it for Wikidata case. The good news, in case commons never grows so large for the migration to be risky to be done in one go, that will be done automatically through db update script, once we switch to new store entirely (there will be a version release of wikibase that introduce that as an automatic update, and will be announced as well).

if we're waiting until migration is done, we might need to write different code to write to the new, redesigned, solution

that might be the case, but I have to look into what changes are needed in MediaInfo itself in order to use the old or the new term store. Wikibase will contain interfaces to use the new store, so it shouldn't be that much of difference more than using a different interface at some level.


sorry for shorter less useful answers for the moment. I hope that this might help you get closer to a decision, and hope I will get back sooner to you with better more detailed answers once I'm back at the office next week :)

I don't think wb_terms should be used at all for media info.
A custom system or new table or set of tables should be decided on, created, and populated.
Temporarily using wb_terms just to remove it would cause more trouble that it is worth and could just lead to it still existing in a few years and us having to do another massive migration.

I don't think wb_terms should be used at all for media info.

The not normalized version? Completely agree.

A custom system or new table or set of tables should be decided on, created, and populated.

Depends what you call custom. Have a look at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/499142/11/repo/sql/AddNormalizedTermsTablesDDL.sql . It already mentions how to add a different type. So we could create the "wbt_mediainfo_terms" table which is just a renamed version of wbt_item_terms .

Temporarily using wb_terms just to remove it would cause more trouble that it is worth and could just lead to it still existing in a few years and us having to do another massive migration.

Agree.

It looks like we may not need wb_terms (or the redesigned store) at this time. Declining ticket for now.